Hash table: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:15, 16 January 2024 edit Félix An (talk \| contribs) Extended confirmed users, IP block exemptions 2,796 edits →Implementations: dictionary .net Tag: Visual edit ← Previous edit		Latest revision as of 15:26, 12 November 2024 edit undo 98.7.197.219 (talk) →History
(40 intermediate revisions by 24 users not shown)
Line 34: [[File:Hash table 3 1 1 0 1 0 0 SP.svg\|thumb\|315px\|right\|A small phone book as a hash table]] In [[computing]], a '''hash table'''~~, also known as a '''hash map''',~~ is a [[data structure]] that implements an [[associative array]], also called a '''dictionary,''' ~~which~~or simply '''map'''; an associative array is an [[abstract data type]] that maps [[Unique key\|keys]] to [[Value (computer science)\|values]].<ref name="ms">{{~~citation~~cite book \|~~contribution~~doi=410.1007/978-3-540-77978-0_4 \|chapter=Hash Tables and Associative Arrays \|title=Algorithms and Data Structures~~: The Basic~~ ~~Toolbox~~\|first1=Kurt\|last1=Mehlhorn \|author1-link=Kurt Mehlhorn \|first2=Peter \|last2=Sanders \|author2-link=Peter Sanders (computer scientist) \|publisher=Springer \|~~year~~date=2008 \|pages=81–98 \|isbn=978-3-540-77977-3 \|chapter-url=~~http~~https://people.mpi-inf.mpg.de/~mehlhorn/ftp/Toolbox/HashTables.pdf}}</ref> A hash table uses a [[hash function]] to compute an ''index'', also called a ''hash code'', into an array of ''buckets'' or ''slots'', from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a '''hash map'''. ~~Ideally, the hash function will assign each key to a unique bucket, but most~~Most hash table designs employ an ~~imperfect~~[[Perfect hash function,\|imperfect ~~which~~hash ~~might~~function]]. ~~cause hash ''~~[[~~hash~~Hash collision\|Hash collisions]]'', where the hash function generates the same index for more than one key., ~~Such~~therefore ~~collisions~~typically ~~are~~must ~~typically~~be accommodated in some way. In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of [[name–value pair\|key–value pairs]], at [[amortized analysis\|amortized]] constant average cost per operation.<ref name="leiser">{{cite web \|first=Charles E. \|last=Leiserson \|author-link=Charles E. Leiserson \|url=http://videolectures.net/mit6046jf05_leiserson_lec13/ \|title=Lecture 13: Amortized Algorithms, Table Doubling, Potential Method \|archive-url=https://web.archive.org/web/20090807022046/http://videolectures.net/mit6046jf05_leiserson_lec13/ \|archive-date=August 7, 2009 \|work=course MIT 6.046J/18.410J Introduction to Algorithms \|date=Fall 2005 \|url-status=live}}</ref><ref name="knuth">{{cite book \| first=Donald \|last=Knuth \|author1-link=Donald Knuth \| title = The Art of Computer Programming \| volume = 3: ''Sorting and Searching'' \| edition = 2nd \| publisher = Addison-Wesley \| year = 1998 \| isbn = 978-0-201-89685-5 \| pages = 513–558 }}</ref><ref name="cormen">{{cite book \|last1=Cormen \|first1=Thomas H. \|author1-link=Thomas H. Cormen \|last2=Leiserson \|first2=Charles E. \|author2-link=Charles E. Leiserson \|last3=Rivest \|first3=Ronald L. \|author3-link=Ronald L. Rivest \|last4=Stein \|first4=Clifford \|author4-link=Clifford Stein \| title = Introduction to Algorithms \| publisher = MIT Press and McGraw-Hill \| year= 2001 \| isbn = 978-0-262-53196-2 \| edition = 2nd \| pages=[https://archive.org/details/introductiontoal00corm_691/page/n243 221]–252 \| chapter = Chapter 11: Hash Tables \|title-link=Introduction to Algorithms }}</ref> Line 45: ==History== The idea of hashing arose independently in different places. In January 1953, [[Hans Peter Luhn]] wrote an internal [[IBM]] memorandum that used hashing with chaining. The first example of [[open addressing]] was proposed by A. D. Linh, building on Luhn's memorandum.<ref name="~~hashhist~~knuth"~~>{{cite book \|url=https:~~//www.taylorfrancis.com/books/mono/10.1201/9781420035179/handbook-data-structures-applications-dinesh-mehta-dinesh-mehta-sartaj-sahni \|publisher=[[Taylor & Francis]] \|isbn=978-1-58488-435-4 \|first1=Dinesh P. \|last1=Mehta \|first2=Sartaj \|last2=Sahni \|author2-link=Sartaj Sahni \|editor-first1=Dinesh P. \|editor-last1=Mehta \|editor-first2=Dinesh P. \|editor-last2=Mehta \|editor-first3=Sartaj \|editor-last3=Sahni \|title=Handbook of Data Structures and Applications \|chapter=9: Hash Tables \|date=28 October 2004 \|edition=1 \|doi=10.1201/9781420035179}}</ref>{{rp\|p=15547}} Around the same time, [[Gene Amdahl]], [[Elaine M. McGraw]], [[Nathaniel Rochester (computer scientist)\|Nathaniel Rochester]], and [[Arthur Samuel (computer scientist)\|Arthur Samuel]] of [[IBM Research]] implemented hashing for the [[IBM 701]] [[~~Assembly_language~~Assembly language#Assembler\|assembler]].{{r\|Konheim\|p=124}} Open addressing with linear probing is credited to Amdahl, although [[Andrey Ershov]] independently had the same idea.<ref name="Konheim">{{cite book \|doi=10.1002/9780470630617 \|title=Hashing in Computer Science: ~~Fifty~~\|date=2010 ~~Years of Slicing and Dicing~~\|~~publisher~~last1=~~[[John~~Konheim ~~Wiley & Sons, Inc.]]~~\|~~first~~first1=Alan G.~~\|last=Konheim\|date=21~~ ~~June 2010~~\|isbn=~~9780470630617\|doi=10.1002/9780470630617\|url=https://onlinelibrary.wiley.com/doi/book/10.1002/9780470630617~~978-0-470-34473-6 }}</ref>{{rp\|pp=~~124-125~~124–125}} The term "open addressing" was coined by [[W. Wesley Peterson]] onin his article which discusses the problem of search in large files.<ref name="hashhist">{{rcite book \|~~hashhist~~doi=10.1201/9781420035179 \|title=Handbook of Data Structures and Applications \|date=2004 \|isbn=978-0-429-14701-2 \|editor-last1=Mehta \|editor-last2=Mehta \|editor-last3=Sahni \|editor-first1=Dinesh P. \|editor-first2=Dinesh P. \|editor-first3=Sartaj }}</ref>{{rp\|p=15}} The first [[~~Academic_publishing~~Academic publishing\|published]] work on hashing with chaining is credited to [[Arnold Dumey]], who discussed the idea of using remainder modulo a prime as a hash function.{{r\|hashhist\|p=15}} The word "hashing" was first published in an article by Robert Morris.{{r\|Konheim\|p=126}} A [[~~Analysis_of_algorithms~~Analysis of algorithms\|theoretical analysis]] of linear probing was submitted originally by Konheim and Weiss.{{r\|hashhist\|p=15}} == Overview == An [[associative array]] stores a [[~~Set_~~Set (~~abstract_data_type~~abstract data type)\|set]] of (key, value) pairs and allows insertion, deletion, and lookup (search), with the constraint of [[unique key]]s. In the hash table implementation of associative arrays, an array <math>A</math> of length <math>m</math> is partially filled with <math>n</math> elements, where <math>m \ge n</math>. A value <math>x</math> gets stored at an index location <math>A[h(x)]</math>, where <math>h</math> is a hash function, and <math>h(x) < m</math>.{{r\|hashhist\|p=2}} Under reasonable assumptions, hash tables have better [[time complexity]] bounds on search, delete, and insert operations in comparison to [[self-balancing binary search tree]]s.{{r\|hashhist\|p=1}} Hash tables are also commonly used to implement sets, by omitting the stored value for each key and merely tracking whether the key is present.{{r\|hashhist\|p=1}} Line 63: The performance of the hash table deteriorates in relation to the load factor <math>\alpha</math>.{{r\|hashhist\|p=2}} AThe ~~table~~software typically ensures that the load factor <math>\alpha</math> remains below a certain constant, <math>\alpha_{\max}</math>. This helps maintain good performance. Therefore, a common approach is to resize or "rehash" the hash table whenever the load factor <math>\alpha</math> reaches <math>\alpha_{\max}</math>. Similarly the table may also be resized if the load factor drops below <math>\alpha_{\max}/4</math>.<ref name="cornell08">{{cite web\|url=https://www.cs.cornell.edu/courses/cs312/2008sp/lectures/lec20.html\|title=CS 312: Hash tables and amortized analysis\|publisher=[[Cornell University]], Department of Computer Science\|first=Andrew\|last=Mayers\|access-date=26 October 2021\|year=2008\|archive-url=https://web.archive.org/web/20210426052033/http://www.cs.cornell.edu/courses/cs312/2008sp/lectures/lec20.html\|archive-date=26 April 2021\|url-status=live\|via=cs.cornell.edu}}</ref>▼ ~~To maintain good performance, the software makes sure the load factor <math>\alpha</math> never exceeds some constant <math>\alpha_{\max}</math>.<ref name="cornell08" />~~ ~~Therefore a hash table is resized or ''rehashed'' whenever the load factor <math>\alpha</math> reaches <math>\alpha_{\max}</math>.<ref name="cornell08" />~~ ▲A table is also resized if the load factor drops below <math>\alpha_{\max}/4</math>.<ref name="cornell08">{{cite web\|url=https://www.cs.cornell.edu/courses/cs312/2008sp/lectures/lec20.html\|title=CS 312: Hash tables and amortized analysis\|publisher=[[Cornell University]], Department of Computer Science\|first=Andrew\|last=Mayers\|access-date=26 October 2021\|year=2008\|archive-url=https://web.archive.org/web/20210426052033/http://www.cs.cornell.edu/courses/cs312/2008sp/lectures/lec20.html\|archive-date=26 April 2021\|url-status=live\|via=cs.cornell.edu}}</ref> ==== Load factor for separate chaining ==== Line 88 ⟶ 84: Therefore a hash table that uses open addressing ''must'' be resized or ''rehashed'' if the load factor <math>\alpha</math> approaches 1.<ref name="cornell08" /> With open addressing, acceptable figures of max load factor <math>\alpha_{\max}</math> should range around 0.6 to 0.75.<ref>{{cite journal \|~~journal~~last1=~~ACM~~Maurer ~~Computing Surveys\|issue=1\|volume=1~~\|first1=W. D. \|~~last1~~last2=~~Maurer~~Lewis \|first2=T. G. \|~~last2~~title=~~Lewis~~Hash Table Methods \|journal=ACM Computing Surveys \|date=1 March 1975 \|~~doi~~volume=~~10.1145/356643.356645~~7 \|~~url~~issue=~~https://dl.acm.org/~~1 \|pages=5–19 \|doi/=10.1145/356643.356645~~\|publisher=[[Journal~~ ~~of the ACM]]\|page=14\|title=Hash Table Methods~~\|s2cid=17874775 }}</ref>{{r\|owo03\|p=110}} ==Hash function== A [[hash function]] ~~<math>h</math> maps the universe <math>U</math> of keys~~ <math>h : U \rightarrow \{0, ..., m-1\}</math> tomaps ~~array~~the universe <math>U</math> of keys to indices or slots within the table, ~~for~~that ~~each~~is, <math>h(x) \in \{0, ..., m-1\}</math> ~~where~~for <math>x \in ~~S</math> and <math>m < n~~U</math>. The conventional implementations of hash functions are based on the ''integer universe assumption'' that all elements of the table stem from the universe <math>U = \{0, ..., u - 1\}</math>, where the [[bit length]] of <math>u</math> is confined within the [[~~Word_~~Word (~~computer_architecture~~computer architecture)\|word size]] of a [[computer architecture]].{{r\|hashhist\|p=2}} A hash function <math>h</math> is said to be [[perfect hash function\|perfect]] for a given set <math>hS</math> isif ~~defined~~it ~~as an~~is [[injective function\|injective]] ~~such~~on <math>S</math>, that is, if each element <math>x~~</math>~~ \in ~~<math>~~S</math> maps to a ~~unique~~different value in <math>{0, ..., m-1}</math>.<ref name="Yi06">{{cite conference \| last1 = Lu \| first1 = Yi \| last2 = Prabhakar \| first2 = Balaji \| last3 = Bonomi \| first3 = Flavio \| doi = 10.1109/ISIT.2006.261567 \| conference = 2006 IEEE International Symposium on Information Theory \| pages = 2774–2778 \| title = Perfect Hashing for Network Applications \| year = 2006\| isbn = 1-4244-0505-X \| s2cid = 1494710 }}</ref><ref name="CHD">{{cite conference \| last1 = Belazzougui \| first1 = Djamal \| last2 = Botelho \| first2 = Fabiano C. \| last3 = Dietzfelbinger \| first3 = Martin \| title = Hash, displace, and compress \| url = http://cmph.sourceforge.net/papers/esa09.pdf \| doi = 10.1007/978-3-642-04128-0_61 \| location = Berlin \| mr = 2557794 \| pages = 682–693 \| publisher = Springer \| series = [[Lecture Notes in Computer Science]] \| book-title = Algorithms—ESA 2009: 17th Annual European Symposium, Copenhagen, Denmark, September ~~7-9~~7–9, 2009, Proceedings \| volume = 5757 \| year = 2009\| citeseerx = 10.1.1.568.130}}</ref> A perfect hash function can be created if all the keys are known ahead of time.<ref name="Yi06" /> === Integer universe assumption === The schemes of hashing used in ''integer universe assumption'' include hashing by division, hashing by multiplication, [[universal hashing]], [[dynamic perfect hashing]], and [[Static Hashing\|static perfect hashing]].{{r\|hashhist\|p=2}} However, hashing by division is the commonly used scheme.{{r\|cormenalgo01\|p=264}}<ref name="owo03">{{cite journal \|last1=Owolabi \|first1=Olumide \|title=Empirical studies of some hashing functions \|journal= Information and Software Technology \|date=February 2003 \|volume=45 \|issue=2 \|~~date~~pages=1109–112 ~~February 2003~~\|doi=10.1016/S0950-5849(02)00174-X\|url=https://www.sciencedirect.com/science/article/abs/pii/S095058490200174X\|title= Empirical studies of some hashing functions \|via=[[ScienceDirect]]\|first=Olumide\|last=Owolabi\|pages=109–112 \|publisher= Department of Mathematics and Computer Science, University of Port ~~Harcourt~~}}</ref>{{rp\|p=110}} ==== Hashing by division ==== The scheme in hashing by division is as follows:{{r\|hashhist\|p=2}} <math display="block">h(x)\ =\ Mx\, \bmod\, m</math> ~~Where~~where <math>Mh(x)</math> is the hash ~~digest~~value of <math>x \in S</math> and <math>m</math> is the size of the table. ==== Hashing by multiplication ==== The scheme in hashing by multiplication is as follows:{{r\|hashhist\|pp=2-3}} <math display="block">h(x) = \lfloor m \bigl((~~M A~~xA) \bmod 1\bigr) \rfloor</math> Where <math>A</math> is a non-integer [[~~Real_number~~Real number\|real-valued constant]] and <math>m</math> is the size of the table. An advantage of the hashing by multiplication is that the <math>m</math> is not critical.{{r\|hashhist\|pp=2-3}} Although any value <math>A</math> produces a hash function, [[Donald Knuth]] suggests using the [[golden ratio]].{{r\|hashhist\|p=3}} ===Choosing a hash function=== Line 115 ⟶ 111: The distribution needs to be uniform only for table sizes that occur in the application. In particular, if one uses dynamic resizing with exact doubling and halving of the table size, then the hash function needs to be uniform only when the size is a [[power of two]]. Here the index can be computed as some range of bits of the hash function. On the other hand, some hashing algorithms prefer to have the size be a [[prime number]].<ref name=":0">{{Cite web\|title = Prime Double Hash Table\|url = https://www.concentric.net/~Ttwang/tech/primehash.htm\|date = March 1997\|access-date = 2015-05-10\|last = Wang\|first = Thomas\|archive-url = https://web.archive.org/web/19990903133921/http://www.concentric.net/~Ttwang/tech/primehash.htm\|archive-date = 1999-09-03\|url-status=dead}}</ref> For [[open addressing]] schemes, the hash function should also avoid ''[[Primary clustering\|clustering]]'', the mapping of two or more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular multiplicative hash is claimed to have particularly poor clustering behavior.<ref name=":0" /><ref name="knuth"/> [[K-independent hashing]] offers a way to prove a certain hash function does not have bad keysets for a given type of hashtable. A number of K-independence results are known for collision resolution schemes such as linear probing and cuckoo hashing. Since K-independence can prove a hash function works, one can then focus on finding the fastest possible such hash function.<ref>{{cite journal \| last1 = Wegman \| first1 = Mark N. \| ~~author1-link = Mark N. Wegman \|~~ last2 = Carter \| first2 = J. Lawrence \| title = New hash functions and their use in authentication and set equality \| journal = Journal of Computer and System Sciences \|date=June 1981 \|volume = 22 \| issue = 3 \| pages = 265–279 \| ~~year = 1981 \|~~ doi = 10.1016/0022-0000(81)90033-7 \| ~~id = Conference version in FOCS'79 \| url = http://www.fi.muni.cz/~xbouda1/teaching/2009/IV111/Wegman_Carter_1981_New_hash_functions.pdf \| accessdate = 9 February 2011 \|~~ doi-access = free }}</ref> ==Collision resolution== Line 149 ⟶ 145: ==== Caching and locality of reference ==== The linked list of separate chaining implementation may not be [[Cache-oblivious algorithm\|cache-conscious]] due to [[spatial locality]]—[[locality of reference]]—when the nodes of the linked list are scattered across memory, thus the list traversal during insert and search may entail [[CPU cache]] inefficiencies.<ref name="nick05">{{cite ~~conference~~book \|~~conference~~doi=10.1007/11575832_1 ~~International~~\|chapter=Enhanced ~~Symposium~~Byte onCodes with Restricted Prefix Properties \|title=String Processing and Information Retrieval \|~~title~~series=~~Cache-Conscious~~Lecture ~~Collision Resolution~~Notes in ~~String~~Computer ~~Hash~~Science ~~Tables~~\| ~~first1~~date=~~Nikolas~~2005 \|last1=~~Askitis~~Culpepper \|~~first2~~first1=~~Justin~~J. Shane \|last2=~~Zobel~~Moffat \|~~url~~first2=~~https://link.springer.com/chapter/10.1007/11575832_11~~Alistair \|volume=3772 ~~doi=10.1007/11575832_1~~\|~~publisher~~pages=~~[[Springer Science+Business~~1–12 ~~Media]]~~\|isbn= 978-3-540-29740-6~~\|year=2005\|pages=91–102~~ }}</ref>{{rp\|p=91}} In [[cache-oblivious algorithm\|cache-conscious variants]] of collision resolution through separate chaining, a [[dynamic array]] found to be more [[CPU cache\|cache-friendly]] is used in the place where a linked list or self-balancing binary search trees is usually deployed, since the [[Memory management (operating systems)#Single contiguous allocation\|contiguous allocation]] pattern of the array could be exploited by [[Cache prefetching\|hardware-cache prefetchers]]—such as [[translation lookaside buffer]]—resulting in reduced access time and memory consumption.<ref>{{~~Cite~~cite journal \|last1=Askitis \|first1=Nikolas \|last2=Sinha \|first2=Ranjan \|title=Engineering scalable, cache and space efficient tries for strings\| ~~first1=Nikolas~~\| ~~last1~~journal=~~Askitis\|~~The VLDB Journal ~~first2=Ranjan~~\| ~~last2~~date=~~Sinha\|~~October ~~year=~~2010 \|volume=19 ~~issn~~\|issue=~~1066-8888~~5 \|pages=633–660 \|doi=10.1007/s00778-010-0183-9\| ~~journal=The VLDB Journal\| volume=17\| issue=5\| s2cid=432572\|page=634~~}}</ref><ref>{{Cite conference \| title=Cache-conscious Collision Resolution in String Hash Tables \| first1=Nikolas \| last1=Askitis \| first2=Justin \| last2=Zobel \|date=October 2005 \| isbn=978-3-540-29740-6 \| pages=91–102 \| book-title=Proceedings of the 12th International Conference, String Processing and Information Retrieval (SPIRE 2005) \| doi=10.1007/11575832_11 \| volume=3772/2005}}</ref><ref>{{Cite conference \|title = Fast and Compact Hash Tables for Integer Keys \|first1 = Nikolas \|last1 = Askitis \|year = 2009 \|isbn = 978-1-920682-72-9 \|url = http://crpit.com/confpapers/CRPITV91Askitis.pdf \|pages = 113–122 \|book-title = Proceedings of the 32nd Australasian Computer Science Conference (ACSC 2009) \|volume = 91 \|url-status = dead \|archive-url = https://web.archive.org/web/20110216180225/http://crpit.com/confpapers/CRPITV91Askitis.pdf \|archive-date = February 16, 2011 \|df = mdy-all \|access-date = June 13, 2010 }}</ref> ===Open addressing=== Line 184 ⟶ 180: {{main\| Hopscotch hashing}} [[Hopscotch hashing]] is an open addressing based algorithm which combines the elements of [[cuckoo hashing]], [[linear probing]] and chaining through the notion of a ''neighbourhood'' of buckets—the subsequent buckets around any given occupied bucket, also called a "virtual" bucket.<ref name="nir08">{{cite ~~conference~~book \|doi=10.1007/978-3-540-87779-0_24 \|~~isbn~~chapter=Hopscotch ~~978-3-540-87778-3~~Hashing \|~~publisher~~title=~~[[Springer~~Distributed Computing ~~Publishing]]~~\|~~conference~~series=Lecture ~~International~~Notes ~~Symposium~~in onComputer ~~Distributed Computing~~Science \|~~year~~date=2008 \|last1=Herlihy \|first1=Maurice \|last2=Shavit \|first2=Nir \|last3=Tzafrir \|first3=Moran~~\|title=Hopscotch~~ ~~Hashing~~\|volume=5218~~\|via=Springer~~ ~~Link\|series= Distributed Computing~~\|pages= 350–364 \|~~url~~isbn=~~https://link.springer.com/chapter/10.1007/~~978-3-540-~~87779~~87778-~~0_24\|location=Berlin,~~3 ~~Heidelberg~~}}</ref>{{rp\|pp=351–352}} The algorithm is designed to deliver better performance when the load factor of the hash table grows beyond 90%; it also provides high throughput in [[Concurrent computing\|concurrent settings]], thus well suited for implementing resizable [[concurrent hash table]].{{r\|nir08\|p=350}} The neighbourhood characteristic of hopscotch hashing guarantees a property that, the cost of finding the desired item from any given buckets within the neighbourhood is very close to the cost of finding it in the bucket itself; the algorithm attempts to be an item into its neighbourhood—with a possible cost involved in displacing other items.{{r\|nir08\|p=352}} Each bucket within the hash table includes an additional "hop-information"—an ''H''-bit [[bit array]] for indicating the [[Euclidean distance#One dimension\|relative distance]] of the item which was originally hashed into the current virtual bucket within ''H''-1 entries.{{r\|nir08\|p=352}} Let <math>k</math> and <math>Bk</math> be the key to be inserted and bucket to which the key is hashed into respectively; several cases are involved in the insertion procedure such that the neighbourhood property of the algorithm is vowed:{{r\|nir08\|pp=352-353}} if <math>Bk</math> is empty, the element is inserted, and the leftmost bit of bitmap is [[Bitwise operation\|set]] to 1; if not empty, linear probing is used for finding an empty slot in the table, the bitmap of the bucket gets updated followed by the insertion; if the empty slot is not within the range of the ''neighbourhood,'' i.e. ''H''-1, subsequent swap and hop-info bit array manipulation of each bucket is performed in accordance with its neighbourhood [[Invariant (mathematics)\|invariant properties]].{{r\|nir08\|p=353}} =====Robin Hood hashing===== Robin Hood hashing is an open addressing based collision resolution algorithm; the collisions are resolved through favouring the displacement of the element that is farthest—or longest ''probe sequence length'' (PSL)—from its "home location" i.e. the bucket to which the item was hashed into.<ref name="waterloo86">{{cite book\|title=Robin Hood Hashing\|first=Pedro\|last=Celis\|publisher=[[University of Waterloo]], Dept. of Computer Science\|year=1986\|url=https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf \|location=Ontario, Canada\|isbn= ~~031529700X~~978-0-315-29700-5 \|oclc= 14083698\|archive-url=https://web.archive.org/web/20211101071032/https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf\|archive-date=1 November 2021\|access-date=2 November 2021\|url-status=live}}</ref>{{rp\|p=12}} Although Robin Hood hashing does not change the [[Computational complexity theory\|theoretical search cost]], it significantly affects the [[variance]] of the [[Probability distribution\|distribution]] of the items on the buckets,<ref>{{cite journal~~\|publisher=[[Cambridge~~ ~~University Press]]~~\|~~date~~last1=~~14 August~~Poblete ~~2018~~\|first1=P. V. \|~~last1~~last2=~~Poblete~~Viola \|first2=A.~~\|last2=Viola\|journal=[[Combinatorics,~~ ~~Probability and Computing]]\|volume=28\|issue=4~~\|title=Analysis of Robin Hood and Other Hashing Algorithms Under the Random Probing Model, With and Without Deletions \|~~pages~~journal=~~600–617~~Combinatorics, Probability and Computing \|~~doi~~date=~~10.1017/S0963548318000408~~July 2019 \|~~s2cid~~volume=~~125374363~~28 \|~~url~~issue=https://www.cambridge.org/core/journals/combinatorics-probability-and-computing/article/abs/analysis-of-robin-hood-and-other-hashing-algorithms-under-the-random-probing-model-with-and-without-deletions/933D4F203E3C70EF15053287412242E04 \|~~via~~pages=~~Cambridge~~600–617 ~~Core~~\|~~access-date~~doi=110.1017/S0963548318000408 ~~November 2021~~\|~~issn~~s2cid=125374363 ~~1469-2163~~}}</ref>{{rp\|p=2}} i.e. dealing with [[Cluster analysis\|cluster]] formation in the hash table.<ref name="cornell14">{{cite web\|url=https://www.cs.cornell.edu/courses/cs3110/2014fa/lectures/13/lec13.html\|title= Lecture 13: Hash tables\|publisher=[[Cornell University]], Department of Computer Science\|first=Michael\|last=Clarkson\|access-date=1 November 2021\|year=2014\|archive-url=https://web.archive.org/web/20211007011300/https://www.cs.cornell.edu/courses/cs3110/2014fa/lectures/13/lec13.html\|archive-date=7 October 2021\|url-status=live\|via=cs.cornell.edu}}</ref> Each node within the hash table that uses Robin Hood hashing should be augmented to store an extra PSL value.<ref>{{cite web\|publisher=[[Cornell University]], Department of Computer Science\|url=https://www.cs.cornell.edu/courses/JavaAndDS/files/hashing_RobinHood.pdf\|title=JavaHyperText and Data Structure: Robin Hood Hashing\|access-date=2 November 2021\|first=David\|last=Gries\|year=2017\|archive-url=https://web.archive.org/web/20210426051503/http://www.cs.cornell.edu/courses/JavaAndDS/files/hashing_RobinHood.pdf\|archive-date=26 April 2021\|url-status=live\|via=cs.cornell.edu}}</ref> Let <math>x</math> be the key to be inserted, <math>x.psl</math> be the (incremental) PSL length of <math>x</math>, <math>T</math> be the hash table and <math>j</math> be the index, the insertion procedure is as follows:{{r\|waterloo86\|pp=12-13}}<ref name="indiana88">{{cite tech report\|first=Pedro\|last=Celis\|date=28 March 1988\| number=246\|institution=[[Indiana University]], Department of Computer Science\|location=Bloomington, Indiana\| url=https://legacy.cs.indiana.edu/ftp/techreports/TR246.pdf\|archive-url=https://web.archive.org/web/20211103013505/https://legacy.cs.indiana.edu/ftp/techreports/TR246.pdf\|archive-date=3 November 2021\|access-date=2 November 2021\|url-status=live\| title=External Robin Hood Hashing}}</ref>{{rp\|p=5}} * If <math>x.psl\ \le\ T[j].psl</math>: the iteration goes into the next bucket without attempting an external probe. * If <math>x.psl\ >\ T[j].psl</math>: insert the item <math>x</math> into the bucket <math>j</math>; swap <math>x</math> with <math>T[j]</math>—let it be <math>x'</math>; continue the probe from the <math>j+1</math>st bucket to insert <math>x'</math>; repeat the procedure until every element is inserted. Line 197 ⟶ 193: ===Resizing by moving all entries=== Generally, a new hash table with a size double that of the original hash table gets [[dynamic memory allocation\|allocated]] privately and every item in the original hash table gets moved to the newly allocated one by computing the hash values of the items followed by the insertion operation. Rehashing is simple, but computationally expensive.<ref>{{cite book \|last1=Thareja \|first1=Reema \|title= Data Structures Using C \|date=132014 ~~October 2018\|edition=2\|first=Reema\|last=Thareja~~\|publisher=[[Oxford University Press]] \| ~~url~~isbn=~~https://global.oup.com/academic/product/data~~978-~~structures~~0-~~using~~19-c809930-~~9780198099307\|isbn=~~7 ~~9780198099307~~\|~~url-access=subscription\|~~ chapter=Hashing and Collision \|pages=464–488 }}</ref>{{rp\|pp=478–479}} ===Alternatives to all-at-once rehashing=== Line 222 ⟶ 218: ===Caches=== {{Main\|Cache (computing) }} Hash tables can be used to implement [[cache (computing)\|caches]], auxiliary data tables that are used to speed up the access to data that is primarily stored in slower media. In this application, hash collisions can be handled by discarding one of the two colliding entries—usually erasing the old item that is currently stored in the table and overwriting it with the new item, so every item in the table has a unique hash value.<ref>{{~~Cite~~cite journal \|last1=Zhong \|first1=Liang \|last2=Zheng \|first2=Xueqian \|last3=Liu \|first3=Yong \|last4=Wang \|first4=Mengting \|last5=Cao \|first5=Yang~~\|date=February~~ ~~2020~~\|title=Cache hit ratio maximization in device-to-device communications overlaying cellular networks~~\|url=http://dx.doi.org/10.23919/jcc.2020.02.018~~ \|journal=China Communications \|date=February 2020 \|volume=17 \|issue=2 \|pages=232–238 \|doi=10.23919/jcc.2020.02.018 \|s2cid=212649328~~\|issn=1673-5447~~ }}</ref><ref>{{cite web\|url=https://www.linuxjournal.com/article/7105\|publisher=[[Linux Journal]]\|access-date=16 April 2022\|date=1 January 2004\|title=Understanding Caching\|first=James\|last=Bottommley\|url-status=live\|archive-url=https://web.archive.org/web/20201204195114/https://www.linuxjournal.com/article/7105\|archive-date=4 December 2020}}</ref> ===Sets=== Line 233 ⟶ 229: ==Implementations== Many programming languages provide hash table functionality, either as built-in associative arrays or as [[standard library]] modules. In [[JavaScript]], an "object" is a mutable collection of key-value pairs (called "properties"), where each key is either a string or a guaranteed-unique "symbol"; any other value, when used as a key, is first [[Type conversion\|coerced]] to a string. Aside from the seven "primitive" data types, every value in JavaScript is an object.<ref>{{cite web \|title=JavaScript data types and data structures - JavaScript {{!}} MDN \|url=https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#objects \|website=developer.mozilla.org \|access-date=24 July 2022}}</ref> ECMAScript 2015 also added the <code>Map</code> data structure, which accepts arbitrary values as keys.<ref>{{Cite web \|date=2023-06-20 \|title=Map - JavaScript {{!}} MDN \|url=https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map \|access-date=2023-07-15 \|website=developer.mozilla.org \|language=en-US}}</ref> [[C++11]] includes <code>[[unordered map (C++)\|unordered_map]]</code> in its standard library for storing keys and values of [[~~Template_~~Template (C~~%2B%2B~~++)\|arbitrary types]].<ref>{{cite web\|url=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf\|title=Programming language C++ - Technical Specification\|access-date=8 February 2022\|publisher=[[International Organization for Standardization]]\|archive-url=https://web.archive.org/web/20220121061142/http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3690.pdf\|archive-date=21 January 2022\|pages=812–813}}</ref> [[~~Go_~~Go (~~programming_language~~programming language)\|Go]]'s built-in <code>map</code> implements a hash table in the form of a [[~~Primitive_data_type~~Primitive data type\|type]].<ref>{{cite web\|url=https://go.dev/ref/spec#Map_types\|title=The Go Programming Language Specification\|website=go.dev\|access-date=January 1, 2023}}</ref> [[Java (programming language)\|Java]] programming language includes the <code>HashSet</code>, <code>HashMap</code>, <code>LinkedHashSet</code>, and <code>LinkedHashMap</code> [[Generics in Java\|generic]] collections.<ref>{{cite web\|url=https://docs.oracle.com/javase/tutorial/collections/implementations/index.html\|title=Lesson: Implementations (The Java™ Tutorials > Collections)\|website=docs.oracle.com\|access-date=April 27, 2018\|url-status=live\|archive-url=https://web.archive.org/web/20170118041252/https://docs.oracle.com/javase/tutorial/collections/implementations/index.html\|archive-date=January 18, 2017\|df=mdy-all}}</ref> [[Python (programming language)\|Python]]'s built-in <code>dict</code> implements a hash table in the form of a [[~~Primitive_data_type~~Primitive data type\|type]].<ref>{{cite journal\|journal=[[Journal of Physics: Conference Series]]\|first1=Juan\|last1=Zhang\|first2=Yunwei\|last2=Jia\|title=Redis rehash optimization based on machine learning\|volume=1453\|year=2020\|issue=1 \|page=3\|doi=10.1088/1742-6596/1453/1/012048 \|bibcode=2020JPhCS1453a2048Z \|s2cid=215943738 \|doi-access=free}}</ref> [[Ruby (programming language)\|Ruby]]'s built-in <code>Hash</code> uses the open addressing model from Ruby 2.4 onwards.<ref>{{cite web\|url=https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding#hash-changes\|title=Ruby 2.4 Released: Faster Hashes, Unified Integers and Better Rounding\|author=Jonan Scheffler\|date=December 25, 2016\|website=heroku.com\|access-date=July 3, 2019\|df=mdy-all\|archive-date=July 3, 2019\|archive-url=https://web.archive.org/web/20190703145530/https://blog.heroku.com/ruby-2-4-features-hashes-integers-rounding#hash-changes\|url-status=live}}</ref> [[Rust (programming language)\|Rust]] programming language includes <code>HashMap</code>, <code>HashSet</code> as part of the Rust Standard Library. <ref>{{cite web \|title=doc.rust-lang.org \|url=https://doc.rust-lang.org/std/index.html~~\|title=doc.rust-lang.org\|access-date=December~~ ~~14, 2022~~\|url-status=live \|archive-url=https://web.archive.org/web/20221208155205/https://doc.rust-lang.org/std/index.html \|archive-date=December 8, 2022 \|access-date=December 14, 2022 \|df=mdy-all}}</ref> ~~test</ref>~~ The [[.NET]] standard library includes <code>HashSet</code> and <code>Dictionary</code>,<ref>{{cite web \|title=HashSet Class (System.Collections.Generic) \|url=https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1?view=net-7.0 \|website=learn.microsoft.com \|access-date=1 July 2023 \|language=en-us}}</ref><ref>{{Cite web \|last=dotnet-bot \|title=Dictionary Class (System.Collections.Generic) \|url=https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2?view=net-8.0 \|access-date=2024-01-16 \|website=learn.microsoft.com \|language=en-us}}</ref> so it can be used from languages such as [[C Sharp (programming language)\|C#]] and [[VB.NET]].<ref>{{cite web \|url=https://www.dotnetperls.com/hashset-vbnet \|title=VB.NET HashSet Example \|website=Dot Net Perls}}</ref> ==See also== Line 266 ⟶ 260: * [[Search data structure]] * [[Stable hashing]] * [[Succinct hash table]] {{div col end}} Line 287 ⟶ 282: [[Category:Articles with example C code]] ~~[[Category:Hashing\|*]]~~ [[Category:Hash-based data structures]] [[Category:1953 in computing]]