0% found this document useful (0 votes)
20 views45 pages

23

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
20 views45 pages

23

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 45
@ o ; Basics of Hashing Complete Course on Data Structures - GATE 2024 & 2025 Sanchit Jain Lesson 25 + Apr 6, 2023 Introduction fo hashing ~, Main idea of data structure is to help us store the data. But Most common operation on any data structure is not insert or delete but actually search, as even for insertion and deletion search is also required. In any of the data structure the search time first depends on the number of elements which data structure contains and then on type of structure. for e.g. * Unsorted array — O(j!) —~ * sorted array — O(logn) a * link list — O(n) ae * BT-O(A) — + BST-O(n) ~~ * AVL-Ollogh) —— Tina n ——_ Sice So hashing is a technique where search time is independent of the number of items in which we are searching a data value. ‘The basic idea is to use the key itself to find the address in the memory to make searching easy. For e.g. to use phone number, roll no, Aadhar card, voter id or any other key and convert it into a smaller practical number (but it must be modified so a great deal of space is not wasted) and uses the small number as index in a table called hash table. ‘The values are then stored in hash table, By using that key you can access the element O14) time. > 1 overflow = keys buckets entries 0 Ss aT Ror location L. — ‘ute a ale + HKD) awe BG In simple terms, a hash function maps a big nui a.small integer that can be used as index in hash table. An array that stores s pol ters to records col esponding to our search key. The remaining entries can be nil. — <— * Collision: - It is possible that two different set of Keys Kee will yield the same hash address. This situation is called collision. The technique to resolve collision is called collision resolution. ® * Su We * Characteristics of good hash function : sy to compute and understand ficiently computable- It must take less time to compute __* Should uniformly distribute the keys (Each table position equally likely for each key) and should not result in clustering. _*-Must have low collision rate ‘\ us Olly) Le o— 7 L Most popular hash function ion-remainder method: The size of the number of items in the table is estimated. That number is then used as a divisor into each original value or key to extract a quotient and a remainder. The remainder is the hashed value. (Since this method is liable to produce a number of collisions, any search mechanism would have to be able to recognize a collision and offer an alternate search mechanism.) + H(K) = K(mod m) + H(K) = K(mod m) +4 * Note: Irrespective of how good a hash function is, collisions are bound to occur. Therefore, to maintain the performance of a hash table, it is important to manage collisions through various collision resolution techniques. Q Given the following input (4322, 1334, 1471, 9679, 1989, 6171, 6173, 4199) and the hash functio: {x mod 10; w ich of the following statements are true? (Gate- 2004) (1 Marks) i. 9679, 1986) 419)hash to the same value ii. 1471, 6171 has to the same value —— iii. All elements hash to the same value7< a.) iv. Each element hashes to a different value * 7 (A) i only (B) ii only a mvt! (C) iand ii only (D) iii or iv wa tl Man 3 173 ay 3a5 s ‘6 7 y aan PF 1781/4196 Collision resolution technique . _Open Addressing/closed hashing - In Open Addressing, all elements are stored in the hash table itself. i.e. collision is resolved by probing or searching through alternate locations in the ee Hash table itself i i ‘sequence. * When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table. So, at any point, size of table must be greater than or equal to total number of keys. * Itis of three types(inear probing, qdadratic probing, double hashing * Performance of Open Addressing: Like Chaining, performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed to any slot of table (simple uniform hashing) * m= Number of slots in hash table * n= Number of keys to be inserted in hash table + Load factor a= n/m (< 1) * Expected time to search/insert/delete < 1/(1- a) * So Search, Insert and Delete take (1/(1 - a)) time Q Given a hash table T with(25 sloBs 3 that stores2000 elements, the load factor a for Tis (Gate-2015) (1 Marks) (A) 80 (B) 0.0125 (c) 8000 (D) 1.25 Q Consider a hash function that distributes keys uniformly. The hash table size is 20. After hashing of how many keys-witt the probability that any new key-hashed fl collides with an existing-one exceed{0.5. |Gate- 1-2007) (2 Marks) (a)s (8)6 (7 (0) Seren eli ae 7 da Q Which among the following statement(s) is(are) true? a. Ahash function takes a message of arbitrary length and generates a fixed length code b. A hash function takes a message of fixed length and generates a code of variable length c. Ahash function may give same hash value for distinct messages Choose the correct answer from the options given below:(NET 2020 OCT) (A) (a) only (B) (b) and (c) only (C) (a) and (c) only (D) (b) only lear probing Searches the table sequentially starting at the position given by the hash function, until finding a cell with a matching key or an empty cell. It takes constant expected time per search, insertion, or deletion when implemented using a random hash function. Using linear probing, dictionary operations can be implemented in constant expected time. In other words, insert, remove and search operations can be implemented in 0(1), as long as the load factor of the hash table is a constant strictly less than one. + Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k. * Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached. + Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots of deleted keys are marked specially as “deleted”. insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot Linear Probing © In linear probing method, in case of a collision we find out the next free space and store the key that is causing collision in it. * The method of linear probing uses the hash function hi(k, i) = (h’(k) + i) mod m; fori=0,1,...,m-1. Example: Let us take the previous example, where the key value 13 was causing the collision at location 3. © h (23) = (h(13) + 0) mod 10 = 3, since it is causing collision we consider the next value of i, i.e. © h (13) = (h(13) + 1) mod 10 = 4, now at this location there is no collision so we place the value 13 at location 4. i 0 oo[alofafa|d|rle Primary Clustering * Advantage * The most popular implementation on standard hardware uses linear probing, which is both fast, simple and easy to implement. * Linear probing can provide high performance because of its good locality of reference. * Disadvantage + Is more sensitive to the quality of its hash function than some other collision resolution schemes. * its performance degrades more quickly at high load factors because of primary clustering, a tendency for one collision to cause more nearby collisions. Basic operations takes more time. etc * Additionally, achieving good performance with this method requires a higher-quality hash function than for some other collision resolution schemes. Q Consider a hash table efsize seven, with starting index zero, and a hash funcfion (7x+3) Dod 4. Assuming the hash table is initially empty, which of the following is the contents of the table when the sequence 4d, 8, 107s inserted into the table usingGlosed hashing? Here “_” denotes an empty location in the table. (NET-JULY-2018) 10; 4) 10) Ue (b) 1, 3,8,10,_, ()1,_,3,_@&— 10% (d)3,10,_,_ (8) > nF Cc i |e Ga 4 rnd z \ 3] 8 : - é QA hash table contains 10 buckets and us¢ linear probing to resolve collision. The key values are integers and the hash function used is key%10, if the values 165" are inserted in the table, in what location would the key value 142 be inserted? (Gate-2005) (1 Marks) So Aya b)3 cha ae > | Su 42 C lazy a las mu San arcun-o QThe keys 12/7 A 8 2p 2,8, 25, and AS are inserted into an initially empty hash using table of length 1 open addressing with hash function h(k) = k mod 10 and linear probing. What is the resultant hash table? ooo (2 Marks) Qconsider a hash table of size 11 that uses en addressing with oe Let h(k)= k mod 11 be the hash function used. A sequence of records with keys 43 < 59 srs wihis inserted into an initially empty hash table, indexing is from 0, what is the index of the bin into which the last record is inserted? (Gate-2008) (2 Marks) a)3 b ° 1 a fe p 7 u 13 6 AR 4 2 ‘4 W a TH AB C6. ANT 4\ we Q Consider a hash table of size seven, with starting index zero, and a hash function (3x + 4)mod7.. ‘Assuming the hash table is initially empty, which of the following is the contents of the table when the sequence 1, 3, 8, 10 is inserted into the table using closed hashing? Note that ‘_’ denotes an empty location in the table. (Gate-2007) (2 Marks) (A)8,_,_, 10 (B) 1,8,10,_,_,_,3 (914,,,55.4_3 (D) 1, 10,8, _,_,_3 QA hash table of length 10 uses open addressing with hash function h(k)=k mod 10, and linear probing. After inserting 6 values into an empty hash table, the table is as shown below. Which one of the following choices gives a possible order in which the key values could have been inserted in the table? (Gate-2010) (2 Marks) (A) 46, 42, 34, 52, 23, 33 (B) 34, 42, 23, 52, 33, 46 (0) 36.34/92 33,38 (58°F Kiv (D) 42, 46, 33, 23, 34, 52 wln|elo iw) : Non} un} He io }e0 QA hash table with ten buckets with one slot per bucket is shown in the following figure. The symbols S, to S, initially entered using a hashing function with linear probing. The maximum number of comparisons needed in searching an item that is not present is (Gate-1989) (2 Marks) OMNAUAWNHO * Primary clustering is one of two major failure modes of open addressing based hash tables, especially those using linear probing. For instance, in linear probing, a record involved in a collision is always moved to the next available hash table cell subsequent to the position given by its hash function, creating a contiguous cluster of occupied hash table cells. Whenever another record is hashed to anywhere within the cluster, it grows in size by one cell * Arelated phenomenon, secondary clustering, occurs more generally with open addressing modes including linear probing and quadratic probing in which the probe sequence is independent of the key. In this phenomenon, a low-quality hash function may cause many keys to hash to the same location, after which they all follow the same probe sequence or are placed in the same hash chain as each other, causing them to have slow access times. \ L ‘Quadratic prob ing * Quadratic probing operates by taking the original hash index and adding successive values of an arbitrary quadratic polynomial until an open slot is found. * Quadratic probing uses a hash function of the form (ki) = (W’(k) + £(2)) mod m Where, h’ is an auxiliary hash function and i=0, 1, ..,m-1. L oa (=& Ce Ley u Loita Example: Consider the key values 8, 3, 13, 23 and the hash table size is 10. '* Bill be placed at: h (8) = [h (8) +f (0°)] mod 10 = 8, soit gets placed at location 8. ‘+ 3.will be placed at: h (3) = [h (3) + (02)] mod 10 = 3, no collision, so it gets placed at location 3. ‘© 13 willbe placed at: h (13) = [h (13) + f (0")] mod 10 = 3, colision occurred, so we increase the value of i. 1h (23) = [h (23) + (2°]] mod 10 = 4, no collision, so it gets placed at location 4, ‘© 23 will be placed at: h (23) = [h (23) + (0°)] mod 10 = 3, colision occurred, so we increase the value of i. bh (23) = [h (23) + (12]] mod 10 = 4, again collision occurred, so we increase the value of i. (23) = [h (23) +f (2°1] mod 10 = 3 + 4 = 7, no collision occurred, soit gets placed at location 7. '* Quadratic probing avoids clustering of elements and thus improves the searching time. * Advantage * Quadratic probing can be a more efficient algorithm in a closed hashing table, since it better avoids the clustering problem that can occur with linear probing, although it is not immune. * Italso provides good memory caching because it preserves some locality of reference; however, linear probing has greater locality and, thus, better cache performance. * Disadvantage * Quadratic probing lies between the two in terms of cache performance and clustering. * The idea is to make each cell of hash table point to a linked list of records that have same hash function value. In chaining, we place all the elements that hash to the same slot into the same linked list. * Advantage: - Chaining is simple * Disadvantage: -but requires additional memory outside the table. Peet o PIRI) chaining is Simpler to implement, PRIN)!» chaining, Hash table never fills up, we can always add more elements to chain. Chaining is Less sensitive to the hash function Jor load factors. Chaining is mostly used when itis unknown how many and how frequently keys may be Inserted or deleted. [Cache performance of chaining is not good as. keys are stored using linked list. | Wastage of Space (Some Parts of hash table in chaining are never used). [Chaining uses extra space for links. CTOs d ‘Open Addressing requires more computation. In open addressing, table may become full. ‘Open addressing requires extra care for to avoid Clustering and load factor. ‘Open addressing is used when the frequency and ‘number of keys is known. ‘Open addressing provides better cache performance ‘as everything is stored in the same table. In Open addressing, a slot can be used even if an input doesn’t map tot. No links in Open addressing QAn advantage of chained hash table (external hashing) over the open addressing scheme is (Gate-1996) (1 Marks) (A) Worst case complexity of search operations is less (B) Space used is less (C) Deletion is easier (D) None of the above Q Consider a hash table with 9 slots. The hash function is #(k) = k mod a The are resolved by chaining. The following 9 keys are inserted in the order; 5, 33,011, 10. The maximum, minimum, and average chain lengths in the hash ce respectively, are (Gate-2014) (2-Marks) [ Asked in Accenture] won i? (4-6-and- M74 SORA SORE +—>\ L- jes tr PLY 7 L {8)}3;Henda~ D2? —>\, 10 (0}3-Oardae, | +2. Tr © Vee ee 15, 20, ue . QConsider a hash table with 100 slots. Collisions are resolved using chaining. Assuming simple uniform hashing, what is the probability that the first 3 slots are unfilled after the first 3 insertions? (Gate-2014) (2 Marks) (A) (97 x 97 x 97)/100? (B) (99 x 98 x 97)/1002 (C) (97 x 96 x 95)/100? (D) (97 x 96 x 95)/(3! x 100°) ut Double Hashing Dower panning anaes wm easy as she a Sy sya ny ha one ah era cass Date Wh ees csc stun aie ‘eles eoine est has anes eae ede) ep ov mr eee ae ee ny an aed eee ste ee Ds ea ey ase ret cin Ute eco ein ese py maa bey ea ees one a oh aes ayy eae bare er bce eases seas ede os Becta erg Goer ann nm apenas chs ath, enon De cesepee rae kaa aes Tne Mi) = (h(t) +h) mod Gea hath asec tna umes tacos ssecet tae se< (07-1 ae ec ~1}Outeasy armas arn seb mre pee pce ene as crs yep nT) Data parce Desareboesepece Q Consider a double hashing scheme in which the primary hash function is h,(k) = k mod 23] and ig h,(k) = 14(k mod 19). the secondary hash functiot Assume that the table size is 23. Then the address returned by probe 1 in the probe sequence (assume that the probe sequence begins at probe 0) for key value k = 90 is 4 _.? (Gate-2020) (2-Marks) Peo,'\he- GT Ls [re 2 Phe) | ma eo Ga ebay - a ae (oats + Ce GF) Hoy fe os Q Consider double hashing of the form h(k,i) = (h,(k) + i h,(k))mod m where h,(k)=k mod m, > ha(k}= 1+(k mod n) where n=m-1 and m=701. For k=123456, what is the difference between first and second probes in terms of slots?(NET 2019 June) (A) 255 (B) 256 (C) 257 (D) 258 Break ia are given n keys, m hash table slots, and two simple uniform hash eee and +h, . Further suppose our hashing scheme uses h, for the odd keys and h, for the even keys. What is the expected number of keys in a slot? | (GATE 2022) (1 MARKS) (A) m/n ((B) n/m (C) 2n/m (D) n/2m Q Consider a dynamic hashing approach for 4-bit integer keys: (A) There isa main hash table of size 4. (B) The 2 least significant bits of a key is used to index into the main hast (C) initially, the main hash table entries are empty. (0) Thereafter, when more keys are hashed into it, to resolve collisions, the set of all keys corresponding to a main hash table entry is organized as a binary tree that grows on demand. (E) First, the 3rd least significant bit is used to divide the keys into left and right subtrees. (F) To resolve more collisions, each node of the binary tree is further sub-divided into left and right subtrees based on the 4th least significant bit. (6) A split is done only if itis needed, i.e,, only when there is a collision. Consider the following state of the hash table. Which of the following sequences of key insertions can cause the above state of the hash table (assume the keys are in decimal notation)? (GATE 2021) (2 MARKS) (4)5,9,4,13,10,7 (8) 9,5,10,6,7,2 (C) 10,9,6,7,5,13 (0) 9,5,13,6,10,14 QWhich one of the following hash functions on integers will distribute keys most uniformly over 10 buckets numbered 0 to 9 for i ranging from 0 to 2020? (Gate-2015) (2-Marks) (A) h(i) =i? mod 10 (B) h(i) =i? mod 10 (C) h(i) = (11 * i?) mod 10 (D) h(i) = (12 * i) mod 10

You might also like