0% found this document useful (0 votes)
44 views16 pages

Hashing

This is 4th sem DAA inportant topic notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
44 views16 pages

Hashing

This is 4th sem DAA inportant topic notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 16
Data Structures & Applications 9-18 Sorting and Searching [G3 Hashing Hashing is an effective way to store the elements in some data structure. It allows to reduce the number of comparisons. Using the hashing technique we can obtain the concept of direct access of stored record. There are two concepts used regarding the hashing and those are - hash table and hash function. (5G Hash Table Organization [EGE Hash Table © Hash Table is a data structure used for storing and retrieving data very quickly. Insertion of data in the hash table is based on the key value. Hence every entry in the hash table is associated with some key. For example for storing an employee record in the hash table the employee ID will work as a key. © Using the hash key the required piece of data can be searched in the hash table by few or more key comparisons. The searching time is then dependant upon the size of the hash table. * The effective representation of dictionary can be done using hash table. We can place the dictionary entries (key and value pair) in the hash table using hash function. [E24] Hash Functions * Hash function is a function which is used to put the data in the hash table. Hence ‘one can use the same hash function to retrieve the data from the hash table. Thus hash function is used to implement the hash table. © The integer returned by the hash function is called hash key. For example : Consider that we want to place some employee records in the hash table. The record of employee is placed with the help of key : employee ID. The employee ID is a 7 digit number for placing the record in the hash table. To place the record, the key 7 digit number is converted into 3 digits by taking only last three digits of the key. If the key is 496700 it can be stored at 0" position. The second key is 8421003, the record of this key is placed at 2" position in the array. Hence the hash function will be - TECHNICAL PUBLICATIONS” An up thrust for knowledge @ scanned with OKEN Scanner is sructuros & Applications 9-19 Sotting and Searching pi(key)=key%1000. where key%1000 is a hash function and key obtained by hash function is called | hash key- The hash table will be - | Employee ID i 0 1 | Poet | 998) i ig used to map several é ner ie hash table is called and hore a posh table. Each position of = one i ir whose value is ies : pucket for the dictionary with pair whose function Hikey) is home The @ scanned with OKEN Scanner Data Structures & Applications 9-20 Sorting and Searching Typically the divisor is table length. For example :- If the record 54, 72, 89, 37 is to be placed in the hash table and if the table size jg 10 then h(key) = record%table size 4 = 54%10 2= 72%10 9 = 89%10 7 = 37%10 ee uaa eoneo 2. Mid square : In the mid square method, the key is squared and the middle or mid part of the result is used as the index. If the key is a string, it has to be preprocessed to produce a number. Consider that if we want to place a record 3111 then. 3111? = 9678321 For the hash table of size 1000 H(@GI11) = 783 (the middle 3 digits) 3. Multiplicative hash function : The given record is multiplied by some constant value. The formula for computing the hash key is - H(key)=floor(p “(fractional part of key*A)) where p is integer constant and A is constant real number. Donald Knuth suggested to use constant A = 0.61803398987 If key 107 and p = 50 then Hikey) = floor(50*(107*0.61803398987) = floor(3306.4818458045) = 3306 ‘At 3306 location in the hash table the record 107 will be placed. 4. Digit folding : The key is divided into separate parts and using some snl operation these parts are combined to produce the hash key. For example, consider a record 123 pasts 65412 then it is divided into separate 123 654 12 and these are added a together ——— @ scanned with OKEN Scanner = = 9-21 Sorting and Searching " Hikey) = 123+654+12 = 0789 | The record will be placed at location 789 in the hash table. 5, Digit analysis : The digit analysis is used in a situation when all the identifiers are jnown in advance. We first transform the identifiers info numbers using some radix, r. Then we examine the digits of each identifier. Some digits having most skewed distributions are deleted. This deleting of digits is continued until the number of remaining digits is small enough to give an address in the range of the hash table. Then these digits are used to calculate the hash address Ter a ieee | L Tin ach fc. What do yo may pst ashi? fal Static and Dynamic Hashing resultant bucket size remains the same- in which the 1 : é ‘in the bucket address. Various techniques of static ree eam are fixed, this type of hashing can not handle overflow jn(10). Consider the elements that can be inserted in the { 0 iE 2 3 | 93 4 | 2 5 | 6 6 ‘le 8 | 9 |» TECHNICAL PUBLICATIONS”> An up trust for kroweda” J @ scanned with OKEN Scanner Data Structures & Applications 9-22 Sorting and Searching Dynamic Hashing This is an hashing technique in which the bucket size is not fixed. It can grow o shrink. Typical example of dynamic hashing is extendible hashing technique. Extensible hashing is a technique which handles a large amount of data. The data to be placed in the hash table is by extracting certain number of bits. © Extensible hashing grow and shrink similar to B-trees. In extensible hashing referring the size of directory the elements are to be placed in buckets. The levels are indicated in parenthesis. For example : ea (ty (1)—— Levels oot 111 Data to o10 be placed in bucket Fig. 9.8.1 © The bucket can hold the data of its global depth. If data in bucket is more than global depth then, split the bucket and double the directory. 7 aoe Consider we have to insert 1, 4, 5, 7, 8, 10. Assume each page can hold 2 data © entries (2 is the depth). oot Step 1: Insert 1, 4. 100 1=001 4=100 Fig. 9.8.2 We will examine last bit of data and insert the data in bucket. To | eal Insert 5. The bucket is full. Hence double the directory. “ o tor, i 1=001 f 101 4=100 = 5 = 10} Fig. 9.8.3 @ scanned with OKEN Scanner - & ications ws grvtures & Applica 9-23 Sorting and Searching step 2: Insert 7. (1) a) geil 001 Insert 7 = But as depth is full we can 401 ot insert 7 here. Then double the directory ocoVv—_C_C— and split the bucket. Fig. 9.8.4 ‘After insertion of 7. Now consider last two bits. eae (2) 400 11 Fig. 9.8.5 Step 3: Insert 8 ie. 1000. o re 411 4000 w Fig. 9.8.6 ta is inserted using extensible hashing, . 1010 oa thus the ae es (2) 0 4010 Fig. 9.8.7 -~, © scanned with OKEN Scanner Data Structures & Applications 9-24 Sorting and Searching Ea Concept of Collision ‘The hash function is a function that returns the key value using which the record can be placed in the hash table. Thus, this function helps us in placing the record in the hash table at appropriate position and due to this we can retrieve the record directly from that location. This function needs to be designed very carefully and it should not return the same hash key address for two different records. This is an undesirable situation in hashing. | Definition : The situation in which the hash function returns the same hash key(home bucket) for more than one record is called collision and two same hash keys returned for different records is called synonyri. : é ry Similarly when there is no room for a new pair in the hash table then such a situation is called overflow. Sometimes when we handle collision it may lead to overflow conditions. Collision and overflow show the poor hash functions. For example : Consider a hash function. H(key) = recordkey%10 having the hash table of size 10. The recordkeys to be placed are 131, 44, 43, 78, 19, 36, 57 and 77 AG 3175 f (. eNaaeenno e 9119 Now if we try to place 77 in the hash table then we get the hash key to be 7 and at index 7 already the recordkey 57 is place. This situation is called collision. From the index 7 if we look for next vacant passion at subsequent indices 8,9 then we find that there is no room to piace 77 in the hash table. This situation is called overflow. Characteristics of good hashing function - 1. The hash function should be simple to compute. 2. Number of collisions should be less while placing the record in the hash table Ideally no collision should occur. Such a function is called perfect hash function 3. Hash functions should produce such a keys(buckets) which will get distributed uniformly over an array. ee @ scanned with OKEN Scanner , ances & Aplications e 9-25 os The hash function should di Sottg and Searching + anction that simpl lepend upon every bi fun iply extracts the portion of ‘ey bit of the key. Thus the a key is not suitable. hash le. Collision Resolution Techniques If collision occurs then it should be colli i handk is iechnigue #8 called collision handling inked BY Applying wim iectengocs So There are two methods for detectin; . collisior i aa iB wns and overflows in the hash table : 2. Open addressing (linear probing) Two more difficult collision handling techniques are - 1, Quadratic probing 2. Double hashing [ERED Open Addressing-Linear Probing This is the easiest method of handling two records demand for the ame home bucket in the hash table r second record linearly di solved by placing found. When use probing(open addressing) hash table is asa one-dimensional "ray th indices that range rom 0 to the desired table sizel- into this table, We must init the table to represent the situation W On ea meee can be inserted into the hash table. ample * se ae an followin keys are 0 be inserted in ; = the hash table- : 9,29 4, 8, 7 2 5, 31, 61, 9 : Sati i following keys ** the hash : table. : 4, 4, 8 7 / : we will use Division hash function That means the 7 keys are placed using the formula ‘ Hitkey) = Keybtablesize 3 Hkey) = 10 ; na element 131 can DE placed at @ scanned with OKEN Scanner Data Structures & Applications 9-26 Sorting and Searching 131%10 =1 Index 1 will be the home bucket for 131. Continuing in this fashion we will place 4, 8 and 7. Now the next key to be inserted is 21. According to the hash function H(key) = 21%10 H(key) = 1. But the index 1 location is already occupied by 131 ie. collision occurs. To resolve this collision we will linearly move _down and at _the_next_empty location we will prob the element. Therefore 21 will be placed at the index 2. If the next element is 5 then we get the home bucket for 5 as index 5 and this bucket is empty so we will put the element 5 at index 5. ‘After placing record keys 31, 61 the hash table will be as shown in Fig. 9.10: The next recordkey that comes is 9. According to decision hash function it demands for the home bucket 9. Hence we will place 9 at index 9. Now the next final recordkey is 29 and it hashes a key 9. But home bucket 9 is already occupied. And there is no next empty bucket as the table size is limited to index 9. The overflow occurs. To handle it we move back to Pala bucket 0 and is the location over there is oe empty 29 will be placed at 0" index. Hikey) Index| Key eet See Index weer aane BnHo a Problem with linear probing ; One major problem with linear probing is primary clustering, Primary clustering is a prowess in which a block of data is formed in the hash table when colson resolved. © scanned with OKEN Scanner ita Structures & Applications 9-27 Sorting and Sez For example : 0 1 E cluster is formed 19% 10 =9 | 2 18% 10 = 8 . a 39 %10 =9 7 29% 10 =9 5 rest of the table is empty 8% 10 =8 6 =) 7 | 8 18 oa jem can be solved by eee Probing. ce DN TT nn mnenecs NESE SRT (® scanned with OKEN Scanner 1 9.10.2 | Separate Chaining In collision handling method chaining is a concept which introduces an additional field with data i.e. chain. A separate chain table is maintained for colliding data. When collision occurs then a linked list(chain) is maintained at the home bucket. For example : Consider the keys to be placed in their home buckets are 131, 3, 4, 21, 61, 24, 7, 97, 8, 9 Then we will apply a hash function as Hikey) = key%D where D is the size of tabie. The hash table will be - oawro a ds oN A Fig. 9.10.2 Chaining TECHNICA: PUBLICATIONS”- An up thrust for knowledge © scanned with OKEN Scanner y ast & Applications 9-31 : Sorting and Searching yore D = 102 | in is maintained for collidi i jenain iding elements. For instance 131 has a home bucket similarly keys 21 and 61 demand yt a A and for home bucket 1. Hi in i tf et onde Silly he a at nec 7 id —- pos Double Hashing Double hashing is technique in which a second hash function is applied to the key when a collision occurs. By applying the second hash function we will get the number of positions from the point of collision to insert. There are two important rules to be followed for the second function : It must never evaluate to zero. « Must make sure that all cells can be probed. Hy (key) = key mod tablesize Hy key) = M— (key mod M) ee where M is a prime number smaller than the size of the table. ier the following elements to be placed in the hash table of size 10 5, 22, 17, 49, 557 the elements using Consid the formula for Hy(key)- ow if 17 isto be inset ted then TECHNICAL PUBLICATIONS”- Anup thst fr nowdadge (® scanned with OKEN Scanner Data Structures & Applications H,(17) = 37%10=7 H,(key) = M- (key%M) Here M is a smaller than table size 10 is 7. Hence M = 7 37. eer 27ers eNO 49 Fig. 9.10.3 H,(I7) = 7-(17%7) =7-3=4 ‘That means we have to insert the element 17 at 4 places from 37. In short we have to take 4 jumps. Therefore the 17 will be placed at index 1. Now to insert number 55. H, (65) = 55%10 = 5 Hy(55) = 7- (65%7) =7-6 =1 That means we have to take one jump from index 5 to place 55. Fi table will be - eNaneonno Sorting and Searing prime number smaller than the size of the table. Prime number » Collision inally the hash (® scanned with OKEN Scanner 4p snus & Applications o Sorting and Soars pl Rehashing ~antig a Soins Rehashing is a technique in which the table is resized, 7 je, the size of table is doubled Transferring 4 [-——- the contents 2 by creating a new table. It is preferable if the total size of ° fable is a prime number. There 1 gre situations in which the 2 rehashing is required - i © When table is completely full. 9 ¢ With quadratic probing when the table is filled half. When insertions fail due to overflow. In such situations, we have Fig. 9.10.4 Rehashing = New table ting their positions using suitable hash functions: Consider we hav 37, 90, 55, 22, 17, 49 and 87, The table size is 10 and will use hash function, Hikey) = key mod tablesize i 90 37% 10=7 90% 10=0 2a 55% 10=5 | 1 2 3 2% 10 =2 4 5 17 % 10 = 7 Collision solved by 55 linear probing, by wiacng tat 8 49% 10=9 6 7\ 37 8) 17 9 49 (® scanned with OKEN Scanner | (® scanned with OKEN Scanner Data Structures & Applications gw ity Wd newhing Now this table is almost full and if we try to insert mune ements cMickena, wi occur and eventually further insertions will fail. Hence we will retash try dentthing, tee table size. The old table size is 10 then we should double this ize for new tathe, tas becomes 20. But 20 is not a prime number, we will prefer to make the tithe tan a1, And new hash function will be Hikey) = key mod 23 Wh B= 16 Nb B= 55% B=9 Db B=2 7% B=17 9% B= 87% B= 1B fe A te tet tates Lawn tn anreenrandate new insertions. ~~ (® scanned with OKEN Scanner ata Structures & Applications 9-35 = Sorting and Searching dvantages 1. This technique provide voquited, Provides the programmer a flexibility to enlarge the table size if 2. Only the ith si per hmaes gets doubled with simple hash function which avoids occurrence of beige coh) 1. Discuss in brief the linear probing collision resolution technique. What are the disadvantages of this technique ? How could it be overcome ? 2. Define hash function. Explain collision resolution strategies. EXER Applications of Hashing ers to keep track of declared variables. Jlling checking the hashing functions are used. in Game playing programs to store the moves made. while caching the web pages, hashing is used. 4, In compil 3, Hashing helps 4, For browser program aoa

You might also like