0% found this document useful (0 votes)
21 views32 pages

9 DictionaryandHashing-1

Uploaded by

Rohan Work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views32 pages

9 DictionaryandHashing-1

Uploaded by

Rohan Work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

DICTIONARY (MAP) AND HASHING

J.Govindarajan, Asst.Prof. (S.G) , CSE


Map

• A map is an abstract data type designed to


efficiently store and retrieve values based
upon a uniquely identifying search key for
each.
• Map stores keyvalue pairs (k,v), which is
called as entries
• Also known as associate array (using entry’s
key serves as index. Key need not be
numeric.)
Map : Applications

 A university’s information system relies on some form of a student ID as key

 domain-name system (DNS)maps a host name, such as www.wiley.com, to an


Internet-Protocol (IP) address, such as 208.215.179.146

 A social media site typically relies on a (nonnumeric) username as a key

 A company’s customer base may be stored as a map

 A computer graphics system may map a color name to RGB numbers


Multimap ADT

 Allows multiple entries to have the same key

 Multimap to contain entries (k,v) and (k,v′) having the same


key
Dictionary
 Models a searchable collection of key-
element items
Multiple items with same key allowed
 Main operations include
insertion, searching, and deleting
 Applications
– Telephone directory
– Mapping student info to roll nos
Dictionary : ADT

 find(k):
if the dictionary has an item with key k, returns the position of this item, else, returns a null position.

 insertItem(k, o):

inserts item o with key k into the dictionary

 removeElement(k):

removes the item with key k from the dictionary. Exception of no such element.
 Other functions

– size(), isEmpty()

– keys(), Elements()
Dictionary
 Types
– Ordered Dictionaries
A total order relation is defined on the keys
– Unordered Dictionaries
No order relation is assumed on the keys
Only equality testing between keys is used
 Associative Stores

– When keys are unique, keys are like addresses to the location where the element

is stored
Dictionary : Using Direct Addressing

• Key k is stored in slot k


• Applied when the number of keys are small and are unique
Dictionary : Using Direct Addressing - not suitable

 If the universe U is large, storing a table T of size |U| may


be impractical, or even impossible, given the memory
available on a typical computer.

 the set K of keys actually stored may be so small


relative to U that most of the space allocated for T would
be wasted
Dictionary : Using Hashing

• Mapping key into the


index

index = hash(key)
Hashing : An Example

Telephone directory
Hashing Function

key k

integer

index
Hash Code Maps

 Bit Representation as an Integer

combine in some way the high-order and low-order portions of a 64-bit key to form a 32-
bit hash code,

summation of bits

 Polynomial Hash Codes

 Cyclic Shift hast codes

 A variant of the polynomial hash code replaces multiplication by a with a cyclic shift of a
partial sum by a certain number of bits.
HashCode

 Memory address:

 interpret the memory address of the key object as an integer

 Casting to an Integer

 Eg: Float.floatToIntBits(x) in java

 Suitable for keys whose length is lesser than that of integer

 Summing the Components:

object x whose binary representation can be viewed as an n-tuple (x0,x1, . . . ,xn−1) of 32-
bit integers
Summing the Components

 Example: Summing the ASCII codes

STOP - 83 + 84 +79 + 80 = 326

POTS - 80 + 79 + 84 + 83 = 326
Polynomial Accumulation

 x0an−1 +x1an−2+···+xn−2a+xn−1

 a=33 or 37 or 39 or 41 will give at most 6 collisions on


vocabulary of 50,000 words

 By Horner’s rule the polynomial computation

xn−1+a(xn−2 +a(xn−3+···+a(x2+a(x1+ax0)) ···)).


Example

 xn−1+a(xn−2 +a(xn−3+···+a(x2+a(x1+ax0)) ···)).

 STOP (if a=33)


33*(80+(33*(79+(33*(84+(33*83)))))) = 101538822

 POTS (if a=33)


33*(83+(33*(84+(33*(79+(33*80)))))) = 97806918
Cyclic-Shift Hash Codes
 Example:
static int hashCode(String s) {
int h=0;
for (int i=0; i<s.length( ); i++) {
h = (h << 5)
h += (int) s.charAt(i); // add in next character
}  Example:

return h; STOP - 2804276

}
POTS - 2705107
Cyclic-Shift Hash Codes

 Example:
STOP - 2804276

POTS - 2705107
Hash code Excerise

What would be a good hash code for a vehicle identification


that is a string of numbers and letters of the form
"9X9XX99X9XX999999", where a "9" represents a digit
and an "X" represents a letter?
Answer:

 Either Summing components, Polynomial hash codes, or Cyclic Shift hash


codes would be appropriate
 Breaking into key consists of 6 letters and 11 digits into pieces:

break it into two groups of 3 letters, two groups of four digits and one group of

three digits. five numbers or components (x0, x1, x2, x3, x4)

- whose maximum values are 17575 (263-1), 17575 (263-1), 9999, 9999 and 999,

respectively. For the size of our hash table, we can choose a prime number near

20000 (for example, N=19997).


Answer (continued)

If we use summing of components, the hash function would


be
x0 + x1 + x2 + x3 + x4 mod N
For a polynomial hash code, the hash function is
x4 a4 + x3a3 + x2 a2 + x1 a + x0 mod N
which can be calculated using "Horner's Rule" as
x0 + z ( x1 + z ( x2 + z ( x3 + z ( x4 )))) mod N
Compression Functions

 Division Method
i mod N

 MAD (Multiply-Add-and-Divide) Method

[(ai+b) mod p] mod N,


Collision-Handling Schemes

 Separate Chaining

 Open Addressing
- linear probing
- quadratic probing
- double hashing
- Using pseudorandom number generator
Separate Chaining
• Good hash function, the
core map operations run
in O(⌈n/N⌉)

Where,
|λ= n/N, called the load
factor of the hash table

• As long as l is O(1), the


core operations on the
hash table run in O(1)
Figure: hash table of size 13, storing 10 entries
expected time.
Open Addressing

 This approach saves space because no auxiliary


structures are employed

 Open addressing requires that the load factor is


always at most 1 and that entries are stored directly
in the cells of the bucket array itself.
Linear Probing

hash function is h(k) = k mod 11.


Finding empty bucket
Next try A[( j+1) mod N] Next try A[( j+2) mod N] and so on
where, j = h(k),
Linear Probing

 get, put, or remove operations should be modified

 Replace a deleted entry with a special “defunct” sentinel


object to aovid

 Put should remember a defunct location encountered during


the search for k
Quadratic probing

Finding empty bucket:


A[(h(k)+ f (i)) mod N], for i =0,1,2, . . ., where f (i) =i2
 Complicates the removal operation
 Avoids the kinds of clustering patterns that occur with
linear probing.
 it creates its own kind of clustering, called secondary
clustering
Double hashing

 secondary hash function, h′,


A[(h(k)+ f (i)) mod N] next,
for i = 1,2,3, . . ., where f (i) = i · h′(k)

Where h′(k) = q−(k mod q)


Double Hashing example

 h(k) = k mod 13 and h′(k) = 8 – (k mod 8)

 h(k) = k mod 7 and h′(k) = 5 – (k mod 5)

A[(h(k)+ f (i)) mod N] next,


for i = 1,2,3, . . ., where f (i) = i · h′(k)
Exercise

 Draw the 11-entry hash table that results from using the
hash function, h(i) =(3i+5) mod 11, to hash the keys 12, 44,
13, 88, 23, 94, 11, 39, 20, 16, and 5, assuming collisions
are handled by chaining.

You might also like