0% found this document useful (0 votes)
97 views24 pages

Hash Table

Hash tables provide constant-time insertion, deletion and search by using a hash function to map keys to array indices. Collisions occur when different keys hash to the same index, and separate chaining resolves collisions by storing keys in linked lists at each index. The document discusses using a hash table to count integer frequencies, considers appropriate data structures for associating names to phone numbers, and reviews how separate chaining works by inserting example keys into a hash table and linking collided keys into buckets.

Uploaded by

Ram C. Gudavalli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views24 pages

Hash Table

Hash tables provide constant-time insertion, deletion and search by using a hash function to map keys to array indices. Collisions occur when different keys hash to the same index, and separate chaining resolves collisions by storing keys in linked lists at each index. The document discusses using a hash table to count integer frequencies, considers appropriate data structures for associating names to phone numbers, and reviews how separate chaining works by inserting example keys into a hash table and linking collided keys into buckets.

Uploaded by

Ram C. Gudavalli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Motivating Hash Tables

For a dictionary with n key, value pairs


insert

find

delete

Unsorted linked-list
O(1)
O(n)
O(n)
Unsorted array
O(1)
O(n)
O(n)
Sorted linked list
O(n)
O(n)
O(n)
Sorted array
O(n)
O(log n)
O(n)
Balanced tree
O(log n) O(log n)
O(log n)
Magic array
O(1)
O(1)
O(1)

Sufficient magic:
Use key to compute array index for an item in O(1) time [doable]
Have a different index for every item [magic]

11/3/16

Motivating Hash Tables


Lets say you are tasked with counting the frequency of
integers in a text file. You are guaranteed that only the
integers 0 through 100 will occur:
For example: 5, 7, 8, 9, 9, 5, 0, 0, 1, 12
Result: 0 2
11
52
71

81

2
What structure is appropriate?
Tree?
2
1
2
List?
Array?
0
1
2
3
4
5

11/3/16

1
6

2
8

Motivating Hash Tables


Now what if we want to associate name to
phone number?
Suppose keys are first, last names
how big is the key space?

Maybe we only care about students

11/3/16

Hash Tables

Aim for constant-time (i.e., O(1)) find, insert, and


delete
On average under some often-reasonable assumptions

hash table

A hash table is an array of some fixed size


0

Basic idea:
hash function:
index = h(key)

key space (e.g., integers, strings)


11/3/16

TableSize 1

11/3/16

Hash Tables vs. Balanced


Trees
In terms of a Dictionary ADT for just insert, find,
delete, hash tables and balanced trees are just
different data structures
Hash tables O(1) on average (assuming we follow good
practices)
Balanced trees O(log n) worst-case

Constant-time is better, right?


Yes, but you need hashing to behave (must avoid
collisions)
Yes, but findMin, findMax, predecessor, and successor go
from O(log n) to O(n), printSorted from O(n) to O(n log n)
Why your textbook considers this to be a different ADT
11/3/16

Hash Tables
There are m possible keys (m typically large, even
infinite)
We expect our table to have only n items
n is much less than m (often written n << m)
Many dictionaries have this property
Compiler: All possible identifiers allowed by the language vs.
those used in some file of one program
Database: All possible student names vs. students enrolled
AI: All possible chess-board configurations vs. those
considered by the current player

11/3/16

Hash functions
An ideal hash function:
Fast to compute
Rarely hashes two used keys to the same index
hash table
Often impossible in theory but easy in practice
0
Will handle collisions later

hash function:
index = h(key)

key space (e.g., integers, strings)


11/3/16

TableSize 1

Simple Integer Hash Functions


key space K = integers
TableSize = 7
h(K) = K % 7
Insert: 7, 18, 41

11/3/16

0
1
2
3
4
5
6

18

41

Simple Integer Hash Functions


0
1
2
3
h(K) = ??
4
5
Insert: 7, 18, 41, 34
What happens when we insert 6
44?
7
8
9
key space K = integers
TableSize = 10

11/3/16

10

41

34

7
18

Aside: Properties of Mod


To keep hashed values within the size of the
table, we will generally do:

h(K) = function(K) % TableSize


(In the previous examples, function(K) = K.)

Useful properties of mod:


(a + b) % c = [(a % c) + (b % c)] % c
(a b) % c = [(a % c) (b % c)] % c
a % c = b % c (a b) % c = 0

11/3/16

11

Designing Hash Functions


Often based on modular hashing:

h(K) = f(K) % P
P is typically the TableSize
P is often chosen to be prime:
Reduces likelihood of collisions due to patterns in
data
Is useful for guarantees on certain hashing strategies
(as well see)

Equivalent objects MUST hash to the same


location
11/3/16
12

Some String Hash


Functions
key space = strings
K = s0 s1 s2 s

m-1

(where si are chars: si [0,

128])
H(batman) = H(ballgame)

1. h(K) = s0 % TableSize

si
i 0

2. h(K) =

m 1

m1

3. h(K) =
11/3/16

s 37
i

i0

H(spot) = H(pots)

% TableSize

% TableSize
13

What to hash?
We will focus on the two most common things to hash:
ints and strings
For objects with several fields, usually best to have most of
the identifying fields contribute to the hash to avoid
collisions
Example:
class Person {
String first; String middle; String last;
Date birthdate;
}
An inherent trade-off: hashing-time vs. collision-avoidance

11/3/16

Bad idea(?): Use only first name


Good idea(?): Use only middle initial? Combination of fields?
Admittedly, what-to-hash-with is often unprincipled
14

Deep Breath
Recap

11/3/16

15

Hash Tables: Review


Aim for constant-time (i.e., O(1)) find, insert, and
delete
On average under some reasonable assumptions

A hash table is an array of some fixed size


But growable as well see

hash table library

client
E

hash table

int

collision? collision
table-index
resolution

TableSize 1
11/3/16

16

Collision resolution
Collision:
When two keys map to the same location in
the hash table
We try to avoid it, but number-of-keys exceeds
table size
So hash tables should support collision
resolution
Ideas?
11/3/16

17

Separate Chaining
0

11/3/16

Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)
As easy as it sounds
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10
18

Separate Chaining
0

10 /

11/3/16

Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)
As easy as it sounds
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10

19

Separate Chaining
0
1

10 /

/
22 /

2
3

11/3/16

Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)
As easy as it sounds
Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10

20

Separate Chaining
0
1

10 /

/
22 /

2
3
4
5
6

/
/
/
/

7
/

11/3/16

As easy as it sounds

107 /

Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)

Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10

21

Separate Chaining
0
1

10 /

/
12

2
3
4
5
6

/
/
/
/

7
/

11/3/16

As easy as it sounds

107 /

22 /

Chaining:
All keys that map to the same
table location are kept in a list
(a.k.a. a chain or bucket)

Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10

22

Separate Chaining
0
1

10 /

/
42

2
3
4
5
6

/
/
/
/

7
/

11/3/16

As easy as it sounds
107 /

12

Chaining:
All keys that map to the same
table location are kept in a
22 /
list (a.k.a. a chain or
bucket)

Example:
insert 10, 22, 107, 12, 42
with mod hashing
and TableSize = 10

23

More rigorous chaining


analysis

Definition: The load factor, , of a hash table is


N

TableSize

number of elements

Under chaining, the average number of elements per


bucket is
So if some inserts are followed by random finds, then on
average:
Each unsuccessful find compares against items

So we like to keep fairly low (e.g., 1 or 1.5 or 2) for


chaining
11/3/16
24

You might also like