0% found this document useful (0 votes)
9 views24 pages

What Is Hashing?

Uploaded by

nouryones38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views24 pages

What Is Hashing?

Uploaded by

nouryones38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

1/8/23

Hashing

Lecture No. 8

Contents

1 What is Hashing?

2 Hash Function

3 Collisions Reduction

7 LOGO
7

1
1/8/23

What is Hashing?

8 LOGO
8

Introduction
v Hashing is a useful searching technique, which can be
used for implementing indexes.

v The main motivation for Hashing is improving searching


time.

v Below we show how the search time for Hashing


compares to the one for other methods:
§ Simple Indexes (using binary search): O(log2N)
§ B Trees and B+ trees: O(logkN)
§ Hashing: O(1)

9 LOGO
9

2
1/8/23

What is Hashing?
v The idea is to discover the location of a key by simply
examining the key. For that we need to design a hash
function.

v A Hash Function is a function h(k) that transforms a key


into an address

v An address space is chosen before hand. For example,


we may decide the file will have 1,000 available
addresses.

v If U is the set of all possible keys, the hash function is


from U to {0,1,...,999}, that is h : U → {0,1,...,999}
10 LOGO
10

Example

ASCII code
HOME
NAME for first two PRODUCT
ADDRESS
letters
BALL 66 65 66×65=4290 290

LOWELL 76 79 76×79=6004 004

TREE 84 82 4×82=6888 888

11 LOGO
11

3
1/8/23

What is Hashing?

RRN File
000
001
⁞ ⁞
004 LOWELL
⁞ ⁞
290 BALL
⁞ ⁞
888 TREE
⁞ ⁞
999

12 LOGO
12

What is Hashing?
v There is no obvious connection between the key and the
location (randomizing)

v Two different keys may be sent to the same address


generating a Collision

v Can you give an example of collision for the hash function


in the previous example?

13 LOGO
13

4
1/8/23

What is Hashing?
v LOWELL, LOCK, OLIVER, and any word with first two
letters L and O will be mapped to the same address
h(LOWELL)=h(LOCK)=h(OLIVER)=004

v These keys are called synonyms. The address “004” is


said to be the home address of any of these keys.

v Avoiding collisions is extremely difficult, So we need


techniques for dealing with it.

14 LOGO
14

Reducing Collisions
1. Spread out the records by choosing a good hash function

2. Use extra memory: increase the size of the address


space ( Example: reserve 5,000 available addresses
rather than 1,000)

3. Put more than one record at a single address: use of


Buckets

15 LOGO
15

5
1/8/23

Hash Function

16 LOGO
16

A simple Hash Function


v To compute this hash function, apply 3 steps:

v Step 1: transform the key into a number.

17 LOGO
17

6
1/8/23

A simple Hash Function


v Step 2: fold and add (chop off pieces of the number and
add them together) and take the mod by a prime number

18 LOGO
18

A simple Hash Function


v Step 3: divide by the size of the address space
(preferably a prime number)

19 LOGO
19

7
1/8/23

Distribution of Records among Addresses


v There are 3 possibilities:

v Uniform distributions are extremely rare


v Random distributions are acceptable and more easily
obtainable.
20 LOGO
20

Better than Random Distribution


vExamine keys for patterns
§ Example: Numerical keys that are spread out naturally
such as keys are years between 1970 and 2004
f(year)=(year-1970) mod (2004-1970+1)
f(1970)=0, f(1971)=1,..., f(2004)=34

vFold parts of the key


§ Folding means extracting digits from a key and adding
the parts together as in the previous example.
§ In some cases, this process may preserve the natural
separation of keys, if there is a natural separation

21 LOGO
21

8
1/8/23

Better than Random Distribution


vUse prime number when dividing the key

§ Dividing by a number is good when there are


sequences of consecutive numbers.

§ If there are many different sequences of consecutive


numbers, dividing by a number that has many small
factors may result in lots of collisions. A prime number
is a better choice.

22 LOGO
22

Randomization
v When there is no natural separation between keys, try
randomization.

v You can using the following Hash functions:

1. Square the key and take the middle


Example:
key=453 4532 = 205209
Extract the middle = 52
This address is between 00 and 99

23 LOGO
23

9
1/8/23

Randomization

2. Radix transformation:
Transform the number into another base and then
divide by the maximum address

Example:
Addresses from 0 to 99
key = 453 in base 11 = 382
hash address = 382 mod 99 = 85.

24 LOGO
24

Collisions Reduction

25 LOGO
25

10
1/8/23

Collision Resolution: Progressive Overflow


v Progressive overflow/linear probing works as follows:

1. Insertion of key k:
§ Go to the home address of k: h(k)
§ If free, place the key there
§ If occupied, try the next position until an empty
position is found
(the ‘next’ position for the last position is position 0, i.e.
wrap around)

26 LOGO
26

Collision Resolution: Progressive Overflow


v Example:
Complete Table
Key K Home Address – h(k) 0
COLE 20 1
BATES 21 2
ADAMS 21 ⁞ ⁞
DEAN 22 19
EVANS 20 20
21
22
Table Size=23

27 LOGO
27

11
1/8/23

Collision Resolution: Progressive Overflow


v Example:
Complete Table
Key K Home Address – h(k) 0 DEAN
COLE 20 1 EVANS
BATES 21 2
ADAMS 21 ⁞ ⁞
DEAN 22 19
EVANS 20 20 COLE
21 BATES
22 ADAMS
Table Size=23

28 LOGO
28

Collision Resolution: Progressive Overflow


2. Searching for key k:
§ Go to the home address of k: h(k)
§ If k is in home address, we are done.
§ Otherwise try the next position until: key is found or
empty space is found or home address is reached (in
the last 2 cases, the key is not found)

29 LOGO
29

12
1/8/23

Collision Resolution: Progressive Overflow


v Example:
§ A search for ‘EVANS’ probes
Complete Table
places: 20,21,22,0,1, finding
0 DEAN
the record at position 1.
1 EVANS
§ Search for ‘MOURA’, if 2
h(MOURA)=22, probes ⁞ ⁞
places 22,0,1,2 where it 19
concludes ‘MOURA’ in not in 20 COLE
the table. 21 BATES
§ Search for ‘SMITH’, if 22 ADAMS
h(SMITH)=19, probes 19, Table Size=23
and concludes ‘SMITH’ in
not in the table.

30 LOGO
30

Collision Resolution: Progressive Overflow


v Advantages:
§ Simplicity

v Disadvantage:
§ If there are lots of collisions of records, as in the
previous example

31 LOGO
31

13
1/8/23

Collision Resolution: Progressive Overflow


v Search length:
§ It is the number of accesses required to retrieve a
record.

32 LOGO
32

Collision Resolution: Progressive Overflow


v Example: Complete Table
0 DEAN
Key K Home Address – h(k)
1 EVANS
COLE 20
2
BATES 21
⁞ ⁞
ADAMS 21
19
DEAN 22
20 COLE
EVANS 20
21 BATES
Key K Search length 22 ADAMS
COLE 1 Table Size=23
BATES 1
ADAMS 2
Average search length
DEAN 2
(1+1+2+2+5)/5=2.2
EVANS 5

33 LOGO
33

14
1/8/23

Hashing with Buckets


v This is a variation of hashed files in which more than one
record/key is stored per hash address.

v Bucket = block of records corresponding to one address


in the hash table

v The hash function gives the Bucket Address

34 LOGO
34

Hashing with Buckets


v Example:
§ For a bucket holding 3 records, insert the following
keys Complete Table
0
Key K Home Address – h(k)
LOYD 34
KING 33
⁞ ⁞
LAND 33
33 KING
MARX 33
LAND
MUTT 33
MARX
PLUM 34
34 LOYD
REES 34

35 LOGO
35

15
1/8/23

Hashing with Buckets


v Example:
Complete Table
0 REES

Key K Home Address – h(k)


LOYD 34
⁞ ⁞
KING 33
33 KING
LAND 33
LAND
MARX 33
MARX
MUTT 33
34 LOYD
PLUM 34
MUTT
REES 34
PLUM

36 LOGO
36

Hashing with Buckets: Implementation issues


1. Bucket Structure:

§ A Bucket should contain a counter that keeps track of


the number of records stored in it.

§ Empty slots in a bucket may be marked ‘//.../’

§ Example: Bucket of size 3 holding 2 records

2 JONES ////////////..// ARNSWORTH

37 LOGO
37

16
1/8/23

Hashing with Buckets: Implementation issues


2. Initializing a file for hashing:

§ Decide on the Logical Size (number of available


addresses) and on the number of buckets per
address.

§ Create a file of empty buckets before storing records.

§ An empty bucket will look like


0 ////////////..// ////////////..// ////////////..//

38 LOGO
38

Hashing with Buckets: Implementation issues


3. Loading a hash file:

§ When inserting a key, remember to:

Ø Be careful with infinite loops when hash file is full

Ø Create a file of empty buckets before storing


records.

39 LOGO
39

17
1/8/23

Making Deletions
v Deletions in a hashed file have to be made with care:

⁞ ⁞
4 /////////////////////
Key K Home Address – h(k) 5 ADAMS
ADAMS 5 6 JONES
JONES 6 7 MORRIS
MORRIS 6 8 SMITH
SMITH 5 ⁞ ⁞

Hashed File using


Progressive Overflow

40 LOGO
40

Making Deletions: Delete ‘MORRIS’


v If ‘MORRIS’ is simply erased, a search for ‘SMITH’ would
be unsuccessful

⁞ ⁞ ⁞ ⁞
4 ///////////////////// 4 /////////////////////
5 ADAMS 5 ADAMS
Empty Slots
6 JONES 6 JONES
7 MORRIS 7 /////////////////////
8 SMITH 8 SMITH
⁞ ⁞ ⁞ ⁞

v Search for ‘SMITH’ would go to home address (position 5)


and when reached 7 it would conclude ‘SMITH’ is not in
the file!

41 LOGO
41

18
1/8/23

Making Deletions: Delete ‘MORRIS’


v Replace deleted records with a marker indicating that a
record once lived there.

⁞ ⁞
4 ///////////////////// Empty Slot
5 ADAMS
6 JONES
7 ########### Deleted Slot
8 SMITH you can find ‘SMITH’
⁞ ⁞

v A search must continue when it finds a tombstone, but


can stop whenever an empty slot is found
42 LOGO
42

Be careful in Deleting and Addling a Rerecord

v Only insert a tombstone when the next record is occupied


or is a tombstone.

v Insertions should be modified to work with tombstones: if


either an empty slot or a tombstone is reached, place the
new record there.

43 LOGO
43

19
1/8/23

Effects of Deletions and Additions on Performance


v The presence of too many tombstones increases search
length.

v Solutions to the problem of deteriorating average search


lengths:
1. Deletion algorithm may try to move records that follow
a tombstone backwards towards its home address
2. Complete reorganization: re-hashing
3. Use a different type of collision resolution technique

44 LOGO
44

Other Collision Resolution Techniques


1. Double Hashing:

§ The first hash function determines the home address

§ If the home address is occupied, apply a second hash


function to get a number c (c relatively prime to N)

§ c is added to the home address to produce an overflow


addresses: if occupied, proceed by adding c to the
overflow address, until an empty spot is found.

45 LOGO
45

20
1/8/23

Other Collision Resolution Techniques


0
1
h1(k)
Key K h2(k) =c 2
home address
ADAMS 5 2 3
JONES 6 3 4
MORRIS 6 4 5 ADAMS
SMITH 5 3 6 JONES
7
8 SMITH
9
Hashed file using double 10 MORRIS
hashing

46 LOGO
46

Other Collision Resolution Techniques


v Suppose the above table is full, and that a
key k has h1(k)=6 and h2(k)=3. 0 XXXXXXX
1 XXXXXXX

v What would be the order in which the 2 XXXXXXX

addresses would be probed when trying to 3 XXXXXXX

insert k? 4 XXXXXXX
5 XXXXXXX
6 XXXXXXX
v Answer: 6, 9, 1, 4, 7, 10, 2, 5, 8, 0, 3 7 XXXXXXX
8 XXXXXXX
9 XXXXXXX
10 XXXXXXX

47 LOGO
47

21
1/8/23

Other Collision Resolution Techniques


2. Chained Progressive Overflow:

§ Similar to progressive overflow, except that synonyms


are linked together with pointers.

§ The objective is to reduce the search length for records


within clusters.

48 LOGO
48

Other Collision Resolution Techniques


v Example:

Chained
Progressive
Key K home address Progressive
Overflow
Overflow
ADAMS 20 1 1
BATES 21 1 1
COLES 20 3 2
DEAN 21 3 2
EVANS 24 1 1
FLI NT 20 6 3
Average Search Length 2.5 1.7

49 LOGO
49

22
1/8/23

Other Collision Resolution Techniques


v Example:

Progressive Overflow Chained Progressive Overflow

data data next


⁞ ⁞ ⁞ ⁞ ⁞
20 ADAMS 20 ADAMS 22
21 BATES 21 BATES 23
22 COLES 22 COLES 25
23 DEAN 23 DEAN -1
24 EVANS 24 EVANS -1
25 FLI NT 25 FLI NT -1
⁞ ⁞ ⁞ ⁞ ⁞

50 LOGO
50

Other Collision Resolution Techniques


3. Chained with a Separate Overflow Area:

§ Move overflow records to a Separate Overflow Area

§ A linked list of synonyms start at their home address in


the Primary data area, continuing in the separate
overflow area

51 LOGO
51

23
1/8/23

Other Collision Resolution Techniques


v Example:

Primary Data Area Overflow Area

data next data next


⁞ ⁞ ⁞ ⁞ ⁞ ⁞
20 ADAMS 0 0 COLES 2
21 BATES 1 1 DEAN -1
22 2 FLINT -1
23 3
24 EVANS -1 ⁞ ⁞ ⁞
25
⁞ ⁞ ⁞

52 LOGO
52

24

You might also like