0% found this document useful (0 votes)
10 views24 pages

5 Hash - New

This document provides an introduction to hashing, focusing on its application in managing sparse key-based data using hash tables. It covers essential concepts such as hash functions, collision resolution methods (open addressing and separate chaining), and the importance of table size, particularly prime sizes. Additionally, it discusses practical applications of hashing in areas like dictionaries, data integrity, and programming resources for further learning.

Uploaded by

jackyko0319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

5 Hash - New

This document provides an introduction to hashing, focusing on its application in managing sparse key-based data using hash tables. It covers essential concepts such as hash functions, collision resolution methods (open addressing and separate chaining), and the importance of table size, particularly prime sizes. Additionally, it discusses practical applications of hashing in areas like dictionaries, data integrity, and programming resources for further learning.

Uploaded by

jackyko0319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Data Structures

Introduction to Hashing

1
Objective
• Sparse Data
• Key Based Data
• Hash Table
• Hash Functions
• Collision Resolution
• Applications

2
Sparse data
• There are many players in a complex game
• Each player has an identification number (key)
• The range of the keys can be 1~1,000,000
• No two players have the same key
• Suppose now we have 3 players
– 954,323
– 447,829
– 194,332
• They are far away from each other
• The player information is called key-based data
3
Sparse data
• How to store those data in the computer so that we
can easily get the player’s information by their keys?
– Array:
• A lot of memory space wasted
194,332 447,829 954,323

3 2 1

– Linked List:
• Hard to search if we have 10,000 players
– Hash Table
• Best solution in this case!
4
Basic Hash Table
• Advantages:
– Quickly store sparse key-based data in a reasonable
amount of space
– Quickly determine if a certain key is within the table

194,332 954,323 447,829

0 1 2 3 4 5 6 7 8 9

194,332%10=2 or 194,332≡2 (mod 10)


447,879%10=9 or 447,879≡9 (mod 10)
954,323%10=3 or 954,323≡3 (mod 10)

To get the information, we use:


player=table[key%10]; 5
Collisions
• Two players mapped to the same cell
194,333 954,323

0 1 2 3 4 5 6 7 8 9

• Method to deal with collisions


– Change the table
– Hash functions
• ‘Hash’ in the dictionary: chop (meat) into small pieces
• Here, we ‘Hash’ numbers

6
Hash Functions
Good hash function:
Fast computation, Minimize collision

Kinds of hash functions:


• Division: Slot_id = Key % table_size.
• Others: eg., Slot_id = (Key2 + Key + 41) % table_size
• table_size should better be a prime number.

7
Combination of Hash Functions
• Collision is easy to happen if we use % function
• Combination:
– Apply hash function h1 on key to obtain mid_key
– Apply hash function h2 on mid_key to obtain Slot_id

• Example:
– We apply %101 on 12320324111220 and get 79
– We apply %10 on the result 79 obtained by %101
• 79 % 10 =9

8
Collision Resolution - Open Addressing
• Linear Probing 954,323
– If collide, try Slot_id+1, Slot_id+2
0 1 2 Full Full 5 6 7 8 9
• Quadratic Probing
954,323
– If collide, try Slot_id+1, Slot_id+4,…
• Double Hashing 0 1 2 Full Full 5 6 7 8 9
– If collide, try Slot_id+h2(x), Slot_id+2h2(x),… (prime size important)
• General rule: If collide, try other slots in a certain order
• How to find data?
– If not found, try the next position according to different probing rule
– Every key has a preference over all the positions
– When finding them, just search in the order of their preferences

9
Collision Resolution - Separate Chaining
• Problems with Open Addressing?
• Using linked list to solve Collision
– Every slot in the hash table is a linked list
– CollisionInsert into the corresponding list
– Find dataSearch the corresponding list

1 441
361 91
2 512

3 63
723
4
74 10
Collision Resolution
• Example: 11,22,33,44,55,66,77,88,99,21
– Using linear probing
21 11 22 33 44 55 66 77 88 99

– Using separate chaining


1
11 21
2
22
3
33
4
44
5
55

11
More on Hash Table Size
• Table of prime size is important in the following cases:
a) For quadratic probing, we have the following property:
– If quadratic probing is used and the table size is prime,
then a new element can always be inserted if the table
is at least half empty (Why only prime can do?).
See Section 5.4.2

b) For double hashing, we have the following property:


– If double hashing is used and the table size is prime,
then a new element can always be inserted if the table
is not full (Is this correct?).
See Section 5.4.3
12
Data Structures and Algorithm Analysis in C++, 4th Edition
Rehashing
• Too many elements in the table
Too many collisions when inserting
• Load factor = number of slots occupied/total slots
• When half full, rehash all the elements into a double-size
table
• In an interactive system, the user who triggers rehashing
is unlucky
• In total, only O(n) cost incurred for a hash table of size n
• Example: initial hash table size 2, when the size grows to
32, how many rehashes are done?
– 24 1 number rehashed
– 48 2 numbers rehashed
– 816 4 numbers rehashed
– 1632 8 numbers rehashed
– In total, 15 numbers rehashed, 15<16=32/2

13
More Questions
• How can rehashing be used?
– If we allow rehashing, then quadratic probing
can always succeed in inserting new items
because the table will always be at least half
empty.
• How to keep the table size still prime when you
do rehashing?

14
Application —— Dictionary
• How do Word perform spelling check?
• A dictionary (large hash table) is kept
• Hash words into that dictionary
• The way to hash words
– Establish a map between characters and numbers
– E.g. A—136, F—356, T—927, E—442, R—091
– “AFTER” corresponds to the key
136,356,927,442,091
– Hashing ‘AFTER’ will be equivalent to hashing the
key

15
How to write Hash Class?
• Exercises:
– 1. use linear probing to write a hash class
– 2. use this class to implement your own small dictionary

16
Hash implementation (linear probing)

// arr is a pointer to pointers, or


practically a two-dimensional
array; each element of the array
is also a pointer

// hash table size

17
Hash implementation (linear probing)

// Linear probing
// Linear probing

// Insert new node

18
Hash implementation (linear probing)

// Linear probing

19
Hash implementation (quadratic probing)
• Modify codes to implement quadratic probing

j=0
while (arr[hashIndex] != NULL && arr[hashIndex]->key != key && j < capacity)
{ j++;
// Computing the new hash value
hashIndex = (hashCode(key)+j*j) % capacity; }
}

20
Hash implementation (double hashing)
• Modify codes to implement double hashing

j=0
while (arr[hashIndex] != NULL && arr[hashIndex]->key != key && j < capacity)
{ j++;
// Computing the new hash value
hash2 = hashCode2(key);
hashIndex = (hashCode(key)+j*hash2) % capacity; }
}

21
Applications of hashing

• Data Structures (Programming Languages)


• Pattern Searching
• Hashing in Database
• Data Integrity Check
• Password Verification
• Compiler Operation
• Finding similar items in huge space
• Linking File name and path together

22
Learning Objectives
1. Understand the concept of Hash
2. Able to insert step by step in a hash table
given the data and the probing rule
3. Know the property of Quadratic Probing and
Double Hashing
4. Able to Implement Hash Table

D:1; C:1,2; B:1,2,3; A:1,2,3,4

23
Programming Resources
• Data Structure/Algorithm Visualization
– https://www.cs.usfca.edu/~galles/visualization
/Algorithms.html

• Online Judge
– PKU OJ, USACO, Leetcode

• ACM ICPC

24

You might also like