0% found this document useful (0 votes)

3 views5 pages

10 More Hashing

Uploaded by

satish.bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views5 pages

10 More Hashing

Uploaded by

satish.bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Remember This List?

CSE 326: Data Structures • How should we resolve collisions?

• What should the table size be?
More Hashing Techniques • What should the hash function be?
• How well does hashing work in the real world?
– We’ll see a case study today!

Hannah Tang and Brian Tjaden

Summer Quarter 2002

Hashing Dilemma Universal Hashing1

0
Suppose your WorstEnemy 1) knows your hash function; 2) gets to k1 1
decide which keys to send you? Suppose we have a set K of h .
k2 .
possible keys, and a finite K .
Faced with this enticing possibility, WorstEnemy decides to: set H of hash functions that m-1
a) Send you keys which maximize collisions for your hash function. map keys to entries in a hi
b) Take a nap. hashtable of size m. H hj

Definition:
Moral: No single hash function can protect you! H is a universal collection of hash functions if and only if …
For any two keys k1 , k2 in K, there are at most |H|/m functions in H for which
Faced with this dilemma, you: h(k1 ) = h(k2 ).
a) Give up and use a linked list for your Dictionary.
b) Drop out of software, and choose a career in fast foods.
• So … if we randomly choose a hash function from H, our chances of collision
c) Run and hide.
are no more than if we get to choose hash table entries at random!
d) Proceed to the next slide, in hope of a better alternative. 1Motivation: see previous slide (or visit http://www.burgerking.com/jobs)

Good Hashing:
Random Hashing – Not! Universal Hash Function A (UHFa)
Parameterized by prime table size and vector of r integers:
How can we “randomly choose a hash function”? a = <a 1 … ar> where 0 <= ai < size
– Certainly we cannot randomly choose hash functions at runtime,
interspersed amongst the inserts, finds, deletes! Why not?
Represent each key as a vector k of r integers, where ki < size
• We can, however, randomly choose a hash function each – size = 11, key = 39752 ==> <3,9,7,5,2>
time we initialize a new hashtable.
– size = 29, key = “hello world” ==>
<8,5,12,12,15,23,15,18,12,4>
Conclusions
– WorstEnemy never knows which hash function we will choose –  r 
neither do we! h a (k) =  ∑ a i ki  mod size
– No single input (set of keys) can always evoke worst-case behavior  i= 0 

1
UHFa: Example Thinking about UHFa
• Context: hash strings of length 3 in a table of size 131
Strengths:
let a = <35, 100, 21> – Works on any type as long as you can map keys to
vectors
h a (“xyz”) = (35*120 + 100*121 + 21*122) % 131
– If we’re building a static table, we can try many values
= 129 of the hash vector <a>
– Random <a> has guaranteed good properties no matter
Let b = <25, 90, 83> what we’re hashing
h b(“xyz”) = (25*120 + 90*121 + 83*122) % 131
= 43 Weaknesses:
– Must choose prime table size larger than any k i

Good Hashing: UHF b : Example

Universal Hash Function B (UHFb ) Context: hash integers in a table of size 160

Parameterized by j, a, and b: Let j = 32, a = 13, b = 142

– j * size should fit into an int h j,a,b(1000) = ((13*1000 + 142) % (32*160)) / 32
– a and b must be less than size = (13142 % 5120) / 32
= 2902 / 32
= 90
hj,a,b (k) = ((ak + b) mod (j*size))/j

Let j = 31, a = 82, b = 112

h j,a,b(1000) = ((82*1000 + 112) % (31*160)) / 31
= (82112 % 4960) / 31
= 2752 / 31
= 89

Thinking about UHFb Perfect Hashing

Strengths
– If we’re building a static table, we can try many parameter When we know the entire key set in advance …
values – Examples: programming language keywords, CD -ROM
– Randoma,b has guaranteed good properties no matter file list, spelling dictionary, etc.
what we’re hashing
– Can choose any size table
– Very efficient if j and size are powers of 2 - why? … then perfect hashing lets us achieve:
– Worst-case O(1) time complexity!
Weaknesses – Worst-case O(n) space complexity!
– Need to turn non-integer keys into integers

2
Perfect Hashing Theorems 2
Perfect Hashing Technique
• Static set of n known keys 0 Theorem: If we store n keys in a hash table of size n 2 using a randomly chosen
universal hash function, then the probability of any collision is < ½.
• Separate chaining, two-level hash 1
• Primary hash table size=n 2 Theorem: If we store n keys in a hash table of size m=n using a randoml y chosen
universal hash function, then
• j th secondary hash table size=nj 2 3  m −1 2 
(where nj keys hash to slot j in primary E∑n j < 2n
hash table) 4 Secondary hash tables  j =0 
• Universal hash functions in all hash 5 where n j is the number of keys hashing to slot j.
tables 6
Corollary : If we store n keys in a hash table of size m=n using a randoml y chosen
• Conduct (a few!) random trials, until universal hash function and we set the size of each secondary hash table to mj=n j2,
we get collision -free hash functions Primary hash table then:
a) The probability that the total storage used for all secondary hash tables exceeds 4n is less than ½.
b) The expected amount of storage required for all secondary hash t ables is less than 2n.

2
Intro to Algorithms, 2 n d ed. Cormen ,
Leiserson , Rivest, Stein

Extendible Hashing:
Perfect Hashing Conclusions Cost of a Database Query
Perfect hashing theorems set tight expected bounds on sizes and
collision behavior of all the hash tables (primary and all
secondaries).

à Conduct a few random trials of universal hash functions, by

simply varying UHF parameters, until we get a set of UHFs and
associated table sizes which deliver …
– Worst-case O(1) time complexity!
– Worst-case O(n) space complexity!

I/O to CPU ratio is 300-to-1!

Extendible Hashing Extendible Hash Table

• Directory entry: key prefix (first k bits) and a pointer to the bucket with all
Hashing technique for huge data sets keys starting with its prefix
– Optimizes to reduce disk accesses • Each bucket contains keys matching on first j ≤ k bits, plus the value
associated with each key
– Each hash bucket fits on one disk block
– Better than B-Trees if order is not important – why? directory for k = 3
000 001 010 011 100 101 110 111

Table contains:
(j = 2) (j = 2) (j = 3) (j = 3) (j = 2)
– Buckets, each fitting in one disk block, with the data 00001 01001 10001 10101 11001
– A directory that fits in one disk block is used to hash to 00011 01011 10011 10110 11011
the correct bucket 00100 01100 10111 11100
00110 11110

3
Inserting (easy case) Splitting a Leaf
000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111

(2) (2) (3) (3) (2) (2) (2) (3) (3) (2)
00001 01001 10001 10101 11001 00001 01001 10001 10101 11001
00011 01011 10011 10110 11100 00011 01011 10011 10110 11011
00100 01100 10111 11110 00100 01100 10111 11100
00110 00110 11110

insert(11011) insert(11000)

000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111

(2) (2) (3) (3) (2) (2) (2) (3) (3) (3) (3)
00001 01001 10001 10101 11001 00001 01001 10001 10101 11000 11100
00011 01011 10011 10110 11011 00011 01011 10011 10110 11001 11110
00100 01100 10111 11100 00100 01100 10111 11011
00110 11110 00110

If Extendible Hashing Doesn’t Cut It

Splitting the Directory
00 01 10 11 Store only pointers/references to the items: (key, value) pairs are
1. insert(10010) in disk
But, no room to insert and + (Potentially ) much smaller M
no adoption! (2) (2) (2)
+ Fewer items in the directory
01101 10000 11001
10001 11110 – One extra disk access!
2. Solution: Expand directory 10011 Rehash
10111 + Potentially better distribution over the buckets
3. Then, it’s just a normal split.
+ Fewer unnecessary items in the directory
– Can’t solve the problem if there’s simply too much data
000 001 010 011 100 101 110 111
What if these don’t work?
– Use a B-Tree to store the directory!

Hash Wrap-up
Hash function: maps keys to integers; table size should be prime
Hash Wrap-up (part 2)
Collision resolution Choosing a Hash Function
• Separate Chaining • Universal hashing
– Guarantees no (always) bad
• Also: Extendible hashing
– Expand beyond hashtable via
input – For disk-based data
secondary Dictionaries
– Combine with B-tree directory if needed
– Allows λ > 1 • Perfect hashing
• Open Addressing – Requires known, fixed keyset
– Expand within hashtable – Achieves O(1) time, O(n) space
- guaranteed!
– Secondary probing: {linear,
quadratic, double hash}
– λ ≤ 1 (by definition!)
– λ ≤ ½ (by preference!)

•Rehashing
–Tunes up hashtable when λ crosses the line

4
Dictionary ADT Wrapup: Case Study: Assumptions
Case Study
You will be given a spelling dictionary of English words
• Your company, Procrastinators Inc., will release its highly
– 30,000 words
hyped word -processing program, WordMaster2000 (yeah,
they’re a little behind the times), next month. – Static (ie, does not support adding user-supplied words yet)
• Your highly successful alpha-test was marred by user – Arbitrary(ish) preprocessing time
requests for a spell-checker. Practical notes
• Your mission: write and test a spell -checker module before – Almost all searches are successful – Why?
WordMaster2000 is released. – Words average about 8 characters in length
• For now, you only need to worry about the English – 30,000 words at 8 bytes/word ~ .25 MB
language, although WordMaster2000 is successful, you may
– There are many regularities in the structure of English
need to port your spell-checker to other languages/character
words
sets.

Case Study:
Design Considerations
Issues:
– Which data structure should we use?
– What are our design goals?

Possible Solutions?

Punished by Rewards
100% (1)
Punished by Rewards
48 pages
D31EXPX 22 Vs CAT AECI431 00 LoRes 58018
No ratings yet
D31EXPX 22 Vs CAT AECI431 00 LoRes 58018
68 pages
Session 2: Personal Professional Development: Pre-Test
No ratings yet
Session 2: Personal Professional Development: Pre-Test
9 pages
HW5e Int Tests Guide
50% (2)
HW5e Int Tests Guide
1 page
Waiver of Rights
100% (10)
Waiver of Rights
2 pages
Elp Group1 Lesson 4 7
No ratings yet
Elp Group1 Lesson 4 7
17 pages
Chöông 4: Phaân Tích Cuù Phaùp
No ratings yet
Chöông 4: Phaân Tích Cuù Phaùp
54 pages
SOCIAL VALUE PROPOSITION - Group 4 Narrative Report
No ratings yet
SOCIAL VALUE PROPOSITION - Group 4 Narrative Report
5 pages
Ey Step Up To Ind As For Banks and NBFCSMHNGG
No ratings yet
Ey Step Up To Ind As For Banks and NBFCSMHNGG
44 pages
DSCP & Vlan Priority
No ratings yet
DSCP & Vlan Priority
13 pages
1 Hashing: 1.1 Maintaining A Dictionary
No ratings yet
1 Hashing: 1.1 Maintaining A Dictionary
17 pages
Special Web 1 PDF
No ratings yet
Special Web 1 PDF
12 pages
01 Phone-Book-Problem 07 Hash Tables 2 Hashfunctions
No ratings yet
01 Phone-Book-Problem 07 Hash Tables 2 Hashfunctions
119 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
WIREs Water - 2020 - Meehan - Exposing The Myths of Household Water Insecurity in The Global North A Critical Review
No ratings yet
WIREs Water - 2020 - Meehan - Exposing The Myths of Household Water Insecurity in The Global North A Critical Review
20 pages
A Review On The Ayurvedic Management of Causes and Symptoms of Bronchial Asthma
No ratings yet
A Review On The Ayurvedic Management of Causes and Symptoms of Bronchial Asthma
8 pages
Inbound 7418254903065815207
No ratings yet
Inbound 7418254903065815207
78 pages
CS143: Hash Index
No ratings yet
CS143: Hash Index
26 pages
1 Hashing: 1.1 Desired Properties
No ratings yet
1 Hashing: 1.1 Desired Properties
8 pages
Hashing
No ratings yet
Hashing
111 pages
CSE 326: Data Structures Hash Tables: Autumn 2007
No ratings yet
CSE 326: Data Structures Hash Tables: Autumn 2007
29 pages
Using The Tea Evaluation Sheet
No ratings yet
Using The Tea Evaluation Sheet
4 pages
Doctrine of Christ Series - Denver Snuffer
No ratings yet
Doctrine of Christ Series - Denver Snuffer
57 pages
4452756321762954@4364258&425681233248 - 4288232132658965554 TH Application Form
No ratings yet
4452756321762954@4364258&425681233248 - 4288232132658965554 TH Application Form
3 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Hash Tables: A Detailed Description
No ratings yet
Hash Tables: A Detailed Description
10 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Lect10 Hash Basics
No ratings yet
Lect10 Hash Basics
4 pages
10 Dictionaries
No ratings yet
10 Dictionaries
11 pages
DSAU1HASH
No ratings yet
DSAU1HASH
21 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing
No ratings yet
Hashing
13 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Career Manager Brochure
No ratings yet
Career Manager Brochure
6 pages
ERPNEXT
No ratings yet
ERPNEXT
5 pages
Lecture 05
No ratings yet
Lecture 05
19 pages
Idst 2016 SA 05 Hashing
No ratings yet
Idst 2016 SA 05 Hashing
68 pages
Ads-Unit I
No ratings yet
Ads-Unit I
16 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
Administration of Justice
No ratings yet
Administration of Justice
4 pages
Hash Tables
No ratings yet
Hash Tables
30 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Week 12 Hashing
No ratings yet
Week 12 Hashing
24 pages
06 - Hashing
No ratings yet
06 - Hashing
75 pages
12 Hashing
No ratings yet
12 Hashing
9 pages
Mihir (J) Bhatt - LinkedIn
No ratings yet
Mihir (J) Bhatt - LinkedIn
7 pages
Rishi Sunak's Five Promises What Progress Has He Made
No ratings yet
Rishi Sunak's Five Promises What Progress Has He Made
5 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
25 pages
Lab 3
No ratings yet
Lab 3
5 pages
Full-Stack Development 5 Day Workshop Syllabus
No ratings yet
Full-Stack Development 5 Day Workshop Syllabus
5 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
For Billing Enquiry Call Tikona Care at 1800-20-94276: Current Bill Details Amount (RS.)
No ratings yet
For Billing Enquiry Call Tikona Care at 1800-20-94276: Current Bill Details Amount (RS.)
1 page
Bhs Inggris 6
No ratings yet
Bhs Inggris 6
2 pages
Perfect Hashing University of Chicago
No ratings yet
Perfect Hashing University of Chicago
42 pages
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
No ratings yet
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
20 pages
R8 Waray BoSY CRLA 11.24.2021 v4
No ratings yet
R8 Waray BoSY CRLA 11.24.2021 v4
10 pages
06 Hashing
No ratings yet
06 Hashing
6 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Dictionaries: Sets
No ratings yet
Dictionaries: Sets
92 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
No ratings yet
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
19 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
Unit28 Hashing1
No ratings yet
Unit28 Hashing1
19 pages
DNA Technology PDF
No ratings yet
DNA Technology PDF
7 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
5 Hash - New
No ratings yet
5 Hash - New
24 pages
Chapter 5 - Hashing - Part1
No ratings yet
Chapter 5 - Hashing - Part1
28 pages
Hashing
No ratings yet
Hashing
30 pages
Media Are The Communication Outlets or Tools Used To Store and Deliver Information or Data
No ratings yet
Media Are The Communication Outlets or Tools Used To Store and Deliver Information or Data
7 pages
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
No ratings yet
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
32 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
69 pages
Lecture03 Hashing
No ratings yet
Lecture03 Hashing
12 pages
Oral Cancer Essay
No ratings yet
Oral Cancer Essay
3 pages
Hashing Unit 1
No ratings yet
Hashing Unit 1
91 pages
Lab8 Hash
No ratings yet
Lab8 Hash
43 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Lecture05 Hash Table
No ratings yet
Lecture05 Hash Table
65 pages
MB-409 International Marketing All Unit Notes RGPV
No ratings yet
MB-409 International Marketing All Unit Notes RGPV
51 pages
L32 Hashing
No ratings yet
L32 Hashing
2 pages
Module 5
No ratings yet
Module 5
72 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
Runtime
From Everand
Runtime
Jasmine Patel
No ratings yet

10 More Hashing

Uploaded by

10 More Hashing

Uploaded by

Remember This List?

CSE 326: Data Structures • How should we resolve collisions?

Hannah Tang and Brian Tjaden

Hashing Dilemma Universal Hashing1

Good Hashing: UHF b : Example

Parameterized by j, a, and b: Let j = 32, a = 13, b = 142

Let j = 31, a = 82, b = 112

Thinking about UHFb Perfect Hashing

à Conduct a few random trials of universal hash functions, by

I/O to CPU ratio is 300-to-1!

Extendible Hashing Extendible Hash Table

If Extendible Hashing Doesn’t Cut It

You might also like