Advanced Algorithms Course. Lecture Notes. Part 10: Hashing

The document discusses two algorithms: 1) Hashing - A method for implementing dictionaries using hash tables. It describes a simple hashing scheme that uses a hash function to map elements to array indices, keeping expected collisions low. This allows dictionary operations like insertion and lookup to run in O(1) expected time. 2) Closest Points - A randomized algorithm for finding the closest pair of points among n input points. It processes the points in random order and maintains the minimum distance d, updating hash tables when d changes. The expected number of hash table insertions is O(n), allowing the algorithm to run in O(n) expected time plus O(n) dictionary operations.

Uploaded by

Kasapa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views4 pages

Advanced Algorithms Course. Lecture Notes. Part 10: Hashing

Uploaded by

Kasapa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Advanced Algorithms Course.

Lecture Notes. Part 10

Hashing
(This part may be skipped if you know hashing already very well from Data
Structure courses.)
Let be U a universe (a huge set) of elements. A dictionary is a data
structure that keeps track of a set S U and supports the following opra-
tions: insert, delete, lookup. That is, a dictionary enables us to quickly
insert elements into a set, delete elements from a set, or retrieve elements of
the set. Hash tables are among the most well known implementations of dic-
tionaries. In the following, n is always some fixed size bound much smaller
than |U |. A hash table H is an array of size n, with indices 0, . . . , n 1,
where n |S|. That is, H allocates enough space for storing sets S of at
most n elements. However, several elements may be stored in the same entry
of H, for example as a list. Then we speak of collisions.
A hash function h maps U onto this index set. In order to execute any
of the dictionary operations for an element, we compute the index of that
element and access the corresponding entry of H. Of course, h must be
easily computable, and it is essential that our hash function keeps collisions
to a minimum: If many elements are stored in the same entry, we still
have to search for the desired element there, and this would slow down the
dictionary operation. Since U is much larger than n, collisions cannot be
avoided, but with a good randomized approach we can keep their expected
number small. In the following, note again that randomness is only in the
algorithm (here: in the design of our hash function h), but we do not make
any assumptions on the set S we want to store, other than |S| n.
Here is a classical simple hashing scheme, along with a rigorous analysis
of its performance. We will choose h at random from a certain class of easily
computable functions. We call a function class universal if for any pair

1
u, v U the probability of h(u) = h(v) is at most 1/n. This is a good
property for hashing because, if we pick a random h from a universal class
then, for any fixed element u, the expected number of other elements s S
with h(s) = h(u) is at most 1, and we barely get large bags of elements in
the same entry of H. Thus our dictionary will be able to do any operation
in O(1) expected time.
But do such universal classes of functions exist? Trivially, the class of
all functions from U into the index set has this property. But what would
it mean to choose a random h from the class of all functions? Since the
values of such h are random and independent, h has no structure, and we
can compute the values of h for given elements only by looking them up,
in a table of size |U |, which is against the very idea of hashing. We need
a restricted class of functions which are easily computable but still shake
well the elements of any subset with at most n elements. One construction
comes from elementary number theory.
We choose a prime number p slightly larger than our n. (Prime numbers
are dense enough in the set of integers, we will always find such p. We do
not go into details of this preprocessing step.) We represent the elements of
U as vectors x = (x1 , . . . , xr ) with 0 xi < p for all i. The dimensionality
we need is clearly r log |U |/ log p. (This may look complicated, but note
that these vectors can be seen as arbitrary names of the elements.) For
every a = (a1 , . . . , ar ) we define a hash function ha (x) = ( ri=1 ai xi ) mod p.
P

For any given x U these values are really easy to compute. It remains
to analyze the collisions. We will see that the class of all functions ha is
universal.
Very little help from number theory is needed: If p is a prime number,
and z 6= 0 mod p, then az = bz mod p implies a = b mod p for any two
numbers a, b. (The proof is straightforward.)
Using this fact we show, for any two x, y U , that ha (x) = ha (y)
happens with probability at most 1/p. (Recall where this probability comes
from: We took some random a.) Since x 6= y, their vectors must differ
somewhere. Hence, let j be some position where xj 6= yj . A nice trick
makes the probability calculation extremely simple: Instead of considering a
random a, we fix all ai , i 6= j, and choose only the component aj randomly,
where 0 aj < p. Then the probability result applies also to a random
vector a. (Why?) By the construction of ha , a collision ha (x) = ha (y)
appears if and only if aj (yj xj ) = i6=j ai (xi yi ) mod p. Since we have
P

fixed the right-hand side, we can treat it as a constant, say m. Now define

2
z := yj xj . Due to the above number-theoretic fact, there exists exactly
one aj with aj z = m mod p. Hence the probability of collision is 1/p 1/n,
and our hash table can execute dictionary operations in O(1) expected time.
A final remark: There is often confusion about the time complexity of
hash table operations. O(1) is the expected number of arithmetic operations.
But the bit complexity is not constant, it grows logarithmically in the size
of the sets we want to deal with. Thus, hash tables are asymptotically not
faster than other dictionary implementations such as balanced search trees.
The real advantage of hash tables is elsewhere: They are easy to implement
(just evaluation of some simple functions) and use only arithmetic, which
is physically faster than manipulations with pointers, etc., that would be
needed to implement trees.

Closest Points
For the problem of finding a closest pair of n points in the plane there
exists a divide-and-conquer algorithm running in O(n log n) time. It follows
a simple idea but is a bit complicated when it comes to the implementation
details. A simpler randomized algorithm solves the problem already in O(n)
expected time plus O(n) dictionary operations.
We can always assume that our n points are in a unit square. In our
algorithm we maintain a real number d which is the smallest distance be-
tween two points known so far. We consider the n points in random order.
For every new point p we test whether p has distance smaller than d to
some earlier point, and in this case we update d. For an efficient test we
have to avoid computing the distances to all earlier points. Therefore we
divide the unit square into squares of side length d/2. Since d is the smallest
distance, at most one earlier point can be located in each square. Moreover,
those points which might have a distance smaller than d to p are in squares
close to the square containing p, more precisely, they are in a 5 5 grid of
squares. Thus we have to test at most 24 candidates in every step. Hence
O(n) computations are enough, for all n points. So far we have not even
used the fact that points are processed in random order.
However, some complications begin here: We need to know which points
are in the candidate squares! For this purpose we may use a hash table, with
an entry for every point. But whenever d is diminshed, our partitioning into
squares of side length d/2 changes totally, and we have to create a new
hash table from scratch. How often do we have to insert our points into

3
the various hash tables? Only here the randomized order of points becomes
important.
Let X be a random variable for the total number of insertions. Let Xi
be another random variable, with Xi = 1 if the ith point causes an update,
P
and Xi = 0 else. Clearly, X = n + i iXi . The key fact is that Xi = 1
with probability at most 2/i: For each i, the first i points are randomly
ordered as well, hence, the event that some of the two points in a closest
pair is the ith point has probability 2/i. Linearity of expectation gives
E[X] = n + i iE[Xi ] 3n. Thus, the expected number of dictionary
P

operationseeds O(1) expected time. From these two facts it follows that the
total expected time is O(n). Stop! The latter conclusion seems obvious at
first glance. But we have to point out that a strict proof needs a careful
analysis of conditional expectations, since we combine here two different
sources of randomness. However we omit this technical proof. We only
wanted to stress the efficiency and simplicity of a randomized approach.

Hash Function - Wikipedia
No ratings yet
Hash Function - Wikipedia
44 pages
Data Structures Notes Unit-2
No ratings yet
Data Structures Notes Unit-2
60 pages
Lec1 Bloom Distinctcount
No ratings yet
Lec1 Bloom Distinctcount
76 pages
Hashing
No ratings yet
Hashing
111 pages
24 SimilaritySearch
No ratings yet
24 SimilaritySearch
52 pages
Dsa Merged
No ratings yet
Dsa Merged
339 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Lecture05 Hash Table
No ratings yet
Lecture05 Hash Table
65 pages
Lec 31 Handout
No ratings yet
Lec 31 Handout
18 pages
MIT6 046JS12 Lec10
No ratings yet
MIT6 046JS12 Lec10
8 pages
CS5800 Assignment 6
No ratings yet
CS5800 Assignment 6
10 pages
Notes ML For Data Science
No ratings yet
Notes ML For Data Science
14 pages
Randomized Algorithms Notes
No ratings yet
Randomized Algorithms Notes
13 pages
2 6 Hash+Maps Min
No ratings yet
2 6 Hash+Maps Min
12 pages
Universal Hashing
No ratings yet
Universal Hashing
4 pages
Lect10 Hash Basics
No ratings yet
Lect10 Hash Basics
4 pages
Introduction
No ratings yet
Introduction
34 pages
1 Hashing: 1.1 Maintaining A Dictionary
No ratings yet
1 Hashing: 1.1 Maintaining A Dictionary
17 pages
F.F.T. Hashing Is Not Collision-Free: January 1995
No ratings yet
F.F.T. Hashing Is Not Collision-Free: January 1995
11 pages
c11 Hashing
No ratings yet
c11 Hashing
9 pages
Ch11 Soln 2
No ratings yet
Ch11 Soln 2
8 pages
1 Overview: Lecture 2 - February 3, 2005
No ratings yet
1 Overview: Lecture 2 - February 3, 2005
6 pages
CH8 Hashing
No ratings yet
CH8 Hashing
110 pages
Cuckoo Hashing For Undergraduates: Rasmus Pagh IT University of Copenhagen March 27, 2006
No ratings yet
Cuckoo Hashing For Undergraduates: Rasmus Pagh IT University of Copenhagen March 27, 2006
6 pages
Problem Idea of Universal Hashing
No ratings yet
Problem Idea of Universal Hashing
14 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
High Speed Hashing For Integers
No ratings yet
High Speed Hashing For Integers
17 pages
1 Hashing: 1.1 Desired Properties
No ratings yet
1 Hashing: 1.1 Desired Properties
8 pages
Dictionary ADT: Dictionaries 4/1/2003 8:43 AM
No ratings yet
Dictionary ADT: Dictionaries 4/1/2003 8:43 AM
4 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
Lecture03 Hashing
No ratings yet
Lecture03 Hashing
12 pages
CS174: Note07
No ratings yet
CS174: Note07
5 pages
Hash Table
No ratings yet
Hash Table
9 pages
Lecture 24
No ratings yet
Lecture 24
13 pages
Dictionaries: Sets
No ratings yet
Dictionaries: Sets
92 pages
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
No ratings yet
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
78 pages
Lect1004 PDF
No ratings yet
Lect1004 PDF
7 pages
Chapter10 HashTables
No ratings yet
Chapter10 HashTables
49 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
12 Hashing
No ratings yet
12 Hashing
9 pages
File Organization
No ratings yet
File Organization
49 pages
Ads-Unit I
No ratings yet
Ads-Unit I
16 pages
Hashing
No ratings yet
Hashing
13 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hash Table 2010
No ratings yet
Hash Table 2010
43 pages
Grade 10 Summative Test Q1
82% (11)
Grade 10 Summative Test Q1
4 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
06 Hashing
No ratings yet
06 Hashing
6 pages
Computer Science Lecture 10 Continuations
No ratings yet
Computer Science Lecture 10 Continuations
13 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
CS2040 Summary
No ratings yet
CS2040 Summary
16 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
Constraint Interview Ques Solutions
No ratings yet
Constraint Interview Ques Solutions
5 pages
MIT6 006F11 Lec08 PDF
No ratings yet
MIT6 006F11 Lec08 PDF
7 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
DSA Company Wise
No ratings yet
DSA Company Wise
8 pages
Labview Internal Exam Questions 2022-23
0% (1)
Labview Internal Exam Questions 2022-23
2 pages
CST401 Artificial Intelligence, May 2024
No ratings yet
CST401 Artificial Intelligence, May 2024
4 pages
MFL Characterization
No ratings yet
MFL Characterization
6 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
Web-Enabled Decision Support Systems: Prof. Name Position (123) 456-7890 University Name
No ratings yet
Web-Enabled Decision Support Systems: Prof. Name Position (123) 456-7890 University Name
18 pages
A GSM Simulation Platform Using MATLAB
No ratings yet
A GSM Simulation Platform Using MATLAB
9 pages
Big O Algorithm Complexity Cheat Sheet
100% (1)
Big O Algorithm Complexity Cheat Sheet
3 pages
ML Program 7, 8,9 And10
No ratings yet
ML Program 7, 8,9 And10
12 pages
Deep Learning LAB
No ratings yet
Deep Learning LAB
47 pages
Unit-I (Ensemble Learning)
No ratings yet
Unit-I (Ensemble Learning)
67 pages
Sample Exam 1 EE 210
No ratings yet
Sample Exam 1 EE 210
6 pages
Cse ND 2020 Cs 6704 Resource Management Techniques 850755521 x20411 (Cs6704) Resource Management Techniques
No ratings yet
Cse ND 2020 Cs 6704 Resource Management Techniques 850755521 x20411 (Cs6704) Resource Management Techniques
4 pages
A Gentle Introduction To Singular-Value Decomposition For Machine Learning
No ratings yet
A Gentle Introduction To Singular-Value Decomposition For Machine Learning
14 pages
Lecture 2 Power Planning
No ratings yet
Lecture 2 Power Planning
35 pages
Standard Greedy Algorithms
No ratings yet
Standard Greedy Algorithms
5 pages
Advanced Algorithms Course. Lecture Notes. Part 1
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 1
4 pages
Choosing The Right Gridding Method in Surfer - Golden Software S
No ratings yet
Choosing The Right Gridding Method in Surfer - Golden Software S
3 pages
Heart Disease Prediction Documentation
No ratings yet
Heart Disease Prediction Documentation
4 pages
Abcs of Computing Interest
No ratings yet
Abcs of Computing Interest
22 pages
Float Exercise
No ratings yet
Float Exercise
9 pages
5 Robust Optimization
No ratings yet
5 Robust Optimization
24 pages
03 InformationGain
No ratings yet
03 InformationGain
20 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
5 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
Experiment 5
No ratings yet
Experiment 5
6 pages
Advanced Algorithms Course. Lecture Notes. Part 5: Speeding Up Maximum Flow Computation
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 5: Speeding Up Maximum Flow Computation
5 pages
Advanced Algorithms Course. Lecture Notes. Part 4: Using Linear Programming For Approximation Algorithms
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 4: Using Linear Programming For Approximation Algorithms
5 pages
Advanced Algorithms Course. Lecture Notes. Part 3: Addendum: Weighted Hitting Set
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 3: Addendum: Weighted Hitting Set
4 pages
Advanced Algorithms Course. Lecture Notes. Part 2: Set Cover
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 2: Set Cover
3 pages
16 Primal Dual
No ratings yet
16 Primal Dual
20 pages
Automata Revision Ans
No ratings yet
Automata Revision Ans
6 pages
An Efficient Classification Scheme For Classical Maze Problems
No ratings yet
An Efficient Classification Scheme For Classical Maze Problems
20 pages
Coding 1
No ratings yet
Coding 1
2 pages
Advalgex4 PDF
No ratings yet
Advalgex4 PDF
1 page
Advanced Algorithms 2011. Exercises 7-9
No ratings yet
Advanced Algorithms 2011. Exercises 7-9
1 page
Rodriguez Hw2
No ratings yet
Rodriguez Hw2
2 pages
Advanced Algorithms 2011. Exam Answers
No ratings yet
Advanced Algorithms 2011. Exam Answers
2 pages
Advalgex5 PDF
No ratings yet
Advalgex5 PDF
2 pages
Advanced Algorithms 2011. Exercises 1-3: J J J J
No ratings yet
Advanced Algorithms 2011. Exercises 1-3: J J J J
2 pages
Advanced Algorithms 2011. Exercises 4-6
No ratings yet
Advanced Algorithms 2011. Exercises 4-6
2 pages
Advanced Algorithms Course. Lecture Notes. Part 8
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 8
2 pages
Advanced Algorithms Course. Lecture Notes. Part 12: Small Vertex Covers (Continued)
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 12: Small Vertex Covers (Continued)
3 pages
Advanced Algorithms Course. Lecture Notes. Part 7: Minimum Cost Perfect Matchings in Bipartite Graphs
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 7: Minimum Cost Perfect Matchings in Bipartite Graphs
3 pages
Advanced Algorithms 2011. Exam Questions
No ratings yet
Advanced Algorithms 2011. Exam Questions
4 pages
Advanced Algorithms Course. Lecture Notes. Part 11: Chernoff Bounds
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 11: Chernoff Bounds
4 pages
Advanced Algorithms Course. Lecture Notes. Part 9: 3-SAT: How To Satisfy Most Clauses
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 9: 3-SAT: How To Satisfy Most Clauses
4 pages
Advanced Algorithms Course. Lecture Notes. Part 6: A Simplified Airline Scheduling Problem
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 6: A Simplified Airline Scheduling Problem
4 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet

Advanced Algorithms Course. Lecture Notes. Part 10: Hashing

Uploaded by

Advanced Algorithms Course. Lecture Notes. Part 10: Hashing

Uploaded by

Advanced Algorithms Course.

Lecture Notes. Part 10

You might also like