0% found this document useful (0 votes)

11 views5 pages

4101 Assignment 9

The document outlines a homework assignment for York University's EECS 4101/5101 course, focusing on designing a hash table for storing strings with a specific hash function to minimize collisions. It includes problems related to the properties of the hash function, assumptions about prime number sizes, and the independence of coefficients. Additionally, it discusses the expected time complexity for searching strings in the hash table using chaining for collision resolution.

Uploaded by

rajpunjabi47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views5 pages

4101 Assignment 9

Uploaded by

rajpunjabi47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

York University EECS 4101/5101

November 22, 2024

Homework Assignment #9
Due: November 29, 2024 at 5:00 p.m.
November 29, 2024

Problem 1: Georgy’s Hash Table

Georgy is working on designing a hash table to store strings of various lengths. Since the
length of each string may vary, Georgy is trying to devise a general method for selecting a
hash function such that the probability of collision between any two strings is minimized.
A string x of length ℓ is represented as a sequence of characters ⟨x0 , x1 , . . . , xℓ ⟩, where
the character xℓ is a special End of Text (ETX) character that is not allowed to appear
earlier in the string. Each character is encoded in ASCII, with values between 0 and 127
(with ETX encoded as 3).
For simplicity, Georgy decides to use a hash table size p, where p is a prime number
greater than 128. Georgy generates random numbers a0 , a1 , a2 , . . ., where each ai ∈
{0, 1, 2, . . . , p − 1}, and defines the hash function h as:
ℓ
!
X
h(⟨x0 , x1 , . . . , xℓ ⟩) = ai x i mod p
i=0

Answer(a)
Consider two different strings x = ⟨x0 , x1 , . . . , xℓ ⟩ and y = ⟨y0 , y1 , . . . , ym ⟩ (possibly with
different lengths):

(i) Why must there be some index k such that 0 ≤ k ≤ min(ℓ, m) and xk ̸= yk ?
Since the strings x and y are different, it is guaranteed that there must exist an index k
where the characters at that position differ. This is because the strings are not identical,
so at least one character must not match. This index must be less than or equal to
min(ℓ, m) because:
• If the strings are of different lengths, say ℓ ̸= m, the shorter string will end at index
min(ℓ, m) − 1. Thus, we compare characters only up to the index min(ℓ, m) − 1.
If the strings differ in length, the last characters compared before the strings end
must also differ, ensuring the existence of such a k.
• If the strings are of the same length, there will still be an index k where the char-
acters xk ̸= yk , since the assumption is that x and y are not identical.

1
• Therefore, the index k must be within the range 0 ≤ k ≤ min(ℓ, m), as we only
need to compare up to the shorter string’s length or up to the first position where
the strings differ.
P
ℓ Pm
(ii) Show that h(x) = h(y) if and only if ak (xk −yk ) mod p = i=0,i̸=k ai y i − i=0,i̸=k ai xi
mod p.
We are given two strings x = ⟨x0 , x1 , . . . , xℓ ⟩ and y = ⟨y0 , y1 , . . . , ym ⟩, and we need to
show that the hash values h(x) and h(y) are equal if and only if the condition
ℓ m
!
X X
ak (xk − yk ) mod p = ai y i − ai x i mod p
i=0,i̸=k i=0,i̸=k

holds for some index k.

The hash function h(x) for the string x is defined as:
ℓ
!
X
h(x) = ai x i mod p.
i=0

Similarly, for the string y, the hash function is:

m
!
X
h(y) = ai y i mod p.
i=0

Now, let us assume that h(x) = h(y). This means:

ℓ
! m
!
X X
ai x i mod p = ai y i mod p.
i=0 i=0

This can be rewritten as:

ℓ
X m
X
ai x i − ai y i ≡ 0 (mod p).
i=0 i=0

We now separate the sums at the index k where xk ̸= yk . Let’s write the sum for each
side, taking care to exclude the term at k:
ℓ
X ℓ
X
ai xi = ak xk + ai x i
i=0 i̸=k,i=0

and m m
X X
ai yi = ak yk + ai y i .
i=0 i̸=k,i=0

Thus, the difference becomes:

ℓ m
!
X X
ak x k + ai x i − ak y k + ai y i ≡0 (mod p).
i̸=k,i=0 i̸=k,i=0

Simplifying the expression:

2
ℓ
X m
X
ak (xk − yk ) + ai x i − ai yi ≡ 0 (mod p).
i̸=k,i=0 i̸=k,i=0

Rearranging terms:
ℓ m
!
X X
ak (xk − yk ) ≡ ai y i − ai x i (mod p).
i̸=k,i=0 i̸=k,i=0

This completes the proof for the forward direction: if h(x) = h(y), then the condition
holds.
Now, we prove the reverse direction: if the condition holds, then h(x) = h(y).
Assume that:
ℓ m
!
X X
ak (xk − yk ) ≡ ai y i − ai x i (mod p).
i̸=k,i=0 i̸=k,i=0
Pℓ Pm
Adding i=0 ai xi and i=0 ai yi to both sides:
ℓ
X m
X
ai x i − ai yi + ak (xk − yk ) ≡ 0 (mod p).
i=0 i=0

Thus:
ℓ
! m
!
X X
ai x i mod p = ai y i mod p.
i=0 i=0

This shows that h(x) = h(y), completing the proof for the reverse direction.
Therefore, we have shown that h(x) = h(y) if and only if the condition
ℓ m
!
X X
ak (xk − yk ) mod p = ai y i − ai x i mod p
i̸=k,i=0 i̸=k,i=0

holds for some index k.

1
.
p

(b) Assumptions and Dependencies

(i) Where did you use the assumption that p > 128?
The assumption that p > 128 ensures that the hash function can accommodate the entire
ASCII character set. If p were smaller than 128, collisions would occur more frequently
since there wouldn’t be enough space to map all the possible character values.

3
(ii) Where did you use the assumption that the ai ’s are independent?
The independence of the ai ’s ensures that each character contributes uniquely to the hash
value, preventing any systematic bias in the hash computation. If the coefficients were
not independent, the hash function could produce biased or repetitive results, increasing
the chance of collisions.

(iii) Show that the probability that h(x) = h(y) is p1 .

When we calculate the hash values h(x) and h(y), the result depends on the random
numbers a0 , a1 , . . . , ap−1 , which are picked randomly from the set {0, 1, 2, . . . , p − 1}.
These numbers are chosen independently of each other, and each has an equal chance of
being any number from 0 to p − 1.
The important part to focus on is the condition for when the two hash values are
equal, i.e., when h(x) = h(y). This condition involves the sum of the differences between
the corresponding characters of the two strings, multiplied by the random numbers ai .
The term that matters most here is ak , where xk and yk differ. In order for h(x) = h(y),
the random number ak needs to satisfy a specific equation:

ak (xk − yk ) ≡ some constant (mod p).

Since the value of ak is picked randomly from {0, 1, 2, . . . , p − 1}, the chance that ak
satisfies this equation is simply p1 . That’s because there are p possible values for ak , and
only one specific value will work to satisfy the condition for the hash values to be equal.
Therefore, the probability that h(x) = h(y) is p1 .

(b) Where, exactly, in part (a) did you use the assumption that p > 128?
Where, exactly, did you use the assumption that the ai ’s are independent?
In part (a), the assumption that p > 128 was used in the following context:
- The assumption ensures that the size of the hash table p is sufficiently large to
allow each character in the string to have a unique, non-conflicting hash value. Since
each character is encoded in ASCII, which uses values between 0 and 127, having p >
128 guarantees that there will be enough distinct buckets in the hash table to store all
potential character combinations without significant collisions. If p ≤ 128, the chance of
collisions increases because the number of available buckets is too small compared to the
possible character combinations.
As for the assumption that the ai ’s are independent:
- The independence of the ai ’s is crucial in the hash function’s design. Since the
random numbers a0 , a1 , . . . , ap−1 are chosen independently, it means that the contribution
of each character in the string to the hash value is independent of the others. This
independence ensures that the hash values will be well distributed and minimizes the
chances of collisions between different strings. If the ai ’s were not independent, it would
introduce a correlation between the terms in the hash function, increasing the likelihood
of collisions and making the hash function less effective.

(c) Storing Strings in a Hash Table

Now, Georgy wants to store a set S of n strings in a hash table of size p, where p > n.
He will use chaining to resolve collisions. Answer the following questions:

4
(i) How should Georgy store the coefficients a0 , a1 , . . .? When should each ai
be chosen?
Georgy should choose the random values a0 , a1 , . . . for the hash function just before
inserting each string into the hash table. These values should be selected independently
and uniformly at random from the set {0, 1, 2, . . . , p − 1}.
There’s no need to choose all the ai ’s in advance, as they are only required when
computing the hash for a specific string. The chosen values are used temporarily to
compute the hash value for that string and do not need to be stored permanently. Once
the hash is computed and the string is inserted into the hash table, the values can be
discarded.
Therefore, Georgy can pick a0 , a1 , . . . just before inserting each string and can store
them temporarily during the hash computation process.

(ii) What is the expected time to search for a string of length ℓ in the hash
table, and how can you justify this?
Once the hash table for the set S is built, the expected time to perform a search for
a string of length ℓ can be broken down into two parts: the time to compute the hash
function and the time to search within the bucket.
1. **Time toPCompute the Hash Function**: The hash function h(x) involves com-
puting the sum ℓi=0 ai xi modulo p. Since each arithmetic operation and comparison in
{0, 1, 2, . . . , p − 1} is constant time, the time to compute the hash is proportional to the
length of the string ℓ, i.e., O(ℓ).
2. **Time to Search in the Bucket**: After computing the hash, we check the cor-
responding bucket for the string. Since chaining is used to resolve collisions, the search
within a bucket will involve traversing a linked list of strings. On average, if the hash
table is well-distributed, the expected number of strings in any given bucket will be np ,
where n is the number of strings and p is the size of the hash table.
The expected time to search within the bucket is thus proportional to the average
number of elements in the bucket, i.e., O np .
Combining both parts, the total expected time to search for a string of length ℓ is:

n
O(ℓ) + O
p
Thus, the expected time to perform the search is O(ℓ + np ), where O(ℓ) accounts for

the hash computation time, and O np accounts for the search within the bucket.

Full Stack Lab - MANUAL
0% (1)
Full Stack Lab - MANUAL
53 pages
Hash Data Structure
No ratings yet
Hash Data Structure
18 pages
Dsa Merged
No ratings yet
Dsa Merged
339 pages
Hashing
No ratings yet
Hashing
111 pages
8 Hashtables
No ratings yet
8 Hashtables
84 pages
11 - Hash Table
No ratings yet
11 - Hash Table
65 pages
Hash Tables
No ratings yet
Hash Tables
45 pages
Lec 11 Hash Table
No ratings yet
Lec 11 Hash Table
43 pages
Unit 3 Hashing
No ratings yet
Unit 3 Hashing
23 pages
Idst 2016 SA 05 Hashing
No ratings yet
Idst 2016 SA 05 Hashing
68 pages
Dictionaries: Sets
No ratings yet
Dictionaries: Sets
92 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
77 pages
Separate Chaining Hashing Technique
No ratings yet
Separate Chaining Hashing Technique
50 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Hash Tables
No ratings yet
Hash Tables
30 pages
Lecture 3.2.1 Hashing
No ratings yet
Lecture 3.2.1 Hashing
17 pages
2 6 Hash+Maps Min
No ratings yet
2 6 Hash+Maps Min
12 pages
Hashing and Hash Tables
No ratings yet
Hashing and Hash Tables
23 pages
Hashing
50% (2)
Hashing
43 pages
CS2040 Tutorial4 Ans
No ratings yet
CS2040 Tutorial4 Ans
5 pages
Chapter 5 - Hashing - Part1
No ratings yet
Chapter 5 - Hashing - Part1
28 pages
Unit 5
No ratings yet
Unit 5
50 pages
Universal Hashing
No ratings yet
Universal Hashing
4 pages
String Matching and Hashing
No ratings yet
String Matching and Hashing
10 pages
Lab 3
No ratings yet
Lab 3
5 pages
Hash Table
No ratings yet
Hash Table
24 pages
07 Hashing
No ratings yet
07 Hashing
73 pages
Hash Functions
No ratings yet
Hash Functions
9 pages
CS5800 Assignment 6
No ratings yet
CS5800 Assignment 6
10 pages
Lecture 04 Inaryseachtree
No ratings yet
Lecture 04 Inaryseachtree
20 pages
Hashing PDF
No ratings yet
Hashing PDF
61 pages
Hashing
No ratings yet
Hashing
23 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
Unit - 3
No ratings yet
Unit - 3
45 pages
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
No ratings yet
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
41 pages
Ads-Unit I
No ratings yet
Ads-Unit I
16 pages
hw05 Solution PDF
No ratings yet
hw05 Solution PDF
8 pages
Hashing
No ratings yet
Hashing
13 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hashing: An Ideal Hash Table
No ratings yet
Hashing: An Ideal Hash Table
11 pages
Chapter10 HashTables
No ratings yet
Chapter10 HashTables
49 pages
Iit Lecture Notes On Data Structure
No ratings yet
Iit Lecture Notes On Data Structure
36 pages
Intro To C#
No ratings yet
Intro To C#
90 pages
22CS302 LM21
No ratings yet
22CS302 LM21
7 pages
Perfect Hashing
No ratings yet
Perfect Hashing
6 pages
CLRS Chapter 11 Solutions
No ratings yet
CLRS Chapter 11 Solutions
7 pages
CS606-FinalTerm-By Rana Abubakar Khan
No ratings yet
CS606-FinalTerm-By Rana Abubakar Khan
27 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
LAB003
No ratings yet
LAB003
1 page
Hash Full
No ratings yet
Hash Full
8 pages
DS Module 5 Hashing
No ratings yet
DS Module 5 Hashing
23 pages
Lect10 Hash Basics
No ratings yet
Lect10 Hash Basics
4 pages
Rolling Hash (Rabin-Karp Algorithm) : Objective
No ratings yet
Rolling Hash (Rabin-Karp Algorithm) : Objective
4 pages
Hash Table 2010
No ratings yet
Hash Table 2010
43 pages
Problem Idea of Universal Hashing
No ratings yet
Problem Idea of Universal Hashing
14 pages
MCQ Infix Postfix Prefix
No ratings yet
MCQ Infix Postfix Prefix
4 pages
Questions 11
No ratings yet
Questions 11
3 pages
Solutions To Exercises On Hash Tables
No ratings yet
Solutions To Exercises On Hash Tables
3 pages
Fundamentals of Programming Lec-1
No ratings yet
Fundamentals of Programming Lec-1
26 pages
Software Engineering Thesis Titles
100% (3)
Software Engineering Thesis Titles
4 pages
Hashing
No ratings yet
Hashing
14 pages
Engineering Problem Solving With C++ 4th Edition Etter Test Bank 1
100% (75)
Engineering Problem Solving With C++ 4th Edition Etter Test Bank 1
5 pages
Data Structure Algorithm & System Design Learnbay
No ratings yet
Data Structure Algorithm & System Design Learnbay
21 pages
11CS EM PublicQns
No ratings yet
11CS EM PublicQns
36 pages
DLD Chapter2
No ratings yet
DLD Chapter2
64 pages
Introduction To Python
No ratings yet
Introduction To Python
6 pages
Non-Restoring Division Algorithm
100% (1)
Non-Restoring Division Algorithm
4 pages
COAL Mid Paper-1
No ratings yet
COAL Mid Paper-1
3 pages
Data Structure Assignment
No ratings yet
Data Structure Assignment
10 pages
FRM Download File
No ratings yet
FRM Download File
8 pages
Parallel & Distributed Computing: By: M. Imran Siddiqui
No ratings yet
Parallel & Distributed Computing: By: M. Imran Siddiqui
30 pages
Chapter One
No ratings yet
Chapter One
27 pages
Lecture 10 - Generic Class
No ratings yet
Lecture 10 - Generic Class
30 pages
Pragma Solidity
No ratings yet
Pragma Solidity
43 pages
Learning C Programming
No ratings yet
Learning C Programming
19 pages
Python2 Cheat Sheet v2
No ratings yet
Python2 Cheat Sheet v2
2 pages
Prgm1-Bresenham's Line Drawing
No ratings yet
Prgm1-Bresenham's Line Drawing
3 pages
UBD01 - Fundamentals
No ratings yet
UBD01 - Fundamentals
3 pages
Data Structures Notes
No ratings yet
Data Structures Notes
2 pages
VILauncher User Guide
No ratings yet
VILauncher User Guide
9 pages
Midterm Exam
No ratings yet
Midterm Exam
3 pages
Front Page of Report
No ratings yet
Front Page of Report
3 pages
II-II CSE ACD MID-2 Descriptive QP Set 1&2 July 2024
No ratings yet
II-II CSE ACD MID-2 Descriptive QP Set 1&2 July 2024
2 pages
Ps 1
No ratings yet
Ps 1
4 pages
Software Development From A To Z - OOP, UML, Agile, Python Udemy Coupon Code & Review PDF
No ratings yet
Software Development From A To Z - OOP, UML, Agile, Python Udemy Coupon Code & Review PDF
3 pages

4101 Assignment 9

Uploaded by

4101 Assignment 9

Uploaded by

York University EECS 4101/5101

November 22, 2024

Problem 1: Georgy’s Hash Table

holds for some index k.

Similarly, for the string y, the hash function is:

Now, let us assume that h(x) = h(y). This means:

This can be rewritten as:

Thus, the difference becomes:

Simplifying the expression:

holds for some index k.

(b) Assumptions and Dependencies

(iii) Show that the probability that h(x) = h(y) is p1 .

ak (xk − yk ) ≡ some constant (mod p).

(c) Storing Strings in a Hash Table

You might also like