0% found this document useful (0 votes)

53 views13 pages

Hashing

Hash tables allow storing and quickly searching records by mapping records to array indices using a hash function. Collisions occur when different records hash to the same index. Common collision resolution methods include linear probing, quadratic probing, and separate chaining. A good hash function maps records uniformly to random indices to minimize collisions. Modulo arithmetic ensures hash values fit in the array range and avoids overflow issues.

Uploaded by

dasd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views13 pages

Hashing

Uploaded by

dasd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 13

Hash Tables

Using an array, we can retrieve/search for values in that array

in O(lg n) time. However, inserting an item into an array
quickly can be difficult. Why is this?

Also, if we store items in a binary tree, we can also

retrieve/search for them in O(lg n) time as long as the tree is
balanced. Insertion also takes O(lg n). These seem to be some
pretty good numbers (we'll do the analysis later when we start
to cover binary trees), but actually, we can do even better.

A hash table allows us to store many records and very quickly

search for these records. The basic idea is as follows:

If you know you are going to deal with a total of n records,

create an array of size n. Each array slot should be able to hold
1 record. Now, the crux to a hash table working is a good hash
function. A hash function takes as an input the type of record
being stored, and outputs a value from 0 to n-1, an integer that
is a valid index into the array.

So, for example, if you were storing Strings, the hash function
would have to map an arbitrary String to an integer in the
range of the hash table array. Here is an example of such a
hash function (this is a very poor hash function):

f(w) = ascii value of the first character of w.

One of the first things to notice about a hash function is that

two different values, such as "cat" and "camera", can hash to
the exact same value. (In this case, the ascii values of the first
letters of both words are identical.)
For the time being, let's ignore this problem. Let's assume that
our hash function works in such a way that every element we
try to store in it miraculously hashes to a different location.

Now, imagine searching for an element in the hash table. All

you have to do is compute its hash value and then just look in
that ONE particular array slot! Thus, the running time of this
operation is simply based on how long it takes to compute the
hash function. Hopefully, we can come up with a hash function
that works reasonably well that can be computed in O(1) time.

So now we get to the problem of collisions. A collision is when

two values you are storing in your hash table hash to the exact
same location. (In essence, we have that f(x) = f(y) when x y.)
Some ideas of how to deal with these:

1) Don't: Just replace the new value you are trying to insert
with the old one stored in the hash table.

2) Linear Probing: If there is a collision, continue searching in

the array in sequential order until you find the next empty
location.

3) Quadratic Probing: If there is a collision, continue searching

in the array by offsets of the integers square. This means you
first look in array index c, the original index, followed by index
c+1, then index c+4, then index c+9, etc.

4) Separate Chaining Hashing: Rather than storing the hash

table as an array of elements, store it as an array of linked lists
of elements. If there is a collision, just insert the new element
into the linked list at that slot. Hopefully, there will only be a
few collisions so that no one linked list will be too long.
Although searching may not be exactly O(1) time, it will
definitely be quicker than O(lg n) time.
The Hash Function

Since these are difficult to truly analyze, I won't get into much
detail here. (The book does not either.) Ideally, a hash function
should work well for any type of input it could receive. With
that in mind, the ideal hash function simply maps an element
to a random integer in the given range. Thus, the probability
that a randomly chosen element maps to any one of the
possible array locations should be equal.

Given this reasoning, why is the hash function I showed you

earlier a poor choice?

Mainly for two reasons:

1) It's designed for an array of only size 26. (Or maybe a bit
bigger if we allow non-alphabetic characters.) Usually hash
tables are larger than this.

2) More words start with certain letters than others. These

hash locations would be more densely filled.

Let's go over a couple ideas in the book for hash functions

(when dealing with Strings):

1) Each printable character has a ascii value less than 128.

Thus, a string could be a representation of a number in base
128. So, if we had the String "dog", we could hash it to the
integer

ascii('d')1280 + ascii('o')1281 + ascii('g')*1282 =

1001 + 111128 + 103*1282 = 1701860.

What are the problems with this technique?

1) Relatively small Strings map to HUGE integers. It is

possible that these integers are not valid indexes to our hash
table.

2) Just computing this function may cause an overflow error.

Remember that the int type can only store an integer up to 2 31 -
1. Any string of length 5 or higher would hash to a value
greater than this.

How can we deal with these problems?

Simply put, the mod operator can help us deal with both of
these issues. Change the function as follows:

f(c0c1...cn) =
(ascii(c0)*1280 + ascii(c1)*1281 +...+ ascii(cn)*128n) mod tablesize

Now, we get a guarantee that each String hashes to a valid

location in the table, and it's not readily apparent that this
function isn't fairly random. It may not be, but chances are it's
better than the first one I showed you. However, rigorously
proving this is most certainly beyond the scope of this class.

Now, if you do this calculation in one particular way, you have

the possibility creating overflow. (Namely if you only do the
mod at the end rather than at each point of the calculation...)

If you apply the mod at each step, then all intermediate values
calculated remain low and no overflow occurs.
Also, using Horner's rule can aid in this computation.
Applying the rule states that

(ascii(c0)1280 + ascii(c1)1281 +...+ ascii(cn)*128n) =

ascii(c0) + 128(ascii(c1) + 128(ascii(c2) + ...+(128(ascii(cn-1) + 128

ascii(cn))...))

In general, Horner's rule specifies how to evaluate a

polynomial without calculating xn in that polynomial directly.
Here it is:

cnxn + cn-1xn-1 + ... c1x + c0 =c0 + x(c1 + x(c2 + ... + x(cn-1 + xcn)...))

Notice that if you are trying to compute the value of a

polynomial mod n none of your intermediate calculations get
very large.

An example of a more severe pitfall with a hash function is one

like the following:

f(c0c1...cn) = (ascii(c0)+ ascii(c1) +...+ ascii(cn)) mod tablesize

The problem here is that if the table size is big, say just even
10000, then you find that the highest value an 8 letter string
could possibly hash to is 8*127 = 1016. Then, in this situation,
you would NOT be using nearly 90% of the hash locations at
all. This would most definitely result in many values hashing to
the same location.

One final note: I have just shown you how to use mod to get a
hash function into the desired range. Another technique that
can be used is the "MAD" method, or the multiply and divide
method. This uses mod as well. You can read about this in page
348 of the text.
Linear Probing

This is the first reasonable idea mentioned with respect to

dealing with collisions. (Throwing away values from the hash
table shouldn't be an option if you want any sort of retrieval
accuracy.)

The idea is as follows:

Let's say that our hash table can hold up to 10 integer values
and that it currently looks like this:

index 0 1 2 3 4 5 6 7 8 9
value 173 281 461

Now imagine that the next value to store, 352, hashed to

location 3.

We see that a value is stored there, so instead, we now look at

location 4. Since a value is stored there also, we continue to the
next location, where we have found a place to store 352:

index 0 1 2 3 4 5 6 7 8 9
value 173 281 352 461

Now, if we want to search for 352, we'd first figure out it's hash
value was 3. Since it is NOT stored there, we'd look at location
4. Since it's not stored there, we'd finally find 352 in location 5.

Incidentally, when do we know that we can stop searching for a

value with this method?
When we hit an empty location is the answer...until then there
is NO way to tell whether or not the values we are looking at
initially hashed to 3 or not.

So the question arises - how many array elements should we

expect to look at during a search for a random element.

IF, the elements in the hash table were randomly distributed,

then we could use the following analysis:

(Note: If you can't follow this, don't worry about it...)

Let  be the fraction of the elements that are filled in the hash
table. This means that 1- fraction of elements are free in the
hash table.

Thus, the probability that the first element we search is empty

is 1-.

But, what if that's not the case. Then the probability that the
first element we search is full, but the second is empty is
(1-).

Similarly, the probability that the first two elements are full
but the third empty is 2(1 - )

In any event, our approximate expected number of cells

searched is:
 

 i (1   )
i 1
i 1
 (1   ) ii 1
i 1


d
 (1   )
d

i 1
i

d
 (1   ) (  i )
d i 0

d 1
 (1   ) ( ( ))
d 1 

(1   )1   ( 1)
 (1   )( )
(1   ) 2

1   
 (1   )( )
(1   ) 2

1
 (1   )( )
(1   ) 2

1

(1   )

Basically, what this says is that if  = .5, (meaning that the

table is at most half full), then we would expect to have to
search 2 elements in the hash table.

Unfortunately, this analysis isn't accurate because in a hash

table with linear probing, the elements do NOT distribute them
selves in the hash table randomly.

Rather, clustering occurs. After a few values have been placed

in the table, you'll get a run of several indexes in a row in the
table that are filled. Now, it's pretty likely that something will
hash to that cluster. It won't "find a home" so to speak until it
gets to the end of the cluster, thereby increasing the size of the
cluster as well.

It turns out that if you do the analysis with clustering, which is

beyond the scope of this class for sure (it's a lot uglier than
what I just showed you above...) the actual number of cells
examined is, on average (1 + 1/(1 - )2)/2.

This isn't so bad if  is only .5, but as  gets close to 1, say .9,
then the average number of values to search is about 50 or so,
which certainly is not efficient.

Thus, if you use linear probing, it makes sense to make your

hash table about twice as big as the number of elements you
want to store in it.
Quadratic Probing

The idea here is to get rid of primary clustering. Now consider

the same situation as before where we are trying to insert 352
into the following hash table:

index 0 1 2 3 4 5 6 7 8 9
value 173 281 461

In quadratic probing, after we see that 352 hashes to location 3

and that location is filled, we will go to location 4 (since this is
3+12), But this is taken as well. So next, we will go to location 7
(since this is 3+22), and this is where we can store 352.

index 0 1 2 3 4 5 6 7 8 9
value 173 281 461 352

Although this example wasn't a great example of the advantage

of quadratic probing, hopefully you can see that if we have to
try four or five locations to place an element that we aren't
necessarily trying in the same cluster.

The idea is the same here. You keep on looking for an element
until you find it or you hit an empty location in your search.
(In our situation, if we were searching for 863, and let's say
that also hashed to location 3, then we would check location 3,
then location 4, then location 7, and finally location 2, since the
next place to check is (3+32)%10 = 2.

Now, there are some complications with quadratic probing.

How do we know that eventually we will find a "free" location
in the array, instead of looping around all filled locations?
It turns out that if the table size is prime, AND the table is at
least half empty, quadratic probing will always find an empty
location.

Here is the proof:

Let the table size be M, where M is an odd prime > 3. We will

show that the first M/2 of the possible insertion locations are
unique. Let's say that a value to store in the hash table hashes
to array location H, which is already filled. Now consider two
of the possible locations that the element could end up. Let
these be (H+i2) mod M and (H+j2) mod M, with 0<i,j<M/2, and
i and j are distinct integers. Assume to the contrary that these
two represent the same index into the hash table. Then we
have:

(H+i2)  (H+j2) mod M 

i2  j2 mod M 
M | (i2 - j2) 
M | (i - j)(i+ j)

Since M is prime, it follows that either M | (i-j) or M | (i+j). But

neither of these can be true since i can not equal j and the
maximum value of i+j is (M-1)/2, since M is odd.

This proves that if we keep our hash table at least half empty
and use quadratic probing, we will ALWAYS be able to find a
location for a value to hash to. (Notice that unlike linear
probing where we are guaranteed to cycle through every
possible location when we hash, here it is more difficult to
prove such a property.)
Dynamic Table Expansion

Of course it's certainly possible that we won't know how many

records we'll have to store in a hash table before we set it up.
So, it would be nice if we could expand our hash table, much
like we expanded our stack and queue when we used an array
implementation.

But, there are more issues here to think about. How will our
hash function change? Once we change that, what do we have
to do for each value in the hash table?

When we want the table size to expand, it makes sense to at

least double the size. Also, it turns out that prime numbers
usually work best with respect to hash functions that use mod.
So, our table expansion process works as follows:

1) Pick a prime number that is approximately twice as large as

the current table size.

2) Use this number to change the hash function.

3) Rehash ALL the values already stored in the table.

4) Now, hash the value to currently be stored in the table.

Step one seems like it would be expensive, but probabilistic

primality testing makes it a fairly quick and efficient process.

Step three is the most time consuming step. It would run in

O(n) time, where there are currently n values stored in the
table and the hash function can be computed in constant time.
Separate Chaining Hashing

This is last alternative mentioned, and in my opinion, the one

that makes the most sense. Instead of having some sort of
method to try new hash locations, stick with a hash location,
but somehow find a way to store more than one value at that
hash location.

Naturally, one would simply think about creating a 2D array.

The problem with this is the waste of memory. A hash table is
supposed to have one dimension that is very large. Also, to be
safe, it would be necessary to leave plenty of room in the
second dimension just in case there was one "bad" hash
location. But in this situation, you'd ALSO have to allocate
space for all the other rows, and this would end up being a
great deal of wasted space.

Instead, we can store multiple values at one location of a hash

table if each location was simply a linked list. Since we expect
each linked list to be small, inserting and searching for a value
should not be too time consuming. Assuming that the hash
function works randomly, the worst possible time (on average)
for a search operation is O(lg lg n), which is a very small
amount of time. (Once again, the proof of this is beyond the
scope of this class.)

Next assignment: Creating a hash table using double hashing.

The technique is discussed on apge 352 in the text and your
assignment will specify which two hash functions will be used.

Circuit Diagrams and PWB Layouts: 10-1 A01 715G4545 PSU 32" Adapter
No ratings yet
Circuit Diagrams and PWB Layouts: 10-1 A01 715G4545 PSU 32" Adapter
1 page
English For Presentation
No ratings yet
English For Presentation
76 pages
Chapter10 HashTables
No ratings yet
Chapter10 HashTables
49 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Chapter 5 - Hashing - Part1
No ratings yet
Chapter 5 - Hashing - Part1
28 pages
Unit 5
No ratings yet
Unit 5
50 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hash Table
No ratings yet
Hash Table
9 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hashing
No ratings yet
Hashing
30 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
15 HashTables
No ratings yet
15 HashTables
27 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
Module 5-Hashing and Collision
No ratings yet
Module 5-Hashing and Collision
51 pages
Hash Tables
No ratings yet
Hash Tables
45 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
Hashing Methods
No ratings yet
Hashing Methods
20 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Unit-5 2
No ratings yet
Unit-5 2
9 pages
22CS302 LM21
No ratings yet
22CS302 LM21
7 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hash Functions
No ratings yet
Hash Functions
9 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Lab 3
No ratings yet
Lab 3
5 pages
Hashing
No ratings yet
Hashing
20 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Exp 5 - Dsa Lab File
No ratings yet
Exp 5 - Dsa Lab File
10 pages
Ads M Tech Mid 2
No ratings yet
Ads M Tech Mid 2
26 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Unit 3 Hashing
No ratings yet
Unit 3 Hashing
23 pages
Hashing
No ratings yet
Hashing
23 pages
Module 5
No ratings yet
Module 5
33 pages
Hashing
No ratings yet
Hashing
34 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
HASHING
No ratings yet
HASHING
21 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Hashing
No ratings yet
Hashing
40 pages
DS 5
No ratings yet
DS 5
23 pages
Lab5 Hashing Algos
No ratings yet
Lab5 Hashing Algos
10 pages
Hashing
No ratings yet
Hashing
44 pages
Dsa Labtask 12
No ratings yet
Dsa Labtask 12
5 pages
2,2 Hashing
No ratings yet
2,2 Hashing
30 pages
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
No ratings yet
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
41 pages
CH 4
No ratings yet
CH 4
58 pages
Hashing New
No ratings yet
Hashing New
48 pages
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
No ratings yet
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
39 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Unit 5 Data Structure
No ratings yet
Unit 5 Data Structure
12 pages
Data Structures and Algorithms: B.Tech (CSE) Module-4
No ratings yet
Data Structures and Algorithms: B.Tech (CSE) Module-4
71 pages
Algorithm Design Paradigms (Dynamic Programming)
No ratings yet
Algorithm Design Paradigms (Dynamic Programming)
68 pages
452/454 Series Fuse: Nano Slo-Blo 452/454 Series
No ratings yet
452/454 Series Fuse: Nano Slo-Blo 452/454 Series
3 pages
11ak57 CRT Board 004.sht: TV R&D Group Vestel Electronics
No ratings yet
11ak57 CRT Board 004.sht: TV R&D Group Vestel Electronics
4 pages
PSIV181E01A HANARO2 - 37 - FHD: (PFC & Multi Section)
No ratings yet
PSIV181E01A HANARO2 - 37 - FHD: (PFC & Multi Section)
5 pages
SEFUSE SF SERIES Datasheet PDF
No ratings yet
SEFUSE SF SERIES Datasheet PDF
2 pages
Akai+AKTV3210+Chassis+TP S506 PB818
No ratings yet
Akai+AKTV3210+Chassis+TP S506 PB818
8 pages
Province Municipality/City Opportunity Product Priority Level
No ratings yet
Province Municipality/City Opportunity Product Priority Level
4 pages
Bcs Higher Education Qualifications BCS Level 4 Certificate in IT
No ratings yet
Bcs Higher Education Qualifications BCS Level 4 Certificate in IT
5 pages
Name of Subscriber Confirmed/Pay Call Out Done By/Date Sales Agent
No ratings yet
Name of Subscriber Confirmed/Pay Call Out Done By/Date Sales Agent
5 pages
Ads-Unit I
No ratings yet
Ads-Unit I
16 pages
A Fast, Minimal Memory, Consistent Hash Algorithm
No ratings yet
A Fast, Minimal Memory, Consistent Hash Algorithm
12 pages
Software Dev. Interview Q&A
No ratings yet
Software Dev. Interview Q&A
19 pages
Hash Values in Digital Forensics
No ratings yet
Hash Values in Digital Forensics
3 pages
Digital Signature
No ratings yet
Digital Signature
39 pages
Difference Between Interface and Abstract Class: How Hashmap Works in Java
No ratings yet
Difference Between Interface and Abstract Class: How Hashmap Works in Java
10 pages
Unit 9 Space and Time Tradeoffs: Structure
No ratings yet
Unit 9 Space and Time Tradeoffs: Structure
25 pages
Locality-Sensitive Binary Codes From Shift-Invariant Kernels
No ratings yet
Locality-Sensitive Binary Codes From Shift-Invariant Kernels
9 pages
Gate Pyq Test 07 - 2020 - With Solution
No ratings yet
Gate Pyq Test 07 - 2020 - With Solution
28 pages
DSA Topics 80 30 GPT
No ratings yet
DSA Topics 80 30 GPT
38 pages
Memprof: A Memory Profiler For Ruby
100% (5)
Memprof: A Memory Profiler For Ruby
142 pages
Blockchain Business Models
No ratings yet
Blockchain Business Models
19 pages
Selection Test - Sample - Paper
No ratings yet
Selection Test - Sample - Paper
24 pages
Handling Data Skew in Mapreduce: Benjamin Gufler, Nikolaus Augsten, Angelika Reiser and Alfons Kemper
No ratings yet
Handling Data Skew in Mapreduce: Benjamin Gufler, Nikolaus Augsten, Angelika Reiser and Alfons Kemper
10 pages
Mapreduce Ii: Permalink Comments (25) Trackbacks
No ratings yet
Mapreduce Ii: Permalink Comments (25) Trackbacks
6 pages
Hashing
50% (2)
Hashing
43 pages
CSC201Lesson 2
No ratings yet
CSC201Lesson 2
28 pages
Week6 Iot Big Data
No ratings yet
Week6 Iot Big Data
21 pages
Nis Manual
No ratings yet
Nis Manual
38 pages
Winnowing
No ratings yet
Winnowing
15 pages
Junit Cheat Sheet: Parameter Resolution
No ratings yet
Junit Cheat Sheet: Parameter Resolution
1 page
Reversing On Windows 2016
No ratings yet
Reversing On Windows 2016
78 pages
Unit Testing: Written by Patrick Kua Oracle Australian Development Centre Oracle Corporation
No ratings yet
Unit Testing: Written by Patrick Kua Oracle Australian Development Centre Oracle Corporation
40 pages
Co Po Mapping Justification DSA
No ratings yet
Co Po Mapping Justification DSA
3 pages
Unit 15
No ratings yet
Unit 15
6 pages
Using Environment Variables - Turborepo
No ratings yet
Using Environment Variables - Turborepo
9 pages
"Database Safety & Security": Team G2
No ratings yet
"Database Safety & Security": Team G2
80 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
23 pages
Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT
No ratings yet
Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT
13 pages

Hashing

Uploaded by

Hashing

Uploaded by

Hash Tables

Using an array, we can retrieve/search for values in that array

Also, if we store items in a binary tree, we can also

A hash table allows us to store many records and very quickly

If you know you are going to deal with a total of n records,

f(w) = ascii value of the first character of w.

One of the first things to notice about a hash function is that

Now, imagine searching for an element in the hash table. All

So now we get to the problem of collisions. A collision is when

2) Linear Probing: If there is a collision, continue searching in

3) Quadratic Probing: If there is a collision, continue searching

4) Separate Chaining Hashing: Rather than storing the hash

Given this reasoning, why is the hash function I showed you

Mainly for two reasons:

2) More words start with certain letters than others. These

Let's go over a couple ideas in the book for hash functions

1) Each printable character has a ascii value less than 128.

ascii('d')*1280 + ascii('o')*1281 + ascii('g')*1282 =

100*1 + 111*128 + 103*1282 = 1701860.

1) Relatively small Strings map to HUGE integers. It is

2) Just computing this function may cause an overflow error.

How can we deal with these problems?

Now, we get a guarantee that each String hashes to a valid

Now, if you do this calculation in one particular way, you have

(ascii(c0)*1280 + ascii(c1)*1281 +...+ ascii(cn)*128n) =

ascii(c0) + 128(ascii(c1) + 128(ascii(c2) + ...+(128(ascii(cn-1) + 128

In general, Horner's rule specifies how to evaluate a

Notice that if you are trying to compute the value of a

An example of a more severe pitfall with a hash function is one

f(c0c1...cn) = (ascii(c0)+ ascii(c1) +...+ ascii(cn)) mod tablesize

This is the first reasonable idea mentioned with respect to

The idea is as follows:

Now imagine that the next value to store, 352, hashed to

We see that a value is stored there, so instead, we now look at

Incidentally, when do we know that we can stop searching for a

So the question arises - how many array elements should we

IF, the elements in the hash table were randomly distributed,

(Note: If you can't follow this, don't worry about it...)

Thus, the probability that the first element we search is empty

In any event, our approximate expected number of cells

Basically, what this says is that if  = .5, (meaning that the

Unfortunately, this analysis isn't accurate because in a hash

Rather, clustering occurs. After a few values have been placed

It turns out that if you do the analysis with clustering, which is

Thus, if you use linear probing, it makes sense to make your

The idea here is to get rid of primary clustering. Now consider

In quadratic probing, after we see that 352 hashes to location 3

Although this example wasn't a great example of the advantage

Now, there are some complications with quadratic probing.

Here is the proof:

Let the table size be M, where M is an odd prime > 3. We will

(H+i2)  (H+j2) mod M 

Since M is prime, it follows that either M | (i-j) or M | (i+j). But

Of course it's certainly possible that we won't know how many

When we want the table size to expand, it makes sense to at

1) Pick a prime number that is approximately twice as large as

2) Use this number to change the hash function.

3) Rehash ALL the values already stored in the table.

4) Now, hash the value to currently be stored in the table.

Step one seems like it would be expensive, but probabilistic

Step three is the most time consuming step. It would run in

This is last alternative mentioned, and in my opinion, the one

Naturally, one would simply think about creating a 2D array.

Instead, we can store multiple values at one location of a hash

Next assignment: Creating a hash table using double hashing.

You might also like

ascii('d')1280 + ascii('o')1281 + ascii('g')*1282 =

1001 + 111128 + 103*1282 = 1701860.

(ascii(c0)1280 + ascii(c1)1281 +...+ ascii(cn)*128n) =