0% found this document useful (0 votes)
48 views

Lecture18 PDF

This document contains questions and explanations about hash tables and collision resolution. It discusses hash functions, load factors, rehashing, separate chaining, and implementing a generic hash table library. The key topics covered are what causes collisions, how to select a hash table size, when rehashing is needed, and the pros and cons of separate chaining to resolve collisions.

Uploaded by

dinban1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Lecture18 PDF

This document contains questions and explanations about hash tables and collision resolution. It discusses hash functions, load factors, rehashing, separate chaining, and implementing a generic hash table library. The key topics covered are what causes collisions, how to select a hash table size, when rehashing is needed, and the pros and cons of separate chaining to resolve collisions.

Uploaded by

dinban1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

15-123

Systems Skills in C and Unix

Questions?
 Why do we need hashing?
 Can there be entries in the hash table with same key?
 Can there be entries in the hash table with same value?
 Can there be two entries in the hash table with same

key and same value?

Questions
 What would be a good table size to select given n keys

to insert
 What is load factor?
 What would be a good load factor?
 What would you do if the load factor is too high?

questions
 how would you select a hash function?

 How do you know if your hash function is a good one?

 Is it possible to pick a function that is 1-1? How difficult

is it to find one?

What is a collision
 A collision occurs when two keys map to the same

location
 Why do collisions occur?
 Mainly due to bad hash functions
 Eg: imagine hashing 1000 keys, where each key is on

average 6 characters long, using a simple function like


H(s) = characters, and a table size of at least 1001, how
many collisions can be expected per cell (collisions occur
only when the cell is taken and another key wants to
map into the same place)

How to resolve collisions

Separate Chaining

Separate Chaining
 Pros
 No probing necessary


Each node has a place in the same hashcode

 List gets never full




Performance can go down though

 Cons
 Complicated implementation of array of linked lists
 Still lots of collisions can create a bad hash table

Load factor
 Need to keep the load factor reasonably under control
 If load factor becomes too large, rehash

Rehash
 The process of creating a larger table to distribute the

keys better

Implementing a generic hash table


 Library design considerations
 hash_node a node that contains (key, value, next)
 A struct that contains




Array of hashnode*s
Size of the table
Function pointers
 equal compare two elems and return success (equal) or
failure(not equal)
 free_key, free_value

Client considerations
 Must provide a hash function
 It is also possible to provide a generic hash function like
java API
 Must allocate memory for key and value (if necessary)

Implementation
hashlib.h

client.c

hashlib.c

hashlib.0

a.out

Data Structures

Library Interface
 ht_init
 ht_insert
 ht_retrieve
 ht_rehash
 ht_set functions
 equal, free_key, free_value

Client implementation
int hashcode(void* s, int m) {
/* this takes a pointer to a key and
computes the hash code. m is string size
*/
}

Code Examples

You might also like