CS235102 Data Structures
CS235102 Data Structures
Data Structures
Chapter 8 Hashing
Implementations
Binary search tree: the complexity is O(n)
Some other binary trees (chapter 10): O(log n).
Hashing
A technique for search, insert, and delete operations
that has very good expected performance.
Search Techniques
Search tree methods
Identifier comparisons
Hashing methods
Relies on a formula called the hash function.
Types of hashing
Static hashing
Dynamic hashing
Hash table, ht
Stored in sequential memory locations that are
partitioned into b buckets, ht[0], , ht[b-1].
Each bucket has s slots
f(x): 0 (b-1)
b buckets
0
1
2
.
.
b-2
b-1
1
s slots
Synonym
s
123
203
P2
203
241
P3
241
112
P4
112
20
P5
20
699
folding at the
boundaries
MSD ---> LSD
LSD <--- MSD
123
203
241
112
20
d1n
X2:d21 d22
d2n
Xm:dm1 dm2
dmn
Select 3 digits from n
Criterion:
Delete the digits having the most skewed distributions
insertion
float
acos
atoi
atol
define
char
ceil
cos
floor
exp
Enter ctime
Identifiers tend to cluster together :
Adjacent cluster tend to coalesce
Increase the search time
Example: suppose we enter the
C built-in functions into a
26-bucket hash table in order.
The hash function uses the first
character in each function name
Enter
sequence:
acos, atoi, char, define, exp,
ceil, cos, float, atol, floor, ctime
# of key comparisons=35/11=3.18
Hash table with linear probing (26 buckets, 1
Quadratic probing
rehashing
random probing
Rehashing
Try f1, f2, , fm in sequence if collision occurs
disadvantage
comparison of identifiers with different hash values
use chain to resolve collisions
j
10
14
11 2
127
31
19 4
251
62
23 5
503 125
31 7
1019 254
# of key comparisons=21/11=1.91
Comparison:
Dynamic Hashing
Dynamic hashing using directories
Analysis of directory dynamic hashing
simulation