0% found this document useful (0 votes)
48 views157 pages

Unit-V Tables: Sinhgad Institute of Technology

The document discusses different types of balanced binary search trees including symbol tables, AVL trees, and various tree rotations. It provides details on: 1) Symbol tables which are data structures used in compilers to store program elements and their attributes. AVL trees are a type of self-balancing binary search tree where the balance factor of any node must be between -1 and 1. 2) The four basic tree rotations - left-left, right-right, left-right, and right-left - that are used to rebalance an AVL tree after an insertion or deletion to maintain the balance factor property. Pseudocode for performing each rotation type is also provided. 3) Workings of the

Uploaded by

shubham gandhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views157 pages

Unit-V Tables: Sinhgad Institute of Technology

The document discusses different types of balanced binary search trees including symbol tables, AVL trees, and various tree rotations. It provides details on: 1) Symbol tables which are data structures used in compilers to store program elements and their attributes. AVL trees are a type of self-balancing binary search tree where the balance factor of any node must be between -1 and 1. 2) The four basic tree rotations - left-left, right-right, left-right, and right-left - that are used to rebalance an AVL tree after an insertion or deletion to maintain the balance factor property. Pseudocode for performing each rotation type is also provided. 3) Workings of the

Uploaded by

shubham gandhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 157

Sinhgad Technical Education Society’s

SINHGAD INSTITUTE OF TECHNOLOGY


Gat no. 309/310, Off Mumbai-Pune Expressway, Kusgaon (Bk.) Lonavala -410401.

UNIT- V
TABLES
CONTENTS .

1. Symbol Tables :Notion of Symbol Table, Concept of red


and black trees, AVL Trees,
2. OBST, Huffman's algorithm,
3. Heap data structure, Min and Max Heap , Heap sort
implementation, applications of heap
4. Hash tables and scattered tables: Basic concepts, hash
function, characteristics of good hash function, different
key-to-address transformations techniques,
5. Synonyms or collisions, collision resolution techniques-
linear probing, quadratic probing, rehashing,
6. Chaining without replacement and chaining with
replacement.
.

PART-1

3
.

Symbol Table

Is an ADT for storing names encountered in source


program along with their attributes.
Typically used in compilers.
Types of Symbol Table
Static
Dynamic
Operations on symbol table 1) Insertion
2) Deletion
3) Searching
.

AVL TREES

An AVL tree is a height balanced binary search tree.


AVL is named for its inventors: Adel’son-Vel’skii and
Landis

In AVL tree

Balance factor = |Height of left subtree – Height of right


subtree.|

Balance factor must be always -1 , 0, or 1.


.

Balance Factor
A Balance Factor of a node will be defined as height
of left subtree of the node minus the height of right
subtree of the node.
B.F of a node
= ht(leftsubtree) – ht(rightsubtree)
For AVL Trees , the B.F which is measure of
imbalance , should be in the range of -1 , 0 , or 1.
.

20

15 35

10 18 25 40

30 38 45

50
.

-2
20

0 -1
15 35

0 0 -1
-1
10 18 25 40

0
0 -1
30 38 45

0
50
.

4 Types of Rotations to be applied


when the tree is imbalanced

Single rotations
LL Rotations
RR Rotations
Double Rotations
LR Rotations
RL Rotations
.

LL Rotations

When a node is added on the left of leftson, such


that the balance factor of the current node becomes
2, balancing will be achieved by LL Rotations.
.

LL Rotations
2
T1
1
T2 d

T3 c

a b

New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is 2, and T2 B.f is 1 then we need to apply LL rotation as
new node is added to the left of T1 and left of T2.
.

LL Rotations
2
T1 T2
1
T2 d T3 T1
LL rotation

T3 c a b c d

a b

Before Balancing After Balancing


.

LL Rotations Proof
2
T1
1 T2
T2 d
T3 T1
T3 c LL rotation

a a b c d
b

Before Balancing. After Balancing.


Assumptions : B.F (T1) = ht(c) – ht(d)
Let ht(d) = x = x–x =0
So, ht(T2) = x + 2 B.F (T3) = Original
So, Ht(T3) = x + 1 B.F(T2) = Ht(T3) – Ht(T1)
And So Ht(c) = x = (x + 1) - (x + 1)
=0
.

LL Rotations implementations
1. Let T1 be the node with bf 2 and its
parent as par
2. T2 = T1->lc and has bf 1
3. T1->lc = T2->rc
4. T2->rc = T1
5. T1->bf = 0
6. T2->bf = 0
7. Attach T2 to parent node par
.

RR Rotations
When a node is added on the right of right son, such
that the balance factor of the current node becomes -2,
balancing will be achieved by RR Rotations.
.

RR Rotations
-2
T1
-1

a T2

b
T3

c d
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is -2, and T2 B.f is -1 then we need to apply RR rotation as
new node is added to the right of T1 and right of T2.
.

RR Rotations
-2
T1
T2
-1
a T2
T1 T3
RR rotation

b
T3
a b c d

c d

Before Balancing After Balancing


.

RR Rotations implementations
1. Let T1 be the node with bf -2 and its
parent as par
2. T2 = T1->rc and has bf -1
3. T1->rc = T2->lc
4. T2->lc = T1
5. T1->bf = 0
6. T2->bf = 0
7. Attach T2 to parent node par
.

LR Rotations
When a node is added on the left of Rightson,
such that the balance factor of the current
node becomes 2 and that of its leftson
becomes -1, balancing will be achieved by LR
Rotations.
.

LR Rotations
2
T1
-1
T2 d

a
T3

b c
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is 2, and T2 B.f is -1 then we need to apply LR rotation as
new node is added to the left of T1 and right of T2.
.

LR Rotations
2
T1
-1
T3
T2 d
LR rotation
T2 T1
a T3
a b c d
b c

Before Balancing After Balancing


.

LR Rotations
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. HT (T2) = max (Ht(a), Ht(b)) + 1
Assumptions : = max( x, will not be > x) + 1
Let ht(d) = x =x+1
HT (T1) = max (Ht(c), Ht(d)) + 1
So, ht(T2) = x + 2
= max( will not be > x , x) + 1
So, Ht(T3) = x + 1 =x+1
And So Ht(a) = x B.F (T3) = O
.
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 1 : When B.F of T3 = 1
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. BT (T2) = Ht(a) - Ht(b)
ht(b) = ht(c) + 1 =x–x=0
BF (T1) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x–1-x
So, ht(b) = x = -1
So, Ht(c) = x - 1
.
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 2 : When B.F of T3 = -1
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. BT (T2) = Ht(a) - Ht(b)
ht(c) = ht(b) + 1 = x – (x – 1) = 1
BF (T1) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x– x
So, ht(c) = x =0
So, Ht(b) = x - 1
.

RL Rotations
When a node is added on the Right of Left son, such
that the balance factor of the current node becomes -2
and that of its Right son becomes 1, balancing will be
achieved by RL Rotations.
.

RL Rotations
-2
T1
1

a T2

T3
d

b c
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is -2, and T2 B.f is 1 then we need to apply RL rotation as
new node is added to the right of T1 and left of T2.
.

RL Rotations
-2
T1
T3
1

a T2 T1 T2
RL rotation

T3
d a b c d

b c

Before Balancing After Balancing


T1
-2
RL Rotations .

T3
1
a T2 T1 T2
RL rotation

T3 a
d b c d

b c
After Balancing.
Before Balancing. HT (T2) = max (Ht(a), Ht(b)) + 1
Assumptions : = max( x, will not be > x) + 1
Let ht(a) = x =x+1
So, ht(T2) = x + 2 HT (T1) = max (Ht(c), Ht(d)) + 1
= max( will not be > x , x) + 1
So, Ht(T3) = x + 1 =x+1
And So Ht(d) = x B.F (T3) = O
T1
-2
RL Rotations .

T3
1
a T2 T1 T2
RL rotation

T3 a
d b c d

b c
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 1 : When B.F of T3 = 1

Before Balancing.
After Balancing.
ht(b) = ht(c) + 1
BT (T2) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
= x – 1 – x = -1
So, ht(b) = x BF (T1) = Ht(a) - Ht(b)
So, Ht(c) = x - 1 =x– x
=0
T1
-2
RL Rotations .

T3
1
a T2 T1 T2
RL rotation

T3 a
d b c d

b c
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 2 : When B.F of T3 = -1

Before Balancing.
After Balancing.
ht(b) = ht(c) + 1
BT (T2) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x – x=0
So, ht(b) = x - 1 BF (T1) = Ht(a) - Ht(b)
So, Ht(c) = x = x – (x-1)
=1
.

AVL Tree Example:


• Insert 14, 17, 11, 7, 53, 4, 13 into an empty
AVL tree 14

7 17

4 11 53

13

31
.

AVL Tree Example:


• Now insert 12
14

7 17

4 11 53

13

12

32
.

AVL Tree Example:


• Now insert 12
14

7 17

4 11 53

12

13

33
.

AVL Tree Example:


• Now the AVL tree is balanced.
14

7 17

4 12 53

11 13

34
.

AVL Tree Example:


• Now insert 8
14

7 17

4 12 53

11 13

35
.

AVL Tree Example:


• Now insert 8
14

7 17

4 11 53

8 12

13

36
.

AVL Tree Example:


• Now the AVL tree is balanced.
14

11 17

7 12 53

4 8 13

37
.

AVL Tree Example:


• Now remove 53
14

11 17

7 12 53

4 8 13

38
.

AVL Tree Example:


• Now remove 53, unbalanced
14

11 17

7 12

4 8 13

39
.

AVL Tree Example:


• Balanced!
11

7 14

4 8 12 17

13

40
.

Red and black tree


It is a binary search tree with following
properties
1)Every node is either Red or Black

2)Every leaf is Black

3)Every children of red node are black

4)Every path from node to descendent


leaf contains same no. of black nodes
41
.

Red & Black Trees

Red–black trees offer worst-case time complexity


O(lg n) for insertion, deletion, and search . This
make them valuable in time-sensitive applications .
.

Optimal Binary Search Trees (***)

Given a fixed set of identifiers, we wish to create a


binary search tree organization.
We may expect different binary search trees for the
same identifier set to have different performance
characteristics.
Two possible BST .

for for

do while
do int

int
if while

if Average no of comparisions

( 1 + 2 + 2 + 3 + 4 ) / 5 = 12 / 5 ( 1 + 2 + 2 + 3 + 3 ) / 5 = 11 / 5

Each identifier is searched with equal probability and that no unsuccessful searches are
made.
.

In a general situation, we can expect different identifiers


to be searched for with different frequencies (or
probabilities).
In addition, we can expect unsuccessful searches are also
made.
.

Let us assume that the given set of identifiers is


{a1, a2 ,……, an} with a1 < a2 … < an .
Let p(i) be the probability with which we search for ai.
Let q(i) be the probability that the identifier x being
searched for is such that ai < x < ai+1, 0 ≤ i ≤ n
(Assume a0 = -∞ and an+1 = +∞ )
∑ 1 ≤ i ≤ n p(i) is the probability of an successful search.
∑ 0 ≤ i ≤ n q(i) is the probability of an unsuccessful search.
Clearly ∑ 1 ≤ i ≤ n p(i) + ∑ 0 ≤ i ≤ n q(i) = 1
Given this data, we wish to construct an optimal binary
search tree for {a1, a2 ,……, an}
.

Cost function for BST


It will be useful to add a fictitious node in place of
every empty subtree in the search tree. Such nodes,
called external nodes, are drawn in square.
All other nodes are internal nodes.
If a BST represents n identifiers, then there will be
exactly n internal nodes and n+1 (fictitious) external
nodes.
Every internal node represents a point where a
successful search may terminate.
Every external node represents a point where an
unsuccessful search may terminate.
BST with External nodes .

for Internal nodes for

do while
do int

int
if while

if

External
nodes
.

If successful search terminates at an internal node at level


l, then l comparisons are needed.
Unsuccessful search terminates at external nodes.
The identifiers not present in the BST can be partitioned
into n+1 equivalence classes Ei , 0 ≤ i ≤ n.
The class E0 contains all identifiers x such that x < a1 .
The class Ei contains all identifiers x, x > an
If the failure node for is Ei is at level l then only l-1
comparisons are needed.
.

Finding the cost (performance)


Expected cost contribution from the internal
node for ai is : p(i) * level(ai)
Expected cost contribution from the
external node for Ei is : q(i) * ( level(Ei) – 1)
.

PART-2

51
.

OBST
Formula for the expected cost of a BST
∑ p(i) * level(ai) + ∑ q(i) * ( level(Ei) – 1)
1≤i≤n 0≤i≤n
We define an optimal binary search tree for the
identifier set {a1, a2 ,……, an} to be a binary search
tree for which the cost is minimum.
.

Example
Identifier set {a1, a2 , a3} = {do, if , while}
With equal probabilities p(i) = q(i) = 1/7 for all i.

With {p1, p2 , p3} = {0.5, 0.1, 0.05}


{q0, q1, q2 , q3} = {0.15, 0.1, 0.05,0.05}
.

while if

if
do while

do

(a) (b)
.

do while

if
do

while
if

(c) (d)
.

do

while

if

(e)
.

Case 1 with equal probabilities


Cost ( tree a ) = 15 / 7
Cost ( tree b ) = 13 / 7
Cost ( tree c ) = 15 / 7
Cost ( tree d ) = 15 / 7
Cost ( tree e ) = 15 / 7
Case 2
Cost ( tree a ) = 2.65
Cost ( tree b ) = 1.9
Cost ( tree c ) = 1.5
Cost ( tree d ) = 2.05
Cost ( tree e ) = 1.6
.

Tabular method (Dynamic


programming)
w(i, j) = p(j) + q(j) + w(i, j-1)
c(i, j) = min { c(i, k-1) + c(k,j) } + w(i, j)
i<k≤j
r(i, j) = value of k that minimizes the above
equation.
Initial values
w(i, i) = q(i)
c(i, i) = 0
r(i, i) = 0
.

Optimal search tree for the example

if

do int

while

Minimum cost of the BST = 32


.

Huffman coding
Proposed by Dr. David A. Huffman in 1952
“A Method for the Construction of Minimum Redundancy
Codes”
Applicable to many forms of data transmission

Basic Huffman algorithm:


1. Scan text to be compressed.
2. Sort characters based on number of occurrences in text.
3. Build Huffman code tree based on sorted list.
4. Perform a traversal of tree to determine all code words.
5. Create new file using the Huffman codes.
60
.

Text message can be converted into sequence of 0’s &


1’s by replacing each character of message with code.

Huffman code used to find prefix code i. e. no code is


prefix of other code.

61
.

Draw Huffman tree for the given data set.


112

Data Frequency Huffman


Code

P 18
Q 8
R 15
S 2
T 25
U 13
V 5
W 26

62
.

10, 3, 4, 15, 2, 4, 2, 3, 6, 8, 7, 5, 12, 5

(86)

63
.

Draw Huffman tree for the given data


set. 1

Data Probability Huffman Code

A 0.07
B 0.09
C 0.12
D 0.22
E 0.23
F 0.27

64
.

PART-3

65
.

Heapsort
Combines best features of two sorting algorithms.
Introduces another algorithm design technique : the
use of a data structure called heap.
Heap data structure useful for heapsort and also
makes an efficient priority queue.
.

Heap
The (binary) heap data structure is an array object that
can be viewed as a nearly complete binary tree.
Each node of the tree corresponds to an element of the
array that stores the value in the node.
The tree is completely filled on all levels except
possibly the lowest, which is filled from the left up to a
point.
.

Heap
An array A that represents a heap is an object with two
attributes : lengthA, which is the number of elements in
the array.
And heapSizeA , the number of elements in the heap
stored within array A.
That is, although A[1 ….. lengthA] may contain valid
numbers, no element past A[heapSizeA], where
heapSizeA <= lengthA ,is an element of the heap.
A max heap viewed as binary tree and an array .

1 2 3 4 5 6 7 8 9 10

16 14 10 8 7 9 3 2 4 1

1
16

2 3
14 10

4 5 6 7
8 7 9 3

8 9 10
2 4 1
.

Types
Max-heaps and Min-heaps
In both kinds the values in the nodes satisfy a heap
property, the specifics of which depend on the kind
of heap.
In a max-heap, the max-heap property is that for
every node other than the root,
A[Parent(i)] >= A[i]
In a min-heap, the min-heap property is that for
every node other than the root,
A[Parent(i)] <= A[i]
Five Basic Procedures
.

The MAX-HEAPIFY procedure, which runs in O(lg n)


time, is the key to maintaining the max-heap property.
The BUILD-MAX-HEAP procedure, which runs in
linear time, produces a max-heap from an unordered
input array.
The HEAPSORT procedure, which runs in O(n lg n)
time, sorts an array in place.
The MAX-HEAP-INSERT, HEAP-EXTRACT-MAX
procedures, which runs in O(lg n) time , allow the heap
data structure to be used as a priority queue.
.

Maintaining the heap property


MAX-HEAPIFY is an important subroutine for
manipulating max-heaps.
Its input are an array A and an index i into the array.
When MAX-HEAPIFY is called, it is assumed that the
binary trees rooted as Left(i) and Right(i) are max-
heaps, but the A[i] may be smaller than its children,
thus violating the max-heap property.
The function of MAX-HEAPIFY is to let the value at
A[i] float down in the max-heap so that the subtree
rooted at index I becomes a max-heap.
MAX_HEAPIFY (A , i)
.

1. l = Left(i)
2. r = Right(i)
3. If l ≤ heapSizeA and A[l] > A[i]
4. then largest  l
5. else largest  i
6. If r ≤ heapSizeA and A[r] > A[largest]
7. then largest  r
8. If largest ≠ i
9. then Exchange A[i] ↔ A[largest]
10. MAX-HEAPIFY(A, largest)
.

1
16

i
2 3
4 10

5 6 7
4
14 7 9 3

8 9 10
2 8 1
.

1
16

2 3
14 10

i 7
4 5 6
4 7 9 3

8 9 10
2 8 1
.

1
16

2 3
14 10

4 5 6 7
8 7 9 3

i
8 9 10
2 4 1
.

Analysis
RT of MAX-HEAPIFY on a subtree of size n rooted at given
node i is the Θ(1) time to fix up the relationships among the
elements A[i], A[Left(i)], and A[Right(i)], plus the time to run
MAX-HEAPIFY on a subtree routed at one of the children of
node i.
The children’s subtrees each have size atmost 2n/3 – the
worst case occurs when the last row of the tree is exactly half
full – and the running time of MAX-HEAPIFY can therefore
be described by recurrence relation.
T(n) ≤ T(2n/3) + Θ(1)
The solution to this recurrence by master theorem is T(n) =
O(lg n)
We can also characterize the running time of MAX-
HEAPIFY on a node of height h as O(h).
.

Building a heap
We can use the procedure MAX-HEAPIFY in a
bottom-up manner to convert an array A[1 .. n] , where
n = lengthA , into a max-heap.
The Elements in the subarray A[ (|_n/2_| + 1) .. n ] are
all leaves of the tree and so each is 1-element heap to
begin with.
The procedure BUILD-MAX_HEAP goes through the
remaining nodes of the tree and runs MAX-HEAPFIY
on each one.
.

BUILD_MAX_HEAP(A)
1. heapsizeA  lengthA
2. for i  |_ lengthA / 2 _| down to 1
3. do MAX_HEAPIFY(A,i)
Build heap example .

1 2 3 4 5 6 7 8 9 10

4 1 3 2 16 9 10 14 8 7

1
4

2 3
1 3

4 i 5 6 7
2 16 9 10

8 9 10
14 8 7
(a)
.

1
4

2 3
1 3

i 5 6 7
4
2 16 9 10

8 9 10
14 8 7

(b)
.

1
4

2 i 3
1 3

4 5 6 7
14 16 9 10

8 9 10
2 8 7

(c)
.

1
4

i
2 3
1 10

4 5 6 7
14 16 9 3

8 9 10
2 8 7

(d)
.

1
i
4

2 3
16 10

5 6 7
4
14 7 9 3

8 9 10
2 8 1

(e)
.

1
16

2 3
14 10

4 5 6 7
8 7 9 3

8 9 10
2 4 1

(f)
The Heapsort Algorithm
.

Starts by using BUILD_MAX_HEAP to build a max-heap on the


input array A[1..n], where n = lengthA.
Since the maximum element of the array is stored at the root A[1],
it can be put into its correct final position by exchanging it with
A[n].
If we now “discard” node n from the heap by decrementing
heapsizeA, we observe that A[1 .. (n-1)] can easily be made into a
max-heap.
The children of the root remain max-heaps, but the new root
element may violate the max-heap property.
All that is needed to restore the max-heap-property by calling
MAX_HEAPIFY(A,1) , which leaves a mam-heap in A[1 .. (n-1)].
The heapsort algo then repeats this process for the max-heap of
size n-1 down to a heap of size 2.
.

HEAPSORT(A)
1. BUILD_MAX_HEAP(A)
2. for i  lengthA downto 2
3. do exhange A[1] ↔ A[i]
4. heapsizeA  heapsizeA -1
5. MAX_HEAPIFY(A,1)
.

Analysis
The HeapSort procedure takes time O(n lg n), since the
call to BUILD_MAX_HEAP takes time O(n) and each of
the n-1 calls to MAX_HEAPIFY takes time O(lg n).
.

1
16

2 3
14 10

5 6 7
4
8 7 9 3

8 9 10
2 4 1

(a)
.

1
14

2 3
8 10

5 6 7
4
4 7 9 3

i
8 9 10
2 1 16

(b)
.

1
10

2 3
8 9

5 6 7
4
4 7 1 3

i
8 9 10
2 14 16

(c)
.

1
9

2 3
8 3

5 6 7
4
4 7 1 2

i
8 9 10
10 14 16

(d)
.

1
8

2 3
7 3

i 7
4 5 6
4 2 1 9

8 9 10
10 14 16

(e)
.

1
7

2 3
4 3

i
5 6 7
4
1 2 8 9

8 9 10
10 14 16

(f)
.

1
4

2 3
2 3

i
5 6 7
4
1 7 8 9

8 9 10
10 14 16

(g)
.

1
3

2 3
2 1

i
5 6 7
4
4 7 8 9

8 9 10
10 14 16

(h)
.

1
2

i
2 3
1 3

5 6 7
4
4 7 8 9

8 9 10
10 14 16

(i)
.

1
1

i
2 3
2 3

5 6 7
4
4 7 8 9

8 9 10
10 14 16

(j)
.

1 2 3 4 5 6 7 8 9 10

1 2 3 4 7 8 9 10 14 16
.

Priority Queues
Most popular application of heap, its use as an efficient
priority queue.
A priority queue is a data structure for maintaining a set
S of elements, each with an associated value called a key.
Operations
INSERT(S,x) inserts the element x into the Set S.
EXTRACT_MAX(S) removes and returns the
element of S with the largest key.
MAXIMUM(S) returns the element of S with the
largest key.
.

HEAP_EXTRACT_MAX(A)
1. if heapsizeA < 1
2. then error “heap underflow”
3. max  A[1]
4. A[1]  A[heapsizeA]
5. heapsizeA  heapsizeA -1
6. MAX_HEAPIFY(A,1)
7. return max
.

HEAP_INSERT(A,key)
1. heapsizeA  heapsizeA + 1
2. i  heapsizeA
3. while i > 1 and A[Parent(i)] < key
4. do A[i]  A[Parent(i)]
5. i  Parent(i)
6. A[i]  key
.

PART-4

103
.

Hashing
If we have a table organization and a search
technique which tries to retrieve the key in a single
access, it would be very efficient. i.e now our need is
to search the element in constant time and less key
comparisons should be involved. To do so, the
position of the key in the table should not depend
upon the other keys but the location should be
calculated on the basis of the key itself. Such an
organization and search technique is called hashing.
.

In hashing the address or location of an identifier X


is obtained by using some function f(X) which gives
the address of X in a table.
For storing record For accessing record
Key Key

Generate array index Generate array index

Store the record on that Get the record from that


array index array index
.

Hashing Terminology
Hash Function : A function that transforms a key X into
a table index is called a hash function.
Hash Address : The address of X computed by the hash
function is called the hash address.
Synonyms : Two identifiers I1 and I2 are synonyms if
f(I1) = f(I2)
.

Hashing Terminology
Collision : When two non identical identifiers are
mapped into the same location in the hash table, a
collision is said to occur, i.e. f(I1) = f(I2)
Overflow : An overflow is said to occur when an
identifier gets mapped onto a full bucket. When s = 1
i.e, a bucket contains only one record, collision and
overflow occur simultaneously. Such situation is called
a Hash clash.
Bucket : Each hash table is partitioned into b buckets
ht[0] …… ht[b-1].
Each bucket is capable of holding ‘s’ records. Thus, a
bucket consists of ‘s’ slots. When s = 1, each bucket
can hold 1 record.
The function f(X) maps an identifier X into one of the
‘b’ buckets i.e. from 0 to b-1.
.

Hash Tables
hash table

hash function:
h(K)
Constant time accesses! …
A hash table is an array of some
fixed size, usually a prime number. General idea:
key space (e.g., integers, strings) TableSize –1

108
.

Example
0
key space = integers 1
2
TableSize = 10
3
4
h(K) = K mod 10 5
6
Insert: 7, 18, 41, 94 7
8
9

109
.

Another Example

key space = integers 0


TableSize = 6 1
2
3
h(K) = K mod 6 4
5
Insert: 7, 18, 41, 34

110
.

Hash Functions
A hashing function f transforms an identifier X into a bucket address in the
hash table.

Characteristics:
1. simple/fast to compute,
2. Avoid collisions
3. have keys distributed evenly among
cells.

111
.

Hash Functions

Truncation Method
Mid Square Method
Folding Method
Modular Method
Hash function for floating point numbers
Hash function for strings.
.

Truncation Method
Easiest Method
Take only part of the key as address, it can be some
rightmost digit or leftmost digit
Example : Let us take some 8 digit keys
82394561, 87139465, 83567271, 85943228
Hash Addresses : 61, 65, 71 and 28 (Rightmost two
digits for Hash table size 100)
Easy to compute but chances of collision is more
because last two digits can be same in different keys.
.

Mid Square Method


We square the key, and we take some digits from the
middle of that number as an address.
This is a very widely used function in symbol table
applications. Since middle bits of the square will
depend on all the bits in the identifier/ key, different
identifiers will result in different hash addresses thus
minimizing collision.
Example : We will choose the middle two digits (the
size of hash table = 100)
If X = 225 , X2 = 050625 , Hash address = 06
If X = 3205 , X2 = 01027205 , Hash address = 27
Folding Method
.

In this method, the identifier X is partitioned into


several parts all of the same length except the last.
These parts are added to obtain the hash address.
Addition is done in two ways.
Shift Folding : All parts except the last are shifted
so that their least significant bits correspond
to each other.
Folding at the boundaries : The identifier is
folded at the part boundaries and the bits
falling together are added.
.

Example

X = 12320324211220

P1 123 P1 123
P2 203 P2 302
P3 241 P3 241
P4 112 P4 211
P5 20 P5 20
------ -------
699 897
Shift Folding Folding at boundaries
.

Modular Method
Take the key, do the modulus operation and get the
remainder as address for hash table.
It ensures that the address will be in the range of hash
table.
Here only one thing we should keep in mind that table size
should not be in power of two, otherwise it will give more
collision.
The best way to minimize the collision is to take the table
size a prime number.
Example : table size : 31
X = 134
f(X) = X % 31 = 134 % 31 = 10
.

Table size: Why Prime?


Suppose
data stored in hash table: 7160, 493, 60,
55, 321, 900, 810
Real-life data tends to
have a pattern
tableSize = 10 Being a multiple of 11 is
data hashes to 0, 3, 0, 5, 1, 0, 0 usually not the pattern 

tableSize = 11
data hashes to 10, 9, 5, 0, 2, 9, 7

118
.

Hash function for floating point


numbers
Getting the hash address for floating point numbers has some what
little bit different approach but it also requires modulus
operation at the end for getting hash address in range of hash
table. The whole operation can be defined as :
Take the fractional part of key.
Multiply the fractional part with the size of the hash table.
Take the integer part of the multiplication result as a hash
address of key.
Example : X = 19.463 Hash table size = 97
0.463 x 97 = 44.911
f(X) = 44
.

Hash function for strings


Every character has some ASCII value that can be used
for calculation in generating hash key value and that
value can be used with modulus operation for
mapping with hash table.
Example : X = suresh Hash table size = 97
Suresh = s + u + r + e + s + h
= 115 + 117 + 114 + 101 + 115 + 104
` = 666
After modulus operation
f(X) = 666 % 97 = 84
.

Characteristics of Good hash


function

1. Good hash function avoids collision.

2. Good hash function is easy to compute

3. Spread the keys evenly in array.

121
.

PART-5

122
.

Collision Resolution
Collision: when two keys map to the same location in the
hash table.

Two ways to resolve collisions:


1. Separate Chaining
2. Open Addressing (linear probing, quadratic probing,
double hashing)

123
Collision Resolution Policies
.

Two classes:
(1) “Open hashing” equals “separate chaining”
(2) “Closed hashing” equals “open addressing”
Difference has to do with whether collisions are stored
outside the table (open hashing) or whether collisions
result in storing one of the records at another slot in the
table (closed hashing)
.

Separate Chaining Insert:


0 10
22
1 107
12
2 42

3
4 Separate chaining: All keys that map to
5 the same hash value are kept in a list (or
“bucket”).
6
7
8
9
125
.

Open Addressing
Insert:
0 38
19
1 8
109
2 10

3
4 Linear Probing: after checking spot h(k),
try spot h(k)+1, if that is full, try h(k)+2,
5 then h(k)+3, etc.
6
7
8
9
126
Linear Probing
.

Insert:
0 29
18
1 43
10
2 36
25
3 46
4
5 Hash table size : 11
6 Hashing Function
7 F(X) = X mod HTsize
8
9
10 127
Linear Probing
.

Insert:
0 10 29
18
1 43
10
2 46 36
25
3 36 46
4 25
5 Hash table size : 11
6 Hashing Function
7 29 F(X) = X mod HTsize
8 18
9
10 43 128
.

Linear Probing
f(i) = i

Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + 1) mod TableSize
2th probe = (h(k) + 2) mod TableSize
...
ith probe = (h(k) + i) mod TableSize

129
.

Linear Probing – Clustering

no collision
collision in small cluster
no collision

collision in large cluster

[R. Sedgewick]

130
.

Quadratic Probing
This method is used to avoid the clustering problem in
the above method.
Suppose hash address is h then in the case of collision,
linear probing search the location h, h+1, h+2 …..
(%size).
In Quadratic probing a quadratic function of i is used
as the increment i.e. instead of checking (i+ 1)th
index, this method checks the index computed from a
quadratic equation. This ensures that the identifiers are
fairly spread out in the table.
.

Quadratic Probing Less likely to


encounter
Primary
Clustering

f(i) = i2

Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + 1) mod TableSize
2th probe = (h(k) + 4) mod TableSize
3th probe = (h(k) + 9) mod TableSize
...
ith probe = (h(k) + i2) mod TableSize
132
.

Quadratic Probing

0 Insert:
89
1 18
49
2 58
79
3
4
5
6
7
8
9 133
.

Quadratic Probing
0
1 Insert:
29
2 18
43
3 10
46
4 54

5
6
7 Hash table size 11 Quadratic
equation = i2
8
9
10 134
.

Quadratic Probing
0 10
1 Insert:
29
2 46 18
43
3 54 10
46
4 54

5
6
7 29 Hash table size 11 Quadratic
equation = i2
8 18
9
10 43 135
.

Quadratic Probing Example


insert(76) insert(40) insert(48) insert(5) insert(55)
76%7 = 6 40%7 = 5 48%7 = 6 5%7 = 5 55%7 = 6
0

1 insert(47)
But…
47%7 = 5
2

6
76

136
Double hashing
.

In this method, if an overflow occurs, a new address is


computed by using another hash function. A series of hash
functions f1,f2, …..,fn are used. Hashed values f1(X),
f2(X), …. fn(X) are examined in order till an empty slot is
found.
Example : H = key % 13 H’ = 11 – (key %11)
So at the time of collision hash address for the next prob
will be as
(H+H’) % 13 = ((key % 13) + (11 – (key % 11))) % 13
.

Double Hashing
f(i) = i * g(k)
where g is a second hash function

Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + g(k)) mod TableSize
2th probe = (h(k) + 2*g(k)) mod TableSize
3th probe = (h(k) + 3*g(k)) mod TableSize
...
ith probe = (h(k) + i*g(k)) mod TableSize
138
.

Double Hashing Example


h(k) = k mod 7 and g(k) = 5 – (k mod 5)

76 93 40 47 10 55

0 0 0 0 0 0
1 1 1 1 47 1 47 1 47
2 2 93 2 93 2 93 2 93 2 93
3 3 3 3 3 10 3 10
4 4 4 4 4 4 55
5 5 5 40 5 40 5 40 5 40
6 76 6 76 6 76 6 76 6 76 6 76
Probes 1 1 1 2 1 2
139
.
0
Double Hashing
1
2
Insert:
3
8
4 55
5 48
6 68
Hash table size 13
7 H = key % 13
8
H’ = 11 – (key %11)
9
(H+H’) % 13 =
10
= ((key % 13) + (11 – (key % 11))) % 13
11
140
12
.
0 Double Hashing
1
2 Insert:
3 55 8
55
4 48
68
5
6 Hash table size 13
7 H = key % 13
8 8
H’ = 11 – (key %11)
9 48
(H+H’) % 13 =
10
= ((key % 13) + (11 – (key % 11))) % 13
11
141
12 68
Resolving Collisions with Double Hashing
.

0
1
2
3
4 Insert these values into the hash
table in this order. Resolve any
5 collisions with double hashing:
6 13
7 28
33
8 147
9 43
142
.

Rehashing
Idea: When the table gets too full, create a bigger
table (usually 2x as large) and hash all the items from
the original table into the new table.
When to rehash?
half full ( = 0.5)
when an insertion fails
some other threshold
Cost of rehashing?

143
Rehashing .

There are chances of insertion failure when hash table is


full. So the solution for this particular case is to create a
new hash table with the double size of previous hash
table.
Here we will use new hash function and we will insert all
the elements of the previous hash table.
So we will scan the elements of previous hash table one
by one and calculate the hash key with new hash function
and insert them into new hash table.
This technique is called rehashing.
It ensures the insertion of element in hash table.
.

PART-6

145
Chaining
.

The problem of the complexity of the search algorithm


will still come into picture as we are using the linear
search for some portion.
To avoid this the concept of the chain comes into
picture.
We will now provide an extra field with the record
which will point to the record number which contains
the record having same hash value.
Chaining means remembering the record number ,
where the record which has same hash value is stored
.

Chaining without Replacement


The hash address of an identifier is computed
If this position is vacant, it is placed there.
It its position is occupied, the identifier is put in the
next vacant position and a chain is formed to the new
position.
.

Resolving Collisions with Chaining without replacement


Hash function : f(X) = X % 10 Table size =
Value Chain
10:
0
1
2
3 Insert these values into the hash
table in this order.
4 11
32
5 41
54
6 33

7
8
9 148
Resolving Collisions with Chaining without.

replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:

0 -1
1 11 3
2 32 -1
Insert these values into the hash
3 41 5 table in this order.
11
4 54 -1 32
41
5 33 -1 54
33
6 -1
7 -1
8 -1
149
9 -1
.

Disadvantage
The main idea is to chain all identifiers having same
hash address (synonyms).
However, since an identifier occupies the position of
another identifier, even non-synonyms get chained
together thereby increasing complexity.
.

Chaining with Replacement


In this method, if another identifier Y is occupying the
position of an identifier X, X replaces it and then Y is
relocated to a new position.
Resolving Collisions with Chaining without.

replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:

0
1
2
Insert these values into the hash
3 table in this order.
11
4 32
41
5 54
33
6
7
8
152
9
Resolving Collisions with Chaining without
.

replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:

0 -1
1 11 5
2 32 -1
Insert these values into the hash
3 33 -1 table in this order.
11
4 54 -1 32
41
5 41 -1 54
33
6 -1
7 -1
8 -1
153
9 -1
Collision Resolution – Open .

Addressing

Linear Probing
new position = (current position + 1) MOD hash size
Example – Before linear probing:

After linear probing:

Problem – Clustering occurs, that is, the used spaces tend to appear
in groups which tends to grow and thus increase the search time
to reach an open space.

154
.

Quadratic Probing
new position = (collision position + j2) MOD hash size
j = 1, 2, 3, 4, ……}
Example – Before quadratic probing:

After quadratic probing:


Problem – Overflow may occurs when there is still
space in the hash table.
155
.

Collision Resolution – Chaining

Example:
Before chaining:

After chaining:

156
.

Rehashing
Rehashing is done when hash table is almost full
Size of the table is to be increased and all keys are
rearranged.

157

You might also like