0% found this document useful (0 votes)

48 views157 pages

Unit-V Tables: Sinhgad Institute of Technology

The document discusses different types of balanced binary search trees including symbol tables, AVL trees, and various tree rotations. It provides details on: 1) Symbol tables which are data structures used in compilers to store program elements and their attributes. AVL trees are a type of self-balancing binary search tree where the balance factor of any node must be between -1 and 1. 2) The four basic tree rotations - left-left, right-right, left-right, and right-left - that are used to rebalance an AVL tree after an insertion or deletion to maintain the balance factor property. Pseudocode for performing each rotation type is also provided. 3) Workings of the

Uploaded by

shubham gandhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views157 pages

Unit-V Tables: Sinhgad Institute of Technology

Uploaded by

shubham gandhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 157

Sinhgad Technical Education Society’s

SINHGAD INSTITUTE OF TECHNOLOGY

Gat no. 309/310, Off Mumbai-Pune Expressway, Kusgaon (Bk.) Lonavala -410401.

UNIT- V
TABLES
CONTENTS .

1. Symbol Tables :Notion of Symbol Table, Concept of red

and black trees, AVL Trees,
2. OBST, Huffman's algorithm,
3. Heap data structure, Min and Max Heap , Heap sort
implementation, applications of heap
4. Hash tables and scattered tables: Basic concepts, hash
function, characteristics of good hash function, different
key-to-address transformations techniques,
5. Synonyms or collisions, collision resolution techniques-
linear probing, quadratic probing, rehashing,
6. Chaining without replacement and chaining with
replacement.
.

PART-1

3
.

Symbol Table

Is an ADT for storing names encountered in source

program along with their attributes.
Typically used in compilers.
Types of Symbol Table
Static
Dynamic
Operations on symbol table 1) Insertion
2) Deletion
3) Searching
.

AVL TREES

An AVL tree is a height balanced binary search tree.

AVL is named for its inventors: Adel’son-Vel’skii and
Landis

In AVL tree

Balance factor = |Height of left subtree – Height of right

subtree.|

Balance factor must be always -1 , 0, or 1.

Balance Factor
A Balance Factor of a node will be defined as height
of left subtree of the node minus the height of right
subtree of the node.
B.F of a node
= ht(leftsubtree) – ht(rightsubtree)
For AVL Trees , the B.F which is measure of
imbalance , should be in the range of -1 , 0 , or 1.
.

15 35

10 18 25 40

30 38 45

50
.

-2
20

0 -1
15 35

0 0 -1
-1
10 18 25 40

0
0 -1
30 38 45

0
50
.

4 Types of Rotations to be applied

when the tree is imbalanced

Single rotations
LL Rotations
RR Rotations
Double Rotations
LR Rotations
RL Rotations
.

LL Rotations

When a node is added on the left of leftson, such

that the balance factor of the current node becomes
2, balancing will be achieved by LL Rotations.
.

LL Rotations
2
T1
1
T2 d

T3 c

a b

New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is 2, and T2 B.f is 1 then we need to apply LL rotation as
new node is added to the left of T1 and left of T2.
.

LL Rotations
2
T1 T2
1
T2 d T3 T1
LL rotation

T3 c a b c d

a b

Before Balancing After Balancing

LL Rotations Proof
2
T1
1 T2
T2 d
T3 T1
T3 c LL rotation

a a b c d
b

Before Balancing. After Balancing.

Assumptions : B.F (T1) = ht(c) – ht(d)
Let ht(d) = x = x–x =0
So, ht(T2) = x + 2 B.F (T3) = Original
So, Ht(T3) = x + 1 B.F(T2) = Ht(T3) – Ht(T1)
And So Ht(c) = x = (x + 1) - (x + 1)
=0
.

LL Rotations implementations
1. Let T1 be the node with bf 2 and its
parent as par
2. T2 = T1->lc and has bf 1
3. T1->lc = T2->rc
4. T2->rc = T1
5. T1->bf = 0
6. T2->bf = 0
7. Attach T2 to parent node par
.

RR Rotations
When a node is added on the right of right son, such
that the balance factor of the current node becomes -2,
balancing will be achieved by RR Rotations.
.

RR Rotations
-2
T1
-1

a T2

b
T3

c d
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is -2, and T2 B.f is -1 then we need to apply RR rotation as
new node is added to the right of T1 and right of T2.
.

RR Rotations
-2
T1
T2
-1
a T2
T1 T3
RR rotation

b
T3
a b c d

c d

Before Balancing After Balancing

RR Rotations implementations
1. Let T1 be the node with bf -2 and its
parent as par
2. T2 = T1->rc and has bf -1
3. T1->rc = T2->lc
4. T2->lc = T1
5. T1->bf = 0
6. T2->bf = 0
7. Attach T2 to parent node par
.

LR Rotations
When a node is added on the left of Rightson,
such that the balance factor of the current
node becomes 2 and that of its leftson
becomes -1, balancing will be achieved by LR
Rotations.
.

LR Rotations
2
T1
-1
T2 d

a
T3

b c
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is 2, and T2 B.f is -1 then we need to apply LR rotation as
new node is added to the left of T1 and right of T2.
.

LR Rotations
2
T1
-1
T3
T2 d
LR rotation
T2 T1
a T3
a b c d
b c

Before Balancing After Balancing

LR Rotations
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. HT (T2) = max (Ht(a), Ht(b)) + 1
Assumptions : = max( x, will not be > x) + 1
Let ht(d) = x =x+1
HT (T1) = max (Ht(c), Ht(d)) + 1
So, ht(T2) = x + 2
= max( will not be > x , x) + 1
So, Ht(T3) = x + 1 =x+1
And So Ht(a) = x B.F (T3) = O
.
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 1 : When B.F of T3 = 1
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. BT (T2) = Ht(a) - Ht(b)
ht(b) = ht(c) + 1 =x–x=0
BF (T1) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x–1-x
So, ht(b) = x = -1
So, Ht(c) = x - 1
.
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 2 : When B.F of T3 = -1
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. BT (T2) = Ht(a) - Ht(b)
ht(c) = ht(b) + 1 = x – (x – 1) = 1
BF (T1) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x– x
So, ht(c) = x =0
So, Ht(b) = x - 1
.

RL Rotations
When a node is added on the Right of Left son, such
that the balance factor of the current node becomes -2
and that of its Right son becomes 1, balancing will be
achieved by RL Rotations.
.

RL Rotations
-2
T1
1

a T2

T3
d

b c
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is -2, and T2 B.f is 1 then we need to apply RL rotation as
new node is added to the right of T1 and left of T2.
.

RL Rotations
-2
T1
T3
1

a T2 T1 T2
RL rotation

T3
d a b c d

b c

Before Balancing After Balancing

T1
-2
RL Rotations .

T3
1
a T2 T1 T2
RL rotation

T3 a
d b c d

b c
After Balancing.
Before Balancing. HT (T2) = max (Ht(a), Ht(b)) + 1
Assumptions : = max( x, will not be > x) + 1
Let ht(a) = x =x+1
So, ht(T2) = x + 2 HT (T1) = max (Ht(c), Ht(d)) + 1
= max( will not be > x , x) + 1
So, Ht(T3) = x + 1 =x+1
And So Ht(d) = x B.F (T3) = O
T1
-2
RL Rotations .

T3
1
a T2 T1 T2
RL rotation

T3 a
d b c d

b c
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 1 : When B.F of T3 = 1

Before Balancing.
After Balancing.
ht(b) = ht(c) + 1
BT (T2) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
= x – 1 – x = -1
So, ht(b) = x BF (T1) = Ht(a) - Ht(b)
So, Ht(c) = x - 1 =x– x
=0
T1
-2
RL Rotations .

T3
1
a T2 T1 T2
RL rotation

T3 a
d b c d

b c
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 2 : When B.F of T3 = -1

Before Balancing.
After Balancing.
ht(b) = ht(c) + 1
BT (T2) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x – x=0
So, ht(b) = x - 1 BF (T1) = Ht(a) - Ht(b)
So, Ht(c) = x = x – (x-1)
=1
.

AVL Tree Example:

• Insert 14, 17, 11, 7, 53, 4, 13 into an empty
AVL tree 14

7 17

4 11 53

31
.

AVL Tree Example:

• Now insert 12
14

7 17

4 11 53

32
.

AVL Tree Example:

• Now insert 12
14

7 17

4 11 53

33
.

AVL Tree Example:

• Now the AVL tree is balanced.
14

7 17

4 12 53

11 13

34
.

AVL Tree Example:

• Now insert 8
14

7 17

4 12 53

11 13

35
.

AVL Tree Example:

• Now insert 8
14

7 17

4 11 53

8 12

36
.

AVL Tree Example:

• Now the AVL tree is balanced.
14

11 17

7 12 53

4 8 13

37
.

AVL Tree Example:

• Now remove 53
14

11 17

7 12 53

4 8 13

38
.

AVL Tree Example:

• Now remove 53, unbalanced
14

11 17

7 12

4 8 13

39
.

AVL Tree Example:

• Balanced!
11

7 14

4 8 12 17

40
.

Red and black tree

It is a binary search tree with following
properties
1)Every node is either Red or Black

2)Every leaf is Black

3)Every children of red node are black

4)Every path from node to descendent

leaf contains same no. of black nodes
41
.

Red & Black Trees

Red–black trees offer worst-case time complexity

O(lg n) for insertion, deletion, and search . This
make them valuable in time-sensitive applications .
.

Optimal Binary Search Trees (***)

Given a fixed set of identifiers, we wish to create a

binary search tree organization.
We may expect different binary search trees for the
same identifier set to have different performance
characteristics.
Two possible BST .

for for

do while
do int

int
if while

if Average no of comparisions

( 1 + 2 + 2 + 3 + 4 ) / 5 = 12 / 5 ( 1 + 2 + 2 + 3 + 3 ) / 5 = 11 / 5

Each identifier is searched with equal probability and that no unsuccessful searches are
made.
.

In a general situation, we can expect different identifiers

to be searched for with different frequencies (or
probabilities).
In addition, we can expect unsuccessful searches are also
made.
.

Let us assume that the given set of identifiers is

{a1, a2 ,……, an} with a1 < a2 … < an .
Let p(i) be the probability with which we search for ai.
Let q(i) be the probability that the identifier x being
searched for is such that ai < x < ai+1, 0 ≤ i ≤ n
(Assume a0 = -∞ and an+1 = +∞ )
∑ 1 ≤ i ≤ n p(i) is the probability of an successful search.
∑ 0 ≤ i ≤ n q(i) is the probability of an unsuccessful search.
Clearly ∑ 1 ≤ i ≤ n p(i) + ∑ 0 ≤ i ≤ n q(i) = 1
Given this data, we wish to construct an optimal binary
search tree for {a1, a2 ,……, an}
.

Cost function for BST

It will be useful to add a fictitious node in place of
every empty subtree in the search tree. Such nodes,
called external nodes, are drawn in square.
All other nodes are internal nodes.
If a BST represents n identifiers, then there will be
exactly n internal nodes and n+1 (fictitious) external
nodes.
Every internal node represents a point where a
successful search may terminate.
Every external node represents a point where an
unsuccessful search may terminate.
BST with External nodes .

for Internal nodes for

do while
do int

int
if while

External
nodes
.

If successful search terminates at an internal node at level

l, then l comparisons are needed.
Unsuccessful search terminates at external nodes.
The identifiers not present in the BST can be partitioned
into n+1 equivalence classes Ei , 0 ≤ i ≤ n.
The class E0 contains all identifiers x such that x < a1 .
The class Ei contains all identifiers x, x > an
If the failure node for is Ei is at level l then only l-1
comparisons are needed.
.

Finding the cost (performance)

Expected cost contribution from the internal
node for ai is : p(i) * level(ai)
Expected cost contribution from the
external node for Ei is : q(i) * ( level(Ei) – 1)
.

PART-2

51
.

OBST
Formula for the expected cost of a BST
∑ p(i) * level(ai) + ∑ q(i) * ( level(Ei) – 1)
1≤i≤n 0≤i≤n
We define an optimal binary search tree for the
identifier set {a1, a2 ,……, an} to be a binary search
tree for which the cost is minimum.
.

Example
Identifier set {a1, a2 , a3} = {do, if , while}
With equal probabilities p(i) = q(i) = 1/7 for all i.

With {p1, p2 , p3} = {0.5, 0.1, 0.05}

{q0, q1, q2 , q3} = {0.15, 0.1, 0.05,0.05}
.

while if

if
do while

(a) (b)
.

do while

if
do

while
if

while

(e)
.

Case 1 with equal probabilities

Cost ( tree a ) = 15 / 7
Cost ( tree b ) = 13 / 7
Cost ( tree c ) = 15 / 7
Cost ( tree d ) = 15 / 7
Cost ( tree e ) = 15 / 7
Case 2
Cost ( tree a ) = 2.65
Cost ( tree b ) = 1.9
Cost ( tree c ) = 1.5
Cost ( tree d ) = 2.05
Cost ( tree e ) = 1.6
.

Tabular method (Dynamic

programming)
w(i, j) = p(j) + q(j) + w(i, j-1)
c(i, j) = min { c(i, k-1) + c(k,j) } + w(i, j)
i<k≤j
r(i, j) = value of k that minimizes the above
equation.
Initial values
w(i, i) = q(i)
c(i, i) = 0
r(i, i) = 0
.

Optimal search tree for the example

do int

while

Minimum cost of the BST = 32

Huffman coding
Proposed by Dr. David A. Huffman in 1952
“A Method for the Construction of Minimum Redundancy
Codes”
Applicable to many forms of data transmission

Basic Huffman algorithm:

1. Scan text to be compressed.
2. Sort characters based on number of occurrences in text.
3. Build Huffman code tree based on sorted list.
4. Perform a traversal of tree to determine all code words.
5. Create new file using the Huffman codes.
60
.

Text message can be converted into sequence of 0’s &

1’s by replacing each character of message with code.

Huffman code used to find prefix code i. e. no code is

prefix of other code.

61
.

Draw Huffman tree for the given data set.

112

Data Frequency Huffman

Code

P 18
Q 8
R 15
S 2
T 25
U 13
V 5
W 26

62
.

10, 3, 4, 15, 2, 4, 2, 3, 6, 8, 7, 5, 12, 5

(86)

63
.

Draw Huffman tree for the given data

set. 1

Data Probability Huffman Code

A 0.07
B 0.09
C 0.12
D 0.22
E 0.23
F 0.27

64
.

PART-3

65
.

Heapsort
Combines best features of two sorting algorithms.
Introduces another algorithm design technique : the
use of a data structure called heap.
Heap data structure useful for heapsort and also
makes an efficient priority queue.
.

Heap
The (binary) heap data structure is an array object that
can be viewed as a nearly complete binary tree.
Each node of the tree corresponds to an element of the
array that stores the value in the node.
The tree is completely filled on all levels except
possibly the lowest, which is filled from the left up to a
point.
.

Heap
An array A that represents a heap is an object with two
attributes : lengthA, which is the number of elements in
the array.
And heapSizeA , the number of elements in the heap
stored within array A.
That is, although A[1 ….. lengthA] may contain valid
numbers, no element past A[heapSizeA], where
heapSizeA <= lengthA ,is an element of the heap.
A max heap viewed as binary tree and an array .

1 2 3 4 5 6 7 8 9 10

16 14 10 8 7 9 3 2 4 1

1
16

2 3
14 10

4 5 6 7
8 7 9 3

8 9 10
2 4 1
.

Types
Max-heaps and Min-heaps
In both kinds the values in the nodes satisfy a heap
property, the specifics of which depend on the kind
of heap.
In a max-heap, the max-heap property is that for
every node other than the root,
A[Parent(i)] >= A[i]
In a min-heap, the min-heap property is that for
every node other than the root,
A[Parent(i)] <= A[i]
Five Basic Procedures
.

The MAX-HEAPIFY procedure, which runs in O(lg n)

time, is the key to maintaining the max-heap property.
The BUILD-MAX-HEAP procedure, which runs in
linear time, produces a max-heap from an unordered
input array.
The HEAPSORT procedure, which runs in O(n lg n)
time, sorts an array in place.
The MAX-HEAP-INSERT, HEAP-EXTRACT-MAX
procedures, which runs in O(lg n) time , allow the heap
data structure to be used as a priority queue.
.

Maintaining the heap property

MAX-HEAPIFY is an important subroutine for
manipulating max-heaps.
Its input are an array A and an index i into the array.
When MAX-HEAPIFY is called, it is assumed that the
binary trees rooted as Left(i) and Right(i) are max-
heaps, but the A[i] may be smaller than its children,
thus violating the max-heap property.
The function of MAX-HEAPIFY is to let the value at
A[i] float down in the max-heap so that the subtree
rooted at index I becomes a max-heap.
MAX_HEAPIFY (A , i)
.

1. l = Left(i)
2. r = Right(i)
3. If l ≤ heapSizeA and A[l] > A[i]
4. then largest  l
5. else largest  i
6. If r ≤ heapSizeA and A[r] > A[largest]
7. then largest  r
8. If largest ≠ i
9. then Exchange A[i] ↔ A[largest]
10. MAX-HEAPIFY(A, largest)
.

1
16

i
2 3
4 10

5 6 7
4
14 7 9 3

8 9 10
2 8 1
.

1
16

2 3
14 10

i 7
4 5 6
4 7 9 3

8 9 10
2 8 1
.

1
16

2 3
14 10

4 5 6 7
8 7 9 3

i
8 9 10
2 4 1
.

Analysis
RT of MAX-HEAPIFY on a subtree of size n rooted at given
node i is the Θ(1) time to fix up the relationships among the
elements A[i], A[Left(i)], and A[Right(i)], plus the time to run
MAX-HEAPIFY on a subtree routed at one of the children of
node i.
The children’s subtrees each have size atmost 2n/3 – the
worst case occurs when the last row of the tree is exactly half
full – and the running time of MAX-HEAPIFY can therefore
be described by recurrence relation.
T(n) ≤ T(2n/3) + Θ(1)
The solution to this recurrence by master theorem is T(n) =
O(lg n)
We can also characterize the running time of MAX-
HEAPIFY on a node of height h as O(h).
.

Building a heap
We can use the procedure MAX-HEAPIFY in a
bottom-up manner to convert an array A[1 .. n] , where
n = lengthA , into a max-heap.
The Elements in the subarray A[ (|_n/2_| + 1) .. n ] are
all leaves of the tree and so each is 1-element heap to
begin with.
The procedure BUILD-MAX_HEAP goes through the
remaining nodes of the tree and runs MAX-HEAPFIY
on each one.
.

BUILD_MAX_HEAP(A)
1. heapsizeA  lengthA
2. for i  |_ lengthA / 2 _| down to 1
3. do MAX_HEAPIFY(A,i)
Build heap example .

1 2 3 4 5 6 7 8 9 10

4 1 3 2 16 9 10 14 8 7

1
4

2 3
1 3

4 i 5 6 7
2 16 9 10

8 9 10
14 8 7
(a)
.

1
4

2 3
1 3

i 5 6 7
4
2 16 9 10

8 9 10
14 8 7

(b)
.

1
4

2 i 3
1 3

4 5 6 7
14 16 9 10

8 9 10
2 8 7

1
4

i
2 3
1 10

4 5 6 7
14 16 9 3

8 9 10
2 8 7

(d)
.

1
i
4

2 3
16 10

5 6 7
4
14 7 9 3

8 9 10
2 8 1

(e)
.

1
16

2 3
14 10

4 5 6 7
8 7 9 3

8 9 10
2 4 1

(f)
The Heapsort Algorithm
.

Starts by using BUILD_MAX_HEAP to build a max-heap on the

input array A[1..n], where n = lengthA.
Since the maximum element of the array is stored at the root A[1],
it can be put into its correct final position by exchanging it with
A[n].
If we now “discard” node n from the heap by decrementing
heapsizeA, we observe that A[1 .. (n-1)] can easily be made into a
max-heap.
The children of the root remain max-heaps, but the new root
element may violate the max-heap property.
All that is needed to restore the max-heap-property by calling
MAX_HEAPIFY(A,1) , which leaves a mam-heap in A[1 .. (n-1)].
The heapsort algo then repeats this process for the max-heap of
size n-1 down to a heap of size 2.
.

HEAPSORT(A)
1. BUILD_MAX_HEAP(A)
2. for i  lengthA downto 2
3. do exhange A[1] ↔ A[i]
4. heapsizeA  heapsizeA -1
5. MAX_HEAPIFY(A,1)
.

Analysis
The HeapSort procedure takes time O(n lg n), since the
call to BUILD_MAX_HEAP takes time O(n) and each of
the n-1 calls to MAX_HEAPIFY takes time O(lg n).
.

1
16

2 3
14 10

5 6 7
4
8 7 9 3

8 9 10
2 4 1

(a)
.

1
14

2 3
8 10

5 6 7
4
4 7 9 3

i
8 9 10
2 1 16

(b)
.

1
10

2 3
8 9

5 6 7
4
4 7 1 3

i
8 9 10
2 14 16

1
9

2 3
8 3

5 6 7
4
4 7 1 2

i
8 9 10
10 14 16

(d)
.

1
8

2 3
7 3

i 7
4 5 6
4 2 1 9

8 9 10
10 14 16

(e)
.

1
7

2 3
4 3

i
5 6 7
4
1 2 8 9

8 9 10
10 14 16

(f)
.

1
4

2 3
2 3

i
5 6 7
4
1 7 8 9

8 9 10
10 14 16

(g)
.

1
3

2 3
2 1

i
5 6 7
4
4 7 8 9

8 9 10
10 14 16

(h)
.

1
2

i
2 3
1 3

5 6 7
4
4 7 8 9

8 9 10
10 14 16

(i)
.

1
1

i
2 3
2 3

5 6 7
4
4 7 8 9

8 9 10
10 14 16

(j)
.

1 2 3 4 5 6 7 8 9 10

1 2 3 4 7 8 9 10 14 16
.

Priority Queues
Most popular application of heap, its use as an efficient
priority queue.
A priority queue is a data structure for maintaining a set
S of elements, each with an associated value called a key.
Operations
INSERT(S,x) inserts the element x into the Set S.
EXTRACT_MAX(S) removes and returns the
element of S with the largest key.
MAXIMUM(S) returns the element of S with the
largest key.
.

HEAP_EXTRACT_MAX(A)
1. if heapsizeA < 1
2. then error “heap underflow”
3. max  A[1]
4. A[1]  A[heapsizeA]
5. heapsizeA  heapsizeA -1
6. MAX_HEAPIFY(A,1)
7. return max
.

HEAP_INSERT(A,key)
1. heapsizeA  heapsizeA + 1
2. i  heapsizeA
3. while i > 1 and A[Parent(i)] < key
4. do A[i]  A[Parent(i)]
5. i  Parent(i)
6. A[i]  key
.

PART-4

103
.

Hashing
If we have a table organization and a search
technique which tries to retrieve the key in a single
access, it would be very efficient. i.e now our need is
to search the element in constant time and less key
comparisons should be involved. To do so, the
position of the key in the table should not depend
upon the other keys but the location should be
calculated on the basis of the key itself. Such an
organization and search technique is called hashing.
.

In hashing the address or location of an identifier X

is obtained by using some function f(X) which gives
the address of X in a table.
For storing record For accessing record
Key Key

Generate array index Generate array index

Store the record on that Get the record from that

array index array index
.

Hashing Terminology
Hash Function : A function that transforms a key X into
a table index is called a hash function.
Hash Address : The address of X computed by the hash
function is called the hash address.
Synonyms : Two identifiers I1 and I2 are synonyms if
f(I1) = f(I2)
.

Hashing Terminology
Collision : When two non identical identifiers are
mapped into the same location in the hash table, a
collision is said to occur, i.e. f(I1) = f(I2)
Overflow : An overflow is said to occur when an
identifier gets mapped onto a full bucket. When s = 1
i.e, a bucket contains only one record, collision and
overflow occur simultaneously. Such situation is called
a Hash clash.
Bucket : Each hash table is partitioned into b buckets
ht[0] …… ht[b-1].
Each bucket is capable of holding ‘s’ records. Thus, a
bucket consists of ‘s’ slots. When s = 1, each bucket
can hold 1 record.
The function f(X) maps an identifier X into one of the
‘b’ buckets i.e. from 0 to b-1.
.

Hash Tables
hash table

hash function:
h(K)
Constant time accesses! …
A hash table is an array of some
fixed size, usually a prime number. General idea:
key space (e.g., integers, strings) TableSize –1

108
.

Example
0
key space = integers 1
2
TableSize = 10
3
4
h(K) = K mod 10 5
6
Insert: 7, 18, 41, 94 7
8
9

109
.

Another Example

key space = integers 0

TableSize = 6 1
2
3
h(K) = K mod 6 4
5
Insert: 7, 18, 41, 34

110
.

Hash Functions
A hashing function f transforms an identifier X into a bucket address in the
hash table.

Characteristics:
1. simple/fast to compute,
2. Avoid collisions
3. have keys distributed evenly among
cells.

111
.

Hash Functions

Truncation Method
Mid Square Method
Folding Method
Modular Method
Hash function for floating point numbers
Hash function for strings.
.

Truncation Method
Easiest Method
Take only part of the key as address, it can be some
rightmost digit or leftmost digit
Example : Let us take some 8 digit keys
82394561, 87139465, 83567271, 85943228
Hash Addresses : 61, 65, 71 and 28 (Rightmost two
digits for Hash table size 100)
Easy to compute but chances of collision is more
because last two digits can be same in different keys.
.

Mid Square Method

We square the key, and we take some digits from the
middle of that number as an address.
This is a very widely used function in symbol table
applications. Since middle bits of the square will
depend on all the bits in the identifier/ key, different
identifiers will result in different hash addresses thus
minimizing collision.
Example : We will choose the middle two digits (the
size of hash table = 100)
If X = 225 , X2 = 050625 , Hash address = 06
If X = 3205 , X2 = 01027205 , Hash address = 27
Folding Method
.

In this method, the identifier X is partitioned into

several parts all of the same length except the last.
These parts are added to obtain the hash address.
Addition is done in two ways.
Shift Folding : All parts except the last are shifted
so that their least significant bits correspond
to each other.
Folding at the boundaries : The identifier is
folded at the part boundaries and the bits
falling together are added.
.

Example

X = 12320324211220

P1 123 P1 123
P2 203 P2 302
P3 241 P3 241
P4 112 P4 211
P5 20 P5 20
------ -------
699 897
Shift Folding Folding at boundaries
.

Modular Method
Take the key, do the modulus operation and get the
remainder as address for hash table.
It ensures that the address will be in the range of hash
table.
Here only one thing we should keep in mind that table size
should not be in power of two, otherwise it will give more
collision.
The best way to minimize the collision is to take the table
size a prime number.
Example : table size : 31
X = 134
f(X) = X % 31 = 134 % 31 = 10
.

Table size: Why Prime?

Suppose
data stored in hash table: 7160, 493, 60,
55, 321, 900, 810
Real-life data tends to
have a pattern
tableSize = 10 Being a multiple of 11 is
data hashes to 0, 3, 0, 5, 1, 0, 0 usually not the pattern 

tableSize = 11
data hashes to 10, 9, 5, 0, 2, 9, 7

118
.

Hash function for floating point

numbers
Getting the hash address for floating point numbers has some what
little bit different approach but it also requires modulus
operation at the end for getting hash address in range of hash
table. The whole operation can be defined as :
Take the fractional part of key.
Multiply the fractional part with the size of the hash table.
Take the integer part of the multiplication result as a hash
address of key.
Example : X = 19.463 Hash table size = 97
0.463 x 97 = 44.911
f(X) = 44
.

Hash function for strings

Every character has some ASCII value that can be used
for calculation in generating hash key value and that
value can be used with modulus operation for
mapping with hash table.
Example : X = suresh Hash table size = 97
Suresh = s + u + r + e + s + h
= 115 + 117 + 114 + 101 + 115 + 104
` = 666
After modulus operation
f(X) = 666 % 97 = 84
.

Characteristics of Good hash

function

1. Good hash function avoids collision.

2. Good hash function is easy to compute

3. Spread the keys evenly in array.

121
.

PART-5

122
.

Collision Resolution
Collision: when two keys map to the same location in the
hash table.

Two ways to resolve collisions:

1. Separate Chaining
2. Open Addressing (linear probing, quadratic probing,
double hashing)

123
Collision Resolution Policies
.

Two classes:
(1) “Open hashing” equals “separate chaining”
(2) “Closed hashing” equals “open addressing”
Difference has to do with whether collisions are stored
outside the table (open hashing) or whether collisions
result in storing one of the records at another slot in the
table (closed hashing)
.

Separate Chaining Insert:

0 10
22
1 107
12
2 42

3
4 Separate chaining: All keys that map to
5 the same hash value are kept in a list (or
“bucket”).
6
7
8
9
125
.

Open Addressing
Insert:
0 38
19
1 8
109
2 10

3
4 Linear Probing: after checking spot h(k),
try spot h(k)+1, if that is full, try h(k)+2,
5 then h(k)+3, etc.
6
7
8
9
126
Linear Probing
.

Insert:
0 29
18
1 43
10
2 36
25
3 46
4
5 Hash table size : 11
6 Hashing Function
7 F(X) = X mod HTsize
8
9
10 127
Linear Probing
.

Insert:
0 10 29
18
1 43
10
2 46 36
25
3 36 46
4 25
5 Hash table size : 11
6 Hashing Function
7 29 F(X) = X mod HTsize
8 18
9
10 43 128
.

Linear Probing
f(i) = i

Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + 1) mod TableSize
2th probe = (h(k) + 2) mod TableSize
...
ith probe = (h(k) + i) mod TableSize

129
.

Linear Probing – Clustering

no collision
collision in small cluster
no collision

collision in large cluster

[R. Sedgewick]

130
.

Quadratic Probing
This method is used to avoid the clustering problem in
the above method.
Suppose hash address is h then in the case of collision,
linear probing search the location h, h+1, h+2 …..
(%size).
In Quadratic probing a quadratic function of i is used
as the increment i.e. instead of checking (i+ 1)th
index, this method checks the index computed from a
quadratic equation. This ensures that the identifiers are
fairly spread out in the table.
.

Quadratic Probing Less likely to

encounter
Primary
Clustering

f(i) = i2

Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + 1) mod TableSize
2th probe = (h(k) + 4) mod TableSize
3th probe = (h(k) + 9) mod TableSize
...
ith probe = (h(k) + i2) mod TableSize
132
.

Quadratic Probing

0 Insert:
89
1 18
49
2 58
79
3
4
5
6
7
8
9 133
.

Quadratic Probing
0
1 Insert:
29
2 18
43
3 10
46
4 54

5
6
7 Hash table size 11 Quadratic
equation = i2
8
9
10 134
.

Quadratic Probing
0 10
1 Insert:
29
2 46 18
43
3 54 10
46
4 54

5
6
7 29 Hash table size 11 Quadratic
equation = i2
8 18
9
10 43 135
.

Quadratic Probing Example

insert(76) insert(40) insert(48) insert(5) insert(55)
76%7 = 6 40%7 = 5 48%7 = 6 5%7 = 5 55%7 = 6
0

1 insert(47)
But…
47%7 = 5
2

6
76

136
Double hashing
.

In this method, if an overflow occurs, a new address is

computed by using another hash function. A series of hash
functions f1,f2, …..,fn are used. Hashed values f1(X),
f2(X), …. fn(X) are examined in order till an empty slot is
found.
Example : H = key % 13 H’ = 11 – (key %11)
So at the time of collision hash address for the next prob
will be as
(H+H’) % 13 = ((key % 13) + (11 – (key % 11))) % 13
.

Double Hashing
f(i) = i * g(k)
where g is a second hash function

Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + g(k)) mod TableSize
2th probe = (h(k) + 2*g(k)) mod TableSize
3th probe = (h(k) + 3*g(k)) mod TableSize
...
ith probe = (h(k) + i*g(k)) mod TableSize
138
.

Double Hashing Example

h(k) = k mod 7 and g(k) = 5 – (k mod 5)

76 93 40 47 10 55

0 0 0 0 0 0
1 1 1 1 47 1 47 1 47
2 2 93 2 93 2 93 2 93 2 93
3 3 3 3 3 10 3 10
4 4 4 4 4 4 55
5 5 5 40 5 40 5 40 5 40
6 76 6 76 6 76 6 76 6 76 6 76
Probes 1 1 1 2 1 2
139
.
0
Double Hashing
1
2
Insert:
3
8
4 55
5 48
6 68
Hash table size 13
7 H = key % 13
8
H’ = 11 – (key %11)
9
(H+H’) % 13 =
10
= ((key % 13) + (11 – (key % 11))) % 13
11
140
12
.
0 Double Hashing
1
2 Insert:
3 55 8
55
4 48
68
5
6 Hash table size 13
7 H = key % 13
8 8
H’ = 11 – (key %11)
9 48
(H+H’) % 13 =
10
= ((key % 13) + (11 – (key % 11))) % 13
11
141
12 68
Resolving Collisions with Double Hashing
.

0
1
2
3
4 Insert these values into the hash
table in this order. Resolve any
5 collisions with double hashing:
6 13
7 28
33
8 147
9 43
142
.

Rehashing
Idea: When the table gets too full, create a bigger
table (usually 2x as large) and hash all the items from
the original table into the new table.
When to rehash?
half full ( = 0.5)
when an insertion fails
some other threshold
Cost of rehashing?

143
Rehashing .

There are chances of insertion failure when hash table is

full. So the solution for this particular case is to create a
new hash table with the double size of previous hash
table.
Here we will use new hash function and we will insert all
the elements of the previous hash table.
So we will scan the elements of previous hash table one
by one and calculate the hash key with new hash function
and insert them into new hash table.
This technique is called rehashing.
It ensures the insertion of element in hash table.
.

PART-6

145
Chaining
.

The problem of the complexity of the search algorithm

will still come into picture as we are using the linear
search for some portion.
To avoid this the concept of the chain comes into
picture.
We will now provide an extra field with the record
which will point to the record number which contains
the record having same hash value.
Chaining means remembering the record number ,
where the record which has same hash value is stored
.

Chaining without Replacement

The hash address of an identifier is computed
If this position is vacant, it is placed there.
It its position is occupied, the identifier is put in the
next vacant position and a chain is formed to the new
position.
.

Resolving Collisions with Chaining without replacement

Hash function : f(X) = X % 10 Table size =
Value Chain
10:
0
1
2
3 Insert these values into the hash
table in this order.
4 11
32
5 41
54
6 33

7
8
9 148
Resolving Collisions with Chaining without.

replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:

0 -1
1 11 3
2 32 -1
Insert these values into the hash
3 41 5 table in this order.
11
4 54 -1 32
41
5 33 -1 54
33
6 -1
7 -1
8 -1
149
9 -1
.

Disadvantage
The main idea is to chain all identifiers having same
hash address (synonyms).
However, since an identifier occupies the position of
another identifier, even non-synonyms get chained
together thereby increasing complexity.
.

Chaining with Replacement

In this method, if another identifier Y is occupying the
position of an identifier X, X replaces it and then Y is
relocated to a new position.
Resolving Collisions with Chaining without.

replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:

0
1
2
Insert these values into the hash
3 table in this order.
11
4 32
41
5 54
33
6
7
8
152
9
Resolving Collisions with Chaining without
.

replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:

0 -1
1 11 5
2 32 -1
Insert these values into the hash
3 33 -1 table in this order.
11
4 54 -1 32
41
5 41 -1 54
33
6 -1
7 -1
8 -1
153
9 -1
Collision Resolution – Open .

Addressing

Linear Probing
new position = (current position + 1) MOD hash size
Example – Before linear probing:

After linear probing:

Problem – Clustering occurs, that is, the used spaces tend to appear
in groups which tends to grow and thus increase the search time
to reach an open space.

154
.

Quadratic Probing
new position = (collision position + j2) MOD hash size
j = 1, 2, 3, 4, ……}
Example – Before quadratic probing:

After quadratic probing:

Problem – Overflow may occurs when there is still
space in the hash table.
155
.

Collision Resolution – Chaining

Example:
Before chaining:

After chaining:

156
.

Rehashing
Rehashing is done when hash table is almost full
Size of the table is to be increased and all keys are
rearranged.

157

Avl Tree and Its Applications
100% (1)
Avl Tree and Its Applications
15 pages
Police Officer CV Examples Uk
100% (2)
Police Officer CV Examples Uk
7 pages
G12 Empowerment Technologies Module (Learner's Copy)
No ratings yet
G12 Empowerment Technologies Module (Learner's Copy)
99 pages
Data Sheet: CB-12-POI-DF-48
100% (1)
Data Sheet: CB-12-POI-DF-48
3 pages
Module 4.5 (AVL Trees)
No ratings yet
Module 4.5 (AVL Trees)
44 pages
Day 5 IELTS Academic Reading Questions by KenyanNurse-1
No ratings yet
Day 5 IELTS Academic Reading Questions by KenyanNurse-1
12 pages
2.5 AVL Trees
No ratings yet
2.5 AVL Trees
20 pages
Trees MCQ Notes
100% (1)
Trees MCQ Notes
8 pages
Acknowledgement Thesis Sample Friends
100% (2)
Acknowledgement Thesis Sample Friends
5 pages
Trees: Balanced Search
No ratings yet
Trees: Balanced Search
84 pages
DISRUPT
100% (1)
DISRUPT
14 pages
AVO Company Profile
No ratings yet
AVO Company Profile
41 pages
AAAdvanced Customisation Part 1
100% (1)
AAAdvanced Customisation Part 1
99 pages
2023 Princeton
No ratings yet
2023 Princeton
45 pages
Lecture 23 Avl Tree Chapter 10 of Textbook: 1. Binary Search Tree (BST)
No ratings yet
Lecture 23 Avl Tree Chapter 10 of Textbook: 1. Binary Search Tree (BST)
37 pages
Lec10 Disjoint Set&Suffix Array
No ratings yet
Lec10 Disjoint Set&Suffix Array
75 pages
ADS Notes
No ratings yet
ADS Notes
29 pages
Lec9 Balanced BST
No ratings yet
Lec9 Balanced BST
69 pages
Agenda - Rail Show - 4th Edition - 2025 (12.6.2025)
No ratings yet
Agenda - Rail Show - 4th Edition - 2025 (12.6.2025)
6 pages
Unit 1 (Long Answers)
No ratings yet
Unit 1 (Long Answers)
49 pages
Slot14 15 Trees Part3 BalancedBST Heaps
No ratings yet
Slot14 15 Trees Part3 BalancedBST Heaps
51 pages
Mind Alarm Report
No ratings yet
Mind Alarm Report
47 pages
Advanced Topics
No ratings yet
Advanced Topics
47 pages
AVL Trees: by Jyostna Devi Bodapati
No ratings yet
AVL Trees: by Jyostna Devi Bodapati
55 pages
2001 Roura
No ratings yet
2001 Roura
12 pages
Trees 2
No ratings yet
Trees 2
24 pages
Motivation Research in Writing
No ratings yet
Motivation Research in Writing
26 pages
AVLTrees
100% (1)
AVLTrees
13 pages
AVL, B and B+ Trees
No ratings yet
AVL, B and B+ Trees
20 pages
5 - Heaps - Binary - and - Avl Trees
No ratings yet
5 - Heaps - Binary - and - Avl Trees
29 pages
Cis - Qfs Jose Huberney
No ratings yet
Cis - Qfs Jose Huberney
4 pages
Avl Tree
No ratings yet
Avl Tree
13 pages
OTIS GLIDE® P Door
No ratings yet
OTIS GLIDE® P Door
4 pages
AVL Trees: Binary Tree For Every Node, Define Its Balance Factor
No ratings yet
AVL Trees: Binary Tree For Every Node, Define Its Balance Factor
23 pages
Dsa Mod4 - Part3
No ratings yet
Dsa Mod4 - Part3
13 pages
6 D AAOT-avl
No ratings yet
6 D AAOT-avl
9 pages
David Stotts Computer Science Department UNC Chapel Hill
No ratings yet
David Stotts Computer Science Department UNC Chapel Hill
32 pages
F505-87 (2011) Standard Practice For Comparative Evaluati
No ratings yet
F505-87 (2011) Standard Practice For Comparative Evaluati
4 pages
Lecture 29-31: Balanced Binary Tree (AVL Tree)
No ratings yet
Lecture 29-31: Balanced Binary Tree (AVL Tree)
30 pages
Gate MCQ Questions On Unit IV-Graphs
100% (1)
Gate MCQ Questions On Unit IV-Graphs
16 pages
Binary Search Trees: 1 BST Basics
No ratings yet
Binary Search Trees: 1 BST Basics
16 pages
CRP 100N
No ratings yet
CRP 100N
2 pages
Chaper Five: Curve Fitting
No ratings yet
Chaper Five: Curve Fitting
44 pages
B Tree
No ratings yet
B Tree
16 pages
8 Balanced - BST - New
No ratings yet
8 Balanced - BST - New
78 pages
(CSE 225) Lecture 21
No ratings yet
(CSE 225) Lecture 21
45 pages
Unit-III Tree-AVL
No ratings yet
Unit-III Tree-AVL
15 pages
Notes From preMBA Stats
No ratings yet
Notes From preMBA Stats
4 pages
Job Order Costing
100% (3)
Job Order Costing
45 pages
Dynamic Dictionaries: Primary Operations
No ratings yet
Dynamic Dictionaries: Primary Operations
31 pages
Ujwala N Jagdale and Ors Vs Jagdale Industries PriNC201723061716192583COM45336-1
No ratings yet
Ujwala N Jagdale and Ors Vs Jagdale Industries PriNC201723061716192583COM45336-1
2 pages
06 Tree Part03
No ratings yet
06 Tree Part03
44 pages
AVL Trees - Horowitz Sahani
No ratings yet
AVL Trees - Horowitz Sahani
31 pages
35 - Data Structure and Algorithms - AVL Trees
No ratings yet
35 - Data Structure and Algorithms - AVL Trees
38 pages
AVL Tree
No ratings yet
AVL Tree
16 pages
Balanced Binary Search Trees (BBSTS) : William Fiset
No ratings yet
Balanced Binary Search Trees (BBSTS) : William Fiset
95 pages
Labrador v. CA
No ratings yet
Labrador v. CA
5 pages
Lecture 2.Pptx 3
No ratings yet
Lecture 2.Pptx 3
41 pages
Tydings-Kosciasowzki Law
No ratings yet
Tydings-Kosciasowzki Law
127 pages
AVL Tree
No ratings yet
AVL Tree
37 pages
Ds 12 AVL Tree
100% (1)
Ds 12 AVL Tree
45 pages
CONSORT - Estudos in Vitro
No ratings yet
CONSORT - Estudos in Vitro
8 pages
AVL Tree
No ratings yet
AVL Tree
34 pages
DSF Unit IV MCQ Notes
No ratings yet
DSF Unit IV MCQ Notes
6 pages
Avl 1
No ratings yet
Avl 1
38 pages
Balanced Search Trees
No ratings yet
Balanced Search Trees
4 pages
Annie Aho Updated Resume
No ratings yet
Annie Aho Updated Resume
2 pages
DS UNIT 4 Part 2
No ratings yet
DS UNIT 4 Part 2
12 pages
Implementation of Stack AIM: Write A Program To Implement Stack As An Abstract Data Type Using Linked List and Use
No ratings yet
Implementation of Stack AIM: Write A Program To Implement Stack As An Abstract Data Type Using Linked List and Use
57 pages
Ada Unit 5
No ratings yet
Ada Unit 5
31 pages
DOST Puts Up Free Online Reviewer For PSHS Exams
No ratings yet
DOST Puts Up Free Online Reviewer For PSHS Exams
2 pages
Laplace Table PDF
No ratings yet
Laplace Table PDF
2 pages
SBA Balanced-Scorecard Script
No ratings yet
SBA Balanced-Scorecard Script
5 pages
M.tech DS-Scheme CIE 2
No ratings yet
M.tech DS-Scheme CIE 2
5 pages
AVL Trees AVL Trees: Unit 4: Height-Balanced Trees
No ratings yet
AVL Trees AVL Trees: Unit 4: Height-Balanced Trees
32 pages
Balance Search Trees
No ratings yet
Balance Search Trees
47 pages
AVL Tree
No ratings yet
AVL Tree
28 pages
Lecture15 (Presorting, BST)
No ratings yet
Lecture15 (Presorting, BST)
46 pages
Nitoprime Primer
No ratings yet
Nitoprime Primer
2 pages
Data Structures Lab 11 AVL Trees
No ratings yet
Data Structures Lab 11 AVL Trees
27 pages
AVL Treee
No ratings yet
AVL Treee
7 pages
Lecture15 (Presorting, BST)
No ratings yet
Lecture15 (Presorting, BST)
46 pages
Question Bank - DSF - Unit 5 and 6
No ratings yet
Question Bank - DSF - Unit 5 and 6
5 pages
CCMT at Nits: Centralized Counselling at National Institutes of Technology
No ratings yet
CCMT at Nits: Centralized Counselling at National Institutes of Technology
34 pages
Standard Template Library: The Probabilities For All The Characters
No ratings yet
Standard Template Library: The Probabilities For All The Characters
8 pages
Final Project
No ratings yet
Final Project
12 pages
DSF Unit III Question Bank
No ratings yet
DSF Unit III Question Bank
3 pages
Session 27
No ratings yet
Session 27
17 pages
Aug 2015 (2012 Pattern)
No ratings yet
Aug 2015 (2012 Pattern)
1 page
Tutorial - 2 Q1: Tan Sinxdx
No ratings yet
Tutorial - 2 Q1: Tan Sinxdx
3 pages
Pressure Controls, Type KP, With Enclosure IP 33, IP 44 or IP 54
No ratings yet
Pressure Controls, Type KP, With Enclosure IP 33, IP 44 or IP 54
8 pages
M.Tech JNTUK ADS UNIT-5
No ratings yet
M.Tech JNTUK ADS UNIT-5
13 pages
13 Rotations Avl Trees
No ratings yet
13 Rotations Avl Trees
30 pages
Financial Admission Requirements For Undergraduate International Students at Auburn University Academic Year 2016
No ratings yet
Financial Admission Requirements For Undergraduate International Students at Auburn University Academic Year 2016
2 pages
Unit-4 3
No ratings yet
Unit-4 3
7 pages
Bath Bombs
No ratings yet
Bath Bombs
2 pages
A07
No ratings yet
A07
35 pages
Post GATE-Admission 2020 IITs - CS
No ratings yet
Post GATE-Admission 2020 IITs - CS
66 pages
AVL Trees - Horowitz Sahni CPP - Lec43
No ratings yet
AVL Trees - Horowitz Sahni CPP - Lec43
31 pages
Telecom Case Study - ETL Design Document
No ratings yet
Telecom Case Study - ETL Design Document
9 pages
How To Solve The Rubik's Cube
No ratings yet
How To Solve The Rubik's Cube
23 pages
Laplace Transforms Essentials
From Everand
Laplace Transforms Essentials
Morteza Shafii-Mousavi
3.5/5 (3)