Unit-V Tables: Sinhgad Institute of Technology
Unit-V Tables: Sinhgad Institute of Technology
UNIT- V
TABLES
CONTENTS .
PART-1
3
.
Symbol Table
AVL TREES
In AVL tree
Balance Factor
A Balance Factor of a node will be defined as height
of left subtree of the node minus the height of right
subtree of the node.
B.F of a node
= ht(leftsubtree) – ht(rightsubtree)
For AVL Trees , the B.F which is measure of
imbalance , should be in the range of -1 , 0 , or 1.
.
20
15 35
10 18 25 40
30 38 45
50
.
-2
20
0 -1
15 35
0 0 -1
-1
10 18 25 40
0
0 -1
30 38 45
0
50
.
Single rotations
LL Rotations
RR Rotations
Double Rotations
LR Rotations
RL Rotations
.
LL Rotations
LL Rotations
2
T1
1
T2 d
T3 c
a b
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is 2, and T2 B.f is 1 then we need to apply LL rotation as
new node is added to the left of T1 and left of T2.
.
LL Rotations
2
T1 T2
1
T2 d T3 T1
LL rotation
T3 c a b c d
a b
LL Rotations Proof
2
T1
1 T2
T2 d
T3 T1
T3 c LL rotation
a a b c d
b
LL Rotations implementations
1. Let T1 be the node with bf 2 and its
parent as par
2. T2 = T1->lc and has bf 1
3. T1->lc = T2->rc
4. T2->rc = T1
5. T1->bf = 0
6. T2->bf = 0
7. Attach T2 to parent node par
.
RR Rotations
When a node is added on the right of right son, such
that the balance factor of the current node becomes -2,
balancing will be achieved by RR Rotations.
.
RR Rotations
-2
T1
-1
a T2
b
T3
c d
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is -2, and T2 B.f is -1 then we need to apply RR rotation as
new node is added to the right of T1 and right of T2.
.
RR Rotations
-2
T1
T2
-1
a T2
T1 T3
RR rotation
b
T3
a b c d
c d
RR Rotations implementations
1. Let T1 be the node with bf -2 and its
parent as par
2. T2 = T1->rc and has bf -1
3. T1->rc = T2->lc
4. T2->lc = T1
5. T1->bf = 0
6. T2->bf = 0
7. Attach T2 to parent node par
.
LR Rotations
When a node is added on the left of Rightson,
such that the balance factor of the current
node becomes 2 and that of its leftson
becomes -1, balancing will be achieved by LR
Rotations.
.
LR Rotations
2
T1
-1
T2 d
a
T3
b c
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is 2, and T2 B.f is -1 then we need to apply LR rotation as
new node is added to the left of T1 and right of T2.
.
LR Rotations
2
T1
-1
T3
T2 d
LR rotation
T2 T1
a T3
a b c d
b c
LR Rotations
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. HT (T2) = max (Ht(a), Ht(b)) + 1
Assumptions : = max( x, will not be > x) + 1
Let ht(d) = x =x+1
HT (T1) = max (Ht(c), Ht(d)) + 1
So, ht(T2) = x + 2
= max( will not be > x , x) + 1
So, Ht(T3) = x + 1 =x+1
And So Ht(a) = x B.F (T3) = O
.
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 1 : When B.F of T3 = 1
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. BT (T2) = Ht(a) - Ht(b)
ht(b) = ht(c) + 1 =x–x=0
BF (T1) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x–1-x
So, ht(b) = x = -1
So, Ht(c) = x - 1
.
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 2 : When B.F of T3 = -1
2 T1
-1
T3
T2 d
LR rotation T2 T1
a T3
a b c d
b c
After Balancing.
Before Balancing. BT (T2) = Ht(a) - Ht(b)
ht(c) = ht(b) + 1 = x – (x – 1) = 1
BF (T1) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x– x
So, ht(c) = x =0
So, Ht(b) = x - 1
.
RL Rotations
When a node is added on the Right of Left son, such
that the balance factor of the current node becomes -2
and that of its Right son becomes 1, balancing will be
achieved by RL Rotations.
.
RL Rotations
-2
T1
1
a T2
T3
d
b c
New node is added any where in the subtree rooted at t3, and
while modifying the balance factors in the parent nodes, if T1
b.f is -2, and T2 B.f is 1 then we need to apply RL rotation as
new node is added to the right of T1 and left of T2.
.
RL Rotations
-2
T1
T3
1
a T2 T1 T2
RL rotation
T3
d a b c d
b c
T3
1
a T2 T1 T2
RL rotation
T3 a
d b c d
b c
After Balancing.
Before Balancing. HT (T2) = max (Ht(a), Ht(b)) + 1
Assumptions : = max( x, will not be > x) + 1
Let ht(a) = x =x+1
So, ht(T2) = x + 2 HT (T1) = max (Ht(c), Ht(d)) + 1
= max( will not be > x , x) + 1
So, Ht(T3) = x + 1 =x+1
And So Ht(d) = x B.F (T3) = O
T1
-2
RL Rotations .
T3
1
a T2 T1 T2
RL rotation
T3 a
d b c d
b c
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 1 : When B.F of T3 = 1
Before Balancing.
After Balancing.
ht(b) = ht(c) + 1
BT (T2) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
= x – 1 – x = -1
So, ht(b) = x BF (T1) = Ht(a) - Ht(b)
So, Ht(c) = x - 1 =x– x
=0
T1
-2
RL Rotations .
T3
1
a T2 T1 T2
RL rotation
T3 a
d b c d
b c
Actually B.F of T3 could be +1 or -1 after adding the node.
Case 2 : When B.F of T3 = -1
Before Balancing.
After Balancing.
ht(b) = ht(c) + 1
BT (T2) = Ht(c) - Ht(d)
But , ht(T3) = x + 1
=x – x=0
So, ht(b) = x - 1 BF (T1) = Ht(a) - Ht(b)
So, Ht(c) = x = x – (x-1)
=1
.
7 17
4 11 53
13
31
.
7 17
4 11 53
13
12
32
.
7 17
4 11 53
12
13
33
.
7 17
4 12 53
11 13
34
.
7 17
4 12 53
11 13
35
.
7 17
4 11 53
8 12
13
36
.
11 17
7 12 53
4 8 13
37
.
11 17
7 12 53
4 8 13
38
.
11 17
7 12
4 8 13
39
.
7 14
4 8 12 17
13
40
.
for for
do while
do int
int
if while
if Average no of comparisions
( 1 + 2 + 2 + 3 + 4 ) / 5 = 12 / 5 ( 1 + 2 + 2 + 3 + 3 ) / 5 = 11 / 5
Each identifier is searched with equal probability and that no unsuccessful searches are
made.
.
do while
do int
int
if while
if
External
nodes
.
PART-2
51
.
OBST
Formula for the expected cost of a BST
∑ p(i) * level(ai) + ∑ q(i) * ( level(Ei) – 1)
1≤i≤n 0≤i≤n
We define an optimal binary search tree for the
identifier set {a1, a2 ,……, an} to be a binary search
tree for which the cost is minimum.
.
Example
Identifier set {a1, a2 , a3} = {do, if , while}
With equal probabilities p(i) = q(i) = 1/7 for all i.
while if
if
do while
do
(a) (b)
.
do while
if
do
while
if
(c) (d)
.
do
while
if
(e)
.
if
do int
while
Huffman coding
Proposed by Dr. David A. Huffman in 1952
“A Method for the Construction of Minimum Redundancy
Codes”
Applicable to many forms of data transmission
61
.
P 18
Q 8
R 15
S 2
T 25
U 13
V 5
W 26
62
.
(86)
63
.
A 0.07
B 0.09
C 0.12
D 0.22
E 0.23
F 0.27
64
.
PART-3
65
.
Heapsort
Combines best features of two sorting algorithms.
Introduces another algorithm design technique : the
use of a data structure called heap.
Heap data structure useful for heapsort and also
makes an efficient priority queue.
.
Heap
The (binary) heap data structure is an array object that
can be viewed as a nearly complete binary tree.
Each node of the tree corresponds to an element of the
array that stores the value in the node.
The tree is completely filled on all levels except
possibly the lowest, which is filled from the left up to a
point.
.
Heap
An array A that represents a heap is an object with two
attributes : lengthA, which is the number of elements in
the array.
And heapSizeA , the number of elements in the heap
stored within array A.
That is, although A[1 ….. lengthA] may contain valid
numbers, no element past A[heapSizeA], where
heapSizeA <= lengthA ,is an element of the heap.
A max heap viewed as binary tree and an array .
1 2 3 4 5 6 7 8 9 10
16 14 10 8 7 9 3 2 4 1
1
16
2 3
14 10
4 5 6 7
8 7 9 3
8 9 10
2 4 1
.
Types
Max-heaps and Min-heaps
In both kinds the values in the nodes satisfy a heap
property, the specifics of which depend on the kind
of heap.
In a max-heap, the max-heap property is that for
every node other than the root,
A[Parent(i)] >= A[i]
In a min-heap, the min-heap property is that for
every node other than the root,
A[Parent(i)] <= A[i]
Five Basic Procedures
.
1. l = Left(i)
2. r = Right(i)
3. If l ≤ heapSizeA and A[l] > A[i]
4. then largest l
5. else largest i
6. If r ≤ heapSizeA and A[r] > A[largest]
7. then largest r
8. If largest ≠ i
9. then Exchange A[i] ↔ A[largest]
10. MAX-HEAPIFY(A, largest)
.
1
16
i
2 3
4 10
5 6 7
4
14 7 9 3
8 9 10
2 8 1
.
1
16
2 3
14 10
i 7
4 5 6
4 7 9 3
8 9 10
2 8 1
.
1
16
2 3
14 10
4 5 6 7
8 7 9 3
i
8 9 10
2 4 1
.
Analysis
RT of MAX-HEAPIFY on a subtree of size n rooted at given
node i is the Θ(1) time to fix up the relationships among the
elements A[i], A[Left(i)], and A[Right(i)], plus the time to run
MAX-HEAPIFY on a subtree routed at one of the children of
node i.
The children’s subtrees each have size atmost 2n/3 – the
worst case occurs when the last row of the tree is exactly half
full – and the running time of MAX-HEAPIFY can therefore
be described by recurrence relation.
T(n) ≤ T(2n/3) + Θ(1)
The solution to this recurrence by master theorem is T(n) =
O(lg n)
We can also characterize the running time of MAX-
HEAPIFY on a node of height h as O(h).
.
Building a heap
We can use the procedure MAX-HEAPIFY in a
bottom-up manner to convert an array A[1 .. n] , where
n = lengthA , into a max-heap.
The Elements in the subarray A[ (|_n/2_| + 1) .. n ] are
all leaves of the tree and so each is 1-element heap to
begin with.
The procedure BUILD-MAX_HEAP goes through the
remaining nodes of the tree and runs MAX-HEAPFIY
on each one.
.
BUILD_MAX_HEAP(A)
1. heapsizeA lengthA
2. for i |_ lengthA / 2 _| down to 1
3. do MAX_HEAPIFY(A,i)
Build heap example .
1 2 3 4 5 6 7 8 9 10
4 1 3 2 16 9 10 14 8 7
1
4
2 3
1 3
4 i 5 6 7
2 16 9 10
8 9 10
14 8 7
(a)
.
1
4
2 3
1 3
i 5 6 7
4
2 16 9 10
8 9 10
14 8 7
(b)
.
1
4
2 i 3
1 3
4 5 6 7
14 16 9 10
8 9 10
2 8 7
(c)
.
1
4
i
2 3
1 10
4 5 6 7
14 16 9 3
8 9 10
2 8 7
(d)
.
1
i
4
2 3
16 10
5 6 7
4
14 7 9 3
8 9 10
2 8 1
(e)
.
1
16
2 3
14 10
4 5 6 7
8 7 9 3
8 9 10
2 4 1
(f)
The Heapsort Algorithm
.
HEAPSORT(A)
1. BUILD_MAX_HEAP(A)
2. for i lengthA downto 2
3. do exhange A[1] ↔ A[i]
4. heapsizeA heapsizeA -1
5. MAX_HEAPIFY(A,1)
.
Analysis
The HeapSort procedure takes time O(n lg n), since the
call to BUILD_MAX_HEAP takes time O(n) and each of
the n-1 calls to MAX_HEAPIFY takes time O(lg n).
.
1
16
2 3
14 10
5 6 7
4
8 7 9 3
8 9 10
2 4 1
(a)
.
1
14
2 3
8 10
5 6 7
4
4 7 9 3
i
8 9 10
2 1 16
(b)
.
1
10
2 3
8 9
5 6 7
4
4 7 1 3
i
8 9 10
2 14 16
(c)
.
1
9
2 3
8 3
5 6 7
4
4 7 1 2
i
8 9 10
10 14 16
(d)
.
1
8
2 3
7 3
i 7
4 5 6
4 2 1 9
8 9 10
10 14 16
(e)
.
1
7
2 3
4 3
i
5 6 7
4
1 2 8 9
8 9 10
10 14 16
(f)
.
1
4
2 3
2 3
i
5 6 7
4
1 7 8 9
8 9 10
10 14 16
(g)
.
1
3
2 3
2 1
i
5 6 7
4
4 7 8 9
8 9 10
10 14 16
(h)
.
1
2
i
2 3
1 3
5 6 7
4
4 7 8 9
8 9 10
10 14 16
(i)
.
1
1
i
2 3
2 3
5 6 7
4
4 7 8 9
8 9 10
10 14 16
(j)
.
1 2 3 4 5 6 7 8 9 10
1 2 3 4 7 8 9 10 14 16
.
Priority Queues
Most popular application of heap, its use as an efficient
priority queue.
A priority queue is a data structure for maintaining a set
S of elements, each with an associated value called a key.
Operations
INSERT(S,x) inserts the element x into the Set S.
EXTRACT_MAX(S) removes and returns the
element of S with the largest key.
MAXIMUM(S) returns the element of S with the
largest key.
.
HEAP_EXTRACT_MAX(A)
1. if heapsizeA < 1
2. then error “heap underflow”
3. max A[1]
4. A[1] A[heapsizeA]
5. heapsizeA heapsizeA -1
6. MAX_HEAPIFY(A,1)
7. return max
.
HEAP_INSERT(A,key)
1. heapsizeA heapsizeA + 1
2. i heapsizeA
3. while i > 1 and A[Parent(i)] < key
4. do A[i] A[Parent(i)]
5. i Parent(i)
6. A[i] key
.
PART-4
103
.
Hashing
If we have a table organization and a search
technique which tries to retrieve the key in a single
access, it would be very efficient. i.e now our need is
to search the element in constant time and less key
comparisons should be involved. To do so, the
position of the key in the table should not depend
upon the other keys but the location should be
calculated on the basis of the key itself. Such an
organization and search technique is called hashing.
.
Hashing Terminology
Hash Function : A function that transforms a key X into
a table index is called a hash function.
Hash Address : The address of X computed by the hash
function is called the hash address.
Synonyms : Two identifiers I1 and I2 are synonyms if
f(I1) = f(I2)
.
Hashing Terminology
Collision : When two non identical identifiers are
mapped into the same location in the hash table, a
collision is said to occur, i.e. f(I1) = f(I2)
Overflow : An overflow is said to occur when an
identifier gets mapped onto a full bucket. When s = 1
i.e, a bucket contains only one record, collision and
overflow occur simultaneously. Such situation is called
a Hash clash.
Bucket : Each hash table is partitioned into b buckets
ht[0] …… ht[b-1].
Each bucket is capable of holding ‘s’ records. Thus, a
bucket consists of ‘s’ slots. When s = 1, each bucket
can hold 1 record.
The function f(X) maps an identifier X into one of the
‘b’ buckets i.e. from 0 to b-1.
.
Hash Tables
hash table
hash function:
h(K)
Constant time accesses! …
A hash table is an array of some
fixed size, usually a prime number. General idea:
key space (e.g., integers, strings) TableSize –1
108
.
Example
0
key space = integers 1
2
TableSize = 10
3
4
h(K) = K mod 10 5
6
Insert: 7, 18, 41, 94 7
8
9
109
.
Another Example
110
.
Hash Functions
A hashing function f transforms an identifier X into a bucket address in the
hash table.
Characteristics:
1. simple/fast to compute,
2. Avoid collisions
3. have keys distributed evenly among
cells.
111
.
Hash Functions
Truncation Method
Mid Square Method
Folding Method
Modular Method
Hash function for floating point numbers
Hash function for strings.
.
Truncation Method
Easiest Method
Take only part of the key as address, it can be some
rightmost digit or leftmost digit
Example : Let us take some 8 digit keys
82394561, 87139465, 83567271, 85943228
Hash Addresses : 61, 65, 71 and 28 (Rightmost two
digits for Hash table size 100)
Easy to compute but chances of collision is more
because last two digits can be same in different keys.
.
Example
X = 12320324211220
P1 123 P1 123
P2 203 P2 302
P3 241 P3 241
P4 112 P4 211
P5 20 P5 20
------ -------
699 897
Shift Folding Folding at boundaries
.
Modular Method
Take the key, do the modulus operation and get the
remainder as address for hash table.
It ensures that the address will be in the range of hash
table.
Here only one thing we should keep in mind that table size
should not be in power of two, otherwise it will give more
collision.
The best way to minimize the collision is to take the table
size a prime number.
Example : table size : 31
X = 134
f(X) = X % 31 = 134 % 31 = 10
.
tableSize = 11
data hashes to 10, 9, 5, 0, 2, 9, 7
118
.
121
.
PART-5
122
.
Collision Resolution
Collision: when two keys map to the same location in the
hash table.
123
Collision Resolution Policies
.
Two classes:
(1) “Open hashing” equals “separate chaining”
(2) “Closed hashing” equals “open addressing”
Difference has to do with whether collisions are stored
outside the table (open hashing) or whether collisions
result in storing one of the records at another slot in the
table (closed hashing)
.
3
4 Separate chaining: All keys that map to
5 the same hash value are kept in a list (or
“bucket”).
6
7
8
9
125
.
Open Addressing
Insert:
0 38
19
1 8
109
2 10
3
4 Linear Probing: after checking spot h(k),
try spot h(k)+1, if that is full, try h(k)+2,
5 then h(k)+3, etc.
6
7
8
9
126
Linear Probing
.
Insert:
0 29
18
1 43
10
2 36
25
3 46
4
5 Hash table size : 11
6 Hashing Function
7 F(X) = X mod HTsize
8
9
10 127
Linear Probing
.
Insert:
0 10 29
18
1 43
10
2 46 36
25
3 36 46
4 25
5 Hash table size : 11
6 Hashing Function
7 29 F(X) = X mod HTsize
8 18
9
10 43 128
.
Linear Probing
f(i) = i
Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + 1) mod TableSize
2th probe = (h(k) + 2) mod TableSize
...
ith probe = (h(k) + i) mod TableSize
129
.
no collision
collision in small cluster
no collision
[R. Sedgewick]
130
.
Quadratic Probing
This method is used to avoid the clustering problem in
the above method.
Suppose hash address is h then in the case of collision,
linear probing search the location h, h+1, h+2 …..
(%size).
In Quadratic probing a quadratic function of i is used
as the increment i.e. instead of checking (i+ 1)th
index, this method checks the index computed from a
quadratic equation. This ensures that the identifiers are
fairly spread out in the table.
.
f(i) = i2
Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + 1) mod TableSize
2th probe = (h(k) + 4) mod TableSize
3th probe = (h(k) + 9) mod TableSize
...
ith probe = (h(k) + i2) mod TableSize
132
.
Quadratic Probing
0 Insert:
89
1 18
49
2 58
79
3
4
5
6
7
8
9 133
.
Quadratic Probing
0
1 Insert:
29
2 18
43
3 10
46
4 54
5
6
7 Hash table size 11 Quadratic
equation = i2
8
9
10 134
.
Quadratic Probing
0 10
1 Insert:
29
2 46 18
43
3 54 10
46
4 54
5
6
7 29 Hash table size 11 Quadratic
equation = i2
8 18
9
10 43 135
.
1 insert(47)
But…
47%7 = 5
2
6
76
136
Double hashing
.
Double Hashing
f(i) = i * g(k)
where g is a second hash function
Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + g(k)) mod TableSize
2th probe = (h(k) + 2*g(k)) mod TableSize
3th probe = (h(k) + 3*g(k)) mod TableSize
...
ith probe = (h(k) + i*g(k)) mod TableSize
138
.
76 93 40 47 10 55
0 0 0 0 0 0
1 1 1 1 47 1 47 1 47
2 2 93 2 93 2 93 2 93 2 93
3 3 3 3 3 10 3 10
4 4 4 4 4 4 55
5 5 5 40 5 40 5 40 5 40
6 76 6 76 6 76 6 76 6 76 6 76
Probes 1 1 1 2 1 2
139
.
0
Double Hashing
1
2
Insert:
3
8
4 55
5 48
6 68
Hash table size 13
7 H = key % 13
8
H’ = 11 – (key %11)
9
(H+H’) % 13 =
10
= ((key % 13) + (11 – (key % 11))) % 13
11
140
12
.
0 Double Hashing
1
2 Insert:
3 55 8
55
4 48
68
5
6 Hash table size 13
7 H = key % 13
8 8
H’ = 11 – (key %11)
9 48
(H+H’) % 13 =
10
= ((key % 13) + (11 – (key % 11))) % 13
11
141
12 68
Resolving Collisions with Double Hashing
.
0
1
2
3
4 Insert these values into the hash
table in this order. Resolve any
5 collisions with double hashing:
6 13
7 28
33
8 147
9 43
142
.
Rehashing
Idea: When the table gets too full, create a bigger
table (usually 2x as large) and hash all the items from
the original table into the new table.
When to rehash?
half full ( = 0.5)
when an insertion fails
some other threshold
Cost of rehashing?
143
Rehashing .
PART-6
145
Chaining
.
7
8
9 148
Resolving Collisions with Chaining without.
replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:
0 -1
1 11 3
2 32 -1
Insert these values into the hash
3 41 5 table in this order.
11
4 54 -1 32
41
5 33 -1 54
33
6 -1
7 -1
8 -1
149
9 -1
.
Disadvantage
The main idea is to chain all identifiers having same
hash address (synonyms).
However, since an identifier occupies the position of
another identifier, even non-synonyms get chained
together thereby increasing complexity.
.
replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:
0
1
2
Insert these values into the hash
3 table in this order.
11
4 32
41
5 54
33
6
7
8
152
9
Resolving Collisions with Chaining without
.
replacement
Hash function : f(X) = X % 10 Table size =
Value Chain
10:
0 -1
1 11 5
2 32 -1
Insert these values into the hash
3 33 -1 table in this order.
11
4 54 -1 32
41
5 41 -1 54
33
6 -1
7 -1
8 -1
153
9 -1
Collision Resolution – Open .
Addressing
Linear Probing
new position = (current position + 1) MOD hash size
Example – Before linear probing:
Problem – Clustering occurs, that is, the used spaces tend to appear
in groups which tends to grow and thus increase the search time
to reach an open space.
154
.
Quadratic Probing
new position = (collision position + j2) MOD hash size
j = 1, 2, 3, 4, ……}
Example – Before quadratic probing:
Example:
Before chaining:
After chaining:
156
.
Rehashing
Rehashing is done when hash table is almost full
Size of the table is to be increased and all keys are
rearranged.
157