0% found this document useful (0 votes)
11 views

Unit -4 Search Trees

ADS

Uploaded by

tusharmhans
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Unit -4 Search Trees

ADS

Uploaded by

tusharmhans
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 192

MIT Art Design and Technology University

MIT School of Computing, Pune


Department of Computer Science and Engineering

21BTCS402-Advanced Data Structures & Algorithms

Class - S.Y. (SEM-IV), <11 & 12>

Unit - IV
SEARCH TREES, INDEXING AND MULTIWAY TREES
Prof. Neha V. Chaube
AY 2022-2023 SEM-II
L1
Unit IV - Syllabus

Unit IV–SEARCH TREES, INDEXING AND MULTIWAY TREES 09 hours


• Symbol Table-Representation of Symbol Tables, Static tree table and Dynamic tree table,
• Introduction to Dynamic Programming, Weight balanced tree, Optimal Binary Search Tree
(OBST), OBST as an example of Dynamic Programming
• Height Balanced Tree- AVL tree, Indexing and Multiway Trees- Indexing, indexing
techniques, Types of search tree- Multiway search tree, B-Tree, Trie Tree, Splay Tree, Red-
Black Tree, K-dimensional tree,
• Heap-Basic concepts, realization of heap and operations, Heap as a priority queue, heap sort
Symbol Table
◻ Symbol table is an important data structure defined as the set of
<Name, Value> pairs.

Operations or Functions of symbol table :

1. create() : symbol_table
2. destroy() : symbol_table
3. enter (name, value) : pointer to an entry
4. find (name, table) : pointer to an entry
5. set_attributes (*entry, attributes)
6. get_attributes (*entry) : attributes
4
Symbol Table – Implementation
techniques
◻ Basic Operations : enter() and find()
◻ Considerations:
🞑 Number of names
🞑 Storage space
🞑 Retrieval time
◻ Organizations:
🞑 Unordered List - (linked list/array)
🞑 Ordered List – Binary search on arrays
🞑 Binary Search Trees
🞑 Hash Tables – Most common approach (constant time)

5
Symbol table presentation using List
• Array is used to store names and associated information
• A pointer “available” is maintained at end of all stored records and new
names are added in the order as they arrive
• To search for a name we start from the beginning of the list till available
pointer and if not found we get an error “use of the undeclared name”
• While inserting a new name we must ensure that it is not already present
otherwise an error occurs i.e. “Multiple defined names”
• Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
• The advantage is that it takes a minimum amount of space.
Symbol table presentation using
Linked List

• A link field is added to each record.


• Searching of names is done in order pointed by the link of the link
field.
• A pointer “First” is maintained to point to the first record of the
symbol table.
• Insertion is fast O(1), but lookup is slow for large tables – O(n) on
average
Symbol table presentation using
Hash Table
• Two tables are maintained – a hash table and symbol table
• A hash table is an array with an index range: 0 to table size – 1. These
entries are pointers pointing to the names of the symbol table.
• To search for a name we use a hash function that will result in an
integer between 0 to table size – 1.
• Insertion and lookup can be made very fast – O(1).
• The advantage is quick to search is possible
• Disadvantage is that hashing is complicated to implement.
Symbol table presentation using
BST

• We add two link fields i.e. left and right child.

• All names are created as child of the root node that always follows
the property of the binary search tree.

• Insertion and lookup are O(log2 n) on average.


Operations of Symbol table

10
Use of symbol table
◻ A data structure used by a language translator such as a compiler or
interpreter.
◻ Where, each symbol in a program's source code is associated with
information relating to its –
🞑 Declaration
🞑 Scope or appearance in the source
◻ Store information about the occurrence of various entities such as
variable names, function names, objects, classes, interfaces, etc.

Source Target
COMPILER
Program Program

Errors
11
12
13
14
Lexical Analyzer
15

◻ Lexical Analyzer converts sequence of characters into a sequence of


tokens.
◻ A token describes a pattern of characters having same meaning in the
source program.
◻ (such as identifiers, operators, keywords, numbers, delimeters and so
on)
◻ Puts information about identifiers into the symbol table.
◻ Ex: newval = oldval + 12 → tokens:

newval identifier
= assignment operator
oldval identifier
+ add operator
12 a number
Syntax Analyzer
16

◻ A Syntax Analyzer creates the syntactic structure (generally a parse


tree) of the given program.
◻ A syntax analyzer is also called as a parser.
◻ A parse tree describes a syntactic structure.
17
Semantic Analyzer
18

◻ A semantic analyzer checks the source program for


semantic errors and collects the type information for
the code generation.
◻ Type-checking is an important part of semantic
analyzer.
◻ Ex:
newval := oldval + 12

■ The type of the identifier newval must match with type


of the expression (oldval+12)
Intermediate Code
19
Generation
◻ A compiler may produce an explicit intermediate codes
representing the source program.
◻ These intermediate codes are generally machine
(architecture independent).
◻ Ex:
newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2, id3, temp1


ADD temp1, #1, temp2
MOV temp2, id1
Code Optimizer (for Intermediate
Code Generator)
20

◻ The code optimizer optimizes the code produced by the


intermediate code generator in the terms of time and space.

◻ Ex:

MULT id2,id3,temp1
ADD temp1,#1,id1
Code Generator
21

◻ Produces the target language in a specific architecture.


◻ The target program is normally is a relocatable object
file containing the machine codes.

◻ Ex:

MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1
Symbol tables
22

◻ REPOSITORY of all information within a compiler. All parts of a


compiler communicate thru this table and access the data-
symbols.
◻ Symbol Table is a “scratch pad” where the compiler keeps
information about the “objects” in a program
🞑 Variables (storage areas)
🞑 Functions, procedures

◻ Enables compiler to do type checking and determine/control


scope of a variable
Symbol Table types
23

◻ Static symbol table


🞑 Stores the fixed amount of information.
🞑 No insertion and deletion of any entry.
🞑 Example – OBST with fixed set of identifiers, to build minimum cost tree.
🞑 Huffman code can also be implemented using static symbol table.
◻ Dynamic
🞑 Stores the dynamic information.
🞑 Entries are constantly changing with frequent insertions and deletions of
nodes.
🞑 Example – AVL tree i.e. height-balance tree.
◻ Used to implement static and/or dynamic data structures.
L2
Introduction to Dynamic
Programming
• Dynamic Programming is mainly an optimization over plain recursion.
• Wherever we see a recursive solution that has repeated calls for same
inputs, we can optimize it using Dynamic Programming.
• The idea is to simply store the results of sub-problems, so that we do
not have to re-compute them when needed later.

Recursion Dynamic Programming


Weight Balanced Tree
• The aim is to balance the weight of a given tree
in terms of number of leaves.
• The weight of a node depends on the weight of
its children.
• A Binary Search Tree is weight balanced if for
each node the number of nodes in the left
subtree is at least half and at most twice the
number of nodes in the right subtree.
Features
• Depth of a Weight-Balanced Binary Search Tree is always of the order
log(N), where N is the total number of nodes inserted into the tree.
• Search operations in a Weight Balanced Tree take O(Log(N)) time. The
function is the same as the search operation in a Binary Search Tree.
However, it is more efficient because the height of the tree is balanced.
• Insertion operations take O(log(N)) time. It takes O(Log(N)) time to search
for the appropriate location, O(1) time to insert, and O(Log(N)) time to
rebalance.
• Similar to the insertion operation, deletion operation takes O(Log(N)) time.
Deletion requires O(Log(N)) time to search for the node, O(1) time to
delete, and O(Log(N)) time to rebalance.
OBST – Optimal BST (Need)
28

◻ Symbol tables are used to store information or implement


any data structures.
◻ For BST-
🞑 Searching – The arrangement of nodes affect searching time.
🞑 Arrange BST in some specific manner to achieve quick and
efficient search of any node.
◻ Example – Consider 3 nodes say A,B,C ( 6 different ways to
create a tree )
Optimal BST ?
Tree 1 Tree 2
29

Node # comparisons
A 3 Node # comparisons
B 2 A, C 2
C 1 B 1

Optimal

What about unsuccessful searches ?


Unsuccessful searches;
optimality, and OBST
30

P1, P2, P3 are the


probabilities of successful
searches.

q0, q1, q2, q3 are the


probabilities of unsuccessful
searches.
Assume all probabilities as
1/7
Optimal Binary Search
31
Tree
◻ OBST is one special kind of advanced tree.
◻ It focus on how to reduce the searching cost of the BST.
◻ It needs calculation to record probabilities, weight, cost, and
root.
◻ Example – Dictionary in the form of BST
🞑 Aim : Make this BST efficient by arranging frequently used words
nearer to the root and less frequently used words away from the root.
OR
🞑 Element having more probability of appearance should be placed
nearer to the root and the Element having lesser probability should be
placed away from the root.
Optimal Binary Search
32
Tree
◻ n identifiers:a1<a2<a3< …. <an
◻ Pi ,1≤i ≤ n : the probability that ai is searched. (successful search)
◻ Qi , 0 ≤ i ≤ n: the probability that x is searched; (unsuccessful
search)
🞑 where ai < x < ai+1 (a0=-∞,an+1= ∞)
Optimal Binary Search
33
Tree
Optimal Binary Search Tree
Dynamic Programming Approach
34

Notations:
◻ Tij = OBST (ai+1, …., aj)
◻ Cij = Cost of Tij
◻ Wij = Weight of each of Tij
◻ rij = Root values of Tij
Formulae:
Optimal Binary Search Tree
Dynamic Programming Approach
35

Example:
◻ Consider n = 4 and keys = (do, if, int, while); p(1:4) = (3,3,1,1)
and q(0:4) = (2,3,1,1,1). Construct the OBST.
Formulae:
Optimal Binary Search Tree
Dynamic Programming Approach
36

▪ Consider n = 4 and keys = (do, if, int, while);


▪ p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1). Construct the OBST.

j-1
Optimal Binary Search
Tree
Dynamic Programming Approach
37

▪ Consider n = 4 and keys = (do, if, int, while);


▪ p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1).

i=0; i=0; j - i = 1 i=3; j - i = 1 i=2; j - i = 2

W00 = q0 = 2 W01 = q0 + q1 + p1 W34 = q3 + q4 + p4 W24 = W23+ p4+ q4


=2+3+3=8 =1+1+1=3 =3+1+1=5
W11 = q1 = 3 i=1; j - i = 1 i=0; j - i = 2 i=0; j - i = 3

W22 = q2 = 1 W12 = q1 + q2 + p2 W02 = W01 + p2 + q2 W03 = W02+p3 + q3


=3+1+3=7 = 8 + 3 + 1 = 12 = 12+1 +1 = 14
W33 = q3 = 1 i=2; j - i = 1 i=1; j - i = 2 i=1; j - i = 3
W14 = 11
W44 = q4 = 1 W23 = q2 + q3 + p3 W13 = W12 + p3 + q3 i=0; j - i = 4
=1+1+1=3 = 7 + 1 + 1 = 9 W04 = 16
Optimal Binary Search
Tree
Dynamic Programming Approach
38

▪ Consider n = 4 and keys = (do, if, int, while);


▪ p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1).

i=0; i=0; j - i = 1 i=3; j - i = 1 i=2; j - i = 2

W00 = q0 = 2 W01 = q0 + q1 + p1 W34 = q3 + q4 + p4 W24 = W23+ p4+ q4


=2+3+3=8 =1+1+1=3 =3+1+1=5
W11 = q1 = 3 i=1; j - i = 1 i=0; j - i = 2 i=0; j - i = 3

W22 = q2 = 1 W12 = q1 + q2 + p2 W02 = W01 + p2 + q2 W03 = W02+p3 + q3


=3+1+3=7 = 8 + 3 + 1 = 12 = 12+1 +1 = 14
W33 = q3 = 1 i=2; j - i = 1 i=1; j - i = 2 i=1; j - i = 3
W14 = 11
W44 = q4 = 1 W23 = q2 + q3 + p3 W13 = W12 + p3 + q3 i=0; j - i = 4
=1+1+1=3 = 7 + 1 + 1 = 9 W04 = 16
Optimal Binary Search Tree
Dynamic Programming Approach
39

▪ Consider n = 4 and keys = (do, if, int, while);


▪ p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1). Construct the OBST.

j-1

Calculated Wij
Optimal Binary Search
Tree
Computer C and r
40

▪ Keys = (do, if, int, while); p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1).

j-1
Optimal Binary Search
Tree
Computer C and r
41

▪ Keys = (do, if, int, while);


▪ p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1).
Cij; where, j-i = 0 Cij; where, j-i ≥ 2

C00 = C11 = C22 = C33 = C44 = 0 C02, C03, C04 ; C12, C14 ; C24
r00 = r11 = r22 = r33 = r44 = 0
Cij; where, j-i = 1

C01 = q0+q1+p1 C34 = q3+q4+p4


= 2+3+3 = 8 = 1+1+1=3
C12 = q1+q2+p2 r01 = 1
= 3+1+3 = 7 r12 = 2
C23 = q2+q3+p3 r23 = 3
= 1+1+1 = 3 r34 = 4
Optimal Binary Search
Tree
Computer C and r
42

▪ Keys = (do, if, int, while);


▪ p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1).

Cij; where, j-i = 0 Cij; where, j-i ≥ 2

C00 = C11 = C22 = C33 = C44 = 0


C02
r00 = r11 = r22 = r33 = r44 = 0
Cij; where, j-i = 1 k is between 0 & 2
C01 = 8; r01 = 1 C23 = 3; r23 = 3 i=0; j = 2; k = 1 i=0; j = 2; k = 2

C12 = 7 ; r12 = 2 C34 = 3; r34 = 4 C02 = C00 + C12 C02 = C01 + C22
=0+7=7 =8+0=8
C02, C03, C04 ; C12, C14 ; C24 C02 = min { 7, 8 } + 12 = 19
r02 = 1 ( as we get min value with k =1)
Optimal Binary Search
Tree
Computer C and r
43

▪ Keys = (do, if, int, while); p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1).

Cij; where, j-i ≥ 2

k is between 0 & 3
i=0; j = 3; k = 1 i=0; j = 3; k = 2 i=0; j = 3; k = 3

C03= C00 + C13 C03 = C01 + C23 C03 = C02 + C33


= 0 + 12 = 12 = 8 + 3 = 11 = 19 + 0 = 19
C03 = min { 12, 11, 19 } + 14 = 25
r03 = 2 ( as we get min value with k = 2)
Optimal Binary Search Tree
Dynamic Programming Approach
44

▪ OBST for keys = (do, if, int, while);

Root of tree T04 is r04 =


2
Key with index 2 (if)
rij = k
ri,k-1 becomes left child
rk,j becomes right child
L2
Optimal Binary Search Tree
Dynamic Programming Approach
46

▪ OBST for keys = (do, if, int, while);

if

do int

whil
e
Balanced binary tree
◻ The disadvantage of a BST - its height can be as large as N-1
◻ The time needed to perform insertion and deletion and many
other operations can be O(N) in the worst case.
◻ We want a tree with small height.
◻ Goal is to keep the height of a binary search tree O(log N)
◻ Such trees are called balanced binary search trees.
◻ Examples
🞑 AVL tree
🞑 Red-black tree
AVL - First dynamic balanced trees

▪ AVL tree is a self-balancing BST, where the difference between


heights of left and right subtrees cannot be more than one for all
nodes.
▪ Named after their inventors, Adelson-Velskii and Landis.
▪ Good for lookup-intensive applications
▪ Searching of desired node is faster due to balancing of tree height.

Definition-
▪ An AVL tree is a balanced binary search tree.
▪ In an AVL tree, balance factor of every node is either -1, 0 or +1.
▪ Every subtree is an AVL tree.

Balance Factor = Height of Left Subtree – Height


of Right Subtree
Height of BST
Insertion in AVL tree
50

STEPS-
◻ Insert the element in the AVL tree in the same way the insertion is
performed in BST.
◻ After insertion, check the balance factor of each node of the resulting
tree.
CASES
Balance factor of each node IS {- Balance factor of each node is NOT {-
1,0,1} 1,0,1}
The tree is considered to be balanced. The tree is considered to be imbalanced.

Conclude the operation. Perform the suitable ROTATION(s) to


balance the tree.
Insert the next element if any. After the tree is balanced, insert the
next element if any.
Searching and Deletion in
51
AVL tree
STEPS-
◻ Searching is similar to searching in BST.
◻ Deletion is also similar to deletion in BST.
🞑 But after every deletion operation we need to check with the
Balance Factor condition.
🞑 If the tree is balanced after deletion go for next operation;
🞑 otherwise perform suitable rotation to make the tree Balanced.
AVL Rotations
◻ Rotation is the process of moving nodes either to left or to right to
make the tree balanced.
◻ Rotation types:

Single Rotations
• Left Rotation (LL Rotation)
• Right Rotation (RR Rotation)
Double Rotations

• Left Right Rotation (LR Rotation)


• Right Left Rotation (RL Rotation)
LL Rotations – Single Left
53
Rotation
◻ In LL Rotation, every node moves one position to left from the current
position.
◻ Tree is unbalanced because of adding right child in the right subtree of an
unbalanced node.
RR Rotations – Single Right
54
Rotation
◻ In RR Rotation, every node moves one position to right from the current
position.
◻ Tree is unbalanced because of adding left child in the left subtree of an
unbalanced node.
LR Rotations – Double
55
Rotations
◻ The LR Rotation is sequence of single left rotation followed by single right
rotation.
◻ Tree is unbalanced because of adding right child in the left subtree of an
unbalanced node.
LR Rotations – Double
56
Rotations
RL Rotations – Double
57
Rotations
◻ The RL Rotation is sequence of single right rotation followed by single left
rotation.
◻ Tree is unbalanced because of adding left child in the right subtree of an
unbalanced node.
RL Rotations – Double
58
Rotations
Rotations - Possibilities
59
15

15 Add 16
5 20

5 20
17 25

17 25
16
Balanced Tree The pattern of adding 16 is RLL
But consider only 3 nodes for
rotation including the unbalanced
node.
60

◻ AVL Animation

https://www.cs.usfca.edu/~galles/visualization/AVLtree.html
AVL Examples
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48

Insert 50 Insert 20 Insert 60 Insert 10


AVL Examples
62

Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48

Insert 8
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
63

Insert 15
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
64

Insert 46 and 11

Insert 32
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
65

Insert 48
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
66
Splay Trees
◻ A splay tree is a self-balancing binary search tree with the additional
property that recently accessed elements are quick to access again.
◻ It performs basic operations such as insertion, look-up and removal in
O(log n)
◻ Splaying :- Making the most recent node ‘ROOT NODE’.
❑ All operations on a splay tree are done with a common operation
‘SPLAYING’.
❑ Unlike AVL tree ‘SPLAY TREES’ are not strictly balanced.
❑ Easy to implement.
❑ Most widely known example is CACHEs.
❑ The operation of splaying is done using some pre defined rotation
methods.
Splay Trees
◻ Arranging the tree elements; such that the
most recently accessed node is placed as
root of the tree.
Splaying ◻ i.e. the node on which any type of action is
done is then operated on to make it the root
node.
◻ This makes it easier to access next time
when the particular node is needed.

19
20
Splay 19 20
19 21
21
1) Zig rotation :- Single right rotation.

2) Zag rotation :- single left rotation.

3) Zig - Zig rotation :- Single zig


rotation followed by another single zig
rotation.
Types
of 4) Zag – Zag rotation :- Single zag
rotation followed by another zag
rotatio rotation.
ns
5) Zig- Zag rotation :- single zig
followed by a zag rotation.

6) Zag – Zig :- single zag followed by


another zig rotation.
Zig Rotation

• A zig notation represents a single right rotation.


Zag Rotation

• A zag notation represents a single left


rotation.
Zig-Zig
Rotation
• A zig- zig notation represents a double
right rotation.
Zag-Zag
Rotation
• A zag- zag notation represents a double left
rotation.
Zig-Zag
Rotation
• A zig- zag notation represents a right
rotation followed by a left rotation.

OR Tricky way
Zag-Zig
Rotation
• A zig- zag notation represents a left
rotation followed by a right rotation.

OR Tricky way
AVL Tree Example:
• Insert 14, 17, 11, 7, 53, 4, 13 into an empty AVL
tree

14

11 17

7 53

RR Rotation
AVL Tree Example:
• Insert 14, 17, 11, 7, 53, 4, 13 into an empty AVL
tree

14

7 17

4 11 53

13
AVL Tree Example:
• Now insert 12

14

7 17

4 11 53

13

12

RL Rotation
AVL Tree Example:
• Now insert 12

14

7 17

4 11 53

12

13
AVL Tree Example:
• Now the AVL tree is balanced.

14

7 17

4 12 53

11 13
AVL Tree Example:
• Now insert 8

14

7 17

4 12 53

11 13

8 RL Rotation
AVL Tree Example:
• Now insert 8

14

7 17

4 11 53

8 12

13
AVL Tree Example:
• Now the AVL tree is balanced.

14

11 17

7 12 53

4 8 13
AVL Tree Example:
• Now remove 53

14

11 17

7 12 53

4 8 13
AVL Tree Example:
• Now remove 53, unbalanced

14

11 17

7 12

4 8 13
AVL Tree Example:
• Balanced! Remove 11

11

7 14

4 8 12 17

13
AVL Tree Example:
• Remove 11, replace it with the largest in its left branch

7 14

4 12 17

13
AVL Tree Example:
• Remove 8, unbalanced

4 14

12 17

13
AVL Tree Example:
• Remove 8, unbalanced

4 12

14

13 17
AVL Tree Example:
• Balanced!!

12

7 14

4 13 17
L4
In Class Exercises
◻ Build an AVL tree with the following values:
15, 20, 24, 10, 13, 7, 30, 36, 25
15, 20, 24, 10, 13, 7, 30, 36, 25
20

15
15 24
20
10
24

13

20 20

13 24 15 24

10 15 13

10
15, 20, 24, 10, 13, 7, 30, 36, 25

20
13

13 24 10 20

10 15 7 15 24

7 30

13 36

10 20

7 15 30

24 36
15, 20, 24, 10, 13, 7, 30, 36, 25

13 13

10 20 10 20

7 15 30 7 15 24

24 36 30

25 13 25 36

10 24

7 20 30

15 25 36
Remove 24 and 20 from the AVL tree.

13 13

10 24 10 20

7 20 30 7 15 30

15 25 36 25 36

13 13

10 30 10 15

7 15 36 7 30

25 25 36
AVL Examples
97

Construct AVL Tree for the following sequence –

40, 20, 10, 25, 30, 22, 50, 41, 42


Indexing and Multiway trees
98

Indexing Techniques
• Cylinder-Surface Indexing
• Hashed Indexing
• Tree Indexing
Search trees
• Binary Search Trees
• Height balanced trees (AVL)
• Multiway search trees (Btree, B+ Tree)
• Red Black Trees
• Splay Trees
• Trie Trees
• AA Trees
Indexing
99

◻ It is used to speed up retrieval of a node/record.


◻ Helps to recognize a particular record uniquely.
◻ The index is a collection of pairs of the form (key value, address)
◻ Advantages:
🞑 Useful in managing large data.
🞑 Any record can be easily searched.
🞑 Indexing helps to reduce unwanted memory access.
🞑 There is proper memory allocation due to indexing
◻ Techniques:
🞑 Cylinder-surface indexing – not suitable for multiple key files
🞑 Hashed Indexing – Open addressing, chaining, rehashing
🞑 Tree Indexing – AVL tree
Cylinder surface Indexing –
Simplest type of index organization on sequential files
100

◻ In sequential files, the physical


sequence of records is ordered by
the key, called the primary key.
◻ The sequentially ordered file can be
stored on a tape or a disk.
◻ Disk memory has many surfaces,
each surface having tracks.
◻ A cylinder j consists of track j on all
the surfaces.
◻ First, all tracks on cylinder 1 are
accessed, then cylinder 2, and so
on.
◻ The read/write heads are moved
one cylinder at a time.
Cylinder surface Indexing –
Example
101

◻ Let there be two surfaces and two records stored per track.
◻ The file is organized sequentially on the field ‘Emp. name’.

The surface index for cylinder 1 is


the surface highest key value:
Anand, Amol.
The surface index for cylinder 2 is
the surface highest key value:
Santosh, Shila.
Example – search Record of Rohit?
As per cylinder index; Record is either on cylinder 2 or not
in the file;
As per cylinder 2 highest Key; Record is either on surface 1
Multiway search trees
◻ A binary search tree has one value in each node and two subtrees.
◻ M-way search tree has (M-1) values per node and M subtrees.
◻ M is called the degree of the tree.
◻ Multiway search tree of order m is an ordered tree where each node has atmost m
children.
◻ If a node of M way tree holds K number of keys then it will have K+1 children.
◻ RULES-
🞑 Root node should have atleast 2 children
🞑 All leaf nodes are on the bottom level
🞑 Each node must contain atleast ceil(m/2)-1 keys
🞑 All the internal nodes except root node have atleast ceil(m/2) non-empty children.

102
Multiway search trees -
103
Example
◻ 3-way search tree:
◻ Every node MAY NOT
contain exactly (M-1)
values and exactly M
subtrees.
◻ In an M-way subtree a
node can have
anywhere from 1 to (M-
1) values
Multi-way Searching

◻ Similar to binary searching


🞑 If search key s<k1 search the
leftmost child
🞑 If s>kd-1 , search the
3
rightmost child 4
◻ That’s it in a binary tree; what
about if d>2?
🞑 Find two keys ki-1 and ki
between which s falls, and
search the child.
This leads to reduction in overall height of
tree
Example

Pointer
Key to
subtree

10 45

25 70 90 100
3 8
38
Insertion
◻ Insertion:
◻ Find the appropriate leaf. If there is only one or two
items, just add to leaf.
◻ If no room, move middle item to parent and split
remaining two items among two children.
Insertion
Insert 10 45
80

3 8 25 70 80 90
38 100
Overflow!
Split & move middle element to
parent

10 45
80

25 90 100
3 8 70
38
L5
B-Trees (Balanced Trees)

◻ All data that has been stored in the tree has been in memory.
◻ If data gets too big for main memory, what do we do?
◻ If we keep a pointer to the tree in main memory, we could bring
in just the nodes that we need.
◻ For instance, to do an INSERT with a BST, if we need the left child,
we do a disk access and retrieve the left child.
◻ If the left child is NULL, then we can do the insert, and store the
child node on the disk.
◻ Not too good for a BST.
B-Trees

◻ The problem with BST: storing the data requires disk accesses,
which is expensive, compared to execution of machine
instructions.
◻ If we can reduce the number of disk accesses, then the
procedures run faster.
◻ The only way to reduce the number of disk accesses is to
increase the number of keys in a node.
◻ The BST allows only one key per leaf.
◻ Very good and often used for Search Engines!
🞑 (when collection size gets very big 🡪 the index does not fit in memory)
B-Trees
111
(generalization of a BST in that a node can have more than 2
children)
◻ B-tree is a specialized multiway tree used to store the records in a disk; as it
REDUCES the number of disk reads.
◻ Height of the tree is relatively small.
◻ B-tree is well suited for storage systems that read and write relatively large
blocks of data.
◻ It is commonly used in databases and file systems.

RULES- (All the key values in a node must be in Ascending


order)
• Root node should have atleast 2 children.
• All leaf nodes are on the same level.
• Each node must contain atleast ceil(m/2)-1 keys and maximum m-1
keys.
• All the internal nodes except root node have atleast ceil(m/2) non-
empty children.
Insertion operation in B-
112
Trees
◻ In a B-Tree, new element must be added only at the leaf node.
◻ The new keyValue is always attached to the leaf node only.
◻ The insertion operation is performed as follows.
◻ STEPS –
🞑 Step 1 - Check whether tree is Empty.
🞑 Step 2 - If tree is Empty, then create a new node with new key value and insert it
into the tree as a root node.
🞑 Step 3 - If tree is NOT Empty, then find the suitable leaf node to which the new
key value is added using Binary Search Tree logic.
🞑 Step 4 - If that leaf node has empty position, add the new key value to that leaf
node in ascending order of key value within the node.
🞑 Step 5 - If that leaf node is already full, split that leaf node by sending middle
value to its parent node. Repeat the same until the sending value is fixed into a
node.
🞑 Step 6 - If the splitting is performed at root node then the middle value
becomes new root node for the tree and the height of the tree is increased by
one.
Example B-Trees
113

◻ Construct a B-tree of order 3 for numbers 1 to 10 ( 3 order means 2 keys and 3


children)
◻ Numbers – { 1, 2 ,3, 4, 5, 6, 7, 8, 9,10 }
114
115
Create a B Tree of order 4
Insert 5, 3, 21,9,1,13, 2,
7,10,12, 4, 8

*5*

*3*5* Insert
9

* 3 * 5 * 21
*
Treat * as pointers
Insert 1, 13

*9* a
b c
*1*3*5 * 13 * 21 *
*
Nodes b and c have room to insert more
elements
Insert
2

*3*9* a
b d c
*1*2* *5* * 13 * 21 *

Node b has no more room, so it splits creating node d.


Insert 7,
10

*3*9* a
b d c
*1*2* *5*7* * 10 * 13 * 21 *

Nodes d and c have room to add more elements


Insert
12

* 3 * 9 * 13 * a

b d c e
*1*2* *5*7* * 10 * 12 * * 21 *

Nodes c must split into nodes c and e


* 3 * 9 * 13 * a

b d c e
*1*2* *5*7* * 10 * 12 * 21 *
*

Insert 4

a
* 3 * 9 * 13 *

b d c e
*1*2* *4*5*7* * 10 * 12 * 21 *
*

Node d has room for another element


Insert
8

a
*9*

f g
*3*7* * 13 *

b d h c e
*1*2* *4*5* *8* * 10 * 12 * * 21 *

Node d must split into 2 nodes. This causes node a to split into 2 nodes and
the tree grows a level.
a
*9*

f g
*3*7* * 13 *

b d h c e
*1*2* *4*5* *8* * 10 * 12 * * 21 *

RULES- (All the key values in a node must be in Ascending


order)
• Root node should have atleast 2 children.
• All leaf nodes are on the same level.
• All the internal nodes except root node have atleast ceil(m/2) non-empty
children.
• Each node must contain atleast ceil(m/2)-1 keys and maximum m-1 keys.
• For m order tree; non-leaf node with n children must have (n-1) keys.
L6
Delete 2, 21,10,
3, 4

*9* a

f g
*3*7* * 13 *

b d h c e
*1* *4*5* *8* * 10 * 12 * * 21 *

Node b can loose an element without


underflow.
Delete 21
Delete 21

*9* a

f g
*3*7* * 12 *

b d h c e
*1* *4*5* *8* * 10 * * 13 *

Deleting 21 causes node e to underflow, so elements are


redistributed between nodes c, g, and e
Delete 10; Leaf node; underflow; Pull 12; leaf nodes at
same level
Delete 10
*3*7*9* a

b d h e
*1* *4*5* *8* * 12 * 13
*

Deleting 10 causes node c to underflow. This causes


the parent, node g to recombine with nodes f and a.
This causes the tree to shrink one level.

Delete For m order tree; non-leaf node with n children


must have (n-1) keys.
3
Pull the successor
Delete 3
*4*7*9* a

b d h e
*1* *5* *8* * 12 * 13
*

Because 3 is a pointer to nodes below it, deleting


3 requires keys to be redistributed between
nodes a and d.
For m order tree; non-leaf node with n children must
Delete have (n-1) keys.
4 Pull the successor i.e. 5; but leaf node cannot be
empty; merge 1 and 5
Delete 4

*7*9* a

b h e
*1*5 *8* * 12 * 13
* *

Deleting 4 requires a redistribution of the keys in


the subtrees of 4; however, nodes b and d do not
have enough keys to redistribute without causing
an underflow. Thus, nodes b and d must be
combined.
Insertion & Deletion Operation of
B Tree
Construct B-tree of order 5 for the given input 3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23,
12, 20, 26, 4, 16, 18, 24, 25, 19

Delete – 8, 20, 18, 5

RULES for insertion operation - (All the key values in a node must be
in Ascending order)
• Root node should have atleast 2 children.
• All leaf nodes are on the same level.
• All the internal nodes except root node have atleast ceil(m/2) non-empty children.
• Each node must contain atleast ceil(m/2)-1 keys and maximum m-1 keys.
• For m order tree; non-leaf node with n children must have (n-1) keys.
3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12, 20, 26, 4, 16, 18,
24, 25, 19

131
132
26 is inserted to the rightmost leaf node; overflow; split at 20 and move
20 as a parent node

4 is inserted to the leftmost leaf node; overflow; split at 4 and move 4 as a parent
node
Later, Insert 16, 18, 24, 25
134
From the same B-
Tree; Delete – 8,
20, 18, 5
Delete – 8, 20, 18, 5

136

Delete
20
Delete 20

137
Delete 18

138

Delete
5
Delete 5 (sibling doesn’t
have extra key)
Combining 5 with 1 & 3
139
Bring parent down
Searching B-tree
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with first key value of
root node in the tree.
Step 3 - If both are matched, then display "Given node is
found!!!" and terminate the function
Step 4 - If both are not matched, then check whether search
element is smaller or larger than that key value.
Step 5 - If search element is smaller, then continue the
search process in left subtree.
Step 6 - If search element is larger, then compare the search
element with next key value in the same node and repeate
steps 3, 4, 5 and 6 until we find the exact match or until the
search element is compared with last key value in the leaf
node.
Step 7 - If the last key value in the leaf node is also not
matched then display "Element is not found" and terminate
the function.
Searching B-tree
L7
Btree Variants

◻ B+ Tree
🞑 B-tree in which the data is stored only in the leaf nodes.
🞑 Data access is more efficient
◻ B* Tree
🞑 B-tree in which each node except root node is a least 2/3 full
rather than half full.
B+Trees
◻ A balanced tree, where each node can have atmost m key fields and
m+1 pointer fields.
◻ In a B-tree
🞑 Both keys and data are stored in the internal and leaf nodes,
🞑 Scanning require a traversal of every level in the tree.
🞑 Require more cache misses
◻ In a B+ tree
🞑 Data is stored in the leaf nodes only.
🞑 The leaf nodes are linked, so doing a full scan of all objects in a tree
requires just one linear pass through all the leaf nodes.
🞑 Require fewer cache misses in order to access data on a leaf node.
🞑 They store redundant search key.
◻ B+ trees don't have data associated with interior nodes, more keys can
fit on a page of memory.
B+Trees
◻ Construct B+ tree for {1, 3, 5, 7, 9, 2, 4, 6, 8, 10} with nodes
storing 4 pointers and 3 keys.

Split; Copy &


Link
◻ Construct B+ tree for {1, 3, 5, 7, 9, 2, 4, 6, 8, 10} with nodes storing 4 pointers and 3 keys.

Split; Copy &


Link
◻ Construct B+ tree for {1, 3, 5, 7, 9, 2, 4, 6, 8, 10} with nodes storing 4 pointers and 3 keys.
B+Trees
◻ Construct B+ tree for {4, 9, 16, 25, 1, 20, 13, 15, 10,11, 12}
◻ Insert 4,9,16,25,1
◻ Construct B+ tree for {4, 9, 16, 25, 1, 20, 13, 15, 10,11, 12}
◻ Construct B+ tree for {4, 9, 16, 25, 1, 20, 13, 15, 10,11, 12}
B+Trees

Construct B+ tree for {70, 83, 81, 75, 67, 76, 72, 84, 86, 87, 77, 82}
Trie Trees
◻ Troogle – Google auto prediction
◻ Trie is an efficient information reTrieval data structure.
Trie Trees
◻ Used for storing strings over an
alphabet.
◻ Used to store large amount of
strings.
◻ Efficient for pattern matching.
◻ They are used in building
dictionaries and spell-checking
softwares.
◻ Keys are searched using common
prefixes.
◻ It is faster and can contain a large
number of short strings.
Trie Trees
◻ Every node of Trie consists of multiple branches.
◻ Each branch represents a possible character of keys.
◻ A Trie node field isWord is used to distinguish the node as end of word
node.
Trie Trees
Trie Trees
Trie Trees
⮚ Insert node

Types of ⮚ Delete node


operatio ⮚ Find node
ns
Insert 7

Insert
Operati
on
Splay 7
• Check if tree is i.e. zag-zag
empty. rotation
• If the tree is empty
enter the new node
as the root node.
• If the tree is not
empty insert the
new node as a leaf
node by the BST
insert logic.
• After inserting slay
the new node.
Delete 4

Delete
Operati
on
Splay 4
• In splay tree delete i.e. zag-zig
operation is similar rotation
to delete operation
in BST.
• Before deleting the
element we need to
splay that node. Delete 4 and
join the tree
• Then delete it form
again
the root location and
then join the entire
tree again.
• Similar to the search
operation in BST.
Search • Locate the node
Operati required.
on • The splay the
required node.
ExampleConsider an empty splay tree and insert 0, 2, 4, 6, 8,
13, 11

Insert 0 Insert 2
2
0 0 Zag
Rotation

0
2
Insert 4 4
2
Zag 2
Rotation

0 4
0
Exampl Consider an empty splay tree and insert 0, 2, 4, 6, 8,
e 13, 11
4
4
2
2 Insert 6 6 Zag
Rotation

0
0

Insert 8
Zag
Rotation
Exampl Consider an empty splay tree and insert 0, 2, 4, 6, 8,
e 13, 11

Insert
13
Zag
Rotation

Insert
11 Zag-Zig
Rotation
ExampleConsider an empty splay tree and insert 0, 2, 4, 6, 8,
13, 11

Zag-Zig
Rotation
Red Black Trees
◻ Red - Black Tree is another variant of self-balancing Binary Search Tree in
which every node is colored either RED or BLACK.
◻ Used in process scheduling in Linux kernels, databases, dictionaries, and web
searching.
◻ Properties of Red Black Tree
🞑 Tree Property: Red - Black Tree must be a Binary Search Tree.
🞑 Root Property: The ROOT node must be colored BLACK.
🞑 Internal Property: The children of Red node are BLACK.
🞑 External Property: Every leaf must be colored BLACK.
🞑 Path Property : In all the paths of the tree, there should be same number
of BLACK nodes.
🞑 Depth Property: Every leaf node has same BLACK depth.
🞑 Addition Property: Every new node must be inserted with RED color.
Example Red Black Trees

2
6
4
17
1
NI
NIL 3
L 47
0
3 5
NIL 8 NIL 0
NI NI
NIL NIL
L L
Recognize Red Black Trees
L9
Red Black Trees -
Operations
◻ Insertion
◻ Deletion
◻ Searching

◻ Performance of all operations – O(log n)


🞑Because of balanced BST
172
Red Black Trees - Insertion
◻ Every new node must be inserted with color RED.
◻ The insertion operation in Red Black Tree is similar to insertion
operation in BST with color property.
◻ After every insertion operation,
🞑 Check all the properties of Red Black Tree.
🞑 If tree is balanced; go to next operation
🞑 Otherwise, perform following operations to make it Red Black
Tree.
■ Recolor
■ Rotation
Red Black Trees - Insertion
Steps for insertion operation-
1. Check whether tree is Empty.
a) If tree is Empty then insert the newNode as Root node with color
Black and stop.
b) If tree is not Empty then insert the newNode as leaf node with color
Red.
2. If the parent of newNode is Black then exit from the operation.
3. If the parent of newNode is Red then check the color of
parentnode's sibling (uncle) of newNode.
🞑 If uncle is colored Black or NULL then make suitable Rotation and
Recolor it.
🞑 If uncle is colored Red then perform Recolor.
◻ Repeat the same until tree becomes Red Black Tree.
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED
node
Uncle is RED

NewNode is x
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED node
Uncle is BLACK - Similar to AVL Rotations – LL, RR, LR, RL

RR
rotatio
n

NewNode is
x
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED node
Uncle is BLACK - Similar to AVL Rotations – LL, RR, LR, RL and
RECOLOR

LR
rotatio
n

NewNode is
x
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED node
Uncle is BLACK - Similar to AVL Rotations – LL, RR, LR, RL and
RECOLOR

LL
rotatio
n

NewNode is
x
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED node
Uncle is BLACK - Similar to AVL Rotations – LL, RR, LR, RL and
RECOLOR

RL
rotatio
n

NewNode is
x
Red Black Trees - Example
Create RED BLACK
8, 18, 5, 15, 17, 25, 40, 80
K – DIMENSIONAL TREE
◻ kd-Trees
◻ • Invented in 1970s by Jon Bentley
◻ • Name originally meant “3d-trees, 4d-trees, etc”
◻ where k was the # of dimensions
◻ • Now, people say “kd-tree of dimension d”
◻ • Idea: Each level of the tree compares against 1
◻ dimension.
◻ • Let’s us have only two children at each node
◻ (instead of 2d)
◻ Each level has a “cutting
dimension”
◻ Cycle through the
dimensions as you walk
down the tree.
◻ Each node contains a point
P = (x,y)
◻ To find (x’,y’) you only
compare coordinate from
the cutting dimension
◻ kd-tree example
insert: (30,40), (5,25), (10,12), (70,70), (50,30), (35,45)
◻ Insert (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
◻ Draw K-D tree
(5,8) (18,14) (12,14) (8,11) (10,2) (3,6) (11,20)
Heap Sort
• The drawbacks of the binary tree sort are remedied by the heapsort, an in-place sort
that requires only O(n log n) operations regardless of the order of the input.
• Define a descending heap (also called a max heap or a descending partially ordered tree)
of size n as an almost complete binary tree of n nodes such that the content of each
node is less than or equal to the content of its father.
• It is clear from this definition of a descending heap that the root of the tree (Or the first
element of the array) contains the largest element in the heap. Also note that any path
from the root to a leaf (or indeed, any path in the tree that includes no more than one
node at any level) is an ordered list in descending order, it is also possible to define an
ascending heap (or a nun heap) as an almost complete binary tree such that the content
of each node is greater than or equal to the content of its father.
• In an ascending heap the root contains the smallest element of the heap, and any path
from the root to a leaf is an ascending ordered list..
• A heap allows a very efficient implementation of a priority queue.
• Although priority queue insertion using a binary search tree could require only as few
as 1092 n node accesses.
• It could require as many as it accesses if the tree is unbalanced. Thus a selection sort
using a binary search tree could also require 0(n2 ) operations, although on the
average only O(n log n) are needed.
• A heap allows both insertion and deletion to be implemented in O(log n) operations.
Thus a selection sort consisting of ii insertions and,, deletions can be implemented
using a heap in O(n log n) operations, even in the worst case. An additional bonus is
that the heap itself can be implemented within the input array x using the sequential
implementation of an almost complete binary tree. The only additional space
required is for program variables. The heapsort is, therefore, an O(n log n) inplace
sort.
Heap as a Priority Queue
Dpq=array, q=size of heap,
Sorting Using a Heap
creation of a heap of size 8 from the original file
25 57 48 37 12 92 86 33
Illustrates the adjustment of the heap as x[O] is repeatedly selected
and placed into its proper position in the array and the heap is
readjusted, until all the heap elements are processed. Note that after
an element has been “deleted" from the
heap, it remains in the array; it is merely ignored in subsequent
processing.

You might also like