Unit -4 Search Trees
Unit -4 Search Trees
Unit - IV
SEARCH TREES, INDEXING AND MULTIWAY TREES
Prof. Neha V. Chaube
AY 2022-2023 SEM-II
L1
Unit IV - Syllabus
1. create() : symbol_table
2. destroy() : symbol_table
3. enter (name, value) : pointer to an entry
4. find (name, table) : pointer to an entry
5. set_attributes (*entry, attributes)
6. get_attributes (*entry) : attributes
4
Symbol Table – Implementation
techniques
◻ Basic Operations : enter() and find()
◻ Considerations:
🞑 Number of names
🞑 Storage space
🞑 Retrieval time
◻ Organizations:
🞑 Unordered List - (linked list/array)
🞑 Ordered List – Binary search on arrays
🞑 Binary Search Trees
🞑 Hash Tables – Most common approach (constant time)
5
Symbol table presentation using List
• Array is used to store names and associated information
• A pointer “available” is maintained at end of all stored records and new
names are added in the order as they arrive
• To search for a name we start from the beginning of the list till available
pointer and if not found we get an error “use of the undeclared name”
• While inserting a new name we must ensure that it is not already present
otherwise an error occurs i.e. “Multiple defined names”
• Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
• The advantage is that it takes a minimum amount of space.
Symbol table presentation using
Linked List
• All names are created as child of the root node that always follows
the property of the binary search tree.
10
Use of symbol table
◻ A data structure used by a language translator such as a compiler or
interpreter.
◻ Where, each symbol in a program's source code is associated with
information relating to its –
🞑 Declaration
🞑 Scope or appearance in the source
◻ Store information about the occurrence of various entities such as
variable names, function names, objects, classes, interfaces, etc.
Source Target
COMPILER
Program Program
Errors
11
12
13
14
Lexical Analyzer
15
newval identifier
= assignment operator
oldval identifier
+ add operator
12 a number
Syntax Analyzer
16
◻ Ex:
MULT id2,id3,temp1
ADD temp1,#1,id1
Code Generator
21
◻ Ex:
MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1
Symbol tables
22
Node # comparisons
A 3 Node # comparisons
B 2 A, C 2
C 1 B 1
Optimal
Notations:
◻ Tij = OBST (ai+1, …., aj)
◻ Cij = Cost of Tij
◻ Wij = Weight of each of Tij
◻ rij = Root values of Tij
Formulae:
Optimal Binary Search Tree
Dynamic Programming Approach
35
Example:
◻ Consider n = 4 and keys = (do, if, int, while); p(1:4) = (3,3,1,1)
and q(0:4) = (2,3,1,1,1). Construct the OBST.
Formulae:
Optimal Binary Search Tree
Dynamic Programming Approach
36
j-1
Optimal Binary Search
Tree
Dynamic Programming Approach
37
j-1
Calculated Wij
Optimal Binary Search
Tree
Computer C and r
40
▪ Keys = (do, if, int, while); p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1).
j-1
Optimal Binary Search
Tree
Computer C and r
41
C00 = C11 = C22 = C33 = C44 = 0 C02, C03, C04 ; C12, C14 ; C24
r00 = r11 = r22 = r33 = r44 = 0
Cij; where, j-i = 1
C12 = 7 ; r12 = 2 C34 = 3; r34 = 4 C02 = C00 + C12 C02 = C01 + C22
=0+7=7 =8+0=8
C02, C03, C04 ; C12, C14 ; C24 C02 = min { 7, 8 } + 12 = 19
r02 = 1 ( as we get min value with k =1)
Optimal Binary Search
Tree
Computer C and r
43
▪ Keys = (do, if, int, while); p(1:4) = (3,3,1,1) & q(0:4) = (2,3,1,1,1).
k is between 0 & 3
i=0; j = 3; k = 1 i=0; j = 3; k = 2 i=0; j = 3; k = 3
if
do int
whil
e
Balanced binary tree
◻ The disadvantage of a BST - its height can be as large as N-1
◻ The time needed to perform insertion and deletion and many
other operations can be O(N) in the worst case.
◻ We want a tree with small height.
◻ Goal is to keep the height of a binary search tree O(log N)
◻ Such trees are called balanced binary search trees.
◻ Examples
🞑 AVL tree
🞑 Red-black tree
AVL - First dynamic balanced trees
Definition-
▪ An AVL tree is a balanced binary search tree.
▪ In an AVL tree, balance factor of every node is either -1, 0 or +1.
▪ Every subtree is an AVL tree.
STEPS-
◻ Insert the element in the AVL tree in the same way the insertion is
performed in BST.
◻ After insertion, check the balance factor of each node of the resulting
tree.
CASES
Balance factor of each node IS {- Balance factor of each node is NOT {-
1,0,1} 1,0,1}
The tree is considered to be balanced. The tree is considered to be imbalanced.
Single Rotations
• Left Rotation (LL Rotation)
• Right Rotation (RR Rotation)
Double Rotations
15 Add 16
5 20
5 20
17 25
17 25
16
Balanced Tree The pattern of adding 16 is RLL
But consider only 3 nodes for
rotation including the unbalanced
node.
60
◻ AVL Animation
https://www.cs.usfca.edu/~galles/visualization/AVLtree.html
AVL Examples
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
Insert 8
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
63
Insert 15
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
64
Insert 46 and 11
Insert 32
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
65
Insert 48
Construct AVL Tree for the following sequence - 50, 20, 60, 10, 8,
15, 32, 46, 11, 48
66
Splay Trees
◻ A splay tree is a self-balancing binary search tree with the additional
property that recently accessed elements are quick to access again.
◻ It performs basic operations such as insertion, look-up and removal in
O(log n)
◻ Splaying :- Making the most recent node ‘ROOT NODE’.
❑ All operations on a splay tree are done with a common operation
‘SPLAYING’.
❑ Unlike AVL tree ‘SPLAY TREES’ are not strictly balanced.
❑ Easy to implement.
❑ Most widely known example is CACHEs.
❑ The operation of splaying is done using some pre defined rotation
methods.
Splay Trees
◻ Arranging the tree elements; such that the
most recently accessed node is placed as
root of the tree.
Splaying ◻ i.e. the node on which any type of action is
done is then operated on to make it the root
node.
◻ This makes it easier to access next time
when the particular node is needed.
19
20
Splay 19 20
19 21
21
1) Zig rotation :- Single right rotation.
OR Tricky way
Zag-Zig
Rotation
• A zig- zag notation represents a left
rotation followed by a right rotation.
OR Tricky way
AVL Tree Example:
• Insert 14, 17, 11, 7, 53, 4, 13 into an empty AVL
tree
14
11 17
7 53
RR Rotation
AVL Tree Example:
• Insert 14, 17, 11, 7, 53, 4, 13 into an empty AVL
tree
14
7 17
4 11 53
13
AVL Tree Example:
• Now insert 12
14
7 17
4 11 53
13
12
RL Rotation
AVL Tree Example:
• Now insert 12
14
7 17
4 11 53
12
13
AVL Tree Example:
• Now the AVL tree is balanced.
14
7 17
4 12 53
11 13
AVL Tree Example:
• Now insert 8
14
7 17
4 12 53
11 13
8 RL Rotation
AVL Tree Example:
• Now insert 8
14
7 17
4 11 53
8 12
13
AVL Tree Example:
• Now the AVL tree is balanced.
14
11 17
7 12 53
4 8 13
AVL Tree Example:
• Now remove 53
14
11 17
7 12 53
4 8 13
AVL Tree Example:
• Now remove 53, unbalanced
14
11 17
7 12
4 8 13
AVL Tree Example:
• Balanced! Remove 11
11
7 14
4 8 12 17
13
AVL Tree Example:
• Remove 11, replace it with the largest in its left branch
7 14
4 12 17
13
AVL Tree Example:
• Remove 8, unbalanced
4 14
12 17
13
AVL Tree Example:
• Remove 8, unbalanced
4 12
14
13 17
AVL Tree Example:
• Balanced!!
12
7 14
4 13 17
L4
In Class Exercises
◻ Build an AVL tree with the following values:
15, 20, 24, 10, 13, 7, 30, 36, 25
15, 20, 24, 10, 13, 7, 30, 36, 25
20
15
15 24
20
10
24
13
20 20
13 24 15 24
10 15 13
10
15, 20, 24, 10, 13, 7, 30, 36, 25
20
13
13 24 10 20
10 15 7 15 24
7 30
13 36
10 20
7 15 30
24 36
15, 20, 24, 10, 13, 7, 30, 36, 25
13 13
10 20 10 20
7 15 30 7 15 24
24 36 30
25 13 25 36
10 24
7 20 30
15 25 36
Remove 24 and 20 from the AVL tree.
13 13
10 24 10 20
7 20 30 7 15 30
15 25 36 25 36
13 13
10 30 10 15
7 15 36 7 30
25 25 36
AVL Examples
97
Indexing Techniques
• Cylinder-Surface Indexing
• Hashed Indexing
• Tree Indexing
Search trees
• Binary Search Trees
• Height balanced trees (AVL)
• Multiway search trees (Btree, B+ Tree)
• Red Black Trees
• Splay Trees
• Trie Trees
• AA Trees
Indexing
99
◻ Let there be two surfaces and two records stored per track.
◻ The file is organized sequentially on the field ‘Emp. name’.
102
Multiway search trees -
103
Example
◻ 3-way search tree:
◻ Every node MAY NOT
contain exactly (M-1)
values and exactly M
subtrees.
◻ In an M-way subtree a
node can have
anywhere from 1 to (M-
1) values
Multi-way Searching
Pointer
Key to
subtree
10 45
25 70 90 100
3 8
38
Insertion
◻ Insertion:
◻ Find the appropriate leaf. If there is only one or two
items, just add to leaf.
◻ If no room, move middle item to parent and split
remaining two items among two children.
Insertion
Insert 10 45
80
3 8 25 70 80 90
38 100
Overflow!
Split & move middle element to
parent
10 45
80
25 90 100
3 8 70
38
L5
B-Trees (Balanced Trees)
◻ All data that has been stored in the tree has been in memory.
◻ If data gets too big for main memory, what do we do?
◻ If we keep a pointer to the tree in main memory, we could bring
in just the nodes that we need.
◻ For instance, to do an INSERT with a BST, if we need the left child,
we do a disk access and retrieve the left child.
◻ If the left child is NULL, then we can do the insert, and store the
child node on the disk.
◻ Not too good for a BST.
B-Trees
◻ The problem with BST: storing the data requires disk accesses,
which is expensive, compared to execution of machine
instructions.
◻ If we can reduce the number of disk accesses, then the
procedures run faster.
◻ The only way to reduce the number of disk accesses is to
increase the number of keys in a node.
◻ The BST allows only one key per leaf.
◻ Very good and often used for Search Engines!
🞑 (when collection size gets very big 🡪 the index does not fit in memory)
B-Trees
111
(generalization of a BST in that a node can have more than 2
children)
◻ B-tree is a specialized multiway tree used to store the records in a disk; as it
REDUCES the number of disk reads.
◻ Height of the tree is relatively small.
◻ B-tree is well suited for storage systems that read and write relatively large
blocks of data.
◻ It is commonly used in databases and file systems.
*5*
*3*5* Insert
9
* 3 * 5 * 21
*
Treat * as pointers
Insert 1, 13
*9* a
b c
*1*3*5 * 13 * 21 *
*
Nodes b and c have room to insert more
elements
Insert
2
*3*9* a
b d c
*1*2* *5* * 13 * 21 *
*3*9* a
b d c
*1*2* *5*7* * 10 * 13 * 21 *
* 3 * 9 * 13 * a
b d c e
*1*2* *5*7* * 10 * 12 * * 21 *
b d c e
*1*2* *5*7* * 10 * 12 * 21 *
*
Insert 4
a
* 3 * 9 * 13 *
b d c e
*1*2* *4*5*7* * 10 * 12 * 21 *
*
a
*9*
f g
*3*7* * 13 *
b d h c e
*1*2* *4*5* *8* * 10 * 12 * * 21 *
Node d must split into 2 nodes. This causes node a to split into 2 nodes and
the tree grows a level.
a
*9*
f g
*3*7* * 13 *
b d h c e
*1*2* *4*5* *8* * 10 * 12 * * 21 *
*9* a
f g
*3*7* * 13 *
b d h c e
*1* *4*5* *8* * 10 * 12 * * 21 *
*9* a
f g
*3*7* * 12 *
b d h c e
*1* *4*5* *8* * 10 * * 13 *
b d h e
*1* *4*5* *8* * 12 * 13
*
b d h e
*1* *5* *8* * 12 * 13
*
*7*9* a
b h e
*1*5 *8* * 12 * 13
* *
RULES for insertion operation - (All the key values in a node must be
in Ascending order)
• Root node should have atleast 2 children.
• All leaf nodes are on the same level.
• All the internal nodes except root node have atleast ceil(m/2) non-empty children.
• Each node must contain atleast ceil(m/2)-1 keys and maximum m-1 keys.
• For m order tree; non-leaf node with n children must have (n-1) keys.
3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12, 20, 26, 4, 16, 18,
24, 25, 19
131
132
26 is inserted to the rightmost leaf node; overflow; split at 20 and move
20 as a parent node
4 is inserted to the leftmost leaf node; overflow; split at 4 and move 4 as a parent
node
Later, Insert 16, 18, 24, 25
134
From the same B-
Tree; Delete – 8,
20, 18, 5
Delete – 8, 20, 18, 5
136
Delete
20
Delete 20
137
Delete 18
138
Delete
5
Delete 5 (sibling doesn’t
have extra key)
Combining 5 with 1 & 3
139
Bring parent down
Searching B-tree
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with first key value of
root node in the tree.
Step 3 - If both are matched, then display "Given node is
found!!!" and terminate the function
Step 4 - If both are not matched, then check whether search
element is smaller or larger than that key value.
Step 5 - If search element is smaller, then continue the
search process in left subtree.
Step 6 - If search element is larger, then compare the search
element with next key value in the same node and repeate
steps 3, 4, 5 and 6 until we find the exact match or until the
search element is compared with last key value in the leaf
node.
Step 7 - If the last key value in the leaf node is also not
matched then display "Element is not found" and terminate
the function.
Searching B-tree
L7
Btree Variants
◻ B+ Tree
🞑 B-tree in which the data is stored only in the leaf nodes.
🞑 Data access is more efficient
◻ B* Tree
🞑 B-tree in which each node except root node is a least 2/3 full
rather than half full.
B+Trees
◻ A balanced tree, where each node can have atmost m key fields and
m+1 pointer fields.
◻ In a B-tree
🞑 Both keys and data are stored in the internal and leaf nodes,
🞑 Scanning require a traversal of every level in the tree.
🞑 Require more cache misses
◻ In a B+ tree
🞑 Data is stored in the leaf nodes only.
🞑 The leaf nodes are linked, so doing a full scan of all objects in a tree
requires just one linear pass through all the leaf nodes.
🞑 Require fewer cache misses in order to access data on a leaf node.
🞑 They store redundant search key.
◻ B+ trees don't have data associated with interior nodes, more keys can
fit on a page of memory.
B+Trees
◻ Construct B+ tree for {1, 3, 5, 7, 9, 2, 4, 6, 8, 10} with nodes
storing 4 pointers and 3 keys.
Construct B+ tree for {70, 83, 81, 75, 67, 76, 72, 84, 86, 87, 77, 82}
Trie Trees
◻ Troogle – Google auto prediction
◻ Trie is an efficient information reTrieval data structure.
Trie Trees
◻ Used for storing strings over an
alphabet.
◻ Used to store large amount of
strings.
◻ Efficient for pattern matching.
◻ They are used in building
dictionaries and spell-checking
softwares.
◻ Keys are searched using common
prefixes.
◻ It is faster and can contain a large
number of short strings.
Trie Trees
◻ Every node of Trie consists of multiple branches.
◻ Each branch represents a possible character of keys.
◻ A Trie node field isWord is used to distinguish the node as end of word
node.
Trie Trees
Trie Trees
Trie Trees
⮚ Insert node
Insert
Operati
on
Splay 7
• Check if tree is i.e. zag-zag
empty. rotation
• If the tree is empty
enter the new node
as the root node.
• If the tree is not
empty insert the
new node as a leaf
node by the BST
insert logic.
• After inserting slay
the new node.
Delete 4
Delete
Operati
on
Splay 4
• In splay tree delete i.e. zag-zig
operation is similar rotation
to delete operation
in BST.
• Before deleting the
element we need to
splay that node. Delete 4 and
join the tree
• Then delete it form
again
the root location and
then join the entire
tree again.
• Similar to the search
operation in BST.
Search • Locate the node
Operati required.
on • The splay the
required node.
ExampleConsider an empty splay tree and insert 0, 2, 4, 6, 8,
13, 11
Insert 0 Insert 2
2
0 0 Zag
Rotation
0
2
Insert 4 4
2
Zag 2
Rotation
0 4
0
Exampl Consider an empty splay tree and insert 0, 2, 4, 6, 8,
e 13, 11
4
4
2
2 Insert 6 6 Zag
Rotation
0
0
Insert 8
Zag
Rotation
Exampl Consider an empty splay tree and insert 0, 2, 4, 6, 8,
e 13, 11
Insert
13
Zag
Rotation
Insert
11 Zag-Zig
Rotation
ExampleConsider an empty splay tree and insert 0, 2, 4, 6, 8,
13, 11
Zag-Zig
Rotation
Red Black Trees
◻ Red - Black Tree is another variant of self-balancing Binary Search Tree in
which every node is colored either RED or BLACK.
◻ Used in process scheduling in Linux kernels, databases, dictionaries, and web
searching.
◻ Properties of Red Black Tree
🞑 Tree Property: Red - Black Tree must be a Binary Search Tree.
🞑 Root Property: The ROOT node must be colored BLACK.
🞑 Internal Property: The children of Red node are BLACK.
🞑 External Property: Every leaf must be colored BLACK.
🞑 Path Property : In all the paths of the tree, there should be same number
of BLACK nodes.
🞑 Depth Property: Every leaf node has same BLACK depth.
🞑 Addition Property: Every new node must be inserted with RED color.
Example Red Black Trees
2
6
4
17
1
NI
NIL 3
L 47
0
3 5
NIL 8 NIL 0
NI NI
NIL NIL
L L
Recognize Red Black Trees
L9
Red Black Trees -
Operations
◻ Insertion
◻ Deletion
◻ Searching
NewNode is x
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED node
Uncle is BLACK - Similar to AVL Rotations – LL, RR, LR, RL
RR
rotatio
n
NewNode is
x
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED node
Uncle is BLACK - Similar to AVL Rotations – LL, RR, LR, RL and
RECOLOR
LR
rotatio
n
NewNode is
x
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED node
Uncle is BLACK - Similar to AVL Rotations – LL, RR, LR, RL and
RECOLOR
LL
rotatio
n
NewNode is
x
Red Black Trees - Insertion
Parent of NewNode is RED; RED-RED node
Uncle is BLACK - Similar to AVL Rotations – LL, RR, LR, RL and
RECOLOR
RL
rotatio
n
NewNode is
x
Red Black Trees - Example
Create RED BLACK
8, 18, 5, 15, 17, 25, 40, 80
K – DIMENSIONAL TREE
◻ kd-Trees
◻ • Invented in 1970s by Jon Bentley
◻ • Name originally meant “3d-trees, 4d-trees, etc”
◻ where k was the # of dimensions
◻ • Now, people say “kd-tree of dimension d”
◻ • Idea: Each level of the tree compares against 1
◻ dimension.
◻ • Let’s us have only two children at each node
◻ (instead of 2d)
◻ Each level has a “cutting
dimension”
◻ Cycle through the
dimensions as you walk
down the tree.
◻ Each node contains a point
P = (x,y)
◻ To find (x’,y’) you only
compare coordinate from
the cutting dimension
◻ kd-tree example
insert: (30,40), (5,25), (10,12), (70,70), (50,30), (35,45)
◻ Insert (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
◻ Draw K-D tree
(5,8) (18,14) (12,14) (8,11) (10,2) (3,6) (11,20)
Heap Sort
• The drawbacks of the binary tree sort are remedied by the heapsort, an in-place sort
that requires only O(n log n) operations regardless of the order of the input.
• Define a descending heap (also called a max heap or a descending partially ordered tree)
of size n as an almost complete binary tree of n nodes such that the content of each
node is less than or equal to the content of its father.
• It is clear from this definition of a descending heap that the root of the tree (Or the first
element of the array) contains the largest element in the heap. Also note that any path
from the root to a leaf (or indeed, any path in the tree that includes no more than one
node at any level) is an ordered list in descending order, it is also possible to define an
ascending heap (or a nun heap) as an almost complete binary tree such that the content
of each node is greater than or equal to the content of its father.
• In an ascending heap the root contains the smallest element of the heap, and any path
from the root to a leaf is an ascending ordered list..
• A heap allows a very efficient implementation of a priority queue.
• Although priority queue insertion using a binary search tree could require only as few
as 1092 n node accesses.
• It could require as many as it accesses if the tree is unbalanced. Thus a selection sort
using a binary search tree could also require 0(n2 ) operations, although on the
average only O(n log n) are needed.
• A heap allows both insertion and deletion to be implemented in O(log n) operations.
Thus a selection sort consisting of ii insertions and,, deletions can be implemented
using a heap in O(n log n) operations, even in the worst case. An additional bonus is
that the heap itself can be implemented within the input array x using the sequential
implementation of an almost complete binary tree. The only additional space
required is for program variables. The heapsort is, therefore, an O(n log n) inplace
sort.
Heap as a Priority Queue
Dpq=array, q=size of heap,
Sorting Using a Heap
creation of a heap of size 8 from the original file
25 57 48 37 12 92 86 33
Illustrates the adjustment of the heap as x[O] is repeatedly selected
and placed into its proper position in the array and the heap is
readjusted, until all the heap elements are processed. Note that after
an element has been “deleted" from the
heap, it remains in the array; it is merely ignored in subsequent
processing.