MODULE 5 GraphHashing
MODULE 5 GraphHashing
(21CS32)
Dr. Hemavathi P
Associate Professor
Dept. of CSE
BIT
Course Learning Objectives
1. Explain the fundamentals of data structures and their applications essential for
implementing solutions to problems.
4. Explore the applications of trees and graphs to model and solve the real-world
problem.
5. Make use of Hashing techniques and resolve collisions during mapping of key
value pairs
Text Books
1. Ellis Horowitz and Sartaj Sahni, Fundamentals of
Data Structures in C, 2nd Ed, Universities Press, 2014.
Textbook 1: Chapter 10:10.2, 10.3, 10.4, Textbook 2:7.10 – 7.12, 7.15 Chapter
11: 11.2, Textbook 1: Chapter 6 : 6.1–6.2, Chapter 8 : 8.1-8.3,
Textbook 2: 8.1 – 8.3, 8.5, 8.7
Textbook 3: Chapter 15:15.1, 15.2,15.3, 15.4,15.5 and 15.7
Introduction
To
Trees 2
Graphs
• A graph G is defined as follows:
G=(V,E)
V(G): a finite, nonempty set of vertices
E(G): a set of edges (pairs of vertices)
Types of Graph
• Directed Graph
• Undirected Graph
• Weighted Graph
10
undirected graphs
• When the edges in a graph have no direction, the
graph is called undirected
Directed graph
• When the edges in a graph have a direction, the
graph is called directed (or digraph)
0 2 1 3
1 2
self edge multigraph:
(a) (b) multiple occurrences
of the same edge
Figure 6.3
17
Examples for Graph
0 0 0
1 2 1 2
1
3
3 4 5 6
G1 2
G2
complete graph incomplete graph G3
V(G1)={0,1,2,3} E(G1)={(0,1),(0,2),(0,3),(1,2),(1,3),(2,3)}
V(G2)={0,1,2,3,4,5,6} E(G2)={(0,1),(0,2),(1,3),(1,4),(2,5),(2,6)}
V(G3)={0,1,2} E(G3)={<0,1>,<1,0>,<1,2>}
complete undirected graph: n(n-1)/2 edges
complete directed graph: n(n-1) edges
18
Graph terminology (cont.)
• What is the number of edges in a complete
directed graph with N vertices?
N * (N-1)
Graph terminology (cont.)
• What is the number of edges in a complete
undirected graph with N vertices?
N * (N-1) / 2
Subgraph and Path
A subgraph of G is a graph G’ such that V(G’)
is a subset of V(G) and E(G’) is a subset of
E(G)
CHAPTER 6 21
Figure 6.4: subgraphs of G1 and G3 (p.261)
0 0 0 1 2 0
1 2 1 2 3 1 2
3
3
G1 (i) (ii) (iii) (iv)
(a) Some of the subgraph of G1
0 0 0 0
0
1 1 1
1
2 2
(i) (ii) (iii) (iv)
2 (b) Some of the subgraph of G3
G3 22
Degree
The degree of a vertex is the number of edges
incident to that vertex
For directed graph,
– in-degree of a vertex v is the number of edges
coming(incident) into v
– out-degree of a vertex v is the number of edges
going out of v
Examples for Adjacency Matrix
0 0 4
0
2 1 5
1 2
3 6
3 1
0 1 1 1 0 1 0
1 0 1 1 7
1 0 1
2 0 1 1 0 0 0 0 0
1 1 0 1 0 0 0 1
0 0 1 0 0 0 0
1 1 1 0
1 0 0 1 0 0 0 0
G2
G1
0 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 1 0 1 0
symmetric 0 0 0 0 0 1 0 1
undirected: n2/2 0 0 0 0 0 0 1 0
directed: n2 CHAPTER 6
G4 28
29
Data Structures for Adjacency Lists
Each row in adjacency matrix is represented as an adjacency list.
#define MAX_VERTICES 50
node_pointer graph[MAX_VERTICES];
CHAPTER 6 30
0 0 4
2 1 5
1 2 3 6
3 7
0 1 2 3 0 1 2
1 0 2 3 1 0 3
2 0 1 3 2 0 3
3 0 1 2 3 1 2
G1 0 4 5
5 4 6
0 1 6 5 7
1 0 2 1
7 6
2
G3 G4
2
31
An undirected graph with n vertices and e edges ==> n head nodes and 2e list nodes
Alternate order adjacency list for G1
Order is of no significance.
headnodes vertex link
0 3 1 2 NULL
1 2 0 3 NULL
2 3 0 1 NULL
3 2 1 0 NULL
1 2
3 32
33
0
1 2
3
0
1
2
3
G1
tail head column link for head row link for tail
Figure 6.12: Orthogonal representation for graph G3(p.268)
0 1 2
0 0 1 NULL NULL
0
2 NULL
0 1 0
1 0 1
0 0 0
1
2
Adjacency Multilists
An edge in an undirected graph is represented by two
nodes in adjacency list representation.
Adjacency Multilists
– lists in which nodes may be shared among several lists.
(an edge is shared by two different paths)
(1,0)
0 N1 0 1 N2 N4 edge (0,1)
1 (2,0)
2 N2 0 2 N3 N4 edge (0,2)
(3,0)
3 0 3 N5
N3 (2,1)
edge (0,3)
0 N4 1 2 N5 N6 edge (1,2)
(3,1)
N5 1 3 N6 edge (1,3)
1 2 (3,2)
N6 2 3 edge (2,3)
3
six edges
Some Graph Operations
Traversal
Given G=(V,E) and vertex v, find all wV,
such that w connects v.
– Depth First Search (DFS)
preorder tree traversal
– Breadth First Search (BFS)
level order tree traversal
Depth First Search
A standard DFS implementation puts each vertex of the graph into one of
two categories:
1.Visited
2.Not Visited
• The purpose of the algorithm is to mark each vertex as visited while
avoiding cycles.
The DFS algorithm works as follows:
1.Start by putting any one of the graph's vertices on top of a stack.
2.Take the top item of the stack and add it to the visited list.
3.Create a list of that vertex's adjacent nodes. Add the ones which aren't in
the visited list to the top of the stack.
4.Keep repeating steps 2 and 3 until the stack is empty.
Example: Undirected graph with 5 vertices
We start from vertex 0, the DFS algorithm starts by putting it in the Visited list and
putting all its adjacent vertices in the stack.
Next, we visit the element at the top of stack i.e. 1 and go to its adjacent nodes. Since 0 has already
been visited, we visit 2 instead.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the stack and visit it.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the stack and visit it.
After we visit the last element 3, it doesn't have any unvisited adjacent nodes, so we have completed
the Depth First Traversal of the graph.
DFS using adjacency matrix
• Create a recursive function that takes the index of the node
and a visited array.
• Mark the current node as visited and print the node.
• Traverse all the adjacent and unmarked nodes and call the
recursive function with the index of the adjacent node.
DFS using adjacency matrix
void DFS(int v)
{
int w;
visited[v]=1;
printf(“ %d”,v);
for(w=1;w<=n;w++)
{ if(visited[w]==0 && a[v][w]==1)
{ // printf("%d->",w);
DFS(w);
}
}
printf("\n");
DFS using adjacency matrix
void DFS(int v)
{
int w;
visited[v]=1;
printf(“ %d”,v);
for(w=1;w<=n;w++)
{ if(visited[w]==0 && a[v][w]==1)
{ // printf("%d->",w);
DFS(w);
}
}
printf("\n");
*Figure 6.19:Graph G and its adjacency lists (p.274)
depth first search: v0, v1, v3, v7, v4, v5, v2, v6
breadth first search: v0, v1, v2, v3, v4, v5, v6, v7
Depth First Search
#define FALSE 0
#define TRUE 1
short int visited[MAX_VERTICES];
void dfs(int v)
{
node_pointer w;
visited[v]= 1;
printf(“%5d”, v);
for (w=graph[v]; w; w=w->link)
if (!visited[w->vertex])
dfs(w->vertex);
}
Depth First Search
Complexity of Depth First Search
void bfs(int v)
{
node_pointer w;
queue_pointer front, rear;
front = 0; rear = -1;
Breadth First Search (Continued)
printf(“%5d”, v);
visited[v] = 1;
q[++r]=v;
while (front<=rear)
{
v= q[front++];
for (w=graph[v]; w; w=w->link)
if (!visited[w->vertex])
{
printf(“%5d”, w->vertex);
q[++r]=w->vertex;
visited[w->vertex] = 1;
}
}
}
Breadth First Search
BFS Algorithm Complexity
The time complexity of the BFS algorithm is represented in the form of O(V +
E), where V is the number of nodes and E is the number of edges.
• Linear Search takes O(n) time to perform the search in unsorted arrays consisting of n
elements.
• Binary Search takes O(logn) time to perform the search in sorted arrays consisting of n
elements.
• It takes O(logn) time to perform the search in Binary Search Tree consisting of n elements.
s slots
0 1 s-1
0 . . .
1
b buckets
. . .
. . .
. . .
b-1 . . .
Properties of a good hash function
1. Low cost
• Execution cost and searching cost should be less
2. Determinism
• Hash procedure must be deterministic i.e same hash value must be generated
for a given input value excluding time of day and memory address of the
object
3. Uniformity
• Must map the keys as evenly as possible over its output range which
minimizes the number of collisions
Types of Hash Functions
• There are various types of hash functions available such as-
• Division Method
• Mid Square Method
• Folding Method
• Multiplication Method
• It is best suited that M is a prime number as that can make sure the
keys are more uniformly distributed. The hash function is dependent
upon the remainder of a division.
Ex:
• k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
• k = 1276
M = 11
h(1276) = 1276 mod 11
=0
• Pros:
1.This method is quite good for any value of M.
2.The division method is very fast since it requires only a single
division operation.
• Cons:
1.This method leads to poor performance since consecutive keys map to
consecutive hash values in the hash table.
2.Sometimes extra care should be taken to choose the value of M.
2. Mid square method
• The mid-square method is a very good hashing method. It involves two steps to
compute the hash value-
h(K) = h(k x k)
Here,
k is the key value.
• The performance of this method is good as most or all digits of the key
value contribute to the result. This is because all digits in the key
contribute to generating the middle digits of the squared result.
• The result is not dominated by the distribution of the top digit or
bottom digit of the original key value.
Cons:
• The size of the key is one of the limitations of this method, as the key is
of big size then its square will double the number of digits.
• Another disadvantage is that there will be collisions but we can try to
reduce collisions.
3. Digit folding method
• Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each
part has the same number of digits except for the last part that can have lesser
digits than the other parts.
• Add the individual parts. The hash value is obtained by ignoring the last carry if
any.
Formula:
Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
Note:
The number of digits in each part varies depending upon the size of the hash
table. Suppose for example the size of the hash table is 100, then each part
must have two digits except for the last part which can have a lesser number
of digits.
4. Multiplication Method
This method involves the following steps:
1.Choose a constant value A such that 0 < A < 1.
2.Multiply the key value with A.
3.Extract the fractional part of kA.
4.Multiply the result of the above step by the size of the hash table i.e. M.
5.The resulting hash value is obtained by taking the floor of the result obtained in
step 4.
• Formula:
h(K) = floor (M (kA mod 1))
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Pros:
• The advantage of the multiplication method is that it can work with any value between
0 and 1, although there are some values that tend to give better results than the rest.
Cons:
• The multiplication method is generally suitable when the table size is the power of
two, then the whole process of computing the index by the key using multiplication
hashing is very fast.
Types of Hashing
• Static Hashing
• Dynamic Hashing
Static Hashing
• It is a hashing technique that enables users to lookup a definite data
set. Meaning, the data in the directory is not changing, it is "Static" or
fixed. In this hashing technique, the resulting number of data buckets
in memory remains constant.
• Operations Provided by Static Hashing
• Delete − Search a record address and delete a record at the same address or
delete a chunk of records from records for that address in memory.
• Insertion − While entering a new record using static hashing, the hash
function (h) calculates bucket address "h(K)" for the search key (k), where the
record is going to be stored.
• Search − A record can be obtained using a hash function by locating the
address of the bucket where the data is stored.
• Update − It supports updating a record once it is traced in the data bucket.
Static Hashing
• Advantages of Static Hashing
• Static hashing is advantageous in the following ways −
• Offers unparalleled performance for small-size databases.
• Allows Primary Key value to be used as a Hash Key.
• Disadvantages of Static Hashing
• Static hashing comes with the following disadvantages −
• It cannot work efficiently with the databases that can be scaled.
• It is not a good option for large-size databases.
• Bucket overflow issue occurs if there is more data and less memory.
Dynamic Hashing
• It is a hashing technique that enables users to lookup a dynamic data
set. Means, the data set is modified by adding data to or removing the
data from, on demand hence the name ‘Dynamic’ hashing. Thus, the
resulting data bucket keeps increasing or decreasing depending on the
number of records.
• In this hashing technique, the resulting number of data buckets in
memory is ever-changing.
• Operations Provided by Dynamic Hashing
• Delete − Locate the desired location and support deleting data (or a chunk of
data) at that location.
• Insertion − Support inserting new data into the data bucket if there is a space
available in the data bucket.
• Query − Perform querying to compute the bucket address.
• Update − Perform a query to update the data.
Dynamic Hashing
• Advantages of Dynamic Hashing
• Dynamic hashing is advantageous in the following ways −
• It works well with scalable data.
• It can handle addressing large amount of memory in which data size is always
changing.
• Bucket overflow issue comes rarely or very late.
• Disadvantages of Dynamic Hashing
• Dynamic hashing comes with the following disadvantage −
• The location of the data in memory keeps changing according to the bucket
size. Hence if there is a phenomenal increase in data, then maintaining the
bucket address table becomes a challenge.
Differences between Static and Dynamic Hashing
Collision in Hashing
• Hash function is used to compute the hash value for a key.
• Hash value is then used as an index to store the key in the hash table.
• Hash function may return the same hash value for two or more keys.
• When the hash value of a key maps to an already occupied bucket of
the hash table, it is called as a Collision.
Collision Resolution Techniques
• Collision Resolution Techniques are the techniques used for resolving
or handling the collision.
Collision Resolution
Techniques
Linear Probing
Quadratic Probing
Double Hashing
Separate Chaining
• To handle the collision,
• This technique creates a linked list to the slot for which collision occurs.
• The new key is then inserted in the linked list.
• These linked lists to the slots appear like chains.
• That is why, this technique is called as separate chaining.
Example-Separate Chaining
• Using the hash function ‘key mod 7’, insert the following sequence of
keys in the hash table-
• 50, 700, 76, 85, 92, 73 and 101
Step-1
• Draw an empty hash table.
• For the given hash function, the possible range of hash values is [0, 6].
• So, draw an empty hash table consisting of 7 buckets as-
Step-2
• Insert the given keys in the hash table one by one.
• The first key to be inserted in the hash table = 50.
• Bucket of the hash table to which key 50 maps = 50 mod 7 = 1.
• So, key 50 will be inserted in bucket-1 of the hash table as-
Step-3
• The next key to be inserted in the hash table = 700.
• Bucket of the hash table to which key 700 maps = 700 mod 7 = 0.
• So, key 700 will be inserted in bucket-0 of the hash table as-
Step-4
• The next key to be inserted in the hash table = 76.
• Bucket of the hash table to which key 76 maps = 76 mod 7 = 6.
• So, key 76 will be inserted in bucket-6 of the hash table as-
Step-5
• The next key to be inserted in the hash table = 85.
• Bucket of the hash table to which key 85 maps = 85 mod 7 = 1.
• Since bucket-1 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-1.
• So, key 85 will be inserted in bucket-1 of the hash table as-
Step-6
• The next key to be inserted in the hash table = 92.
• Bucket of the hash table to which key 92 maps = 92 mod 7 = 1.
• Since bucket-1 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-1.
• So, key 92 will be inserted in bucket-1 of the hash table as-
Step-7
• The next key to be inserted in the hash table = 101.
• Bucket of the hash table to which key 101 maps = 101 mod 7 = 3.
• Since bucket-3 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-3.
• So, key 101 will be inserted in bucket-3 of the hash table as-
Algorithm to insert an item using Chaining
Approach
• In case of collision,
• Probing is performed until an empty bucket is found.
• Once an empty bucket is found, the key is inserted.
• Probing is performed in accordance with the technique used for open
addressing.
1. Linear Probing
• In linear probing,
• When collision occurs, we linearly probe for the next bucket.
• We keep probing until an empty bucket is found.
• Advantage-
• It is easy to compute.
• Disadvantage-
For(i=j=0;i<10;i++)
{ temp=b[i];
while(temp!=NULL)
{ a[j++]=temp->info;
temp=temp->link;
}
}
}
Tree is one of the most important data structure that is used for efficiently performing operations like
insertion, deletion and searching of values. However, while working with a large volume of data,
construction of a well-balanced tree for sorting all data s not feasible. Thus only useful data is stored
as a tree, and the actual volume of data being used continually changes through the insertion of new
data and deletion of existing data. You will find in some cases where the NULL link to a binary tree to
special links is called as threads and hence it is possible to perform traversals, insertions, deletions
without using either stack or recursion. In this chapter, you will learn about the Height balance tree
which is also known as the AVL tree.
What is AVL Tree
• AVL tree is a binary search tree in which the difference of heights of left and right subtrees of any node
is less than or equal to one. The technique of balancing the height of binary trees was developed by
Adelson, Velskii, and Landi and hence given the short form as AVL tree or Balanced Binary Tree.
• An AVL tree can be defined as follows:
• Let T be a non-empty binary tree with TL and TR as its left and right subtrees. The tree is
height balanced if:
• TL and TR are height balanced
• hL - hR <= 1, where hL - hR are the heights of TL and TR
• The Balance factor of a node in a binary tree can have value 1, -1, 0,
• If balance factor of any node is 1, it means that the left sub-tree is one level higher than the
right sub-tree.
• If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal
height.
• If balance factor of any node is -1, it means that the left sub-tree is one level lower than the
right sub-tree.
AVL tree controls the height of a binary search tree and it prevents
it from becoming skewed. Because when a binary tree becomes
skewed, it is the worst case (O (n)) for all the operations.
By using the balance factor, AVL tree imposes a limit on the binary
tree and thus keeps all the operations at O (log n).
Advantages of AVL Tree
• Since AVL trees are height balance trees, operations like insertion and
deletion have low time complexity. Let us consider an example:
• If you have the following tree having keys 1, 2, 3, 4, 5, 6, 7 and then the
binary tree will be like the second figure:
Advantages of AVL Tree
Advantages of AVL Tree
Struct AVLNode
{
int data;
struct AVLNode *left, *right;
int balfactor;
};
Insertion
• Step 1: First, insert a new element into the tree using BST's (Binary
Search Tree) insertion logic.
• Step 2: After inserting the elements you have to check the Balance
Factor of each node.
• Step 3: When the Balance Factor of every node will be found like 0
or 1 or -1 then the algorithm will proceed for the next operation.
• Step 4: When the balance factor of any node comes other than the
above three values then the tree is said to be imbalanced. Then
perform the suitable Rotation to make it balanced and then the
algorithm will proceed for the next operation.
AVL Rotations
We perform rotation in AVL tree only in case if Balance Factor is other than -1, 0, and
1. There are basically four types of rotations which are as follows:
1. L L rotation: Inserted node is in the left subtree of left subtree of A
2. R R rotation : Inserted node is in the right subtree of right subtree of A
3. L R rotation : Inserted node is in the right subtree of left subtree of A
4. R L rotation : Inserted node is in the left subtree of right subtree of A
Where node A is the node whose balance Factor is other than -1, 0, 1.
• The first two rotations LL and RR are single rotations and the next two
rotations LR and RL are double rotations. For a tree to be unbalanced,
minimum height must be at least 2, Let us understand each rotation
1. RR Rotation
• When BST becomes unbalanced, due to a node is inserted into the
right subtree of the right subtree of A, then we perform RR
rotation, RR rotation is an anticlockwise rotation, which is applied
on the edge below a node having balance factor -2
H, I, J, B, A, E, C, F, D, G, K, L
• Insert H, I, J
• On inserting the above elements, especially in the case of H, the BST
becomes unbalanced as the Balance Factor of H is -2. Since the BST is
right-skewed, we will perform RR Rotation on node H.
• Insert B, A
• On inserting the above elements, especially in case of A, the BST
becomes unbalanced as the Balance Factor of H and I is 2, we consider
the first node from the last inserted node i.e. H. Since the BST from H
is left-skewed, we will perform LL Rotation on node H.
• Insert E
• On inserting E, BST becomes unbalanced as the Balance Factor of I is
2, since if we travel from E to I we find that it is inserted in the left
subtree of right subtree of I, we will perform LR Rotation on node I.
LR = RR + LL rotation
• On inserting C, F, D, BST becomes unbalanced as the Balance Factor
of B and H is -2, since if we travel from D to B we find that it is
inserted in the right subtree of left subtree of B, we will perform RL
Rotation on node I. RL = LL + RR rotation.
• 3b) We first perform LL rotation on the node I
• Insert C, F, D
• Insert C, F, D
Deletion
Step 1: Firstly, find that node where k is stored
Step 2: Secondly delete those contents of the node (Suppose the node is x)
Step 3: Claim: Deleting a node in an AVL tree can be reduced by deleting a leaf. There are three possible cases:
Most of the BST operations (e.g., search, max, min, insert, delete.. etc)
take O(h) time where h is the height of the BST. The cost of these
operations may become O(n) for a skewed Binary tree. If we make sure
that the height of the tree remains O(log n) after every insertion and
deletion, then we can guarantee an upper bound of O(log n) for all these
operations. The height of a Red-Black tree is always O(log n) where n is
the number of nodes in the tree.
Properties of Red Black Tree:
The Red-Black tree satisfies all the properties of binary search tree in addition to that it
satisfies following additional properties –
1. Root property: The root is black.
2. External property: Every leaf (Leaf is a NULL child of a node) is black in Red-
Black tree.
3. property: The children of a red node are black. Hence possible parent of red node is
a black node.
4. Depth property: All the leaves have the same black depth.
5. Path property: Every simple path from root to descendant leaf node contains same
number of black nodes.
result of all these above-mentioned properties is that the Red-Black tree is
roughly balanced.
Rules That Every Red-Black Tree
Follows:
1.Every node has a color either red or black.
2.The root of the tree is always black.
3.There are no two adjacent red nodes (A red node cannot have a red
parent or red child).
4.Every path from a node (including root) to any of its descendants
NULL nodes has the same number of black nodes.
5. Every leaf (e.i. NULL node) must be colored BLACK.
Comparison with AVL Tree:
• The AVL trees are more balanced compared to Red-Black Trees, but
they may cause more rotations during insertion and deletion. So if
your application involves frequent insertions and deletions, then Red-
Black trees should be preferred. And if the insertions and deletions are
less frequent and search is a more frequent operation, then AVL tree
should be preferred over the Red-Black Tree.
Applications:
1.Most of the self-balancing BST library functions like map, multiset, and multimap in C++ ( or java
packages like java.util.TreeMap and java.util.TreeSet ) use Red-Black Trees.
2.It is used to implement CPU Scheduling Linux. Completely Fair Scheduler uses it.
3. It is also used in the K-mean clustering algorithm in machine learning for reducing time complexity.
4. Moreover, MySQL also uses the Red-Black tree for indexes on tables in order to reduce the
searching and insertion time.
5.Red Black Trees are used in the implementation of the virtual memory manager in some operating
systems, to keep track of memory pages and their usage.
6. Many programming languages such as Java, C++, and Python have implemented Red Black Trees as
a built-in data structure for efficient searching and sorting of data.
7.Red Black Trees are used in the implementation of graph algorithms such as Dijkstra’s shortest path
algorithm and Prim’s minimum spanning tree algorithm.
8.Red Black Trees are used in the implementation of game engines.
Advantages:
1.Red Black Trees require one extra bit of storage for each node to store
the color of the node (red or black).
2.Complexity of Implementation.
3. Although Red Black Trees provide efficient performance for basic
operations, they may not be the best choice for certain types of data or
specific use cases.
Insertion in Red-Black Tree
• Logic:
• First, you have to insert the node similarly to that
in a binary tree and assign a red colour to it. Now,
if the node is a root node then change its colour to
black, but if it is not then check the colour of the
parent node. If its colour is black then don’t
change the colour but if it is not i.e. it is red then
check the colour of the node’s uncle. If the node’s
uncle has a red colour then change the colour of
the node’s parent and uncle to black and that of
grandfather to red colour and repeat the same
process for him (i.e. grandfather).If grandfather is
root then don’t change grandfather to red colour.
• has black colour then there are 4 possible cases:
• Left Left Case (LL rotation):
• Left Right Case (LR rotation):
• Right Right Case (RR rotation):
• Right Left Case (RL rotation):
Rules That Every Red-Black Tree
Follows:
1.Every node has a color either red or black.
2.The root of the tree is always black.
3.There are no two adjacent red nodes (A red node cannot have a red
parent or red child).
4.Every path from a node (including root) to any of its descendants
NULL nodes has the same number of black nodes.
5. Every leaf (e.i. NULL node) must be colored BLACK.
Insertion
1. If the tree is empty, create a new node as root node with colour
black.
2. if the tree is not empty, create a new node as leaf node with colour
Red
3. If the parent of new node is black then exit
4. If the parent of new node is Red, then check the colour of the
parent’s siblings of the new node
a) If colour is black or NULL then do suitable rotation (AVL Tree) and recolour
b) If colour is Red, then recolour and also check if parent’s parent of the new
node is not root node then recolour it and recheck
Splay Trees
Splay Tree: Introduction
● Splay trees are the self-balancing or self-adjusted binary search trees. In other words,
we can say that the splay trees are the variants of the binary search trees.
● A splay tree is a self-balancing tree, but AVL and Red-Black trees are also self-
balancing trees. Splay tree has one extra property that makes it unique which is
splaying.
● A splay tree contains the same operations as a Binary search tree, i.e., Insertion,
deletion and searching, but it also contains one more operation, i.e., splaying. So. all
the operations in the splay tree are followed by splaying.
The splay tree can be defined as the self-adjusted tree in which any operation
performed on the element would rearrange the tree so that the element on which
operation has been performed becomes the root node of the tree.
We use Zig rotation when the item which is to be searched is either a root node or a left child of
the root node
In the following example , lets search for the node 9 which is left child of root node
Here Right rotation is performed , so that 9 becomes the root node of the tree
.
Zig Zig rotation
This is done cases where we need to search a node which has both parent and grandparent
Let’s say we have to search for 13, which is present on the right of the root node of the tree.
Zag Zag Rotation
Here we perform zag rotation two times.Every node moves two positions to the left of its
current position.
This type of rotation is a sequence of zig rotations followed by zag rotations.Every node moves one
position to the right, followed by one position to the left of its current position.
This rotation is similar to the Zig-zag rotation, the only difference is that here every node
moves one position to the left, followed by one position to the right of its current position
● Splay tree is the fastest type of binary search tree, which is used in a
variety of practical applications such as GCC compilers.
The main disadvantage of the splay tree is that trees are not strictly
balanced, but rather roughly balanced. When the splay trees are linear, the
time complexity is O(n).
B TREE
B TREE
• A B-tree of order m is a tree which satisfies the following
properties:
1.Every node has at most m children.
2.Every internal node has at least ⌈m/2⌉ children.
3.Every non-leaf node has at least two children.
4.All leaves appear on the same level.
5.A non-leaf node with k children contains k−1 keys
PROPERTIES OF B-TREES
• All leaves are at the same level.
• If m is the order of the tree, each internal node can contain at most
m - 1 keys along with a pointer to each child.
• Each node except root can have at most m children and at
least m/2 children.
• The root has at least 2 children and contains a minimum of 1 key.
Construction of B-tree of order 3
• Insert 1 to 8 into the B-tree
1
• Step 1: insert 1
1
• Step 2: insert 2 2
2
1 2 3
• Step 3 : insert 3
1 3
Insertion
• Step 4: Insert 4 into B Tree 2
1 3 4
• Step 5: Insert 5 into B Tree
• 2 2 4
3 4 5
1 1 3 5
Insertion
• Step 6 : Insert 6 into B Tree 2 4
1 3 5 6
2 4 2 4 6 2 6
1 3 5 7
1 3 5 6 7
1 3 5 7
Insertion
• Step 8 : Insert 8 into B Tree
4
2 6
1 3 5 7 8
Insert an elements 8, 9, 10, 11, 15, 20, 17
in B tree of order 3
Application of B tree
• It is used in large databases to access data stored on the disk
• Searching for data in a data set can be achieved in significantly less time using
the B-Tree
• With the indexing feature, multilevel indexing can be achieved.
• Most of the servers also use the B-tree approach.
• B-Trees are used in CAD systems to organize and search geometric data.
• B-Trees are also used in other areas such as natural language processing,
computer networks, and cryptography.
Advantages of B tree
• B-Trees are self-balancing.
• High-concurrency and high-throughput.
• Efficient storage utilization.
• suitable for large data sets and real-time applications.
Disadvantages of B trees
• B-Trees are based on disk-based data structures and can have a high disk
usage.
• Not the best for all cases.
• Slow in comparison to other data structures.