ADS Unit-3
ADS Unit-3
Unit-III
Trees Part-II:
Red-Black Trees, Splay Trees, Applications.
Hash Tables: Introduction, Hash Structures, Hash functions, Linear Open Addressing, Chaining and
applications:
Red-Black Tree:
A red-black tree is a type of binary search tree. It is self balancing like the AVL tree
though it uses different properties to maintain the invariant of being balanced. Balanced binary search trees
are much more efficient at search than unbalanced binary search trees, so the complexity needed to
maintain balance is often worth it. They are called red-black trees because each node in the tree is labeled
as red or black.
A Red Black Tree is a category of the self-balancing binary search tree. It was created in
1972 by Rudolf Bayer who termed them "symmetric binary B-trees."
A red-black tree is a Binary tree where a particular node has color as an extra attribute, either
red or black. By check the node colors on any simple path from the root to a leaf, red-black trees secure
that no such path is higher than twice as long as any other so that the tree is generally balanced.
1
II CSE III Sem Advanced Data Structures & Algorithm
Clearly, the order (Ax By C) is preserved by the rotation operation. Therefore, if we start with a BST and
only restructure using rotation, then we will still have a BST i.e. rotation do not break the BST-Property.
Algorithm:
LEFT ROTATE (T, x)
1. y ← right [x]
1. y ← right [x]
2. right [x] ← left [y]
3. p [left[y]] ← x
4. p[y] ← p[x]
5. If p[x] = nil [T]
then root [T] ← y
else if x = left [p[x]]
then left [p[x]] ← y
else right [p[x]] ← y
6. left [y] ← x.
7. p [x] ← y.
Example: Draw the complete binary tree of height 3 on the keys {1, 2, 3... 15}. Add the NIL leaves and
color the nodes in three different ways such that the black heights of the resulting trees are: 2, 3 and 4.
Solution:
2
II CSE III Sem Advanced Data Structures & Algorithm
3
II CSE III Sem Advanced Data Structures & Algorithm
After the insert new node, Coloring this new node into black may violate the black-height conditions and
coloring this new node into red may violate coloring conditions i.e. root is black and red node has no red
children. We know the black-height violations are hard. So we color the node red. After this, if there is any
color violation, then we have to correct them by an RB-INSERT-FIXUP procedure.
Example: Show the red-black trees that result after successively inserting the keys 41,38,31,12,19,8 into
an initially empty red-black tree.
1. Insert 41
4
II CSE III Sem Advanced Data Structures & Algorithm
Insert 19
3. Deletion:
First, search for an element to be deleted
o If the element to be deleted is in a node with only left child, swap this node with one containing the
largest element in the left subtree. (This node has no right child).
o If the element to be deleted is in a node with only right child, swap this node with the one
containing the smallest element in the right subtree (This node has no left child).
o If the element to be deleted is in a node with both a left child and a right child, then swap in any of
the above two ways. While swapping, swap only the keys but not the colors.
5
II CSE III Sem Advanced Data Structures & Algorithm
o The item to be deleted is now having only a left child or only a right child. Replace this node with
its sole child. This may violate red constraints or black constraint. Violation of red constraints can
be easily fixed.
o If the deleted node is black, the black constraint is violated. The elimination of a black node y
causes any path that contained y to have one fewer black node.
o Two cases arise:
o The replacing node is red, in which case we merely color it black to make up for the loss of
one black node.
o The replacing node is black.
The strategy RB-DELETE is a minor change of the TREE-DELETE procedure. After splicing out a node,
it calls an auxiliary procedure RB-DELETE-FIXUP that changes colors and performs rotation to restore the
red-black properties.
RB-DELETE (T, z) RB-DELETE-FIXUP (T, x)
1. if left [z] = nil [T] or right [z] = nil [T] 1. while x ≠ root [T] and color [x] = BLACK
2. then y ← z 2. do if x = left [p[x]]
3. else y ← TREE-SUCCESSOR (z) 3. then w ← right [p[x]]
4. if left [y] ≠ nil [T] 4. if color [w] = RED
5. then x ← left [y] 5. then color [w] ← BLACK //Case 1
6. else x ← right [y] 6. color [p[x]] ← RED //Case 1
7. p [x] ← p [y] 7. LEFT-ROTATE (T, p [x]) //Case 1
8. if p[y] = nil [T] 8. w ← right [p[x]] //Case 1
9. then root [T] ← x 9. If color [left [w]] = BLACK and color [right[w]] =
10. else if y = left [p[y]] BLACK
11. then left [p[y]] ← x 10. then color [w] ← RED //Case 2
12. else right [p[y]] ← x 11. x ← p[x] //Case 2
13. if y≠ z 12. else if color [right [w]] = BLACK
14. then key [z] ← key [y] 13. then color [left[w]] ← BLACK //Case 3
15. copy y's satellite data into z 14. color [w] ← RED //Case 3
16. if color [y] = BLACK 15. RIGHT-ROTATE (T, w) //Case 3
17. then RB-delete-FIXUP (T, x) 16. w ← right [p[x]] //Case 3
18. return y 17. color [w] ← color [p[x]] //Case 4
18. color p[x] ← BLACK //Case 4
19. color [right [w]] ← BLACK //Case 4
20. LEFT-ROTATE (T, p [x]) //Case 4
21. x ← root [T] //Case 4
22. else (same as then clause with "right" and "left"
exchanged)
23. color [x] ← BLACK
6
II CSE III Sem Advanced Data Structures & Algorithm
Example: In a previous example, we found that the red-black tree that results from successively inserting
the keys 41,38,31,12,19,8 into an initially empty tree. Now show the red-black trees that result from the
successful deletion of the keys in the order 8, 12, 19,31,38,41.
Delete 38
Delete 41
No Tree.
7
II CSE III Sem Advanced Data Structures & Algorithm
Splay Trees:
Splay trees are the self-balancing or self-adjusted binary search trees. In other words, we can say
that the splay trees are the variants of the binary search trees. The prerequisite for the splay trees
that we should know about the binary search trees.
Splay trees are not strictly balanced trees, but they are roughly balanced trees.
A splay tree contains the same operations as a Binary search tree, i.e., Insertion, deletion and
searching, but it also contains one more operation, i.e., splaying.
Splaying an element is the process of bringing it to the root position by performing suitable
rotation operations.
In a splay tree, splaying an element rearranges all the elements in the tree so that splayed element is
placed at the root of the tree.
By splaying elements we bring more frequently used elements closer to the root of the tree so that
any operation on those elements is performed quickly. That means the splaying operation
automatically brings more frequently used elements closer to the root of the tree.
Advantages of Splay tree:
o In the splay tree, we do not need to store the extra information. In contrast, in AVL trees, we need
to store the balance factor of each node that requires extra space, and Red-Black trees also require
to store one extra bit of information that denotes the color of the node, either Red or Black.
o It is the fastest type of Binary Search tree for various practical applications. It is used in Windows
NT and GCC compilers.
o It provides better performance as the frequently accessed nodes will move nearer to the root node,
due to which the elements can be accessed quickly in splay trees. It is used in the cache
implementation as the recently accessed data is stored in the cache so that we do not need to go to
the memory for accessing the data, and it takes less time.
Drawback of Splay tree
The major drawback of the splay tree would be that trees are not strictly balanced, i.e., they are
roughly balanced. Sometimes the splay trees are linear, so it will take O(n) time complexity.
In splay tree, to splay any element we use the following rotation operations.
Rotations in Splay Tree:
1. Zig Rotation
2. Zag Rotation
3. Zig - Zig Rotation
4. Zag - Zag Rotation
5. Zig - Zag Rotation
6. Zag - Zig Rotation
Zig Rotation:
The Zig Rotation in splay tree is similar to the
single right rotation in AVL Tree rotations. In zig rotation,
every node moves one position to the right from its current
position. Consider the following example.
8
II CSE III Sem Advanced Data Structures & Algorithm
Zag Rotation
The Zag Rotation in splay tree is
similar to the single left rotation in AVL Tree
rotations. In zag rotation, every node moves one
position to the left from its current position. Consider
the following example.
Zig-Zig Rotation:
The Zig-Zig Rotation in splay tree is a double
zig rotation. In zig-zig rotation, every node moves two
positions to the right from its current position.
Consider the following example
Zag-Zag Rotation:
The Zag-Zag Rotation in splay tree is a double zag
rotation. In zag-zag rotation, every node moves two
positions to the left from its current position. Consider the
following example.
Zig-Zag Rotation:
The Zig-Zag Rotation in splay tree is a
sequence of zig rotation followed by zag rotation. In
zig-zag rotation, every node moves one position to
the right followed by one position to the left from its
current position. Consider the following example.
Zag-Zig Rotation:
The Zag-Zig Rotation in splay tree is a
sequence of zag rotation followed by zig rotation. In
zag-zig rotation, every node moves one position to
the left followed by one position to the right from its
current position. Consider the following example
9
II CSE III Sem Advanced Data Structures & Algorithm
In the insertion operation, we first insert the element in the tree and then perform the splaying
operation on the inserted element.
Step 1: First, we insert node 15 in the tree. After insertion, we need to perform splaying. As 15 is a root
node, so we do not need to perform splaying.
Step 2: The next element is 10. As 10 is less than 15, so node 10 will be the
left child of node 15, as shown below:
Step 3: The next element is 17. As 17 is greater than 10 and 15 so it will become the right child of node 15.
Now, we will perform splaying. As 17 is having a parent as well as a grandparent so we will perform zig
zig rotations.
Step 4: The next element is 7. As 7 is less than 17, 15, and 10, so node 7 will be left child of 10.
Now, we have to
splay the tree. As 7 is
having a parent as well as a
grandparent so we will
perform two right rotations
as shown below:
10
II CSE III Sem Advanced Data Structures & Algorithm
Still the node 7 is not a root node, it is a left child of the root node, i.e., 17. So, we need to perform one
more right rotation to make node 7 as a root node as shown below:
temp= T_root
y=NULL
while(temp!=NULL)
y=temp
if(n->data <temp->data)
temp=temp->left
else
temp=temp->right
n.parent= y
if(y==NULL)
T_root = n
y->left = n
else
y->right = n
Splay(T, n)
11
II CSE III Sem Advanced Data Structures & Algorithm
Still, node 10 is not a root node; node 10 is the left child of the root node. So, we need to perform the
right rotation on the root node, i.e.,
14 to make node 10 a root node as
shown below:
12
II CSE III Sem Advanced Data Structures & Algorithm
Now, we have to delete the 14 element from the tree, which is shown below.
As we know that we cannot simply delete the internal node. We will replace the value of the node
either using in-order predecessor or in-order successor. Suppose we use in-order successor in which we
replace the value with the lowest value that exist in the right subtree. The lowest value in the right subtree
of node 14 is 15, so we replace the value 14 with 15. Since node 14 becomes the leaf node, so we can
simply delete it as shown below:
Top-down splaying:
In top-down splaying, we first perform the
splaying on which the deletion is to be performed and then delete
the node from the tree. Once the element is deleted, we will
perform the join operation. Let's understand the top-down
splaying through an example.
Suppose we want to delete 16 from the tree which is shown below:
13
II CSE III Sem Advanced Data Structures & Algorithm
Step-4: The first step is to find the maximum element in the left subtree. In the left subtree, the maximum
element is 15, and then we need to perform splaying operation on 15.
As we can observe in the above tree that
the element 15 is having a parent as well as a
grandparent. A node is right of its parent, and
the parent node is also right of its parent, so we
need to perform two left rotations to make node
15 a root node as shown below:
14
II CSE III Sem Advanced Data Structures & Algorithm
15
II CSE III Sem Advanced Data Structures & Algorithm
Hashing:
Hashing is one of the searching techniques that uses a constant time. The time complexity in
hashing is O(1). Till now, we read the two techniques for searching, i.e. linear search and binary search.
The worst time complexity in linear search is O(n), and O(logn) in binary search. In both the searching
techniques, the searching depends upon the number of elements but we want the technique that takes a
constant time. So, hashing technique came that provides a constant time.
In hashing technique, the hash table and hash function are used. Using the hash function, we can
calculate the address at which the value can be stored.
The main idea behind the hashing is to create the (key/value) pairs. If the key is given, then the algorithm
computes the index at which the value would be stored. It can be written as:
For example, suppose the key value is “Sai” and the value is the “phone number”, so when we pass the key
value in the hash function shown as below:
Hash (key) = index;
When we pass the key in the hash function, then it gives the index.
Hash (sai) = 3;
The above example adds the john at the index 3.
Hash function:
A hash function is any function that can be used to map a data set of an arbitrary size to a
data set of a fixed size, which falls into the hash table. The values returned by a hash function are called
hash values, hash codes, hash sums, or simply hashes.
To achieve a good hashing mechanism, It is important to have a good hash function with the following
basic requirements:
1. Easy to compute: It should be easy to compute and must not become an algorithm in itself.
2. Uniform distribution: It should provide a uniform distribution across the hash table and should not
result in clustering.
3. Less collision: Collisions
occur when pairs of elements
are mapped to the same hash
value. These should be
avoided.
Hash Collision:
When the hash function generates the same index for multiple keys, there will be a conflict (what
value to be stored in that index). This is called a hash collision.
We can resolve the hash collision using one of the following techniques.
Collision resolution by chaining
Open Addressing: Linear/Quadratic Probing and Double Hashing
16
II CSE III Sem Advanced Data Structures & Algorithm
17
II CSE III Sem Advanced Data Structures & Algorithm
Sr.No. Key Hash Array Index After Linear Probing, Array Index
1 1 1 % 20 = 1 1 1
2 2 2 % 20 = 2 2 2
3 42 42 % 20 = 2 2 3
4 4 4 % 20 = 4 4 4
5 12 12 % 20 = 12 12 12
6 14 14 % 20 = 14 14 14
7 17 17 % 20 = 17 17 17
8 13 13 % 20 = 13 13 13
9 37 37 % 20 = 17 17 18
Quadratic Probing:
Quadratic probing is similar to linear probing and the only difference is the
interval between successive probes or entry slots. Here, when the slot at a hashed
index for an entry record is already occupied, you must start traversing until you find
an unoccupied slot. The interval between slots is computed by adding the successive
value of an arbitrary polynomial in the original hashed index.
Let us assume that the hashed index for an entry is index and at index there is an
occupied slot. The probe sequence will be as follows:
18
II CSE III Sem Advanced Data Structures & Algorithm
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively. The calculated index value
of 11 is 5 which is already occupied by another key value, i.e., 6. When linear probing is applied, the
nearest empty cell to the index 5 is 6; therefore, the value 11 will be added at the index 6.
Double hashing is similar to linear probing and the only difference is the interval between
successive probes. Here, the interval between probes is computed by using two hash functions.
Let us say that the hashed index for an entry record is an index that is computed by one hashing
function and the slot at that index is already occupied. You must start traversing in a specific probing
sequence to look for an unoccupied slot. The probing sequence will be:
Double hashing is an open addressing technique which is used to avoid the collisions. When the collision
occurs then this technique uses the secondary hash of the key. It uses one hash value as an index to move
forward until the empty location is found.
In double hashing, two hash functions are used. Suppose h1(k) is one of the hash functions used to calculate
the locations whereas h2(k) is another hash function. It can be defined as "insert ki at first free place
from (u+v*i)%m where i=(0 to m-1)". In this case, u is the location computed using the hash function and
v is equal to (h2(k)%m).
19
II CSE III Sem Advanced Data Structures & Algorithm
13 ((2*13)+3)%10 = 9 (3(13)+1)%10 = 0
7 ((2*7)+3)%10 = 7 (3(7)+1)%10 = 2
12 ((2*12)+3)%10 = 7 (3(12)+1)%10 = 7 2
As we know that no collision would occur while inserting the keys (3, 2, 9, 6), so we will not apply double
hashing on these key values.
On inserting the key 11 in a hash table, collision will occur because the calculated index value of 11 is 5
which is already occupied by some another value. Therefore, we will apply the double hashing technique
on key 11. When the key value is 11, the value of v is 4.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (5+4*0)%10 =5
When i=1
Index = (5+4*1)%10 = 9
When i=2
Index = (5+4*2)%10 = 3
Since the location 3 is empty in a hash table; therefore, the key 11 is added at the
index 3.
The next element is 13. The calculated index value of 13 is 9 which is already
occupied by some another key value. So, we will use double hashing technique to
find the free location. The value of v is 0.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (9+0*0)%10 = 9
We will get 9 value in all the iterations from 0 to m-1 as the value of v is zero.
Therefore, we cannot insert 13 into a hash table.
The next element is 7. The calculated index value of 7 is 7 which is already occupied by some another key
value. So, we will use double hashing technique to find the free location. The value of v is 2.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (7 + 2*0)%10 = 7
When i=1
Index = (7+2*1)%10 = 9
When i=2
Index = (7+2*2)%10 = 1
When i=3
Index = (7+2*3)%10 = 3
When i=4
Index = (7+2*4)%10 = 5
20
II CSE III Sem Advanced Data Structures & Algorithm
When i=5
Index = (7+2*5)%10 = 7
When i=6
Index = (7+2*6)%10 = 9
When i=7
Index = (7+2*7)%10 = 1
When i=8
Index = (7+2*8)%10 = 3
When i=9
Index = (7+2*9)%10 = 5
Since we checked all the cases of i (from 0 to 9), but we do not find suitable place
to insert 7. Therefore, key 7 cannot be inserted in a hash table.
The next element is 12. The calculated index value of 12 is 7 which is already
occupied by some another key value. So, we will use double hashing technique to
find the free location. The value of v is 7.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (7+7*0)%10 = 7
When i=1
Index = (7+7*1)%10 = 4
Since the location 4 is empty; therefore, the key 12 is inserted at the index 4.
The final hash table would be:
Applications
Associative arrays: Hash tables are commonly used to implement many types of in-memory tables.
They are used to implement associative arrays (arrays whose indices are arbitrary strings or other
complicated objects).
Database indexing: Hash tables may also be used as disk-based data structures and database indices
(such as in dbm).
Caches: Hash tables can be used to implement caches i.e. auxiliary data tables that are used to
speed up the access to data, which is primarily stored in slower media.
Object representation: Several dynamic languages, such as Perl, Python, JavaScript, and Ruby use
hash tables to implement objects.
Hash Functions are used in various algorithms to make their computing faster
Example program:
21
II CSE III Sem Advanced Data Structures & Algorithm
hashTable = [[],] * 10
def checkPrime(n):
if n == 1 or n == 0:
return 0
for i in range(2, n//2):
if n % i == 0:
return 0
return 1
def getPrime(n):
if n % 2 == 0:
n=n+1
while not checkPrime(n):
n += 2
return n
def hashFunction(key):
capacity = getPrime(10)
return key % capacity
def insertData(key, data):
index = hashFunction(key)
hashTable[index] = [key, data]
def removeData(key):
index = hashFunction(key)
hashTable[index] = 0
insertData(123, "apple")
insertData(432, "mango")
insertData(213, "banana")
insertData(654, "guava")
print(hashTable)
removeData(123)
print(hashTable)
22