8335 Decap770 Advanced Data Structures
8335 Decap770 Advanced Data Structures
DECAP770
Edited by
Balraj Kumar
Advanced Data Structures
Edited By:
Balraj Kumar
CONTENT
Unit 1: Introduction 1
Ashwani Kumar, Lovely Professional University
Objectives
After studying this unit, you will be able to:
Introduction
The static representation of a linear ordered list using an array wastes resources and, in some
situations, causes overflows. We no longer want to pre-allocate memory to any linear list; instead,
we want to allocate memory to elements as they are added to the list. This necessitates memory
allocation that is dynamic.
Semantically data can exist in either of the two forms – atomic or structured. In most of the
programming problems data to be read, processed and written are often related to each other. Data
items are related in a variety of different ways. Whereas the basic data types such as integers,
characters etc. can be directly created and manipulated in a programming language, the
responsibility of creating the structured type data items remains with the programmers themselves.
Accordingly, programming languages provide mechanism to create and manipulate structured
data items.
A data structure is a type of storage that is used to organize and store data. It is a method of
organizing data on a computer so that it may be easily accessible and modified.
It's critical to choose the correct data format for your project based on your requirements and
project. If you wish to store data sequentially in memory, for example, you can use the Array data
structure.
Non-linear data structure: Every data item is attached to several other data items in a way that is
specific for reflecting relationships. The data items are not arranged in a sequential structure.
Ex: Trees, Graphs.
Trees: Trees are multilevel data structures with a hierarchical relationship among its elements
known as nodes.
Graphs: Graphs can be defined as the pictorial representation of the set of elements (represented by
vertices) connected by the links known as edges
Basic Terminology
Data: Data can be defined as an elementary value or the collection of values, for example, student's
name and its id are the data about the student.
Group Items: Data items which have subordinate data items are called Group item, for example,
name of a student can have first name and the last name.
Record: Record can be defined as the collection of various data items, for example, if we talk about
the student entity, then its name, address, course and marks can be grouped together to form the
record for the student.
Field: A File is a collection of various records of one type of entity, for example, if there are 60
students in the class, then there will be 20 records in the related file where each record contains the
data about each student.
As shown above Address1 is unstructured address data. In this form you cannot access individual
items from it. You can at best refer to the entire address at one time. While in the second from, i.e.,
Address2, you can access and manipulate individual fi elds of the address – House No., Street, PIN
etc. Given hereunder are two instances of the address1 and address2 variables.
Precondition specifies any condition that may apply as a pre-requisite for the operation
definition.There are certain operations that can be carried out if certain conditions are satisfied. For
example,in case of division operation the divisor should never be equal to zero. Only if this
condition issatisfied the division operation is carried out. Hence, this becomes a precondition. In
that case &(ampersand) should be mentioned in the operation definition.
Postcondition specifies what the operation does. One can say that it specifies the state after
theoperation is performed. In the addition operation, the post condition will give the addition of
thetwo integers.
Component of ADT
As an example, let us consider the representation of integer data type as an ADT. We will consider
only two operations addition and division.
Value Definition
1. Definition clause: The values must be in between the minimum and maximum values
specified for the particular computer.
2. Condition clause: Values should not include decimal point.
Operations
1. add (a, b)
Function: add the two integers a and b.
Precondition: no precondition.
Postcondition: output = a + b
2. Div (a, b)
Function: Divide a by b.
Precondition: b != 0
Postcondition: output = a/b.
There are two ways of implementing a data structure viz. static and dynamic. In static
implementation, the memory is allocated at the compile time. If there are more elements than the
specified memory then the program crashes. In dynamic implementation, the memory is allocated
as and when required during run time.
Any type of data structure will have certain basic operations to be performed on its data like insert,
delete, modify, sort, search etc. depending on the requirement. These are the entities that decide the
representation of data and distinguish data structures from each other.
Let us see why user defined data structures are essential. Consider a problem where we need to
create a list of elements. Any new element added to the list must be added at the end of the list and
whenever an element is retrieved, it should be the last element of the list. One can compare this to a
pile of plates kept on a table. Whenever one needs a plate, the last one on the pile is taken and if a
plate is to be added on the pile, it will be kept on the top. The description wants us to implement a
stack. Let us try to solve this problem using arrays.
We will have to keep track of the index of the last element entered in the list. Initially, it will be set
to –1. Whenever we insert an element into the list, we will increment the index and insert the value
into the new index position. To remove an element, the value of current index will be the output
and the index will be decremented by one. In the above representation, we have satisfied the
insertion and deletion conditions.
Using arrays we could handle our data properly, but arrays do allow access to other values in
addition to the top most one. We can insert an element at the end of the list but there is no way to
ensure that insertion will be done only at the end. This is because array as a data structure allows
access to any of its values. At this point we can think of another representation, a list of elements
where one can add at the end, remove from the end and elements other than the top one are not
accessible. As already discussed, this data structure is called as STACK. The insertion operation is
known as push and removal as pop. You can try to write an ADT for stacks.
Another situation where we would like to create a data structure is while working with complex
numbers. The operations add, subtract division and multiplication will have to be created as per the
rules of complex numbers. The ADT for complex numbers is given below. Only addition and
multiplication operations are considered here, you can try to write the remaining operations.
Abstract Data Type (ADT)
1. A framework for an object interface
2. What kind of stuff it’d be made of (no details)?
3. What kind of messages it would receive and kind of action it’ll perform when properly
triggered?
The cell is the basic building block of data structures. We can picture a cell as a box that is capable
of holding a value drawn from some basic or composite data type. Data structures are created by
giving names to aggregates of cells and (optionally) interpreting the values of some cells as
representing relationships or connections (e.g., pointers) among cells.
1.4 Algorithm
Algorithm is set of rules/ instructions that step-by-step define how a work is to be executed upon
in order to get the expected results.
systematic procedure that produces in a finite number of steps the answer to a question or the
solution of a problem.
Computer algorithms work via input and output. They take the input and apply each step of the
algorithm to that information to generate an output.
E.g. a search engine is an algorithm that takes a search query as an input and searches its database
for items relevant to the words in the query. It then outputs the results.
Financial companies use algorithms in areas such as loan pricing, stock trading, asset-liability
management, and many automated functions. For example, algorithmic trading, known as algo
trading, is used for deciding the timing, pricing, and quantity of stock orders. Also referred to as
automated trading or black-box trading, algo trading uses computer programs to buy or sell
securities at a pace not possible for humans.
Computer algorithms make life easier by trimming the time it takes to manually do things. In the
world of automation, algorithms allow workers to be more proficient and focused. Algorithms
make slow processes more proficient. In many cases, especially in automation, algos can save
companies money.
Example:
Step 1: Start
Step 2: Declare variables num1, num2 and sum.
Step 3: Read values num1 and num2.
Step 4: Add num1 and num2 and assign the result to sum.
Sum = num1+num2
Step 5: Display sum
Step 6: Stop
Big-O notation
Omega notation
Theta notation
Big-O notation represents the upper bound of the running time of an algorithm. It gives the worst-
case complexity of an algorithm.
O(n) is useful when we only have an upper bound on the time complexity of an algorithm.
It is widely used to analyses an algorithm as we are always interested in the worst-case scenario.
Summary
Data Structure is method or technique to data organization, management, and storageformat
in the computer so we can perform operations on the stored data more efficiently.
Data structure is a combination of one or more basic data types to form a single addressable
data type.
An algorithm is a finite set of instructions which, when followed, accomplishes a particular
task, the termination of which is guaranteed under all cases, i.e. the termination is
guaranteed for every input.
The instructions must be unambiguous and the algorithm must produce the output within a
finite number of executions of its instructions.
Abstract data type (ADT) is a mathematical model with a collection of operations defined on
that model. Although the terms ‘data type’, ‘data structure’ and ‘abstract data type’ sound
alike, they have different meanings.
Keywords
Data: Data can be defined as an elementary value or the collection of values, for example,
student's name and its id are the data about the student.
Group Items: Data items which have subordinate data items are called Group item, for
example, name of a student can have first name and the last name.
Linear Data Structure: A linear data structure traverses the data elements sequentially, in
whichonly one data element can directly be reached.
Non-linear Data Structure: Every data item is attached to several other data items in a way
thatis specific for reflecting relationships. The data items are not arranged in a sequential
structure.
Searching: Finding the location of the record with a given key value, or finding the
locations ofall records, which satisfy one or more conditions.
Traversing: Accessing each record exactly once so that certain items in the record may
beprocessed.
Self Assessment
1. Which is type of data structure.
A. Primitive
B. Non-primitive
C. Both primitive and non-primitive
D. None of above
A. Trees
B. Arrays
C. Graphs
D. None of these
A. Array
B. Linked lists
C. Stacks
D. None of these
A. Primitive
B. Identifier
C. Non-primitive
D. None of these
A. FIFO
B. LIFO
C. Push
D. None of the Above
7. A procedure for solving a problem in terms of action and their order is called as
A. Program instruction
B. Algorithm
C. Process
D. Template
A. Pseudocode
B. Flowchart
C. None of the above
D. Both Pseudocode and Flowchart
A. Space Complexity
B. Time Complexity
C. Both space and time complexity
D. None of aboveone of these
A. Big-O notation
B. Omega notation
C. Theta notation
D. All of above
A. Space complexity
B. Upper bound of the running time of an algorithm
C. Lower bound of the running time of an algorithm
D. None of above
A. Reflexive
B. Symmetric
C. Transpose Symmetric
D. All of these
A. Value definition
B. Operation definition
C. Both value and operation definition
D. None of above.
6. D 7. B 8. D 9. D 10. C
Review Question
1. Define data structure and its application.
2. What are the advantages of data structure?
3. Discuss abstract data type.
4. What is significance of space and time complexity in algorithm.
5. Explain different types of algorithms.
6. Discuss Asymptotic notations with example.
7. Define record and file.
Further Readings
Data Structures and Algorithms; Shi-Kuo Chang; World Scientifi c.
Data Structures and Efficient Algorithms, Burkhard Monien, Thomas Ottmann,
Springer.
Mark Allen Weles: Data Structure & Algorithm Analysis in C Second Adition.
Addison-Wesley publishing
Thomas H. Cormen, Charles E, Leiserson& Ronald L. Rivest: Introduction to
Algorithms. Prentice-Hall of India Pvt. Limited, New Delhi
Timothy A. Budd, Classic Data Structures in C++, Addison Wesley.
Ashwani Kumar, Lovely Professional University Unit 02: Arrays vs Linked Lists
Objectives
After studying this unit, you will be able to:
• Learn basic concepts of arrays
• Understand the basics of linked list
• Describe the types of array operations
• Discuss the operations of linked lists
Introduction
A data structure consists of a group of data elements bound by the same set of rules. The data
elements also known as members are of different types and lengths. We can manipulate data stored
in the memory with the help of data structures. The study of data structures involves examining the
merging of simple structures to form composite structures and accessing definite components from
composite structures. An array is an example of one such composite data structure that is derived
from a primitive data structure.
An array is a set of similar data elements grouped together. Arrays can be one-dimensional or
multidimensional. Arrays store the entries sequentially. Elements in an array are stored in
continuous locations and are identified using the location of the first element of the array.
2.1 Arrays
An array is a data type, much like a variable as both array and variable hold information. However,
unlike a variable, an array can hold several pieces of data called elements. Arrays can hold any type
of data, which includes string, integers, Boolean, and so on. An array can also handle other
variables as well as other arrays. An integer index identifies the individual elements of an array.
Arrays are allocated the memory in a strictly contiguous fashion. The simplest array is one-
dimensional array which is a list of variables of same data type. An array of one-dimensional arrays
[Subject]
Initializing an Array
We can initialize an array by assigning values to the elements during declaration. We can access the
element by specifying its index. While initializing an array, the initial values are given sequentially
separated by commas and enclosed in braces.
Example:
Consider the elements 10, 20, 30, and 40. The array can be represented as:
a[4]={10, 20, 30, 40}
The elements can be stored in an array as shown below:
a[0] = 10
a[1] = 20
a[2] = 30
a[3] = 40
The element 20 can be accessed by referencing a[1].
Now, consider n number of elements in an array. Hence, to access any element
within the array, we use a[i], where i is the value between 0 to n-1.
The corresponding code used in C language to read n number of integers in an
array is:
for(i= 0; i<n; i++)
{
scanf(“%d”,&a[i]);
}
Example:
int value = 10;
Here, the value 10 is called an initializer.
Similar to a variable, we can initialize an array at the time of its declaration. The following example
shows an array initialization.
Example:
int a[5] = {10, 11, 12, 13, 14};
In this declaration, a[0] is initialized to 10, a[1] is initialized to 11, and so on. There must be at least
one
initial value between braces. If the number of initialized array elements is lesser than the declared
size,
then the remaining array elements are assigned the value 0.
If we provide all the array elements during initialization, it is not necessary to specify the array size.
Thecompiler automatically counts the number of elements and reserves the space in the memory
for thearray.
Example:
int a[] = {10, 20, 30, 40};
Here the compiler reserves four spaces for array a.
Linear Array
A linear or one-dimensional array is a structured collection of elements (often called array
elements). It can be accessed individually by specifying the position of each element by an index
value.
Example: If we want to store a set of five numbers by an array variable number. Then it will be
accomplished in the following way:
int number [5];
This declaration will reserve five contiguous memory locations capable of storing an integer type
value each, as shown below:
[Subject]
Now let us see how individual elements of linear array are accessed. The syntax for accessing an
arraycomponent is:
ArrayName[IndexExpression]
The IndexExpression must be an integer value. The integer value can be of char, short int, long int,
or
Boolean value because these are integral data types. The simplest form of index expression is a
constant.
Example:
If we consider an array number[25], then,
number[0] specifies the 1st component of the array
number[1] specifies the 2nd component of the array
number[2] specifies the 3rd component of the array
number[3] specifies the 4th component of the array
number[4] specifies the 5th component of the array
.
.
.
number[23] specifies the 2nd
To store and print values from the number array, we can perform the following:
for(int i=0; i< 25; i++)
{
number[i]=i; // Storing a number in each array element
printf("%d", number[i]); //Printing the value
}
Multidimensional Array
Multidimensional arrays are also known as "arrays of arrays." Programming languages often need
to store and manipulate two or more dimensional data structures such as, matrices, tables, and so
on. When programming languages use two subscripts they are known as two-dimensional arrays.
One subscript denotes a row and the other denotes a column.
The declaration of two-dimension array is as follows:
data_typearray_name[row_size][column_size];
Example:
int m[5][10]
Here, m is declared as a two dimensional array having 5 rows (numbered from 0 to 4) and 10
columns (numbered from 0 to 9). The first element of the array is m[0][0] and the last row last
column is m[4][9]
Now let us discuss a three-dimensional array. A three-dimensional array is considered as an array
of two-dimensional arrays.
Example:
A three dimensional array is created as follows:
Example:
Suppose we want to assign a value 312 to the element at position 3 down, 7 across, and 2 in, then
we write it as:
bigArray [2][6][1] = 312;
Example:
int table[2][3]={0,0,0,1,1,1};
The table array initializes the elements of first row to 0 and the second row to
1. The initialization is done row by row. The above statement can be equivalently written as:
int table[2][3]={{0,0,0},{1,1,1}}
Three or four-dimensional arrays are more complicated. They can also be initialized by declaring a
list of initial values enclosed in braces.
Example:
int table[3][3][3]={1,2,3,4,5 6,7,8,…………….27 };
This will create an array named table containing 27 integers. We can access any element of this
array by using 3 indices.
The method to access table[1][1][1], is as shown below:
The values for array - table[3][3][3] are as follows:
{1, 2, 3}
{4, 5, 6}
{7, 8, 9}
{10, 11, 12}
{13, 14, 15}
{16, 17, 18}
{19, 20, 21}
{22, 23, 24}
{25, 26, 27}
The values in the array can be accessed using three for loops. The loop contains three variables i, j,
and k respectively. This is as shown below:
for(i=0;i<3;i++)
{
for(j=0;j<3;j++)
for(k=0;k<3;k++)
{
printf("%d\t",table[i][j][k]);
[Subject]
}
printf("\n");
}
}
printf(“%d”, table[1][1][1]);
For every iteration of the i, j and k loops, the values printed are:
[0][0][0] = 1
[0][0][1] =2
[0][0][2] =3
[1][1][1] =14
Adding Operation
Adding elements into an array is known as insertion. The insertion of data elements is done at the
end of an array. This is possible only if there is enough space in the array to add the additional
elements. The elements can also be inserted in the middle of the array. Here, the average half of the
array elements is moved to the next location to empty the block of memory, and to accommodate
the new element.
Example:
#include<stdio.h>
#include<conio.h>
void main()
{
int n, i, data, po_indx, a[50]; //Variable declaration
clrscr();
printf("Enter number of elements in the array\n");
/*Get the number of elements to be added to the array from the user*/
scanf("%d", &n);
printf("\nEnter %d elements\n\n", n); //Print the number of elements
for(i=0;i<n;i++) //Iterations using for loop
scanf("%d",&a[i]); //Accepting the values in the array
printf("\nEnter a data to be inserted\n");
scanf("%d",&data); //Reads the data added by user
printf("\nEnter the position of the item \n");
scanf("%d",&po_indx); //Reads the position where the data is inserted
/* Checking if the position is greater than the size of the array*/
if(po_indx-1>n)
printf("\nposition not valid\n"); //If the condition is true this will be printed
else //If the condition is false the ‘else’ part will get executed
{
for(i=n;i>=po_indx;i--) //Iterations using for loop
a[i]=a[i-1]; //Value of a[i-1] is assigned to a[i]
/*Value of data will be assigned to [po_indx-1] position*/
a[po_indx-1]=data;
n=n+1; //Incrementing the value of n
printf("\nArray after insertion\n"); //Print the array list after insertion
for(i=0;i<n;i++) //Use for loop and
printf("%d\t",a[i]); //Print the final array after insertion
}
getch(); //Display characters on screen
}
Output:
Enter number of elements in the array
5
Enter 5 elements
15 20 32 45 62
Enter a data to be inserted
77
Enter the position of the item
2
Array after insertion
15 77 20 32 45 62
In this example:
1. First, the header files are included using #include directive.
[Subject]
Sorting Operation
Sorting operation arranges the elements of a list in a certain order. Efficient sorting is important for
optimizing the use of other algorithms that require sorted lists to work correctly.
Sorting an array efficiently is quite complicated. There are different sorting algorithms to perform
the task of sorting, but here we will discuss only Bubble Sort.
Bubble Sort
Bubble sort is a simple sorting technique when compared to other sorting techniques. The bubble
sort algorithm starts from the very first element of the data set. In order to sort elements in the
ascending order, the algorithm compares the first two elements of the data set. If the first element is
greater than the second, then the numbers are swapped.
This process is carried out for each pair of adjacent elements to the end of the data set until no
swaps occur on the last pass. This algorithm's average and worst case performance is O (2n) as it is
rarely used to sort large, unordered data sets.
Bubble sort can always be used to sort a small number of items where efficiency is not a high
priority. Bubble sort may also be effectively used to sort a partially sorted list.
Example:
#include <stdio.h>
#include <conio.h>
int A[8] = {55, 22, 2, 43, 12, 8, 32, 15}; //Declaring the array with 8 elements
int N = 8; //Size of the array
void BUBBLE (void); //BUBBLE Function declaration
void main()
{
int i; //Variable declaration
clrscr();
/*Printing the values in the array*/
printf("\n\nValues present in array A =");
for (i=0; i<8; i++) //Iterations using for loop
printf(" %d, ", A[i]); //Printing the array
BUBBLE(); //BUBBLE function is called
/*Printing the values from the array after sorting*/
printf("\n\nValues present in the array after sorting =");
for (i=0; i<8; i++) //Iterations
printf(" %d, ", A[i]); // Printing the array after sorting
getch(); // waits for a key to be pressed
}
void BUBBLE(void) //BUBBLE Function definition
{
int K, PTR, TEMP; //Declaration variables
for(K=0; K <= (N-2); K++) //Iterations
{
PTR = 0; //Assign 0 to variable PTR
while(PTR <= (N-K-1-1)) //Checking if PTR <= (N-K-1-1)
{
/* Checking if the element at A[PTR] is greater than A[PTR+1]*/
if(A[PTR] > A[PTR+1])
{
TEMP = A[PTR];
A[PTR] = A[PTR+1];
A[ PTR +1] = TEMP;
}
/*Increment the array index*/
PTR = PTR+1;
}
}
[Subject]
}
Output:
Values present in A[8] = 55, 22, 2, 43, 12, 8, 32, 15
Values present in A[8] after sorting = 2, 8, 12, 15, 22, 32, 43, 55
In this example:
1. First, the header files are included using #include directive.
2. Then, the array A is declared globally along with the array elements and the size.
3. Then, inside the main function the variable i is declared.
4. The values in the array are printed using a for loop.
5. Next, the Bubble function is called. The sorting operation is carried out and values present in the
array are printed.
6. getch() prompts the user to press a key. Then the program terminates.
7. In The BUBBLE function the variables K, PTR and TEMP are declared as integers.
8. PTR is set to 0.
9. Within the while loop the adjacent array elements are compared. If the element at a lower
position is greater than the element at the next position, both the elements are interchanged.
10. The array index is then incremented.
Searching Operation
Searching is an operation used for finding an item with specified properties among a collection of
items.In a database, the items are stored individually as records, or as elements of a search space
addressed by a mathematical formula or procedure. The mathematical formula or procedure may
be the root of an equation containing integer variables.
Search operation is closely related to the concept of dictionaries. Dictionaries are a type of data
structure that support operations such as, search, insert, and delete.
Computer systems are used to store large amounts of data. From these large amount of data,
individual records are retrieved based on some search criterion. The efficient storage of data is an
important issue to facilitate fast searching.
There are many different searching techniques or algorithms. The selection of algorithm depends
on the way the information is organized in memory. Now, we will discuss linear searching
technique.
6. Exit
Traversing Operation
Traversing an array refers to moving in inward and outward direction to access each element in an
array. To traverse an array, one can use for loop. The array elements are accessed using an array
index or a pointer of type similar to that of array elements. To access the elements using a pointer,
the pointer must be initialized with the base address of the array. Traversing operation also
involves printing the elements in an array.
Example:
#include<stdio.h>
#include<conio.h>
#define SIZE 20 //Define array size
void main()
{
float sum(float[], int); //Function declaration
float x[SIZE], Sum_total=0.0;
int i, n; //Variable declaration
clrscr();
printf("Enter the number of elements in array\n");
scanf(" %d", &n); //Reads the data added by user
printf("Enter %d elements:\n", n); //Printing the values in the array
for(i=0; i<n; i++) //Iterations using for loop
/* Input the elements of the array (Traverse operation)*/
scanf(" %f", &x[i]);
printf("The elements of array are:\n\n"); //Printing the elements of the array
for(i=0; i<n; i++) //Iterations using for loop
/*print the elements of array in floating point form(Traverse operation)*/
printf(" %.2f\t", x[i]);
/*Call the function sum and store the value returned in Sum_total*/
Sum_total = sum(x, n);
/*Printing the sum*/
printf("\n\nSum of the given array is: %.2f\n", Sum_total);
getch(); //wait until a key is pressed
}
[Subject]
deletion at any arbitrary position in an array a costly operation, because this involves the
movement of some of the existing elements.
When we want to represent several lists by using arrays of varying size, either we have to represent
each list using a separate array of maximum size or we have to represent each of the lists using one
single array. The first one will lead to wastage of storage, and the second will involve a lot of data
movement.
So we have to use an alternative representation to overcome these disadvantages. One alternative is
a linked representation. In a linked representation, it is not necessary that the elements be at a fixed
distance apart. Instead, we can place elements anywhere in memory, but to make it a part of the
same list, an element is required to be linked with a previous element of the list. This can be done
by storing the address of the next element in the previous element itself. This requires that every
element be capable of holding the data as well as the address of the next element. Thus every
element must be a structure with a minimum of two fields, one for holding the data value, which
we call a data field, and the other for holding the address of the next element, which we call link
field.
Therefore, a linked list is a list of elements in which the elements of the list can be placed anywhere
in memory, and these elements are linked with each other using an explicit link field, that is, by
storing the address of the next element in the link field of the previous element.
This program uses a strategy of inserting a node in an existing list to get the list created. An insert
function is used for this. The insert function takes a pointer to an existing list as the first parameter,
and a data value with which the new node is to be created as a second parameter, creates a new
node by using the data value, appends it to the end of the list, and returns a pointer to the first node
of the list. Initially the list is empty, so the pointer to the starting node is NULL.
Therefore, when insert is called first time, the new node created by the insert becomes the start
node. Subsequently, the insert traverses the list to get the pointer to the last node of the existing list,
and puts the address of the newly created node in the link field of the last node, thereby appending
the new node to the existing list. The main function reads the value of the number of nodes in the
list. Calls iterate that many times by going in a while loop to create the links with the specified
number of nodes.
[Subject]
Example:
Program:
# include <stdio.h>
# include <stdlib.h>
struct node
{
int data;
struct node *link;
};
struct node *insert(struct node *p, int n)
{
struct node *temp;
/* if the existing list is empty then insert a new node as the
starting node */
if(p==NULL)
{
p=(struct node *)malloc(sizeof(struct node)); /* creates new
node data value passes
as parameter */
if(p==NULL)
{
printf(“Error\n”);
exit(0);
}
p-> data = n;
p-> link = p; /* makes the pointer pointing to itself because
it is a circular list*/
}
else
{
temp = p;
/* traverses the existing list to get the pointer to the last node
of it */
while (temp->link != p)
temp = temp->link;
temp-> link = (struct node *)malloc(sizeof(struct node)); /*
creates new node using
data value passes as
parameter and puts its
address in the link field
of last node of the
existing list*/
if(temp -> link == NULL)
{
printf(“Error\n”);
exit(0);
}
temp = temp->link;
[Subject]
temp-> data = n;
temp-> link = p;
}
return (p);
}
void printlist( struct node *p )
{
struct node *temp;
temp = p;
printf(“The data values in the list are\n”);
if(p!= NULL)
{
do
{
printf(“%d\t”,temp->data);
temp=temp->link;
} while (temp!= p);
}
else
printf(“The list is empty\n”);
}
void main()
{
int n;
int x;
struct node *start = NULL ;
printf(“Enter the nodes to be created \n”);
scanf(“%d”,&n);
while ( n -- > 0 )
{
printf( “Enter the data values to be placed in a node\n”);
scanf(“%d”,&x);
start = insert ( start, x );
}
printf(“The created list is\n”);
printlist( start );
}
node to be deleted. Then the link fi eld of the node that appears before the node to be deleted is
made to point to the node that appears after the node to be deleted, and the node to be deleted is
freed.
[Subject]
Lab Exercise:
# include <stdio.h>
# include <stdlib.h>
struct node *delet( struct node *, int );
int length ( struct node * );
struct node
{
int data;
struct node *link;
};
struct node *insert(struct node *p, int n)
{
struct node *temp;
if(p==NULL)
{
p=(struct node *)malloc(sizeof(struct node));
if(p==NULL)
{
printf(“Error\n”);
exit(0);
}
p-> data = n;
p-> link = NULL;
}
else
{
temp = p;
while (temp->link != NULL)
temp = temp->link;
temp-> link = (struct node *)malloc(sizeof(struct node));
if(temp -> link == NULL)
{
printf(“Error\n”);
exit(0);
}
temp = temp->link;
temp-> data = n;
temp-> link = NULL;
}
return (p);
}
void printlist( struct node *p )
{
printf(“The data values in the list are\n”);
while (p!= NULL)
{
printf(“%d\t”,p-> data);
p = p->link;
}
}
void main()
{
int n;
int x;
struct node *start = NULL;
printf(“Enter the nodes to be created \n”);
scanf(“%d”,&n);
while ( n- > 0 )
{
printf( “Enter the data values to be placed in a node\n”);
scanf(“%d”,&x);
start = insert ( start, x );
}
printf(“ The list before deletion id\n”);
printlist( start );
printf(“% \n Enter the node no \n”);
scanf( “ %d”,&n);
start = delet (start , n );
printf(“ The list after deletion is\n”);
printlist( start );
}
/* a function to delete the specified node*/
struct node *delet( struct node *p, int node_no )
{
struct node *prev, *curr ;
int i;
if (p == NULL )
{
printf(“There is no node to be deleted \n”);
}
[Subject]
else
{
if ( node_no> length (p))
{
printf(“Error\n”);
}
else
{
prev = NULL;
curr = p;
i=1;
while ( i<node_no )
{
prev = curr;
curr = curr->link;
i = i+1;
}
if ( prev == NULL )
{
p = curr ->link;
free ( curr );
}
else
{
prev -> link = curr ->link ;
free ( curr );
}
}
}
return(p);
}
/* a function to compute the length of a linked list */
int length ( struct node *p )
{
int count = 0 ;
while ( p != NULL )
{
count++;
p = p->link;
}
return ( count ) ;
}
[Subject]
To reverse a list, we maintain a pointer each to the previous and the next node, then we make
thelink field of the current node point to the previous, make the previous equal to the current, and
thecurrent equal to the next.
Memory is allocated during the compile time Memory is allocated during the run-time
(Static memory allocation). (Dynamic memory allocation).
Size of the array must be specified at the time of Size of a Linked list grows/shrinks as and when
array declaration/initialization. new elements are inserted/deleted.
Summary
An array is a set of same data elements grouped together. Arrays can be one-dimensional
ormultidimensional.
A linear or one-dimensional array is a structured collection of elements (often called as
array elements) that are accessed individually by specifying the position of each element
with a single index value.
Multidimensional arrays are nothing but "arrays of arrays". Two subscripts are used to
refer to the elements.
The operations that are performed on an array are adding, sorting, searching, and
traversing.
Traversing an array refers to moving in inward and outward direction to access each
element in an array.
Linked list is a technique of dynamically implementing a list using pointers. A linked list
contains two fields namely, data field and link field.
A singly-linked list consists of only one pointer to point to another node and the last node
always points to NULL to indicate the end of the list.
A doubly-linked list consists of two pointers, one to point to the next node and the other to
point to the previous node.
In a circular singly-linked list, the last node always points to the first node to indicate the
circular nature of the list.
A circular doubly-linked list consists of two pointers for forward and backward traversal
and the last node points to the first node.
[Subject]
Searching operation involves searching for a specific element in the list using an associated
key.
Insertion operation involves inserting a node at the beginning or end of a list.
Deletion operation involves deleting a node at the beginning or following a given node or
at the end of a list.
Keywords
Non-linear Data Structure: Every data item is attached to several other data items in a way
thatis specific for reflecting relationships. The data items are not arranged in a sequential structure.
Searching: Finding the location of the record with a given key value, or fi nding the locations ofall
records, which satisfy one or more conditions.
Traversing: Accessing each record exactly once so that certain items in the record may
beprocessed.
Circular Linked List: A linear linked list in which the last element points to the fi rst element,
thus,forming a circle.
Doubly Linked List: A linear linked list in which each element is connected to the two
nearestelements through pointers.
Self-Assessment
1. Which one is correct statement?
A. True
B. False
A. Sorting
B. Merging
C. Traversing
D. All of above
A. 1
B. 2
C. 5
D. 3
A. Random
B. Sequential
C. Both random and sequential
D. None of above
A. True
B. False
A. 1
B. 2
C. 3
D. 4
[Subject]
A. 1
B. 2
C. -1
D. 0
A. Data field
B. Link field
C. Both data and link field
D. None of above
14. What are the shortcomings of array?
A. Memory allocation
B. Memory efficiency
C. Execution time
D. All of above
A. Single
B. Double
C. Circular
D. All of above
A. Insertion
B. Deletion
C. Search
D. All of above
A. Static
B. Heap
C. Dynamic
D. Compile time
6. B 7. D 8. A 9. B 10. D
Review Questions
1. Define array and its types.
2. Give an example of multidimensional array.
3. Discuss any two types of array initialization methods with example.
4. Discuss different sorting methods.
5. Write a program to sort the elements of a linked list.
6. Differentiate between array and linked list with suitable example.
7. Discuss different operation performed with linked list.
8. Discuss advantages of linked list as compared to arrays.
Further Readings
Data Structures and Efficient Algorithms, Burkhard Monien, Thomas Ottmann,Springer.
Kruse Data Structure & Program Design, Prentice Hall of India, New Delhi
Mark Allen Weles: Data Structure & Algorithm Analysis in C Second Adition.Addison-
Wesley publishing
Sorenson and Tremblay: An Introduction to Data Structure with Algorithms.
Thomas H. Cormen, Charles E, Leiserson& Ronald L. Rivest: Introduction to
Algorithms. Prentice-Hall of India Pvt. Limited, New Delhi
Timothy A. Budd, Classic Data Structures in C++, Addison Wesley.
Web Links
www.en.wikipedia.org
www.web-source.net
www.webopedia.com
https://www.geeksforgeeks.org/data-structures/
https://www.programiz.com/dsa/data-structure-types
[Subject]
Objectives
After studying this unit, you will be able to:
• Learn fundamentals of stacks
• Explain the basic operations of stack
• Explain the implementation and applications of stacks
Introduction
Stacks are simple data structures and an important tool in programming language. Stacks are linear
lists which have restrictions on the insertion and deletion operations. These are special cases of
ordered list in which insertion and deletion is done only at the ends.
The basic operations performed on stack are push and pop. Stack implementation can be done in
two ways - static implementation or dynamic implementation. Stack can be represented in the
memory using a one-dimensional array or a singly linked list.
Stack is another linear data structure having a very interesting property. Unlike arrays and link
lists, an element can be inserted and deleted not at any arbitrary position but only at one end. Thus,
one end of a stack is sealed for insertion and deletion while the other end allows both the
operations.
A stack is said to possess LIFO (Last In First Out) property. A data structure has LIFO property if
the element that can be retrieved first is the one that was inserted last.
Push Operation
The procedure to insert a new element to the stack is called push operation. The push operation
adds an element on the top of the stack. ‘Top’ refers to the element on the top of stack. Push makes
the ‘Top’ point to the recently added element on the stack. After every push operation, the ’Top’ is
incremented by one. When the array is full, the status of stack is FULL and the condition is called
stack overflow. No element can be inserted when the stack is full
Pop Operation
The procedure to delete an element from the top of the stack is called pop operation. After every
pop operation, the ‘Top’ is decremented by one. When there is no element in the stack, the status of
the stack is called empty stack or stack underflow. The pop operation cannot be performed when it
is in stack underflow condition.
Array-based Implementation
A stack is a sequence of data elements. To implement a stack structure, an array can be used as it is
a storage structure. Each element of the stack occupies one array element. Static implementation of
stack can be achieved using arrays. The size of the array, once declared, cannot be changed during
the program execution. Memory is allocated according to the array size. The memory requirement
is determined before the compilation. The compiler provides the required memory. This is suitable
when the exact number of elements is known. The static allocation is an inefficient memory
allocation technique because if fewer elements are stored than declared, the memory is wasted and
if more elements need to be stored than declared, the array cannot expand. In both the cases, there
is inefficient use of memory.
The following pseudo-code shows the array-based implementation of a stack. In this, the elements
of the stack are of type T.
struct stk
{ T array[max_size];
/* max_size is the maximum size */
int top = -1;
/* stack top initially given value -1 */
} stack;
void push(T e)
/*inserts an element e into the stack s*/
{
if (stack.top == max_size)
printf(“Stack is full-insertion not possible”);
else
{
stack.top = stack.top + 1;
stack.array[stack.top] = e;
}
}
T pop()
/*Returns the top element from the stack */
{
T x;
if(stack.top == -1)
printf(“Stack is empty”);
else
{
x = stack.array[stack.top];
stack.top = stack.top - l;
return(x);
}
}
booleanempty()
/* checks if the stack is empty * /
{
boolean empty = false;
if(stack.top == -1)
empty = true else empty = false;
return(empty);
}
void initialise()
/* This procedure initializes the stack s * /
{
stack.top = -1;
}
The above implementation strategy is easy and fast since it does not have run-time overheads. At
the same time it is not flexible since it cannot handle a situation when the number of elements
exceeds max_size. Also, let us say, if max_size is derided statically to 100 and a stack actually has
only 10 elements, then memory space for the rest of the 90 elements would be wasted.
Here the memory is used dynamically. For every push operation, the memory space for one
element is allocated at run-time and the element is inserted into the stack. For every pop operation,
the memory space for the deleted element is de-allocated and returned to the free space pool.
Hence the shortcomings of the array-based implementation are overcome. But since, this allocates
memory dynamically, the execution is slowed down.
The following pseudo-code is for the pointer-based implementation of a stack. Each element of the
stack is of type T.
struct stk
{
T element;
struct stk *next;
};
struct stk *stack;
void push(struct stk *p, T e)
{
struct stk *x;
x = new(stk);
x.element = e;
x.next = NULL;
p = x;
}
Here the stack full condition is checked by the call to new which would give an error if no memory
space could be allocated.
T pop(struct stk *p)
{
struct stk *x;
if (p == NULL)
printf(“Stack is empty”);
else
{ x = p;
x = x.next;
return(p.element);
}
booleanempty(sstructstk *p)
{
if (p == NULL)
return(true);
else
return(false);
}
void initialize(struct stk *p)
{
p = NULL;
}
Parenthesis checker
Parenthesis checker is used for balanced Brackets in an expression. The balanced parenthesis means
that when the opening parenthesis is equal to the closing parenthesis, then it is a balanced
parenthesis.
(a+b*(c/d))
[10+20*(6+7)]
(x+y)/(c-d)
Balanced parenthesis
A = (50+25)
In the above expression there is one opening and one closing parenthesis means that both opening
and closing brackets are equal; therefore, the above expression is a balanced parenthesis.
Unbalanced parenthesis
A= [(15+25)
The above expression has two opening brackets and one closing bracket, which means that both
opening and closing brackets are not equal; therefore, the above expression is unbalanced.
Algorithm
Infix Notation
Infix Notation can be represented as:
Example: 15 + 26
a+b
Postfix Notation
Postfix Notation can be represented as
operand1 operand2 operator
Example: 15 29 +
ab+
Prefix notation
Prefix notation can be represented as
operator operand1 operand2
Example: + 10 20
+ab
Infix notation is used most frequently in our day to day tasks. Machines find infix notations
tougher to process than prefix/postfix notations. Hence, compilers convert infix notations to
prefix/postfix before the expression is evaluated.
The precedence of operators needs to be taken care as per hierarchy
(^) > (*) > (/) > (+) > (-)
Brackets have the highest priority.
To evaluate an infix expression, We need to perform 2 steps:
Sorting
A Sorting process is used to rearrange a given array or elements based upon selected algorithm/
sort function.
Quick Sort is used for sorting a list of data elements.Quicksort is a sorting algorithm based on the
divide and conquer approach.An array is divided into subarrays by selecting a pivot
element.During array dividing, the pivot element should be positioned in such a way that elements
less than pivot are kept on the left side and elements greater than pivot are on the right side of the
pivot.The left and right subarrays are also divided using the same approach. This process continues
until each subarray contains a single element
There are many different versions of quick Sort that pick pivot in different ways.
Always pick first element as pivot.
Always pick last element as pivot
Pick a random element as pivot.
Pick median as pivot.
Algorithm
quickSort(arr, beg, end)
if (beg < end)
pivotIndex = partition(arr,beg, end)
quickSort(arr, beg, pivotIndex)
quickSort(arr, pivotIndex + 1, end)
partition(arr, beg, end)
set end as pivotIndex
pIndex = beg - 1
for i = beg to end-1
if arr[i] < pivot
swap arr[i] and arr[pIndex]
pIndex++
swap pivot and arr[pIndex+1]
return pIndex + 1
Tower of Hanoi
The Tower of Hanoi, is a mathematical problem which consists of three rods and multiple
disks.Initially, all the disks are placed on one rod, one over the other in ascending order of size
similar to a cone-shaped tower.
The objective of this problem is to move the stack of disks from the source to destination, following
these rules:
1. Only one disk can be moved at a time.
2. Only the top disk can be removed.
3. No large disk can sit over a small disk.
Iterative Algorithm
1. At First Calculate the number of moves required i.e. "pow(2,n) - 1" where "n" is number of discs.
2. If the number of discs i.e n is even then swap Destination Rod and Auxiliary Rod.
3. for i = 1 upto number of moves:
Check if "i mod 3" == 1:
Perform Movement of top disc between Source Rod and Destination Rod.
Check if "i mod 3" == 2:
Perform Movement of top disc between Source Rod and Auxiliary Rod.
Check if "i mod 3" == 0:
Perform Movement of top disc between Auxiliary Rod and Destination Rod.
given control, it must eventually restore control to the calling routine by means of a branch.
However, it cannot execute that branch unless it knows the location to which it must return. Since
this location is within the calling routine and not within the function, the only way that the function
can know this address is to have it passed as an argument. This is exactly what happens. Aside
from the explicit arguments specified by the programmer, there is also a set of implicit arguments
that contain information necessary for the function to execute and return correctly. Chief among
these implicit arguments is the return address. The function stores this address within its own data
area. When it is ready to return control to the calling program, the function retrieves the return
address and branches to that location. Once the arguments and the return address have been
passed, control may be transferred to the function, since everything required has been done to
ensure that the function can operate on the appropriate data and then return to the calling routine
safely.
Summary
A stack is a linear data structure in which allocation and deallocation are made in a last-in-
first-out (LIFO) method.
The basic operations of stack are inserting an element on the stack (push operation) and
deleting an element from the stack (pop operation).
Stacks are represented in main memory by using one-dimensional array or a singly linked
list.
To implement a stack structure, an array can be used as its storage structure. Each element
of the stack occupies one array element. Static implementation of stack can be achieved
using arrays.
Stack is used to store return information in the case of function/procedure/subroutine
calls. Hence, one would fi nd a stack in architecture of any Central Processing Unit (CPU).
In infix notation operators come in between the operands. An expression can be evaluated
using stack data structure.
Keywords
LIFO: (Last In First Out) the property of a list such as stack in which the element which can be
retrieved is the last element to enter it.
Pop: Stack operation retrieves a value form the stack.
Infix: Notation of an arithmetic expression in which operators come in between their operands.
Postfix: Notation of an arithmetic expression in which operators come after their operands.
Prefix: Notation of an arithmetic expression in which operators come before their operands.
Push: Stack operation which puts a value on the stack.
Stack: A linear data structure where insertion and deletion of elements can take place only at one
end.
SelfAssessment
1. Which method is followed by Stack?
A. FILO
B. LIFO
C. Both FILO and LIFO
D. None of above
A. count()
B. peek()
C. getche()
D. display()
A. Overflow
B. Enqueue
C. Underflow
D. None of above
A. isEmpty()
B. isFull()
C. peek()
D. display()
A. Array
B. Linked list
C. Both using array and linked list
D. None of above
A. Balanced Brackets
B. Unbalanced Brackets
C. Both balanced and unbalanced
D. None of above
A. (a+b*(c/d))
B. [10+20*(6+7)}]
C. (x+y)/(c-d)
D. None of above
A. Bubble
B. Merge
C. Insertion
D. Linear
A. Queue
B. Stack
C. Array
D. None of above
A. Infix notation
B. Postfix notation
C. Prefix notation
D. All of above
A. Statically
B. Dynamically
C. Both Statically and Dynamically
D. None of above
A. peek()
B. pop()
C. display()
D. printf()
A. Full
B. Empty
C. Half
D. None of above
6. C 7. B 8. D 9. B 10. D
Review Questions
1 Explain how function calls may be implemented using stacks for return values.
2 What are the advantages of implementing a stack using dynamic memory allocation method?
3 Explain concept of tower of Hanoi.
4 what are the different methods for implementing stacks?
5 Give an example of push and pop operation using stack.
6 Write an algorithm to reverse an input string of characters using a stack.
Further Readings
Data Structures and Efficient Algorithms, Burkhard Monien, Thomas Ottmann,
Springer.
Kruse Data Structure & Program Design, Prentice Hall of India, New Delhi
Mark Allen Weles: Data Structure & Algorithm Analysis in C Second Adition. Addison-
Wesley publishing
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Shi-kuo Chang, Data Structures and Algorithms, World Scientifi c
Sorenson and Tremblay: An Introduction to Data Structure with Algorithms.
Thomas H. Cormen, Charles E, Leiserson& Ronald L. Rivest: Introduction to
Algorithms. Prentice-Hall of India Pvt. Limited, New Delhi
Timothy A. Budd, Classic Data Structures in C++, Addison Wesley.
Web Links
www.en.wikipedia.org
www.webopedia.com
https://www.programiz.com/
https://www.javatpoint.com/data-structure-stack
https://www.tutorialspoint.com/data_structures_algorithms/stack_algorithm.htm
Objectives
After studying this unit, you will be able to:
• Learn implementation of queues
• Explain priority queue
• Discuss applications of queues
Introduction
A queue is a linear list of elements that consists of two ends known as front and rear. We can delete
elements from the front end and insert elements at the rear end of a queue. A queue in an
application is used to maintain a list of items that are ordered not by their values but by their
sequential value.
The queue abstract data type is also a widely used one with applications very common in real life.
An example comes from the operating system software where the scheduler picks up the next
process to be executed on the system from a queue data structure. In this unit, we would study the
various properties of queues, their operations and implementation strategies.
return underflow
end if
data = queue[front]
front ← front + 1
return true
end procedure
Example:
/*Program of queue using array*/
/*insertion and deletion in a queue*/
/*insertion and deletion in a queue*/
# include <stdio.h>
# define MAX 50
int queue_arr[MAX];
int rear = -1;
int front = -1;
void ins_delete();
void insert();
void display();
void main()
{
int choice;
while(1)
{
printf("1.Insert\n");
printf("2.Delete\n");
printf("3.Display\n");
printf("4.Quit\n");
printf("Enter your choice : \n");
scanf("%d",&choice);
switch(choice)
{
case 1 : insert();
break;
case 2 :ins_delete();
break;
case 3: ins_display();
break;
case 4: exit(1);
default:
printf("Wrong choice\n");
}/*End of switch*/
}/*End of while*/
}/*End of main()*/
void insert()
{
int added_item;
if (rear==MAX-1)
printf("Queue overflow\n");
else
{
if (front==-1) /*If queue is initially empty */
front=0;
printf("Enter an element to add in the queue : ");
scanf("%d", &added_item);
rear=rear+1;
queue_arr[rear] = added_item ;
}
} /*End of insert()*/
void ins_delete()
{
if (front == -1 || front > rear)
{
printf("Queue underflow\n");
return ;
}
else
{
printf("Element deleted from queue is : %d\n", queue_arr[front]);
front=front+1;
}
} /*End of delete() */
void display()
{
int i;
if (front == -1)
printf("Queue is empty\n");
else
{
printf("Elements in the queue:\n");
for(i=front;i<= rear;i++)
printf("%d ",queue_arr[i]);
printf("\n");
}
} /*End of display() */
Output:
1. Insert
2. Delete
3. Display
4. Quit
Enter your choice: 1
Enter an element to add in the queue: 25
Enter your choice: 1
Enter an element to add in the queue: 36
Enter your choice: 3
Elements in the queue: 25, 36
Enter your choice: 2
Element deleted from the queue is: 25
In this example:
1. The preprocessor directives #include are given. MAXSIZE is defined as 50 using the #define
statement.
2. The queue is declared as an array using the declaration int queue_arr[MAX].
3. In the while loop, the different options are displayed on the screen and the value entered in the
variable choice is accepted.
4. The switch case compares the value entered and calls the method corresponding to it. If the value
entered is invalid, it displays the message “Wrong choice”.
5. Insert method: The insert method inserts item in the queue. The if condition checks whether the
queue is full or not. If the queue is full, the “Queue overflow” message is displayed. If the queue is
not full, the item is inserted in the queue and the rear is incremented by 1.
6. Delete method: The delete method deletes item from the queue. The if condition checks whether
the queue is empty or not. If the queue is empty, the “Queue underflow” message is displayed. If
the queue is not empty, the item is deleted and front is incremented by 1.
7. Display method: The display method displays the contents of the queue. The if condition checks
whether the queue is empty or not. If the queue is not empty, it displays all the items in the queue.
Circular Queue
In a circular queue, the last element points to the first element making a circular link. In a circular
queue, the rear end is connected to the front end forming a circular loop. An advantage of circular
queue is that, the insertion and deletion operations are independent of one another. This prevents
an interrupt handler from performing an insertion operation at the same time when the main
function is performing a deletion operation.
Double ended queue is also known as deque. It is a type of queue where the insertions and
deletions happen at the front or the rear end of the queue. The various operations that can be
performed on the double ended queue are:
1. Insert an element at the front end
2. Insert an element at the rear end
3. Delete an element at the front end
4. Delete an element at the rear end
Example:
Program for Implementation of Circular Queue.
#include<stdio.h>
#include<conio.h>
#define SIZE 5
int Q_F(int COUNT)
{
return (COUNT==SIZE)? 1:0;
}
int Q_E(int COUNT)
{
return (COUNT==0)? 1:0;
}
void rear_insert(int item, int Q[], int *R, int *COUNT)
{
if(Q_F(*COUNT))
{
printf("Queue overflow");
return;
}
*R=(*R+1) % SIZE;
Q[*R]=num;
*COUNT+=1;
}
void front_delete(int Q[], int *F, int *COUNT)
{
if(Q_E(*COUNT))
{
printf("Queue underflow");
return;
}
printf("The deleted element is %d\n", Q[*F]);
*F=(*F+1) % SIZE;
*COUNT-=1;
}
void display(int Q[], int F, int COUNT)
{
int i,j;
if(Q_E(COUNT))
{
printf("Queue is empty\n");
return;
}
printf("The contents of the queue are:\n");
i=F;
for(j=1;j<=COUNT; j++)
{
printf("%d\n", Q[i]);
i=(i+1) % SIZE;
}
printf("\n");
}
void main()
{
int choice, num, COUNT, F, R, Q[20];
clrscr();
F=0;
R=-1;
COUNT=0;
for(;;)
{
printf("1. iInsert at front\n");
printf("2. Delete at rear end\n");
printf("3. Display\n");
printf("4. Exit\n");
scanf("%d", &choice);
switch(choice)
{
case 1: printf("Enter the number to be inserted\n");
scanf("%d", &num);
rear_insert(num, Q, &R, &COUNT);
break;
case 2: front_delete(Q, &F, &COUNT);
break;
In this example:
1. The header files are included and a constant value 5 is defined for variable SIZE using #define
statement. The SIZE defines the size of the queue.
2. A queue is created using an array named Q with an element capacity of 20. A variable named
COUNT is declared to store the count of number elements present in the queue.
3. Four functions are created namely, Q_F(), Q_E(), rear_insert(), front_delete(),and display(). The
user has to select an appropriate function to be performed.
4. The switch statement is used to call the rear_insert(), front_delete(), and display() functions.
5. When the user enters 1, rear_insert() function is called. In the rear_insert() function, the if loop
checks if the count is full. If the condition is true, then the program prints a message “Queue is
empty”. Else, it checks for the value of R and assigns the element (num) entered by the user to R.
Initially, when there are no elements in the queue, the value of R will be 0. After every insertion, the
variable COUNT is incremented.
6. When the user enters 2, the front_delete() function is called. In this function, the if loop checks if
the variable COUNT is empty. If the condition is true, then the program prints a message “Queue
underflow”. Else, the element in the 0th
7. When the user enters 3, the display() function is called. In this function, the if loop checks if the
value of COUNT is 0. If the condition is true, the program prints a message “Queue is empty”. Else,
the value of F is assigned to the variable i. The for loop then displays the elements present in the
queue. position will be deleted. The size of F is computed and the COUNT is set to 1.
8. When the user enters 4, the program terminates.
Priority Queue
In priority queue, the elements are inserted and deleted based on their priority. Each element is
assigned a priority and the element with the highest priority is given importance and processed
first. If all the elements present in the queue have the same priority, then the first element is given
importance.
A priority queue is an abstract data type in which each element is associated with a priority
value.Elements are served on the basis of their priority.An element with high priority is dequeued
before an element with low priority.If two elements have the same priority, they are served
according to their order in the queue.
The priority queue moves the highest priority elements at the beginning of the priority queue and
the lowest priority elements at the back of the priority queue.It supports only those elements that
are comparable. Priority queue in the data structure arranges the elements in either ascending or
descending order.
Example:
List: 5 6 20 22 10
Arrange these numbers in ascending order.
List 5 6 10 20 22
Example:
List: 5 6 35 22 10
Arrange these numbers
List: 35 22 10 6 5
Array
Linked List
Queue implementation Using Array
To represent a queue we require a one-dimensional array of some maximum size say n to hold the
data items and two other variables front and rear to point to the beginning and the end of the
queue.
Queue implemented using array stores only fixed number of data values. Two variables front
andrear, that are implemented in queue. Front and rear variables point to the position from where
insertions and deletions are performed in a queue.
Initially both front and rear are set to -1.For insert a new value into the queue, increment rear value
by one and then insert at that position. For delete a value from the queue, then delete the element
which is at front position and increment front value by one.
Enqueue operation
Enqueue() function is used to insert a new element into the queue. In a queue, the new element is
always inserted at rear position. The enQueue() function takes one integer value as a parameter and
inserts that value into the queue.
Dequeue operation
Dequeue() is a function used to delete an element from the queue. In a queue, the element is always
deleted from front position. The Dequeue() function does not take any value as parameter.
Insert operation
There can be the two scenario of inserting this new node ptr into the linked queue.In the first
scenario, we insert element into an empty queue. In this case, the condition front = NULL becomes
true.In the second case, the queue contains more than one element. The condition front = NULL
becomes false.
Algorithm
Step 1: Allocate the space for the new node PTR
Step 2: SET PTR -> DATA = VAL
Step 3: IF FRONT = NULL
SET FRONT = REAR = PTR
SET FRONT -> NEXT = REAR -> NEXT = NULL
ELSE
SET REAR -> NEXT = PTR
SET REAR = PTR
SET REAR -> NEXT = NULL
[END OF IF]
Step 4: END
Delete operation
Deletion operation removes the element that is first inserted among all the queue elements. The
condition front == NULL becomes true if the list is empty. Otherwise, we will delete the element
that is pointed by the pointer front.
Algorithm
Step 1: IF FRONT = NULL
Write " Underflow "
Go to Step 5
[END OF IF]
Step 2: SET PTR = FRONT
Step 3: SET FRONT = FRONT -> NEXT
Step 4: FREE PTR
Step 5: END
Summary
A queue is an ordered collection of items in which deletion takes place at the front and
insertion at the rear of the queue.
In a memory, a queue can be represented in two ways; by representing the way in which the
elements are stored in the memory, and by naming the address to which the front and rear
pointers point to.
The different types of queues are double ended queue, circular queue, and priority queue.
The basic operations performed on a queue include inserting an element at the rear end and
deleting an element at the front end.
A priority queue is a collection of elements such that each element has been assigned a
priority. An element of higher priority is processed before any element of lower priority.
Two elements with the same priority are processed according to the order in which they
were inserted into the queue.
Keywords
FIFO: (First In First Out) The property of a linear data structure which ensures that the element
retrieved from it is the first element that went into it.
Front: The end of a queue from where elements are retrieved.
Queue: A linear data structure in which the element is inserted at one end while retrieved from
another end.
Rear: The end of a queue where new elements are inserted.
Dequeue: Process of deleting elements from the queue.
Enqueue: Process of inserting elements into queue.
SelfAssessment
1. Which technique is followed by queue?
A. FIFO
B. LIFO
C. Both LIFO and FIFO
D. None of above
A. Deletion
B. Insertion
C. Display
D. All of above
A. Peek
B. isFull
C. isEmpty
D. All of above
A. Array
B. Stack
C. Linked List
D. All of above
A. Circular
B. Simple
C. Complex
D. Priority
A. Static
B. Linear
C. Dynamic
D. None of above
A. It checks if there exists any item before popping from the queue.
B. It checks whether all variables are initialized.
C. It checks if the queue is full before enqueueing any element.
D. All of above
A. Static
B. Linear
C. Abstract
D. All of above
A. Ascending order
B. Descending order
C. Both ascending and descending order
D. None of above
A. Linked list
B. Heap data structure
C. Binary search tree
D. All of above
A. Max heap
B. Min-heap.
C. Both max and min heap
D. None of above
6. B 7. A 8. B 9. C 10. A
Review Questions
1 “Using double ended queues is more advantageous than using circular queues. “Discuss
2 “Stacks are different from queues.” Justify.
3 “Using priority queues is advantageous in job scheduling algorithms. “Analyze
4 Can a basic queue be implemented to function as a dynamic queue? Discuss
5 Describe the application of queue.
6 How will you insert and delete an element in queue?
7 Explain dynamic memory allocation advantages.
Further Readings
Data Structures and Algorithms; Shi-Kuo Chang; World Scientifi c.
Data Structures and Efficient Algorithms, Burkhard Monien, Thomas Ottmann,
Springer.
Kruse Data Structure & Program Design, Prentice Hall of India, New Delhi
Mark Allen Weles: Data Structure & Algorithm Analysis in C Second Adition.
Addison-Wesley publishing
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Shi-kuo Chang, Data Structures and Algorithms, World Scientifi c
Sorenson and Tremblay: An Introduction to Data Structure with Algorithms.
Thomas H. Cormen, Charles E, Leiserson& Ronald L. Rivest: Introduction to
Algorithms. Prentice-Hall of India Pvt. Limited, New Delhi
Timothy A. Budd, Classic Data Structures in C++, Addison Wesley.
Web Links
www.en.wikipedia.org
www.web-source.net
www.webopedia.com
https://www.geeksforgeeks.org/
https://www.javatpoint.com/data-structure-queue
https://www.tutorialspoint.com/data_structures_algorithms/dsa_queue.htm
Objectives
After studying this unit, you will be able to:
Introduction
We know that data structure is a set of data elements grouped together under one name. A data
structure can be considered as a set of rules that hold the data together. Almost all computer
programs use data structures. Data structures are an essential part of algorithms. We can use it
to manage huge amount of data in large databases. Some modern programming languages
emphasize more on data structures than algorithms.
There are many data structures that help us to manipulate the data stored in the memory,
which we have discussed in the previous units. These include array, stack, queue, and linked-
list.
Choosing the best data structure for a program is a challenging task. Similar tasks may require
different data structures. We derive new data structures for complex tasks using the already
existing ones. We need to compare the characteristics of the data structures before choosing the
right data structure. A tree is a hierarchical data structure suitable for representing hierarchical
information. The tree data structure has the characteristics of quick search, quick inserts, and
quick deletes.
This is a tree because it is a set of nodes {A, B, C, D, E, F, G, H, I}, with node A as a root node,
and the remaining nodes are partitioned into three disjoint sets: {B, G, H, I}, {C, E, F} AND {D}
respectively. Each of these sets is a tree individually because each of these sets satisfies the
above properties.
This is not a tree because it is a set of nodes {A, B, C, D, E, F, G, H, I}, with node A as a root
node, but the remaining nodes cannot be partitioned into disjoint sets, because the node I is
shared.
Given below are some of the important definitions, which are used in connection with trees.
Degree of Node of a Tree: The degree of a node of a tree is the number of sub-trees havingthis
node as a root, or it is a number of decedents of a node. If degree is zero then it is
calledterminal node or leaf node of a tree.
Degree of a Tree: It is defined as the maximum of degree of the nodes of the tree, i.e. degreeof
tree = max (degree (node i) for i = 1 to n).
Level of a Node: We defi ne the level of the node by taking the level of the root node to be
1,and incrementing it by 1 as we move from the root towards the sub-trees i.e. the level of allthe
descendents of the root nodes will be 2. The level of their descendents will be 3 and soon. We
then define depth of the tree to be the maximum value of level for node of a tree.
Root Node: The root of a tree is called a root node. A root node occurs only once in the whole
tree.
Parent Node: The parent of a node is the immediate predecessor of that node.
Child Node: Child nodes are the immediate successors of a node.
Leaf Node: A node which does not have any child nodes is known as a leaf node.
Link: The pointer to a node in the tree is known as a link. A node can have more than one link.
Path: Every node in the tree is reachable from the root node through a unique sequence of
links. This sequence of links is known as a path. The number of links in a path is considered to
be the length of the path.
Levels: The level of a node in the tree is considered to be its hierarchical rank in the tree.
Height: The height of a non-empty tree is the maximum level of a node in the tree. The height
of an empty tree (no node in a tree) is 0. The height of a tree containing a single node is 1. The
longest path in the tree has to be considered to measure the height of the tree.
Height of a tree (h) = Imax+ 1, where Imax is the maximum level of a tree.
Siblings: The nodes which have the same parent node are known as siblings.
Graphs consist of a set of nodes and edges, just like trees. But for graphs, there are no rules for
the connections between nodes. In graphs, there is no concept of a root node, nor a concept of
parents and children. Rather, a graph is just a collection of interconnected nodes. All trees are
graphs. A tree is a special case of a graph, in which the nodes are all reachable from some
starting node.
In a formal way, we can define a binary tree as a finite set of nodes which is either empty or
partitioned in to sets of T0, Tl, Tr, where T0 is the root and Tl and Tr are left and right binary
trees, respectively.
So, for a binary tree we find that:
1. The maximum number of nodes at level i will be 2i−1
2. If k is the depth of the tree then the maximum number of nodes that the tree can have is
2k − 1 = 2k−1 + 2k−2 + … + 20
3. If we have a binary tree containing n nodes, then the height of the tree is at most n and at
least ceiling log2 (n + 1).
4. If a binary tree has n nodes at a level l then, it has at most 2n nodes at a level l + 1
5. The total number of nodes in a binary tree with depth k (root has depth zero) is N = 20 + 21
+ 22 + …….+ 2k = 2k+1 – 1.
pointer is also passed as a parameter. The procedure accomplishes this by checking whether
the tree whose root pointer is passed as a parameter is empty. If it is empty then the newly
created node is inserted as a root node. If it is not empty then it copies the root pointer into a
variable temp1, it then stores value of temp1 in another variable temp2, compares the data
value of the node pointed to by temp1 with the data value supplied as a parameter, if the data
value supplied as a parameter is smaller than the data value of the node pointed to by temp1
then it copies the left link of the node pointed by temp1 into temp1 (goes to the left), otherwise
it copies the right link of the node pointed by temp1 into temp1(goes to the right). It repeats
this process till temp1 becomes nil.
When temp1 becomes nil, the new node is inserted as a left child of the node pointed to by
temp2 if data value of the node pointed to by temp2 is greater than data value supplied as
parameter. Otherwise the new node is inserted as a right child of node pointed to by temp2.
Therefore the insert procedure is
void insert(tnode *p, int val)
{
tnode *temp1, *temp2;
if (p == NULL)
{
p = new(tnode);
p->data = val;
p->1child = NULL;
p->rchild = NULL;
}
else
{
temp1 = p;
while(temp1 != NULL)
{
temp2 = temp1;
if(temp1->data > val)
temp1 = temp1->1eft;
else
temp1 = temp1->right;
}
if(temp2->data > val)
{
temp2->left = new(tnode);
temp2 = temp2->left;
temp2->data = val;
temp2->left = NULL;
temp2->right= NULL;
}
else
{
temp2->right = new(tnode);
temp2 = temp2->right;
temp2->data = val;
temp2->left = NULL;
temp2->right = NULL;
}
}
}
When inserting any node in a binary search tree, it is necessary to look for its proper position in
thebinary search tree. The new node is compared with every node of the tree. If the value of the
nodewhich is to be inserted is more than the value of the current node, then the right sub-tree is
considered,else the left sub-tree is considered. Once the proper position is identified, the new
node is attached asthe left or right child node. Let us now discuss the pseudo code for inserting
a new element in a binarysearch tree.
Pseudocode for Inserting a Value in a Binary Search Tree
//Purpose: Insert data object X into the Tree
//Inputs: Data object X (to be inserted), binary-search-tree node
//Effect: Do nothing if tree already contains X;
// otherwise, update binary search tree by adding a new node containing data object X
insert(X, node){
if(node = NULL){
node = new binaryNode(X,NULL,NULL)
return
}
if(X = node:data)
return
else if(X<node:data)
insert(X, node:leftChild)
else // X>node:data
insert(X, node:rightChild)
}
else if(X>node:data)
delete(X, node:rightChild)
else { // found the node to be deleted. Take action based on number of node children
if(node:leftChild = NULL and node:rightChild = NULL){
delete node
node = NULL
return
}
else if(node:leftChild = NULL){
tempNode = node
node = node:rightChild
delete tempNode}
else if(node:rightChild = NULL){
tempNode = node
node = node:leftChild
delete tempNode
}
else { //replace node:data with minimum data from right sub-tree
tempNode = findMin(node.rightChild)
node:data = tempNode:data
delete(node:data,node:rightChild)
}
}
Step 2 - If it has two children, then find the largest node in its left subtree (OR) the smallest
node in its right subtree.
Step 3 - Swap both deleting node and node which is found in the above step.
Step 4 - Then check whether deleting node came to case 1 or case 2 or else goto step 2
Step 5 - If it comes to case 1, then delete using case 1 logic.
Step 6- If it comes to case 2, then delete using case 2 logic.
Step 7 - Repeat the same process until the node is deleted from the tree.
Summary
Search trees are data structures that support many dynamic-set operations such as
searching, finding the minimum or maximum value, inserting, or deleting a value.
In a binary search tree, for a given node n, each node to the left has a value lesser than n
and each node to the right has a value greater than n.
The time taken to perform operations on a binary search tree is directly proportional to
the height of the tree.
Binary trees provide an excellent solution to this problem. By making the entries of an
ordered list into the nodes of a binary tree, we shall fi nd that we can search for a target
key in O(log n) steps, just as with binary search, and we shall obtain algorithms for
inserting and deleting entries also in time O(log n).
Keywords
Binary Search Tree: A binary search tree is a binary tree which may be empty, and
every node contains an identifier.
Searching: Searching for the key in the given binary search tree, start with the root
node and compare the key with the data value of the root node.
Degree of a tree: The highest degree of a node appearing in the tree.
Inorder: A tree traversing method in which the tree is traversed in the order of left-
tree, node andthen right-tree.
Level of a node: The number of nodes that must be traversed to reach the node from
the root.
Root node: The node in a tree which does not have a parent node.
Tree: A two-dimensional data structure comprising of nodes where one node is the
root and rest of the nodes form two disjoint sets each of which is a tree.
Self Assessment
1. Tree is a _______ hierarchical data structure.
A. Linear
B. Nonlinear
C. Abstract
D. All of above
A. Binary Tree
B. Binary Search Tree
C. AVL Tree
D. All of above
A. Ordered
B. Sorted
C. Both ordered and sorted binary tree
D. None of above
5. value of the nodes in the left sub-tree is less than the value of the root is
A. General Tree
B. Binary Tree
C. Binary Search Tree
D. None of above
A. Search
B. Peek
C. Insertion
D. Deletion
A. O(n)
B. O(1)
C. O(2)
D. None of above
A. O(0)
B. O(1)
C. O(2)
D. None of above
11. What is the worst case time complexity for Delete operation?
A. O(0)
B. O(1)
C. O(n)
D. None of above
A. The left and right sub-trees should also be binary search trees
B. In order sequence gives decreasing order of elements
C. The left child is always lesser than its parent
D. The right child is always greater than its parent
A. Data element
B. Pointer to right subtree
C. Super tree
D. Pointer to left subtree
A. General Tree
B. Primary Tree
C. Binary Tree
D. Binary Search Tree
E.
6. D 7. B 8. D 9. A 10. B
Review Questions
1. define tree with suitable example.
2. Draw a binary tree with six child node.
4. Discuss degree of tree.
5. Explain representation of tree.
6. what are the application of tree.
7. Discuss time complexity of Binary search tree.
Further Readings
Data Structures and Effi cient Algorithms, Burkhard Monien, Thomas Ottmann,
Springer.
Kruse Data Structure & Program Design, Prentice Hall of India, New Delhi
Mark Allen Weles: Data Structure & Algorithm Analysis in C Second Adition.
Addison-Wesley publishing
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Shi-kuo Chang, Data Structures and Algorithms, World Scientifi c
Sorenson and Tremblay: An Introduction to Data Structure with Algorithms.
Thomas H. Cormen, Charles E, Leiserson & Ronald L. Rivest: Introduction to
Algorithms. Prentice-Hall of India Pvt. Limited, New Delhi
Timothy A. Budd, Classic Data Structures in C++, Addison Wesley.
Web Links
www.en.wikipedia.org
www.webopedia.com
https://www.programiz.com/
https://www.javatpoint.com/data-structure-stack
https://www.tutorialspoint.com/data_structures_algorithms/stack_algorithm.htm
Objectives
After studying this unit, you will be able to:
Introduction
One of the most essential data structures is the tree, which is used to conduct operations like
insertion, deletion, and searching of items efficiently. Construction of a well-balanced tree for
sorting all data is not practicable when working with a huge number of data, though. As a result,
only valuable data is saved as a tree, and the actual volume of data used changes over time as new
data is inserted and old data is deleted. It is possible to conduct traversals, insertions, and deletions
without utilizing either stack or recursion in some circumstances where the NULL link to a binary
tree to special links is referred to as threads.
Here, the height of the tree is h. Height of one subtree is h–1 while that of another subtree of the
same node is h–2, differing from each other by just 1. Therefore, it is an AVL tree.
AVL Tree is defined as height balanced binary search tree. In AVL tree each node is associated with
a balance factor which is calculated by subtracting the height of its right sub-tree from that of its left
sub-tree.
Balance Factor
Balance factor of a node in an AVL tree is the difference between the height of the left sub tree and
that of the right sub tree of that node.
Balance Factor = (Height of Left Sub tree - Height of Right Sub tree) or (Height of Right Sub tree -
Height of Left Sub tree). Balance factor value are: -1, 0 or 1.
If balance factor of any node is 1, it means that the left sub-tree is one level higher than the right
sub-tree.
If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal
height.
If balance factor of any node is -1, it means that the left sub-tree is one level lower than the right
sub-tree.
Balanced Tree
Left rotation
If a tree becomes unbalanced, when a node is inserted into the right subtree of the right subtree,
then we perform a single left rotation.
Right rotation
AVL tree may become unbalanced, if a node is inserted in the left subtree of the left subtree. The
tree then needs a right rotation.
Left-Right rotation
Double rotations are slightly complex rotations. To understand them better, we should take note of
each action performed while rotation. A left-right rotation is a combination of left rotation followed
by right rotation.
Step - 1
Step – 2
Right-Left rotation
It is a combination of right rotation followed by left rotation.
Step – 1
Step – 2
Deletion
When you delete a node, there are three things that can happen to the parent:
1. Its height is decremented by one.
2. Its height doesn’t change and it stays balanced.
3. Its height doesn’t change, but it becomes imbalanced.
You handle these three cases in different ways:
1. The parent’s height is decremented by one. When this happens, you check the parent’s parent:
you keep doing this until you return or you reach the root of the tree.
2. The parent’s height doesn’t change and it stays balanced. When this happens you may return –
deletion is over.
3. The parent’s height doesn’t change, but it becomes imbalanced. When this happens, you have to
rebalance the subtree rooted at the parent. After rebalancing, the subtree’s height may be one
smaller than it was originally. If so, you must continue checking the parent’s parent.
To rebalance, you need to identify whether you are in a zig-zig situation or a zig-zag situation and
rebalance accordingly.
6.4 B-tree
A B-tree is a tree data structure that keeps data sorted and allows insertions and deletions that is
logarithmically proportional to fi le size. It is commonly used in databases and fi le systems.
In B-trees, internal nodes can have a variable number of child nodes within some pre-defined
range. When data is inserted or removed from a node, its number of child nodes changes. In order
to maintain the pre-defi ned range, internal nodes may be joined or split. Because a range of child
nodes is permitted, B-trees do not need re-balancing as frequently as other self balancing search
trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on
the number of child nodes are typically fixed for a particular implementation.
A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will
increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent,
and results in all leaf nodes being one more hop further removed from the root.
B-trees are balanced trees that are optimized for situations when part or the entire tree must be
maintained in secondary storage such as a magnetic disk. Since disk accesses are expensive (time
consuming) operations, a b-tree tries to minimize the number of disk accesses.
Structure of B-trees
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The
keys are stored in non-decreasing order. Each key has an associated child that is the root of a
subtree containing all nodes with keys less than or equal to the key but greater than the preceding
key. A node also has an additional rightmost child that is the root for a subtree containing all keys
greater than any keys in the node.
A b-tree has a minimum number of allowable children for each node known as the minimization
factor. If t is this minimization factor, every node must have at least t – 1 keys. Under certain
circumstances, the root node is allowed to violate this property by having fewer than t – 1 keys.
Every node may have at most 2t – 1 keys or, equivalently, 2t children. Since each node tends to
have a large branching factor (a large number of children), it is typically necessary to traverse
relatively few nodes before locating the desired key. If access to each node requires a disk access,
then a b-tree will minimize the number of disk accesses required.
The minimization factor is usually chosen so that the total size of each node corresponds to a
multiple of the block size of the underlying storage device. This choice simplifies and optimizes
disk access. Consequently, a b-tree is an ideal data structure for situations where all data cannot
reside in primary storage and accesses to secondary storage are comparatively expensive (or time
consuming).
Why B Tree
B-Trees is used to reduce the number of disk accesses.Most of the tree operations (search, insert,
delete, max, min) require O(h) disk accesses where h is the height of the tree. B-tree is a fat tree. The
height of B-Trees is kept low by putting maximum possible keys in a B-Tree node.
Data structures like binary search tree, avl tree, red-black tree, etc. can store only one key in one
node. If you have to store a large number of keys, then the height of such trees becomes very large
and the access time increases.
The height of the B-tree is low so total disk accesses for most of the operations are reduced
significantly compared to balanced Binary Search Trees like AVL Tree, Red-Black Tree,etc.
B Tree properties
B-Tree of Order m has the following properties:
1 - All leaf nodes must be at same level.
2 - All nodes except root must have at least [m/2]-1 keys and maximum of m-1 keys.
3 - All non leaf nodes except root (i.e. all internal nodes) must have at least m/2 children.
4 - If the root node is a non leaf node, then it must have at least 2 children.
5 - A non leaf node with n-1 keys must have n number of children.
6 - All the key values in a node must be in Ascending Order.
Search Operation
The search operation on a b-tree is analogous to a search on a binary tree. Instead of choosing
between a left and a right child as in a binary tree, a b-tree search must make an n-way choice. The
correct child is chosen by performing a linear search of the values in the node. After finding the
value greater than or equal to the desired value, the child pointer to the immediate left of that value
is followed. If all values are less than the desired value, the rightmost child pointer is followed. Of
course, the search can be terminated as soon as the desired node is found. Since the running time of
the search operation depends upon the height of the tree, B-Tree-Search is O(logt n).
B-Tree-Search(x, k)
i<- 1
while i<= n[x] and k >keyi[x]
do i<- i + 1
if i<= n[x] and k = keyi[x]
then return (x, i)
if leaf[x]
then return NIL
else Disk-Read(ci[x])
return B-Tree-Search(ci[x], k)
Search algorithm
Let the key (the value) to be searched by “X”.
Start searching from the root and recursively traverse down.
If X is lesser than the root value, search left sub tree, if X is greater than the root value, search the
right sub tree.
If the node has the found X, simply return the node.
If the X is not found in the node, traverse down to the child with a greater Key.
If X is not found in the tree, we return NULL.
Insertion Operation
Insertions in B-Tree performed only at the leaf node level.Inserting operation performed with two
steps: searching the appropriate node to insert the element and splitting the node if required.
Insertion algorithm
Check whether tree is Empty.
If tree is Empty, then create a new node with new key value and insert it into the tree as a root node
If tree is not empty, then, find the appropriate leaf node at which the node can be inserted.
If the leaf node contain less than m-1 keys then insert the element in the increasing order.
Else, if the leaf node contains m-1 keys, then follow the following steps.
-Insert the new element in the increasing order of elements.
-Split the node into the two nodes at the median.
-Push the median element upto its parent node.
-If the parent node also contain m-1 number of keys, then split it too by following
thesame steps.
Deletion operation
In case of deletion from B-Tree user need to follow more rule as compared to search and insertion.
Three case for deletion from B-Tree
If the key is in the leaf node
If the key is in an internal node
If the key is in a root node
If, after deletion, the target has less than min keys, then the target node will borrow max value from
its sibling via sibling’s parent.
The max value of the parent will be taken by a target, but with the nodes of the max value of the
sibling.
Summary
AVL tree controls the height of the binary search tree. The time taken for all operations in a
binary search tree of height h is O (h).
In AVL tree each node is associated with a balance factor which is calculated by subtracting
the height of its right sub-tree from that of its left sub-tree.
B-trees are balanced trees that are optimized for situations when part or the entire tree must
be maintained in secondary storage such as a magnetic disk.
A B-tree is a specialized multiway tree designed especially for use on disk. In a B-tree each
node may contain a large number of keys. The number of subtrees of each node, then, may
also be large.
A B-tree is designed to branch out in this large number of directions and to contain a lot of
keys in each node so that the height of the tree is relatively small.
This means that only a small number of nodes must be read from disk to retrieve an item.
The goal is to get fast access to the data, and with disk drives this means reading a very
small number of records. Note that a large node size (with lots of keys in the node) also fits
with the fact that with a disk drive one can usually read a fair amount of data at once.
Keywords
B-Tree Algorithms: A B-tree is a data structure that maintains an ordered set of data and allows
efficient operations to fi nd, delete, insert, and browse the data.
B-trees: B-trees are balanced trees that are optimized for situations when part or the entire tree
must be maintained in secondary storage such as a magnetic disk.
SelfAssessment
1. AVL Tree is invented in
A. 1955
B. 1966
C. 1962
D. None of above
A. -1
B. 0
C. 1
D. All of above
A. 1
B. 0
C. 2
D. None of above
A. Right rotation
B. Left-Right rotation
C. Right-Left rotation
D. All of above
A. Double rotations
B. Single rotation
C. Triple rotation
D. None of above
A. Single rotation
B. Triple rotation
C. Double rotations
D. None of above
A. 1
B. 0
C. 2
D. None of above
A. True
B. False
A. Search
B. Insert
C. Data manipulation
D. Delete
A. Root node
B. Leaf node
C. Both root and leaf node
D. None of above
15. What are the different cases for deletion from B-Tree?
6. B 7. D 8. C 9. A 10. B
Review Questions
1. define AVL tree and its advantages.
2. How AVL tree is different from B-tree.
3. Describe the deletion of an item from b-trees.
4. Describe of structure of B-tree. Also explain the operation of B-tree.
5. Explain insertion of an item in b-trees.
6. Differentiate between Left Heavy Tree and right Heavy Tree with example.
7. Discuss different AVL tree rotations with suitable diagram.
Further Readings
Data Structures and Efficient Algorithms, Burkhard Monien, Thomas Ottmann,
Springer.
Kruse Data Structure & Program Design, Prentice Hall of India, New Delhi
Mark Allen Weles: Data Structure & Algorithm Analysis in C Second Adition. Addison-
Wesley publishing
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Shi-kuo Chang, Data Structures and Algorithms, World Scientifi c
Sorenson and Tremblay: An Introduction to Data Structure with Algorithms.
Thomas H. Cormen, Charles E, Leiserson& Ronald L. Rivest: Introduction to
Algorithms. Prentice-Hall of India Pvt. Limited, New Delhi
Timothy A. Budd, Classic Data Structures in C++, Addison Wesley.
Web Links
www.en.wikipedia.org
www.webopedia.com
https://www.programiz.com/
https://www.javatpoint.com/data-structure-stack
https://www.tutorialspoint.com/data_structures_algorithms/stack_algorithm.htm
Objectives
After studying this unit, you will be able to:
Introduction
Recall that, for binary search trees, although the average-case times for the lookup, insert, and
delete methods are all O(log N), where N is the number of nodes in the tree, the worst-case time is
O(N). We can guarantee O(log N) time for all three methods by using a balanced tree -- a tree that
always has height O(log N)-- instead of a binary search tree.
A number of different balanced trees have been defined, including AVL trees, 2-4 trees, and B trees.
Here we will look at yet another kind of balanced tree called a red-black tree. The important idea
behind all of these trees is that the insert and delete operations may restructure the tree to keep it
balanced. So lookup, insert, and delete will always be logarithmic in the number of nodes but insert
and delete may be more complicated than for binary search trees.
Red-black tree
Three Invariants
A red/black tree is a binary search tree in which each node is colored either red or black. At the
interface, we maintain three invariants:
Ordering Invariant This is the same as for binary search trees: all the keys to left of a node are
smaller, and all the keys to the right of a node are larger than the key at the node itself.
Height Invariant The number of black nodes on every path from the root to each leaf is the same.
We call this the black height of the tree.
Color Invariant No two consecutive nodes are red. The balance and color invariants together
imply that the longest path from the root to a leaf is at most twice as long as the shortest path. Since
insert and search in a binary search tree have time proportional to the length of the path from the
root to the leaf, this guarantees O (log(n)) times for these operations, even if the tree is not perfectly
balanced. We therefore refer to the height and color invariants collectively as the balance invariant.
Insertion operation
The insertion operation in Red Black Tree is similar to the Binary Search Tree.Every new node must
be inserted with the color RED.
After every insertion operation, we need to check all the properties of Red-Black Tree. If all the
properties are satisfied then we go to next operation otherwise we perform the following operation
to make it Red Black Tree.
1. Recolor
2. Rotation
3. Rotation followed by Recolor
Check if the tree is empty (ie. whether x is NIL). If yes, insert newNode as a root node and color it
black.
Else, repeat steps following steps until leaf (NIL) is reached.
Compare newKey with rootKey.
If newKey is greater than rootKey, traverse through the right subtree.
Else traverse through the left subtree.
Assign the parent of the leaf as a parent of newNode.
If leafKey is greater than newKey, make newNode as rightChild.
Else, make newNode as leftChild.
Assign NULL to the left and rightChild of newNode.
Assign RED color to newNode.
Deletion operation
The deletion operation in Red-Black Tree is similar to the BST. In deletion operation, we need to
check with the Red-Black Tree properties. If any of the properties are violated then make suitable
operations like Recolor, Rotation and Rotation followed by Recolor to make it Red-Black Tree.
Lab Exercise:
Implementation of red-black tree
#include <iostream>
using namespace std;
struct Node {
int data;
Node *parent;
Node *left;
Node *right;
int color;
};
typedef Node *NodePtr;
class RedBlackTree {
private:
NodePtr root;
NodePtr TNULL;
void initializeNULLNode(NodePtr node, NodePtr parent) {
node->data = 0;
node->parent = parent;
node->left = nullptr;
node->right = nullptr;
node->color = 0;
}
// Preorder
void preOrderHelper(NodePtr node) {
if (node != TNULL) {
cout << node->data << " ";
preOrderHelper(node->left);
preOrderHelper(node->right);
}
}
// Inorder
void inOrderHelper(NodePtr node) {
if (node != TNULL) {
inOrderHelper(node->left);
rightRotate(s);
s = x->parent->right;
}
s->color = x->parent->color;
x->parent->color = 0;
s->right->color = 0;
leftRotate(x->parent);
x = root;
}
} else {
s = x->parent->left;
if (s->color == 1) {
s->color = 0;
x->parent->color = 1;
rightRotate(x->parent);
s = x->parent->left;
}
if (s->right->color == 0 && s->right->color == 0) {
s->color = 1;
x = x->parent;
} else {
if (s->left->color == 0) {
s->right->color = 0;
s->color = 1;
leftRotate(s);
s = x->parent->left;
}
s->color = x->parent->color;
x->parent->color = 0;
s->left->color = 0;
rightRotate(x->parent);
x = root;
}
}
}
x->color = 0;
}
void rbTransplant(NodePtr u, NodePtr v) {
if (u->parent == nullptr) {
root = v;
} else if (u == u->parent->left) {
u->parent->left = v;
} else {
u->parent->right = v;
}
v->parent = u->parent;
}
void deleteNodeHelper(NodePtr node, int key) {
NodePtr z = TNULL;
NodePtr x, y;
while (node != TNULL) {
if (node->data == key) {
z = node;
}
if (node->data <= key) {
node = node->right;
} else {
node = node->left;
}
}
if (z == TNULL) {
cout << "Key not found in the tree" << endl;
return;
}
y = z;
int y_original_color = y->color;
if (z->left == TNULL) {
x = z->right;
rbTransplant(z, z->right);
} else if (z->right == TNULL) {
x = z->left;
rbTransplant(z, z->left);
} else {
y = minimum(z->right);
y_original_color = y->color;
x = y->right;
if (y->parent == z) {
x->parent = y;
} else {
rbTransplant(y, y->right);
y->right = z->right;
y->right->parent = y;
}
rbTransplant(z, y);
y->left = z->left;
y->left->parent = y;
y->color = z->color;
}
delete z;
if (y_original_color == 0) {
deleteFix(x);
}
}
// For balancing the tree after insertion
void insertFix(NodePtr k) {
NodePtr u;
while (k->parent->color == 1) {
if (k->parent == k->parent->parent->right) {
u = k->parent->parent->left;
if (u->color == 1) {
u->color = 0;
k->parent->color = 0;
k->parent->parent->color = 1;
k = k->parent->parent;
} else {
if (k == k->parent->left) {
k = k->parent;
rightRotate(k);
}
k->parent->color = 0;
k->parent->parent->color = 1;
leftRotate(k->parent->parent);
}
} else {
u = k->parent->parent->right;
if (u->color == 1) {
u->color = 0;
k->parent->color = 0;
k->parent->parent->color = 1;
k = k->parent->parent;
} else {
if (k == k->parent->right) {
k = k->parent;
leftRotate(k);
}
k->parent->color = 0;
k->parent->parent->color = 1;
rightRotate(k->parent->parent);
}
}
if (k == root) {
break;
}
}
root->color = 0;
}
void printHelper(NodePtr root, string indent, bool last) {
if (root != TNULL) {
cout << indent;
if (last) {
cout << "R----";
indent += " ";
} else {
cout << "L----";
indent += "| ";
}
string sColor = root->color ? "RED" : "BLACK";
cout << root->data << "(" << sColor << ")" << endl;
printHelper(root->left, indent, false);
printHelper(root->right, indent, true);
}
}
public:
RedBlackTree() {
TNULL = new Node;
TNULL->color = 0;
TNULL->left = nullptr;
TNULL->right = nullptr;
root = TNULL;
}
void preorder() {
preOrderHelper(this->root);
}
void inorder() {
inOrderHelper(this->root);
}
void postorder() {
postOrderHelper(this->root);
}
NodePtr searchTree(int k) {
return searchTreeHelper(this->root, k);
}
NodePtr minimum(NodePtr node) {
while (node->left != TNULL) {
node = node->left;
}
return node;
}
NodePtr maximum(NodePtr node) {
while (node->right != TNULL) {
node = node->right;
}
return node;
}
NodePtr successor(NodePtr x) {
if (x->right != TNULL) {
return minimum(x->right);
}
NodePtr y = x->parent;
while (y != TNULL && x == y->right) {
x = y;
y = y->parent;
}
return y;
}
NodePtr predecessor(NodePtr x) {
if (x->left != TNULL) {
return maximum(x->left);
}
NodePtr y = x->parent;
while (y != TNULL && x == y->left) {
x = y;
y = y->parent;
}
return y;
}
void leftRotate(NodePtr x) {
NodePtr y = x->right;
x->right = y->left;
if (y->left != TNULL) {
y->left->parent = x;
}
y->parent = x->parent;
if (x->parent == nullptr) {
this->root = y;
} else if (x == x->parent->left) {
x->parent->left = y;
} else {
x->parent->right = y;
}
y->left = x;
x->parent = y;
}
void rightRotate(NodePtr x) {
NodePtr y = x->left;
x->left = y->right;
if (y->right != TNULL) {
y->right->parent = x;
}
y->parent = x->parent;
if (x->parent == nullptr) {
this->root = y;
} else if (x == x->parent->right) {
x->parent->right = y;
} else {
x->parent->left = y;
}
y->right = x;
x->parent = y;
}
// Inserting a node
void insert(int key) {
NodePtr node = new Node;
node->parent = nullptr;
node->data = key;
node->left = TNULL;
node->right = TNULL;
node->color = 1;
NodePtr y = nullptr;
NodePtr x = this->root;
while (x != TNULL) {
y = x;
if (node->data < x->data) {
x = x->left;
} else {
x = x->right;
}
}
node->parent = y;
if (y == nullptr) {
root = node;
} else if (node->data < y->data) {
y->left = node;
} else {
y->right = node;
}
if (node->parent == nullptr) {
node->color = 0;
return;
}
if (node->parent->parent == nullptr) {
return;
}
insertFix(node);
}
NodePtr getRoot() {
return this->root;
}
Splaying
Splaying is a process in which a node is transferred to the root by performing suitable rotations. In
a splay tree, whenever we access any node (searching, inserting or deleting a node), it is splayed to
the root.
7.5 Operations
Splay trees support the following operations. We write S for sets, x for elements and k for key
values.
splay(S, k) returns an access to an element x with key k in the set S. In case no such element exists,
we return an access to the next smaller or larger element.
split(S, k) returns (S_1,S_2), where for each x in S_1 holds: key[x] <= k , and for each y in S_2 holds:
k < key[y].
join(S_1,S_2) returns the union S = S_1 + S_2. Condition: for each x in S_1 and each y in S_2: x
<= y.
insert(S,x) augments S by x.
delete(S,x) removes x from S.
Each split, join, delete and insert operation can be reduced to splay operations and modifications of
the tree at the root which take only constant time. Thus, the run time for each operation is
essentially the same as for a splay operation.
The most important tree operation is splay(x), which moves an element x to the root of the tree. In
case x is not present in the tree, the last element on the search path for x is moved instead. The run
time for a splay(x) operation is proportional to the length of the search path for x. While searching
for x we traverse the search path top-down. Let y be the last node on that path. In a second step, we
move y along that path by applying rotations as described later.
The time complexity of maintaining a splay tree is analyzed using an Amortized Analysis. Consider
a sequence of operations op_1, op_2, ..., op_m. Assume that our data structure has a potential. One
can think of the potential as a bank account. Each tree operation op_i has actual costs proportional
to its running time. We’re paying for the costs c_i of op_i with its amortized costs a_i. The
difference between concrete and amortized costs is charged against the potential of the data
structure. This means that we’re investing in the potential if the amortized costs are higher that the
actual costs, otherwise we’re decreasing the potential.
Thus, we’re paying for the sequence op_1, op_2, ..., op_m no more than the initial potential plus the
sum of the amortized costs a_1 + a_2 + ... + a_m.
The trick of the analysis is to defi ne a potential function and to show that each splay operation has
amortized costs O (ln (n)). It follows that the sequence has costs O (m ln (n) + n ln (n))
Zig Rotation
The Zig Rotation in splay tree is like single right rotation in AVL Tree rotations.In zig rotation,
every node moves one position to the right from its current position.
Zag Rotation
In zag rotation, every node moves one position to the left from its current position.
Zig-Zig Rotation
The Zig-Zig Rotation in splay tree is a double zig rotation. In zig-zig rotation, every node moves
two positions to the right from its current position.
Zig-Zag Rotation
Disadvantages
A splay tree can arrange itself linearly. Therefore, the worst-case performance of a splay tree is
O(n).
Multithreaded operations can be complicated since, even in a read-only configuration, splay trees
can reorganize themselves.
2-Node
2-Node: A node with a single data element that has two child nodes.
1. Every value appearing in the child (b) 38 must be ≤ (a) 40.
2. Every value appearing in the child (c) 55 must be ≥ (a) 40.
3. The length of the path from the root of a 2-node to every leaf in its child must be the same.
3-Node
3-Node: A node with two data elements that has three child nodes.
1. Every value appearing in child P must be ≤ X.
2. Every value appearing in child Q must be in between X and Y.
3. Every value appearing in child R must be ≥ Y.
4. The length of the path from the root of a 3-node to every leaf in its child must be the same.
Insertion operation
If the tree is empty, create a node and put value into the node
Otherwise find the leaf node where the value belongs.
If the leaf node has only one value, put the new value into the node
If the leaf node has more than two values, split the node and promote the median of the three
values to parent.
If the parent then has three values, continue to split and promote, forming a new root node if
necessary
Search operation
If Tree is empty, return False (data item cannot be found in the tree).
If current node contains data value which is equal to data, return True.
If we reach the leaf-node and it doesn’t contain the required key value, return False.
Recursive Calls
If data < currentNode.leftVal, we explore the left sub tree of the current node.
Else if currentNode.leftVal < data < currentNode.rightVal, we explore the middle sub tree of the
current node.
Else if data data > currentNode.rightVal, we explore the right sub tree of the current node.
Deletion process
There are three cases in deletion process
1. When the record is to be removed from a leaf node containing two records.
In this case, the record is simply removed, and no other nodes are affected.
2. When the only record in a leaf node is to be removed.
3. When a record is to be removed from an internal node.
In both the second and the third cases, the deleted record is replaced with another that can take its
place while maintaining the correct order of 2-3 tree.
Summary
A Red Black Tree is a self-balancing binary search tree in which each node has a red or black
colour.
The red black tree satisfies all of the features of the binary search tree, but it also has several
additional properties.
Splay trees are self-adjusting binary search trees in which every access for insertion or retrieval
of a node, lifts that node all the way up to become the root, pushing the other nodes out of the
way to make room for this new root of the modified tree. Hence, the frequently accessed nodes
will frequently be lifted up and remain around the root position; while the most infrequently
accessed nodes would move farther and farther away from the root.
A 2-3 tree data structure is a specific form of a B tree where every node with children has either
two children and one data element or three children and two data elements.
Keywords
AVL tree 2-3 tree
Zag rotation / Left rotation Zig zag / Zig followed by zag
Zag zig / Zag followed by zig Zig zig / two right rotations
Zag zag / two left rotations 2-Node, 3-Node
Self Assessment
1. Red Black Tree invented in
A. 1960
B. 1972
C. 1976
D. None of above
A. O(1)
B. O(log n)
C. O(0)
D. None of above
A. Red
B. Black
C. Green
D. All of above
A. AVL tree
B. BST
C. Binary tree
D. All of above
A. Red
B. Black
C. Blue
D. None of above
A. Insertion
B. Deletion
C. Both insertion and deletion
D. Search
A. BST
B. Binary tree
C. Splay tree
D. All of above
A. Zig rotation
B. Zag rotation
C. Zag-zag rotation
D. None of above
A. Zag zag
B. Zag zig
C. Zig zig
D. All of above
10. What are factors for selecting a type of rotation
A. Insertion
B. Deletion
C. Search
D. All of above
12. 2-node is
A. A node with a double data element that has two child nodes.
B. A node with a single data element that has two child nodes.
C. A node with a single data element that has one child nodes.
D. All of above
13. 3-node is
A. A node with two data elements that has three child nodes
B. A node with three data elements that has three child nodes
C. A node with two data elements that has two child nodes
D. None of above
A. True
B. False
6. C 7. C 8. B 9. C 10. D
Review Questions
1. Discuss red black tree properties.
2. Define recolor and rotation process.
3. Differentiate between zig-zag rotation.
4. Discuss concept of 2-node and 3-node with suitable diagram.
5. Explain splay operation in splay trees.
6. “The time complexity of maintaining a splay tree is analyzed using an Amortized
Analysis.”Explain
7. “A splay tree does not keep track of heights and does not use any balance factors like
anAVL tree”. Explain
Further Readings
Burkhard Monien, Data Structures and Effi cient Algorithms, Thomas
Ottmann,Springer.
Kruse, Data Structure & Program Design, Prentice Hall of India, New Delhi.
Mark Allen Weles, Data Structure & Algorithm Analysis in C, Second Ed.,
Addison-Wesley Publishing.
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Lipschutz. S. (2011). Data Structures with C. Delhi: Tata McGraw hill
Reddy. P. (1999). Data Structures Using C. Bangalore: Sri Nandi Publications
Samantha. D (2009). Classic Data Structures. New Delhi: PHI Learning Private
Limited
Web Links
www.en.wikipedia.org
www.web-source.net
www.webopedia.com
https://www.cs.auckland.ac.nz/software/AlgAnim/red_black.html
https://www.javatpoint.com/daa-red-black-tree
http://www.btechsmhttp://www.cs.cornell.edu/courses/cs3110/2011sp/Recitations/rec
25-splay/splay.htmartclass.com/data_structures/splay-trees.html
Objectives
After studying this unit, you will be able to:
Introduction
The heap data structure is a complete binary tree where each node of the tree has an orderly
relationship with its successors. Binary search trees are totally ordered, but the heap data structure
is only partially ordered. It is suitable for inserting and deleting minimum value operations.
Heap is an array object that is considered as a complete binary tree. Each node of the tree
corresponds to an element of the array that stores the value in the node. The tree is completely
filled at all levels except possibly the lowest, which is filled from the left upwards to a point. Heap
data structures are suitable for implementing priority queues. The heap serves as a foundation of a
theoretically important sorting algorithm called heap sort, which we will discuss after defining the
heap.
8.1 Heap
A heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node
of A, then key (A) ≥ key(B). This implies that an element with the greatest key is always in the root
node, and so such a heap is sometimes called a max-heap. (Alternatively, if the comparison is
reversed, the smallest element is always in the root node, which results in a min-heap.) The heap is
one maximally-efficient implementation of an abstract data type called a priority queue. Heaps are
crucial in several efficient graph algorithms.
A heap is a storage pool in which regions of memory are dynamically allocated. For example, in
C++ the space for a variable is allocated essentially in one of three possible places: Global variables
are allocated in the space of initialized static variables; the local variables of a procedure are
allocated in the procedure’s activation record, which is typically found in the processor stack; and
dynamically allocated variables are allocated in the heap. In this unit, the term heap is taken to
mean the storage pool for dynamically allocated variables.
I consider heaps and heap-ordered trees in the context of priority queue implementations. While it
may be possible to use a heap to manage a dynamic storage pool, typical implementations do not.
In this context, the technical meaning of the term heap is closer to its dictionary defi nition–”a pile
of many things.”
A binary tree has the heap property iff
1. It is empty or
2. The key in the root is larger than that in either child and both subtrees have the heap property.
A heap can be used as a priority queue: the highest priority item is at the root and is trivially
extracted. But if the root is deleted, you are left with two sub-trees and you must efficiently re-
create a single tree with the heap property.
The value of the heap structure is that you can both extract the highest priority item and insert a
new one in O(logn) time.
A heap can be defined as binary trees with keys assigned to its nodes (one key per node). The two
types of heaps are:
1. Max heaps
2. Min heaps
Max heaps
The key present at the root node must be greatest or equal to the keys present at all of its
children.The same property must be true for all sub-trees in that Binary Tree.
Max heap
Min heaps
The key present at the root node must be minimum or equal to the keys present at all of its
children. The same property must be true for all sub-trees in that Binary Tree.
Example:Max heap
Method 1
1. Remove root node.
2. Move the last element of last level to root.
3. Compare the value of this child node with its parent.
Method 2
1. Select the element to be deleted.
2. Swap it with the last element.
3. Remove the last element.
4. Heapify the tree.
Example:
Priority queue
In priority queue key is associated with every element.The element with highest priority will be
moved to the front of the queue and one with lowest priority will move to the back of the
queue.Queue returns the element according to priority.However, if elements with the same priority
occur, they are served according to their order in the queue.
One of the most important applications of priority queues is in discrete event simulation.
Simulation is a tool which is used to study the behavior of complex systems. The fi rst step in
simulation is modeling. You construct a mathematical model of the system I wish to study. Then
you write a computer program to evaluate the model.
The systems studied using discrete event simulation have the following characteristics: The system
has a state which evolves or changes with time. Changes in state occur at distinct points in
simulation time. A state change moves the system from one state to another instantaneously. State
changes are called events.
A priority queue is a queue with items having an orderable characteristic called priority. The
objects having the highest priority are always removed first from the priority queues. A priority
queue can be obtained by creating a heap. First call a function that creates an ascending heap.
After creating the heap, delete the root node and call a function to recreate the heap for the
remaining elements. This method helps in implementing an ascending priority queue. In the same
way, we can implement a descending priority queue.
A max-priority queue returns the element with maximum key first. A max-heap is used for a
max-priority queue.
A min-priority queue returns the element with the smallest key first. A min-heap is used for a
min-priority queue.
Ascending order priority queue: In ascending order priority queue, a lower priority number is
given as a higher priority in a priority.
Descending order priority queue: In descending order priority queue, a higher priority number
is given as a higher priority in a priority.
Insertion operation
To insert a new key into a heap, add a new node with key K after the last leaf of the existing heap.
Then, shift K up to its suitable place in the new heap. Consider inserting value 8 into the heap
shown in the figure
Compare 8 with its parent key. Stop if the parent key is greater than or equal to 8. Else, swap these
two keys and compare 8 with its new parent (Refer to figure 14.8). This swapping continues until 8
is not greater than its last parent or it reaches the root. In this algorithm too, we can shift up an
empty node until it reaches its proper position, where it acquires the value 8.
This insertion operation does not require more key comparisons than the heap’s height. Since the
height of a heap with n nodes is about log2n, the time efficiency of insertion is in O(log n).
Delete operation
If nodeToBeDeleted is the leafNode
remove the node
Else swap nodeToBeDeleted with the lastLeafNode
remove noteToBeDeleted
heapify the array
Delete operation
We can determine the efficiency of deletion by the number of key comparisons required to
“heapify” the tree after the swap is done, and the size of the tree is decreased by 1. Since it does not
need more key comparisons than twice the heap’s height, the time efficiency of deletion is in O(log
n).
Summary
A heap is a partially sorted binary tree. Although a heap is not completely in order, it conforms
to a sorting principle: every node has a value less (for the sake of simplicity, I will assume that
all orderings are from least to greatest) than either of its children.
The heap data structure is a complete binary tree where each node of the tree relates to an
element of the array that stores the value in the node.
The two principal ways to construct a heap are by using the bottom-up heap construction
algorithm and the top-down heap construction algorithm
A heap is used to implement heapsort. Heapsort is a comparison-based sorting algorithm which
has a worst-case of O (n log n) runtime.
A priority queue is a queue with items having an orderable characteristic called priority. The
objects having the highest priority are always removed first from the priority queues.
Priority queue can be attained by creating a heap.
Keywords
Ascending Heap: It is a complete binary tree in which the value of each node is greater
than or equal to the value of its parent.
Heapify: Heapify is a procedure for manipulating heap data structures.
N-ary Tree: An n-ary tree is either an empty tree, or a non-empty set of nodes which
consists of a root and exactly N sub-trees. The degree of each node of an N-ary tree is
either zero or N.
Heap: A heap is a specialized tree-based data structure that satisfi es the heap property: if
B is a child node of A, then key(A) ≥ key(B).
Binary Heap: A binary heap is a heap-ordered binary tree which has a very special shape
called a complete tree.
Discrete Event Simulation: One of the most important applications of priority queues is
in discrete event simulation.
Self Assessment
1. Heap satisfy following properties
A. Structural property
B. Ordering property
C. Both Structural and Ordering property
D. None of above
A. Max-Heap
B. Min-Heap
5. In which heap the root node must be greatest among the keys present at all of its children?
A. Max-heap
B. Min-heap
C. Both A and B
D. None of the above
A. O(log n)
B. O(log h)
C. Both O(log n) and O(log h)
D. None of above
8. In the worst case, the time complexity of inserting a node in a heap would be
A. O(logN)
B. O(1)
C. O(H)
D. None of above
A. Max
B. Min
C. Descending order
D. All of above
A. Delete
B. Peeking from the Priority Queue
C. Isfull
D. Extract-Max/Min from the Priority Queue
A. Arrays,
B. Linked list,
C. Heap data structure and binary search tree
D. All of above
A. Arrays,
B. Linked list,
C. Heap data structure
D. Binary search tree
15. What is the time complexity to insert a node based on key in a priority queue?
A. O(nlogn)
B. O(n)
C. O(1)
D. O(n2)
6. C 7. C 8. A 9. D 10. D
Review Questions
1. Discuss heap properties.
2. “A heap can be implemented as an array by recording its elements in top-down left-to-
right manner”. Describe in detail.
3. “Binary search property is different from heap property”. Justify.
4. Describe priority heap with example.
5. What are the applications of Priority Queue?
6. Represent the max heap and min heap for the data 3, 8, 20, 28, 42, 54.
7. Differentiate between max heap and min heap with example.
Further Readings
Burkhard Monien, Data Structures and Effi cient Algorithms, Thomas
Ottmann,
Springer.
Kruse, Data Structure & Program Design, Prentice Hall of India, New Delhi.
Mark Allen Weles, Data Structure & Algorithm Analysis in C, Second Ed.,
Addison-Wesley Publishing.
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Lipschutz. S. (2011). Data Structures with C. Delhi: Tata McGraw hill
Reddy. P. (1999). Data Structures Using C. Bangalore: Sri Nandi Publications
Samantha. D (2009). Classic Data Structures. New Delhi: PHI Learning Private
Limited
Web Links
www.en.wikipedia.org
www.web-source.net
www.webopedia.com
https://www.programiz.com/dsa/heap-data-structure
https://www.javatpoint.com/heap-data-structure
https://www.tutorialspoint.com/data_structures_algorithms/heap_data_structure.htm
Objectives
After studying this unit, you will be able to:
Introduction
A heap is a complete binary tree, and a binary tree is one in which each node can have no more
than two children. A complete binary tree is one in which all levels except the last, i.e., the leaf
node, are completely filled and all nodes are justified to the left.
The elements of a heap sort are processed by generating a min-heap or max-heap with the items of
the provided array. The ordering of an array in which the root element reflects the array's minimal
or maximum element is known as min-heap or max-heap.Heapsort is a well-liked and efficient
sorting method. The idea behind heap sort is to remove elements from the heap part of the list one
by one and then insert them into the sorted part.
Algorithm
HeapSort(arr)
BuildMaxHeap(arr)
for i = length(arr) to 2
swap arr[1] with arr[i]
heap_size[arr] = heap_size[arr] ? 1
MaxHeapify(arr,1)
End
BuildMaxHeap(arr)
BuildMaxHeap(arr)
heap_size(arr) = length(arr)
for i = length(arr)/2 to 1
MaxHeapify(arr,i)
End
MaxHeapify(arr,i)
MaxHeapify(arr,i)
L = left(i)
R = right(i)
if L ? heap_size[arr] and arr[L] > arr[i]
largest = L
else
largest = i
if R ? heap_size[arr] and arr[R] > arr[largest]
largest = R
if largest != i
swap arr[i] with arr[largest]
MaxHeapify(arr,largest)
End
Heap sort first converts the initial array into a heap. The heapsort algorithm uses ‘heapify’ method
to complete the task. The heapify algorithm, as given in the above code, receives a binary tree as
input and converts it to a heap. Then, the root is compared with its two children, and the larger
child is swapped with it. This may result in one of the left or right sub-trees losing the heap
property. As a result, the heapify algorithm is recursively applied to the suitable sub-tree rooted at
the node whose value was swapped with the root. This process continues until a leaf node is
reached, or until the heap property is satisfied in the sub-tree.
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
heapify(a, n, largest);
}
}
/*Function to implement the heap sort*/
void heapSort(int a[], int n)
{
heapify(a, i, 0);
}
}
/* function to print the array elements */
void printArr(int a[], int n)
{
for (int i = 0; i < n; ++i)
{
cout<<a[i]<<" ";
}
}
int main()
{
If the binomial tree is represented as B0 then the tree consists of a single node.In general terms, B k
consists of two binomial trees, i.e., Bk-1 and Bk-1 are linked together in which one tree becomes the
left sub tree of another binomial tree.
Binomial Tree B0
If B0, k= 0, there would be only one node in the tree
Binomial Tree B1
If B1, k= 1, means k-1 equal to 0. Therefore, there would be two binomial trees of B0 in which one
B0 becomes the left sub tree of another B0.
Binomial Tree B2
If B2, k= 2, means k-1 equal to 1. Therefore, there would be two binomial trees of B1 in which one
B1 becomes the left sub tree of another B1.
Binomial Tree B3
If B3 , k= 3, means k-1 equal to 2. Therefore, there would be two binomial trees of B2 in which one
B2 becomes the left sub tree of another B2.
Find minimum
To find the minimum element of the heap, find the minimum among the roots of the binomial
trees.It requires O(Logn) time. It can be optimized to O(1) by maintaining a pointer to minimum
key root.
Decrease Key
We compare the decreases key with it parent and if parent’s key is more, we swap keys and recur
for the parent. Swap process stop when we either reach a node whose parent has a smaller key or
we hit the root node. Time complexity of decrease Key() is O(Logn).
The child nodes of a parent node are connected to each other through a circular doubly linked
list.Deleting a node from the tree takes O(1) time.The concatenation of two such lists takes O(1)
time.
Fibonacci heaps have a faster amortized running time than other heap types. Fibonacci heaps have
a less rigid structure as compared to binomial heaps.Fibonacci heaps are used to implement the
priority queue element in Dijkstra’s algorithm.The reduced time complexity of Decrease-Key has
importance in Dijkstra and Prim algorithms. With Binary Heap, time complexity of these
algorithms is O(VLogV + ELogV). If Fibonacci Heap is used, then time complexity is improved to
O(VLogV + E).
Insertion
Algorithm: Insertion
insert(H, x)
degree[x] = 0
p[x] = NIL
child[x] = NIL
left[x] = x
right[x] = x
mark[x] = FALSE
concatenate the root list containing x with root list H
if min[H] == NIL or key[x] < key[min[H]]
then min[H] = x
n[H] = n[H] + 1
Union
Steps for Union of two Fibonacci heaps.
Extract Min
In extract min minimum value is removed from the heap and the tree is re-adjusted.
Steps for Extract Min
Decrease Key
In decreasing a key operation, the value of a key is decreased to a lower value.
Decrease the value of the node ‘x’ to the new chosen value.
CASE 1 - If min heap order not violated,
Update min pointer if necessary.
Deleting a Node
This process makes use of decrease-key and extract-min operations. The following steps are
followed for deleting a node.
Let k be the node to be deleted.
Apply decrease-key operation to decrease the value of k to the lowest possible value (i.e. -∞).
Apply extract-min operation to remove this node.
node *H;
public:
node *InitializeHeap();
int Fibonnaci_link(node *, node *, node *);
node *Create_node(int);
node *Insert(node *, node *);
node *Union(node *, node *);
node *Extract_Min(node *);
int Consolidate(node *);
int Display(node *);
node *Find(node *, int);
int Decrease_key(node *, int, int);
int Delete_key(node *, int);
int Cut(node *, node *, node *);
int Cascase_cut(node *, node *);
FibonacciHeap() { H = InitializeHeap(); }
};
// Initialize heap
node *FibonacciHeap::InitializeHeap() {
node *np;
np = NULL;
return np;
}
// Create node
node *FibonacciHeap::Create_node(int value) {
node *x = new node;
x->n = value;
return x;
}
// Insert node
node *FibonacciHeap::Insert(node *H, node *x) {
x->degree = 0;
x->parent = NULL;
x->child = NULL;
x->left = x;
x->right = x;
x->mark = 'F';
x->C = 'N';
if (H != NULL) {
(H->left)->right = x;
x->right = H;
x->left = H->left;
H->left = x;
if (x->n < H->n)
H = x;
} else {
H = x;
}
nH = nH + 1;
return H;
}
// Create linking
int FibonacciHeap::Fibonnaci_link(node *H1, node *y, node *z) {
(y->left)->right = y->right;
(y->right)->left = y->left;
if (z->right == z)
H1 = z;
y->left = y;
y->right = y;
y->parent = z;
if (z->child == NULL)
z->child = y;
y->right = z->child;
y->left = (z->child)->left;
((z->child)->left)->right = y;
(z->child)->left = y;
if (y->n < (z->child)->n)
z->child = y;
z->degree++;
}
// Union Operation
node *FibonacciHeap::Union(node *H1, node *H2) {
node *np;
node *H = InitializeHeap();
H = H1;
(H->left)->right = H2;
(H2->left)->right = H;
np = H->left;
H->left = H2->left;
H2->left = np;
return H;
}
// Display the heap
int FibonacciHeap::Display(node *H) {
node *p = H;
if (p == NULL) {
(z->left)->right = z->right;
(z->right)->left = z->left;
H1 = z->right;
if (x->right == x)
H1 = x;
A[d] = NULL;
d = d + 1;
}
A[d] = x;
x = x->right;
}
while (x != H1);
H = NULL;
for (int j = 0; j <= D; j++) {
if (A[j] != NULL) {
A[j]->left = A[j];
A[j]->right = A[j];
if (H != NULL) {
(H->left)->right = A[j];
A[j]->right = H;
A[j]->left = H->left;
H->left = A[j];
if (A[j]->n < H->n)
H = A[j];
} else {
H = A[j];
}
if (H == NULL)
H = A[j];
else if (A[j]->n < H->n)
H = A[j];
}
}
}
y->mark = 'T';
} else
{
Cut(H1, y, z);
Cascase_cut(H1, z);
}
}
}
// Search function
node *FibonacciHeap::Find(node *H, int k) {
node *x = H;
x->C = 'Y';
node *p = NULL;
if (x->n == k) {
p = x;
x->C = 'N';
return p;
}
if (p == NULL) {
if (x->child != NULL)
p = Find(x->child, k);
if ((x->right)->C != 'Y')
p = Find(x->right, k);
}
x->C = 'N';
return p;
}
// Deleting key
int FibonacciHeap::Delete_key(node *H1, int k) {
node *np = NULL;
int t;
t = Decrease_key(H1, k, -5000);
if (!t)
np = Extract_Min(H);
if (np != NULL)
cout << "Key Deleted" << endl;
else
cout << "Key not Deleted" << endl;
return 0;
}
int main() {
int n, m, l;
FibonacciHeap fh;
node *p;
node *H;
H = fh.InitializeHeap();
p = fh.Create_node(7);
H = fh.Insert(H, p);
p = fh.Create_node(3);
H = fh.Insert(H, p);
p = fh.Create_node(17);
H = fh.Insert(H, p);
p = fh.Create_node(24);
H = fh.Insert(H, p);
fh.Display(H);
p = fh.Extract_Min(H);
if (p != NULL)
cout << "The node with minimum key: " << p->n << endl;
else
cout << "Heap is empty" << endl;
m = 26;
l = 16;
fh.Decrease_key(H, m, l);
m = 16;
fh.Delete_key(H, m);
}
Complexities
Insertion O(1)
Find Min O(1)
Union O(1)
Extract Min O(log n)
Decrease Key O(1)
Delete Node O(log n)
Summary
Heap sort is sorting technique based upon Binary Heap data structure. It is comparison-based
sorting technique.
The elements of a heap sort are processed by generating a min-heap or max-heap with the
items of the provided array.
Keywords
Max heap Min heap
Binomial Heap Heap sort
Extract Min Decrease Key
Self Assessment
1. Heap sort is___
A. Max heap
B. Min heap
C. Both Max and Min heap
D. None of above
A. Max heap
B. Min heap
C. Both Max and Min heap
D. None of above
A. Max heap
B. Min heap
C. Both Max and Min heap
D. None of above
5. Complexity of the Heap Sort in worst case is___
A. (log 1)
B. (log n)
C. (n log n)
D. None of above
A. Min heap
B. Max heap
C. Both max and min heap
D. None of above
A. Binary trees.
B. AVL trees.
C. Binomial trees.
D. None of above
A. 0
B. 1
C. 2
D. 3
A. 1
B. 2
C. 3
D. None of above
A. Union
B. Extract Min
C. Peek
D. Decrease a key
A. Maximum
B. Minimum
C. Both minimum and maximum
D. None of above
14. The child nodes of a parent node are connected to each other through______
A. (1)
B. (0)
C. (log n)
D. None of above
6. B 7. C 8. C 9. D 10. C
Review Questions
1. What are the steps for heap sort operation?
2. Write algorithm for heap sort.
3. Explain complexity of heap sort.
4. Define binomial Heap with suitable example.
5. Discuss different operations of binomial heap
6. Describe insert and union operations in Fibonacci Heap.
7. Explain different cases of Decrease Key.
Further Readings
Burkhard Monien, Data Structures and Effi cient Algorithms, Thomas Ottmann,
Springer.
Kruse, Data Structure & Program Design, Prentice Hall of India, New Delhi.
Mark Allen Weles, Data Structure & Algorithm Analysis in C, Second Ed.,
Addison-Wesley Publishing.
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Lipschutz. S. (2011). Data Structures with C. Delhi: Tata McGraw hill
Reddy. P. (1999). Data Structures Using C. Bangalore: Sri Nandi Publications
Samantha. D (2009). Classic Data Structures. New Delhi: PHI Learning Private
Limited
Web Links
www.en.wikipedia.org
www.web-source.net
www.webopedia.com
https://www.tutorialspoint.com/fibonacci-heaps-in-data-structure
https://www.cl.cam.ac.uk/teaching/1415/Algorithms/fibonacci.pdf
http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap20.htm
http://www.cs.toronto.edu/~anikolov/CSC265F18/binomial-heaps.pdf
Objectives
After studying this unit, you will be able to:
Introduction
In this unit, we introduce you to an important mathematical structure called Graph. Graphs have
found applications in subjects as diverse as Sociology, Chemistry, Geography and Engineering
Sciences. They are also widely used in solving games and puzzles. In computer science, graphs are
used in many areas one of which is computer design. In day-to-day applications, graphs find their
importance as representations of many kinds of physical structure.
We use graphs as models of practical situations involving routes: the vertices represent the cities
and edges represent the roads or some other links, specially in transportation management,
Assignment problems and many more optimization problems. Electric circuits are another obvious
example where interconnections between objects play a central role. Circuit’s elements like
transistors, resistors, and capacitors are intricately wired together. Such circuits can be represented
and processed within a computer in order to answer simple questions like “Is everything connected
together?” as well as complicated questions like “If this circuit is built, will it work?”
10.1 Graphs
A Graph G consists of a set V of vertice (nodes) and a set E of edges (arcs). We write G=(V,E). V
is a fi nite and non empty set of vertices. E is a set of pairs of vertices; these pairs are called edges.
Therefore
In an undirected graph, pair of vertices representing any edge is unordered. Thus (v,w) and (w,v)
represent the same edge. In a directed graph each edge is an ordered pair of vertices, i.e. each edge
is represented by a directed pair. If e = (v,w), then v is tail or initial vertex and w is head or fi nal
vertex. Subsequently (v,w) and (w,v) represent two different edges.
A directed graph may be pictorically represented as given in Figure
Directed graph
The direction is indicated by an arrow. The set of vertices for this graph remains the same as that
of the graph in the earlier example, i.e.
V(G) = (1,2,3,4,5}
However the set of edges would be
E(G) = {(1,2), (2,3), (3,4), (5,4), (5,1), (1,3), (5,3)}
Vertices
Edges
Path
Closed path
Degree of the Node
Adjacent Nodes/ Adjacency
Vertices: Each node of the graph is represented as a vertex.
Edge: it is used to represent the relationships between various nodes in a graph. An edge between
two nodes expresses a one-way or two-way relationship between the nodes.
Path: Path represents a sequence of edges between the two vertices. E.g. ABC
Closed Path: A path will be called as closed path if the initial node is same as terminal node.
Degree of the Node: A degree of a node is the number of edges that are connected with that node.
Degree of A=3.
Adjacent Nodes/ Adjacency: if two nodes are connected to each other through an edge are called as
neighbors or adjacent nodes.
Undirected graph
An undirected graph nodes are connected and all the edges are bi-directional i.e. the edges do not
point in any specific direction.
Directed graph
A directed graph is a graph in which all the edges are uni-directional i.e. the edges point in a single
direction. It is also called a digraph.
Weightedgraph
In weighted graph edges or path have values or cost. All the values seen associated with the edges
are called weights.
Un-weighted graph
In un-weighted graph there is no value or weight associated with the edge. By default, all the
graphs are un-weighted.
Complete graph
A complete graph is the one in which every node is connected with all other nodes. A complete
graph contain n(n-1)/2 edges where n is the number of nodes in the graph.
Finite graph
The graph G=(V, E) is called a finite graph if the number of vertices and edges in the graph is
limited in number
Trivial Graph
A graph G= (V, E) is trivial if it contains only a single vertex and no edges.
Multi Graph
If there are numerous edges between a pair of vertices in a graph G= (V, E), the graph is referred to
as a multi graph. There are no self-loops in a Multi graph.
Pseudo graph
If a graph G= (V, E) contains a self-loop besides other edges, it is a pseudo graph.
Labeled graph
A graph G=(V, E) is called a labeled graph if its edges are labeled with some name or data.
Adjacent Matrix
Two vertices is called adjacent or neighbor if it support at least one common edge.A finite graph
can be represented in the form of a square matrix.Boolean value (0,1) of the matrix indicates if there
is a direct path between two vertices.
It is also called 2D matrix that is used to map the association between the graph nodes.If a graph
has n number of vertices, then the adjacency matrix of that graph is n x n, and each entry of the
matrix represents the number of edges from one vertex to another.
Navigation tasks
It is used to represent finite graphs
Creating routing table in networks
The adjacency list representation needs a list of all of its nodes, i.e.
The adjacency list representation is better for sparse graphs because the space required is O(V + E),
as contrasted with the O(V2) required by the adjacency matrix representation.
An undirected connected graph can have more than one spanning tree.
All the possible spanning trees of a graph have the same number of edges and vertices.
The spanning tree does not have any cycle / loops.
Any connected and undirected graph will always have at least one spanning tree.
Spanning tree is always minimally connected. Removing one edge from the spanning tree
will make the graph disconnected
A spanning tree is maximally acyclic. Adding one edge to the spanning tree will create a
cycle or loop.
Spanning tree has n-1 edges, where n is the number of nodes (vertices).
A complete graph can have maximum nn-2 number of spanning trees.
From a complete graph, by removing maximum e - n + 1 edges, we can construct a spanning
tree.
Example:If we have n = 4, the maximum number of possible spanning trees is equal to 4 4-2 =
16. Thus, 16 spanning trees can be formed from a complete graph with 4 vertices.
Fig. a
Fig. b
Fig. c
Fig. d
Fig. e
B C
D
Minimum spanning tree = C, sum =9.
Summary
A Graph G consists of a set V of vertice (nodes) and a set E of edges (arcs)
An undirected graph nodes are connected and all the edges are bi-directional i.e. the edges
do not point in any specific direction.
If there are numerous edges between a pair of vertices in a graph G= (V, E), the graph is
referred to as a multi graph. There are no self-loops in a Multi graph.
Two vertices is called adjacent or neighbor if it support at least one common edge. A finite
graph can be represented in the form of a square matrix.
Every undirected and connected graph has a minimum of one spanning tree.
Keywords
Vertices Edges
Path Closed path
Degree of the Node Spanning tree
Self Assessment
1. Graph is collection of ___
A. Vertices
B. Edges
C. Both vertices and edges
D. None of above
A. Path
B. Extract Min
C. Edge
D. Closed path
A. Pseudo
B. Trivial
C. Disconnected
D. All of above
A. Adjacency Matrix
B. Adjacency List
C. Both Adjacency Matrix and Adjacency List
D. None of above
A. Navigation tasks
B. It is used to represent finite graphs
C. Creating routing table in networks
D. All of above
A. Connected
B. Directed
C. Centrality
D. Bidirectional
6. C 7. D 8. A 9. A 10. B
Review Questions
1. Define graph and its different types.
2. Discuss edge and vertices with example.
3. How to find degree of node?
4. Differentiate between directed and weighted graph with example.
5. Give an example of Adjacency List representation.
6. How spanning tree is different from minimum spanning tree?
7. What are the applications of spanning tree?
Further Readings
Burkhard Monien, Data Structures and Efficient Algorithms, Thomas
Ottmann,Springer.
Kruse, Data Structure & Program Design, Prentice Hall of India, New Delhi.
Mark Allen Weles, Data Structure & Algorithm Analysis in C, Second Ed.,
Addison-Wesley Publishing.
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Lipschutz. S. (2011). Data Structures with C. Delhi: Tata McGraw hill
Web Links
www.en.wikipedia.org
www.web-source.net
www.webopedia.com
https://www.javatpoint.com/spanning-tree
https://www.programiz.com/dsa/graph
Objectives
After studying this unit, you will be able to:
Introduction
Graph traversal entails visiting each vertex and edge in a predetermined order. You must verify
that each vertex of the graph is visited exactly once when utilizing certain graph algorithms. The
sequence in which the vertices are visited is crucial, and it may be determined by the algorithm or
question you're working on.It's critical to keep track of which vertices have been visited throughout
a traversal. Marking vertices is the most popular method of tracking them.
In Graph traversal visiting every vertex and edge exactly once in a well-defined order.In graph
algorithms, you must ensure that each vertex of the graph is visited exactly once. The order in
which the vertices are visited may depend upon the algorithm or type of problem going to solve.
Two common elementary algorithms for tree-searching are
– Breadth-first search (BFS)
– Depth-first search (DFS).
Both of these algorithms work on directed or undirected graphs. Many advanced graph algorithms
are based on the ideas of BFS or DFS. Each of these algorithms traverses edges in the graph,
discovering new vertices as it proceeds. The difference is in the order in which each algorithm
discovers the edges.
Fig (a)
Fig (b)
Fig (c)
Fig (d)
Fig (e)
Fig (f)
Fig (g)
Fig (h)
Algorithm Complexity
The time complexity of the BFS algorithm is represented in the form of O(V + E), where V is the
number of nodes and E is the number of edges.
The space complexity of the algorithm is O(V).
BFS Applications
Step 5: Push on the stack all the neighbours of N that are in the ready state (whose STATUS = 1)
and set their
STATUS = 2 (waiting state)
[END OF LOOP]
Step 6: EXIT
Fig (a)
Fig (b)
Fig (c)
Fig (d)
Fig (e)
Fig (f)
Fig (g)
Fig (h)
Fig (i)
Fig (j)
Fig (k)
Fig (l)
Fig (m)
Algorithm Complexity
Time complexity: O(V + E), where V is the number of vertices and E is the number of edges in the
graph.
Space Complexity: O(V).
DFS Applications
The network simplex algorithm, a method based on linear programming but specialized for
network flow
The out-of-kilter algorithm for minimum-cost flow
The push–relabel maximum flow algorithm, one of the most efficient known techniques for
maximum flow
Bottleneck capacity of a path is the minimum capacity of any edge on the path.
An augmenting path is a simple path from source to sink which do not include any cycles and
that pass only through positive weighted edges.
Residual capacity: which is equal to original capacity of the edge minus current flow. Residual
capacity is basically the current capacity of the edge.
Path: A-B-C-G
Flow = 4
Path: A-D-E-G
Flow= 4+3
Path: A-B-F-C-G
Flow = 4+3+2
Path: A-D-F-C-G
Flow = 4+3+2+1 = 10
Ford-Fulkerson Applications
Floyd-Warshall Algorithm
n = no of vertices
A = matrix of dimension n*n
for k = 1 to n
for i = 1 to n
for j = 1 to n
Ak[i, j] = min (Ak-1[i, j], Ak-1[i, k] + Ak-1[k, j])
return A
D2
D3
D4
Complexity
Time complexity = O(|V|3)
Space complexity = O(|V|2)
Visited vertex: 1 2 3 5 6 4
Visited vertex: 1 3 2 5 4 6
Visited vertex: 1 2 3 5 4 6
Visited vertex: 1 3 2 5 6 4
Summary
Graphs provide in excellent way to describe the essential features of many applications.
Graphs are mathematical structures and are found to be useful in problem solving. They may
be implemented in many ways by the use of different kinds of data structures.
Graph traversals, Depth first as well as Breadth First, are also required in many applications.
Breadth first search is a graph traversal algorithm that starts traversing the graph from root
node and explores all the neighboring nodes.
DFS traversal is a recursive algorithm for searching all the vertices/ nodes of a graph or tree
using stack data structure.
The Floyd Warshall Algorithm is used for solving the All Pairs Shortest Path problem
Keywords
Network Flow BFS
DFS Floyd Warshall Algorithm
Topological sort network simplex algorithm
Self Assessment
1. Which is not Graph traversal algorithm___
A. Log n
B. (V + E)
C. (log 1)
D. log n
A. Stack
B. Queue
C. Linked list
D. All of above
A. Path Finding.
B. Cycle detection in graphs.
C. Topological Sorting.
D. All of above
A. Log n
B. (V + E)
C. (V)
D. log n
A. Queue
B. Linked list
C. Stack
D. All of above
A. Multi-commodity
B. Minimum-cost
C. Travelling salesman problem
D. Nowhere-zero
A. Dinic's algorithm
B. Edmonds–Karp algorithm
C. Out-of-kilter
D. All of above
A. log n
B. O(|V|3)
C. O(|V|2)
D. All of above
A. (log 2)
B. log 1
C. O(|V|3)
D. O(|V|2)
A. Data Serialization
B. Instruction Scheduling
C. Scheduling jobs from the given dependencies among jobs
D. All of above
A. 1
B. 0
C. 2
D. 4
6. C 7. B 8. C 9. D 10. C
Review Questions
1. Discuss graph traversal with example.
2. Why queue and stack data structure is used with BFS and DFS?
3. Give an example of BFS with example.
4. Describe different types of network flow problem.
5. What are the applications of topological sort?
6. Discuss all pair shortest path problem.
7. DefineBottleneck capacity and Augmenting path.
Further Readings
Burkhard Monien, Data Structures and Efficient Algorithms, Thomas Ottmann,
Springer.
Kruse, Data Structure & Program Design, Prentice Hall of India, New Delhi.
Mark Allen Weles, Data Structure & Algorithm Analysis in C, Second Ed., Addison-
Wesley Publishing.
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Lipschutz. S. (2011). Data Structures with C. Delhi: Tata McGraw hill
Reddy. P. (1999). Data Structures Using C. Bangalore: Sri Nandi Publications
Samantha. D (2009). Classic Data Structures. New Delhi: PHI Learning Private Limited
Web Links
www.en.wikipedia.org
www.web-source.net
https://www.brainkart.com/article/Topological-Sort_10158
https://www.geeksforgeeks.org/topological-sorting
https://www.tutorialspoint.com/difference-between-bfs-and-dfs
Objectives
After studying this unit, you will be able to:
Introduction
The search time of each algorithm depend on the number n of elements of the collection S of the
data. A searching technique called Hashing or Hash addressing which is essentially independent of
the number n.
Hashing is the transformation of a string of characters into a usually shorter fixed-length value or
key that represents the original string. Hashing is used to index and retrieve items in a database
because it is faster to find the item using the shorter hashed key than to find it using the original
value. It is also used in many encryption algorithms.
A Hash Function is a Unary Function that is used by Hashed Associative Containers: it maps its
argument to a result of type sized. A Hash Function must be deterministic and stateless. That is, the
return value must depend only on the argument, and equal arguments must yield equal results.
12.1 Hashing
In many applications we require to use a data object called symbol table. A symbol table is nothing
but a set of pairs (name, value). Where value represents collection of attributes associated with
thename, and this collection of attributes depends upon the program element identified by the
name.For example, if a name x is used to identify an array in a program, then the attributes
associatedwith x are the number of dimensions, lower bound and upper bound of each dimension,
and theelement type. Therefore a symbol table can be thought of as a linear list of pairs (name,
value),and hence you can use a list of data object for realizing a symbol table. A symbol table is
referredto or accessed frequently either for adding the name, or for storing the attributes of the
name, orfor retrieving the attributes of the name. Therefore, accessing efficiency is a prime concern
whiledesigning a symbol table. Hence the most common way of getting a symbol table
implementedis to use a hash table. Hashing is a method of directly computing the index of the table
by usingsome suitable mathematical function called hash function. The hash function operates on
thename to be stored in the symbol table, or whose attributes are to be retrieved from the
symboltable. If h is a hash function and x is a name, then h(x) gives the index of the table where x
alongwith its attributes can be stored. If x is already stored in the table, then h(x) gives the index of
thetable where it is stored to retrieve the attributes of x from the table. There are various methods
ofdefining a hash function like a division method. In this method, you take the sum of the values
ofthe characters, divide it by the size of the table, and take the remainder. This gives us an
integervalue lying in the range of 0 to (n – 1) if the size of the table is n. The other method is a mid-
square method. In this method, the identifi er is first squared and then the appropriate numberof
bits from the middle of square is used as the hash value. Since the middle bits of the squareusually
depend on all the characters in the identifi er, it is expected that different identifiers willresult into
different values. The number of middle bits that you select depends on the table size.Therefore if r
is the number of middle bits that you use to form hash value, then the table sizewill be 2r. Hence
when you use this method the table size is required to be power of 2. Anothermethod is folding in
which the identifi er is partitioned into several parts, all but the last partbeing of the same length.
These parts are then added together to obtain the hash value.
To store the name or to add attributes of the name, you compute hash value of the name, andplace
the name or attributes as the case may be, at that place in the table whose index is the hashvalue of
the name. For retrieving the attribute values of the name kept in the symbol table, I applythe hash
function to the name to obtain index of the table where you get the attributes of thename. Hence
you find that no comparisons are required to be done. Hence the time required forthe retrieval is
independent of the table size. Therefore, retrieval is possible in a constant amountof time, which
will be the time taken for computing the hash function. Therefore, hash table seemsto be the best
for realization, of the symbol table, but there is one problem associated with thehashing, and it is of
collisions. Hash collision occurs when the two identifiers are mapped intothe same hash value. This
happens because a hash function defines a mapping from a set of valididentifiers to the set of those
integers, which are used as indices of the table. Therefore, you seethat the domain of the mapping
defined by the hash function is much larger than the range of themapping, and hence the mapping
is of many to one nature. Therefore, when I implement a hashtable a suitable collision handling
mechanism is to be provided which will be activated whenthere is a collision.
Collision handling involve finding out an alternative location for one of the two collidingsymbols.
For example, if x and y are the different identifiers and if h(x) = h(y), x and y are thecolliding
symbols. If x is encountered before y, then the ith entry of the table will be used foraccommodating
symbol x, but later on when y comes there is a hash collision, and thereforeyou have to fi nd out an
alternative location either for x or y. This means you find out a suitablealternative location and
either accommodate y in that location, or you can move x to that location. and place y in the ith
location of the table. There are various methods available to obtain an alternative location to handle
the collision. They differ from each other in the way search is made for an alternative location. The
following are the commonly used collision handling techniques.
Double Hashing –Double hashing is a computer programming method used in hash tables to
resolve the issues of has a collision.
Quadratic probing– It helps you to determine the new bucket address. It helps you to add
Interval between probes by adding the consecutive output of quadratic polynomial to starting
value given by the original computation.
Time complexity
Time complexity in linear search is O(n)
Time complexity in binary search is O(log n)
Time complexity in hashing is O(1)
Example:
m=30; k=80
h(k) = k mod m = 20
Example:
Example:
-The left and right numbers are folded on fixed boundary between them and the centre.
-The two outside values are then reversed.
Case 1: if hash table size is 100 (0-99) and sum in 3 digits, then we will ignore last carry, if any.
Or
Case 2: if hash table size is 100 (0-99) and sum in 3 digits, then we need to perform the extra step of
dividing by 100(size of table) and keeping the remainder.
Summary
• Hash functions are mostly used in hash tables, to quickly locate a data record (for example, a
dictionary defi nition) given its search key (the headword).
• Specifically, the hash function is used to map the search key to the index of a slot in the table
where the corresponding record is supposedly stored.
• A hash function is any function that can be used to map a data set of an arbitrary size to a data
set of a fixed size, which falls into the hash table.
• The folding method for constructing hash functions begins by dividing the item into equal-
size pieces (the last piece may not be of equal size).
Keywords
Hash function Division method
Mid square method Fold boundary
Hash table Hashes
Modular arithmetic
Self Assessment
1. Time complexity in hashing is ___
A. (log o)
B. (log n)
C. (1)
D. None of above
A. Hash values
B. Hash codes
C. Hash sums
D. All of above
a) Division method
b) Folding method
c) Mid square method
d) All of above
A. Encrypt data
B. Converting an input of any length into a fixed size string
C. Calculate mean of number
D. None of above
A. Key
B. Value
C. Both key and value
D. None of above
A. +
B. *
C. %
D. /
A. Less collisions
B. Uniform Distribution
C. Static
D. All of above
A. Folding
B. Mid square
C. Square
D. Division
A. Key value
B. Hash table size
C. Hash function
D. None of above
A. Division method
B. Mid square method
C. Multiplication Method
D. None of above
A. Division method
B. Multiplication Method
C. Mid square method
D. Pairing method
A. 0.61803
B. 0.62347
C. 0.71803
D. None of above
A. Smith K
B. Knuth
C. George S
D. None of above
A. 2 and 4
B. 2 and 3
C. 0 and 1
D. None of above
6. C 7. C 8. C 9. C 10. A
Review Question
1. Discuss hashing.
2. What is the significance of hashing in data structure?
3. Define hash function with suitable example.
4. Give an example mid square method.
5. Differentiate between division method and multiplication method.
6. Define Linear Probing and data bucket.
Further Readings
Burkhard Monien, Data Structures and Efficient Algorithms, Thomas Ottmann,
Springer.
Kruse, Data Structure & Program Design, Prentice Hall of India, New Delhi.
Mark Allen Weles, Data Structure & Algorithm Analysis in C, Second Ed., Addison-
Wesley Publishing.
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Lipschutz. S. (2011). Data Structures with C. Delhi: Tata McGraw hill
Reddy. P. (1999). Data Structures Using C. Bangalore: Sri Nandi Publications
Samantha. D (2009). Classic Data Structures. New Delhi: PHI Learning Private Limited
Web Links
www.en.wikipedia.org
www.web-source.net
https://www.tutorialspoint.com/Hash-Functions-and-Hash-Tables
https://www.ee.ryerson.ca/~courses/coe428/structures/hash.html
https://www.techopedia.com/definition/19744/hash-function
https://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200101/homework10/hashfuncs.ht
ml
Objectives
After studying this unit, you will be able to:
Introduction
The implementation of hash tables is frequently called hashing. Hashing is a technique used for
performing insertions, deletions, and finds in constant average time. Tree operations that require
any ordering information among the elements are not supported efficiently. Thus, operations such
as findMin, findMax, and the printing of the entire table in sorted order in linear time are not
supported.
Note that straightforward hashing is not without its problems, because for almost all hash
functions, more than one key can be assigned to the same position. For example, if the hash
function h1 applied to names returns the ASCII value of the first letter of each name (i.e., h1(name)
= name[0]), then all names starting with the same letter are hashed to the same position. This
problem can be solved by finding a function that distributes names more uniformly in the table. For
example, the function h2 could add the first two letters (i.e., h2(name) = name[0] + name[1]), which
is better than h1. But even if all the letters are considered (i.e., h 3(name) = name[0] + · · · +
name[strlen(name) – 1]), the possibility of hashing different names to the same location still exists.
The function h3 is the best of the three because it distributes the names most uniformly for the three
defined functions, but it also tacitly assumes that the size of the table has been increased. If the table
has only 26 positions, which is the number of different values returned by h1, there is no
improvement using h3 instead of h1. Therefore, one more factor can contribute to avoiding conflicts
between hashed keys, namely, the size of the table. Increasing this size may lead to better hashing,
but not always! These two factors—hash function and table size—may minimize the number of
collisions, but they cannot completely eliminate them. The problem of collision has to be dealt with
in a way that always guarantees a solution.
Open Hashing
The simplest form of open hashing defines each slot in the hash table to be the head of a linked list.
All records that hash to a particular slot are placed on that slot’s linked list. The figure below
illustrates a hash table where each slot stores one record and a link pointer to the rest of the list.
Records within a slot’s list can be ordered in several ways: by insertion order, by key value order,
or by frequency-of-access order. Ordering the list by key value provides an advantage in the case of
an unsuccessful search, because I know to stop searching the list once you encounter a key that is
greater than the one being searched for. If records on the list are unordered or ordered by
frequency, then an unsuccessful search will need to visit every record on the list.
Given a table of size M storing N records, the hash function will (ideally) spread the records evenly
among the M positions in the table, yielding on average N/M records for each list. Assuming that
the table has more slots than there are records to be stored, you can hope that few slots will contain
more than one record. In the case where a list is empty or has only one record, a search requires
only one access to the list. Thus, the average cost for hashing should be Θ(1). However, if clustering
causes many records to hash to only a few of the slots, then the cost to access a record will be much
higher because many elements on the linked list must be searched.
Open hashing is most appropriate when the hash table is kept in main memory, with the lists
implemented by a standard in-memory linked list. Storing an open hash table on disk in an efficient
way is difficult, because members of a given linked list might be stored on different disk blocks.
This would result in multiple disk accesses when searching for a particular key value, which
defeats the purpose of using hashing.
Let:
1. U be the universe of keys:
(a) Integers
(b) Character strings
(c) Complex bit patterns
2. B the set of hash values (also called the buckets or bins). Let B = {0, 1,..., m - 1}where m > 0 is a
positive integer.
A hash function h: U → B associates buckets (hash values) to keys.
Two main issues:
Collisions
If x1 and x2 are two different keys, it is possible that h(x1) = h(x2). This is called a collision.
Collision resolution is the most important issue in hash table implementations.
Hash Functions
Choosing a hash function that minimizes the number of collisions and also hashes uniformly is
another critical issue.
Closed Hashing
1. All elements are stored in the hash table itself
2. Avoids pointers; only computes the sequence of slots to be examined.
3. Collisions are handled by generating a sequence of rehash values.
h: U × U → {0, 1, 2,..., m - 1}
universe of primary keys probe number
4. Given a key x, it has a hash value h(x,0) and a set of rehash values
h(x, 1), h(x,2), . . . , h(x, m-1)
5. I require that for every key x, the probe sequence
< h(x,0), h(x, 1), h(x,2), . . . , h(x, m-1)>
be a permutation of <0, 1, ..., m-1>.
This ensures that every hash table position is eventually considered as a slot for storing a record
with a key value x.
Search (x, T)
Search will continue until you fi nd the element x (successful search) or an empty slot (unsuccessful
search).
Delete (x, T)
1. No delete if the search is unsuccessful.
2. If the search is successful, then put the label DELETED (different from an empty slot).
Insert (x, T)
1. No need to insert if the search is successful.
2. If the search is unsuccessful, insert at the fi rst position with a DELETED tag.
Performance of Chaining
Load factor α = n/m
m = Number of slots in hash table
n = Number of keys to be inserted in hash table
Expected time to search = O(1 + α)
Expected time to delete = O(1 + α)
Time to insert = O(1)
Time complexity
For Searching
In worst case, all the keys might map to the same bucket of the hash table.
In such a case, all the keys will be present in a single linked list.
Sequential search will have to be performed on the linked list to perform the search.
Worst case complexity for searching is O(n).
For Deletion
In worst case, the key might have to be searched first and then deleted.
Linear Probing
Quadratic probing
Double hashing
Advantage
It is easy to compute.
Disadvantage
The clustering is major problem with linear probing
Many consecutive elements form groups.
It takes too much time to find an empty slot.
Time complexity
Worst time to search for an element is O(table size).
Advantage:
Primary clustering problem resolved
Disadvantage:
Secondary clustering
No guarantee for finding slots
The number of keys to be stored in the hash The number of keys to be stored in the hash
table can even exceed the size of the hash table can never exceed the size of the hash
table. table.
Some buckets of the hash table are never Buckets may be used even if no key maps to
used which leads to wastage of space. those particular buckets.
Summary
When two keys or hash values compete with a single hash table slot, then Collision occur.
To resolve collision we use collision resolution techniques. Collisions can be reduced with a
selection of a good hash function.
Hash functions are mostly used in hash tables, to quickly locate a data record (for example, a
dictionary definition) given its search key (the headword).
Specifically, the hash function is used to map the search key to the index of a slot in the table
where the corresponding record is supposedly stored.
In linear probing fixed sized hash table is used and when hash collision situation occur then,
we linearly traverse the table in a cyclic manner to find the next empty slot.
Quadratic probing is a collision resolution method that eliminates the primary clustering
problem of linear probing.
Keywords
Separate chaining Quadratic probing
Linear Probing Open Addressing
Hash function Load factor
Self Assessment
1. Which is part of collision resolution technique ___
A. Separate chaining
B. Open addressing
C. Double hashing
D. All of above
A. Linear probing
B. Quadratic probing
C. Double hashing
D. All of above
A. Number of keys
B. Number of slots in hash table
C. Hash function
D. None of above
A. Log (1)
B. (n)
C. log (0)
D. None of above
A. (n)
B. log (1)
C. log (2)
D. None of above
A. Open hashing
B. Closed hashing
C. Linear probing
D. None of above
A. Open hashing
B. Closed hashing
C. Linear probing
D. All of above
A. Linear probing
B. Quadratic probing
C. Open hashing
D. All of above
A. Insert
B. Delete
C. Search
D. All of above
A. Linear probing
B. Quadratic probing
C. Double hashing
D. None of above
A. h(k) = k mod 10
B. h(k, i) = (h(k)+i2) mod 10
C. h(k, i) = (h(k)+i) mod 10
D. None of above
6. A 7. D 8. B 9. B 10. C
Review Questions
1. Discuss significance of collision resolution.
2. Differentiate between open hashing and closed hashing.
3. Explain cluster problem and its solution in hashing.
4. What are the advantages of quadratic probing?
5. Give an example of linear probing.
6. Discuss load factor in hashing.
Further Readings
Burkhard Monien, Data Structures and Efficient Algorithms, Thomas Ottmann,
Springer.
Kruse, Data Structure & Program Design, Prentice Hall of India, New Delhi.
Mark Allen Weles, Data Structure & Algorithm Analysis in C, Second Ed., Addison-
Wesley Publishing.
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Lipschutz. S. (2011). Data Structures with C. Delhi: Tata McGraw hill
Reddy. P. (1999). Data Structures Using C. Bangalore: Sri Nandi Publications
Samantha. D (2009). Classic Data Structures. New Delhi: PHI Learning Private Limited
Web Links
www.en.wikipedia.org
https://www.gatevidyalay.com/collision-resolution-techniques-separate-chaining/
http://users.csc.calpoly.edu/~gfisher/classes/103/lectures/week5.2.html
https://www.tutorialandexample.com/collision-resolution-techniques-in-data-structure
Objectives
After studying this unit, you will be able to:
Introduction
When two keys or hash values compete with a single hash table slot, then Collision occur.To
resolve collision we use collision resolution techniques. Collisions can be reduced with a selection
of a good hash function.
The open addressing technique requires a hash table with fixed and known size. All elements are
stored in the hash table itself. The size of the table must be greater than or equal to the total number
of keys.During insertion, if a collision is encountered, alternative cells are tried until an empty
bucket is found.
In case of collision:Probing is performed until an empty bucket is found.Once an empty bucket is
found, the key is inserted.Probing is performed in accordance with the technique used for open
addressing.
h2(k) is relatively prime to m are, either make m = 2k and design h2(k) so it is always odd, or make
m prime and ensure h2(k) < m. Of course, h2(k) cannot equal zero.
Double hashing can be performed using:
(hash1(key) + i * hash2(key)) mod TABLE_SIZE
Here hash1() and hash2() are hash functions
First hash function:
hash1(key) = key mod TABLE_SIZE
Second hash function is :
hash2(key) = PRIME – (key mod PRIME)
Where PRIME is a prime smaller than the TABLE_SIZE
A good second Hash function is:It must never evaluate to zero. Must make sure that all cells can be
probed.
14.2 Rehashing
This is another method of collision handling. In this method you fi nd an alternative empty location
by modifying the hash function, and applying the modifi ed hash function to the colliding symbol.
For example, if x is symbol and h(x) = i, and if the ith location is already occupied, then I modify the
hash function h to h1, and fi nd out h1(x), if h1(x) =j, and jth location is empty, then I accommodate
x in the jth location. Otherwise you once again modify h1 to some h2 and repeat the process till the
collision gets handled. Once the collision gets handled we revert back to the original hash function
before considering the next symbol.
It is process of re-calculating the hash code of already stored entries.The Hash table provides
Constant time complexity of insertion and searching, provided the hash function is able to
distribute the input load evenly. In case of Collision, the time complexity can go up to O(N) in the
worst case. Rehashing of a hash map is done when the number of elements in the map reaches the
maximum threshold value.
When load factor increases to more then its predefined value, complexity increases.To overcome
this problem, size of array is increased and all the values are hashed again and stored in new
double size array to maintain a low load factor and complexity.
Load factor
Load factor is number of element (n) divide by number of bucket (m).
Load factor ( λ ) =n/m
- λ < 1 i.e. m>n
if λ < 1 then no need to apply rehashing
if λ > 1 then we need to increase number of buckets
Increase in bucket size is known as rehashing.The Load Factor decides “when to increase the size of
the hash Table.”
Rehashing steps
-Increase number of buckets.
-Modify hash function
Hash function before rehashing : x mod m
after rehashing x mod m’
-apply changed hash function to existing elements.
m’ calculation
m’ = closet prime number of 2m
Example:
m=3 m’= 2(3) = 6
Closet prime number = 5 or 7.
Example: Rehashing
Elements: 12, 13, 14 table size = 3
m’ = 2n
m’ = 2x3 => 6
Complexity of Rehashing
Time complexity – O(n)
Space complexity – O(n)
Summary
Rehashing schemes use a second hashing operation when there is a collision.
The open addressing technique requires a hash table with fixed and known size.
Double Hashing is a hashing collision resolution technique in open addressed Hash tables.
Rehashing is process of re-calculating the hash code of already stored entries.
The load factor in HashMap is basically a measure that decides when exactly to increase the
size of the HashMap to maintain the same time complexity of O(1).
The Load Factor decides “when to increase the size of the hash Table.”
Keywords
Rehashing Load factor
Hash map Open addressing
Double hashing Prime number
Clustering
Self Assessment
1. Which statement is correct about open addressing?
A. 4
B. 2
C. 1
D. 3
A. The second hash function is used to provide an offset value in case the first function causes a
collision.
B. In double hashing, there are two hash functions.
C. Second hash function used to remove the collision when you encountered the collision.
D. All of above
A. Table size
B. Key value
C. Prime number
D. None of above
A. No primary clustering
B. No secondary clustering
C. Double hashing can find the next free slot faster than the linear probing approach
D. All of above
A. Number of element
B. Number of key values
C. Number of bucket
D. None of above
a) Less than 1
b) Greater then 1
c) Equal to 0
d) All of above
A. Number of bucket
B. Number of element
C. Number of key values
D. None of above
14. Table size is 3 and elements are 12, 13, and 14. Load factor is_
A. 3
B. 2
C. 1
D. 0
A. λ
B. ∞
C. µ
D. Ω
6. B 7. C 8. D 9. D 10. C
Review Questions
1. What are the conditions for double hashing?
2. Discuss double hashing technique with example.
3. What are the two hash functions used in double hashing?
4. What is significance of load factor in hashing?
5. What are the advantages of double hashing?
6. Discuss steps for rehashing.
7. What is time and space complexity of rehashing?
Further Readings
Burkhard Monien, Data Structures and Efficient Algorithms, Thomas Ottmann,
Springer.
Kruse, Data Structure & Program Design, Prentice Hall of India, New Delhi.
Mark Allen Weles, Data Structure & Algorithm Analysis in C, Second Ed., Addison-
Wesley Publishing.
RG Dromey, How to Solve it by Computer, Cambridge University Press.
Lipschutz. S. (2011). Data Structures with C. Delhi: Tata McGraw hill
Reddy. P. (1999). Data Structures Using C. Bangalore: Sri Nandi Publications
Samantha. D (2009). Classic Data Structures. New Delhi: PHI Learning Private Limited
Web Links
www.en.wikipedia.org
https://learningsolo.com/what-is-rehashing-and-load-factor-in-hashmap/
https://www.scaler.com/topics/data-structures/load-factor-and-rehashing/
https://www.javatpoint.com/double-hashing-in-java
https://www.educative.io/edpresso/what-is-double-hashing