Data Structure Notes Ugca 1915
Data Structure Notes Ugca 1915
Predefined
Process /Function
Used to represent a
group of statements
performing one
processing task.
Preprocessor
Big - O Notation
Big - O Notation is used to describe the Time complexity of an
algorithm. It means how much time is needed by an algorithm to
complete its execution for the input size of N. For Example a sorting
algorithm take longer time to sort 5000 elements than 50.
Following are commonly used Orders of an algorithm.
(1) O(1): An algorithm that will always execute in the same time
regardless of the size of the input data is having complexity of O(1).
(2) O(n): An algorithm whose performance is directly proportional to
the size of the input data is having complexity of O(n). It is also
known as linear complexity. If an algorithm uses looping structure
over the data then it is having linear complexity of O(n). Linear
Search is an example of O(n).
(3) O(n2): An algorithm whose performance is directly proportional
to the square of the size of the input data is having complexity of
O(n2). If an algorithms uses nested looping structure over the data
then it is having quadratic complexity of O(n2). Bubble sort,
Selection Sort are the example of O(n2).
(4) O(logn): An algorithm in which during each iteration the input
data set is partitioned into to sub parts is having complexity of
O(logn). Quick Sort, Binary Search are the example of O(logn)
complexity.
Arrays
Array is a container which can hold a fix number of items and these
items should be of the same type. Most of the data structures make
use of arrays to implement their algorithms. Following are the
important terms to understand the concept of Array.
Element − Each item stored in an array is called an element.
Index − Each location of an element in an array has a numerical
index, which is used to identify the element.
Floating point
Float Data Structure is used to represent numbers having decimal
point.
For Example: 3.2, 4.56 etc…
In computer floating point numbers can be represented using
normalized floating point representation. In this type of
representation, a floating point number is expressed as a
combination of mantissa and exponent.
Character
Character is used to represent single character enclosed between
single inverted comma.
It can store letters (a-z, A-Z), digit (0-9) and special symbols.
Non Primitive Data Structure
"The Data Structure which is not directly operated by machine
level instruction is known as Non Primitive Data Structure."
Non Primitive Data Structure is derived from Primitive Data
Structure.
Non Primitive Data Structure are classified into two categories:
(1) Linear Data Structure
(2) Non Linear Data Structure
Linear data structure
"The Data Structure in which elements are arranged such that we
can process them in linear fashion (sequentially) is called linear
data structure."
Following are Examples of Linear Data Structure:
Array
Array is a collection of variables of same data type that share common
name. Array is an ordered set which consist of fixed number of
elements. In array memory is allocated sequentially to each element.
So it is also known as sequential list.
Stack
Stack is a linear Data Structure in which insertion and deletion
operation are performed at same end.
Queue
Queue is a linear Data Structure in which insertion operation is
performed at one end called Rear end and deletion operation is
performed at another end called front end.
Linked list
Linked list is an ordered set which consist of variable number of
elements. In Linked list elements are logically adjacent to each other
but they are physically not adjacent. It means elements of linked list
are not sequentially stored in memory.
Non-Linear data structure
"The Data Structure in which elements are arranged such that we
can not process them in linear fashion (sequentially) is called
Non-Linear data structure."
Non Linear Data Structures are useful to represent more complex
relationships as compared to Linear Data Structures.
Following are Examples of Non-Linear Data Structure:
Tree
A tree is a collection of one or more nodes such that:
(1) There is a special node called root node.
(2) All the nodes in a tree except root node having only one
Predecessor.
(3) Each node in a tree having 0 or more Successor.
A tree is used to represent hierarchical information.
Graph
A Graph is a collection of nodes and edges which is used to represent
relationship between pair of nodes. A graph G consists of set of nodes
and set of edges and a mapping from the set of edges to a set of pairs
of nodes.
1.1.6 Static and Dynamic Memory Allocation
Memory allocation is the process of setting aside sections of memory
in a program to be used to store variables, and instances of structures
and classes.
There are two types of memory allocations possible in C:
1. Compile-time or Static allocation.
2. Run-time or Dynamic allocation (using pointers).
Compile-time or Static allocation
Static memory allocation allocated by the compiler. Exact size and
type of memory must be known at compile time.
int x,y;
float a[5];
When the first statement is encountered, the compiler will allocate
two bytes to each variables x and y. The second statement results into
the allocation of 20 bytes to the array a (5*4, where there are five
elements and each element of float type tales four bytes). Note that as
there is no bound checking in C for array boundaries, i.e., if you have
declared an array of five elements, as above and by mistake you are
intending to read more than five values in the array a, it will still work
without error.
For example we are reading the above array as follows:
For(i=0; i<10; i++)
{
Scanf(“%d”,&a[i]);
}
Run-time or Dynamic allocation
Dynamic memory allocation is when an executing program requests
that the operating system give it a block of main memory. The
program then uses this memory for some purpose. Usually the
purpose is to add a node to a data structure. In object-oriented
languages, dynamic memory allocation is used to get the memory for
a new object.
The memory comes from above the static part of the data segment.
Programs may request memory and may also return previously
dynamically allocated memory. Memory may be returned whenever it
is no longer needed. Memory can be returned in any order without
any relation to the order in which it was allocated. The heap may
develop "holes" where previously allocated memory has been
returned between blocks of memory still in use.
A new dynamic request for memory might return a range of
addresses out of one of the holes. But it might not use up all the hole,
so further dynamic requests might be satisfied out of the original hole.
dynamic allocation and de-allocation functions:
malloc( )
calloc( )
free( )
realloc( )
The Malloc( ) Function
The malloc( ) function allocates a block of memory in bytes. The user
should explicitly give the block sixe it requires of the use. The
malloc( ) is like a request to the RAM of the system to allocate
memory.
Syntax:-
malloc (number of elements * size of each element);
Example :
int *ptr;
ptr = malloc (10*sizeof(int));
The Calloc( ) Function
This function works exactly similar to malloc( ) function except for
the fact that it needs two arguments as against one argument required
by malloc( ).
Example:-
int *ptr;
ptr = (int*) calloc (10,2);
The Free( ) Function
The free( ) function is used to de-allocate the previously allocated
memory using malloc( ) or calloc( ) functions.
Syntax:-
Free(ptr_var);
// where ptr is the pointer in which the address of the allocated
memory block is assigned.
The Realloc( ) Function
This function is used to resize the size of memory block, which is
already allocated. It found use of in two situations :
If the allocated memory block is insufficient for current application.
If the allocated memory is much more than what is required by the
current application.
Syntax:-
ptr_var = realloc (ptr_var,new_size);
1.1.7 Function
A function is a set of statements that take inputs, do some specific
computation and produces output.
Need of functions
Functions help us in reducing code redundancy. If functionality is
performed at multiple places in software, then rather than writing the
same code, again and again, we create a function and call it
everywhere. This also helps in maintenance as we have to change at
one place if we make future changes to the functionality.
Functions make code modular. Consider a big file having many lines
of codes. It becomes really simple to read and use the code if the code
is divided into functions.
Functions provide abstraction. For example, we can use library
functions without worrying about their internal working.
Function Declaration
A function declaration tells the compiler about the number of
parameters function takes, data-types of parameters and return type of
function. Putting parameter names in function declaration is optional
in the function declaration, but it is necessary to put them in the
definition.
In above example, the last inserted node is 99 and the first inserted
node is 25. The order of elements inserted is 25, 32,50 and 99.
PUSH OPERATION
This procedure pushes an ITEM into a linked stack
PUSH_LINKSTACK (INFO, LINK, TOP, AVAIL, ITEM)
1. [Available space?] if AVAIL=NULL THEN WRITE
OVERFLOW and Exit.
2. [Remove first node from AVAIL list]
Set NEW: =AVAIL and AVAIL: =LINK[AVAIL].
3. Set INFO[NEW]: =ITEM [copies Item into new node]
4. Set LINK[NEW]: =TOP [new node points to the original top
node in the stack]
5. Set TOP=NEW [Reset TOP to point to the new node at the top
of the stack]
6. Exit.
POP OPERATION
This procedure deletes the top element of a linked stack and assigns it
to the variable ITEM
POP_LINKSTACK (INFO, LINK, TOP, AVAIL, ITEM)
1. [Stack has an element to be removed]
IF TOP: =NULL then Write: UNDERFLOW and Exit.
2. Set ITEM: =INFO[TOP] [Copies the top element of stack into
ITEM]
3. Set TEMP: = TOP and TOP = LINK[TOP]
[Remember the old value of the TOP pointer in TEMP and reset TOP
to point to the next element in the stack]
4. [Return deleted node to the AVAIL list]
Set LINK[TEMP]=AVAIL and AVAIL = TEMP.
5. Exit
1. (A + B) * C + D
[+ AB] * C + D
[*+ ABC] + D
+*+ ABCD
2. (A + B) * (C + D)
[+ AB] * [+ CD]
*+ AB + CD
Example of Infix to Postfix Conversion
1. (A + B) * C + D
[AB + ] * C + D
[AB + C * ] + D
AB + C * D +
2. A + B - (C * D) + E / F + G * H
A + B - [CD * ] + E / F + G * H
A + B - [CD * ] + [EF / ] + [GH * ]
[AB + ] - [CD * ] + [EF / ] + [GH * ]
[AB + CD *- ] + [EF / GH *+ ]
AB + CD *- EF / GH *++
Evaluation of postfix Expression using Stack
For solving postfix expression using Stack certain steps are to be
followed. The steps are :
1) Add a left parenthesis '(' at start of expression and right
parenthesis ')' at end of expression.
2) Initially push any symbol to the stack and consider it a
lower precedence.
3) Each operand simply add to the expression(Postfix) and
does not change the state of the stack.
4) If any higher or equal precedence operator is in the stack,
and next we get any lower preference operator then we pop
that higher preference operator and push lower preference
operator onto the stack.
5) In any parenthesis expression if there is left parenthesis
in the stack then we only remove this parenthesis when a right
parenthesis of a same level encounter.
6) Exit
Evaluation of Prefix Expression using Stack
For solving prefix expression using Stack certain steps are to be
followed. The steps are :
1) Reverse the given Infix expression.
2) Make every open bracket as close bracket and vice-versa.
3) Convert expression into postfix form.
4) Reverse the expression.
5) Receive expression is Prefix expression.
2. Recursion
A procedure that calls itself directly or indirectly is called Recursion.
Every Recursive procedure must follow the following recursive
properties:
1) Every Recursion must have a base criteria at which the
recursive procedure terminates.
2) In each step Recursive Procedure must be closer to the
base criteria.
The procedure that follows the above Recursive properties is said to
be well defined Recursive Procedure.
Basic requirements of Recursion
For implementing and designing the good recursive program we must
make certain assumptions which are as follows-
1) Base Case:is the terminating condition for the problem
while designing any recursive algorithm, we must choose a
proper terminating condition for the problem.
2) If condition defines the terminating condition.
3) Every time a new recursive call is made, a new memory is
allocated to each variable automatically used by recursive
routine.
4) Each time recursive call is there, the duplicate values of
the local variables of the recursive call are pushed onto the
stack within the respective call and all these values are
available to the respective function call when it is popped off
from the stack.
5) Recursive Case: else part of the recursive definition calls
the function recursively
Types of Recursion
The characterization is based on:
1) Whether the function calls itself or not (Direct or Indirect
Recursion).
2) Whether there are pending operations at each recursive
calls (Tail Recursion or not).
3) The shape of the calling pattern whether pending
operations are also recursive (Linear or Tree Recursion)
3. Tower of Hanoi
In tower of Hanoi problem there are 3 pegs (post or tower) and n disc
of different sizes. each disc has a hole in the middle so that it can fit
on any peg. At the beginning of game, all n disc is on first peg
arranged such that the largest is on the bottom and smallest is on the
top.
The goal of the game is to end up with all disc in the third peg in the
same order i.e, smallest on top and increasing order towards the
bottom. There are some restrictions to how the disc are moved -
1) The only allowed type of move is to grab one disc from
the top of one peg and drop it on another peg i.e, only one disc
can be moved at a time.
2) A larger disc can never lie above a smaller disc on any
cost.
3) The solution of the problem is given in 2 n - 1 number of
steps, where 'n' is number of disc.
Tower of Hanoi problem for n=3 and A,B,C post
2n-1 = 23-1 = 7steps
2.5 Implementation of Multiple stack
When a stack is created using single array, we can not able to
store large amount of data, thus this problem is rectified using
more than one stack in the same array of sufficient array. This
technique is called as Multiple Stack.
To implement multiple stacks in a single array, one approach is to
divide the array in k slots of size n/k each, and fix the slots for
different stacks, we can use arr[0] to arr[n/k-1] for first stack, and
arr[n/k] to arr[2n/k-1] for stack2 and so on where arr[] is the array of
size n.
Although this method is easy to understand, but the problem with this
method is inefficient use of array space.A stack push operation may
result in stack overflow even if there is space available in arr[].
Algorithm:
1. Here we use 2 arrays min[] and max[] to represent the lower
and upper bounds for a stack
2. Array s[] stores the elements of the stack
3. Array top[] is used to store the top index for each stack
4. Variable ns represents the stack number
5. Variable size represents the size for each stack in an array
6. First we build a function init() to initialize the starting values
7. Then we have a function createstack() to create the stack
8. Function Push() & Pop() are used to push and pop an element to
and from the stack
9. Function Display() is used to display the elements in a particular
stack
2.6 Introduction to Queue
Like Stack, Queues are also an ordered collection of items. But unlike
Stacks that have only one end for insertion and deletion.
Queues have 2 ends, one end for insertion and other for deletion. The
end at which we insert an item is called Rear End (Back) and end
where we remove an item is called Front End.
The items are removed from the queue in the same order as they were
inserted in the queue i.e the first item inserted into the queue will be
serviced first and so it has to be removed first from the queue i.e the
queue operations are performed in FIFO(First in First Out) basis.
Whenever an item is removed or deleted from the queue, then the
value of front is incremented by 1.
Example- At the ticket window, people will served as First in First
Out(FIFO) basis
Queue Representation
2.7 Queue implementation
Queue can be implemented using an Array, Stack or Linked List. The
easiest way of implementing a queue is by using an Array.
Initially the head(FRONT) and the tail(REAR) of the queue points at
the first index of the array (starting the index of array from 0). As we
add elements to the queue, the tail keeps on moving ahead, always
pointing to the position where the next element will be inserted, while
the head remains at the first index.
2.10 De-queue
Double Ended Queue (DEQue)
In DEQue, bot insertion and deletion operations are performed at
either i.e both ends of the queue i.e we can insert the element from the
rear or front end. Also deletion is possible from both the ends.
This DEQue can be used both as a stack and as a Queue. There are
various ways by which this DEQue can be represented-
i. using a circular array
ii. Doubly Linked List
Types of DEQueues
Input Restricted DEQue :
In this, element can be added only at one end but we can delete the
element from both the ends.
Output Restricted DEQue :
In this, element can be deleted only from one end but insertion is
allowed at both the ends.
2.11 Priority Queue
Priority Queue is more specialized data structure than Queue. Like
ordinary queue, priority queue has same method but with a major
difference. In Priority queue items are ordered by key value so that
item with the lowest value of key is at front and item with the highest
value of key is at rear or vice versa. So we're assigned priority to item
based on its key value. Lower the value, higher the priority.
Following are the principal methods of a Priority Queue.
Basic Operations
insert / enqueue − add an item to the rear of the queue.
remove / dequeue − remove an item from the front of the queue.
Priority Queue Representation
We're going to implement Queue using array in this article. There is
few more operations supported by queue which are following.
Peek − get the element at front of the queue.
isFull − check if queue is full.
isEmpty − check if queue is empty.
Insert / Enqueue Operation
Whenever an element is inserted into queue, priority queue inserts the
item according to its order. Here we're assuming that data with high
value has low priority.
Now, the next node at the left should point to the new node.
LeftNode.next −> NewNode;
This will put the new node in the middle of the two. The new list
should look like this −
Similar steps should be taken if the node is being inserted at the
beginning of the list. While inserting it at the end, the second last
node of the list should point to the new node and the new node will
point to NULL.
Insertion algorithms
a) Insertion at the beginning of list
b) Insertion after a given node
c) Inserting into a sorted linked list.
We assume that the linked list is in memory in the form
LIST(INFO,LINK,START,AVAIL) and that the variable ITEM
contains the new information to be added to the list.
Since our insertion algorithms will use a node AVAIL list , all of the
algorithms will include following steps:
(a) Checking to see if space is available in the AVAIL list. if , not ,
that is, if AVAIL= null, then the algorithm will print the message
OVERFLOW.
(b) Removing the first node from the AVAIL list. Using the
variable NEW to keep track of the location of new node, the step can
be implemented by the pair of assignments
NEW:= AVAIL, AVAIL:=LINK[AVAIL]
(c) Copying new information into the new node.
INFO [NEW]:= ITEM
Insertion at the beginning of list
The easiest place to inset the node is at the beginning of the list.
Algorithm:: INSFIREST(INFO,LINK,START,AVAIL,ITEM)
This algorithm inserts ITEM as the first node in the list.
1. [OVERFLOW?]If AVAIL=NULL, then: Write: OVERFLOW,
and EXIT.
2. [Remove first node from AVAIL list.]
Set NEW: =AVAIL and AVAIL:= LINK[AVAIL].
3. Set INFO [NEW] := ITEM. [ copies new data into new node]
4. Set LINK [NEW] := START. [New node now points to original
first node.]
5. Set START := NEW.[ Changes START so it points to the new
node.]
6. EXIT.
Insertion after a given node
Suppose we are given the value of LOC where either LOC is the
location of a node A in a linked LIST or LOC=NULL. When LOC:=
NULL , Then ITEM is inserted as first nide.
Let N denote the new mode. if LOC =NULL, then N is inserted as the
first node in LIST.
LINK[NEW]:= LINK[ LOC]
And we let node A to the new node N by the assignment
LINK[LOC]:= NEW
Algorithm:
INSLOC(INFO, LINK, START , AVAIL, LOC, ITEM)
This algorithm inserts ITEM so that TEM follows the node with
location LOC or inserts ITEM as the first node when LOC:= NULL.
1. [OVERFLOW?] If AVAIL=NULL, then write: OVERFLOW ,
and Exit.
2. [Remove first node from AVAIL list.]
Set NEW ;= AVAIL and AVAIL ;= LINK[AVAIL].
3. Set INFO[NEW] := ITEM .[copies new data into new node.]
4. If LOC=NULL, then:[insert as first node.]
Set LINK[NEW] := START and START := NEW.
Else: [insert after node with location LOC.]
Set LINK[NEW] := LINK[LOC] and LINK[LOC] := NEW.
[End of if structure.]
5. Exit.
Inserting into a sorted list
Suppose ITEM is to be inserted into a sorted linked LIST. Then ITEM
must be inserted between nodes A and B so that
INFO (A)< ITEM < INFO(B)
The following is a procedure which finds the location LOC of node
A , that is, which finds the location LOC of the last node in LIST
whose value is less than ITEM.
Traverse the list, using a pointer variable PTR and comparing ITEM
with INFO[PTR] at each node. While traversing, keep track of the
location of the preceding node by the assignments variable SAVE .
thus, SAVE and PTR are updated by the assignment.
SAVE: = PTR and PTR: = LINK[PTR]
The traversing continuous as long as INFO[PTR] > ITEM, or in other
words, the traversing stops as soon as ITEM<= INFO[PTR].Then
PTR points to node B, so SAVE will contain the location of the node
A.
The formal statement of procedure follows. The cases where the list is
empty or where ITEM < INFO[START], so LOC= NULL, are treated
separately, since they do not involve the variable SAVE.
PROCEDURE: FINDA (INFO, LINK, START, ITEM, LOC)
This procedure finds the location LOC of the last node in a sorted list
such that INFO[LOC]<ITEM, or sets LOC=NULL.
1. [list empty?] If START = NULL, then : LOC := NULL, and
return.
2. [special case?] if ITEM < INFO[START] , then Set LOC:=
NULL, and Return .
3. Set SAVE := START and PTR := LINK[ START] .
[ INTIALIZE POINTERS.]
4. Repeat steps 5 and 6 while PTR ≠ NULL
5. If ITEM < INFO[PTR],then
Set LOC := SAVE, and Return .
[end of if statement.]
6. Set SAVE:= PTR and PTR:= LINK[PTR].[Update pointers]
[end of step 4 loop.]
7. Set LOC:= SAVE.
8. Return.
Now we have all components to present an algorithm which inserts
ITEM into a linked list. The simplicity of algorithm comes from using
the previous two procedures.
Algoritm: INERT(INFO, LINK, START,AVAIL, ITEM)
This algorithm inserts ITEM into a sorted list .
1. [use procedure to find the location of the node preceding ITEM.]
Call FINDA (INFO, LINK, START, ITEM,LOC)
2. [use algorithm to insert ITEM after the node with location
LOC.]
Call INSLOC( INFO, LINK, START, AVAIL, LOC , ITEM).
3. Exit.
Deletion Operation
Deletion is also a more than one step process. We shall learn with
pictorial representation. First, locate the target node to be removed, by
using searching algorithms.
The left (previous) node of the target node now should point to the
next node of the target node −
Left Node.next −> TargetNode.next;
This will remove the link that was pointing to the target node. Now,
using the following code, we will remove what the target node is
pointing at.
TargetNode.next −> NULL;
We have to make sure that the last node is not the lost node. So we'll
have some temp node, which looks like the head node pointing to the
last node. Now, we shall make all left side nodes point to their
previous nodes one by one.
Except the node (first node) pointed by the head node, all nodes
should point to their predecessor, making them their new successor.
The first node will point to NULL.
We'll make the head node point to the new first node by using the
temp node.
In the above figure, Link1 field stores the address of the previous
node and Link2 field stores the address of the next node. The Data
Item field stores the actual value of that node. If we insert a data into
the linked list, it will be look like as follows:
Note:
First node is always pointed by head. In doubly linked list, previous
field of the first node is always NULL (it must be NULL) and the
next field of the last must be NULL.
In the above figure we see that, doubly linked list contains three
fields. In this, link of two nodes allow traversal of the list in either
direction. There is no need to traverse the list to find the previous
node. We can traverse from head to tail as well as tail to head.
Advantages of Doubly Linked List
Doubly linked list can be traversed in both forward and backward
directions.
To delete a node in singly linked list, the previous node is required,
while in doubly linked list, we can get the previous node using
previous pointer.
It is very convenient than singly linked list. Doubly linked list
maintains the links for bidirectional traversing.
Disadvantages of Doubly Linked List
In doubly linked list, each node requires extra space for previous
pointer.
All operations such as Insert, Delete, Traverse etc. require extra
previous pointer to be maintained.
In the above figure we see that, each node points to its next node in
the sequence but the last node points to the first node in the list. The
previous element stores the address of the next element and the last
element stores the address of the starting element. It forms a circular
chain because the element points to each other in a circular way.
In circular linked list, the memory can be allocated when it is required
because it has a dynamic size.
Circular linked list is used in personal computers, where multiple
applications are running. The operating system provides a fixed time
slot for all running applications and the running applications are kept
in a circular linked list until all the applications are completed. This is
a real life example of circular linked list.
We can insert elements anywhere in circular linked list, but in the
array we cannot insert elements anywhere in the list because it is in
the contiguous memory.
3.4.4 Doubly circular Linked List
Doubly circular linked list is a linked data structure which consists of
a set of sequentially linked records called nodes.
Doubly circular linked list can be conceptualized as two singly linked
lists formed from the same data items, but in opposite sequential
orders.
We're going to implement tree using node object and connecting them
through references.
Tree Node
The code to write a tree node would be similar to what is given below.
It has a data part and references to its left and right child nodes.
Struct node
{
int data;
struct node *leftChild;
struct node *rightChild;
};
In a tree, all nodes share common construct.
BST Basic Operations
The basic operations that can be performed on a binary search tree
data structure, are the following −
Insert − Inserts an element in a tree/create a tree.
Search − Searches an element in a tree.
Preorder Traversal − Traverses a tree in a pre-order manner.
Inorder Traversal − Traverses a tree in an in-order manner.
Postorder Traversal − Traverses a tree in a post-order manner.
We shall learn creating (inserting into) a tree structure and searching a
data item in a tree in this chapter. We shall learn about tree traversing
methods in the coming chapter.
Insert Operation
The very first insertion creates the tree. Afterwards, whenever an
element is to be inserted, first locate its proper location. Start
searching from the root node, then if the data is less than the key
value, search for the empty location in the left subtree and insert the
data. Otherwise, search for the empty location in the right subtree and
insert the data.
Algorithm
If root is NULL
then create root node
return
If root exists then
compare the data with node.data
while until insertion position is located
If data is greater than node.data
goto right subtree
else
goto left subtree
endwhile
insert data
end If
Search Operation
Whenever an element is to be searched, start searching from the root
node, then if the data is less than the key value, search for the element
in the left subtree. Otherwise, search for the element in the right
subtree. Follow the same algorithm for each node.
Algorithm
If root.data is equal to search.data
return root
else
while data not found
If data is greater than node.data
goto right subtree
else
goto left subtree
If data found
return node
endwhile
return data not found
end if
3.13 B Tree
B-Tree is a self-balancing search tree. In most of the other self-
balancing search trees (like AVL and Red-Black Trees), it is assumed
that everything is in main memory. To understand the use of B-Trees,
we must think of the huge amount of data that cannot fit in main
memory. When the number of keys is high, the data is read from disk
in the form of blocks. Disk access time is very high compared to main
memory access time. The main idea of using B-Trees is to reduce the
number of disk accesses. Most of the tree operations (search, insert,
delete, max, min, ..etc ) require O(h) disk accesses where h is the
height of the tree. B-tree is a fat tree. The height of B-Trees is kept
low by putting maximum possible keys in a B-Tree node. Generally, a
B-Tree node size is kept equal to the disk block size. Since h is low
for B-Tree, total disk accesses for most of the operations are reduced
significantly compared to balanced Binary Search Trees like AVL
Tree, Red-Black Tree, ..etc.
Properties of B-Tree
1) All leaves are at same level.
2) A B-Tree is defined by the term minimum degree ‘t’. The value of t
depends upon disk block size.
3) Every node except root must contain at least t-1 keys. Root may
contain minimum 1 key.
4) All nodes (including root) may contain at most 2t – 1 keys.
5) Number of children of a node is equal to the number of keys in it
plus 1.
6) All keys of a node are sorted in increasing order. The child
between two keys k1 and k2 contains all keys in the range from k1
and k2.
7) B-Tree grows and shrinks from the root which is unlike Binary
Search Tree. Binary Search Trees grow downward and also shrink
from downward.
8) Like other balanced Binary Search Trees, time complexity to
search, insert and delete is O(Logn).
Following is an example B-Tree of minimum degree 3. Note that in
practical B-Trees, the value of minimum degree is much more than 3.
Operations on a B-Tree
The following operations are performed on a B-Tree...
1. Search
2. Insertion
3. Deletion
Search Operation in B-Tree
The search operation in B-Tree is similar to the search operation in
Binary Search Tree. In a Binary search tree, the search process starts
from the root node and we make a 2-way decision every time (we go
to either left subtree or right subtree). In B-Tree also search process
starts from the root node but here we make an n-way decision every
time. Where 'n' is the total number of children the node has. In a B-
Tree, the search operation is performed with O(log n) time
complexity. The search operation is performed as follows...
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with first key value of root
node in the tree.
Step 3 - If both are matched, then display "Given node is found!!!"
and terminate the function
Step 4 - If both are not matched, then check whether search element
is smaller or larger than that key value.
Step 5 - If search element is smaller, then continue the search
process in left subtree.
Step 6 - If search element is larger, then compare the search
element with next key value in the same node and repeate steps 3, 4, 5
and 6 until we find the exact match or until the search element is
compared with last key value in the leaf node.
Step 7 - If the last key value in the leaf node is also not matched
then display "Element is not found" and terminate the function.
Insertion Operation in B-Tree
In a B-Tree, a new element must be added only at the leaf node. That
means, the new keyValue is always attached to the leaf node only.
The insertion operation is performed as follows...
Step 1 - Check whether tree is Empty.
Step 2 - If tree is Empty, then create a new node with new key
value and insert it into the tree as a root node.
Step 3 - If tree is Not Empty, then find the suitable leaf node to
which the new key value is added using Binary Search Tree logic.
Step 4 - If that leaf node has empty position, add the new key value
to that leaf node in ascending order of key value within the node.
Step 5 - If that leaf node is already full, split that leaf node by
sending middle value to its parent node. Repeat the same until the
sending value is fixed into a node.
Step 6 - If the spilting is performed at root node then the middle
value becomes new root node for the tree and the height of the tree is
increased by one.
Example
Construct a B-Tree of Order 3 by inserting numbers from 1 to 10.
Unit 4 Graphs, Searching, Sorting and Hashing
3.1 Introduction to Graphs
A graph can be defined as group of vertices and edges that are used to
connect these vertices. A graph can be seen as a cyclic tree, where the
vertices (Nodes) maintain any complex relationship among them
instead of having parent child relationship.
Definition
A graph G can be defined as an ordered set G(V, E) where V(G)
represents the set of vertices and E(G) represents the set of edges
which are used to connect these vertices.
A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B),
(B,C), (C,E), (E,D), (D,B), (D,A)) is shown in the following figure.
Graph Terminology
Path
A path can be defined as the sequence of nodes that are followed in
order to reach some terminal node V from the initial node U.
Closed Path
A path will be called as closed path if the initial node is same as
terminal node. A path will be closed path if V0=VN.
Simple Path
If all the nodes of the graph are distinct with an exception V0=VN,
then such path P is called as closed simple path.
Cycle
A cycle can be defined as the path which has no repeated edges or
vertices except the first and last vertices.
Connected Graph
A connected graph is the one in which some path exists between
every two vertices (u, v) in V. There are no isolated nodes in
connected graph.
Complete Graph
A complete graph is the one in which every node is connected with all
other nodes. A complete graph contain n(n-1)/2 edges where n is the
number of nodes in the graph.
Weighted Graph
In a weighted graph, each edge is assigned with some data such as
length or weight. The weight of an edge e can be given as w(e) which
must be a positive (+) value indicating the cost of traversing the edge.
Digraph
A digraph is a directed graph in which each edge of the graph is
associated with some direction and the traversing can be done only in
the specified direction.
Loop
An edge that is associated with the similar end points can be called as
Loop.
Adjacent Nodes
If two nodes u and v are connected via an edge e, then the nodes u and
v are called as neighbours or adjacent nodes.
Degree of the Node
A degree of a node is the number of edges that are connected with
that node. A node with degree 0 is called as isolated node.
3.2 Representation to Graphs
Graphs are mathematical structures that represent pairwise
relationships between objects. A graph is a flow structure that
represents the relationship between various objects. It can be
visualized by using the following two basic components:
Nodes: These are the most important components in any graph.
Nodes are entities whose relationships are expressed using edges. If a
graph comprises 2 nodes A and B and an undirected edge between
them, then it expresses a bi-directional relationship between the nodes
and edge.
Edges: Edges are the components that are used to represent the
relationships between various nodes in a graph. An edge between two
nodes expresses a one-way or two-way relationship between the
nodes.
Types of nodes
Root node: The root node is the ancestor of all other nodes in a
graph. It does not have any ancestor. Each graph consists of exactly
one root node. Generally, you must start traversing a graph from the
root node.
Leaf nodes: In a graph, leaf nodes represent the nodes that do not
have any successors. These nodes only have ancestor nodes. They can
have any number of incoming edges but they will not have any
outgoing edges.
Types of graphs
Undirected: An undirected graph is a graph in which all the edges
are bi-directional i.e. the edges do not point in any specific direction.
Directed: A directed graph is a graph in which all the edges are uni-
directional i.e. the edges point in a single direction.
Graph representation
You can represent a graph in many ways. The two most common
ways of representing a graph is as follows:
Adjacency matrix
An adjacency matrix is a VxV binary matrix A. Element Ai,j is 1 if
there is an edge from vertex i to vertex j else Ai,j is 0.
Note: A binary matrix is a matrix in which the cells can have only one
of two possible values - either a 0 or 1.
The adjacency matrix can also be modified for the weighted graph in
which instead of storing 0 or 1 in Ai,j, the weight or cost of the edge
will be stored.
In an undirected graph, if Ai,j = 1, then Aj,i = 1. In a directed graph,
if Ai,j = 1, then Aj,i may or may not be 1.
Adjacency matrix provides constant time access (O(1) ) to determine
if there is an edge between two nodes. Space complexity of the
adjacency matrix is O(V2).
The adjacency matrix of the following graph is:
i/j : 1 2 3 4
1:0101
2:1010
3:0101
4:1010
Follow the steps below to find the shortest path between all the pairs
of vertices.
1. Create a matrix A1 of dimension n*n where n is the number of
vertices. The row and the column are indexed
as i and j respectively. i and j are the vertices of the graph.
Each cell A[i][j] is filled with the distance from the ith vertex to
the jth vertex. If there is no path from ith vertex to jth vertex, the cell is
left as infinity.
That is, if the direct distance from the source to the destination is
greater than the path through the vertex k, then the cell is filled
with A[i][k] + A[k][j].
For example: For A1[2, 4], the direct distance from vertex 2 to 4 is 4
and the sum of the distance from vertex 2 to 4 through vertex (ie.
from vertex 2 to 1 and from vertex 1 to 4) is 7. Since 4 < 7, A0[2, 4] is
filled with 4.
3. In a similar way, A2 is created using A3 . The elements in the
second column and the second row are left as they are.
In this step, k is the second vertex (i.e. vertex 2). The remaining steps
are the same as in step 2.
Now we compare the value stored at location 4, with the value being
searched, i.e. 31. We find that the value at location 4 is 27, which is
not a match. As the value is greater than 27 and we have a sorted
array, so we also know that the target value must be in the upper
portion of the array.
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2
Our new mid is 7 now. We compare the value stored at location 7
with our target value 31.
The above figure shows the hash table with the size of n = 10. Each
position of the hash table is called as Slot. In the above hash table,
there are n slots in the table, names = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Slot
0, slot 1, slot 2 and so on. Hash table contains no items, so every slot
is empty.
As we know the mapping between an item and the slot where item
belongs in the hash table is called the hash function. The hash
function takes any item in the collection and returns an integer in the
range of slot names between 0 to n-1.
Suppose we have integer items {26, 70, 18, 31, 54, 93}. One
common method of determining a hash key is the division method of
hashing and the formula is:
Hash Key = Key Value % Number of Slots in the Table
Division method or reminder method takes an item and divides it by
the table size and returns the remainder as its hash value.
After computing the hash values, we can insert each item into the
hash table at the designated position as shown in the above figure. In
the hash table, 6 of the 10 slots are occupied, it is referred to as the
load factor and denoted by, λ = No. of items / table size. For
example , λ = 6/10.
It is easy to search for an item using hash function where it
computes the slot name for the item and then checks the hash table to
see if it is present.
Constant amount of time O(1) is required to compute the hash value
and index of the hash table at that location.
Linear Probing
Take the above example, if we insert next item 40 in our collection,
it would have a hash value of 0 (40 % 10 = 0). But 70 also had a hash
value of 0, it becomes a problem. This problem is called
as Collision or Clash. Collision creates a problem for hashing
technique.
Linear probing is used for resolving the collisions in hash table,
data structures for maintaining a collection of key-value pairs.
Linear probing was invented by Gene Amdahl, Elaine M. McGraw
and Arthur Samuel in 1954 and analyzed by Donald Knuth in 1963.
It is a component of open addressing scheme for using a hash table
to solve the dictionary problem.
The simplest method is called Linear Probing. Formula to compute
linear probing is:
P = (1 + P) % (MOD) Table_size
For example,
3.10 Collison
Since a hash function gets us a small number for a key which is a big
integer or string, there is a possibility that two keys result in the same
value. The situation where a newly inserted key maps to an already
occupied slot in the hash table is called collision and must be handled
using some collision handling technique.
A situation when the resultant hashes for two or more data elements
in the data set U, maps to the same location in the has table, is called
a hash collision. In such a situation two or more data elements would
qualify to be stored/mapped to the same location in the hash table.
linked-list.
In this technique when a data needs to be searched, it might become
necessary (worst case) to traverse all the nodes in the linked list to
retrieve the data.
Note that the order in which the data is stored in each of these linked
lists (or other data structures) is completely based on implementation
requirements. Some of the popular criteria are insertion order,
frequency of access etc.
closed hashing (open Addressing)
In this technique a hash table with pre-identified size is considered.
All items are stored in the hash table itself. In addition to the data,
each hash bucket also maintains the three states: EMPTY,
OCCUPIED, DELETED. While inserting, if a collision occurs,
alternative cells are tried until an empty bucket is found. For which
one of the following techniques is adopted.
1. Liner Probing
2. Quadratic probing
3. Double hashing
A COMPARATIVE ANALYSIS OF CLOSED HASHING
VS OPEN HASHING
Open Addressing Closed Addressing
All elements would be stored in
Additional Data structure needs
the Hash table itself. No
to be used to accommodate
additional data structure is
collision data.
needed.
Simple and effective approach to
In cases of collisions, a unique
collision resolution. Key may or
hash key must be obtained.
may not be unique.
Determining size of the hash Performance deterioration of
table, adequate enough for closed addressing much slower
storing all the data is difficult. as compared to Open addressing.
State needs be maintained for the No state data needs to be
data (additional work) maintained (easier to maintain)
Uses space efficiently Expensive on space