Data Structure Final PART II
Data Structure Final PART II
Multi-way search tree of order ‘n’ is a general tree in which each node has ‘n’ or fewer
sub tree and contains one less key than the number of sub trees.
Just as there has to be a file to contain customer data, there has to be a file to contain a
B-tree. Naturally, a good convention (and one followed in the examples already presented) is to
create a file called B-TREE for keeping the B-tree data that the B-TREE-P subroutines create.
Initially, the B-TREE file is completely empty. Then, each time the BTPINS subroutine is called by
a program, another item identifier is inserted into the B-TREE file, and the file becomes a
specially sorted and constructed list of identifiers.
The order in which item identifiers are sorted in a B-tree is controlled by the BTPKEY
subroutine. Although the statements in BTPKEY may specify a very complicated sort for
controlling how items in a B-tree are ordered (for example, by ZIP code by address by company
by name), the only data actually saved in a B-tree are item identifiers. Therefore, it doesn't matter
how complicated the sort is, since the size of the resulting B-tree file is always the same. As a
very rough rule of thumb, a B-tree for a file takes approximately the same amount of space as
about two SELECT lists of the file.
The actual structure of a B-tree consists of a number of nodes stored as items in the
B-tree's file. Each node contains a portion of the sorted list of identifiers in the B-tree, along with
pointers to other nodes in the B-tree. The number of identifiers and pointers stored in each node
is controlled by a special size parameter that is passed as the second argument to the BTPINS and
BTPDEL subroutines. The size parameter indicates the minimum number of identifiers in a node,
and also half the maximum. For example, in the examples already presented, the node size used
was 5, so each node contained from 5 to 10 item identifiers.
The B-tree node size can be any number from 1 up. Small sizes create B-trees that may
be faster to search, but take up more disk space because there are more nodes with pointers.
Extremely small nodes may cause very "deep" B-trees that end up being slow to search. Larger
node sizes slow down searches, but take less disk space because there are fewer pointers. The
disk space occupied by nodes also depends on the length of your data file's item identifiers. A
node size of 50 is often a good, all-around, starting value. Once the B-tree is built, it can be
examined using standard Pick file maintenance techniques to find the optimum node size that
keeps items in the B-tree file nicely packed within the boundaries of Pick's frame structure. If
desired, the B-tree can then easily be rebuilt with the optimum node size.
The first node created in a B-tree file is numbered 0, the next is 1 (even though the node
might be for a different B-tree in the same file), and the next node is 2, and so on. As more
identifiers are inserted into the file's B-trees, more nodes are created. The special item named
NEXT.ID, which is automatically created in every B-tree file, contains the number of the next
node that will be created.
A file can contain any number of B-trees, but each B-tree in the file must have a unique
name, which can be any string of characters. Each B-tree name is saved as an item in the B-tree
file, and contains the number of the root node in the B-tree. The root node for a given B-tree is
where all searches through that tree happen to start. In the B-TREE-P examples, the B-TREE file
contained three different B-trees, named ZIP, COMP, and LNAME, so the B-TREE file also contained
three items with those names.
46. Explain why a tree structure should be made bushy as possible? Justify
your answer with example.
A tree structure should be made bushy. As for example the binary search tree is organized
in such a ay that all of the items less than the item in the chosen node are contained in the
left sub tree and all the items greater than or equal to the chosen node are contained in the
right sub tree. In this manner one does not have to search the entire tree for a particular
item in the manner of linked list traversals. If a binary search tree is traversed in order (left,
root, right) and the content of each node are printed as the node is visited, the numbers are
printed in ascending order.
Binary trees can therefore be considered a self-sorting data structure. Consequently, search
times through the data structure are, on average, greatly reduced.
48. What are the different types of sorting in general? What are the
basic operations involved in sorting, explain with example?
Sorting means arranging the unordered data into order. Sorting is done in order to
simplify the manual retrieval of the information or to make the machine access to data more
efficiently. There are many different sorting algorithms. Generally, we say a file with ‘n’
number of records is said to be sorted if the key i<j implies that k[i] precedes k[j] in some
ordering of the keys.
The two general types of sorting are internal sorts and the external sorts. If the
records being sort are inside the main memory then it is called as the internal sort but if the
records are being sorted is in the auxiliary storage it is called as the external sort.
Also the records of the file can be sorted in different ways depending upon the
conditions. Some types of sorting are:
i. Exchange Sorts
ii. Selection and Tree sorting
iii. Insertion Sorts
iv. Merge and Radix Sorts
The two basic operations that can be involved in the sorting can be either the sorting
of the records themselves or sorting of the auxiliary pointers that represent those records.
In the first case a sort can take place on the actual records themselves. This type of
sorting will be efficient if the records present in the file are in less numbers.
In the second case if the amount of the data stored in the files are in large numbers
and moving of those records are prohibitive then an auxiliary table of pointers may be used
so that these pointers will be moved instead of moving the actual data. This method is also
called as sorting by address. The table in the center is the file, and the table at the left is
the initial table of pointers. During the sorting process, the entries in the pointer table are
adjusted so that the final table is shown at the right. Here no original file entries are moved.
Original Sorted
Pointer File r Table
As we know that there are the great numbers of sorting techniques to sort a file, a
programmer must be careful in deciding which sorting technique is to be used. Three of the
most important considerations involved during the consideration of the program are: length
of the time that must be spent by the programmer in coding a particular sorting program,
the amount of machine time necessary for running the program, and the amount of space
necessary for the program.
The efficiency of various sorting techniques could be compared from the O-notation,
where the O represents the complexity of the algorithm and a value n represents the size of
the set the algorithm is run against.
For example, O (n) means that an algorithm has a linear complexity. In other words, it
takes ten times longer to operate on a set of 100 items than it does on a set of 10 items (10
* 10 = 100). If the complexity was O (n2) (quadratic complexity), then it would take 100
times longer to operate on a set of 100 items than it does on a set of 10 items.
The two classes of sorting algorithms are O(n2), which includes the bubble, insertion,
selection, and shell sorts; and O(n log n) which includes the heap, merge, and quick sorts.
In addition to algorithmic complexity, the speed of the various sorts can be compared with
empirical data. Since the speed of a sort can vary greatly depending on what data set it
sorts, accurate empirical results require several runs of the sort be made and the results
averaged together. In these empirical efficiency graphs the lowest line is the "best".
From the graph the bubble sort is inefficient and the shell sort is the best. On the
other-side, all of these algorithms are simple and is useful for quick test programs, rapid
prototypes, or internal-use software.
O (n2) Sorts
O (n log n) Sorts
Notice that the time on this graph is measured in tenths of seconds, instead hundreds of
seconds like the O (n2) graph. But as with everything else in the real world, there are
trade-offs. These algorithms are blazingly fast, but that speed comes at the cost of
complexity. Recursion, advanced data structures, multiple arrays - these algorithms make
extensive use of those nasty things.
Often we do not measure the time efficiency of a sort by the number of time units
required but rather by number of critical operations performed. Examples of such critical
operations are key comparisons(that is, the comparisons of the keys of two records in the file to
determine which is greater)
● movement of records or pointers to records or
● Interchange of two records.
The critical operations chosen are those that take the most time.
In another way, The Big O is the upper bound of a function. In the case of algorithm
analysis, we use it to bound the worst-case running time, or the longest running time
possible for any input of size n. We can say that the maximum running time of the
algorithm is in the order of Big O.
For further explanation, let us consider two functions f(n) ad g(n),the f(n) is said to be the
order of g(n)or that f(n) is O(g(n))if there exist positive integer s a and b such that
f(n)<=a*g(n) for all n>=b. For example, f(n)=n^2+10n,and g(n)=n^2,then f(n) is O(g(n))
,since n62+10n is less than or equal to2n^2 for all n greater than or equal to 10.In this
case a equals 2 and b equals 10.However the same function f(n) is also order of O(n^3),as
n62+10n is less than or equal to 2n^3 for all n greater or equal to 8.So,for a given function
f(n),there may be more than one g(n) for O(g(n)) to be order of the function
f(n)..Thus,n^2+10n is O(n^2)and n^2 is O(n^3).This property is called transitive property
So, if f(n) is g(n) eventually f(n) becomes permanently smaller or equal to some multiple of
g(n).Soft(n) is bounded by g(n),more formally f(n) is asymmetrically bounded by g(n).
51.i.“There is a trade off between machine time and space required”. Justify the
statement.
In any process, generally there is a trade off between the machine time and the space
required i.e. the improvement of machine time effect the space and vice versa. Taking
sorting particularly, the amount of machine time necessary for running the sorting program
and the amount of space necessary for the program are the main efficiency considerations
of the sorting.
It would be difficult for the programmer to spend days investigating the best methods of
obtaining the last ounce of efficiency considering these considerations. However the
programmer must be able to recognize the fact that a particular sort is inefficient and must
be able to justify its use in a particular situation. Much time the designers and the planners
are surprised at the inadequacy of their creation. To maximize the techniques and be
cognizant of the advantages and disadvantages of each, so that when the need for a sort
arises the programmer can supply the one which is most appropriate for the particular
situation. This brings the considerations to the time and space while designing the sorting
techniques.
In most of the computer applications, the programmer must often optimize either time or
the space at the expense of the other. While considering the time required to sort a file of
size n, the actual time units are not concerned as these will vary from one machine to
another. Instead, the corresponding change in the amount of time required to sort a file
induced by a change in file size n is the matter of interest. This shows the relationship
between the time and the space. Let us consider y is proportional to x such that multiplying
x by a constant multiplies y by the same constant. Thus if y is proportional ton x, doubling x
will double y, and multiplying x by 10 multiply y by 10.
There are different ways to determine the time requirements of a sort, neither of which
yields that are applicable to all the cases. One is to go through a sometimes intricate and
involved mathematical analysis of the various cases ,the result of which is often a formula
giving the average time required for a particular sort as a function of file size that indicate
the space required.
Considering these issues ,it can be noticed that there is a trade off between the machine
time and the space both of which contribute equally to the efficiency of the process like
sorting.
51.ii. “There is a trade off between machine time and space required". Justify the
statement.
In any process, generally there is a trade off between the machine time and the space
required i.e. the improvement of machine time effect the space and vice versa. Taking
sorting particularly, the amount of machine time necessary for running the sorting program
and the amount of space necessary for the program are the main efficiency considerations
of the sorting.
It would be difficult for the programmer to spend days investigating the best methods of
obtaining the last ounce of efficiency considering these considerations. However the
programmer must be able to recognize the fact that a particular sort is inefficient and must
be able to justify its use in a particular situation. Many times the designers and the planners
are surprised at the inadequacy of their creation. To maximize the techniques and be
cognizant of the advantages and disadvantages of each, so that when the need for a sort
arises the programmer can supply the one, which is most appropriate for the particular
situation. This brings the considerations to the time and space while designing the sorting
techniques.
In most of the computer applications, the programmer must often optimize either time or
the space at the expense of the other. While considering the time required to sort a file of
size n, the actual time units are not concerned, as these will vary from one machine to
another. Instead, the corresponding change in the amount of time required to sort a file
induced by a change in file size n is the matter of interest. This shows the relationship
between the time and the space. Let us consider y is proportional to x such that multiplying
x by a constant multiplies y by the same constant. Thus if y is proportional ton x, doubling x
will double y, and multiplying x by 10 multiply y by 10.
There are different ways to determine the time requirements of a sort, neither of which
yields that are applicable to all the cases. One is to go through a sometimes intricate and
involved mathematical analysis of the various cases, the result of which is often a formula
giving the average time required for a particular sort as a function of file size that indicate
the space required.
Considering these issues, it can be noticed that there is a trade off between the machine
time and the space both of which contribute equally to the efficiency of the process like
sorting.
Comparing every two numbers in the list, and swapping them if the second number is less
than the first do the exchange sort. This turns out to be a very inefficient N2 sort. It is done
by comparing each adjacent pair of items in a list in turn, swapping the items if necessary,
and repeating the pass through the list until no swaps are done. Complexity is O(n2) for
arbitrary data, but approaches (n) if the list is nearly in order at the beginning.
Bi-directional bubble sort usually does better than bubble sort since at least one item is
moved forward or backward to its place in the list with each pass. Bubble sort moves items
forward into place, but can only move items backward one location each pass. Since at least
one item is moved into its final place for each pass, an improvement is to decrement the
last position checked on each pass. one of the main characteristics of this sort is that it is
easy to understand the program. but it is probably the least efficient. suppose x is an array
of integer out of which the first n are to be sorted so that x[I]<=x[j] for 0<=I<j<n. it is
straight forward to extend this simple format to one which is used in sorting n records ,
each with a sub field key k.
The basic idea underlying the bubble sort is to pass trough the file sequentially several
times. Each pass consist of comparing the each element in the file with its successor (x[I]
with x[I+1]) interchanging the two elements if they are not in the proper order.
The next sort we consider is the partition exchange sort or quick sort .let x be an array and
n the number of elements in the array to be sorted .choose an element a from a specific
position within the array (for e.g a can be chosen as the first element so that
a=x[0]).suppose that the elements of x are portioned so that a is placed into the position j
and the following condition hold .
Comparing every two numbers in the list, and swapping them if the second number is less
than the first do the exchange sort. This turns out to be a very inefficient N2 sort. It is done
by comparing each adjacent pair of items in a list in turn, swapping the items if necessary,
and repeating the pass through the list until no swaps are done. Complexity is O(n2) for
arbitrary data, but approaches (n) if the list is nearly in order at the beginning.
Bi-directional bubble sort usually does better than bubble sort since at least one item is
moved forward or backward to its place in the list with each pass. Bubble sort moves items
forward into place, but can only move items backward one location each pass. Since at least
one item is moved into its final place for each pass, an improvement is to decrement the
last position checked on each pass. one of the main characteristics of this sort is that it is
easy to understand the program. but it is probably the least efficient. suppose x is an array
of integer out of which the first n are to be sorted so that x[I]<=x[j] for 0<=I<j<n. it is
straight forward to extend this simple format to one which is used in sorting n records ,
each with a sub field key k.
The basic idea underlying the bubble sort is to pass trough the file sequentially several
times. Each pass consist of comparing the each element in the file with its successor (x[I]
with x[I+1]) interchanging the two elements if they are not in the proper order. The next
sort we consider is the partition exchange sort or quick sort .let x be an array and n the
number of elements in the array to be sorted .choose an element a from a specific position
within the array (for e.g a can be chosen as the first element so that a=x[0]).suppose that
the elements of x are portioned so that a is placed into the position j and the following
condition hold
2.each of the elements in position j+1 through n-1 is greater than or equal to a.
so if this two condition hold for the particular a and j, a is the jth smallest element of x, so that a
remains in position j when the array is completely sorted so if the forgoing process is repeated
with the sub arrays x[0] through x[j-1]and x[j+1] through x[n-1] and any sub arrays created by
this process in successive iteration finally we will get the sorted file.
Let us use the partition exchange sort (or quick sort) to the initial array given below:
25 57 48 37 12 92 86 33
The first element (25) placed in it proper position ,then the resulting array is
12 25 57 48 37 92 86 33
At this point, 25 are in its proper position in the array. It means each element in the
array below that position is less than or equal to 25 and each element in the array above that
position is greater than or equal to 25. Since 25 is in its final position, the original problem
has been decomposed into the problem of sorting the two sub arrays
(12) And (57 48 37 92 86 33)
Since an array containing one element is already sorted, nothing has to be done to
sort the first of the sub array. To sort the second sub array the process is repeated and the
sub array is further subdivided. The entire array may now be viewed as
Let us use the partition exchange sort (or quick sort) to the initial array given
below:
26 57 48 37 12 92 86 33
The first element (25) placed in it proper position ,then the resulting array is
13 25 57 48 37 92 86 33
At this point, 25 are in its proper position in the array. It means each element in the
array below that position is less than or equal to 25 and each element in the array above
that position is greater than or equal to 25. Since 25 is in its final position, the original
problem has been decomposed into the problem of sorting the two sub arrays
(12) And (57 48 37 92 86 33)
Since an array containing one element is already sorted, nothing has to be done to
sort the first of the sub array. To sort the second sub array the process is repeated and the
sub array is further subdivided. The entire array may now be viewed as
13 25 (48 37 33) 57 (92 86)
Where parenthesis encloses the sub arrays yet to be sorted. Repeating the process
on the sub arrays further yields
12 25 (37 33) 48 57 (92 86)
12 25 (33) 37 48 57 (92 86)
12 25 33 37 48 57 (92 86)
12 25 33 37 48 57 (86) 92
12 25 33 37 48 57 86 92
Hence the final array is a sorted array. In this way, the quick sort works.
54. Compare the efficiency of the bubble sort and quick sort.
BUBBLE SORT:
Each of these algorithms requires n-1 passes: each pass places one item in its
correct place. (The nth is then in the correct place also.) The ith pass makes either I or n - i
comparisons and moves. So:
or O(n2) Thus these algorithms are only suitable for small problems where their
simple code makes them faster than the more complex code of the O(n logn) algorithm. It
is likely that it is the number of interchanges rather than the number of comparisons that
takes up the most time in the programs’ execution.
QUICK SORT:
Assume that the file size n is a power of 2, say n=2m, so that m=log2n. Assume also
the proper position for the pivot always turns out to be the exact middle of the sub array. In
that case, there will be approximately n comparisons on the first pass, after which the file is
split into two files of size n/2, approximately. For each of these files, there are
approximately n/2 comparisons yielding four files each of size n/4 approximately. After
halving the sub files m times, there are n files of size 1.Thus the total number of
comparisons for the entire sort is approximately
n+2*n/2+4*n/4+…………………..+n*n/n
or, n+n+n+…………………………+n(m terms)
comparisons. There are m terms because the file is subdivided m times. Thus the
total number of comparisons is O(n*m) or O(n log n).
Hence, the quick sort [O(n logn)] is more efficient than the bubble
sort[O(n2)] during the complex sorting.
56. Explain insertion sort. Calculate the efficiency of insertion sort for both
worst case and best case.
An insertion is one that sorts a set of records by inserting records into existing sorted
file. The simple insertion sort is viewed as a general selection sort in which the priority
queue is implemented as an ordered array. Only the preprocessing phase of inserting the
elements into the priority queue is necessary; once the elements have been inserted, they
are already sorted, so that no selection is necessary.
The speed of the sort can be improved somewhat by using a binary search to find
the proper positioning the sorted file. Another improvement to the simple insertion sort can
be made by using list insertion. In this method there is an array link of pointers, one for
each of the original array elements.
The above fig. illustrates the complete process operation on a sample file. Each file is
contained within brackets.
In the Merge sort, the number of process does not exceed more than log2(n) [E.g.
log2(8) = log10(8)/ log10(2) = 3 ], each involving ‘n’ or few comparisons so the number of
comparisons does not exceed more than n* log2(n). And, Merge sort requires O(n)
additional space for the auxiliary array, whereas quick sort requires only O(log n) additional
space for the stack.
The above fig. illustrates the complete process operation on a sample file. Each file is
contained within brackets.
In the Merge sort, the number of process does not exceed more than log2(n) [E.g. log2(8)
= log10(8)/ log10(2) = 3 ], each involving ‘n’ or few comparisons so the number of comparisons
does not exceed more than n* log2(n). And, Merge sort requires O(n) additional space for the
auxiliary array.
Q. 58 Define Radix Sort. Show the two different ways of implementing Radix
Sort.
Radix Sort is defined as the process of sorting based on the values of the actual
digits in the positional representations of the numbers being sorted. For e.g. the number
317 in decimal notation or in positional representation is written with a 3 in the hundreds
position, a 1 in the tens position and a 7 in the units position.
Radix Sort can be implemented in two different ways: -
1) The first method is based on foregoing method. In this method, the given data’s/
numbers are partitioned into ten groups (from 0 to 9) based on their most significant digit,
but the necessary condition is that all the numbers should be of same size, if it not, it can
be made by placing ‘0’ i.e. zero in the front of the number. Thus, every element in the ‘0’
group is less than every element in the ‘1’ group, all of whose elements are less than every
element in the ‘2’ group and so on. Thus, we can sort within the individual groups based on
the next significant digit and thus process is repeated till each sub-group has been
sub-divided so the least significant digits are sorted. At this point, all the elements i.e. the
original file is sorted. This method of sorting is also called Radix-Exchange Sort.
For e.g.
Original File: - 237 222 138 37 135 444 21
Each element is made of same size: -
237 222 138 037 135 444 021
2. The second method of Radix sorting is started with the least significant digit and
ending with the most significant digit. Here, each number is taken in the order in which it
appears in the file and is placed into a one of the ten queue, depending upon the value of
digit currently being processed. Then, the numbers are restored into the original file from
each queue starting with the queue of numbers with a ‘0’ digit and ending with queue of
number with a ‘9’ digit. The process is repeated for each digit starting with the least
significant digit and ending with the MSD and at last the file is sorted. This method is called
Radix Sort.
For e.g.: -
Original File: - 25 19 17 26 33 49 30
Queue based on Least-Significant Digit : -
queue [0] 30
queue [1]
queue [2]
queue [3] 33
queue [4]
queue [5] 25
queue [6] 26
queue [7] 17
queue [8]
queue [9] 19 49
Radix Sort is defined as the process of sorting based on the values of the actual
digits in the positional representations of the numbers being sorted. For e.g. the number
589 in decimal notation or in positional representation is written with a 5 in the hundreds
position, a 8 in the tens position and a 9 in the units position.
Radix Sort can be implemented in two different ways: -
1) The first method is based on foregoing method. In this method, the given data or
numbers are partitioned into ten groups (from 0 to 9) using the decimal base. Thus, every
element in the ‘0’ group is less than every element in the ‘1’ group, all of whose elements
are less than every element in the ‘2’ group and so on. Thus, we can sort within the
individual groups based on the next significant digit and thus process is repeated till each
sub-group has been sub-divided so the least significant digits are sorted. At this point, all
the elements i.e. the original file is sorted. This method of sorting is also called
Radix-Exchange Sort.
For e.g.
Original File: - 732 123 128 30 511 555 10
Each element is made of same size: -
732 123 198 030 511 555 010
2. The second method of Radix sorting is started with the least significant digit and
ending with the most significant digit. Here, each number is taken in the order in which it
appears in the file and is placed into a one of the ten queue, depending upon the value of
digit currently being processed. Then, restore each queue to the original file starting with
the queue of numbers with a ‘0’ digit and ending with queue of number with a ‘9’ digit. The
process is repeated for each digit starting with the least significant digit and ending with the
MSD and at last the file is sorted. This method is called Radix Sort.
For e.g.: -
Original File: - 25 57 48 37 12 92 86 33
Queue based on Least-Significant Digit : -
Fr. Re.
queue [0]
queue [1]
queue [2] 12 92
queue [3] 33
queue [4]
queue [5] 25
queue [6] 86
queue [7] 57 37
queue [8] 48
queue [9]
In this method different increment i is chosen, i sub files are divided so that the jth
element of the kth sub file is x[(j-1)*k+j-1]. After the first i sub files are sorted a new
smaller value of increment is chosen and the file is again positioned into a new set of sub
files. Each of these large sub files are sorted and process is repeated again with an even
smaller value of i. The latest value in the sequence must be 1 so that the sub files consisting
of the entire file be sorted.
For e.g.
Original file is
17 81 19 84 12 28 24 92 33
x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7] x[8]
let the sequence of the increment be 5,3,1 . then
In addition to the standard parameters, x and n i.e. original file and its size it
requires an array incrmnts, containing the diminishing increments of the sort and numic, the
number elements in the array incmnts.
Let us consider incmnts[]={5, 3, 1}, numinc=3and original file with the elements as
below. Then, the shell sort takes place in following steps.
Original file 25 57 48 37 12 92 86 33
Pass 1 25 57 48 37 12 92 86 33
Span=5
Pass 2 25 57 33 37 12 92 86 48
Span=3
Pass 3 25 12 33 37 48 92 86 57
Span=1
Sorted 12 25 33 37 48 57 86 92
File
On the last iteration, where the span equals 1, the sort reduces to a simple insertion.
One thing that should be noted in Shell sort is that the elements of the incrmnts
should be relatively prime for successful iteration and the entire file is indeed almost sorted
when the span equals 1 on the last iteration.
60.i. What is heap? Write the conditions that should be satisfied to become a heap. Write
the steps involved in a heap sort.
Heap is implicit data structure for the abstract data type, Priority Queue , and allow for fast
insertions of new elements and fast deletions of the minimum element. In this respect, a
heap is not an abstract data type, but an implementation of the priority queue.
We choose a representation of a priority queue as a complete binary tree in which each node
contains an element and its associated key or priority. In addition, we assume that these keys
satisfy the heap condition, that:
At each node, the associated key is larger than the keys associated with either child of that node.
1) The value of each node is greater than or equal to the value of its parent, with the
minimum-value element at the root
2) The value of each node is less than or equal to the value of its parent, with the
maximum-value element at the root.
60.ii.What is heap? Write the conditions that should be satisfied to become a heap.
Write the steps involved in a heap sort.
Sorting algorithm, that works by first organizing the data to be
sorted into a special type of binary tree called a heap. The heap itself has, by definition, the
largest value at the top of the tree, so the heap sort algorithm must also reverse the order. It
does this with the following steps:
1. Remove the topmost item (the largest) and replace it with the rightmost leaf. The
topmost item is stored in an array.
3. Repeat steps 1 and 2 until there are no more items left in the heap.
A heap sort is especially efficient for data that is already stored in a binary tree. In most
cases, however, the quick sort algorithm is more efficient
Heap is implicit data structure for the abstract data type, Priority Queue , and allow for fast
insertions of new elements and fast deletions of the minimum element. In this respect, a heap
is not an abstract data type , but an implementation of the priority queue.
We choose a representation of a priority queue as a complete binary tree in which each node
contains an element and its associated key or priority. In addition, we assume that these keys
satisfy the heap condition, that:
At each node, the associated key is larger than the keys associated with either child of that
node.
i. The value of each node is greater than or equal to the value of its parent, with the
minimum-value element at the root
ii. The value of each node is less than or equal to the value of its parent, with the maximum-value
element at the root.
● Heap sort begins by building a heap out of the data set, and then removing the largest
item and placing it at the end of the sorted array.
● After removing the largest item, it reconstructs the heap.
● It then removes the largest remaining item and places it in the next open position from
the end of the sorted array.
● This is repeated until there are no items left in the heap and the sorted array is full.
61.i. Compare the different types of sorting for different types of data. Give
suitable reason in support for your answer.
Sorting is the most common ingredients of programming systems. Sorting means
putting unordered data into order. The typical situation is where we have items in an array
and would like to sort them by some key so that we can e.g. print them out nicely or do fast
binary searches on them. There are different sorting techniques, they are as follows:
BUBBLE SORT
This is one of the simplest sorting techniques. The idea is we go through the array,
comparing adjacent elements. If they are out of order, we swap them. We keep repeating
this process over and over until we can go through the array without doing any swaps. At
this point, no elements are out of order and the array is sorted. There are n-1 passes and
n-1 comparisons on each pass. Thus the total no. of comparisons is (n-1)*(n-1)=n2-2n+1,
which is O(n2). Of course, the number of interchanges depends on the original order of the
file. However, the no. of interchanges cannot be greater then the no. of comparisons. The
only redeeming features of the bubble sort are that it requires little additional space in
comparison to other sorting techniques.
INSERTION SORT
An insertion sort is one that sorts a set of values by insertion values into an existing
sorted file. Suppose an array a with n elements a[1],a[2]………,a[n] is in memory. The
insertion sort algorithm scans a from a[1] to a[n], inserting each element a[k] into its
proper position in the previously sorted sub array a[1],a[2],…….a[k-1]. Insertion sort has
one major disadvantage with respect to other sorting. Even after more items have been
sorted properly into the first part of the list, the insertion of a later item may require that
many of them be moved. All the moves made by insertion sort are moves of only one
position at a time. Thus to move an item 20 positions up the list requires 20 separate
moves. If the items are small ,perhaps a key alone, or if the items are in linked storage,
then that many moves may not require excessive time. But if the items are very large,
structures containing hundreds of components like personnel files or student transcripts,
and the structures must be kept in contiguous storage, then it would be far more efficient if,
when it is necessary to move an item, it could be moved immediately to its final position.
Now, this goal can be accomplished by sorting method called selection sort.
SELECTION SORT
The objective of this sort is to minimize data movement. The primary advantage of
selection sort regards data movement. If an item is in its correct final position, then it will
never be moved. Every time any pair of items is swapped ,then at least one of them moves
into its final position, and therefore at most n-1 swaps are done altogether in sorting a list
of n items. This is very best that we can expect from any method that relies entirely on
swaps to move its items.
Let us pause for a moment to compare the counts for selection sort with those for
insertion sort. The results are
Let us make relative comparisons between these two. When n becomes large, 0.25n2
becomes much larger than 3n, and if moving items is a slow process, then insertion sort will
take far longer than will selection sort. But the amount of time taken for comparisons is, on
average, only about half as much for insertion sort as for selection sort. Under other
conditions, then, insertion sort will be better.
SHELL SORT
As we have seen that insertion sort and selection sort behave in opposite ways.
Selection sort moves the items very efficiently but does many redundant comparisons. In its
best case, insertion sort does the minimum no. of comparisons, but is inefficient in moving
items only one place at a time. Now, the problems with both of these are avoided by
another method called shell sort. This method sorts separate sub files of the original file.
These sub files contain every kth element of the original file. The value of k is called an
increment. For example, if k is 5,the sub file consisting of x[0],x[5],x[10], ………is first
stored. Five sub files; each containing one fifth of the elements of the original file are sorted
in this manner.
Shell sort improves on the efficiency of insertion sort by quickly shifting values to
their destination. This can be explained as:
61.ii. Compare the different types of sorting for different types of data. Give
suitable reason in support for your answer.
Sorting means putting unordered data into order. The typical situation is where we
have items in an array and would like to sort them by some key so that we can e.g. print
them out nicely or do fast binary searches on them. There are different sorting techniques.
BUBBLE SORT
This is one of the simplest sorting techniques. The idea is we go through the array,
comparing adjacent elements. If they are out of order, we swap them. We keep repeating
this process over and over until we can go through the array without doing any swaps. At
this point, no elements are out of order and the array is sorted. There are n-1 passes and
n-1 comparisons on each pass. Thus the total no. of comparisons is (n-1)*(n-1)=n2-2n+1,
which is O(n2). Of course, the number of interchanges depends on the original order of the
file. However, the no. of interchanges cannot be greater then the no. of comparisons. The
only redeeming features of the bubble sort are that it requires little additional space in
comparison to other sorting techniques.
INSERTION SORT
An insertion sort is one that sorts a set of values by insertion values into an existing
sorted file. Suppose an array a with n elements a[1],a[2]………,a[n] is in memory. The
insertion sort algorithm scans a from a[1] to a[n], inserting each element a[k] into its
proper position in the previously sorted sub array a[1],a[2],…….a[k-1]. Insertion sort has
one major disadvantage with respect to other sorting. Even after more items have been
sorted properly into the first part of the list, the insertion of a later item may require that
many of them be moved. All the moves made by insertion sort are moves of only one
position at a time. Thus to move an item 20 positions up the list requires 20 separate
moves. If the items are small ,perhaps a key alone, or if the items are in linked storage,
then that many moves may not require excessive time. But if the items are very large,
structures containing hundreds of components like personnel files or student transcripts,
and the structures must be kept in contiguous storage, then it would be far more efficient if,
when it is necessary to move an item, it could be moved immediately to its final position.
Now, this goal can be accomplished by sorting method called selection sort.
SELECTION SORT
The objective of this sort is to minimize data movement. The primary advantage of
selection sort regards data movement. If an item is in its correct final position, then it will
never be moved. Every time any pair of items is swapped ,then at least one of them moves
into its final position, and therefore at most n-1 swaps are done altogether in sorting a list
of n items. This is very best that we can expect from any method that relies entirely on
swaps to move its items.
Let us pause for a moment to compare the counts for selection sort with those for
insertion sort. The results are
Let us make relative comparisons between these two. When n becomes large, 0.25n2
becomes much larger than 3n, and if moving items is a slow process, then insertion sort will
take far longer than will selection sort. But the amount of time taken for comparisons is, on
average, only about half as much for insertion sort as for selection sort. Under other
conditions, then, insertion sort will be better.
SHELL SORT
As we have seen that insertion sort and selection sort behave in opposite ways.
Selection sort moves the items very efficiently but does many redundant comparisons. In its
best case, insertion sort does the minimum no. of comparisons, but is inefficient in moving
items only one place at a time. Now, the problems with both of these are avoided by
another method called shell sort. This method sorts separate sub files of the original file.
These sub files contain every kth element of the original file. The value of k is called an
increment. For example, if k is 5,the sub file consisting of x[0],x[5],x[10], ………is first
stored. Five sub files; each containing one fifth of the elements of the original file are sorted
in this manner.
Shell sort improves on the efficiency of insertion sort by quickly shifting values to
their destination. This can be explained as:
QUICK SORT
Although the shell sort algorithm is significantly better than insertion sort, there is
still room for improvement. One of the most popular sorting algorithms is quick sort. Quick
sort executes in O(n log n) on average, and O(n 2 ) in the worst-case. However, with proper
precautions, worst-case behavior is very unlikely.
It posses a very good average case behaviors among all the sorting techniques. It works by
partitioning the array to be sorted. And each partition is in turn sorted recursively. In
partition, one of the array elements is chosen as a key value. This key value can be the first
element of an array. That is, if a is an array then key=a[0]. And rest of the array elements
are grouped into two partitions such that
⇨ one partition contains elements smaller than the key value.
⇨ Another partition contains elements larger than the key value.
Now let us comp the sorting algorithms covered: insertion sort, shell sort, and
quick sort. There are several factors that influence the choice of a sorting algorithm:
• Stable sort: A stable sort is a sort that will leave identical keys in the same
relative position in the sorted output. Insertion sort is the only algorithm covered that is
stable.
Space: An in-place sort does not require any extra space to accomplish its task.
Both insertion sort and shell sort are in- place sorts. Quick sort requires stack space for
recursion, and therefore is not an in-place sort. Tinkering with the algorithm considerably
reduced the amount of time required.
Time: Table below shows the relative timings for each method.
62.i. What is search? Discuss the binary search algorithm with suitable example.
Search is a technique used to find the location where the desired element is
available. Computer systems are often used to store large amounts of data from which
individual records must be retrieved according to some search criterion. Thus the efficient
storage of data to facilitate fast searching is an important issue. Generally, there are two
types of searching:
⇨ Linear or sequential searching
⇨ Binary searching
In linear search, we access each element of an array one by one sequentially and see
whether it is desired element or not. A search will be unsuccessful if all the elements are
accessed and the desired element is not found. In worst case, the number of average case
we may have to scan half of the size of the array(n/2).
However, if we place our items in an array and sort them in either ascending or
descending order on the key first, then we can obtain much better performance with an algorithm
called binary search. In binary search, we first compare the key with the item in the middle
position of the array. If there's a match, we can return immediately. If the key is less than the
middle key, then the item sought must lie in the lower half of the array; if it's greater then the
item sought must lie in the upper half of the array. So we repeat the procedure on the lower (or
upper) half of the array.
ALGORITHM
3) If item<a[mid],then
Set hi=mid-1
5) If a[mid]=item then
Set location=mid
Else location=NULL
3) Exit
Analysis
9 12 24 30 36 45 70
0 1 2 3 4 5 6
STEP 1: The given array is in ascending order; item to be searched for is 45.
Low=0, hi=6
mid=int(low+hi)/2= int(0+6/2)=int(3)=3
9 12 24 30 36 45 70
0 low 1 2 3 4 5
6 hi
low mid hi
36 45 70
4 5 6
45=45
Explanation:
1. A midpoint pointer is initialized to point to the middle of the list. i.e the array is
divided into two sub-arrays about half as large.
2. While the start pointer is LESS THAN OR EQUAL TO the end pointer
AND the Item is NOT found.
2(a). if key k is LESS THAN value pointed to by the mid pointer
“right” pointer is set to point to mid index – 1 (i.e. ignore top half of current list)
2(b). Else item is GREATER THAN value pointed to by mid pointer
“left” pointer is set to point to mid index + 1 (I.e. ignore bottom half of current list)
2(c). Mid pointer is set to point to mid index of current new working list, loop back
and do check again.
3. If mid pointer is EQUAL to our searched “Key”, the algorithm stops.
4. Else returns –1 to indicate item not found.
Example.
Consider an array of 13 elements and search key K = 19. The operation of binary
search is
illustrated in the figure below.
63. Compare and contract the efficiency of three searching algorithms.
(Sequential, Binary and Search Tree).
Sequential Search: The algorithm begins at the first element in the array and
looks at each element in turn until the search argument is found.
Analysis of sequential search efficiency:
The worst-case efficiency of this search algorithm is its efficiency for the
worst-case input of size n. In the worst case, sequential Search performs n comparisons
(if key K is the last
element in the array).So consumes more time.
The best-case efficiency of an algorithm is its efficiency for the best-case input of
size n.
In the best case, one comparison is done (if key K is the first element in the
array).So requires less time.
The average-case efficiency of an algorithm is its efficiency for a “typical” or
“random”
input of size n. In the average case, n/2 comparisons are done.
The time complexity of sequential search is O(n) in the worst case.
Sequential search is easier to implement than binary search, and does not require the list to
be sorted.
Binary Search:
Binary search that requires O(log2n) time.
Binary search is much faster than sequential search, but it works only for sorted
arrays.
For every file there is at least one set of keys (possibly more) that are unique (that
is, no two records have the same key value). Such a key is called a Primary key. For
example, if the file is stored as an array, the index within the array of an element is a
unique external key for that element.
If the state used is in such a way that the key for a particular search is not be
unique. Since there may be two records with the same state in the file. Such a key is called
Secondary key.
Some of the algorithms we present assume unique keys; others allow for duplicate
keys. When adopting an algorithm for a particular application the programmer should know
whether the keys are unique and make sure that the algorithm selected is appropriate.
A file is also known as a Table and it is a group of elements each of which is called a record.
Each record id differentiated with other different records by an association known as a key.
The key may or may not be contained within a record. The key within a record is called
internal key or embedded key. A separate table of keys that include pointers to the records
and such are called external key.
For the organization of the file such keys are used. So while the file is needed to be
searched different techniques are to be used. Such as, the file with internal key, while
searching in such file the content of entire table maintained within the main memory is
searched and such searching technique is called internal searching. But it is not the case of
external searching. The file which has its table of keys is kept in the auxiliary storage, so
while searching in such file the table is to be searched by going to the auxiliary storage and
such searching techniques is called external searching.
So external searching is quite different from internal searching with a view of managing the
table of keys. Since the table of key is in the auxiliary storage external searching is slower
and inefficient because it requires external storage and the searching time is greater as
compared to internal searching.
66.i. Write down sequential searching algorithm and also its efficiency?
The sequential searching algorithm is as follows.
for (i=0;i<n;i++)
if (key == k(i))
return (i);
return (-1);
This algorithm examines each key in turn; upon finding one that matches the search
argument, its index is returned. If no match is found than -1 is returned.
If we have number of comparisons to perform through a table of constant size n .The
number of comparisons depends upon where the record with the argument key appears in the
table .If the record is the first one in the table than only one comparison will do but if the
record is the last one than the no of comparisons needed is n. If it is equally likely for the
argument to be found at any table position, a successful search will take (on the average)
(n+1)/2 comparisons and an unsuccessful search will take n comparisons. In any case the
number of comparisons being 0(n).
Let p(i) be the probability that record i is retrieved. (P (i) is a number between 0 and
1 such that if m retrievals are made form the file , m*p(i) of them will be from r(i). Let us
assume that p(0)+p(1)+………..+p(n-1) = 1, so that there is no possibility that an argument
key is missing from the table. Then the average number of comparisons in searching for a
record is
P(0) + 2*p(1)+3*p(2)+…..+n*p(n-1
Thus if a large stable file, reordering the file in order of decreasing probability of
retrieval achieves a greater degree of efficiency each time that the file is searched.
66.ii. Write down the sequential searching algorithm and also its efficiency.
Sequential searching is the simplest type of searching method. This search is applicable to a
table organized either as an array or as a linked list. To write an algorithm we have to
generalize some terms. r represents a record k represents a key so that k(I) is the key of r(i).
i is the index.
The algorithm:
For(i=0;i<n;i++)
If(key==k(i))
Return (i);
Return –1;
If the required key is matched with the record key the index of that record is returned form
the function otherwise –1 is returned. The algorithm examines each key in turn.
Low=0;
Hi=n-1;
While(low<=hi){
mid=(low+hi)/2;
if(key==k[mid])
return(mid);
if(key<k(mid))
hi=mid-1;
else
low=mid+1;
}
return –1;
Each comparison in the binary search reduces the number of possible candidates by
a factor of 2. Therefore the maximum number of comparisons is approximately log2n.
Hence binary search is a better search for a large number of data.
67.ii. Write down the two versions of Binary Search.( i.e. Recursive and Iterative).
The binary search refers to the searching techniques that divide the records into two parts
while searching. In this search the argument compared with the key of the middle element
of the table. If they are equal, the search ends successfully otherwise either the upper or
lower half of the table must be searched in the similar manner until the required element is
found.
The recursive version of the binary search is as follows:
Int bsearch(int key,int a[]){
If(I>j)
Return –1;
mid=(I+j)/2;
if(key==a[mid])
return mid;
if(key<a[mid])
bsearch(key,a,I,mid-1)
else
bsearch(key,a,mid+1,j);
}
The iterative version of the binary search is as follows:
Low=0;
Hi=n-1;
While(low<=hi){
mid=(low+hi)/2;
if(key==k[mid])
return(mid);
if(key<k(mid))
hi=mid-1;
else
low=mid+1;
}
return –1;
Each comparison in the binary search reduces the number of possible candidates by a factor
of 2. Therefore the maximum number of comparisons is approximately log2n.
The function node search is responsible for locating the smallest key in a node
greater than or equal to the search argument. The simplest technique or doing this is a
sequential search through the ordered set of keys in the nodes. If all keys are of equal
length , a binary search can also be used to locate the appropriate key. The decision
whether to use a sequential or binary search depend upon the order of tree, which
determine how many keys must be searched. Another possibility is to organize the keys
within the node as a binary search tree.
A multi-way search tree of order n is a general tree in which each node has n or
fewer sub trees and contains one fewer key than it has sub trees. That is, if a node has four
sub trees, it contains three keys .The searching mechanism for the multi-way search tree
whether it is top-down, balanced, or neither ,is straightforward. Each node contains a single
integer field, a variable number of pointer fields, and a variable number of key fields. If
node (p) is a node, the integer field numtrees (p) equals the number of sub trees of
node(p).numtrees (p) is always less than or equal to the order of the tree, n. The pointer
fields son(p,0) through son(p, numtrees (p)-1) point to the sub trees of node(p) .The key
fields k(p,0) through k(p ,numtrees (p)-2) are the keys contained in node(p) in ascending
order. The sub tree to which son(p, i) points (for I between 1 and numtrees(p)-2 inclusive)
contains all keys in the tree between k(p,i-1) and k(p,i).son(p,0) points to a sub tree
containing only keys less than k(p,0) and son(p,numtrees(p)-1) points to a sub tree
containing only keys greater than k(p ,numtrees(p)-2).
We also assume a function nodesearch(p, key) that returns the smallest integer j
such that key<=k(p, j) , or numtrees (p)-1 if key is greater than all the keys in
node(p).The following recursive algorithm is for a function search(tree) that returns a
pointer to the node containing key(or –1[representing null] if there is no such node in the
tree) and sets the global variable position to the position of key in that node:
p=tree;
if (p==null){
position=-1;
return (-1);
}
i=nodesearch(p, key);
if(i<numtrees (p)-1&& key ==k(p, i) {
position =i;
return (p);
}
return (search(son(p, i)));
Hashing is the process of positioning or pointing the key of any number of digits by
using the less number of elements. For example, suppose that the company has an
inventory file of more than 100 items and the key to each record is a seven-digit part
number. To use direct indexing using the entire seven-digit key, an array of 10 million
elements would be required. This clearly wastes an unacceptably large amount of space
because it is extremely unlikely that a company stocks more than a few thousand parts.
So, the hashing is used for the seven-digit number. If the company has fewer than
1000 parts and that there is only a single record for each part. Then an array of 1000
elements is sufficient to contain the entire file. The array is indexed by an integer between o
and 999 inclusive. The last three digits of the part number are used as the index for the
part’s record in the array.
4967000
8421002
0000990
8765991
0001999
0
1
2
990
991
999
In the above figure the seven –digits numbers are represented by the 0-to 999-. The
function that transforms a key into a table index is called a hash function. The seven –digits
are not close to each other but while representing they are close. For example
0000990 and 8765991 are extremely different but the 990 and 991, which are close,
represents them.
73. Discuss topological depth first Traversal with a suitable example
(Assume a directed graph of eight vertices)
Suppose that the traversal has just visited a vertex v, and let w1,w2,….,wk be the
vertices adjacent to v. Then we shall next visit w1 and keep w2,…,wk waiting. After visiting
w1,we traverse all the vertices to which it is adjacent before returning to traverse w2,….,wk.
1 2 3
Start
9 5 4
6 7 8
6 7 8
Depth-first traversal
Breadth-first traversal of a graph is roughly analogous to level-by-level traversal of
an ordered tree. An alternative traversal method, breadth-first traversal (or breadth-first
search), visits all the successors of a visited node before visiting any successors of any of
those successors. This is the contradiction to depth-first traversal, which visits the
successors of a visited node before visiting any of its ‘brother’. Whereas depth-first traversal
tends to create very long, narrow trees, breadth first traversal tends to create very wide,
short trees.
If the traversal has just visited a vertex v, then it next visits all the vertices adjacent
to v, putting the vertices adjacent to these in a waiting list to be traversed after all the
vertices adjacent to v have been visited. Figure below shows breadth-first traversal.
Start
1 2 5
4 3 7
6 8 9
Breadth-first traversal
74: Write down the difference between depth first and breadth first
traversal of a graph?
In many problems, we wish to investigate all the vertices in a graph in some
systematic order, just as with binary trees, where we developed several systematic
traversal methods. In tree traversal, we had a root vertex with which we generally started;
in graphs, we often do not have any one vertex singled out as special, and therefore the
traversal may start at an arbitrary vertex. Although there are many possible orders for
visiting the vertices of the graph, two methods are of particular importance: Depth-first
traversal and Breadth- first traversal.
Depth-first traversal of a graph is roughly analogous to preorder traversal of an
ordered tree. Depth-first traversal as its name indicates, traverse a single path of the graph
as far as it can go i.e. until it visits a node with no successors or a node all of whose
successor have already been visited. It then resume at the last node on the path just
traversed that has an unvisited successor and begins traversing a new path emanating from
that node. Spanning trees created by a depth-first traversal tends to be very deep.
Depth-first traversal, like any other traversal method that creates a spanning forest, can be
used to determine if an undirected graph is connected and to identify the connected
component of an undirected graph. Depth-first traversal can also be used to determine if a
graph is acyclic.
Suppose that the traversal has just visited a vertex v, and let w1, w2, wk be the
vertices adjacent to v. Then we shall next visit w1 and keep w2, wk waiting. After visiting
w1, we traverse all the vertices to which it is adjacent before returning to traverse w2, wk.
1 2 3
Start
9 5 4
6 7 8
6 7 8
Depth-first traversal
Breadth-first traversal of a graph is roughly analogous to level-by-level traversal of
an ordered tree. An alternative traversal method, breadth-first traversal (or breadth-first
search), visits all the successors of a visited node before visiting any successors of any of
those successors. This is the contradiction to depth-first traversal, which visits the
successors of a visited node before visiting any of its ‘brothers’. Whereas depth-first
traversal tends to create very long, narrow trees, breadth first traversal tends to create very
wide, short trees.
If the traversal has just visited a vertex v, then it next visits all the vertices adjacent
to v, putting the vertices adjacent to these in a waiting list to be traversed after all the
vertices adjacent to v have been visited. Figure below shows breadth-first traversal.
Start
1 2 5
4 3 7
6 8 9
Breadth-first traversal
In implementing depth-first traversal, each visited node is placed on a stack (either
implicitly via recursion or explicitly), reflecting the fact that the last node visited is the first
node whose successors will be visited. Breadth-first traversal is implemented using a queue,
representing the fact that the first node visited is the first node whose successor is visited.
A forest may be defined as an acyclic graph in which every node has one or no
predecessors. A tree may be defined as a forest in which only a single node (called the root)
has no predecessors. Any forest consists of a collection of trees. An ordered forest is one
whose component trees are ordered.
Any spanning tree divides the edges (arcs) of a graph into four distinct groups:
Tree edges, forward edges, cross edges and back edges.
Tree Edges: Tree edges are arcs of the graph that are included in the spanning
forest.
Forward Edges: Forward edges are arcs of the graph from a node to a spanning
forest non-son descendant.
Cross Edges: Cross Edges are arcs from one node to another node that is not the
first node’s descendant or ancestor in the spanning forest.
Back Edges: Back edges are arcs from a node to a spanning forest ancestor.
ig a
Fig b
77.i. Explain directed and undirected graph and also discuss about
connected and not connected graphs.
Undirected Graph:
Undirected graph is represented by a diagram where vertices are represented
by points or small circles, and edges by arcs joining the vertices of the associated path
given by a mapping.
Figure below shows an undirected graph, thus the unordered pair (v1,v2) is
associated with edge e1; the pair (v2,v2) is associated with edge e6( a self loop).
fig
Fig: undirected graph
Directed Graph:
A directed graph or digraph consists of:
(a) a non empty set V called the set of vertices,
(b) a set E called the set of edges, and
(c) a map Ø which assigns to every edge unique ordered pair of vertices.
In a directed graph, edges are represented by different arcs.
Figure below gives a directed graph. The order pair (v2,v3),(v3,v4),(v1,v3) is
associated with the edges e3,e4,e2, respectively.
If every node in a graph is reachable from every other then the graph is known to be
Connected Graph otherwise it is Not-connected Graph.
Fig: undirected and not-connected graph
FIG 2
FIG 3
FIG 5
FIG 4
An undirected graph is termed connected if every node in it is reachable from every
other .Pictorially, a connected graph has only one segment for e.g. the graph of fig 6 is a
connected graph .The graph of fig 2 is not connected ,since node E is not reachable from
node C ,for e,g.A connected component of an undirected graph is a connected sub graph is
reachable from any node in a sub graph. For e.g., the sub graph fig. 1 has three connected
component: nodes a,b,c,d,f ;nodes E and G ; and node H. A connected graph has a single
connected component.
Fig 6
1)topological sort is used to arrange items when some pairs of items have no
comparison, that is, according to a partial order.
3) The topological sort algorithm creates a linear ordering of the vertices such that if
edge (u,v) appears in the graph, then v comes before u in the ordering. The graph
must be a directed acyclic graph (DAG). The implementation consists mainly of a call
to depth_first_search().
4)topological sort command tsort reads input from file or from the standard input if
you do not specify a file and produces a totally ordered list of items consistent with a
partial ordering of items provided by the input.
Input to tsort takes the form of pairs of items (non-empty strings) separated by
blanks. A pair of two different items indicates ordering. A pair of identical items
indicates presence, but not ordering.
1)topological sort is used to arrange items when some pairs of items have no
comparison, that is, according to a partial order.
3) The topological sort algorithm creates a linear ordering of the vertices such that if
edge (u,v) appears in the graph, then v comes before u in the ordering. The graph
must be a directed acyclic graph (DAG). The implementation consists mainly of a call
to depth_first_search().
4)topological sort command tsort reads input from file or from the standard input if
you do not specify a file and produces a totally ordered list of items consistent with a
partial ordering of items provided by the input.
Input to tsort takes the form of pairs of items (non-empty strings) separated by
blanks. A pair of two different items indicates ordering. A pair of identical items
indicates presence, but not ordering.
79.i..Explain the Kruskal’s method of Spanning Tree. Compare this with Round
Robin’s Algorithm.
This algorithm operates by considering edges in the graph in order of weight from the least
weighted edge up to the most while keeping track of which nodes in the graph have been
added to the spanning tree. If an edge being considered joins either two nodes not in the
spanning tree, or joins a node in the spanning tree to one not in the spanning tree, the edge
and its endpoints are added to the spanning tree. After considering one edge the algorithm
continues to consider the next higher weighted edge. In the event that a graph contains
equally weighted edges the order in which these edges are considered does not matter. The
algorithm stops when all nodes have been added to the spanning tree.
Note that, while the spanning tree produced will be connected at the end of the algorithm,
in intermediate steps Kruskal can be working on many independent, non-connected sections
of the tree. These sections will be joined before the algorithm completes.
Often this algorithm is implemented using parent pointers and equivalence classes. At the
start of the processing, each vertex in the graph is an independent equivalence class.
Looping through the edges in order of weight, the algorithm groups the vertices together
into one or more equivalence classes to denote that these nodes have been added to the
solution minimal spanning tree.
It is a good idea to process the edges by putting them into a min-heap. This is usually much
faster than sorting the edges by weight since, in most cases, not all the edges will be added
to the minimal spanning tree.
Whereas ROUND ROBIN ALGORITHM provides even better performance than kruskal’s
algorithm when the number of edges are low. Kruskal’s algorithm is o(e loge)Whereas
Round robin algorithm requires only o(e log logn) operations.
The Kruskal Algorithm starts with a forest which consists of n trees. Each and
everyone tree, consists only by one node and nothing else. In every step of the algorithm,
two different trees of this forest are connected to a bigger tree. Therefore ,we keep having
less and bigger trees in our forest until we end up in a tree which is the minimum genetic
tree (m.g.t.) .In every step we choose the side with the least cost, which means that we are
still under greedy policy. If the chosen side connects nodes which belong in the same tree
the side is rejected, and not examined again because it could produce a circle which will
destroy our tree. Either this side or the next one in order of least cost will connect nodes of
different trees, and this we insert connecting two small trees into a bigger one. Whereas
The Round-Robin algorithms for the short-term scheduler are the only preemptive
algorithms that will be used. All partial trees are maintained in a queue ,q. associated with
each partial treat, is a priority queue(t),of all arcs with exactly one incident node in the tree,
ordered by the weights of the arcs. Initially, as in kruskal’s , each node is a partial tree. A
priority queue of all arcs incident to nd is created for each node nd, and the single-node
trees are inserted into q in arbitrary order.
79.ii. Exlain Kruskal’s method of Spanning Tree .Compare this with Round Robin’s
Algorithm.
Kruskal’s method of spanning Tree is one of the methods of creating minimum spanning
tree. In this method nodes of the graph are initially considered as ‘ n’ distinct partial trees
with one node each. Then two distinct partial trees are connected into a single partial tree
by an edge of the graph. While connecting two distinct trees the arc of minimum weight
should be used, for that arcs are placed in a priority queue on the basis of weight. Then the
arc of lowest weight is examined to see if it connects two distinct trees. To determine if an
arc (x, y) connects distinct trees, we can implement the trees with a father field in each
node. Then we can traverse all ancestors of x and y to obtain the roots of the trees
containing them. If the roots of the two trees are the same node, x and y are already in the
same tree, so arc (x, y) is discarded, and the arc of next lowest weight is examined.
Combining two trees simply involves setting the father of the root of one to the root of the
Round Robin’s algorithm is another method of spanning tree, which provides better
performance when the number of edges is low. This algorithm is similar to Kruskal’s method
except that there is a priority queue of arcs associated with each partial tree, rather than
one global priority queue of all unexamined arcs. For this at first all partial trees are
maintained in a queue,Q. Associated with each partial tree, T, is a priority queue ,P(T),of all
arcs with exactly one incident node in the tree, ordered by the weights of the arcs. Then a
priority queue of all arcs incident to ‘nd’ is created for each node ‘nd’, and the single–node
trees are inserted into Q in arbitrary order. The algorithm proceeds by removing a partial
tree,T1, from the front of Q; finding the minimum –weight arc a in P(T1);deleting from Q
the tree ,T2,at the other end of arc a; combining T1 and T2 into a single new tree T3 [and
at the same time combining P(T1) and P(T2),with a deleted ,into P(T3)];and adding T3 to
the rear of Q. This continues until Q contains a single tree: the minimum spanning tree. This
algorithm requires only O(e log n) operations if appropriate implementation of the priority
queues is used .
80.i. Discuss about Greedy algorithms and Dijkstra’s algorithm to find shortest
path in a graph.
To find the shortest path between points, the weight or length of a path is calculated
as the sum of the weights of the edges in the path. A path is a shortest path if there is no
path from x to y with lower weight. Dijkstra's algorithm finds the shortest path from x to y
in order of increasing distance from x. That is, it chooses the first minimum edge, stores
this value and adds the next minimum value from the next edge it selects. It starts out at
one vertex and branches out by selecting certain edges that lead to new vertices. Djikstra's
algorithm solves the problem of finding the shortest path from a point in a graph (the
source) to a destination. It turns out that one can find the shortest paths from a given
source to all points in a graph in the same time, hence this problem is sometimes called the
single-source shortest paths problem. This problem is related to the spanning tree one. The
graph representing all the paths from one vertex to all the others must be a spanning tree -
it must include all vertices. There will also be no cycles as a cycle would define more than
one path from the selected vertex to at least one other vertex. For a graph,
● G=(V,E) where
● V is a set of vertices and
● E is a set of edges.
The relaxation process updates the costs of all the vertices, v, connected to a vertex, u, if
we could improve the best estimate of the shortest path to v by including (u,v) in the path
to v.
80.ii. discuss about Greedy and Dijkstra’s algorithm to find shortest path in a
graph.
In stead of building the tree from the top down, as in balancing method, the Greedy method
builds the tree from the bottom up. The method uses a doubly linked linear list in which
each list element contains four pointers, one key value and three probability values. The
four pointers are left and right list pointers used to organize the doubly linked list and left
and right sub tree pointers used to keep track of binary search sub trees containing keys
less than and greater than the key value in the node. The three probability values are the
sum of the probabilities in the left sub tree , called the left probability, the probability p( i )
of the node’s key value k(i), called the key probability, and the sum of the probabilities in
the right sub tree , called the right probability . The total probability of a node is defined as
sum of its left, key, and right probabilities. Initially there are n nodes in the list. The key
value in the ith node is k (i), its left probability is q (i-1), its right probability is q (i), its key
probability is p (i), and its left and right sub tree pointers are null.
Each iteration of the algorithm finds the first node on the list whose total probability is less
than or equal to its successor’s (if no node qualifies, nd is set to the last node in the list)
.The key in nd becomes the root of a binary search sub tree whose left and right sub trees
are the left and right sub trees of nd .nd is then removed from the list. The left sub tree
pointer of its successor (if any) and the right sub tree pointer of its predecessor (if any) are
reset to point to the new sub tree, and the left probability of its successor and the right
probability of its predecessor are reset to the total probability of nd .This process is
repeated until only one node remains on the list .when only one node remains on the list.
Its key is placed in the root of the final binary search tree, with the left and right sub tree
pointers of the node as the left and right sub tree pointers of the root.
Dijkstra’s Algorithm:
In a weighted graph, if all weights [let weight (i, j) is the weight of the arc from i to j, if
there is no arc from i to j, weight (i, j) is set to an arbitrarily large value to indicate the
infinite cost (that is impossibility) of going directly from i to j.] are positive, the Dijkstra’s
algorithm is used to find the shortest path between two nodes, s and t (i.e. from s to t). Let
the variable infinity hold the largest possible integer. Distance[i] keeps the cost of the
shortest path known thus far from s to i. Initially, distance [s] =0 and distance[i] =infinity
for all i! =s. A set perm contains all nodes whose minimal distance from s is known-that is,
those nodes whose distance value is permanent and will not change. If a node i is member
of perm, distance [i] is the minimal distance from s to i. Initially, the only member of perm
is s. Once t becomes a member of perm, distance[t] is known to be the shortest distance
from s to t, and the algorithm terminates. This implementation was O (n2 ) , where n is the
number of nodes in the graph. In most cases this algorithm can be implemented more
efficiently using adjacency lists.