Algorithms and Data Structures: Unit-I
Algorithms and Data Structures: Unit-I
1. Algorithm
Programming is a very complex task, and there are a number of aspects of programming that make it so
complex. The first is that most programming projects are very large, requiring the coordinated efforts of many
people. (This is the topic a course like software engineering.) The next is that many programming projects
involve storing and accessing large quantities of data efficiently. (This is the topic of courses on data structures
and databases.) The last is that many programming projects involve solving complex computational problems,
for which simplistic or naive solutions may not be efficient enough. The complex problems may involve
numerical data (the subject of courses on numerical analysis), but often they involve discrete data.
Any real-word problem solving starts with by planning step by step instructions to solve a problem. An
algorithm is an explicit, precise, unambiguous, mechanically-executable sequence of elementary instructions.
Algorithm must satisfy following criteria
1. Input: Zero or more quantities are externally supplied.
2. Output: At least one quantity is produce.
3. Definiteness: Each instruction is clear and unambiguous (e.g add salt to suit your test and add 1
teaspoon salt).
4. Finiteness: if we trace out the instructions of an algorithm, then for all cases, the algorithm terminates
after a finite number of steps.
5. Effectiveness: Every instruction must be very basic so that it can be carried out, in principle, by a
person using only pencil and paper.
Note: Algorithms that are definite and effective are also called computational procedures.
Algorithm is an abstraction of real computation that that we wanted to perform in-order to solve a problem.
While planning to design an algorithm one must learn two skills
1. Intuition: How to think about abstract computation.
2. Language: How to talk about abstract computation.
It is important to express or write what we precisely think about abstract computation. We need some language
to express our intuitions.
The best way to write down an algorithm is using pseudocode. Pseudocode uses the structure of formal
programming languages and mathematics to break algorithm into primitive steps; but the primitive steps
themselves may be written using mathematics, pure English, or an appropriate mixture of the two. Well-written
pseudocode reveals the internal structure of the algorithm but hides irrelevant implementation details, making
the algorithm much easier to understand, analyse, debug, and implement.
The precise syntax of pseudocode is a personal choice, but the overriding goal should be clarity and precision.
Ideally, pseudocode should allow any computer programmer to implement the underlying algorithm, quickly
and correctly, in their favourite programming language, without understanding why the algorithm works.
Flow chart is pictorial representation of algorithm. Flow chart uses specified symbols to indicate processing
tasks.
1. Start Start
2. Sum=0
3. i=1
Sum=0
4. if i>n goto 8
5. Sum=Sum+i
6. i=i+1 For i= 1 to n
7. goto 4
8. Stop
Sum=Sum + i
End
Flow Chart
This logic gets complicated as value of N increase. It can not be generalized by using loop.
Logic-2: let start with find largest number between 2 numbers (c&d) and assume c is large
Large=c
If (b>Large)
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
Large=b
Now extend this logic for 3 numbers (b,c,d)
Large=c
If (d>Large)
Large=d
If (b>Large)
Large=b
This logic is less complicated as value of N increase ad can easily be generalized.
Large=list[0]
For (i=1; i<N; i++)
If (list[i] > Large)
Large=list[i]
Exercise:
1. Find GCD of two numbers
2. Find LCM of two numbers
3. Find second largest number in the list of numbers
4. Find smallest number in the list of numbers
5. Generate first N Fibonacci numbers
2. Analysis of algorithms
Unlike programs, algorithms are not dependent on a particular programming language, machine, system, or
compiler. They are mathematical entities, which can be thought of as running on some sort of idealized
computer with an infinite memory of an unlimited word size.
.
Analysing algorithm means predicting the requirement of resources. There are two important resources;
processor that runs instructions and memory that stores data and instructions. Space Complexity of an algorithm
is the amount of memory it needs to run for completion. Time complexity of an algorithm is the amount of time
is needed to process or run all instructions.
It is important to select or define model of computation (like Random Access Machine (RAM)) for running
algorithm. we assume that our machine model (usually RAM) and the real machine differ only by constant
factors (unknown to us). RAM model has
• all instructions operate in serial (no concurrence, no parallel computation except when noted);
• all atomic operations like addition, multiplication, subtraction, read, compare, store etc. take unit time, and
• all atomic data (chars, ints, double, pointer etc) take up unit space
Once model is fixed then an algorithm can be written using a pseudo language. There are two ways to analyse
algorithm
• The empirical (or posterior) approach to choose an algorithm consists of programming the competing
techniques and trying them on different instances with the help of computer.
• The theoretical (or priori) approach consists of determining mathematically the quantity of resources
needed by each algorithm as a function of the size of the instances considered. The resources of most
interest are computing time and storage space with the former usually being the most critical. The
advantage of theoretical approach is that it depends on neither the computer being used, nor the
programming language, nor even the skill of the programmer. Most significantly, it allows us to study
the efficiency of an algorithm when used on instances of any size.
Mathematically algorithm is a function that takes zero or more inputs and produces one or more outputs. In
general, the time taken by an algorithm grows with the size of the instance (input output), so it is traditional to
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
describe the running time of an algorithm as a function of the size of its instance. The running time of an
algorithm typically grows with the instance size.
In general, following are the types of inputs.
• Size of an array
• Polynomial degree
• Number of elements in a matrix
• Number of bits in binary representation of the input
• Vertices and edges in a graph
The most natural measure for instance size is the number of items in the instance but the best measure of
instance size is the total number of bits needed (e.g. program to check for prime number need single input but
algorithm is based on magnitude of a number) to represent the instance.
The running time of an algorithm on a particular instance is the number of primitive operations or “steps”
executed. It is convenient to define notation of step so that it is as machine-independent as possible. The number
of state any program statement is assigned depends on the kind of statement. For example,
• Comments count as zero state;
• An assignment statement which does not involve any calls to other algorithms is counted as one step;
• In an iterative statement such as for, while and do-while statements, we consider step counts only for the
control part (condition statement) of the statement.
The execution time of step depending on the particular implementation used- the machine, the programming
language, and so on. Since exact time required for step is unimportant, we are simplifying by saying that step
can be executed in unit cost.
The method to determine step count of an algorithm is to build a table in which we list the total number of steps
contributed by each statement. The total number of step is calculated by first determining the number of steps
per execution (s/e) of the statement and the total number of times (i.e. frequency) each statement is executed.
The s/e of a statement is the amount by which the count changes as a result of the execution of that statement.
By combining these two quantities, the total contribution of each statement is obtained. By adding the
contribution of all statements, the step count for the entire algorithm is obtained.
Example:
1. Find largest number among N numbers
Statement s/e frequency Steps count
Algorithm Large(list, n) 0 -- 0
Large=list[0] 1 -- 1
For (i=1; i<N; i++) 1 n n
If (list[i] > Large) 1 n-1 n-1
Large=list[i] 1 n-1 n-1
Total Step Count 3n -1
3. Sum=1+2+4+8+…… <=N
Statement s/e frequency Step count
Algorithm Sum( n) 0 -- 0
Total=0 1 1 1
I=1 1 1 1
While(total < =n) 1 Log n Log n
{ 0 0 0
I=I*2 1 Log n Log n
Total= Total+I 1 Log n Log n
}
Total Step Count 3log n +2
The time taken by an algorithm can vary considerably between two different instances of the same size. One
can extricate from difficulties resulting from situations when the chosen parameters are not adequate to
determine the step count uniquely by defining three kinds of step counts: best case, worst case, and average
case.
• The best-case step count is the minimum number of steps that can be executed for the given parameters.
(lower bound)
• The worst-case step count is the maximum number of steps that can be executed for the given
parameters. (upper bound)
• The average-case step count is the average number of steps executed on instances with the given
parameters. Let us take algorithm for linear search.
Example:
1. Linear search
Statement s/e frequency Total steps
1 Algorithm LinSrch(list, val, n) 0 -- 0
2 { 0 -- 0
3 for i= 1 to n do 1 n+1 n+1(1)
4 If (list[i] == val) 1 n n (1)
5 return 1; 1 1 1
6 return 0; 1 1 1
7 } 0 -- 0
Step Count (best case) 3
Step Count (worst case) 2n+2
The worst-case running time is the longest running time for any input of size n. The worst-case running time of
an algorithm is an upper bound on the running time for any input. Knowing it gives us a guarantee that the
algorithm will never take any longer.
Worst-case analysis is appropriate for an algorithm whose response time is critical. On the other hand, an
algorithm is to be used many times on many different instances, it may be more important to know the average
execution time on instances of size n. it is usually harder to analyse the average behaviour of an algorithm than
to analyse its behaviour in the worst case.
3. Behaviour of Algorithm
If we have two algorithms with a complexity of c1n2+c2n and c3n respectively, then we know that the one with
complexity c3n will be faster than the one with complexity c1n2+c2n for sufficiently large values of n. for small
values of n, either algorithm could be faster (depending on c1, c2 and c3). No matter what the values of c1, c2 and
c3, there will be an n beyond which the algorithm with complexity c3n will be faster than the one with
complexity c1n2+c2n. This value of n will be called the break-even point. If the break-even point is zero, then
the algorithm with complexity c3n is always faster. The exact break-even point cannot be determined
analytically. The algorithms have to be run on computer in order to determine the break-even point.
n(input
log n 3n n2 n3 n4 2n
Size)
0 0 0 0 0 0 1
1 0 3 1 1 1 2
2 0.30 6 4 8 16 4
3 0.48 9 9 27 81 8
4 0.60 12 16 64 256 16
5 0.70 15 25 125 625 32
6 0.78 18 36 216 1296 64
7 0.85 21 49 343 2401 128
8 0.90 24 64 512 4096 256
9 0.95 27 81 729 6561 512
10 1 30 100 1000 10000 1024
To analyse the efficiency of an algorithm we are interested in analysing how the running time increases when
the input size increases. When two algorithms are compared with respect to their behaviour (logarithmic, linear,
quadratic, cubic, exponential etc) for the large input sizes, a very useful measure is called order of growth.
4. Structured programming
It is a programming paradigm aimed at improving the clarity, quality, and development time of a computer
program by making extensive use of the structured control flow constructs of selection (if/then/else) and
repetition (while and for), block structures, and subroutines in contrast to using simple tests and jumps such as
the go to statement, which can lead to "spaghetti code" that is potentially difficult to follow and maintain.
Subroutines; callable units such as procedures, functions, methods, or subprograms are used to allow a sequence
to be referred to by a single statement. Blocks are used to enable groups of statements to be treated as if they
were one statement. Block-structured languages have a syntax for enclosing structures in some formal way.
A programming approach is nothing but way of solving the computational problem. Sometimes it is easy to
break a large problem into small pieces, compute those small pieces and add them up to get the solution
• Top-down approach:
1. Take the whole problem and split it into two or more parts.
2. If these parts turn out to be too big to be solved as a whole, split them further and find solutions to
those sub-parts.
3. Merge solutions according to the sub-problem hierarchy thus created after all parts have been
successfully solved.
It emphasizes planning and a complete understanding of the system. It is inherent that no coding can
begin until a sufficient level of detail has been reached in the design of at least some part of the system.
This, however, delays testing of the ultimate functional units of a system until significant design is
complete
• Bottom-up approach:
1. Breaking the problem into smallest possible (and practical) parts.
2. Finding solutions to these small sub-problems.
3. Merging the solutions, you get iteratively (again and again) till you have merged all of them to get
the final solution to the "big" problem.
It emphasizes coding and early testing, which can begin as soon as the first module has been specified.
his approach, however, runs the risk that modules may be coded without having a clear idea of how they
link to other parts of the system, and that such linking may not be as easy as first thought. Re-usability
of code is one of the main benefits of the bottom-up approach.
The main difference in approaches is splitting versus merging. You either start big and split "down" as required
or start with the smallest and merge your way "up" to the final solution.
Function factorial(n)
if (n == 1)
return(1);
else
return(n*factorial(n-1))
2. Bottom=up approach:
1!=1 Function fact_iter(n)
2!=2*1!=2 fact=1;
. for i=1 to n do
. fact=fact*i;
.
return fact;
(n-1)!= (n-1)* (n-2)!
n!=n* (n-1)!
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
Exercise:
1. Write down the differences between top=down and bottom-up approaches
2. To check whether string is palindrome or not
3.
4. Arrays
The array data type is the simplest structured data type. It is a collection of elements of the same type (i.e.
homogeneous collection). Any element in array is access directly and randomly. Each element in array is
having a fixed position in collection known as its Index or subscript In order to denote an individual element,
the name of the array is augmented by the index. This index is to be a positive integer. Two fundamental
operations store and retrieve are defined on array.
Let A is an array then the elements of A are denoted by subscript notation
a0, a1, a2, ……., an-1
or, by the parenthesis notation
A(0), A(1), A(2), A(3),……., A(n-1)
or, by the bracket notation
A[0], A[1], A[2], A[3],…….., A[n-1]
Regardless of the notation, the number K in A[K] is called a subscript and A[K] is called a subscripted
variable. Linear arrays are called one-dimensional arrays because each element in such array is referenced by
one subscript. A two-dimensional array is a collection of similar data elements where each element is
referenced by two subscripts. Such arrays are called matrices in mathematics, and tables in business
applications.
4.1 Representation of Arrays in Memory
Memory may be regarded as 1-dimensional array with words numbered from 1 to
Memory cells m. Here main concern is representation of n-dimensional arrays in a 1-
1F00 dimensional memory. Also it is necessary to determine the amount of memory
1F01 space to be reserved for a particular array. The elements in array are stored in
1F02 Word successive words (or cells) of contiguous memory block allocated to array by
1F03
1F04 operating system. Assuming that each array element requires only one word of
1F05
1F06 n
1F07 (u
i 1
i li 1)
1F08
1F09 memory, the number of word needed is number of elements in the array. If an
array is declared A[l1..u1, l2..u2, ........, ln..un] then number of elements is
Address Where li & ui are lower and upper bounds for ith dimension. Some languages like
Pascal permit to mention upper and lower bound for each dimension.
For Example: in array A [4..5, 2..4, 1..2, 3..4],
Number of elements = (5-4+1) * (4-2+1) * (2-1+1) * (4-3+1) =2*3*2*2= 24
But in languages like C, lower bound is always zero hence upper bound represents the number of elements in
that dimension.
Normally size of word of memory express in bits (or Byte) depends on the hardware and operation system. Let
us assume that w is a size of word of memory. If each element of array takes more than one word of memory
then total amount of memory require in bits (or Byte) to store array is
= w * (No of words per element) * (No of Elements)
For example if one assume that word is of one byte (i.e. w=1) then (for typical C compiler)
One-dimensional array is directly mapped with one-dimensional memory as shown in Figure. The number of
words assigned to each element in array depends on the data type of array.
Consider the following prototype of declaration
data_type array_name[SIZE];
Let is a base address or starting address of an array and s is a size-of data_type i.e. number of words of
memory required to store element of data type. The address of ith element is + s*i.
In C programming, name of an array always stores the base address that is also address of 0 th element in array.
Consider declaration int a[5], if base address () is 1F00 then array name a stores 1F00 that is a address of 0th
element i.e. &a[0] (& is address operator). The address of 2nd element is + s*i = 1F00 + 2*2 = 1F04.
Such arithmetic (+, -, * and /) operations on array are known as address arithmetic operations and they are
different from the arithmetic operations performed on values.
4..3 Representation of Two-Dimensional Arrays in Memory
In 2–dimensional array elements in array are arranged in rows and columns. Each element in array is identified
by unique row and column numbers. There are two possible ways to map 2-dimentional array in 1-dimensional
memory
a. Row-major order
b. Column-major order
In Row-major order all elements of 1st row are mapped in memory first, and then all elements of 2nd row and so
on. In Column-major order all elements of 1st column are mapped in memory first, then all elements of 2nd
column and so on. In implementation of row-major order all the elements of 0th row are followed by all
elements of 1st row and so on. Another synonym for row-major order is lexicographic order. In implementation
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
of column-major order all the elements of 0th column are followed by all elements of 1st column and so on.
Representation of Two-Dimensional Array in Memory in row-major order and column-major order is shown in
Figure.
It is important to derive formula to translate 2-dimensional location (in the form row, col) of element to 1-
dimensional location (in the form of index). A 2-dimensional array A[l1..u1][l2..u2] may be interpreted as u1 rows
with each row consisting of u2 elements. In row-major order, the formula to find location of 2-D element A[i][j]
in 1-D is
m = (i-l1) *(u2-l2+1) + (j-l2)
If lower bounds of both the dimensions are 0 then formula to calculate index in 1-D becomes
m = (i) *(u2+1) + (j)
For ex:
for a[2][3] the value of k= 2*(3+1)+3= 11
for a[3][1] the value of k= 3*(3+1)+1= 13
To calculate address of memory location we can use formula + s*m.
Similarly, for 3-D array formula to find out location for element A[i][j][k] is
m = (i-l1) *(u2-l2+1) (u3-l3+1) + (j-l2)(u3-l3+1) + (k-l3)
Generalizing on preceding discussion, the addressing formula for any element A[i 1[i2]....[in] in n-dimensional
array declared as A[l1..u1][l2..u2]...[ln..un] is
m = (i1-l1) *(u2-l2+1) (u3-l3+1) ..... (un-ln+1)
+ (i2-l2)(u3-l3+1) (u4-l4+1) ..... (un-ln+1)
+ (i3-l3)(u4-l4+1) (u5-l5+1) ..... (un-ln+1)
.
.
.
+(in-1-ln-1)(un-ln+1)
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
+ (in-ln)
Note: row-major order storage is used in C & Java and column-major order storage is used in FORTRAN.
Exercise:
1. Obtain an addressing formula for the element A[i1][i2]....[in] in a n-dimensional array declared as
A[l1..u1][l2..u2]...[ln..un] . Assuming a column-major order representation of an array
2. Representation of lower triangular matrix and upper triangular matrix into a single matrix
3. A matrix a is said to have a saddle point if some entry a[I][j] is the smallest value in the i th row and the
largest value in the jth column. A matrix may have more than one saddle point. Write a function to print
all saddle points in given matrix.
4. A sparse matrix is a matrix that contains more zero elements than non-zero elements. A sparse matrix is
represented by an array of triplets. Each triplet consists of a row, a column, and a value corresponding to
one nonzero element in the sparse matrix. Thus, an array of size N × 3 for each sparse matrix, where N
is the number of non-zero elements in the array. The first row of each sparse matrix contains the number
of rows, columns, and nonzero elements in the matrix. Write a function to convert sparse matrix into 3-
tuple matrix.
5.
the list is stored at array cell with index i. The head of the list is always at position 0. This makes random access
to any element in the list quite easy. Given some position in the list, the value of the element in that position can
be accessed directly in Θ(1) time.
List stored in array with List stored in array with
increasing order of index decreasing order of index
Order of Order of
element element element element
0 a0 0 0
1 a1 1 1
2 a2 2 2 an-1 n-1
3 3
an-1 n-1 a2 2
a1 1
SIZE- SIZE- a0 0
1 1
0 5 0 5 0 5 0 5
1 9 1 9 1 9 1 9
2 15 2 2 10 2 10
Insert 10 at Insert 11 at
3 17 3 15 3 15 3 15
2nd position end of the list
4 7 4 17 4 17 4 17
5 7 5 7 5 7
6 11
SIZE- SIZE- SIZE- SIZE-
1 1 1 1
Initial List Shift elements Insert 10 at 2nd position Append 11
Appending element to the list does not need any shifting hence time complexity is O(1). Insertion at the
beginning or somewhere in a middle of the list requires shifting of elements. In worst case (i.e. insertion at 0th
position) needs shifting of n elements hence time complexity is O(n).
Following is a code for insertion operation. Function Inslist insert element (i.e. data) at position p in the list.
# define SIZE 10
int List[SIZE], n= 0;
{
int i;
if (n == SIZE)
printf("\n Error: List is full -- Overflow \n");
else
if (p >= n) // append element
{
List[n]=data;
n++;
}
else
{
for(i=n; i>p; i--)
List[i]=List[i-1];
List[p]=data;
n++;
}
}
0 5 0 5 0 5 0 5
1 9 1 9 1 9 1 9
2 15 Delete element 2 2 17 Delete element 2 17
3 17 at 2nd position 3 17 3 7 at 7th position 3 7
4 7 4 7 4 11 4 11
5 11 5 11
Deletion of last element from the list does not need any shifting hence time complexity is O(1). Deletion of
element at the beginning or somewhere in the middle the list requires shifting of elements. In worst case (i.e.
deletion element at 0th position) needs shifting of n-1 elements hence time complexity is O(n).
Following is a code for deletion operation. Function Dellist delete element at position p in the list.
void Dellist(int p)
{
int i;
if(n == 0)
printf(" \n Error: List is empty -- Underflow\n ");
else
if( p >= n)
printf("\nError: Position is out of List\n");
else
{
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
if(n == 0)
printf(" \n List is empty\n ");
else
{
printf("Traversing in List\n");
for(i=0; i< n; i++)
printf("%d\t", List[i]);
printf("\n");
}
}
Exercise:
1. Given two sorted lists, L1 and L2, write a function to compute L1 L2.
2. Given two sorted lists, L1 and L2, write a function to compute L1 L2.
3. Write a function to delete duplicate elements in a list.
4. Consider an array of size n. Write an algorithm to shift all items in the array k places cyclically counter-
clockwise. You are allowed to use only one extra location to implement swapping of items.
5. Let L be a linear list represented in the array. Write a function to make an in-place reversal of the order
of elements in L. The only additional space available is a simple variable. How much time does a
function take to accomplish the reversal?
6. Given an array of n integers and a value v, describe an algorithm to find whether there are two values x
and y in the array with sum v.
7. Find a duplicate element in a limited range array
8. Find largest sub-array formed by consecutive integers in an array
9. Find maximum length sub-array having given sum in an array
10. Find maximum product of two integers in an array
11. Rearrange the array with alternate high and low elements
12. Find minimum sum subarray of given size k
13. Find subarray having given sum in given array of integers
6. Sorting.
A sorting algorithm is an algorithm that put elements of a list (or collection) in a certain order. The most-used
orders are numerical order and lexicographical order. More formally, the output of sorting must have two
properties:
1. The output is a permutation or reordering, of the input.
2. The elements in output are in certain order based on set of well-defined rules.
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
Initially, the sorted sub-list is empty, and the unsorted sub-list is the entire input list.
Sorted sub-list Unsorted sub-list
1st 2nd 3rd 4th
17 29 14 26
The algorithm proceeds by finding the smallest (or largest, depending on sorting order)
element in the unsorted sub-list, exchanging it with the leftmost unsorted element (putting it
in sorted order), and moving the sub-list boundaries one element to the right.
Sorted Unsorted
sub-list sub-list
1st 2nd 3rd 4th 1st 2nd 3rd 4th
Pass 1: Smallest element instlist isrd14
Swap elements at 1 & 3 14 29 17 26 14 29 17 26
places
Sorted Unsorted
sub-list sub-list
Smallest element in
1st 2nd 3rd 4th 1st 2nd 3rd 4th
Pass 2: unsorted sub-list is 17
Swap elements at 2nd & 3rd 14 17 29 26 14 17 29 26
places
Sorted Unsorted
Smallest element in sub-list sub-list
Pass 3: unsorted sub-list is 26 1st 2nd 3rd 4th 1st 2nd 3rd 4th
Swap elements at 3rd & 4th
places 14 17 26 29 14 17 26 29
On its first pass the algorithm finds the smallest element in the input list and then swaps that
element with the element at 1st position in the input list. This process brings the 1st element in
the sorted sub-list and unsorted sub-list begins from 2nd element onward. In 2nd pass
algorithm finds smallest element in the unsorted sub-list and swaps with the 2nd element in
the input list. Now first two elements are in sorted sub-list and unsorted sub-list begins from
3rd element onward. This process continues until the entire list is sorted. At the end of sorting
all elements in input list belongs to sorted sub-list and unsorted sub-list is empty.
Sorted sub- Unsorted sub-list
1st
list 2nd 3rd 4th
14 17 26 29
There is no need to do pass-4 as after pass-3 the largest element in list (of 4 elements) is
placed at last position. At the end of sorting all elements in input list are now part of sorted
sub-list and unsorted sub-list is empty. Selection sort is an in-place comparison sort.
first position. Finding the next smallest element requires scanning of remaining n − 1 elements (n-2
comparisons) and so on, for (n − 1) + (n − 2) + ... + 2 + 1 = n(n − 1) / 2 ∈ Θ(n2) comparisons (by arithmetic
progression). Each of these scans requires one swap for n − 1 elements (the final element is already in place).
For analysis best case is when input list is already sorted one and worst case is input list is sorted in reverse
order. Ideally if ith smallest element in list is at ith position in list then there is no need to do swap. This will
result in 0 swaps in best case. In both the cases there are (n(n-1))/2 comparisons.
Table 6.2. Analysis of selection sort algorithm
Statement Best Case Worst case
For i=1 to n-1 do n n
(n(n − 1)) / 2 (n(n − 1)) / 2
For j=i to n do
(Size of unsorted list) (Size of unsorted list)
Find smallest element and its
(n(n − 1)) / 2 (n(n − 1)) / 2
position in the unsorted sub-list
Swap smallest element with the
0 n
element at ith position in the list
n2 n n2 n
In best case time complexity is n 0
2 2
n 2 O(n 2 )
n2 n n2 n
In worst case time complexity is n 2
2
n
n 2 n O(n 2 )
It is observed that algorithm is not taking any benefit of order of elements in input list hence in best & worst
cases the time complexity is O(n2).
Program:
#define SIZE 10
SelectionSort(int list[], int n)
{
int small, pos, i,j;
for (i=0; i<n-1; i++)
{
/* find the smallest element in the unsorted list */
small=list[i];
pos=i;
for(j=i+1; j<n; j++)
{
if(list[j] < small)
{
small = list[j];
pos=j;
}
}
/* swap smallest element & element at ith position in the list*/
if (pos != i)
{
list[pos]=list[i];
list[i]=small;
}
}
}
main()
{
int list[SIZE],n,i;
Selection sort is noted for its simplicity, and it has performance advantages over more complicated algorithms
in certain situations, particularly where auxiliary memory is limited. The selection sort algorithm has time
complexity O(n2) in the best, average, and worst cases hence making it inefficient on large lists.
Exercise:
1. Show the passes-wise snap shot in sorting the following lists using selection sort:
i. (3, 5, 1, 2, 8) ii. (8, 5, 3, 2, 1) iii. (1, 3, 5, 2, 8) iv. (1, 2, 3, 5, 8)
2. If all the elements in an input list are equal for example {1,1,1,1,1,1}, What would be the running time
of the selection sort algorithm?
3. A bidirectional variant of selection sort, called cocktail sort, is an algorithm which finds both the
minimum and maximum values in the list in every pass. This reduces the number of scans of the list by a
factor of 2, eliminating some loop overhead but not actually decreasing the number of comparisons or
swaps. Write a program for cocktail sort.
4. Selection sort can be implemented as a stable sort. If, rather than swapping, the minimum element is
inserted into the ith position in sorted sub-list (that is, all intervening items moved down), the algorithm
is stable. However, this modification needs list to be implemented using linked list. Write a program for
this implementation.
5. In the bingo sort variant, elements are ordered by repeatedly looking through the remaining values to
find the greatest value and moving all elements with that value to sorted sub-list. This is an efficient
variant if there are many duplicate values. Indeed, selection sort does one pass through the remaining
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
elements for each element moved. Bingo sort does one pass for each value (not element): after an initial
pass to find the biggest value, the next passes can move every element with that value to sorted sub-list
while finding the next value in unsorted sub-list. Write a program for bingo sort invariant of selection
sort.
6. Give an example to show that the Selection Sort is not stable
Pass-2:
14 14 14 Sorted
29 29 17 < 29 17 List
17 17 Swap them
26 not < 17 29 Unsorted
No Swap List
26 26 26
At the end of pass-2 the 2nd smallest element in the list (17) is bubbled up to the 2nd position
from top. Also it is observed that the input list is divided into two sub-lists namely sorted
sub-list and unsorted sub-list.
Pass-3:
14 14 14
Sorted No more processing is
17 17 List required as the largest 17 Sorted
element is automatically 26 List
29 26 < 29 26
Swap them shifted to bottom
26 29 Unsorted 29
List Unsorted
List
At the end of each pass one element is bubbled out and is added to sorted sub-list. At the end
of sorting all elements in input list are now part of sorted sub-list and unsorted sub-list is
empty.
Stop
n2 n n2 n
In best case time complexity is n 0
2 2
n 2 O(n 2 )
n2 n n2 n n2 n
n
In worst case time complexity is 2 2 2
2
3n n
O(n 2 )
2 2
Program:
bubbleSort(int list[], int n)
{
int temp,i,j;
for (i=0; i<n-1; i++)
for(j=0; j<n-i; j++)
if(list[j] < list[j+1])
{
temp=list[j];
list[j] = list[j+1];
list[j+1] = temp;
}
Bubble sort is not offering any advantage over selection sort in terms of number of
comparisons. The time complexity in the best, average, and worst cases for both the sorting
methods is O(n2). Both the methods have some mechanism to reduce number of swaps. It is
interesting see that bubble sort is able to utilize benefit of order of elements in input list or
unsorted sub-list.
Pass-1:
Sorted
14 14 14 17 not < 14 14 List
17 17 26 not < 17 17 No Swap 17 Unsorted
26 No Swap List
26 29 not < 26 26 26
No Swap
29 29 29 29
Bubble sort uses comparisons to operate on elements, it is a comparison sort as well as exchange sort. When an
input list or unsorted sub-list is already sorted, bubble sort will pass through the unsorted sub-list once, and find
that it does not need to swap any elements. It successfully terminates sorting with the declaration that now input
list is completely sorted.
The information about order of elements in unsorted sub-list helps bubble sort to reduce number of
comparisons. The modified program for bubble sort is shown below
Program:
bubbleSort(int list[], int n)
{
int temp,i,j,swaped=1;
for (i=0; i<n-1 && swaped; i++)
{
swaped=0;
for(j=0; j<n-i; j++)
if(list[j] < list[j+1])
{
temp=list[j];
list[j] = list[j+1];
list[j+1] = temp;
swaped=1;
}
}
In above programe, variable swaped is used as a Boolean flag. It is set to 0 in the beginning of every pass. If
swap occurs, then swaped is set to 1. Before commencement of new pass it is checked that is swap occur in
previous pass? If yes then start new pass else terminate the sorting. Minimum 1 pass is insured by setting
swaped=1 in the beginning.
.
In best-case when sorted list is given to bubble sort, in 1st pass it will do n-1 comparisons without any swap and
sorting will be terminated after 1st pass. Hence in best-case the time complexity of bubble sort is O(n). It will
also use considerably less time than О(n2) if the elements in the unsorted list are not too far from their sorted
places. Bubble sort is also efficient when one random element needs to be sorted into a sorted list, provided that
new element is placed at the beginning and not at the end. When placed at the beginning, it will simply bubble
up to the correct place, and the second iteration through the list will generate 0 swaps, ending the sort. But if the
random element is placed at the end, bubble sort loses its efficiency because each element greater than it must
bubble all the way up to the top. Comparison of bubble sort with selection sort is shown in Table .
Bubble sort is not very efficient sorting algorithm when n is large. The only significant advantage that bubble
sort has over most other implementations, is that the ability to detect that the list is sorted, is efficiently built
into the algorithm.
Exercise:
1. Show the passes-wise snap shot in sorting the following lists using bubble sort:
i. (3, 5, 1, 2, 8) ii. (8, 5, 3, 2, 1) iii. (1, 3, 5, 2, 8) iv. (1, 2, 3, 5, 8)
2. If all the elements in an input list are equal for example {1,1,1,1,1,1}, What would be the running time
of the bubble sort algorithm?
3. Consider the following list of partially sorted numbers in decreasing order. The double bars represent the
sort marker. How many comparisons and swaps are needed to sort the next numbers using bubble sort?
[4, 8, 9 || 3, 2, 1]
4. Cocktail sort is a bi-directional bubble sort that goes from beginning to end, and then reverses itself,
going end to beginning. Write a program for cocktail sort.
5. Comb sort compares elements separated by large gaps, and can move window extremely quickly before
proceeding to smaller and smaller gaps to smooth out the list. Write a program for comb sort.
Initially, the sorted sub-list is empty or contain single element and the unsorted sub-list is the
remaining input list.
Sorted Unsorted
sub-list sub-list
Input list 1st 2nd 3rd 4th
before processing 14 29 17 26
Process of sorting starts with 2nd element as 1st element is by default element of sorted sub-
list (only element). 2nd element (also a 1st element in unsorted sub-list) is deleted from list. If
deleted element is greater than 1st element then shift 1st element at 2nd position and insert
deleted element at 1st place. Now 1st & 2nd elements form a sorted sub-list.
Pass 1: 29
1 st
2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th
14 17 26 14 17 26 29 14 17 26
Sorted Unsorted
Delete 29, now 29 > 14, Reached at the beginning of input list,
sub-list sub-list
hence shift 14 at 2nd place insert deleted value 29 at 1st place
Now delete 3rd element (1st element in unsorted sub-list) from the list. Now compare deleted
element with 2nd element (also last element in sorted sub-list), if it is small then insert it in list
(at 3rd position) as 3rd element in sorted sub-list. Otherwise shift 2nd element at 3rd place. Now
compare deleted element with 1st element if it greater then shift 1st element at 2nd place &
insert deleted element at 1st place. Otherwise (i.e. deleted element is less than 1st element)
then insert deleted element at 2nd place. This process had placed 3rd element in sorted sub-list
at appropriate position. 1st, 2nd & 3rd elements are part of sorted sub-list and unsorted sub-list
begins from 4th element.
Pass 2: 17 17
1 st
2 nd
3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th
29 14 26 29 14 26 29 17 14 26
17 < 29 Sorted Unsorted
Insert deleted value 17 at 2nd place sub-list sub-list
Delete 17, Since 17 > 14,
hence shift 14 at 3rd place
Continue process of placing element of unsorted sub-list (one by one into) at appropriate
place in sorted sub-list till unsorted sun-list is empty.
Pass 3: 26 26 26
1 st
2nd
3rd
4 th
1 st
2nd
3rd
4 th 1st 2nd 3rd 4th
29 17 14 29 17 14 29 17 14
26 > 17 26 < 29
Delete 26, Since 26 > 14 Shift 17 at 3rd place Insert deleted value 26 at 2nd place
hence shift 14 at 4th place
n 2 2n 2 O ( n 2 )
Program:
insertionSort(int list[], int n)
{
int dval,i,j;
break;
if(j < 0)
list[0]=dval;
else
list[j+1]=dval;
}
}
In code, variable dval stores the value of 1st element in unsorted sub-list that is deleted from unsorted sub-list to
place it at appropriate position in sorted sub-list.
Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted lists. It is
Adaptive (i.e., efficient) for data sets that are already substantially sorted. Comparison of insertion sort with
bubble sort and selection sort is shown in Table.
Table. Comparison of insertion sort with other sorting methods
Insertion sort Selection sort Bubble sort
Parameter
Best-case Worst-case Best-case Worst-case Best-case Worst-case
No. of comparisons O(n) O(n2) O(n2) O(n2) O(n) O(n2)
No. of swaps and/or shifts 0 O(n2) O(1) O(n2) 0 O(n2)
Memory usage O(1) O(1) O(1) O(1) O(1) O(1)
recursion No No No
stability Yes No Yes
Unlike selection sort and bubble sort which rely primarily on comparing and swapping, the insertion sort built
sorted list by identifying an element that is out of order relative to the elements around it. It removes that
element from the list, shift elements up one place and then place the removed element in its correct
location. The insertion sort is an in-place sort as no additional memory is required. It is also a stable sort that
retains the original ordering of elements with equal keys.
Exercise:
1. Show the pass-wise snap shots in sorting the following lists using bubble sort:
i. (3, 5, 1, 2, 8) ii. (8, 5, 3, 2, 1) iii. (1, 3, 5, 2, 8) iv. (1, 2, 3, 5, 8)
2. Consider the following list of partially sorted numbers in increasing order. The double bars represent the
sort marker. How many comparisons and swaps are needed to sort the next number?
[1, 3, 4, 8, 9 || 5, 2]
3. Consider the following list of partially sorted numbers. The double bars represent the sort marker. How
many comparisons and swaps are needed to sort the next number?
[1, 3, 4, 5, 8, 9 || 2]
4. If all the elements in an input list are equal for example {1, 1, 1, 1, 1, 1}, what would be the running
time of the Insertion Algorithm?
5. What are the correct intermediate steps of the data set (15, 20, 10, 18) when it is being sorted with the
Insertion sort?
A. 15,20,10,18 -- 10,15,20,18 -- 10,15,18,20 -- 10,15,18,20
B. 15,18,10,20 -- 10,18,15,20 -- 10,15,18,20 -- 10,15,18,20
C. 15,10,20,18 -- 15,10,18,20 -- 10,15,18,20
D. 10, 20, 15, 18 -- 10,15,20,18 -- 10,15,18,20
6. Explain why the Insertion Sort is stable.
7. Write an algorithm program to remove all duplicates from an list array using the logic: Sort the list and
then for each element in the array, look at its next neighbor to decide whether it is present more than
once. Is this a faster algorithm?
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
8. Why does insertion sort perform significantly better than selection sort if an array is already sorted?
9. Generate a random list of integers. Show how this list is sorted by the following algorithms:
• bubble sort
• selection sort
• insertion sort
10. Consider the following list of integers: [1,2,3,4,5,6,7,8,9,10]. Show how this list is sorted by the
following algorithms:
• bubble sort
• selection sort
• insertion sort
11. Consider the following list of integers: [10,9,8,7,6,5,4,3,2,1]. Show how this list is sorted by the
following algorithms:
• bubble sort
• selection sort
• insertion sort
12.
7. Searching
A search algorithm is an algorithm that finds an element with specified properties among a collection of
elements. The fundamental problem in searching is to retrieve the elements from collection associated with a
given search value so that the information in the element is made available for processing. Search algorithms
works on collection whose elements may or may not be arranged in particular order (i.e. a structure of the
search space is known or unknown). Searching in sorted or unsorted collection is a common task. For example
• A dictionary is a sorted list of word definitions. Given a word, one can find its definition.
• A telephone book is a sorted list of people's names, addresses, and telephone numbers. Knowing
someone's name allows one to quickly find their telephone number and address.
The search algorithms are divided on the use of information about search space as
• uninformed – not using information about search space and
• Informed - apply knowledge about the structure of the search space to reduce the searching time.
The search begin from the 1st element in a list and it check whether element match with the value to be search or
not. If match found then search successfully terminate otherwise take next element in the list. If none of the
element in the list does not match with the value and end of a list is encounter then terminate search
unsuccessfully.
• The worst case is when value is not in the list or it is the last element in a list, needs n comparisons hence time
complexity O(n).
Program:
#define SIZE 10
linSearch( int list[], int n, int value)
{
int i;
for (i=0; i<n; i++)
if(list[i] == value)
return(i);
return(-1);
}
main()
{
int list[SIZE]= {23, 60, 5, 45, 70}, n=5,value, i;
The linSearch returns index of ith element if value is matched with ith element in the list otherwise it returns -1 to
give "not found" indication. Run of a linSearch to check whether value 45 is in list or not is shown in Table.
The simplicity of the linear search makes it very effective algorithm if just a few elements are to be searched. It
is less trouble than more complex methods that require preparation such as sorting the list to be searched or
more complex data structures, especially when entries may be subject to frequent revision.
There are various heuristic techniques which can be used to speed up linear search algorithms. For example, the
most frequently accessed elements can be placed towards the beginning of the list, or if information about the
relative frequency of access of an element is not available, this optimal arrangement can be approximated by
moving an element to the beginning of the list each time it is accessed. However none of these techniques will
improve the efficiency of the search procedure beyond O(n).
One of the advantages of linear search on unordered list is that it works efficiently on implementing lists that
change its size while an application is running. The average performance of linear search can be improved by
using it on an ordered list. In the case of no matching element, the search can terminate at the first element
which is greater (lesser) than the unmatched target element, rather than examining the entire list.
Exercise:
1. Show the snap shots of Linear Search on the following values to be search in the list (2, 13, 75, 7, 28,
3, 14). Also comment on number of comparisons made.
i. 2
ii. 14
iii. 7
iv. 15
2. Show the snap shots of Linear Search on the following values to be search in the list (2, 3, 7, 13, 14,
28, 75). Also comment on number of comparisons made.
i. 2
ii. 75
iii. 13
iv. 15
3. One of the heuristic techniques to speed up linear search algorithm is to use self-organizing algorithm
that place or move more frequently accessed elements towards the front of a list. If information about the
relative frequency of access of elements is not available then this optimal arrangement can be
approximated by bubbling a found element toward the front of the list. Element can be moved up in the
list just by swapping with the preceded element in the list. These swapping will slowly bubble out most
frequently accessed elements towards the front of the list. Hence future searches for those elements will
be executed more quickly. (Write a program to find the best way for selecting p most popular persons in
an unordered database containing entries for n different persons all over the world.)
4. Write a program to perform efficient linear search on sorted list of elements.
Example: Let (23, 36, 45, 51, 55, 57, 61, 70, 82) is a linear ordered list of elements sorted in ascending order.
Pass -1: Mid= (1+9)/2= 5 List [5] = 55 < 65 Search value 45 in left sub-list
Pass -2: Mid= (6+9)/2= 7 List [7] = 61 < 65 Search value 65 in right sub-list
It is observe that for 1st pass entire sorted list is available for search. In subsequent passes list for search is
reduce by half of its size in previous pass. The position of middle element is decided using beginning (lower
bound) and end (upper bound) of a search list.
value is not in the sorted list, the algorithm must continue iterating until the search list is empty. This needs at
most (lg n +1) iterations and lg n comparisons.
• The worst case is when element is not present or present at either ends of a sorted list. For worst case time
complexity of binary search is O(lg n). As compared to linear search, whose worst-case time complexity is
O(n), the binary search is substantially faster as n grows large.
• The best case for binary search is when value to be search is the middle element of the sorted list. It needs
only 1 comparison hence time complexity in best case is O(1).
Program:
binSearch(int list[], int n, int value)
{
int low=0, upper=n-1, mid;
Here low & upper are lower bound & upper bound for a search list. In each iteration either of these bounds is
redefined. The binSearch returns value of mid if element is in the list otherwise it returns -1 to give "not found"
indication.
The most straightforward realization that supports divide and conquer strategy is a recursive implementation of
binary search. It recursively searches value in the divided search sub-list defined after each comparison.
The algorithm for binary sort is recursive hence its recurrence relation is
T (n) T (n / 2) O(1) for n 1 with T (0) 0
If list contains 1 or more elements (n >=1) then calculate mid and check for match. If match not found then
proceed search either in left or right sub-list. This recurrence relation can be solved by using Substitution
method, Recursion-tree method, Iteration method or Masters Method. Recursion goes up to lg n times. Hence
recursive tree has depth lg n. In each recursive call calculation of mid and matching element at mid with value
take O(1) time.
log2 n
T ( n) O(1) O(1) O(1) O(1) ......... lg n times O(1). lg n
i 0
T (n) (lg n)
• The worst case is when element is not present or present at either ends of a sorted list. Since there are lg n
recursive calls hence the time complexity of binary search is O(lg n).
• The best case for recursive binary search is when value to be search is the middle element of a sorted list. In
1st recursive call after 1 comparison, recursion ends with success. Hence time complexity in best case is O(1).
The bounds low & upper are set in each recursive call. The RbinSearch returns value of mid if element is in the
list otherwise it returns -1 to give "not found" indication. In some languages tail recursion is not eliminated and
the recursive version requires more stack space.
Exercise:
1. Show the snap shots of binary search on the following values to be search in the list (2, 3, 7, 13, 14, 28,
75). Also compare the number of comparisons made against the informed linear search.
i. 2
ii. 75
iii. 14
iv. 15
v. 90
By Dr. M. M. Raghuwanshi, YCCE,Nagpur
Notes On Algorithms and Data Structures
2. Write a program to search value in a list using binary search where elements in the list are arranged in
descending order.
3. The recursive binary search algorithm is used to search value in the sorted list [2, 3, 7, 8, 11, 12, 14, 25,
37, 44]. Which option shows the correct sequence of comparisons used to find the value 8?
a) 11, 7, 8
b) 11, 3, 7, 8
c) 12, 7, 8
d) 11, 2, 7, 8
4. Programming implementations using fixed-width integers with modular arithmetic need to account for
the possibility of overflow. One frequently-used technique for this is to compute mid, so that two smaller
numbers are ultimately added: mid = low + ((high - low) / 2). Write a program to implement it.
5. The elements of the list are not necessarily all unique. If one searches for a value that occurs multiple
times in the list, the index returned will be of the first-encountered equal element, and this will not
necessarily be that of the first, last, or middle element of the run of equal-value elements but will depend
on the positions of the values. To find all equal elements an upward and downward, linear search can be
carried out from the initial result, stopping each search when the element is no longer equal. Write a
program to implement it.
6. Write a program to search value in a list using binary search where elements in the list are negative
values and are arranged in descending order (e.g. −1, −2, −4, −8, −16, −12, −14).