Data Structure Notes
Data Structure Notes
Unit I
Unit II
Stack: Definition-Representation-Operations-Applications-Queues:Definition-
Representation-Various queue structures-Applications- Trees-Definition-Representation
–Operations-Types.
Unit III
Unit IV
Text book:
Classic data structures by D.Samantha
Fundamentals of computer algorithm by Ellis Horowitz, Sartaj sahini, Sanguthevar
Rajasekaran
Data:
Data means value or set of values. Following are the some examples of data
(i) 34
(ii) 12/01/1965
(iii)ISBN 81-203-0000-0
(iv) Pascal
Values are represented in different ways. Each value or collection of all such values is
termed as data.
Entity:
An entity Data Entity is one the components defined in a logical data model. A logical
data model is a representation of all of the organizations data which are being organized
in terms of data management technology.
A relationship captures how two or more entities are related to one another. Relationships
can be thought of as verbs, linking two or more nouns.
To help make this clearer, here is a specification for an Abstract Data Type
called STACK:
Make empty stack (): manufactures an empty stack.
Is empty stack(s): s is a stack. Returns TRUE if and only if it is empty.
Push(x,s): x is an integer, sis a stack. Returns a non-empty stack which can
Be used with top and pop. is empty stack(push(x,s))=FALSE.
Top: s is a non-empty stack; returns an integer. Top(x,s)) (push= x.
Pop: s is a non-empty stack; returns a stack. pop(push(x,s)) =s.3
ADT circle
Data
r³0 (radius);
Operations
Costructor:
Initial values:radius of circle;
Process: Assign initial value of r;
Area:
Input: none;
1. Domain (d):this is the range of values that the data may have. This domain also
termed as data object.
2. Fnction (f): this is the set of operations which mazy legally applied to elements of
data object. This implies that for data structure we must specify the set of operations.
3. Aioms (a); this is the set of rules with which the different operation belongs to
function actually can be implemented.
In the linear structure all the elements are form a sequence or maintain a linear ordering.
But in non linear data structure no such sequence and all the elements are distributed.
The heap sort algorithm works with a heap tree and a result list. They can be stored
efficiently in one array.
Heap tree and result list are stored together in one array. The array has as many elements
as there are keys to sort. No additional storage space for pointers or intermediate results is
needed.
DATA STRUCTURES AND COMPUTER ALGORITHMS
Providing Input Data
A program that wants to sort data with the heap sort algorithm provides it as an array.
The array appears to the algorithm as a heap tree containing all keys. In the notation of
figure 1 the green heap tree occupies the whole array. That is why the heap tree needs not
be built but exists instantly, complete and completely filled.
The function heap sort performs the complete algorithm. Within the algorithm the
function move max moves keys from the heap tree to the result list. This removes one
key from the heap tree which makes room in the array at the border between heap tree
and result list. The result list grows using this free node and stores the new element there.
After the algorithm run the result list occupies the complete array. There is no need to
copy this list, because the array can be used directly by the program.
For a node stored at index k in the array, its left child is stored at index 2*k and its right
child at index 2*k+1 (figure 2). It follows that the parent node is at index (k div 2).
Arrays
Definition
i) An array is simply a number of memory locations, each of which can store an item of
data of the same data type and which are all referenced through the same variable
name. Ivor Horton.
ii) Array may be defined abstractly as finite order set of homogeneous elements. So we
can say that there are finite numbers of elements in an array and all the elements are of
same data type. Also array elements are ordered i.e. we can access a specific array
element by an index.
int age[10];
Here array_type declares base type of array which is the type of each element in array. In
our example array_type is int and its name is Age. Size of the array is defined by
array_size i.e. 10. We can access array elements by index, and first item in array is at
index 0. First element of array is called lower bound and its always 0. Highest element in
array is called upper bound.
In C programming language upper and lower bounds cannot be changed during the
execution of the program, so array length can be set only when the program in written.
Age 0 Age 1 Age 2 Age 3 Age 4 Age 5 Age 6 Age 7 Age 8 Age 9
30 32 54 32 26 29 23 43 34 5
Note: One good practice is to declare array length as a constant identifier. This will
minimize the required work to change the array size during program development.
#define NUM_EMPLOEE 10
int Age[NUM_EMPLOYEE];
Initialization of array is very simple in c programming. There are two ways you can
initialise arrays.
Look at the following C code which demonstrates the declaration and initialisation of an
array.
Array can also be initialized in a ways that array size is omitted, in such case compiler
automatically allocates memory to array.
The array which is used to represent and store data in a linear form is called as 'single or
one dimensional array.'
Syntax:
Example:
In above example, a is an array of type integer which has storage size of 3 elements. The
total size would be 3 * 2 = 6 bytes.
Memory allocation for an array:
#include <stdio.h>
#include <conio.h>
void main()
{
int a[3], i;;
clrscr();
printf("\n\t Enter three numbers : ");
for(i=0; i<3; i++)
{
scanf("%d", &a[i]); // read array
}
printf("\n\n\t Numbers are : ");
for(i=0; i<3; i++)
{
printf("\t %d", a[i]); // print array
}
Output :
Numbers are: 9 4 6_
Operations on array:
Various operations that can be performed on an array are traversing, sorting, searching,
insertion, deletion, merging.
Traversing
This operation is used visiting all elements in an array. A simplified algorithm is below:
Description: Here A is a linear array with lower bound LB and upper bound UB. This
algorithm traverses
Array A and applies the operation PROCESS to each element of the array.
1. Repeat For I = LB to UB
3. Exit
Explanation: Here A is a linear array stored in memory with lower bound LB and upper
bound UB.
Note: The operation PROCESS in the traversal algorithm may use certain variables
which must be initialized before PROCESS is applied to any of the elements in the array.
Therefore, the algorithm may need to be preceded by such an initialization step.
Sorting:
Theory
Starting near the top of the array in Figure 2-1(a), we extract the 3. Then the above
elements are shifted down until we find the correct place to insert the 3. This process
repeats in Figure 2-1(b) with the next number. Finally, in Figure 2-1(c), we complete the
sort by inserting 2 in the correct place.
Assuming there are n elements in the array, we must index through n - 1 entries. For each
entry, we may need to examine and shift up to n - 1 other entries, resulting in a O(n2)
algorithm. The insertion sort is an in-place sort. That is, we sort the array in-place. No
extra memory is required. The insertion sort is also a stable sort. Stable sorts retain the
original ordering of keys when identical keys are present in the input data.
Algorithm:
For j = p to N-1
Take the j-th element and find a place for it among the
first j sorted elements
a[j] = a[j-1];
a[j] = tmp;
}
Insertion:
Insertion sort is a simple sorting algorithm: a comparison sort in which the sorted array
(or list) is built one entry at a time. It is much less efficient on large lists than more
advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort
provides several advantages:
Simple implementation
Efficient for (quite) small data sets
Adaptive (i.e., efficient) for data sets that are already substantially sorted:
thetime complexity is O(n + d), where d is the number of inversions
More efficient in practice than most other simple quadratic (i.e., O(n2)) algorithms
such as selection sort or bubble sort; the best case (nearly sorted input) is O(n)
Stable; i.e., does not change the relative order of elements with equal keys
In-place; i.e., only requires a constant amount O(1) of additional memory space
Online; i.e., can sort a list as it receives it
insertionSort(array A)
Arrays are used to implement mathematical vectors and matrices, as well as other
kinds of rectangular tables. Many databases, small and large, consist of (or
include) one-dimensional arrays whose elements are records.
Arrays are used to implement other data structures, such as heaps, hash
tables, deques, queues, stacks, strings, and VLists.
Syntax:
<data-type> <array_nm> [row_subscript][column-subscript];
Example:
Int a[3] [3];
In above example, a is an array of type integer which has storage size of 3 * 3 matrix.
The total size would be 3 * 3 * 2 = 18 bytes.
It is also called as 'multidimensional array.'
MEMORY ALLOCATION:
#include <stdio.h>
#include <conio.h>
void main()
{
int a[3][3], i, j;
clrscr();
printf("\n\t Enter matrix of 3*3 : ");
for(i=0; i<3; i++)
{
for(j=0; j<3; j++)
{
scanf("%d",&a[i][j]); //read 3*3 array
}
}
printf("\n\t Matrix is : \n");
for(i=0; i<3; i++)
{
for(j=0; j<3; j++)
{
scanf("\t %d",a[i][j]); //print 3*3 array
}
Output :
Enter matrix of 3*3: 3 4 5 6 7 2 1 2 3
Matrix is :
3 4 5
6 7 2
1 2 3_
Sparse matrices:
When storing and manipulating sparse matrices on a computer, it is beneficial and often
necessary to use specialized algorithms and data structures that take advantage of the
sparse structure of the matrix. Operations using standard dense matrix structures and
algorithms are slow and consume large amounts of memory when applied to large sparse
matrices. Sparse data is by nature easily compressed, and this compression almost always
results in significantly less computer data storage usage. Indeed, some very large sparse
matrices are infeasible to manipulate with the standard dense algorithms.
Substantial memory requirement reductions can be realised by storing only the non-zero
entries. Depending on the number and distribution of the non-zero entries, different data
structures can be used and yield huge savings in memory when compared to a naïve
approach.
Formats can be divided into two groups: (1) those that support efficient modification, and
(2) those that support efficient matrix operations. The efficient modification group
includes DOK, LIL, and COO and is typically used to construct the matrix. Once the
matrix is constructed, it is typically converted to a format, such as CSR or CSC, which is
more efficient for matrix operations.
LIL stores one list per row, where each entry stores a column index and value. Typically,
these entries are kept sorted by column index for faster lookup. This is another format
which is good for incremental matrix construction. See scipy.sparse.lil_matrix.
COO stores a list of (row, column, value) tuples. Ideally, the entries are sorted (by
row index, then column index) to improve random access times. This is the traditional
format for specifying a sparse matrix in Matlab (via the sparse function), except as three
separate arrays instead of a single array of triples. This is another format which is good
for incremental matrix construction. Seescipy.sparse.coo_matrix and Matlab sparse.
Yale format
The Yale Sparse Matrix Format stores an initial sparse m×n matrix, M, in row form using
three one-dimensional arrays. Let NNZ denote the number of nonzero entries of M. The
first array is A, which is of length NNZ, and holds all nonzero entries of M in left-to-right
top-to-bottom (row-major) order. The second array is IA, which is of length m + 1 (i.e.,
one entry per row, plus one). IA(i) contains the index in A of the first nonzero element of
row i. Row i of the original matrix extends from A(IA(i)) to A(IA(i+1)-1), i.e. from
the start of one row to the last index before the start of the next. The third array, JA,
contains the column index of each element of A, so it also is of length NNZ.
JA= [0 1 1 2 1 2]
In this case the Yale representation contains 16 entries, compared to only 12 in the
original matrix. The Yale format saves on memory only when NNZ < (m(n − 1) − 1) / 2.
Is effectively identical to the Yale Sparse Matrix format, except that the column array is
normally stored ahead of the row index array. I.e. CSR is (val, col_ind, row_ptr),
where val is an array of the (left-to-right, then top-to-bottom) non-zero values of the
matrix; col_ind is the column indices corresponding to the values; and, row_ptr is the
list of value indexes where each row starts. The name is based on the fact that row index
information is compressed relative to the COO format. One typically uses another format
(LIL, DOK, and COO) for construction. This format is efficient for arithmetic operations,
row slicing, and matrix-vector products..
It is similar to CSR except that values are read first by column, a row index is stored for
each value, and column pointers are stored. I.e. CSC is (val, row_ind, col_ptr),
where val is an array of the (top-to-bottom, then left-to-right-bottom) non-zero values of
the matrix; row_ind is the row indices corresponding to the values; and, col_ptr is the
list of val indexes where each column starts. The name is based on the fact that column
index information is compressed relative to the COO format. One typically uses another
format (LIL, DOK, and COO) for construction. This format is efficient for arithmetic
operations, column slicing, and matrix-vector products. See scipy.sparse.csc_matrix.
Example
A bitmap image having only 2 colors, with one of them dominant (say a file that stores a
handwritten signature) can be encoded as a sparse matrix that contains only row and
column numbers for pixels with the non-dominant color.
Band matrix
Matrices with reasonably small upper and lower bandwidth are known as band matrices
and often lend themselves to simpler algorithms than general sparse matrices; or one can
sometimes apply dense matrix algorithms and gain efficiency simply by looping over a
reduced number of indices.
Diagonal matrix
A very efficient structure for an extreme case of band matrices, the diagonal matrix, is to
store just the entries in the main diagonal as a one-dimensional array, so a
diagonal n×n matrix requires only n entries.
"Fill-in" redirects here. For the puzzle, see Fill-In (puzzle).
The fill-in of a matrix are those entries which change from an initial zero to a non-zero
value during the execution of an algorithm. To reduce the memory requirements and the
number of arithmetic operations used during an algorithm it is useful to minimize the fill-
in by switching rows and columns in the matrix. The symbolic Cholesky
decomposition can be used to calculate the worst possible fill-in before doing the
actual Cholesky decomposition.
There are other methods than the Cholesky decomposition in use. Orthogonalization
methods (such as QR factorization) are common, for example, when solving problems by
least squares methods. While the theoretical fill-in is still the same, in practical terms the
You can access a two-dimensional matrix element with two subscripts: the first
representing the row index, and the second representing the column index.
As you add dimensions to an array, you also add subscripts. A four-dimensional array,
for example, has four subscripts. The first two reference a row-column pair; the second
two access the third and fourth dimensions of data.
Most of the operations that you can perform on matrices (i.e., two-dimensional arrays)
can also be done on multidimensional arrays.
redPlane = RGB(:,:,1);
To access a subimage, use
subimage = RGB(20:40,50:85,:);
The RGB image is a good example of data that needs to be accessed in planes for
operations like display or filtering. In other instances, however, the data itself might be
multidimensional. For example, consider a set of temperature measurements taken at
equally spaced points in a room. Here the location of each value is an integral part of the
data set—the physical placement in three-space of each element is an aspect of the
information. Such data also lends itself to representation as a multidimensional array.
mean(mean(mean(TEMP)));
To obtain a vector of the "middle" values (element (2,2)) in the room on each page, use
Pointer arrays:
Pointer array is use to store address of some memory variables or the address of another
arrays. Address of memory variables or array is known as pointer and an containing
pointers as its elements is known as pointer array.
When setting up data structures like lists, queues and trees, it is necessary to have
pointers to help manage how the structure is implemented and controlled. Typical
examples of pointers are start pointers, end pointers, and stack pointers. These pointers
can either be absolute (the actual physical address or a virtual address in virtual memory)
or relative (an offset from an absolute start address ("base") that typically uses fewer bits
than a full address, but will usually require one additional arithmetic operation to
resolve).
A two-byte offset, containing a 16-bit, unsigned integer, can be used to provide relative
addressing for up to 64 kilobytes of a data structure. This can easily be extended to 128K,
256K or 512K if the address pointed to is forced to be on a half-word, word or double-
word boundary (but, requiring an additional "shift left" bitwise operation—by 1,2 or 3
bits—in order to adjust the offset by a factor of 2,3 or 4, before its addition to the base
A one byte offset, such as the hexadecimal ASCII value of a character (e.g. X'29') can be
used to point to an alternative integer value (or index) in an array (e.g. X'01'). In this way,
characters can be very efficiently translated from 'raw data' to a usable
sequential index and then to an absolute address without a lookup table.
Fig:
Pointer array
Linked list
In computer science, a linked list is a data structure that consists of a sequence of data
records such that in each record there is a field that contains a reference (i.e., a link) to
the next record in the sequence.
Algorithm:
record Node {
data; // The data being stored in the node
Node next // A reference to the next node, null for last node
}
record List {
Node firstNode // points to first node of list; null for empty
list
}
Traversal of a singly linked list is simple, beginning at the first node and
following each next link
until we come to the end:
node := list.firstNode
The following code inserts a node after an existing node in a singly linked list.
The diagram shows how it works. Inserting a node before an existing one cannot
be done directly; instead, you have to keep track of the previous node and insert a
node after it.
Inserting at the beginning of the list requires a separate function. This requires
updating firstNode.
Similarly, we have functions for removing the node after a given node, and for
removing a node from the beginning of the list. The diagram demonstrates the
former. To find and remove a particular node, one must again keep track of the
previous element.
Notice that removeBeginning() sets list.firstNode to null when removing the last
node in the list.Since we can't iterate backwards, efficient "insertBefore" or
"removeBefore" operations are not possible.Appending one linked list to another
can be inefficient unless a reference to the tail is kept as part of the List structure,
because we must traverse the entire first list in order to find the tail, and then
A linked list whose nodes contain two fields: an integer value and a link to the next node.
Linked lists are among the simplest and most common data structures; they
provide an easy implementation for several important abstract data structures,
including stacks, queues, associative arrays, and symbolic expressions.
The principal benefit of a linked list over a conventional array is that the order of
the linked items may be different from the order that the data items are stored in
memory or on disk. For that reason, linked lists allow insertion and removal of
nodes at any point in the list, with a constant number of operations.
On the other hand, linked lists by themselves do not allow random access to the
data, or any form of efficient indexing. Thus, many basic operations — such as
obtaining the last node of the list, or finding a node that contains a given datum,
or locating the place where a new node should be inserted — may require
scanning most of the list elements.
In the last node of a list, the link field often contains a null reference, a special
value that is interpreted by programs as meaning "there is no such node". A less
common convention is to make it point to the first node of the list; in that case the
function iterate(someNode)
if someNode ≠ null
node := someNode
do
do something with node.value
node := node.next
while node ≠ someNode
Notice that the test "while node ≠ some Node" must be at the end of the loop. If it
were replaced by the test "" at the beginning of the loop, the procedure would fail
whenever the list had only one node.
This function inserts a node "new Node" into a circular linked list after a given
node "node". If "node" is null, it assumes that the list is empty.
Suppose that "L" is a variable pointing to the last node of a circular linked list (or
null if the list is empty). To append "new Node" to the end of the list, one may do
insertAfter(L, newNode)
L := newNode
insertAfter(L, newNode)
if L = null
L := newNode
In computer science, a doubly linked list is a linked data structure that consists of
a set of sequentially linked records called nodes. Each node contains two fields,
called links that are references to the previous and to the next node in the
sequence of nodes. The beginning and ending nodes' previous and next links,
respectively, point to some kind of terminator, typically a sentinel node or null, to
facilitate traversal of the list. If there is only one sentinel node, then the list is
circularly linked via the sentinel node. It can be conceptualized as two singly
linked lists formed from the same data items, but in opposite sequential orders.
record DoublyLinkedNode {
prev // A reference to the previous node
next // A reference to the next node
data // Data or reference to data
}
record DoublyLinkedList {
Node firstNode
DATA STRUCTURES AND COMPUTER ALGORITHMS
// points to first node of list
Node lastNode
// points to last node of list
}
These symmetric functions insert a node either after or before a given node, with the
diagram
demonstrating after:
We also need a function to insert a node at the beginning of a possibly empty list:
Removal of a node is easier than insertion, but requires special handling if the node to be
removed is the firstNode or lastNode:
One subtle consequence of the above procedure is that deleting the last node of a
list sets both firstNode and lastNode to null, and so it handles removing the last
node from a one-element list correctly. Notice that we also don't need separate
"removeBefore" or "removeAfter" methods, because in a doubly linked list we
can just use "remove(node.prev)" or "remove(node.next)" where these are valid.
A doubly-linked list whose nodes contain three fields: an integer value, the link
forward to the next node, and the link backward to the previous node.The technique
known as XOR-linking allows a doubly-linked list to be implemented using a single
link field in each node. However, this technique requires the ability to do bit
3500
Node N
PT
R
that list starting with some Node (any node will do):
Forwards
node := someNode
do
do something with node.value
node := node.next
while node ≠ someNode
Backwards
Notice the postponing of the test to the end of the loop. This is important for the case
where the list contains only the single node someNode.
Inserting a node
This simple function inserts a node into a doubly linked circularly linked list after a given
element:
function insertAfter(Node node, Node newNode)
newNode.next := node.next
newNode.prev := node
node.next.prev := newNode
node.next
:= newNode
To do an "insertBefore", we can simply "insertAfter(node.prev, newNode)".
Inserting an element in a possibly empty list requires a special function:
function insertEnd(List list, Node node)
if list.lastNode == null
node.prev := node
node.next := node
else
insertAfter(list.lastNode, node)
list.lastNode := node
To insert at the beginning we simply "insertAfter(list.lastNode, node)". Finally,
removing a node must deal with the case where the list empties:
Memory representation:
In computer science, a stack is a last in, first out (LIFO) abstract data type and data
structure. A stack can have any abstract data type as an element, but is characterized by
only three fundamental operations: push, pop and stack top.
The push operation adds a new item to the top of the stack, or initializing the stack if it is
empty, but if the stack is full and does not contain more space to accept the given item it
is considered as an Overflow state (It means that the stack is overloaded or no more space
for new item).
The pop operation removes an item from the top of the stack, A pop either reveals
previously concealed items, or results in an empty stack, but if the stack is empty then it
goes under underflow state (It means no items are present in stack to be removed).
The stack top operation removes the data from top most position without deleting it and
returns it to user, the same underflow state can also occur in stack top operation if stack is
empty.
A stack is a restricted data structure, because only a small number of operations are
performed on it. The nature of the pop and push operations also mean that stack elements
have a natural order. Elements are removed from the stack in the reverse order to the
order of their addition: therefore, the lower elements are those that have been on the stack
the longest.
Stack representation
Array implementation.
Representing stacks with arrays is a natural idea. The first problem that you might
encounter is implementing the constructor ArrayStackOfStrings(). An instance
variable a[] with an array of strings to hold the stack items is clearly needed, but how big
should it be? For the moment, we will finesse this problem by having the client provide
an argument for the constructor that gives the maximum stack size. We keep the items
in reverse order of their arrival. This policy allows us to add and remove items at the end
without moving any of the other items in the stack.
We could hardly hope for a simpler implementation of Array Stack Of Strings ava: all of
the methods are one-liners! The instance variables are an array a[] that hold the items in
the stack and an integer N that counts the number of items in the stack. To remove an
item, we decrement N and then return a[N]; to insert a new item, we set a[N] equal to the
new item and then increment N. These operations preserve the following properties: the
items in the array are in their insertion order the stack is empty when the value of N is 0
the top of the stack (if it is nonempty) is at a[N-1] Infix to Postfix Conversion
Linked lists.
For classes such as stacks that implement collections of objects, an important objective is
to ensure that the amount of space used is always proportional to the number of items in
the collection. Now we consider the use of a fundamental data structure known as
a linked list that can provide implementations of collections (and, in particular, stacks)
that achieves this important objective.
A linked list is a recursive data structure defined as follows: a linked list is either empty
(null) or a reference to a node having a reference to a linked list. The node in this
definition is an abstract entity that might hold any kind of data in addition to the node
reference that characterizes its role in building linked lists. With object-oriented
programming, implementing linked lists is not difficult. We start with a simple example
of a class for the node abstraction:
String item;
Node next;
A Node has two instance variables: a String and a Node. The String is a placeholder in
this example for any data that we might want to structure with a linked list (we can use
any set of instance variables); the instance variable of type Node characterizes the linked
nature of the data structure. Now, from the recursive definition, we can represent a linked
list by a variable of type Node just by ensuring that its value is either null or a reference
to a Node whose next field is a reference to a linked list.
We create an object of
type Node by invoking its
(no-argument) constructor.
This creates a reference to
a Node object whose instance
variables are both
initialized to the
value null. For example, to
build a linked list that
contains the
items "to", "be", and "or", we
create a Node for each
item:Node first = new Node();
Node second = new Node();
Node third = new Node();
and set the item field in each of the nodes to the desired item value:
first.item = "to";
Insert. Suppose that you want to insert a new node into a linked list. The easiest
place to do so is at the beginning of the list. For example, to insert the
string "not" at the beginning of a given linked list whose first node is first, we
save first in oldfirst, assign to first a new Node, assign itsitem field
to "not" and its next field to oldfirst.
These two operations take constant time (independent of the length of the list).
There are two operations which we use in stack to add or delete an element for the
stack. The operations are listed below.
1. PUSH ():- This function is used to add an element into the Stack.
2. POP ():- This function is used to delete an element from the Stack.
Push (6)
In Stack
Array position
Element
Pointer
2
6
Top
1
5
Now the Pointer TOP gets incremented as (TOP+1) and moved to 2nd position in the
element which points the element 6 (NEWLY ADDED ELEMENT).
Example:
Assume the stack is,
Array position
Element
Pointer
2
6
Top
1
5
Now the pointer is pointed in the element 6 that is, it shows the 2nd position of the array
were the element is available.
POP (6)
Array position
Element
Pointer
2
1
5
Top
Now the pointer TOP gets decremented (TOP-1or TOP--) and move to the position 1.
Example
B B B
A A A A
s.pop();
returns C
B B
A A A
s.push(‗F‘); s.pop();
returns F
s.pop();
returns B
s.pop();
returns A
DATA STRUCTURES AND COMPUTER ALGORITHMS
Example program in c
#include<stdio.h>
int main()
{
int a[100], i;
printf("To pop enter -1\n");
for(i = 0;;)
{
printf("Push ");
scanf("%d", &a[i]);
if(a[i] == -1)
{
if(i == 0)
{
printf("Underflow\n");
}
else
{
printf("pop = %d\n", a[--i]);
}
}
else
{
i++;
}
}
}
Applications of stack
1) read (data)
1) push (data)
2) read (data)
1) pop (data)
2) print (data)
2) print (digit)
3) number = number / 2
Rules:
• Operands immediately go directly to output Operators are pushed into the stack
(including parenthesis) Check to see if stack top operator is less than current
operator
• If the top operator is less than, push the current operator onto stack If the top
operator is greater than the current, pop top operator and push onto stack, push
current perator onto stack
• Priority 2: * /
• Priority 1: + -
• Priority 0: (
• If we encounter a right parenthesis, pop from stack until we get matching left
parenthesis. Do not output parenthesis.Example of infix and postfix
• A+B*C-D/E
b) + B * C - D / E A
c) B*C-D/E + A
d) *C-D/E + AB
e) C-D/E +* AB
f) -D/E +* ABC
g) D/E +- ABC*
h) /E +- ABC*D
i) E +-/ ABC*D
j) +-/ ABC*DE
k) ABC*DE/-+
Operand: push
Operator: pop 2 operands, do the math, pop result back onto stack
123+*
123+*
23+* 1
3+* 12
+* 123
DATA STRUCTURES AND COMPUTER ALGORITHMS
* 1 5 // 5 from 2 + 3
5 // 5 from 1 * 5
Back tracking
Queues
Definition
A queue supports the insert and removes operations using a FIFO discipline. By
convention, we name the queue insert operation enqueue and the remove
operation dequeues.
Representation of queue
Iteration.
Sometimes the client needs to access all of the items of a collection, one at a time,
without deleting them. To maintain encapsulation, we do not want to reveal the internal
representation of the queue (array or linked list) to the client. "Decouple the thing that
needs to traverse the list from the details of getting each element from it." We solve this
design challenge by using Java'sjava.util.Iterator interface:
public interface Iterator<Item> {
boolean hasNext();
Item next();
void remove(); // optional
}
Enhanced for loop. Iteration is such a useful abstraction that Java provides
compact syntax (known as the enhanced for loop) to iterate over the elements of a
collection (or array).
Iterator<String> i = queue.iterator();
while (i.hasNext()) {
String s = i.next();
StdOut.println(s);
}
To take advantage of Java's enhanced foreach syntax, the data type must
implement Java'sIterable interface.
public interface Iterable<Item> {
Iterator<Item> iterator();
}
That is, the data type must implement a method named iterator() that returns
an Iterator to the underlying collection. Since our Queue ADT now includes
such a method, we simply need to declare it as implementing
the Iterable interface and we are ready to use the foreach notation.
This implementation stores the queue in an array. The array indices at which the head
and tail of the queue are currently stored must be maintained. The head of the queue is
not necessarily at index 0. The array can be a ―circular array‖ – the queue ―wraps round‖
Continue the above example to show the state of the queue after the following
operations:
Add(E,Q)
Remove(Q)
Add(J,Q)
Add(K,Q)
As in the case of the stack, each node in a dynamic data structure contains data AND a
A queue also needs a reference to the head node AND a reference to the tail node.
The following diagram describes the storage of a queue called Queue. Each node
The new node is to be added at the tail of the queue. The reference Queue.Tail should
point to the new node, and the NextNode reference of the node previously at the tail of
DataItem NextNode
DataItem NextNode
Queue.Head Queue.Tail
Queue.Head Queue.Tail
NewNode
and set to point to head node in the queue (Temp = Queue.Head). Queue.Head is then
set to point to the second node instead of the top node.The only reference to the
original head node is now Temp and the memory used by this node can then be freed.
Queue.Head Queue.Tail
Queue.Head Queue.Tail
Temp
234+56**+
Postscript and FORTH programming languages are stack based. Java bytecode is
interpreted on (virtual) stack based processor. Microsoft Intermediate Language
(MSIL) that .NET applications are compiled to.
Little's law asserts that the average number of customers in a (stable) queueing
system equals the average arrival rate times their average time in the system. But
the variance of customer waiting times satisfies: Var(FIFO) < Var(SIRO) <
Var(LIFO).
The distribution of the number of customers in the system does not depend on the
queueing discipline (so long as it is independent of their service times). Same for
expected waiting time.
M/D/1 queue. Program MD1Queue.java is similar but the service occurs at a fixed
rate (rather than random).
Quicksort is a fast sorting algorithm, which is used not only for educational purposes,
but widely applied in practice. On the average, it has O(n log n) complexity, making
quicksort suitable for sorting big data volumes. The idea of the algorithm is quite
simple and once you realize it, you can write quicksort as fast as bubble sort.
Algorithm
1. Choose a pivot value. We take the value of the middle element as pivot value,
but it can be any value, which is in range of sorted values, even if it doesn't
present in the array.
2. Partition. Rearrange elements in such a way, that all elements which are lesser
than the pivot go to the left part of the array and all elements greater than the
pivot, go to the right part of the array. Values equal to the pivot can stay in any
part of the array. Notice, that array may be divided in non-equal parts.
3. Sort both parts. Apply quicksort algorithm recursively to the left and the right
parts.
Notice, that we show here only the first recursion step, in order not to make example
too long. But, in fact, {1, 2, 5, 7, 3} and {14, 7, 26, 12} are sorted then recursively.
On the partition step algorithm divides the array into two parts and every element a from
the left part is less or equal than every element b from the right part.
Also a and b satisfy a ≤ pivot ≤ b inequality. After completion of the recursion calls both
of the parts become sorted and, taking into account arguments stated above, the whole
array is sorted.
Complexity analysis
On the average quicksort has O(n log n) complexity, but strong proof of this fact is not
trivial and not presented here. Still, you can find the proof in [1]. In worst case,
quicksort runs O(n2) time, but on the most "practical" data it works just fine and
outperforms other O(n log n) sorting algorithms.
Code snippets
Partition algorithm is important per se, therefore it may be carried out as a separate
function. The code for C++ contains solid function for quicksort, but Java code
contains two separate functions for partition and sort, accordingly.
Java
while (i <= j) {
while (arr[i] < pivot)
i++;
while (arr[j] > pivot)
j--;
if (i <= j) {
tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
i++;
j--;
}
};
return i;
}
C++
/* partition */
while (i <= j) {
while (arr[i] < pivot)
i++;
while (arr[j] > pivot)
j--;
if (i <= j) {
tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
i++;
j--;
}
};
/* recursion */
if (left < j)
quickSort(arr, left, j);
if (i < right)
quickSort(arr, i, right);
}
Trees:
Definition
• Nodes that do not have any children are called leaf nodes. They are also referred
to as terminal nodes.A free tree is a tree that is not rooted.The height of a node is
the length of the longest downward path to a leaf from that node. The height of
the root is the height of the tree. The depth of a node is the length of the path to its
root (i.e., its root path). This is commonly needed in the manipulation of the
various self balancing trees, AVL Trees in particular. Conventionally, the value -1
corresponds to a subtree with no nodes, whereas zero corresponds to a subtree
with one node.
• The topmost node in a tree is called the root node. Being the topmost node, the
root node will not have parents. It is the node at which operations on the tree
commonly begin (although some algorithms begin with the leaf nodes and work
up ending at the root). All other nodes can be reached from it by
following edges or links. (In the formal definition, each such path is also unique).
In diagrams, it is typically drawn at the top. In some trees, such as heaps, the root
node has special properties. Every node in a tree can be seen as the root node of
the subtree rooted at that node.
Representation of trees
A node is a structure which may contain a value, a condition, or represent a separate data
structure (which could be a tree of its own). Each node in a tree has zero or more child
nodes, which are below it in the tree (by convention, trees are drawn growing
downwards). A node that has a child is called the child's parent node (or ancestor node,
or superior). A node has at most one parent.
Nodes that do not have any children are called leaf nodes. They are also referred to as
terminal nodes.A free tree is a tree that is not rooted. The height of a node is the length of
the longest downward path to a leaf from that node. The height of the root is the height of
the tree. The depth of a node is the length of the path to its root (i.e., its root path). This is
commonly needed in the manipulation of the various self balancing trees, AVL Trees in
particular. Conventionally, the value -1 corresponds to a sub tree with no nodes, whereas
zero corresponds to a sub tree with one node.
The topmost node in a tree is called the root node. Being the topmost node, the root node
will not have parents. It is the node at which operations on the tree commonly begin
(although some algorithms begin with the leaf nodes and work up ending at the root). All
other nodes can be reached from it by following edges or links. (In the formal definition,
each such path is also unique). In diagrams, it is typically drawn at the top. In some trees,
such as heaps, the root node has special properties. Every node in a tree can be seen as
the root node of the sub tree rooted at that node.
A sub tree of a tree T is a tree consisting of a node in T and all of its descendants in T.
(This is different from the formal definition of sub tree used in graph theory. The sub tree
corresponding to the root node is the entire tree; the sub tree corresponding to any other
node is called a proper sub tree (in analogy to the term proper subset).
A binary tree is a tree data structure in which each node has at most two child nodes,
usually distinguished as "left" and "right". Nodes with children are parent nodes, and
child nodes may contain references to their parents. Outside the tree, there is often a
reference to the "root" node (the ancestor of all nodes), if it exists. Any node in the data
structure can be reached by starting at root node and repeatedly following references to
either the left or right child. Binary trees are used to implement binary search
trees and binary heaps. Directed edge refers to the link from the parent to the child (the
arrows in the picture of the tree).
There are many different ways to represent trees; common representations represent the
nodes as records allocated on the heap (not to be confused with the heap data structure)
with pointers to their children, their parents, or both, or as items in an array, with
relationships between them determined by their positions in the array (e.g., binary heap)
Heap is a specialized tree-based data structure that satisfies the heap property: if B is
a child node of A, then key (A) ≥ key (B). This implies that an element with the greatest
key is always in the root node, and so such a heap is sometimes called a max-heap.
(Alternatively, if the comparison is reversed, the smallest element is always in the root
node, which results in a min-heap.) There is no restriction as to how many children each
node has in a heap. The heap is one maximally-efficient implementation of an abstract
data type called a priority queue. Heaps are crucial in several
efficient graph algorithms such as Dijkstra's algorithm.
A heap data structure should not be confused with the heap which is a common name for
dynamic allocated memory. The term was originally used only for the data structure.
Heaps are usually implemented in an array, and do not require pointers between
elements.
The root node of a tree is the node with no parents. There is at most one root node
in a rooted tree.
A leaf node has no children.
e.g.: depth of tree with level =3,then, size of tree is, level+1=4
A rooted binary tree is a tree with a root node in which every node has at most
two children.
A full binary tree (sometimes proper binary tree or 2-tree or strictly binary tree) is
a tree in which every node other than the leaves has two children.
A perfect binary tree is a full binary tree in which all leaves are at the
same depth or same level.[1] (This is ambiguously also called a complete binary tree.)
A complete binary tree is a binary tree in which every level, except possibly the
last, is completely filled, and all nodes are as far left as possible.[2]
An infinite complete binary tree is a tree with levels, where for each level d
the number of existing nodes at level d is equal to 2d. The cardinal number of the set
of all nodes is . The cardinal number of the set of all paths is . The infinite
complete binary tree essentially describes the structure of the Cantor set; the unit
interval on the real line (of cardinality ) is the continuous image of the Cantor set;
this tree is sometimes called the Cantor space.
A balanced binary tree is commonly defined as a binary tree in which the height
of the two sub trees of every node never differ by more than 1.,[3] although in general
Note that this terminology often varies in the literature, especially with respect to the
meaning "complete" and "full".
A Strictly Binary Tree:- Its when the tree is fully expanded i.e., with 2 degree
expansion.
The number of nodes n in a perfect binary tree can be found using this
formula: n = 2h + 1 − 1 where h is the height of the tree.
The number of nodes n in a complete binary tree is minimum: n = 2h and
maximum: n = 2h + 1 − 1 where h is the height of the tree.
The number of leaf nodes L in a perfect binary tree can be found using this
formula: L = 2h where h is the height of the tree.
The number of nodes n in a perfect binary tree can also be found using this
formula: n = 2L − 1 where L is the number of leaf nodes in the tree.
The number of NULL links in a Complete Binary Tree of n-node is (n+1).
Operations of trees
Here are a variety of different operations that can be performed on trees. Some
are mutator operations, while others simply return useful information about the tree.
• Insertion
Nodes can be inserted into binary trees in between two other nodes or added after
an external node. In binary trees, a node that is inserted is specified as to which child it is.
• External nodes
Say that the external node being added on to is node A. To add a new node after node A,
A assigns the new node as one of its children and the new node assigns node A as its
parent.
Insertion on internal nodes is slightly more complex than on external nodes. Say that the
internal node is node A and that node B is the child of A. (If the insertion is to insert a
right child, then B is the right child of A, and similarly with a left child insertion.) A
assigns its child to the new node and the new node assigns its parent to A. Then the new
node assigns its child to B and B assigns its parent as the new node.
• Deletion
Deletion is the process whereby a node is removed from the tree. Only certain nodes in a
binary tree can be removed unambiguously.
Say that the node to delete is node A. If a node has no children (external node), deletion
is accomplished by setting the child of A's parent to null and A's parent to null. If it has
one child, set the parent of A's child to A's parent and set the child of A's parent to A's
child.
• Iteration
Often, one wishes to visit each of the nodes in a tree and examine the value there, a
process called iteration or enumeration. There are several common orders in which the
nodes can be visited, and each has useful properties that are exploited in algorithms based
on binary trees:
Pre-order, in-order, and post-order traversal visit each node in a tree by recursively
visiting each node in the left and right subtrees of the root. If the root node is visited
before its subtrees, this is pre-order; if after, post-order; if between, in-order. In-order
traversal is useful inbinary search trees, where this traversal visits the nodes in increasing
order.
Depth-first order
In depth-first order, we always attempt to visit the node farthest from the root that we
can, but with the caveat that it must be a child of a node we have already visited. Unlike a
depth-first search on graphs, there is no need to remember all the nodes we have visited,
because a tree cannot contain cycles. Pre-order is a special case of this. See depth-first
search for more information.
Breadth-first order
Contrasting with depth-first order is breadth-first order, which always attempts to visit
the node closest to the root that it has not already visited. See breadth-first search for
more information. Also called a level-order traversal.
Graph theorists use the following definition: A binary tree is a connected acyclic
graph such that the degree of each vertex is no more than three. It can be shown that in
any binary tree of two or more nodes, there are exactly two more nodes of degree one
than there are of degree three, but there can be any number of nodes of degree two.
A rooted binary tree is such a graph that has one of its vertices of degree no more than
two singled out as the root.
With the root thus chosen, each vertex will have a uniquely defined parent, and up to two
children; however, so far there is insufficient information to distinguish a left or right
child. If we drop the connectedness requirement, allowing multiple connected
components in the graph, we call such a structure a forest.
A single vertex.
A graph formed by taking two binary trees, adding a vertex, and adding an edge
directed from the new vertex to the root of each binary tree.
This also does not establish the order of children, but does fix a specific root node.
• There is a unique binary tree of size 0 (consisting of a single leaf), and any other
binary tree is characterized by the pair of its left and right children; if these have
sizes i and j respectively, the full tree has size i + j + 1. Therefore the
number Cn of binary trees of size nhas the following recursive description C0 = 1,
• The above parenthesized strings should not be confused with the set of words of
length 2n in the Dyck language, which consist only of parentheses in such a way
that they are properly balanced. The number of such strings satisfies the same
recursive description (each Dyck word of length 2n is determined by the Dyck
subword enclosed by the initial '(' and its matching ')' together with the Dyck
subword remaining after that closing parenthesis, whose lengths 2i and
2j satisfy i + j + 1 = n); this number is therefore also the Catalan number Cn. So
there are also five Dyck words of length 10:
• The ability to represent binary trees as strings of symbols and parentheses implies
that binary trees can represent the elements of afree magma on a singleton
set.Binary trees can be constructed from programming language primitives in
several ways.
Arrays
its parent (if any) is found at index (assuming the root has
index zero). This method benefits from more compact storage and
better locality of reference, particularly during a preorder traversal.
Succinct encodings
• One simple representation which meets this bound is to visit the nodes
of the tree in preorder, outputting "1" for an internal node and "0" for a
leaf. [1] If the tree contains data, we can simply simultaneously store it
in a consecutive array in preorder. This function accomplishes this:
• One way of thinking about this is that each node's children are in
a linked list, chained together with their right fields, and the node only
has a pointer to the beginning or head of this list, through its left field.
For example, in the tree on the left, A has the 6 children
{B,C,D,E,F,G}. It can be converted into the binary tree on the right
• The binary tree can be thought of as the original tree tilted sideways,
with the black left edges representing first child and the blue right edges
representing next sibling. The leaves of the tree on the left would be
written in Lisp as:(((N O) I J) C D ((P) (Q)) F (M))which would be
implemented in memory as the binary tree on the right, without any
letters on those nodes that have a left child.
Types of trees
Tree is a non-empty set, one element of which is designated the root of the tree while the
remaining elements are partitioned into non-empty sets each of which is a subtree of the
root.
Tree nodes have many useful properties. The depth of a node is the length of the path (or
the number of edges) from the root to that node. The height of a node is the longest path
from that node to its leaves. The height of a tree is the height of the root. A leaf nodehas
no children -- its only path is up to its parent.
See the axiomatic development of trees and its consequences for more information.
Types of trees:
Binary: Each node has zero, one, or two children. This assertion makes many tree
operations simple and efficient.
AVL: A balanced binary search tree according to the following specification: the heights
of the two child subtrees of any node differ by at most one.
Red-Black Tree: A balanced binary search tree using a balancing algorithm based on
colors assigned to a node, and the colors of nearby nodes.
Traversal
Many problems require we visit* the nodes of a tree in a systematic way: tasks such as
counting how many nodes exist or finding the maximum element. Three different
methods are possible for binary trees: preorder, postorder, and in-order, which all do the
same three things: recursively traverse both the left and right subtrees and visit the
current node. The difference is when the algorithm visits the current node:
levelorder: Level by level, from left to right, starting from the root node.
Visit means performing some operation involving the current node of a tree, like
incrementing a counter or checking if the value of the current node is greater than any
other recorded.
preorder(node)
visit(node)
if node.left ≠ null then preorder(node.left)
For an algorithm that is less taxing on the stack, see Threaded Trees.
Balancing
When entries that are already sorted are stored in a tree, all new records will go the same
route, and the tree will look more like a list (such a tree is called a degenerate tree).
Therefore the tree needs balancing routines, making sure that under all branches are an
equal number of records. This will keep searching in the tree at optimal speed.
Specifically, if a tree with n nodes is a degenerate tree, the longest path through the tree
will be n nodes; if it is a balanced tree, the longest path will be log n nodes.
The balancing operation can move nodes up and down a tree without affecting the left
right ordering. Media:left_rotation.svg
Node Any item that is stored in the tree. Root The top item in the tree. (50 in the tree
above) Child Node(s) under the current node. (20 and 40 are children of 30 in the tree
above) Parent The node directly above the current node. (90 is the parent of 100 in the
tree above) Leaf A node which has no children. (20 is a leaf in the tree above)
1. The root node is 50, which is greater than 40, so you go to 50's left child.
2. 50's left child is 30, which is less than 40, so you next go to 30's right child.
3. 30's right child is 40, so you have found the item that you are looking for :)
Adding an item to a binary search tree
1. To add an item, you first must search through the tree to find the position that you
should put it in. You do this following the steps above.
2. When you reach a node which doesn't contain a child on the correct branch, add
the new node there.
1. The root node is 50, which is greater than 25, so you go to 50's left child.
2. 50's left child is 30, which is greater than 25, so you go to 30's left child.
3. 30's left child is 20, which is less than 25, so you go to 20's right child.
4. 20's right child doesn't exist, so you add 25 there :)
Deleting an item from a binary search tree
It is assumed that you have already found the node that you want to delete, using the
search technique described above.
1. Directly connect the child of the node that you want to delete, to the parent of the
node that you want to delete.
1. Find the left-most node in the right sub tree of the node being deleted. (After you
have found the node you want to delete, go to its right node, then for every node
under that, go to its left node until the node has no left node) From now on, this
node will be known as the successor.
1. Directly move the child to the right of the node being deleted into the position of
the node being deleted.
2. As the new node has no left children, you can connect the deleted node's left
subtree's root as it's left child.
To delete 30...
1. Move the successor into the place where the deleted node was and make it inherit
both of its children. So 35 moves to where 30 was and 20 and 40 become its
children.
2. Move the successor's (35s) right subtree to where the successor was. So 37
becomes a child of 40.
Node deletion
In general, remember that a node's left subtree's rightmost node is the closest node on the
left , and the right subtree's leftmost node is the closest node on the right, and either one
of these can be chosen to replace the deleted node, and the only complication is when the
replacing node has a child subtree on the same side to that node as the replacing node's
side of the deleting node, and the easiest thing to do is to always return the same side
B Trees
o A classical B-Tree can have N-node internal nodes, and empty 2-nodes as leaf
nodes, or more conveniently, the children can either be a value or a pointer to the
next N-node, so it is a union.
o The main idea with B-trees is that one starts with a root N-node , which is able to
hold N-1 entries, but on the Nth entry, the number of keys for the node is
exhausted, and the node can be split into two half sized N/2 sized N nodes,
separated by a single key K, which is equal to the right node's leftmost key, so
any entry with key K2 equal or greater than K goes in the right node, and
anything less than K goes in the left. When the root node is split, a new root node
is created with one key, and a left child and a right child. Since there are N
children but only N-1 entries, the leftmost child is stored as a separate pointer. If
the leftmost pointer splits, then the left half becomes the new leftmost pointer, and
the right half and separating key is inserted into the front of the entries.
o Apparently, this fan out is so important, compression can also be applied to the
blocks to increase the number of entries fitting within a given underlying layer's
block size ( the underlying layer is often a file system block ).
o Most database systems use the B+ tree algorithm, including postgresql, mysql,
derbydb, firebird, many Xbase index types, etc.Many file systems also use a B+
tree to manage their block layout ( e.g. xfs, NTFS, etc. . don't know much detail
though ).
o Hence, the flat leaf block list of this B+ implementation can't contain blocks that
don't contain any data, because the ordering depends on the first key of the
entries, so a leaf block needs to be created with its first entry.
Splitting the n inputs to k distinct subset 1< k <=n, resulting k sub problems.
The sub problems are solved and sub solutions are combined into solutions of
the whole. If sub problems are large then divide and conquer be applied.
Control Abstraction:
A procedure whose flow of control is clear but the primary operations are
specified by other procedures
Example:
If problem P size is n. Size (k) sub problems are n1, n2….nk
G(n)
T(n) = T(n1)+T(n2)+….. T(nk)+ f(n)
DATA STRUCTURES AND COMPUTER ALGORITHMS
Otherwise T(n) for DA and C.
g(n) – time to compute the answers directly for small inputs.
f(n) – time for dividing and combing the solution of sub problems.
Complexity:
T(1) n=1
T(n) = aT(n/b) + f(n) n>1
Binary search:
Example:
If(5==1) then
{
Else
mid:= [(1+5)/2]; mid:=3
if(4=3)
else if(4<3)
Binsrch(5,4,5,4);
If(5==1)
Space complexity:
Merge sort:
I. It is a simple short procedure which sorts two halves of the input and
procedures the sorted list by merging the sorted sub list.
Unsorted unsorted
Divide Sub list sub list
sorted sorted
Sub list sub list
Solve
Example:
output
Compare 10 and 15 10
Compare 25 and 15 15
Compare 25 and 21 21
Compare 25 and 30 25
Compare 32 and 30 30
Compare 32 and 42 32
Compare 35 and 42 35
Example:
Insertion sort:
It works exceeding fast on arrays of small sized set.
It computing time 0(n2)
Example:
Input : 10 15 3 27 9 20 12
10 15 3 27 9 20 12
10 15 3 27 9 20 12
3 10 15 27 9 20 12
3 10 15 27 9 20 12
3 9 10 15 27 20 12
3 9 10 15 20 27 12
3 9 10 12 15 20 27
1. Position a pointer low to the second element and a pointer high to the last
element.
2. Move low forward till it reaches an element greater than k1
3. Move high backward till it reaches an element lass than k1
4. Interchange klow and khigh if low<high
5. Continued step 2 to 4 till low < high
6. Swap the element at first and high indices.
35 26 10 13 45 92 30 60
Selection sort:
1. Selection sort selects the largest or smallest element from the list to be sorted and
places in its correct position.
2. If the largest element is taken, then it is placed in the last position.
Strassen's algorithm
To calculate the matrix product C = AB, Strassen's algorithm partitions the data to reduce
the
number of multiplications performed. This algorithm requires M, N and P to be powers of
2.
Greedy method:
Most of the problem will have n input and require as obtaining a subset
that satisfies some constraints. Any subset satisfies these constraints are
called feasible solutions.
We have to find a feasible solution that either maximizes or a minimizes
given objective function.
A feasible solution that does this is called optimal solutions.The greedy
method suggest consider 1 input at a time. At each stage a decision is
made regarding whether a particular I/P is in an optimal solution.
If the inclusion of the next I/P into the partially constructed optimal
solution will result in an infeasible solutions, then this I/P is not added to
the partial solution, otherwise it is added.The selection procedure is based
on some optimization. This measure may be the objective function.
This version of the greedy techniques is called the subset paradigm.For
problem that do not call for the selection of an optimal subset, in the
greedy method we make decision by considering the inputs in some
order.This version of the greedy method is called the ordering
paradigm.Greedy method control abstraction for the subset paradigm.
General Method
Procedure GREEDY(A,n)
// A(1:n) contains the n inputs//
solution ¬ f //initialize the solution to empty//
for i ¬ 1 to n do
x ¬ SELECT(A)
if FEASIBLE(solution,x)
DATA STRUCTURES AND COMPUTER ALGORITHMS
then solution ¬ UNION(solution,x)
endif
repeat
return(solution)
end GREEDY
Subset problem:
There are n positive numbers given in a set. The desire is to find all possible subsets of
this set, the contents of which add onto a predefined value M.
Let there be n elements in the main set. W=w[1..n] represent the elements of the set. i.e.,
w = (w1,w2,w3,…,wn) vector x = x[1..n] assumes either 0 or 1 value. If element w(i) is
included in the subset then x(i) =1.
Consider n=6 m=30 and w[1..6]={5,10,12,13,15,18}. The partial backtracking tree is
shown in fig 6.2. The label to the left of a node represents the item number chosen for
insertion and the label to the right represents the space occupied in M. S represents
a solution to the given problem and B represents a bounding criteria if no solution can
be reached. For the above problem the solution could be (1,1,0,0,1,0), (1,0,1,1,0,0) and
(0,0,1,0,0,1). Completion of the tree structure is left as an assignment for the reader.
The 8 queen problem can be stated as follows. Consider a chessboard of order 8X8. The
problem is to place 8 queens on this board such that no two queens are attack can attack
each other.
Illustration.
Consider the problem of 4 queens, backtracking solution for this is as shown in the fig
6.3. The figure shows a partial backtracking tree. Completion of the tree is left as an
assignment for the reader.
KNAPSACK PROBLEM:
(P1,P2,P3) = (25,24,15)
(W1,W2,W3) = (18,15,10)
= 18 * ½ + 15 * 1/3 + 10 * ¼
= 9 + 5 + 5/2
= 14 + 2.5 = 16.5
= 25/2 + 8 + 15/4
= 12.5 +8+3.75
= 24.25
= 18 * 1 + 15 * 2/15 + 0
WiXi = 20
∑ PiXi = 25 * 1 + 24 *2/15
= 25 + 16/5
= 25 + 3.2
= 28.2
WiXi = 18 * 0 + 15 * 2/3 + 10 * 1
= 10 + 10
= 0 + 16 + 15
= 31
At each step we select object which has the maximum profit per unit of capacity
used. This means the objects are consider in the ratio = Pi / Wi
(0, 1 , ½) = 25 * 0 +24*1+15 * ½
= 24 + 7.5
PiXi = 31.5
WiXi =20
2 3
4 5
V = { 1,2,3,4,5}
Spanning trees:
1 2 1 2
1 2 1 2
3
4
3 4
V = {1, 2, 3, 4}
E = {(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}
V= {1, 2, 3, 4}
A minimum spanning tree of a graph G is its spanning tree in which the sum of
the weight of the edges is minimum.
1
DATA STRUCTURES AND COMPUTER ALGORITHMS
28
1
10
10 2 2
6
14
14 16 7
3 6 7
16
3
25
5 4
24 25
18 1212
22 5 4
22
Prims Algorithm:
Stage : 1 Stage : 2
1
1
1
2
10
6
1
7
3 1
1
5
1 ALGORITHMS
DATA 4STRUCTURES AND COMPUTER
1
25
Stage : 3 Stage : 4
1 2
2
1
7
7
6 3
6
3
12
5 4
5 4
Stage : 5 stage : 6
2
1 1
2
14
16
7 3
6 6
3
COMPLEXITY:
The time required by prims algorithm is 0(n2) where n is the number of vertices
in a graph G.
KRUSKAL‘S ALGORITHM:
Each vertex of a graph is taken as a one node tree.
Then a minimum weighted edge is selected to connect two trees together.
The iteration continue till all trees of the graph form a single tree.
Stages : 1 stage : 2
2 1
2
6
10 6
7
3 7
5
3
4 5
Stage : 3 stage: 4
1 1
10 10
2
2
6 7
6 ALGORITHMS
DATA STRUCTURES AND COMPUTER
7
3
5
4
14
12 5 3
12
4
Stage: 5 stage:6
1 1
2
10 10 6 2
6
14 7
7
14
3
5 16 5 3
16 4
4
12 22 12
Graph theory, the shortest path problem is the problem of finding a path between two
vertices (or nodes) such that the sum of the weights of its constituent edges is minimized.
An example is finding the quickest way to get from one location to another on a road
map; in this case, the vertices represent locations and the edges represent segments of
road and are weighted by the time needed to travel that segment.
is minimal among all paths connecting v to v' .The problem is also sometimes called
the single-pair shortest path problem, to distinguish it from the following
generalizations:
The single-source shortest path problem, in which we have to find shortest paths
from a source vertex v to all other vertices in the graph.
The single-destination shortest path problem, in which we have to find shortest
paths from all vertices in the graph to a single destination vertex v. This can be
reduced to the single-source shortest path problem by reversing the edges in the
graph.
The all-pairs shortest path problem, in which we have to find shortest paths
between every pair of vertices v, v' in the graph.
The travelling salesman problem is the problem of finding the shortest path that
goes through every vertex exactly once, and returns to the start. Unlike the
shortest path problem, which can be solved in polynomial time in graphs
without negative cycles (edges with negative weights), the travelling salesman
problem is NP-complete and, as such, is believed not to be efficiently solvable
(see P = NP problem). The problem of finding the longest path in a graph is
also NP-complete.
The widest path problem seeks a path so that the minimum label of any edge is
as large as possible.
Given a directed graph (V, A) with source node s, target node t, and cost wij for
each arc (i, j) in A, consider the program with variables xij
all i,
This LP, which is common fodder for operations research courses, has the
special property that it is integral; more specifically, everybasic optimal
solution (when one exists) has all variables equal to 0 or 1, and the set of
edges whose variables equal 1 form an s-tdipath. See Ahuja et al.[6] for one
proof, although the origin of this approach dates back to mid-20th century.
Dynamic programming
For the programming paradigm, see Dynamic programming
language.In mathematics and computer science, dynamic programming is a
method for solving complex problems by breaking them down into simpler
subproblems. It is applicable to problems exhibiting the properties of overlapping
subproblems which are only slightly smaller and optimal substructure (described
below). When applicable, the method takes far less time than naïve methods.
The key idea behind dynamic programming is quite simple. In general, to solve a
given problem, we need to solve different parts of the problem (subproblems),
then combine the solutions of the subproblems to reach an overall solution. Often,
many of these subproblems are really the same. The dynamic programming
approach seeks to solve each subproblem only once, thus reducing the number of
computations. This is especially useful when the number of repeating
subproblems is exponentially large.
Top-down dynamic programming simply means storing the results of certain
calculations, which are later used again since the completed calculation is a sub-
problem of a larger calculation. Bottom-up dynamic programming involves
formulating a complex calculation as a recursive series of simpler calculations.
The term dynamic programming was originally used in the 1940s by Richard Bellman to
describe the process of solving problems where one needs to find the best decisions one
after another. By 1953, he refined this to the modern meaning, referring specifically to
nesting smaller decision problems inside larger decisions,[2] and the field was thereafter
recognized by the IEEE as a systems analysis and engineering topic. Bellman's
contribution is remembered in the name of the Bellman equation, a central result of
dynamic programming which restates an optimization problem in recursive form.
The word dynamic was chosen by Bellman to capture the time-varying aspect of the
problems, and also because it sounded impressive.[3] The word programming referred to
the use of the method to find an optimal program, in the sense of a military schedule for
DATA STRUCTURES AND COMPUTER ALGORITHMS
training or logistics. This usage is the same as that in the phrases linear
programming and mathematical programming, a synonym for mathematical optimization.
Finding the shortest path in a graph using optimal substructure; a straight line
indicates a single edge; a wavy line indicates a shortest path between the two
vertices it connects (other nodes on these paths are not shown); the bold line is
the overall shortest path from start to goal.
Dynamic programming is both a mathematical optimization method and a
computer programming method. In both contexts it refers to simplifying a
complicated problem by breaking it down into simpler subproblems in
a recursive manner. While some decision problems cannot be taken apart this
way, decisions that span several points in time do often break apart recursively;
Bellman called this the "Principle of Optimality". Likewise, in computer science,
a problem which can be broken down recursively is said to have optimal
substructure.
If subproblems can be nested recursively inside larger problems, so that dynamic
programming methods are applicable, then there is a relation between the value
of the larger problem and the values of the subproblems.[5] In the optimization
literature this relationship is called the Bellman equation.
The subproblem graph for the Fibonacci sequence. The fact that it is not
a treeindicates overlapping subproblems.This can be achieved in either of two
waysTop-down approach: This is the direct fall-out of the recursive formulation
of any problem. If the solution to any problem can be formulated recursively
using the solution to its subproblems, and if its subproblems are overlapping, then
one can easily memoize or store the solutions to the subproblems in a table.
Whenever we attempt to solve a new subproblem, we first check the table to see if
it is already solved. If a solution has been recorded, we can use it directly,
otherwise we solve the subproblem and add its solution to the table.
Bottom-up approach: This is the more interesting case. Once we formulate the
solution to a problem recursively as in terms of its subproblems, we can try
reformulating the problem in a bottom-up fashion: try solving the subproblems
first and use their solutions to build-on and arrive at solutions to bigger
subproblems. This is also usually done in a tabular form by iteratively generating
solutions to bigger and bigger subproblems by using the solutions to small
subject to
This problem is much simpler than the one we wrote down before, because it involves
only two decision variables, ct and kt + 1. Intuitively, instead of choosing his whole
lifetime plan at birth, the consumer can take things one step at a time. At time t, his
current capital kt is given, and he only needs to choose current consumption ct and
saving kt + 1.
To actually solve this problem, we work backwards. For simplicity, the current level of
capital is denoted as k. VT + 1(k) is already known, so using the Bellman equation once we
can calculate VT(k), and so on until we get to V0(k), which is the value of the initial
decision problem for the whole lifetime. In other words, once we know VT − j + 1(k), we
can calculate VT − j(k), which is the maximum of ln(cT − j) + bVT − j + 1(Aka − cT − j),
Fibonacci sequence
Here is a naïve implementation of a function finding the nth member of the Fibonacci
sequence, based directly on the mathematical definition:
function fib(n)
if n = 0 return 0
if n = 1 return 1
return fib(n − 1) + fib(n − 2)
Notice that if we call, say, fib(5), we produce a call tree that calls the function on the
same value many different times:
fib(5)
fib(4) + fib(3)
(fib(3) + fib(2)) + (fib(2) + fib(1))
((fib(2) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) + fib(1))
(((fib(1) + fib(0)) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0))
+ fib(1))
In particular, fib(2) was calculated three times from scratch. In larger examples, many
more values of fib, or subproblems, are recalculated, leading to an exponential time
algorithm.
Now, suppose we have a simple map object, m, which maps each value of fib that has
already been calculated to its result, and we modify our function to use it and update it.
The resulting function requires only O(n) time instead of exponential time:
There are at least three possible approaches: brute force, backtracking, and dynamic
programming.
Brute force consists of checking all assignments of zeros and ones and counting those
that have balanced rows and columns (n / 2zeros and n / 2 ones). As there are
possible assignments, this strategy is not practical except maybe up to n = 6.
Backtracking for this problem consists of choosing some order of the matrix elements
and recursively placing ones or zeros, while checking that in every row and column the
number of elements that have not been assigned plus the number of ones or zeros are both
at least n / 2. While more sophisticated than brute force, this approach will visit every
solution once, making it impractical for nlarger than six, since the number of solutions is
already 116963796250 for n = 8, as we shall see.
Dynamic programming makes it possible to count the number of solutions without
visiting them all. Imagine backtracking values for the first row - what information would
we require about the remaining rows, in order to be able to accurately count the solutions
obtained for each first row values? We consider k × n boards, where 1 ≤ k ≤ n,
whose k rows contain n / 2 zeros and n / 2 ones. The function f to which memoization is
applied maps vectors of n pairs of integers to the number of admissible boards
(solutions). There is one pair for each column and its two components indicate
respectively the number of ones and zeros that have yet to be placed in that column. We
one of possible assignments for the top row of the board, and going through every
column, subtracting one from the appropriate element of the pair for that column,
depending on whether the assignment for the top row contained a zero or a one at that
((1, 2) (2, 1) (1, 2) (2, 1)) ((1, 2) (1, 2) (2, 1) (2, 1)) k=3
1 0 1 0 0 0 1 1
((1, 1) (1, 1) (1, 1) (1, 1)) ((0, 2) (0, 2) (2, 0) (2, 0)) k=2
0 1 0 1 1 1 0 0
((0, 1) (1, 0) (0, 1) (1, 0)) ((0, 1) (0, 1) (1, 0) (1, 0)) k=1
1 0 1 0 1 1 0 0
((0, 0) (0, 0) (0, 0) (0, 0)) ((0, 0) (0, 0), (0, 0) (0, 0))
The number of solutions (sequence A058527 in OEIS) is
Links to the Perl source of the backtracking approach, as well as a MAPLE and a C
implementation of the dynamic programming approach may be found among the external
links.
Checkerboard
Consider a checkerboard with n × n squares and a cost-function c(i, j) which returns a
cost associated with square i, j (i being the row, j being the column). For instance (on a 5
× 5 checkerboard),
4 7 6 1 1 4
3 3 5 7 8 2
2 - 6 7 0 -
1 - - *5* - -
1 2 3 4 5
Thus c(1, 3) = 5
Let us say you had a checker that could start at any square on the first rank (i.e., row) and
you wanted to know the shortest path (sum of the costs of the visited squares are at a
minimum) to get to the last rank, assuming the checker could move only diagonally left
forward, diagonally right forward, or straight forward. That is, a checker on (1,3) can
move to (2,2), (2,3) or (2,4).
5
2 x x x
1 o
1 2 3 4 5
This problem exhibits optimal substructure. That is, the solution to the entire problem
relies on solutions to subproblems. Let us define a function q(i, j) as
q(i, j) = the minimum cost to reach square (i, j)
If we can find the values of this function for all the squares at rank n, we pick the
minimum and follow that path backwards to get the shortest path.
Note that q(i, j) is equal to the minimum cost to get to any of the three squares below it
(since those are the only squares that can reach it) plus c(i, j). For instance:
5
4 A
1 2 3 4 5
q(A) = min(q(B),q(C),q(D)) + c(A)
Now, let us define q(i, j) in somewhat more general terms:
The first line of this equation is there to make the recursive property simpler (when
dealing with the edges, so we need only one recursion). The second line says what
happens in the last rank, to provide a base case. The third line, the recursion, is the
important part. It is similar to the A,B,C,D example. From this definition we can make a
straightforward recursive code for q(i, j). In the following pseudocode, n is the size of the
board, c(i, j) is the cost-function, and min() returns the minimum of a number of
values:
function minCost(i, j)
if j < 1 or j > n
return infinity
else if i = 1
return c(i, j)
else
return min( minCost(i+1, j-1), minCost(i+1, j), minCost(i+1, j+1) ) + c(i, j)
It should be noted that this function only computes the path-cost, not the actual path. We
will get to the path soon. This, like the Fibonacci-numbers example, is horribly slow
since it spends mountains of time recomputing the same shortest paths over and over.
However, we can compute it much faster in a bottom-up fashion if we store path-costs in
a two-dimensional array q[i, j] rather than using a function. This avoids
o G=(V,E) with V partitioned into K >= 2 disjoint subsets such that if (a,b) is in E,
then a is in Vi , and b is in Vi+1 for some subsets in the partition;
o and | V1 | = | VK | = 1.
The vertex s in V1 is called the source; the vertex t in VK is called the sink.
The cost of a path from node v to node w is sum of the costs of edges in the path.
The "multistage graph problem" is to find the minimum cost path from s to t.
{ V(i+1,j) } j=0 .. n
The edges are weighted with C(i,j) = -P(i,j) [the negative of the profit] to make it a
minimization problem.
Let path(i,j) be some specification of the minimal path from vertex j in set i to
vertex t; C(i,j) is the cost of this path;c(j,t) is the weight of the edge from j to t.
(j,l) in E
To write a simple algorithm, assign numbers to the vertices so those in stage Vi have
lower number those in stage Vi+1.
int[] MStageForward(Graph G)
{
// returns vector of vertices to follow through the graph
// let c[i][j] be the cost matrix of G
Floyd's Algorithm
A[i, j] = C[i, j] if i j
=0 if i = j
The kth pass explores whether the vertex k lies on an optimal path from i to j,
i, j
We use
Ak[i, j] = min
The algorithm:
{ int i, j, k;
{ for(i = 0, i n - 1;i + +)
for(j = 0;j n - 1, j + +)
A[i, i] = 0;
for(j = 0;j n - 1, j + +)
A0 = ; A1 = A2
= ; A3 =
If we don't plan on modifying a search tree, and we know exactly how often each item
will be accessed, we can construct an optimal binary search tree, which is a search tree
where the average cost of looking up an item (the expected search cost) is minimized.
Even if we only have estimates of the search costs, such a system can considerably speed
up lookups on average. For example, if you have a BST of English words used in a spell
checker, you might balance the tree based on word frequency in text corpora, placing
words like "the" near the root and words like "agerasia" near the leaves. Such a tree might
be compared with Huffman trees, which similarly seek to place frequently-used items
If we do not know the sequence in which the elements in the tree will be accessed in
advance, we can use splay trees which are asymptotically as good as any static search tree
we can construct for any particular sequence of lookup operations.
Alphabetic trees are Huffman trees with the additional constraint on order, or,
equivalently, search trees with the modification that all elements are stored in the leaves.
Faster algorithms exist for optimal alphabetic binary trees (OABTs).
Example:
Let i be the highest-numbered item in an optimal solution S for W pounds. Then S`= S -
{i} is an optimal solution for W-wipounds and the value to the solution S is Vi plus the
value of the subproblem.
We can express this fact in the following formula: define c[i, w] to be the solution for
items 1,2, . . . , i and maximum weight w. Then
0 if i = 0 or w = 0
This says that the value of the solution to i items either include ith item, in which case it
is vi plus a subproblem solution for (i-1) items and the weight excluding wi, or does not
include ith item, in which case it is a subproblem's solution for (i-1) items and the same
weight. That is, if the thief picks item i, thief takes vi value, and thief can choose from
items w-wi, and get c[i-1, w-wi] additional value. On other hand, if thief decides not to
take item i, thief can choose from item 1,2, . . . , i-1 upto the weight limitw, and get c[i-
1, w] value. The better of these two choices should be made.Although the 0-1 knapsack
problem, the above formula for c is similar to LCS formula: boundary values are 0, and
other values are computed from the input and "earlier" values of c. So the 0-1 knapsack
algorithm is like the LCS-length algorithm given in CLR-book for finding a longest
common subsequence of two sequences.
The algorithm takes as input the maximum weight W, the number of items n, and the two
sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . , wn>. It stores the c[i, j] values in
the table, that is, a two dimensional array, c[0 . . n, 0 . . w] whose entries are computed in
a row-major order. That is, the first row of c is filled in from left to right, then the second
row, and so on. At the end of the computation, c[n, w] contains the maximum value that
can be picked into the knapsack.
Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W
do c[0, w] = 0
for i = 1 to n
do c[i, 0] = 0
for w = 1 to W
do if wi ≤ w
then if vi + c[i-1, w-wi]
then c[i, w] = vi + c[i-1, w-wi]
The set of items to take can be deduced from the table, starting at c[n. w] and tracing
backwards where the optimal values came from. If c[i, w] = c[i-1, w] item i is not part of
the solution, and we are continue tracing with c[i-1, w]. Otherwise item i is part of the
solution, and we continue tracing with c[i-1, w-W].
Analysis
θ(nw) times to fill the c-table, which has (n+1).(w+1) entries, each requiring θ(1) time to
compute. O(n) time to trace the solution, because the tracing process starts in row n of the
table and moves up 1 row at each step.
The problem was first formulated as a mathematical problem in 1930 and is one of the
most intensively studied problems in optimization. It is used as a benchmark for many
optimization methods. Even though the problem is computationally difficult, a large
number of heuristics and exact methods are known, so that some instances with tens of
thousands of cities can be solved.
The TSP has several applications even in its purest formulation, such
as planning, logistics, and the manufacture of microchips. Slightly modified, it appears as
a sub-problem in many areas, such as DNA sequencing. In these applications, the
concept cityrepresents, for example, customers, soldering points, or DNA fragments, and
In the theory of computational complexity, the decision version of the TSP (where, given
a length L, the task is to decide whether any tour is shorter than L) belongs to the class
of NP-complete problems. Thus, it is likely that the worst case running time for any
algorithm for the TSP increases exponentially with the number of cities.
The origins of the travelling salesman problem are unclear. A handbook for travelling
salesmen from 1832 mentions the problem and includes example tours through Germany
and Switzerland, but contains no mathematical treatment.
the travelling salesman problem was defined in the 1800s by the Irish mathematician W.
R. Hamilton and by the British mathematician Thomas Kirkman. Hamilton‘s Icosian
Gamewas a recreational puzzle based on finding a Hamiltonian cycle.[2] The general form
of the TSP appears to have been first studied by mathematicians during the 1930s in
Vienna and at Harvard, notably by Karl Menger, who defines the problem, considers the
obvious brute-force algorithm, and observes the non-optimality of the nearest neighbour
heuristic:
Richard M. Karp showed in 1972 that the Hamiltonian cycle problem was NP-complete,
which implies the NP-hardness of TSP. This supplied a mathematical explanation for the
apparent computational difficulty of finding optimal tours.Great progress was made in the
DATA STRUCTURES AND COMPUTER ALGORITHMS
late 1970s and 1980, when Grötschel, Padberg, Rinaldi and others managed to exactly
solve instances with up to 2392 cities, using cutting planes and branch-and-bound.
In the 1990s, Applegate, Bixby, Chvátal, and Cook developed the program Concorde that
has been used in many recent record solutions. Gerhard Reinelt published the TSPLIB in
1991, a collection of benchmark instances of varying difficulty, which has been used by
many research groups for comparing results. In 2005, Cook and others computed an
optimal tour through a 33,810-city instance given by a microchip layout problem,
currently the largest solved TSPLIB instance. For many other instances with millions of
cities, solutions can be found that are guaranteed to be within 1% of an optimal tour.
TSP can be modeled as an undirected weighted graph, such that cities are the
graph'svertices, paths are the graph's edges, and a path's distance is the edge's length. A
TSP tour becomes a Hamiltonian cycle if and only if every edge has the same distance.
Often, the model is a complete graph (i.e., an edge connects each pair of vertices). If no
path exists between two cities, adding an arbitrarily long edge will complete the graph
without affecting the optimal tour.
In the symmetric TSP, the distance between two cities is the same in each opposite
direction, forming an undirected graph. This symmetry halves the number of possible
solutions. In the asymmetric TSP, paths may not exist in both directions or the distances
might be different, forming a directed graph. Traffic collisions, one-way streets, and
airfares for cities with different departure and arrival fees are examples of how this
symmetry could break down.
The generalized travelling salesman problem deals with "states" that have (one or
more) "cities" and the salesman has to visit exactly one "city" from each "state". Also
known as the "travelling politician problem". One application is encountered in
ordering a solution to the cutting stock problem in order to minimise knife changes.
Another is concerned with drilling in semiconductor manufacturing, see e.g. U.S.
Patent 7,054,798. Surprisingly, Behzad and Modarres[5] demonstrated that the
generalised travelling salesman problem can be transformed into a standard travelling
salesman problem with the same number of cities, but a modified distance matrix.
The sequential ordering problem deals with the problem of visiting a set of cities
where precedence relations between the cities exist.
The travelling purchaser problem deals with a purchaser who is charged with
purchasing a set of products. He can purchase these products in several cities, but at
The traditional lines of attack for the NP-hard problems are the following:
Devising algorithms for finding exact solutions (they will work reasonably fast
only for small problem sizes).
Devising "suboptimal" or heuristic algorithms, i.e., algorithms that deliver either
seemingly or probably good solutions, but which could not be proved to be optimal.
Finding special cases for the problem ("subproblems") for which either better or
exact heuristics are possible.
Computational complexity
The problem has been shown to be NP-hard (more precisely, it is complete for
the complexity class FPNP; see function problem), and the decision problem version
("given the costs and a number x, decide whether there is a round-trip route cheaper
than x") is NP-complete. The bottleneck travelling salesman problem is also NP-hard.
The problem remains NP-hard even for the case when the cities are in the plane
with Euclidean distances, as well as in a number of other restrictive cases. Removing the
condition of visiting each city "only once" does not remove the NP-hardness, since it is
easily seen that in the planar case there is an optimal tour that visits each city only once
(otherwise, by the triangle inequality, a shortcut that skips a repeated visit would not
increase the tour length).
Complexity of approximation
In the general case, finding a shortest travelling salesman tour is NPO-complete. If the
distance measure is a metric and symmetric, the problem becomes APX-
complete and Christofides‘s algorithm approximates it within 1.5. If the distances are
restricted to 1 and 2 (but still are a metric) the approximation ratio becomes 7/6. In the
asymmetric, metric case, only logarithmic performance guarantees are known, the best
current algorithm achieves performance ratio 0.814 log n; it is an open question if a
constant factor approximation exists.
Exact algorithms
The most direct solution would be to try all permutations (ordered combinations) and see
which one is cheapest (using brute force search). The running time for this approach lies
within a polynomial factor of O(n!), the factorial of the number of cities, so this solution
becomes impractical even for only 20 cities. One of the earliest applications of dynamic
programming is the Held–Karp algorithm that solves the problem in time O(n22n).
Improving these time bounds seems to be difficult. For example, it is an open problem if
there exists an exact algorithm for TSP that runs in time O(1.9999n)
An exact solution for 15,112 German towns from TSPLIB was found in 2001 using
the cutting-plane method proposed by George Dantzig, Ray Fulkerson, and Selmer
Johnson in 1954, based on linear programming. The computations were performed on a
network of 110 processors located at Rice University and Princeton University (see the
Various heuristics and approximation algorithms, which quickly yield good solutions
have been devised. Modern methods can find solutions for extremely large problems
(millions of cities) within a reasonable time which are with a high probability just 2–3%
away from the optimal solution.
Constructive heuristics
The nearest neighbour (NN) algorithm (or so-called greedy algorithm) lets the salesman
choose the nearest unvisited city as his next move. This algorithm quickly yields an
effectively short route. For N cities randomly distributed on a plane, the algorithm on
average yields a path 25% longer than the shortest possible path.[17] However, there exist
many specially arranged city distributions which make the NN algorithm give the worst
route (Gutin, Yeo, and Zverovich, 2002). This is true for both asymmetric and symmetric
TSPs (Gutin and Yeo, 2007). Rosenkrantz et al. [1977] showed that the NN algorithm has
the approximation factor Θ(log | V | ) for instances satisfying the triangle inequality.
The bitonic tour of a set of points is the minimum-perimeter monotone polygon that has
the points as its vertices; it can be computed efficiently by dynamic programming.
Iterative improvement
ACS sends out a large number of virtual ant agents to explore many
possible routes on the map. Each ant probabilistically chooses the next city
to visit based on a heuristic combining the distance to the city and the
amount of virtual pheromone deposited on the edge to the city. The ants
explore, depositing pheromone on each edge that they cross, until they
have all completed a tour. At this point the ant which completed the
shortest tour deposits virtual pheromone along its complete tour route
(global trail updating). The amount of pheromone deposited is inversely
proportional to the tour length: the shorter the tour, the more it deposits.
Special cases
Metric TSP
The edge lengths then form a metric on the set of vertices. When the
cities are viewed as points in the plane, many natural distance
functions are metrics, and so many natural instances of TSP satisfy
this constraint.
The following are some examples of metric TSPs for various metrics.
In the Euclidean TSP (see below) the distance between two cities
is the Euclidean distance between the corresponding points.
In the rectilinear TSP the distance between two cities is the sum of
the differences of their x- and y-coordinates. This metric is often
called the Manhattan distance or city-block metric.
In the maximum metric, the distance between two points is the
maximum of the absolute values of differences of their x- and y-
coordinates.
Flow shop scheduling problems
Flow Shop Scheduling Problems, or FSPs, are a class of scheduling problems with a
work shop or group shop in which the flow control shall enable an appropriate
sequencing for each job and for processing on a set of machines or with other resources
1,2,...,m in compliance with given processing orders. Especially the maintaining of a
continuous flow of processing tasks is desired with a minimum of idle time and a
minimum of waiting time. FSP may apply as well to production facilities as to computing
designs.
Good News: Approximable within m/2 if m is even and within m/2 + 1/6 if m is odd
[104].
Bad News: Not approximable within 5/4 for any [466].
Comment: Approximable within 5/3 if m=3 [104]. Variation in which m=2, but the two
processors are replaced by respectively and identical parallel processors, is approximable
within [103].
Garey and Johnson: SS15see MINIMUM OPEN-SHOP SCHEDULING) such that, for
each and , .MEASURE: The completion time of the schedule.