Data Structures and Algorithm Lecture
Data Structures and Algorithm Lecture
CONTENT
What is Data Structure?
Data Structure and Algorithms Linear Search, binary search, Hash Table
Time Complexity − Running time or the execution time of operations of data structure
must be as small as possible.
Processor speed − Processor speed although being very high, falls limited if the data
grows to billion records.
To solve the above-mentioned problems, data structures come to rescue. Data can
be organized in a data structure in such a way that all items may not be required to
be searched, and the required data can be searched almost instantly.
Worst Case − This is the scenario where a particular data structure operation takes
maximum time it can take. If an operation's worst case time is ƒ(n) then this operation
will not take more than ƒ(n) time where ƒ(n) represents function of n.
Average Case − This is the scenario depicting the average execution time of an operation
of a data structure. If an operation takes ƒ(n) time in execution, then m operations will
take mƒ(n) time.
Best Case − This is the scenario depicting the least possible execution time of an
operation of a data structure. If an operation takes ƒ(n) time in execution, then the actual
operation may take time as the random number which would be maximum as ƒ(n).
From the data structure point of view, following are some important categories of
algorithms −
Characteristics of an Algorithm
Not all procedures can be called an algorithm. An algorithm should have the
following characteristics −
Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or
phases), and their inputs/outputs should be clear and must lead to only one meaning.
Output − An algorithm should have 1 or more well-defined outputs, and should match the
desired output.
As we know that all programming languages share basic code constructs like loops
(do, for, while), flow-control (if-else), etc. These common constructs can be used to
write an algorithm.
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
LECTURES NOTES @ HIBMAT BUEA ©NGOLAH [3]
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Algorithms tell the programmers how to code the program. Alternatively, the
algorithm can be written as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe
an algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all
unwanted definitions. He can observe what operations are being used and how the
process is flowing.
Hence, many solution algorithms can be derived for a given problem. The next step
is to analyze those proposed solution algorithms and implement the best suitable
solution.
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before
implementation and after implementation. They are the following −
We shall learn about a priori algorithm analysis. Algorithm analysis deals with the
execution or running time of various operations involved. The running time of an
operation can be defined as the number of computer instructions executed per
operation.
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used
by the algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space required by
the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space
required by the algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required
by the algorithm in its life cycle. The space required by an algorithm is equal to the
sum of the following two components −
A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and constants used,
program size, etc.
A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed
part and S(I) is the variable part of the algorithm, which depends on instance
characteristic I. Following is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Time Complexity
Time complexity of an algorithm represents the amount of time required by the
algorithm to run to completion. Time requirements can be defined as a numerical
function T(n), where T(n) can be measured as the number of steps, provided each
step consumes constant time.
For example, addition of two n-bit integers takes n steps. Consequently, the total
computational time is T(n) = c ∗ n, where c is the time taken for the addition of two
bits. Here, we observe that T(n) grows linearly as the input size increases.
Asymptotic Notations
Following are the commonly used asymptotic notations to calculate the running time
complexity of an algorithm.
Ω Notation
θ Notation
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's
running time. It measures the worst case time complexity or the longest amount of
time an algorithm can possibly take to complete.
Omega Notation, Ω
The notation Ω(n) is the formal way to express the lower bound of an algorithm's
running time. It measures the best case time complexity or the best amount of time
an algorithm can possibly take to complete.
Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and the upper
bound of an algorithm's running time. It is represented as follows −
Constant − Ο(1)
Logarithmic − Ο(log n)
Linear − Ο(n)
Quadratic − Ο(n2)
Cubic − Ο(n3)
Polynomial − nΟ(1)
Exponential − 2Ο(n)
Greedy algorithms try to find a localized optimum solution, which may eventually
lead to globally optimized solutions. However, generally greedy algorithms do not
provide globally optimized solutions.
Though, it seems to be working fine, for this count we need to pick only 4 coins. But
if we slightly change the problem then the same approach may not be able to
produce the same optimum result.
For the currency system, where we have coins of 1, 7, 10 value, counting coins for
value 18 will be absolutely optimum but for count like 15, it may use more coins
than necessary. For example, the greedy approach will use 10 + 1 + 1 + 1 + 1 + 1,
total 6 coins. Whereas the same problem could be solved by using only 3 coins (7 +
7 + 1)
Hence, we may conclude that the greedy approach picks an immediate optimized
solution and may fail where global optimization is a major concern.
Examples
Most networking algorithms use the greedy approach. Here is a list of few of them −
Knapsack Problem
There are lots of similar problems that uses the greedy approach to find an optimum
solution.
Divide/Break
This step involves breaking the problem into smaller sub-problems. Sub-problems
should represent a part of the original problem. This step generally takes a recursive
approach to divide the problem until no sub-problem is further divisible. At this
stage, sub-problems become atomic in nature but still represent some part of the
actual problem.
Conquer/Solve
This step receives a lot of smaller sub-problems to be solved. Generally, at this level,
the problems are considered 'solved' on their own.
Merge/Combine
When the smaller sub-problems are solved, this stage recursively combines them
until they formulate a solution of the original problem. This algorithmic approach
works recursively and conquer & merge steps works so close that they appear as
one.
Examples
LECTURES NOTES @ HIBMAT BUEA ©NGOLAH
[10]
The following computer algorithms are based on divide-and-conquerprogramming
approach −
Merge Sort
Quick Sort
Binary Search
There are various ways available to solve any computer problem, but the mentioned
are a good example of divide and conquer approach.
Dynamic programming is used where we have problems, which can be divided into
similar sub-problems, so that their results can be re-used. Mostly, these algorithms
are used for optimization. Before solving the in-hand sub-problem, dynamic
algorithm will try to examine the results of the previously solved sub-problems. The
solutions of sub-problems are combined in order to achieve the best solution.
Comparison
In contrast to greedy algorithms, where local optimization is addressed, dynamic
algorithms are motivated for an overall optimization of the problem.
Example
LECTURES NOTES @ HIBMAT BUEA ©NGOLAH
[11]
The following computer problems can be solved using dynamic programming
approach −
Knapsack problem
Tower of Hanoi
Project scheduling
Dynamic programming can be used in both top-down and bottom-up manner. And of
course, most of the times, referring to the previous solution output is cheaper than
recomputing in terms of CPU cycles.
Data Definition
Data Definition defines a particular data with the following characteristics.
Data Object
Data Object represents an object having a data.
Data Type
Data type is a way to classify various types of data such as integer, string, etc. which
determines the values that can be used with the corresponding type of data, the type
of operations that can be performed on the corresponding type of data. There are
two data types −
Integers
List
Array
Stack
Queue
Basic Operations
The data in the data structures are processed by certain operations. The particular
data structure chosen largely depends on the frequency of the operation that needs
to be performed on the data structure.
Traversing
Searching
Insertion
Deletion
Sorting
Merging
Linked List is a sequence of links which contains items. Each link contains a
connection to another link. Linked list is the second most-used data structure after
array. Following are the important terms to understand the concept of Linked List.
Link − Each link of a linked list can store a data called an element.
Next − Each link of a linked list contains a link to the next link called Next.
LinkedList − A Linked List contains the connection link to the first link called First.
Link − Each link of a linked list can store a data called an element.
Next − Each link of a linked list contains a link to the next link called Next.
Prev − Each link of a linked list contains a link to the previous link called Prev.
LinkedList − A Linked List contains the connection link to the first link called First and to
the last link called Last.
This feature makes it LIFO data structure. LIFO stands for Last-in-first-out. Here, the
element which is placed (inserted or added) last, is accessed first. In stack
terminology, insertion operation is called PUSH operation and removal operation is
called POP operation.
Stack Representation
The following diagram depicts a stack and its operations −
A stack can be implemented by means of Array, Structure, Pointer, and Linked List.
Stack can either be a fixed size one or it may have a sense of dynamic resizing.
Here, we are going to implement stack using arrays, which makes it a fixed size
stack implementation.
Basic Operations
Stack operations may involve initializing the stack, using it and then de-initializing it.
Apart from these basic stuffs, a stack is used for the following two primary
operations −
peek() − get the top data element of the stack, without removing it.
At all times, we maintain a pointer to the last PUSHed data on the stack. As this
pointer always represents the top of the stack, hence named top. The toppointer
provides top value of the stack without actually removing it.
Infix Notation
These notations are named as how they use operator in expression. We shall learn
the same here in this chapter.
Infix Notation
We write expression in infix notation, e.g. a - b + c, where operators are used in-
between operands. It is easy for us humans to read, write, and speak in infix
notation but the same does not go well with computing devices. An algorithm to
process infix notation could be difficult and costly in terms of time and space
consumption.
Prefix Notation
In this notation, operator is prefixed to operands, i.e. operator is written ahead of
operands. For example, +ab. This is equivalent to its infix notation a + b. Prefix
notation is also known as Polish Notation.
Postfix Notation
This notation style is known as Reversed Polish Notation. In this notation style,
the operator is postfixed to the operands i.e., the operator is written after the
operands. For example, ab+. This is equivalent to its infix notation a + b.
The following table briefly tries to show the difference in all three notations −
LECTURES NOTES @ HIBMAT BUEA ©NGOLAH
[16]
Sr.No. Infix Notation Prefix Notation Postfix Notation
2 (a + b) ∗ c ∗+abc ab+c∗
3 a ∗ (b + c) ∗a+bc abc+∗
5 (a + b) ∗ (c + d) ∗+ab+cd ab+cd+∗
Parsing Expressions
As we have discussed, it is not a very efficient way to design an algorithm or
program to parse infix notations. Instead, these infix notations are first converted
into either postfix or prefix notations and then computed.
Precedence
When an operand is in between two different operators, which operator will take the
operand first, is decided by the precedence of an operator over others. For example
−
Associativity
Associativity describes the rule where operators with the same precedence appear in
an expression. For example, in expression a + b − c, both + and – have the same
precedence, then which part of the expression will be evaluated first, is determined
by associativity of those operators. Here, both + and − are left associative, so the
expression will be evaluated as (a + b) − c.
The above table shows the default behavior of operators. At any point of time in
expression evaluation, the order can be altered by using parenthesis. For example −
In a + b*c, the expression part b*c will be evaluated first, with multiplication as
precedence over addition. We here use parenthesis for a + b to be evaluated first,
like (a + b)*c.
A real-world example of queue can be a single-lane one-way road, where the vehicle
enters first, exits first. More real-world examples can be seen as queues at the ticket
windows and bus-stops.
Queue Representation
As we now understand that in queue, we access both ends for different reasons. The
following diagram given below tries to explain queue representation as data structure
−
Basic Operations
Queue operations may involve initializing or defining the queue, utilizing it, and then
completely erasing it from the memory. Here we shall try to understand the basic
operations associated with queues −
Few more functions are required to make the above-mentioned queue operation
efficient. These are −
peek() − Gets the element at the front of the queue without removing it.
In queue, we always dequeue (or access) data, pointed by front pointer and while
enqueing (or storing) data in the queue we take help of rear pointer.
Step 1: Set i to 1
Step 2: if i > n then go to step 7
Step 3: if A[i] = x then go to step 6
Step 4: Set i to i + 1
Step 5: Go to Step 2
Step 6: Print Element x Found at index i and go to step 8
Step 7: Print element not found
Step 8: Exit
Binary search looks for a particular item by comparing the middle most item of the
collection. If a match occurs, then the index of item is returned. If the middle item is
greater than the item, then the item is searched in the sub-array to the left of the
middle item. Otherwise, the item is searched for in the sub-array to the right of the
middle item. This process continues on the sub-array as well until the size of the
subarray reduces to zero.
Thus, it becomes a data structure in which insertion and search operations are very
fast irrespective of the size of the data. Hash Table uses an array as a storage
medium and uses hash technique to generate an index where an element is to be
inserted or is to be located from.
The importance of sorting lies in the fact that data searching can be optimized to a
very high level, if data is stored in a sorted manner. Sorting is also used to represent
data in more readable formats. Following are some of the examples of sorting in
real-life scenarios −
Telephone Directory − The telephone directory stores the telephone numbers of people
sorted by their names, so that the names can be searched easily.
Dictionary − The dictionary stores words in an alphabetical order so that searching of any
word becomes easy.
The array is searched sequentially and unsorted items are moved and inserted into
the sorted sub-list (in the same array). This algorithm is not suitable for large data
sets as its average and worst case complexity are of Ο(n2), where n is the number of
items.
The smallest element is selected from the unsorted array and swapped with the
leftmost element, and that element becomes a part of the sorted array. This process
continues moving unsorted array boundary by one element to the right.
This algorithm is not suitable for large data sets as its average and worst case
complexities are of Ο(n2), where n is the number of items.
Merge sort first divides the array into equal halves and then combines them in a
sorted manner.
Shell sort is a highly efficient sorting algorithm and is based on insertion sort
algorithm. This algorithm avoids large shifts as in case of insertion sort, if the smaller
value is to the far right and has to be moved to the far left.
This algorithm uses insertion sort on a widely spread elements, first to sort them and
then sorts the less widely spaced elements. This spacing is termed as interval. This
interval is calculated based on Knuth's formula as −
Knuth's Formula
h = h * 3 + 1
where −
h is interval with initial value 1
This algorithm is quite efficient for medium-sized data sets as its average and worst
case complexity are of Ο(n), where n is the number of items.
This algorithm uses insertion sort on a widely spread elements, first to sort them and
then sorts the less widely spaced elements. This spacing is termed as interval. This
interval is calculated based on Knuth's formula as −
Knuth's Formula
h = h * 3 + 1
where −
h is interval with initial value 1
This algorithm is quite efficient for medium-sized data sets as its average and worst-
case complexity of this algorithm depends on the gap sequence the best known is
Ο(n), where n is the number of items. And the worst case space complexity is O(n).
Quick sort partitions an array and then calls itself recursively twice to sort the two
resulting subarrays. This algorithm is quite efficient for large-sized data sets as its
average and worst case complexity are of Ο(n2), where n is the number of items.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and Eis the
set of edges, connecting the pairs of vertices. Take a look at the following graph −
V = {a, b, c, d, e}
Vertex − Each node of the graph is represented as a vertex. In the following example, the
labeled circle represents vertices. Thus, A to G are vertices. We can represent them using
an array as shown in the following image. Here A can be identified by index 0. B can be
identified using index 1 and so on.
Edge − Edge represents a path between two vertices or a line between two vertices. In
the following example, the lines from A to B, B to C, and so on represents edges. We can
use a two-dimensional array to represent an array as shown in the following image. Here
AB can be represented as 1 at row 0, column 1, BC as 1 at row 1, column 2 and so on,
keeping other combinations as 0.
Adjacency − Two node or vertices are adjacent if they are connected to each other
through an edge. In the following example, B is adjacent to A, C is adjacent to B, and so
on.
Path − Path represents a sequence of edges between the two vertices. In the following
example, ABCD represents a path from A to D.
Add Edge − Adds an edge between the two vertices of the graph.
Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Push it in a
stack.
Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It will pop up all
the vertices from the stack, which do not have adjacent vertices.)
Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Insert it in a
queue.
Rule 2 − If no adjacent vertex is found, remove the first vertex from the queue.
Binary Tree is a special datastructure used for data storage purposes. A binary tree
has a special condition that each node can have a maximum of two children. A binary
tree has the benefits of both an ordered array and a linked list as search is as quick
as in a sorted array and insertion or deletion operation are as fast as in linked list.
Path − Path refers to the sequence of nodes along the edges of a tree.
Root − The node at the top of the tree is called root. There is only one root per tree and
one path from the root node to any node.
Parent − Any node except the root node has one edge upward to a node called parent.
Child − The node below a given node connected by its edge downward is called its child
node.
Leaf − The node which does not have any child node is called the leaf node.
Visiting − Visiting refers to checking the value of a node when control is on the node.
Levels − Level of a node represents the generation of a node. If the root node is at level
0, then its next child node is at level 1, its grandchild is at level 2, and so on.
In-order Traversal
Pre-order Traversal
Post-order Traversal
Generally, we traverse a tree to search or locate a given item or key in the tree or to
print all the values it contains.
In-order Traversal
In this traversal method, the left subtree is visited first, then the root and later the
right sub-tree. We should always remember that every node may represent a
subtree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an
ascending order.
D→B→E→A→F→C→G
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Visit root node.
Step 3 − Recursively traverse right subtree.
Pre-order Traversal
In this traversal method, the root node is visited first, then the left subtree and
finally the right subtree.
A→B→D→E→C→F→G
Algorithm
Until all nodes are traversed −
Step 1 − Visit root node.
Step 2 − Recursively traverse left subtree.
Step 3 − Recursively traverse right subtree.
Post-order Traversal
In this traversal method, the root node is visited last, hence the name. First we
traverse the left subtree, then the right subtree and finally the root node.
We start from A, and following Post-order traversal, we first visit the left
subtree B. B is also traversed post-order. The process goes on until all the nodes are
visited. The output of post-order traversal of this tree will be −
D→E→B→F→G→C→A
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Recursively traverse right subtree.
Step 3 − Visit root node.
The right sub-tree of a node has a key greater than to its parent node's key.
Thus, BST divides all its sub-trees into two segments; the left sub-tree and the right
sub-tree and can be defined as −
left_subtree (keys) ≤ node (key) ≤ right_subtree (keys)
Representation
BST is a collection of nodes arranged in a way where they maintain BST properties.
Each node has a key and an associated value. While searching, the desired key is
compared to the keys in BST and if found, the associated value is retrieved.
We observe that the root node key (27) has all less-valued keys on the left sub-tree
and the higher valued keys on the right sub-tree.
Basic Operations
Following are the basic operations of a tree −
We found three spanning trees off one complete graph. A complete undirected graph
can have maximum nn-2 number of spanning trees, where n is the number of nodes.
In the above addressed example, n is 3, hence 33−2 = 3spanning trees are possible.
All possible spanning trees of graph G, have the same number of edges and vertices.
Removing one edge from the spanning tree will make the graph disconnected, i.e. the
spanning tree is minimally connected.
Adding one edge to the spanning tree will create a circuit or loop, i.e. the spanning tree
is maximally acyclic.
Thus, we can conclude that spanning trees are a subset of connected Graph G and
disconnected graphs do not have spanning tree.
Cluster Analysis
Let us understand this through a small example. Consider, city network as a huge
graph and now plans to deploy telephone lines in such a way that in minimum lines
we can connect to all city nodes. This is where the spanning tree comes into picture.
Kruskal's Algorithm
Prim's Algorithm
key(α) ≥ key(β)
As the value of parent is greater than that of child, this property generates Max
Heap. Based on this criteria, a heap can be of two types −
For Input → 35 33 42 10 14 19 27 44 26 31
Max-Heap − Where the value of the root node is greater than or equal to either of
its children.
Both trees are constructed using the same input and order of arrival.
F8 = 0 1 1 2 3 5 8 13
or, this −
F8 = 1 1 2 3 5 8 13 21