unit 1
unit 1
Algorithm:
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a
certain order to get the desired output. Algorithms are generally created independent of
underlying languages, i.e. an algorithm can be implemented in more than one programming
language.
From the data structure point of view, following are some important categories of algorithms
−
Search − Algorithm to search an item in a data structure.
Sort − Algorithm to sort items in a certain order.
Insert − Algorithm to insert item in a data structure.
Update − Algorithm to update an existing item in a data structure.
Delete − Algorithm to delete an existing item from a data structure.
How to Write an Algorithm?
There are no well-defined standards for writing algorithms. Rather, it is problem and
resource dependent. Algorithms are never written to support a particular programming
code.
As we know that all programming languages share basic code constructs like loops (do,
for, while), flow-control (if-else), etc. These common constructs can be used to write
an algorithm.
We write algorithms in a step-by-step manner, but it is not always the case. Algorithm
writing is a process and is executed after the problem domain is well-defined. That is,
we should know the problem domain, for which we are designing a solution.
Example
Let's try to learn algorithm-writing by using an example.
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Algorithms tell the programmers how to code the program. Alternatively, the algorithm can be
written as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe an
algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all unwanted
definitions. He can observe what operations are being used and how the process is flowing.
Writing step numbers, is optional.
We design an algorithm to get a solution of a given problem. A problem can be solved in more
than one ways.
Hence, many solution algorithms can be derived for a given problem. The next step is to
analyze those proposed solution algorithms and implement the best suitable solution.
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation and
after implementation. They are the following −
A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an
algorithm is measured by assuming that all other factors, for example, processor speed,
are constant and have no effect on the implementation.
A Posterior Analysis − This is an empirical analysis of an algorithm. The selected
algorithm is implemented using programming language. This is then executed on target
computer machine. In this analysis, actual statistics like running time and space
required, are collected.
We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution
or running time of various operations involved. The running time of an operation can be
defined as the number of computer instructions executed per operation.
Algorithms as a Technology
Efficiency Different algorithms devised to solve the same problem often differ dramatically in
their efficiency. These differences can be much more significant than differences due to
hardware and software.
Algorithms and other technologies The example above shows that we should consider
algorithms, like computer hardware, as a technology. Total system performance depends on
choosing efficient algorithms as much as on choosing fast hardware. Just as rapid advances are
being made in other computer technologies, they are being made in algorithms as well. You
might wonder whether algorithms are truly that important on contemporary computers in light
of other advanced technologies, such as
advanced computer architectures and fabrication technologies,
easy-to-use, intuitive, graphical user interfaces (GUIs),
object-oriented systems,
integrated Web technologies, and
fast networking, both wired and wireless.
Insertion sort
Our first algorithm, insertion sort, solves the sorting problem introduced in Chapter 1: Input:
A sequence of n numbers ha1; a2;:::;ani. Output: A permutation (reordering) ha0 1; a0 2;:::;a0
ni of the input sequence such that a0 1 a0 2 a0 n. The numbers that we wish to sort are also
known as the keys. Although conceptually we are sorting a sequence, the input comes to us in
the form of an array with n elements.
Algorithm Complexity:
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space required
by the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required
by the algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the
following two components −
A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and constants
used, program size, etc.
A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and
S(I) is the variable part of the algorithm, which depends on instance characteristic I. Following
is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space
depends on data types of given variables and constant types and it will be multiplied
accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to
run to completion. Time requirements can be defined as a numerical function T(n), where T(n)
can be measured as the number of steps, provided each step consumes constant time.
For example, addition of two n-bit integers takes n steps. Consequently, the total
computational time is T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here,
we observe that T(n) grows linearly as the input size increases.
Asymptotic Notation:
The main idea of asymptotic analysis is to have a measure of efficiency of algorithms that
doesn’t depend on machine specific constants, and doesn’t require algorithms to be
implemented and time taken by programs to be compared. Asymptotic notations are
mathematical tools to represent time complexity of algorithms for asymptotic analysis. The
following 3 asymptotic notations are mostly used to represent time complexity of algorithms.
1) Θ Notation: The theta notation bounds a functions from above and below, so it defines
exact asymptotic behavior.
A simple way to get Theta notation of an expression is to drop low order terms and ignore
leading constants. For example, consider the following expression.
3n3 + 6n2 + 6000 = Θ(n3)
Dropping lower order terms is always fine because there will always be a n0 after which Θ(n3)
has higher values than Θn2) irrespective of the constants involved.
For a given function g(n), we denote Θ(g(n)) is following set of functions.
Θ(g(n)) = {f(n): there exist positive constants c1, c2 and n0 such
that 0 <= c1*g(n) <= f(n) <= c2*g(n) for all n >= n0}
The above definition means, if f(n) is theta of g(n), then the value f(n) is always between
c1*g(n) and c2*g(n) for large values of n (n >= n0). The definition of theta also requires that
f(n) must be non-negative for values of n greater than n0.
2) Big O Notation: The Big O notation defines an upper bound of an algorithm, it bounds a
function only from above. For example, consider the case of Insertion Sort. It takes linear time
in best case and quadratic time in worst case. We can safely say that the time complexity of
Insertion sort is O(n^2). Note that O(n^2) also covers linear time.
If we use Θ notation to represent time complexity of Insertion sort, we have to use two
statements for best and worst cases:
1. The worst case time complexity of Insertion Sort is Θ(n^2).
2. The best case time complexity of Insertion Sort is Θ(n).
The Big O notation is useful when we only have upper bound on time complexity of an
algorithm. Many times we easily find an upper bound by simply looking at the algorithm.
O(g(n)) = { f(n): there exist positive constants c and
n0 such that 0 <= f(n) <= c*g(n) for
all n >= n0}
Efficient algorithms are of paramount importance when working with advanced data structures.
Advanced data structures provide more specialized ways of organizing and storing data to
improve various aspects of algorithmic performance, such as time complexity, space
complexity, and overall efficiency. However, without efficient algorithms to operate on these
data structures, their potential benefits might not be fully realized. Here's why efficient
algorithms are crucial when working with advanced data structures:
Optimized Resource Usage: Advanced data structures are designed to optimize certain
operations, like insertion, deletion, search, etc. However, if inefficient algorithms are used, the
benefits of these data structures might be offset by poor performance. Efficient algorithms help
ensure that the advanced data structures are utilized optimally, leading to minimized resource
usage (CPU time, memory, etc.).
Time Complexity: Advanced data structures often offer improved time complexity for specific
operations compared to simpler data structures. For example, self-balancing binary search trees
like AVL trees or red-black trees have logarithmic height, which ensures efficient searching.
However, to maintain this time complexity, you need algorithms that take advantage of the
balanced properties. Inefficient algorithms might lead to worst-case behavior, degrading
performance.
Space Complexity: While advanced data structures might have more complex internal
arrangements, they often optimize space usage. For instance, a hash table with open addressing
can have better space efficiency compared to a basic array. Efficient algorithms ensure that this
space optimization is maintained and not squandered due to poor algorithmic choices.
Consistency: Advanced data structures often come with certain properties and invariants that
need to be maintained to ensure their efficiency. Efficient algorithms help in performing
operations that respect these properties, keeping the data structure in a consistent and optimized
state.
Scalability: As datasets grow larger, the impact of algorithmic inefficiencies becomes more
pronounced. Advanced data structures are often chosen to handle large-scale applications
precisely because of their efficiency improvements. Efficient algorithms are crucial for scaling
these structures to handle massive datasets.
Real-world Applications: Many real-world applications, such as databases, networking
protocols, compilers, and simulations, rely on advanced data structures to optimize their
operations. Without efficient algorithms, these applications might suffer from poor performance
and sluggish behavior.
Algorithmic Innovation: The study of advanced data structures and the development of
efficient algorithms often go hand in hand. The design and analysis of algorithms for these
structures can lead to new insights and innovations in computer science, driving progress in
various fields.
Competitive Advantage: In industries where performance is crucial, such as finance, gaming,
and machine learning, having efficient algorithms operating on advanced data structures can
provide a competitive advantage. Faster processing and response times can lead to better user
experiences and more efficient business operations.
Comparison of running times For each function f .n/ and time t in the following table,
determine the largest size n of a problem that can be solved in time t, assuming that the
algorithm to solve the problem takes f .n/ microseconds.
Recurrences:
Many algorithms are recursive in nature. When we analyze them, we get a recurrence relation
for time complexity. We get running time on an input of size n as a function of n and the
running time on inputs of smaller sizes. For example in merge sort, to sort a given array, we
divide it in two halves and recursively repeat the process for the two halves. Finally we merge
the results. Time complexity of Merge Sort can be written as T(n) = 2T(n/2) + cn.
1) Substitution Method: We make a guess for the solution and then we use mathematical
induction to prove the guess is correct or incorrect.
For example consider the recurrence T(n) = 2T(n/2) + n
We need to prove that T(n) <= cnLogn. We can assume that it is true
for values smaller than n.
T(n) = 2T(n/2) + n
<= cn/2Log(n/2) + n
= cnLogn - cnLog2 + n
= cnLogn - cn + n
<= cnLogn
2) Recurrence Tree Method: In this method, we draw a recurrence tree and calculate the
time taken by every level of tree. Finally, we sum the work done at all levels. To draw the
recurrence tree, we start from the given recurrence and keep drawing till we find a pattern
among levels. The pattern is typically a arithmetic or geometric series.
For example consider the recurrence relation
T(n) = T(n/4) + T(n/2) + cn2
cn2
/ \
T(n/4) T(n/2)
cn2
/ \
2
c(n )/16 c(n2)/4
/ \ / \
T(n/16) T(n/8) T(n/8) T(n/4)
Breaking down further gives us following
cn2
/ \
c(n2)/16 c(n2)/4
/ \ / \
c(n2)/256 c(n2)/64 c(n2)/64 c(n2)/16
/ \ / \ / \ / \
In recurrence tree method, we calculate total work done. If the work done at leaves is
polynomially more, then leaves are the dominant part, and our result becomes the work done
at leaves (Case 1). If work done at leaves and root is asymptotically same, then our result
becomes height multiplied by work done at any level (Case 2). If work done at root is
asymptotically more, then our result becomes work done at root (Case 3).
Examples of some standard algorithms whose time complexity can be evaluated using
Master Method
Merge Sort: T(n) = 2T(n/2) + Θ(n). It falls in case 2 as c is 1 and Logba] is also 1. So the
solution is Θ(n Logn)
Binary Search: T(n) = T(n/2) + Θ(1). It also falls in case 2 as c is 0 and Logba is also 0. So the
solution is Θ(Logn)
Notes:
1) It is not necessary that a recurrence of the form T(n) = aT(n/b) + f(n) can be solved using
Master Theorem. The given three cases have some gaps between them. For example, the
recurrence T(n) = 2T(n/2) + n/Logn cannot be solved using master method.
2) Case 2 can be extended for f(n) = Θ(ncLogkn)
If f(n) = Θ(ncLogkn) for some constant k >= 0 and c = Logba, then T(n) = Θ(ncLogk+1n)
Applications of Data Structure and Algorithms
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a
certain order to get the desired output. Algorithms are generally created independent of
underlying languages, i.e. an algorithm can be implemented in more than one programming
language.
From the data structure point of view, following are some important categories of algorithms −
Search − Algorithm to search an item in a data structure.
Sort − Algorithm to sort items in a certain order.
Insert − Algorithm to insert item in a data structure.
Update − Algorithm to update an existing item in a data structure.
Delete − Algorithm to delete an existing item from a data structure.
The following computer problems can be solved using Data Structures −
Important Terms
Following are the important terms with respect to tree.
Path − Path refers to the sequence of nodes along the edges of a tree.
Root − The node at the top of the tree is called root. There is only one root per tree and
one path from the root node to any node.
Parent − Any node except the root node has one edge upward to a node called parent.
Child − The node below a given node connected by its edge downward is called its child
node.
Leaf − The node which does not have any child node is called the leaf node.
Subtree − Subtree represents the descendants of a node.
Visiting − Visiting refers to checking the value of a node when control is on the node.
Traversing − Traversing means passing through nodes in a specific order.
Levels − Level of a node represents the generation of a node. If the root node is at level
0, then its next child node is at level 1, its grandchild is at level 2, and so on.
Keys − Key represents a value of a node based on which a search operation is to be
carried out for a node.
Types of Trees
There are three types of trees −
General Trees
Binary Trees
Binary Search Trees
General Trees
General trees are unordered tree data structures where the root node has minimum 0 or
maximum ‘n’ subtrees.
The General trees have no constraint placed on their hierarchy. The root node thus acts like the
superset of all the other subtrees.