Algorithms
Algorithms
What is an Algorithm?
An algorithm is a well-defined sequential computational technique that accepts a
value or a collection of values as input and produces the output(s) needed to solve a
problem. Or we can say that an algorithm is said to be accurate if and only if it stops
with the proper output for each input instance.
Types of Algorithms:
Sorting algorithms: Bubble Sort, insertion sort, and many more. These
algorithms are used to sort the data in a particular format.
Searching algorithms: Linear search, binary search, etc. These algorithms are
used in finding a value or record that the user demands.
Graph Algorithms: It is used to find solutions to problems like finding the
shortest path between cities, and real-life problems like traveling salesman
problems.
Time Complexity:
1
Time taken by the algorithm to solve the problem. It is measured by
calculating the iteration of loops, number of comparisons etc.
Time complexity is a function describing the amount of time an
algorithm takes in terms of the amount of input to the algorithm.
“Time” can mean the number of memory accesses performed, the
number of comparisons between integers, the number of times some
inner loop is executed, or some other natural unit related to the amount of
real time the algorithm will take.
Space Complexity
Space taken by the algorithm to solve the problem. It includes space used
by necessary input variables and any extra space (excluding the space
taken by inputs) that is used by the algorithm. For example, if we use a
hash table (a kind of data structure), we need an array to store values so
this is an extra space occupied, hence will count towards the space
complexity of the algorithm. This extra space is known as Auxiliary
Space.
Space complexity is a function describing the amount of memory (space)
an algorithm takes in terms of the amount of input to the algorithm.
Space complexity is sometimes ignored because the space used is
minimal and/ or obvious, but sometimes it becomes an issue as time.
Cases in Complexities
There are two commonly studied cases of complexity in algorithms:
1. Best case complexity: The best-case scenario for an algorithm is the scenario in
which the algorithm performs the minimum amount of work (e.g. takes the shortest
amount of time, uses the least amount of memory, etc.).
2. Worst case complexity: The worst-case scenario for an algorithm is the scenario
in which the algorithm performs the maximum amount of work (e.g. takes the
longest amount of time, uses the most amount of memory, etc.).
2
Asymptotic Notation and Analysis (Based on input size) in Complexity Analysis
of Algorithms
Asymptotic analysis is defined as the big idea that handles the above issues in
analyzing algorithms. In Asymptotic analysis, we evaluate the performance of an
algorithm in terms of input size (we don’t measure the actual running time). We
calculate, how the time (or space) taken by an algorithm increases with the input
size.
Asymptotic notation is a way to describe the running time or space complexity of an
algorithm based on the input size. It is commonly used in complexity analysis to
describe how an algorithm performs as the size of the input grows. The three most
commonly used notations are Big O, Omega, and Theta.
1. Big O notation (O): This notation provides an upper bound on the growth rate
of an algorithm’s running time or space usage. It represents the worst-case
scenario, i.e., the maximum amount of time or space an algorithm may need to
solve a problem. For example, if an algorithm’s running time is O(n), then it
means that the running time of the algorithm increases linearly with the input
size n or less.
2. Omega notation (Ω): This notation provides a lower bound on the growth rate
of an algorithm’s running time or space usage. It represents the best-case
scenario, i.e., the minimum amount of time or space an algorithm may need to
solve a problem. For example, if an algorithm’s running time is Ω(n), then it
means that the running time of the algorithm increases linearly with the input
size n or more.
3. Theta notation (Θ): This notation provides both an upper and lower bound on
the growth rate of an algorithm’s running time or space usage. It represents the
average-case scenario, i.e., the amount of time or space an algorithm typically
needs to solve a problem. For example, if an algorithm’s running time is Θ(n),
then it means that the running time of the algorithm increases linearly with the
input size n.
In general, the choice of asymptotic notation depends on the problem and the
specific algorithm used to solve it. It is important to note that asymptotic notation
does not provide an exact running time or space usage for an algorithm, but rather a
description of how the algorithm scales with respect to input size. It is a useful tool
3
for comparing the efficiency of different algorithms and for predicting how they will
perform on large input sizes.
Import time
start_time = time.time() # start timer
# run the algorithm
print("Time--- %s seconds ---" % (time.time() - start_time))
Given two algorithms for a task, how do we find out which one is better?
One naive way of doing this is – to implement both the algorithms and run the two
programs on your computer for different inputs and see which one takes less time.
There are many problems with this approach for the analysis of algorithms.
4
It might be possible that for some inputs, the first algorithm performs better than
the second. And for some inputs second performs better.
It might also be possible that for some inputs, the first algorithm performs better
on one machine, and the second works better on another machine for some other
inputs.
Asymptotic Analysis is the big idea that handles the above issues in analyzing
algorithms. In Asymptotic Analysis, we evaluate the performance of an algorithm
in terms of input size (we don’t measure the actual running time). We calculate,
how the time (or space) taken by an algorithm increases with the input size.
Also, in asymptotic analysis, we always talk about input sizes larger than a constant
value. It might be possible that those large inputs are never given to your software
and an asymptotically slower algorithm always performs better for your particular
situation. So, you may end up choosing an algorithm that is asymptotically slower
but faster for your software.
Advantages:
5
1. Asymptotic analysis provides a high-level understanding of how an algorithm
performs with respect to input size.
2. It is a useful tool for comparing the efficiency of different algorithms and
selecting the best one for a specific problem.
3. It helps in predicting how an algorithm will perform on larger input sizes, which
is essential for real-world applications.
4. Asymptotic analysis is relatively easy to perform and requires only basic
mathematical skills.
Disadvantages:
1. Asymptotic analysis does not provide an accurate running time or space usage of
an algorithm.
2. It assumes that the input size is the only factor that affects an algorithm’s
performance, which is not always the case in practice.
3. Asymptotic analysis can sometimes be misleading, as two algorithms with the
same asymptotic complexity may have different actual running times or space
usage.
4. It is not always straightforward to determine the best asymptotic complexity for
an algorithm, as there may be trade-offs between time and space complexity.
Algorithms Classification
6
5. Research: Classifying algorithms is essential for research and development in
computer science, as it helps to identify new algorithms and improve existing
ones.
Overall, the classification of algorithms plays a crucial role in computer science and
helps to improve the efficiency and effectiveness of solving problems.
7
the local optimum, without thinking about the future consequences.
Example: Fractional Knapsack, Activity Selection.
2. Divide and Conquer: The Divide and Conquer strategy involves dividing the
problem into sub-problem, recursively solving them, and then recombining them for
the final answer.
Example: Merge sort, Quicksort.
8
possible course of actions and finds the route that leads to the solution.
Example: N-queen problem, maize problem.
1. Top-Down approach
2. Bottom-up approach
Top-Down Approach:
Breaking down a complex problem into smaller, more manageable sub-problems
and solving each sub-problem individually.
Designing a system starting from the highest level of abstraction and moving
towards the lower levels.
Bottom-Up Approach:
Building a system by starting with the individual components and gradually
integrating them to form a larger system.
Solving sub-problems first and then using the solutions to build up to a solution of a
larger problem.
Note: Both approaches have their own advantages and disadvantages and the choice
between them often depends on the specific problem being solved.
Other Classifications
Apart from classifying the algorithms into the above broad categories, the algorithm
can be classified into other broad categories like:
9
1. Randomized Algorithms: Algorithms that make random choices for faster
solutions are known as randomized algorithms.
Example: Randomized Quicksort Algorithm
10
2. The Sorting Problem
A Sorting Algorithm is used to rearrange a given array or list of elements according to a
comparison operator on the elements. The comparison operator is used to decide the
new order of elements in the respective data structure.
2.1 QuickSort
QuickSort is a Divide and Conquer algorithm. It picks an element as pivot and
partitions the given array around the picked pivot. There are many different versions
of quickSort that pick pivot in different ways.
11
put all smaller elements (smaller than x) before x, and put all greater elements
(greater than x) after x. All this should be done in linear time.
12
Searching Algorithms
Binary Search Tree
Binary Search Tree is a node-based binary tree data structure that has the following
properties:
The left subtree of a node contains only nodes with keys lesser than the
node’s key.
The right subtree of a node contains only nodes with keys greater than the
node’s key.
The left and right subtree each must also be a binary search tree.
The above properties of the Binary Search Tree provide an ordering among keys so
that the operations like search, minimum and maximum can be done fast. If there is
no order, then we may have to compare every key to search for a given key.
Searching Element
Start from the root.
Compare the searching element with root, if less than root, then recurse for left,
else recurse for right.
If the element to search is found anywhere, return true, else return false.
# A utility function to search a given key in BST
def search(root,key):
13
Insertion of a key
Start from the root.
Compare the inserting element with root, if less than root, then recurse for left,
else recurse for right.
After reaching the end, just insert that node at left(if less than current) else right.
# Python program to demonstrate
# insert operation in binary search tree
r = Node(50)
r = insert(r, 30)
r = insert(r, 20)
r = insert(r, 40)
r = insert(r, 70)
r = insert(r, 60)
14
r = insert(r, 80)
Pattern Searching
The Pattern Searching algorithms are sometimes also referred to as String Searching
Algorithms and are considered as a part of the String algorithms. These algorithms are
useful in the case of searching a string within another string.
String matching algorithm have greatly influenced computer science and play an
essential role in various real-world problems. It helps in performing time-efficient
tasks in multiple domains. These algorithms are useful in the case of searching a
string within another string. String matching is also used in the Database schema,
Network systems.
String Matching Algorithms can broadly be classified into two types of algorithms
–
1. Exact String Matching Algorithms
2. Approximate String Matching Algorithms
15
Naïve Algorithm: It slides the pattern over text one by one and check for a
match. If a match is found, then slides by 1 again to check for subsequent
matches.
KMP (Knuth Morris Pratt) Algorithm: The idea is whenever a mismatch is
detected, we already know some of the characters in the text of the next
window. So, we take advantage of this information to avoid matching the
characters that we know will anyway match.
Boyer Moore Algorithm: This algorithm uses best heuristics of Naive and
KMP algorithm and starts matching from the last character of the pattern.
Using Trie data structure: It is used as an efficient information retrieval data
structure. It stores the keys in form of a balanced BST.
2. Deterministic Finite Automaton (DFA) method:
Automation Matcher Algorithm: It starts from the first state of the automata
and the first character of the text. At every step, it considers next character of
text, and look for the next state in the built finite automata and move to a
new state.
3. Algorithms based on Bit (parallelism method):
Aho-Corasick Algorithm: It finds all words in O(n + m + z) time where n is
the length of text and m be the total number characters in all words and z is
total number of occurrences of words in text. This algorithm forms the basis
of the original Unix command fgrep.
4. Hashing-string matching algorithms:
Rabin Karp Algorithm: It matches the hash value of the pattern with the hash
value of current substring of text, and if the hash values match then only it
starts matching individual characters.
16
These techniques are used when the quality of the text is low, there are spelling
errors in the pattern or text, finding DNA subsequences after mutation,
heterogeneous databases, etc. Some approximate string matching algorithms are:
17
Digital Forensics: String matching algorithms are used to locate specific text
strings of interest in the digital forensic text, which are useful for the
investigation.
Spelling Checker: Trie is built based on a predefined set of patterns. Then, this
trie is used for string matching. The text is taken as input, and if any such pattern
occurs, it is shown by reaching the acceptance state.
Spam filters: Spam filters use string matching to discard the spam. For example,
to categorize an email as spam or not, suspected spam keywords are searched in
the content of the email by string matching algorithms. Hence, the content is
classified as spam or not.
18
Intrusion Detection System: The data packets containing intrusion-related
keywords are found by applying string matching algorithms. All the malicious
code is stored in the database, and every incoming data is compared with stored
data. If a match is found, then the alarm is generated. It is based on exact string
matching algorithms where each intruded packet must be detected .
19
Randomized Algorithms
What is a Randomized Algorithm?
An algorithm that uses random numbers to decide what to do next anywhere in its
logic is called a Randomized Algorithm. For example, in Randomized Quick Sort,
we use a random number to pick the next pivot (or we randomly shuffle the array).
And in Karger’s algorithm, we randomly pick an edge.
How to analyse Randomized Algorithms?
Some randomized algorithms have deterministic time complexity. For example, this
implementation of Karger’s algorithm has time complexity is O(E). Such algorithms
are called Monte Carlo Algorithms and are easier to analyse for worst case.
On the other hand, time complexity of other randomized algorithms (other than Las
Vegas) is dependent on value of random variable. Such Randomized algorithms are
called Las Vegas Algorithms. These algorithms are typically analysed for expected
worst case. To compute expected time taken in worst case, all possible values of the
used random variable needs to be considered in worst case and time taken by every
possible value needs to be evaluated. Average of all evaluated times is the expected
worst case time complexity. Below facts are generally helpful in analysis os such
algorithms.
Linearity of Expectation
Expected Number of Trials until Success
For example consider below a randomized version of QuickSort.
A Central Pivot is a pivot that divides the array in such a way that one side has at-
least 1/4 elements.
// Sorts an array arr[low..high]
randQuickSort(arr[], low, high)
20
3. Partition arr[low..high] around the pivot x.
The important thing in our analysis is, time taken by step 2 is O(n).
How many times while loop runs before finding a central pivot?
The probability that the randomly chosen element is central pivot is 1/n.
Therefore, expected number of times the while loop runs is n
Thus, the expected time complexity of step 2 is O(n).
Note that the above randomized algorithm is not the best way to implement
randomized Quick Sort. The idea here is to simplify the analysis as it is simple to
analyse.
Typically, randomized Quick Sort is implemented by randomly picking a pivot (no
loop). Or by shuffling array elements.
import random
import time
def find_solution(n):
# seed the random number generator with the current time
random.seed(time.time())
# randomly select a number between 1 and n and return it as the
solution
return random.randint(1, n)
def main():
n = 10 # the range of possible solutions is 1 to n
print("Solution:", find_solution(n))
if __name__ == '__main__':
main()
21
Classification
Randomized algorithms are classified in two categories.
Las Vegas:
A Las Vegas algorithm were introduced by Laszlo Babai in 1979.
A Las Vegas algorithm is an algorithm which uses randomness, but gives guarantees
that the solution obtained for given problem is correct. It takes the risk with
resources used. A quick-sort algorithm is a simple example of Las-Vegas algorithm.
To sort the given array of n numbers quickly we use the quick sort algorithm. For
that we find out central element which is also called as pivot element and each
element is compared with this pivot element. Sorting is done in less time or it
requires more time is dependent on how we select the pivot element. To pick the
pivot element randomly we can use Las-Vegas algorithm.
Definition:
A randomized algorithm that always produce correct result with only variation from
one aun to another being its running time is known as Las-Vegas algorithm.
OR
A randomized algorithm which always produces a correct result or it informs about
the failure is known as Las-Vegas algorithm.
OR
A Las-Vegas algorithm take the risk with the resources used for computation but it
does not take risk with the result i.e. it gives correct and expected output for the
given problem.
Let us consider the above example of quick sort algorithm. In this algorithm we
choose the pivot element randomly. But the result of this problem is always a sorted
array. A Las-Vegas algorithm is having one restriction i.e. the solution for the given
problem can be found out in finite time. In this algorithm the numbers of possible
solutions arc limited. The actual solution is complex in nature or complicated to
calculate but it is easy to verify the correctness of candidate solution.
22
expected value. For example, Randomized Quick Sort always sorts an input array
and expected worst case time complexity of Quick Sort is O(nlogn).
Relation with the Monte-Carlo Algorithms:
The Las-Vegas algorithm can be differentiated with the Monte-carlo algorithms
in which the resources used to find out the solution are bounded but it does not
give guarantee that the solution obtained is accurate.
In some applications by making early termination a Las-Vegas algorithm can be
converted into Monte-Carlo algorithm.
Complexity Analysis:
The complexity class of given problem which is solved by using a Las-Vegas
algorithms is expect that the given problem is solved with zero error probability and
in polynomial time.
This zero error probability polynomial time is also called as ZPP which is obtained
as follows,
ZPP = RP ∩ CO-RP
Where, RP = Randomized polynomial time.
Randomized polynomial time algorithm always provide correct output when the
correct output is no, but with a certain probability bounded away from one when the
answer is yes. These kinds of decision problem can be included in class RP i.e.
randomized where polynomial time.
That is how we can solve given problem in expected polynomial time by using Las-
Vegas algorithm. Generally there is no upper bound for Las-vegas algorithm related
to worst case run time.
Monte Carlo:
The computational algorithms which rely on repeated random sampling to compute
their results such algorithm are called as Monte-Carlo algorithms.
OR
The random algorithm is Monte-carlo algorithms if it can give the wrong answer
sometimes.
23
Monte-carlo methods are best repeated computation of the random numbers, and
that’s why these algorithms are used for solving physical simulation system and
mathematical system.
This Monte-carlo algorithms are specially useful for disordered materials, fluids,
cellular structures. In case of mathematics these method are used to calculate the
definite integrals, these integrals are provided with the complicated boundary
conditions for multidimensional integrals. This method is successive one with
consideration of risk analysis when compared to other methods.
There is no single Monte carlo methods other than the term describes a large and
widely used class approaches and these approach use the following pattern.
Produce correct or optimum result with some probability. These algorithms have
deterministic running time and it is generally easier to find out worst case time
complexity. For example this implementation of Karger’s Algorithm produces
minimum cut with probability greater than or equal to 1/n2 (n is number of
vertices) and has worst case time complexity as O(E). Another example is Fermat
Method for Primality Testing.
A Las Vegas algorithm for this task is to keep picking a random element until we
find a 1. A Monte Carlo algorithm for the same is to keep picking a random element
until we either find 1 or we have tried maximum allowed times say k. The Las
Vegas algorithm always finds an index of 1, but time complexity is determined as
expect value. The expected number of trials before success is 2, therefore expected
24
time complexity is O(1). The Monte Carlo Algorithm finds a 1 with probability
[1 – (1/2)k]. Time complexity of Monte Carlo is O(k) which is deterministic
The deterministic algorithm provides a correct solution but it takes long time or its
runtime is large. This run-time can be improved by using the Monte carlo
integration algorithms. There are various methods used for integration by using
Monte-carlo methods such as,
25
Consider a tool that basically does sorting. Let the tool be used by many users
and there are few users who always use tool for already sorted array. If the tool
uses simple (not randomized) QuickSort, then those few users are always going
to face worst case situation. On the other hand if the tool uses Randomized
QuickSort, then there is no user that always gets worst case. Everybody gets
expected O(n Log n) time.
Randomized algorithms have huge applications in Cryptography.
Load Balancing.
Number-Theoretic Applications: primality Testing
Data Structures: Hashing, Sorting, Searching, Order Statistics and
Computational Geometry.
Algebraic identities: Polynomial and matrix identity verification. Interactive
proof systems.
Mathematical programming: Faster algorithms for linear programming,
Rounding linear program solutions to integer program solutions
Graph algorithms: Minimum spanning trees, shortest paths, minimum cuts..
Counting and enumeration: Matrix permanent Counting combinatorial
structures.
Parallel and distributed computing: Deadlock avoidance distributed consensus.
Probabilistic existence proofs: Show that a combinatorial object arises with non-
zero probability among objects drawn from a suitable probability space.
Derandomization: First devise a randomized algorithm then argue that it can be
derandomized to yield a deterministic algorithm.
26
2. Randomized search algorithms: These are algorithms that use randomness to
search for solutions to problems. Examples include genetic algorithms and
simulated annealing.
3. Randomized data structures: These are data structures that use randomness to
improve their performance. Examples include skip lists and hash tables.
4. Randomized load balancing: These are algorithms used to distribute load across
a network of computers, using randomness to avoid overloading any one
computer.
5. Randomized encryption: These are algorithms used to encrypt and decrypt data,
using randomness to make it difficult for an attacker to decrypt the data without
the correct key.
import random
# Generates a random permutation of the given array
def random_permutation(array):
# Shuffle the array using the random number generator
random.shuffle(array)
array = [1, 2, 3, 4, 5]
# Example usage
print(find_median([1, 2, 3, 4, 5])) # Output: 3
print(find_median([1, 2, 3, 4, 5, 6])) # Output: 3 or 4 (randomly
chosen)
print(find_median([])) # Output: None
27
print(find_median([7])) # Output: 7
28