0% found this document useful (0 votes)
29 views60 pages

Unit-1 Introduction To Algorithms-: Precise Finite Input Output Effective

Uploaded by

fosefan745
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views60 pages

Unit-1 Introduction To Algorithms-: Precise Finite Input Output Effective

Uploaded by

fosefan745
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

UNIT- 1

Introduction to Algorithms-
"Algorithms" refers to a set of rules or steps followed in problem-solving operations, particularly
by a computer. They are fundamental to the field of computer science, as they define the solution
to a problem in terms of the steps needed to solve it. Here's a breakdown of key aspects of
algorithms:
Definition: An algorithm is a finite sequence of well-defined, computer-implementable
instructions, typically to solve a class of problems or to perform a computation.

Characteristics:
Precise: Each step is clearly defined.
Finite: It must terminate after a finite number of steps.
Input: It may have zero or more inputs.
Output: It should produce at least one output.
Effective: Each step must be basic enough to be carried out, in principle, by a person using only
pencil and paper.
Types of Algorithms:
Sorting Algorithms: Such as Quick Sort, Merge Sort, and Bubble Sort.
Search Algorithms: Such as Linear Search and Binary Search.
Graph Algorithms: Such as Dijkstra's algorithm for shortest paths, and Depth-First Search
(DFS).
Dynamic Programming Algorithms: Solving complex problems by breaking them into simpler
subproblems.
Applications:
Used in software development for problem-solving.
Essential in data analysis and processing.
Basis for machine learning and artificial intelligence models.
Analysis of Algorithms:
Focuses on time complexity (how the time to complete an algorithm increases with the size of
the input) and space complexity (the amount of memory space required).
Notation such as Big O (O(n)) is commonly used to describe the upper limit of time or space
requirements.
Examples: Let's explore a detailed example of an algorithm: the Binary Search Algorithm. This
algorithm is an efficient method for finding an item from a sorted list of items. It works by
repeatedly dividing in half the portion of the list that could contain the item, until you've
narrowed down the possible locations to just one.
Algorithm Description - Binary Search:
Purpose: To find the position of a target value within a sorted array.
Approach:
Compare the target value to the middle element of the array.
If they are not equal, the half in which the target cannot lie is eliminated, and the search
continues on the remaining half.
Repeat this until the target value is found or the remaining array is empty.
Steps:
Begin with two variables representing the start and end of the array (initially, the first and last
elements).
Find the middle element of the array.
If the middle element is equal to the target, return the index of the middle element.
If the target is less than the middle element, repeat the search with the end set to the middle
element - 1.
If the target is greater than the middle element, repeat the search with the start set to the middle
element + 1.
If the start index exceeds the end index, the target is not in the array.
Complexity:
Time Complexity: O(log n) - because the algorithm divides the search interval in half each time.
Space Complexity: O(1) - it uses a constant amount of space.
Example:
Consider an array [3, 4, 5, 6, 7, 8, 9] and the target value 6.
The binary search will start by comparing 6 with the middle element (6), and find a match
immediately.
Binary search is a classic example of a divide and conquer algorithm and illustrates how
algorithms can be much more efficient than a simple linear search, especially for large datasets.
In summary, algorithms are the backbone of computer programming and data processing. They
allow us to solve problems efficiently and are integral in the development of software and
technology.

Attributes:-
Algorithms are fundamental to computing and problem-solving. Understanding their attributes
helps in evaluating and choosing the right algorithm for a particular problem. Here are the key
attributes of algorithms:
Correctness: An algorithm should produce the correct output for all possible valid inputs. It
involves two aspects:
Partial Correctness: The algorithm gives the correct output when it terminates.
Termination: The algorithm eventually terminates or stops.
Efficiency: This refers to how well an algorithm performs in terms of time and space:
Time Complexity: The amount of time an algorithm takes to complete in relation to its input size.
It's usually expressed using Big O notation (e.g., O(n), O(log n)).
Space Complexity: The amount of memory space required by an algorithm as a function of its
input size.
Determinism: Each step of the algorithm must be precisely defined. Given the same input, an
algorithm should always produce the same output.
Finiteness: An algorithm must always terminate after a finite number of steps.
Input and Output: An algorithm should have 0 or more well-defined inputs and 1 or more well-
defined outputs. Inputs are the data to be transformed, and outputs are the data that has been
processed.
Generality: The ability of an algorithm to apply to a set of different problems rather than a single
specific instance. A good algorithm should be general enough to solve a category of problems.
Scalability: The ability of an algorithm to maintain its efficiency even when the size of the
problem scales up.
Optimality: An optimal algorithm is the most efficient algorithm in terms of time and space for a
particular problem. However, achieving optimality is not always possible.
Stability (in sorting algorithms): Stability in sorting algorithms means that two objects with equal
keys appear in the same order in sorted output as they appear in the input unsorted array. Some
sorting algorithms are stable by nature, like Bubble Sort, while others are not, like Quick Sort.
Simplicity and Understandability: The ease with which an algorithm can be understood and
implemented. Simpler algorithms are generally preferred as they are easier to analyze and debug.
These attributes are critical for algorithm analysis and selection, and they are taken into
consideration during algorithm design and implementation in various fields of computing and
data processing.

Design Techniques:
Design techniques in the context of algorithms refer to the methodologies and strategies used to
craft algorithms that efficiently solve computational problems. Understanding these techniques is
crucial for developing algorithms that are not only correct but also optimized for performance
and resource utilization. Here are some key algorithm design techniques:
Divide and Conquer:
Principle: Break a problem into smaller sub-problems, solve each sub-problem independently,
and then combine their solutions to solve the original problem.
Applications: Quicksort, Mergesort, Binary Search, Strassen’s Matrix Multiplication.
Dynamic Programming:
Principle: Break the problem into simpler sub-problems in a recursive manner and store the
results of sub-problems to avoid computing the same results again.
Applications: Fibonacci sequence computation, Knapsack problem, Shortest path problems like
Bellman-Ford.
Greedy Method:
Principle: Always choose the best immediate or local solution while finding an answer. It doesn't
reconsider choices.
Applications: Kruskal’s and Prim’s algorithms for Minimum Spanning Tree, Dijkstra’s algorithm
for shortest paths.
Backtracking:
Principle: It involves recursive, systematic trial and error. If a solution doesn't work, it backtracks
to find another possible path.
Applications: The N-Queens problem, solving puzzles like Sudokus, and combinatorial
problems.
Branch and Bound:
Principle: It's similar to backtracking but with an optimization strategy (bounding function) to
speed up the search process.
Applications: Traveling Salesperson Problem (TSP), Knapsack problem.
Randomized Algorithms:
Principle: Use randomness as part of the logic to solve problems where deterministic strategies
are not viable.
Applications: Randomized Quicksort, Monte Carlo algorithms, randomized data structures like
hash tables.
Iterative Improvement:
Principle: Start with a feasible solution and iteratively improve it. Each step attempts to improve
the current solution.
Applications: Algorithms in optimization and network flows, like the Simplex method in linear
programming.
Recursive Algorithms:
Principle: The method where the solution to a problem depends on solutions to smaller instances
of the same problem.
Applications: Tree traversals, DFS in graphs, Merge Sort, Quick Sort.
Heuristic Algorithms:
Principle: Practical approach to find a satisfactory solution where finding an optimal solution is
impractical.
Applications: Used in complex problems like scheduling, network design.
Each of these techniques has its strengths and is suitable for specific types of problems. The
choice of technique largely depends on the problem's nature, the required efficiency, and
resource constraints.

Time Space Trade Off: -


The Time-Space Trade-Off in computer science is a concept where memory space and
computational time can often be traded for one another in algorithms. This means that by using
more memory, you can often make an algorithm run faster, and conversely, by using less
memory, the algorithm might take longer to execute. It's a fundamental concept in algorithm
design and optimization, balancing between these two resources depending on the constraints
and requirements of the given problem.
Examples of Time-Space Trade-Off:
Hash Tables vs. Linked Lists:
Using a hash table can significantly speed up search operations to O(1) on average, compared to
O(n) for a linked list. However, hash tables require more memory space to store the hash
function and handle collisions effectively.
Caching and Memoization:
Caching and memoization techniques store the results of expensive function calls and reuse them
when the same inputs occur again, reducing the time complexity at the expense of additional
memory.
Precomputed Tables:
In certain algorithms, you can use tables that are precomputed with results for quick lookup. This
approach uses more memory to store the tables but reduces the computational time for each
operation.
Sorting Algorithms:
Some sorting algorithms, like Merge Sort, use additional space to sort the array but have better
time complexity (O(n log n)) compared to in-place sorting algorithms like Bubble Sort (O(n²)).
Data Structures:
Data structures like Trie, used for storing a dynamic set or associative array, offer fast retrieval
but consume more space compared to simpler data structures like arrays or linked lists.
Considerations:
Application Requirements: The decision often depends on what's more critical for the application
- speed or memory usage.
Hardware Constraints: Limited memory or processing power can dictate the choice.
Scalability: For large-scale applications, even small efficiencies can make a big difference, so the
trade-off becomes crucial.
Problem Specifics: Some problems naturally lend themselves to either time or space
optimization.
In summary, the time-space trade-off is about finding the right balance between speed (execution
time) and memory usage (space complexity) to optimize the performance of an algorithm or a
program based on specific needs and constraints.

Data Structures: -
Data structures are a way of organizing and storing data in a computer so that it can be accessed
and modified efficiently. They are crucial in the field of computer science and software
engineering, as they provide the foundation for efficient algorithm implementation. There are
various types of data structures, each suited to different kinds of applications. Here's an overview
of some common data structures:
Arrays:
Description: A collection of elements, each identified by an array index or key. It's the simplest
and most widely used data structure.
Usage: Used for storing data in contiguous memory locations. Efficient for accessing elements at
a known index.
Linked Lists:
Description: A sequential collection of elements, but unlike arrays, the elements are linked using
pointers.
Types: Singly linked lists, doubly linked lists, and circular linked lists.
Usage: Useful for dynamic size and ease of insertion/deletion without reallocation or
reorganization of the entire structure.
Stacks:
Description: An abstract data type that serves as a collection of elements, with two principal
operations: push (adds an element) and pop (removes the most recently added element).
Usage: Used in scenarios where data needs to be stored and retrieved in a Last In First Out
(LIFO) manner.
Queues:
Description: Similar to stacks, but the removal happens in a First In First Out (FIFO) order.
Types: Simple queue, circular queue, priority queue, double-ended queue (deque).
Usage: Common in scenarios where data needs to be processed in the order it arrives.
Trees:
Description: A hierarchical data structure consisting of nodes, with a single node designated as
the root, and all other nodes connected by edges.
Types: Binary tree, binary search tree, AVL tree, red-black tree, segment tree, B-tree, etc.
Usage: Used in databases for efficient storage and retrieval, in file systems, and for various
hierarchical relationships.
Graphs:
Description: A collection of nodes (or vertices) and edges connecting some or all of them.
Types: Directed and undirected graphs, weighted and unweighted graphs.
Usage: Widely used in network systems, for representing connections and pathways, and in
solving complex computational problems.
Hash Tables:
Description: Implements an associative array, a structure that can map keys to values. A hash
function is used to compute an index into an array of buckets or slots, from which the desired
value can be found.
Usage: Effective for implementing databases, caches, sets, and more, where quick lookup of data
is essential.
Heaps:
Description: A specialized tree-based data structure that satisfies the heap property: if P is a
parent node of C, then the key of P is either greater than or equal to (in a max heap) or less than
or equal to (in a min heap) the key of C.
Usage: Used in priority queues, for efficient sorting algorithms, and in graph algorithms like
Dijkstra's shortest path.
Each data structure has its own strengths and weaknesses and is chosen based on the
requirements of the algorithm or the application. Understanding these structures is essential for
efficient programming and algorithm design.

Classification and Operations of Data Structures: -


The classification of data structures primarily revolves around their organization and the
operations that can be performed on them. Broadly, data structures are classified into two
categories: Linear and Non-linear. Within these, they can further be categorized as static or
dynamic. Here's an overview:
1. Linear Data Structures:
Arrays:
Operations: Access, Insertion, Deletion (inefficient), Traversal, Searching, Sorting.
Characteristics: Fixed size, elements stored in contiguous memory locations.
Linked Lists (Singly, Doubly, Circular):
Operations: Access (inefficient), Insertion, Deletion, Traversal.
Characteristics: Dynamic size, elements linked using pointers.
Stacks:
Operations: Push, Pop, Peek, IsEmpty.
Characteristics: LIFO (Last In First Out) principle.
Queues:
Operations: Enqueue, Dequeue, Peek, IsEmpty.
Characteristics: FIFO (First In First Out) principle.
2. Non-linear Data Structures:
Trees (Binary, AVL, Red-Black, B-tree, etc.):
Operations: Insertion, Deletion, Traversal (in-order, pre-order, post-order), Searching.
Characteristics: Hierarchical structure, each node has a key and zero or more children.
Graphs (Directed, Undirected):
Operations: Add Vertex, Add Edge, Remove Vertex, Remove Edge, Traversal (BFS, DFS).
Characteristics: Consists of nodes (vertices) and edges, can represent complex relationships.
Static vs Dynamic Data Structures:
Static:
Characteristics: Size and structure are fixed at compile time, e.g., Arrays.
Usage: When the number of elements is known in advance.
Dynamic:
Characteristics: Size and structure can change during runtime, e.g., Linked Lists, Trees, Graphs.
Usage: When the number of elements is not known in advance and can change over time.
Common Operations on Data Structures:
Insertion: Adding a new element.
Deletion: Removing an existing element.
Traversal: Accessing each element to perform a certain operation.
Searching: Finding an element.
Sorting: Arranging elements in a certain order.
Access: Retrieving an element at a given position.
Understanding these classifications and operations is crucial for selecting the appropriate data
structure for a particular problem or application. The choice is often dictated by the specific
requirements for efficiency in terms of time and space, as well as the complexity of operations
that need to be supported.

Arrays: -
Arrays are one of the most fundamental and commonly used data structures in computer
programming. They are used to store a collection of elements (items), typically of the same data
type, in a contiguous block of memory.
We can directly access an array element by using its index value.
Basic terminologies of array
 Array Index: In an array, elements are identified by their indexes. Array index
starts from 0.
 Array element: Elements are items stored in an array and can be accessed by their
index.
 Array Length: The length of an array is determined by the number of elements it
can contain.
Representation of Array
The representation of an array can be defined by its declaration. A declaration means allocating
memory for an array of a given size.

Arrays can be declared in various ways in different languages. For better illustration, below are
some language-specific array declarations.
int arr[5]; // This array will store integer type element
char arr[10]; // This array will store char type element
float arr[20]; // This array will store float type element

However, the above declaration is static or compile-time memory allocation, which means
that the array element’s memory is allocated when a program is compiled. Here only a fixed
size (i,e. the size that is mentioned in square brackets []) of memory will be allocated for
storage, but don’t you think it will not be the same situation as we know the size of the array
every time, there might be a case where we don’t know the size of the array. If we declare a
larger size and store a lesser number of elements will result in a wastage of memory or either
be a case where we declare a lesser size then we won’t get enough memory to store the rest of
the elements. In such cases, static memory allocation is not preferred.
Why Array Data Structures is needed?
Assume there is a class of five students and if we have to keep records of their marks in
examination then, we can do this by declaring five variables individual and keeping track of
records but what if the number of students becomes very large, it would be challenging to
manipulate and maintain the data.
What it means is that, we can use normal variables (v1, v2, v3, ..) when we have a small
number of objects. But if we want to store a large number of instances, it becomes difficult to
manage them with normal variables. The idea of an array is to represent many instances in
one variable..

Need for Array

Types of arrays:
There are majorly two types of arrays:
 One-dimensional array (1-D arrays): You can imagine a 1d array as a row, where
elements are stored one after another.

1D array

 Two-dimensional array: 2-D Multidimensional arrays can be considered as an


array of arrays or as a matrix consisting of rows and columns.
2D array

 Three-dimensional array: A 3-D Multidimensional array contains three


dimensions, so it can be considered an array of two-dimensional arrays.

3D array

Types of Array operations:


 Traversal: Traverse through the elements of an array.
 Insertion: Inserting a new element in an array.
 Deletion: Deleting element from the array.
 Searching: Search for an element in the array.
 Sorting: Maintaining the order of elements in the array.
Advantages of using Arrays:
 Arrays allow random access to elements. This makes accessing elements by position
faster.
 Arrays have better cache locality which makes a pretty big difference in
performance.
 Arrays represent multiple data items of the same type using a single name.
 Arrays store multiple data of similar types with the same name.
 Array data structures are used to implement the other data structures like linked
lists, stacks, queues, trees, graphs, etc.
Disadvantages of Array:
 As arrays have a fixed size, once the memory is allocated to them, it cannot be
increased or decreased, making it impossible to store extra data if required. An
array of fixed size is referred to as a static array.
 Allocating less memory than required to an array leads to loss of data.
An array is homogeneous in nature so, a single array cannot store values of different
data types.
 Arrays store data in contiguous memory locations, which makes deletion and
insertion very difficult to implement. This problem is overcome by implementing
linked lists, which allow elements to be accessed sequentially.
Application of Array:
 They are used in the implementation of other data structures such as array lists,
heaps, hash tables, vectors, and matrices.
 Database records are usually implemented as arrays.
 It is used in lookup tables by computer.
 It is used for different sorting algorithms such as bubble sort insertion sort, merge
sort, and quick sort.

Memory Representation: -
The memory representation of data structures, particularly arrays, is a critical aspect in
understanding how they work and how they are accessed in computer memory. Let's delve into
how arrays are represented in memory:

Memory Representation of Arrays:

1. Contiguous Memory Allocation:


 Arrays are stored in contiguous blocks of memory. Each element of the array is of
the same data type and therefore occupies the same amount of memory space.
 For example, if an array is of type int in a system where integers occupy 4 bytes,
each element of the array will take up 4 bytes of memory.
2. Indexing:
 The position of each element in the array is determined by indexing. The first
element is at index 0, the second at index 1, and so on.
 The memory address of any element in the array can be calculated using the base
address of the array (the address of the first element) and the size of each element.
3. Calculating Memory Address:
 The address of an element at index i can be calculated as:
Address = Base_address + (i * Size_of_element)
1.
 This formula is why arrays offer constant time complexity (O(1)) for accessing
elements.

Memory Representation of Multi-Dimensional Arrays:

1. Row-Major Order (common in languages like C and C++):


 Elements are stored row by row.
 For a 2D array array[row][column], the address of an element at position (i, j) is
calculated as:
Address = Base_address + ((i * Number_of_columns) + j) * Size_of_element
1. Column-Major Order (common in languages like Fortran):
 Elements are stored column by column.
 The address calculation is slightly different, taking into account the number of
rows instead.

Memory Representation of Other Data Structures:

 Linked Lists: Each element (node) contains the data and a pointer to the next node.
Nodes are not stored in contiguous memory locations.
 Stacks and Queues: Can be implemented using arrays (contiguous memory) or linked
lists (non-contiguous memory).
 Trees and Graphs: Usually represented in memory using pointers (like linked lists) or
arrays (especially for specific types like binary trees).

Importance of Understanding Memory Representation:

 Efficiency: Knowing how data structures are represented in memory helps in writing
efficient code, especially in terms of memory usage and access time.
 Memory Management: It's crucial for understanding how much memory a data structure
will occupy and how it will grow or shrink.
 Debugging: Helps in debugging issues related to memory, such as memory leaks or
buffer overflows.

Memory representation of data structures is a fundamental concept in computer science,


particularly when dealing with lower-level programming and optimization of code for
performance and memory usage.
Address Calculation: -
Address calculation in arrays is crucial for understanding how elements are accessed in memory.
The calculation varies slightly depending on whether you're dealing with a single-dimensional
array or a multi-dimensional array. Let's look at how these calculations are typically done:

Calculating the address of any element In the 1-D array:


A 1-dimensional array (or single-dimension array) is a type of linear array. Accessing its
elements involves a single subscript that can either represent a row or column index.
Example:

1-D array

To find the address of an element in an array the following formula is used-


Address of A[I] = B + W * (I – LB)
I = Subset of element whose address to be found,
B = Base address,
W = Storage size of one element store in any array(in byte),
LB = Lower Limit/Lower Bound of subscript(If not specified assume zero).
Example: Given the base address of an array A[1300 ………… 1900] as 1020 and the size of
each element is 2 bytes in the memory, find the address of A[1700].
Solution:
Given:
Base address B = 1020
Lower Limit/Lower Bound of subscript LB = 1300
Storage size of one element store in any array W = 2 Byte
Subset of element whose address to be found I = 1700
Formula used:
Address of A[I] = B + W * (I – LB)
Solution:
Address of A[1700] = 1020 + 2 * (1700 – 1300)
= 1020 + 2 * (400)
= 1020 + 800
Address of A[1700] = 1820
Calculate the address of any element in the 2-D array:
The 2-dimensional array can be defined as an array of arrays. The 2-Dimensional arrays are
organized as matrices which can be represented as the collection of rows and columns as
array[M][N] where M is the number of rows and N is the number of columns.
Example:

2-D array

To find the address of any element in a 2-Dimensional array there are the following two ways-
 Row Major Order
 Column Major Order
1. Row Major Order:
Row major ordering assigns successive elements, moving across the rows and then down the
next row, to successive memory locations. In simple language, the elements of an array are
stored in a Row-Wise fashion.
To find the address of the element using row-major order uses the following formula:
Address of A[I][J] = B + W * ((I – LR) * N + (J – LC))
I = Row Subset of an element whose address to be found,
J = Column Subset of an element whose address to be found,
B = Base address,
W = Storage size of one element store in an array(in byte),
LR = Lower Limit of row/start row index of the matrix(If not given assume it as zero),
LC = Lower Limit of column/start column index of the matrix(If not given assume it as zero),
N = Number of column given in the matrix.
Example: Given an array, arr[1………10][1………15] with base value 100 and the size of
each element is 1 Byte in memory. Find the address of arr[8][6] with the help of row-major
order.
Solution:
Given:
Base address B = 100
Storage size of one element store in any array W = 1 Bytes
Row Subset of an element whose address to be found I = 8
Column Subset of an element whose address to be found J = 6
Lower Limit of row/start row index of matrix LR = 1
Lower Limit of column/start column index of matrix = 1
Number of column given in the matrix N = Upper Bound – Lower Bound + 1
= 15 – 1 + 1
= 15
Formula:
Address of A[I][J] = B + W * ((I – LR) * N + (J – LC))
Solution:
Address of A[8][6] = 100 + 1 * ((8 – 1) * 15 + (6 – 1))
= 100 + 1 * ((7) * 15 + (5))
= 100 + 1 * (110)
Address of A[I][J] = 210
2. Column Major Order:
If elements of an array are stored in a column-major fashion means moving across the column
and then to the next column then it’s in column-major order. To find the address of the element
using column-major order use the following formula:
Address of A[I][J] = B + W * ((J – LC) * M + (I – LR))
I = Row Subset of an element whose address to be found,
J = Column Subset of an element whose address to be found,
B = Base address,
W = Storage size of one element store in any array(in byte),
LR = Lower Limit of row/start row index of matrix(If not given assume it as zero),
LC = Lower Limit of column/start column index of matrix(If not given assume it as zero),
M = Number of rows given in the matrix.
Example: Given an array arr[1………10][1………15] with a base value of 100 and the size
of each element is 1 Byte in memory find the address of arr[8][6] with the help of column-
major order.
Solution:
Given:
Base address B = 100
Storage size of one element store in any array W = 1 Bytes
Row Subset of an element whose address to be found I = 8
Column Subset of an element whose address to be found J = 6
Lower Limit of row/start row index of matrix LR = 1
Lower Limit of column/start column index of matrix = 1
Number of Rows given in the matrix M = Upper Bound – Lower Bound + 1
= 10 – 1 + 1
= 10
Formula: used
Address of A[I][J] = B + W * ((J – LC) * M + (I – LR))
Address of A[8][6] = 100 + 1 * ((6 – 1) * 10 + (8 – 1))
= 100 + 1 * ((5) * 10 + (7))
= 100 + 1 * (57)
Address of A[I][J] = 157
From the above examples, it can be observed that for the same position two different address
locations are obtained that’s because in row-major order movement is done across the rows and
then down to the next row, and in column-major order, first move down to the first column and
then next column. So both the answers are right.
So it’s all based on the position of the element whose address is to be found for some cases the
same answers is also obtained with row-major order and column-major order and for some
cases, different answers are obtained.
Calculate the address of any element in the 3-D Array:
A 3-Dimensional array is a collection of 2-Dimensional arrays. It is specified by using three
subscripts:
 Block size
 Row size
 Column size
More dimensions in an array mean more data can be stored in that array.
Example:

3-D array

To find the address of any element in 3-Dimensional arrays there are the following two ways-
1. Row Major Order
1. Column Major Order
1. Row Major Order:
To find the address of the element using row-major order, use the following formula:
Address of A[i][j][k] = B + W *(M * N * (i-x) + N *(j-y) + (k-z))
Here:
B = Base Address (start address)
W = Weight (storage size of one element stored in the array)
M = Row (total number of rows)
N = Column (total number of columns)
P = Width (total number of cells depth-wise)
x = Lower Bound of Row
y = Lower Bound of Column
z = Lower Bound of Width
Example: Given an array, arr[1:9, -4:1, 5:10] with a base value of 400 and the size of each
element is 2 Bytes in memory find the address of element arr[5][-1][8] with the help of row-
major order?
Solution:
Given:
Block Subset of an element whose address to be found I = 5
Row Subset of an element whose address to be found J = -1
Column Subset of an element whose address to be found K = 8
Base address B = 400
Storage size of one element store in any array(in Byte) W = 2
Lower Limit of blocks in matrix x = 1
Lower Limit of row/start row index of matrix y = -4
Lower Limit of column/start column index of matrix z = 5
M(row) = Upper Bound – Lower Bound + 1 = 1 – (-4) + 1 = 6
N(Column)= Upper Bound – Lower Bound + 1 = 10 – 5 + 1 = 6

Formula used:
Address of[I][J][K] =B + W (M * N(i-x) + N *(j-y) + (k-z))
Solution:
Address of arr[5][-1][8] = 400 + 2 * {[6 * 6 * (5 – 1)] + 6 * [(-1 + 4)]} + [8 – 5]
= 400 + 2 * (6*6*4)+(6*3)+3
= 400 + 2 * (165)
= 730
2. Column Major Order:
To find the address of the element using column-major order, use the following formula:1
Address of A[i][j][k]= B + W(M * N(i – x) + M *(k – z) + (j – y))
Here:
B = Base Address (start address)
W = Weight (storage size of one element stored in the array)
M = Row (total number of rows)
N = Column (total number of columns)
P = Width (total number of cells depth-wise)
x = Lower Bound of block (first subscipt)
y = Lower Bound of Row
z = Lower Bound of Column
Example: Given an array arr[1:8, -5:5, -10:5] with a base value of 400 and the size of each
element is 4 Bytes in memory find the address of element arr[3][3][3] with the help of
column-major order?
Solution:
Given:
Row Subset of an element whose address to be found I = 3
Column Subset of an element whose address to be found J = 3
Block Subset of an element whose address to be found K = 3
Base address B = 400
Storage size of one element store in any array(in Byte) W = 4
Lower Limit of blocks in matrix x = 1
Lower Limit of row/start row index of matrix y = -5
Lower Limit of column/start column index of matrix z = -10
M (row)= Upper Bound – Lower Bound + 1 = 5 +5 + 1 = 11
N (column)= Upper Bound – Lower Bound + 1 = 5 + 10 + 1 = 16
Formula used:
Address of[i][j][k] = B + W(M * N(i – x) + M * (k-z) + (j-y))
Solution:
Address of arr[3][3][3] = 400 + 4 * ((11*16*(3-1)+11*(3-(-10)+(3-(-5)))
= 400 + 4 * ((176*2 + 11*13 + 8)
= 400 + 4 * (503)
= 400 + 2012
= 2412

Sparse Matrices-
A matrix is a two-dimensional data object made of m rows and n columns, therefore having
total m x n values. If most of the elements of the matrix have 0 value, then it is called a sparse
matrix.
Why to use Sparse Matrix instead of simple matrix ?
 Storage: There are lesser non-zero elements than zeros and thus lesser memory can
be used to store only those elements.
 Computing time: Computing time can be saved by logically designing a data
structure traversing only non-zero elements..
Example:
00304
00570
00000
02600
Representing a sparse matrix by a 2D array leads to wastage of lots of memory as zeroes in the
matrix are of no use in most of the cases. So, instead of storing zeroes with non-zero elements,
we only store non-zero elements. This means storing non-zero elements with triples- (Row,
Column, value).
Sparse Matrix Representations can be done in many ways following are two common
representations:
 Array representation
 Linked list representation
Method 1: Using Arrays:
2D array is used to represent a sparse matrix in which there are three rows named as
 Row: Index of row, where non-zero element is located
 Column: Index of column, where non-zero element is located
1. Value: Value of the non-zero element located at index – (row, column)

Implementation:
// C++ program for Sparse Matrix Representation
// using Array
#include<stdio.h>

int main()
{
// Assume 4x5 sparse matrix
int sparseMatrix[4][5] =
{
{0 , 0 , 3 , 0 , 4 },
{0 , 0 , 5 , 7 , 0 },
{0 , 0 , 0 , 0 , 0 },
{0 , 2 , 6 , 0 , 0 }
};

int size = 0;
for (int i = 0; i < 4; i++)
for (int j = 0; j < 5; j++)
if (sparseMatrix[i][j] != 0)
size++;

// number of columns in compactMatrix (size) must be


// equal to number of non - zero elements in
// sparseMatrix
int compactMatrix[3][size];

// Making of new matrix


int k = 0;
for (int i = 0; i < 4; i++)
for (int j = 0; j < 5; j++)
if (sparseMatrix[i][j] != 0)
{
compactMatrix[0][k] = i;
compactMatrix[1][k] = j;
compactMatrix[2][k] = sparseMatrix[i][j];
k++;
}

for (int i=0; i<3; i++)


{
for (int j=0; j<size; j++)
printf("%d ", compactMatrix[i][j]);

printf("\n");
}
return 0;
}
Output
001133
242312
345726
Time Complexity: O(NM), where N is the number of rows in the sparse matrix, and M is the
number of columns in the sparse matrix.
Auxiliary Space: O(NM), where N is the number of rows in the sparse matrix, and M is the
number of columns in the sparse matrix.
Method 2: Using Linked Lists
In linked list, each node has four fields. These four fields are defined as:
1. Row: Index of row, where non-zero element is located
 Column: Index of column, where non-zero element is located
 Value: Value of the non zero element located at index – (row,column)
 Next node: Address of the next node

// C program for Sparse Matrix Representation


// using Linked Lists
#include<stdio.h>
#include<stdlib.h>

// Node to represent sparse matrix


struct Node
{
int value;
int row_position;
int column_postion;
struct Node *next;
};
// Function to create new node
void create_new_node(struct Node** start, int non_zero_element,
int row_index, int column_index )
{
struct Node *temp, *r;
temp = *start;
if (temp == NULL)
{
// Create new node dynamically
temp = (struct Node *) malloc (sizeof(struct Node));
temp->value = non_zero_element;
temp->row_position = row_index;
temp->column_postion = column_index;
temp->next = NULL;
*start = temp;

}
else
{
while (temp->next != NULL)
temp = temp->next;

// Create new node dynamically


r = (struct Node *) malloc (sizeof(struct Node));
r->value = non_zero_element;
r->row_position = row_index;
r->column_postion = column_index;
r->next = NULL;
temp->next = r;

}
}

// This function prints contents of linked list


// starting from start
void PrintList(struct Node* start)
{
struct Node *temp, *r, *s;
temp = r = s = start;

printf("row_position: ");
while(temp != NULL)
{

printf("%d ", temp->row_position);


temp = temp->next;
}
printf("\n");

printf("column_postion: ");
while(r != NULL)
{
printf("%d ", r->column_postion);
r = r->next;
}
printf("\n");
printf("Value: ");
while(s != NULL)
{
printf("%d ", s->value);
s = s->next;
}
printf("\n");
}

// Driver of the program


int main()
{
// Assume 4x5 sparse matrix
int sparseMatric[4][5] =
{
{0 , 0 , 3 , 0 , 4 },
{0 , 0 , 5 , 7 , 0 },
{0 , 0 , 0 , 0 , 0 },
{0 , 2 , 6 , 0 , 0 }
};

/* Start with the empty list */


struct Node* start = NULL;

for (int i = 0; i < 4; i++)


for (int j = 0; j < 5; j++)

// Pass only those values which are non - zero


if (sparseMatric[i][j] != 0)
create_new_node(&start, sparseMatric[i][j], i, j);

PrintList(start);
return 0;
}

Output
row_position:0 0 1 1 3 3
column_position:2 4 2 3 1 2
Value:3 4 5 7 2 6

Searching and Sorting: Linear and Binary Search


Linear Search is defined as a sequential search algorithm that starts at one end and goes
through each element of a list until the desired element is found, otherwise the search
continues till the end of the data set.

Linear Search Algorithm

How Does Linear Search Algorithm Work?


In Linear Search Algorithm,
 Every element is considered as a potential match for the key and checked for the
same.
 If any element is found equal to the key, the search is successful and the index of
that element is returned.
 If no element is found equal to the key, the search yields “No match found”.
Implementation of Linear Search Algorithm:
Below is the implementation of the linear search algorithm:

 C
 C++
 Java
 Python3
 C#
 PHP
 Javascript

// C code to linearly search x in arr[].

#include <stdio.h>

int search(int arr[], int N, int x)


{
for (int i = 0; i < N; i++)
if (arr[i] == x)
return i;
return -1;
}

// Driver code
int main(void)
{
int arr[] = { 2, 3, 4, 10, 40 };
int x = 10;
int N = sizeof(arr) / sizeof(arr[0]);

// Function call
int result = search(arr, N, x);
(result == -1)
? printf("Element is not present in array")
: printf("Element is present at index %d", result);
return 0;
}
Output
Element is present at index 3
Complexity Analysis of Linear Search:
Time Complexity:
 Best Case: In the best case, the key might be present at the first index. So the best
case complexity is O(1)
 Worst Case: In the worst case, the key might be present at the last index i.e.,
opposite to the end from which the search has started in the list. So the worst-case
complexity is O(N) where N is the size of the list.
 Average Case: O(N)
Auxiliary Space: O(1) as except for the variable to iterate through the list, no other variable is
used.
Advantages of Linear Search:
 Linear search can be used irrespective of whether the array is sorted or not. It can be
used on arrays of any data type.
 Does not require any additional memory.
 It is a well-suited algorithm for small datasets.
Drawbacks of Linear Search:
 Linear search has a time complexity of O(N), which in turn makes it slow for large
datasets.
 Not suitable for large arrays.
When to use Linear Search?
 When we are dealing with a small dataset.
 When you are searching for a dataset stored in contiguous memory.

B inary Search is defined as a searching algorithm used in a sorted array by repeatedly


dividing the search interval in half. The idea of binary search is to use the information that
the array is sorted and reduce the time complexity to O(log N).
Example of Binary Search Algorithm

Conditions for when to apply Binary Search in a Data Structure:


To apply Binary Search algorithm:
 The data structure must be sorted.
 Access to any element of the data structure takes constant time.
Binary Search Algorithm:
In this algorithm,
 Divide the search space into two halves by finding the middle index “mid”.
Finding the middle index “mid” in Binary Search Algorithm

 Compare the middle element of the search space with the key.
 If the key is found at middle element, the process is terminated.
 If the key is not found at middle element, choose which half will be used as the next
search space.
 If the key is smaller than the middle element, then the left side is used for
next search.
 If the key is larger than the middle element, then the right side is used for
next search.
 This process is continued until the key is found or the total search space is
exhausted.
How to Implement Binary Search?
The Binary Search Algorithm can be implemented in the following two ways
 Iterative Binary Search Algorithm
 Recursive Binary Search Algorithm
Given below are the pseudocodes for the approaches.
1. Iterative Binary Search Algorithm:
Here we use a while loop to continue the process of comparing the key and splitting the search
space in two halves.
Implementation of Iterative Binary Search Algorithm:

// C program to implement iterative Binary Search


#include <stdio.h>

// An iterative binary search function.


int binarySearch(int arr[], int l, int r, int x)
{
while (l <= r) {
int m = l + (r - l) / 2;

// Check if x is present at mid


if (arr[m] == x)
return m;

// If x greater, ignore left half


if (arr[m] < x)
l = m + 1;
// If x is smaller, ignore right half
else
r = m - 1;
}

// If we reach here, then element was not present


return -1;
}

// Driver code
int main(void)
{
int arr[] = { 2, 3, 4, 10, 40 };
int n = sizeof(arr) / sizeof(arr[0]);
int x = 10;
int result = binarySearch(arr, 0, n - 1, x);
(result == -1) ? printf("Element is not present"
" in array")
: printf("Element is present at "
"index %d",
result);
return 0;
}

Advantages of Binary Search:


 Binary search is faster than linear search, especially for large arrays.
 More efficient than other searching algorithms with a similar time complexity, such
as interpolation search or exponential search.
 Binary search is well-suited for searching large datasets that are stored in external
memory, such as on a hard drive or in the cloud.
Drawbacks of Binary Search:
 The array should be sorted.
Binary search requires that the data structure being searched be stored in contiguous
memory locations.
 Binary search requires that the elements of the array be comparable, meaning that
they must be able to be ordered.
Applications of Binary Search:
 Binary search can be used as a building block for more complex algorithms used in
machine learning, such as algorithms for training neural networks or finding the
optimal hyperparameters for a model.
 It can be used for searching in computer graphics such as algorithms for ray tracing
or texture mapping.
 It can be used for searching a database.

Selection Sort: -
Selection sort is a simple and efficient sorting algorithm that works by repeatedly selecting the
smallest (or largest) element from the unsorted portion of the list and moving it to the sorted
portion of the list.
The algorithm repeatedly selects the smallest (or largest) element from the unsorted portion of
the list and swaps it with the first element of the unsorted part. This process is repeated for the
remaining unsorted portion until the entire list is sorted.

// C program for implementation of selection sort


#include <stdio.h>

void swap(int *xp, int *yp)


{
int temp = *xp;
*xp = *yp;
*yp = temp;
}

void selectionSort(int arr[], int n)


{
int i, j, min_idx;
// One by one move boundary of unsorted subarray
for (i = 0; i < n-1; i++)
{
// Find the minimum element in unsorted array
min_idx = i;
for (j = i+1; j < n; j++)
if (arr[j] < arr[min_idx])
min_idx = j;

// Swap the found minimum element with the first element


if(min_idx != i)
swap(&arr[min_idx], &arr[i]);
}
}

/* Function to print an array */


void printArray(int arr[], int size)
{
int i;
for (i=0; i < size; i++)
printf("%d ", arr[i]);
printf("\n");
}

// Driver program to test above functions


int main()
{
int arr[] = {64, 25, 12, 22, 11};
int n = sizeof(arr)/sizeof(arr[0]);
selectionSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}

Output
Sorted array:
11 12 22 25 64
Complexity Analysis of Selection Sort
Time Complexity: The time complexity of Selection Sort is O(N2) as there are two nested
loops:
 One loop to select an element of Array one by one = O(N)
 Another loop to compare that element with every other Array element = O(N)
 Therefore overall complexity = O(N) * O(N) = O(N*N) = O(N2)
Auxiliary Space: O(1) as the only extra memory used is for temporary variables while
swapping two values in Array. The selection sort never makes more than O(N) swaps and can
be useful when memory writing is costly.
Advantages of Selection Sort Algorithm
 Simple and easy to understand.
 Works well with small datasets.
Disadvantages of the Selection Sort Algorithm
 Selection sort has a time complexity of O(n^2) in the worst and average case.
 Does not work well on large datasets.
 Does not preserve the relative order of items with equal keys which means it is not
stable.

Bubble Sort: -
Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent
elements if they are in the wrong order. This algorithm is not suitable for large data sets as its
average and worst-case time complexity is quite high.
Bubble Sort Algorithm
In Bubble Sort algorithm,
 traverse from left and compare adjacent elements and the higher one is placed at
right side.
 In this way, the largest element is moved to the rightmost end at first.
 This process is then continued to find the second largest and place it and so on until
the data is sorted.
Implementation of Bubble Sort
Below is the implementation of the bubble sort. It can be optimized by stopping the algorithm
if the inner loop didn’t cause any swap.

// Optimized implementation of Bubble sort


#include <stdbool.h>
#include <stdio.h>

void swap(int* xp, int* yp)


{
int temp = *xp;
*xp = *yp;
*yp = temp;
}

// An optimized version of Bubble Sort


void bubbleSort(int arr[], int n)
{
int i, j;
bool swapped;
for (i = 0; i < n - 1; i++) {
swapped = false;
for (j = 0; j < n - i - 1; j++) {
if (arr[j] > arr[j + 1]) {
swap(&arr[j], &arr[j + 1]);
swapped = true;
}
}

// If no two elements were swapped by inner loop,


// then break
if (swapped == false)
break;
}
}

// Function to print an array


void printArray(int arr[], int size)
{
int i;
for (i = 0; i < size; i++)
printf("%d ", arr[i]);
}

// Driver program to test above functions


int main()
{
int arr[] = { 64, 34, 25, 12, 22, 11, 90 };
int n = sizeof(arr) / sizeof(arr[0]);
bubbleSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}

Output
Sorted array:
11 12 22 25 34 64 90
Complexity Analysis of Bubble Sort:
Time Complexity: O(N2)
Auxiliary Space: O(1)
Advantages of Bubble Sort:
 Bubble sort is easy to understand and implement.
 It does not require any additional memory space.
 It is a stable sorting algorithm, meaning that elements with the same key value
maintain their relative order in the sorted output.
Disadvantages of Bubble Sort:
 Bubble sort has a time complexity of O(N2) which makes it very slow for large data
sets.
 Bubble sort is a comparison-based sorting algorithm, which means that it requires a
comparison operator to determine the relative order of elements in the input data
set. It can limit the efficiency of the algorithm in certain cases.

Insertion Sort: -
Insertion sort is a simple sorting algorithm that works similar to the way you sort playing
cards in your hands. The array is virtually split into a sorted and an unsorted part. Values from
the unsorted part are picked and placed at the correct position in the sorted part.
Insertion Sort Algorithm
To sort an array of size N in ascending order iterate over the array and compare the current
element (key) to its predecessor, if the key element is smaller than its predecessor, compare it to
the elements before. Move the greater elements one position up to make space for the swapped
element.
Working of Insertion Sort algorithm
Consider an example: arr[]: {12, 11, 13, 5, 6}
12 11 13 5 6

First Pass:
 Initially, the first two elements of the array are compared in insertion sort.

12 11 13 5 6

 Here, 12 is greater than 11 hence they are not in the ascending order and 12 is not
at its correct position. Thus, swap 11 and 12.
 So, for now 11 is stored in a sorted sub-array.

11 12 13 5 6

Second Pass:
 Now, move to the next two elements and compare them

11 12 13 5 6
 Here, 13 is greater than 12, thus both elements seems to be in ascending order,
hence, no swapping will occur. 12 also stored in a sorted sub-array along with 11
Third Pass:
 Now, two elements are present in the sorted sub-array which are 11 and 12
 Moving forward to the next two elements which are 13 and 5

11 12 13 5 6

 Both 5 and 13 are not present at their correct place so swap them

11 12 5 13 6

 After swapping, elements 12 and 5 are not sorted, thus swap again

11 5 12 13 6

 Here, again 11 and 5 are not sorted, hence swap again

5 11 12 13 6

 Here, 5 is at its correct position


Fourth Pass:
 Now, the elements which are present in the sorted sub-array are 5, 11 and 12
 Moving to the next two elements 13 and 6

5 11 12 13 6

 Clearly, they are not sorted, thus perform swap between both

5 11 12 6 13

 Now, 6 is smaller than 12, hence, swap again

5 11 6 12 13

 Here, also swapping makes 11 and 6 unsorted hence, swap again

5 6 11 12 13

 Finally, the array is completely sorted.


Illustrations:
// C program for insertion sort
#include <math.h>
#include <stdio.h>

/* Function to sort an array using insertion sort*/


void insertionSort(int arr[], int n)
{
int i, key, j;
for (i = 1; i < n; i++) {
key = arr[i];
j = i - 1;

/* Move elements of arr[0..i-1], that are


greater than key, to one position ahead
of their current position */
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j = j - 1;
}
arr[j + 1] = key;
}
}

// A utility function to print an array of size n


void printArray(int arr[], int n)
{
int i;
for (i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
}

/* Driver program to test insertion sort */


int main()
{
int arr[] = { 12, 11, 13, 5, 6 };
int n = sizeof(arr) / sizeof(arr[0]);
insertionSort(arr, n);
printArray(arr, n);

return 0;
}

Output
5 6 11 12 13
Time Complexity: O(N^2)
Auxiliary Space: O(1)
Complexity Analysis of Insertion Sort:
Time Complexity of Insertion Sort
 The worst-case time complexity of the Insertion sort is O(N^2)
 The average case time complexity of the Insertion sort is O(N^2)
 The time complexity of the best case is O(N).
Space Complexity of Insertion Sort
The auxiliary space complexity of Insertion Sort is O(1)
Characteristics of Insertion Sort
 This algorithm is one of the simplest algorithms with a simple implementation
 Basically, Insertion sort is efficient for small data values
 Insertion sort is adaptive in nature, i.e. it is appropriate for data sets that are already
partially sorted.

Merge Sort: -
Merge sort is defined as a sorting algorithm that works by dividing an array into smaller
subarrays, sorting each subarray, and then merging the sorted subarrays back together to form
the final sorted array.
In simple terms, we can say that the process of merge sort is to divide the array into two
halves, sort each half, and then merge the sorted halves back together. This process is repeated
until the entire array is sorted.
Merge Sort Algorithm

How does Merge Sort work?


Merge sort is a recursive algorithm that continuously splits the array in half until it cannot be
further divided i.e., the array has only one element left (an array with one element is always
sorted). Then the sorted subarrays are merged into one sorted array.

// C program for Merge Sort


#include <stdio.h>
#include <stdlib.h>

// Merges two subarrays of arr[].


// First subarray is arr[l..m]
// Second subarray is arr[m+1..r]
void merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
// Create temp arrays
int L[n1], R[n2];

// Copy data to temp arrays L[] and R[]


for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1 + j];

// Merge the temp arrays back into arr[l..r


i = 0;
j = 0;
k = l;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
}
else {
arr[k] = R[j];
j++;
}
k++;
}

// Copy the remaining elements of L[],


// if there are any
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}

// Copy the remaining elements of R[],


// if there are any
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
}

// l is for left index and r is right index of the


// sub-array of arr to be sorted
void mergeSort(int arr[], int l, int r)
{
if (l < r) {
int m = l + (r - l) / 2;

// Sort first and second halves


mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);

merge(arr, l, m, r);
}
}

// Function to print an array


void printArray(int A[], int size)
{
int i;
for (i = 0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}

// Driver code
int main()
{
int arr[] = { 12, 11, 13, 5, 6, 7 };
int arr_size = sizeof(arr) / sizeof(arr[0]);

printf("Given array is \n");


printArray(arr, arr_size);

mergeSort(arr, 0, arr_size - 1);

printf("\nSorted array is \n");


printArray(arr, arr_size);
return 0;
}

Output
Given array is
12 11 13 5 6 7
Sorted array is
5 6 7 11 12 13

Complexity Analysis of Merge Sort


Time Complexity: O(N log(N)), Merge Sort is a recursive algorithm and time complexity can
be expressed as following recurrence relation.
T(n) = 2T(n/2) + θ(n)
The above recurrence can be solved either using the Recurrence Tree method or the Master
method. It falls in case II of the Master Method and the solution of the recurrence is
θ(Nlog(N)). The time complexity of Merge Sort isθ(Nlog(N)) in all 3 cases (worst, average,
and best) as merge sort always divides the array into two halves and takes linear time to merge
two halves.
Auxiliary Space: O(N), In merge sort all elements are copied into an auxiliary array. So N
auxiliary space is required for merge sort.
Applications of Merge Sort:
 Sorting large datasets: Merge sort is particularly well-suited for sorting large
datasets due to its guaranteed worst-case time complexity of O(n log n).
 External sorting: Merge sort is commonly used in external sorting, where the data
to be sorted is too large to fit into memory.
 Custom sorting: Merge sort can be adapted to handle different input distributions,
such as partially sorted, nearly sorted, or completely unsorted data.
 Inversion Count Problem
Advantages of Merge Sort:
 Stability: Merge sort is a stable sorting algorithm, which means it maintains the
relative order of equal elements in the input array.
 Guaranteed worst-case performance: Merge sort has a worst-case time
complexity of O(N logN), which means it performs well even on large datasets.
 Parallelizable: Merge sort is a naturally parallelizable algorithm, which means it
can be easily parallelized to take advantage of multiple processors or threads.
Drawbacks of Merge Sort:
 Space complexity: Merge sort requires additional memory to store the merged sub-
arrays during the sorting process.
 Not in-place: Merge sort is not an in-place sorting algorithm, which means it
requires additional memory to store the sorted data. This can be a disadvantage in
applications where memory usage is a concern.
 Not always optimal for small datasets: For small datasets, Merge sort has a higher
time complexity than some other sorting algorithms, such as insertion sort. This can
result in slower performance for very small datasets.

Elementary Comparison of Searching and Sorting Algorithms:-

Hashing: Hash Table, Hash Functions


What is Hash Table?
A Hash table is defined as a data structure used to insert, look up, and remove key-value pairs
quickly. It operates on the hashing concept, where each key is translated by a hash function
into a distinct index in an array. The index functions as a storage location for the matching
value. In simple words, it maps the keys with the value.
What is Load factor?
A hash table’s load factor is determined by how many elements are kept there in relation to
how big the table is. The table may be cluttered and have longer search times and collisions if
the load factor is high. An ideal load factor can be maintained with the use of a good hash
function and proper table resizing.
What is a Hash function?
A Function that translates keys to array indices is known as a hash function. The keys should
be evenly distributed across the array via a decent hash function to reduce collisions and ensure
quick lookup speeds.
 Integer universe assumption: The keys are assumed to be integers within a certain
range according to the integer universe assumption. This enables the use of basic
hashing operations like division or multiplication hashing.
 Hashing by division: This straightforward hashing technique uses the key’s
remaining value after dividing it by the array’s size as the index. When an array size
is a prime number and the keys are evenly spaced out, it performs well.
 Hashing by multiplication: This straightforward hashing operation multiplies the
key by a constant between 0 and 1 before taking the fractional portion of the
outcome. After that, the index is determined by multiplying the fractional
component by the array’s size. Also, it functions effectively when the keys are
scattered equally.
 Hashing is the process of generating a value from a text or a list of numbers using a
mathematical function known as a hash function.
 A Hash Function is a function that converts a given numeric or alphanumeric key to a
small practical integer value. The mapped integer value is used as an index in the hash
table. In simple terms, a hash function maps a significant number or string to a small
integer that can be used as the index in the hash table.
 The pair is of the form (key, value), where for a given key, one can find a value using
some kind of a “function” that maps keys to values. The key for a given object can be
calculated using a function called a hash function. For example, given an array A, if i is
the key, then we can find the value by simply looking up A[i].
 Types of Hash functions
There are many hash functions that use numeric or alphanumeric keys. This article focuses on
discussing different hash functions:
1. Division Method.
2. Mid Square Method.
3. Folding Method.
4. Multiplication Method.
Let’s begin discussing these methods in detail.
1. Division Method:
This is the most simple and easiest method to generate a hash value. The hash function divides
the value k by M and then uses the remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are more uniformly
distributed. The hash function is dependent upon the remainder of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Pros:
1. This method is quite good for any value of M.
2. The division method is very fast since it requires only a single division operation.
Cons:
1. This method leads to poor performance since consecutive keys map to consecutive
hash values in the hash table.
2. Sometimes extra care should be taken to choose the value of M.
2. Mid Square Method:
The mid-square method is a very good hashing method. It involves two steps to compute the
hash value-
1. Square the value of the key k i.e. k2
2. Extract the middle r digits as the hash value.
Formula:
h(K) = h(k x k)
Here,
k is the key value.
The value of r can be decided based on the size of the table.
Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits are required to
map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
Pros:
1. The performance of this method is good as most or all digits of the key value
contribute to the result. This is because all digits in the key contribute to generating
the middle digits of the squared result.
2. The result is not dominated by the distribution of the top digit or bottom digit of the
original key value.
Cons:
1. The size of the key is one of the limitations of this method, as the key is of big size
then its square will double the number of digits.
2. Another disadvantage is that there will be collisions but we can try to reduce
collisions.
3. Digit Folding Method:
This method involves two steps:
1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part
has the same number of digits except for the last part that can have lesser digits than
the other parts.
2. Add the individual parts. The hash value is obtained by ignoring the last carry if any.
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
Note:
The number of digits in each part varies depending upon the size of the hash table. Suppose for
example the size of the hash table is 100, then each part must have two digits except for the
last part which can have a lesser number of digits.
4. Multiplication Method
This method involves the following steps:
1. Choose a constant value A such that 0 < A < 1.
2. Multiply the key value with A.
3. Extract the fractional part of kA.
4. Multiply the result of the above step by the size of the hash table i.e. M.
5. The resulting hash value is obtained by taking the floor of the result obtained in step
4.
Formula:
h(K) = floor (M (kA mod 1))
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Pros:
The advantage of the multiplication method is that it can work with any value between 0 and 1,
although there are some values that tend to give better results than the rest.
Cons:
The multiplication method is generally suitable when the table size is the power of two, then
the whole process of computing the index by the key using multiplication hashing is very fast.
Commonly used hash functions:
Hash functions are widely used in computer science and cryptography for a variety of
purposes, including data integrity, digital signatures, password storage, and more.
There are many types of hash functions, each with its own strengths and weaknesses. Here are
a few of the most common types:
1. SHA (Secure Hash Algorithm): SHA is a family of cryptographic hash functions designed
by the National Security Agency (NSA) in the United States. The most widely used SHA
algorithms are SHA-1, SHA-2, and SHA-3. Here’s a brief overview of each:
 SHA-1: SHA-1 is a 160-bit hash function that was widely used for digital signatures
and other applications. However, it is no longer considered secure due to known
vulnerabilities.
 SHA-2: SHA-2 is a family of hash functions that includes SHA-224, SHA-256,
SHA-384, and SHA-512. These functions produce hash values of 224, 256, 384,
and 512 bits, respectively. SHA-2 is widely used in security protocols such as
SSL/TLS and is considered secure.
 SHA-3: SHA-3 is the latest member of the SHA family and was selected as the
winner of the NIST hash function competition in 2012. It is designed to be faster
and more secure than SHA-2 and produces hash values of 224, 256, 384, and 512
bits.
2. CRC (Cyclic Redundancy Check): CRC is a non-cryptographic hash function used
primarily for error detection in data transmission. It is fast and efficient but is not suitable for
security purposes. The basic idea behind CRC is to append a fixed-length check value, or
checksum, to the end of a message. This checksum is calculated based on the contents of the
message using a mathematical algorithm, and is then transmitted along with the message.
When the message is received, the receiver can recalculate the checksum using the same
algorithm, and compare it with the checksum transmitted with the message. If the two
checksums match, the receiver can be reasonably certain that the message was not corrupted
during transmission.
The specific algorithm used for CRC depends on the application and the desired level of error
detection. Some common CRC algorithms include CRC-16, CRC-32, and CRC-CCITT.
3. Murmur Hash: Murmur Hash is a fast and efficient non-cryptographic hash function
designed for use in hash tables and other data structures. It is not suitable for security purposes
as it is vulnerable to collision attacks.
4. BLAKE2: BLAKE2 is a cryptographic hash function designed to be fast and secure. It is an
improvement over the popular SHA-3 algorithm and is widely used in applications that require
high-speed hashing, such as cryptocurrency mining.
BLAKE2 is available in two versions: BLAKE2b and BLAKE2s. BLAKE2b is optimized for
64-bit platforms and produces hash values of up to 512 bits, while BLAKE2s is optimized for
8- to 32-bit platforms and produces hash values of up to 256 bits.
5. Argon2: Argon2 is a memory-hard password hashing function designed to be resistant to
brute-force attacks. It is widely used for password storage and is recommended by the
Password Hashing Competition. The main goal of Argon2 is to make it difficult for attackers to
crack passwords by using techniques such as brute force attacks or dictionary attacks. It
achieves this by using a computationally-intensive algorithm that makes it difficult for
attackers to perform large numbers of password guesses in a short amount of time.
Argon2 has several key features that make it a strong choice for password hashing and key
derivation:
 Resistance to parallel attacks: Argon2 is designed to be resistant to parallel attacks,
meaning that it is difficult for attackers to use multiple processing units, such as
GPUs or ASICs, to speed up password cracking.
 Memory-hardness: Argon2 is designed to be memory-hard, meaning that it requires
a large amount of memory to compute the hash function. This makes it more
difficult for attackers to use specialized hardware to crack passwords.
 Customizable: Argon2 is highly customizable, and allows users to adjust parameters
such as the memory usage, the number of iterations, and the output length to meet
their specific security requirements.
Resistance to side-channel attacks: Argon2 is designed to be resistant to side-channel attacks,
such as timing attacks or power analysis attacks, that could be used to extract information
about the password being hashed.
6. MD5 (Message Digest 5): MD5 is a widely-used cryptographic hash function that produces
a 128-bit hash value. It is fast and efficient but is no longer recommended for security purposes
due to known vulnerabilities. The basic idea behind MD5 is to take an input message of any
length, and produce a fixed-length output, known as the hash value or message digest. This
hash value is unique to the input message, and is generated using a mathematical algorithm
that involves a series of logical operations, such as bitwise operations, modular arithmetic, and
logical functions.
MD5 is widely used in a variety of applications, including digital signatures, password storage,
and data integrity checks. However, it has been shown to have weaknesses that make it
vulnerable to attacks. In particular, it is possible to generate two different messages with the
same MD5 hash value, a vulnerability known as a collision attack.
There are many other types of hash functions, each with its own unique features and
applications. The choice of hash function depends on the specific requirements of the
application, such as speed, security, and memory usage.
Searching Algorithms are designed to check for an element or retrieve an element from
any data structure where it is used. Based on the type of operations these algorithms are
generally classified into two categories:

1. Sequential Search: The Sequential Search is the basic and simple Searching
Algorithm. Sequential Search starts at the beginning of the list or array. It traversed
the list or array sequentially and checks for every element of the list or array. The
Linear Search is an example of the Sequential Search.
A Linear Search checks one by one each element of the array, without jumping to
any item. It searches the element in the array until a match is found. If the match is
found then it returns the index of the item otherwise it returns the -1. The worst-case
complexity of the Linear Search Algorithm is O(N), where N is the total number of
elements in the list.
How Linear Search works:
Let’s understand with an example of how linear search works. Suppose, in this
example, the task is to search an element x in the array. For searching the given
element, start from the leftmost element of the array and one by one compare x with
each element of the array. If the x matches with the an element it returns the index
otherwise it returns the -1. Below is the image to illustrate the same:

1. Interval Search: These algorithms are designed to searching for a given element in
sorted data structures. These types of searching algorithms are much more efficient
than a Linear Search Algorithm. The Binary Search is an example of the Interval
Search.
A Binary Search searches the given element in the array by dividing the array into
two halves. First, it finds the middle element of the array and then compares the
given element with the middle element of thearray, if the element to be searched is
less than the item in the middle of the array then the given element can only lie in
the left subarray otherwise it lies in the right subarray. It repeatedly checks until the
element is found. The worst-case complexity of the Binary Search Algorithm
is O(log N),
How Binary Search works:
Let’s understand with an example of how Binary search works. Suppose, in this
example we have to search a element x in the array. For searching the given
element, first we find the middle element of the array then compare x with the
middle element of the array. If x matches with the middle element we return the mid
index and if x is greater than the mid element then x can be only lie in the right half
subarray after the mid element, so we recur for the right half otherwise recur for the
left half. It repeatedly checks until the element is found. Below is the image to
illustrate the same:

Sorting Algorithm: A Sorting Algorithm is used to arranging the data of list or array into
some specific order. It can be numerical or lexicographically order. For Example: The below
list of characters is sorted in increasing order of their ASCII values. That is, the character with
lesser ASCII value will be placed first than the character with higher ASCII value. The Bubble
Sort, Insertion Sort, Selection Sort, Merge Sort, Quick Sort, Heap Sort, Radix Sort, etc are the
examples of Sorting Algorithms.
There are two different categories in sorting. They are:

 Internal Sorting: When all data is placed in memory, then sorting is called internal
sorting.
 External Sorting: When all data that needs to be sorted cannot be placed in
memory at a time, the sorting is called External Sorting. External Sorting is used for
massive amount of data. Merge Sort and its variations are typically used for external
sorting. Some external storage like hard-disk, CD, etc is used for external storage.
Difference between Searching and Sorting Algorithm:
S.No. Searching Algorithm Sorting Algorithm

Searching Algorithms are designed


A Sorting Algorithm is used to arranging the
1. to retrieve an element from any
data of list or array into some specific order.
data structure where it is used.

These algorithms are generally


classified into two categories i.e. There are two different categories in sorting.
2.
Sequential Search and Interval These are Internal and External Sorting.
Search.

The worst-case time complexity of many


The worst-case time complexity of
3. sorting algorithms like Bubble Sort, Insertion
searching algorithm is O(N).
Sort, Selection Sort, and Quick Sort is O(N2).

Bubble Sort, Insertion Sort, Merge Sort etc are


There is no stable and unstable the stable sorting algorithms whereas Quick
4.
searching algorithms. Sort, Heap Sort etc are the unstable sorting
algorithms.

The Linear Search and the Binary The Bubble Sort, Insertion Sort, Selection
5. Search are the examples of Sort, Merge Sort, Quick Sort etc are the
Searching Algorithms. examples of Sorting Algorithms.

Choosing a hash function:


Selecting a decent hash function is based on the properties of the keys and the intended
functionality of the hash table. Using a function that evenly distributes the keys and reduces
collisions is crucial.
Criteria based on which a hash function is chosen:
 To ensure that the number of collisions is kept to a minimum, a good hash function
should distribute the keys throughout the hash table in a uniform manner. This
implies that for all pairings of keys, the likelihood of two keys hashing to the same
position in the table should be rather constant.
 To enable speedy hashing and key retrieval, the hash function should be
computationally efficient.
 It ought to be challenging to deduce the key from its hash value. As a result,
attempts to guess the key using the hash value are less likely to succeed.
 A hash function should be flexible enough to adjust as the data being hashed
changes. For instance, the hash function needs to continue to perform properly if the
keys being hashed change in size or format.
Collision resolution techniques:
Collisions happen when two or more keys point to the same array index. Chaining, open
addressing, and double hashing are a few techniques for resolving collisions.
 Open addressing: collisions are handled by looking for the following empty space
in the table. If the first slot is already taken, the hash function is applied to the
subsequent slots until one is left empty. There are various ways to use this approach,
including double hashing, linear probing, and quadratic probing.
 Separate Chaining: In separate chaining, a linked list of objects that hash to each
slot in the hash table is present. Two keys are included in the linked list if they hash
to the same slot. This method is rather simple to use and can manage several
collisions.
 Robin Hood hashing: To reduce the length of the chain, collisions in Robin Hood
hashing are addressed by switching off keys. The algorithm compares the distance
between the slot and the occupied slot of the two keys if a new key hashes to an
already-occupied slot. The existing key gets swapped out with the new one if it is
closer to its ideal slot. This brings the existing key closer to its ideal slot. This
method has a tendency to cut down on collisions and average chain length.
Dynamic resizing:
This feature enables the hash table to expand or contract in response to changes in the number
of elements contained in the table. This promotes a load factor that is ideal and quick lookup
times.
Implementations of Hash Table
Python, Java, C++, and Ruby are just a few of the programming languages that support hash
tables. They can be used as a customised data structure in addition to frequently being included
in the standard library.
Example- Count characters in the String “geeksforgeeks”.
In this example, we use a hashing technique for storing the count of the string.
 C++
 Java
 Python3
 C#
 Javascript

#include <bits/stdc++.h>
using namespace std;

int main() {
//initialize a string
string s="geeksforgeeks";

// Using an array to store the count of each alphabet


// by mapping the character to an index value
int arr[26]={0};

//Storing the count


for(int i=0;i<s.size();i++){
arr[s[i]-'a']++;
}

//Search the count of the character


char ch='e';

// get count
cout<<"The count of ch is "<<arr[ch-'a']<<endl;
return 0;
}

Output
The count of ch is 4
Complexity Analysis of a Hash Table:
For lookup, insertion, and deletion operations, hash tables have an average-case time
complexity of O(1). Yet, these operations may, in the worst case, require O(n) time, where n is
the number of elements in the table.
Applications of Hash Table:
 Hash tables are frequently used for indexing and searching massive volumes of
data. A search engine might use a hash table to store the web pages that it has
indexed.
 Data is usually cached in memory via hash tables, enabling rapid access to
frequently used information.
 Hash functions are frequently used in cryptography to create digital signatures,
validate data, and guarantee data integrity.
 Hash tables can be used for implementing database indexes, enabling fast access to
data based on key values.
Collision Resolution: - the hash function is used to find the index of the array. The hash value
is used to create an index for the key in the hash table. The hash function may return the same
hash value for two or more keys. When two or more keys have the same hash value, a collision
happens. To handle this collision, we use collision resolution techniques.
Collision Resolution Techniques
There are two types of collision resolution techniques.
 Separate chaining (open hashing)
 Open addressing (closed hashing)
Separate chaining: This method involves making a linked list out of the slot where the
collision happened, then adding the new key to the list. Separate chaining is the term used to
describe how this connected list of slots resembles a chain. It is more frequently utilized when
we are unsure of the number of keys to add or remove.
Time complexity
 Its worst-case complexity for searching is o(n).
 Its worst-case complexity for deletion is o(n).
Advantages of separate chaining
 It is easy to implement.
 The hash table never fills full, so we can add more elements to the chain.
 It is less sensitive to the function of the hashing.
Disadvantages of separate chaining
 In this, the cache performance of chaining is not good.
 Memory wastage is too much in this method.
 It requires more space for element links.
Open addressing: To prevent collisions in the hashing table open, addressing is employed as a
collision-resolution technique. No key is kept anywhere else besides the hash table. As a result,
the hash table’s size is never equal to or less than the number of keys. Additionally known as
closed hashing.
The following techniques are used in open addressing:
 Linear probing
 Quadratic probing
 Double hashing

You might also like