0% found this document useful (0 votes)
26 views

DataStructures Unit123 FullNotes

Uploaded by

milandhiman232
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

DataStructures Unit123 FullNotes

Uploaded by

milandhiman232
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

DATA STRUCTURES (23CSH-241)

Unit 1 NOTES

Chapter 1 : Introduction to Data Structures

1. Concept of Data and Information

● Data: Raw, unprocessed facts and figures without any context. Data can be numbers,
characters, symbols, or even sounds. Data does not convey any meaning on its own.
For example, "23", "Blue", "X" are pieces of data.
● Information: Data that has been processed, organized, or structured in a way that
provides meaning or context. Information is derived from data and is useful for
decision-making. For example, "John's age is 23", or "The sky is blue" are pieces of
information because they give context to the raw data.
Key Differences:
○ Data is the raw input that is processed to derive information.
○ Data is unorganized and meaningless on its own, while information is
processed and meaningful.

2. Introduction to Data Structures

● Data Structures: A data structure is a particular way of organizing data in a computer so


that it can be used efficiently. Different data structures are suited for different types of
applications, and some are highly specialized to specific tasks.
Why Use Data Structures?
○ Efficient Data Management: Data structures help manage large amounts of data
efficiently, both in terms of time and space.
○ Better Resource Utilization: Appropriate data structures enable better memory
and processor usage.
○ Simplified Problem Solving: They provide a clear, organized way of thinking
about and solving a problem.
○ Reusability and Abstraction: They enable code reuse and abstract complexity.

3. Types of Data Structures

Data structures can be broadly categorized into two types:

A. Linear Data Structures

● Definition: A data structure where data elements are arranged sequentially or in a linear
order, where each element is connected to its previous and next element.
● Examples:
1. Array: A collection of elements identified by index or key, all of which are stored
in contiguous memory locations. Arrays have a fixed size and are used for storing
multiple items of the same type.
2. Linked List: A collection of nodes where each node contains data and a
reference (link) to the next node in the sequence. Linked lists allow for dynamic
memory allocation.
3. Stack: A linear data structure that follows the Last In, First Out (LIFO) principle.
Elements are added and removed only from one end, called the "top" of the
stack.
4. Queue: A linear data structure that follows the First In, First Out (FIFO) principle.
Elements are added from the "rear" and removed from the "front."

B. Non-Linear Data Structures

● Definition: A data structure where data elements are not arranged sequentially; instead,
they are connected in a hierarchical or graph-like manner.
● Examples:
1. Tree: A hierarchical data structure consisting of nodes, with a root node and
sub-nodes (children) forming a parent-child relationship. Binary Trees, Binary
Search Trees, AVL Trees, and B-Trees are examples.
2. Graph: A collection of nodes (vertices) and edges that connect pairs of nodes.
Graphs can be directed or undirected and are used to represent networks.

4. Operations on Data Structures

Common operations that can be performed on data structures include:

● Insertion: Adding a new element to a data structure.


● Deletion: Removing an element from a data structure.
● Traversal: Accessing each element of a data structure exactly once to perform some
operation (e.g., printing).
● Searching: Finding the location of an element in a data structure.
● Sorting: Arranging elements in a specific order (e.g., ascending or descending).

5. Algorithm Complexity

● Definition: Algorithm complexity refers to the amount of computational resources (time


and space) an algorithm requires to execute. It is used to measure the efficiency of an
algorithm.
● Types of Complexity:
1. Time Complexity: Measures the amount of time an algorithm takes to run as a
function of the size of its input. It's commonly expressed using Big O notation.
2. Space Complexity: Measures the amount of memory an algorithm needs to run
to completion. It includes both the memory used by the variables in the algorithm
and the memory needed for the input data.
6. Time-Space Trade-Off

● Definition: The time-space trade-off is a concept in which an increase in time complexity


leads to a decrease in space complexity or vice versa. The trade-off arises because
optimizing an algorithm for faster execution may require more memory, or reducing
memory usage may require more computational time.
Example:
○ Caching: To speed up data retrieval, frequently accessed data is stored in
memory (cache). This reduces time but increases memory usage.

7. Asymptotic Notations

● Definition: Asymptotic notations are mathematical tools used to describe the limiting
behavior of an algorithm's complexity as the input size grows. They provide a way to
describe the performance or complexity of an algorithm in a general sense.
● Types of Asymptotic Notations:
1. Big O Notation (O): Describes the upper bound of an algorithm's running time. It
represents the worst-case scenario, giving the maximum time required by an
algorithm.
■ Example: O(n) indicates that the running time grows linearly with the
input size.
2. Omega Notation (Ω): Describes the lower bound of an algorithm's running time.
It represents the best-case scenario, showing the minimum time required by an
algorithm.
■ Example: Ω(n) indicates that the running time grows at least linearly with
the input size.
3. Theta Notation (Θ): Describes the tight bound of an algorithm's running time. It
represents both the upper and lower bounds, showing the exact growth rate of
the algorithm.
■ Example: Θ(n) indicates that the running time grows linearly with the
input size.
4. Little o Notation (o): Represents an upper bound that is not asymptotically tight.
It describes a function that grows slower than the function in the denominator.
■ Example: o(n^2) indicates the algorithm's time complexity is less than
n^2 but not as precisely bounded as O(n^2).
5. Little omega Notation (ω): Represents a lower bound that is not asymptotically
tight. It describes a function that grows faster than the function in the
denominator.
■ Example: ω(n) indicates that the running time grows faster than n.
Question Bank

1. Define data and information. How do they differ?

Answer: Data refers to raw, unprocessed facts such as numbers, characters, or symbols,
whereas information is processed data that has context and meaning.

2. What is a data structure, and why is it important in programming?

Answer: A data structure is a specialized format for organizing, processing, retrieving, and
storing data. It is important because:

● Efficient Data Management: Optimizes data handling and storage.


● Improves Algorithm Efficiency: Enhances performance of algorithms.
● Supports Reusability: Standardizes data operations across applications.

3. Explain the difference between linear and non-linear data structures.


Provide examples.

Answer:
4. Describe the main operations that can be performed on data structures.

Answer:

1. Insertion: Adding a new element to a data structure.


2. Deletion: Removing an element from a data structure.
3. Traversal: Accessing each element in a data structure exactly once.
4. Searching: Finding a particular element within a data structure.
5. Sorting: Arranging the elements in a certain order (ascending or descending).

5. What is algorithm complexity, and why is it important?

Answer: Algorithm complexity measures the resources an algorithm requires, such as time and
space, relative to the size of the input data. It is crucial because:

● Efficiency: Helps in selecting the most efficient algorithm.


● Performance Prediction: Provides insights into how an algorithm scales with input size.

6. Explain the time-space trade-off in algorithms with an example.

Answer:
● Time-space trade-off involves balancing the memory usage of an algorithm against its
running time.
● Example: Using a hash table for faster searches increases memory usage, while a
simple array requires less memory but takes longer to search (linear search O(n)).

7. What is Big O notation, and how is it used in algorithm analysis?

Answer: Big O notation describes an algorithm's upper bound in terms of time or space
complexity, representing the worst-case scenario.

● Usage: Predicts performance and helps in comparing the efficiency of different


algorithms by focusing on the most significant factor affecting growth.

8. Provide an example of a linear data structure and its typical use case.

Answer:

Queue:

● Definition: Follows First In, First Out (FIFO) order.


● Use Case: Managing tasks in order of arrival, like print queues or handling web server
requests.

9. Describe the concept of a stack and its common operations.

Answer:

● Stack: A linear data structure following Last In, First Out (LIFO) principle.
Operations:
○ Push: Add an element to the top.
○ Pop: Remove the top element.
○ Peek: Retrieve the top element without removing it.

10. What is a binary search, and when can it be used?

Answer:

● Binary Search: An efficient search algorithm used on sorted arrays.


● When Used: It divides the search space in half, significantly reducing the number of
comparisons needed. The time complexity is O(log n), making it ideal for large datasets.

11. Explain what asymptotic notation is and list three types.

Answer:

Asymptotic notations describe the behavior of an algorithm as the input size approaches infinity.

1. Big O (O): Upper bound, worst-case scenario.


2. Omega (Ω): Lower bound, best-case scenario.
3. Theta (Θ): Tight bound, average-case scenario.

12. What are the advantages of using a linked list over an array?

Answer: Answer:

Linked List Array

Dynamic size, grows/shrinks as Fixed size, cannot change


needed

Efficient insertions/deletions Insertions/deletions require


shifts

Extra memory for pointers Less memory, no extra pointers

13. Describe the structure of a binary tree and its primary use cases.

Answer:
● Binary Tree: A hierarchical structure where each node has at most two children (left and
right).
Use Cases:
○ Search Operations: Implementing binary search trees.
○ Hierarchical Data: Representing organizational structures or file systems.

14. What is the difference between time complexity and space complexity?

Answer:

15. Give an example of a non-linear data structure and its typical


application.

Answer:

Graph:

● Definition: A set of nodes (vertices) connected by edges.


● Use Case: Network modeling (e.g., social networks, communication networks) where
entities are interconnected non-linearly.

16. What does the Big O notation O(n^2) signify in an algorithm's


performance?

Answer:

● O(n^2): Indicates quadratic time complexity, where the time required grows
exponentially with the input size.
● Implication: Algorithms with O(n^2) complexity, such as bubble sort in the worst case,
become inefficient for large datasets as performance degrades rapidly.

17. How do you calculate the space complexity of an algorithm?

Answer:

● Space Complexity: Calculated by determining the amount of memory required by the


algorithm in terms of:
○ Input Size: Memory required for input data.
○ Auxiliary Space: Extra space or temporary storage needed during execution.
● Formula: Total space = Input space + Auxiliary space.

18. Explain the significance of Omega (Ω) notation in analyzing algorithms.

Answer:

● Omega (Ω) Notation: Represents the best-case scenario or lower bound of an


algorithm’s running time.
● Significance: Helps in understanding the minimum time an algorithm takes, providing
insights into its optimal performance under ideal conditions.

19. Differentiate between time complexity classes O(n log n) and O(n^2).

Answer:

O(n log n) O(n^2)

Log-linear growth, more efficient Quadratic growth, less efficient

Example: Merge Sort, Quick Sort (avg) Example: Bubble Sort, Insertion Sort

Suitable for larger datasets Suitable for smaller datasets


20. What is a time-space trade-off, and how does it affect algorithm design?

Answer:

● Time-Space Trade-off: The balance between the time required to execute an algorithm
and the memory required to run it.
● Effect on Design:
○ More Space, Less Time: Using more memory to speed up computation (e.g.,
hash tables).
○ Less Space, More Time: Reducing memory usage at the cost of longer
computation (e.g., searching unsorted data linearly).

Long answer questions

1. Discuss the various types of data structures, comparing linear and


non-linear data structures with examples. Explain how the choice of data
structure can impact the efficiency of algorithms.

Answer:

Data structures are categorized into two primary types: linear and non-linear.

1. Linear Data Structures:


○ Definition: Data elements are arranged sequentially or in a line, where each
element is connected to its previous and next element.
○ Examples:
■ Array: A collection of elements stored at contiguous memory locations.
Arrays allow random access and are used when the number of elements
is known and fixed.
■ Linked List: A series of connected nodes, where each node contains
data and a reference to the next node. It supports dynamic memory
allocation and efficient insertion and deletion operations.
■ Stack: A LIFO (Last In, First Out) structure used for operations like
function calls, parsing expressions, and backtracking algorithms.
■ Queue: A FIFO (First In, First Out) structure used for scheduling
processes in operating systems and managing tasks in a sequential
order.
2. Non-Linear Data Structures:
○ Definition: Data elements are not stored sequentially, forming a hierarchical or
networked structure.
○ Examples:
■ Tree: A hierarchical structure with a root node and child nodes, used in
databases and file systems.
■ Graph: Consists of vertices (nodes) connected by edges, used to model
networks like social media, computer networks, or roads.

Impact on Algorithm Efficiency:

● Choice of Data Structure: The choice directly affects an algorithm’s efficiency


concerning time and space complexity. For example:
○ Arrays are ideal for static data and offer O(1) access time but have a high cost
for insertions and deletions (O(n)).
○ Linked Lists are more flexible with O(1) insertion and deletion but have O(n)
access time, which is less efficient for large datasets.
○ Trees and Graphs are optimal for hierarchical data and networks, with efficient
search and traversal algorithms like Depth-First Search (DFS) and Breadth-First
Search (BFS).

The choice of data structure should be based on the specific needs of the application, balancing
memory usage and processing speed.

2. Explain the concept of algorithm complexity and its types. Provide a


detailed example of how to analyze the time complexity of a given
algorithm, including best, worst, and average cases.

Answer:

Algorithm Complexity: Algorithm complexity measures the efficiency of an algorithm in terms


of time (time complexity) and space (space complexity) relative to the input size.

1. Time Complexity:
○ Represents the amount of time an algorithm takes to complete as a function of
the input size.
○ Types:
■ Best Case: The minimum time required for an algorithm to complete. This
is the most favorable input condition.
■ Worst Case: The maximum time required, representing the most
unfavorable input.
■ Average Case: The expected time for typical input conditions, averaging
over all possible inputs.
2. Space Complexity:
○ Represents the amount of memory space required by the algorithm, including
both the input space and auxiliary space used during execution.
Example of Time Complexity Analysis:

Consider the Bubble Sort algorithm:

Algorithm:

def bubble_sort(arr):

n = len(arr)

for i in range(n):

for j in range(0, n-i-1):

if arr[j] > arr[j+1]:

arr[j], arr[j+1] = arr[j+1], arr[j]


● Best Case (Already sorted array):
○ Time Complexity: O(n)
○ Reason: Only one pass is needed to confirm that the array is sorted.
● Worst Case (Reversed array):
○ Time Complexity: O(n^2)
○ Reason: Each element must be compared with every other element, leading to
n*(n-1)/2 comparisons.
● Average Case (Random order):
○ Time Complexity: O(n^2)
○ Reason: The algorithm does not have prior knowledge of the order of elements,
averaging out to quadratic time complexity over all possible cases.

Understanding these complexities helps developers choose or optimize algorithms based on the
size and nature of the input data.

3. What is a time-space trade-off in algorithms? Provide examples where


increasing space reduces time complexity and vice versa. How does this
trade-off influence the design and optimization of algorithms?

Answer:

Time-Space Trade-off: The time-space trade-off in algorithms is a balancing act between the
memory space used by an algorithm and the time it takes to execute. Optimizing for one often
means compromising on the other.
1. Increasing Space to Reduce Time:
○ Example: Hashing:
■ By using a hash table, searching for an element can be done in O(1) time.
However, this requires additional space to store the hash table, potentially
up to O(n) where n is the number of elements.
○ Example: Dynamic Programming:
■ Solving problems like Fibonacci sequence using memoization stores
previously computed results, reducing time complexity from O(2^n) to
O(n) but requiring O(n) space.
2. Reducing Space to Increase Time:
○ Example: Recursive Algorithms:
■ A recursive Fibonacci calculation has O(1) space complexity but O(2^n)
time complexity due to repeated calculations.
○ Example: In-place Sorting Algorithms (like Insertion Sort):
■ Operates with O(1) space by sorting in place but has O(n^2) time
complexity, making it less efficient for large datasets.

Influence on Algorithm Design:

● The trade-off is crucial when designing algorithms for environments with limited memory
or where speed is a priority. For example, embedded systems prioritize space efficiency,
whereas web applications may favor speed. Developers need to evaluate the specific
constraints and requirements of their applications to find an optimal balance between
time and space.

4. Describe asymptotic notations and their importance in analyzing


algorithm efficiency. Explain Big O, Omega, and Theta notations with
examples, illustrating their role in determining the scalability of algorithms.

Answer:

Asymptotic Notations: Asymptotic notations are mathematical tools used to describe the
running time or space requirement of an algorithm in terms of input size. They provide a
high-level understanding of an algorithm's efficiency by focusing on its behavior as the input size
grows.

1. Big O Notation (O):


○ Definition: Describes the upper bound of an algorithm’s running time,
representing the worst-case scenario.
○ Example: In a linear search algorithm, Big O is O(n) because, in the worst case,
every element must be checked once.
2. Omega Notation (Ω):
○ Definition: Describes the lower bound of an algorithm’s running time,
representing the best-case scenario.
○ Example: In a linear search, Omega is Ω(1) if the target element is the first
element in the list.
3. Theta Notation (Θ):
○ Definition: Describes a tight bound on the running time, representing both the
upper and lower bounds, hence the average-case scenario.
○ Example: In insertion sort, Θ(n^2) represents both its average and worst-case
time complexity.

Importance in Analyzing Algorithm Efficiency:

● Scalability: Asymptotic notations help predict how algorithms perform as the input size
scales, allowing for better resource allocation and optimization.
● Algorithm Comparison: They provide a common framework for comparing the
efficiency of different algorithms regardless of machine or platform differences.
● Optimization Guidance: Highlight areas where improvements can be made, especially
for large datasets.

Understanding and applying these notations ensure algorithms are chosen or optimized based
on the context in which they will run, improving overall performance and efficiency.
Chapter 2 : Arrays

1. Basic Terminology

● Array: A collection of elements (usually of the same data type) stored in contiguous
memory locations. Each element can be accessed using its index.
● Index: The position of an element in an array, typically starting from 0.
● Pointer: A variable that stores the memory address of another variable. Pointers are
powerful in C/C++ for dynamic memory management, manipulating arrays, and
referencing data structures.

2. Linear Arrays and Their Representation

● Linear Array (One-Dimensional Array): A linear array, or one-dimensional array, is a


list of elements stored sequentially in memory. Each element is identified by a unique
index, which is the position relative to the first element.
Representation:
○ Memory Layout: Elements are stored in contiguous memory locations.
○ Accessing Elements: The address of an element in a linear array can be
calculated using the formula:
● Address(A[i])=BaseAddress(A)+i×Size of each element\text{Address}(A[i]) =
\text{BaseAddress}(A) + i \times \text{Size of each
element}Address(A[i])=BaseAddress(A)+i×Size of each element
where BaseAddress(A) is the memory location of the first element of the array, i is the
index of the element, and Size of each element is the number of bytes required to
store each element of the array.

3. Traversing Linear Arrays


Traversing: Visiting each element of the array exactly once to perform some operation (e.g.,
printing elements, counting elements, etc.).
Example in C:
c
Copy code
for (int i = 0; i < n; i++) {
printf("%d ", array[i]);
}

4. Insertion & Deletion in Arrays

● Insertion: Adding a new element to an array at a specific position.


○ Procedure:
1. Shift all elements from the insertion point to the end of the array one
position to the right.
2. Insert the new element at the specified position.
○ Complexity: O(n) in the worst case (when inserting at the beginning).
● Deletion: Removing an element from an array at a specific position.
○ Procedure:
1. Shift all elements after the deletion point one position to the left.
2. Reduce the size of the array.
○ Complexity: O(n) in the worst case (when deleting the first element).

5. Searching Algorithms

● Linear Search:
○ Definition: Sequentially checks each element of the array until the desired
element is found or the end of the array is reached.
○ Complexity: O(n), where n is the number of elements in the array.
● Binary Search:
○ Definition: An efficient algorithm for finding an element in a sorted array. It
repeatedly divides the search interval in half.
○ Procedure:
1. Start with the middle element.
2. If the middle element is equal to the target value, the search is complete.
3. If the target value is less than the middle element, search the left half.
4. If the target value is greater than the middle element, search the right half.
○ Complexity: O(log n).

6. Sorting Algorithms

● Insertion Sort:
○ Definition: Builds the final sorted array one element at a time by repeatedly
picking the next element and inserting it into its correct position.
○ Complexity: O(n^2) in the worst case.
● Selection Sort:
○ Definition: Repeatedly selects the smallest (or largest) element from the
unsorted portion and moves it to the sorted portion.
○ Complexity: O(n^2) in all cases.
● Bubble Sort:
○ Definition: Repeatedly steps through the list, compares adjacent elements, and
swaps them if they are in the wrong order.
○ Complexity: O(n^2) in the worst case.

7. Quick Sort
● Definition: A divide-and-conquer algorithm that selects a 'pivot' element and partitions
the array into two sub-arrays, according to whether they are less than or greater than the
pivot.
● Procedure:
○ Choose a pivot: Typically, the first element, last element, or a random element.
○ Partitioning: Rearrange elements so that all elements less than the pivot are on
the left, and all elements greater than the pivot are on the right.
○ Recursively apply the above steps to the sub-arrays of elements with smaller
values and larger values.
● Complexity:
○ Best Case: O(n log n) when the pivot divides the array into two equal halves.
○ Worst Case: O(n^2) when the pivot is the smallest or largest element repeatedly
(like sorted or reverse sorted arrays).
○ Average Case: O(n log n).
● Space Complexity: O(log n) due to recursive stack space.

8. Merging Arrays & Merge Sort

Merging Arrays:

● Definition: Combining two sorted arrays into a single sorted array.


● Procedure:
1. Initialize two pointers, each pointing to the beginning of the two arrays.
2. Compare the elements pointed by the pointers, and place the smaller element in
the result array.
3. Move the pointer of the smaller element to the next position.
4. Repeat until all elements from both arrays have been placed in the result array.
● Complexity: O(n + m) where n and m are the sizes of the two arrays.

Merge Sort:

● Definition: A stable divide-and-conquer sorting algorithm that divides the input array into
two halves, calls itself for the two halves, and then merges the two sorted halves.
● Procedure:
○ Divide the array into two halves.
○ Conquer by recursively sorting the two halves.
○ Combine by merging the sorted halves to produce the sorted array.
● Complexity:
○ Time Complexity: O(n log n) for all cases (best, average, and worst).
○ Space Complexity: O(n) due to auxiliary storage required for merging.

9. Multi-Dimensional Arrays and Their Representation

● Definition: Arrays with more than one dimension (e.g., 2D arrays or matrices, 3D
arrays).
● Representation:
○ Row-Major Order: Stores all elements of a row contiguously.
Address(A[i][j])=BaseAddress(A)+((i×number of columns)+j)×Size of each
element\text{Address}(A[i][j]) = \text{BaseAddress}(A) + ((i \times \text{number of
columns}) + j) \times \text{Size of each
element}Address(A[i][j])=BaseAddress(A)+((i×number of columns)+j)×Size of
each element
○ Column-Major Order: Stores all elements of a column contiguously.
Address(A[i][j])=BaseAddress(A)+((j×number of rows)+i)×Size of each
element\text{Address}(A[i][j]) = \text{BaseAddress}(A) + ((j \times \text{number of
rows}) + i) \times \text{Size of each
element}Address(A[i][j])=BaseAddress(A)+((j×number of rows)+i)×Size of each
element

10. Pointers and Pointer Arrays

● Pointers:
○ Definition: A pointer is a variable that stores the memory address of another
variable.
○ Uses: Dynamic memory allocation, efficient array manipulation, accessing
hardware, and implementing data structures (e.g., linked lists, trees).
● Pointer Arrays:
○ Definition: An array of pointers, where each pointer can point to an individual
element or another array.
○ Usage: Useful in dynamic memory management and when dealing with
variable-length data or strings in C/C++.

11. Records; Record Structure

● Definition: A record (or struct in C) is a collection of fields, possibly of different data


types, grouped together to represent a single data item.

Example in C:
c
Copy code
struct Student {
char name[50];
int age;
float GPA;
};


● Representation in Memory:
○ Contiguous Allocation: Fields are stored in contiguous memory locations.
○ Padding: There might be padding between fields for alignment purposes.

12. Parallel Arrays

● Definition: Multiple arrays that hold related data in corresponding elements (parallel to
each other).
Example:
○ One array to store student names, another to store their grades. Index 0 in both
arrays corresponds to the same student.
● Usage: Simplifies data management when dealing with large data sets where each
record is homogeneous in terms of data types.

13. Sparse Matrices and Their Storage

● Definition: A sparse matrix is a matrix in which most of the elements are zero. Storing
only non-zero elements saves memory.
● Storage Methods:
○ Array of Tuples: Store each non-zero element as a tuple (row, column, value).
○ Compressed Sparse Row (CSR):
■ Three arrays:
■ Values to store non-zero elements.
■ Column Indices to store column indices of each element in the
Values array.
■ Row Pointers to store the starting index of each row in the
Values array.
○ Compressed Sparse Column (CSC): Similar to CSR but focuses on
column-wise storage.
● Complexity:
○ Space Complexity: O(nz) where nz is the number of non-zero elements, which
is much less than O(m*n) for dense matrices.
Question bank

1. Define a linear array and explain how it is represented in memory.

Answer:
A linear array is a data structure consisting of a collection of elements, each identified by an
index or a key. It is stored in contiguous memory locations, allowing efficient indexing and easy
access to any element using its index. The memory address of each element can be calculated
using the formula:

Address=Base Address+(i×size of each element)

where i is the index of the element.

2. What is traversing in the context of arrays, and why is it important?

Answer:
Traversing an array means accessing and processing each element of the array sequentially
from the first element to the last. It is important because it allows us to perform operations like
searching, sorting, and updating elements within the array.

3. Write a code snippet to insert an element at a specific position in an


array.

Answer:

c
Copy code
void insert(int arr[], int n, int element, int position) {
for (int i = n; i > position; i--) {
arr[i] = arr[i - 1];
}
arr[position] = element;
}

This code shifts elements to the right and inserts the new element at the specified position.
4. Explain the difference between linear search and binary search.

Answer:

Linear Search Binary Search

Scans each element Divides the array in half and searches in sorted
sequentially arrays

Time Complexity: O(n) Time Complexity: O(log n)

Works on unsorted arrays Requires the array to be sorted

5. What is the time complexity of inserting an element at the end of an array


and why?

Answer:
The time complexity of inserting an element at the end of an array is O(1) because it involves
placing the new element directly into the next available memory location without shifting other
elements.

6. Describe the concept of a multi-dimensional array with an example.

Answer:
A multi-dimensional array is an array of arrays, where each element is itself an array. A common
example is a 2D array (matrix) which is used to represent a table of rows and columns.

Example in C:

c
Copy code
int matrix[3][4]; // A 2D array with 3 rows and 4 columns

7. How do you delete an element from an array at a specific index?

Answer:
To delete an element from an array at a specific index, shift all elements after that index one
position to the left:

void delete(int arr[], int n, int index) {


for (int i = index; i < n - 1; i++) {
arr[i] = arr[i + 1];
}
}

This code snippet shifts elements to the left to overwrite the deleted element.

8. Explain the working principle of Bubble Sort.

Answer:
Bubble Sort is a simple sorting algorithm that repeatedly steps through the list, compares
adjacent elements, and swaps them if they are in the wrong order. This process is repeated until
the list is sorted. The algorithm has a time complexity of O(n^2) in the worst case.

9. What is a pointer array, and how does it differ from a regular array?

Answer:
A pointer array is an array that stores the addresses of other variables, rather than storing
actual data values. Unlike a regular array, which contains data, a pointer array can reference
different memory locations, allowing dynamic access to elements or arrays.

10. Write a code snippet for performing a binary search on a sorted array.

Answer:

int binarySearch(int arr[], int n, int key) {


int low = 0, high = n - 1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (arr[mid] == key) return mid;
if (arr[mid] < key) low = mid + 1;
else high = mid - 1;
}
return -1; // Not found
}
This code divides the array into halves to efficiently find the target element.

11. What is the significance of sorting algorithms in arrays?

Answer:
Sorting algorithms organize the elements of an array into a specific order (ascending or
descending). This is significant because it:

● Enhances the performance of searching algorithms (like binary search).


● Makes data easier to analyze and visualize.
● Optimizes performance in other algorithms that require ordered data.

12. Describe how Quick Sort works with an example of its average-case
time complexity.

Answer:
Quick Sort is a divide-and-conquer algorithm that works by selecting a 'pivot' element from the
array and partitioning the other elements into two sub-arrays according to whether they are less
than or greater than the pivot. It recursively sorts the sub-arrays.

● Average-case time complexity: O(n log n), as it divides the array approximately in half
each time.

13. What is a sparse matrix, and why is specialized storage necessary?

Answer:
A sparse matrix is a matrix in which most of the elements are zero. Specialized storage (like
compressed row storage or list of lists) is necessary to efficiently store only the non-zero
elements, reducing memory usage and improving computational efficiency for operations like
addition and multiplication.

14. Explain the concept of merging two sorted arrays with an example.

Answer:
Merging two sorted arrays involves combining them into a single sorted array. The process
compares elements from both arrays and places the smaller element into the resulting array,
continuing until all elements are merged.
Example:

void merge(int arr1[], int arr2[], int n1, int n2, int merged[]) {
int i = 0, j = 0, k = 0;
while (i < n1 && j < n2) {
if (arr1[i] < arr2[j]) merged[k++] = arr1[i++];
else merged[k++] = arr2[j++];
}
while (i < n1) merged[k++] = arr1[i++];
while (j < n2) merged[k++] = arr2[j++];
}

15. What is a record in the context of arrays, and how is it represented in


memory?

Answer:
A record is a data structure that can store elements of different data types. In arrays, records
are stored as structures, where each structure contains fields that represent the record's
elements. Memory representation involves storing each field sequentially, similar to how an
array stores its elements.

16. How are multi-dimensional arrays represented in memory?

Answer:
Multi-dimensional arrays, such as 2D arrays, are stored in memory either in row-major order or
column-major order:

● Row-major order: Stores elements row by row, meaning the entire first row is stored
first, followed by the second row, and so on.
● Column-major order: Stores elements column by column, meaning the entire first
column is stored first, followed by the second column, and so on.

The choice depends on the programming language and the use case.

17. Differentiate between an array and a pointer in C.

Answer:
Array Pointer

Represents a fixed-size sequence of A variable that stores the address of another


elements variable

Cannot be resized after creation Can point to different locations dynamically

Example: int arr[5]; Example: int *ptr;

18. Describe the process of Selection Sort and its time complexity.

Answer:
Selection Sort works by dividing the array into a sorted and unsorted section. It repeatedly
selects the smallest (or largest) element from the unsorted section and swaps it with the first
unsorted element. The process continues until the entire array is sorted.

● Time Complexity: O(n^2) in all cases (best, average, and worst).

19. What is an insertion sort, and when is it most efficient?

Answer:
Insertion Sort is a simple sorting algorithm that builds the final sorted array one item at a time.
It is most efficient for small datasets or when the array is already partially sorted because its
time complexity can approach O(n) in the best case (nearly sorted array).

20. Explain how parallel arrays are used and provide an example.

Answer:
Parallel arrays use multiple arrays to store related data such that corresponding elements
across arrays represent a single entity. This method maintains relationships between different
data types without using a complex data structure.

Example:

int ids[] = {101, 102, 103};


char names[][10] = {"Alice", "Bob", "Charlie"};
float salaries[] = {50000.0, 60000.0, 55000.0};
In this example, ids[0], names[0], and salaries[0] together represent the details of one
person.

Long answer questions

1. Explain the process of insertion and deletion in a linear array. Provide


code snippets for both operations and discuss their time complexities.

Answer:

Insertion: To insert an element into a linear array, shift elements to the right starting from the
end of the array to the position where the new element should be inserted.

Code Snippet:

void insert(int arr[], int *n, int element, int position) {


for (int i = *n; i > position; i--) {
arr[i] = arr[i - 1];
}
arr[position] = element;
(*n)++;
}

Time Complexity: O(n), where n is the number of elements in the array. This is because, in the
worst case, all elements after the insertion point need to be shifted.

Deletion: To delete an element from a linear array, shift elements to the left starting from the
position of the element to be deleted.

Code Snippet:

void delete(int arr[], int *n, int position) {


for (int i = position; i < *n - 1; i++) {
arr[i] = arr[i + 1];
}
(*n)--;
}

Time Complexity: O(n), where n is the number of elements in the array. This is because, in the
worst case, all elements after the deletion point need to be shifted.
2. Describe and implement a function to merge two sorted arrays into a
single sorted array. Explain the algorithm and its time complexity.

Answer:

Merging Two Sorted Arrays: The merging process involves comparing elements from both
arrays and adding the smaller element to the new merged array. This continues until all
elements from both arrays are processed.

Algorithm:

1. Initialize pointers for both arrays and a merged array.


2. Compare elements from both arrays.
3. Insert the smaller element into the merged array and move the pointer.
4. Once one array is exhausted, append the remaining elements from the other array.

Code Snippet:

void merge(int arr1[], int n1, int arr2[], int n2, int merged[]) {
int i = 0, j = 0, k = 0;
while (i < n1 && j < n2) {
if (arr1[i] < arr2[j]) merged[k++] = arr1[i++];
else merged[k++] = arr2[j++];
}
while (i < n1) merged[k++] = arr1[i++];
while (j < n2) merged[k++] = arr2[j++];
}

Time Complexity: O(n1 + n2), where n1 and n2 are the sizes of the two input arrays. Each
element from both arrays is processed exactly once.

3. Discuss how to perform a binary search on a sorted array. Implement the


binary search algorithm and explain its time complexity in different
scenarios.

Answer:

Binary Search: Binary search is an efficient algorithm for finding an item from a sorted array by
repeatedly dividing the search interval in half.
Algorithm:

1. Initialize pointers for the beginning and end of the array.


2. Calculate the middle index.
3. Compare the target value with the middle element.
4. Adjust pointers based on comparison and repeat until the element is found or the
pointers converge.

Code Snippet:

int binarySearch(int arr[], int size, int key) {


int low = 0, high = size - 1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (arr[mid] == key) return mid;
else if (arr[mid] < key) low = mid + 1;
else high = mid - 1;
}
return -1; // Key not found
}

Time Complexity:

● Best Case: O(1) (when the middle element is the target).


● Worst Case: O(log n) (the search interval is halved each time).
● Average Case: O(log n) (on average, the target might be in the middle of the interval).

4. Explain the concept of a multi-dimensional array with a focus on a 2D


array. Implement a function to initialize and print a 2D array, and discuss its
memory representation.

Answer:

Multi-Dimensional Array: A 2D array is an array of arrays. It represents a matrix with rows and
columns, where each element can be accessed using two indices.

Initialization and Printing Function:

Code Snippet:

void initializeAndPrint2DArray(int rows, int cols) {


int array[rows][cols];

// Initialize the array


for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
array[i][j] = i * cols + j; // Example initialization
}
}

// Print the array


for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
printf("%d ", array[i][j]);
}
printf("\n");
}
}

Memory Representation: In memory, a 2D array is stored in a contiguous block of memory in


row-major order (all elements of the first row, followed by the second row, and so on).

Memory Calculation:

● For a 2D array of size m x n, total memory used is m * n * size_of_element.

5. Describe sparse matrices and their storage techniques. Implement a


function to convert a sparse matrix to a compressed row storage (CRS)
format.

Answer:

Sparse Matrix: A sparse matrix is a matrix predominantly filled with zero values. Special
storage techniques are used to save space and improve computational efficiency.

Storage Techniques:

● Compressed Row Storage (CRS): Stores non-zero elements in three separate arrays:
○ Values Array: Contains non-zero values.
○ Column Index Array: Contains column indices for each non-zero value.
○ Row Pointer Array: Contains pointers to the start of each row in the values
array.

Conversion to CRS Format:

Code Snippet:

void convertToCRS(int matrix[][4], int rows, int cols, int *values,


int *colIndex, int *rowPtr) {
int k = 0; // Index for values and colIndex
rowPtr[0] = 0; // Row pointer for the first row

for (int i = 0; i < rows; i++) {


for (int j = 0; j < cols; j++) {
if (matrix[i][j] != 0) {
values[k] = matrix[i][j];
colIndex[k] = j;
k++;
}
}
rowPtr[i + 1] = k; // Point to the start of the next row
}
}

Explanation:

● values array stores the non-zero elements.


● colIndex array stores the column indices of these elements.
● rowPtr array stores the index positions in the values array where each row starts.

This format reduces the space required to store sparse matrices and speeds up matrix
operations like addition and multiplication.
Data Structures (23CSH-241)
Unit 2 Notes & Sample Questions

CHAPTER 2.1 : LINKED LISTS

1. Linear Linked List

A linear linked list is a collection of elements, called nodes, where each node points to
the next node in the sequence. Unlike arrays, linked lists are not stored in contiguous
memory locations. Each node contains two parts:

● Data: The information stored in the node.


● Pointer (Link): A reference to the next node in the sequence.

Types of Linear Linked Lists:

● Singly Linked List: Each node points to the next node and the last node points to
NULL.
● Doubly Linked List: Each node points to both the next and the previous nodes.
● Circular Linked List: The last node points back to the first node, forming a
circle.

2. Representation of Linked Lists in Memory


In memory, a linked list is represented as a collection of nodes where each node consists
of two fields:

● Data Field: Stores the actual data.


● Pointer Field: Stores the address/reference of the next node.

Example representation of a node in a singly linked list in C:

struct Node {
int data; // Data field
struct Node* next; // Pointer to the next node
};

3. Traversing a Linked List

Traversing a linked list involves visiting each node in the list sequentially. This is
typically done using a loop, starting from the head node and moving to each subsequent
node using the pointers.

Example in C:

void traverse(struct Node* head) {


struct Node* current = head; // Start from the head
while (current != NULL) {
printf("%d -> ", current->data); // Print data
current = current->next; // Move to the next node
}
printf("NULL\n"); // Indicate the end of the list
}

4. Searching a Linked List

Searching for an element in a linked list involves traversing the list and comparing each
node's data with the target value. The search stops when the target is found or the end of
the list is reached.

Example in C:

struct Node* search(struct Node* head, int key) {


struct Node* current = head;
while (current != NULL) {
if (current->data == key) {
return current; // Return the node if found
}
current = current->next; // Move to the next node
}
return NULL; // Return NULL if not found
}

5. Insertion in & Deletion from a Linked List

Insertion and deletion operations can occur at the beginning, end, or at a specific
position in the linked list.

Insertion

1. At the Beginning:
○ Create a new node.
○ Point the new node's next pointer to the current head.
○ Update the head pointer to the new node.

Example in C:
void insertAtBeginning(struct Node** head, int newData) {
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = newData;
newNode->next = *head; // Link new node to head
*head = newNode; // Update head to new node
}

2. At the End:
○ Create a new node.
○ Traverse to the last node and link its next pointer to the new node.

Example in C:
void insertAtEnd(struct Node** head, int newData) {
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
struct Node* last = *head; // Start at head
newNode->data = newData; // Set data
newNode->next = NULL; // New node will be the last node

if (*head == NULL) { // If the list is empty


*head = newNode; // Update head
return;
}
while (last->next != NULL) { // Traverse to the last node
last = last->next;
}
last->next = newNode; // Link last node to new node
}

3. At a Specific Position:
○ Traverse to the desired position and adjust pointers accordingly.

Deletion

1. From the Beginning:


○ Update head to point to the second node.

Example in C:
void deleteFromBeginning(struct Node** head) {
if (*head == NULL) return; // List is empty
struct Node* temp = *head; // Store head
*head = (*head)->next; // Move head to next node
free(temp); // Free memory of old head
}

2. From the End:


○ Traverse to the second last node and set its next pointer to NULL.

Example in C:
void deleteFromEnd(struct Node** head) {
if (*head == NULL) return; // List is empty
struct Node* temp = *head;
if (temp->next == NULL) { // Only one node
free(temp);
*head = NULL; // List becomes empty
return;
}
while (temp->next->next != NULL) { // Traverse to the second last
node
temp = temp->next;
}
free(temp->next); // Free last node
temp->next = NULL; // Set second last node's next to NULL
}

3. From a Specific Position:


○ Adjust pointers to bypass the node to be deleted.

6. Header Linked List

A header linked list is a type of linked list that includes a header node, which does not
store data relevant to the list but serves as a starting point for traversals. This helps
simplify operations such as insertion and deletion at the beginning since the header node
always exists.

7. Doubly Linked List

A doubly linked list consists of nodes that contain three fields:

● Previous Pointer: Points to the previous node.


● Data Field: Stores the actual data.
● Next Pointer: Points to the next node.

This structure allows for traversal in both directions.

Example representation of a node in a doubly linked list in C:

struct DNode {
int data; // Data field
struct DNode* prev; // Pointer to the previous node
struct DNode* next; // Pointer to the next node
};

8. Operations on Doubly Linked List

Common Operations:
● Insertion: At the beginning, end, or specific position.
● Deletion: From the beginning, end, or specific position.
● Traversal: Both forward and backward.

Example: Insertion at the Beginning


void insertAtBeginning(struct DNode** head, int newData) {
struct DNode* newNode = (struct DNode*)malloc(sizeof(struct
DNode));
newNode->data = newData; // Set data
newNode->next = *head; // Link new node to current head

if (*head != NULL) {
(*head)->prev = newNode; // Link previous head to new node
}
newNode->prev = NULL; // New node becomes the first node
*head = newNode; // Update head to new node
}

9. Complexity Analysis of Each Algorithm

Operation Time
Complexity

Traversing a linked list O(n)

Searching a linked list O(n)

Insertion at beginning O(1)

Insertion at end O(n)

Insertion at a specific pos. O(n)

Deletion from beginning O(1)

Deletion from end O(n)

Deletion from a specific O(n)


pos.
Traversing a doubly linked O(n)
list

Insertion in doubly linked O(1) or O(n)


list

Deletion in doubly linked O(1) or O(n)


list

10. Applications of Linked Lists

● Dynamic Memory Allocation: Linked lists can grow and shrink as needed,
allowing for efficient memory use.
● Implementing Stacks and Queues: Linked lists are often used to implement
these data structures.
● Graph Representation: Adjacency lists in graph theory can be implemented
using linked lists.
● Undo Functionality in Applications: Linked lists can be used to keep track of
changes in applications that require undo operations.
● Polynomial Representation: Linked lists can represent polynomials where each
node holds a coefficient and exponent.

Conclusion

Linked lists are fundamental data structures with flexible memory allocation, allowing for
dynamic data management. Understanding their operations, complexity, and applications
is essential for efficient programming and algorithm design.

2-Mark Questions

1. Question: What is a linked list?


○ Answer: A linked list is a linear data structure consisting of nodes where
each node contains data and a pointer to the next node in the sequence.
Unlike arrays, linked lists do not require contiguous memory allocation.
2. Question: Describe the structure of a node in a singly linked list.
○ Answer: A node in a singly linked list consists of two fields: a data field to
store the actual data and a pointer (link) field that points to the next node in
the list.
3. Question: What is the time complexity of searching for an element in a linked
list?
○ Answer: The time complexity of searching for an element in a linked list is
O(n) in the worst case, where n is the number of nodes in the list.
4. Question: How do you insert a new node at the beginning of a linked list?
○ Answer: To insert a new node at the beginning, create a new node, set its
next pointer to the current head, and then update the head pointer to the new
node.
5. Question: What is the difference between a singly linked list and a doubly linked
list?
○ Answer: In a singly linked list, each node has one pointer pointing to the
next node, whereas in a doubly linked list, each node has two pointers: one
pointing to the next node and another pointing to the previous node.
6. Question: Define a header linked list.
○ Answer: A header linked list contains a special header node that does not
store any relevant data but serves as a starting point for traversals and
simplifies operations like insertion and deletion.
7. Question: What is the purpose of a doubly linked list?
○ Answer: A doubly linked list allows traversal in both forward and
backward directions, making operations like insertion and deletion more
flexible compared to singly linked lists.
8. Question: Describe the process of deleting a node from the end of a linked list.
○ Answer: To delete a node from the end, traverse the list to the
second-to-last node, set its next pointer to NULL, and free the memory of
the last node.
9. Question: What is the space complexity of a linked list?
○ Answer: The space complexity of a linked list is O(n), where n is the
number of nodes, since each node requires additional memory for the
pointer(s).
10. Question: Give one application of linked lists.
○ Answer: One application of linked lists is in implementing dynamic
memory allocation, as they allow efficient insertion and deletion of nodes
without reallocating memory.
5-Mark Questions

1. Question: Explain the process of traversing a linked list with a suitable code
example.
○ Answer: To traverse a linked list, start from the head node and follow the
next pointers until you reach a node whose next pointer is NULL. During
traversal, you can perform operations such as printing the data of each
node.

Example Code:
void traverse(struct Node* head) {
struct Node* current = head; // Start from the head
while (current != NULL) {
printf("%d -> ", current->data); // Print data
current = current->next; // Move to the next node
}
printf("NULL\n"); // Indicate the end of the list
}

2. Question: Describe the insertion operations in a doubly linked list, including at


the beginning, end, and a specific position.
○ Answer:
■ At the Beginning: Create a new node, set its next pointer to the
current head, and update the head. Also, set the previous pointer of
the current head to the new node.
■ At the End: Traverse to the last node, create a new node, set the last
node's next pointer to the new node, and set the new node's previous
pointer to the last node.
■ At a Specific Position: Traverse to the desired position, adjust the
pointers of the previous and next nodes to include the new node.
3. Question: Compare the advantages and disadvantages of linked lists and arrays.
○ Answer:
■ Advantages of Linked Lists:
■ Dynamic size; no need to preallocate memory.
■ Efficient insertions and deletions (O(1) at the head).
■ Disadvantages of Linked Lists:
■ Increased memory overhead due to pointers.
■ Random access is not possible; traversal is required.
■ Advantages of Arrays:
■ Fixed size; efficient in terms of memory access.
■ Direct access to elements via index.
■ Disadvantages of Arrays:
■ Fixed size; resizing requires allocation of new memory.
■ Inefficient insertions and deletions (O(n) in the worst case).
4. Question: Discuss the process of deleting a node from a specific position in a
linked list with an example.
○ Answer: To delete a node from a specific position:
■ Traverse the list to reach the node before the position.
■ Adjust the next pointer of the previous node to skip the node to be
deleted.
■ Free the memory of the node being deleted.

Example Code:
void deleteAtPosition(struct Node** head, int position) {
if (*head == NULL) return; // List is empty
struct Node* temp = *head;

// If head needs to be removed


if (position == 0) {
*head = temp->next; // Change head
free(temp); // Free old head
return;
}

// Find previous node


for (int i = 0; temp != NULL && i < position - 1; i++)
temp = temp->next;

// If position is more than number of nodes


if (temp == NULL || temp->next == NULL) return;

// Node temp->next is the node to be deleted


struct Node* next = temp->next->next; // Store pointer to the next
of node to be deleted
free(temp->next); // Free memory
temp->next = next; // Unlink the deleted node from the list
}

5. Question: Explain how a circular linked list works and its advantages over a
regular linked list.
○ Answer: In a circular linked list, the last node's next pointer points back
to the first node instead of NULL, creating a loop. This structure allows for
continuous traversal of the list without the need to reset to the head, which
is useful for applications that require repeated cycling through the list, such
as in round-robin scheduling.

Advantages:

○ Easier to traverse continuously without checking for NULL.


○ More efficient for certain applications that require cycling through
elements, as you can start from any node and keep traversing.

CHAPTER 2.2 : STACKS

Basic Terminology

● Stack: A linear data structure that follows the Last In First Out (LIFO) principle,
where the last element added is the first one to be removed.
● Top: The index or pointer that indicates the last element added to the stack.
● Push: The operation to add an element to the top of the stack.
● Pop: The operation to remove the top element from the stack.
● Peek/Top: The operation to retrieve the top element without removing it.
● Underflow: An error condition that occurs when attempting to pop from an empty
stack.
● Overflow: An error condition that occurs when attempting to push onto a full
stack (in the case of a fixed-size stack).

Sequential and Linked Representations

1. Sequential Representation:
○ A stack can be implemented using an array.
○ An array of fixed size is created, and a variable (top) keeps track of the
index of the last element.
○ Advantages: Simple to implement and provides fast access to elements.
○ Disadvantages: Fixed size leads to overflow when the stack exceeds its
limit.

Example Code:
#define MAX 100
struct Stack {
int items[MAX];
int top;
};

void initStack(struct Stack* s) {


s->top = -1; // Stack is empty
}

void push(struct Stack* s, int value) {


if (s->top < MAX - 1) {
s->items[++s->top] = value; // Increment top and add item
} else {
printf("Stack Overflow\n");
}
}

int pop(struct Stack* s) {


if (s->top >= 0) {
return s->items[s->top--]; // Return and decrement top
} else {
printf("Stack Underflow\n");
return -1; // Error value
}
}

2. Linked Representation:
○ A stack can also be implemented using a linked list where each node points
to the next node in the stack.
○ Advantages: Dynamic size, no overflow (except for memory limits).
○ Disadvantages: Overhead of extra memory for pointers.
Example Code:
struct Node {
int data;
struct Node* next;
};

struct Stack {
struct Node* top;
};

void initStack(struct Stack* s) {


s->top = NULL; // Stack is empty
}

void push(struct Stack* s, int value) {


struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = value;
newNode->next = s->top; // Link new node to previous top
s->top = newNode; // Update top to new node
}

int pop(struct Stack* s) {


if (s->top == NULL) {
printf("Stack Underflow\n");
return -1; // Error value
}
struct Node* temp = s->top;
int poppedValue = temp->data;
s->top = s->top->next; // Update top
free(temp); // Free old top
return poppedValue;
}

3.

Operations on Stacks

● Push Operation:
○ Adds an element to the top of the stack.
○ Increases the top index or pointer.
○ Checks for overflow condition (if using an array).
● Pop Operation:
○ Removes the element from the top of the stack.
○ Decreases the top index or pointer.
○ Checks for underflow condition.

Applications of Stacks

1. Parenthesis Matching:
○ Stacks can be used to check for balanced parentheses in expressions.
○ Traverse the expression and push opening brackets onto the stack; pop for
closing brackets.
○ If the stack is empty at the end and all brackets match, the expression is
balanced.

Example Code:
int isBalanced(char* expr) {
struct Stack s;
initStack(&s);
for (int i = 0; expr[i]; i++) {
if (expr[i] == '(') {
push(&s, expr[i]);
} else if (expr[i] == ')') {
if (pop(&s) == -1) {
return 0; // Unmatched closing parenthesis
}
}
}
return s.top == -1; // Balanced if stack is empty
}

2. Evaluation of Postfix Expressions:


○ A stack can evaluate postfix expressions (also known as Reverse Polish
Notation).
○ Operands are pushed onto the stack, and when an operator is encountered,
pop the required operands, apply the operator, and push the result back onto
the stack.
Example Code:
int evaluatePostfix(char* expr) {
struct Stack s;
initStack(&s);
for (int i = 0; expr[i]; i++) {
if (isdigit(expr[i])) {
push(&s, expr[i] - '0'); // Convert char to int
} else {
int operand2 = pop(&s);
int operand1 = pop(&s);
switch (expr[i]) {
case '+': push(&s, operand1 + operand2); break;
case '-': push(&s, operand1 - operand2); break;
case '*': push(&s, operand1 * operand2); break;
case '/': push(&s, operand1 / operand2); break;
}
}
}
return pop(&s); // Result is the last item
}

3. Conversion from Infix to Postfix Representation:


○ The infix expression is converted to postfix using the Shunting Yard
algorithm.
○ Operands are output immediately; operators are pushed to the stack based
on precedence and associativity.
4. Example Algorithm Steps:
○ Initialize an empty stack for operators and an output list.
○ Read tokens from the infix expression:
■ If an operand, add it to the output.
■ If an operator, pop from the stack to the output until the top of the
stack has an operator of less precedence.
■ Push the operator onto the stack.
○ At the end, pop all operators to the output.

Meaning and Importance of Recursion

● Recursion: A programming technique where a function calls itself to solve a


smaller instance of the same problem.
● Importance:
○ Simplifies code for problems that can be divided into subproblems (e.g.,
factorial, Fibonacci series).
○ Useful for traversing complex data structures like trees and graphs.

Principles of Recursion

1. Base Case: The condition under which the recursion stops. It prevents infinite
calls.
2. Recursive Case: The part of the function that includes the recursive call.

Implementation of Recursive Procedures

● Recursive functions typically have a base case and one or more recursive cases.

Example: Factorial Calculation

int factorial(int n) {
if (n == 0) { // Base case
return 1;
} else { // Recursive case
return n * factorial(n - 1);
}
}

Example: Fibonacci Series Calculation

int fibonacci(int n) {
if (n == 0) { // Base case
return 0;
} else if (n == 1) { // Base case
return 1;
} else { // Recursive case
return fibonacci(n - 1) + fibonacci(n - 2);
}
}
2-Mark Questions

1. What is a stack?
○ A stack is a linear data structure that follows the Last In First Out (LIFO)
principle, where the last element added is the first one to be removed.
2. Define the PUSH operation in a stack.
○ The PUSH operation adds an element to the top of the stack and increases
the index or pointer representing the top of the stack.
3. What does the term "underflow" mean in the context of stacks?
○ Underflow refers to the condition that occurs when trying to pop an element
from an empty stack.
4. Explain the difference between sequential and linked representations of a
stack.
○ Sequential representation uses an array with a fixed size, while linked
representation uses a linked list, allowing dynamic sizing.
5. What is the time complexity of the POP operation in a stack?
○ The time complexity of the POP operation is O(1) because it involves
accessing the top element and adjusting the top index or pointer.
6. Give one application of stacks in programming.
○ Stacks are used for parenthesis matching in expressions to ensure that
brackets are balanced.
7. What is the base case in recursion?
○ The base case is a condition that stops the recursion by providing a
straightforward solution for a specific input, preventing infinite calls.
8. Describe the principle of recursion with an example.
○ Recursion involves a function calling itself with a smaller instance of the
same problem. For example, the factorial function calls itself with (n-1).
9. What is the importance of recursion in programming?
○ Recursion simplifies the solution for problems that can be divided into
smaller subproblems, making the code cleaner and easier to understand.
10. What is the time complexity of evaluating a postfix expression using a stack?
○ The time complexity for evaluating a postfix expression using a stack is
O(n), where n is the number of operands in the expression.

5-Mark Questions

1. Explain how to implement a stack using an array. Include the basic


operations and their complexities.
○ To implement a stack using an array, create an array of fixed size and
maintain a top variable to track the last element's index. The PUSH
operation adds an element to the index indicated by top and increments it;
the POP operation decrements top and retrieves the element at that index.
Both operations have a time complexity of O(1).

#define MAX 100


struct Stack {
int items[MAX];
int top;
};

void initStack(struct Stack* s) {


s->top = -1; // Stack is empty
}

void push(struct Stack* s, int value) {


if (s->top < MAX - 1) {
s->items[++s->top] = value;
}
}

int pop(struct Stack* s) {


if (s->top >= 0) {
return s->items[s->top--];
}
return -1; // Underflow
}

2. Describe how to evaluate a postfix expression using a stack. Provide a code


example.
○ To evaluate a postfix expression, read tokens sequentially. Push operands
onto the stack. When an operator is encountered, pop the required number
of operands from the stack, apply the operator, and push the result back
onto the stack. At the end, the final result will be on the top of the stack.

int evaluatePostfix(char* expr) {


struct Stack s;
initStack(&s);
for (int i = 0; expr[i]; i++) {
if (isdigit(expr[i])) {
push(&s, expr[i] - '0');
} else {
int operand2 = pop(&s);
int operand1 = pop(&s);
switch (expr[i]) {
case '+': push(&s, operand1 + operand2); break;
case '-': push(&s, operand1 - operand2); break;
case '*': push(&s, operand1 * operand2); break;
case '/': push(&s, operand1 / operand2); break;
}
}
}
return pop(&s);
}

3. Explain the process of converting an infix expression to a postfix expression


using a stack.
○ To convert infix to postfix:
■ Initialize an empty stack for operators and an output list for the
postfix expression.
■ Read tokens from the infix expression:
■ If an operand, add it to the output.
■ If an operator, pop from the stack to the output until the top of
the stack has an operator of lower precedence.
■ Push the operator onto the stack.
■ After reading all tokens, pop all operators from the stack to the
output.
○ This process ensures that operators maintain their precedence and
associativity in the final output.
4. Discuss the advantages and disadvantages of using recursion.
○ Advantages:
■ Simplifies the code for problems that can be broken down into
smaller subproblems.
■ Easier to understand and implement for certain algorithms (e.g., tree
traversal).
○ Disadvantages:
■ Can lead to high memory usage due to stack space for function calls,
which may cause stack overflow for deep recursions.
■ Recursive solutions can be less efficient than iterative solutions,
especially if not optimized (e.g., using memoization).
5. Illustrate the concept of recursion with a detailed example. Explain both the
base case and the recursive case.

Consider the function to compute the Fibonacci sequence:


int fibonacci(int n) {
if (n == 0) return 0; // Base case
if (n == 1) return 1; // Base case
return fibonacci(n - 1) + fibonacci(n - 2); // Recursive case
}

○ Base Cases: The function stops calling itself when n is 0 or 1, returning the
corresponding Fibonacci number.
○ Recursive Case: For n greater than 1, the function calls itself with (n-1)
and (n-2) to compute the nth Fibonacci number by summing the two
preceding values.

CHAPTER 2.3 : QUEUES

Queues

A queue is a linear data structure that follows the First In First Out (FIFO) principle. This
means that the first element added to the queue will be the first one to be removed.

1. Linear Queue

● Definition: A linear queue is a straightforward queue implementation where


elements are added at one end (rear) and removed from the other end (front).
● Structure:
○ A linear queue can be represented using an array or a linked list.
○ In an array representation, a fixed size is allocated for the queue, while in a
linked list representation, nodes are dynamically created.
● Operations:
○ Enqueue: Add an element to the rear of the queue.
○ Dequeue: Remove an element from the front of the queue.
○ Front: Retrieve the front element without removing it.
○ isEmpty: Check if the queue is empty.
○ isFull: Check if the queue is full (in the case of an array implementation).

2. Sequential Representation of Linear Queue


Array Implementation:
#define MAX 100

struct Queue {
int items[MAX];
int front;
int rear;
};

void initQueue(struct Queue* q) {


q->front = -1; // Queue is empty
q->rear = -1; // Queue is empty
}

int isFull(struct Queue* q) {


return q->rear == MAX - 1;
}

int isEmpty(struct Queue* q) {


return q->front == -1;
}

void enqueue(struct Queue* q, int value) {


if (isFull(q)) {
printf("Queue is full!\n");
return;
}
if (isEmpty(q)) {
q->front = 0; // First element added
}
q->items[++q->rear] = value;
}

int dequeue(struct Queue* q) {


if (isEmpty(q)) {
printf("Queue is empty!\n");
return -1; // Underflow
}
int dequeuedValue = q->items[q->front];
if (q->front == q->rear) {
q->front = q->rear = -1; // Reset when queue is empty
} else {
q->front++;
}
return dequeuedValue;
}

3. Linked Representation of Linear Queue


Node Structure:
struct Node {
int data;
struct Node* next;
};

struct Queue {
struct Node* front;
struct Node* rear;
};

● Operations:
○ Enqueue: Create a new node and link it to the rear of the queue.
○ Dequeue: Remove the node at the front of the queue and adjust the front
pointer.
4. Circular Queue

● Definition: A circular queue is a linear queue where the last position is connected
back to the first position, forming a circle. This allows for efficient use of space.
● Advantages: It eliminates the problem of wasted space in a linear queue when
elements are dequeued.
● Operations:
○ Enqueue: Similar to a linear queue, but the rear wraps around to the front if
it reaches the end of the array.
○ Dequeue: The front pointer wraps around when it reaches the end.

Implementation:
#define MAX 100

struct CircularQueue {
int items[MAX];
int front;
int rear;
};

void initCircularQueue(struct CircularQueue* q) {


q->front = -1;
q->rear = -1;
}

int isFull(struct CircularQueue* q) {


return (q->rear + 1) % MAX == q->front;
}

int isEmpty(struct CircularQueue* q) {


return q->front == -1;
}

void enqueue(struct CircularQueue* q, int value) {


if (isFull(q)) {
printf("Queue is full!\n");
return;
}
if (isEmpty(q)) {
q->front = 0; // First element added
}
q->rear = (q->rear + 1) % MAX;
q->items[q->rear] = value;
}

int dequeue(struct CircularQueue* q) {


if (isEmpty(q)) {
printf("Queue is empty!\n");
return -1;
}
int dequeuedValue = q->items[q->front];
if (q->front == q->rear) {
q->front = q->rear = -1; // Reset when queue is empty
} else {
q->front = (q->front + 1) % MAX;
}
return dequeuedValue;
}

5. Deques (Double-Ended Queues)

● Definition: A deque allows insertion and deletion of elements from both ends
(front and rear).
● Operations:
○ Enqueue Front: Add an element to the front of the deque.
○ Enqueue Rear: Add an element to the rear of the deque.
○ Dequeue Front: Remove an element from the front of the deque.
○ Dequeue Rear: Remove an element from the rear of the deque.
● Implementation:
○ A deque can be implemented using either an array or a doubly linked list.

6. Priority Queue
● Definition: A priority queue is an abstract data type where each element has a
priority assigned. Elements are dequeued based on their priority rather than their
order in the queue.
● Types:
○ Min-Priority Queue: The element with the lowest priority is dequeued
first.
○ Max-Priority Queue: The element with the highest priority is dequeued
first.
● Implementation:
○ A priority queue can be implemented using an array, linked list, or a binary
heap.
● Operations:
○ Enqueue: Insert an element in the queue based on its priority.
○ Dequeue: Remove and return the element with the highest (or lowest)
priority.

Summary

● Queues are essential data structures that operate on a FIFO basis, and they can be
represented in different ways, such as linear, circular, and linked lists.
● Deques and priority queues extend the basic queue functionality, allowing for
more complex operations based on specific use cases.

1. Linear Queue

● Definition: A linear queue is a straightforward queue implementation where


elements are added at one end (rear) and removed from the other end (front).
● Structure:
○ A linear queue can be represented using an array or a linked list.
○ In an array representation, a fixed size is allocated for the queue, while in a
linked list representation, nodes are dynamically created.
● Operations:
○ Enqueue: Add an element to the rear of the queue.
○ Dequeue: Remove an element from the front of the queue.
○ Front: Retrieve the front element without removing it.
○ isEmpty: Check if the queue is empty.
○ isFull: Check if the queue is full (in the case of an array implementation).
2. Sequential Representation of Linear Queue
Array Implementation:
c
Copy code
#define MAX 100

struct Queue {
int items[MAX];
int front;
int rear;
};

void initQueue(struct Queue* q) {


q->front = -1; // Queue is empty
q->rear = -1; // Queue is empty
}

int isFull(struct Queue* q) {


return q->rear == MAX - 1;
}

int isEmpty(struct Queue* q) {


return q->front == -1;
}

void enqueue(struct Queue* q, int value) {


if (isFull(q)) {
printf("Queue is full!\n");
return;
}
if (isEmpty(q)) {
q->front = 0; // First element added
}
q->items[++q->rear] = value;
}

int dequeue(struct Queue* q) {


if (isEmpty(q)) {
printf("Queue is empty!\n");
return -1; // Underflow
}
int dequeuedValue = q->items[q->front];
if (q->front == q->rear) {
q->front = q->rear = -1; // Reset when queue is
empty
} else {
q->front++;
}
return dequeuedValue;
}

3. Linked Representation of Linear Queue


Node Structure:
c
Copy code
struct Node {
int data;
struct Node* next;
};

struct Queue {
struct Node* front;
struct Node* rear;
};


● Operations:
○ Enqueue: Create a new node and link it to the rear of the queue.
○ Dequeue: Remove the node at the front of the queue and adjust the front
pointer.

4. Circular Queue

● Definition: A circular queue is a linear queue where the last position is connected
back to the first position, forming a circle. This allows for efficient use of space.
● Advantages: It eliminates the problem of wasted space in a linear queue when
elements are dequeued.
● Operations:
○ Enqueue: Similar to a linear queue, but the rear wraps around to the front if
it reaches the end of the array.
○ Dequeue: The front pointer wraps around when it reaches the end.

Implementation:
c
Copy code
#define MAX 100

struct CircularQueue {
int items[MAX];
int front;
int rear;
};

void initCircularQueue(struct CircularQueue* q) {


q->front = -1;
q->rear = -1;
}

int isFull(struct CircularQueue* q) {


return (q->rear + 1) % MAX == q->front;
}

int isEmpty(struct CircularQueue* q) {


return q->front == -1;
}

void enqueue(struct CircularQueue* q, int value) {


if (isFull(q)) {
printf("Queue is full!\n");
return;
}
if (isEmpty(q)) {
q->front = 0; // First element added
}
q->rear = (q->rear + 1) % MAX;
q->items[q->rear] = value;
}

int dequeue(struct CircularQueue* q) {


if (isEmpty(q)) {
printf("Queue is empty!\n");
return -1;
}
int dequeuedValue = q->items[q->front];
if (q->front == q->rear) {
q->front = q->rear = -1; // Reset when queue is
empty
} else {
q->front = (q->front + 1) % MAX;
}
return dequeuedValue;
}

5. Deques (Double-Ended Queues)

● Definition: A deque allows insertion and deletion of elements from both ends
(front and rear).
● Operations:
○ Enqueue Front: Add an element to the front of the deque.
○ Enqueue Rear: Add an element to the rear of the deque.
○ Dequeue Front: Remove an element from the front of the deque.
○ Dequeue Rear: Remove an element from the rear of the deque.
● Implementation:
○ A deque can be implemented using either an array or a doubly linked list.

6. Priority Queue

● Definition: A priority queue is an abstract data type where each element has a
priority assigned. Elements are dequeued based on their priority rather than their
order in the queue.
● Types:
○ Min-Priority Queue: The element with the lowest priority is dequeued
first.
○ Max-Priority Queue: The element with the highest priority is dequeued
first.
● Implementation:
○ A priority queue can be implemented using an array, linked list, or a binary
heap.
● Operations:
○ Enqueue: Insert an element in the queue based on its priority.
○ Dequeue: Remove and return the element with the highest (or lowest)
priority.

Summary

● Queues are essential data structures that operate on a FIFO basis, and they can be
represented in different ways, such as linear, circular, and linked lists.
● Deques and priority queues extend the basic queue functionality, allowing for
more complex operations based on specific use cases.

Questions of 2 Marks

1. What is a queue?
○ A queue is a linear data structure that follows the First In First Out (FIFO)
principle, where the first element added is the first one to be removed.
2. Explain the difference between a linear queue and a circular queue.
○ In a linear queue, elements are added at the rear and removed from the
front, leading to wasted space when elements are dequeued. In a circular
queue, the last position is connected back to the first, allowing efficient use
of space.
3. What operations are typically supported by a queue?
○ Common operations include Enqueue (inserting an element), Dequeue
(removing an element), Front (retrieving the front element), isEmpty
(checking if the queue is empty), and isFull (checking if the queue is full).
4. Describe the concept of a priority queue.
○ A priority queue is an abstract data type where each element has a priority.
Elements are dequeued based on their priority rather than the order in
which they were added.
5. What is the primary advantage of using a circular queue over a linear queue?
○ A circular queue eliminates wasted space that can occur in a linear queue
by reusing positions of dequeued elements, allowing better memory
utilization.
6. How is a deque different from a regular queue?
○ A deque (double-ended queue) allows insertion and deletion of elements
from both ends (front and rear), while a regular queue allows operations
only at one end for insertion and the other end for deletion.
7. What is the time complexity of the enqueue and dequeue operations in a
queue implemented with a linked list?
○ The time complexity for both enqueue and dequeue operations in a linked
list implementation is O(1), as they involve inserting or removing elements
at the front or rear.
8. In which scenarios would you prefer to use a priority queue?
○ A priority queue is preferred in scenarios such as scheduling tasks based on
priority, managing bandwidth in networks, or handling events in
simulations where certain events need to be processed before others.
9. What is the effect of trying to dequeue an element from an empty queue?
○ Attempting to dequeue an element from an empty queue typically results in
an error or an underflow condition, as there are no elements to remove.
10. Explain how a circular queue is implemented using an array.
○ A circular queue uses an array with two pointers (front and rear). When
adding or removing elements, the pointers wrap around to the beginning of
the array if they reach the end, allowing the queue to reuse positions of
dequeued elements.

Questions of 5 Marks

1. Discuss the different methods of implementing a queue and their advantages


and disadvantages.
○ Array Implementation: Simple to implement but has a fixed size, leading
to overflow issues if full.
○ Linked List Implementation: Dynamic size, no overflow, but requires
extra memory for pointers.
○ Circular Queue: Efficient memory use, eliminates wasted space but can be
more complex to manage.
2. Explain how to implement a circular queue using an array, including the
operations for enqueue and dequeue. Provide sample code.
○ A circular queue is implemented using an array with two pointers (front and
rear). Enqueue and dequeue operations adjust the pointers to wrap around
the array's bounds.
#define MAX 100

struct CircularQueue {
int items[MAX];
int front;
int rear;
};

void enqueue(struct CircularQueue* q, int value) {


if ((q->rear + 1) % MAX == q->front) {
printf("Queue is full!\n");
return;
}
if (q->front == -1) {
q->front = 0; // First element
}
q->rear = (q->rear + 1) % MAX;
q->items[q->rear] = value;
}

int dequeue(struct CircularQueue* q) {


if (q->front == -1) {
printf("Queue is empty!\n");
return -1;
}
int dequeuedValue = q->items[q->front];
if (q->front == q->rear) {
q->front = q->rear = -1; // Reset when queue is empty
} else {
q->front = (q->front + 1) % MAX;
}
return dequeuedValue;
}

3. Describe the applications of queues in real-world scenarios. Provide at least


three examples.
○ Job Scheduling: Operating systems use queues to manage processes
waiting for CPU time.
○ Print Spooling: Print jobs are queued to ensure they are printed in the order
received.
○ Call Centers: Incoming calls are queued to ensure they are answered in the
order they arrive.
4. Explain the concept of a priority queue and how it differs from a regular
queue. Illustrate with examples.
○ In a priority queue, each element has a priority level, and elements are
dequeued based on priority rather than order. For example, in a hospital,
patients with critical conditions are treated before those with minor issues,
regardless of their arrival time.
5. Discuss how a deque can be implemented using a doubly linked list, including
its operations. Provide sample code.
○ A deque can be implemented using a doubly linked list where each node
has pointers to both the next and previous nodes, allowing for efficient
insertion and deletion from both ends.

struct Node {
int data;
struct Node* next;
struct Node* prev;
};

struct Deque {
struct Node* front;
struct Node* rear;
};

void enqueueFront(struct Deque* dq, int value) {


struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = value;
newNode->next = dq->front;
newNode->prev = NULL;
if (dq->front != NULL) {
dq->front->prev = newNode;
}
dq->front = newNode;
if (dq->rear == NULL) {
dq->rear = newNode; // First element added
}
}

void enqueueRear(struct Deque* dq, int value) {


struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = value;
newNode->next = NULL;
newNode->prev = dq->rear;
if (dq->rear != NULL) {
dq->rear->next = newNode;
}
dq->rear = newNode;
if (dq->front == NULL) {
dq->front = newNode; // First element added
}

}
Sample Paper for Mid Semester Test - II

Program Name/Code : Bachelor of Engineering

Semester : 3rd

Subject : Data Structures

Time : 1 hour Maximum marks : 20

Q.No. Statement CO mapping


Section A
5 x 2 = 10 marks
1 What is a linear linked list, and how does it store data in CO2
memory?
2 Given a circular queue with a size of 5 and the elements [1, CO2
2, 3], what will be the state of the queue after two more
enqueues and one dequeue?
3 Write a recursive function to evaluate a postfix expression CO3
like "2354+".
4 Compare a singly linked list with a doubly linked list in CO1
terms of memory usage and traversal operations.
5 Differentiate between a linear queue and a circular queue in CO1
terms of structure and efficiency.
Section B
2 x 5 = 10 marks
6 Write a C function to traverse and print all elements of a CO3
singly linked list. Explain the logic behind how traversal
works in a linked list.
7 Implement a stack using an array in C, and write a function CO5
for checking balanced parentheses in a given string using
this stack. Explain how the stack helps in matching
parentheses.
DATA STRUCTURES (23CSH - 241)
Unit 3 NOTES

3.1 : Graph Theory


1. Introduction to Graphs
Graphs are fundamental structures used in computer science and mathematics to model
networks, relationships, and paths. Graphs are represented by sets of vertices (nodes) and
edges (connections), providing a way to capture and analyze complex data relationships
effectively.

Applications of Graph Theory

Graph theory has a wide range of applications in various fields, such as:

● Computer Science: Used in networking, data structure analysis, search algorithms, and
AI.
● Social Networks: To represent friendships, connections, or interactions between people.
● Biology: Mapping biological networks, such as neural pathways or food chains.
● Transportation and Navigation: For routing, shortest path, and connectivity analysis,
especially in maps and logistics.
2. Key Terminology in Graph Theory
To work with graphs effectively, understanding the terminology is essential.

Basic Terms

1. Vertex (Node): The fundamental unit of a graph, representing entities or points.


○ Example: In a city map, vertices represent intersections.
2. Edge: A line connecting two vertices, representing a relationship between them.
○ Example: In a transportation network, edges represent roads between
intersections.

Types of Graphs

1. Undirected Graph:
○ An undirected graph has edges that do not have a specific direction. This means
if there is an edge between A and B, you can traverse it from A to B and vice
versa.
○ Notation: G=(V,E), where V is the set of vertices and E is the set of edges.
○ Example: In a social network where friendships are mutual, an undirected edge
represents a friendship between two people.
2. Directed Graph (Digraph):
○ In a directed graph, each edge has a direction, meaning it goes from one vertex
to another specific vertex.
○ Notation: Often represented as (u→v), indicating a one-way edge from u to v.
○ Example: A Twitter following network, where one user can follow another, but the
follow may not be reciprocal.
3. Weighted Graph:
○ In a weighted graph, each edge has a numerical value, or “weight,” associated
with it.
○ Example: A city map with distances between locations as edge weights.
4. Unweighted Graph:
○ An unweighted graph has edges with no weights, meaning all edges are
considered equal in terms of cost or distance.

Graph Characteristics

1. Path:
○ A path is a sequence of vertices connected by edges. A path between vertices A
and B might go through multiple intermediate vertices.
○ Example: In a subway system, the path from Station A to Station C might go
through Station B.
2. Cycle:

A cycle is a path that starts and ends at the same vertex without traversing any
vertex more than once (except for the start/end vertex).
○ Example: In a food chain, a cycle could represent an ecosystem loop where each
organism is consumed by another, eventually leading back to the initial organism.
3. Degree:
○ The degree of a vertex is the number of edges connected to it.
○ In-Degree (for directed graphs): The number of incoming edges to a vertex.
○ Out-Degree (for directed graphs): The number of outgoing edges from a vertex.

Connectivity

1. Connected Graph:
○ A graph is connected if there is a path between every pair of vertices.
2. Disconnected Graph:
○ A graph with at least one pair of vertices that do not have a connecting path.

3. Graph Representation
Graphs can be represented in multiple ways, which makes it easier to store and manipulate
them depending on the application.

Adjacency Matrix

● A 2D matrix where A[i][j] represents the edge between vertices i and j.


○ 1 (or weight) if there’s an edge.
○ 0 if there’s no edge.

Example: In a friend network of four people (A, B, C, D):

A B C D

A 0 1 0 1

B 1 0 1 0

C 0 1 0 1

D 1 0 1 0

Adjacency List

● Each vertex has a list of all its neighboring vertices.


● Efficient for sparse graphs as it uses less space than the adjacency matrix.
Example: For a graph with vertices A, B, C, and D:

● A: B, D
● B: A, C
● C: B, D
● D: A, C

Path Matrix

● A matrix representation indicating the existence of paths between pairs of vertices.


● PM[i][j]=1 if a path exists from i to j, otherwise PM[i][j]=0.

4. Graph Traversal Techniques


Traversal refers to visiting nodes systematically to explore and gather information from a graph.
The two primary traversal algorithms are Depth-First Search (DFS) and Breadth-First Search
(BFS).

Depth-First Search (DFS)

Definition: DFS explores as deeply as possible along each branch before backtracking.

Steps:

1. Initialize: Start from a source vertex, mark it as visited.


2. Visit Neighbors: Visit an unvisited adjacent vertex, mark it as visited, and continue to
this vertex.
3. Backtrack: If there are no unvisited vertices, backtrack to the last visited vertex and
explore its unvisited neighbors.
4. Repeat: Continue until all vertices are visited.

Example: Imagine solving a maze where you go as far as possible in one direction before trying
other paths if you hit a dead end.

Algorithm:

DFS(Graph, source):
Mark source as visited
For each neighbor of source:
If neighbor is not visited:
Call DFS(neighbor)
Breadth-First Search (BFS)

Definition: BFS explores all vertices at the current depth level before moving to the next level.

Steps:

1. Initialize: Start from a source vertex, mark it as visited, and add it to a queue.
2. Dequeue: Remove the front vertex from the queue.
3. Visit Neighbors: For each unvisited neighbor of the dequeued vertex, mark it as visited
and add it to the queue.
4. Repeat: Continue until the queue is empty.

Example: BFS is like spreading out from the center of a ripple, exploring each level one by one,
making it ideal for finding shortest paths in unweighted graphs.

Algorithm:

BFS(Graph, source):
Mark source as visited
Enqueue source
While queue is not empty:
vertex = Dequeue()
For each neighbor of vertex:
If neighbor is not visited:
Mark neighbor as visited
Enqueue neighbor

5. Operations on Graphs
Adding and Removing Nodes and Edges

1. Adding a Node: Add a vertex with no edges.


2. Adding an Edge: Connect two vertices with an edge.
3. Removing a Node: Remove the vertex and all connected edges.
4. Removing an Edge: Delete a specific edge between two vertices.

Checking for Connectivity

● A graph is connected if there is a path between every pair of vertices.


● Algorithm: Use DFS or BFS starting from any vertex. If all vertices are reachable, the
graph is connected.
Detecting Cycles

● Directed Graph: Use DFS with a recursion stack to detect cycles.


● Undirected Graph: DFS or BFS can detect cycles if they find a back edge that connects
a vertex to an already visited vertex that is not its parent.

Connected Components

● A connected component is a subgraph in which any two vertices are connected by


paths.
● Algorithm:
1. Use DFS or BFS on each unvisited node.
2. Mark all reachable vertices as part of the same component.
3. Repeat until all nodes are visited.

Summary
Graphs are versatile tools for representing complex data structures in real-world applications.
Understanding the core concepts of graph types, representations, and traversal algorithms
(DFS and BFS) is essential for analyzing networks, relationships, and paths. With the ability to
add/remove nodes and edges, detect cycles, and identify connected components, graph theory
provides the foundation for tackling real-world problems across domains like networking, social
media, biology, and transportation.

Short Questions (2 Marks Each)

1. Define a graph and give two real-world examples of where graphs are used.

● Answer: A graph is a collection of vertices (nodes) connected by edges, representing


relationships between entities. Examples include:
1. Social networks, where vertices represent people, and edges represent
friendships or connections.
2. Transportation networks, where vertices represent locations, and edges
represent routes between them.

2. What is the difference between a directed and an undirected graph?


● Answer: In a directed graph, edges have a direction, indicating a one-way relationship
(e.g., Twitter follower connections). In an undirected graph, edges have no direction,
representing mutual relationships (e.g., Facebook friendships).

3. Describe the adjacency matrix representation of a graph.

● Answer: An adjacency matrix is a 2D matrix where the cell A[i][j] is 1 if there is an edge
between vertices i and j (or a weight if it's weighted) and 0 if there is no edge. It’s
suitable for dense graphs but uses more memory for sparse graphs.

4. What is the purpose of graph traversal, and name the two primary traversal methods.

● Answer: Graph traversal is the process of visiting each vertex in a graph systematically
to explore and analyze relationships. The two primary traversal methods are Depth-First
Search (DFS) and Breadth-First Search (BFS).

Long Questions (5 Marks Each)

1. Explain the adjacency list representation of a graph with an example.

● Answer: An adjacency list represents a graph by storing each vertex as a list of its
neighbors. It is more memory-efficient for sparse graphs than an adjacency matrix.
Example: For a graph with vertices A, B, C, and D:
○ A: B, D
○ B: A, C
○ C: B, D
○ D: A, C
● In this example, vertex A is connected to B and D, vertex B to A and C, and so on. This
representation reduces storage by only listing connections that exist, making it
particularly effective for large, sparse graphs.

2. Differentiate between a path and a cycle in a graph. Provide an example of each.

● Answer: A path in a graph is a sequence of vertices connected by edges, with no


repeated vertices. Example: In a graph with vertices A→B→C, this is a path from A to C.
A cycle is a path that starts and ends at the same vertex without repeating any other
vertex. Example: In a graph with vertices A→B→C→A, this forms a cycle starting and
ending at A.
3. Outline the main differences between DFS and BFS traversal techniques.

● Answer: The main differences between Depth-First Search (DFS) and Breadth-First
Search (BFS) are:
1. DFS explores as deeply as possible along each branch before backtracking,
while BFS explores all neighbors at the current depth before moving to the next
level.
2. DFS uses a stack (or recursion), while BFS uses a queue.
3. DFS is suited for scenarios like maze solving where a deep path needs
exploring, while BFS is suited for shortest path problems in unweighted graphs.
4. BFS finds the shortest path in unweighted graphs, but DFS does not guarantee
the shortest path.

Very Long Questions (10 Marks Each)

1. Explain Depth-First Search (DFS) algorithm in detail with step-by-step explanation and
a practical example.

● Answer: Depth-First Search (DFS) is a graph traversal algorithm that explores as far
down each branch as possible before backtracking.
Steps:
○ Start at a source vertex, mark it as visited.
○ Visit each adjacent unvisited vertex, marking it as visited and proceeding
down its branches.
○ Backtrack when no unvisited adjacent vertices remain.
○ Repeat until all vertices reachable from the source are visited.
● Example: For a graph with vertices A,B,C,D where A→B, B→C, A→D:
○ Start at A, mark as visited.
○ Move to B, mark as visited.
○ Move to C, mark as visited.
○ Backtrack to B, no unvisited vertices, backtrack to A.
○ Move to D, mark as visited.
● DFS is useful in applications like maze solving or detecting cycles in graphs.

2. Describe the Breadth-First Search (BFS) algorithm with a detailed example and discuss
its applications.
● Answer: Breadth-First Search (BFS) is a graph traversal algorithm that explores all
neighbors at the current level before moving to the next level.
Steps:
○ Start at a source vertex, mark as visited, and enqueue it.
○ Dequeue the front vertex, visit its unvisited neighbors, mark them, and enqueue
them.
○ Repeat until the queue is empty.
● Example: For a graph with vertices A,B,C,D where A→B, B→C, A→D:
○ Start at A, mark as visited, enqueue A.
○ Dequeue A, visit B and D, mark and enqueue them.
○ Dequeue B, visit C, mark and enqueue it.
○ Dequeue D and C; no new vertices.
● Applications: BFS is used for finding shortest paths in unweighted graphs,
peer-to-peer networking (e.g., finding closest nodes), and social network analysis.

3. Discuss the concept of connectivity in graphs. Explain how to determine if a graph is


connected and how to find connected components.

● Answer: Connectivity in a graph refers to the presence of paths between all pairs of
vertices.
○ A connected graph has a path between every pair of vertices.
○ A disconnected graph has at least one pair of vertices with no path between
them.
● Determining Connectivity:
○ To check if a graph is connected, perform DFS or BFS from any vertex. If all
vertices are reachable, the graph is connected.
● Connected Components:
○ A connected component is a subgraph where any two vertices are connected
by paths, and no other vertices are connected to any vertex in the subgraph.
○ Finding Components:
1. Start DFS or BFS from each unvisited vertex.
2. Each traversal covers one connected component.
3. Repeat until all vertices are visited, with each traversal identifying one
component.
● Example: For a graph with two components, say G1={A,B} and G2={C,D}:
○ Running DFS/BFS from A will visit B, marking G1.
○ Running DFS/BFS from C will visit D, marking G2.
3.2 : Trees
Trees: Basic Terminology

1. Tree: A hierarchical data structure consisting of nodes connected by edges, with a


unique starting node called the root. Trees represent a parent-child relationship.
2. Node: Each element in a tree. A node can have multiple child nodes but only one parent
node (except for the root node, which has no parent).
3. Root: The topmost node in the tree, serving as the starting point of the structure.
4. Parent and Child: In a tree, a node’s parent is the node directly above it, and a child is
a node directly below it.
5. Leaf: A node with no children, located at the tree's lowest level.
6. Degree: The number of children a node has. The degree of a tree is the highest degree
among its nodes.
7. Depth and Height: The depth of a node is the number of edges from the root to that
node, while the height is the maximum depth in the tree.
8. Subtree: Any node, along with its descendants, forms a subtree of the main tree.

Binary Trees

A binary tree is a tree where each node has at most two children, commonly referred to as the
left and right children. Binary trees are foundational structures in computer science due to their
efficient data processing capabilities.

1. Full Binary Tree: Every node has either zero or two children.
2. Complete Binary Tree: All levels, except possibly the last, are completely filled, and all
nodes are as left as possible.
3. Perfect Binary Tree: A full binary tree with all levels completely filled.
4. Skewed Binary Tree: All nodes have only one child, creating either a left or right linear
structure.

Representation of Binary Trees in Memory

Binary trees can be represented in two main ways:

1. Array Representation: Nodes are stored in an array, where the root node is at index 1
(or 0). For any node at index iii:
○ Left child is at 2i+1
○ Right child is at 2i+2
○ Parent is at floor (i−1)/2

Linked Representation: Each node has pointers to its left and right children. A typical node
structure for a binary tree includes:
struct Node {
int data;

struct Node* left;

struct Node* right;

};

2.

Traversing Binary Trees

Tree traversal is the process of visiting all nodes in a specific order. The primary types of binary
tree traversal are:

1. Inorder (Left, Root, Right):


○ Traverse the left subtree, visit the root node, and then traverse the right subtree.
○ Example sequence: Left → Root → Right.
2. Preorder (Root, Left, Right):
○ Visit the root node, then traverse the left subtree, and finally traverse the right
subtree.
○ Example sequence: Root → Left → Right.
3. Postorder (Left, Right, Root):
○ Traverse the left subtree, then the right subtree, and visit the root node last.
○ Example sequence: Left → Right → Root.

Traversal Algorithms Using Stacks

For non-recursive traversal, stacks are commonly used:

1. Inorder Traversal:
○ Push nodes onto the stack until reaching the leftmost node.
○ Pop and visit nodes, then move to the right child if it exists.
2. Preorder Traversal:
○ Push the root node onto the stack, visit it, then push the right and left children (in
that order).
○ Continue popping, visiting, and pushing children until the stack is empty.
3. Postorder Traversal:
○ Push nodes and manage visited markers to ensure the right child is processed
after the left child.

Header Nodes and Threads

1. Header Nodes: Special nodes in linked data structures (such as linked lists or trees) that
act as placeholders or starting points.
2. Threaded Binary Trees: In a threaded binary tree, null pointers are replaced with
pointers to the inorder predecessor or successor, making it easier to traverse the tree
without recursion or stacks.

Binary Search Trees (BST)

A binary search tree is a binary tree with the following properties:

● The left subtree contains nodes with values less than the root.
● The right subtree contains nodes with values greater than the root.
● This property applies to every node, ensuring efficient searching, insertion, and deletion.

Operations on BST

1. Searching in BST:
○ Start at the root; move left if the target is smaller or right if it’s larger until the
target is found or a leaf is reached.
2. Inserting in BST:
○ Traverse to the appropriate leaf node based on value comparison, then insert the
new node as a left or right child.
3. Deleting in BST:
○ Locate the node, then:
■ If it's a leaf, simply remove it.
■ If it has one child, replace it with its child.
■ If it has two children, replace it with its inorder successor or predecessor,
then delete that node.

AVL Trees

An AVL tree is a balanced BST where the heights of the left and right subtrees of every node
differ by no more than one. If the tree becomes unbalanced after an insertion or deletion, it is
rebalanced using rotations:

1. Left Rotation: Applied when the right subtree is heavier.


2. Right Rotation: Applied when the left subtree is heavier.
3. Left-Right Rotation: Applied when the left subtree’s right child causes imbalance.
4. Right-Left Rotation: Applied when the right subtree’s left child causes imbalance.

B-Trees

B-Trees are self-balancing trees that maintain sorted data and allow efficient insertion, deletion,
and search operations. They are widely used in databases and file systems for managing large
blocks of data.

● Properties:
1. All leaves are at the same depth.
2. Each node can contain multiple keys and children.
3. Nodes have a minimum and maximum number of children to maintain balance.

Heaps

A heap is a specialized binary tree that satisfies the heap property:

1. Max-Heap: The key at each node is greater than or equal to the keys of its children.
2. Min-Heap: The key at each node is less than or equal to the keys of its children.

Heap Operations:

1. Insertion: Add the new element at the end, then "heapify" by swapping it up to maintain
the heap property.
2. Deletion (of the root): Replace the root with the last element, then heapify down.

Heap Sort

Heap Sort is an efficient sorting algorithm that uses the heap data structure.

Steps:

1. Build a max-heap from the array.


2. Extract the maximum element (root) and swap it with the last element in the heap.
3. Reduce the heap size by one and heapify the root to restore the max-heap property.
4. Repeat the extraction and heapifying steps until the heap size is 1.

Time Complexity: O(nlog⁡n)


Short Questions (2 Marks)

1. Define a binary tree and explain its basic properties.


Answer:
A binary tree is a hierarchical data structure where each node has at most two children,
typically referred to as the left and right children. Basic properties include:
○ Each node has a maximum degree of 2.
○ The tree has a unique root node.
○ The nodes form a parent-child relationship, with each node having only one
parent (except for the root, which has none).
2. What is a Binary Search Tree (BST)?
Answer:
A Binary Search Tree (BST) is a binary tree where each node follows the ordering rule:
the left subtree contains values less than the node’s value, and the right subtree
contains values greater than the node’s value. This property enables efficient searching,
insertion, and deletion operations.
3. List two key differences between AVL trees and Binary Search Trees (BST).
Answer:
○ Balancing: AVL trees are balanced binary search trees where the height
difference between left and right subtrees of every node is at most 1. BSTs do not
ensure balance.
○ Rotations: AVL trees may require rotations to maintain balance after insertions
and deletions, while BSTs do not require rotations.
4. What is a min-heap, and how is it structured?
Answer:
A min-heap is a complete binary tree where each parent node’s value is less than or
equal to the values of its children. The root node contains the minimum value, and this
heap property is maintained after each insertion or deletion.

Long Questions (5 Marks)

1. Explain the process of inorder, preorder, and postorder traversals in a binary tree
with an example.
Answer:
○ Inorder Traversal (Left, Root, Right): Visit the left subtree, then the root, then
the right subtree. Example: For a tree with root A and left child B, right child C,
the inorder traversal is B → A → C.
○ Preorder Traversal (Root, Left, Right): Visit the root, then left subtree, then
right subtree. Using the same tree, the preorder traversal is A → B → C.
○ Postorder Traversal (Left, Right, Root): Visit the left subtree, right subtree,
then the root. For the example tree, the postorder traversal is B → C → A.
2. Describe the insertion operation in a Binary Search Tree (BST).
Answer:
To insert a node in a BST:
○ Start at the root.
○ If the new value is less than the current node, move to the left child; if greater,
move to the right child.
○ Continue this process recursively until an empty spot (null) is found.
○ Insert the new node in this spot as either the left or right child based on its value.
This ordering ensures that the BST properties are maintained.
3. Explain the concept of a threaded binary tree and its advantages.
Answer:
In a threaded binary tree, null pointers in the nodes are replaced by pointers to the
inorder predecessor or successor, allowing for efficient inorder traversal without
recursion or stacks. The advantages include:
○ Improved Traversal: Enables faster traversal by providing direct links to the next
node in sequence.
○ Reduced Memory Use: Reduces the need for auxiliary data structures like
stacks for traversal.

Very Long Questions (10 Marks)

1. Explain AVL trees, including their balancing process and types of rotations used
to maintain balance.
Answer:
An AVL tree is a balanced binary search tree where the height difference between the
left and right subtrees of each node is at most 1. This balance is achieved by performing
rotations after insertions and deletions, ensuring efficient operations.
Types of Rotations:
○ Left Rotation: Applied when a node’s right subtree is heavier. The right child
becomes the new root, and the old root becomes the left child of the new root.
○ Right Rotation: Applied when a node’s left subtree is heavier. The left child
becomes the new root, and the old root becomes the right child of the new root.
○ Left-Right Rotation: Applied when the left subtree’s right child causes
imbalance. A left rotation is applied to the left subtree, followed by a right rotation.
○ Right-Left Rotation: Applied when the right subtree’s left child causes
imbalance. A right rotation is applied to the right subtree, followed by a left
rotation.

Advantages: AVL trees maintain balance, ensuring a time complexity of O(log⁡n) for
search, insert, and delete operations, making them suitable for applications where fast
data retrieval is critical.
2. Describe heap data structures, focusing on max-heaps, min-heaps, and their
applications.
Answer:
A heap is a complete binary tree where each node follows a specific ordering property:
○ Max-Heap: The value at each node is greater than or equal to the values of its
children. The root node has the maximum value.
○ Min-Heap: The value at each node is less than or equal to the values of its
children. The root node has the minimum value.

Applications of Heaps:

○ Priority Queues: Heaps efficiently manage priority queues, allowing for quick
access to the highest (max-heap) or lowest (min-heap) priority element.
○ Heap Sort: A sorting algorithm that uses the heap structure to order data in
O(nlog⁡n) time.
○ Graph Algorithms: Heaps optimize graph traversal algorithms like Dijkstra’s
shortest path by providing efficient access to the minimum cost path.
3. Explain the operations of insertion, searching, and deletion in Binary Search Trees
(BSTs) with examples.
Answer:
Insertion:
○ Start from the root; compare the value to be inserted with the current node.
○ Move left if the value is smaller; right if larger.
○ Repeat until a null position is found, then insert the node.

Example: Insert 15 into BST: For nodes 10, 20, and 30, insert as the right child of 10
since 15 > 10 but < 20.
Searching:

○ Start from the root and compare the target with each node.
○ Move left if smaller, right if larger.
○ Continue until the target is found or a leaf node is reached.

Example: To search for 15 in the above tree, start at root 10 → move right to 20 → left
to 15.
Deletion:

○ Locate the node to delete. Three cases arise:


1. Leaf Node: Simply remove it.
2. One Child: Replace the node with its child.
3. Two Children: Replace the node with its inorder successor (smallest
node in the right subtree) and then delete the successor node.

Example: Deleting 15 from the tree involves replacing it with the right child if any, or just
removing it if it’s a leaf.
3.3 : Hashing and File Organization
Hashing
Hashing is a technique used in computer science to store and retrieve data quickly. Hashing
maps data (keys) to fixed-size values (hash codes or hash values) using a hash function, which
allows for fast access to data by leveraging these hash values as indices in a hash table.

Basic Terminology

1. Hash Function: A function that converts an input (or "key") into a fixed-size string of
bytes. The output, called the hash value or hash code, is typically a number used as an
index in a hash table.
2. Hash Table: A data structure that stores key-value pairs based on the hash value of the
key. This table uses the hash function to quickly locate entries.
3. Bucket: A slot in the hash table where one or more values may be stored, depending on
the collision handling method.
4. Collision: Occurs when two or more keys produce the same hash value and are
mapped to the same index or bucket in the hash table.
5. Load Factor: The ratio of the number of entries in the hash table to the number of
buckets. A high load factor can increase collisions, leading to slower access times.

Hash Functions

A good hash function should:

● Be deterministic: Given a specific input, it should always return the same hash code.
● Distribute values uniformly: Hash codes should be evenly distributed to avoid clusters
of data.
● Be efficient to compute: The function should work quickly even for large datasets.

Common Hash Functions:

1. Division Method:
○ Formula: h(k)=k mod m
○ Here, k is the key, and m is the size of the hash table.
○ Example: If m=10 and k=1234, then h(1234)=1234 mod 10 = 4.
2. Multiplication Method:
○ Formula: h(k)=⌊m⋅(k⋅A mod 1)⌋
○ A is a constant, typically an irrational number like (root 5−1)/2
3. Folding Method: Divides the key into parts, adds them, and then takes the modulo.
○ Example: For a key 123456, split into 12, 34, and 56, sum to get 102, and take
mod m to get the hash value.
Collision Resolution Techniques

Since collisions can occur when two keys hash to the same index, we use collision resolution
methods:

1. Separate Chaining:
○ Each bucket is associated with a linked list of entries that hash to the same
index.
○ Pros: Simple and efficient when the load factor is controlled.
○ Cons: Can become inefficient if chains grow too long.
2. Open Addressing:
○ Stores all elements within the hash table itself, probing for an empty spot if a
collision occurs.
3. Types of Open Addressing:

4. Rehashing:
○ When the load factor reaches a threshold, a new larger table is created, and all
entries are rehashed into it. This reduces collisions but requires time for resizing.

Applications of Hashing

1. Databases: Hashing allows for fast lookups of records based on key values, enhancing
database efficiency.
2. Caches: Used to map resource requests to cached data, allowing for quicker retrieval.
3. Password Storage: Hashing is commonly used to securely store passwords.
4. Compiler Symbol Tables: Hashing helps manage identifiers in programming languages
efficiently.

Examples of Hashing Concepts


Example of a Division Method Hash Function

Each key hashes to index 0. Using separate chaining, we store all these values in a linked list at
index 0.

Performance and Load Factor in Hashing


● The Load Factor (α) is calculated as:

● A high load factor increases the probability of collisions, degrading performance.


● For Open Addressing, typical performance deteriorates when α\alphaα approaches 1.
● For Separate Chaining, performance depends on the average chain length, which is
directly proportional to α\alphaα.

Advantages and Disadvantages of Hashing


Advantages

● Fast Access: Provides O(1) average-case time complexity for search, insert, and delete
operations.
● Efficient Memory Use: Only the required entries are stored, making it memory-efficient.

Disadvantages

● Collisions: Hashing cannot avoid collisions completely, so collision resolution methods


are necessary.
● Worst-case Time Complexity: In cases of poor hash function design or high load factor,
search operations can degrade to O(n).

Hashing Algorithms
Hash Table Operations

1. Insertion:
○ Calculate the hash value using the hash function.
○ Check if the computed index is occupied.
○ If there is no collision, place the element.
○ If a collision occurs, use a collision resolution method (e.g., chaining or probing).
2. Searching:
○ Compute the hash code using the hash function.
○ Check the corresponding index in the table.
○ If the value is present, return it.
○ If a collision resolution technique is used (e.g., probing), follow the technique’s
rules to locate the key.
3. Deletion:
○ Compute the hash code of the key.
○ Locate the key in the hash table using the collision resolution technique, if any.
○ Remove the entry if found.
File Organization and Record Management

In computer science, file organization refers to the way data is stored in a file system. Efficient
file organization enables fast access to records and optimal use of memory. Understanding how
records are organized and retrieved from files is crucial for managing large datasets, ensuring
efficient file systems, and improving performance.

Basic Concepts of Files

1. File: A collection of data or information that is stored on a storage device (e.g., hard disk,
SSD). Files can store various types of data, such as text, numbers, images, or even
executable code.
2. Record: A collection of related data items, often referred to as fields. For example, in a
student database, a record may contain fields like student ID, name, date of
birth, address, etc.
3. File Structure: The way data is arranged within a file. It includes how records are
stored, accessed, and organized.
4. Block: A unit of storage in a file. Typically, a file is divided into blocks, which are the
smallest unit of data that can be transferred from disk to memory.
5. File Access: How a program accesses records in a file. It can be sequential (records are
read in order), direct (records can be accessed randomly), or indexed (using an index to
quickly locate records).

Organization of Records into Blocks

When data is stored in files, it's typically organized into blocks. A block is a contiguous set of
records, and each block is stored sequentially in the storage device. The organization into
blocks helps in efficient storage and retrieval because:

● Files may contain many records, and managing them in smaller, fixed-size blocks helps
with better memory management and faster access.
● Blocks can be read in one operation from the disk, which reduces I/O operations,
improving system performance.

In file systems, records within a block are typically stored in sequential order, but different file
organizations can be used to manage how these blocks of records are accessed.

File Organization Techniques


There are several ways to organize records in a file, each with advantages and limitations. The
most common file organization techniques are:

1. Sequential File Organization

● Definition: In a sequential file, records are stored in order, one after the other. A
sequential file can only be read in a sequence, typically from the beginning to the end.
● Use Cases: This organization is useful for applications where the majority of access is
through the entire file in a linear fashion (e.g., generating reports, sequential
processing).
● Insertion/Deletion: Inserting or deleting records in sequential files can be slow because
new records must be inserted in the correct order, and deletions may require shifting
records.
● Advantages:
○ Simple to implement.
○ Efficient for reading entire files or sequential access.
● Disadvantages:
○ Searching for specific records can be slow unless the file is indexed or sorted.
○ Insertions and deletions are inefficient, especially as the file size grows.

Example: A log file where each entry is recorded in order of occurrence.

2. Relative File Organization

● Definition: A relative file is one in which records are stored in a manner that allows
access via a relative address. The records are stored in blocks, and each block has an
address, which allows direct access to records.
● Access: The record location is calculated based on a relative position or address within
the file, using the key field of the record (e.g., record ID).
● Use Cases: This is suitable for applications where records need to be accessed directly
by a key value, like databases or key-value stores.
● Advantages:
○ Direct access to records, making retrieval fast.
○ Efficient for applications where records are accessed randomly.
● Disadvantages:
○ Deletion and insertion can be complex as the position of records might change.
○ Handling overflow (when a block is full) requires additional techniques like
chaining or linking.

Example: A phonebook where each entry can be accessed directly via a unique phone number.

3. Indexed Sequential File Organization

● Definition: Indexed sequential files use an index to store pointers to records. The index
allows for both sequential and direct access to records.
● Working: The data file is organized sequentially, and an index file is created that stores
the addresses of the records. The index provides a fast way to locate a record using its
key field.
● Use Cases: Suitable for applications where both sequential and random access are
required, like a library catalog or inventory management system.
● Advantages:
○ Provides fast access to records via the index.
○ Allows both sequential and random access.
● Disadvantages:
○ The index adds extra storage and complexity.
○ Insertions and deletions may require updates to the index, which can be costly.

Example: A student database where records are stored in order of student ID, and an index is
created to locate records by name or age.

4. Inverted File Organization

● Definition: An inverted file is a specialized file organization where each keyword (or key)
points to a list of records containing that keyword. This structure is often used in
information retrieval systems, like search engines.
● Working: Instead of storing a record with its key directly, an inverted index stores a
mapping from a keyword to the set of records containing it. It is particularly useful when
queries are based on keywords or attributes.
● Use Cases: Most commonly used in search engines, full-text databases, and data
warehousing systems.
● Advantages:
○ Efficient for searching based on attributes or keywords.
○ Great for applications that require frequent keyword-based queries.
● Disadvantages:
○ The inverted index takes up more space because it stores a list of record pointers
for each keyword.
○ Maintaining the inverted index can be complex during updates (insertions,
deletions).

Example: A search engine index where each word in a document corpus points to a list of
documents containing that word.
Exam Questions for Hashing and File Organization

4 Short Questions (2 Marks Each)

1. What is the purpose of a hash function in hashing, and what characteristics make
a hash function effective?
○ Answer: A hash function converts input data into a fixed-size integer or index,
allowing for efficient data storage and retrieval. Effective hash functions distribute
keys uniformly to avoid clustering and minimize collisions.
2. Explain the term ‘collision’ in hashing and provide one method to resolve
collisions.
○ Answer: A collision occurs when two keys produce the same hash value and
map to the same index in a hash table. One method to resolve collisions is
separate chaining, where each index in the table points to a linked list of entries.
3. Define sequential file organization and give an example of its application.
○ Answer: Sequential file organization stores records in a fixed order. This method
is efficient for processing files in a linear manner and is commonly used in payroll
systems where data needs to be processed in sequence.
4. What is an inverted file, and where is it commonly used?
○ Answer: An inverted file is an indexing technique where each attribute value
points to a list of records containing that attribute. It is commonly used in search
engines to quickly retrieve records based on keywords.

3 Long Questions (5 Marks Each)

1. Describe open addressing in hashing, and compare linear probing and double
hashing as methods of resolving collisions.
○ Answer: Open addressing is a collision-resolution method where, rather than
using external chaining, collisions are resolved within the hash table itself. In
linear probing, the next available slot is found sequentially, which may cause
primary clustering. In contrast, double hashing uses a secondary hash function
to calculate probe intervals, which reduces clustering by distributing collisions
more evenly across the table.
2. Explain index sequential file organization and its advantages.
○ Answer: Index sequential file organization stores records in a sorted order with
an index allowing for both sequential and direct access. This approach allows
efficient retrieval for both sequential processing and quick access via the index.
Advantages include faster access times due to the index and efficient processing
for large, ordered datasets, though it requires extra storage for the index and may
be slower for frequent updates.
3. How does separate chaining differ from open addressing in handling hash
collisions? Provide examples.
○ Answer: Separate chaining resolves collisions by creating a linked list of entries
at each hash table index, allowing multiple entries to occupy the same index.
Open addressing, on the other hand, keeps all data within the table and
resolves collisions by probing alternative indices. For example, if the hash
function maps two keys to index 5, separate chaining would store both entries in
a linked list at index 5, whereas open addressing would place the second entry in
the next open slot, such as index 6 or 7.

3 Very Long Questions (10 Marks Each)

1. Compare and contrast various collision resolution techniques in hashing,


including separate chaining, linear probing, and double hashing.
○ Answer:
■ Separate Chaining: Each hash table index points to a linked list of
entries. This approach is simple and keeps entries organized, but linked
lists require extra memory and can be slower due to pointer operations.
■ Linear Probing: This is an open-addressing method where the next
available slot is searched sequentially. While it avoids extra memory
usage for linked lists, it can lead to clustering, slowing down search and
insertion times.
■ Double Hashing: This method calculates probe intervals using a
secondary hash function, reducing clustering effects. It requires careful
choice of secondary hash functions but provides better distribution,
making it effective in minimizing clustering.
■ Each method has trade-offs: separate chaining is easier to implement
with fewer restrictions on table size, linear probing is simpler but can
cluster, and double hashing offers improved distribution but is complex to
implement.
2. Describe the different file organization techniques (sequential, index sequential,
relative, and inverted file organization) and their typical use cases.
○ Answer:
■ Sequential File Organization: Stores records in a fixed order, best for
applications needing linear data processing, such as payroll or report
generation. It’s simple but inefficient for random access.
■ Index Sequential: Adds an index to sequential files, enabling both
sequential and direct access. This is suited for applications like student or
employee records where both sequential processing and direct lookup are
required.
■ Relative File Organization: Uses calculated addresses (relative
positioning), enabling fast direct access, useful in applications like
inventory management systems where random access is crucial.
■ Inverted File Organization: Common in databases and search engines,
this organization uses an attribute-based index for quick retrieval based
on keywords or specific fields, making it ideal for search-heavy
applications.
■ Each technique is chosen based on access needs, with sequential
favoring linear access, index sequential for mixed access, relative for fast
direct access, and inverted for attribute-based searching.
3. Explain rehashing in hashing, its purpose, and its impact on performance. When
should rehashing be applied, and how does it affect a system with a high load
factor?
○ Answer:
■ Rehashing: This process involves creating a new, larger hash table and
re-inserting all existing elements based on a new hash function or
updated table size. Rehashing reduces the load factor, distributing entries
more evenly to maintain O(1) access times.
■ Purpose: It is used to prevent performance degradation caused by high
load factors, where frequent collisions slow down operations.
■ Application: Rehashing should be applied when the load factor exceeds
a certain threshold (often 0.7 or 0.75). This ensures the table doesn’t
become overcrowded, as too many elements in close proximity increase
search and insertion times.
■ Impact: Rehashing can temporarily slow down the system due to the
need to copy and reinsert elements, but it’s essential for long-term
performance. It effectively reduces the likelihood of clustering and
collision frequency, allowing the hash table to maintain its performance
efficiency.
Unit 3 Summary: Graphs, Trees, and Hashing/File Organization

1. Graphs

Graphs consist of nodes (vertices) connected by edges, and are used to model relationships
between entities in complex networks, such as social media, transportation, and communication
systems.

● Graph Terminology: Includes terms like vertices, edges, paths, degree, and connected
components.
● Representation:
○ Adjacency Matrix: A 2D array where each cell indicates if an edge exists
between vertices.
○ Path Matrix: Stores information about reachability between vertices.
● Graph Traversal: Methods include:
○ Depth-First Search (DFS): Explores as far as possible along a branch before
backtracking.
○ Breadth-First Search (BFS): Explores neighbors layer by layer.
● Applications: Include shortest path finding, connectivity checking, and network analysis.

2. Trees

Trees are hierarchical structures with a root node and branches, used to represent hierarchical
relationships like file directories, organization charts, and decision processes.

● Basic Terminology: Includes root, parent, child, leaf, subtree, depth, and height.
● Binary Trees: A type of tree where each node has at most two children (left and right).
○ Representation in Memory: Nodes are linked, with pointers to child nodes or
stored in arrays.
○ Traversal Methods:
■ Inorder, Preorder, and Postorder: Different ways of visiting nodes based
on the order of left/right children and the root.
■ Traversal Using Stacks: Used for non-recursive traversals.
○ Binary Search Trees (BSTs): A binary tree structure that maintains elements in
sorted order, allowing efficient searching, insertion, and deletion.
○ AVL Trees: Self-balancing binary search trees that keep height balanced to
ensure efficient operations.
○ B-Trees: Multi-way search trees optimized for disk storage and retrieval,
commonly used in databases.
○ Heaps: A complete binary tree primarily used in priority queues and Heap Sort
algorithm.
3. Hashing and File Organization

Hashing is a method to organize and retrieve data based on a computed key, while file
organization methods define how data is physically stored in files.

● Hashing Concepts: Uses a hash function to map keys to table indices, with techniques
to handle collisions (e.g., separate chaining and open addressing).
● Rehashing: Involves expanding the table and redistributing entries to maintain
efficiency.
● File Organization:
○ Sequential Organization: Stores records in a fixed, sorted order.
○ Relative Organization: Uses calculated addresses for random access.
○ Index Sequential Organization: Combines sequential organization with an
index for both sequential and direct access.
○ Inverted Files: An indexing method allowing quick access to records based on
specific fields, used in search-heavy applications.

Notes and Sample Questions compiled by : Subhayu Mukherjee


Sample Question Paper (for EST) - Unit 3

Section A (2x3 = 6 Marks)

1. Define an adjacency matrix and explain its use in representing a graph.


2. What is a binary tree? Briefly describe its structure.
3. Explain the concept of a hash function in hashing.

Section B (5x2 = 10 Marks)

4. Explain the different traversal methods in a binary tree and the order in which nodes are
visited in each method.
5. Describe the difference between open addressing and separate chaining in collision
resolution for hashing.

Section C (10x2 = 20 Marks)

6. Describe Depth-First Search (DFS) and Breadth-First Search (BFS) traversal algorithms
in graphs. Discuss their applications with examples.
7. Explain the organization of files in an Index Sequential File Organization. Discuss how it
improves data access and efficiency.

You might also like