DataStructures Unit123 FullNotes
DataStructures Unit123 FullNotes
Unit 1 NOTES
● Data: Raw, unprocessed facts and figures without any context. Data can be numbers,
characters, symbols, or even sounds. Data does not convey any meaning on its own.
For example, "23", "Blue", "X" are pieces of data.
● Information: Data that has been processed, organized, or structured in a way that
provides meaning or context. Information is derived from data and is useful for
decision-making. For example, "John's age is 23", or "The sky is blue" are pieces of
information because they give context to the raw data.
Key Differences:
○ Data is the raw input that is processed to derive information.
○ Data is unorganized and meaningless on its own, while information is
processed and meaningful.
● Definition: A data structure where data elements are arranged sequentially or in a linear
order, where each element is connected to its previous and next element.
● Examples:
1. Array: A collection of elements identified by index or key, all of which are stored
in contiguous memory locations. Arrays have a fixed size and are used for storing
multiple items of the same type.
2. Linked List: A collection of nodes where each node contains data and a
reference (link) to the next node in the sequence. Linked lists allow for dynamic
memory allocation.
3. Stack: A linear data structure that follows the Last In, First Out (LIFO) principle.
Elements are added and removed only from one end, called the "top" of the
stack.
4. Queue: A linear data structure that follows the First In, First Out (FIFO) principle.
Elements are added from the "rear" and removed from the "front."
● Definition: A data structure where data elements are not arranged sequentially; instead,
they are connected in a hierarchical or graph-like manner.
● Examples:
1. Tree: A hierarchical data structure consisting of nodes, with a root node and
sub-nodes (children) forming a parent-child relationship. Binary Trees, Binary
Search Trees, AVL Trees, and B-Trees are examples.
2. Graph: A collection of nodes (vertices) and edges that connect pairs of nodes.
Graphs can be directed or undirected and are used to represent networks.
5. Algorithm Complexity
7. Asymptotic Notations
● Definition: Asymptotic notations are mathematical tools used to describe the limiting
behavior of an algorithm's complexity as the input size grows. They provide a way to
describe the performance or complexity of an algorithm in a general sense.
● Types of Asymptotic Notations:
1. Big O Notation (O): Describes the upper bound of an algorithm's running time. It
represents the worst-case scenario, giving the maximum time required by an
algorithm.
■ Example: O(n) indicates that the running time grows linearly with the
input size.
2. Omega Notation (Ω): Describes the lower bound of an algorithm's running time.
It represents the best-case scenario, showing the minimum time required by an
algorithm.
■ Example: Ω(n) indicates that the running time grows at least linearly with
the input size.
3. Theta Notation (Θ): Describes the tight bound of an algorithm's running time. It
represents both the upper and lower bounds, showing the exact growth rate of
the algorithm.
■ Example: Θ(n) indicates that the running time grows linearly with the
input size.
4. Little o Notation (o): Represents an upper bound that is not asymptotically tight.
It describes a function that grows slower than the function in the denominator.
■ Example: o(n^2) indicates the algorithm's time complexity is less than
n^2 but not as precisely bounded as O(n^2).
5. Little omega Notation (ω): Represents a lower bound that is not asymptotically
tight. It describes a function that grows faster than the function in the
denominator.
■ Example: ω(n) indicates that the running time grows faster than n.
Question Bank
Answer: Data refers to raw, unprocessed facts such as numbers, characters, or symbols,
whereas information is processed data that has context and meaning.
Answer: A data structure is a specialized format for organizing, processing, retrieving, and
storing data. It is important because:
Answer:
4. Describe the main operations that can be performed on data structures.
Answer:
Answer: Algorithm complexity measures the resources an algorithm requires, such as time and
space, relative to the size of the input data. It is crucial because:
Answer:
● Time-space trade-off involves balancing the memory usage of an algorithm against its
running time.
● Example: Using a hash table for faster searches increases memory usage, while a
simple array requires less memory but takes longer to search (linear search O(n)).
Answer: Big O notation describes an algorithm's upper bound in terms of time or space
complexity, representing the worst-case scenario.
8. Provide an example of a linear data structure and its typical use case.
Answer:
Queue:
Answer:
● Stack: A linear data structure following Last In, First Out (LIFO) principle.
Operations:
○ Push: Add an element to the top.
○ Pop: Remove the top element.
○ Peek: Retrieve the top element without removing it.
Answer:
Answer:
Asymptotic notations describe the behavior of an algorithm as the input size approaches infinity.
12. What are the advantages of using a linked list over an array?
Answer: Answer:
13. Describe the structure of a binary tree and its primary use cases.
Answer:
● Binary Tree: A hierarchical structure where each node has at most two children (left and
right).
Use Cases:
○ Search Operations: Implementing binary search trees.
○ Hierarchical Data: Representing organizational structures or file systems.
14. What is the difference between time complexity and space complexity?
Answer:
Answer:
Graph:
Answer:
● O(n^2): Indicates quadratic time complexity, where the time required grows
exponentially with the input size.
● Implication: Algorithms with O(n^2) complexity, such as bubble sort in the worst case,
become inefficient for large datasets as performance degrades rapidly.
Answer:
Answer:
19. Differentiate between time complexity classes O(n log n) and O(n^2).
Answer:
Example: Merge Sort, Quick Sort (avg) Example: Bubble Sort, Insertion Sort
Answer:
● Time-Space Trade-off: The balance between the time required to execute an algorithm
and the memory required to run it.
● Effect on Design:
○ More Space, Less Time: Using more memory to speed up computation (e.g.,
hash tables).
○ Less Space, More Time: Reducing memory usage at the cost of longer
computation (e.g., searching unsorted data linearly).
Answer:
Data structures are categorized into two primary types: linear and non-linear.
The choice of data structure should be based on the specific needs of the application, balancing
memory usage and processing speed.
Answer:
1. Time Complexity:
○ Represents the amount of time an algorithm takes to complete as a function of
the input size.
○ Types:
■ Best Case: The minimum time required for an algorithm to complete. This
is the most favorable input condition.
■ Worst Case: The maximum time required, representing the most
unfavorable input.
■ Average Case: The expected time for typical input conditions, averaging
over all possible inputs.
2. Space Complexity:
○ Represents the amount of memory space required by the algorithm, including
both the input space and auxiliary space used during execution.
Example of Time Complexity Analysis:
Algorithm:
def bubble_sort(arr):
n = len(arr)
for i in range(n):
●
● Best Case (Already sorted array):
○ Time Complexity: O(n)
○ Reason: Only one pass is needed to confirm that the array is sorted.
● Worst Case (Reversed array):
○ Time Complexity: O(n^2)
○ Reason: Each element must be compared with every other element, leading to
n*(n-1)/2 comparisons.
● Average Case (Random order):
○ Time Complexity: O(n^2)
○ Reason: The algorithm does not have prior knowledge of the order of elements,
averaging out to quadratic time complexity over all possible cases.
Understanding these complexities helps developers choose or optimize algorithms based on the
size and nature of the input data.
Answer:
Time-Space Trade-off: The time-space trade-off in algorithms is a balancing act between the
memory space used by an algorithm and the time it takes to execute. Optimizing for one often
means compromising on the other.
1. Increasing Space to Reduce Time:
○ Example: Hashing:
■ By using a hash table, searching for an element can be done in O(1) time.
However, this requires additional space to store the hash table, potentially
up to O(n) where n is the number of elements.
○ Example: Dynamic Programming:
■ Solving problems like Fibonacci sequence using memoization stores
previously computed results, reducing time complexity from O(2^n) to
O(n) but requiring O(n) space.
2. Reducing Space to Increase Time:
○ Example: Recursive Algorithms:
■ A recursive Fibonacci calculation has O(1) space complexity but O(2^n)
time complexity due to repeated calculations.
○ Example: In-place Sorting Algorithms (like Insertion Sort):
■ Operates with O(1) space by sorting in place but has O(n^2) time
complexity, making it less efficient for large datasets.
● The trade-off is crucial when designing algorithms for environments with limited memory
or where speed is a priority. For example, embedded systems prioritize space efficiency,
whereas web applications may favor speed. Developers need to evaluate the specific
constraints and requirements of their applications to find an optimal balance between
time and space.
Answer:
Asymptotic Notations: Asymptotic notations are mathematical tools used to describe the
running time or space requirement of an algorithm in terms of input size. They provide a
high-level understanding of an algorithm's efficiency by focusing on its behavior as the input size
grows.
● Scalability: Asymptotic notations help predict how algorithms perform as the input size
scales, allowing for better resource allocation and optimization.
● Algorithm Comparison: They provide a common framework for comparing the
efficiency of different algorithms regardless of machine or platform differences.
● Optimization Guidance: Highlight areas where improvements can be made, especially
for large datasets.
Understanding and applying these notations ensure algorithms are chosen or optimized based
on the context in which they will run, improving overall performance and efficiency.
Chapter 2 : Arrays
1. Basic Terminology
● Array: A collection of elements (usually of the same data type) stored in contiguous
memory locations. Each element can be accessed using its index.
● Index: The position of an element in an array, typically starting from 0.
● Pointer: A variable that stores the memory address of another variable. Pointers are
powerful in C/C++ for dynamic memory management, manipulating arrays, and
referencing data structures.
5. Searching Algorithms
● Linear Search:
○ Definition: Sequentially checks each element of the array until the desired
element is found or the end of the array is reached.
○ Complexity: O(n), where n is the number of elements in the array.
● Binary Search:
○ Definition: An efficient algorithm for finding an element in a sorted array. It
repeatedly divides the search interval in half.
○ Procedure:
1. Start with the middle element.
2. If the middle element is equal to the target value, the search is complete.
3. If the target value is less than the middle element, search the left half.
4. If the target value is greater than the middle element, search the right half.
○ Complexity: O(log n).
6. Sorting Algorithms
● Insertion Sort:
○ Definition: Builds the final sorted array one element at a time by repeatedly
picking the next element and inserting it into its correct position.
○ Complexity: O(n^2) in the worst case.
● Selection Sort:
○ Definition: Repeatedly selects the smallest (or largest) element from the
unsorted portion and moves it to the sorted portion.
○ Complexity: O(n^2) in all cases.
● Bubble Sort:
○ Definition: Repeatedly steps through the list, compares adjacent elements, and
swaps them if they are in the wrong order.
○ Complexity: O(n^2) in the worst case.
7. Quick Sort
● Definition: A divide-and-conquer algorithm that selects a 'pivot' element and partitions
the array into two sub-arrays, according to whether they are less than or greater than the
pivot.
● Procedure:
○ Choose a pivot: Typically, the first element, last element, or a random element.
○ Partitioning: Rearrange elements so that all elements less than the pivot are on
the left, and all elements greater than the pivot are on the right.
○ Recursively apply the above steps to the sub-arrays of elements with smaller
values and larger values.
● Complexity:
○ Best Case: O(n log n) when the pivot divides the array into two equal halves.
○ Worst Case: O(n^2) when the pivot is the smallest or largest element repeatedly
(like sorted or reverse sorted arrays).
○ Average Case: O(n log n).
● Space Complexity: O(log n) due to recursive stack space.
Merging Arrays:
Merge Sort:
● Definition: A stable divide-and-conquer sorting algorithm that divides the input array into
two halves, calls itself for the two halves, and then merges the two sorted halves.
● Procedure:
○ Divide the array into two halves.
○ Conquer by recursively sorting the two halves.
○ Combine by merging the sorted halves to produce the sorted array.
● Complexity:
○ Time Complexity: O(n log n) for all cases (best, average, and worst).
○ Space Complexity: O(n) due to auxiliary storage required for merging.
● Definition: Arrays with more than one dimension (e.g., 2D arrays or matrices, 3D
arrays).
● Representation:
○ Row-Major Order: Stores all elements of a row contiguously.
Address(A[i][j])=BaseAddress(A)+((i×number of columns)+j)×Size of each
element\text{Address}(A[i][j]) = \text{BaseAddress}(A) + ((i \times \text{number of
columns}) + j) \times \text{Size of each
element}Address(A[i][j])=BaseAddress(A)+((i×number of columns)+j)×Size of
each element
○ Column-Major Order: Stores all elements of a column contiguously.
Address(A[i][j])=BaseAddress(A)+((j×number of rows)+i)×Size of each
element\text{Address}(A[i][j]) = \text{BaseAddress}(A) + ((j \times \text{number of
rows}) + i) \times \text{Size of each
element}Address(A[i][j])=BaseAddress(A)+((j×number of rows)+i)×Size of each
element
● Pointers:
○ Definition: A pointer is a variable that stores the memory address of another
variable.
○ Uses: Dynamic memory allocation, efficient array manipulation, accessing
hardware, and implementing data structures (e.g., linked lists, trees).
● Pointer Arrays:
○ Definition: An array of pointers, where each pointer can point to an individual
element or another array.
○ Usage: Useful in dynamic memory management and when dealing with
variable-length data or strings in C/C++.
Example in C:
c
Copy code
struct Student {
char name[50];
int age;
float GPA;
};
●
● Representation in Memory:
○ Contiguous Allocation: Fields are stored in contiguous memory locations.
○ Padding: There might be padding between fields for alignment purposes.
● Definition: Multiple arrays that hold related data in corresponding elements (parallel to
each other).
Example:
○ One array to store student names, another to store their grades. Index 0 in both
arrays corresponds to the same student.
● Usage: Simplifies data management when dealing with large data sets where each
record is homogeneous in terms of data types.
● Definition: A sparse matrix is a matrix in which most of the elements are zero. Storing
only non-zero elements saves memory.
● Storage Methods:
○ Array of Tuples: Store each non-zero element as a tuple (row, column, value).
○ Compressed Sparse Row (CSR):
■ Three arrays:
■ Values to store non-zero elements.
■ Column Indices to store column indices of each element in the
Values array.
■ Row Pointers to store the starting index of each row in the
Values array.
○ Compressed Sparse Column (CSC): Similar to CSR but focuses on
column-wise storage.
● Complexity:
○ Space Complexity: O(nz) where nz is the number of non-zero elements, which
is much less than O(m*n) for dense matrices.
Question bank
Answer:
A linear array is a data structure consisting of a collection of elements, each identified by an
index or a key. It is stored in contiguous memory locations, allowing efficient indexing and easy
access to any element using its index. The memory address of each element can be calculated
using the formula:
Answer:
Traversing an array means accessing and processing each element of the array sequentially
from the first element to the last. It is important because it allows us to perform operations like
searching, sorting, and updating elements within the array.
Answer:
c
Copy code
void insert(int arr[], int n, int element, int position) {
for (int i = n; i > position; i--) {
arr[i] = arr[i - 1];
}
arr[position] = element;
}
This code shifts elements to the right and inserts the new element at the specified position.
4. Explain the difference between linear search and binary search.
Answer:
Scans each element Divides the array in half and searches in sorted
sequentially arrays
Answer:
The time complexity of inserting an element at the end of an array is O(1) because it involves
placing the new element directly into the next available memory location without shifting other
elements.
Answer:
A multi-dimensional array is an array of arrays, where each element is itself an array. A common
example is a 2D array (matrix) which is used to represent a table of rows and columns.
Example in C:
c
Copy code
int matrix[3][4]; // A 2D array with 3 rows and 4 columns
Answer:
To delete an element from an array at a specific index, shift all elements after that index one
position to the left:
This code snippet shifts elements to the left to overwrite the deleted element.
Answer:
Bubble Sort is a simple sorting algorithm that repeatedly steps through the list, compares
adjacent elements, and swaps them if they are in the wrong order. This process is repeated until
the list is sorted. The algorithm has a time complexity of O(n^2) in the worst case.
9. What is a pointer array, and how does it differ from a regular array?
Answer:
A pointer array is an array that stores the addresses of other variables, rather than storing
actual data values. Unlike a regular array, which contains data, a pointer array can reference
different memory locations, allowing dynamic access to elements or arrays.
10. Write a code snippet for performing a binary search on a sorted array.
Answer:
Answer:
Sorting algorithms organize the elements of an array into a specific order (ascending or
descending). This is significant because it:
12. Describe how Quick Sort works with an example of its average-case
time complexity.
Answer:
Quick Sort is a divide-and-conquer algorithm that works by selecting a 'pivot' element from the
array and partitioning the other elements into two sub-arrays according to whether they are less
than or greater than the pivot. It recursively sorts the sub-arrays.
● Average-case time complexity: O(n log n), as it divides the array approximately in half
each time.
Answer:
A sparse matrix is a matrix in which most of the elements are zero. Specialized storage (like
compressed row storage or list of lists) is necessary to efficiently store only the non-zero
elements, reducing memory usage and improving computational efficiency for operations like
addition and multiplication.
14. Explain the concept of merging two sorted arrays with an example.
Answer:
Merging two sorted arrays involves combining them into a single sorted array. The process
compares elements from both arrays and places the smaller element into the resulting array,
continuing until all elements are merged.
Example:
void merge(int arr1[], int arr2[], int n1, int n2, int merged[]) {
int i = 0, j = 0, k = 0;
while (i < n1 && j < n2) {
if (arr1[i] < arr2[j]) merged[k++] = arr1[i++];
else merged[k++] = arr2[j++];
}
while (i < n1) merged[k++] = arr1[i++];
while (j < n2) merged[k++] = arr2[j++];
}
Answer:
A record is a data structure that can store elements of different data types. In arrays, records
are stored as structures, where each structure contains fields that represent the record's
elements. Memory representation involves storing each field sequentially, similar to how an
array stores its elements.
Answer:
Multi-dimensional arrays, such as 2D arrays, are stored in memory either in row-major order or
column-major order:
● Row-major order: Stores elements row by row, meaning the entire first row is stored
first, followed by the second row, and so on.
● Column-major order: Stores elements column by column, meaning the entire first
column is stored first, followed by the second column, and so on.
The choice depends on the programming language and the use case.
Answer:
Array Pointer
18. Describe the process of Selection Sort and its time complexity.
Answer:
Selection Sort works by dividing the array into a sorted and unsorted section. It repeatedly
selects the smallest (or largest) element from the unsorted section and swaps it with the first
unsorted element. The process continues until the entire array is sorted.
Answer:
Insertion Sort is a simple sorting algorithm that builds the final sorted array one item at a time.
It is most efficient for small datasets or when the array is already partially sorted because its
time complexity can approach O(n) in the best case (nearly sorted array).
20. Explain how parallel arrays are used and provide an example.
Answer:
Parallel arrays use multiple arrays to store related data such that corresponding elements
across arrays represent a single entity. This method maintains relationships between different
data types without using a complex data structure.
Example:
Answer:
Insertion: To insert an element into a linear array, shift elements to the right starting from the
end of the array to the position where the new element should be inserted.
Code Snippet:
Time Complexity: O(n), where n is the number of elements in the array. This is because, in the
worst case, all elements after the insertion point need to be shifted.
Deletion: To delete an element from a linear array, shift elements to the left starting from the
position of the element to be deleted.
Code Snippet:
Time Complexity: O(n), where n is the number of elements in the array. This is because, in the
worst case, all elements after the deletion point need to be shifted.
2. Describe and implement a function to merge two sorted arrays into a
single sorted array. Explain the algorithm and its time complexity.
Answer:
Merging Two Sorted Arrays: The merging process involves comparing elements from both
arrays and adding the smaller element to the new merged array. This continues until all
elements from both arrays are processed.
Algorithm:
Code Snippet:
void merge(int arr1[], int n1, int arr2[], int n2, int merged[]) {
int i = 0, j = 0, k = 0;
while (i < n1 && j < n2) {
if (arr1[i] < arr2[j]) merged[k++] = arr1[i++];
else merged[k++] = arr2[j++];
}
while (i < n1) merged[k++] = arr1[i++];
while (j < n2) merged[k++] = arr2[j++];
}
Time Complexity: O(n1 + n2), where n1 and n2 are the sizes of the two input arrays. Each
element from both arrays is processed exactly once.
Answer:
Binary Search: Binary search is an efficient algorithm for finding an item from a sorted array by
repeatedly dividing the search interval in half.
Algorithm:
Code Snippet:
Time Complexity:
Answer:
Multi-Dimensional Array: A 2D array is an array of arrays. It represents a matrix with rows and
columns, where each element can be accessed using two indices.
Code Snippet:
Memory Calculation:
Answer:
Sparse Matrix: A sparse matrix is a matrix predominantly filled with zero values. Special
storage techniques are used to save space and improve computational efficiency.
Storage Techniques:
● Compressed Row Storage (CRS): Stores non-zero elements in three separate arrays:
○ Values Array: Contains non-zero values.
○ Column Index Array: Contains column indices for each non-zero value.
○ Row Pointer Array: Contains pointers to the start of each row in the values
array.
Code Snippet:
Explanation:
This format reduces the space required to store sparse matrices and speeds up matrix
operations like addition and multiplication.
Data Structures (23CSH-241)
Unit 2 Notes & Sample Questions
A linear linked list is a collection of elements, called nodes, where each node points to
the next node in the sequence. Unlike arrays, linked lists are not stored in contiguous
memory locations. Each node contains two parts:
● Singly Linked List: Each node points to the next node and the last node points to
NULL.
● Doubly Linked List: Each node points to both the next and the previous nodes.
● Circular Linked List: The last node points back to the first node, forming a
circle.
struct Node {
int data; // Data field
struct Node* next; // Pointer to the next node
};
Traversing a linked list involves visiting each node in the list sequentially. This is
typically done using a loop, starting from the head node and moving to each subsequent
node using the pointers.
Example in C:
Searching for an element in a linked list involves traversing the list and comparing each
node's data with the target value. The search stops when the target is found or the end of
the list is reached.
Example in C:
Insertion and deletion operations can occur at the beginning, end, or at a specific
position in the linked list.
Insertion
1. At the Beginning:
○ Create a new node.
○ Point the new node's next pointer to the current head.
○ Update the head pointer to the new node.
Example in C:
void insertAtBeginning(struct Node** head, int newData) {
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
newNode->data = newData;
newNode->next = *head; // Link new node to head
*head = newNode; // Update head to new node
}
2. At the End:
○ Create a new node.
○ Traverse to the last node and link its next pointer to the new node.
Example in C:
void insertAtEnd(struct Node** head, int newData) {
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
struct Node* last = *head; // Start at head
newNode->data = newData; // Set data
newNode->next = NULL; // New node will be the last node
3. At a Specific Position:
○ Traverse to the desired position and adjust pointers accordingly.
Deletion
Example in C:
void deleteFromBeginning(struct Node** head) {
if (*head == NULL) return; // List is empty
struct Node* temp = *head; // Store head
*head = (*head)->next; // Move head to next node
free(temp); // Free memory of old head
}
Example in C:
void deleteFromEnd(struct Node** head) {
if (*head == NULL) return; // List is empty
struct Node* temp = *head;
if (temp->next == NULL) { // Only one node
free(temp);
*head = NULL; // List becomes empty
return;
}
while (temp->next->next != NULL) { // Traverse to the second last
node
temp = temp->next;
}
free(temp->next); // Free last node
temp->next = NULL; // Set second last node's next to NULL
}
A header linked list is a type of linked list that includes a header node, which does not
store data relevant to the list but serves as a starting point for traversals. This helps
simplify operations such as insertion and deletion at the beginning since the header node
always exists.
struct DNode {
int data; // Data field
struct DNode* prev; // Pointer to the previous node
struct DNode* next; // Pointer to the next node
};
Common Operations:
● Insertion: At the beginning, end, or specific position.
● Deletion: From the beginning, end, or specific position.
● Traversal: Both forward and backward.
if (*head != NULL) {
(*head)->prev = newNode; // Link previous head to new node
}
newNode->prev = NULL; // New node becomes the first node
*head = newNode; // Update head to new node
}
Operation Time
Complexity
● Dynamic Memory Allocation: Linked lists can grow and shrink as needed,
allowing for efficient memory use.
● Implementing Stacks and Queues: Linked lists are often used to implement
these data structures.
● Graph Representation: Adjacency lists in graph theory can be implemented
using linked lists.
● Undo Functionality in Applications: Linked lists can be used to keep track of
changes in applications that require undo operations.
● Polynomial Representation: Linked lists can represent polynomials where each
node holds a coefficient and exponent.
Conclusion
Linked lists are fundamental data structures with flexible memory allocation, allowing for
dynamic data management. Understanding their operations, complexity, and applications
is essential for efficient programming and algorithm design.
2-Mark Questions
1. Question: Explain the process of traversing a linked list with a suitable code
example.
○ Answer: To traverse a linked list, start from the head node and follow the
next pointers until you reach a node whose next pointer is NULL. During
traversal, you can perform operations such as printing the data of each
node.
Example Code:
void traverse(struct Node* head) {
struct Node* current = head; // Start from the head
while (current != NULL) {
printf("%d -> ", current->data); // Print data
current = current->next; // Move to the next node
}
printf("NULL\n"); // Indicate the end of the list
}
Example Code:
void deleteAtPosition(struct Node** head, int position) {
if (*head == NULL) return; // List is empty
struct Node* temp = *head;
5. Question: Explain how a circular linked list works and its advantages over a
regular linked list.
○ Answer: In a circular linked list, the last node's next pointer points back
to the first node instead of NULL, creating a loop. This structure allows for
continuous traversal of the list without the need to reset to the head, which
is useful for applications that require repeated cycling through the list, such
as in round-robin scheduling.
Advantages:
Basic Terminology
● Stack: A linear data structure that follows the Last In First Out (LIFO) principle,
where the last element added is the first one to be removed.
● Top: The index or pointer that indicates the last element added to the stack.
● Push: The operation to add an element to the top of the stack.
● Pop: The operation to remove the top element from the stack.
● Peek/Top: The operation to retrieve the top element without removing it.
● Underflow: An error condition that occurs when attempting to pop from an empty
stack.
● Overflow: An error condition that occurs when attempting to push onto a full
stack (in the case of a fixed-size stack).
1. Sequential Representation:
○ A stack can be implemented using an array.
○ An array of fixed size is created, and a variable (top) keeps track of the
index of the last element.
○ Advantages: Simple to implement and provides fast access to elements.
○ Disadvantages: Fixed size leads to overflow when the stack exceeds its
limit.
Example Code:
#define MAX 100
struct Stack {
int items[MAX];
int top;
};
2. Linked Representation:
○ A stack can also be implemented using a linked list where each node points
to the next node in the stack.
○ Advantages: Dynamic size, no overflow (except for memory limits).
○ Disadvantages: Overhead of extra memory for pointers.
Example Code:
struct Node {
int data;
struct Node* next;
};
struct Stack {
struct Node* top;
};
3.
Operations on Stacks
● Push Operation:
○ Adds an element to the top of the stack.
○ Increases the top index or pointer.
○ Checks for overflow condition (if using an array).
● Pop Operation:
○ Removes the element from the top of the stack.
○ Decreases the top index or pointer.
○ Checks for underflow condition.
Applications of Stacks
1. Parenthesis Matching:
○ Stacks can be used to check for balanced parentheses in expressions.
○ Traverse the expression and push opening brackets onto the stack; pop for
closing brackets.
○ If the stack is empty at the end and all brackets match, the expression is
balanced.
Example Code:
int isBalanced(char* expr) {
struct Stack s;
initStack(&s);
for (int i = 0; expr[i]; i++) {
if (expr[i] == '(') {
push(&s, expr[i]);
} else if (expr[i] == ')') {
if (pop(&s) == -1) {
return 0; // Unmatched closing parenthesis
}
}
}
return s.top == -1; // Balanced if stack is empty
}
Principles of Recursion
1. Base Case: The condition under which the recursion stops. It prevents infinite
calls.
2. Recursive Case: The part of the function that includes the recursive call.
● Recursive functions typically have a base case and one or more recursive cases.
int factorial(int n) {
if (n == 0) { // Base case
return 1;
} else { // Recursive case
return n * factorial(n - 1);
}
}
int fibonacci(int n) {
if (n == 0) { // Base case
return 0;
} else if (n == 1) { // Base case
return 1;
} else { // Recursive case
return fibonacci(n - 1) + fibonacci(n - 2);
}
}
2-Mark Questions
1. What is a stack?
○ A stack is a linear data structure that follows the Last In First Out (LIFO)
principle, where the last element added is the first one to be removed.
2. Define the PUSH operation in a stack.
○ The PUSH operation adds an element to the top of the stack and increases
the index or pointer representing the top of the stack.
3. What does the term "underflow" mean in the context of stacks?
○ Underflow refers to the condition that occurs when trying to pop an element
from an empty stack.
4. Explain the difference between sequential and linked representations of a
stack.
○ Sequential representation uses an array with a fixed size, while linked
representation uses a linked list, allowing dynamic sizing.
5. What is the time complexity of the POP operation in a stack?
○ The time complexity of the POP operation is O(1) because it involves
accessing the top element and adjusting the top index or pointer.
6. Give one application of stacks in programming.
○ Stacks are used for parenthesis matching in expressions to ensure that
brackets are balanced.
7. What is the base case in recursion?
○ The base case is a condition that stops the recursion by providing a
straightforward solution for a specific input, preventing infinite calls.
8. Describe the principle of recursion with an example.
○ Recursion involves a function calling itself with a smaller instance of the
same problem. For example, the factorial function calls itself with (n-1).
9. What is the importance of recursion in programming?
○ Recursion simplifies the solution for problems that can be divided into
smaller subproblems, making the code cleaner and easier to understand.
10. What is the time complexity of evaluating a postfix expression using a stack?
○ The time complexity for evaluating a postfix expression using a stack is
O(n), where n is the number of operands in the expression.
5-Mark Questions
○ Base Cases: The function stops calling itself when n is 0 or 1, returning the
corresponding Fibonacci number.
○ Recursive Case: For n greater than 1, the function calls itself with (n-1)
and (n-2) to compute the nth Fibonacci number by summing the two
preceding values.
Queues
A queue is a linear data structure that follows the First In First Out (FIFO) principle. This
means that the first element added to the queue will be the first one to be removed.
1. Linear Queue
struct Queue {
int items[MAX];
int front;
int rear;
};
struct Queue {
struct Node* front;
struct Node* rear;
};
● Operations:
○ Enqueue: Create a new node and link it to the rear of the queue.
○ Dequeue: Remove the node at the front of the queue and adjust the front
pointer.
4. Circular Queue
● Definition: A circular queue is a linear queue where the last position is connected
back to the first position, forming a circle. This allows for efficient use of space.
● Advantages: It eliminates the problem of wasted space in a linear queue when
elements are dequeued.
● Operations:
○ Enqueue: Similar to a linear queue, but the rear wraps around to the front if
it reaches the end of the array.
○ Dequeue: The front pointer wraps around when it reaches the end.
Implementation:
#define MAX 100
struct CircularQueue {
int items[MAX];
int front;
int rear;
};
● Definition: A deque allows insertion and deletion of elements from both ends
(front and rear).
● Operations:
○ Enqueue Front: Add an element to the front of the deque.
○ Enqueue Rear: Add an element to the rear of the deque.
○ Dequeue Front: Remove an element from the front of the deque.
○ Dequeue Rear: Remove an element from the rear of the deque.
● Implementation:
○ A deque can be implemented using either an array or a doubly linked list.
6. Priority Queue
● Definition: A priority queue is an abstract data type where each element has a
priority assigned. Elements are dequeued based on their priority rather than their
order in the queue.
● Types:
○ Min-Priority Queue: The element with the lowest priority is dequeued
first.
○ Max-Priority Queue: The element with the highest priority is dequeued
first.
● Implementation:
○ A priority queue can be implemented using an array, linked list, or a binary
heap.
● Operations:
○ Enqueue: Insert an element in the queue based on its priority.
○ Dequeue: Remove and return the element with the highest (or lowest)
priority.
Summary
● Queues are essential data structures that operate on a FIFO basis, and they can be
represented in different ways, such as linear, circular, and linked lists.
● Deques and priority queues extend the basic queue functionality, allowing for
more complex operations based on specific use cases.
1. Linear Queue
struct Queue {
int items[MAX];
int front;
int rear;
};
struct Queue {
struct Node* front;
struct Node* rear;
};
●
● Operations:
○ Enqueue: Create a new node and link it to the rear of the queue.
○ Dequeue: Remove the node at the front of the queue and adjust the front
pointer.
4. Circular Queue
● Definition: A circular queue is a linear queue where the last position is connected
back to the first position, forming a circle. This allows for efficient use of space.
● Advantages: It eliminates the problem of wasted space in a linear queue when
elements are dequeued.
● Operations:
○ Enqueue: Similar to a linear queue, but the rear wraps around to the front if
it reaches the end of the array.
○ Dequeue: The front pointer wraps around when it reaches the end.
Implementation:
c
Copy code
#define MAX 100
struct CircularQueue {
int items[MAX];
int front;
int rear;
};
● Definition: A deque allows insertion and deletion of elements from both ends
(front and rear).
● Operations:
○ Enqueue Front: Add an element to the front of the deque.
○ Enqueue Rear: Add an element to the rear of the deque.
○ Dequeue Front: Remove an element from the front of the deque.
○ Dequeue Rear: Remove an element from the rear of the deque.
● Implementation:
○ A deque can be implemented using either an array or a doubly linked list.
6. Priority Queue
● Definition: A priority queue is an abstract data type where each element has a
priority assigned. Elements are dequeued based on their priority rather than their
order in the queue.
● Types:
○ Min-Priority Queue: The element with the lowest priority is dequeued
first.
○ Max-Priority Queue: The element with the highest priority is dequeued
first.
● Implementation:
○ A priority queue can be implemented using an array, linked list, or a binary
heap.
● Operations:
○ Enqueue: Insert an element in the queue based on its priority.
○ Dequeue: Remove and return the element with the highest (or lowest)
priority.
Summary
● Queues are essential data structures that operate on a FIFO basis, and they can be
represented in different ways, such as linear, circular, and linked lists.
● Deques and priority queues extend the basic queue functionality, allowing for
more complex operations based on specific use cases.
Questions of 2 Marks
1. What is a queue?
○ A queue is a linear data structure that follows the First In First Out (FIFO)
principle, where the first element added is the first one to be removed.
2. Explain the difference between a linear queue and a circular queue.
○ In a linear queue, elements are added at the rear and removed from the
front, leading to wasted space when elements are dequeued. In a circular
queue, the last position is connected back to the first, allowing efficient use
of space.
3. What operations are typically supported by a queue?
○ Common operations include Enqueue (inserting an element), Dequeue
(removing an element), Front (retrieving the front element), isEmpty
(checking if the queue is empty), and isFull (checking if the queue is full).
4. Describe the concept of a priority queue.
○ A priority queue is an abstract data type where each element has a priority.
Elements are dequeued based on their priority rather than the order in
which they were added.
5. What is the primary advantage of using a circular queue over a linear queue?
○ A circular queue eliminates wasted space that can occur in a linear queue
by reusing positions of dequeued elements, allowing better memory
utilization.
6. How is a deque different from a regular queue?
○ A deque (double-ended queue) allows insertion and deletion of elements
from both ends (front and rear), while a regular queue allows operations
only at one end for insertion and the other end for deletion.
7. What is the time complexity of the enqueue and dequeue operations in a
queue implemented with a linked list?
○ The time complexity for both enqueue and dequeue operations in a linked
list implementation is O(1), as they involve inserting or removing elements
at the front or rear.
8. In which scenarios would you prefer to use a priority queue?
○ A priority queue is preferred in scenarios such as scheduling tasks based on
priority, managing bandwidth in networks, or handling events in
simulations where certain events need to be processed before others.
9. What is the effect of trying to dequeue an element from an empty queue?
○ Attempting to dequeue an element from an empty queue typically results in
an error or an underflow condition, as there are no elements to remove.
10. Explain how a circular queue is implemented using an array.
○ A circular queue uses an array with two pointers (front and rear). When
adding or removing elements, the pointers wrap around to the beginning of
the array if they reach the end, allowing the queue to reuse positions of
dequeued elements.
Questions of 5 Marks
struct CircularQueue {
int items[MAX];
int front;
int rear;
};
struct Node {
int data;
struct Node* next;
struct Node* prev;
};
struct Deque {
struct Node* front;
struct Node* rear;
};
}
Sample Paper for Mid Semester Test - II
Semester : 3rd
Graph theory has a wide range of applications in various fields, such as:
● Computer Science: Used in networking, data structure analysis, search algorithms, and
AI.
● Social Networks: To represent friendships, connections, or interactions between people.
● Biology: Mapping biological networks, such as neural pathways or food chains.
● Transportation and Navigation: For routing, shortest path, and connectivity analysis,
especially in maps and logistics.
2. Key Terminology in Graph Theory
To work with graphs effectively, understanding the terminology is essential.
Basic Terms
Types of Graphs
1. Undirected Graph:
○ An undirected graph has edges that do not have a specific direction. This means
if there is an edge between A and B, you can traverse it from A to B and vice
versa.
○ Notation: G=(V,E), where V is the set of vertices and E is the set of edges.
○ Example: In a social network where friendships are mutual, an undirected edge
represents a friendship between two people.
2. Directed Graph (Digraph):
○ In a directed graph, each edge has a direction, meaning it goes from one vertex
to another specific vertex.
○ Notation: Often represented as (u→v), indicating a one-way edge from u to v.
○ Example: A Twitter following network, where one user can follow another, but the
follow may not be reciprocal.
3. Weighted Graph:
○ In a weighted graph, each edge has a numerical value, or “weight,” associated
with it.
○ Example: A city map with distances between locations as edge weights.
4. Unweighted Graph:
○ An unweighted graph has edges with no weights, meaning all edges are
considered equal in terms of cost or distance.
Graph Characteristics
1. Path:
○ A path is a sequence of vertices connected by edges. A path between vertices A
and B might go through multiple intermediate vertices.
○ Example: In a subway system, the path from Station A to Station C might go
through Station B.
2. Cycle:
○
A cycle is a path that starts and ends at the same vertex without traversing any
vertex more than once (except for the start/end vertex).
○ Example: In a food chain, a cycle could represent an ecosystem loop where each
organism is consumed by another, eventually leading back to the initial organism.
3. Degree:
○ The degree of a vertex is the number of edges connected to it.
○ In-Degree (for directed graphs): The number of incoming edges to a vertex.
○ Out-Degree (for directed graphs): The number of outgoing edges from a vertex.
Connectivity
1. Connected Graph:
○ A graph is connected if there is a path between every pair of vertices.
2. Disconnected Graph:
○ A graph with at least one pair of vertices that do not have a connecting path.
3. Graph Representation
Graphs can be represented in multiple ways, which makes it easier to store and manipulate
them depending on the application.
Adjacency Matrix
A B C D
A 0 1 0 1
B 1 0 1 0
C 0 1 0 1
D 1 0 1 0
Adjacency List
● A: B, D
● B: A, C
● C: B, D
● D: A, C
Path Matrix
Definition: DFS explores as deeply as possible along each branch before backtracking.
Steps:
Example: Imagine solving a maze where you go as far as possible in one direction before trying
other paths if you hit a dead end.
Algorithm:
DFS(Graph, source):
Mark source as visited
For each neighbor of source:
If neighbor is not visited:
Call DFS(neighbor)
Breadth-First Search (BFS)
Definition: BFS explores all vertices at the current depth level before moving to the next level.
Steps:
1. Initialize: Start from a source vertex, mark it as visited, and add it to a queue.
2. Dequeue: Remove the front vertex from the queue.
3. Visit Neighbors: For each unvisited neighbor of the dequeued vertex, mark it as visited
and add it to the queue.
4. Repeat: Continue until the queue is empty.
Example: BFS is like spreading out from the center of a ripple, exploring each level one by one,
making it ideal for finding shortest paths in unweighted graphs.
Algorithm:
BFS(Graph, source):
Mark source as visited
Enqueue source
While queue is not empty:
vertex = Dequeue()
For each neighbor of vertex:
If neighbor is not visited:
Mark neighbor as visited
Enqueue neighbor
5. Operations on Graphs
Adding and Removing Nodes and Edges
Connected Components
Summary
Graphs are versatile tools for representing complex data structures in real-world applications.
Understanding the core concepts of graph types, representations, and traversal algorithms
(DFS and BFS) is essential for analyzing networks, relationships, and paths. With the ability to
add/remove nodes and edges, detect cycles, and identify connected components, graph theory
provides the foundation for tackling real-world problems across domains like networking, social
media, biology, and transportation.
1. Define a graph and give two real-world examples of where graphs are used.
● Answer: An adjacency matrix is a 2D matrix where the cell A[i][j] is 1 if there is an edge
between vertices i and j (or a weight if it's weighted) and 0 if there is no edge. It’s
suitable for dense graphs but uses more memory for sparse graphs.
4. What is the purpose of graph traversal, and name the two primary traversal methods.
● Answer: Graph traversal is the process of visiting each vertex in a graph systematically
to explore and analyze relationships. The two primary traversal methods are Depth-First
Search (DFS) and Breadth-First Search (BFS).
● Answer: An adjacency list represents a graph by storing each vertex as a list of its
neighbors. It is more memory-efficient for sparse graphs than an adjacency matrix.
Example: For a graph with vertices A, B, C, and D:
○ A: B, D
○ B: A, C
○ C: B, D
○ D: A, C
● In this example, vertex A is connected to B and D, vertex B to A and C, and so on. This
representation reduces storage by only listing connections that exist, making it
particularly effective for large, sparse graphs.
● Answer: The main differences between Depth-First Search (DFS) and Breadth-First
Search (BFS) are:
1. DFS explores as deeply as possible along each branch before backtracking,
while BFS explores all neighbors at the current depth before moving to the next
level.
2. DFS uses a stack (or recursion), while BFS uses a queue.
3. DFS is suited for scenarios like maze solving where a deep path needs
exploring, while BFS is suited for shortest path problems in unweighted graphs.
4. BFS finds the shortest path in unweighted graphs, but DFS does not guarantee
the shortest path.
1. Explain Depth-First Search (DFS) algorithm in detail with step-by-step explanation and
a practical example.
● Answer: Depth-First Search (DFS) is a graph traversal algorithm that explores as far
down each branch as possible before backtracking.
Steps:
○ Start at a source vertex, mark it as visited.
○ Visit each adjacent unvisited vertex, marking it as visited and proceeding
down its branches.
○ Backtrack when no unvisited adjacent vertices remain.
○ Repeat until all vertices reachable from the source are visited.
● Example: For a graph with vertices A,B,C,D where A→B, B→C, A→D:
○ Start at A, mark as visited.
○ Move to B, mark as visited.
○ Move to C, mark as visited.
○ Backtrack to B, no unvisited vertices, backtrack to A.
○ Move to D, mark as visited.
● DFS is useful in applications like maze solving or detecting cycles in graphs.
2. Describe the Breadth-First Search (BFS) algorithm with a detailed example and discuss
its applications.
● Answer: Breadth-First Search (BFS) is a graph traversal algorithm that explores all
neighbors at the current level before moving to the next level.
Steps:
○ Start at a source vertex, mark as visited, and enqueue it.
○ Dequeue the front vertex, visit its unvisited neighbors, mark them, and enqueue
them.
○ Repeat until the queue is empty.
● Example: For a graph with vertices A,B,C,D where A→B, B→C, A→D:
○ Start at A, mark as visited, enqueue A.
○ Dequeue A, visit B and D, mark and enqueue them.
○ Dequeue B, visit C, mark and enqueue it.
○ Dequeue D and C; no new vertices.
● Applications: BFS is used for finding shortest paths in unweighted graphs,
peer-to-peer networking (e.g., finding closest nodes), and social network analysis.
● Answer: Connectivity in a graph refers to the presence of paths between all pairs of
vertices.
○ A connected graph has a path between every pair of vertices.
○ A disconnected graph has at least one pair of vertices with no path between
them.
● Determining Connectivity:
○ To check if a graph is connected, perform DFS or BFS from any vertex. If all
vertices are reachable, the graph is connected.
● Connected Components:
○ A connected component is a subgraph where any two vertices are connected
by paths, and no other vertices are connected to any vertex in the subgraph.
○ Finding Components:
1. Start DFS or BFS from each unvisited vertex.
2. Each traversal covers one connected component.
3. Repeat until all vertices are visited, with each traversal identifying one
component.
● Example: For a graph with two components, say G1={A,B} and G2={C,D}:
○ Running DFS/BFS from A will visit B, marking G1.
○ Running DFS/BFS from C will visit D, marking G2.
3.2 : Trees
Trees: Basic Terminology
Binary Trees
A binary tree is a tree where each node has at most two children, commonly referred to as the
left and right children. Binary trees are foundational structures in computer science due to their
efficient data processing capabilities.
1. Full Binary Tree: Every node has either zero or two children.
2. Complete Binary Tree: All levels, except possibly the last, are completely filled, and all
nodes are as left as possible.
3. Perfect Binary Tree: A full binary tree with all levels completely filled.
4. Skewed Binary Tree: All nodes have only one child, creating either a left or right linear
structure.
1. Array Representation: Nodes are stored in an array, where the root node is at index 1
(or 0). For any node at index iii:
○ Left child is at 2i+1
○ Right child is at 2i+2
○ Parent is at floor (i−1)/2
Linked Representation: Each node has pointers to its left and right children. A typical node
structure for a binary tree includes:
struct Node {
int data;
};
2.
Tree traversal is the process of visiting all nodes in a specific order. The primary types of binary
tree traversal are:
1. Inorder Traversal:
○ Push nodes onto the stack until reaching the leftmost node.
○ Pop and visit nodes, then move to the right child if it exists.
2. Preorder Traversal:
○ Push the root node onto the stack, visit it, then push the right and left children (in
that order).
○ Continue popping, visiting, and pushing children until the stack is empty.
3. Postorder Traversal:
○ Push nodes and manage visited markers to ensure the right child is processed
after the left child.
1. Header Nodes: Special nodes in linked data structures (such as linked lists or trees) that
act as placeholders or starting points.
2. Threaded Binary Trees: In a threaded binary tree, null pointers are replaced with
pointers to the inorder predecessor or successor, making it easier to traverse the tree
without recursion or stacks.
● The left subtree contains nodes with values less than the root.
● The right subtree contains nodes with values greater than the root.
● This property applies to every node, ensuring efficient searching, insertion, and deletion.
Operations on BST
1. Searching in BST:
○ Start at the root; move left if the target is smaller or right if it’s larger until the
target is found or a leaf is reached.
2. Inserting in BST:
○ Traverse to the appropriate leaf node based on value comparison, then insert the
new node as a left or right child.
3. Deleting in BST:
○ Locate the node, then:
■ If it's a leaf, simply remove it.
■ If it has one child, replace it with its child.
■ If it has two children, replace it with its inorder successor or predecessor,
then delete that node.
AVL Trees
An AVL tree is a balanced BST where the heights of the left and right subtrees of every node
differ by no more than one. If the tree becomes unbalanced after an insertion or deletion, it is
rebalanced using rotations:
B-Trees
B-Trees are self-balancing trees that maintain sorted data and allow efficient insertion, deletion,
and search operations. They are widely used in databases and file systems for managing large
blocks of data.
● Properties:
1. All leaves are at the same depth.
2. Each node can contain multiple keys and children.
3. Nodes have a minimum and maximum number of children to maintain balance.
Heaps
1. Max-Heap: The key at each node is greater than or equal to the keys of its children.
2. Min-Heap: The key at each node is less than or equal to the keys of its children.
Heap Operations:
1. Insertion: Add the new element at the end, then "heapify" by swapping it up to maintain
the heap property.
2. Deletion (of the root): Replace the root with the last element, then heapify down.
Heap Sort
Heap Sort is an efficient sorting algorithm that uses the heap data structure.
Steps:
1. Explain the process of inorder, preorder, and postorder traversals in a binary tree
with an example.
Answer:
○ Inorder Traversal (Left, Root, Right): Visit the left subtree, then the root, then
the right subtree. Example: For a tree with root A and left child B, right child C,
the inorder traversal is B → A → C.
○ Preorder Traversal (Root, Left, Right): Visit the root, then left subtree, then
right subtree. Using the same tree, the preorder traversal is A → B → C.
○ Postorder Traversal (Left, Right, Root): Visit the left subtree, right subtree,
then the root. For the example tree, the postorder traversal is B → C → A.
2. Describe the insertion operation in a Binary Search Tree (BST).
Answer:
To insert a node in a BST:
○ Start at the root.
○ If the new value is less than the current node, move to the left child; if greater,
move to the right child.
○ Continue this process recursively until an empty spot (null) is found.
○ Insert the new node in this spot as either the left or right child based on its value.
This ordering ensures that the BST properties are maintained.
3. Explain the concept of a threaded binary tree and its advantages.
Answer:
In a threaded binary tree, null pointers in the nodes are replaced by pointers to the
inorder predecessor or successor, allowing for efficient inorder traversal without
recursion or stacks. The advantages include:
○ Improved Traversal: Enables faster traversal by providing direct links to the next
node in sequence.
○ Reduced Memory Use: Reduces the need for auxiliary data structures like
stacks for traversal.
1. Explain AVL trees, including their balancing process and types of rotations used
to maintain balance.
Answer:
An AVL tree is a balanced binary search tree where the height difference between the
left and right subtrees of each node is at most 1. This balance is achieved by performing
rotations after insertions and deletions, ensuring efficient operations.
Types of Rotations:
○ Left Rotation: Applied when a node’s right subtree is heavier. The right child
becomes the new root, and the old root becomes the left child of the new root.
○ Right Rotation: Applied when a node’s left subtree is heavier. The left child
becomes the new root, and the old root becomes the right child of the new root.
○ Left-Right Rotation: Applied when the left subtree’s right child causes
imbalance. A left rotation is applied to the left subtree, followed by a right rotation.
○ Right-Left Rotation: Applied when the right subtree’s left child causes
imbalance. A right rotation is applied to the right subtree, followed by a left
rotation.
Advantages: AVL trees maintain balance, ensuring a time complexity of O(logn) for
search, insert, and delete operations, making them suitable for applications where fast
data retrieval is critical.
2. Describe heap data structures, focusing on max-heaps, min-heaps, and their
applications.
Answer:
A heap is a complete binary tree where each node follows a specific ordering property:
○ Max-Heap: The value at each node is greater than or equal to the values of its
children. The root node has the maximum value.
○ Min-Heap: The value at each node is less than or equal to the values of its
children. The root node has the minimum value.
Applications of Heaps:
○ Priority Queues: Heaps efficiently manage priority queues, allowing for quick
access to the highest (max-heap) or lowest (min-heap) priority element.
○ Heap Sort: A sorting algorithm that uses the heap structure to order data in
O(nlogn) time.
○ Graph Algorithms: Heaps optimize graph traversal algorithms like Dijkstra’s
shortest path by providing efficient access to the minimum cost path.
3. Explain the operations of insertion, searching, and deletion in Binary Search Trees
(BSTs) with examples.
Answer:
Insertion:
○ Start from the root; compare the value to be inserted with the current node.
○ Move left if the value is smaller; right if larger.
○ Repeat until a null position is found, then insert the node.
Example: Insert 15 into BST: For nodes 10, 20, and 30, insert as the right child of 10
since 15 > 10 but < 20.
Searching:
○ Start from the root and compare the target with each node.
○ Move left if smaller, right if larger.
○ Continue until the target is found or a leaf node is reached.
Example: To search for 15 in the above tree, start at root 10 → move right to 20 → left
to 15.
Deletion:
Example: Deleting 15 from the tree involves replacing it with the right child if any, or just
removing it if it’s a leaf.
3.3 : Hashing and File Organization
Hashing
Hashing is a technique used in computer science to store and retrieve data quickly. Hashing
maps data (keys) to fixed-size values (hash codes or hash values) using a hash function, which
allows for fast access to data by leveraging these hash values as indices in a hash table.
Basic Terminology
1. Hash Function: A function that converts an input (or "key") into a fixed-size string of
bytes. The output, called the hash value or hash code, is typically a number used as an
index in a hash table.
2. Hash Table: A data structure that stores key-value pairs based on the hash value of the
key. This table uses the hash function to quickly locate entries.
3. Bucket: A slot in the hash table where one or more values may be stored, depending on
the collision handling method.
4. Collision: Occurs when two or more keys produce the same hash value and are
mapped to the same index or bucket in the hash table.
5. Load Factor: The ratio of the number of entries in the hash table to the number of
buckets. A high load factor can increase collisions, leading to slower access times.
Hash Functions
● Be deterministic: Given a specific input, it should always return the same hash code.
● Distribute values uniformly: Hash codes should be evenly distributed to avoid clusters
of data.
● Be efficient to compute: The function should work quickly even for large datasets.
1. Division Method:
○ Formula: h(k)=k mod m
○ Here, k is the key, and m is the size of the hash table.
○ Example: If m=10 and k=1234, then h(1234)=1234 mod 10 = 4.
2. Multiplication Method:
○ Formula: h(k)=⌊m⋅(k⋅A mod 1)⌋
○ A is a constant, typically an irrational number like (root 5−1)/2
3. Folding Method: Divides the key into parts, adds them, and then takes the modulo.
○ Example: For a key 123456, split into 12, 34, and 56, sum to get 102, and take
mod m to get the hash value.
Collision Resolution Techniques
Since collisions can occur when two keys hash to the same index, we use collision resolution
methods:
1. Separate Chaining:
○ Each bucket is associated with a linked list of entries that hash to the same
index.
○ Pros: Simple and efficient when the load factor is controlled.
○ Cons: Can become inefficient if chains grow too long.
2. Open Addressing:
○ Stores all elements within the hash table itself, probing for an empty spot if a
collision occurs.
3. Types of Open Addressing:
4. Rehashing:
○ When the load factor reaches a threshold, a new larger table is created, and all
entries are rehashed into it. This reduces collisions but requires time for resizing.
Applications of Hashing
1. Databases: Hashing allows for fast lookups of records based on key values, enhancing
database efficiency.
2. Caches: Used to map resource requests to cached data, allowing for quicker retrieval.
3. Password Storage: Hashing is commonly used to securely store passwords.
4. Compiler Symbol Tables: Hashing helps manage identifiers in programming languages
efficiently.
Each key hashes to index 0. Using separate chaining, we store all these values in a linked list at
index 0.
● Fast Access: Provides O(1) average-case time complexity for search, insert, and delete
operations.
● Efficient Memory Use: Only the required entries are stored, making it memory-efficient.
Disadvantages
Hashing Algorithms
Hash Table Operations
1. Insertion:
○ Calculate the hash value using the hash function.
○ Check if the computed index is occupied.
○ If there is no collision, place the element.
○ If a collision occurs, use a collision resolution method (e.g., chaining or probing).
2. Searching:
○ Compute the hash code using the hash function.
○ Check the corresponding index in the table.
○ If the value is present, return it.
○ If a collision resolution technique is used (e.g., probing), follow the technique’s
rules to locate the key.
3. Deletion:
○ Compute the hash code of the key.
○ Locate the key in the hash table using the collision resolution technique, if any.
○ Remove the entry if found.
File Organization and Record Management
In computer science, file organization refers to the way data is stored in a file system. Efficient
file organization enables fast access to records and optimal use of memory. Understanding how
records are organized and retrieved from files is crucial for managing large datasets, ensuring
efficient file systems, and improving performance.
1. File: A collection of data or information that is stored on a storage device (e.g., hard disk,
SSD). Files can store various types of data, such as text, numbers, images, or even
executable code.
2. Record: A collection of related data items, often referred to as fields. For example, in a
student database, a record may contain fields like student ID, name, date of
birth, address, etc.
3. File Structure: The way data is arranged within a file. It includes how records are
stored, accessed, and organized.
4. Block: A unit of storage in a file. Typically, a file is divided into blocks, which are the
smallest unit of data that can be transferred from disk to memory.
5. File Access: How a program accesses records in a file. It can be sequential (records are
read in order), direct (records can be accessed randomly), or indexed (using an index to
quickly locate records).
When data is stored in files, it's typically organized into blocks. A block is a contiguous set of
records, and each block is stored sequentially in the storage device. The organization into
blocks helps in efficient storage and retrieval because:
● Files may contain many records, and managing them in smaller, fixed-size blocks helps
with better memory management and faster access.
● Blocks can be read in one operation from the disk, which reduces I/O operations,
improving system performance.
In file systems, records within a block are typically stored in sequential order, but different file
organizations can be used to manage how these blocks of records are accessed.
● Definition: In a sequential file, records are stored in order, one after the other. A
sequential file can only be read in a sequence, typically from the beginning to the end.
● Use Cases: This organization is useful for applications where the majority of access is
through the entire file in a linear fashion (e.g., generating reports, sequential
processing).
● Insertion/Deletion: Inserting or deleting records in sequential files can be slow because
new records must be inserted in the correct order, and deletions may require shifting
records.
● Advantages:
○ Simple to implement.
○ Efficient for reading entire files or sequential access.
● Disadvantages:
○ Searching for specific records can be slow unless the file is indexed or sorted.
○ Insertions and deletions are inefficient, especially as the file size grows.
● Definition: A relative file is one in which records are stored in a manner that allows
access via a relative address. The records are stored in blocks, and each block has an
address, which allows direct access to records.
● Access: The record location is calculated based on a relative position or address within
the file, using the key field of the record (e.g., record ID).
● Use Cases: This is suitable for applications where records need to be accessed directly
by a key value, like databases or key-value stores.
● Advantages:
○ Direct access to records, making retrieval fast.
○ Efficient for applications where records are accessed randomly.
● Disadvantages:
○ Deletion and insertion can be complex as the position of records might change.
○ Handling overflow (when a block is full) requires additional techniques like
chaining or linking.
Example: A phonebook where each entry can be accessed directly via a unique phone number.
● Definition: Indexed sequential files use an index to store pointers to records. The index
allows for both sequential and direct access to records.
● Working: The data file is organized sequentially, and an index file is created that stores
the addresses of the records. The index provides a fast way to locate a record using its
key field.
● Use Cases: Suitable for applications where both sequential and random access are
required, like a library catalog or inventory management system.
● Advantages:
○ Provides fast access to records via the index.
○ Allows both sequential and random access.
● Disadvantages:
○ The index adds extra storage and complexity.
○ Insertions and deletions may require updates to the index, which can be costly.
Example: A student database where records are stored in order of student ID, and an index is
created to locate records by name or age.
● Definition: An inverted file is a specialized file organization where each keyword (or key)
points to a list of records containing that keyword. This structure is often used in
information retrieval systems, like search engines.
● Working: Instead of storing a record with its key directly, an inverted index stores a
mapping from a keyword to the set of records containing it. It is particularly useful when
queries are based on keywords or attributes.
● Use Cases: Most commonly used in search engines, full-text databases, and data
warehousing systems.
● Advantages:
○ Efficient for searching based on attributes or keywords.
○ Great for applications that require frequent keyword-based queries.
● Disadvantages:
○ The inverted index takes up more space because it stores a list of record pointers
for each keyword.
○ Maintaining the inverted index can be complex during updates (insertions,
deletions).
Example: A search engine index where each word in a document corpus points to a list of
documents containing that word.
Exam Questions for Hashing and File Organization
1. What is the purpose of a hash function in hashing, and what characteristics make
a hash function effective?
○ Answer: A hash function converts input data into a fixed-size integer or index,
allowing for efficient data storage and retrieval. Effective hash functions distribute
keys uniformly to avoid clustering and minimize collisions.
2. Explain the term ‘collision’ in hashing and provide one method to resolve
collisions.
○ Answer: A collision occurs when two keys produce the same hash value and
map to the same index in a hash table. One method to resolve collisions is
separate chaining, where each index in the table points to a linked list of entries.
3. Define sequential file organization and give an example of its application.
○ Answer: Sequential file organization stores records in a fixed order. This method
is efficient for processing files in a linear manner and is commonly used in payroll
systems where data needs to be processed in sequence.
4. What is an inverted file, and where is it commonly used?
○ Answer: An inverted file is an indexing technique where each attribute value
points to a list of records containing that attribute. It is commonly used in search
engines to quickly retrieve records based on keywords.
1. Describe open addressing in hashing, and compare linear probing and double
hashing as methods of resolving collisions.
○ Answer: Open addressing is a collision-resolution method where, rather than
using external chaining, collisions are resolved within the hash table itself. In
linear probing, the next available slot is found sequentially, which may cause
primary clustering. In contrast, double hashing uses a secondary hash function
to calculate probe intervals, which reduces clustering by distributing collisions
more evenly across the table.
2. Explain index sequential file organization and its advantages.
○ Answer: Index sequential file organization stores records in a sorted order with
an index allowing for both sequential and direct access. This approach allows
efficient retrieval for both sequential processing and quick access via the index.
Advantages include faster access times due to the index and efficient processing
for large, ordered datasets, though it requires extra storage for the index and may
be slower for frequent updates.
3. How does separate chaining differ from open addressing in handling hash
collisions? Provide examples.
○ Answer: Separate chaining resolves collisions by creating a linked list of entries
at each hash table index, allowing multiple entries to occupy the same index.
Open addressing, on the other hand, keeps all data within the table and
resolves collisions by probing alternative indices. For example, if the hash
function maps two keys to index 5, separate chaining would store both entries in
a linked list at index 5, whereas open addressing would place the second entry in
the next open slot, such as index 6 or 7.
1. Graphs
Graphs consist of nodes (vertices) connected by edges, and are used to model relationships
between entities in complex networks, such as social media, transportation, and communication
systems.
● Graph Terminology: Includes terms like vertices, edges, paths, degree, and connected
components.
● Representation:
○ Adjacency Matrix: A 2D array where each cell indicates if an edge exists
between vertices.
○ Path Matrix: Stores information about reachability between vertices.
● Graph Traversal: Methods include:
○ Depth-First Search (DFS): Explores as far as possible along a branch before
backtracking.
○ Breadth-First Search (BFS): Explores neighbors layer by layer.
● Applications: Include shortest path finding, connectivity checking, and network analysis.
2. Trees
Trees are hierarchical structures with a root node and branches, used to represent hierarchical
relationships like file directories, organization charts, and decision processes.
● Basic Terminology: Includes root, parent, child, leaf, subtree, depth, and height.
● Binary Trees: A type of tree where each node has at most two children (left and right).
○ Representation in Memory: Nodes are linked, with pointers to child nodes or
stored in arrays.
○ Traversal Methods:
■ Inorder, Preorder, and Postorder: Different ways of visiting nodes based
on the order of left/right children and the root.
■ Traversal Using Stacks: Used for non-recursive traversals.
○ Binary Search Trees (BSTs): A binary tree structure that maintains elements in
sorted order, allowing efficient searching, insertion, and deletion.
○ AVL Trees: Self-balancing binary search trees that keep height balanced to
ensure efficient operations.
○ B-Trees: Multi-way search trees optimized for disk storage and retrieval,
commonly used in databases.
○ Heaps: A complete binary tree primarily used in priority queues and Heap Sort
algorithm.
3. Hashing and File Organization
Hashing is a method to organize and retrieve data based on a computed key, while file
organization methods define how data is physically stored in files.
● Hashing Concepts: Uses a hash function to map keys to table indices, with techniques
to handle collisions (e.g., separate chaining and open addressing).
● Rehashing: Involves expanding the table and redistributing entries to maintain
efficiency.
● File Organization:
○ Sequential Organization: Stores records in a fixed, sorted order.
○ Relative Organization: Uses calculated addresses for random access.
○ Index Sequential Organization: Combines sequential organization with an
index for both sequential and direct access.
○ Inverted Files: An indexing method allowing quick access to records based on
specific fields, used in search-heavy applications.
4. Explain the different traversal methods in a binary tree and the order in which nodes are
visited in each method.
5. Describe the difference between open addressing and separate chaining in collision
resolution for hashing.
6. Describe Depth-First Search (DFS) and Breadth-First Search (BFS) traversal algorithms
in graphs. Discuss their applications with examples.
7. Explain the organization of files in an Index Sequential File Organization. Discuss how it
improves data access and efficiency.