Dsa Small
Dsa Small
Internal and external sorting are two broad categories of sorting techniques, primarily distinguished by how data is managed during the sorting process and the size of data } • Security: Searching is applied in password verification and intrusion
they are designed to handle. detection systems.
A full external sort would involve:
Key Differences C++ Code Examples
• Splitting data into chunks that fit in RAM
Sorting Example (std::sort for integers)
Aspect Internal Sorting External Sorting
• Sorting each chunk in memory (using an internal sort) cpp
Data Location Entire dataset fits in main memory (RAM) Dataset exceeds main memory capacity, stored in secondary storage (disk)
• Writing sorted chunks to disk #include <iostream>
Memory Usage Uses only RAM for sorting Uses both RAM (for buffers/chunks) and disk storage
• Merging all sorted chunks into a final sorted file69 #include <vector>
Suitable Dataset Small to medium datasets Large datasets that cannot fit into memory
Summary Table #include <algorithm>
I/O Operations Minimal; mostly limited to initial read and final write Frequent; involves reading/writing data to/from disk repeatedly using namespace std;
Internal Sorting External Sorting
Speed Generally faster due to direct access to data in memory Slower due to overhead of disk access
Works entirely in RAM Uses both RAM and disk
int main() {
Fast, simple, minimal I/O Slower, complex, frequent disk I/O
for(int j = low; j < high; ++j) { vector<int> data = {5, 2, 9, 1, 5, 6};
Quick Sort, Merge Sort, Heap External Merge Sort, Polyphase
Algorithm Suitable for small/medium datasets Essential for large datasets
Sort, Bubble Sort, Insertion Merge Sort, Replacement sort(data.begin(), data.end()); // Sorts in ascending order
Examples if(arr[j] < pivot) {
Sort Selection, External Radix Sort
Algorithms: Quick Sort, Heap Sort, Algorithms: External Merge Sort, for (int num : data) {
++i;
Simpler implementation, less More complex due to chunk etc. Polyphase
Complexity cout << num << " ";
overhead management and merging swap(arr[i], arr[j]) }
In conclusion:
} swap(arr[i+1], arr[high]); Internal sorting is ideal for small datasets that fit in memory, offering speed and }
Sorting arrays, lists, or tables
Sorting large files, databases, or simplicity. External sorting is necessary for very large datasets, trading off speed
Application in memory (e.g., in-memory return 0;
datasets stored on disk return i + 1; for the ability to handle massive data volumes using disk storage358.
databases)
} }
Applications of Sorting and Searching in Computer Science (with C++
Efficient for large datasets, but Examples)
Highly efficient for Binary Search Example (std::binary_search)
Efficiency slower than internal sorting for void quickSort(vector<int>& arr, int low, int high) {
small/medium datasets
small datasets Sorting and searching are foundational operations in computer science, enabling cpp
if(low < high) {
efficient data management, retrieval, and analysis across a wide range of
Explanation applications. #include <iostream>#include <vector>#include <algorithm>using namespace
int pi = partition(arr, low, high);
std;
• Internal Sorting is used when the entire dataset to be sorted can fit quickSort(arr, low, pi - 1); Applications of Sorting
into the main memory (RAM). All sorting operations are performed in- int main() {
memory, making these algorithms fast and simple to implement. quickSort(arr, pi + 1, high); • Data Organization: Sorting arranges data for easier access and
Examples include Quick Sort, Merge Sort, and Heap Sort38. management, such as alphabetizing names in a contact list or vector<int> data = {1, 2, 5, 5, 6, 9};int target = 5;
}} organizing files on a computer57.
// Data must be sorted for binary_search
• External Sorting is necessary when the dataset is too large to fit into
int main() { • Efficient Searching: Many search algorithms (like binary search)
RAM. Data is divided into manageable chunks, each sorted in bool found = binary_search(data.begin(), data.end(), target);
memory, then merged using external storage (like a hard disk). This require sorted data to function efficiently, reducing search time from
vector<int> arr = {5, 2, 9, 1, 5, 6};
approach minimizes random access and disk I/O, which are much linear to logarithmic57. if (found) cout << "Element found!" << endl;
slower than RAM access. Examples include External Merge Sort and quickSort(arr, 0, arr.size() - 1);
Polyphase Merge Sort368. • Data Analysis: Sorting helps in identifying trends, patterns, and else cout << "Element not found." << endl;
for(int i : arr) cout << i << " "; outliers, which is crucial in fields like statistics, finance, and scientific
Trade-offs and Use Cases return 0;
research157.
return 0;}
}
• Internal sorting is preferred for small to medium datasets due to its • Database Management: Databases use sorting to optimize query
External Sorting (Simplified External Merge Sort Example)
speed and simplicity58. performance, create indexes, and enable rapid data retrieval57. Application
Sorting Role Searching Role
External sorting is more complex and typically involves file I/O. The following is a Area
• External sorting is essential for very large datasets (e.g., big data, simplified illustration of the process: • User Experience: Sorting improves usability in applications such as
large logs, database tables) where only a portion can be loaded into e-commerce (product listings), social media feeds, music playlists, Index creation, query Record lookup, key-based
memory at a time. It is slower but necessary to handle data beyond // Pseudocode for clarity, not a full implementation and email management1. Databases
optimization retrieval
RAM capacity568.
#include <fstream>#include <vector>#include <algorithm> • Canonicalization and Output: Sorted data is easier to read and
C++ Example Code Trend/pattern/outlier Finding specific values or
compare, useful in reporting and data export2. Data Analysis
using namespace std; identification records
Internal Sorting (Quick Sort Example)
void sortChunk(const string& inputFile, const string& outputFile, int chunkSize) • Other Applications: Sorting is also used in GPS navigation, weather
Organizing lists, feeds, and Quick item lookup (e.g.,
cpp { forecasting, stock market analysis, medical diagnosis, and more1. User Interfaces
content contacts, products)
#include <iostream> ifstream in(inputFile); Applications of Searching
Information Query matching and relevance
Ranking and ordering results
#include <vector> ofstream out(outputFile); • Information Retrieval: Search engines and information systems use Retrieval determination
searching algorithms to quickly locate relevant documents or data
#include <algorithm> vector<int> buffer(chunkSize); entries7. Scientific Data preparation and Locating data points or
Research analysis experiments
using namespace std;
• Database Queries: Searching is fundamental for finding records in
databases, especially when combined with indexing and sorting7. In summary:
while(in.read((char*)buffer.data(), chunkSize * sizeof(int))) {
Sorting and searching are essential in computer science for efficient data
int partition(vector<int>& arr, int low, int high) { int readCount = in.gcount() / sizeof(int); • Pattern Matching: Searching algorithms are used in text editors, DNA organization, retrieval, and analysis. They underpin many real-world
sequence analysis, and plagiarism detection. applications, from databases and search engines to user-facing software, and
int pivot = arr[high]; sort(buffer.begin(), buffer.begin() + readCount); are implemented using standard algorithms and libraries in C++1257.
o Space Complexity: In-place, needs only O(1)O(1)O(1) extra • In-Place Sorting: if (low < high) {
space3. It requires only a small, constant amount of additional storage space
void mergeSort(vector<int>& arr, int left, int right) { int pi = partition(arr, low, high); // Partitioning index
(O(logn)O(\log n)O(logn) due to recursion stack)57.
o Stability: Not stable. if(left < right) { quickSort(arr, low, pi - 1); // Sort left subarray
• Versatility:
o Use Case: Suitable for memory-constrained environments int mid = left + (right - left) / 2; Widely used in commercial software, system libraries, and for large quickSort(arr, pi + 1, high); // Sort right subarray
where stability is not necessary. datasets where average-case performance is critical5.
mergeSort(arr, left, mid); }
Quick Sort is often the fastest in practice due to low overhead and cache
mergeSort(arr, mid + 1, right);
• Customization:
efficiency, but its worst-case complexity is O(n2)O(n^2)O(n2), which can be }
Pivot selection strategies and partitioning schemes can be tailored for
problematic for certain input patterns135. With good pivot selection (like
merge(arr, left, mid, right); specific data characteristics56.
randomized or median-of-three), the average case is O(nlogn)O(n \log
n)O(nlogn), making it a popular choice for general-purpose sorting in C++ (e.g., } • Limitations: // Utility function to print array
std::sort uses a variant of Quick Sort)5.
Quick Sort is not stable (does not preserve the order of equal
} elements) and its worst-case time complexity is O(n2)O(n^2)O(n2), void printArray(int arr[], int size) {
Why Merge Sort is Often Considered Best (Theoretical Perspective)
though this is rare with good pivot selection56.
for (int i = 0; i < size; i++)
• Consistent Performance: Merge Sort guarantees O(nlogn)O(n \log
Complexity Analysis
n)O(nlogn) time in all cases (best, average, and worst), making it int main() { cout << arr[i] << " ";
reliable for large datasets134. Case Time Complexity Space Complexity Notes
vector<int> arr = {5, 2, 9, 1, 5, 6}; cout << endl;
• Stability: It is stable, which is important for many real-world mergeSort(arr, 0, arr.size() - 1); O(nlogn)O(n \log O(logn)O(\log
Best Case Balanced partitioning }
applications (e.g., sorting records by multiple keys).
n)O(nlogn) n)O(logn)
for(int num : arr)
• Divide and Conquer: Efficient for external sorting (sorting data that
Average Most practical
O(nlogn)O(n \log O(logn)O(\log
does not fit in memory). cout << num << " "; int main() {
Case n)O(nlogn) n)O(logn) scenarios
return 0; int arr[] = {9, 4, 8, 3, 7, 1, 6, 2, 5};
int n = sizeof(arr) / sizeof(arr[0]); CountingSort(array, n, place) } int n = sizeof(arr) / sizeof(arr[0]);
cout << "Sorted array: "; Consider the array: // Copy to original array display(arr, n);
{170, 45, 75, 90, 802, 24, 2, 66}
printArray(arr, n); for (int i = 0; i < n; i++) radixSort(arr, n);
Step 1: Find the maximum number (802, which has 3 digits).
return 0; arr[i] = output[i];} cout << "After sorting: ";
Step 2: Sort by each digit place using Counting Sort:
} // Main radix sort function display(arr, n);
• Pass 1 (Unit place):
This code selects the last element as the pivot and sorts the array in-place346. void radixSort(int arr[], int n) { return 0;
Example Run
o Sorted array: {170, 90, 802, 2, 24, 45, 75, 66}
int max = getMax(arr, n); }
Input: • Pass 2 (Tens place): // Apply counting sort to sort elements based on place value Output:
9, 4, 8, 3, 7, 1, 6, 2, 5
o Sorted array: {802, 2, 24, 45, 66, 170, 75, 90} for (int place = 1; max / place > 0; place *= 10) text
Output:
123456789 • Pass 3 (Hundreds place): countSort(arr, n, place); Before sorting: 170 45 75 90 802 24 2 66
Summary Table o Sorted array: {2, 24, 45, 66, 75, 90, 170, 802} } After sorting: 2 24 45 66 75 90 170 802
Worst O(n2)O(n^2)O(n2) o O(n+k)O(n + k)O(n+k), due to the output array and counting
// Function to get the largest element from an array array8.
int main() {
Space O(logn)O(\log n)O(logn)
int getMax(int arr[], int n) {
int arr[] = {170, 45, 75, 90, 802, 24, 2, 66};
Practical Use Very fast, widely used
int max = arr[0];
Summary
Conclusion for (int i = 1; i < n; i++) 3. Recursive Calls:
• Radix Sort is efficient for sorting integers, especially when the range
of digits is not significantly larger than the number of elements. Recursively apply merge sort to the left and right halves.
Quick Sort is a powerful, efficient, and widely-used sorting algorithm in C++. Its if (arr[i] > max)
divide-and-conquer approach, in-place sorting, and fast average-case 4. Merge:
performance make it a top choice for many real-world applications, despite its max = arr[i]; • It processes digits from least significant to most significant, using a
Merge the two sorted halves into a single sorted array1368.
worst-case scenario. Using randomized or median-of-three pivot selection can stable sort at each step.
help avoid the worst-case and ensure robust performance56. return max; Example Dry Run
• It is non-comparative, stable, and has linear time complexity relative
Radix Sort Algorithm and Example } to the number of digits and elements. Consider the array: [38, 27, 43, 3, 9,Step 1: Divide** [38, 27, 43, 3, 9,d [3,
Radix Sort Algorithm Radix Sort is particularly useful for sorting large lists of numbers where • Step 2: Further Divide
comparison-based sorts (like Quick Sort or Merge Sort) are less efficient due → and [27, [27,and
Radix Sort is a non-comparative sorting algorithm that sorts numbers by // Using counting sort to sort elements based on significant places
to their O(nlogn)O(n \log n)O(nlogn) complexity. [3, → [3] and `[3]` → `[3]` and
processing individual digits. It works from the least significant digit (LSD) to the
void countSort(int arr[], int n, int place) { `[82,and
most significant digit (MSD), using a stable sub-sorting algorithm (commonly Merge Sort Algorithm: Explanation, Example, and C++ Code
Counting Sort) at each digit position. It is especially efficient for sorting integers const int max = 10;
and can outperform comparison-based algorithms when the number of digits is Algorithm Overview • Step 3: Merge
Merge and → [27, Merge `` and [27,[27, Merge 3and `` →3 Merge ``
small relative to the number of elements. int output[n];
Merge Sort is a classic divide-and-conquer sorting algorithm. It works by and `` →[10, Merge [3] and → `[3, 9, 10,erge and `[3, 9, 10,, 9, 10, 27,
Algorithm Steps int count[max] = {0}; recursively dividing the array into two halves, sorting each half, and then merging 38, 43,C++ Implementation**
the sorted halves back together. This process continues until the entire array is
1. Find the Maximum Number: sorted. cpp
Determine the maximum number in the array to know the number of
// Count occurrences Key Steps #include <iostream>
digits to process.
cout << endl; • Pass 2: (3,5)→ok; (5,4)→swap → 3,4,5,2,8; (5,2)→swap → 3,4,2,5,8 using namespace std;
// Merge the temp arrays back into arr[left..right] } • Pass 3: (3,4)→ok; (4,2)→swap → 3,2,4,5,8
while (i < n1 && j < n2) { int main() { C++ Code: for (int i = 1; i < n; i++) {
if (L[i] <= R[j]) { int arr[] = {38, 27, 43, 3, 9, 82, 10}; cpp int key = arr[i];
i++; cout << "Original array: "; using namespace std; while (j >= 0 && arr[j] > key) {
k++; cout << "Sorted array: "; if (arr[i] > arr[i + 1]) { }
i++; o Best, Average, Worst: O(nlogn)O(n \log n)O(nlogn) for all } insertionSort(arr, n);
cases, as the array is always split into halves and
k++; } cout << "Insertion Sorted: ";
merged1368.
} for (int i = 0; i < n; i++) cout << arr[i] << " ";
• Space Complexity:
int main() { return 0;
o O(n)O(n)O(n) due to the temporary arrays used for merging.
// Copy any remaining elements of R[] int arr[] = {5, 3, 8, 4, 2}; }
Advantages of Merge Sort
while (j < n2) { int n = sizeof(arr) / sizeof(arr[0]); Complexity:
• Consistent O(nlogn)O(n \log n)O(nlogn) performance regardless of
arr[k] = R[j]; input order. bubbleSort(arr, n); • Best: O(n)O(n)O(n) (if already sorted)
j++; cout << "Bubble Sorted: ";
• Stable sort (preserves the order of equal elements). • Average/Worst: O(n2)O(n^2)O(n2)
k++; for (int i = 0; i < n; i++) cout << arr[i] << " ";
• Well-suited for sorting linked lists and large datasets, and for external • Space: O(1)O(1)O(1) (in-place)
} sorting. return 0;
3. Selection Sort
} Summary }
Explanation:
• Merge Sort divides the array into halves, sorts each half recursively, Complexity: Selection Sort repeatedly finds the minimum element from the unsorted part and
and merges them. swaps it with the first unsorted element. This process continues moving the
// Merge Sort function boundary of the sorted and unsorted parts.
• Best: O(n)O(n)O(n) (if already sorted)
void mergeSort(int arr[], int left, int right) { • It is efficient, stable, and guarantees O(nlogn)O(n \log n)O(nlogn)
Example:
time complexity. • Average/Worst: O(n2)O(n^2)O(n2)
Given array: 5, 3, 8, 4, 2
if (left < right) {
• The merge step is the key operation, combining two sorted arrays into • Space: O(1)O(1)O(1) (in-place)235
int mid = left + (right - left) / 2; • Step 1: Find min (2), swap with 5 → 2,3,8,4,5
one sorted array13568.
2. Insertion Sort
// Sort first and second halves In conclusion: • Step 2: Min is 3 (already in place) → 2,3,8,4,5
Merge Sort is a robust, efficient, and widely-used sorting algorithm in C++, ideal Explanation:
mergeSort(arr, left, mid);
for large datasets and applications requiring stable sorting. Insertion Sort builds the sorted array one element at a time. It picks the next • Step 3: Min is 4, swap with 8 → 2,3,4,8,5
element and inserts it into its correct position among the previously sorted
mergeSort(arr, mid + 1, right);
Citations: elements. • Step 4: Min is 5, swap with 8 → 2,3,4,5,8
// Merge the sorted halves
Bubble Sort, Insertion Sort, and Selection Sort: Explanation, Example, Example: C++ Code:
merge(arr, left, mid, right); Complexity, and C++ Code Given array: 5, 3, 8, 4, 2
cpp
} 1. Bubble Sort • Step 1: 3 is inserted before 5 → 3,5,8,4,2
#include <iostream>
} Explanation: • Step 2: 8 is in correct place → 3,5,8,4,2
Bubble Sort compares adjacent elements in the array and swaps them if they are using namespace std;
in the wrong order. This process is repeated for all elements until the array is • Step 3: 4 is inserted between 3 and 5 → 3,4,5,8,2
sorted. After each pass, the largest unsorted element "bubbles up" to its correct
// Utility function to print the array position at the end of the array235. • Step 4: 2 is inserted at the start → 2,3,4,5,8 void selectionSort(int arr[], int n) {
void printArray(int arr[], int n) { Example: C++ Code: for (int i = 0; i < n - 1; i++) {
Given array: 5, 3, 8, 4, 2
int minIdx = i; 4. If the end of the array is reached without a match, return -1245. o If arr[mid] < target, set low = mid + 1. • Not suitable for linked lists or unsorted data.
for (int j = i + 1; j < n; j++) { C++ Code Example: o If arr[mid] > target, set high = mid - 1. Comparison Table
if (arr[j] < arr[minIdx]) cpp 4. If not found, return -1678. Feature Linear Search Binary Search
minIdx = j; #include <iostream> C++ Code Example:
Array Sorted? Not required Required
} using namespace std; cpp
Time Complexity O(n) O(log n)
int temp = arr[i]; #include <iostream>
Data Structure Any Array (random access)
arr[i] = arr[minIdx]; int linearSearch(int arr[], int n, int target) { using namespace std;
arr[minIdx] = temp; for (int i = 0; i < n; i++) { Simplicity Very simple More complex
} if (arr[i] == target) int binarySearch(int arr[], int n, int target) { Use Case Small/unsorted data Large/sorted data
int arr[] = {5, 3, 8, 4, 2}; } if (arr[mid] == target) • Binary Search repeatedly divides the search interval in half, requiring
sorted data and offering much better performance for large datasets.
int n = sizeof(arr) / sizeof(arr[0]); return mid;
// Calculate count of elements How the Algorithm Works (Example) using namespace std; Node* findMin(Node* node) {
for (int i = 0; i < size; i++) Given array: {170, 45, 75, 90, 802, 24, 2, 66} while (node->left != NULL)
count[(array[i] / place) % 10]++; • Pass 1 (Units place): struct Node { node = node->left;
Sorted by unit digit: {170, 90, 802, 2, 24, 45, 75, 66}
int data; return node;
output[count[(array[i] / place) % 10] - 1] = array[i]; • Best, Average, Worst: O(d⋅(n+k))O(d \cdot (n + k))O(d⋅(n+k)), where if (key < root->data) else if (key > root->data)
ddd is the number of digits, nnn is the number of elements, and kkk is
count[(array[i] / place) % 10]--; the range of the digit (10 for decimal numbers)45. root->left = insert(root->left, key); root->right = deleteNode(root->right, key);
Summary root->right = insert(root->right, key); // Node with only one child or no child
// Copy the output array to the original array return root; if (root->left == NULL) {
• Radix Sort processes each digit of the numbers, using Counting Sort
for (int i = 0; i < size; i++) as a stable subroutine. } Node* temp = root->right;
array[i] = output[i]; • It is efficient for sorting integers, especially when the number of digits Deletion in Binary Search Tree delete root;
is less than the number of elements.
} Algorithm return temp;
• It avoids direct element comparisons, making it faster than A Binary
Search Tree (BST) is a binary tree data structure in which each node To delete a node with value key from BST: }
// Main function to implement radix sort has at most two children, and for every node: 1. Start at the root. else if (root->right == NULL) {
void radixSort(int array[], int size) { • All values in the left subtree are less than the node’s value. 2. Search for the node to delete: Node* temp = root->left;
int max = getMax(array, size); • All values in the right subtree are greater than the node’s value. o If key < node’s value, go left. delete root;
• Both left and right subtrees are themselves BSTs248. o If key > node’s value, go right. return temp;
// Apply counting sort to sort elements based on place value This arrangement enables efficient searching, insertion, and deletion operations, o If key == node’s value, node found. }
making BSTs fundamental in many applications such as dynamic sets, lookup
for (int place = 1; max / place > 0; place *= 10) 3. Handle three cases: // Node with two children
tables, and priority queues9.
countSort(array, size, place); o No children (leaf): Remove the node. Node* temp = findMin(root->right);
Insertion in Binary Search Tree
} o One child: Replace node with its child. root->data = temp->data;
Algorithm
root->right = deleteNode(root->right, temp->data);
1. Start at the root node. o Two children:
// Utility function to print the array }
2. Compare the value to insert with the current node’s value. ▪ Find the node’s inorder successor (smallest
void display(int array[], int size) { value in right subtree). return root;
3. If the value is less, move to the left child; if greater, move to the right
for (int i = 0; i < size; i++) child. ▪ Replace node’s value with successor’s value. }
cout << array[i] << " "; 4. Repeat steps 2-3 until you reach a null pointer (empty spot). ▪ Delete the successor node6. Detailed Steps and Explanations
cout << endl; 5. Insert the new value as a leaf node at this position67. Example Insertion Steps
root->data = temp->data; Summary Table • Limited Use Cases: Mainly beneficial for traversal; not as widely used
o Delete the successor node (which will have at most one
as standard binary trees.
child)69.
root->right = deleteNode(root->right, temp->data); Operation Steps Time Complexity (avg/worst)
Complete Example Program • Overhead: Additional logic is required to distinguish between child
} pointers and thread pointers.
Insertion Traverse, compare, insert O(log n) / O(n)
cpp
return root; How Threads are Implemented in Trees
Deletion Search, handle cases, rebalance O(log n) / O(n)
#include <iostream>
} A threaded binary tree modifies the standard binary tree structure:
using namespace std; Conclusion
• In a normal binary tree, many left or right child pointers are NULL
• A Binary Search Tree efficiently supports dynamic set operations (especially in leaves).
void inorder(Node* root) {
such as search, insert, and delete.
struct Node {
if (root != NULL) { • In a threaded binary tree, these NULL pointers are replaced with
• Insertion places new nodes as leaves, maintaining the BST property. threads pointing to the node's in-order predecessor or successor,
int data;
inorder(root->left); facilitating traversal.
Node *left, *right; • Deletion handles three cases: leaf, one child, two children, with
cout << root->data << " "; special handling for the latter using the inorder successor. Types of Threaded Binary Trees
Node(int val) : data(val), left(NULL), right(NULL) {}
inorder(root->right);
• BSTs are widely used due to their efficient average-case performance • Single Threaded: Only left or right NULL pointers are replaced with
}; and clear structure for ordered data689. threads (usually right).
}
} A thread in computer science generally refers to a lightweight process or a • Double Threaded: Both left and right NULL pointers are replaced with
sequence of executable instructions within a program that can run threads (to predecessor and successor, respectively)6.
Node* insert(Node* root, int key) {
independently and concurrently with other threads145. However, in the context
if (root == NULL) return new Node(key); of trees (specifically, binary trees), a thread has a different meaning: it refers to a Example: Threaded Binary Tree Insertion and Traversal
int main() { special pointer used to make tree traversal more efficient, particularly for in-
if (key < root->data) order traversal, by replacing some NULL pointers with pointers to in-order Node Structure in C++
Node* root = NULL;
predecessor or successor nodes69.
root->left = insert(root->left, key); cpp
root = insert(root, 50);
Advantages and Disadvantages of Threads
else if (key > root->data) class Node {
root = insert(root, 30);
General Multithreading (Software Threads)
root->right = insert(root->right, key); public:
root = insert(root, 20);
Advantages:
return root; int key;
root = insert(root, 40);
• Improved Performance and Concurrency: Threads allow multiple Node *left, *right;
}
root = insert(root, 70); operations to run in parallel, making better use of CPU resources and
improving program responsiveness35811. bool leftThread, rightThread;
root = insert(root, 60);
Node* findMin(Node* node) { • Resource Sharing: Threads within the same process share memory Node(int val) : key(val), left(nullptr), right(nullptr), leftThread(true),
root = insert(root, 80); and resources, enabling efficient communication18. rightThread(true) {}
while (node->left != NULL)
};
• Better Responsiveness: Useful for interactive applications, as one
node = node->left;
cout << "Inorder traversal: "; thread can handle user input while others perform background Insertion (Right Threaded Example)
return node; tasks58.
inorder(root); cpp
} • Simplified Modeling: Natural fit for tasks that can be performed
cout << endl; concurrently, such as handling multiple clients in a server811. Node* insert(Node* root, int key) {
return 0;
int height(Node* n) {
// Left Left Case } Node* search(Node* root, int key) {
return n ? n->height : 0;
if (balance > 1 && key < node->left->key) Summary if (root == nullptr || root->key == key)
}
return rightRotate(node); • An AVL tree is a self-balancing BST where the height difference return root;
(balance factor) between left and right subtrees is at most 1 for every
int getBalance(Node* n) { node1235. if (key < root->key)
Node* x = y->left; Node* temp = minValueNode(root->right); • Deletion involves standard BST deletion followed by updating heights bool leaf;
and rebalancing using rotations to maintain the AVL property246.
Node* T2 = x->right; root->key = temp->key;
x->right = y; root->right = deleteNode(root->right, temp->key); • Both operations are efficient and guarantee logarithmic time due to BTreeNode(int t1, bool leaf1);
the strict balancing of AVL trees.
y->left = T2; } void traverse();
Citations:
y->height = std::max(height(y->left), height(y->right)) + 1; } BTreeNode *search(int k);
1. https://github.com/KhaledAshrafH/AVL-Tree
x->height = std::max(height(x->left), height(x->right)) + 1; if (root == nullptr) return root; };
2. https://www.tutorialspoint.com/cplusplus-program-to-implement-
return x; avl-tree
y->height = std::max(height(y->left), height(y->right)) + 1; return rightRotate(root); Answer from Perplexity: https://www.perplexity.ai/search/summarize-what-are- // Insert, search, and traversal methods would be implemented here
the-differe-61ecHbEzRQWNf4W.Ywftlg?login-source=signupButton&login-
return y; Conclusion:
new=false&utm_source=copy_output
B-Trees provide an efficient, balanced structure for organizing and accessing
} // Left Right large datasets, especially when disk I/O is a concern. Their ability to maintain
6. Short Notes on B-Tree
balance and allow multiple keys per node makes them ideal for database and
if (balance > 1 && getBalance(root->left) < 0) {
A B-Tree is a self-balancing search tree in which each node can contain multiple filesystem implementations158.
Node* minValueNode(Node* node) { root->left = leftRotate(root->left); keys and can have more than two children. It is widely used in databases and file
systems to efficiently manage large blocks of data that cannot fit entirely in 7. General Tree and Conversion to Binary Tree
Node* current = node; return rightRotate(root); memory.
General Tree
while (current->left != nullptr) } Key Properties of B-Tree158:
A General Tree is a hierarchical data structure where each node can have any
number of children, making it highly flexible for representing complex
current = current->left; • Order: If the order of the B-tree is nnn, each node can have at most
relationships (e.g., file systems, organizational charts)69.
nnn children and n−1n-1n−1 keys.
return current; // Right Right
Characteristics:
} if (balance < -1 && getBalance(root->right) <= 0) • Balanced: All leaves are at the same depth (height).
• No restriction on the number of children per node.
return leftRotate(root); • Node Capacity: Each node (except root) must have at least ⌈n/2⌉\lceil
n/2 \rceil⌈n/2⌉ children. • Nodes can have zero or more children.
Node* deleteNode(Node* root, int key) {
• Root: The root must have at least 2 children if it is not a leaf. • Used for representing hierarchical data with variable branching.
// Standard BST delete // Right Left
if (key < root->key) root->right = rightRotate(root->right); • Efficient Operations: Search, insertion, and deletion are all cpp
performed in O(logn)O(\log n)O(logn) time.
root->left = deleteNode(root->left, key); return leftRotate(root); #include <iostream>
Applications:
else if (key > root->key) } #include <vector>
• Used in database indexing and file systems due to efficient disk using namespace std;
root->right = deleteNode(root->right, key); access.
else { return root;
• Suitable for systems that read and write large blocks of data.
class GenTreeNode {
public: BinTreeNode* curr = bRoot->left; 2. Construct a Huffman Tree using a priority queue (min-heap) based Node* buildHuffmanTree(const unordered_map<char, int>& freq) {
on frequencies.
int data; for (size_t i = 1; i < root->children.size(); ++i) { priority_queue<Node*, vector<Node*>, Compare> pq;
3. Generate Huffman Codes by traversing the tree.
vector<GenTreeNode*> children; curr->right = convertToBinary(root->children[i]); for (const auto& pair : freq) {
4. Encode the input using these codes.
GenTreeNode(int val) : data(val) {} curr = curr->right; pq.push(new Node(pair.first, pair.second));
This algorithm is widely used in compression formats like ZIP and JPEG.
}; } }
Huffman Coding Algorithm Steps
Conversion of General Tree to Binary Tree return bRoot; while (pq.size() > 1) {
1. Count the frequency of each character in the input.
Purpose: } Node *left = pq.top(); pq.pop();
To represent a general tree using a binary tree structure, which simplifies storage 2. Create a leaf node for each character and build a min-heap of all leaf
and traversal using standard binary tree algorithms. Example Illustration: nodes. Node *right = pq.top(); pq.pop();
Steps for Conversion7: Suppose a general tree node A has children B, C, D: 3. While there is more than one node in the heap: Node *newNode = new Node('\0', left->frequency + right->frequency);
1. Left-Child: text o Remove the two nodes with the lowest frequency. newNode->left = left;
For each node, keep its first (leftmost) child as the left child in the
A newNode->right = right;
binary tree. o Create a new internal node with these two nodes as
children and frequency equal to the sum of their pq.push(newNode);
├── B
2. Right-Sibling: frequencies.
For each node, link its immediate right sibling as the right child in the ├── C }
binary tree. o Insert the new node back into the min-heap.
└── D return pq.top();
3. Remove Other Children: 4. The remaining node is the root of the Huffman Tree.
All other children (other than the first) are linked as a chain through the After conversion: }
right child pointers. 5. Traverse the tree to assign codes: left edge as '0', right edge as '1'.
text
Result: Example
Each node in the binary tree has at most two children: A // Generate Huffman Codes
Suppose the input is:
/ A:5, B:9, C:12, D:13, E:16, F:45 void generateCodes(Node* root, const string& str, unordered_map<char,
• The left child points to its first child in the general tree.
string>& huffmanCode) {
B • The most frequent character (F) gets the shortest code.
• The right child points to its next sibling. if (!root) return;
\ • The least frequent (A) gets the longest code.
C++ Example: if (!root->left && !root->right) {
C C++ Implementation
cpp huffmanCode[root->character] = str;
\ cpp
// General Tree Node }
D #include <iostream>
class GenTreeNode { generateCodes(root->left, str + "0", huffmanCode);
public: • B is the left child of A. #include <queue>
generateCodes(root->right, str + "1", huffmanCode);
int data; • C is the right child of B. #include <unordered_map>
}
#include <vector>
vector<GenTreeNode*> children; • D is the right child of C.
• General Trees allow any number of children per node; they can be // Comparator for the priority queue (min-heap) }
// Conversion function
systematically converted to binary trees using the left-child, right-
struct Compare { return 0;
BinTreeNode* convertToBinary(GenTreeNode* root) { sibling method for easier processing and storage7.
bool operator()(Node* l, Node* r) { }
if (!root) return nullptr; 1.
return l->frequency > r->frequency; Sample Output:
BinTreeNode* bRoot = new BinTreeNode(root->data); Huffman Algorithm: Explanation and C++ Example
} text
What is Huffman Coding?
}; Huffman Codes:
if (!root->children.empty()) Huffman coding is a lossless data compression algorithm that assigns
variable-length codes to input characters, with shorter codes for more frequent F: 0
bRoot->left = convertToBinary(root->children[0]);
characters and longer codes for less frequent ones. The main steps are:
// Build the Huffman Tree C: 100
1. Build a frequency table for all characters.
D: 101 int keys[M]; // Array of keys (max M-1 keys) i++; root->keys[i] = succ->keys[0];
A: 1100 Node* children[M + 1]; // Array of child pointers (max M children) if (i < root->count && key == root->keys[i]) deleteKey(root->children[i + 1], succ->keys[0]);
Explanation of Output for (int i = 0; i <= M; ++i) children[i] = nullptr; // Insert key in this node if space is available deleteKey(root->children[i], key);
}; for (int j = root->count; j > i; --j) // Rebalancing logic would go here (not shown for brevity)
• A (least frequent) has the longest code: 1100
Searching in an M-way Search Tree root->keys[j] = root->keys[j - 1]; }
• Each code is a unique prefix, so the encoding is unambiguous.
Algorithm: root->keys[i] = key; Note: Full implementation would handle rebalancing after deletion.
Key Points
1. At each node, compare the target value with the keys in the node. root->count++; Summary Table
• Time Complexity: O(nlogn)O(n \log n)O(nlogn), where nnn is the
number of unique characters4. 2. If the value matches a key, return success. } else { Operation Steps Complexity
3. Otherwise, determine the correct child pointer to follow (based on key // Node splitting logic would go here (not shown for brevity)
• Space Complexity: O(n)O(n)O(n) for the tree and code table. Searching Compare keys in node, follow correct child, repeat O(logₘ n)
intervals) and recurse.
}
• Advantage: Produces optimal prefix codes for lossless compression. 4. If a null child is reached, the value is not in the tree569. Find leaf, insert key or split node, propagate split if
Insertion O(logₘ n)
} else { needed
In summary: C++ Code:
Huffman coding efficiently compresses data by assigning shorter codes to insert(root->children[i], key); Remove key, replace with successor/predecessor if
frequent characters. The algorithm builds a binary tree using a min-heap and cpp Deletion O(logₘ n)
needed, rebalance if node underflows
generates codes by traversing the tree. The provided C++ code demonstrates the }
full process from frequency table to code generation1346. Node* search(Node* root, int key) {
} Conclusion
Citations: if (!root) return nullptr;
Note: Full implementation would include node splitting when a node is full. • An m-way search tree is a generalization of BSTs where each node
1. https://www.programiz.com/dsa/huffman-coding int i = 0; can have up to m children and m-1 keys.
Deletion in an M-way Search Tree
2. https://gist.github.com/pwxcoo/72d7d3c5c3698371c21e486722f9b3 while (i < root->count && key > root->keys[i]) • Searching involves comparing keys and following the correct child
Algorithm:
4b pointer.
i++;
1. Search for the key to be deleted.
3. https://www.w3schools.com/dsa/dsa_ref_huffman_coding.php • Insertion adds keys to leaf nodes or splits nodes as needed.
if (i < root->count && key == root->keys[i])
2. If the key is in a leaf node, remove it directly.
4. https://www.tutorialspoint.com/huffman-coding
return root; // Key found • Deletion removes keys and may require rebalancing.
3. If the key is in an internal node:
5. https://github.com/cynricfu/huffman-coding
return search(root->children[i], key); • These trees are foundational for efficient large-scale data storage,
6. https://blog.heycoach.in/huffman-encoding-decoding-in-c/ o Replace it with either its in-order predecessor (largest in left
such as in database indices and file systems12356.
} subtree) or successor (smallest in right subtree), and then
7. https://www.studytonight.com/data-structures/huffman-coding delete that value from the child node. Huffman Coding: Description, Importance in Data Structures, and C++
Insertion in an M-way Search Tree
Example
8. https://iamshnoo.github.io/huffman/index.html?amp=1 4. If a node falls below the minimum number of keys, borrow a key from a
Algorithm: sibling or merge nodes as needed to maintain tree properties68. What is Huffman Coding?
1. Search for the correct leaf node where the new key should be inserted. C++ Code (Simplified):
Huffman coding is a lossless data compression algorithm that assigns
Answer from Perplexity: https://www.perplexity.ai/search/summarize-what-are- variable-length codes to input characters, with shorter codes for more frequent
2. If the node has fewer than m-1 keys, insert the key at the correct cpp
the-differe-61ecHbEzRQWNf4W.Ywftlg?login-source=signupButton&login- characters and longer codes for less frequent ones1689. It is a greedy algorithm
position.
new=false&utm_source=copy_output that builds an optimal prefix code-meaning no code is a prefix of another-
void deleteKey(Node* &root, int key) {
3. If the node is full, split the node: ensuring unambiguous decoding68. Huffman coding is widely used in file
M-way Search Tree: Definition, Operations, and C++ Implementation
if (!root) return; compression (ZIP, GZIP), image and audio compression (JPEG, MP3), and
What is an M-way Search Tree? o Promote the median key to the parent. network data transmission12310.
int i = 0;
An m-way search tree (or multi-way search tree) is a generalization of the binary o Split the node into two nodes, distributing keys and How Huffman Coding Works
search tree (BST) where each node can have up to m children and contains up to children. while (i < root->count && key > root->keys[i])
1. Frequency Calculation:
m-1 keys. The keys within each node are kept in sorted order, and the children
o If the parent is also full, split recursively up to the root, i++; Count the frequency of each character in the input data568.
pointers partition the key space so that:
possibly creating a new root25610.
if (i < root->count && key == root->keys[i]) { 2. Tree Construction:
• All keys in the first child are less than the first key, C++ Code (Simplified, without splitting for brevity):
// Key found o Create a leaf node for each character, storing its frequency.
• Keys in the ith child are between the *(i-1)*th and ith key, cpp
if (!root->children[i]) { o Insert all nodes into a priority queue (min-heap) based on
• All keys in the last child are greater than the last key1356. void insert(Node* &root, int key) {
// Leaf node: remove key
frequency459.
This structure reduces the height of the tree, making search, insertion, and if (!root) { o While more than one node remains:
for (int j = i; j < root->count - 1; ++j)
deletion more efficient, especially for large datasets.
root = new Node(); ▪ Remove the two nodes with the lowest
root->keys[j] = root->keys[j + 1];
Structure of an M-way Search Tree Node (C++ Example) frequencies.
root->keys[0] = key;
root->count--;
cpp ▪ Create a new internal node with these two as
root->count = 1;
} else { children; its frequency is the sum of their
const int M = 4; // Example: 4-way search tree
return; frequencies.
// Internal node: find successor and replace
} ▪ Insert the new node back into the queue.
Node* succ = root->children[i + 1];
struct Node {
int i = 0; o The remaining node is the root of the Huffman tree569.
while (succ->children[0])
int count; // Number of keys in the node
while (i < root->count && key > root->keys[i]) 3. Code Assignment:
succ = succ->children[0];
o Traverse the tree from root to leaves. if (!root) return; Feature Description adj[v].push_back(u); // For undirected graph
o Assign '0' for a left edge and '1' for a right edge. if (!root->left && !root->right) huffmanCode[root->ch] = code; }
Type Lossless compression, greedy algorithm
o The code for each character is the sequence of 0s and 1s generateCodes(root->left, code + "0", huffmanCode); const vector<int>& neighbors(int u) const { return adj[u]; }
Data Structure
along the path from root to that character456. Binary tree (Huffman tree), priority queue (min-heap)
generateCodes(root->right, code + "1", huffmanCode); Used int size() const { return V; }
4. Encoding and Decoding:
} Output Variable-length, prefix-free codes };
o Replace each character in the original data with its code to
2. Insertion and Deletion
compress. Reduces storage/transmission size; optimal for given
Efficiency
int main() { frequencies Insert Vertex:
o For decompression, traverse the Huffman tree according to Increase the vertex count and add a new adjacency list.
the bit sequence until a leaf is reached, then output the // Example frequencies File compression (ZIP, JPEG), network data, storage,
corresponding character47. Applications Insert Edge:
text/audio compression
unordered_map<char, int> freq = {{'A', 5}, {'B', 9}, {'C', 12}, {'D', 13}, {'E', 16}, {'F',
Importance in Data Structures 45}}; cpp
Conclusion
• Efficient Data Compression: priority_queue<Node*, vector<Node*>, Compare> pq; Huffman coding is a foundational algorithm in data structures for efficient, void addEdge(int u, int v) {
Huffman coding minimizes the total number of bits needed to lossless data compression. By using a binary tree and priority queue, it generates
represent data, reducing storage and transmission costs1310. for (auto& pair : freq) adj[u].push_back(v);
optimal, prefix-free codes, enabling significant savings in storage and bandwidth.
pq.push(new Node(pair.first, pair.second)); Its practical importance is evident in many modern compression standards and adj[v].push_back(u); // For undirected graph
• Optimal Prefix Codes: systems1610.
Ensures no code is a prefix of another, preventing ambiguity in }
decoding68. Citations:
// Build Huffman Tree Delete Edge:
• Practical Applications:
Used in file formats (ZIP, JPEG, MP3), network protocols, and storage while (pq.size() > 1) { cpp
Operations on Graphs: Concepts and C++ Code
systems for efficient, lossless compression1210.
Node *left = pq.top(); pq.pop(); void removeEdge(int u, int v) {
Graphs are fundamental data structures in computer science, consisting of a set
• Algorithmic Concepts: Node *right = pq.top(); pq.pop(); of vertices (nodes) and edges (connections). The most common operations on adj[u].erase(remove(adj[u].begin(), adj[u].end(), v), adj[u].end());
Demonstrates the use of greedy algorithms, priority queues, and graphs include:
binary trees in real-world data structure problems69. Node *parent = new Node('\0', left->freq + right->freq); adj[v].erase(remove(adj[v].begin(), adj[v].end(), u), adj[v].end());
parent->left = left;
• Graph Representation
}
C++ Example: Huffman Coding Implementation
• Both algorithms are fundamental for graph analysis and network text 2. Kahn's Algorithm (BFS/Indegree Method)
optimization. dist[src] = 0;
Vertex Distance from Source • Compute the indegree (number of incoming edges) for each vertex.
pq.push({0, src});
A 0 • Enqueue all vertices with indegree 0.
Dijkstra's Algorithm for Shortest Path
B 3
Introduction • While the queue is not empty:
while (!pq.empty()) {
C 2
Dijkstra's algorithm is a classic greedy algorithm used to find the shortest path o Remove a vertex from the queue and add it to the result.
int u = pq.top().second;
from a single source vertex to all other vertices in a weighted, non-negative edge D 8
graph1357. It is widely used in network routing, mapping, and many real-world int d = pq.top().first;
o For each neighbor, decrease its indegree by 1. If indegree
E 10 becomes 0, enqueue it.
shortest path problems.
pq.pop();
F 12 • If all vertices are processed, the ordering is valid. Otherwise, the graph
Algorithm Steps
has a cycle17.
(Distances may vary based on graph representation and edge direction.)
1. Initialization:
// If this distance is not up-to-date, skip C++ Code Example: DFS Approach
Explanation
o Set the distance to the source vertex as 0 and all other
vertices as infinity. if (d > dist[u]) continue; cpp
• The algorithm starts at the source (A), visiting the nearest unvisited
o Mark all vertices as unvisited. vertex at each step and updating the shortest known distances to its #include <iostream>
neighbors.
for (auto edge : adj[u]) { #include <list>
o Use a priority queue (min-heap) to efficiently select the next
vertex with the smallest tentative distance25. • It uses a priority queue to always process the vertex with the smallest
int v = edge.first; #include <stack>
tentative distance next25.
2. Main Loop: int weight = edge.second; using namespace std;
• Once a vertex is marked visited, its shortest distance is finalized and
o While there are unvisited vertices: if (dist[u] + weight < dist[v]) { never updated again137.
▪ Select the unvisited vertex with the smallest dist[v] = dist[u] + weight; Key Points class Graph {
distance (let's call it u).
pq.push({dist[v], v}); • Greedy Approach: Always selects the nearest unvisited vertex137. int V;
▪ For each neighbor v of u, calculate the distance
} list<int> *adj;
from the source to v through u. If this distance is • Time Complexity: O((V+E)logV)O((V + E) \log V)O((V+E)logV) with a
less than the current stored distance for v, min-heap priority queue.
} void topologicalSortUtil(int v, bool visited[], stack<int> &Stack);
update it.
} • Limitation: Works only with non-negative edge weights. public:
▪ Mark u as visited.
Applications Graph(int V);
3. Termination:
cout << "Vertex\tDistance from Source\n"; • GPS navigation and mapping void addEdge(int v, int w);
o When all vertices are visited, the algorithm ends. The
distance array now contains the shortest distances from for (int i = 0; i < n; ++i) void topologicalSort();
• Network routing protocols
the source to every vertex1357.
cout << char('A' + i) << "\t" << dist[i] << endl; };
Output: Output: public:
5 4 2 3 1 0 (One possible valid ordering)56. 4 5 2 0 3 1 (One possible valid ordering)17.
Graph::Graph(int V) { void addEdge(int u, int v) {
C++ Code Example: Kahn's Algorithm (BFS/Indegree) Applications of Topological Sort
this->V = V; adjList[u].insert(v);
cpp • Task scheduling (e.g., build systems, course prerequisites)
adj = new list<int>[V]; adjList[v].insert(u); // For undirected graph
#include <iostream>
} • Resolving symbol dependencies in compilers }
#include <vector>
• Determining the order of compilation const map<int, set<int>>& getAdjList() const { return adjList; }
#include <queue>
void Graph::addEdge(int v, int w) { • Data serialization, circuit design private:
using namespace std;
adj[v].push_back(w); Complexity map<int, set<int>> adjList;
} };
• Both DFS and Kahn's Algorithm run in O(V + E) time, where V = number
void topologicalSort(int V, vector<vector<int>> &adj) {
of vertices, E = number of edges127.
vector<int> indegree(V, 0);
void Graph::topologicalSortUtil(int v, bool visited[], stack<int> &Stack) { Summary Table // BFS function
for (int i = 0; i < V; i++)
visited[v] = true; Data vector<int> bfs(const Graph& graph, int start) {
Method Approach Output Order Cycle Detection
for (int v : adj[i]) Structure
for (auto i = adj[v].begin(); i != adj[v].end(); ++i) set<int> visited;
indegree[v]++; Reverse
if (!visited[*i]) DFS Recursive Stack No (unless checked) queue<int> q;
postorder
topologicalSortUtil(*i, visited, Stack); vector<int> result;
queue<int> q; Kahn's Yes (if not all nodes
Iterative Queue As processed
Stack.push(v); (BFS) processed)
for (int i = 0; i < V; i++)
} In summary: q.push(start);
if (indegree[i] == 0) Topological sorting provides a way to order tasks in a DAG so that all
while (!q.empty()) {
dependencies are respected. It is implemented efficiently in C++ using either
q.push(i);
void Graph::topologicalSort() { DFS (with a stack) or Kahn's Algorithm (using indegrees and a queue)1257. int node = q.front();
Concept: o DFS: Lower, as it only stores nodes along the current path in void bfs(vector<vector<int>>& adj, int start) {
DFS explores as deep as possible along each branch before backtracking. It uses the stack or recursion call stack5.
cout << "DFS Traversal: "; int n = adj.size();
a stack (often implemented via recursion) to keep track of the path.
5. Path Finding and Optimality
g.DFS(0); vector<bool> visited(n, false);
Applications:
cout << endl;
• BFS: Guarantees the shortest path in unweighted graphs5.
queue<int> q;
• Detecting cycles
return 0; • DFS: Does not guarantee the shortest path5. q.push(start);
• Topological sorting
} 6. Applications visited[start] = true;
• Connected components
Output: • BFS: while (!q.empty()) {
C++ Implementation: DFS Traversal: 0 1 3 4 2 5
This shows nodes visited by going as deep as possible before backtracking57. o Finding shortest path in unweighted graphs5. int node = q.front(); q.pop();
cpp
Summary Table o Network broadcasting, social network friend suggestions, cout << node << " ";
#include <iostream> bipartite graph checking145.
for (int neighbor : adj[node]) {
Traversal Data Structure Order Visited Applications
#include <list> • DFS: if (!visited[neighbor]) {
Shortest path, connectivity,
#include <vector> BFS Queue Level by level o Cycle detection, topological sorting, solving
search visited[neighbor] = true;
puzzles/mazes, connected components145.
using namespace std;
Deep before Cycle detection, topological q.push(neighbor);
DFS Stack/Recursion 7. Suitability
backtrack sort
}
class Graph { Conclusion
• BFS: Suitable for searching vertices closer to the source (level-wise
search)4. }
int V;
• BFS and DFS are the two fundamental graph traversal algorithms. }
• DFS: Suitable for solutions that may be far from the source or require
list<int> *adj; exploring all possibilities (deep search)4.
• BFS uses a queue to visit nodes level by level, ideal for shortest path }
void DFSUtil(int v, vector<bool>& visited) { and connectivity. 8. Backtracking DFS Implementation (C++):
visited[v] = true; • DFS uses a stack (or recursion) to explore as deep as possible, useful • BFS: No backtracking4. cpp
for cycle detection and topological sorting.
cout << v << " ";
• DFS: Uses backtracking to explore alternative paths4. #include <iostream>
for (int neighbor : adj[v]) { • Both can be implemented efficiently in C++ using standard data
structures12357. 9. Loop Trapping #include <vector>
if (!visited[neighbor])
Difference Between DFS and BFS (with C++ Code) using namespace std;
• BFS: Less prone to getting trapped in infinite loops (with proper visited
DFSUtil(neighbor, visited); checks)4.
Below is a comprehensive point-wise comparison between Depth-First Search
} (DFS) and Breadth-First Search (BFS), including their principles,
implementation, applications, and C++ code examples.
• DFS: Can get trapped in cycles if visited nodes are not tracked4. void dfsUtil(vector<vector<int>>& adj, int node, vector<bool>& visited) {
}
10. Order of Visiting Nodes visited[node] = true;
1. Definition and Traversal Order
public:
• BFS (Breadth-First Search):
• BFS: Visits siblings before children4. cout << node << " ";
Graph(int V) {
for (int neighbor : adj[node]) {
o Explores all nodes at the present depth level before moving • DFS: Visits children before siblings4.
this->V = V;
on to nodes at the next depth level (layer by layer)145. if (!visited[neighbor])
11. Tree Traversal
adj = new list<int>[V];
• DFS (Depth-First Search): dfsUtil(adj, neighbor, visited);
• BFS: Used for level-order traversal in trees2.
}
o Explores as far as possible along each branch before }
void addEdge(int v, int w) { backtracking (goes deep before wide)145. • DFS: Used for pre-order, in-order, and post-order traversals in trees2.
}
adj[v].push_back(w); 2. Data Structure Used 12. Implementation Complexity
} • BFS: Uses a Queue (FIFO principle) to keep track of the next vertex to • BFS: Straightforward with a queue4.
void dfs(vector<vector<int>>& adj, int start) {
visit147.
void DFS(int v) { • DFS: Can use recursion or explicit stack4. int n = adj.size();
vector<bool> visited(V, false); • DFS: Uses a Stack (LIFO principle) or recursion to keep track of the
13. Cycle Detection
path147. vector<bool> visited(n, false);
DFSUtil(v, visited);
3. Implementation Principle • BFS: Not commonly used for cycle detection. dfsUtil(adj, start, visited);
}
• BFS: First-In-First-Out (FIFO)47. • DFS: Commonly used for cycle detection in graphs4. }
};
14. Example C++ Code 15. Summary Table
• DFS: Last-In-First-Out (LIFO)47.
BFS Implementation (C++):
4. Time and Space Complexity
int main() {
cpp
Graph g(6);
Parameter BFS DFS Let's compute the hash values: for (int val : table[i]) • Good hash functions ensure uniform distribution, minimize collisions,
and are efficient to compute234.
Key kmod 7k \mod 7kmod7 Hash Index cout << val << " -> ";
Deep along a branch, then
Traversal Order Level by level
backtrack cout << "NULL\n"; • Using the division method with table size 7, the given values are
32 32 % 7 = 4 4 distributed as shown, with collisions handled by chaining6710.
Data Structure Queue (FIFO) Stack (LIFO) or Recursion }
49 49 % 7 = 0 0 • The provided C++ code demonstrates insertion and display of the
Space Higher (stores all nodes at a Lower (stores only current } hash table using chaining for collision resolution.
Complexity level) path) 97 97 % 7 = 6 6
~HashTable() {
• When a collision occurs (multiple keys hash to the same index), the
HashTable ht(7); • Input Key: The data to be hashed (e.g., a number, string, file).
Cycle Detection Not typical Common
new key is added to the end of the list at that index.
• Hash Function: The mathematical function that converts the input
16. Conclusion
Resulting Hash Table: for (int i = 0; i < n; ++i) into a hash value.
• BFS is optimal for shortest path and level-wise traversal, uses more Index Keys Stored (Chain) • Hash Table: The data structure that stores the hash values and
ht.insert(keys[i]);
memory, and is implemented with a queue145.
associated data128.
0 49
• DFS is suited for deep exploration, uses less memory, enables Use cases:
backtracking, and is implemented with a stack or recursion145. cout << "Hash Table using Division Method and Chaining:\n";
1 155 → 183
• Fast data lookup in hash tables
• Both are fundamental for graph and tree algorithms, each with distinct ht.display();
2
strengths and applications.
return 0;
• Password storage
• Flexibility: Should work well for a wide range of possible inputs3. 5: NULL int hashFunc(int key, int tableSize) {
• Avoid Patterns: The function should not produce patterns that could list<int> *table; // Array of linked lists Rule for Good Hash 2. Multiplication Method
cause clustering. Explanation
Function
public: • Formula: h(k)=⌊m⋅(kAmod 1)⌋h(k) = \lfloor m \cdot (kA \mod 1)
• Use of Established Algorithms: Prefer well-tested hash functions Uniform Distribution Spread keys evenly across table23 \rfloorh(k)=⌊m⋅(kAmod1)⌋, where 0<A<10 < A < 10<A<1
over custom ones for critical applications3. HashTable(int s) : size(s) {
Reduce number of keys mapping to same • Example: m=10,A=0.618m = 10, A = 0.618m=10,A=0.618, key k=112k
Division Method of Hashing table = new list<int>[size]; Minimize Collisions
index234 = 112k=112:
} h(112)=⌊10×(112×0.618mod 1)⌋h(112) = \lfloor 10 \times (112 \times
The division method is a simple and widely used hash function:
Efficiency Fast computation3 0.618 \mod 1) \rfloorh(112)=⌊10×(112×0.618mod1)⌋
h(k)=kmod mh(k) = k \mod mh(k)=kmodm void insert(int key) {
Deterministic Same input gives same output • C++ Example:
where kkk is the key and mmm is the table size69. int index = key % size;
cpp
table[index].push_back(key); Flexibility Works for various key types3
• Table size mmm should preferably be a prime number, not a power of
int hashFunc(int key, int tableSize) {
2, to help distribute keys more uniformly6. } Scalability Performs well as data grows3
double A = 0.6180339887;
Hashing the Given Values with Table Size 7 void display() {
Avoid Patterns Prevent clustering
return int(tableSize * fmod(key * A, 1));
Given values: 32, 49, 97, 101, 102, 155, 183 for (int i = 0; i < size; ++i) {
Table size (m): 7 Use Established Algorithms Prefer proven hash functions3 }
Hash function: h(k)=kmod 7h(k) = k \mod 7h(k)=kmod7 cout << i << ": ";
Conclusion 3. Folding Method
• Description: Split the key into parts, add them together, then take int index = key % size; Summary Table o The smallest unit of data within a record.
modulo table size.
table[index].push_back(key); Hashing Function Example Use o For example, in a record {101, "Raj", 85}, the field "Raj"
• Example: Key = 123456, split into 123 and 456, sum = 579, then represents the name.
}
579mod m579 \mod m579modm Division (Modulo) Fast, simple, general purpose
4. Data Section:
void display() {
• C++ Example: Multiplication More uniform distribution
o This holds the actual content of the file, meaning the
for (int i = 0; i < size; ++i) {
records are stored in this part.
cpp Folding Large numeric keys
cout << i << ": ";
5. End-of-File (EOF) Marker:
int hashFunc(int key, int tableSize) {
Mid-Square Uniform for certain key types
for (int val : table[i]) cout << val << " -> ";
int part1 = key / 1000;
o A special marker that indicates the end of the file to prevent
cout << "NULL\n"; Cryptographic (SHA, MD) Security, data integrity reading beyond it.
int part2 = key % 1000;
} Collision Resolution Description o In text files, it's often represented as a special character like
return (part1 + part2) % tableSize; EOF or -1.
}
Chaining Linked lists at each index
}
~HashTable() { delete[] table; }
4. Mid-Square Method Linear Probing Next available slot Q2. Describe Various Kinds of Operations Required to Maintain Files. (15
}; Marks)
• Description: Square the key, extract the middle digits, then take Quadratic Probing
Probing with quadratic
int main() { step Introduction:
modulo table size.
HashTable ht(7); File operations are fundamental for managing data stored in external storage.
• Example: Key = 123, 1232=15129123^2 = 151291232=15129, middle Double Hashing Second hash for step size
These operations include creating, opening, reading, writing, updating, and
digits = 151, 151mod m151 \mod m151modm int keys[] = {32, 49, 97, 101, 102, 155, 183};
In summary: deleting files. File handling ensures data permanence, structured access, and
security.
• C++ Example: for (int k : keys) ht.insert(k);
• Hashing is a method to map data to a fixed-size value using a hash
cpp ht.display(); function for efficient storage and retrieval1258.
return 0; Types of File Operations:
int hashFunc(int key, int tableSize) { • Hash functions include division, multiplication, folding, mid-square,
} and cryptographic hashes. 1. Create:
int squared = key * key;
int mid = (squared / 10) % 100; // extract middle two digits Output: • Collisions can be resolved by chaining (linked lists) or open o This operation creates a new file in the system.
addressing (probing)86.
return mid % tableSize; text o A unique filename and location are assigned, and space is
• C++ code examples illustrate both hash function usage and collision allocated.
} 0: 49 -> NULL
resolution.
2. Open:
5. Cryptographic Hash Functions 1: 155 -> 183 -> NULL
Absolutely, Raj! Below is the exam-oriented, rewritten version of all your
o Before any operation (read/write), a file must be opened
2: NULL questions with long, 15-mark-style definitions. C++ code is included only where
• Description: Used in security (e.g., SHA-256, MD5); produce fixed- it's truly needed for explanation. These answers are ideal for a BCA-level theory
using a specific mode (read, write, append, binary, etc.).
length, unique, and irreversible digests25. 3: 101 -> NULL exam—especially one focusing on File Organization or System Software. Each 3. Read:
answer is detailed, structured, and clear for scoring high in exams.
• Example: Hashing a password before storing it. 4: 32 -> 102 -> NULL o Used to retrieve data from a file.
• C++ Example: (using libraries, e.g., OpenSSL) 5: NULL o The data can be read sequentially or randomly depending
Q1. Define File. Describe Constituents of a File. (15 Marks) on the file organization.
14. How can collision be resolved? 6: 97 -> NULL
Definition of a File: 4. Write:
Collision: 2. Open Addressing
A collision occurs when two different keys hash to the same index in the hash A file is a collection of logically related data stored in secondary memory, such o Inserts new data into the file or overwrites existing data
table. • If a collision occurs, probe for the next available slot using a defined as a hard drive or SSD, under a specific name (filename). It is the basic unit of depending on the mode.
sequence. storage used in every computer system for data permanence and accessibility.
Collision Resolution Techniques Files allow data to persist beyond program execution. 5. Append:
• Linear Probing: Check next slot (index + 1, index + 2, ...).
1. Chaining Files are managed by the operating system, and can be of various types—text o Adds new data at the end of the file without modifying
• Quadratic Probing: Check index + 1^2, index + 2^2, etc. files, binary files, executable files, etc. Depending on the file organization existing content.
• Each table index points to a linked list of entries. method, data may be accessed sequentially, directly, or through indexing.
• Double Hashing: Use a second hash function to determine the step 6. Update (Modify):
• All keys that hash to the same index are stored in the list. size.
o Changes or updates part of the data in the file, usually done
• C++ Example: Constituents of a File: by locating the specific record and overwriting it.
• C++ Example (Linear Probing):
cpp 1. Header: 7. Delete:
cpp
#include <iostream> o This is the metadata section that stores important o Removes data or the file itself from the storage system.
const int TABLE_SIZE = 7;
information about the file, such as file type, size, format,
#include <list> int table[TABLE_SIZE] = {0}; record length, and creation/modification dates. o Logical deletion marks a record as deleted; physical
deletion removes it permanently.
using namespace std; void insert(int key) { o For example, a file may have a header stating that it stores
100 records of 50 bytes each. 8. Close:
class HashTable { int hash = key % TABLE_SIZE;
2. Records: o Finalizes the operation, ensuring the data is saved and
int size; while (table[hash] != 0) { resources are released.
o A record is a collection of fields that represent a single unit
list<int>* table; hash = (hash + 1) % TABLE_SIZE; of meaningful information.
public: } o For instance, a student record may contain fields like name, C++ Example: Writing and Reading a File
HashTable(int s) : size(s) { table = new list<int>[size]; } roll number, and marks.
table[hash] = key; #include <iostream>
void insert(int key) { 3. Fields:
} #include <fstream>
using namespace std; o Periodically, the main file and overflow areas are merged Example:
and sorted again to reduce lookup time and fragmentation.
Variable-Length Record: Email messages, social media posts, or chat logs where the length of content
varies significantly.
int main() { • Record sizes may vary due to varying field lengths (e.g., comments,
Diagram: addresses).
// Writing to a file
+-------------------+ +---------------+ Difference Table:
ofstream fout("example.txt"); • Efficient in space but more complex to access and maintain.
| Index File | --> | Key: Address |
fout << "This is file handling in C++."; Complexity Simple Requires delimiters/pointers
+-------------------+ +---------------+
fout.close(); Q12. Describe Primary and Secondary Key
With Example. (15 Marks)
Q9. What is Indexed Sequential File? Explain Techniques for Handling Multiple linked lists for each Central index for each • Employee ID
Structure
Overflow. (15 Marks) key/field field/attribute
• Aadhaar Number
Indexed Sequential File: Many-to-many relationships (e.g., Searching based on multiple
Use Case
Students and Courses) non-primary keys
An indexed sequential file combines the advantages of sequential and direct
access. Records are stored sequentially based on a key field, and an index is Secondary Key:
maintained to allow fast access to blocks or records. Access Quick access using inverted
Traversal through linked records
Type indexes A secondary key is any non-unique attribute used
This type of file organization is suitable for applications like employee databases, for searching, sorting, or filtering. It does not
bank systems, and library management, where both sequential and random Medium (depends on list High (requires maintaining uniquely identify records but enhances flexibility
Complexity
access are frequently needed. connections) multiple indexes) in access.