Dsa 7
Dsa 7
Unit – VI
Indexing and Multiway Trees
This PDF is watermarked and traceable. Unauthorized sharing will result in
permanent access ban and legal action.
1. Files
A file is a named collection of related data stored on a secondary storage device like a hard disk
or SSD. It allows programs to store data permanently and retrieve it whenever required.
Unlike variables (which store data temporarily in RAM), files preserve data even after the
program terminates.
101 Akshay 85
102 Priya 92
103 Rahul 78
104 Sham 88
1.1 Query
A query is a request to retrieve or manipulate data from a database, file, or data structure
based on specific conditions.
Example (File Query): A file of employees records, has ‘employee no’ as primary key and the
‘department code’ and the ‘designation code’ as the secondary keys. Write a procedure to
answer the following query – ‘Which employees from systems department are above
designation level 4?
Problem Statement:
• Query: Find employees from "Systems" department who have designation level greater
than 4.
Procedure (Step-by-Step):
• Department = System
3|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
In a fixed-length record file, each record has the same size (number of bytes), even if some
fields are empty. Space is pre-allocated for every field.
Example:
101 Jayesh 85
102 Rohan 92
1 0 1 J a y e s h
8 5
Advantages:
• Fast access: You can directly jump to the desired record using simple calculations.
Disadvantages:
• Wastage of space: If data is smaller than the allocated size, unused space is wasted.
In a variable-length record file, each record can have different sizes, depending on the actual
data stored. No extra space padding is done. The # is used to indicate end of each field and $ is
used to indicate the end of the file.
Example:
4|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
101 Akshita 85
102 Rakesh 92
103 Sanjay 88
(Here, the "Name" field size varies with the name length.)
1 0 1 # A k s h i t a # 8
5 $
Advantages:
Disadvantages:
• Slower access: Cannot directly jump to a specific record; may require sequential
reading.
• Harder to update: Modifying a record may shift the positions of other records.
File handling means performing operations like creating, opening, reading, writing, and
closing files using programs.
In C++, file handling is done using the fstream library, which provides three important
classes:
Class Purpose
#include <fstream>
#include <fstream>
using namespace std;
int main() {
ofstream file("example.txt"); // Create and open a file
file << "Hello, world!"; // Write to the file
file.close(); // Close the file
return 0;
}
Example: Reading from a File
#include <fstream>
#include <iostream>
using namespace std;
int main() {
ifstream file("example.txt"); // Open the file
string text;
Opening a File
To perform any operation (read/write) on a file, you must open it first. You can open a file in
two ways:
ofstream file;
• mode → (optional) specifies how to open the file (ios::in, ios::out, etc.).
Mode Meaning
#include <iostream>
#include <fstream>
using namespace std;
int main() {
// ios::out - Write Mode
ofstream fout("example.txt", ios::out);
fout << "Hello, World!";
fout.close();
return 0;
}
8|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
2. Closing a File
After all operations are done, you should close the file to:
Syntax: stream_object.close();
Example: file.close();
Example Code:
#include <fstream>
using namespace std;
int main() {
ofstream file;
file.open("sample.txt"); // Open file for writing
file << "Hello DSA Notes!";
file.close(); // Close the file
return 0;
}
Checking if a File Opened Successfully
When opening a file, it’s important to verify whether the file was opened correctly.
If the file doesn't exist or there is a permission error, the program could crash if not checked.
if (file.is_open()) {
} else {
}
9|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
When reading a file, we often need to detect when the file ends. C++ provides the eof()
function for this.
Syntax: file.eof()
• Returns true when the file pointer reaches the end of file.
When reading or writing a file, a pointer keeps track of the current position.
You can move this pointer anywhere inside the file using special functions.
Important Functions:
Function Purpose
Syntax Examples:
Key Points:
Q. Write a C++ program to create a file. Insert records into the file by opening file in append
mode. Search for a specific record into file.
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ofstream outFile("employees.txt", ios::app); // append mode
int empNo;
string name;
outFile << empNo << " " << name << endl;
outFile.close();
ifstream inFile("employees.txt");
int searchNo, no;
string empName;
break;
}
}
if (!found)
cout << "Record not found." << endl;
inFile.close();
return 0;
}
File Organization
File organization refers to the way records are arranged (stored) in a file on storage devices
like hard disks.
The method of organizing affects how fast data can be stored, retrieved, and updated.
In Sequential File Organization, records are stored one after another in a specific order
(usually by a key, like employee number).
When accessing data, the system reads records sequentially from the beginning until it finds
the required one.
Characteristics:
• To find a record, the system may have to search from the start.
• Best for applications where most or all records are processed (like generating reports).
12 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
Example:
1002 Sakshi HR
Advantages: Disadvantages:
1. Create: Create a new file and insert records one after another in sorted order (like by
employee number or roll number).
Example Table:
1002 Sakshi HR
Pseudocode:
void CreateFile() {
// Open a new file in output mode (overwrite if exists)
open file "Employee.dat" in out mode
close file
}
2. Read/Display: Read all records sequentially from the beginning.
Example: Reading each employee's data one after another to print a complete employee list.
Pseudocode:
void ReadAllRecords() {
// Open file in input mode to read records
open file "Employee.dat" in in mode
while not end of file:
read employee
display employee details
close file
}
3. Write: Insert a new record and maintain the order (may require copying).
Pseudocode:
inserted = false
Example:
Pseudocode:
Example:
Pseudocode:
Example:
Pseudocode:
Key Points:
• You can jump directly to any record using its position (like accessing array elements).
Advantages: Disadvantages:
• Very fast access to any record. • Works best when records are fixed-size.
104 Aarav HR
1. Create: Make a new file on disk to store employee records. File is created empty, ready to
store records.
18 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
Pseudocode:
void CreateFile() {
open file "Employee.dat" in out + binary mode
close file
}
2. Write (Insert): Add new records into the file. Each record is inserted at a position directly
calculated from the employee number.
Example: Insert record for Employee No. 106 ("Shruti") at position related to 106.
Pseudocode:
3. Read (Retrieve): Fetch a record from the file using its key. Directly jump to the address
based on the primary key.
Example: Search for Employee No. 105 → Direct jump to fetch Simran’s record.
Pseudocode:
Example: Update Department of Anushka (Employee No. 102) from Marketing to Admin.
Pseudocode:
5. Delete: Remove a record from the file. Mark the record as deleted (or clear the data) at that
position.
Example: Delete Employee No. 104 (Aarav). Mark position 104 as deleted.
Pseudocode:
6. Search: Find a specific record using the primary key. Use the key (Employee No.) to directly
locate the record.
Example: Search for Employee No. 102 → directly access Anushka’s record.
Pseudocode:
It combines the advantages of sequential organization (easy range access) and indexing (fast
search).
Key Points:
• Records are kept in sorted order based on a key field (e.g., roll number, employee number).
• An index is built separately, storing key values and the addresses of corresponding
records.
21 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
• To find a record, first the index is searched (quickly), and then the record is accessed
directly.
• If a record is not found exactly, a small sequential search is done in the related block.
Advantages: Disadvantages:
In Indexed Sequential File Organization, different types of indices are used depending on
how fast we want to search and how big the data is. There are mainly three types of indices:
1. Primary Index
1007 Pranav HR
• Built on the primary key of the file (a unique field, like Employee No or Roll No).
• There is one index entry for each block (not every record).
• The index helps locate the block where the record exists.
22 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
2. Secondary Index
1007 Pranav HR
• Built on a non-primary key field (which may not be unique), like Department or
Designation.
• Multiple records can have the same value for this field.
• The index maintains pointers to all records matching the same field value.
Best when searching frequently on fields other than the primary key.
3. Clustering Index
1007 Pranav HR
• Built when records are physically grouped together based on a non-primary key.
23 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
• All actual records are stored sequentially in sorted order based on a key field (like
Roll No, Employee No, etc.).
• These records are physically arranged on disk in increasing order of the key.
• The index contains selected key values along with the addresses (or locations) of their
corresponding records in the main file.
• Searching is done on the index first to find the location, and then a direct access or
small sequential search is done to reach the exact record.
5. Linked Organization
In Linked Organization, records are not stored in a specific sequence on the storage device.
Instead, each record contains a pointer (address) that links it to the next related record.
• The first record points to the next record, and so on, forming a linked list.
Characteristics:
• Searching can be slow because you may have to follow many pointers.
Example:
Advantages: Disadvantages:
• Easy to insert or delete records without • Slower searching (especially for large
shifting other records. files).
This allows fast searching and easy access to records based on different attributes.
• Each linked list represents one way of organizing the records (example: by
Department, by Designation, etc.).
• So, a single record can participate in multiple lists at the same time.
Characteristics:
• Each logical list (e.g., Department list, Designation list) is independently maintained.
• Very efficient for complex queries (like "find all employees of a department" or "find
all employees of a certain designation").
Example: Imagine we are storing employee records. We want to organize the employees in
two ways: i) By Department and ii) By Designation
Records:
2. Department-wise links:
3. Designation-wise links:
Advantages:
Disadvantages:
• More complex insertion and deletion logic (you must update multiple lists).
Coral Rings are a special type of linked file organization where records are linked in a
circular manner instead of a simple linear way.
In coral rings:
• Last record points back to the first record → making a ring (circular link).
• Hence, you can traverse all the records starting from any node, moving one-by-one,
and eventually come back to the starting point.
• Very useful for cyclic processes (like scheduling, real-time systems, etc.).
Disadvantages:
Structure:
Forward: α → A → B → C → D → α
Backward: α → D → C → B → A → α
28 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
Inverted File is a type of file organization where an index is maintained for every field (or
attribute) that we want to search.
Example:
102 Raj HR
Sakshi 104
Shreya 105
29 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
• Very fast searching on multiple fields. • More space needed to store extra indices.
Cellular Partition is a file organization technique where the entire data (or records) is
divided into smaller groups called cells (or partitions).
• Each cell handles a subset of records based on some criteria (like a range of values,
category, etc.).
• Searching, insertion, deletion, and updating operations happen within the specific cell
instead of the whole file — which makes operations faster.
Key Points
• Partitioning is based on some field (e.g., ID, Name initial, Department, Age range, etc.).
• Each cell can be organized internally in any way — sequential, direct access, etc.
• It reduces search space because instead of searching the full file, we only search within a
relevant partition.
Example: Suppose you have 1000 Employee Records. You can divide them into Cells based
on Employee ID:
Cell ID Range
• If you want to search for Employee ID = 378, you know it will be in Cell 2 (201–400).
Advantages Disadvantages
• Smaller memory scanned each time. • Some cells may become overloaded or
empty.
• Easier to manage and organize.
• Managing many cells can be slightly
• Can grow easily by adding more cells.
complex.
6. External Sort
External Sort is a method used to sort very large files that do not fit into main memory
(RAM). Instead, sorting is done using the disk (external storage), by dividing the file into
smaller manageable parts.
1. Divide: Split the large file into small chunks (called runs) that can fit into RAM.
2. Sort: Sort each chunk individually in memory using any internal sorting algorithm
(like quicksort, mergesort).
4. Merge: Merge all sorted chunks together into one single sorted file using a technique
called k-way merging.
Advantages
Disadvantage: Slower than internal sort because it depends on disk read/write speed.
Example: Suppose you have a file of 10000 records but RAM can hold only 500 records at a
time:
• Step 4: Merge all sorted chunks into one big sorted file.
32 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
Multiway merge sort is a technique of merging 'm' sorted lists into single sorted list. The two-
way merge is a special case of multiway merge sort.
The two-way merge sort makes use of two input tapes and two output tapes for sorting the
records.
Stage 1: Break the records into block. Sort individual record with the help of two input tapes.
Stage 2: Merge the sorted blocks and create a single sorted with the help of two output tapes.
Example: Sort the following list of elements using two-way merge sort with M = 3.
20, 47, 15, 8, 9, 4, 40, 30, 12, 17, 11, 56, 28, 35.
As M = 3, we will break the records in the group of 3 and sort them. Then we will store them
on tape. We will store data on alternate tapes.
Tb1: 15 20 47
2) Read next three records, sort them and store them on Tape Tb2.
Tb2: 4 8 9
3) Read next three records, sort them and store on tape Tb1.
Tb1: 15 20 47 12 30 40
33 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
4) Read next three records, sort them and store on tape Tb2.
Tb2: 4 8 9 11 17 56
5) Read next two remaining records, sort them and store on Tape Tb1.
Tb1: 15 20 47 12 30 40 28 35
Tb1: 15 20 47 12 30 40 28 35
Tb2: 4 8 9 11 17 56
The input tapes Tb1 and Tb2 will use two more output tapes Ta1 and Ta2, for sorting. Finally,
the sorted data will be on tape Ta1.
Tb1: 15 20 47 12 30 40 28 35
Tb2: 4 8 9 11 17 56
We will read the elements from both the tapes Tb1 and Tb2, compare them, and store on Ta1
in sorted order.
Ta1: 4 8 9 15 20 47
Now we will read second blocks from Tb1 and Tb2. Sort the elements and store on Ta2.
Ta2: 11 12 17 30 40 56
Finally read the third block from Tb1 and store in sorted manner on Ta1. We will not compare
this block with Ta2 as there is no third block. Hence, we will get
Ta1: 4 8 9 15 20 47 28 35
Ta2: 11 12 17 30 40 56
Now compare first blocks of Ta1 and Ta2 and store sorted elements on Tb1.
Ta1: 4 8 9 11 12 15 17 20 30 40 47 56
Tb2: 28 35
34 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
Now both Tb1 and Tb2 contains only single block each. Sort the elements from both the blocks
and store the result on Ta1.
Ta1: 4 8 9 11 12 15 17 20 28 30 35 40 47 56
Algorithm/Pseudocode:
return K
In this method instead of two tapes, we use k tapes. The basic two-way merge algorithm is
used. The representation of multiway merge technique is as shown below:
35 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
Example: Sort the following list of elements using two-way merge sort with M = 3.
20, 47, 15, 8, 9, 4, 40, 30, 12, 17, 11, 56, 28, 35.
We will read three records in the memory, sort them and store on tape Tb1, then read next three
records, sort them and store on tape Tb2, similarly store next three sorted records on Tb3.
8, 9, 4 4, 8, 9 Tb2: 4 8 9
1) Now read next 3 records (i.e. 17, 11, 56), sort them and store on Tb1.
Tb1: 15 20 47 11 17 56
Tb2: 4 8 9
Tb3: 12 30 40
2) Read next records (i.e. 28, 35) and sort them store on tape Tb2.
Tb1: 15 20 47 11 17 56
Tb2: 4 8 9 28 35
Tb3: 12 30 40
Tb1: 15 20 47 11 17 56
Tb2: 4 8 9 28 35
36 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
Tb3: 12 30 40
Stage 2: Merging
In this stage, we will build heap for the first elements of first block elements of Tb1, Tb2 and
Tb3 (i.e. 15, 4, 12). Then perform deleteMin operation and store the elements on Ta1.
3)
4)
Ta1: 4 8 9 12 15 20 30 40 47
Step 2: Similarly, by constructing heap for second block of elements performing deleteMin
we get
Ta2: 11 17 28 35 56
Now we have two tapes Ta1 and Ta2. We will now build heap for 4 and 11 (i.e. first elements
of Ta1 and Ta2).
1)
ii) As 4 is from Ta1, we will delete 4 from heap and insert next
element i.e. 8 in heap.
2)
3)
38 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.
Tb1: 4 8 9 11 12 15 17 20 28 30 35 40 47 56
Algorithm/Pseudocode:
Organization Records are stored in Records are stored Records are placed
sequential order. sequentially with an randomly.
index.
Access Type Only sequential access Both sequential and direct Direct (random)
is possible. access possible. access only.
Searching Slow for large files Faster due to index. Very fast (direct
(linear search). lookup).
Insertion Difficult; may require Easier than sequential, but Easy; just add or
/Deletion rewriting the file. index needs updating. remove record.
Use Case When all records are When both sequential and When immediate
processed in order. quick direct access are access to any
needed. record is needed.