0% found this document useful (0 votes)

12 views38 pages

Dsa 7

The document discusses file handling in C++, covering concepts such as fixed-length and variable-length record files, file organization, and basic operations like creating, reading, and updating files. It provides examples of queries to retrieve data from employee records and illustrates the use of the fstream library for file operations. Additionally, it explains the advantages and disadvantages of different file types and organization methods.

Uploaded by

sanchipawar10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views38 pages

Dsa 7

Uploaded by

sanchipawar10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

1|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Unit – VI
Indexing and Multiway Trees
This PDF is watermarked and traceable. Unauthorized sharing will result in
permanent access ban and legal action.

1. Files
A file is a named collection of related data stored on a secondary storage device like a hard disk
or SSD. It allows programs to store data permanently and retrieve it whenever required.
Unlike variables (which store data temporarily in RAM), files preserve data even after the
program terminates.

Example of a file: sample.txt

Roll No Name Marks

101 Akshay 85

102 Priya 92

103 Rahul 78

104 Sham 88

1.1 Query

A query is a request to retrieve or manipulate data from a database, file, or data structure
based on specific conditions.

Example (File Query): A file of employees records, has ‘employee no’ as primary key and the
‘department code’ and the ‘designation code’ as the secondary keys. Write a procedure to
answer the following query – ‘Which employees from systems department are above
designation level 4?

Problem Statement:

• Primary Key: Employee No

• Secondary Keys: Department Code, Designation Code

2|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

• Query: Find employees from "Systems" department who have designation level greater
than 4.

Procedure (Step-by-Step):

1. Open the employee records file.

2. Read each employee record one by one.

3. Check if the Department Code = "Systems".

4. Check if the Designation Code > 4.

5. If both conditions are true, display the employee's details.

6. Repeat until all records are checked.

7. Close the file.

Example: Employee Records Table

Employee No Name Department Code Designation Code Other Details

101 John Systems 5 Software Engineer

102 Chris HR 3 HR Executive

103 Ravi Systems 4 Junior Developer

104 Parth Systems 6 Senior Developer

105 Tejas Finance 5 Accountant

106 Vaibhav Systems 3 Intern

Answer (From Table):

Employee No Name Department Code Designation Code Other Details

101 John Systems 5 Software Engineer

104 Parth Systems 6 Senior Developer

Only John and Parth satisfy both conditions:

• Department = System
3|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

• Designation Code > 4

1.2 Fixed Length Record File

In a fixed-length record file, each record has the same size (number of bytes), even if some
fields are empty. Space is pre-allocated for every field.

Example:

Roll No Name Marks

101 Jayesh 85

102 Rohan 92

The field length can be

Roll No: 4, Name: 10, Marks: 3

The first record is stored as shown below:

1 0 1 J a y e s h
8 5

Advantages:

• Fast access: You can directly jump to the desired record using simple calculations.

• Easy to update: Overwriting a record does not affect other records.

Disadvantages:

• Wastage of space: If data is smaller than the allocated size, unused space is wasted.

• Less flexible: Cannot efficiently store records of very different sizes.

1.3 Variable – Length Record File

In a variable-length record file, each record can have different sizes, depending on the actual
data stored. No extra space padding is done. The # is used to indicate end of each field and $ is
used to indicate the end of the file.

Example:
4|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Roll No Name Marks

101 Akshita 85

102 Rakesh 92

103 Sanjay 88

(Here, the "Name" field size varies with the name length.)

The first record is stored as shown below:

1 0 1 # A k s h i t a # 8
5 $

Advantages:

• Saves storage: No unnecessary space is wasted.

• Flexible: Can easily store records of different sizes.

Disadvantages:

• Slower access: Cannot directly jump to a specific record; may require sequential
reading.

• Harder to update: Modifying a record may shift the positions of other records.

1.4 File Handling in C++

File handling means performing operations like creating, opening, reading, writing, and
closing files using programs.

In C++, file handling is done using the fstream library, which provides three important
classes:

Class Purpose

ifstream For reading from files (input)

ofstream For writing to files (output)

fstream For both reading and writing

5|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Steps in File Handling:

1. Include the header file:

#include <fstream>

2. Create a file stream object: (like ifstream, ofstream, or fstream)

3. Open the file using open() function or constructor.

4. Perform operations like read, write.

5. Close the file using close() function.

Example: Writing to a File

#include <fstream>
using namespace std;

int main() {
ofstream file("example.txt"); // Create and open a file
file << "Hello, world!"; // Write to the file
file.close(); // Close the file
return 0;
}
Example: Reading from a File

#include <fstream>
#include <iostream>
using namespace std;

int main() {
ifstream file("example.txt"); // Open the file
string text;

while (getline(file, text)) { // Read line by line

cout << text << endl;
}
6|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

file.close(); // Close the file

return 0;
}

Opening a File

To perform any operation (read/write) on a file, you must open it first. You can open a file in
two ways:

• Method 1: Using Constructor

ofstream file("data.txt"); // Open file for writing

ifstream file("data.txt"); // Open file for reading

• Method 2: Using open() function

ofstream file;

file.open("data.txt"); // Open file for writing

Syntax: stream_object.open("filename", mode);

• stream_object → object of ifstream, ofstream, or fstream.

• filename → name of the file to be opened.

• mode → (optional) specifies how to open the file (ios::in, ios::out, etc.).

Common File Opening Modes:

Mode Meaning

ios::in Open file for reading

ios::out Open file for writing

ios::app Append to the end of file

ios::trunc Delete file contents if exists

ios::binary Open file in binary mode

7|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Example: Using various file modes

#include <iostream>
#include <fstream>
using namespace std;

int main() {
// ios::out - Write Mode
ofstream fout("example.txt", ios::out);
fout << "Hello, World!";
fout.close();

// ios::in - Read Mode

ifstream fin("example.txt", ios::in);
string word;
fin >> word;
cout << "Read from file: " << word << endl;
fin.close();

// ios::app - Append Mode

fout.open("example.txt", ios::app);
fout << "\nAppending text.";
fout.close();

// ios::binary - Binary Mode

ofstream fbin("binaryfile.dat", ios::binary);
int num = 12345;
fbin.write((char*)&num, sizeof(num));
fbin.close();

return 0;
}
8|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

2. Closing a File

After all operations are done, you should close the file to:

• Save the changes properly.

• Free system resources.

• Avoid file corruption.

Syntax: stream_object.close();

Example: file.close();

Example Code:

#include <fstream>
using namespace std;

int main() {
ofstream file;
file.open("sample.txt"); // Open file for writing
file << "Hello DSA Notes!";
file.close(); // Close the file
return 0;
}
Checking if a File Opened Successfully

When opening a file, it’s important to verify whether the file was opened correctly.
If the file doesn't exist or there is a permission error, the program could crash if not checked.

C++ provides the is_open() function to check this.

if (file.is_open()) {

// File opened successfully

} else {

// Error opening file

}
9|Page © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

• is_open() returns true if the file is successfully opened.

• Returns false if opening the file failed.

Finding End of File (EOF)

When reading a file, we often need to detect when the file ends. C++ provides the eof()
function for this.

Syntax: file.eof()

• Returns true when the file pointer reaches the end of file.

• Returns false if more data is still available.

Positioning the Pointer in a File

When reading or writing a file, a pointer keeps track of the current position.
You can move this pointer anywhere inside the file using special functions.

Important Functions:

Function Purpose

seekg() Move get (read) pointer

seekp() Move put (write) pointer

tellg() Get current get (read) pointer position

tellp() Get current put (write) pointer position

Syntax Examples:

file.seekg(0, ios::beg); // Move to start of file

file.seekp(0, ios::end); // Move to end of file

int pos = file.tellg(); // Get current read position

Key Points:

• seekg() and seekp() are used to move.

• tellg() and tellp() are used to find current position.

10 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Q. Write a C++ program to create a file. Insert records into the file by opening file in append
mode. Search for a specific record into file.

#include <iostream>
#include <fstream>
using namespace std;

int main() {
ofstream outFile("employees.txt", ios::app); // append mode
int empNo;
string name;

cout << "Enter Employee No: ";

cin >> empNo;
cout << "Enter Name: ";
cin >> name;

outFile << empNo << " " << name << endl;
outFile.close();

ifstream inFile("employees.txt");
int searchNo, no;
string empName;

cout << "Enter Employee No to search: ";

cin >> searchNo;

bool found = false;

while (inFile >> no >> empName) {
if (no == searchNo) {
cout << "Record Found: " << no << " " << empName << endl;
found = true;
11 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

break;
}
}
if (!found)
cout << "Record not found." << endl;

inFile.close();
return 0;
}

File Organization
File organization refers to the way records are arranged (stored) in a file on storage devices
like hard disks.

The method of organizing affects how fast data can be stored, retrieved, and updated.

Types of File Organization

2. Sequential File Organization

In Sequential File Organization, records are stored one after another in a specific order
(usually by a key, like employee number).

When accessing data, the system reads records sequentially from the beginning until it finds
the required one.

Characteristics:

• Records are arranged in order (ascending or descending).

• To find a record, the system may have to search from the start.

• Best for applications where most or all records are processed (like generating reports).
12 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Example:

Employee No Name Department

1001 Akash Sales

1002 Sakshi HR

1003 Rohan Systems

(Stored in order of Employee No)

Advantages: Disadvantages:

• Simple to create and maintain. • Slow to search for specific records.

• Efficient when processing all • Difficult to insert new records in the

records. correct order (requires rewriting file).

2.1 Primitive Operations

1. Create: Create a new file and insert records one after another in sorted order (like by
employee number or roll number).

Example Table:

Employee No Name Department

1001 Akash Sales

1002 Sakshi HR

1003 Rohan Systems

13 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Pseudocode:

void CreateFile() {
// Open a new file in output mode (overwrite if exists)
open file "Employee.dat" in out mode
close file
}
2. Read/Display: Read all records sequentially from the beginning.

Example: Reading each employee's data one after another to print a complete employee list.

Pseudocode:

void ReadAllRecords() {
// Open file in input mode to read records
open file "Employee.dat" in in mode
while not end of file:
read employee
display employee details
close file
}
3. Write: Insert a new record and maintain the order (may require copying).

Example: Adding Employee

Employee No Name Department

1004 Ajay Systems

Add at correct position (after 1003).

Pseudocode:

void InsertRecordOrdered(Employee newEmp) {

// Open original file for reading and temporary file for writing
open file "Employee.dat" in in mode
open file "Temp.dat" in out mode
14 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

inserted = false

while not end of file:

read employee

// Check if new record should be inserted before current employee

if (!inserted && newEmp.EmployeeNo < employee.EmployeeNo) {
write newEmp into temp file // Insert the new record
inserted = true
}

write employee into temp file // Write existing record

}

// If new record is largest and not inserted yet

if (!inserted) {
write newEmp into temp file
}

close both files

delete "Employee.dat" // Delete old file
rename "Temp.dat" to "Employee.dat" // Rename temp to original
}
4. Search: Find a record by starting from the beginning and checking each record.

Example:

• To find employee no 1002:

o Check 1001 → not matching

o Check 1002 → found!

Pseudocode:

void SearchRecord(int employeeNo) {

15 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

// Open file to search for a record

open file "Employee.dat" in in mode
found = false
while not end of file:
read employee
if employee.EmployeeNo == employeeNo:
display employee details
found = true
break
if not found:
display "Record not found"
close file
}
5. Update: Search for the record, modify its contents, and rewrite the file if needed.

Example:

• Change Sakshi’s department from HR → Admin:

o Find record 1002.

o Update department field.

Pseudocode:

void UpdateRecord(int employeeNo, Employee updatedEmp) {

// Open original file for reading and temp file for writing
open file "Employee.dat" in in mode
open file "Temp.dat" in out mode
while not end of file:
read employee
if employee.EmployeeNo == employeeNo:
// Write updated employee to temp
write updatedEmp into temp file
else
write employee into temp file
16 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

close both files

delete "Employee.dat" // Delete old file
rename "Temp.dat" to "Employee.dat" // Rename temp to original
}
6. Delete: Search for the record to be deleted. Skip it while copying records to a new file.

Example:

• To delete employee 1001:

o Read 1001 → skip

o Read and copy 1002, 1003 to a new file.

Pseudocode:

void DeleteRecord(int employeeNo) {

// Open original file and temporary file
open file "Employee.dat" in in mode
open file "Temp.dat" in out mode
while not end of file:
read employee
if employee.EmployeeNo != employeeNo:
// Copy all records except the one to delete
write employee into temp file
close both files
delete "Employee.dat" // Delete old file
rename "Temp.dat" to "Employee.dat" // Rename temp to original
}

3. Direct / Random Access File

In a Direct Access File, records can be read, written, or modified directly without reading all
previous records.

Each record has a specific position (address) inside the file.

17 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Key Points:

• You can jump directly to any record using its position (like accessing array elements).

• Faster than sequential access when you want specific records.

• Commonly used in databases, bank systems, inventory systems, etc.

Advantages: Disadvantages:

• Very fast access to any record. • Works best when records are fixed-size.

• Efficient for large files. • Managing deleted/updated records can be

complicated.

3.1 Primitive Operations

Example Table: Employee Records

Employee No. (Primary Key) Name Department

101 Dhruv Systems

102 Anushka Marketing

103 Shruti Systems

104 Aarav HR

105 Simran Systems

1. Create: Make a new file on disk to store employee records. File is created empty, ready to
store records.
18 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Example: Create a file called "Employee.dat" to store employee data.

Pseudocode:

void CreateFile() {
open file "Employee.dat" in out + binary mode
close file
}
2. Write (Insert): Add new records into the file. Each record is inserted at a position directly
calculated from the employee number.

Example: Insert record for Employee No. 106 ("Shruti") at position related to 106.

Pseudocode:

void WriteRecord(Employee emp) {

open file "Employee.dat" in in + out + binary mode
calculate position = (emp.EmployeeNo - 100) * sizeof(Employee)
move write pointer to position
write emp to file
close file
}

3. Read (Retrieve): Fetch a record from the file using its key. Directly jump to the address
based on the primary key.

Example: Search for Employee No. 105 → Direct jump to fetch Simran’s record.

Pseudocode:

Employee ReadRecord(int employeeNo) {

open file "Employee.dat" in in + binary mode
calculate position = (employeeNo - 100) * sizeof(Employee)
move read pointer to position
read employee from file
close file
return employee
}
19 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

4. Update: Modify the details of an existing record.

Example: Update Department of Anushka (Employee No. 102) from Marketing to Admin.

Pseudocode:

void UpdateRecord(Employee updatedEmp) {

open file "Employee.dat" in in + out + binary mode
calculate position = (updatedEmp.EmployeeNo - 100) * sizeof(Employee)
move write pointer to position
write updatedEmp to file
close file
}

5. Delete: Remove a record from the file. Mark the record as deleted (or clear the data) at that
position.

Example: Delete Employee No. 104 (Aarav). Mark position 104 as deleted.

Pseudocode:

void DeleteRecord(int employeeNo) {

open file "Employee.dat" in in + out + binary mode
create blank record
calculate position = (employeeNo - 100) * sizeof(Employee)
move write pointer to position
write blank record to file
close file
}

6. Search: Find a specific record using the primary key. Use the key (Employee No.) to directly
locate the record.

Example: Search for Employee No. 102 → directly access Anushka’s record.

Pseudocode:

void SearchRecord(int employeeNo) {

open file "Employee.dat" in in + binary mode
20 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

calculate position = (employeeNo - 100) * sizeof(Employee)

move read pointer to position
read employee from file
if (employee.EmployeeNo == employeeNo)
display employee details
else
display "Record not found"
close file
}

4. Indexed Sequential File Organization

Indexed Sequential File Organization is a method where:

• Records are stored sequentially (sorted by key, like Employee No),

• And an index is maintained for faster searching.

It combines the advantages of sequential organization (easy range access) and indexing (fast
search).

Key Points:

• Records are kept in sorted order based on a key field (e.g., roll number, employee number).

• An index is built separately, storing key values and the addresses of corresponding
records.
21 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

• To find a record, first the index is searched (quickly), and then the record is accessed
directly.

• If a record is not found exactly, a small sequential search is done in the related block.

Advantages: Disadvantages:

• Fast searching using index. • Extra space required for maintaining

index.
• Easy sequential access for range
queries. • Complex insertion/deletion (index
must be updated too).
• Efficient for both small and large files.
• Overflow handling needed if file
grows.

4.1 Types of Indices

In Indexed Sequential File Organization, different types of indices are used depending on
how fast we want to search and how big the data is. There are mainly three types of indices:

1. Primary Index

Primary Index File: Main File:

EmpNo Address EmpNo Name Department

1001 Addr1 1001 Rahul Systems

1005 Addr3 1003 Swati Finance

1010 Addr5 1005 Akshay Systems

1007 Pranav HR

1010 Shivani Finance

• Built on the primary key of the file (a unique field, like Employee No or Roll No).

• There is one index entry for each block (not every record).

• Records are sorted based on the primary key.

• The index helps locate the block where the record exists.
22 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Best when data is sorted.

2. Secondary Index

Secondary Index File: Main File:

Department List of Addresses EmpNo Name Department

Systems Addr1, Addr3 1001 Rahul Systems

Finance Addr2, Addr5 1003 Swati Finance

HR Addr4 1005 Akshay Systems

1007 Pranav HR

1010 Shivani Finance

• Built on a non-primary key field (which may not be unique), like Department or
Designation.

• Multiple records can have the same value for this field.

• The index maintains pointers to all records matching the same field value.

Best when searching frequently on fields other than the primary key.

3. Clustering Index

Clustering Index File: Main File:

Department Address of First Record EmpNo Name Department

Systems Addr1 1001 Rahul Systems

Finance Addr3 1005 Akshay Systems

HR Addr5 1003 Swati Finance

1010 Shivani Finance

1007 Pranav HR

• Built when records are physically grouped together based on a non-primary key.
23 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

• Only one entry per group is kept in the index.

• Used to cluster similar records together for faster access.

Best when you need to retrieve a group of records together.

4.2 Structure of Index Sequential File

1. Main File (Data Records)

• All actual records are stored sequentially in sorted order based on a key field (like
Roll No, Employee No, etc.).

• These records are physically arranged on disk in increasing order of the key.

2. Index Table (or Index File)

• The index contains selected key values along with the addresses (or locations) of their
corresponding records in the main file.

• Index entries point to blocks or specific records in the main file.

• Searching is done on the index first to find the location, and then a direct access or
small sequential search is done to reach the exact record.

Index File: Main File:

Key (Employee No.) Address Key (Employee No.) Address

1001 Addr of Record1 1001 Aditya
1005 Addr of Record3 1003 Vihaan
1009 Addr of Record5 1005 Ashwini
1007 Priya
1009 Rahul

5. Linked Organization
In Linked Organization, records are not stored in a specific sequence on the storage device.
Instead, each record contains a pointer (address) that links it to the next related record.

• Each record has two parts:

1. Data (actual record information)

24 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

2. Pointer (address of the next record)

• The first record points to the next record, and so on, forming a linked list.

• The last record has a null pointer (indicating end of file).

Characteristics:

• Logical order is maintained via pointers, not physical storage.

• Suitable for dynamic data (insertion and deletion become easier).

• Searching can be slow because you may have to follow many pointers.

Example:

Advantages: Disadvantages:

• Easy to insert or delete records without • Slower searching (especially for large
shifting other records. files).

• No need to store data sequentially on • Extra storage is needed for pointers.

disk.
• If a pointer gets corrupted, the whole
link breaks (data loss risk).

5.1 Multi List Files

In Multi-List File Organization, a record can belong to multiple linked lists

simultaneously based on different fields or keys.
25 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

This allows fast searching and easy access to records based on different attributes.

• Each record contains:

1. Data (actual record information)

2. Multiple pointers — one for each linked list.

• Each linked list represents one way of organizing the records (example: by
Department, by Designation, etc.).

• So, a single record can participate in multiple lists at the same time.

Characteristics:

• Multiple logical structures exist at the same time.

• Each logical list (e.g., Department list, Designation list) is independently maintained.

• Very efficient for complex queries (like "find all employees of a department" or "find
all employees of a certain designation").

Example: Imagine we are storing employee records. We want to organize the employees in
two ways: i) By Department and ii) By Designation

Each record will have:

• Data (Employee details)

• Pointer 1 (next employee in same Department)

• Pointer 2 (next employee in same Designation)

Records:

Emp ID Name Department Designation

101 Akshay Systems Manager

102 Raj HR Executive

103 Kunal Systems Executive

104 Sakshi Marketing Manager

105 Shreya Marketing Executive

26 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Logical Linked Lists:

1. Employee ID List (Primary key)

Header → [Akshay (101)] → [Raj (102)] → [Kunal (103)] → [Sakshi (104)]

→ [Shreya (105)] → NULL

2. Department-wise links:

Systems [Akshay (101)] [Kunal (103)] NULL

HR [Raj (102)] NULL

Marketing [Sakshi (104)] [Shreya (105)] NULL

3. Designation-wise links:

Manager [Akshay (101)] [Sakshi (104)] NULL

Executive [Raj (102)] → [Kunal (103)] [Shreya (105)] NULL

Advantages:

• Fast access through multiple keys.

• Flexible — records can be searched and organized in many ways.

• Efficient for multi-key search operations.

Disadvantages:

• Extra storage space is required for multiple pointers.

• More complex insertion and deletion logic (you must update multiple lists).

5.2 Coral Rings

Coral Rings are a special type of linked file organization where records are linked in a
circular manner instead of a simple linear way.

Coral Ring = Circular Doubly Linked List used in file organization.

In coral rings:

• Each record contains a pointer (or link) to the next record.

• Last record points back to the first record → making a ring (circular link).

• Hence, you can traverse all the records starting from any node, moving one-by-one,
and eventually come back to the starting point.

Advantages of Coral Rings:

• No "end" — you can keep traversing in a loop.

• Very useful for cyclic processes (like scheduling, real-time systems, etc.).

• Easier to insert new nodes without worrying about end-of-list handling.

Disadvantages:

• Infinite loops possible if traversal is not properly controlled.

• Harder to implement if you don't properly manage the "visited" status.

Structure:

• Each record has two links:

o ALINK( ,i) → Forward Link (Next Record)

o BLINK( , ) → Backward Link (Previous Record)

• Node α acts as a control point (head node) to manage the ring.

Forward: α → A → B → C → D → α

5.3 Inverted Files

Inverted File is a type of file organization where an index is maintained for every field (or
attribute) that we want to search.

• Instead of scanning the entire file for a query,

• We first search the index to find the record quickly.

It’s called inverted because instead of records pointing to attributes,

attributes point to records.

Example:

Suppose we have this data file:

EmpID Name Dept

101 Akshay Systems

102 Raj HR

103 Kunal Systems

104 Sakshi Marketing

105 Shreya Marketing

We can create inverted indices like this:

1. Index for Name 2. Index for Department

Name Pointer to Record Department Pointer to Records

Akshay 101 Systems 101, 103

Raj 102 HR 102

Kunal 103 Marketing 104, 105

Sakshi 104

Advantages of Inverted File: Disadvantages:

• Very fast searching on multiple fields. • More space needed to store extra indices.

• Useful when we do a lot of searches on • Inserts/Deletes/Updates become slower

different fields (like name, dept, because the indices must also be updated.
location).

5.4 Cellular Partitions

Cellular Partition is a file organization technique where the entire data (or records) is
divided into smaller groups called cells (or partitions).

• Each cell handles a subset of records based on some criteria (like a range of values,
category, etc.).

• Searching, insertion, deletion, and updating operations happen within the specific cell
instead of the whole file — which makes operations faster.

Key Points

• Partitioning is based on some field (e.g., ID, Name initial, Department, Age range, etc.).

• Each cell can be organized internally in any way — sequential, direct access, etc.

• It reduces search space because instead of searching the full file, we only search within a
relevant partition.

• It’s efficient for large data because it localizes operations.

Example: Suppose you have 1000 Employee Records. You can divide them into Cells based
on Employee ID:

Cell ID Range

Cell 1 IDs from 1 to 200

Cell 2 IDs from 201 to 400

Cell 3 IDs from 401 to 600

Cell 4 IDs from 601 to 800

Cell 5 IDs from 801 to 1000

• If you want to search for Employee ID = 378, you know it will be in Cell 2 (201–400).

• So, you only search in Cell 2, not all 1000 records.

// Suppose we have 5 cells

// Each cell holds 200 records based on ID ranges
int cellNumber = (id - 1) / 200; // Decide which cell to search
searchInCell(cellNumber, id);

Advantages Disadvantages

• Faster search and updates. • Choosing partition rules is tricky.

• Smaller memory scanned each time. • Some cells may become overloaded or
empty.
• Easier to manage and organize.
• Managing many cells can be slightly
• Can grow easily by adding more cells.
complex.

• Sometimes repartitioning is needed if

data grows too much

6. External Sort
External Sort is a method used to sort very large files that do not fit into main memory
(RAM). Instead, sorting is done using the disk (external storage), by dividing the file into
smaller manageable parts.

How External Sort Works

1. Divide: Split the large file into small chunks (called runs) that can fit into RAM.

2. Sort: Sort each chunk individually in memory using any internal sorting algorithm
(like quicksort, mergesort).

3. Store: Write each sorted chunk back to the disk.

4. Merge: Merge all sorted chunks together into one single sorted file using a technique
called k-way merging.

Advantages

• Can handle huge files • Efficient use of memory.

Disadvantage: Slower than internal sort because it depends on disk read/write speed.

6.1 Consequential Processing and Merging Two Lists

Example: Suppose you have a file of 10000 records but RAM can hold only 500 records at a
time:

• Step 1: Divide into 100 chunks of 10000 records.

• Step 2: Sort each chunk.

• Step 3: Save sorted chunks.

6.2 Multiway Merging

Multiway merge sort is a technique of merging 'm' sorted lists into single sorted list. The two-
way merge is a special case of multiway merge sort.

The two-way merge sort makes use of two input tapes and two output tapes for sorting the
records.

It works in two stages:

Stage 1: Break the records into block. Sort individual record with the help of two input tapes.

Stage 2: Merge the sorted blocks and create a single sorted with the help of two output tapes.

Example: Sort the following list of elements using two-way merge sort with M = 3.
20, 47, 15, 8, 9, 4, 40, 30, 12, 17, 11, 56, 28, 35.

As M = 3, we will break the records in the group of 3 and sort them. Then we will store them
on tape. We will store data on alternate tapes.

Stage I: Sorting Phase

1) 20, 47, 15. We arrange in sorted order 15, 20, 47.

Tb1: 15 20 47

2) Read next three records, sort them and store them on Tape Tb2.

Tb2: 4 8 9

3) Read next three records, sort them and store on tape Tb1.

40, 30, 12 12, 30, 40.

4) Read next three records, sort them and store on tape Tb2.

Tb2: 4 8 9 11 17 56

5) Read next two remaining records, sort them and store on Tape Tb1.

28, 35 28, 35.

Tb1: 15 20 47 12 30 40 28 35

At the end of this process, we get:

Tb1: 15 20 47 12 30 40 28 35

Tb2: 4 8 9 11 17 56

Stage II: Merging of Runs

The input tapes Tb1 and Tb2 will use two more output tapes Ta1 and Ta2, for sorting. Finally,
the sorted data will be on tape Ta1.

Tb1: 15 20 47 12 30 40 28 35

Tb2: 4 8 9 11 17 56

We will read the elements from both the tapes Tb1 and Tb2, compare them, and store on Ta1
in sorted order.

Ta1: 4 8 9 15 20 47

Now we will read second blocks from Tb1 and Tb2. Sort the elements and store on Ta2.

Ta2: 11 12 17 30 40 56

Finally read the third block from Tb1 and store in sorted manner on Ta1. We will not compare
this block with Ta2 as there is no third block. Hence, we will get

Ta1: 4 8 9 15 20 47 28 35

Ta2: 11 12 17 30 40 56

Now compare first blocks of Ta1 and Ta2 and store sorted elements on Tb1.

Ta1: 4 8 9 11 12 15 17 20 30 40 47 56

Now both Tb1 and Tb2 contains only single block each. Sort the elements from both the blocks
and store the result on Ta1.

Ta1: 4 8 9 11 12 15 17 20 28 30 35 40 47 56

Thus, we get the sorted list.

Algorithm/Pseudocode:

Algorithm TwoWayMerge(X, Y):

// INPUT: X, Y (sorted arrays)
// OUTPUT: K (merged sorted array)

K <- an empty array

while X and Y are not empty:

if X[0] <= Y[0]:
append X[0] to K
remove first element from X
else:
append Y[0] to K
remove first element from Y

// Append remaining elements

append remaining elements of X to K
append remaining elements of Y to K

return K

6.3 K Way Merge Algorithm

In this method instead of two tapes, we use k tapes. The basic two-way merge algorithm is
used. The representation of multiway merge technique is as shown below:
35 | P a g e © Haris Chaus | ALL RIGHTS ARE RESERVED as per copyright act.

Example: Sort the following list of elements using two-way merge sort with M = 3.
20, 47, 15, 8, 9, 4, 40, 30, 12, 17, 11, 56, 28, 35.

We will read three records in the memory, sort them and store on tape Tb1, then read next three
records, sort them and store on tape Tb2, similarly store next three sorted records on Tb3.

20, 47, 15 15, 20, 47 Tb1: 15 20 47

8, 9, 4 4, 8, 9 Tb2: 4 8 9

40, 30, 12 12, 30, 40 Tb3: 12 30 40

1) Now read next 3 records (i.e. 17, 11, 56), sort them and store on Tb1.

Tb1: 15 20 47 11 17 56

Tb2: 4 8 9

Tb3: 12 30 40

2) Read next records (i.e. 28, 35) and sort them store on tape Tb2.

Tb1: 15 20 47 11 17 56

Tb2: 4 8 9 28 35

Tb3: 12 30 40

Nothing will be stored on Tb3. Thus, we get

Tb1: 15 20 47 11 17 56

Tb3: 12 30 40

Stage 2: Merging

In this stage, we will build heap for the first elements of first block elements of Tb1, Tb2 and
Tb3 (i.e. 15, 4, 12). Then perform deleteMin operation and store the elements on Ta1.

Step 1: 1) Build heap for 15, 4, 12.

i) Delete 4, store it on Ta1.

ii) As 4 is from Tb2, insert next element of Tb2 i.e.

8 in heap.

2) Build heap for 15, 8, 12

i) Delete 8, store it on Ta1.

ii) As 8 is from Tb2, insert next element i.e. 9 in heap.

i) Delete 9, store it on Ta1.

ii) There is no next record from first block of Tb2.

iii) Select next element from Tb1, i.e. 20 insert it in

i) Delete 12, store it on Ta1.

ii) As 12 is from Tb3, select next element of 12, i.e. 30 and

insert it in heap.

5) Proceeding this way, we get:

Ta1: 4 8 9 12 15 20 30 40 47

Step 2: Similarly, by constructing heap for second block of elements performing deleteMin
we get

Ta2: 11 17 28 35 56

Now we have two tapes Ta1 and Ta2. We will now build heap for 4 and 11 (i.e. first elements
of Ta1 and Ta2).

i) Delete 4, insert it in Tb1.

ii) As 4 is from Ta1, we will delete 4 from heap and insert next
element i.e. 8 in heap.

i) Delete 8, store on Tb1.

ii) Next element of 8 is 9. So insert it in the heap.

4) Proceeding in this manner we get sorted list on Tb1 as:

Tb1: 4 8 9 11 12 15 17 20 28 30 35 40 47 56

Algorithm/Pseudocode:

Algorithm KWayMerge(Item, k):

while not all lists in Item are empty:
minItem <- MinIndex(Item, k)
processItem(minItem)
if minItem is exhausted:
continue to the next list
else:
MoreItems[minItem] <- NextItemInTheList(minItem)

Comparison between Sequential, Indexed Sequential, Direct Access File

Feature Sequential File Indexed Sequential File Direct Access File

Organization Records are stored in Records are stored Records are placed
sequential order. sequentially with an randomly.
index.

Access Type Only sequential access Both sequential and direct Direct (random)
is possible. access possible. access only.

Searching Slow for large files Faster due to index. Very fast (direct
(linear search). lookup).

Insertion Difficult; may require Easier than sequential, but Easy; just add or
/Deletion rewriting the file. index needs updating. remove record.

Use Case When all records are When both sequential and When immediate
processed in order. quick direct access are access to any
needed. record is needed.

Example Payroll processing Student records with roll Airline reservation

(monthly salary list). number indexing. systems.

Database Management System Lab CEN3011: Dr. Muhammad Umair Khan Lab 1
No ratings yet
Database Management System Lab CEN3011: Dr. Muhammad Umair Khan Lab 1
14 pages
DELEM
50% (4)
DELEM
150 pages
DSA Unit 6 Notes Rishit Sinha
No ratings yet
DSA Unit 6 Notes Rishit Sinha
39 pages
Unit VI
No ratings yet
Unit VI
52 pages
C++ - File Handling
No ratings yet
C++ - File Handling
8 pages
File Operations
No ratings yet
File Operations
34 pages
FILE HANDLING v2
No ratings yet
FILE HANDLING v2
10 pages
C++ Notes Unit 5
No ratings yet
C++ Notes Unit 5
14 pages
Working With Files: A Presentation On
No ratings yet
Working With Files: A Presentation On
27 pages
Unit 4
No ratings yet
Unit 4
40 pages
FilesAndStreams 25052023 090139am 12122023 083834am
No ratings yet
FilesAndStreams 25052023 090139am 12122023 083834am
66 pages
What Is A File?
No ratings yet
What Is A File?
21 pages
File Handling
No ratings yet
File Handling
35 pages
Lesson 21
No ratings yet
Lesson 21
12 pages
CHAPTER 11: Working With Files: File Field1 Field2
No ratings yet
CHAPTER 11: Working With Files: File Field1 Field2
8 pages
Object Oriented Programming: File Handling in C++
No ratings yet
Object Oriented Programming: File Handling in C++
58 pages
Chapter-4 - File
No ratings yet
Chapter-4 - File
30 pages
C++ Chapter8
No ratings yet
C++ Chapter8
14 pages
C++ File Handing
No ratings yet
C++ File Handing
21 pages
Chapter 4. File Operations
No ratings yet
Chapter 4. File Operations
49 pages
Chapter 4 - Files
No ratings yet
Chapter 4 - Files
42 pages
Lab 12 Filing - 2
No ratings yet
Lab 12 Filing - 2
21 pages
File Handling FoP
No ratings yet
File Handling FoP
33 pages
Lecture 5 File Handling Part 1
No ratings yet
Lecture 5 File Handling Part 1
29 pages
File Processing in C++
100% (1)
File Processing in C++
9 pages
Files
No ratings yet
Files
9 pages
File Handling
No ratings yet
File Handling
5 pages
File Handling in C++
No ratings yet
File Handling in C++
5 pages
File Handling in C++
No ratings yet
File Handling in C++
10 pages
17 Files 19062024 120658pm
No ratings yet
17 Files 19062024 120658pm
54 pages
Filehandling
No ratings yet
Filehandling
8 pages
10 PPT9
No ratings yet
10 PPT9
39 pages
C++ Unit 5
No ratings yet
C++ Unit 5
17 pages
Unit 5
No ratings yet
Unit 5
19 pages
C++ Unit V
No ratings yet
C++ Unit V
14 pages
PPT10
No ratings yet
PPT10
52 pages
Chapter 4 - File
No ratings yet
Chapter 4 - File
5 pages
Lecture 3.3.2 Reading and Writing To Files Random Access To Files
No ratings yet
Lecture 3.3.2 Reading and Writing To Files Random Access To Files
41 pages
CPP Unit-Iv
No ratings yet
CPP Unit-Iv
21 pages
Chapter 06 File I-0
No ratings yet
Chapter 06 File I-0
55 pages
File Handling in C ++
No ratings yet
File Handling in C ++
74 pages
File Handling
No ratings yet
File Handling
10 pages
M5 Filehandling
No ratings yet
M5 Filehandling
32 pages
Unit6 Notes
No ratings yet
Unit6 Notes
26 pages
OOP Lab 14
No ratings yet
OOP Lab 14
11 pages
Oops 5
No ratings yet
Oops 5
71 pages
File Handling
No ratings yet
File Handling
35 pages
5.1 C++ Stream Classes, Stream Classes in C++ Are Used To Input and Output Operations On Files and Io Devices. These
No ratings yet
5.1 C++ Stream Classes, Stream Classes in C++ Are Used To Input and Output Operations On Files and Io Devices. These
6 pages
File Hanling - New - C++
No ratings yet
File Hanling - New - C++
26 pages
Console I/O Operations Console I/O Operations
No ratings yet
Console I/O Operations Console I/O Operations
66 pages
File Handling
No ratings yet
File Handling
56 pages
7 Data File Handling
No ratings yet
7 Data File Handling
40 pages
Chapter 6b
No ratings yet
Chapter 6b
34 pages
Working With Files
No ratings yet
Working With Files
31 pages
File Stream Classes:-: Steps of File Operations
No ratings yet
File Stream Classes:-: Steps of File Operations
28 pages
Chapter Four: File I/O
No ratings yet
Chapter Four: File I/O
35 pages
AEP CS2 Files 2
No ratings yet
AEP CS2 Files 2
106 pages
Chapter Four-File Processing v1.0
No ratings yet
Chapter Four-File Processing v1.0
32 pages
CS210 - OOP Using C++ Lab 11-A: Input/Output With Files
No ratings yet
CS210 - OOP Using C++ Lab 11-A: Input/Output With Files
5 pages
Unit-6 Managing IO Formats and Operations
No ratings yet
Unit-6 Managing IO Formats and Operations
5 pages
Python File Handling Made Easy: A Practical Guide with Examples
From Everand
Python File Handling Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
PPL June
No ratings yet
PPL June
26 pages
Dsa 4
No ratings yet
Dsa 4
50 pages
Dsa 3
No ratings yet
Dsa 3
44 pages
Se 1
No ratings yet
Se 1
39 pages
EXERCISE 2 ARRAY RafiRidzuan
No ratings yet
EXERCISE 2 ARRAY RafiRidzuan
6 pages
MAT 4052 22 Mar 2024
No ratings yet
MAT 4052 22 Mar 2024
6 pages
Iteanz AEM Course Path
No ratings yet
Iteanz AEM Course Path
23 pages
2344 1101 Zensar Placement Paper A Interview Questions
No ratings yet
2344 1101 Zensar Placement Paper A Interview Questions
1 page
2as Test Unit Make Peace Nobel Peace Prize Tests 131908
100% (1)
2as Test Unit Make Peace Nobel Peace Prize Tests 131908
6 pages
Sci Fi Anthology Selections
No ratings yet
Sci Fi Anthology Selections
2 pages
Discuss The Following Questions With Your Partner .Say at Least Five Sentences For Each Question
No ratings yet
Discuss The Following Questions With Your Partner .Say at Least Five Sentences For Each Question
2 pages
A Changing Ecclesiology in A Changing Church
No ratings yet
A Changing Ecclesiology in A Changing Church
27 pages
Reported Speech, Passive, Relative Clause
No ratings yet
Reported Speech, Passive, Relative Clause
5 pages
Introduction To Pairings
No ratings yet
Introduction To Pairings
19 pages
Chinese Mandarin: Greetings
No ratings yet
Chinese Mandarin: Greetings
6 pages
Esp Approach Not Product
No ratings yet
Esp Approach Not Product
9 pages
Imaginary Situations
No ratings yet
Imaginary Situations
5 pages
Assignment Error Analysis
No ratings yet
Assignment Error Analysis
3 pages
JSP Quick Reference Card
No ratings yet
JSP Quick Reference Card
4 pages
Nagarjuna CV
No ratings yet
Nagarjuna CV
2 pages
Forward Error Handling
100% (1)
Forward Error Handling
5 pages
The Relationship Between Vocabulary Knowledge and English Reading Comprehension Achievement By: Yulia Rizki Ramadhani
No ratings yet
The Relationship Between Vocabulary Knowledge and English Reading Comprehension Achievement By: Yulia Rizki Ramadhani
9 pages
DWC Pcie DM Databook 4.10a 2012
No ratings yet
DWC Pcie DM Databook 4.10a 2012
1,460 pages
DLP-1 3
100% (3)
DLP-1 3
5 pages
Evaluation of Rajput States
No ratings yet
Evaluation of Rajput States
7 pages
2019 ICLR CuBERT Pre Trained Contextual Embedding of Source Code
No ratings yet
2019 ICLR CuBERT Pre Trained Contextual Embedding of Source Code
22 pages
A Web-Based Modeling Tool For The SEMAT Essence Theory of Software Engineering
No ratings yet
A Web-Based Modeling Tool For The SEMAT Essence Theory of Software Engineering
7 pages
LRDI Questions For CAT
No ratings yet
LRDI Questions For CAT
11 pages
HIS - D'ele HER - D Ela: Lesson Three
No ratings yet
HIS - D'ele HER - D Ela: Lesson Three
6 pages
5 6100201510036047580
No ratings yet
5 6100201510036047580
53 pages
10 - Part 2 PDF
No ratings yet
10 - Part 2 PDF
230 pages
IOM House Style Manual - 2020
No ratings yet
IOM House Style Manual - 2020
66 pages
Meta Serif TF Bold
No ratings yet
Meta Serif TF Bold
4 pages