Custom File Allocation System
Custom File Allocation System
A MINI-PROJECT REPORT
Submitted By
Aditya Jayant[RA2311003020767]
Annie Rishona[RA2311003020768]
Harshavardhan[RA2311003020769] in
of
BACHELOR OF TECHNOLOGY
in
OCTOBER 2024
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University U/S 3 of UGC Act, 1956) BONAFIDE
CERTIFICATE
Certified that this mini project report titled “Custom file allocation
system ” is the bonafide work of “Aditya
Jayant(RA2311003020767), Annie Rishona (RA2311003020768),
Harshavardhan (RA2311003020769)” of CSE ‘L’ submitted for the
course 21CSC202J (Operating Systems) for the Academic year 2024 - 2025 Odd
Semester
SIGNATURE
Dr. Faritha Banu J, M.E., Ph.D.,
Associate Professor,
DECLARATION
We hereby declare that the entire work contained in this mini project
report titled “CUSTOM FILE ALLOCATION SYSTEM ” has been
carried out by Aditya jayant (RA2311003020767), Annie Rishona
(RA2311003020768), Harshavardhan
Aditya Jayant
Annie Rishona
Harshavardhan
Place: Chennai Date: 04/10/2024
ABSTRACT
last_login
2. Files Table
● Purpose: To store information about files in the system.
● Columns:
o file_id (Primary Key) o
file_name
o file_extension o file_size
(e.g., in bytes) o file_path (location
of the file) oowner_id (Foreign Key
from Users table) o created_at o
modified_at o accessed_at
3. Directories Table
● Purpose: To store directory structures.
● Columns:
o directory_id (Primary Key) o
directory_name
o parent_directory_id (Self-referencing
Foreign Key to represent nested directories)
o owner_id (Foreign Key from Users table)
o created_at o modified_at
4. File Metadata Table
● Purpose: To store additional metadata for files.
● Columns:
o metadata_id (Primary Key) o file_id (Foreign
Key from Files table) o key (e.g., "author",
"last_modified_by") o value (The associated value
for the metadata key)
5. File Permissions Table
● Purpose: To manage file permissions (read, write, execute) for different
users or groups.
● Columns:
o permission_id (Primary Key) o file_id (Foreign Key from
Files table) o user_id (Foreign Key from Users table)
o permission_type (e.g., read, write, execute)
o granted_at
6. Groups Table
● Purpose: To manage user groups for easier permission management.
● Columns:
o group_id (Primary Key) o group_name
o created_at
7. Group Members Table
● Purpose: To track users belonging to different groups.
● Columns:
o group_member_id (Primary Key) o
group_id (Foreign Key from Groups table)
o user_id (Foreign Key from Users table)
8. File Versions Table
● Purpose: To manage different versions of a file.
● Columns:
o version_id (Primary Key) o file_id (Foreign Key
from Files table) o version_number o created_at
o version_description
9. Audit Logs Table
● Purpose: To track file system activities for security and auditing purposes.
● Columns:
o log_id (Primary Key) o user_id (Foreign Key from Users table)
o file_id (Foreign Key from Files table, nullable if it's a directory action) o
action (e.g., create, delete, update, move)
o timestamp
10. Storage Devices Table
● Purpose: To manage the physical or virtual devices where files are stored.
● Columns:
o device_id (Primary Key) o
device_name
o total_capacity (e.g., in GB or
TB) o available_space o
mounted_at
11. File Allocation Table
● Purpose: To map file data blocks to their actual locations on the storage
device.
● Columns:
o allocation_id (Primary Key) o file_id (Foreign Key
from Files table) o block_number (Location of the file
chunk in storage) o device_id (Foreign Key from
Storage Devices table) o allocated_at
12. File Tags Table
● Purpose: To allow tagging files for better organization or searching.
● Columns:
o tag_id (Primary Key)
o file_id (Foreign Key from Files table)
o tag_name
13. File Locks Table
● Purpose: To manage file locks, preventing concurrent edits.
● Columns:
o lock_id (Primary Key) o file_id (Foreign Key from Files
table) o locked_by (Foreign Key from Users table) o
locked_at
7. Directory Structure
● Description: This visualization depicts the hierarchical directory structure
used to organize files and directories in the system. It shows the parent-child
relationships between directories and files, helping users navigate the file system
added, moved, or deleted.
8. File Allocation Table (FAT) Representation
● Description: This figure shows how the File Allocation Table (FAT) works,
a popular method used in file systems like FAT32. It maps file blocks to their
physical locations on disk and helps in managing free and occupied spaces.
● .
9. Fragmentation: Internal vs. External
● Description: This figure contrasts internal and external fragmentation in
file allocation. Internal fragmentation occurs when allocated blocks are larger than
the needed file size, wasting space. External fragmentation happens when free
space is scattered across the disk, preventing large files from being stored
contiguously.
● Detailed Explanation: The figure would use examples to demonstrate both
.
LIST OF ABBREVIATIONS: -
1. CFAS – Custom File Allocation System
● Definition: A system designed to efficiently allocate and manage file storage on
a disk. CFAS uses a combination of file allocation methods such as contiguous,
linked, and indexed allocation to manage files, aiming to optimize space utilization
and speed of access.
2. FAT – File Allocation Table
● Definition: A legacy file system used in many operating systems (e.g., DOS,
Windows). FAT keeps track of where files are stored on a disk by maintaining a
table that maps each file to the physical location of its blocks.
3. SSD – Solid-State Drive
●Definition: A modern type of storage device that uses flash memory instead of
magnetic disks (used in HDDs) to store data. SSDs offer faster data access speeds
and better durability than traditional hard drives.
4. HDD – Hard Disk Drive
● Definition: A traditional storage device that stores data magnetically on spinning
disks (platters). Though slower than SSDs, HDDs typically offer higher storage
capacities at a lower cost.
5. OS – Operating System
● Definition: The software that manages computer hardware and provides services
for computer programs. Examples include Windows, Linux, and macOS. The OS
plays a critical role in managing file systems and disk allocation.
6. FAT32 – File Allocation Table 32-bit
● Definition: An improved version of the FAT file system that uses 32 bits for
addressing file locations. It supports larger file sizes and disk partitions compared
to the older FAT16.
7. NTFS – New Technology File System
● Definition: A modern file system developed by Microsoft for the
Windows operating system, offering better security, support for large files, and
advanced features like file compression and encryption compared to FAT-based
systems.
8. I/O – Input/Output
● Definition: Refers to the communication between a computer system and the
outside world (e.g., user input from a keyboard, or output displayed on a screen).
I/O operations in the context of file systems involve reading from or writing to
disk storage.
9. RAM – Random Access Memory
● Definition: A type of computer memory that can be accessed randomly. RAM is
used for storing data that is actively being processed by the CPU. Unlike SSD or
HDD, RAM is volatile, meaning data is lost when the system powers down.
10. CPU – Central Processing Unit
● Definition: The "brain" of a computer that performs instructions from programs.
In the context of file systems, the CPU coordinates file access, allocation, and data
transfer between memory (RAM) and disk storage (HDD/SSD).
11. FS – File System
● Definition: A method and data structure that an operating system uses to control
how data is stored and retrieved on storage devices (e.g., SSD, HDD).
Examples include FAT, NTFS, ext4 (Linux), and APFS (macOS).
12. VFS – Virtual File System
● Definition: An abstraction layer that provides a common interface for different
file systems, allowing an operating system to support multiple types of file systems
(e.g., FAT, NTFS) without requiring specific code for each.
13. FMS – File Management System
●Definition: Software that helps users and applications organize, store, retrieve,
and manage files. A file management system is responsible for creating, deleting,
and modifying files, as well as organizing files into directories.
14. POSIX – Portable Operating System Interface
● Definition: A set of standards specified by the IEEE for maintaining
compatibility between operating systems. POSIX defines APIs (application
programming interfaces) for functions like file operations, making it easier to
write cross-platform software.
15. API – Application Programming Interface
● Definition: A set of rules and tools that allows software applications to
communicate with each other. In the context of file systems, APIs allow programs
to interact with the OS to perform file operations like opening, reading, and writing
files.
16. UUID – Universally Unique Identifier
●Definition: A 128-bit identifier used to uniquely identify objects, such as files,
across systems and networks. UUIDs ensure that identifiers remain unique even
across different systems, making them useful for tracking files in distributed
systems.
17. ACL – Access Control List
● Definition:
A list associated with a file or directory that specifies which users or
system processes have permissions (such as read, write, execute) to access it.
ACLs are used for enforcing security in file systems.
18. EOF – End of File
● Definition: A special marker or character used to indicate the end of a file in file
systems. EOF is commonly used in file reading operations to signal that no more
data can be read from a file.
19. MBR – Master Boot Record
● Definition: The first sector of a storage device (such as a hard drive) that contains
information about the partitions on the disk and the code needed to start the boot
process of an operating system.
20. GPT – GUID Partition Table
● Definition: A modern partitioning system for disks that supports larger disk sizes
and a more flexible partitioning scheme compared to MBR. GPT uses GUIDs
(Globally Unique Identifiers) to identify partitions, providing enhanced
reliability and performance.
Software Requirements
a. Operating System
● Windows, Linux, or macOS – The development and deployment of CFAS
should be compatible with a major operating system to simulate real-world usage.
Linux might be preferred for low-level file system manipulation.
b. Programming Language
● C/C++ – Ideal for low-level system programming and file system
operations, providing fine-grained control over memory and disk management.
● Python – Useful for developing simulation tools or prototypes for file
allocation methods, although not as efficient for low-level disk operations.
c. Development Tools and IDEs
● Visual Studio (for Windows), Code::Blocks, or Eclipse – Integrated
development environments (IDEs) that support C/C++ for system-level
programming.
● GCC (GNU Compiler Collection) – For compiling C/C++ programs,
particularly in Linux environments.
● GDB (GNU Debugger) – For debugging and troubleshooting CFAS during
development.
d. Libraries and APIs
● POSIX API – To provide compatibility for file operations like creating,
reading, writing, and manipulating files and directories.
● Filesystem Libraries (e.g., <filesystem> in C++) – For easier manipulation
of file paths and directories in simulations or higher-level operations.
● SQLite – Optional, for managing metadata and version control simulations
if a database structure is needed.
e. Virtual Machine or Emulator (Optional)
●VMware Workstation, Oracle VirtualBox, or QEMU – These tools can be
used to create isolated environments to test and simulate CFAS across different
operating systems and file systems without affecting the host machine.
f. Version Control
●
2. Hardware Requirements
a. Processor (CPU)
●Multi-core processor (Intel i5/i7, AMD Ryzen, or equivalent) – Required to
handle multiple tasks during the simulation and processing of file allocation
operations. A modern, mid-range processor should suffice.
b. Memory (RAM)
● 4 GB RAM minimum – Sufficient for basic CFAS development and
simulation tasks.
● 8 GB or higher – Recommended for testing the system with large files or
complex simulations that require higher memory throughput.
c. Storage (HDD/SSD)
● HDD (Hard Disk Drive) – Traditional disk drives can be used for testing
CFAS’s ability to manage large-scale data, block management, and fragmentation
in a real-world HDD environment.
● SSD (Solid-State Drive) – Useful for testing CFAS’s performance under
● file sets.
d. Disk Partitioning Tool
●Tools like GParted (for Linux) or built-in disk management utilities in
Windows/macOS can help in creating and managing different partitions for testing
CFAS under various conditions (e.g., with FAT, NTFS, ext4 file systems).
e. Graphics (Optional)
● Basic GPU (Graphics Processing Unit) – Not strictly necessary but could be
useful for visualizing file system structure (e.g., visual simulations of
fragmentation or file allocation) using graphical tools.
f. Network (Optional)
● Local Network Access – For testing distributed file systems or if CFAS needs
to be deployed and tested across multiple machines.
3. Optional/Additional Tools
a. Hardware Disk Profiling Tools
● Tools like SMART monitoring utilities (e.g., smartctl) or
CrystalDiskInfo can be used to track the health and performance of physical
storage devices (HDD/SSD) when running CFAS simulations.
b. Backup Tools
●External Drive or Cloud Storage – For regular backups of simulation data,
code, and system configurations during development.
c. Data Visualization Tools
● Matplotlib (Python) or Excel – To graphically visualize data on file allocation
efficiency, fragmentation levels, and performance metrics from simulation
outputs.
Conclusion:
The above requirements provide the necessary tools and environment to develop,
simulate, and test the Custom File Allocation System (CFAS). A focus on
efficient memory management, disk allocation, and simulation tools will help
optimize CFAS's functionality for various file system scenarios.
PROJECT DESCRIPTION: -
Project Significance:
This project is particularly relevant for students, researchers, and professionals
interested in file systems and disk management. CFAS offers a practical and
educational simulation of the core principles of file allocation systems,
demonstrating how files are stored and managed on disk in real-world operating
systems. By providing an environment to test different file allocation strategies,
CFAS allows users to analyze the trade-offs between speed, space efficiency, and
fragmentation.
Potential Use Cases:
● Educational Tool: CFAS serves as a learning tool for computer science
students who want to understand how file allocation works in file systems
like FAT, NTFS, or ext4.
● File System Research: The project can be extended
and used for
PROJECT DESCRIPTION
1.INTRODUCTION
A custom file allocation system is a specialized solution designed to manage the
storage, access, and organization of files in ways that address specific operational
challenges not fully met by standard file systems like FAT, NTFS, or ext4. These
custom systems are often necessary in environments that require high
performance, scalability, or unique file handling, such as cloud computing, large-
scale data centers, or multimedia platforms. By incorporating advanced techniques
such as extent-based or indexed allocation, data deduplication, compression, and
caching, custom file systems optimize for speed and efficiency. They also provide
enhanced fault tolerance through methods like journaling and redundancy,
ensuring data integrity even in the event of failures. In distributed environments,
custom file allocation systems help manage data across multiple servers,
improving load balancing and access times while supporting scalability. These
systems can also be tailored with specific security features, such as encryption and
custom access control, ensuring that sensitive data is protected. Overall, custom
file allocation systems offer organizations the flexibility and performance needed
to handle specialized workloads that traditional systems cannot effectively
manage.
Additionally, custom file allocation systems frequently incorporate advanced data
management features. Deduplication eliminates redundant data across the system,
freeing up valuable storage space, while compression techniques reduce file size
without significantly impacting access speed, particularly useful for environments
that store large datasets, such as multimedia files or sensor data in IoT systems.
Caching mechanisms enhance performance by storing frequently accessed data in
faster, temporary storage, reducing access latency and improving user experience,
especially in scenarios with high read frequencies, like content delivery networks
(CDNs).
Security is another critical advantage of custom file systems, allowing the
incorporation of advanced encryption methods to protect data both at rest and in
transit. This is often complemented by fine-grained access controls, which ensure
that only authorized users or systems can interact with sensitive data. In industries
like healthcare, finance, and government, where data privacy and security are
paramount, custom file systems can be designed to meet stringent regulatory
requirements (e.g., HIPAA, GDPR).
Custom file allocation systems are also resilient, often implementing journaling or
write-ahead logging to ensure that changes to the file system are recorded before
being committed, protecting the system from crashes and power failures.
Some systems also adopt RAID (Redundant Array of Independent Disks)
configurations or other forms of data replication to enhance reliability, ensuring
that even if one part of the system fails, data remains accessible from another node
or backup. This redundancy is crucial for mission-critical applications, ensuring
continuous operation and minimal downtime.
In summary, a custom file allocation system is a highly specialized tool designed
to handle unique challenges related to data management in high-performance and
large-scale environments. By optimizing data placement, improving access
speeds, enhancing fault tolerance, reducing storage waste, and providing robust
security measures, custom systems offer significant advantages for organizations
that need to manage massive amounts of data efficiently and reliably. Whether it’s
for optimizing distributed cloud storage, managing high-throughput databases, or
ensuring secure, scalable data access, custom file systems provide the flexibility
and control required to meet modern data demands.
Fig.2.1
1. Storage Layer: This is the foundation, where the actual file data is stored. It
employs various file allocation strategies such as:
o Contiguous Allocation: Files are stored in consecutive memory blocks,
improving sequential access but risking fragmentation.
o Indexed Allocation: Files have an index of pointers to their data blocks,
offering flexible access.
o Extent-Based Allocation: Files are stored in contiguous extents (large
blocks), minimizing fragmentation while allowing for flexible expansion.
2. Data Management Layer: This layer focuses on improving efficiency and
performance through techniques like:
o Deduplication: Identifies and eliminates duplicate files or data blocks to save
storage space.
o Caching: Temporarily stores frequently accessed files in faster storage for
quicker retrieval.
o Compression: Reduces file sizes to optimize storage capacity and data
transmission.
3. File System Layer: This top layer manages how users and systems access the
files, ensuring secure and reliable operations: o File Access: Manages file reads,
writes, and modifications.
o Security: Includes encryption for data protection and access controls to
regulate permissions.
o Fault Tolerance: Mechanisms like RAID and journaling ensure data
reliability, preventing loss in case of system crashes or failures.
This layered architecture optimizes file storage, retrieval, and management,
ensuring that the system meets specific performance, scalability, and security
needs.
Suitable for sequential access applications (e.g., video streaming), where all file
blocks are stored together. This simplifies access but requires knowing the file's
size upfront, making it prone to fragmentation as files grow or shrink.
3.Algorithm used
File Allocation Algorithms
● First Fit Algorithm: This simple allocation strategy finds the first available
block of memory that is large enough to accommodate the file being stored. It's
fast but can lead to fragmentation over time.
● Best Fit Algorithm: This algorithm searches the entire list of free blocks
and allocates the smallest block that fits the requested size. While this can reduce
wasted space, it may lead to fragmentation and can be slower than the first fit.
● Worst Fit Algorithm: Contrary to best fit, the worst fit algorithm allocates
the largest available block. This approach can help in creating larger remaining
free spaces but is less commonly used due to potential inefficiencies.
2. Indexing Algorithms
● B-Trees/B+ Trees: These tree data structures are commonly used in
databases for indexing files, allowing for efficient insertion, deletion, and
searching operations. They maintain sorted data and enable quick retrieval of file
locations.
●
4. Compression Algorithms
6.Output Results
1. Performance Metrics
● File Retrieval Speed: Measure the time taken to retrieve files using different
indexing and caching algorithms. A reduction in retrieval time compared to
traditional systems indicates improved efficiency.
● Storage Utilization: Analyze the amount of storage space used versus the
total available space. Effective allocation algorithms should minimize
fragmentation and maximize usable storage.
● Data Access Latency: Evaluate the latency associated with file access,
including both read and write operations. Lower latency times reflect a more
responsive system.
● Throughput: Measure the number of files that can be read or written per unit
time. A higher throughput indicates the system's ability to handle large volumes of
data effectively.
7.Conclusion
In conclusion, the custom file allocation system project showcases a
comprehensive and innovative approach to file management that significantly
enhances efficiency, performance, and reliability in handling data. By carefully
selecting and implementing a variety of specialized algorithms—such as those for
file allocation, indexing, caching, compression, and fault tolerance—the system is
able to optimize storage utilization and minimize data retrieval times, resulting in
a more responsive and user-friendly experience.
The architecture of the system is designed to be modular and scalable, allowing it
to adapt to different workloads and accommodate future growth in data volume
and user demand. This scalability is crucial in today’s dynamic digital landscape,
where the volume of data continues to expand exponentially.
Additionally, the integration of redundancy and robust data recovery mechanisms
enhances data integrity, ensuring that critical information remains accessible even
in the face of hardware failures or data corruption. Techniques such as RAID
configurations and replication strategies are employed to provide a safety net, .
8.Appendix
class File:
def __init__(self, name, size, block_size, blocks, contiguous):
self.name = name
self.size = size self.block_size = block_size self.blocks =
blocks # List of blocks allocated to this file
self.contiguous = contiguous # Whether the file was allocated contiguously
class CustomFileSystem:
def __init__(self, total_blocks, block_size):
self.total_blocks = total_blocks
self.block_size = block_size
self.free_blocks = [True] * total_blocks
self.files = {}
contiguous_blocks = self.find_contiguous_blocks(needed_blocks)
if contiguous_blocks:
# Allocate contiguously if possible
self.allocate_blocks(file_name, file_size, contiguous_blocks,
contiguous=True)
else:
# Fallback to indexed allocation index_blocks =
self.allocate_indexed_blocks(needed_blocks)
if index_blocks:
self.allocate_blocks(file_name, file_size, index_blocks,
contiguous=False)
else: print(f"Error: Could not allocate file
'{file_name}'.")
file = self.files.pop(file_name)
for block in file.blocks:
self.free_blocks[block] = True # Mark blocks as free
print(f"File '{file_name}' deallocated.")
def display_allocation(self): print("File
Allocation Table:") for file_name, file
in self.files.items(): allocation_type =
"Contiguous
Expected output:
File 'file1' allocated using Contiguous Allocation at blocks: [0, 1]
File 'file2' allocated using Contiguous Allocation at blocks: [2, 3] File
'file3' allocated using Indexed Allocation at blocks: [4, 5, 6, 7, 8]
File Allocation Table:
File: file1, Size: 8 KB, Blocks: [0, 1], Contiguous
File: file2, Size: 6 KB, Blocks: [2, 3], Contiguous File:
file3, Size: 20 KB, Blocks: [4, 5, 6, 7, 8], Indexed
File 'file2' deallocated.
File Allocation Table:
File: file1, Size: 8 KB, Blocks: [0, 1], Contiguous
File: file3, Size: 20 KB, Blocks: [4, 5, 6, 7, 8], Indexed.
OUTPUT