File Management
File Management
File
“A file is a named collection of related information that is recorded on secondary
storage such as magnetic disks, magnetic tapes and optical disks.”
“In general, a file is a sequence of bits, bytes, lines or records whose meaning is
defined by the files creator and user.”
File Attributes
A file has certain other attributes, which vary from operating system to another, but
typically consist of these:
Name:
The file name is the only information kept in human readable form.
Identifier:
This unique tag, usually a number, identifies the file within the file system.
Type:
This information is needed for those systems that support different types.
Location:
This information is a pointer to a device and to the location of the file on that
device.
Size:
The current size of the file (in bytes, words, or blocks), and possibly the
maximum allowed size are included in this attribute.
Protection:
Time, date, and user identification: This information may be kept for creation,
last modification, and last use.
1
Operations on file
Six basic file operations: The OS can provide system calls to create, write, read,
reposition, delete, and truncate files.
Creating a file:
Two steps are necessary to create a file.
Space in the file system must be found for the file.
An entry for the new file must be made in the directory.
Writing a file:
To write a file, we make a system call specifying both the name of the file
and the information to be written to the file.
The system must keep a write pointer to the location in the file where the
next write is to take place.
The write pointer must be updated whenever a write occurs.
Reading a file:
To read from a file, we use a system call that specifies the name of the file
and where (in memory) the next block of the file should be put.
The system needs to keep a read pointer to the location in the file where the
next read is to take place.
Deleting a file:
To delete a file, we search the directory for the named file.
Having found the associated directory entry, we release all file space, so
that it can be reused by other files, and erase the directory entry.
Truncating a file:
The user may want to erase the contents of a file but keep its attributes.
Rather than forcing the user to delete the file and then recreate it, this
function allows all attributes to remain unchanged (except for file length)
but lets the file be reset to length zero and its file space released.
2
File Type
File type refers to the ability of the operating system to distinguish different types
of file such as text files source files and binary files etc. Many operating systems
support many types of files. Operating system like MS-DOS and UNIX has the
following types of files:
Ordinary files
These are the files that contain user information.
These may have text, databases or executable program.
The user can apply various operations on such files like add, modify, delete
or even remove the entire file.
Directory files
These files contain list of file names and other information related to these
files.
Special files:
These files are also known as device files.
These files represent physical device like disks, terminals, printers,
networks, tape drive etc.
Two types
File Structure
File structure is a structure, which is according to a required format that operating
system can understand.
File Protection :
File Naming
By providing the name to the file only the owner of the file may know the
name and can access the file.
Access control
o Providing the access control
o Read , write , execute
o Only owner can decide regarding the permissions to other users.
Passwords
Passwords are the other means of providing the protection to the files.
Sequential access
Direct/Random access
Sequential access
Information in the file is processed in order, one record after the other.
A read operation reads the next portion of the file and automatically
advances the file pointer.
4
Similarly, a write appends to the end of the file and the file pointer advances
to the end of the newly written material (the new end of file).
Such a file can be reset to the beginning, and, on some systems, a program
may be able to skip forward or backward n records, for some integer n.
Rewind Read/Write
Direct/Random access
Space Allocation
Files are allocated disk spaces by operating system. Operating systems deploy
following three main ways to allocate disk space to files.
1. Contiguous Allocation
2. Linked Allocation
3. Indexed Allocation
5
Contiguous Allocation
In contiguous allocation, files are assigned to contiguous areas of secondary
storage.
The location of a file is defined by the disk address of the first block and its
length.
File owner has to specify the size of the file in advance.
Widely used in CD-ROMs, DVDs where final file sizes are known in
advance and won't change
This is shown in the following figure.
Advantages:
1) All records of a file are normally physically adjacent to each other. This
increases the accessing speed of records.
2) Contiguous allocation supports both sequential and direct accessing.
3) Ease of implementation.
Disadvantages:
1) To find a space for a new file we have to search N free contiguous free
holes.
2) External fragmentation.
3) Compaction is required.
4) Reallocate the file if it grows in size.
6
Linked Allocation
The advantages:
1) It’s simplicity
2) No external disk fragmentation.
3) No disk compaction required.
4) There is also no need of declaration of the size of a file in linked allocation
while it is created.
The disadvantages :
1) Slow direct accessing of any disk block
7
2) Space requirement for pointers
3) Reliability - Since disk blocks are linked by pointers, a single damaged
pointer can make thousands of disk blocks inaccessible.
Indexed Allocation
The principle of indexed allocation is illustrated in figure.
In this scheme each file is provided with its own index block, which is an
array of disk block pointers (addresses).
The Nth entry in the index block points to the Nth disk block of the file. The
directory contains the address of the index block.
To read the Nth disk block the pointer in the Nth index block entry is used to
find the desired block and then read.
The advantages
1) The absence of external fragmentation
2) Indexing of free space can be accomplished by means of the bit map.
The disadvantages
1) The number of disk accesses necessary to retrieve the address of the target
block on disk.
2) Indexed allocation requires lots of space for keeping pointers.
8
General Directory Structure :
There are many types of directory structure in Operating System. They are as
follows:-
Single level directory is simple to implement but each file must have a
unique name.
In this all file are stored in the same directory. A single-level directory has
significant limitations, however, when the number of files increases or when
there is more than one user.
Since all files are in the same directory, they must have unique names. If
there are two users who call their data file "test", then the unique-name rule
is violated.
Even with a single-user, as the number of files increases, it becomes difficult
to remember the names of all the files in order to create only files with
unique names
9
Limitations of Single Level Directory
a) since all files are in the same directory, they must have unique name.
b) If two users call their data free test, then the unique name rule is violated.
c) Files are limited in length.
d) Even a single user may find it difficult to remember the names of all files as the
number of file increases.
e) Keeping track of so many file is difficult task.
Advantages:
i) Path name
ii) Can have the same file name for different user
iii) Efficient searching
Disadvantages:
No grouping capability
10
Tree Structured Directory
In Tree structured directory system, any directory entry can either be a file
or sub directory.
Each user has its own directory and it cannot enter in the other user's
directory. However, the user has the permission to read the root's data but he
cannot write or modify this. Only administrator of the system has the
complete access of root directory.
Absolute path is the path of the file with respect to the root directory of the
system while relative path is the path with respect to the current working
directory of the system. In tree structured directory systems, the user is given
the privilege to create the files as well as directories.
11
In modern computers, most of the secondary storage is in the form of magnetic
disks. Hence, knowing the structure of a magnetic disk is necessary to understand
how the data in the disk is accessed by the computer.
A magnetic disk contains several platters. Each platter is divided into circular
shaped tracks. The length of the tracks near the centre is less than the length of the
tracks farther from the centre. Each track is further divided into sectors, as shown
in the figure.
Tracks of the same distance from centre form a cylinder. A read-write head is used
to read data from a sector of the magnetic disk.
The speed of the disk is measured as two parts:
12
Transfer rate: This is the rate at which the data moves from disk to the
computer.
Random access time: It is the sum of the seek time and rotational latency.
Seek time: is the time taken by the arm to move to the required track. Rotational
latency is defined as the time taken by the arm to reach the required sector in the
track.
Logical structure
At the beginning of the hard drive is the MBR. When your computer starts
using your hard drive, this is where it looks first.
The MBR itself has a specific organization. The size of the MBR is 512
bytes.
The boot loader is the first 446 bytes of the MBR. This section contains
executable code, where programs are housed.
The partition tables are 4 slots of 16 bytes each, containing the description of
a partition (primary or extended) on the disk.
The Magic Number is two bytes used to determine if the hard disk has a
bootloader or not. If it does, the magic number should be equal in value to
hexadecimal 55AA.
13
How RAID works
RAID works by placing data on multiple disks and allowing input/output (I/O)
operations to overlap in a balanced way, improving performance. Because the use
of multiple disks increases the mean time between failures (MTBF), storing data
redundantly also increases fault tolerance.
RAID arrays appear to the operating system (OS) as a single logical hard disk.
RAID employs the techniques of disk mirroring or disk striping. Mirroring copies
identical data onto more than one drive. Striping partitions each drive's storage
space into units ranging from a sector (512 bytes) up to several megabytes. The
stripes of all the disks are interleaved and addressed in order.
RAID 0: This configuration has striping, but no redundancy of data. It offers the
best performance, but no fault tolerance.
RAID 1: Also known as disk mirroring, this configuration consists of at least two
drives that duplicate the storage of data. There is no striping. Read performance is
improved since either disk can be read at the same time. Write performance is the
same as for single disk storage.
14
RAID 2: This configuration uses striping across disks, with some disks storing
error checking and correcting (ECC) information. It has no advantage over RAID 3
and is no longer used.
RAID 4: This level uses large stripes, which means you can read records from any
single drive. This allows you to use overlapped I/O for read operations. Since all
write operations have to update the parity drive, no I/O overlapping is possible.
RAID 4 offers no advantage over RAID 5.
16
RAID 5: This level is based on block-level striping with parity. The parity
information is striped across each drive, allowing the array to function even if one
drive were to fail. The array's architecture allows read and write operations to span
multiple drives. This results in performance that is usually better than that of a
single drive, but not as high as that of a RAID 0 array. RAID 5 requires at least
three disks, but it is often recommended to use at least five disks for performance
reasons.
RAID 5 arrays are generally considered to be a poor choice for use on write-
intensive systems because of the performance impact associated with writing parity
information. When a disk does fail, it can take a long time to rebuild a RAID 5
array. Performance is usually degraded during the rebuild time, and the array is
vulnerable to an additional disk failure until the rebuild is complete.
17
RAID 6: This technique is similar to RAID 5, but includes a second parity scheme
that is distributed across the drives in the array. The use of additional parity allows
the array to continue to function even if two disks fail simultaneously. However,
this extra protection comes at a cost. RAID 6 arrays have a higher cost per
gigabyte (GB) and often have slower write performance than RAID 5 arrays.
18