OS Unit 6
OS Unit 6
OS Unit 6
Unit 6
File Management
File Operations:
The operating system can provide system calls to create, write, read, reposition,
delete and truncate files. Following are the operation that can be performed in
file:
• Creating a file:
Its purpose is to create a blank file in which two steps are required. First,
space in the file system must be found for the file. Second, an entry for the
new file must be made in the directory.
• Writing a file:
Its purpose is to write some data into file. To do this, system call is made
to specify both the name of the file and the information to be written into
file. Given the name of the file, the system searches the directory to find
the file’s location. The system must keep the write pointer to the location
in the file where the write is to take place. The write pointer must be
updated whenever a write occur.
• Reading a file:
Its purpose is to read some data from a file. To do this, system call is made
to specify the name of the file and where in memory the next block of the
file should be put. The directory is searched for the entry and the system
needs to keep a read pointer to the location in the file where the next read
is to take place. The read pointer must be updated once the read has taken
palace.
• Repositioning within the file:
The directory is searched for the appropriate entry and the current file
position pointer is repositioned to a given value. Repositioning within a file
need not involve any actual I/O. This file operation is also known as a file
seek.
• Deleting a file:
The purpose of this system call is to delete a file. To do this, directory is
searched for the named file. If the directory is found all the file spaces are
releases so that it can be reused by other files and erase the directory entry.
• Truncating a file:
Its purpose is to erase the content of the file but keeping its attributes
unchanged. The file is reset to length zero and its file space is released.
• Appending a file:
It purpose is to add some data to the end of the existing file.
• Renaming a file:
It purpose is to put a new file’s name from current name.
Most of the file operation mentioned above requires searching a directory for the
entry associated with the named file. To avoid this constant searching many
system requires that an open() system call be made before a file is first used. The
operating system keeps a table called the open file table which contains the
information about all the open file. When a file operation is requested the file is
specified via an index into this table so no searching is required. When a file is
no longer being actively used it is closed by the process and the OS removes its
entry from the open file table.
File Access Method:
When a file is used, the information it contained must be accessed and read into
computer memory. The information in the file can be accessed in several ways.
Some system provide only one access method for files while other may support
many access method. Following are the common file access methods:
• Sequential Access:
In this access method, information in the file is processed in order, one
record after the other. This mode of access is a common access method.
For example editors and compilers usually access file in this fashion. Data
records are retrieve in the same order in which they have been stored in
disk.
Read and writes make up the bulk of the operations on a file. A read
operation read_next() reads the next portion of the file and causes a pointer
to move ahead by one. Similarly, the write operation write_next() appends
to the end of the file and advances to the end of newly written material i.e.
the new end of a file.
system that may not be the part of his/her file. The techniques used to
calculate or obtain absolute address is hashing in which every record is
associated with key number to preprocess the address calculation.
With large file the index file itself may become too large to be kept in
memory. Solution to it is to create an index for the index file. The primary
index fie contains pointer to secondary index files which point to the actual
data items.
Directory and Disk Structure:
There may be a thousand, millions of file within a computer. Files are stored on
random access storage device including hard disk, optical disk and solid state
disks. A whole storage device can be used for a file system or it can also be
divided into fine grained control. For e.g. disk can be partitioned into quarters and
each quarter can hold a separate file system.
Partitioning is done for limiting the size of individual file system, putting multiple
file-system types on the same device or leaving part of the device available for
other uses such as swap space or unformatted (raw) disk space. A file system can
be created in each of this partition and such partition containing a file system is
generally known as a volume. The volume may be the subset of a device, a whole
device or multiple device linked together.
Each volume that contains a file system must also contains information about the
files in the system. This information is kept in entries in a device directory
(directory) or volume table of content. Directory records the information such as
name, location, size and type for all files on that volume.
Directory:
A directory is a location for storing a file on a computer i.e. a container that is
used to contain folder and file. It organizes the file and folder into hierarchical
manner. The directory can be organized by itself and must allow to insert entries,
to delete entries, to search for a named entry and to list all the entries in the
directory. The operation that can be performed in directory are:
• Search for a file: to be able to search a file, first directory structure should
be search to find an entry for a particular file.
• Create a file: need file need to be created and added to the directory.
• Delete a file: when a file is no longer needed, file should be able to be
deleted from a directory.
• List a directory: files are needed to be listed in a directory and the content
of the directory entry for each file in the list.
• Rename a file: as a name of the file represents its contents to its users, the
name of the file should be able to change when the content or use of the
file changes. Renaming a file may also allow its position within the
directory structure to be changed.
• Traverse the file system: sometimes it is needed to access every directory
and every file within a directory structure. It is good idea to save the content
and structure of the entire file system at regular interval. Files are copied
to magnetic tape because of which backup copy is provided in case of
system failure.
Structure of Directory:
1. Single level directory:
It is a simplest directory structure in which all files are contained in the
same directory which is easy to support and understand. Since all the files
are in same directory they must have a unique name. If the two user put
their data file name same for e.g. test.doc then the unique name rule is
violated. But most file system supports file name up to 255 character so it
is relatively easy to select unique file name.
Advantages:
• Its implementation is very easy as it contains single directory
• If files are smaller in size then searching will be faster.
• The operation like creation, deletion, updating are very easy in such
a directory structure.
Disadvantages:
• There may be a chance of name collision.
• As the number of files increases then it will be difficult to remember
the name of the files.
• Searching will became time consuming if directory will be large.
File Sharing:
Here, the general issues and solution related to sharing a file are discussed:
1. Multiple Users:
When an operating system accommodates multiple users the issues of the
file sharing, file naming and file protection becomes important. Given a
directory structure that allows file to be shared by users, the system must
implement file sharing. The system can either allows a user to access the
files of other users by default or require that a user specifically grant access
to the files.
To implement the sharing and protection, most system have evolved to use
the concept of file or directory owner and group. The owner is the user who
can change attributes, grant access and who has the most control over the
file. The group attribute defines a subset of users who can share access to
file. Out of all operation group member can execute only a subset of such
operation and exactly which operation can be executed is defined by file’s
owner.
The owner and the group id of the given file are stored within file attributes.
When a user request some operation on the file then the user id or group id
is checked with owner attributes to determine if the requesting user is the
owner of the file or not. The result indicates which permission are
applicable. The system then applies those permissions to the requested
operation and allows or denies it.
o Failure Mode:
Remote file system have more failure than local file system because
of the complexity of the network system and the required interaction
between remote machine, many more problem can interfere with the
proper operation of remote file system. The network can be
• If the list of the user in the system is not known in advance then
constructing such list may be tedious and time consuming.
• The directory structure previously of fixed size must be of variable
size resulting in more complicated space management.
To solve this problem many system recognize three classification of users
in connection with each file:
• Owner: the user who created the file.
• Group: a set of users who are sharing the file and need similar access
is a group
• Universe: all other users in the system constitute the universe
The common approach is to combine access control list with the more
general owner, group and universe and allowing the access based on the
control information.
c) Other Protection Approaches:
Another approach to maintain protection is to associate a password with
each file. If the password are chosen randomly and changed often, this
scheme may be effective in limiting access to a file. The use of the
password have several disadvantages such as the number of password that
the user set might be longer and difficult to remember which makes them
impracticable and if only one password is used and if discovered all files
are accessible. To solve this problem some system allows a user to
associate a password with subdirectories rather than in individual file.
structure via file control block (FCB) which contains the information about
the file including ownership, permission, and location of the file contents.
Advantages:
➢ Duplication of code is minimized
➢ I/O control and basic file system code can be used by multiple file system.
Each file system can then have its own logical file system and file
organization modules.
Disadvantages:
➢ Layering can introduce more operating system overhead
➢ Decision should include how many layer to use and what each layer should
do is major challenges in designing new systems.
In-Memory Structure:
It is used for both file system management and performance improvement via
caching. The data are loaded at mount time, updating during file system operation
and discarded at dismount. It includes:
• An in-memory mount table contains information about each mounted
volume.
• An in-memory directory structure cache holds the directory information
of recently accessed directories.
• The system-wide-open file table contains a copy of FCB of each open file
as well as other information.
• The per-process-open-file table contains a pointer to the appropriate entry
in the system wide open file table as well as other information.
• Buffers hold file system blocks when they are being read form disk or
written to disk.
structure is searched for the given file name. Once the file is found the FCB is
copied into a system-wide-open- file table in memory. This table also keeps tracks
of the number of processes that have the file open.
Next, an entry is made in the per process open-file table with a pointer to the entry
in the system-wide open file table and some other field. The other field may
include pointer to the current location in the file for read () or write () operation
and access mode in which the file is open. The open () call returns a pointer to
the appropriate entry in the per-process file system table and all the operation are
then performed via this pointer. When a process closes the file, the per-process
table entry is removed and the system-wide entry open count is decremented.
Directory Implementation:
The selection of directory allocation and directory management algorithm
significantly affects the efficiency, performance and reliability of the file system.
Some ways to implement directory are:
1. Linear List:
It is a simplest method to implement a directory in which linear list of file
name is used with a pointer to the data block. To create a new file, first
directory is searched to be sure that no existing file has the same name.
Then new entry is added at the end of the directory. To delete a file a
directory is searched for a specific file name then releases the space
allocated to it. To reuse a directory entry, the entry can be marked as
unused by assigning it a special name such as all blank name or including
a special unused used bit in a memory or attach it to a list of free directory.
A link list can also be used to decrease a time required to delete a file.
2. Hash table:
Another data structure used for a file directory is a hash table in which
linear hash table takes a value computed from the file name and returns
pointer to the file name in the linear list. A key value pair for each file in
the directory gets generated and stored in the hash table. The key is
generated by applying a hash function to each file and key points to the
corresponding file stored in the directory. Insertion and deletion is also
straight forward although some provision must be made for collision
(situation in which two fie names hash to the same location).
2. Linked Allocation:
In this scheme, each file is a linked list of disk block and the disk blocks
may be scattered anywhere on the disk. The directory contains the pointer
to the first and the last block of the file. For example: a file of five block
might start at block 9 and continue at block 16 then block 1 then block 10
and finally block 25. Each block contains a pointer to the next block and
pointer are not available to the user. If each block is 512 byte in the size
and a disk address (pointer) requires 4 bytes then the user sees blocks of
508 byte.
To create a new file new entry in the directory is created and each directory
entry has a pointer to the first disk block of the file. A write to the file
causes the free space management system to find a free block and this new
block is written to and is linked to the end of the file. To read a file, blocks
are read by following the pointer from block to block.
Another variation used for linked allocation is File Allocation Table (FAT)
File Allocation Table (FAT) for file allocation:
Here, a section of the disk at the beginning of each volume is set aside to contain
the table. The table has one entry for each disk block and is indexed by block
number. The directory entry contains a block number of the first block of the file.
The table entry indexed by that block number contains the block number of the
next block in the file. The chain continues until it reaches the last block which
has special end of file value as the table entry.
An unused block is indicated by table value of 0. Allocating a new block to a file
is to find the first 0 valued table entry and replacing the previous end of file value
with the address of the new block. The 0 is then replaced with the end of file
value.
Following figure shows the implementation of FAT:
For the file that are very large, single index bock may not be able to hold all the
pointer. Following are the mechanism to solve this problem:
❖ Linked scheme: this scheme links two or more index block together for
holding a pointers. Every index block would then contains a pointer or the
address of the next index block.
❖ Multilevel index: in this scheme first index block is used to point the
second index block which in turns point to the file block.
❖ Combined scheme: this scheme is used in UNIX file system where say,
15 pointers of the index block is kept in the file inode. The first 12 of this
pointer point to the direct block which contains address of the block that
contains data of the file. The next three pointer points to the indirect block.
The first points to the single indirect block which is an index block
containing not data but address block that do contain data. The second
points to the double indirect block that contains the address of the block
that contains pointer to the actual data block. The last pointer contains the
address of a triple indirect block.
When a disk is in use, a drive motor spin it at high speed like 60 to 250
times per second and are measured in term of rotation per minute (RPM).
Disk speed has two part: the transfer rate is a rate at which data flow
between the drive and computer. The positioning time or random access
time consist of two part: the time necessary to move the disk arm to the
desired cylinder called the seek time and the time necessary for the desired
sector to rotate to the disk head called the rotational latency.
The disk platter are coated with a thin protective layer but the head will
sometimes damage the magnetic surface. This accident is called a head
crash. A disk drive is attach to computer by a set of wires called I/O bus.
The data transfer on the bus are carried out by special processor called
controller. The host controller is the controller at the computer end in which
command is places for I/O operation. The host controller then sends
message to disk controller which is built into each disk and operate to carry
out the received command.
2) Magnetic Tapes:
Magnetic tape was used as an early secondary storage medium and can
hold large quantities of data but its access time is slow compared to that of
main memory and magnetic disk. Tapes are used mainly for backup, for
storage of infrequently used information and as a medium for transferring
information from one system to another. A tape is kept in spool and is
wound or rewound pas a read write head. Moving to the correct spot on a
tape can take a minute but once positioned tape drive can write data at
speeds comparable to disk drives.
To map logical block into disk address some media uses constant linear velocity
(CLV) in which density of bit per track is uniform. The farther the track is from
center of disk the greater its length so the more sector can be hold. From the outer
zone to inner zone the number of sector per track decreases. The drive increases
its rotation speed from as the head moves from the outer to the inner tracks to
keep the same rate of data moving under the head. Alternately, the disk rotation
speed can stay constant. In this case density of bits decreases from inner track to
outer track to keep the data rate constant. This method is used in hard drive and
is known as constant angular velocity (CAV).
Disk Scheduling:
One of the responsibility of the operating system is to use the hardware
efficiently. For the disk drive, meeting this responsibility means having faster
access time and large disk bandwidth. The access time has three major
component:
❖ Seek time: is the time for the disk arm to move the heads to the cylinder
containing the desired sector.
❖ Rotational latency: is the additional time for the disk to rotate the desired
sector to the disk head.
❖ The disk bandwidth: is the total number of bytes transferred divided by
the total time between the first request for service and the completion of
the last transfer.
Both the access time and bandwidth can be improve by managing the order in
which disk I/O requests are serviced.
There are number of algorithm exist to schedule the disk I/O requests:
1) First Come First Serve (FCFS) Scheduling:
This algorithm is fair but generally does not provide fast service. This
scheme selects the I/O block that have request first. For example:
As head starts from 53, it will first move from 53 to 98 then to 183, 37,
122, 14, and 124, 65 and last to 67.
2) Shortest Seek Time First (SSTF):
It selects the request with the least seek time from the current head position
i.e. chooses the request closest to current head position. SSTF is essentially
a form of shortest job first (SJF) scheduling so may cause starvation of
some request.
4) C-SCAN:
Circular SCAN scheduling is a variant of SCAN in which head moves from
one end of the disk to the other servicing request along a way. When the
head reaches the other end however it immediately returns to the beginning
of the disk without servicing any requests on the return trip. This algorithm
treats the cylinder as a circular list that wraps around from the final cylinder
to the first one.
5) LOOK:
In this algorithm, the disk arm starts at one end of the disk and goes as far
as the final request and reverse the direction from there i.e. will not move
to the end of the disk.
6) CLOOK:
It is similar to C-SCAN, in which the disk arm in spite of going to the end
goes to the last request to be services and then from there goes to the
other end’s last request.
Disk Management:
The operating system is responsible for several other aspects of disk management
such as initializing a disk, booting from disk, bad block recovery etc.
1. Disk Formatting:
A new magnetic disk is a blank state: it is just a platter of magnetic
recording material. Before a disk can store data it must be divided into
sector such that disk controller can read and write. This process is called
low level formatting or physical formatting. This low level formatting
fills the disk with a special data structure for each sector which consist of
header, a data area and a trailer. Header and trailer contains information
used by the disk controller such as a sector number and Error Correcting
Code (ECC).
When the controller writes a sector of data during normal I/O the ECC is
updated with a value calculated form all the bytes in the data area. When
the sector is read the ECC is recalculated and compared with the store
value. If the calculated value does not match with stored value then this
indicates the data area of sector have been corrupted and the disk sector
may be bad.
Before operating system can use a disk to hold file, OS needs to record its
own data structure on the disk. It does in two step:
• The first step is to partition the disk into one or more group of
cylinders. One partition can hold a copy of the OS’s executable file
while another can hold user’s file.
• The second step is logical formatting or creation of file system in
which OS stores the initial file system data structure onto the disk.
These data structure may include maps of free and allocated space
and an initial empty directory.
2. Boot Block:
When the computer is turned on, an initial program is run which initializes
all aspect of the system from CPU program register to device controller
and the content of main memory and then starts the operating system. Such
initial program is known as bootstrap program. To do this job, bootstrap
program finds the operating system kernel into memory and jumps to an
initial address to begin the operating system execution.
For most computer bootstrap program is stored in ROM but a problem is
that changing the bootstrap code requires changing the RAM chip. So, most
system store a tiny bootstrap loader program in the boot ROM whose only
job is to bring a full boot strap program from disk. The full bootstrap
program is store in the boot block at a fixed location on the disk. A disk
that has a boot partition is called boot block or system disk. The code in
the boot ROM instructs the disk controller to read the boot block into
memory and then starts executing code.
disk is being formatted. Any bad block that are discovered are
flagged as unusable so that the file does not allocate them.
• In most sophisticated disk, the controller maintains a list of bad
blocks on the disk. The list is initialized during the low level
formatting at the factory and is updated over the life of the disk. The
controller can be told to replace each bad block logically with one
of the spare sector. This scheme is known as sector sparing or
forwarding.
Swap Space Management:
Swap space is the space in the secondary memory which is a substitute of physical
memory and is used as virtual memory which contains process memory image.
Swap space helps the computer’s operating system in pretending that it have more
RAM that it actually has. It is also called swap file. The main goal for the design
and implementation of swap space is to provide the best throughput for the virtual
memory system.
Swap space is used in various ways by different operating system depending on
the memory management algorithm in use. System that implement swapping may
use swap space to hold an entire process image, including the code and data
segments. Paging system may simply stores pages that have been pushed out of
memory. The amount of swap space needed on a system can therefore vary from
a few megabyte to gigabyte, depending upon the amount of physical memory, the
amount of virtual memory it is backing and the way in which the virtual memory
is used.
In operating system such as windows, Linux etc. system provide a certain amount
of swap space by default which can be changed by user according to their needs.
If user don’t want to use virtual memory then user can easily disable it. So it
totally depends upon user whether he / she wants to use swap space or not.
Note:
• Refer your class note for the numerical solution of disk scheduling.
• Cover the topic of I/O hardware like system unit, processor, data
representation, memory, ports, connection etc. by yourself.
• Application I/O interface topic is provided in chapter 8.