File System
File System
File systems
We have two problems in related to memory:
a. It cannot load large amount of information
b. It is volatile.
- The solution for these two problems is storing information on external media in a unit
called files. Storage techniques vary from media to media for example in hard disk
data is stored in blocks, on magnetic tape data organized and stored character by
character sequentially.
- A file is an abstract data type to hold the details of implementation. The part of
Operating System that deals with files is called File system. There are two views of
file.
1. Users point of view
2. Implementation point of view.
- Users view is concerned with file naming, operation allowed on files, what a directory
tree looks like and so on.
- From implementation point of view we are interested in:
How files and directories are stored
How disk space is managed
How to make everything work efficiently and reliable
All computer applications need to store and retrieve information. While a process is running,
it can store amount of information within its own address space. However, the storage
capacity is restricted to the size of the virtual address space. A second problem with keeping
information within a process’ address space is that when the process terminates, the
information is lost. A third problem is that it is frequently necessary for multiple processes to
access (parts of) the information at the same time. Thus we have three essential
requirements for long term information storage
1
5.2 File Structure
Files can be structured in any of several ways. Three common possibilities are depicted
here.
2
5.4 File Access
• Sequential access
– read all bytes/records from the beginning
– cannot jump around, could rewind or back up
– convenient when medium is magnetic tape, rather than disk
• Random access
– bytes/records read in any order
– essential for data base systems
– read can be …
• move file marker (seek), then read or …
• read and then move file marker
3
5.6 File Operation
1. Create
2. Delete
3. Open
4. Close
5. Read
6. Write
7. Append
8. Seek
9. Get attributes
10. Set Attributes
11. Rename
5.7 Memory-Mapped Files
(a) Segmented process before mapping files into its address space
(b) Process after mapping
Existing file abc into one segment
Creating new segment for xyz
4
Implementing Files
Key issue in implementing a file system is to figure out how to keep track of which blocks go
with which file. We have four different methods of implementation.
1. Contiguous allocation
2. Linked list Allocation
3. File Allocation Table /DOS/
4. I-Nodes /UNIT/
1. Contiguous allocation
- In contiguous allocating methods each file is stored as a contiguous block of data on
disk
Information about each file block is stored in a special entry called Directory Entry, which is
created when the file is created. If the block is 1K then a 50 K file will be allocated in a 50
consecutive blocks. With 2=KB blocks, it would be allocated 25 consecutive blocks.
Advantage:
- Simple to implement (keeping track of where a file’s blocks are, is reduced to
remembering 2 numbers: the disk address of the first block, the number of blocks in
the file. Given the number of the first block, the number of any other blocks can be
found by a simple addition.)
- Excellent performance, since the entire file can be read from the disk in a single
operation. Only one seek is needed (to the first block). After that, no more seeks or
rotational delays are needed so data come in at the full bandwidth of the disk
Disadvantage: -
- In time, the disk becomes fragmented. When a file is removed, its blocks are freed,
leaving a run of free blocks on the disk. The disk is not compacted on the spot to
squeeze out the hole since that would involve copying all the blocks following the
hole, potentially millions of blocks. As a result, the disk ultimately consists of files and
holes.
- Eventually the disk will fill up and it will become necessary to either compact the disk,
which is prohibitively expensive, or to reuse the free space in the holes. Reusing the
space requires maintaining list of holes, which is doable. However, when a new file is
to be created, it is necessary to know its final size in order to choose a hole of the
correct size to place it in.
2. Linked- List allocation
5
- In this method file is stored as a linked list of disk blocks
The first word (2 byte) of each block is used as a pointer to the next one, the rest of the block
is for data.
Advantage:
- No external fragmentation
- A good performance for sequential access
- It is sufficient to store the 1st address in the directory entry.
Disadvantage:
- Random access is very slow; we have to read all the previous blocks before we
reach the block we want. To get to block n, the OS has to start at the beginning and
read the n-1 blocks prior to it. One at a time. Clearly doing so many reads will be
painfully slow.
- If one block corrupted all blocks after it will be inaccessible. This doesn't happen in
contagious allocation.
- Since the pointer takes 2 bytes, the amount data in a block is no more a power of has
i.e if we have data that is 52 bytes and a block size of 512 will be needed 1 block size
of 512-2= 510 byte of data + another block of the 2 byte of space for pointer
- Since many programs read and write in a block size that is a power of two it is less
efficient.
Solution: -
3. Linked list allocation using an index
- It is an allocation technique using File Allocation table (FAT)
- Used in DOS and OS/2
- It takes out the pointer word from each disk block and put it in a table in memory
- Such a table is called File Allocation Table
- Using this method, the entire block of memory is available for data.
6
- The directory entry for file contains the starting block and the FAT provides the rest of
the chain.
Advantage: -
- No external fragmentation
- Good performance for sequential access
- Random Access is fast as long as FAT is kept in memory that means if it doesn't
grow too big.
- The entire block is available for data
- If one block is inaccessible we only loose that block.
Disadvantage:
- If FAT is corrupted; we lose everything. The solution for this problem is to maintain a
copy of FAT in CD-ROM, floppy etc.
Exercise:
If we have 20GB disk and 1k block size and 4bytes FAT entry what is the size of the FAT?
20 x 230 * 4
1 x 210
4. I-Node 20 x 220 x 4 = 80 MB
- used by UNIX
- Each file has its own table called an I-node (index node) which lists the attributes and
disk addresses for the file's block
- Each I-node has an identification number
- Given the I-node, it then possible to find all the blocks of the file.
- The big- advantage over FAT is that I-node will be needed in memory only when the
corresponding file is open.
Example:
7
If each I-node occupies n bytes and we have K files open, total memory used by I-
nodes is K*n.
- The size of the I-node doesn’t matter whether the disk is 10B, 100GB, or 100GB.
- The first few address are stored in 1st I-node
- If a file goes beyond its limit, the last disk address will be reserved for the pointer to
be hold mouse disk block address
- The I-nod holds memory only when the file is opened.
- Assume a block of 512 bytes what is the size of file 1
Example:
Advantage:-
- No external fragmentation.
- Fast access for small files
- If an I-node is in accessible only one file is inaccessible.
Implementing Directories
- The file system management accesses a file by its directory entry. Depending on the
system directory entry could hold different information.
For example:
o Contiguous allocation: disk address of the entire files.
o FAT: The address of the 1st block
o I-nodes: the I-node number
Directories in DOS
- It is 32 bit.
Directory in UNIX
2byte 14 byte
8
I-node Number File Name
Information about file size, time, ownership, disk blocks are stored in the I-node.
How a file is accessed using this method:
Example: cat /user /ast/mbox
Root directory
First the file system locates the root directory where its I-node is located at a fixed place on
disk. Finally the fixed I-node from box is read into memory and kept there until the file is
closed
Keeping Track of free blocks
1. Linked List: - link together all free blocks keep pointer to the first free block in a special
disk location.
Disadvantage:
- Maintaining the list is time consuming
- Writing a cod print is time consuming
2. Bit- map: used most of the time. Faster than linked-list. Maintenance is simple