0% found this document useful (0 votes)
52 views

File System

The document discusses file system functions and how they are implemented. It describes how a file system keeps track of files and storage using techniques like allocation algorithms, free space management using bitmaps or linked lists, and storing file metadata in inodes and directories.

Uploaded by

Piyush Raghav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

File System

The document discusses file system functions and how they are implemented. It describes how a file system keeps track of files and storage using techniques like allocation algorithms, free space management using bitmaps or linked lists, and storing file metadata in inodes and directories.

Uploaded by

Piyush Raghav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

FILE SYSTEM (SECONDARY STORAGE MANAGEMENT)

File System Functions:1)


2)
3)
4)
5)
6)
7)
8)

keep track of free space


allocation/deallocation of blocks
keep track of files
convenient use of files
security of files
sharing of files
keep track of bad sectors or blocks
utilites related to file systems

What should be there inside the file system to provide these functions?
1) Routines (allocation, free space mgmt.)
2) Code corresponding to system calls
3) Information
(a) some information is needed only when one system is on. e.g. if some process has opened
some file.
This information is kept in tables in main memory and that is part of file system.
(b) tables in each secondary storage devices , that contains entries like file
name and corresponding sector no. or block no.

Routines should takes care of efficiency


(a) time (in doing allocation and accessing the data)
(b) space(in allocation)
Ques:->What is the difference between raw information stored on disk and file stored on disk?
Ans:- The difference comes at the file system level , there is information stored in tables for file stored on
disk. But there is no information corresponding to raw data stored on the disk in any tables.

Format Operation :-To store raw information on a fresh disk , we have to create the sectors called as
format operation.
File System Creation:- To store data in terms of file , we have to create tables on secondary storage
device called file system creation.
In DOS, Format command for formatting + file system creation.
In Unix, Format for format operation
mkfs for file system creation operation

In order to specify unique location on disk, three dimensional address is neededSurface number, track number, sector number.

Three dimensional address consumes more space so we convert it to one


dimensional address in terms of block.
Three dimensional address is used at the hardware level and one dimensional
address is used at all other levels, i.e. file system level.
Physical block or Sector size is 512 bytes. We can decide the size of logical
block while creating file system. i.e. suppose one logical block = 2 physical
blocks (so logical block size is now 1024 bytes)
Logical block or simply block is the main unit of allocation.
Disadvantage of less block size in large file application e.g. database application
is that corresponding to each file more information is required in the file system
tables.
Also reading will be slow (but not in contiguous allocation)

Large block size are disadvantageous in case of smaller file sizes because for a smaller file also
we have to allocate a larger size block. So space wastage is there due to internal fragmentation.
So, there is a trade off to decide the optimum size of block.

Free Space Management

Type of free space management used depends on the allocation algorithm.

We deal with logical blocks in allocation, deallocation or keeping track of free


blocks.

1. Bitmap
Free block=1
Reserved block=0
Compared to 3rd, less space required.
2. Counting
3 3
7 2
From block no. 3 , 3 blocks are free.
From block no. 7 , 2 blocks are free.

3. To keep the free block address at one place.




More space is required.

4. We keep a linked list of free blocks. Min. space is required.

ALLOCATION ALGORITHMS:1) Contiguous allocation method:To allocate a file, all the logical blocks should be contiguous.
EX:
Holes available is of size 3,2,1,5 file is of size 2.
Hole of size n means n contiguous free blocks.
1. Choose hole which is just enough to meet the requirement i.e. hole of size 2 (Best Fit Method of
continuous allocation). After doing allocation, if file grows & requires more
expansion (require one more block),then we will not be able to allocate that.
2. Choose hole which is largest in size. i.e. hole of size 5.
(Worst fit Method of continuous allocation)
We defragmented the larger hole into two smaller holes, then external fragmentation can be a problem.
External Fragmentation:- If the total space available is large enough to meet our requirement, but
blocks are scattered so that we cannot allocate them to a file.
Internal Fragmentation:- Wastage of space due to partially occupied block
We cannot avoid internal fragmentation in any case.
Both best fit & worst fit takes large time because they have to scan the whole list to find the smallest hole
or largest hole respectively.
First Fit (First hole that meets requirement) i.e hole of size 3
Factors
1) kind of environment e.g. real time system in which time is important, we use first fit.
2) Nature of application e.g. database application in which file size grows continuously, we use worst fit.
3) Where we are not going to modify the files e.g. OS utilities, we use best fit.
Choosing best fit or first fit, we want to expand our file, what can we do?
1) Find out some hole that meets our requirement, and shift file there. It takes time
in :
a) read and write operations required, which takes time
b) new entries have to be made in file system tables.
Solution for External Fragmentation:Bring all the free blocks & keep them at the one place .Bring all occupied blocks at one place resulting in
a large hole. This is called defragmentation or compaction.
This is also time consuming due to the same reasons (above two a and b).
ADVANTAGES OF CONTIGUOUS ALLOCATIONS:
1. Access time is very less because blocks are contiguous.
(seek time and rotational latency are less)
Seek time: time to move the R/W head to the proper cylinder.
Rotational delay: time for the proper sector to rotate under the head.
2. Random(Direct) access is possible.

DISADVANTAGES :
1. it suffers from external fragmentation .
If the system is time critical . e.g. real time system then we use this method.
If space is less . e.g. very small hard disk, then we dont go for this method.
LINKED ALLOCATION:
We keep a linked list of blocks pertaining to a file
In directory table:
For contiguous allocation
File name
Starting
Size of file
address

for linked allocation


File name
Starting
address

 External fragmentation will not be there.


 Access time and allocation time is large as blocks are not contiguous ( seek time and rotation
latency will be more as compared to contiguous)
 Space is wasted to keep address of next block. If 4 bytes are required to address each block, then
4 bytes will be wasted in each block.
 File can be expanded easily.
 Reliability (if pointer is lost). Remedy : Double linked list may be used, space overhead in storing
the pointers is more. We have to keep entry of last block also in directory table.
 Direct access is not possible (random access) in case of linked allocation.
Because suppose we want to directly access 2nd block of a file, We cannot do it w/o reading 1st
block as 1st block contains the address of 2nd block.
When the file sizes are small and space is also important, then we have to use linked allocation.
INDEXED ALLOCATION
Index block contain the entries of all the blocks pertaining to a file.
 More space is wasted in index block for smaller files.
 Direct access is possible.
 Access time and allocation time will be larger then contiguous allocation and comparable with
linked allocation.
 No external fragmentation.
Q. If file size is larger so that the block address cannot be accommodated in one index block, then what
should we do?
Ans:

1. Keep a linked list of index blocks.


2. In single level indexing, 256 blocks is the maximum file size. ( considering
logical block size=1024 bytes and 4 bytes are required to address each block)
In two level indexing, 256*256 blocks is the max file size.
In three level indexing, 256*256*256 blocks is the max file size.
Practically, only up to three levels indexing is done.

Dos and Unix file systems use indexed allocation method.


It cannot be used where time criticality is there otherwise where space is a prime factor, we can use this
approach.
FREE SPACE MANAGEMENT TECHNIQUE:
Suitable for contiguous allocation------counting method.
Suitable for linked allocation -----------linked list of free blocks.
Suitable for indexed allocation ---------bitmap or keeping the address of free blocks at one place.
Ext2 file system in red hat linux uses bitmap technique and standard Unix file system uses keeping the
address of free blocks at one place.

Unix File System


mkfs (for creating file System)
-While creating file system, user may decide logical block size, number and size of subpartions etc.
- Tables will be created on secondary storage that will keep information like
a. Info. about files
b. Free space etc.
Boot Block
Super Block

Inode Table

Data Area

Disk map for UNIX File System

Boot block consists of Routine related to booting operation which is not directly concerned with the
file system

1.
2.
3.
4.
5.

Super block consist information about the state of the file system like:
Total no of logical blocks.
No. of free blocks
A list of free blocks
Pointer to next free block
Total no of inodes

6. Total no. of free inodes


7. A list of free inodes
8. Ptr. to next free inode

i-node information node or index node.


Corresponding to every file, one inode is there in the inode table that will keep all the information about
that file except the file name.
Inode entries are given on page 209 in S.Das, III ED.
Filename and i-node no are stored in root directory table
Directory table entry
File name

I-node number

Somewhere in the data area, root directory table is created.


Some predefined entry in i-node table is always pointing to the root directory table, i-node no 2 is always
pointing to root directory table.
User can decide while creating file system:
- No. of i-nodes (max. no. of files)
- Logical block size

Free Space management technique in standard unix FS -> keeping address of all free blocks at one place
In ext2 file system in redhat Linux (given in Tanenbaum and Galvin in case study part)
 bitmap
Data area also contains a linked list of free blocks. Pointer of first free block is available in super block.
Whenever there is deallocation of file corresponding i-node entry is made in super block.
Similarly, free block entries are also added in super block.
Copy of super block is kept in RAM(main memory), So information available on RAM. So time will
save as we need not require to access hard disk for each allocation or deallocation. When we shutdown
our system this super block is copied to hard disk. All the updations are done in super block in main
memory. Super block at hard disk is updated at regular intervals by sync() system call.
What method Unix uses for allocation of blocks?
i-node table contains address of 10 direct blocks ,
11th entry contain the address of index block (which further contain the address of data blocks of file ). i.e
single indirect or first level of indexing
12th entry second level of indexing. (double indirect)
13th entry third level of indexing. (triple indirect)
Maximum file size supported by unix= (10+ 256 + 256*256 + 256*256*256) blocks.
Assumes 256 data block addresses are there in one block.
For small files,space overhead is less & also access time is less, therefore 10 direct blocks are used.

Whenever file is deleted


- entry from directory table deleted
- free up the disk blocks
- free up the inode
Entry of free blocks is made in free block list of super block from the end if allocation is done from
beginning so that it is possible to recover the file if we want to do so.
But UNIX does not give any chance to recover file at the lowest level also, being a multiuser
system. So unix file system add free blocks to free block list of super block from the beginning if
allocation is done from beginning so that it can not be possible to recover the file by any means.
DOS File System-Format operation.
File system creation operation. (format command)

Boot
block(512
bytes)
Fat
#1 (14
physical
block)
Fat
#2 (14
physical
block)
Root
directory
table
Data area

Logical block is called a cluster which starts from data area.

ROOT DIRECTORY
Filename
XYZ

Starting cluster of files


5

FILE ALLOCATION TABLE


2nd
3rd
5th
7th
230
FAT

Free cluster
Free cluster
7
230
(-1)End of file

So clusters of file are 5,7 and 230.


One common table which contain information about clusters of all the files.
It is not required to read previous block of file to read next block i.e. it is not linked allocation.
But it may be regarded as indexed allocation because blocks are kept as one place corresponding to a file,
common index block corresponding to all files.
When file is deleted.
(1) Entry is not removed from directory table, The Ist character of file is replaced by some special
character.
(2) Free clusters are not given to any other file as far as possible.
So it may be possible to recover the file.
Utility undelete is used to recover the file.
Those blocks which are bad blocks should not be allocated to any file. -2 entry in FAT indicates
bad cluster.
In unix , inode 1 contains list of bad blocks(as if they are occupied by some file)
Q. How you identify cluster as a bad cluster?
DIFF. BETWEEN UNIX & DOS FILE SYSTEM:-No. of files is limited in unix by the number of inodes
No of files can be equals to no. of clusters in DOS.
-size of root directory is fixed in dos, so max. no. of files in root directory are limited.
-it is not possible to recover files at the lowest level in unix ,so security of files is there.

Directory Structures:A directory is a table in which we keep info about the files.
Root directory table:- by default root directory table is created whenever file system is created.
Sometimes it is also called as device directory.
Directory can be regarded as files.
1.

F1

F2

F3

---------------

Fn

F1

Names of
ordinary
files(not the
directory)

Fn

Simplest structure called Single level directory structure.


Need of directory (Problem with single level directory structure):

to group similar kinds of files & place them at one place


to place multiple files with same name
for security reason in a multi-user environment

2. so we go for two level directory structure

U1

F1

U2

F2
F3

F4

Corresponding to each user, there is a directory.


Problem:
User cannot group similar kind of files & keep them at one place.

3. Tree Structure:-

Dig. of tree structure

we can go up to any level


By pathname, we can refer to any file.

absolute pathname:- starting from /


relative pathname:- relative to the current directory or pwd
File system should remember pwd at some place & append pwd to relative pathname to form absolute
pathname.
In UNIX, shell remembers what is current working directory.
In DOS, file system remembers what is current working directory.
Home Directory:By default, it is the directory in which we login.
IN DOS : By default, Home directory is root directory.
. - Current directory
.. - parent of current directory
usage of .. in forming the relative pathnames.
Searching for executable files:OS identify an executable file by a magic number in the header of the file. Shell searches for executable
files in the directories declared in path variable.
$echo $PATH
PATH = /usr/bin: /bin: /
Shell does not search absolute pathname for other files specified as arguments.
In DOS, shell searches in current directory by default .In Unix, shell does not search in current directory
by default. If we want that shell should also search in current directory , then we should declare current
directory also in PATH variable by a dot.
PATH=$PATH:.
This is also a significance of . for representing current directory.

Sharing of files (between user1 and user2)Copy file from user2 to user1, but problems are1. Copying task is cumbersome; it takes time in read and writes operation. Suppose 10 users wants to
share a file, copying task is then more cumbersome.
2. Consistency will not be there. Different users does not feel that file is in their directory.
In order to avoid inconsistency, copy of the file should be physically one. Also users should feel that file
is in their own directory.
Tree structure cannot realize this, so it cannot employ sharing of files. So we go for graph structure.

System command for sharing in UNIX is $ ln u1/f1 u2/f2

u1
F1

u1
F2

Make an entry of file in the directory table of different users with same inode number who wants to share
the file so that the file remains physically one, but different users feel that they have file in their own
directory table.
File deletion for a shared file should not be done until there is atleast one user.
DELETION ALGORITHM
1. Delete the directory entry of file .
2. If the link count in inode is 1, file should be deleted physically if the link count is not 1, decrement it.

It is also possible to share the directories (To look at the listing of files, To create files in the directory, to
delete files in the directory).
Tree structure should not be converted to cyclic graph structures.
ln -d command ( for directory sharing using hard links) can only be used only by superuser for
ensuring that the graph structure is not cyclic or even superuser cannot execute it on some systems.
That is why; graph structure is acyclic graph structure in UNIX.
Reasons:
1> There is no usefulness of such type of sharing
2> If there is such type of sharing , problem is: Suppose we give ls -lR command then it will show
recursive listing of files infinitely.
- ln s d1 d2 can be used for creating symbolic links between directories. Here, U can create cycles but ls
lR command will traverse it only once.
Q> Why link count of directory is atleast 2?
Ans> Directory has always two links i.e. a dot and absolute pathname.
Note: Read shared files (page 408) topic from Tanenbaum, Acyclic Graph Directories & General
Graph Directory from Galvin and links from S.Das. Comparison of hard links and symbolic links is
not there in notes.

Internal data structures of Unix file system

Dig. of user file descriptor table, global file table and incore inode table.

Process is program under execution in main memory. Any program or process uses files.
Whenever a process is created, a file descriptor table is created in memory for each process.
Suppose process executes open system call to open file f1.
int fd;
fd= open ( f1, mode);
open system call returns lowest unoccupied file descriptor from file descriptor table. It points to some
location of file table (consisting of RDWR ptr and MODE info.) & from there, it points to incore inode of
file f1.
Now, any operations on file ( read, write, close, lseek, dup etc.) always use file descriptor and not the
filename.
read( fd, buffer,512) ----- reads first 512 bytes of f1
read (fd, buffer, 512)----- reads next 512 bytes of f1 (RDWR ptr corresponding to fd is at 512)
fd1=open (f1, Mode) (opening second instance of f1 makes the value of reference count=2 in incore
inode entry of file f1)
read( fd1, buffer,512) ----- reads first 512 bytes of f1 (RDWR ptr corresponding to fd1 is at 0)

Additional info. in incore inode table1. reference count that how many instances of file are opened.
2. inode no.
3. a flag that indicates whether changes have been made in the incore inode or not.
for close(fd1)
1. Unoccupy fd1 from fd table, corresponding entry of global file table and reference count is
decrement by 1.
2. If reference count becomes 0, remove corresponding incore inode entry and save it in inode table
on secondary storage if changes have been made in it by some process, otherwise simply discard
it.
A process can open 20 instance of files simultaneously as entries in FD
table are only 20 in standard UFS.
Whenever a process is created , 0,1,2 file descriptors are already occupied in FD table.
FD-0 points to standard input device i.e. keyboard
FD-1 and FD-2 points to standard output device i.e. monitor

The process is by default writes its o/p on file descriptor 1 or corresponding channel of FD1 and because
by default FD1 is connected to monitor, so by default o/p goes to monitor.
The process by default reads on FD0 & By default a process writes error on FD2
$ ls
By default , process writes on fD1,output goes to monitor.
$ls>F1
process ls writes on FD1 and output goes to file f1 by breaking the
channel.
fd=open(f1,O_WRONLY);
close(1);
dup(fd);
execl (/bin/ls, ls, 0);
ls doesnt know that this redirection takes place.

Read 16.5.1 to 16.5.4 from CROWLEY


Process Management
Difference between program and process
Program
Static entity as it is statically lying on the hard
disk.
Program consists of instructions (code) and
data.

Process
Program under Execution in main memory
Dynamic entity because the state of process
changes with respect to time
Consists of code, data and system data.

Dig of process state diagram

From submission to termination, there are various possible states of a process.


New: a program is submitted to become a process. Request is made to process management module of
OS.
Long term scheduler decides that program becomes process or not based on the following factors:
- Whether the process is CPU bound or IO bound. Proper job mixing of CPU bound and IO bound
processes should be there for efficient utilization of CPU and IO devices.
- Whether memory is available for the process or not.

LTS also considers degree of multiprogramming as we have to restrict degree of


multiprogramming also upto some limit (after that CPU utilization starts decreasing) (we will
study this in memory mgmt.)

Ready: A process is created in memory with code, data and system data segment.
( there is a ready queue of process in ready state)
Running: CPU running (executing) the process.
Short term scheduler (CPU scheduler) selects a process from ready queue to execute it on the CPU.
CPU scheduler is invoked after few msec, while LTS is invoked after longer interval ( may be when some
process leaves the system to control degree of multiprogramming)
WAITING : process is waiting for I/O devices.
The transition from WAITING state to END state in not possible. The last instruction in process is always
a cpu instruction .i.e. exit().
SWAPPED OUT :we realized later that process should not be accepted by long term scheduler that
process goes to swapped out state by medium term scheduler. Factors that MTS considers are same as for
LTS.
In interactive systems, there is no LTS as response time is important factor in case of interactive systems.
DEADLOCK : if the process is infinitely waiting in the waiting queue.
Read 3.1, 3.2, 3.3 topics from Galvin.
- fork() system call creates a child process.
Syntax: id=fork()
It returns id=0 to the child process, & id=pid of child process to the parent process. Address of id is
same, both in the child & in the parent.
- In Fig. 3.9 of Galvin, there will be no wait() in else part, if shell executes the child process in
background.
- One of the reasons of process creation by shell is that shell can redirect I/O of newly created child
process.

Booting process of Unix:


- Swapper (pid=0) is created. Swapper mounts the root file system on root directory. It creates init
process (pid=1). Init is the ancestor of all the processes in the system. Orphan processes are
adopted by init process.
- Execution of /etc/rc script will take place and getty process is created corresponding to each
terminal.
- getty creates login process, it asks the user for login name and password & after user
authentication, login creates shell process.

Read pipe from class notes and S.Das III ED or Design and implementation of Unix OS by M.Bach or
Galvin or .

Read 16.5.5 and 16.5.6 (may be after MM) from CROWLEY

Creation of FS on hard disk :Boot block does not go to 1st physical sector of hard disk, it may go to 2nd sector of hard disk & so on .
A hard disk may have multiple OS on same hard disk, we have to partition hard disk into two
logical hard disk .

DOS

UNIX

I sector contains partition table (which contains info about available partitions on the disk) + partition
table loader routine.
Partition no

type of OS

1
2

Starting address

size of partition

Active
partition

DOS
Unix

Booting process:1. Control goes to predefined location of ROM consisting of ROM bootstrap routine. Execution of ROM
Bootstrap routine takes place.
2. Control goes to very first sector of the disk & execution of PT loader routine takes place. It loads the
partition table in memory & see how many partitions are available & which partition is currently active.
3. Then control is transferred to very first sector of active partition which consists of Bootblock
(consisting of RAM bootstrap routine)
Remaining steps will be different for DOS and Unix OS.
Power on self test (POST) routine is three in BIOS which check all the devices before booting starts.
Certain utilities are available to partition hard disk
fdisk utility (in both dos and unix)
User may decide:
1.
2.
3.
4.

No of partitions
Type of partition
Active partition
size of partition

Partition table is also called as:- Master Boot record in DOS or Windows.
- Volume table of control (VTOC) in UNIX.
LILO Boot (Linux Loader Boot) & GRUB are the boot loaders for linux. They also ask from user to make
a particular partition as active.

Formatting of hard disk:When disk is fresh disk, low level formatting or physical formatting to create sector out of the tracks and
it also creates an empty partition table, it will erase all the data on the disk.

Partition table
I(DOS)

Unix-1

Unix-2

Then create the FS on DOS and UNIX partition area.


This creation of FS in particular partition is called High Level Formatting or logical formatting.
Again, logical formatting means recreation of FS. It does not mean that data is erased physically. In order
to recover data again utility is unformat C:. This restores the file system tables in the previous state.
The option should be given at the time of formatting that FS tables should be stored at some place.
Way by which physical formatting is done:Third party softwares are available. Certain packages are available e.g. disk manager, Norton.
mkfs for file system creation in case of UNIX.

Subpartitions and Mounting


What is the req. of subpartitions?
(i)
To store all the user files / directories in one partition & system files in other partition.
(ii)
The two partitions have separate file systems. It is possible to decide different sizes of logical
blocks.
 It is required to store info in extended partition tables about subpartitions.
In case of DOS, I partition primary partition.
II partition extended partition.
With in extended partition, logical drivers are there.
In Unix, U can see the subpartitions by running command df.
Root partition or primary partition is mounted on root directory. Other subpartitions are to be mounted on
some empty directory of root file system.
All the partitions have files required for booting purpose. But boot block (consisting of RAM bootstrap
routine) of primary partition is used for booting purpose. All the partitions have file systems. Root file
system is the file system in the primary partition.
Mounting

Unix 1
Root file System
(Consists of OS files)

Unix 2
(Consists of
user files)

d1 is user file in 2nd file


system .What is pathname
for d1?

We have to join these two trees and


Unix 2 is subtree this is called
Mount operation.

In UFS, when we boot our system, root file system is accessible by default.
By default , other file systems are not accessible. To make them accessible ,
we have to mount it on some empty directory of root FS.

File System
$mount <

Which is to be
mounted.

>

Mount pt. of root file system.


<

>

Where file system to


be mounted.

e.g. $mount /dev/fd0 /media/floppy


Note: In linux, mount point may be fixed also. It is given in /etc/ftsab.
The directory at which we are mounting should be an empty directory.
All devices are regarded as files, subpartitions are also regarded as files.
If there are multiple FS , it is reqd. to mount them at different mount pts. of root FS only . It is not
possible to mount any FS on mounted FS. i.e. cascading of FS is not possible in standard Unix.
To unmount the FS :$umount <device filename or mount pt.>
/etc/mtab ( mount table) contains info about mounted FS. Any access to file of mounted FS is
through /etc/mnttab which gives device file for this file System.
e.g. /dev/sda2 is mounted on /usr directory. Now when U write $ vi /usr/student/f1, then OS first go
to the inode of usr directory. There it will see that usr is a mount point ( some flag indicates in
incore inode that it is a mount point), then OS reads the device file name corresponding to usr
mount point from mount table /etc/mnttab (here it is /dev/sda2). Then OS will open /dev/sda2 to
retrieve the contents of file f1. Device driver corresponding to /dev/sda2 will be selected from its
major no. and minor no.
No mounting is required in DOS or windows.
Note: Read 10.4: File system mounting, 11.2.2: partitions and mounting, 12.5.1: Disk Formatting
12.5.2: Boot Block topics from Galvin.
Read Protection of files topic from Galvin and also read about rwx permissions from S.Das

Problems: 16.12, 16.13, 16.14 from Crowley, problems from Chapter 17


17.7 Booting the OS from Crowley

You might also like