0% found this document useful (0 votes)
23 views28 pages

22 File Systems 2

The document discusses how virtual file systems (VFS) and logging in file systems help provide efficient and reliable file access. VFS provides an abstract file system interface that allows concrete file systems like ext2 and vfat to implement file operations like open and read in a standardized way. Logging file systems like ext3 and xv6's file system record file system metadata changes to a log to allow for faster recovery from crashes by replaying incomplete operations.

Uploaded by

vaishnavisp13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views28 pages

22 File Systems 2

The document discusses how virtual file systems (VFS) and logging in file systems help provide efficient and reliable file access. VFS provides an abstract file system interface that allows concrete file systems like ext2 and vfat to implement file operations like open and read in a standardized way. Logging file systems like ext3 and xv6's file system record file system metadata changes to a log to allow for faster recovery from crashes by replaying incomplete operations.

Uploaded by

vaishnavisp13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

VFS

VFS

Consider this
/dev/sda1 is “/”
/dev/sda2 is mounted on “/a/b” folder
How does this work in kernel?
open(“/a/b/c/d”, O_RDONLY)

Consider xv6 code

sys_open -> namei -> namex -> (skipelem, dirlookup, ilock)

Dirlookup() of “c” in “/a/b” should return : Not the inode of “c” on
/dev/sda1 but inode of “/” on /dev/sda2
VFS

Object Oriented Programming in C (let’s see
example of this) struct inode_operations {

Clever use of function pointers int (*readi) (int, char *, int);

There is an “abstract” file system class (VFS), int (*writei) (int, char *, int);
and there are concrete file system classes ....
(ext2, vfat, ...) }

sys_read → fileread → readi() becomes struct inode {

sys_read → fileread → (i->-i_ops->read)() int mode,

Inode is a generic inode Int uid;

Contains file system specific inode pointer ....

And file system specific inode operations
void *inode_specific;

Fields setup during namei()
struct inode_ops i_ops;
}
Efficiency and Performance
(and the risks created
while trying to achieve it!)
Efficiency

Efficiency dependent on:

Disk allocation and directory algorithms

Types of data kept in file’s directory entry

Pre-allocation or as-needed allocation of
metadata structures

Fixed-size or varying-size data structures

Performance

Keeping data and metadata close together

Buffer cache – separate section of main memory for frequently used
blocks

Synchronous writes sometimes requested by apps or needed by OS

No buffering / caching – writes must hit disk before acknowledgement

Asynchronous writes more common, buffer-able, faster

Free-behind and read-ahead – techniques to optimize sequential access

Reads frequently slower than writes
Page cache

A page cache caches pages rather than disk
blocks using virtual memory techniques and
addresses

Memory-mapped I/O uses a page cache

Routine I/O through the file system uses the buffer
(disk) cache

This leads to the following figure
I/O Without a Unified Buffer Cache
Unified buffer cache

A unified buffer cache uses the same page
cache to cache both memory-mapped pages
and ordinary file system I/O to avoid double
caching

But which caches get priority, and what
replacement algorithms to use?
I/O Using a Unified Buffer Cache
Recovery

Problem. Consider creating a file on ext2 file system.

Following on disk data structures will/may get modified

Directory data block, new directory data block, block bitmap, inode table, inode
table bitmap, group descriptor, super block, data blocks for new file, more data
block bitmaps, ...

All cached in memory by OS

Delayed write – OS writes changes in its in-memory data structures, and
schedules writes to disk when convenient

Possible that some of the above changes are written, but some are not

Inconsistent data structure! --> Example: inode table written, inode bitmap written,
but directory data block not written
Recovery

fsck: Consistency checking – compares data in directory
structure with data blocks on disk, and tries to fix inconsistencies

Can be slow and sometimes fails

Use system programs to back up data from disk to another
storage device (magnetic tape, other magnetic disk, optical)

Recover lost file or disk by restoring data from backup

Faster recovery? - “log structured file system” or “journaling file
system” can help
Log structured file systems

Log structured (or journaling) file systems record each metadata update to the
file system as a transaction

All transactions are written to a log

A transaction is considered committed once it is written to the log (sequentially)

Sometimes to a separate device or section of disk

However, the file system may not yet be updated

The transactions in the log are asynchronously written to the file system
structures

When the file system structures are modified, the transaction is removed from the log

If the file system crashes, all remaining transactions in the log must still be
performed

Faster recovery from crash, removes chance of inconsistency of metadata
Journaling file systems

Veritas FS

Ext3, Ext4

Xv6 file system!
log in xv6

a mechanism of recovery from disk

Concept: multiple write operations needed for system calls (e.g.
‘open’ system call to create a file in a directory)

some writes succed and some don’t

leading to inconsistencies on disk

In the log, all changes for a ‘transaction’ (an operation) are either
written completely or not at all

During recovery, completed operations can be “rerun” and
incomplete operations neglected
log on disk
log.start log.start + 1
log.start+30

data data
boot super
block block
log of log
52nd of 68nd ... log indoes | bitmap | data ....
block block

block[30]
Start, size, n 52 68
etc =2

0 1 2 3 ..... 29

logheader
log in xv6

xv6 system call does not directly write the on-disk file system data structures.

A system call calls begin_op() at begining and end_op() at end

begin_op() increments log.outstanding

end_op() decrements log.outstanding, and if it’s 0, then calls commit()

During the code of system call, whenever a buffer is modified, (and done with)

log_write() is called

This copies the block in an array of blocks inside log, the block is not written in it’s
actual place in FS as of now

when finally commit() is called, all modified blocks are copied to disk in the
file system
log
struct logheader { // ON DISK
int n; // number of entries in use in block[] below
int block[LOGSIZE]; // List of block numbers stored
};
struct log { // only in memory
struct spinlock lock;
int start; // first log block on disk (starts with logheader)
int size; // total number of log blocks (in use out of 30)
int outstanding; // how many FS sys calls are executing.
int committing; // in commit(), please wait.
int dev; // FS device
struct logheader lh; // copy of the on disk logheader
};
struct log log;
Typical use case of logging
/* In a system call code * / prepare for logging. Wait if logging
system is not ready or ‘committing’. +
begin_op(); +outstanding
... read and get access to a data block –
as a buffer
bp = bread(...);
modify buffer
bp->data[...] = ...;
note down this buffer for writing, in
log_write(bp); log. proxy for bwrite(). Mark B_DIRTY.
Absorb multiple writes into one.
...
Syscall done. write log and all blocks.
end_op(); --outstanding.
If outstanding = 0, commit().

match colors in code and comments on right-side


Example of calls to logging
//file_write() code 
each writei() in turn calls
bread(), log_write() and
begin_op();
brelse()
ilock(f->ip); 
also calles iupdate(ip) which
/*loop */ r = writei(f->ip, ...); also calls bread, log_write
and brelse
iunlock(f->ip); 
Multiple writes are
end_op(); combined between
begin_op() and end_op()
Logging functions

Initlog() 
write_log(void)

Set fields in global log.xyz variables, using 
Called only from commit()
FS superblock

Use block numbers specified in

Recovery if needed
log.lh.block and copy those blocks from

Called from first forkret() memory to log-blocks

Following three called by FS code 
commit(void)

begin_op(void) 
Called only from end_op()

Increment log.outstanding 
write_log()

end_op(void) 
Write header to disk log-header

Decrement log.oustanding and call commit()
if it’s zero

Copy from log blocks to actual FS blocks

log_write(buf *)

Reset and write log header again

Remember the specified block number in
log.lh.block[] array

Set the block to be dirty

You might also like