0% found this document useful (0 votes)
5 views

OSC___CH12___File_System_Implementation

This document discusses file system implementation concepts, including the structure and organization of typical file systems such as VSFS and NTFS. It covers topics like inode management, allocation methods, and the use of block pointers for file data storage. The document aims to provide insights into local and remote file system implementations, as well as block allocation algorithms and their trade-offs.

Uploaded by

Atsenta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

OSC___CH12___File_System_Implementation

This document discusses file system implementation concepts, including the structure and organization of typical file systems such as VSFS and NTFS. It covers topics like inode management, allocation methods, and the use of block pointers for file data storage. The document aims to provide insights into local and remote file system implementations, as well as block allocation algorithms and their trade-offs.

Uploaded by

Atsenta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

0.Prologue 1.VSFS 2.FFS 3.

FSCK & Journaling

OPERATING SYSTEM CONCEPTS


Chapter 12. File-System Implementation

A/Prof. Kai Dong

[email protected]
School of Computer Science and Engineering,
Southeast University

December 16, 2024

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 1 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Contents

1 Warm-up

2 Typical File System

3 Fast File System

4 FSCK and Journaling

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 2 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Warm-up
File System Measurement Summary

Most files are small Roughly 2K is the most common size


Average file size is growing Almost 200K is the average
Most bytes are stored in large files A few big files use most of the space
File systems contains lots of files Almost 100K on average
File systems are roughly half full Even as disks grow, file systems remain ∼ 50% full
Directories are typically small Many have few entries; most have 20 or fewer

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 3 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Warm-up
Size of A File

• Why the files are of these sizes and use these spaces.

Filename Content Description Size Space


test1 - - 0 0
test2 “This is a test file.\r\n” ×1 (line) 22 bytes 0
test3 “This is a test file.\r\n” ×40 (lines) 878 bytes 4 KB
test4 “This is a test file.\r\n” ×(40 − 39) (lines) 22 bytes 4 KB

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 4 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Objectives

• To describe the details of implementing local file systems and directory


structures.
• To describe the implementation of remote file systems.
• To discuss block allocation and free-block algorithms and trade-offs.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 5 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Contents

1 Warm-up

2 Typical File System

3 Fast File System

4 FSCK and Journaling

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 6 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Very Simple File System

• We are now discussing a simple file system implementation, known as VSFS (the
Very Simple File System)
• A simplified version of a typical UNIX file system.
• You should understand
• Data structures: what types of on-disk structures are utilized by the file
system to organize its data and metadata?
• Access methods: How does it map the calls made by a process onto its
structures?

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 7 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Overall Organization

• Divide the disk into blocks (with a commonly-used block size of 4 KB).
• Assume a really small disk, with just 64 blocks.

0 7 8 15 16 23 24 31

32 39 40 47 48 55 56 63

• Reserve a fixed portion of the disk for the data region


• Say the last 56 of 64 blocks on the disk:
Data Region

D D D D D D D D D D D D D D D D D D D D D D D D

0 7 8 15 16 23 24 31

Data Region

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

32 39 40 47 48 55 56 63

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 8 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Overall Organization (contd.)

• To track information about each file, the inodes are stored in the inode table.
• Assume 5 of 64 blocks for inodes.
Inodes Data Region

I I I I I I I I D D D D D D D D D D D D D D D D D D D D D D D D

0 7 8 15 16 23 24 31

Data Region

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

32 39 40 47 48 55 56 63

• Assuming 256 bytes per inode, our file system contains ? total inodes.
• This number represents the maximum number of files we can have in our
file system.
• How could the file system know which inode /data block is free?

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 9 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Overall Organization (contd.)

• To track whether inodes or data blocks are free or allocated, an inode bitmap
and a data bitmap are required.
Inodes Data Region

SI iI dI I I I I I D D D D D D D D D D D D D D D D D D D D D D D D

0 7 8 15 16 23 24 31

Data Region

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

32 39 40 47 48 55 56 63

• Optional data structures include a free list.


• The remaining block is the Superblock.
• It contains information about this particular file system, including, for
example, how many inodes and data blocks are in the file system, where
the inode table begins, and so forth.
• When mounting a file system, the operating system will read the
superblock first.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 10 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


File Organization: the Inode

• An inode in UNIX FS is a File Control Block (FCB).


• The name inode is short for index node.
• Each inode is implicitly referred to by a number, called the inumber.
iblock 0 iblock 1 iblock 2 iblock 3 iblock 4
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77
Super i-bmap d-bmap
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78

3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79

0KB 4KB 8KB 12KB 16KB 20KB 24KB 28KB 32KB

• How to read a given inode number X ?


• By calculating the offset into the inode region (X · sizeof (inode)), add it to
the start address of the inode table on disk.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 11 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


File Organization: the Inode (contd.)

Size Name What is this inode field for?


2 mode can this file be read/written/executed?
2 uid who owns this file?
4 size how many bytes are in this file?
4 time what time was this file last accessed?
4 ctime what time was this file created?
4 mtime what time was this file last modified?
4 dtime what time was this inode deleted?
2 gid which group does this file belong to?
2 links count how many hard links are there to this file?
4 blocks how many blocks have been allocated to this file?
4 flags how should ext2 use this inode?
4 osd1 an OS-dependent field
60 block a set of disk pointers (15 total)
4 generation file version (used by NFS)
4 file acl a new permissions model beyond mode bits
4 dir acl called access control lists
Simplified Ext2 Inode

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 12 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


File Organization: the Inode (contd.)

• How an inode refers to where data blocks are.


• Suppose a simple approach:
• To have one or more direct pointers (disk addresses) inside the inode; each
pointer refers to one disk block that belongs to the file.
• Such an approach is limited:
• For example, if you want to have a file that is really big (e.g., bigger than
the size of a block multiplied by the number of direct pointers).
• Solution: To make use of indirect pointers.
• Instead of pointing to a block that contains user data, it points to a block
that contains more pointers, each of which point to user data.
• What is the maximum size, with 12 direct pointers and 1 indirect
pointer

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 13 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


The Multi-Level Index

• To support even larger files, by adding a


• double indirect pointer
• What is the maximum size, with 12 direct pointers, 1 indirect
pointer, 1 double indirect pointer
• triple indirect pointer
• What is the maximum size, with 12 direct pointers, 1 indirect
pointer, 1 double indirect pointer, 1 triple indirect pointer
• Or, by using extents instead of pointers. (ext4)
• An extent is simply a disk pointer plus a length (in blocks) to specify the
on-disk location of a file.
• Extent-based file systems often allow for more than one extent.
• Less flexible but more compact.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 14 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


FAT (File-Allocation Table)

directory entry
test ··· 217
name start block
0

217 618

339

618 339

-1
FAT

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 15 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


NTFS

• NTFS (new technology file system) with MFT (master file table)
• A record in MFT (each is 1KB in size):
• Body contains attr. data, or a pointer to an extent

Head (ASCII FILE + · · · )


Head 0x10
Attr. 1
Body Standard Information
Head 0x30
Attr. 2
Body File Name
Attr.
···
Head 0x80
Attr. 8
Body Data
···
FF FF FF FF

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 16 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Allocation Method

• Contiguous allocation
• Each file occupies a set of contiguous blocks on the disk.
• ext4, ntfs
• Linked allocation
• Each file is a linked list of disk blocks
• fat
• Indexed allocation
• Brings all pointers together into the index block
• ext2, ext3

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 17 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


In Class Exercise

• Imagine a file system which uses inodes to manage files on disk. Each inode
consists of a file name (4 bytes), user id (2 bytes), three timestamps (4 bytes
each), protection bits (2 bytes), a reference count (2 bytes), a file type (2 bytes),
and the file size (4 bytes). Additionally, the inode contains 13 direct indices, 1
index to a single indirect block, 1 index to a double indirect block, and one index
to a triple indirect block. Each of these indices (block pointer) is 4 bytes. The
file system also stores the first 356 bytes of each file in the inode.
1 Three major methods of allocating disk space are introduced in our
textbook. What are these three allocation methods? Which one is used in
the previous file system?
2 Assume a disk sector is 512 bytes and that each indirect block fills a single
sector. What is the maximum file size for this file system? Show your work
clearly. You need not do the arithmetic to get full credit.
3 Is there any benefit to including the first 356 bytes of the file in the inode?
If so, what is the reason? If not, why not?

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 18 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Key

• indexed allocation
• (512/4)3 ∗ 512 + (512/4)2 ∗ 512 + (512/4)1 ∗ 512 + 13 ∗ 512 + 356
• Yes, Efficiency in both spatial and temporal. Most files are small. For small files
(≤356 bytes), do not need to access disk twice. save disk space (internal
fragmentation within blocks).

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 19 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Directory Organization

• A directory basically contains a list of (entry name, inode number) pairs.


inum reclen strlen name
5 4 2 .
2 4 3 ..
12 4 4 foo
13 4 4 bar
24 8 7 foobar

• Deleting a file (e.g., calling unlink()) can leave an empty space in the
middle of the directory, and hence there should be some way to mark that
as well (e.g., with a reserved inode number such as zero).
• Such a delete is one reason the record length (reclen) is used: a new entry
may reuse an old, bigger entry and thus have extra space within.
• A directory has an inode, somewhere in the inode table (with the type field
of the inode marked as “directory” instead of “regular file”).

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 20 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Access Paths: Reading and Writing

• Suppose we want to open a file, e.g., /foo/bar, read it, then close it.
• Opening a file from disk

open(00 /foo/bar 00 , O RDONLY )

• The file system must traverse the pathname and thus locate the desired inode.
1 Read the inode of the root directory which is simply called /.
2 Look inside the inode to find pointers to data blocks, which contain the
contents of the root directory.
3 Find the entry for foo, and the inode number.
4 ···

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 21 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Access Paths: Reading and Writing (contd.)

• Reading a file from disk


read()

1 The first read (at offset 0 unless lseek() has been called) will thus read in
the first block of the file, consulting the inode to find the location of such
a block.
2 ···

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 22 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Access Paths: Reading and Writing (contd.)

data inode root foo bar root foo bar bar bar
bitmap bitmap inode inode inode data data data[0] data[1] data[2]
read
read
open(bar ) read
read
read
read
read() read
write
read
read() read
write
read
read() read
write

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 23 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Access Paths: Reading and Writing (contd.)

• Writing to disk is a similar process

write()

• Unlike reading, writing to the file may also allocate a block.


• Each write to a file logically generates five I/Os:
• one to read the data bitmap (which is then updated to mark the
newly-allocated block as used),
• one to write the bitmap (to reflect its new state to disk),
• two more to read and then write the inode (which is updated with the new
block’s location), and
• finally one to write the actual block itself.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 24 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System


Access Paths: Reading and Writing (contd.)

• Writing to disk
write()
• Considering file creation, the total amount of I/O traffic to do so is quite high:
• one read to the inode bitmap (to find a free inode),
• one write to the inode bitmap (to mark it allocated),
• one write to the new inode itself (to initialize it),
• one write to the data of the directory (to link the high-level name of the
file to its inode number), and
• one read and write to the directory inode to update it.
• if the directory needs to grow to accommodate the new entry, additional
I/Os (i.e., to the data bitmap, and the new directory block) will be needed
too.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 25 / 55
Typical File System

data inode root foo bar root foo bar bar bar
bitmap bitmap inode inode inode data data data[0] data[1] data[2]
read
read
read
read
create read
(/foo/bar ) write
write
read
write
write
read
read
write() write
write
write
read
read
write() write
write
write
read
read
write() write
write
write
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Contents

1 Warm-up

2 Typical File System

3 Fast File System

4 FSCK and Journaling

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 27 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• Performance problems
• Expensive positioning costs, if data was spread all over the place
• The data blocks of a file were often very far away from its inode, thus
inducing an expensive seek whenever one first read the inode and
then the data blocks.
• Fragmented file system, if the free space was not carefully managed.
• The original block size was too small (512bytes).
• Fast File System (FFS): disk awareness
• Design the file system structures and allocation policies to be “disk aware”
and thus improve performance.
• It keeps the same interface to the FS, but changes the internal
implementation.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 28 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• Modern drives do not export enough information for the file system to truly
understand whether a particular cylinder is in use.
• Modern file systems (such as Linux ext2, ext3, and ext4) instead organize the
drive into block groups.
• Whether you call them cylinder groups or block groups, these groups are the
central mechanism that FFS uses to improve performance.
• By placing two files within the same group, FFS can ensure that accessing
one after the other will not result in long seeks across the disk.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 29 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• FFS keeps within a single cylinder group all the structures you might expect a
file system to have.

S ib db Inodes Data

• FFS keeps a copy of the super block (S) in each group for reliability
reasons.
• A per-group inode bitmap (ib) and data bitmap (db) to track whether the
inodes and data blocks of the group are allocated.
• The inode and data block regions are just like those in the previous
very-simple file system.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 30 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• Policies: how to allocate files and directories


• The basic principle is: keep related stuff together (and its corollary, keep
unrelated stuff far apart).
• Placement heuristics: e.g.,
• For directories
• Place a directory in the cylinder group with a low number of
allocated directories and a high number of free inodes.
• For files
• First, it makes sure (in the general case) to allocate the data blocks
of a file in the same group as its inode.
• Second, it places all files that are in the same directory in the cylinder
group of the directory they are in.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 31 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• Assume three directories (/, /a, and /b), and four files (/a/c, /a/d, /a/e, and
/b/f).
• In general FS
group inodes data
0 /--------- /---------
1 a--------- a---------
2 b--------- b---------
3 c--------- c---------
4 d--------- d---------
5 e--------- e---------
6 f--------- f---------
7 ---------- ----------
···
• In FFS
group inodes data
0 /--------- /---------
1 acde------ accddee---
2 bf-------- bff-------
3 ---------- ----------
···

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 32 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• The Large-File Exception


• Without a different rule, a large file would entirely fill the block group it is
first placed within (and maybe others).
• Filling a block group in this manner is undesirable, as it prevents
subsequent “related” files from being placed within this block group, and
thus may hurt file-access locality.
• After some number of blocks are allocated into the first block group, FFS
places the next “large” chunk of the file in another block group.
• A tradeoff here, since spreading blocks of a file across the disk will hurt
performance.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 33 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• If a user creates one big file, /a


• In general FS-– not enough room for new files in the group in the root directory
(/).
group inodes data
0 /a-------- /aaaaaaaaa aaaaaaaaaa aaaaaaaaaa a---------
1 ---------- ---------- ---------- ---------- ----------
···
• In FFS
group inodes data
0 /a-------- /aaaaa---- ---------- ---------- ----------
1 ---------- aaaaa----- ---------- ---------- ----------
2 ---------- aaaaa----- ---------- ---------- ----------
3 ---------- aaaaa----- ---------- ---------- ----------
4 ---------- aaaaa----- ---------- ---------- ----------
5 ---------- aaaaa----- ---------- ---------- ----------
6 ---------- ---------- ---------- ---------- ----------
···

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 34 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Contents

1 Warm-up

2 Typical File System

3 Fast File System

4 FSCK and Journaling

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 35 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

• Considering consistency
• How to update persistent data structures despite the presence of a power
loss or system crash?
• Crash-consistency problem
• Imagine you have to update two on-disk structures, A and B, in order to
complete a particular operation. Because the disk only services a single
request at a time, one of these requests will reach the disk first (either A or
B). If the system crashes or loses power after one write completes, the
on-disk structure will be left in an inconsistent state.
• How to update the disk despite crashes?
• Two approaches
• A file system checker (fsck)
• Journaling (also known as write-ahead logging)

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 36 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


A Detailed Example

From

Inode Data
Bmap Bmap Inodes Daba Blocks

I.v1
Da

To

Inode Data
Bmap Bmap Inodes Daba Blocks

I.v2
Da Db

Three separate writes to the disk are required

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 37 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


A Detailed Example (contd.)

• Crash Scenarios: Imagine only a single write succeeds.


• Just the data block (Db) is written to disk.
• This case is not a problem at all, from the perspective of file-system
crash consistency.
• Just the updated inode (I[v2]) is written to disk.
• If we trust the inode pointer, we will read garbage data from the disk.
• File-system inconsistency: The on-disk bitmap is telling us that data
block 5 has not been allocated, but the inode is saying that it has.
• Just the updated bitmap (B[v2]) is written to disk.
• File-system inconsistency: The bitmap indicates that block 5 is
allocated, but there is no inode that points to it.
• Space leak: block 5 would never be used by the file system.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 38 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


A Detailed Example (contd.)

• Crash Scenarios (contd.): Two writes succeeds.


• The inode (I[v2]) and bitmap (B[v2]) are written to disk, but not data
(Db).
• The file system metadata is completely consistent.
• But block 5 has garbage in it.
• The inode (I[v2]) and the data block (Db) are written, but not the bitmap
(B[v2]).
• File-system inconsistency.
• The bitmap (B[v2]) and data block (Db) are written, but not the inode
(I[v2]).
• File-system inconsistency.
• We have no idea which file block 5 belongs to.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 39 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]1: FSCK

• Superblock: If fsck finds a suspect (corrupt) superblock; in this case, the system
(or administrator) may decide to use an alternate copy of the superblock.
• Free blocks: Fsck scans the inodes, indirect blocks, double indirect blocks, etc.,
to build an understanding of which blocks are currently allocated within the file
system. It uses this knowledge to produce a correct version of the allocation
bitmaps; thus, if there is any inconsistency between bitmaps and inodes, it is
resolved by trusting the information within the inodes.
• Inode state: Each inode is checked for corruption or other problems. Suspect
inode is cleared by fsck.
• Inode links: Fsck scans through the entire directory tree to verify the link count
of each allocated inode.
• Duplicates: Fsck checks for duplicate pointers. If one inode is obviously bad, it
may be cleared. Alternately, the pointed-to block could be copied, thus giving
each inode its own copy as desired.
• Bad blocks: A pointer is considered “bad” if it obviously points to something
outside its valid range.
• Directory checks: Fsck performs additional integrity checks on the contents of
each directory, e.g., making sure that each inode referred to in a directory entry
is allocated and no directory is linked to more than once.
A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 40 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling

• Fsck is too slow and it allows inconsistencies happen and then find and fix them
later when rebooting.
• An alternate solution is journaling (a.k.a. write-ahead logging).
• When updating the disk, before overwriting the structures in place, first
write down a little note (somewhere else on the disk, in a well-known
location) describing what you are about to do.
• If a crash takes places, the note tells exactly what to fix (and how to fix it)
after a crash, instead of having to scan the entire disk.
• Linux ext2 without journaling

Super Group 0 Group 1 ··· Group N

• Linux ext3 with journaling

Super Journal Group 0 Group 1 ··· Group N

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 41 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Recall our update example, where we wish to write the inode (I[v2]), bitmap
(B[v2]), and data block (Db) to disk.
• Before any writing, we are now first going to write them to the log (a.k.a.
journal).
Journal

TxB I[v2] B[v2] Db TxE

• The transaction begin (TxB) tells us about this update, and contains a
transaction identifier (TID).
• The middle three blocks just contain the exact contents of the blocks
themselves. (physical logging vs. logical logging).
• The final block (TxE) is a marker of the end of this transaction, and also
contains the TID.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 42 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Once this transaction is safely on disk, we are ready to overwrite the old
structures in the file system; this process is called checkpointing.
• To checkpoint the file system (i.e., bring it up to date with the pending update
in the journal), we issue the writes I[v2], B[v2], and Db to their disk locations.
• If these writes complete successfully, we have successfully checkpointed the file
system and are basically done.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 43 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Protocol to update the file system


1 Journal write: Write the transaction, including a transaction-begin block,
all pending data and metadata updates, and a transaction-end block, to
the log; wait for these writes to complete.
2 Checkpoint: Write the pending metadata and data updates to their final
locations in the file system.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 44 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• When a crash occurs during the writes to the journal.


• Issue each write at a time, waiting for each to complete, and then issuing
the next. But this is slow.
• Issue all five block writes at once, as this would turn five writes into a
single sequential write and thus be faster. However this is unsafe.
Journal

TxB I[v2] B[v2] ?? TxE


id=1 id=1

• Thus the file system issues the transactional write in two steps.
Journal

TxB I[v2] B[v2] Db TxE


id=1 id=1

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 45 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Protocol to update the file system


1 Journal write: Write the contents of the transaction (including TxB,
metadata, and data) to the log; wait for these writes to complete.
2 Journal commit: Write the transaction commit block (containing TxE) to
the log; wait for write to complete; transaction is said to be committed.
3 Checkpoint: Write the pending metadata and data updates to their final
locations in the file system.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 46 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• A crash may happen at any time during this sequence of updates.


• If the crash happens before the transaction is written safely to the log (i.e.,
before Step 2 above completes), then our job is easy: the pending update
is simply skipped.
• If the crash happens after the transaction has committed to the log, but
before the checkpoint is complete, the file system can recover the update.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 47 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Batching Log Updates


• Now suppose we create multiple files in a row in the same directory.
• To create one file, one has to update a number of on-disk structures,
minimally including:
• the inode bitmap,
• the newly-created inode of the file,
• the data block of the parent directory containing the new directory
entry,
• the parent directory inode,
• etc.
• We are writing these same blocks over and over.
• Solution: buffer all updates into a global transaction.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 48 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Making The Log Finite


• The log is of a finite size. If we keep adding transactions to it (as in this
figure), it will soon fill.
Journal

Tx1 Tx2 Tx3 Tx4 Tx5

• Use a journal superblock to record enough information to know which


transactions have not yet been checkpointed, and thus reduces recovery
time as well as enables re-use of the log in a circular fashion.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 49 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Protocol to update the file system


1 Journal write: Write the contents of the transaction (including TxB,
metadata, and data) to the log; wait for these writes to complete.
2 Journal commit: Write the transaction commit block (containing TxE) to
the log; wait for write to complete; transaction is said to be committed.
3 Checkpoint: Write the pending metadata and data updates to their final
locations in the file system.
4 Free: Some time later, mark the transaction free in the journal by updating
the journal superblock.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 50 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Journaling Mode: Data / Ordered / Unordered.


• Data journaling (as in Linux ext3) requires writing data twice to the disk.
Journal

TxB I[v2] B[v2] Db TxE

• Metadata journaling, does not write data to the journal.


Journal

TxB I[v2] B[v2] TxE

• When should we write Db?


• Ordered / Unordered

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 51 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Protocol to update the file system


1 Data write: Write data to final location; wait for completion (the wait is
optional).
2 Journal metadata write: Write the begin block and metadata to the log;
wait for writes to complete.
3 Journal commit: Write the transaction commit block (containing TxE) to
the log; wait for write to complete; transaction is said to be committed.
4 Checkpoint: Write the pending metadata and data updates to their final
locations in the file system.
5 Free: Some time later, mark the transaction free in the journal by updating
the journal superblock.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 52 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Block Reuse
• User adds an entry to the directory foo, assume the location of the foo
directory data is block 1000.
• User deletes everything in the directory as well as the directory itself,
freeing up block 1000 for reuse.
• User creates a new file foobar , which ends up reusing the same block 1000.
Journal

TxB I[foo] D[foo] TxE TxB I[foobar] TxE


id=1 pt:1000 [final addr:1000] id=1 id=2 ptr:1000 id=2

• Now assume a crash occurs and all of this information is still in the log.
What will happen?

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 53 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]2: Journaling (contd.)

• Solutions:
• Never reuse blocks until the delete of said blocks is checkpointed out of
the journal
• Linux ext3: add a new type of record (revoke) between two transactions.
Such revoked data is never replayed.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 54 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling


Solution ]3: Copy-on-Write

• Other Approaches include copy-on-write (in storage not memory management).


• Never overwrites files or directories in place; rather, it places new updates
to previously unused locations on disk.
• Doing so makes keeping the file system consistent straightforward.
• We will discuss the log-structured file system (LFS), which is an early
example of a COW.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 55 / 55

You might also like