0% found this document useful (0 votes)

19 views55 pages

OSC_CH12_File_System_Implementation

This document discusses file system implementation concepts, including the structure and organization of typical file systems such as VSFS and NTFS. It covers topics like inode management, allocation methods, and the use of block pointers for file data storage. The document aims to provide insights into local and remote file system implementations, as well as block allocation algorithms and their trade-offs.

Uploaded by

Atsenta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views55 pages

OSC_CH12_File_System_Implementation

Uploaded by

Atsenta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

0.Prologue 1.VSFS 2.FFS 3.

FSCK & Journaling

OPERATING SYSTEM CONCEPTS

Chapter 12. File-System Implementation

A/Prof. Kai Dong

[email protected]
School of Computer Science and Engineering,
Southeast University

December 16, 2024

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 1 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Contents

1 Warm-up

2 Typical File System

3 Fast File System

4 FSCK and Journaling

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 2 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Warm-up
File System Measurement Summary

Most files are small Roughly 2K is the most common size

Average file size is growing Almost 200K is the average
Most bytes are stored in large files A few big files use most of the space
File systems contains lots of files Almost 100K on average
File systems are roughly half full Even as disks grow, file systems remain ∼ 50% full
Directories are typically small Many have few entries; most have 20 or fewer

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 3 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Warm-up
Size of A File

• Why the files are of these sizes and use these spaces.

Filename Content Description Size Space

test1 - - 0 0
test2 “This is a test file.\r\n” ×1 (line) 22 bytes 0
test3 “This is a test file.\r\n” ×40 (lines) 878 bytes 4 KB
test4 “This is a test file.\r\n” ×(40 − 39) (lines) 22 bytes 4 KB

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 4 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Objectives

• To describe the details of implementing local file systems and directory

structures.
• To describe the implementation of remote file systems.
• To discuss block allocation and free-block algorithms and trade-offs.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 5 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Contents

1 Warm-up

2 Typical File System

3 Fast File System

4 FSCK and Journaling

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 6 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Very Simple File System

• We are now discussing a simple file system implementation, known as VSFS (the
Very Simple File System)
• A simplified version of a typical UNIX file system.
• You should understand
• Data structures: what types of on-disk structures are utilized by the file
system to organize its data and metadata?
• Access methods: How does it map the calls made by a process onto its
structures?

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 7 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Overall Organization

• Divide the disk into blocks (with a commonly-used block size of 4 KB).
• Assume a really small disk, with just 64 blocks.

0 7 8 15 16 23 24 31

32 39 40 47 48 55 56 63

• Reserve a fixed portion of the disk for the data region

• Say the last 56 of 64 blocks on the disk:
Data Region

D D D D D D D D D D D D D D D D D D D D D D D D

0 7 8 15 16 23 24 31

Data Region

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

32 39 40 47 48 55 56 63

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 8 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Overall Organization (contd.)

• To track information about each file, the inodes are stored in the inode table.
• Assume 5 of 64 blocks for inodes.
Inodes Data Region

I I I I I I I I D D D D D D D D D D D D D D D D D D D D D D D D

0 7 8 15 16 23 24 31

Data Region

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

32 39 40 47 48 55 56 63

• Assuming 256 bytes per inode, our file system contains ? total inodes.
• This number represents the maximum number of files we can have in our
file system.
• How could the file system know which inode /data block is free?

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 9 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Overall Organization (contd.)

• To track whether inodes or data blocks are free or allocated, an inode bitmap
and a data bitmap are required.
Inodes Data Region

SI iI dI I I I I I D D D D D D D D D D D D D D D D D D D D D D D D

0 7 8 15 16 23 24 31

Data Region

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

32 39 40 47 48 55 56 63

• Optional data structures include a free list.

• The remaining block is the Superblock.
• It contains information about this particular file system, including, for
example, how many inodes and data blocks are in the file system, where
the inode table begins, and so forth.
• When mounting a file system, the operating system will read the
superblock first.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 10 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

File Organization: the Inode

• An inode in UNIX FS is a File Control Block (FCB).

• The name inode is short for index node.
• Each inode is implicitly referred to by a number, called the inumber.
iblock 0 iblock 1 iblock 2 iblock 3 iblock 4
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77
Super i-bmap d-bmap
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78

3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79

0KB 4KB 8KB 12KB 16KB 20KB 24KB 28KB 32KB

• How to read a given inode number X ?

• By calculating the offset into the inode region (X · sizeof (inode)), add it to
the start address of the inode table on disk.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 11 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

File Organization: the Inode (contd.)

Size Name What is this inode field for?

2 mode can this file be read/written/executed?
2 uid who owns this file?
4 size how many bytes are in this file?
4 time what time was this file last accessed?
4 ctime what time was this file created?
4 mtime what time was this file last modified?
4 dtime what time was this inode deleted?
2 gid which group does this file belong to?
2 links count how many hard links are there to this file?
4 blocks how many blocks have been allocated to this file?
4 flags how should ext2 use this inode?
4 osd1 an OS-dependent field
60 block a set of disk pointers (15 total)
4 generation file version (used by NFS)
4 file acl a new permissions model beyond mode bits
4 dir acl called access control lists
Simplified Ext2 Inode

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 12 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

File Organization: the Inode (contd.)

• How an inode refers to where data blocks are.

• Suppose a simple approach:
• To have one or more direct pointers (disk addresses) inside the inode; each
pointer refers to one disk block that belongs to the file.
• Such an approach is limited:
• For example, if you want to have a file that is really big (e.g., bigger than
the size of a block multiplied by the number of direct pointers).
• Solution: To make use of indirect pointers.
• Instead of pointing to a block that contains user data, it points to a block
that contains more pointers, each of which point to user data.
• What is the maximum size, with 12 direct pointers and 1 indirect
pointer

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 13 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

The Multi-Level Index

• To support even larger files, by adding a

• double indirect pointer
• What is the maximum size, with 12 direct pointers, 1 indirect
pointer, 1 double indirect pointer
• triple indirect pointer
• What is the maximum size, with 12 direct pointers, 1 indirect
pointer, 1 double indirect pointer, 1 triple indirect pointer
• Or, by using extents instead of pointers. (ext4)
• An extent is simply a disk pointer plus a length (in blocks) to specify the
on-disk location of a file.
• Extent-based file systems often allow for more than one extent.
• Less flexible but more compact.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 14 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

FAT (File-Allocation Table)

directory entry
test ··· 217
name start block
0

217 618

339

618 339

-1
FAT

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 15 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

NTFS

• NTFS (new technology file system) with MFT (master file table)
• A record in MFT (each is 1KB in size):
• Body contains attr. data, or a pointer to an extent

Head (ASCII FILE + · · · )

Head 0x10
Attr. 1
Body Standard Information
Head 0x30
Attr. 2
Body File Name
Attr.
···
Head 0x80
Attr. 8
Body Data
···
FF FF FF FF

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 16 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Allocation Method

• Contiguous allocation
• Each file occupies a set of contiguous blocks on the disk.
• ext4, ntfs
• Linked allocation
• Each file is a linked list of disk blocks
• fat
• Indexed allocation
• Brings all pointers together into the index block
• ext2, ext3

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 17 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

In Class Exercise

• Imagine a file system which uses inodes to manage files on disk. Each inode
consists of a file name (4 bytes), user id (2 bytes), three timestamps (4 bytes
each), protection bits (2 bytes), a reference count (2 bytes), a file type (2 bytes),
and the file size (4 bytes). Additionally, the inode contains 13 direct indices, 1
index to a single indirect block, 1 index to a double indirect block, and one index
to a triple indirect block. Each of these indices (block pointer) is 4 bytes. The
file system also stores the first 356 bytes of each file in the inode.
1 Three major methods of allocating disk space are introduced in our
textbook. What are these three allocation methods? Which one is used in
the previous file system?
2 Assume a disk sector is 512 bytes and that each indirect block fills a single
sector. What is the maximum file size for this file system? Show your work
clearly. You need not do the arithmetic to get full credit.
3 Is there any benefit to including the first 356 bytes of the file in the inode?
If so, what is the reason? If not, why not?

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 18 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Key

• indexed allocation
• (512/4)3 ∗ 512 + (512/4)2 ∗ 512 + (512/4)1 ∗ 512 + 13 ∗ 512 + 356
• Yes, Efficiency in both spatial and temporal. Most files are small. For small files
(≤356 bytes), do not need to access disk twice. save disk space (internal
fragmentation within blocks).

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 19 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Directory Organization

• A directory basically contains a list of (entry name, inode number) pairs.

inum reclen strlen name
5 4 2 .
2 4 3 ..
12 4 4 foo
13 4 4 bar
24 8 7 foobar

• Deleting a file (e.g., calling unlink()) can leave an empty space in the
middle of the directory, and hence there should be some way to mark that
as well (e.g., with a reserved inode number such as zero).
• Such a delete is one reason the record length (reclen) is used: a new entry
may reuse an old, bigger entry and thus have extra space within.
• A directory has an inode, somewhere in the inode table (with the type field
of the inode marked as “directory” instead of “regular file”).

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 20 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Access Paths: Reading and Writing

• Suppose we want to open a file, e.g., /foo/bar, read it, then close it.
• Opening a file from disk

open(00 /foo/bar 00 , O RDONLY )

• The file system must traverse the pathname and thus locate the desired inode.
1 Read the inode of the root directory which is simply called /.
2 Look inside the inode to find pointers to data blocks, which contain the
contents of the root directory.
3 Find the entry for foo, and the inode number.
4 ···

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 21 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Access Paths: Reading and Writing (contd.)

• Reading a file from disk

read()

1 The first read (at offset 0 unless lseek() has been called) will thus read in
the first block of the file, consulting the inode to find the location of such
a block.
2 ···

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 22 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Access Paths: Reading and Writing (contd.)

data inode root foo bar root foo bar bar bar
bitmap bitmap inode inode inode data data data[0] data[1] data[2]
read
read
open(bar ) read
read
read
read
read() read
write
read
read() read
write
read
read() read
write

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 23 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Access Paths: Reading and Writing (contd.)

• Writing to disk is a similar process

write()

• Unlike reading, writing to the file may also allocate a block.

• Each write to a file logically generates five I/Os:
• one to read the data bitmap (which is then updated to mark the
newly-allocated block as used),
• one to write the bitmap (to reflect its new state to disk),
• two more to read and then write the inode (which is updated with the new
block’s location), and
• finally one to write the actual block itself.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 24 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Typical File System

Access Paths: Reading and Writing (contd.)

• Writing to disk
write()
• Considering file creation, the total amount of I/O traffic to do so is quite high:
• one read to the inode bitmap (to find a free inode),
• one write to the inode bitmap (to mark it allocated),
• one write to the new inode itself (to initialize it),
• one write to the data of the directory (to link the high-level name of the
file to its inode number), and
• one read and write to the directory inode to update it.
• if the directory needs to grow to accommodate the new entry, additional
I/Os (i.e., to the data bitmap, and the new directory block) will be needed
too.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 25 / 55
Typical File System

data inode root foo bar root foo bar bar bar
bitmap bitmap inode inode inode data data data[0] data[1] data[2]
read
read
read
read
create read
(/foo/bar ) write
write
read
write
write
read
read
write() write
write
write
read
read
write() write
write
write
read
read
write() write
write
write
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Contents

1 Warm-up

2 Typical File System

3 Fast File System

4 FSCK and Journaling

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 27 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• Performance problems
• Expensive positioning costs, if data was spread all over the place
• The data blocks of a file were often very far away from its inode, thus
inducing an expensive seek whenever one first read the inode and
then the data blocks.
• Fragmented file system, if the free space was not carefully managed.
• The original block size was too small (512bytes).
• Fast File System (FFS): disk awareness
• Design the file system structures and allocation policies to be “disk aware”
and thus improve performance.
• It keeps the same interface to the FS, but changes the internal
implementation.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 28 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• Modern drives do not export enough information for the file system to truly
understand whether a particular cylinder is in use.
• Modern file systems (such as Linux ext2, ext3, and ext4) instead organize the
drive into block groups.
• Whether you call them cylinder groups or block groups, these groups are the
central mechanism that FFS uses to improve performance.
• By placing two files within the same group, FFS can ensure that accessing
one after the other will not result in long seeks across the disk.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 29 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• FFS keeps within a single cylinder group all the structures you might expect a
file system to have.

S ib db Inodes Data

• FFS keeps a copy of the super block (S) in each group for reliability
reasons.
• A per-group inode bitmap (ib) and data bitmap (db) to track whether the
inodes and data blocks of the group are allocated.
• The inode and data block regions are just like those in the previous
very-simple file system.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 30 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• Policies: how to allocate files and directories

• The basic principle is: keep related stuff together (and its corollary, keep
unrelated stuff far apart).
• Placement heuristics: e.g.,
• For directories
• Place a directory in the cylinder group with a low number of
allocated directories and a high number of free inodes.
• For files
• First, it makes sure (in the general case) to allocate the data blocks
of a file in the same group as its inode.
• Second, it places all files that are in the same directory in the cylinder
group of the directory they are in.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 31 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• Assume three directories (/, /a, and /b), and four files (/a/c, /a/d, /a/e, and
/b/f).
• In general FS
group inodes data
0 /--------- /---------
1 a--------- a---------
2 b--------- b---------
3 c--------- c---------
4 d--------- d---------
5 e--------- e---------
6 f--------- f---------
7 ---------- ----------
···
• In FFS
group inodes data
0 /--------- /---------
1 acde------ accddee---
2 bf-------- bff-------
3 ---------- ----------
···

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 32 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• The Large-File Exception

• Without a different rule, a large file would entirely fill the block group it is
first placed within (and maybe others).
• Filling a block group in this manner is undesirable, as it prevents
subsequent “related” files from being placed within this block group, and
thus may hurt file-access locality.
• After some number of blocks are allocated into the first block group, FFS
places the next “large” chunk of the file in another block group.
• A tradeoff here, since spreading blocks of a file across the disk will hurt
performance.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 33 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Fast File System

• If a user creates one big file, /a

• In general FS-– not enough room for new files in the group in the root directory
(/).
group inodes data
0 /a-------- /aaaaaaaaa aaaaaaaaaa aaaaaaaaaa a---------
1 ---------- ---------- ---------- ---------- ----------
···
• In FFS
group inodes data
0 /a-------- /aaaaa---- ---------- ---------- ----------
1 ---------- aaaaa----- ---------- ---------- ----------
2 ---------- aaaaa----- ---------- ---------- ----------
3 ---------- aaaaa----- ---------- ---------- ----------
4 ---------- aaaaa----- ---------- ---------- ----------
5 ---------- aaaaa----- ---------- ---------- ----------
6 ---------- ---------- ---------- ---------- ----------
···

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 34 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

Contents

1 Warm-up

2 Typical File System

3 Fast File System

4 FSCK and Journaling

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 35 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

• Considering consistency
• How to update persistent data structures despite the presence of a power
loss or system crash?
• Crash-consistency problem
• Imagine you have to update two on-disk structures, A and B, in order to
complete a particular operation. Because the disk only services a single
request at a time, one of these requests will reach the disk first (either A or
B). If the system crashes or loses power after one write completes, the
on-disk structure will be left in an inconsistent state.
• How to update the disk despite crashes?
• Two approaches
• A file system checker (fsck)
• Journaling (also known as write-ahead logging)

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 36 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

A Detailed Example

From

Inode Data
Bmap Bmap Inodes Daba Blocks

I.v1
Da

Inode Data
Bmap Bmap Inodes Daba Blocks

I.v2
Da Db

Three separate writes to the disk are required

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 37 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

A Detailed Example (contd.)

• Crash Scenarios: Imagine only a single write succeeds.

• Just the data block (Db) is written to disk.
• This case is not a problem at all, from the perspective of file-system
crash consistency.
• Just the updated inode (I[v2]) is written to disk.
• If we trust the inode pointer, we will read garbage data from the disk.
• File-system inconsistency: The on-disk bitmap is telling us that data
block 5 has not been allocated, but the inode is saying that it has.
• Just the updated bitmap (B[v2]) is written to disk.
• File-system inconsistency: The bitmap indicates that block 5 is
allocated, but there is no inode that points to it.
• Space leak: block 5 would never be used by the file system.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 38 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

A Detailed Example (contd.)

• Crash Scenarios (contd.): Two writes succeeds.

• The inode (I[v2]) and bitmap (B[v2]) are written to disk, but not data
(Db).
• The file system metadata is completely consistent.
• But block 5 has garbage in it.
• The inode (I[v2]) and the data block (Db) are written, but not the bitmap
(B[v2]).
• File-system inconsistency.
• The bitmap (B[v2]) and data block (Db) are written, but not the inode
(I[v2]).
• File-system inconsistency.
• We have no idea which file block 5 belongs to.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 39 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]1: FSCK

• Superblock: If fsck finds a suspect (corrupt) superblock; in this case, the system
(or administrator) may decide to use an alternate copy of the superblock.
• Free blocks: Fsck scans the inodes, indirect blocks, double indirect blocks, etc.,
to build an understanding of which blocks are currently allocated within the file
system. It uses this knowledge to produce a correct version of the allocation
bitmaps; thus, if there is any inconsistency between bitmaps and inodes, it is
resolved by trusting the information within the inodes.
• Inode state: Each inode is checked for corruption or other problems. Suspect
inode is cleared by fsck.
• Inode links: Fsck scans through the entire directory tree to verify the link count
of each allocated inode.
• Duplicates: Fsck checks for duplicate pointers. If one inode is obviously bad, it
may be cleared. Alternately, the pointed-to block could be copied, thus giving
each inode its own copy as desired.
• Bad blocks: A pointer is considered “bad” if it obviously points to something
outside its valid range.
• Directory checks: Fsck performs additional integrity checks on the contents of
each directory, e.g., making sure that each inode referred to in a directory entry
is allocated and no directory is linked to more than once.
A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 40 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling

• Fsck is too slow and it allows inconsistencies happen and then find and fix them
later when rebooting.
• An alternate solution is journaling (a.k.a. write-ahead logging).
• When updating the disk, before overwriting the structures in place, first
write down a little note (somewhere else on the disk, in a well-known
location) describing what you are about to do.
• If a crash takes places, the note tells exactly what to fix (and how to fix it)
after a crash, instead of having to scan the entire disk.
• Linux ext2 without journaling

Super Group 0 Group 1 ··· Group N

• Linux ext3 with journaling

Super Journal Group 0 Group 1 ··· Group N

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 41 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Recall our update example, where we wish to write the inode (I[v2]), bitmap
(B[v2]), and data block (Db) to disk.
• Before any writing, we are now first going to write them to the log (a.k.a.
journal).
Journal

TxB I[v2] B[v2] Db TxE

• The transaction begin (TxB) tells us about this update, and contains a
transaction identifier (TID).
• The middle three blocks just contain the exact contents of the blocks
themselves. (physical logging vs. logical logging).
• The final block (TxE) is a marker of the end of this transaction, and also
contains the TID.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 42 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Once this transaction is safely on disk, we are ready to overwrite the old
structures in the file system; this process is called checkpointing.
• To checkpoint the file system (i.e., bring it up to date with the pending update
in the journal), we issue the writes I[v2], B[v2], and Db to their disk locations.
• If these writes complete successfully, we have successfully checkpointed the file
system and are basically done.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 43 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Protocol to update the file system

1 Journal write: Write the transaction, including a transaction-begin block,
all pending data and metadata updates, and a transaction-end block, to
the log; wait for these writes to complete.
2 Checkpoint: Write the pending metadata and data updates to their final
locations in the file system.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 44 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• When a crash occurs during the writes to the journal.

• Issue each write at a time, waiting for each to complete, and then issuing
the next. But this is slow.
• Issue all five block writes at once, as this would turn five writes into a
single sequential write and thus be faster. However this is unsafe.
Journal

TxB I[v2] B[v2] ?? TxE

id=1 id=1

• Thus the file system issues the transactional write in two steps.
Journal

TxB I[v2] B[v2] Db TxE

id=1 id=1

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 45 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Protocol to update the file system

1 Journal write: Write the contents of the transaction (including TxB,
metadata, and data) to the log; wait for these writes to complete.
2 Journal commit: Write the transaction commit block (containing TxE) to
the log; wait for write to complete; transaction is said to be committed.
3 Checkpoint: Write the pending metadata and data updates to their final
locations in the file system.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 46 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• A crash may happen at any time during this sequence of updates.

• If the crash happens before the transaction is written safely to the log (i.e.,
before Step 2 above completes), then our job is easy: the pending update
is simply skipped.
• If the crash happens after the transaction has committed to the log, but
before the checkpoint is complete, the file system can recover the update.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 47 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Batching Log Updates

• Now suppose we create multiple files in a row in the same directory.
• To create one file, one has to update a number of on-disk structures,
minimally including:
• the inode bitmap,
• the newly-created inode of the file,
• the data block of the parent directory containing the new directory
entry,
• the parent directory inode,
• etc.
• We are writing these same blocks over and over.
• Solution: buffer all updates into a global transaction.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 48 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Making The Log Finite

• The log is of a finite size. If we keep adding transactions to it (as in this
figure), it will soon fill.
Journal

Tx1 Tx2 Tx3 Tx4 Tx5

• Use a journal superblock to record enough information to know which

transactions have not yet been checkpointed, and thus reduces recovery
time as well as enables re-use of the log in a circular fashion.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 49 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Protocol to update the file system

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 50 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Journaling Mode: Data / Ordered / Unordered.

• Data journaling (as in Linux ext3) requires writing data twice to the disk.
Journal

TxB I[v2] B[v2] Db TxE

• Metadata journaling, does not write data to the journal.

Journal

TxB I[v2] B[v2] TxE

• When should we write Db?

• Ordered / Unordered

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 51 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Protocol to update the file system

1 Data write: Write data to final location; wait for completion (the wait is
optional).
2 Journal metadata write: Write the begin block and metadata to the log;
wait for writes to complete.
3 Journal commit: Write the transaction commit block (containing TxE) to
the log; wait for write to complete; transaction is said to be committed.
4 Checkpoint: Write the pending metadata and data updates to their final
locations in the file system.
5 Free: Some time later, mark the transaction free in the journal by updating
the journal superblock.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 52 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Block Reuse
• User adds an entry to the directory foo, assume the location of the foo
directory data is block 1000.
• User deletes everything in the directory as well as the directory itself,
freeing up block 1000 for reuse.
• User creates a new file foobar , which ends up reusing the same block 1000.
Journal

TxB I[foo] D[foo] TxE TxB I[foobar] TxE

id=1 pt:1000 [final addr:1000] id=1 id=2 ptr:1000 id=2

• Now assume a crash occurs and all of this information is still in the log.
What will happen?

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 53 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]2: Journaling (contd.)

• Solutions:
• Never reuse blocks until the delete of said blocks is checkpointed out of
the journal
• Linux ext3: add a new type of record (revoke) between two transactions.
Such revoked data is never replayed.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 54 / 55
0.Prologue 1.VSFS 2.FFS 3.FSCK & Journaling

FSCK and Journaling

Solution ]3: Copy-on-Write

• Other Approaches include copy-on-write (in storage not memory management).

• Never overwrites files or directories in place; rather, it places new updates
to previously unused locations on disk.
• Doing so makes keeping the file system consistent straightforward.
• We will discuss the log-structured file system (LFS), which is an early
example of a COW.

A/Prof. Kai Dong Operating System Concepts Chapter 12. File-System Implementation 55 / 55

PMI Authorized PMP EXAM PREP 5622696 - 20210404204158570
100% (3)
PMI Authorized PMP EXAM PREP 5622696 - 20210404204158570
404 pages
File System Implementation: Tran, Van Hoai
No ratings yet
File System Implementation: Tran, Van Hoai
30 pages
PA2
No ratings yet
PA2
6 pages
Filesystem Implementation
No ratings yet
Filesystem Implementation
27 pages
Slide 07
No ratings yet
Slide 07
100 pages
File Systems
100% (1)
File Systems
64 pages
Module 4 File System Implemenattion
No ratings yet
Module 4 File System Implemenattion
21 pages
Operating Systems CMPSC 473
No ratings yet
Operating Systems CMPSC 473
27 pages
File Systems: Implementation: Bilkent University Department of Computer Engineering CS342 Operating Systems
No ratings yet
File Systems: Implementation: Bilkent University Department of Computer Engineering CS342 Operating Systems
107 pages
Chapter 12 File System Implementation
No ratings yet
Chapter 12 File System Implementation
53 pages
Chapter 12: File System Implementation
No ratings yet
Chapter 12: File System Implementation
58 pages
Student Course Registration System
No ratings yet
Student Course Registration System
42 pages
File-System Structure
No ratings yet
File-System Structure
5 pages
LINUX File System: Slides Adopted From
No ratings yet
LINUX File System: Slides Adopted From
41 pages
OS Part 04
No ratings yet
OS Part 04
60 pages
Ch-14 - File System Implementation
No ratings yet
Ch-14 - File System Implementation
34 pages
Module 4 File System
No ratings yet
Module 4 File System
58 pages
L18 VSFS and FSFormat
No ratings yet
L18 VSFS and FSFormat
38 pages
14 File System Implementation
No ratings yet
14 File System Implementation
46 pages
File Management 1
No ratings yet
File Management 1
26 pages
File System Implementation
No ratings yet
File System Implementation
31 pages
9.2 Filesystem Implementation
No ratings yet
9.2 Filesystem Implementation
21 pages
Lecture Notes Course Outcome 1 & Session 4 Topic: SFS File System Implementation
No ratings yet
Lecture Notes Course Outcome 1 & Session 4 Topic: SFS File System Implementation
8 pages
File System Implementation OS
No ratings yet
File System Implementation OS
54 pages
File Systems: 6.1 Files 6.2 Directories 6.3 File System Implementation 6.4 Example File Systems
No ratings yet
File Systems: 6.1 Files 6.2 Directories 6.3 File System Implementation 6.4 Example File Systems
46 pages
Os - Unit 5
No ratings yet
Os - Unit 5
60 pages
OSG202 - Chap 4 - File System
No ratings yet
OSG202 - Chap 4 - File System
56 pages
Ex09
No ratings yet
Ex09
2 pages
File System Implementatio N
No ratings yet
File System Implementatio N
28 pages
OS_C4_File and Disk management
No ratings yet
OS_C4_File and Disk management
60 pages
File Systems: Files Directories File System Implementation Example File Systems
No ratings yet
File Systems: Files Directories File System Implementation Example File Systems
46 pages
A Brief History of UNIX File Systems: Val Henson IBM, Inc
No ratings yet
A Brief History of UNIX File Systems: Val Henson IBM, Inc
22 pages
Chapter 06
No ratings yet
Chapter 06
59 pages
18.FileSystems Fundamentals Handout
No ratings yet
18.FileSystems Fundamentals Handout
5 pages
File System Implementation 1!11!21
No ratings yet
File System Implementation 1!11!21
16 pages
A Survey of File Systems
No ratings yet
A Survey of File Systems
2 pages
File System Implementation
No ratings yet
File System Implementation
27 pages
Chapter 11: File System Implementation!
No ratings yet
Chapter 11: File System Implementation!
57 pages
Lec 8
No ratings yet
Lec 8
24 pages
File System Implementation
No ratings yet
File System Implementation
46 pages
Lec19 Filesystems2
No ratings yet
Lec19 Filesystems2
30 pages
CSS-FINALS-REVIEWER
No ratings yet
CSS-FINALS-REVIEWER
19 pages
File System
No ratings yet
File System
46 pages
Advanced Operating Systems -3
No ratings yet
Advanced Operating Systems -3
50 pages
Chapter 10: File System Implementation
No ratings yet
Chapter 10: File System Implementation
33 pages
Distributed Files Ys
No ratings yet
Distributed Files Ys
43 pages
6
No ratings yet
6
40 pages
File System Implementation: Sunu Wibirama
No ratings yet
File System Implementation: Sunu Wibirama
40 pages
lec14_fsapi
No ratings yet
lec14_fsapi
38 pages
He-Dieu-Hanh - Kai-Li - Filelayout - (Cuuduongthancong - Com)
No ratings yet
He-Dieu-Hanh - Kai-Li - Filelayout - (Cuuduongthancong - Com)
7 pages
11.1 EXT2 File System
No ratings yet
11.1 EXT2 File System
56 pages
Modern Operating Systems, 2nd Edition, Chapter 6 course slides
No ratings yet
Modern Operating Systems, 2nd Edition, Chapter 6 course slides
46 pages
File Systems: 6.1 Files 6.2 Directories 6.3 File System Implementation 6.4 Example File Systems
No ratings yet
File Systems: 6.1 Files 6.2 Directories 6.3 File System Implementation 6.4 Example File Systems
46 pages
13 Filesystems Slides
No ratings yet
13 Filesystems Slides
39 pages
Section10-File Systems PDF
No ratings yet
Section10-File Systems PDF
43 pages
12 File Systems
No ratings yet
12 File Systems
79 pages
File System Implementation
No ratings yet
File System Implementation
35 pages
Aspects of File Systems: Data Storage Device Blocks Files Directories
No ratings yet
Aspects of File Systems: Data Storage Device Blocks Files Directories
5 pages
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
OpenBSD Mastery: Filesystems: IT Mastery, #19
From Everand
OpenBSD Mastery: Filesystems: IT Mastery, #19
Michael W. Lucas
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Instant Concrete Mix Design: Acknowledge
No ratings yet
Instant Concrete Mix Design: Acknowledge
1 page
Key: - Cons Constant Coefficient, Hhsize Household Size, Coeff Coefficient
No ratings yet
Key: - Cons Constant Coefficient, Hhsize Household Size, Coeff Coefficient
1 page
Simultaneous Multi-Threaded Design: Virendra Singh
No ratings yet
Simultaneous Multi-Threaded Design: Virendra Singh
15 pages
Daily Diary Report
No ratings yet
Daily Diary Report
3 pages
Dual Plate 800 NB-Model
No ratings yet
Dual Plate 800 NB-Model
1 page
Ds4windows Log 20230716.1
No ratings yet
Ds4windows Log 20230716.1
3 pages
Ac90 95
No ratings yet
Ac90 95
10 pages
تقرير معادلات PDF
No ratings yet
تقرير معادلات PDF
9 pages
5kw Solar System Price in India With Subsidy @Rs250000 Solar Experts
No ratings yet
5kw Solar System Price in India With Subsidy @Rs250000 Solar Experts
1 page
Checksheet Incoming - Nut, Wheel Single LH
No ratings yet
Checksheet Incoming - Nut, Wheel Single LH
1 page
Em-Cijj190028 424..464
No ratings yet
Em-Cijj190028 424..464
41 pages
Desirability Function Approach For Selection of Facility Location: A Case Study
No ratings yet
Desirability Function Approach For Selection of Facility Location: A Case Study
10 pages
A321 Transit Insp
No ratings yet
A321 Transit Insp
37 pages
Componentes
No ratings yet
Componentes
3 pages
Requirements Pracs Week6 Sample Answers
No ratings yet
Requirements Pracs Week6 Sample Answers
4 pages
Cisco Expressway Certificate Creation and Use Deployment Guide X8 9
No ratings yet
Cisco Expressway Certificate Creation and Use Deployment Guide X8 9
36 pages
Relief Valve
No ratings yet
Relief Valve
2 pages
Abstract - in This Paper, We Introduce An Efficient and Userfriendly Smart
No ratings yet
Abstract - in This Paper, We Introduce An Efficient and Userfriendly Smart
4 pages
MLII-101 PRACTACAL
No ratings yet
MLII-101 PRACTACAL
9 pages
DUE 10012 Communicative English 1: Unit 2 Current Issues/ Topics of Interest 3 Hours
No ratings yet
DUE 10012 Communicative English 1: Unit 2 Current Issues/ Topics of Interest 3 Hours
17 pages
Mason Industries, Inc.: Type C Ratings
No ratings yet
Mason Industries, Inc.: Type C Ratings
4 pages
Maths - 1a Imp Questions
No ratings yet
Maths - 1a Imp Questions
73 pages
31 Java IO Package
No ratings yet
31 Java IO Package
8 pages
Assessment - Lesson 1
No ratings yet
Assessment - Lesson 1
12 pages
STD-004 Junction Box and Manhole Details 6 of 6
No ratings yet
STD-004 Junction Box and Manhole Details 6 of 6
1 page
Project Manager Job Scope
No ratings yet
Project Manager Job Scope
3 pages
Resume_Classification_Using_ML_Techniques
No ratings yet
Resume_Classification_Using_ML_Techniques
5 pages
FPGA Implementation of Medical Image Fusion
No ratings yet
FPGA Implementation of Medical Image Fusion
6 pages
And, Or, Gate
100% (1)
And, Or, Gate
5 pages

OSC___CH12___File_System_Implementation

Uploaded by

OSC___CH12___File_System_Implementation

Uploaded by

0.Prologue 1.VSFS 2.FFS 3.

FSCK & Journaling

OPERATING SYSTEM CONCEPTS

A/Prof. Kai Dong

December 16, 2024

2 Typical File System

3 Fast File System

4 FSCK and Journaling

Most files are small Roughly 2K is the most common size

Filename Content Description Size Space

• To describe the details of implementing local file systems and directory

2 Typical File System

3 Fast File System

4 FSCK and Journaling

Typical File System

Typical File System

• Reserve a fixed portion of the disk for the data region

Typical File System

Typical File System

• Optional data structures include a free list.

Typical File System

• An inode in UNIX FS is a File Control Block (FCB).

0KB 4KB 8KB 12KB 16KB 20KB 24KB 28KB 32KB

• How to read a given inode number X ?

Typical File System

Size Name What is this inode field for?

Typical File System

• How an inode refers to where data blocks are.

Typical File System

• To support even larger files, by adding a

Typical File System

Typical File System

Head (ASCII FILE + · · · )

Typical File System

Typical File System

Typical File System

Typical File System

• A directory basically contains a list of (entry name, inode number) pairs.

Typical File System

open(00 /foo/bar 00 , O RDONLY )

Typical File System

• Reading a file from disk

Typical File System

Typical File System

• Writing to disk is a similar process

• Unlike reading, writing to the file may also allocate a block.

Typical File System

2 Typical File System

3 Fast File System

4 FSCK and Journaling

Fast File System

Fast File System

Fast File System

Fast File System

• Policies: how to allocate files and directories

Fast File System

Fast File System

• The Large-File Exception

Fast File System

• If a user creates one big file, /a

2 Typical File System

3 Fast File System

4 FSCK and Journaling

FSCK and Journaling

FSCK and Journaling

Three separate writes to the disk are required

FSCK and Journaling

• Crash Scenarios: Imagine only a single write succeeds.

FSCK and Journaling

• Crash Scenarios (contd.): Two writes succeeds.

FSCK and Journaling

FSCK and Journaling

Super Group 0 Group 1 ··· Group N

• Linux ext3 with journaling

Super Journal Group 0 Group 1 ··· Group N

FSCK and Journaling

OSC_CH12_File_System_Implementation

OSC_CH12_File_System_Implementation