0% found this document useful (0 votes)
50 views38 pages

L18 VSFS and FSFormat

Uploaded by

Sam Cheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views38 pages

L18 VSFS and FSFormat

Uploaded by

Sam Cheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

CSC369H1F

Operating Systems
Fall 2023

Lecture 18: VSFS and On-Disk Format

Instructors: Angela Demke Brown (Day, L0101)


Kuei (Jack) Sun (Evening, L5101)

Department of Computer Science


University of Toronto

CSC369
Today

INTRO TO FILE SYSTEMS THE VERY SIMPLE FILE


SYSTEM

CSC369 2 Lecture 18
Existing File Systems
• Many file system implementations
• Check out https://en.wikipedia.org/wiki/List_of_file_systems
• Every file system is built for a specific purpose
• Disk file systems
• E.g., Windows NTFS, Linux Ext4, Mac OSX HFS+
• Flash file systems
• E.g., Linux F2FS
• File system with LVM (logical volume manager)
• BSD/Solaris ZFS, Linux Btrfs
• Network file system, distributed file system, …etc
• We’ll explore common concerns with a very simple file system (VSFS)
CSC369 3 Lecture 18
The Main Idea
• We need to create a file system for an unformatted disk.
• We need to create some structure in it, so we can find and organize data
• Assumption:
• Want to implement a file system based on the VFS (virtual file system) interface
• Key questions:
• Where do we store data and metadata structures (inodes)?
• How do we keep track of block allocations?
• How do we locate file data and metadata?
• What are the limitations (max file size, etc.)?

CSC369 4 Lecture 18
How does a File System use a disk to store files?
• Disks are block access devices
• Transfer of data between disk and memory is always a block of data.
• Minimum disk block size typically 512 bytes (older) or 4096 bytes (4KB)
• The minimum disk block size is called a sector size

A raw, unformatted disk (256 KB)

CSC369 5 Lecture 18
How does File System use disk to store files?
• File systems use blocks.
• Block size is usually defined by the file system, but must be a multiple of sector
size (e.g., 4KB file system block size = 8 * 512 byte disk sector size)
• Disk space is allocated in the granularity of blocks.
• Data is transferred between disk and memory in units of blocks.

(e.g. Block size: 4KB, Number of blocks: 64)


256 KB Disk

0 15

16 31
32 47
48 63

CSC369 6 Lecture 18
Superblock
• How do we identify a file system on disk?
• A “superblock” holds metadata about the overall file system
• Always at a well-known disk location
• Identifies type of file system (magic number)
• Includes location of other metadata, file system size, etc.
• Often replicated across disk for reliability
256 KB Disk

0 15

16 31
32 47
48 63

CSC369 7 Lecture 18
Example: ext2 Superblock
• Simplified. For illustration only!
struct ext2_super_block {
unsigned int s_inodes_count; /* Inodes count */
unsigned int s_blocks_count; /* Blocks count */
...
unsigned int s_first_data_block; /* First Data Block */
unsigned int s_log_block_size; /* Block size */
...
unsigned short s_magic; /* Magic signature */
...
unsigned short s_inode_size; /* Size of inode struct */
}

80 64 ... Data structure in


Superblock 8 12 ... memory exactly
on disk 0xEF53 ... matches layout of
superblock on disk.
128 ...
CSC369 8 Lecture 18
Free Space Map
• A free map tracks which blocks are free
• Usually a bitmap, one bit per block on the disk
• Indexed by block number, i.e., Nth bit corresponds to block number N
• 1 to indicate free, 0 to indicate in-use
• Stored on disk, cached in memory for performance
256 KB Disk

0 15

16 31
32 47
48 63

CSC369 9 Lecture 18
Data Blocks
• Remaining blocks are used to store files and directories
• And other file system metadata
• There are many ways to do this
• We will look at the Very Simple File System (VSFS) from OSTEP as an example
256 KB Disk

0 15

16 31
32 47
48 63

CSC369 10 Lecture 18
Metadata: Inodes
• File system needs to track information about each file
• In most file systems, directories are just a type of file.
• The content of a directory is a list of directory entries.
• Information about each file is stored in a data structure called an inode.
• Common term, but contents of inode vary widely by FS.
• ext2 inode structure is 128 bytes.

CSC369 11 Lecture 18
Example: Simplified ext2 inode structure
Size Name What is this inode field for?
2 i_mode What is the file type (regular, directory)? Can it be read/written/executed?
2 i_uid Who owns this file?
4 i_size How many bytes are in this file?
4 i_atime What time was this file last accessed?
4 i_ctime What time was this file created?
4 i_mtime What time was this file last modified?
4 i_dtime What time was this file deleted (i.e., the inode was freed)?
2 i_gid What group does this file belong to?
2 i_links_count How many hard links are there to this file?
4 i_blocks How many disk sectors have been allocated to this file?
4 i_flags How should ext2 use this file? (Compressed? Immutable? Append-only?)
4 i_osd1 An OS-dependent field.
60 i_block A set of 15 disk pointers to locate the file content.

CSC369 12 Lecture 18
Metadata: Inode Table
• The inode table is an array of inodes, stored in consecutive blocks on
disk.
• Allocated when file system is created.
• So, there is a pre-defined maximum number of files in file system!
• Need to keep track of which inodes are allocated or free  inode bitmap

CSC369 13 Lecture 18
VSFS: Allocation Structures
• Keep track of which inodes and blocks are being used and which ones are
free.
• One bitmap for the inode region.
• One bitmap for the data region.
• Reserve one block for each bitmap.
• How many blocks can a 4KB data bitmap keep track of?
• 4KB = 4096 bytes * 8 bits per byte = 32K bits, can keep track of 32K blocks.
• Overkill for this tiny file system, but simplifies management to allocate full block.
256 KB Disk

0 SB IB DB 15

16 31
32 47
48 63

CSC369 14 Lecture 18
VSFS: Inode Table
• Must reserve blocks for inode table when file system is initialized
• mkfs.vsfs –i 160 disk.img
• How many blocks do we need for the inode table?
• Assume 1 inode is 128 bytes. Each inode block holds 4KB / 128B = 32 inodes.
• Number of blocks = 160 inodes / 32 inodes per block = 5
256 KB Disk

0 SB IB DB 15

16 31
32 47
48 63

CSC369 15 Lecture 18
VSFS: Inode Table
• Must reserve blocks for inode table when file system is initialized
• mkfs.vsfs –i 160 disk.img
• How many blocks do we need for the inode table?
• Assume 1 inode is 128 bytes. Each inode block holds 4KB / 128B = 32 inodes.
• Number of blocks = 160 inodes / 32 inodes per block = 5
• Why have 160 inodes with only 56 data blocks?
256 KB Disk

0 SB IB DB I I I I I 15

16 31
32 47
48 63

CSC369 16 Lecture 18
VSFS: Data Region
• The rest of the disk will be used to store user data
• As well as dynamically allocated file system metadata.
• Directories
• Indirect blocks
• In our VSFS example, we have the last 56 blocks for the data region.
• First 8 blocks are pre-allocated for metadata.
256 KB Disk

0 SB IB DB I I I I I D D D D D D D D 15

16 D D D D D D D D D D D D D D D D 31
32 D D D D D D D D D D D D D D D D 47
48 D D D D D D D D D D D D D D D D 63

CSC369 17 Lecture 18
Done formatting disk into VSFS!

256 kB Disk
256 KB Disk

0 SB IB DB I I I I I D D D D D D D D 15

16 D D D D D D D D D D D D D D D D 31
32 D D D D D D D D D D D D D D D D 47
48 D D D D D D D D D D D D D D D D 63

CSC369 18 Lecture 18
Ex: Read file with inode number 34
• When mounting the FS, the OS first reads the superblock into memory.
• From the superblock, we know:
• Inode table begins at block 3 (i.e., 3*4KB from start of disk)
• Inode size is 128B
• Disk offset of inode 34 (zero-indexed): 3*4KB + 34*128B = 4*4KB + 2*128B
• block number = 4, block offset = 256
Inode 34 is here. But where is the data?
256 KB Disk

0 SB IB DB I I I I I D D D D D D D D 15

16 D D D D D D D D D D D D D D D D 31
32 D D D D D D D D D D D D D D D D 47
48 D D D D D D D D D D D D D D D D 63

CSC369 19 Lecture 18
From inode to data…
Inode 34
• Read second inode block into memory. i_mode IFREG
• Inode 34 is the third inode structure in this i_uid 10253
block. i_size

3016

• Its i_block array holds up to 15 pointers to disk i_block[0] 15
blocks …
i_block[11]

0
• 0-11 point to file data blocks i_block[12] 0
• 12 points to a single indirect block i_block[13] 0
• 13 points to a double-indirect block i_block[14] 0
• 14 points to a triple indirect block Inode 34 is here. Its data is here.
256 KB Disk

0 SB IB DB I I I I I D D D D D D D D 15

16 D D D D D D D D D D D D D D D D 31
32 D D D D D D D D D D D D D D D D 47
48 D D D D D D D D D D D D D D D D 63

CSC369 20 Lecture 18
Recap: Indirection
• The single indirect block points to a disk block that contains pointers to
data blocks.
• If a pointer takes up 4 bytes and a block is 4KB bytes in size, then we can fit 1024
pointers in a data block.
• The single indirect block points to 1024 = 210 file data blocks.
• The double indirect block points to single indirect blocks.
• The double indirect block points to 1024 single indirect blocks
• 1024 * 1024 = 210 * 210 = 220 file data blocks.
• The triple indirect block points to double indirect blocks.
• It can point to 1024 * 1024 * 1024 = 230 file data blocks.
• Maximum file size = (12 + 210 + 220 + 230) * 4KB ≈ 4TB
CSC369 21 Lecture 18
File data blocks are allocated incrementally
write(fd, buf, 4096);
...
size ~4M+4K
~4G+4K
04144K
16384
12288
49152
45056
40960
36864
32768
28672
24576
20480
~4TB
8192
4096
~4M
~4G
52K
48K Data blocks
0 24 Indirect blocks
1 57
Double indirect blocks
2 32
3 36 Triple indirect block
4 45
5 48
i_block array

1024 1024
6 38 indirect double
7 35 blocks … indirect …
8 29 1024 … blocks
data … …
9 50 blocks …
10 66 … … …
… …
11 53 …
12 30
13 26
14 42
CSC369 22 Lecture 18
Multi-level block pointers form a tree
• But it is imbalanced. Why?
• Most files are small (~2KiB).
• Files are usually accessed sequentially.
• Directories are typically small (20 or fewer entries).

• Design based on evidence.


• For example, see “A Five-Year Study of File System Metadata”
• Published in 2007
• Looks at Windows PC file system images

CSC369 23 Lecture 18
Alternative: Extent-based Allocation
• An extent == a disk pointer plus a length (in # of blocks)
• i.e., it allocates several consecutive blocks.
• Instead of requiring a pointer to every block of a file, we just require a list
of (start block, length) tuples
• E.g., (25, 3), (50, 1), (53, 5), (61, 10)
• Advantages: Uses less metadata per file, and file allocation is more
compact.
• Disadvantage: Less flexible than the pointer-based approach.
• Does this cause external fragmentation?
• Adopted by Ext4, HFS+, NTFS, XFS.

CSC369 24 Lecture 18
Another Alternative: Link-Based
• From last lecture
• Instead of pointers to all blocks, the inode just has one pointer to the first data
block of the file, then the first block points to the second block, etc.
• Works poorly if we want to access the last block of a big file.
• Workaround: inode tracks a tail pointer, to support append efficiently
• Uses in-memory File Allocation Table, indexed by address of data block.
• Tracks the “next” pointer of allocated blocks
• Faster in finding a block.
• FAT file system, used by Windows before NTFS.
• This course focuses on indexed file systems

CSC369 25 Lecture 18
Implementing Directories
• A directory is implemented as a special kind of file.
• In ext2, the inode for a directory has IFDIR set in i_mode field.
• The data blocks of that inode contain directory entries.
• What is the format?
• Simplest: array of fixed-size entry structs (e.g., vsfs)
• Still simple: list of variable length entries (e.g., ext2)
• Complex but time efficient: hash table or b-tree

CSC369 26 Lecture 18
Implementing Directory Entries
Directory entries map file names to inode numbers.
• vsfs: fixed-length records • ext2: variable length records
struct vsfs_dentry { struct ext2_dir_entry {
// inode number // inode number
unsigned int inode; unsigned int inode;
// file name, max 252 chars // directory entry length
char name[252]; unsigned short rec_len;
}; // name length
unsigned char name_len;
// file type
unsigned char file_type;
// file name, max 255 chars
char name[];
};

CSC369 27 Lecture 18
Ext2 Directory Entries Example
• Root directory of a freshly-made ext2 file system:
• i.e., content of first data block for root directory inode.
First entry points to self.
Entry 0:
Inode 2 Root dir in ext2 is inode #2.
The record length is Rec len 12 The length of the name is 1.
rounded up to the Name len 1 Not null-terminated.
next multiple of 4. Type DIRECTORY
Name . The name of the self entry is “.”
Entry 1:
Inode 2
Rec len 12 Second entry points to parent.
Name len 2 Root dir is its own parent.
Type DIRECTORY
Name .. The name of the parent entry is “..”
Entry 2:
Inode 11
The record length of Rec len 1000
the last entry fills the Name len 10
rest of the block. Type DIRECTORY
Name lost+found

CSC369 28 Lecture 18
How do we access a file?
• Example: read the first block of “/somefile”
1. The inode for the root directory “/“ is known.
2. Read the inode for “/“ to find the data block(s) that hold the directory entries for
the root directory (i_block[0])
3. Read the first data block for the directory.
4. Search the entries for “somefile”
5. The entry for “somefile” identifies the inode.
6. Read the inode for “somefile”
7. The inode tells us the location of the first data block for the file.
8. Read that block into memory to access the data at the beginning of the file.

CSC369 29 Lecture 18
How do we follow a path?
• Let’s say you want to open the file “/one/two/three”
1. Read super block for the location of inode for “/”
2. Read inode block containing the inode for “/”
3. Read directory block for “/”
• search for entry “one” to get inode number
4. Read inode block containing inode for “one”
5. Read directory block for “one”
• search for entry “two” to get inode number
6. Read inode block containing inode for “two”
7. Read directory block for “two”
• search for entry “three” to get inode number
8. Read inode block for “three”
• Now we can read the first block of “three”.
CSC369 30 Lecture 18
Access file “/one/two/three”
1. Read Superblock.
2. Read inode block for root inode.
3. Read root directory entry block. Search.
Memory
4. Read inode block containing inode for “one”.
5. Read directory entry block for “one”. Search.
6. Read inode block containing inode for “two”.
7. Read directory entry block for “two”. Search.
8. Read inode block containing inode for “three”.
9. Read first data block of “three”.
256 KB Disk

0 SB IB DB I I I I I D D D D D D D D 15

16 D D D D D D D D D D D D D D D D 31
32 D D D D D D D D D D D D D D D D 47
48 D D D D D D D D D D D D D D D D 63

CSC369 31 Lecture 18
Access file “/one/two/three”
1. Read Superblock.
2. Read inode block for root inode.
SB I I I I
3. Read root directory entry block. Search.
Memory
D D D 4. Read inode block containing inode for “one”.
D 5. Read directory entry block for “one”. Search.
6. Read inode block containing inode for “two”.
7. Read directory entry block for “two”. Search.
8. Read inode block containing inode for “three”.
9. Read first data block of “three”.
256 KB Disk

0 SB IB DB I I I I I D D D D D D D D 15

16 D D D D D D D D D D D D D D D D 31
32 D D D D D D D D D D D D D D D D 47
48 D D D D D D D D D D D D D D D D 63

CSC369 32 Lecture 18
Lecture 18 Exercise Setup
• Lecture 18 exercise uses a simple, text-based representation of a file
system to help you to understand how some common file operations are
implemented.
• Focus is on how the file operations use and modify the file system metadata.

CSC369 33 Lecture 18
Disk Operations Exercise
• Assumptions:
• Only files and directories are represented.
• A file or directory is represented by at most one data block.
inode bitmap 11110000
inodes [d a:0 r:3] [f a:1 r:1] [f a:-1 r:1] [d a:2 r:2] [] …

• An inode is represented by square brackets [ ] with 3 elements:


• type (d – directory, f – file)
• address of first block
• a:0 means that the first data block of the file is block 0
• a:-1 means that no data block is allocated for the file (empty file)
• number of references, or links.
• r:3 means there are 3 directory entries that reference this inode
CSC369 34 Lecture 18
Disk Operations Exercise
• Example: This FS has 4 inodes in use, and 3 data blocks in use.

inode bitmap 11110000

inodes [d a:0 r:3] [f a:1 r:1] [f a:-1 r:1] [d a:2 r:2] [] ...

data bitmap 11100000

data [(.,0) (..,0) (y,1) (z,2) (q,3)] [u] [(.,3) (..,0)] [] ...

• Each data block is represented by square brackets [ ].


• Directory blocks contain one tuple for each directory entry.
• [ (name, inode_number) ]
• File blocks contain one character representing the contents of the file.
CSC369 35 Lecture 18
Disk Operations Exercise

inode bitmap 11110000

inodes [d a:0 r:3] [f a:1 r:1] [f a:-1 r:1] [d a:2 r:2] [] ...

data bitmap 11100000

data [(.,0) (..,0) (y,1) (z,2) (q,3)] [u] [(.,3) (..,0)] [] ...

• The first inode[d a:0 r:3] is the inode for the root directory
• a:0 – the address is data block 0.
• r:3 - there are 3 references to this directory

CSC369 36 Lecture 18
Disk Operations Exercise

inode bitmap 11110000

inodes [d a:0 r:3] [f a:1 r:1] [f a:-1 r:1] [d a:2 r:2] [] ...

data bitmap 11100000

data [(.,0) (..,0) (y,1) (z,2) (q,3)] [u] [(.,3) (..,0)] [] ...

• File z is empty. The inode does not point to any data blocks.
• How do we know the highlighted inode is for file z?

CSC369 37 Lecture 18
Disk Operations Exercise

inode bitmap 11110000

inodes [d a:0 r:3] [f a:1 r:1] [f a:-1 r:1] [d a:2 r:2] [] …

data bitmap 11100000

data [(.,0) (..,0) (y,1) (z,2) (q,3)] [u] [(.,3) (..,0)]


[] ...

• File y is using one data block, at index 1.


• For this exercise, we don’t care what the content is, so we
represent it as a single character, “u”.

CSC369 38 Lecture 18

You might also like