4 4-fsFileIndex
4 4-fsFileIndex
CPSC 313 1
Admin
Quiz 4:
Starts in a bit under two weeks.
Don't forget to make your reservation.
Labs:
Lab 8 is due Sunday November 17th.
Lab 9 will be released tomorrow.
Next week:
No lectures or office hours from Monday to Wednesday.
No tutorials.
CPSC 313 2
Where we are
Unit Map:
...
P20: File Systems implementation overview
4.3. How we represent files
P21: Why fixed-size block file systems?
4.4. Building a file index
P22: Getting File System metadata
4.5. Naming
...
CPSC 313 3
Today
Learning Outcomes
Define:
Buffer cache
Compare/contrast different implementations of file indexing
Topics:
Review how data moves between persistent storage and main memory
Review the layers of abstraction in the file system
Index structures for fixed-size block allocation
CPSC 313 4
Recall: From persistent media to memory
2. We will use main memory as a
cache for persistent data.
Execution
Cores
Execution
Cores
buffer cache
f f
a a
Memory
L1 Cache L1 Cache
s
s
t L2
Cache
t
really slow Disk
reasonably fast
(Persistent)
Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
CPSC 313 6
Our Current Part of the Layers-of-Abstraction:
File system API (open/close/read/write/seek) -- application programmers see this
byte offset o
a stream of bytes
byte 0 byte size - 1
The disk’s view: PBN = something … based on file’s block map or index
A sequence of (maybe not consecutive) disk sectors
PBN 0 (physical block numbers or disk addresses) PBN
max
CPSC 313 7
Recall: Constraints on Indexes
The index itself must be in persistent storage, or we couldn’t find
the blocks of our files on reboot.
Thus, indexes "live" in disk blocks, too. They take space!
As usual, we want to efficiently support:
Sequential access
Random access
Sparse files
Both small and large files
CPSC 313 8
Recall: Impractical Strict Single-Extent-based
Data Structure:
Inode
The inode has the disk address of the first
nblocks
block in the extent and the length of the extent.
Pros:
…
CPSC 313 9
File Representation: Flat Index (practical?)
Data Structure:
Inode
Index
The index is a fixed sized array (else we have the memory
management problem we had with extents). Thus:
The index consumes some number of disk blocks
Growing the index is not possible
There is some maximum file size
That's true for most indexes; however it's worse here.
Index
This inode has a reference to the (first block of) the index.
Or the design could have the index “live” inside the inode.
CPSC 313 10
Example (flat index)
File system parameters
4096-byte blocks This means
maximum file size is 4096*16384=64MB
4-byte block numbers 4096 byte blocks @ 4 bytes/entry = 1024 entries/block
16384 entries @ 1024 entries/block = 16 index blocks
inode: 16384 index entries
Example of sparse file with 3 blocks
LBN 2 is at PBN 123
LBN 6 is at PBN 456
LBN 3074 is at PBN 789
Let’s draw this file’s meta data. (Use 0 to indicate a missing block.)
11
Example (Flat index)
Inode
0 0
Example of sparse file with 3 blocks
1 0
2 123
LBN 2 is at PBN 123
3 0 123
4 0
LBN 6 is at PBN 456
5 0
LBN 3074 is at PBN 789
6 456
7 0 456
⁞
3073 0
3074 789
789
3075 0
16383 0
12
Example (Flat index)
Inode
0 0
Example of sparse file with 3 blocks
1 0
2 123
LBN 2 is at PBN 123
3 0 123
4 0
LBN 6 is at PBN 456
5 0 LBN 3074 is at PBN 789
6 456
7 0 456
⁞
3073 0
Index
Pros:
Can represent sparse files (set block pointer to 0, which
is an invalid disk address).
Sequential and random access are efficient in terms of
metadata blocks that need to be fetched.
Cons:
4-KB block, 4-byte block #, 1-GB max file size.
How big is the index?
How many blocks required for smallest non-
empty file?
CPSC 313 14
File Representation: Flat Index (practical?)
Data Structure:
Inode
Index
The index is a fixed sized array (else we have the memory
management problem we had with extents). Thus:
ra c
Index
Pros:
Im
is an invalid disk address).
Sequential and random access are efficient in terms of
metadata blocks that need to be fetched.
Cons:
4-KB block, 4-byte block #, 1-GB max file size.
Either we have to allocate really big indices or we
How big is the index? 256 blocks
impose unreasonable constraints on file size.
How many blocks required for smallest non-
empty file? 257 blocks Small and large files consume exactly the same amount
CPSC 313
of index space. 15
File Representation: Multi-level Index (Tree)
Data Structure:
Inode
• The inode stores a disk address.
• It may refer to a data block (for a file with one block).
In some versions, the inode always references a separate
index root, even for 1 (or perhaps 0) block files.
CPSC 313 16
File Representation: Multi-level Index (Tree)
Data Structure:
Inode
• The inode stores a disk address.
• It may refer to a data block (for a file with one block).
• If there is more than one data block, it refers to a metadata
block packed with data addresses of blocks: an indirect block
CPSC 313 17
File Representation: Multi-level Index (Tree)
Inode
Data Structure:
• The inode stores a disk address.
• It may refer to a data block (for a file with one block).
• If there is more than one data block, it refers to a metadata block
packed with data addresses of blocks: an indirect block
• When that fills up, switch to storing the address of a block packed
with addresses of indirect blocks: a double-indirect block. Etc.
indirect
blocks
CPSC 313 18
Example (Multi-level index)
Indirect blocks are also 4096 bytes.
File system parameters How many disk addresses can we store in a block?
4096/4 = 1024
4096-byte blocks Files with more than 1024 blocks require a double-
indirect block.
4-byte block numbers
inode: stores one disk address (root of the tree)
Example of sparse file with 3 blocks
LBN 2 is at PBN 123, indirect block is at 124
LBN 6 is at PBN 456, indirect block is at 124
LBN 3074 is at PBN 789, indirect block is at 345
draw this file’s meta data : use 0 to indicate a block is missing.
19
Indirect blocks
0 0 • LBN 2 is at PBN 123, indirect block is at 124
1024-2047 0 ⁞
2048-3071 0 1023 0
124
124 456
3072-4095 345
4096-5119 0
1047552-1048575 0
129
20
Indirect blocks
0 0 • LBN 2 is at PBN 123, indirect block is at 124
4096-5119 0 3073 0
3074 789
⁞
3075 0
789
1047552-1048575 0 3076 0
3077 0
129 3078 0
3079 0
4095 0
21
345
File Representation: Multi-level Index (Tree)
Inode
Pros:
Can represent sparse files.
If block size >> disk address size, an access
requires few intermediate blocks.
Can grow easily.
Cons:
We call these double Even for 2-block files, we perform two IOs
indirect blocks: they
point to indirect blocks
(one to get the root of the index and one to
get the data blocks, ignoring the inode!).
The file size determines how deep the index
indirect
blocks
is, so it’s a bit more complicated to navigate
CPSC 313
the data structure. 22
File Representation: Hybrid Index
Data Structure:
Inode • The index is a fixed-size array, small enough to fit in the inode.
• Most entries are direct pointers (point to data blocks)
• A few entries are single-, double-, or triple-indirect (or more)
Direct block
pointers
Double indirect
block pointer
…
…
Double indirect
block pointer
…
…
25
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456, indirect at 125
Example (Hybrid index) • LBN 3074 is at PBN 789,
• double indirect at 129
Inode • indirect at 345
123
Direct block
0
pointers
0
123
0
indirect 125
129
double
indirect
26
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456, indirect at 125
Example (Hybrid index) 4 0
• LBN 3074 is at PBN 789,
5 0 • double indirect at 129
6 456
Inode • indirect at 345
123 7 0
Direct block
0 8 0
pointers
0 9 0
123 ⁞
0
1027 0
indirect 125
125 456
129
double
indirect Computing the index into the direct block:
lookup = 6 // Initialize lookup to the LBN
lookup -= 4 // Subtract the number of direct pointers
27
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456, indirect at 125
Example (Hybrid index) 4 0
• LBN 3074 is at PBN 789,
5 0 • double indirect at 129
6 456
Inode • indirect at 345
123 7 0
Direct block
0 8 0
pointers
0 9 0
123 ⁞
0
1027 0
indirect 125
125
1028-2051 0
456
129
indirect 3076-4099 0
1047555-1048579 0
129 28
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456, indirect at 125
Example (Hybrid index) 4 0
• LBN 3074 is at PBN 789,
5 0 • double indirect at 129
6 456
Inode • indirect at 345
123 7 0
Direct block
0 8 0
pointers
0 9 0
123 ⁞
0 1027 0
indirect 125 125
1028-2051 0
456
129
6048-7071 0 ⁞
⁞ 3073 0
Why is the direct block located at index 1022? 3074 789
1047555-1048579 0
lookup = 2046 (last slide) 3075 0
iIndex = lookup % 1024129 = 1022 345 29
Widely used!
File Representation: Hybrid Index
Data Structure:
Inode • The index is a fixed-size array, small enough to fit in the inode.
• Most entries are direct pointers (point to data blocks)
• A few entries are single-, double-, or triple-indirect (or more)
Direct block
pointers
Pros:
Small index
Double indirect Efficient for small files and sparse files
block pointer
… Files can grow large
Good for random/sequential if block size >> PBN
size
…
Cons:
…
…
…
…
…
…
Impractical
CPSC 313 31
Exercise
• Compute lots of different things about the different representations
to:
• Develop a good understanding of how each structure works
• Get a sense of how they compare to each other.
CPSC 313 32
Wrapping Up
File index structures must be persistent
Index structure (mostly) in block-sized units (exception is what is in the
inode)
Good design is:
Space efficient for small files
Performant for large files
Flexible: allows for both sequential and random access
In our case studies, we’ll see examples and variants of the
designs we discussed today.
CPSC 313 33