0% found this document useful (0 votes)
16 views33 pages

4 4-fsFileIndex

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views33 pages

4 4-fsFileIndex

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CPSC313: Computer Hardware

and Operating Systems


Unit 4: File Systems
Representing files on disk

CPSC 313 1
Admin

Quiz 4:
 Starts in a bit under two weeks.
 Don't forget to make your reservation.

Labs:
 Lab 8 is due Sunday November 17th.
 Lab 9 will be released tomorrow.

Next week:
 No lectures or office hours from Monday to Wednesday.
 No tutorials.
CPSC 313 2
Where we are

Unit Map:
 ...
 P20: File Systems implementation overview
 4.3. How we represent files
 P21: Why fixed-size block file systems?
 4.4. Building a file index
 P22: Getting File System metadata
 4.5. Naming
 ...
CPSC 313 3
Today

Learning Outcomes
 Define:

Buffer cache
 Compare/contrast different implementations of file indexing

Topics:
 Review how data moves between persistent storage and main memory
 Review the layers of abstraction in the file system
 Index structures for fixed-size block allocation

CPSC 313 4
Recall: From persistent media to memory
2. We will use main memory as a
cache for persistent data.

Execution
Cores
Execution
Cores
buffer cache
f f
a a

Memory
L1 Cache L1 Cache
s
s
t L2
Cache
t
really slow Disk
reasonably fast
(Persistent)

Mother board or system board 1. “Large” unit of transfer


(called an IO for
Input/Output):
>= sector (4KB)
The buffer cache:
• An in-memory cache of persistent data pages
• Managed by the file system (part of the operating system)
• A software cache (i.e., managed by SW not HW)
CPSC 313 5
Posix API: hierarchical name space, byte-streams, open, close, read, write
open close read write

Map: Name to file Allocate and manage


metadata file descriptors

We are still here


Find location of in-memory file
file metadata object: VNODE

Map: file offsets to


disk blocks

Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling

CPSC 313 6
Our Current Part of the Layers-of-Abstraction:
File system API (open/close/read/write/seek) -- application programmers see this
byte offset o
a stream of bytes
byte 0 byte size - 1

The file system’s view: LBN = floor(o / file_system_block_size)


A sequence of logical blocks: the size of a block is whatever the file system wants
LBN 0 (logical block numbers) LBN max
= ceil(filesize /file system block size) - 1

The disk’s view: PBN = something … based on file’s block map or index
A sequence of (maybe not consecutive) disk sectors
PBN 0 (physical block numbers or disk addresses) PBN
max
CPSC 313 7
Recall: Constraints on Indexes

The index itself must be in persistent storage, or we couldn’t find
the blocks of our files on reboot.

Thus, indexes "live" in disk blocks, too. They take space!

As usual, we want to efficiently support:
 Sequential access
 Random access
 Sparse files
 Both small and large files

CPSC 313 8
Recall: Impractical Strict Single-Extent-based
Data Structure:
Inode

The inode has the disk address of the first
nblocks
block in the extent and the length of the extent.


Pros:

 Very fast sequential access


 Minimal meta-data

Cons:
 Cannot (efficiently) support sparse files
 Leads to too much external fragmentation
 Does not match POSIX API

CPSC 313 9
File Representation: Flat Index (practical?)
Data Structure:
Inode
Index

The index is a fixed sized array (else we have the memory
management problem we had with extents). Thus:
 The index consumes some number of disk blocks
 Growing the index is not possible
 There is some maximum file size

That's true for most indexes; however it's worse here.
Index

This inode has a reference to the (first block of) the index.
Or the design could have the index “live” inside the inode.
CPSC 313 10
Example (flat index)
File system parameters
 4096-byte blocks This means
maximum file size is 4096*16384=64MB
 4-byte block numbers 4096 byte blocks @ 4 bytes/entry = 1024 entries/block
16384 entries @ 1024 entries/block = 16 index blocks
 inode: 16384 index entries
Example of sparse file with 3 blocks
 LBN 2 is at PBN 123
 LBN 6 is at PBN 456
 LBN 3074 is at PBN 789
 Let’s draw this file’s meta data. (Use 0 to indicate a missing block.)
11
Example (Flat index)
Inode
0 0
Example of sparse file with 3 blocks
1 0
2 123
LBN 2 is at PBN 123
3 0 123
4 0
LBN 6 is at PBN 456
5 0
LBN 3074 is at PBN 789
6 456
7 0 456

3073 0

3074 789
789
3075 0

16383 0
12
Example (Flat index)
Inode
0 0
Example of sparse file with 3 blocks
1 0
2 123
LBN 2 is at PBN 123
3 0 123
4 0
LBN 6 is at PBN 456
5 0 LBN 3074 is at PBN 789
6 456
7 0 456

3073 0

3074 789 789


3075 0 Note: We draw these as pointers, but they are
⁞ not pointers-in-memory; they are disk addresses!
16383 0
13
File Representation: Flat Index (practical?)
Data Structure:
Inode
Index

The index is a fixed sized array (else we have the memory
management problem we had with extents). Thus:
 The index consumes some number of disk blocks
 Growing the index is not possible
 There is some maximum file size

Index

Pros:
 Can represent sparse files (set block pointer to 0, which
is an invalid disk address).
 Sequential and random access are efficient in terms of
metadata blocks that need to be fetched.

Cons:
4-KB block, 4-byte block #, 1-GB max file size.
How big is the index?
How many blocks required for smallest non-
empty file?
CPSC 313 14
File Representation: Flat Index (practical?)
Data Structure:
Inode
Index

The index is a fixed sized array (else we have the memory
management problem we had with extents). Thus:

al  The index consumes some number of disk blocks

ti c  Growing the index is not possible


 There is some maximum file size

ra c
Index 
Pros:

p  Can represent sparse files (set block pointer to 0, which

Im 
is an invalid disk address).
Sequential and random access are efficient in terms of
metadata blocks that need to be fetched.

Cons:
4-KB block, 4-byte block #, 1-GB max file size.
 Either we have to allocate really big indices or we
How big is the index? 256 blocks
impose unreasonable constraints on file size.
How many blocks required for smallest non-
empty file? 257 blocks  Small and large files consume exactly the same amount
CPSC 313
of index space. 15
File Representation: Multi-level Index (Tree)
Data Structure:
Inode
• The inode stores a disk address.
• It may refer to a data block (for a file with one block).

In some versions, the inode always references a separate
index root, even for 1 (or perhaps 0) block files.

CPSC 313 16
File Representation: Multi-level Index (Tree)
Data Structure:
Inode
• The inode stores a disk address.
• It may refer to a data block (for a file with one block).
• If there is more than one data block, it refers to a metadata
block packed with data addresses of blocks: an indirect block

We call these indirect blocks:


blocks that point to data blocks.

CPSC 313 17
File Representation: Multi-level Index (Tree)
Inode
Data Structure:
• The inode stores a disk address.
• It may refer to a data block (for a file with one block).
• If there is more than one data block, it refers to a metadata block
packed with data addresses of blocks: an indirect block
• When that fills up, switch to storing the address of a block packed
with addresses of indirect blocks: a double-indirect block. Etc.

We call these double


indirect blocks: they
point to indirect blocks

indirect
blocks
CPSC 313 18
Example (Multi-level index)
Indirect blocks are also 4096 bytes.
File system parameters How many disk addresses can we store in a block?
4096/4 = 1024
 4096-byte blocks Files with more than 1024 blocks require a double-
 indirect block.
4-byte block numbers
 inode: stores one disk address (root of the tree)
Example of sparse file with 3 blocks
 LBN 2 is at PBN 123, indirect block is at 124
 LBN 6 is at PBN 456, indirect block is at 124
 LBN 3074 is at PBN 789, indirect block is at 345
 draw this file’s meta data : use 0 to indicate a block is missing.
19
Indirect blocks
0 0 • LBN 2 is at PBN 123, indirect block is at 124

Example (Tree index) • LBN 6 is at PBN 456, indirect block is at 124


1 0
2 123
• LBN 3074 is at PBN 789, indirect block is at 345
3 0
4 0
Inode 5 0
129
Double-indirect block 6 456
0-1023 124 7 0
123

1024-2047 0 ⁞

2048-3071 0 1023 0
124
124 456
3072-4095 345

4096-5119 0

1047552-1048575 0

129

20
Indirect blocks
0 0 • LBN 2 is at PBN 123, indirect block is at 124

Example (Tree index) • LBN 6 is at PBN 456, indirect block is at 124


1 0
2 123
3 0
• LBN 3074 is at PBN 789, indirect block is at 345
4 0
Inode 5 0
129
Double-indirect block 6 456
0-1023 124 7 0 123

1024-2047 0
1023 0
2048-3071 0
124
456
3072-4095 345
3072 0

4096-5119 0 3073 0
3074 789

3075 0
789
1047552-1048575 0 3076 0
3077 0
129 3078 0
3079 0

4095 0
21
345
File Representation: Multi-level Index (Tree)
Inode

Pros:
 Can represent sparse files.
 If block size >> disk address size, an access
requires few intermediate blocks.
 Can grow easily.

Cons:
We call these double  Even for 2-block files, we perform two IOs
indirect blocks: they
point to indirect blocks
(one to get the root of the index and one to
get the data blocks, ignoring the inode!).
 The file size determines how deep the index
indirect
blocks
is, so it’s a bit more complicated to navigate
CPSC 313
the data structure. 22
File Representation: Hybrid Index
Data Structure:
Inode • The index is a fixed-size array, small enough to fit in the inode.
• Most entries are direct pointers (point to data blocks)
• A few entries are single-, double-, or triple-indirect (or more)
Direct block
pointers

Double indirect
block pointer

Indirect block pointer


CPSC 313 23
Widely used!
File Representation: Hybrid Index
Data Structure:
Inode • The index is a fixed-size array, small enough to fit in the inode.
• Most entries are direct pointers (point to data blocks)
• A few entries are single-, double-, or triple-indirect (or more)
Direct block
pointers

Double indirect
block pointer

Indirect block pointer


CPSC 313 24
Example (Hybrid Index)
File system parameters
• 4096-byte blocks
• 4-byte block numbers
• inode: 4 direct, 1 indirect, and 1 double-indirect blocks
Example of sparse file with 3 blocks
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456 and indirect block is at 125
• LBN 3074 is at PBN 789, double indirect block is at 129, indirect at 345
• draw this file’s meta data : use 0 to indicate a block is missing.

25
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456, indirect at 125
Example (Hybrid index) • LBN 3074 is at PBN 789,
• double indirect at 129
Inode • indirect at 345
123
Direct block

0
pointers

0
123
0

indirect 125
129

double
indirect

26
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456, indirect at 125
Example (Hybrid index) 4 0
• LBN 3074 is at PBN 789,
5 0 • double indirect at 129
6 456
Inode • indirect at 345
123 7 0
Direct block

0 8 0
pointers

0 9 0
123 ⁞
0
1027 0
indirect 125
125 456
129

double
indirect Computing the index into the direct block:
 lookup = 6 // Initialize lookup to the LBN
 lookup -= 4 // Subtract the number of direct pointers

This is the index into our indirect block

27
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456, indirect at 125
Example (Hybrid index) 4 0
• LBN 3074 is at PBN 789,
5 0 • double indirect at 129
6 456
Inode • indirect at 345
123 7 0
Direct block

0 8 0
pointers

0 9 0
123 ⁞
0
1027 0
indirect 125
125
1028-2051 0
456
129

double 2052-3075 345

indirect 3076-4099 0

4100-5123 0 Why is the indirect at index 1?



lookup = LBN - nDirect (4) - nIndirect (1024) = 3074-1028 = 2046
5124-6047 0

dIndex = lookup / daddrs_per_block = 2046/1024 = 1
6048-7071 0

1047555-1048579 0

129 28
• LBN 2 is at PBN 123
• LBN 6 is at PBN 456, indirect at 125
Example (Hybrid index) 4 0
• LBN 3074 is at PBN 789,
5 0 • double indirect at 129
6 456
Inode • indirect at 345
123 7 0
Direct block

0 8 0
pointers

0 9 0
123 ⁞
0 1027 0
indirect 125 125
1028-2051 0
456
129

double 2052-3075 345 Indirect block


indirect 3076-4099 0 2052 0
2053 0
4100-5123 0 789
2054 0
5124-6047 0 2055 0

6048-7071 0 ⁞

⁞ 3073 0
Why is the direct block located at index 1022? 3074 789
1047555-1048579 0

lookup = 2046 (last slide) 3075 0

iIndex = lookup % 1024129 = 1022 345 29
Widely used!
File Representation: Hybrid Index
Data Structure:
Inode • The index is a fixed-size array, small enough to fit in the inode.
• Most entries are direct pointers (point to data blocks)
• A few entries are single-, double-, or triple-indirect (or more)
Direct block
pointers


Pros:
 Small index
Double indirect  Efficient for small files and sparse files
block pointer
…  Files can grow large
 Good for random/sequential if block size >> PBN
size


Cons:

Indirect block pointer  Slightly more complicated to map from LBN to


CPSC 313 30
PBN
File Structure Summary
Hybrid Index
Inode
Extent-based
Inode

Flat Index Inode Multi-level Index


Inode
Index





Impractical
CPSC 313 31
Exercise
• Compute lots of different things about the different representations
to:
• Develop a good understanding of how each structure works
• Get a sense of how they compare to each other.

CPSC 313 32
Wrapping Up

File index structures must be persistent
 Index structure (mostly) in block-sized units (exception is what is in the
inode)

Good design is:
 Space efficient for small files
 Performant for large files
 Flexible: allows for both sequential and random access

In our case studies, we’ll see examples and variants of the
designs we discussed today.
CPSC 313 33

You might also like