0% found this document useful (0 votes)
8 views28 pages

CS124 Lec 24

Uploaded by

Murali Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views28 pages

CS124 Lec 24

Uploaded by

Murali Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

FILE SYSTEMS, PART 2

CS124 – Operating Systems


Winter 2013-2014, Lecture 24
2

Last Time: Linked Allocation


• Last time, discussed linked allocation
• Blocks of the file are chained together into a linked list

Directory Entry: A 3 11

• Great for sequential access, but terrible for direct access


• A refinement of this approach is to maintain a separate file
allocation table
• FAT is small enough to keep in memory; speeds up direct access
Directory Entry: A 3 11

File Allocation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Table: 4 10 13 -1 14 15 11
3

File Layout: Indexed Allocation


• Indexed allocation achieves the benefits of linked
allocation while also being very fast for direct access
• Files include indexing information to allow for fast access
• Each file effectively has its own file-allocation table optimized for
both sequential and direct access
• This information is usually stored separate from the file’s contents,
so that programs can assume that blocks are entirely used by data
index
3
A 2 4
10
Location of
Index Block 13
14
15
11
-1
Contents of -1
Index Block …
4

File Layout: Indexed Allocation (2)


• Both direct and sequential access are very fast
• Very easy to translate a logical file position into the
corresponding disk block
• Position in index = logical position / block size
• Use value in index to load the corresponding block into memory

index
3
A 2 4
10
Location of
Index Block 13
14
15
11
-1
Contents of -1
Index Block …
5

File Layout: Indexed Allocation (3)


• Index block can also store file metadata
home
• Recall: many filesystems support hard
linking of a file from multiple paths user1 user2

• If metadata is stored in the directory


A B C D
instead of with the file, metadata must
be duplicated, could get out of sync, etc. A C
• Indexed allocation can avoid this issue!
index
3
A 2 4
10
Location of
Index Block 13
14
15
11
-1
Contents of -1
Index Block …
6

File Layout: Indexed Allocation (4)


• Obvious overhead from indexed allocation is the index
• Tends to be greater overhead than e.g. linked allocation
• Index space tend to be allocated in units of storage blocks
• Difficult to balance concerns for small and large files
• Don’t want small files to waste space with a mostly-empty index…
• Don’t want large files to incur a lot of work from navigating many
small index blocks…
index
3
A 2 4
10
Location of
Index Block 13
14
15
11
-1
Contents of -1
Index Block …
7

Indexing Approaches
• Option 1: a linked sequence of index blocks
• Each index block has an array of file-block pointers
• Last pointer in index block is either “end of index” value,
or a pointer to the next index block
• Good for smaller files
• Example: storage blocks of 512B; 32-bit index entries
• 512 bytes / 4 bytes = maximum of 128 entries
• Index block might store 100 or more entries (extra space
for storing file metadata)
• 100 entries per index block × 512 byte blocks = ~50KB file size for
a single index block
• Usually want to use virtual page size as block size instead
• Max of 1024 entries per 4KiB page
• If index entries refer to 4KiB blocks, a single index block can be
used for up to 4MB files before requiring a second index block
8

Indexing Approaches (2)


• Option 2: a multilevel index structure
• An index page can reference other index pages, or it can
reference data blocks in the file itself (but not both)
• Depth of indexing structure can be adjusted based on the
file’s size
• Using same numbers as before, a single-level index can
index up to ~4MB file sizes
• Above that size, a two-level index can be used:
• Leaf pages in index will each index up to ~4MB regions of the file
• Each entry in the root of the index corresponds to ~4MB of the file
• A two-level index can be used for up to a ~4GB file
• A three-level index can be used for up to a ~4TB file
• etc.
• Index can be navigated very efficiently for direct access
9

Indexing Approaches (3)


• Option 3: a hybrid approach that blends other approaches
• Example: UNIX Ext2 file system
• Root index node (i-node) holds file metadata
• Root index also holds pointers to the first 12 disk blocks
• Small files (e.g. up to ~50KB) only require a single index block
• Called direct blocks
file
• If this is insufficient, one of metadata
the index pointers is used
for single indirect blocks …

• One additional index block …


is introduced to the structure,
like linked organization
• Extends file size up to e.g.
multiple-MB files
10

Indexing Approaches (4)


• For even larger files, the next index pointer is used for
double indirect blocks
• These blocks are accessed via a two-level index hierarchy
• Allows for very large files, up into multiple GB in size
• If this is insufficient, the last root-index pointer is used for
triple indirect blocks
• These blocks use a three- file
metadata
level index hierarchy
• Allows file sizes up into TB …

• This approach imposes …


a size limit on files… …
• More recent extensions to

this filesystem format allow
for larger files (e.g. extents)

11

Files and Processes


• The OS maintains a buffer of storage blocks in memory
• Storage devices are often much slower than the CPU; use caching
to improve performance of reads and writes
• Multiple processes can open a file at the same time…
Process A Global Kernel Data
files[0]
Kernel Data
files[1] flags file_ops File
files[2] Control
offset filename Block
files[3] v_ptr path
… size
flags
Process B i_node
files[0]
Kernel Data
files[1]
files[2] Storage Cache
files[3] flags
files[4] offset
files[5] v_ptr

12

Files and Processes (2)


• Very common to have different processes perform reads
and writes on the same open file
• OSes tend to vary in how they handle this circumstance,
but standard APIs can manage these interactions
Process A Global Kernel Data
files[0]
Kernel Data
files[1] flags file_ops File
files[2] Control
offset filename Block
files[3] v_ptr path
… size
flags
Process B i_node
files[0]
Kernel Data
files[1]
files[2] Storage Cache
files[3] flags
files[4] offset
files[5] v_ptr

13

Files and Processes (3)


• Multiple reads on the same file generally never block each
other, even for overlapping reads
• Generally, a read that occurs after a write, should reflect
the completion of that write operation

• Writes should sometimes block each other, but OSes vary


widely in how they handle this
• e.g. Linux prevents multiple concurrent writes to the same file
• Most important situation to get correct is appending to file
• Two operations must be performed: file is extended, then write is
performed into new space
• If this task isn’t atomic, results will likely be completely broken files
14

Files and Processes (4)


• OSes have several ways to govern concurrent file access
• Often, entire files can be locked in shared or exclusive mode
• e.g. Windows CreateFile() API call allows a file to be locked in one
of several modes when it’s created
• Other processes that attempt to perform conflicting operations are
prevented from doing so by the operating system
• Some OSes provide advisory file-locking operations
• Advisory locks aren’t enforced on actual file-IO operations
• They are only enforced when processes participate in acquiring and
releasing these locks
• Example: UNIX flock() acquires and releases advisory
locks on an entire file
• Processes calling flock() can be blocked if a conflicting lock is held
• If a process decides to just directly access the flock()’d file, the OS
won’t stop it!
15

Files and Processes (5)


• Example: UNIX lockf() function can acquire and release
advisory locks on a region of a file
• i.e. lock a section of the file in a shared or exclusive mode
• Windows has a similar capability
• Both flock() and lockf() are wrappers to fcntl()
• fcntl() can perform many different operations on files:
• Duplicate a file descriptor
• Get and set control flags on open files
• Enable or disable various kinds of I/O signals for open files
• Acquire or release locks on files or ranges of files
• etc.
• Some OSes also provide mandatory file-locking support
• Processes are forced to abide by the current set of file locks
• e.g. Linux has mandatory file-locking support, but this is non-standard
16

File Deletion
• File deletion is a generally straightforward operation
• Specific implementation details depend heavily home
on the file system format
• General procedure: user1 user2
• Remove the directory entry referencing the file
A B C D
• If the file system contains no other hard-links
to the file, record that all of the file’s blocks are
A C
now available for other files to use
• The file system must record what blocks are available for
use when files are created or extended
• Often called a free-space list, although many different
ways to record this information
• Some file systems already have a way of doing this, e.g.
FAT formats simply mark clusters as unused in the table
17

Free Space Management


• A simple approach: a bitmap with one bit per block
• If a block is free, the corresponding bit is 1
• If a block is in use, the corresponding bit is 0
• Simple to find an available block, or a run of available blocks
• Can make more efficient by accessing the bitmap in units of words,
skipping over entire words that are 0
• This bitmap clearly occupies a certain amount of space
• e.g. a 4KiB block can record the state of 32768 blocks, or 128MiB of
storage space
• A 1TB disk would require 8192 blocks (32MiB) to record the disk’s free-
space bitmap
• The file system can break this bitmap into multiple parts
• e.g. Ext2 manages a free-block bitmap for groups of blocks, with the
constraint that each group’s bitmap must always fit into one block
18

Free Space Management (2)


• Another simple approach: a linked list of free blocks
• The file system records the first block in the free list
• Each free block holds a pointer to the next block
• Also very simple to find an available block
• Much harder to find a run of contiguous blocks that are available
• Tends to be more I/O costly than the bitmap approach
• Requires additional disk accesses to scan and update the free-list
of blocks
• Also, wastes a lot of space in the free list…
• A better use of free blocks: store the addresses of many
free blocks in each block of the linked list
• Only a subset of the free blocks are required for this information
• Still generally requires more space than bitmap approach
19

Free Space Management (3)


• Many other ways of recording free storage space
• e.g. record runs of free contiguous blocks with (start, count) values
• e.g. maintain more sophisticated maps of free space
• A common theme: many of these approaches don’t
require actually touching newly deallocated blocks
• e.g. update a bitmap, store a block-pointer in another block, …
• Storage devices frequently still contain the old contents of
deleted or truncated files
• Called data remanence
• Sometimes this characteristic is useful for data recovery
• e.g. file-undelete utilities
• e.g. computer forensics when investigating crimes
• Also generally not difficult to securely erase devices
20

Free Space and SSDs


• Solid State Drives (SSDs) and other flash-based devices
often complicate management of free space
• SSDs are block devices; reads and writes are a fixed size
• Problem: can only write to a block that is currently empty
• Blocks can only be erased in groups, not individually!
• An erase block is a group of blocks that are erased together
• Erase blocks are much larger than read/write blocks
• A read/write block might be 4KiB or 8KiB…
• Erase blocks are often 128 or 256 of these blocks (e.g. 2MiB)!
• As long as some blocks on the SSD are empty, writes can
be performed immediately
• If the SSD has no more empty blocks, a group of blocks
must be erased to provide more empty blocks
21

Solid State Drives


• Solid State Drives include a flash translation layer that
maps logical block addresses to physical memory cells
• Recall: system uses Logical Block Addressing to access disks
• When files are written to the SSD, data must be stored in
empty cells (i.e. old contents can’t simply be overwritten)
• If a file is edited, the SSD sees a write Flash Translation Layer
issued against the same logical block old
F1.1 F1.2 F1.3 F2.1
• e.g. block 2 in file F1 is written
• SSD can’t just replace block’s contents… F2.2 F3.1 F3.2 F3.3

• SSD marks the cell as “old,” then stores


F3.4 F1.2'
the new block data in another cell, and
updates the mapping in the FTL
22

Solid State Drives (2)


• Over time, SSD ends up with few or no available cells
• e.g. a series of writes to our SSD that results in all cells being used
• SSD must erase at least one block of cells to be reused
• Best case is when an entire erase-block can be reclaimed
• SSD erases the entire block, and then carries on as before

Flash Translation Layer Flash Translation Layer


old old old old
Erase! F1.1 F1.2 F1.3 F2.1 F2.1''

old old
F2.2 F3.1 F3.2 F3.3 F2.2 F3.1 F3.2 F3.3

old old old old


F3.4 F1.2' F3.1' F3.4' F3.4 F1.2' F3.1' F3.4'

old
F1.1' F2.1' F1.3' F1.2'' F1.1' F2.1' F1.3' F1.2''
23

Solid State Drives (3)


• More complicated when an erase block still holds data
• e.g. SSD decides it must reclaim the third erase-block
• SSD must relocate the current contents before erasing
• Result: sometimes a write to the SSD incurs additional
writes within the SSD
• Phenomenon is called write amplification

Flash Translation Layer Flash Translation Layer


old old old old old old old old
F2.1'' F3.1' F3.4' F2.1'' F3.1' F3.4'

old old
F2.2 F3.1 F3.2 F3.3 F2.2 F3.1 F3.2 F3.3

old old
Erase! F3.4 F1.2' F3.1' F3.4'

old old
F1.1' F2.1' F1.3' F1.2'' F1.1' F2.1' F1.3' F1.2''
24

Solid State Drives (4)


• SSDs must carefully manage this process to avoid
uneven wear of its memory cells
• Cells can only survive so many erase cycles, then become useless
• How does the SSD know when a cell’s contents are no
longer needed? (i.e. when to mark the cell “old”)
Flash Translation Layer
• The SSD only knows because it sees old old old old
F2.1'' F3.1' F3.4'
several writes to the same logical block
• The new version replaces the old version, so F2.2
old
F3.1 F3.2 F3.3
the old cell is no longer used for storage

old
F1.1' F2.1' F1.3' F1.2''
25

SSDs and File Deletion


• Problem: for most file system formats, file deletion
doesn’t actually touch the blocks in the file themselves!
• File systems try to avoid this anyway, because storage I/O is slow!
• Want to update the directory entry and the free-space map only,
and want this to be as efficient as possible
• Example: File F3 is deleted from the SSD
Flash Translation Layer
• SSD will only see the block with the directory
entry change, and block(s) holding the free map old
F2.1''
old
F3.1'
old
F3.4'
old

• The SSD has no idea that file F3’s data old


no longer needs to be preserved F2.2 F3.1 F3.2 F3.3

• e.g. if the SSD decides to erase bank 2, it will


still move F3.2 and F3.3 to other cells, even
though the OS and the users don’t care! F1.1'
old
F2.1' F1.3' F1.2''
26

SSDs, File Deletion and TRIM


• To deal with this, SSDs introduced the TRIM command
• (TRIM is not an acronym)
• When the filesystem is finished with certain logical blocks,
it can issue a TRIM command to inform the SSD that the
data in those blocks can be discarded
• Previous example: file F3 is deleted Flash Translation Layer
• The OS can issue a TRIM command to inform old old old old
SSD that all associated blocks are now unused F2.1'' F3.1' F3.4'

• TRIM allows the SSD to manage its cells F2.2


old
F3.1 F3.2
old old
F3.3
much more efficiently
• Greatly reduces write magnification issues
• Helps reduce wear on SSD memory cells old
F1.1' F2.1' F1.3' F1.2''
27

SSDs, File Deletion and TRIM (2)


• Still a few issues to resolve with TRIM at this point
• Biggest one is TRIM wasn’t initially a queued command
• Couldn’t include TRIM commands in a mix of other read/write
commands being sent to the device
• TRIM must be performed separately, in isolation of other operations
• TRIM must be issued in a batch-mode
way, when it won’t interrupt other work Flash Translation Layer
• e.g. can’t issue TRIM commands immediately old old old old
F2.1'' F3.1' F3.4'
after each delete operation
• This was fixed in SATA 3.1 specification F2.2
old
F3.1 F3.2
old old
F3.3
• A queued version of TRIM was introduced
• Another issue: not all OSes/filesystems
support TRIM (or not enabled by default) old
F1.1' F2.1' F1.3' F1.2''
28

Remaining Schedule…
• No class on Friday, March 7!
• Last class on Monday, March 10
• Topic: Journaling in filesystems

• Last assignment (Pintos filesystem) goes out today


• Due in two weeks

• Office hours will be held as usual, all the way through


finals week

You might also like