0% found this document useful (0 votes)
2 views20 pages

CS124 Lec 22

The document discusses file systems and their management, focusing on how operating systems handle file access and deletion, as well as free space management techniques. It highlights the complexities of managing concurrent file access, particularly with SSDs, and introduces the TRIM command to optimize storage management. Additionally, it explains the performance differences between random and sequential access in SSDs, emphasizing the implications of write amplification.

Uploaded by

online002.lps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views20 pages

CS124 Lec 22

The document discusses file systems and their management, focusing on how operating systems handle file access and deletion, as well as free space management techniques. It highlights the complexities of managing concurrent file access, particularly with SSDs, and introduces the TRIM command to optimize storage management. Additionally, it explains the performance differences between random and sequential access in SSDs, emphasizing the implications of write amplification.

Uploaded by

online002.lps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

FILE SYSTEMS, PART 2

CS124 – Operating Systems


Spring 2024, Lecture 22
2

Files and Processes


• The OS maintains a buffer of storage blocks in memory
• Storage devices are often much slower than the CPU; use caching to improve performance
of reads and writes
• Multiple processes can open a file at the same time…
Process A Global Kernel Data
Kernel Data files[0]
files[1] flags file_ops File
files[2] Control
offset filename Block
files[3] v_ptr path
… size
flags
Process B i_node
Kernel Data files[0]
files[1]
files[2] Storage Cache
files[3] flags
files[4] offset
files[5] v_ptr

3

Files and Processes (2)


• Very common to have different processes perform reads and writes on the
same open file
• OSes tend to vary in how they handle this circumstance, but standard APIs
can manage these interactions
Process A Global Kernel Data
Kernel Data files[0]
files[1] flags file_ops File
files[2] Control
offset filename Block
files[3] v_ptr path
… size
flags
Process B i_node
Kernel Data files[0]
files[1]
files[2] Storage Cache
files[3] flags
files[4] offset
files[5] v_ptr

4

Files and Processes (3)


• Multiple reads on the same file generally never block each other, even for
overlapping reads
• Generally, a read that occurs after a write, should reflect the completion of that
write operation

• Writes should sometimes block each other, but operating systems vary widely
in how they handle this
• e.g. Linux prevents multiple concurrent writes to the same file

• Most important situation to get correct is appending to file


• Two operations must be performed: the file’s space is extended, then write is performed
into newly allocated space
• If this task isn’t atomic, results will likely be completely broken files
5

Files and Processes (4)


• Operating systems have several ways to govern concurrent file access
• Often, entire files can be locked in shared or exclusive mode
• e.g. Windows CreateFile() allows files to be locked in one of several modes at creation
• Other processes trying to perform conflicting operations are prevented from doing so by the
operating system
• Some OSes provide advisory file-locking operations
• Advisory locks aren’t enforced on actual file-IO operations
• They are only enforced when processes participate in acquiring and releasing these locks
• Example: UNIX flock() acquires/releases advisory locks on an entire file
• Processes calling flock() will be blocked if a conflicting lock is held…
• If a process decides to just directly access the flock()’d file, the OS won’t stop it!
6

Files and Processes (5)


• Example: UNIX lockf() function can acquire/release advisory locks on a region of
a file
• i.e. lock a section of the file in a shared or exclusive mode
• Windows has a similar capability
• Both flock() and lockf() are wrappers to fcntl()
• fcntl() can perform many different operations on files:
• Duplicate a file descriptor
• Get and set control flags on open files
• Enable or disable various kinds of I/O signals for open files
• Acquire or release locks on files or ranges of files
• etc.
• Some OSes also provide mandatory file-locking support
• Processes are forced to abide by the current set of file locks
• e.g. Linux has mandatory file-locking support, but this is non-standard
7

File Deletion
• File deletion is a generally straightforward operation home
• Specific implementation details depend heavily on
the file system format user1 user2
• General procedure:
A B C D
• Remove the directory entry referencing the file
• If the file system contains no other hard-links
A C
to the file, record that all of the file’s blocks are
now available for other files to use
• The file system must record what blocks are available for use when files are
created or extended
• Often called a free-space list, although many different approaches are used
to record this information
• Some file systems already have a way of doing this, e.g. FAT formats simply
mark clusters as unused in the table
8

Free Space Management


• A simple approach: a bitmap with one bit per block
• If a block is free, the corresponding bit is 1
• If a block is in use, the corresponding bit is 0
• Simple to find an available block, or a run of available blocks
• Can make more efficient by accessing the bitmap in units of words, skipping over entire
words that are 0
• This bitmap clearly occupies a certain amount of space
• e.g. a 4KiB block can record the state of 32768 blocks, or 128MiB of storage space
• A 1TB disk would require 8192 blocks (32MiB) to record the disk’s free-space bitmap

• The file system can break this bitmap into multiple parts
• e.g. Ext2 manages a free-block bitmap for groups of blocks, with the constraint that each
group’s bitmap must always fit into one block
9

Free Space Management (2)


• Another simple approach: a linked list of free blocks
• The file system records the first block in the free list
• Each free block holds a pointer to the next block
• Also very simple to find an available block
• Much harder to find a run of contiguous blocks that are available

• Tends to be more I/O costly than the bitmap approach


• Requires additional disk accesses to scan and update the free-list of blocks
• Also, wastes a lot of space in the free list…

• A better use of free blocks: store the addresses of many free blocks in each
block of the linked list
• Only a subset of the free blocks are required for this information
• Still generally requires more space than bitmap approach
10

Free Space Management (3)


• Many other ways of recording free storage space
• e.g. record runs of free contiguous blocks with (start, count) values
• e.g. maintain more sophisticated maps of free space
• A common theme: when deleting a file, many of these approaches don’t
actually require touching the newly deallocated blocks
• e.g. update a bitmap, store a block-pointer in another block, …

• Storage devices usually still contain the old contents of truncated/deleted files
• Called data remanence

• Sometimes this is useful for data recovery


• e.g. file-undelete utilities, or computer forensics when investigating crimes
• (Also generally not difficult to securely erase devices)
11

Free Space and SSDs


• Solid State Drives (SSDs) and other flash-based devices often complicate
management of free space
• SSDs are block devices; reads and writes are a fixed size
• Problem: can only write to a block that is currently empty
• Blocks can only be erased in groups, not individually
• An erase block is a group of blocks that are erased together
• This is done primarily for performance reasons
• Erase blocks are much larger than read/write blocks
• A read/write block might be 4KiB or 8KiB…
• Erase blocks are often 128 or 256 of these blocks (e.g. 2MiB)
• As long as some blocks on SSD are empty, data can be written immediately
• If the SSD has no more empty blocks, a group of blocks must be erased to
provide more empty blocks
12

Solid State Drives


• Solid State Drives include a flash translation layer that maps logical block
addresses to physical memory cells
• Recall: system uses Logical Block Addressing to access disks
• When files are written to the SSD, data must be stored in empty cells
(i.e. old contents can’t simply be overwritten)
• If a file is edited, the SSD sees a write issued against Flash Translation Layer
the same logical block old
F1.1 F1.2 F1.3 F2.1
• e.g. block 2 in file F1 is written

• SSD can’t just replace block’s contents… F2.2 F3.1 F3.2 F3.3

• SSD marks the cell as “old,” then stores the new data
F3.4 F1.2'
in another cell, and updates the mapping in the FTL
13

Solid State Drives (2)


• Over time, SSD ends up with few or no available cells
• e.g. a series of writes to our SSD that results in all cells being used
• SSD must erase at least one block of cells to be reused
• Best case is when an entire erase-block can be reclaimed
• SSD erases the entire block, and then carries on as before

Flash Translation Layer Flash Translation Layer


old old old old
Erase! F1.1 F1.2 F1.3 F2.1 F2.1''

old old
F2.2 F3.1 F3.2 F3.3 F2.2 F3.1 F3.2 F3.3

old old old old


F3.4 F1.2' F3.1' F3.4' F3.4 F1.2' F3.1' F3.4'

old
F1.1' F2.1' F1.3' F1.2'' F1.1' F2.1' F1.3' F1.2''
14

Solid State Drives (3)


• More complicated when an erase block still holds data
• e.g. SSD decides it must reclaim the third erase-block
• SSD must relocate the current contents before erasing
• Result: sometimes a write to the SSD incurs additional writes within the SSD
• Phenomenon is called write amplification

Flash Translation Layer Flash Translation Layer


old old old old old old old old
F2.1'' F3.1' F3.4' F2.1'' F3.1' F3.4'

old old
F2.2 F3.1 F3.2 F3.3 F2.2 F3.1 F3.2 F3.3

old old
Erase! F3.4 F1.2' F3.1' F3.4'

old old
F1.1' F2.1' F1.3' F1.2'' F1.1' F2.1' F1.3' F1.2''
15

Solid State Drives (4)


• SSDs must carefully manage this process to avoid uneven wear of its memory
cells
• Cells can only survive so many erase cycles, then they become useless
• Technique is called wear leveling
• How does the SSD know when a cell’s contents are no longer needed?
(i.e. when to mark the cell “old”) Flash Translation Layer
old old old old
F2.1'' F3.1' F3.4'
• The SSD only knows because it sees several writes
old
to the same logical block F2.2 F3.1 F3.2 F3.3

• The new version replaces the old version, so the old cell is
no longer used for storage
old
F1.1' F2.1' F1.3' F1.2''
16

SSDs and File Deletion


• Problem: for most file system formats, file deletion doesn’t actually touch the
blocks in the file themselves!
• File systems try to avoid this anyway, because storage I/O is slow!
• Want to update the directory entry and the free-space map only, and want this to be as
efficient as possible
• Example: File F3 is deleted from the SSD
Flash Translation Layer
• The SSD will only see the block with the directory entry change,
old old old old
and block(s) holding the free map F2.1'' F3.1' F3.4'

• The SSD has no idea that file F3’s data no longer needs F2.2 F3.1
old
F3.2 F3.3
to be preserved
• e.g. if the SSD decides to erase bank 2, it will still move F3.2 and
F3.3 to other cells, even though the OS and the users don’t care!
old
F1.1' F2.1' F1.3' F1.2''
17

SSDs, File Deletion and TRIM


• To deal with this, SSDs introduced the TRIM command
• (TRIM is not an acronym)
• When the filesystem is finished with certain logical blocks, it can issue a TRIM
command to inform the SSD that the data in those blocks can be discarded
• Previous example: file F3 is deleted
• The OS can issue a TRIM command to inform SSD that all Flash Translation Layer
associated blocks are now unused old old old old
F2.1'' F3.1' F3.4'
• TRIM allows the SSD to manage its cells much more efficiently
old old old
• Greatly reduces write magnification issues F2.2 F3.1 F3.2 F3.3

• Helps reduce wear on SSD memory cells

old
F1.1' F2.1' F1.3' F1.2''
18

SSDs and Random Access


• A common claim about SSDs is that random access Flash Translation Layer
is the same performance as sequential access old
F2.1''
old
F3.1'
old
F3.4'
old

• The Flash Translation Layer is solid-state logic


old old old
• No mechanical devices that must move over a distance F2.2 F3.1 F3.2 F3.3

• Really only true for random vs. sequential reads


• Depending on size of writes being performed, random write
performance can be much slower than sequential writes F1.1'
old
F2.1' F1.3' F1.2''
• Reason:
• Small random writes are much more likely to be spread across many erase blocks…
• Random writes are likely to vary widely in when they can be discarded…
• Overhead of write amplification is increased in these scenarios
• Sequential writes tend to avoid these characteristics, so overhead due to write
amplification is reduced
19

SSDs and Random Access


• If random-write block size grows to the point that it works well with the SSD
erase-block size and garbage collection algorithm, then random writes will be
SSD-H as SSD-M
fast as sequential
SSD-L writes Sequential Write (SSD-H) Random Write (SSD-H)
Intel • Below
Samsung Transcend Sequential Write (SSD-M) Random Write (SSD-M)
that size, sequential Sequential Write (SSD-L) Random Write (SSD-L)
X25-E S470 JetFlash 700
32 GB writes
64 GBto an SSD
32 GB are much 200

SATA faster SATAthan random


USB 3.0 writes

Throughput (MB/s)
150
SLC MLC MLC
216.9 212.6 69.1
13.8 • This
10.6affects 5.3
the design of 100
170 87
SSD-friendly 38filesystems and
5.3 0.6 0.002 50
14
database
2.3
file layouts
1.4
0
of the flash devices. List price
SFS: Random Write Considered Harmful in Solid State Drives (Min et al.) Request size
20

Next Time
• Next time: notes on the Pintos filesystem assignment

You might also like