Filesys
Filesys
File systems
Files
Directories & naming
File system implementation
Example file systems
Long-term information storage
Must store large amounts of data
Gigabytes -> terabytes -> petabytes
Stored information must survive the termination of
the process using it
Lifetime can be seconds to years
Must have some way of finding it!
Multiple processes must be able to access the
information concurrently
3
Naming files
Important to be able to find files after they’re created
Every file has at least one name
Name can be
Human-accessible: “foo.c”, “my photo”, “Go Panthers!”, “Go Banana
Slugs!”
Machine-usable: 4502, 33481
Case may or may not matter
Depends on the file system
Name may include information about the file’s contents
Certainly does for the user (the name should make it easy to figure out
what’s in it!)
Computer may use part of the name to determine the file type
4
Typical file extensions
1 record
1 byte
Tree
6
File types
Executable
file
Archive
7
Accessing a file
Sequential access
Read all bytes/records from the beginning
Cannot jump around
May rewind or back up, however
Convenient when medium was magnetic tape
Often useful when whole file is needed
Random access
Bytes (or records) read in any order
Essential for database systems
Read can be …
Move file marker (seek), then read or …
Read and then move file marker
8
File attributes
9
File operations
Create: make a new file Append: like write, but only
Delete: remove an existing at the end of the file
file Seek: move the “current”
Open: prepare a file to be pointer elsewhere in the file
accessed Get attributes: retrieve
Close: indicate that a file is attribute information
no longer being accessed Set attributes: modify
Read: get data from a file attribute information
Write: put data to a file Rename: change a file’s
name
10
Using file system calls
11
Using file system calls, continued
12
Memory-mapped files
Program Program
text text abc
Data Data xyz
13
More on memory-mapped files
Memory-mapped files are a convenient abstraction
Example: string search in a large file can be done just as
with memory!
Let the OS do the buffering (reads & writes) in the virtual
memory system
Some issues come up…
How long is the file?
Easy if read-only
Difficult if writes allowed: what if a write is past the end of file?
What happens if the file is shared: when do changes
appear to other processes?
When are writes flushed out to disk?
Clearly, easier to memory map read-only files…
14
Directories
Naming is nice, but limited
Humans like to group things together for
convenience
File systems allow this to be done with directories
(sometimes called folders)
Grouping makes it easier to
Find files in the first place: remember the enclosing
directories for the file
Locate related files (or just determine which files are
related)
15
Single-level directory systems
Root
directory
A A B C
foo bar baz blah
16
Two-level directory system
Root
directory
A B C
A A B B C C C
foo bar foo baz bar foo blah
17
Hierarchical directory system
Root
directory
A B C
A A A B B C C C
Papers foo Photos foo Papers bar foo blah
A A A B B
os.tex sunset Family foo.tex foo.ps
A A A
sunset kids Mom
18
Unix directory tree
20
File system implementation issues
How are disks divided up into file systems?
How does the file system allocate blocks to files?
How does the file system manage free space?
How are directories handled?
How can the file system improve…
Performance?
Reliability?
21
Carving up the disk
Entire disk
Partition table
Master
Partition 1 Partition 2 Partition 3 Partition 4
boot record
22
Contiguous allocation for file blocks
A B C D E F
A Free C Free E F
23
Contiguous allocation
Data in each file is stored in 0 1 2 3
consecutive blocks on disk
Simple & efficient indexing
Starting location (block #) on disk
(start) 4 5 6 7
Length of the file in blocks (length)
Random access well-supported
Difficult to grow files
Must pre-allocate all needed space
Wasteful of storage if file isn’t 8 9 10 11
using all of the space
Logical to physical mapping is easy
blocknum = (pos / 1024) + start;
offset_in_block = pos % 1024;
Start=5
Length=2902
24
Linked allocation
0 1 2 3
File is a linked list of disk
4 6
blocks
Blocks may be scattered
around the disk drive 4 5 6 7
Block contains both pointer
x x
to next block and data
Files may be as long as
needed 8 9 10 11
New blocks are allocated as 0
needed
Linked into list of blocks in
file Start=9 Start=3
Removed from list (bitmap) End=4 End=6
of free blocks Length=2902 Length=1500
25
Finding blocks with linked allocation
Directory structure is simple
Starting address looked up from directory
Directory only keeps track of first block (not others)
No wasted space - all blocks can be used
Random access is difficult: must always start at first block!
Logical to physical mapping is done by
block = start;
offset_in_block = pos % 1020;
for (j = 0; j < pos / 1020; j++) {
block = block->next;
}
Assumes that next pointer is stored at end of block
May require a long time for seek to random location in file
26
Linked allocation using a RAM-based table
0 4
1 -1
Links on disk are slow
2 -1 Keep linked list in memory
3 -2 Advantage: faster
4 -2
5 -1
Disadvantages
6 3 B
Have to copy it to disk at
7 -1 some point
8 -1 Have to keep in-memory and
9 0 A on-disk copy consistent
10 -1
11 -1
12 -1
13 -1
14 -1
15 -1
29
Larger files with indexed allocation
How can indexed allocation allow files larger than a single
index block?
Linked index blocks: similar to linked file blocks, but using
index blocks instead
Logical to physical mapping is done by
index = start;
blocknum = pos / 1024;
for (j = 0; j < blocknum /255); j++) {
index = index->next;
}
block = index[blocknum % 255];
offset_in_block = pos % 1024;
File size is now unlimited
Random access slow, but only for very large files
CS 1550, cs.pitt.edu Chapter 6 30
(originaly modified by Ethan
Two-level indexed allocation
Allow larger files by creating an index of index blocks
File size still limited, but much larger
Limit for 1 KB blocks = 1 KB * 256 * 256 = 226 bytes = 64 MB
Logical to physical mapping is done by
blocknum = pos / 1024;
index = start[blocknum / 256)];
block = index[blocknum % 256]
offset_in_block = pos % 1024;
Start is the only pointer kept in the directory
attributes
games attributes games
mail attributes mail attributes
news attributes news
attributes
research attributes research
attributes
Storing all information Using pointers to
in the directory index nodes
CS 1550, cs.pitt.edu Chapter 6 36
(originaly modified by Ethan
Directory structure
Structure
Linear list of files (often itself stored in a file)
Simple to program
Slow to run
Increase speed by keeping it sorted (insertions are slower!)
Hash table: name hashed and looked up in file
Decreases search time: no linear searches!
May be difficult to expand
Can result in collisions (two files hash to same location)
Tree
Fast for searching
Easy to expand
Difficult to do in on-disk directory
Name length
Fixed: easy to program
Variable: more flexible, better for users
A B C
A A A B B C C C
Papers foo Photos foo Photos bar foo blah
A A A B
os.tex sunset Family lake
A A ?
sunset kids ???
CS 1550, cs.pitt.edu Chapter 6 39
(originaly modified by Ethan
Solution: use links
A creates a file, and inserts into her directory
B shares the file by creating a link to it
A unlinks the file
B still links to the file
Owner is still A (unless B explicitly changes it)
A A B B
b.tex b.tex
a.tex a.tex
Block size
Bytes
Checksum