File System: Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011
File System: Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011
Raju Pandey
Department of Computer Sciences
University of California, Davis
Spring 2011
Overall view
Application
Program
mount()
write() WriteFile()
CreateFile()
close() open() CloseHandle() ReadFile()
lseek() read()
SetFilePointer()
Memory Mgr
Memory Mgr
Process Mgr
Process Mgr
Device Mgr
Device Mgr
File Mgr
File Mgr
UNIX Windows
Hardware
ECS 150 (Operating Systems) File System (1), 2 Spring 2011 UC Davis
Files
• A file is a collection of data with some properties
contents, size, owner, last read/write time, protection …
• Files may also have types
understood by file system
o device, directory, symbolic link
understood by other parts of OS or by runtime libraries
o executable, dll, source code, object code, text file, …
ECS 150 (Operating Systems) File System (1), 3 Source: Gribble, Lazowska,
Levy, Zahorjan
File Management
Applications
• The file manager
Records
administers the
collection by:
Storing the information
on a device Structured Record Files
programmer?
ECS 150 (Operating Systems) File System (1), 4 Spring 2011 UC Davis
Interface Layers
Std.
App. Disk
Runtime OS
Code
Library
Procedure Device-type
Calls Dependent Commands
Whatever… Syscalls
ECS 150 (Operating Systems) File System (1), 5 Source: Gribble, Lazowska,
Levy, Zahorjan
Exported Abstractions
Std.
App. Disk
Runtime OS
Code
Library
/ /
ECS 150 (Operating Systems) File System (1), 6 Source: Gribble, Lazowska,
Levy, Zahorjan
Primary Roles of the OS (file system)
ECS 150 (Operating Systems) File System (1), 7 Source: Gribble, Lazowska,
Levy, Zahorjan
File access methods
• Some file systems provide different access methods that
specify ways the application will access data
sequential access
o read bytes one at a time, in order
direct access
o random access given a block/byte #
record access
o file is array of fixed- or variable-sized records
indexed access
o FS contains an index to a particular field of each record in a file
o apps can find a file based on value in that record (similar to DB)
• Why do we care about distinguishing sequential from direct
access?
what might the FS do differently in these cases?
ECS 150 (Operating Systems) File System (1), 8 Source: Gribble, Lazowska,
Levy, Zahorjan
Byte Stream File Interface
fileID = open(fileName)
close(fileID)
read(fileID, buffer, length)
write(fileID, buffer, length)
seek(fileID, filePosition)
ECS 150 (Operating Systems) File System (1), 9 Spring 2011 UC Davis
Low Level Files
fid = open(“fileName”,…);
…
read(fid, buf, buflen); b0 b1 b2 ... bi ...
…
close(fid);
ECS 150 (Operating Systems) File System (1), 10 Spring 2011 UC Davis
Structured Files
Records
Record-Block Translation
ECS 150 (Operating Systems) File System (1), 11 Spring 2011 UC Davis
Record-Oriented Sequential Files
Logical Record
fileID = open(fileName)
close(fileID)
getRecord(fileID, record)
putRecord(fileID, record)
seek(fileID, position)
ECS 150 (Operating Systems) File System (1), 12 Spring 2011 UC Davis
Record-Oriented Sequential Files
Logical Record
ECS 150 (Operating Systems) File System (1), 13 Spring 2011 UC Davis
Indexed Sequential File
fileID = open(fileName)
close(fileID)
getRecord(fileID, index)
index = putRecord(fileID, record)
deleteRecord(fileID, index)
ECS 150 (Operating Systems) File System (1), 14 Spring 2011 UC Davis
Indexed Sequential File (cont)
Application structure
Account # Index index = i
012345
123456 i
294376 k
index = k
...
529366 j
...
965987 index = j
ECS 150 (Operating Systems) File System (1), 15 Spring 2011 UC Davis
Directories
• Directories provide:
a way for users to organize their files
a convenient file name space for both users and FS’s
• Most file systems support multi-level directories
naming hierarchies (/, /usr, /usr/local, /usr/local/bin, …)
• Most file systems support the notion of current directory
absolute names: fully-qualified starting from root of FS
bash$ cd /usr/local
ECS 150 (Operating Systems) File System (1), 16 Source: Gribble, Lazowska,
Levy, Zahorjan
Single-Level Directory
Naming problem
Grouping problem
Path name
Can have the same file name for different user
Efficient searching
No grouping capability
• Grouping Capability
ECS 150 (Operating Systems) File System (1), 27 Source: Gribble, Lazowska,
Levy, Zahorjan
Path name translation
• Let’s say you want to open “/one/two/three”
fd = open(“/one/two/three”, O_RDWR);
• What goes on inside the file system?
open directory “/” (well known, can always find)
search the directory for “one”, get location of “one”
open directory “one”, search for “two”, get location of “two”
open directory “two”, search for “three”, get loc. of “three”
open file “three”
(of course, permissions are checked at each step)
• FS spends lots of time walking down directory paths
this is why open is separate from read/write (session state)
OS will cache prefix lookups to enhance performance
o /a/b, /a/bb, /a/bbb all share the “/a” prefix
ECS 150 (Operating Systems) File System (1), 28 Source: Gribble, Lazowska,
Levy, Zahorjan
File protection
• FS must implement some kind of protection system
to control who can access a file (user)
to control how they can access it (e.g., read, write, or exec)
• More generally:
generalize files to objects (the “what”)
generalize users to principals (the “who”, user or program)
generalize read/write to actions (the “how”, or operations)
• A protection system dictates whether a given action
performed by a given principal on a given object should be
allowed
e.g., you can read or write your files, but others cannot
e.g., your can read /etc/motd but you cannot write to it
ECS 150 (Operating Systems) File System (1), 29 Source: Gribble, Lazowska,
Levy, Zahorjan
Model for representing protection
objects
principals gribble r rw r
guest r capability
ACL
ECS 150 (Operating Systems) File System (1), 30 Source: Gribble, Lazowska,
Levy, Zahorjan
ACLs vs. Capabilities
• Capabilities are easy to transfer
they are like keys: can hand them off
they make sharing easy
• ACLs are easier to manage
object-centric, easy to grant and revoke
o to revoke capability, need to keep track of principals that have it
o hard to do, given that principals can hand off capabilities
• ACLs grow large when object is heavily shared
can simplify by using “groups”
o put users in groups, put groups in ACLs
o you are all in the “VMware powerusers” group on Win2K
additional benefit
o change group membership, affects ALL objects that have this group
in its ACL
ECS 150 (Operating Systems) File System (1), 31 Source: Gribble, Lazowska,
Levy, Zahorjan
Implementing Low Level Files
ECS 150 (Operating Systems) File System (1), 32 Spring 2011 UC Davis
Disk Organization
Block 0
b0 b1 b2 b3 … … bn-1
...
ECS 150 (Operating Systems) File System (1), 34 Spring 2011 UC Davis
File Descriptors
•External name
•Current state
•Sharable
•Owner
•User
•Locks
•Protection settings
•Length
•Time of creation
•Time of last modification
•Time of last access
•Reference count
•Storage device details
ECS 150 (Operating Systems) File System (1), 35 Spring 2011 UC Davis
An open() Operation
ECS 150 (Operating Systems) File System (1), 36 Spring 2011 UC Davis
File Manager Data Structures
ECS 150 (Operating Systems) File System (1), 37 Spring 2011 UC Davis
Opening a UNIX File
0 stdin
1 stdout
File structure
2 stderr
3 ...
inode
ECS 150 (Operating Systems) File System (1), 39 Spring 2011 UC Davis
Contiguous Allocation
File descriptor
Head position 237
…
First block 785
Number of blocks 25
ECS 150 (Operating Systems) File System (1), 40 Spring 2011 UC Davis
ECS 150 (Operating Systems) File System (1), 41 Source: Stallings
After Compaction
ECS 150 (Operating Systems) File System (1), 42 Source: Stallings
Linked Lists or Chained Allocation
First block
… Length Length Length
Byte 0 Byte 0 Byte 0
... ... ...
Head: 417 Byte 4095 Byte 4095 Byte 4095
...
Block 0 Block 1 Block N-1
ECS 150 (Operating Systems) File System (1), 43 Spring 2011 UC Davis
ECS 150 (Operating Systems) File System (1), 44 Source: Stallings
After Consolidation
ECS 150 (Operating Systems) File System (1), 45 Source: Stallings
Indexed Allocation
Byte 0
...
Index block
Byte 4095
…
Length Block 0
Length Byte 0
Head: 417 ...
... Byte 4095
Byte 0 Block 1
...
Length Byte 4095
Block N-1
ECS 150 (Operating Systems) File System (1), 46 Spring 2011 UC Davis
Indexed allocation with Block portions
ECS 150 (Operating Systems) File System (1), 47 Source: Stallings
Indexed allocation with variable length portions
ECS 150 (Operating Systems) File System (1), 49 Source: Gribble, Lazowska,
Levy, Zahorjan
(Old) Unix disks are divided into five parts …
• Boot block
can boot the system by loading from this block
• Superblock
specifies boundaries of next 3 areas, and contains head of
freelists of inodes and file blocks
• i-node area
contains descriptors (i-nodes) for each file on the disk; all i-
nodes are the same size; head of freelist is in the superblock
• File contents area
fixed-size blocks; head of freelist is in the superblock
• Swap area
holds processes that have been swapped out of memory
ECS 150 (Operating Systems) File System (1), 50 Source: Gribble, Lazowska,
Levy, Zahorjan
So …
• You can attach a disk to a dead system …
• Boot it up …
• Find, create, and modify files …
because the superblock is at a fixed place, and it tells you
where the i-node area and file contents area are
by convention, the second i-node is the root directory of the
volume
ECS 150 (Operating Systems) File System (1), 51 Source: Gribble, Lazowska,
Levy, Zahorjan
i-node format
• User number
• Group number
• Protection bits
• Times (file last read, file last written, inode last written)
• File code: specifies if the i-node represents a directory, an
ordinary user file, or a “special file” (typically an I/O
device)
• Size: length of file in bytes
• Block list: locates contents of file (in the file contents
area)
more on this soon!
• Link count: number of directories referencing this i-node
ECS 150 (Operating Systems) File System (1), 52 Source: Gribble, Lazowska,
Levy, Zahorjan
The flat (i-node) file system
• Each file is known by a number, which is the number of the
i-node
seriously – 1, 2, 3, etc.!
why is it called “flat”?
• Files are created empty, and grow when extended through
writes
ECS 150 (Operating Systems) File System (1), 53 Source: Gribble, Lazowska,
Levy, Zahorjan
The tree (directory, hierarchical) file system
…
1
…
…
10
…
11
12
…
…
…
ECS 150 (Operating Systems) File System (1), 55 Source: Gribble, Lazowska,
Levy, Zahorjan
So …
ECS 150 (Operating Systems) File System (1), 56 Source: Gribble, Lazowska,
Levy, Zahorjan
• Can get to 128 x 128 x 128 x 512B = an additional 1GB
with a triple indirect reference
(the 13th pointer in the i-node gets you to a 512B block in
the file contents area that contains 128 4B pointers to
512B blocks in the file contents area that contain 128 4B
pointers to 512B blocks in the file contents area that
contain 128 4B pointers to 512B blocks holding file data)
• Maximum file size is 1GB + a smidge
• Berkeley Unix went to 1KB block sizes
What’s the effect on the maximum file size?
o 256x256x256x1K = 17 GB + a smidge
What’s the price?
• Subsequently went to 4KB blocks
1Kx1Kx1Kx4K = 4TB + a smidge
ECS 150 (Operating Systems) File System (1), 57 Source: Gribble, Lazowska,
Levy, Zahorjan
Putting it all together
•••
directory ‘var/’
directory ‘usr/’
(table of entries)
inode for (table of entries)
inode for
‘usr/’ ‘var/’
data blocks
indirection
block
inode file block indirection block
free list free list
•••
data blocks
Indirection
block
ECS 150 (Operating Systems) File System (1), 58 Source: Gribble, Lazowska,
Levy, Zahorjan
The UNIX File System
ECS 150 (Operating Systems) File System (1), 60 Source: Gribble, Lazowska,
Levy, Zahorjan
File system consistency
• Both i-nodes and file blocks are cached in memory
• The “sync” command forces memory-resident disk
information to be written to disk
system does a sync every few seconds
• A crash or power failure between sync’s can leave an
inconsistent disk
• You could reduce the frequency of problems by reducing
caching, but performance would suffer big-time
ECS 150 (Operating Systems) File System (1), 61 Source: Gribble, Lazowska,
Levy, Zahorjan
What do you do after a crash?
• Run a program called “fsck” to try to fix any consistency
problems
• fsck has to scan the entire disk
as disks are getting bigger, fsck is taking longer and longer
modern disks: fsck can take a full day!
• Newer file systems try to help here
are more clever about the order in which writes happen, and
where writes are directed
o e.g., Journaling file system: collect recent writes in a log called a
journal. On crash, run through journal to replay against file
system.
ECS 150 (Operating Systems) File System (1), 62 Source: Gribble, Lazowska,
Levy, Zahorjan
VFS-based File Manager
ECS 150 (Operating Systems) File System (1), 63 Spring 2011 UC Davis
Linux VFS
• Multiple interfaces build
up VFS: File access
files
dentries
inodes
superblock
quota
• VFS can do all caching &
provides utility fctns to FS
• FS provides methods to
VFS; many are optional
ECS 150 (Operating Systems) File System (1), 64 Source: P.J.Braam, CMU
User level file access
ECS 150 (Operating Systems) File System (1), 65 Source: P.J.Braam, CMU
VFS
ECS 150 (Operating Systems) File System (1), 66 Source: P.J.Braam, CMU
File system level
ECS 150 (Operating Systems) File System (1), 67 Source: P.J.Braam, CMU
Anatomy of stat system call
sys_stat(path, buf) {
dentry = namei(path);
if ( dentry == NULL ) return -
ENOENT; Establish VFS data
inode = dentry->d_inode;
rc =inode->i_op-
Call into inode
>i_permission(inode);
layer of filesystem
if ( rc ) return -EPERM;
Call into inode
rc = inode->i_op- layer of filesystem
>i_getattr(inode, buf);
dput(dentry);
return rc;
}
ECS 150 (Operating Systems) File System (1), 68 Source: P.J.Braam, CMU
Anatomy of fstatfs system call
ECS 150 (Operating Systems) File System (1), 69 Source: P.J.Braam, CMU
DOS FAT Files
File Descriptor
43 254
… 107
Disk Disk Disk
Block Block Block
File Descriptor
43 43 254
… 107
107 Disk Disk Disk
Block Block Block
254