0% found this document useful (0 votes)
51 views8 pages

This Lecture: Physical Reality (Disks) File System Abstraction

The document discusses file system abstraction and components including disk management, naming, security, reliability, and file attributes. It describes how file operations like create, write, read, delete and open are implemented. Data structures for open file tables, file descriptors and disk management are also covered.

Uploaded by

hoang.van.tuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views8 pages

This Lecture: Physical Reality (Disks) File System Abstraction

The document discusses file system abstraction and components including disk management, naming, security, reliability, and file attributes. It describes how file operations like create, write, read, delete and open are implemented. Data structures for open file tables, file descriptors and disk management are also covered.

Uploaded by

hoang.van.tuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

This lecture

n Implementing file system abstraction

File Systems Physical Reality (Disks) File System Abstraction

block oriented byte oriented

physical sector #’s named files


Arvind Krishnamurthy
Spring 2001 no protection users protected from
each other

data might be corrupted robust to machine failures


If machine crashes

File System Components User vs. system view of a file


n Disk management n User’s view
n Durable data structures
n Arrange collection of disk blocks into files User
n Collection of bytes (Unix)
n Naming n Collection of records (IBM)
File File
n User gives file name, not track or sector Naming access
number, to locate data
n Security / protection Disk n System’s view (system call interface)
n Keep information secure
management n Collection of bytes (Unix)

n Reliability/durability Disk
n When system crashes, lose stuff in drivers n System’s view (inside OS):
memory, but want files to be durable n Collection of blocks
n A block is a logical transfer unit, while a sector is the physical
transfer unit. Block size >= sector size.

File usage patterns File attributes


n How do users access files? n Name – only information kept in human-readable form.
n Sequential: bytes read in order n Type – needed for systems that support different types.
Random: read/write element out of middle of arrays
n
n Location – pointer to file location on device.
n Content-based access: find me next byte starting with “CS422”
n Size – current file size.
n How are files used?
n Protection – controls who can do reading, writing,
n Most files are small
executing.
n Large files use up most of the disk space
n Large files account for most of the bytes transferred n Time, date, and user identification – data for
n Bad news protection, security, and usage monitoring.
n Need everything to be efficient n Information about files are kept in the directory structure,
which is maintained on the disk.

1
Data structures for a typical file
File operations system
n create
n write Process Open file
n read control table File descriptors
block (systemwide) (Metadata) File system
n reposition within file – file seek info
n delete File
descriptors
n truncate
n open(Fi) – search the directory structure on disk for entry Open Directories
Fi, and move the content of entry to memory. file
..
pointer
n close (Fi) – move the content of entry Fi in memory to array
.
directory structure on disk. File data

Translating from user to system


Open a file view
n File name lookup and n What happens if user wants to read 10 bytes from a file starting at
authenticate byte 2?
fd = open( FileName, access) seek byte 2
n Copy the file descriptors into n

the in-memory data n fetch the block


structure, if it is not in yet n read 10 bytes
n Create an entry in the open PCB Allocate & link up n What happens if user wants to write 10 bytes to a file starting at byte
file table (system wide) if data structures 2?
there isn’t one n seek byte 2
n Create an entry in PCB Open File name lookup n fetch the block
file write 10 bytes
n Link up the data structures & authenticate n
table
n Return a pointer to user n write out the block
n Everything inside file system is in whole size blocks
Metadata File system on disk n Even getc and putc buffers 4096 bytes
n From now on, file is collection of blocks.

Read a block Data structures for Disks


n A file header for each file (part of the file meta-data)
read( fd, userBuf, size ) n Disk sectors associated with each file
PCB
Find open file
n A data structure to represent free space on disk
descriptor n Bit map
n 1 bit per block (sector)
Open
file read( fileDesc, userBuf, size )
table n blocks numbered in cylinder-major order, why?

Logical → phyiscal n Linked list


n Others?
Metadata read( device, phyBlock, size )
n How much space does a bitmap need for a 4G disk?
Get physical block to sysBuf
Buffer copy to userBuf
cache
Disk device driver

2
Contiguous allocation Linked files
n Request in advance for the size of the file n File header points to 1st
n Search bit map or linked list to locate a space block on disk
File header
n File header n A block points to the next
n first sector in file n Pros
n number of sectors n Can grow files dynamically
Free list is similar to a file
Pros
n
n
n No waste of space
Fast sequential access
Cons
n
n ...
n Easy random access
n Sequential access: slow
n Cons n random access: horrible
n External fragmentation n unreliable: losing a block
n Hard to grow files means losing the rest null

Linked files (cont’d) File Allocation Table (FAT)


n Approach (used by MSDOS)
n A section of disk for each foo 217 0
partition is reserved
n One entry for each block
n A file is a linked list of blocks
217 619
n A directory entry points to the
1st block of the file
n Pros 399 EOF
n Simple
n Improved random access
619 399
n Cons
n Always go to FAT
n Wasting space
FAT

Single-level indexed files(Nachos) Single-level indexed files (cont’d)


n A user declares max size
n A file header holds an array of
pointers to point to disk blocks Disk
n Pros File header blocks
n Can grow up to a limit
n Random access is fast
n No external fragmentation
n Cons
n Clumsy to grow beyond the limit
n Still lots of seeks

3
Multi-level indexed files Combined Scheme (Unix 4.1)
n 13 Pointers in a header
n 10 direct pointers data
n 11: 1-level indirect
n 12: 2-level indirect
n 13: 3-level indirect 1
data
2

Μ
n Pros & Cons ...
n In favor of small files 11 ..
12
. data
n Can grow 13

n Limit is 16G and lots of


seek .. ..
. . data
outer-index n What happens to reach
block 23, 5, 340? .. .. ..
. . . data
index table file

Unix file header (I-node) File header storage


n Where is file header stored on disk?
n In (early) Unix & DOS FAT file sys, special array in outermost
cylinders

n Unix refers to file by index into array --- tells it where to


find the file header
n “i-node” --- file header; “i-number” --- index into the array

n Unix file header organization (seems strange):


n header not anywhere near the data blocks. To read a small file,
seek to get header, seek back to data.

n fixed size, set when disk is formatted.

File header storage (cont’d) Disk Layout


n Why not put headers near data?
n Reliability: whatever happens to the disk, you can find all of the
Boot Super File descriptors
files. File data blocks
block block (i-node in Unix)
n Unix BSD 4.2 puts portion of the file header array on each cylinder.
For small directories, can fit all data, file headers, etc. in same
cylinder è no seeks!
n Superblock defines a file system
n File headers are much smaller than a whole block (a few hundred
n size of the file system
bytes), so multiple file headers fetched from disk at same time.
n size of the file descriptor area
n Q: do you ever look at a file header without reading the n free list pointer, or pointer to bitmap
file? n location of the file descriptor of the root directory
n Yes! Reading the header is 4 times more common than reading the
n other meta-data such as permission and various times
file (e.g., ls, make).

4
Naming and directories Directory structure
n Options n Approach 1: have a single directory for entire system.
n Use index (ask users specify inode number). Easier for system, not n put directory at known location on disk
as easy for users. n directory contains <name, index> pairs
n Text name (need to map to index) n if one user uses a name, no one else can
many older personal computers work this way.
Icon (need to map to index; or map to name then to index)
n
n

n Directories n Approach 2: have a single directory for each user


n Directory map name to file index (where to find file header) n still clumsy. And ls on 10,000 files is a real pain
n Directory is just a table of file name, file index pairs. n many older mathematicians work this way.

n Each directory is stored as a file, containing a (name, index) pair. n Approach 3: hierarchical name spaces
n Only OS permitted to modify directory n allow directory to map names to files or other dirs
n file system forms a tree (or graph, if links allowed)
n large name spaces tend to be hierarchical (ip addresses, domain
names, scoping in programming languages, etc.)

/
Hierarchical Unix Naming magic
afs bin cdrom dev sbin tmp
n Used since CTSS (1960s) n Bootstrapping: Where do you start looking?
n Unix picked up and used really nicely. awk chmod chown n Root directory
n Directories stored on disk just like regular files n inode #2 on the system
n inode contains special flag bit set n 0 and 1 used for other purposes
n <name, inode#>
users can read just like any other file n Special names:
n only special programs can write <afs, 1021> n Root directory: “/” (bootstrap name system for users)
<tmp, 1020> n Current directory: “.”
n file pointed to by the index may be <bin, 1022> n Parent directory: “..” (otherwise how to go up??)
another directory <cdrom, 4123> n user’s home directory: “~”
n makes FS into hierarchical tree <dev, 1001> n Using the given names, only need two operations to navigate
(what needed to make a DAG?) <sbin, 1011> the entire name space:
... n cd ‘name’: move into (change context to) directory “name”
n Simple. Plus speeding up file ops = speeding up dir ops! n ls : enumerate all names in current directory (context)

Default context: working


Unix example: /a/b/c.c directory
“.” Name space Physical organization
n Cumbersome to constantly specify full path names
a disk n in Unix, each process associated with a “current working
directory”
“..” 2 n file names that do not begin with “/” are assumed to be relative
b 3 Inode table to the working directory, otherwise translation happens as
“.” 4 before
5
...
<a,3> What inode holds file for n Shells track a default list of active contexts
c.c a? b? c.c? n a “search path”
<c.c, 14> n given a search path { A, B, C } a shell will check in A, then
How many disk I/O’s to check in B, then check in C
<b,5> access first byte of c.c? n can escape using explicit paths: “./foo”

5
Creating synonyms: hard and soft Example: basic system calls in
links Unix
n More than one dir entry can refer to a given file What happens when you open and read a file?
n Unix stores count of pointers (“hard links”) to inode
foo bar n open
n to make: “ln foo bar” creates a
synonym (‘bar’) for ‘foo’ n read
ref = 2
Soft links: close
...
n n

n also point to a file (or dir), but object can be deleted from underneath n lseek
it (or never even exist). n create
n Unix builds like directories: normal file holds pointer to name, with
special “sym link” bit set n write

“baz” /bar
n When the file system encounters a symbolic link it automatically
translates it (if possible).

Example: the open-read-close Example: open-read-close


cycle (cont’d)
1. The process calls open (“DATA.test”, RD_ONLY) 1. The process calls open (“DATA.test”, RD_ONLY)
2. The kernel: 2. The kernel:
n Get the current working directory of the process: n Get the current working directory of the process:
Let’s say “/c/cs422/as/as3 n Call “namei” and get the inode for DATA.test;
n Find an empty slot “fd” in the file descriptor table for the process;
n Call “namei”:
n Put the pointer to the inode in the slot “fd”;
Get the inode for the root directory “/”
n Set the initial file pointer value in the slot “fd” to 0;
For (each component in the path) { n Return “fd”.
can we open and read the directory file ? 3. The process calls read(fd, buffer, length);
if no, open request failed, return error; 4. The kernel:
if yes, read the blocks in the directory file; n From “fd” find the file pointer
Based on the information from the I-node, read through the n Based on the file system block size (let’s say 1 KB), find the blocks where
directory file
the bytes (file_pointer, file_pointer+length) lies;
to find the inode for the next component;
n Read the inode
}
At the end of the loop, we have the inode for the file DATA.test

Example: open-read-close Example: the create-write-close


(cont’d) cycle
4. The kernel: 1. The process calls create (“README”);
n From “fd” find the file pointer 2. The kernel:
n Based on the file system block size (let’s say 1 KB), find the blocks where n Get the current working directory of the process:
the bytes (file_pointer, file_pointer+length) lies; Let’s say “/c/cs422/as/as3
n Read the inode n Call “namei” and see if a file name “README” already exists in that
n For (each block) { directory
n If the block # < 11, find the disk address of the block in the entries in n If yes, return error “file already exists”;
the inode n If no:
n If the block # >= 11, but < 11 + (1024/4): read the “single indirect”
Allocate a new inode;
block to find the address of the block
Write the directory file “/c/cs422/as/as3” to add a new entry for the
n If the block # >= 11+(1024/4) but < 11 + 256 + 256 * 256: read the

“double indirect” block and find the block’s address (“README”, disk address of inode) pair
Find an empty slot “fd” in the file descriptor table for the process;
n Otherwise, read the “triple indirect” block and find the block’s address }
n

n Read the block from the disk n Put the pointer to the inode in the slot “fd”;
n Copy the bytes in the block to the appropriate location in the buffer n Set the file pointer in the slot “fd” to 0;
Return “fd”;
5. The process calls close(fd); n

6. The kernel: deallocate the fd entry, mark it as empty.

6
Example: create-write-close
(cont’d) Protection
3. The process calls write(fd, buffer, length); n Goals:
4. The kernel: n Prevent accidental and maliciously destructive behavior
n From “fd” find the file pointer;
n Ensure fair resource usage
n Based on the file system block size (let’s say 1 KB), find the blocks where
the bytes (file_pointer, file_pointer+length) lies;
n Read the inode
For (each block) {
A key distinction to make: policy vs. mechanism
n
n
n If the block is new, allocate a new disk block;

n Based on the block no, enter the block’s address to the appropriate
n Mechanism: how something is to be done
places in the inode or the indirect blocks; (the indirect blocks are n Policy: what is to be done
allocated as needed)
n Copy the bytes in buffer to the appropriate location in the block }

n Change the file size field in inode if necessary


5. The process calls close(fd);
6. The kernel deallocate the fd entry --- mark it as empty.

Access control Access control matrix


n Domain structure n Conceptually, we can think of the system enforcing access
n Access/usage rights associated with particular domain controls based on a giant table that encodes all access
Example: user/kernel mode è two domains
n
rights held by each domain in the system
n Unix: each user is a domain; super-user domain; groups of users (and
groups) Example:
n Type of access rights File1 File2 File3 Dir1 Dir2 …
n For files: read/write/execute UserA rw r rwx lmd l …
n For directories: list/modify/delete GroupB r rw lm …
n For access rights themselves … … … … .. … …
n Owner (I have the right to change the access rights for some resource)

n Copy (I have the right to give someone else a copy of an access right I
The access control matrix is the “policy” we want to enforce;
have)
Mechanisms: (1) access control lists
n Control (I have the right to revoke someone else’s access rights)
(2) capability lists

Access control lists vs. capability


lists A combined approach
n Access control lists (ACL): keep lists of access for each domain with n Objects have ACLs
each object:
File3: User A: rwx
Group B: rw n Users have CAPs, called “groups” or “roles”
…………

n Capability lists (CAP): keep lists of access rights for each object with n ACLs can refer to users or groups
each domain
User A: File1: rw
File2: r n Change permissions on an object by modifying its ACL
…………

n Which is better? n Change broad user permissions via changes in group


n ACLs allow easy changing of an object’s permissions membership
n Capability lists allow easy changing of a domain’s permissions

7
Revocation
n How does one revoke someone’s access rights to a particular object?
n Easy with ACLs: just remove entry from the list. Takes effects immediately
since the ACL is checked on each object access.

n Harder to do with CAPs since they are not stored with the object being
controlled.

Not so bad in a single machine: could keep all capability lists in a well-
known place (e.g., the OS capability table).

Very hard in distributed system, where remote hosts may have crashed
or may not cooperate.
(Solutions: expiration dates, back pointers to all CAPs that have been
handed out; maintain a revocation list that gets checked on every
access attempt)

You might also like