Operating Systems - File-System Interface
Operating Systems - File-System Interface
File-System Interface
References:
1. Abraham Silberschatz, Greg Gagne, and Peter Baer Galvin, "Operating System Concepts, Eighth Edition ", Chapter 10
Windows ( and some other systems ) use special file extensions to indicate the type of each file:
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 1/13
2/20/2021 Operating Systems: File-System Interface
Macintosh stores a creator attribute for each file, according to the program that first created it with the create( )
system call.
UNIX stores magic numbers at the beginning of certain files. ( Experiment with the "file" command, especially in
directories such as /bin and /dev )
Some files contain an internal structure, which may or may not be known to the OS.
For the OS to support particular file formats increases the size and complexity of the OS.
UNIX treats all files as sequences of bytes, with no further consideration of the internal structure. ( With the
exception of executable binary programs, which it must know how to load and find the first executable statement,
etc. )
Macintosh files have two forks - a resource fork, and a data fork. The resource fork contains information relating
to the UI, such as icons and button images, and can be modified independently of the data fork, which contains the
code or data as appropriate.
Disk files are accessed in units of physical blocks, typically 512 bytes or some power-of-two multiple thereof.
( Larger physical disks use larger block sizes, to keep the range of block numbers within the range of a 32-bit
integer. )
Internally files are organized in units of logical units, which may be as small as a single byte, or may be a larger
size corresponding to some data record or structure size.
The number of logical units which fit into one physical block determines its packing, and has an impact on the
amount of internal fragmentation ( wasted space ) that occurs.
As a general rule, half a physical block is wasted for each file, and the larger the block sizes the more space is lost
to internal fragmentation.
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 2/13
2/20/2021 Operating Systems: File-System Interface
10.2.1 Sequential Access
A sequential access file emulates magnetic tape operation, and generally supports a few operations:
read next - read a record and advance the tape to the next position.
write next - write a record and advance the tape to the next position.
rewind
skip n records - May or may not be supported. N may be limited to positive numbers, or may be limited to
+/- 1.
Jump to any record and read that record. Operations supported include:
read n - read record number n. ( Note an argument is now required. )
write n - write record number n. ( Note an argument is now required. )
jump to record n - could be 0 or the end of file.
Query current record - used to return back to this record later.
Sequential access can be easily emulated using direct access. The inverse is complicated and inefficient.
An indexed access scheme can be easily built on top of a direct access system. Very large files may require a multi-
tiered indexing scheme, i.e. indexes of indexes.
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 3/13
2/20/2021 Operating Systems: File-System Interface
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 4/13
2/20/2021 Operating Systems: File-System Interface
10.3.3. Single-Level Directory
Figure 10.8
Figure 10.9
An obvious extension to the two-tiered directory structure, and the one with which we are all most familiar.
Each user / process has the concept of a current directory from which all ( relative ) searches take place.
Files may be accessed using either absolute pathnames ( relative to the root of the tree ) or relative pathnames
( relative to the current directory. )
Directories are stored the same as any other file in the system, except there is a bit that identifies them as
directories, and they have some special structure that the OS understands.
One question for consideration is whether or not to allow the removal of directories that are not empty - Windows
requires that directories be emptied first, and UNIX provides an option for deleting entire sub-trees.
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 5/13
2/20/2021 Operating Systems: File-System Interface
Figure 10.10
When the same files need to be accessed in more than one place in the directory structure ( e.g. because they are
being shared by more than one user / process ), it can be useful to provide an acyclic-graph structure. ( Note the
directed arcs from parent to child. )
UNIX provides two types of links for implementing the acyclic-graph structure. ( See "man ln" for more details. )
A hard link ( usually just called a link ) involves multiple directory entries that both refer to the same file.
Hard links are only valid for ordinary files in the same filesystem.
A symbolic link, that involves a special file, containing information about where to find the linked file.
Symbolic links may be used to link directories and/or files in other filesystems, as well as ordinary files in
the current filesystem.
Windows only supports symbolic links, termed shortcuts.
Hard links require a reference count, or link count for each file, keeping track of how many directory entries are
currently referring to this file. Whenever one of the references is removed the link count is reduced, and when it
reaches zero, the disk space can be reclaimed.
For symbolic links there is some question as to what to do with the symbolic links when the original file is moved
or deleted:
One option is to find all the symbolic links and adjust them also.
Another is to leave the symbolic links dangling, and discover that they are no longer valid the next time they
are used.
What if the original file is removed, and replaced with another file having the same name before the
symbolic link is next used?
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 6/13
2/20/2021 Operating Systems: File-System Interface
Figure 10.11
If cycles are allowed in the graphs, then several problems can arise:
Search algorithms can go into infinite loops. One solution is to not follow links in search algorithms. ( Or not
to follow symbolic links, and to only allow symbolic links to refer to directories. )
Sub-trees can become disconnected from the rest of the tree and still not have their reference counts reduced
to zero. Periodic garbage collection is required to detect and resolve this problem. ( chkdsk in DOS and fsck
in UNIX search for these problems, among others, even though cycles are not supposed to be allowed in
either system. Disconnected disk blocks that are not marked as free are added back to the file systems with
made-up file names, and can usually be safely deleted. )
Figure 10.12
Figure 10.13
Figure 10.14
The traditional Windows OS runs an extended two-tier directory structure, where the first tier of the structure separates volumes
by drive letters, and a tree structure is implemented below that level.
Macintosh runs a similar system, where each new volume that is found is automatically mounted and added to the desktop
when it is found.
More recent Windows systems allow filesystems to be mounted to any directory in the filesystem, much like UNIX.
The advent of the Internet introduces issues for accessing files stored on remote computers
The original method was ftp, allowing individual files to be transported across systems as needed. Ftp can be
either account and password controlled, or anonymous, not requiring any user name or password.
Various forms of distributed file systems allow remote file systems to be mounted onto a local directory
structure, and accessed using normal file access commands. ( The actual files are still transported across the
network as needed, possibly using ftp as the underlying transport mechanism. )
The WWW has made it easy once again to access files on remote systems without mounting their
filesystems, generally using ( anonymous ) ftp as the underlying file transport mechanism.
When one computer system remotely mounts a filesystem that is physically located on another system,
the system which physically owns the files acts as a server, and the system which mounts them is the
client.
User IDs and group IDs must be consistent across both systems for the system to work properly. ( I.e.
this is most applicable across multiple computers managed by the same organization, shared by a
common group of users. )
The same computer can be both a client and a server. ( E.g. cross-linked file systems. )
There are a number of security concerns involved in this model:
Servers commonly restrict mount permission to certain trusted systems only. Spoofing ( a
computer pretending to be a different computer ) is a potential security risk.
Servers may restrict remote access to read-only.
Servers restrict which filesystems may be remotely mounted. Generally the information within
those subsystems is limited, relatively public, and protected by frequent backups.
The NFS ( Network File System ) is a classic example of such a system.
The Domain Name System, DNS, provides for a unique naming system across all of the Internet.
Domain names are maintained by the Network Information System, NIS, which unfortunately has
several security issues. NIS+ is a more secure version, but has not yet gained the same widespread
acceptance as NIS.
Microsoft's Common Internet File System, CIFS, establishes a network login for each user on a
networked system with shared file access. Older Windows systems used domains, and newer systems (
XP, 2000 ), use active directories. User names must match across the network for this system to be
valid.
A newer approach is the Lightweight Directory-Access Protocol, LDAP, which provides a secure
single sign-on for all users to access all resources on a network. This is a secure system which is
gaining in popularity, and which has the maintenance advantage of combining authorization
information in one central location.
When a local disk file is unavailable, the result is generally known immediately, and is generally non-
recoverable. The only reasonable response is for the response to fail.
However when a remote file is unavailable, there are many possible reasons, and whether or not it is
unrecoverable is not readily apparent. Hence most remote access systems allow for blocking or
delayed response, in the hopes that the remote system ( or the network ) will come back up eventually.
Consistency Semantics deals with the consistency between the views of shared files on a networked system. When
one user changes the file, when do other users see the changes?
At first glance this appears to have all of the synchronization issues discussed in Chapter 6. Unfortunately the long
delays involved in network operations prohibit the use of atomic operations as discussed in that chapter.
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 9/13
2/20/2021 Operating Systems: File-System Interface
10.5.3.1 UNIX Semantics
Under this system, when a file is declared as shared by its creator, it becomes immutable and the name
cannot be re-used for any other resource. Hence it becomes read-only, and shared access is simple.
10.6 Protection
Files must be kept safe for reliability ( against accidental damage ), and protection ( against deliberate malicious access. ) The
former is usually managed with backup copies. This section discusses the latter.
One simple protection scheme is to remove all access to a file. However this makes the file unusable, so some sort of controlled
access must be arranged.
One approach is to have complicated Access Control Lists, ACL, which specify exactly what access is allowed or
denied for specific users or groups.
The AFS uses this system for distributed access.
Control is very finely adjustable, but may be complicated, particularly when the specific users involved are
unknown. ( AFS allows some wild cards, so for example all users on a certain remote system may be trusted,
or a given username may be trusted when accessing from any remote system. )
UNIX uses a set of 9 access control bits, in three groups of three. These correspond to R, W, and X permissions for
each of the Owner, Group, and Others. ( See "man chmod" for full details. ) The RWX bits control the following
privileges for ordinary files and directories:
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 10/13
2/20/2021 Operating Systems: File-System Interface
file access any specific file in the directory. Note that if a user has X but not R
contents permissions on a directory, they can still access specific files, but only if
as a they already know the name of the file they are trying to access.
program.
In addition there are some special bits that can also be applied:
The set user ID ( SUID ) bit and/or the set group ID ( SGID ) bits applied to executable files temporarily
change the identity of whoever runs the program to match that of the owner / group of the executable
program. This allows users running specific programs to have access to files ( while running that program )
to which they would normally be unable to access. Setting of these two bits is usually restricted to root, and
must be done with caution, as it introduces a potential security leak.
The sticky bit on a directory modifies write permission, allowing users to only delete files for which they are
the owner. This allows everyone to create files in /tmp, for example, but to only delete files which they have
created, and not anyone else's.
The SUID, SGID, and sticky bits are indicated with an S, S, and T in the positions for execute permission for
the user, group, and others, respectively. If the letter is lower case, ( s, s, t ), then the corresponding execute
permission is not also given. If it is upper case, ( S, S, T ), then the coresponding execute permission IS
given.
The numeric form of chmod is needed to set these advanced bits.
Figure 10.16
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 11/13
2/20/2021 Operating Systems: File-System Interface
Figure 10.15
Some systems can apply passwords, either to individual files, or to specific sub-directories, or to the entire system.
There is a trade-off between the number of passwords that must be maintained ( and remembered by the users ) and
the amount of information that is vulnerable to a lost or forgotten password.
Older systems which did not originally have multi-user file access permissions ( DOS and older versions of Mac )
must now be retrofitted if they are to share files on a network.
Access to a file requires access to all the files along its path as well. In a cyclic directory structure, users may have
different access to the same file accessed through different paths.
Sometimes just the knowledge of the existence of a file of a certain name is a security ( or privacy ) concern.
Hence the distinction between the R and X bits on UNIX directories.
10.7 Summary
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 12/13
2/20/2021 Operating Systems: File-System Interface
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/10_FileSystemInterface.html 13/13