0% found this document useful (0 votes)
16 views161 pages

Chapter4 FileSystems

Uploaded by

Mohammed Omar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views161 pages

Chapter4 FileSystems

Uploaded by

Mohammed Omar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 161

Chapter 4 - File Systems

Luis Tarrataca
[email protected]

CEFET-RJ

L. Tarrataca Chapter 4 - File Systems 1 / 161


1 Motivation

2 Files

File Naming

File Structure

File Types

File Access

File Attributes

File Operations

Example Program Using File-System Calls

L. Tarrataca Chapter 4 - File Systems 2 / 161


3 Directories

Hierarchical Directory Systems

Path Names

Directory Operations

L. Tarrataca Chapter 4 - File Systems 3 / 161


4 File System Implementation

File System Layout

Implementing the files

Implementing the files

Implementing the files


Contiguous Allocation

Linked-List Allocation

Linked-List Allocation Using a Table in Memory

I-Nodes

I-Nodes

Implementing Directories

L. Tarrataca Chapter 4 - File Systems 4 / 161


5 File-system Management and Optimization

Disk-space management
Block Size

Keeping track of free blocks

Disk Quotas

File-system performance
Caching

Block read-ahead

Reducing Disk-Arm motion

Defragmenting Disks

L. Tarrataca Chapter 4 - File Systems 5 / 161


6 References

L. Tarrataca Chapter 4 - File Systems 6 / 161


Motivation

Motivation

There are three essential requirements for long-term information storage:

1 It must be possible to store a very large amount of information.

2 The information must survive the termination of the process using it.

3 Multiple processes must be able to access the information at once.

L. Tarrataca Chapter 4 - File Systems 7 / 161


Motivation

How do you find information?

How do you keep one user from reading another user’s data?

How do you know which blocks are free?

L. Tarrataca Chapter 4 - File Systems 8 / 161


Motivation

In your opinion what are some of the most important concepts OS?

L. Tarrataca Chapter 4 - File Systems 9 / 161


Motivation

In your opinion what are some of the most important concepts OS?

• Process? Threads?

• Physical memory? Virtual Memory?

L. Tarrataca Chapter 4 - File Systems 10 / 161


Motivation

In your opinion what are some of the most important concepts OS?

• Process? Threads?

• Physical memory? Virtual Memory?

Today we will learn a new abstraction. Can you guess what it is?

L. Tarrataca Chapter 4 - File Systems 11 / 161


Motivation

In your opinion what are some of the most important concepts OS?

• Process? Threads?

• Physical memory? Virtual Memory?

Today we will learn a new abstraction. Can you guess what it is?

• The file...

L. Tarrataca Chapter 4 - File Systems 12 / 161


Files

Files

First things first:

What is a file? Any ideas?

L. Tarrataca Chapter 4 - File Systems 13 / 161


Files

Files

First things first:

What is a file? Any ideas?

• Files: are logical units of information created by processes:


• Processes / Threads can read existing files and create new ones;

• Information stored in files must be persistent, i.e.:


• not affected by process creation and termination.

L. Tarrataca Chapter 4 - File Systems 14 / 161


Files

Files are managed by the operating system. How they are

• structured...

• named...

• accessed...

• used...

• protected...

• implemented...

• and managed

are major topics in operating system design.

L. Tarrataca Chapter 4 - File Systems 15 / 161


Files

OS part dealing with files is known as the file system:

• The subject of this chapter =)

L. Tarrataca Chapter 4 - File Systems 16 / 161


Files File Naming

File Naming

Exact rules for file naming vary somewhat among OS:

• Current OS allow strings of various lengths as legal file names;

• OS typically support two-part file names: (filename, extension);

L. Tarrataca Chapter 4 - File Systems 17 / 161


Files File Structure

File Structure
Files can be structured in any of several ways (1/3):

Files are merely byte sequences:


• Maximum flexibility;
• Unix, Linux, OS X and Windows
use this model;

Figure: The memory hierarchy (Source:


[Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 18 / 161


Files File Structure

File Structure

Files can be structured in any of several ways (2/3):

File is a sequence of fixed-length


records:
• Each record has a certain
number of bytes;
• Read operation returns one
record;
• Write operation overwrites or
appends one record.
Figure: Record sequence file structure. (Source:
[Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 19 / 161


Files File Structure

File Structure

Files can be structured in any of several ways (3/3):


File consists of a tree of records:
• Not necessarily all the same
length;
• Each record contains a key field
in a fixed position in the record
Figure: Tree file structure (Source: • Tree is sorted on the key field:
[Tanenbaum and Bos, 2015])
• Allowing rapid key search;

L. Tarrataca Chapter 4 - File Systems 20 / 161


Files File Types

File Types

OS support several types of files:

• Files: containing user information:


• Containing ASCII characters;

• Or containing binary information:


• Only readable by the computer;
• All programs are binary files;

• Directories: system files for maintaining the structure of the file system;

L. Tarrataca Chapter 4 - File Systems 21 / 161


Files File Access

File Access

When magnetics disks appeared it became possible to:

• Read the bytes or records of a file out of order;

• Or to access records by key rather than by position;

Files whose bytes or records can be read in any order are called
random-access files;

L. Tarrataca Chapter 4 - File Systems 22 / 161


Files File Access

Two methods can be used for specifying where to start reading:

• 1st method: every read gives the position in the file to start reading at;

• 2nd method: seek operations sets current position:


• After a seek, the file can be read sequentially from the now-current position;

• Used in UNIX and Windows;

L. Tarrataca Chapter 4 - File Systems 23 / 161


Files File Attributes

File Attributes

OS keep track of a wide range of information regarding a file:

Can you think of a few attributes that OS maintain regarding a file? Any
ideas?

L. Tarrataca Chapter 4 - File Systems 24 / 161


Files File Attributes

File Attributes
OS keep track of a wide range of information regarding a file:

Figure: Some possible file attributes (Source: [Tanenbaum and Bos, 2015])
L. Tarrataca Chapter 4 - File Systems 25 / 161
Files File Operations

File Operations

What are the most common file operations made available by the OS?
Any ideas?

L. Tarrataca Chapter 4 - File Systems 26 / 161


Files File Operations

File Operations

Most common system calls relating to files (1/5):

• Create: file is created with no data;

• Delete: When the file is no longer needed, it has to be deleted to free up


disk space

• Open: Before using a file, a process must open it in order to:


• fetch the attributes and list of disk addresses into main memory for rapid
access on later calls.

L. Tarrataca Chapter 4 - File Systems 27 / 161


Files File Operations

File Operations

Most common system calls relating to files (2/5):

• Close: When all the accesses are finished:


• attributes and disk addresses are no longer needed;

• file should be closed to free up internal table space;

• Read: Data are read from file:


• Bytes come from the current position;

• Caller must specify how many bytes to read and buffer to place data;

L. Tarrataca Chapter 4 - File Systems 28 / 161


Files File Operations

Most common system calls relating to files (3/5):

• Write: Data are written to the file using current position:


• If the current position is the end of the file, the file’s size increases;

• If the current position is in the middle of the file, existing data are overwritten;

L. Tarrataca Chapter 4 - File Systems 29 / 161


Files File Operations

Most common system calls relating to files (4/5):

• Append: restricted form of write. It can add data only to the end of the file;

• Seek: repositions file pointer to a specific place in the file:


• After this call, data can be read from, or written to, that position

L. Tarrataca Chapter 4 - File Systems 30 / 161


Files File Operations

Most common system calls relating to files (5/5):

• Get attributes: read file attributes;

• Set attributes: set some of the attributes;

• Rename: changes the name of an existing file;

L. Tarrataca Chapter 4 - File Systems 31 / 161


Files Example Program Using File-System Calls

Example Program Using File-System Calls

Can you tell what the following program is doing? Any ideas?

L. Tarrataca Chapter 4 - File Systems 32 / 161


Files Example Program Using File-System Calls

L. Tarrataca Chapter 4 - File Systems 33 / 161


Files Example Program Using File-System Calls

Can you tell what the following program is doing? Any ideas?

• Copies one file from its source file to a destination file;

L. Tarrataca Chapter 4 - File Systems 34 / 161


Directories Hierarchical Directory Systems

Hierarchical Directory Systems

First things first:

What is a directory? Any ideas?

L. Tarrataca Chapter 4 - File Systems 35 / 161


Directories Hierarchical Directory Systems

Directories

First things first:

What is a directory? Any ideas?

• File systems normally have directories or folders, which are themselves files:
• Allows the file system to have a hierarchy of files;

• Grouping related files together;

• Tree of directories;

L. Tarrataca Chapter 4 - File Systems 36 / 161


Directories Hierarchical Directory Systems

Figure: A hierarchical directory system. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 37 / 161


Directories Path Names

Path Names

When the file system is organized as a directory tree:

• Some way is needed for specifying file names;

• Usually there are two solutions:


• Absolute Pathname: E.g.: ‘‘/usr/ast/mailbox’’

• Relative Pathname: makes use of the current directory:


• E.g.: current directory is ‘‘/usr/ast’’ which can have file ‘‘mailbox’’

L. Tarrataca Chapter 4 - File Systems 38 / 161


Directories Path Names

Usually, OS also have two special directories:

• Directory . - represents the current directory;

• Directory .. - represents the parent directory;

L. Tarrataca Chapter 4 - File Systems 39 / 161


Directories Directory Operations

Directory Operations

What are the most common directory operations made available by the
OS? Any ideas?

L. Tarrataca Chapter 4 - File Systems 40 / 161


Directories Directory Operations

Directory Operations

What are the most common directory operations made available by the
OS? Any ideas?

Don’t forget that directories are files...

• Therefore the available system calls should be similar;

L. Tarrataca Chapter 4 - File Systems 41 / 161


Directories Directory Operations

Directory Operations

Most common system calls relating to directories (1/3):

• Create: creates an empty directory;

• Delete: removes an empty directory;

• Opendir: to open a directory;

L. Tarrataca Chapter 4 - File Systems 42 / 161


Directories Directory Operations

Directory Operations

Most common system calls relating to directories (2/3):

• Closedir: . When a directory has been read, it should be closed to free up


internal table space.

• Readdir: to list the contents of a directory;

• Rename: to rename an existing directory;

L. Tarrataca Chapter 4 - File Systems 43 / 161


Directories Directory Operations

Directory Operations

Most common system calls relating to directories (3/3):

• Link: creates a link for a file in a given directory;

• Unlink: removes a file present in the directory;

L. Tarrataca Chapter 4 - File Systems 44 / 161


File System Implementation

File System Implementation

Now that we know all the main file system concepts:

How are such concepts implemented in an OS? Any ideas?

• How are files and directories stored?

• How is disk space managed?

• How to make everything work efficiently and reliably?

L. Tarrataca Chapter 4 - File Systems 45 / 161


File System Implementation File System Layout

File systems are stored on disks:

• Most disks can be divided up into one or more partitions:


• with independent file systems on each partition;

• Sector 0 of the disk is called the MBR (Master Boot Record):


• Used to boot the computer;

• End of MBR contains the partition table

L. Tarrataca Chapter 4 - File Systems 46 / 161


File System Implementation File System Layout

Partition Table contains:

• Starting and ending addresses of each partition;

• One of the partitions in the table is marked as active;

• When computer is booted:


• BIOS reads in and executes the MBR program;

• Active partition is located;

• Active partition boot block is read and executed;

• Boot block program loads OS;

L. Tarrataca Chapter 4 - File Systems 47 / 161


File System Implementation File System Layout

Layout of a disk partition varies a lot from file system to file system:

• Usually it goes something like this:

Figure: A possible file-system layout (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 48 / 161


File System Implementation File System Layout

From the previous figure (1/2):

• Superblock: contains all the key parameters about the file system;
• File-system type identification;

• Number of blocks;

• Etc...

• Free space mgmt: information about the file system free blocks;
• E.g.: Bitmap or list of pointers

L. Tarrataca Chapter 4 - File Systems 49 / 161


File System Implementation File System Layout

From the previous figure (2/2):

• I-nodes: array of data structure, one per file, detailing the file;

• Root directory: contains the top of the file-system-tree;

• Files and directories: containing all the real information;

L. Tarrataca Chapter 4 - File Systems 50 / 161


File System Implementation Implementing the files

Implementing the files

How can we implement a file?

L. Tarrataca Chapter 4 - File Systems 51 / 161


File System Implementation Implementing the files

Implementing the files

How can we implement a file?

How is a file represented?

L. Tarrataca Chapter 4 - File Systems 52 / 161


File System Implementation Implementing the files

Implementing the files

How can we implement a file?

How is a file represented?

Using a magnetic disk:

• Tracks;

• Sectors;

• New concept: Block which is a set of sectors;

L. Tarrataca Chapter 4 - File Systems 53 / 161


File System Implementation Implementing the files

Various methods are used in different operating systems:

• Contiguous Allocation

• Linked List Allocation

• Linked-List Allocation Using a Table in Memory

Guess what we will be seeing next? Any ideas? =P

L. Tarrataca Chapter 4 - File Systems 54 / 161


File System Implementation Implementing the files

Contiguous Allocation

Idea: Store each file as a contiguous run of disk blocks:

• E.g.: 50-KB file would be allocated to


• 50 consecutive blocks using a disk with 1-KB blocks:

• 25 consecutive blocks using a disk with 2-KB blocks:

Figure: Contiguous allocation of disk space for seven files (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 55 / 161


File System Implementation Implementing the files

From the previous figure:

• First 40 disk blocks are shown;

• Initially, the disk was empty;

• Then a file A, of length four blocks, was written:


• If file A was 3 1/2 blocks, some space is wasted at the end of the last block;

• After that a three-block file, B, was written;

• In the figure, a total of seven files are shown:


• Each one starting at the block following the end of the previous one.

L. Tarrataca Chapter 4 - File Systems 56 / 161


File System Implementation Implementing the files

In your opinion what are the advantages of contiguous allocation? Any


ideas?

L. Tarrataca Chapter 4 - File Systems 57 / 161


File System Implementation Implementing the files

In your opinion what are the advantages of contiguous allocation? Any


ideas?

Advantage 1: Simple to implement:

• Keeping track of where a file’s blocks are is reduced to:


• remembering disk address of the first block and number of blocks in the file;

L. Tarrataca Chapter 4 - File Systems 58 / 161


File System Implementation Implementing the files

Can you see any other advantage of contiguous allocation? Any ideas?

L. Tarrataca Chapter 4 - File Systems 59 / 161


File System Implementation Implementing the files

Can you see any other advantage of contiguous allocation? Any ideas?

Advantage 2: Read performance:

• Entire file can be read from the disk in a single operation;

• Only one seek is needed for the first block;

• After that, no more seeks or rotational delays are needed:


• data come in at the full bandwidth of the disk;

L. Tarrataca Chapter 4 - File Systems 60 / 161


File System Implementation Implementing the files

In your opinion what are the disadvantages of contiguous allocation?


Any ideas?

L. Tarrataca Chapter 4 - File Systems 61 / 161


File System Implementation Implementing the files

In your opinion what are the disadvantages of contiguous allocation?


Any ideas?

Major disadvantage: over time, disk becomes fragmented

Figure: The state of the disk after files D and F have been removed. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 62 / 161


File System Implementation Implementing the files

From the previous figure:

• Files D and F were removed:


• Respective blocks were then freed;

• Leaving a run of free blocks on the disk;

• Disk would have to be compacted immediately:


• Potentially millions of blocks to compact...

• Disastrous perfomance;

• As a result: disk consists of files and holes;

L. Tarrataca Chapter 4 - File Systems 63 / 161


File System Implementation Implementing the files

Initially: fragmentation is not a problem:

• Each new file can be written at the end of disk:


• following the previous one;

• However, eventually the disk will fill up, then two solutions exist:
• Compact the disk: prohibitively expensive;

• Reuse free space:


• When a new file is created choose a hole big enough;
• Requires maintaining a list of holes;

L. Tarrataca Chapter 4 - File Systems 64 / 161


File System Implementation Implementing the files

Can you see any other disadvantages of contiguous allocation? Any


ideas?

L. Tarrataca Chapter 4 - File Systems 65 / 161


File System Implementation Implementing the files

Can you see any other disadvantages of contiguous allocation? Any


ideas?

Major disadvantage: file size needs to known at time of creation

• This is not always possible to know in advance:


• File size may change with time...

Conclusion: contiguous allocation is problematic...

Can you think of any other type of method for implementing files?

L. Tarrataca Chapter 4 - File Systems 66 / 161


File System Implementation Implementing the files

Linked-List Allocation

Idea: keep each file as a linked list of disk blocks:

Figure: Storing a file as a linked list of disk blocks. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 67 / 161


File System Implementation Implementing the files

From the previous figure:

• First word of each block is used as a pointer to the next one:


• Rest of the block is for data.

• Unlike contiguous allocation:


• Every disk block can be used in this method;

• No space is lost to disk fragmentation:


• This does not mean that fragmentation does not occur!

• Directory entries merely need to store the disk address of the first block:
• Rest can be found starting there.

L. Tarrataca Chapter 4 - File Systems 68 / 161


File System Implementation Implementing the files

Can you see any other disadvantages of Linked-List Allocation? Any


ideas?

L. Tarrataca Chapter 4 - File Systems 69 / 161


File System Implementation Implementing the files

Can you see any other disadvantages of Linked-List Allocation? Any


ideas?

• Contiguous allocation allows for sequentially file reads:


• Very efficient =)

• Linked-list allocation implies random block accesses:


• To get to block n, OS has to:
• Start at the beginning and read n - 1 blocks prior;

• Painfully slow =’(

L. Tarrataca Chapter 4 - File Systems 70 / 161


File System Implementation Implementing the files

Linked-list file implementation is also problematic...

Can you think of any other methods for implementing a file? Any ideas?

L. Tarrataca Chapter 4 - File Systems 71 / 161


File System Implementation Implementing the files

Linked-List Allocation Using a Table in Memory

Disadvantages of the linked-list allocation can be eliminated by:

• Storing the pointer word from each disk block in a table in memory;

L. Tarrataca Chapter 4 - File Systems 72 / 161


File System Implementation Implementing the files

In the previous two figures we have two files:

• File A uses disk blocks 4, 7, 2, 10, and 12;

• File B uses disk blocks 6, 3, 11, and 14;

• Using the table:


• File A: start with block 4 and follow the chain until the end;
• Chain is terminated with a special marker (e.g., -1)

• File B: start with block 6 and follow the chain until the end;
• Chain is terminated with a special marker (e.g., -1)

Such a table in main memory is called a FAT (File Allocation Table);

L. Tarrataca Chapter 4 - File Systems 73 / 161


File System Implementation Implementing the files

Can you see any any advantages with linked-list allocation using a table
in memory? Any ideas?

L. Tarrataca Chapter 4 - File Systems 74 / 161


File System Implementation Implementing the files

Can you see any any advantages with linked-list allocation using a table
in memory? Any ideas?

Random access is much easier, however chain:

• Must still be followed to find a given offset within the file;

Despite this the chain is entirely in memory:

• Can be followed without making any disk references;

L. Tarrataca Chapter 4 - File Systems 75 / 161


File System Implementation Implementing the files

Can you see any any disadvantages with linked-list allocation using a
table in memory? Any ideas?

L. Tarrataca Chapter 4 - File Systems 76 / 161


File System Implementation Implementing the files

Can you see any any disadvantages with linked-list allocation using a
table in memory? Any ideas?

• Entire table must be in memory all the time to make it work;

• Example: 1-TB disk and a 1-KB block size:


• Table needs to be 240 /210 = 230 entries;
• one for each of the ≈ 1 billion disk blocks

• Each entry needs a minimum of 30 bits:


• In order to properly identify the block;

• Thus the table requires a total of 230 × 30 ≈ 3GB...

• Conclusion: FAT does not scale well to large disks

L. Tarrataca Chapter 4 - File Systems 77 / 161


File System Implementation Implementing the files

Can you see any any disadvantages with linked-list allocation using a
table in memory? Any ideas?

• When computer is shut down:


• Table must be stored in non-volatile memory;

• This implies disk acesses and writes;

• Table is only used to get the block number:


• Reading / Writing block still requires disk acesses and writes;

L. Tarrataca Chapter 4 - File Systems 78 / 161


File System Implementation Implementing the files

I-Nodes

The previous methods had some problems...

How can we keep track efficiently of which blocks belong to which file?
Any ideas?

L. Tarrataca Chapter 4 - File Systems 79 / 161


File System Implementation Implementing the files

I-Nodes

The previous methods had some problems...

How can we keep track efficiently of which blocks belong to which file?
Any ideas?

Our last method: I-nodes, short for index-node:

• Lists the attributes and disk addresses of the file’s blocks

• Each i-node has a fixed position on the disk;

L. Tarrataca Chapter 4 - File Systems 80 / 161


File System Implementation Implementing the files

I-nodes, short for index-node:

• Lists the attributes and disk addresses of the file’s blocks

• Given i-node, it is then possible to find all the blocks of the file:

Figure: An example i-node. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 81 / 161


File System Implementation Implementing the files

How does this scheme compare against linked files using an in-memory
table? Any ideas?

L. Tarrataca Chapter 4 - File Systems 82 / 161


File System Implementation Implementing the files

How does this scheme compare against linked files using an in-memory
table? Any ideas?

I-node needs be in memory only when the corresponding file is open:

• If each i-node occupies n bytes and k files may be open:


• Array holding the i-nodes for the open files is only kn bytes;

L. Tarrataca Chapter 4 - File Systems 83 / 161


File System Implementation Implementing the files

I-node array is far smaller than space occupied by the file table
approach:

Why do you think this happens? Any ideas?

L. Tarrataca Chapter 4 - File Systems 84 / 161


File System Implementation Implementing the files

I-node array:

• Usually far smaller than the space occupied by the file table approach;

• Reason is simple:
• Table holding all disk blocks is proportional in size to the disk itself;
• If the disk has n blocks, the table needs n entries;
• As disks grow larger, this table grows linearly with them.

• In contrast, i-node scheme requires array size:


• Proportional to the maximum number of files that may be open at once

L. Tarrataca Chapter 4 - File Systems 85 / 161


File System Implementation Implementing the files

Can you see any problem with the i-nodes approach? Any ideas?

L. Tarrataca Chapter 4 - File Systems 86 / 161


File System Implementation Implementing the files

Can you see any problem with the i-nodes approach? Any ideas?

If each i-nodes has room for a fixed number of disk addresses:

what happens when a file grows beyond this limit? Any ideas?

L. Tarrataca Chapter 4 - File Systems 87 / 161


File System Implementation Implementing the files

Can you see any problem with the i-nodes approach? Any ideas?

If each i-nodes has room for a fixed number of disk addresses:

What happens when a file grows beyond this limit? Any ideas?

One solution: reserve the last disk address not for a data block:

• But for the address of a block containing more disk-block addresses;

L. Tarrataca Chapter 4 - File Systems 88 / 161


File System Implementation Implementing the files

What happens when a file grows beyond this limit? Any ideas?

One solution: reserve the last disk address not for a data block:

• but for the address of a block containing more disk-block addresses;

Figure: An example i-node. (Source: [Tanenbaum and Bos, 2015])


L. Tarrataca Chapter 4 - File Systems 89 / 161
File System Implementation Implementing the files

An even more advanced solution:

1 Two or more such blocks containing disk addresses;

2 Disk blocks pointing to other disk blocks full of addresses;

This is known as indirection blocks.

L. Tarrataca Chapter 4 - File Systems 90 / 161


File System Implementation Implementing the files

Figure: Indirection blocks example (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 91 / 161


File System Implementation Implementing Directories

Implementing Directories

Before a file can be read:

• File must be opened;

• OS uses the path name to locate the directory entry on the disk:
• Directory entry provides information needed to find the disk blocks:

• Depending on the system may be:


• Disk address of the entire file (with contiguous allocation);
• Number of the first block (both linked-list schemes);
• I-Node number;

• Main function of directory system is to:


• Map file name onto the information needed to locate the data.

L. Tarrataca Chapter 4 - File Systems 92 / 161


File System Implementation Implementing Directories

Every file system maintains various file attributes (1/3):

• E.g: file’s owner and creation time;

• These attributes need to be stored somewhere;

• One possibility: store them in the directory entry:

Figure: Simple directory containing fixed-size entries with the disk addresses and attributes in the directory
entry. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 93 / 161


File System Implementation Implementing Directories

Every file system maintains various file attributes (2/3):

Figure: Simple directory containing fixed-size entries with the disk addresses and attributes in the directory
entry. (Source: [Tanenbaum and Bos, 2015])

Directory consists of a list of fixed-size entries, one per file, containing:

• A (fixed-length) file name;

• Attributes:
• Creator, Time, etc.

• File disk blocks

L. Tarrataca Chapter 4 - File Systems 94 / 161


File System Implementation Implementing Directories

Every file system maintains various file attributes (3/3):

• Another possibility: for systems that use i-nodes:


• Store attributes in the i-nodes;

• Each directory entry can be shorter: {File name, I-node number}

Figure: A directory in which each entry just refers to an i-node. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 95 / 161


File System Implementation Implementing Directories

Just for curiosity:

How can we determine the i-node of files and directories in Linux? Any
ideas?

L. Tarrataca Chapter 4 - File Systems 96 / 161


File System Implementation Implementing Directories

Just for curiosity:

How can we determine the i-node of files and directories in Linux? Any
ideas?

• ls -i =)

L. Tarrataca Chapter 4 - File Systems 97 / 161


File System Implementation Implementing Directories

Figure: ‘‘ls -lhi’’ output example

L. Tarrataca Chapter 4 - File Systems 98 / 161


File System Implementation Implementing Directories

But what about the file names lengths impact on the directory structure?

L. Tarrataca Chapter 4 - File Systems 99 / 161


File System Implementation Implementing Directories

But what about the file names lengths impact on the directory structure?

• Simplest approach: limit file-name length to, typically, 255 characters:


• Approach is simple;

• However: wastes a great deal of directory space:


• Each directory entry needs to reserve 255 characters;
• Few files have such long names.

• For efficiency reasons: different structure is desirable.

L. Tarrataca Chapter 4 - File Systems 100 / 161


File System Implementation Implementing Directories

But what about the file names lengths impact on the directory structure?

All modern operating systems support long variable-length file names:

• Idea: Give up the idea that all directory entries are the same size;

L. Tarrataca Chapter 4 - File Systems 101 / 161


File System Implementation Implementing Directories

Idea: Give up the idea that all directory entries are the same size:

Figure: In-line handling of long file names (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 102 / 161


File System Implementation Implementing Directories

From the previous figure:

• Each directory entry contains a fixed portion:


• Starting with the length of the entry;

• Followed by the attributes (fixed-length):


• E.g.: Owner, creation time, protection information, etc...

• Followed by the actual file name:


• However long it may be...

• In this example we have three files:


• project-budget, personnel, and foo.

• Each file name is terminated by a special character:


• Usually 0;
• Represented in the figure by a box with a cross in it ⊠

L. Tarrataca Chapter 4 - File Systems 103 / 161


File System Implementation Implementing Directories

Can you see any problems with the previous approach? Any ideas?

L. Tarrataca Chapter 4 - File Systems 104 / 161


File System Implementation Implementing Directories

Can you see any problems with the previous approach? Any ideas?

When a file is removed:

• a variable-sized gap is introduced into the directory:


• Next file to be entered may not fit

• Problem is essentially the same one we saw with contiguous disk files,

L. Tarrataca Chapter 4 - File Systems 105 / 161


File System Implementation Implementing Directories

Can you see any other problem with the previous approach? Any ideas?

L. Tarrataca Chapter 4 - File Systems 106 / 161


File System Implementation Implementing Directories

Can you see any other problem with the previous approach? Any ideas?

Single directory entry may span multiple pages:

• Page fault may occur while reading a file name;

L. Tarrataca Chapter 4 - File Systems 107 / 161


File System Implementation Implementing Directories

Another way to handle variable-length names (1/2):

• Make directory entries themselves all fixed length;

• Keep the file names together in a heap at the end of the directory:

L. Tarrataca
Figure: Heap handling of long Chapter 4[Tanenbaum
file names (Source: - File Systems and Bos,
1082015])
/ 161
File System Implementation Implementing Directories

Can you see any advantage of using the heap method? Any ideas?

L. Tarrataca Chapter 4 - File Systems 109 / 161


File System Implementation Implementing Directories

Can you see any advantage of using the heap method? Any ideas?

When an entry is removed:

• next file entered will always fit there;

Important:

• Heap must be managed;

• Page faults can still occur while processing file names;

L. Tarrataca Chapter 4 - File Systems 110 / 161


File System Implementation Implementing Directories

There is one thing we still have not discussed:

How are directories to be searched? Any ideas?

L. Tarrataca Chapter 4 - File Systems 111 / 161


File System Implementation Implementing Directories

There is one thing we still have not discussed:

How are directories to be searched? Any ideas?

Several possibilities:

• Search linearly from beginning to end for a filename;


• Bad for extremely long directories;

• Search using a hash table in each directory:


• Hashed based on the filename;

• Faster lookup, but more complex administration.

L. Tarrataca Chapter 4 - File Systems 112 / 161


File System Implementation Implementing Directories

Example

Consider the path: /usr/ast/mbox

How does the OS find the i-node information? Any ideas?

L. Tarrataca Chapter 4 - File Systems 113 / 161


File System Implementation Implementing Directories

Example
Consider the path: /usr/ast/mbox

How does the OS find the i-node information? Any ideas?

1 Locate root directory i-node:


• This is always a fixed placed on disk;

• Each directory entry contains: {filename, i-node}

2 I-node is read containing the info for /user/


• Each directory entry contains: {filename, i-node}

3 Next i-node is read containing the info for /user/ast;


• Each directory entry contains: {filename, i-node}

4 Next i-node is read containing the info for /user/ast/mbox


L. Tarrataca Chapter 4 - File Systems 114 / 161
File-system Management and Optimization

File-system Management and Optimization

Making the file system work is one thing:

• Making it work efficiently and robustly in real life is something quite different;

• Guess what we will be seeing next ;)


• Some of the issues involved in managing disks:
• Disk-space management;
• File-system performance;
• Defragmenting disks;

L. Tarrataca Chapter 4 - File Systems 115 / 161


File-system Management and Optimization Disk-space management

Disk-space management

Files are normally stored on disk:

• Disk space management is a major concern to file-system designers.

• Lets have a look at some of the issues influencing file-system design:


• Block Size;

• Keeping Track of Free Blocks;

• Disk quotas

L. Tarrataca Chapter 4 - File Systems 116 / 161


File-system Management and Optimization Disk-space management

Block Size

Two general strategies are possible for storing an n byte file:

• n consecutive bytes of disk space are allocated:


• However, If a file grows, it may have to be moved on the disk;

• Very slow operation...

• File is split up into a number of (not necessarily contiguous) blocks;


• Most file systems chop files up into fixed-size blocks

• Blocks need not to be adjacent;

• File fragmentation may occur;

L. Tarrataca Chapter 4 - File Systems 117 / 161


File-system Management and Optimization Disk-space management

Blocks are an important part of the file system:

• They represent a fixed-length sequence of bytes;

But how can we choose an appropriate block size? Any ideas?

L. Tarrataca Chapter 4 - File Systems 118 / 161


File-system Management and Optimization Disk-space management

Blocks are an important part of the file system:

• They represent a fixed-length sequence of bytes;

But how can we choose an appropriate block size? Any ideas?

• Large block size means:


• small files waste large amounts of disk space;

• Small block size means:


• Most files will span multiple blocks;

• Thus needing multiple seeks and rotation delays to read:


• bad for perfomance;

L. Tarrataca Chapter 4 - File Systems 119 / 161


File-system Management and Optimization Disk-space management

In conclusion:

• If the allocation unit is too large we waste space;

• If the allocation unit is too small we waste time;

Again: But how can we choose an appropriate block size? Any ideas?

Studies have shown that making a good choice requires:

• having some information about the file-size distribution:

L. Tarrataca Chapter 4 - File Systems 120 / 161


File-system Management and Optimization Disk-space management

Studies have shown that making a good choice requires:

• having some information about the file-size distribution:

Figure: Percentage of files smaller than a given size (in bytes) (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 121 / 161


File-system Management and Optimization Disk-space management

From the previous figure:

• For each power-of-two file size:


• Each line lists % of all files ≤ to it for three data sets;

• E.g.: in 2005: ≈ 59% in the 2nd data set were 4KB or smaller

• E.g.: in 2005: ≈ 90% in the 2nd data set were 64KB or smaller

• Median file size was 2475 bytes;

L. Tarrataca Chapter 4 - File Systems 122 / 161


File-system Management and Optimization Disk-space management

What conclusion can we draw from these data? Any ideas?

L. Tarrataca Chapter 4 - File Systems 123 / 161


File-system Management and Optimization Disk-space management

What conclusion can we draw from these data? Any ideas?

• With a 1KB block: ≈ 30% − 50% of all files will fit

• With a 4KB block: ≈ 60% − 70% of all files will fit

L. Tarrataca Chapter 4 - File Systems 124 / 161


File-system Management and Optimization Disk-space management

Example (1/2)

Consider a disk with:

• 1MB per track;

• Rotation time of 8.33 (7200 rpms);

• Average seek time of 5 msec;

• The time in milliseconds to read a block of k bytes is the sum of:


• Seek time;

• Rotational delay;

• Transfer times;

• I.e.: 5 + 4.165 + (k /220 ) × 8.33

L. Tarrataca Chapter 4 - File Systems 125 / 161


File-system Management and Optimization Disk-space management

Example (2/2)
Data rate for such a disk as function of block size:

Figure: The dashed curve (left-hand scale) give the date rate of a disk. The solid curve (right-hand scale)
gives the disk-space efficiency. All files are 4 KB (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 126 / 161


File-system Management and Optimization Disk-space management

From the previous figure (1/3):

• For simplicity: assume that all files are 4KB

• Solid curve shows the space efficiency as a function of block size;

• Dashed curve shows the data rate as a function of block size;

L. Tarrataca Chapter 4 - File Systems 127 / 161


File-system Management and Optimization Disk-space management

From the previous figure (2/3):

• Dashed curve can be understood as follows:


• Block access time is dominated by the seek time and rotational delay;

• I.e. 5 + 4.165 = 9.165 msec are needed to access a block:

• Therefore, the more data are fetched the better;

• Hence, data rate goes up almost linearly with block size:


• Until the transfers take so long that the transfer time begins to matter;

L. Tarrataca Chapter 4 - File Systems 128 / 161


File-system Management and Optimization Disk-space management

From the previous figure (3/3):

• Solid curve can be understood as follows:


• With 4-KB files:
• Four 1-KB blocks are used;
• Two 2-KB blocks are used;
• One 4-KB blocks are used;
• Half 8-Kb blocks are used (50% efficiency)
• Quarter 16-Kb blocks are used (25% efficiency)

• In reality: some space is always wasted:


• Not all files are an exact multiple of the disk block size;

L. Tarrataca Chapter 4 - File Systems 129 / 161


File-system Management and Optimization Disk-space management

What is the main conclusion you can draw from the previous figure? Any
ideas?

L. Tarrataca Chapter 4 - File Systems 130 / 161


File-system Management and Optimization Disk-space management

What is the main conclusion you can draw from the previous figure? Any
ideas?

Performance and space utilization are inherently in conflict:

• Small blocks are bad for performance, but good for disk utilization;

• For this reason: No reasonable compromise is available!

• Size closest to the two curves is 64 KB but:


• Data rate is only 6.6 MB/sec;

• Space efficiency is about 7%;

• Neither of which is very good;

L. Tarrataca Chapter 4 - File Systems 131 / 161


File-system Management and Optimization Disk-space management

Historically:

• File systems have chosen sizes in the 1-KB to 4-KB range;

• But with disks now exceeding 1 TB:


• Better to increase block size to 64 KB and accept wasted disk space;

• Disk space is hardly in short supply any more;

L. Tarrataca Chapter 4 - File Systems 132 / 161


File-system Management and Optimization Disk-space management

Keeping track of free blocks

Once a block size has been chosen:

How does the OS keep track of free blocks? Any ideas?

L. Tarrataca Chapter 4 - File Systems 133 / 161


File-system Management and Optimization Disk-space management

Keeping track of free blocks

Once a block size has been chosen:

How does the OS keep track of free blocks? Any ideas?

Two methods are widely used:

• Linked-List of disk blocks;

• Bitmap of disk blocks;

L. Tarrataca Chapter 4 - File Systems 134 / 161


File-system Management and Optimization Disk-space management

Lets have a look at the linked-list approach:

Figure: Storing the free list on a linked list. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 135 / 161


File-system Management and Optimization Disk-space management

From the previous figure (1/2):

• Linked list of disk blocks:


• Each block holding as many free disk block numbers as will fit;

• Storage of the list requires three blocks: 16, 17 and 18

• Example: With a 1-KBytes block and a 32-bit disk block number:


• Each list block holds numbers of (210 × 8)/25 = 256 free blocks.

• One of these slots is required for the pointer to the next block:
• As a result: only capable of describing 255 free blocks;

L. Tarrataca Chapter 4 - File Systems 136 / 161


File-system Management and Optimization Disk-space management

From the previous figure (2/2):

• Consider a 1-TB disk:


• Then 240 /210 = 230 blocks exist;

• If each block in the list stores the addresses of 255 blocks


• 230 /(28 − 1) ≈ 4 million blocks will be required for the list;

L. Tarrataca Chapter 4 - File Systems 137 / 161


File-system Management and Optimization Disk-space management

Lets have a look at the bitmap approach:

Figure: Storing the free list on a bitmap. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 138 / 161


File-system Management and Optimization Disk-space management

From the previous figure:

• A disk with n blocks requires a bitmap with n bits:


• Free blocks are represented by 1s in the map;

• Allocated blocks by 0s (or vice versa)

• Consider a 1-TB disk with 1-KB blocks:


• Then 240 /210 = 230 bits are required;

• These bits require around 230 /(210 ∗ 8) ≈ 130.000 1KB blocks to store;

Why does the value 8 appear in the previous calculation? Any ideas?

L. Tarrataca Chapter 4 - File Systems 139 / 161


File-system Management and Optimization Disk-space management

From the previous figure:

• A disk with n blocks requires a bitmap with n bits:


• Free blocks are represented by 1s in the map;

• Allocated blocks by 0s (or vice versa)

• Consider a 1-TB disk with 1-KB blocks:


• Then 240 /210 = 230 bits are required;

• These bits require around 230 /(210 ∗ 8) ≈ 130.000 1KB blocks to store;

Why does the value 8 appear in the previous calculation? Any ideas?

• Each block has size 1-KB, i.e.: 1-KByte

• Conclusion: bitmap requires less space than linked-list:


• Uses 1-bit per blocks vs 32-bits...

L. Tarrataca Chapter 4 - File Systems 140 / 161


File-system Management and Optimization Disk-space management

But when is one approach better than the other?

L. Tarrataca Chapter 4 - File Systems 141 / 161


File-system Management and Optimization Disk-space management

But when is one approach better than the other?

• Both approaches require linear search so that is not it....

L. Tarrataca Chapter 4 - File Systems 142 / 161


File-system Management and Optimization Disk-space management

But when is one approach better than the other?

• Both approaches require linear search so that is not it....

• Bitmap approach always requires the same size:


• regardless of how full the disk is...

• Linked-list approach will require less-size as the disk becomes full;

L. Tarrataca Chapter 4 - File Systems 143 / 161


File-system Management and Optimization Disk-space management

Disk Quotas

Multiuser OS often provide a mechanism for enforcing disk quotas:

• Prevents people from monopolising too much space;

• Idea: System administrator assigns each user a maximum space (quota);


• OS makes sure users do not exceed their quota;

How do you think an OS enforces user quotas? Any ideas?

L. Tarrataca Chapter 4 - File Systems 144 / 161


File-system Management and Optimization Disk-space management

How do you think an OS enforces user quotas? Any ideas?

Typical mechanism (1/2):

• When a user opens a file:


• Any increases / decreases in file size will be charged to the owner’s quota;

• OS table contains the quota record for every user with a currently open file:
• Even if the file was opened by someone else

• When a new entry is made in the open-file table:


• Pointer to the owner’s quota record is entered into it

• Making it easy to find various limits;

L. Tarrataca Chapter 4 - File Systems 145 / 161


File-system Management and Optimization Disk-space management

How do you think an OS enforces user quotas? Any ideas?

Typical mechanism (2/2):

• Every time a block is added to a file:


• Total number of blocks charged to the owner is incremented;

• Check is made against both the hard and soft limits:


• Soft limit may be exceeded, but the hard limit may not;

• Appending to a file when the hard block limit has been reached:
• Will result in an error.

L. Tarrataca Chapter 4 - File Systems 146 / 161


File-system Management and Optimization Disk-space management

Figure: Quotas are kept track of on a per-user basis in a quota table. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 147 / 161


File-system Management and Optimization File-system performance

File-system performance

Access to disk is much slower than access to memory:

• Memory access is approximately a million times as fast as disk access.

As a result many file systems have been designed with:

• Various optimizations to improve performance.

• Lets have a look at some:


• Caching;

• Block Read Ahead;

• Reducing disk-arm motion;

L. Tarrataca Chapter 4 - File Systems 148 / 161


File-system Management and Optimization File-system performance

Caching

Technique used to reduce disk accesses: block cache;

• Collection of disk blocks kept in memory for performance reasons;

• Check all read requests to see if block is in cache:


• If block ∈ cache:
• Request can be satisfied without a disk access.

/ cache:
• If block ∈
• Block is read into cache;
• Then copied to wherever it is needed;

L. Tarrataca Chapter 4 - File Systems 149 / 161


File-system Management and Optimization File-system performance

Typically there are many blocks in the cache:

• Need to quickly determine if a given block is present;

• Use a hash table: hash device and disk address;

• All the blocks with the same hash value are chained together

Figure: The buffer cache data structures. (Source: [Tanenbaum and Bos, 2015])

L. Tarrataca Chapter 4 - File Systems 150 / 161


File-system Management and Optimization File-system performance

When a block has to be loaded into a full cache:

• Some block has to be removed:


• And rewritten to the disk if it has been modified;

• Page-replacement algorithms can be used:


• FIFO, LRU, LFU...

L. Tarrataca Chapter 4 - File Systems 151 / 161


File-system Management and Optimization File-system performance

Block read-ahead

Second technique for improving performance:

• Get blocks into the cache before they are needed to increase the hit rate;

• Many files are read sequentially;

• When the file system (FS) is asked to produce block k in a file:


• FS produces the k block;

• FS also checks if block k + 1 ∈ cache:

/ cache FS schedules a read for block k + 1;


• If block ∈
• Hoping that when it is needed it will already be in cache;

L. Tarrataca Chapter 4 - File Systems 152 / 161


File-system Management and Optimization File-system performance

Read-ahead strategy works only for files that are read sequentially:

• If a file is being randomly accessed:


• Read ahead does not help!

• In fact: hurts performance:


• Blocks will be read unnecessarily;
• Potentially useful blocks will be removed from cache;
• Modified blocks evicted from cache will have to be written to disk;
• Disk could be reading useful blocks;

L. Tarrataca Chapter 4 - File Systems 153 / 161


File-system Management and Optimization File-system performance

Reducing Disk-Arm motion

Idea: Reduce amount of disk-arm motion:

• By putting blocks that are likely to be accessed in sequence;


• Ideally contiguous, but in close proximity already helps;

• When an output file is written:


• FS has to allocate the blocks one at a time;

• Easy to do using a bitmap;

• Harder to do with a list of free blocks;


• List would need to be sorted;

L. Tarrataca Chapter 4 - File Systems 154 / 161


File-system Management and Optimization File-system performance

Movements are relevant only for magnetic disks:

Do you know any other type of mass storage devices? Any ideas?

L. Tarrataca Chapter 4 - File Systems 155 / 161


File-system Management and Optimization File-system performance

Movements are relevant only for magnetic disks:

Do you know any other type of mass storage devices? Any ideas?

Solid-state disks (SSD):

• No moving parts =);

• These are based on flash technology:


• Random accesses are just as fast as sequential ones;

• Many of the problems of traditional disks go away;

• Unfortunately, new problems emerge =’(


• Each disk block can be written only a limited number of times;
• Great care is taken to spread the wear on the disk evenly.

L. Tarrataca Chapter 4 - File Systems 156 / 161


File-system Management and Optimization Defragmenting Disks

Defragmenting Disks

When the operating system is initially installed:

• Data are installed consecutively at the beginning of the disk;

• All free disk space is in a single contiguous unit following the installed files;

Can you see any problem with this structure as time goes on? Any ideas?

L. Tarrataca Chapter 4 - File Systems 157 / 161


File-system Management and Optimization Defragmenting Disks

Defragmenting Disks

Can you see any problem with this structure as time goes on? Any ideas?

• Files are created and removed:


• Disk becomes full of holes;
• Non-contiguous empty space spread throughout the disk;

• When a new file is created:


• Blocks used for it may be spread all over the disk;
• Giving poor performance.

L. Tarrataca Chapter 4 - File Systems 158 / 161


File-system Management and Optimization Defragmenting Disks

Performance can be restored by:

1 Moving files around to make them contiguous;

2 Putting all free space contiguously;

L. Tarrataca Chapter 4 - File Systems 159 / 161


File-system Management and Optimization Defragmenting Disks

Linux file systems like ext2 and ext3:

• Generally suffer less from defragmentation than Windows:


• Due to the way disk blocks are selected;

• Manual defragmentation is rarely required;

SSDs do not really suffer from fragmentation at all:

• Defragmenting an SSD is counterproductive;

• Not only is there no gain in performance:


• Writing to SSDs wears them out;

• Defragmenting them merely shortens their life.

L. Tarrataca Chapter 4 - File Systems 160 / 161


References

References I

Tanenbaum, A. and Bos, H. (2015).

Modern Operating Systems.

Pearson Education Limited.

L. Tarrataca Chapter 4 - File Systems 161 / 161

You might also like