Chapter4 FileSystems
Chapter4 FileSystems
Luis Tarrataca
[email protected]
CEFET-RJ
2 Files
File Naming
File Structure
File Types
File Access
File Attributes
File Operations
Path Names
Directory Operations
Linked-List Allocation
I-Nodes
I-Nodes
Implementing Directories
Disk-space management
Block Size
Disk Quotas
File-system performance
Caching
Block read-ahead
Defragmenting Disks
Motivation
2 The information must survive the termination of the process using it.
How do you keep one user from reading another user’s data?
In your opinion what are some of the most important concepts OS?
In your opinion what are some of the most important concepts OS?
• Process? Threads?
In your opinion what are some of the most important concepts OS?
• Process? Threads?
Today we will learn a new abstraction. Can you guess what it is?
In your opinion what are some of the most important concepts OS?
• Process? Threads?
Today we will learn a new abstraction. Can you guess what it is?
• The file...
Files
Files
• structured...
• named...
• accessed...
• used...
• protected...
• implemented...
• and managed
File Naming
File Structure
Files can be structured in any of several ways (1/3):
File Structure
File Structure
File Types
• Directories: system files for maintaining the structure of the file system;
File Access
Files whose bytes or records can be read in any order are called
random-access files;
• 1st method: every read gives the position in the file to start reading at;
File Attributes
Can you think of a few attributes that OS maintain regarding a file? Any
ideas?
File Attributes
OS keep track of a wide range of information regarding a file:
Figure: Some possible file attributes (Source: [Tanenbaum and Bos, 2015])
L. Tarrataca Chapter 4 - File Systems 25 / 161
Files File Operations
File Operations
What are the most common file operations made available by the OS?
Any ideas?
File Operations
File Operations
• Caller must specify how many bytes to read and buffer to place data;
• If the current position is in the middle of the file, existing data are overwritten;
• Append: restricted form of write. It can add data only to the end of the file;
Can you tell what the following program is doing? Any ideas?
Can you tell what the following program is doing? Any ideas?
Directories
• File systems normally have directories or folders, which are themselves files:
• Allows the file system to have a hierarchy of files;
• Tree of directories;
Path Names
Directory Operations
What are the most common directory operations made available by the
OS? Any ideas?
Directory Operations
What are the most common directory operations made available by the
OS? Any ideas?
Directory Operations
Directory Operations
Directory Operations
Layout of a disk partition varies a lot from file system to file system:
• Superblock: contains all the key parameters about the file system;
• File-system type identification;
• Number of blocks;
• Etc...
• Free space mgmt: information about the file system free blocks;
• E.g.: Bitmap or list of pointers
• I-nodes: array of data structure, one per file, detailing the file;
• Tracks;
• Sectors;
• Contiguous Allocation
Contiguous Allocation
Figure: Contiguous allocation of disk space for seven files (Source: [Tanenbaum and Bos, 2015])
Can you see any other advantage of contiguous allocation? Any ideas?
Can you see any other advantage of contiguous allocation? Any ideas?
Figure: The state of the disk after files D and F have been removed. (Source: [Tanenbaum and Bos, 2015])
• Disastrous perfomance;
• However, eventually the disk will fill up, then two solutions exist:
• Compact the disk: prohibitively expensive;
Can you think of any other type of method for implementing files?
Linked-List Allocation
Figure: Storing a file as a linked list of disk blocks. (Source: [Tanenbaum and Bos, 2015])
• Directory entries merely need to store the disk address of the first block:
• Rest can be found starting there.
Can you think of any other methods for implementing a file? Any ideas?
• Storing the pointer word from each disk block in a table in memory;
• File B: start with block 6 and follow the chain until the end;
• Chain is terminated with a special marker (e.g., -1)
Can you see any any advantages with linked-list allocation using a table
in memory? Any ideas?
Can you see any any advantages with linked-list allocation using a table
in memory? Any ideas?
Can you see any any disadvantages with linked-list allocation using a
table in memory? Any ideas?
Can you see any any disadvantages with linked-list allocation using a
table in memory? Any ideas?
Can you see any any disadvantages with linked-list allocation using a
table in memory? Any ideas?
I-Nodes
How can we keep track efficiently of which blocks belong to which file?
Any ideas?
I-Nodes
How can we keep track efficiently of which blocks belong to which file?
Any ideas?
• Given i-node, it is then possible to find all the blocks of the file:
How does this scheme compare against linked files using an in-memory
table? Any ideas?
How does this scheme compare against linked files using an in-memory
table? Any ideas?
I-node array is far smaller than space occupied by the file table
approach:
I-node array:
• Usually far smaller than the space occupied by the file table approach;
• Reason is simple:
• Table holding all disk blocks is proportional in size to the disk itself;
• If the disk has n blocks, the table needs n entries;
• As disks grow larger, this table grows linearly with them.
Can you see any problem with the i-nodes approach? Any ideas?
Can you see any problem with the i-nodes approach? Any ideas?
what happens when a file grows beyond this limit? Any ideas?
Can you see any problem with the i-nodes approach? Any ideas?
What happens when a file grows beyond this limit? Any ideas?
One solution: reserve the last disk address not for a data block:
What happens when a file grows beyond this limit? Any ideas?
One solution: reserve the last disk address not for a data block:
Implementing Directories
• OS uses the path name to locate the directory entry on the disk:
• Directory entry provides information needed to find the disk blocks:
Figure: Simple directory containing fixed-size entries with the disk addresses and attributes in the directory
entry. (Source: [Tanenbaum and Bos, 2015])
Figure: Simple directory containing fixed-size entries with the disk addresses and attributes in the directory
entry. (Source: [Tanenbaum and Bos, 2015])
• Attributes:
• Creator, Time, etc.
Figure: A directory in which each entry just refers to an i-node. (Source: [Tanenbaum and Bos, 2015])
How can we determine the i-node of files and directories in Linux? Any
ideas?
How can we determine the i-node of files and directories in Linux? Any
ideas?
• ls -i =)
But what about the file names lengths impact on the directory structure?
But what about the file names lengths impact on the directory structure?
But what about the file names lengths impact on the directory structure?
• Idea: Give up the idea that all directory entries are the same size;
Idea: Give up the idea that all directory entries are the same size:
Figure: In-line handling of long file names (Source: [Tanenbaum and Bos, 2015])
Can you see any problems with the previous approach? Any ideas?
Can you see any problems with the previous approach? Any ideas?
• Problem is essentially the same one we saw with contiguous disk files,
Can you see any other problem with the previous approach? Any ideas?
Can you see any other problem with the previous approach? Any ideas?
• Keep the file names together in a heap at the end of the directory:
L. Tarrataca
Figure: Heap handling of long Chapter 4[Tanenbaum
file names (Source: - File Systems and Bos,
1082015])
/ 161
File System Implementation Implementing Directories
Can you see any advantage of using the heap method? Any ideas?
Can you see any advantage of using the heap method? Any ideas?
Important:
Several possibilities:
Example
Example
Consider the path: /usr/ast/mbox
• Making it work efficiently and robustly in real life is something quite different;
Disk-space management
• Disk quotas
Block Size
In conclusion:
Again: But how can we choose an appropriate block size? Any ideas?
Figure: Percentage of files smaller than a given size (in bytes) (Source: [Tanenbaum and Bos, 2015])
• E.g.: in 2005: ≈ 59% in the 2nd data set were 4KB or smaller
• E.g.: in 2005: ≈ 90% in the 2nd data set were 64KB or smaller
Example (1/2)
• Rotational delay;
• Transfer times;
Example (2/2)
Data rate for such a disk as function of block size:
Figure: The dashed curve (left-hand scale) give the date rate of a disk. The solid curve (right-hand scale)
gives the disk-space efficiency. All files are 4 KB (Source: [Tanenbaum and Bos, 2015])
What is the main conclusion you can draw from the previous figure? Any
ideas?
What is the main conclusion you can draw from the previous figure? Any
ideas?
• Small blocks are bad for performance, but good for disk utilization;
Historically:
Figure: Storing the free list on a linked list. (Source: [Tanenbaum and Bos, 2015])
• One of these slots is required for the pointer to the next block:
• As a result: only capable of describing 255 free blocks;
Figure: Storing the free list on a bitmap. (Source: [Tanenbaum and Bos, 2015])
• These bits require around 230 /(210 ∗ 8) ≈ 130.000 1KB blocks to store;
Why does the value 8 appear in the previous calculation? Any ideas?
• These bits require around 230 /(210 ∗ 8) ≈ 130.000 1KB blocks to store;
Why does the value 8 appear in the previous calculation? Any ideas?
Disk Quotas
• OS table contains the quota record for every user with a currently open file:
• Even if the file was opened by someone else
• Appending to a file when the hard block limit has been reached:
• Will result in an error.
Figure: Quotas are kept track of on a per-user basis in a quota table. (Source: [Tanenbaum and Bos, 2015])
File-system performance
Caching
/ cache:
• If block ∈
• Block is read into cache;
• Then copied to wherever it is needed;
• All the blocks with the same hash value are chained together
Figure: The buffer cache data structures. (Source: [Tanenbaum and Bos, 2015])
Block read-ahead
• Get blocks into the cache before they are needed to increase the hit rate;
Read-ahead strategy works only for files that are read sequentially:
Do you know any other type of mass storage devices? Any ideas?
Do you know any other type of mass storage devices? Any ideas?
Defragmenting Disks
• All free disk space is in a single contiguous unit following the installed files;
Can you see any problem with this structure as time goes on? Any ideas?
Defragmenting Disks
Can you see any problem with this structure as time goes on? Any ideas?
References I