0% found this document useful (0 votes)
69 views

L10 - File Management Introduction

The document discusses file systems and their abstraction from physical storage. It covers key concepts like files, directories, file metadata and operations. It compares memory and file management, and describes common file system criteria and implementations.

Uploaded by

jiatsd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

L10 - File Management Introduction

The document discusses file systems and their abstraction from physical storage. It covers key concepts like files, directories, file metadata and operations. It compares memory and file management, and describes common file system criteria and implementations.

Uploaded by

jiatsd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

File System Management

File System Introduction

Lecture 10

1
Overview
 File System
 Definition
 Vs Memory Management
 Motivation
 File
 Metadata
 Operations
 Directory
 Directory Structure
 I/O Scheduling
[ CS2106 L10 - AY2021 S1 ]
2
File System: Motivation
 Physical memory is volatile
 Use external storage to store persistent information

 Direct access to the storage media is not portable:


 Dependent on hardware specification and organization (see next
slide for example)

 File System provides:


 An abstraction on top of the physical media
 A high level resource management scheme
 Protection between processes and users
 Sharing between processes and users
[ CS2106 L10 - AY2021 S1 ]
3
File System: General Criteria
 Self-Contained:
 Information stored on a media is enough to describe the entire
organization
 Should be able to "plug-and-play" on another system
 Persistent:
 Beyond the lifetime of OS and processes
 Efficient:
 Provides good management of free and used space
 Minimum overhead for bookkeeping information

[ CS2106 L10 - AY2021 S1 ]


4
Memory Management vs File Management

Memory Management File System Management

Underlying Storage RAM Disk

Access Speed Constant Variable disk I/O time

Unit of Addressing Physical memory address Disk sector


Address space for process
Usage Non-volatile data Explicit access
Implicit when process runs
Many different FS:
Paging/Segmentation:
Organization ext* (Linux), FAT* (Windows),
determined by HW & OS
HFS* (Mac OS)etc.

[ CS2106 L10 - AY2021 S1 ]


5
Key Topics
File System Abstraction

• Discuss the logical entities present in file system


• E.g. Files / Directories

File System Implementation

• Common implementation schemes


• Discuss pros/cons
• Case studies

[ CS2106 L10 - AY2021 S1 ]


6
You mean files and folders are not real?

FILE SYSTEM ABSTRACTIONS

[ CS2106 L10 - AY2021 S1 ]


7
File System Abstraction
 File System:
 Consists of a collection of files and directory structures
 File: An abstract storage of data
 Directory (Folder): Organization of files

 Provides an abstraction of accessing and using the above

 Look at the two abstractions closely next:


 File
 Directory (Folder)

[ CS2106 L10 - AY2021 S1 ]


8
File: Overview
 Basic Definition

 File Metadata

 File Data
 File structure
 Access Methods

 File Operations

[ CS2106 L10 - AY2021 S1 ]


9
File: Basic Description
 Represent a logical unit of information created by process
 An abstraction
 Essentially an Abstract Data Type:
 A set of common operations with various possible implementation

 Contains:
 Data: Information structured in some ways

 Metadata: Additional information associated with the file


 Also known as file attributes

[ CS2106 L10 - AY2021 S1 ]


10
File Metadata
Name: A human readable reference to the file

Identifier: A unique id for the file used internally by FS


Indicate different type of files
Type:
E.g. executable, text file, object file, directory etc
Size: Current size of file (in bytes, words or blocks)

Protection: Access permissions, can be classified as reading, writing and execution rights

Time, date and owner


Creation, last modification time, owner id etc
information:

Table of content: Information for the FS to determine how to access the file

[ CS2106 L10 - AY2021 S1 ]


11
File Name
 Different FS has different naming rule
 To determine valid file name

 Common naming rule:


 Length of file name
 Case sensitivity
 Allowed special symbols
 File extension
 Usual form Name.Extension
 On some FS, extension is used to indicate file type

[ CS2106 L10 - AY2021 S1 ]


12
File Type
 An OS commonly supports a number of file types

 Each file type has:


 An associated set of operations
 Possibly a specific program for processing

 Common file types:


 Regular files: contains user information
 Directories: system files for FS structure
 Special files: character/block oriented

[ CS2106 L10 - AY2021 S1 ]


13
Two Major Types of Regular Files
 ASCII files:
 Example: text file, programming source codes, etc
 Can be displayed or printed as is

 Binary files:
 Example: executable, Java class file, pdf file, mp3/4, png/jpeg/bmp
etc

 Have a predefined internal structure that can be processed by


specific program
 JVM to execute Java class file
 PDF reader for pdf file etc

[ CS2106 L10 - AY2021 S1 ]


14
Distinguishing File Type
1. Use file extension as indication:
 Used by Windows OS
 e.g. XXX.docx  Words document
 Change of extension implies a change in file type!

2. Use embedded information in the file:


 Used by Unix
 Usually stored at the beginning of the file
 Commonly known as magic number

[ CS2106 L10 - AY2021 S1 ]


15
File Protection
 Controlled access to the information stored in a file

 Type of access:
 Read: Retrieve information from file
 Write: Write/Rewrite the file
 Execute: Load file into memory and execute it
 Append: Add new information to the end of file
 Delete: Remove the file from FS
 List: Read metadata of a file

[ CS2106 L10 - AY2021 S1 ]


16
File Protection: How?
 Most common approach:
 Restrict access base on the user identity

 Most general scheme:


 Access Control List
 A list of user identity and the allowed access types
 Pros: Very customizable
 Cons: Additional information associated with file

 A common condensed file protection scheme is discussed


next

[ CS2106 L10 - AY2021 S1 ]


17
File Protection: Permission Bits
 Classified the users into three classes:
1. Owner: The user who created the file
2. Group: A set of users who need similar access to a file
3. Universe: All other users in the system
 Example (Unix)
 Define permission of three access types (Read/Write/Execute) for
the 3 classes of users
 Use "ls –l" to see the permission bits for a file

rwxr--r-- somefile.eg

Owner Universe
Group
[ CS2106 L10 - AY2021 S1 ]
18
File Protection: Access Control List
 In Unix, Access Control List (ACL) can be:
 Minimal ACL (the same as the permission bits)
 Extended ACL (added named users / group )
"getfacl" is the command
$ getfacl exampleDir to get ACL information

# file: exampleDir
# owner: ccris Permission for
# group: compsc Specific User
user::rwx
user:sooyj:rwx
group::r-x Permission for
group:cohort20:rwx Specific Group
mask::rwx
other::---
Permission
"upperbound"
[ CS2106 L10 - AY2021 S1 ]
19
Operations on File Metadata
 Rename:
 Change filename

 Change attributes:
 File access permissions
 Dates
 Ownership
 etc

 Read attribute:
 Get file creation time

[ CS2106 L10 - AY2021 S1 ]


20
File Data: Structure
 Array of bytes:
 The traditional Unix view
 No interpretation of data: just raw bytes
 Each byte has a unique offset (distance) from the file start
 Fixed length records:
 Array of records, can grow/shrink
 Can jump to any record easily:
 Offset of the Nth record = size of Record * (N-1)
 Variable length records
 Flexible but harder to locate a record

[ CS2106 L10 - AY2021 S1 ]


21
File Data: Access Methods
 Sequential Access:
 Data read in order, starting from the beginning
 Cannot skip but can be rewound

 Random Access:
 Data can be read in any order
 Can be provided in two ways:
1. Read( Offset ): Every read operation explicitly state the position to be
accessed
2. Seek( Offset ): A special operation is provided to move to a new location
in file
 E.g. Unix and Windows uses (2)

[ CS2106 L10 - AY2021 S1 ]


22
File Data: Access Methods (cont )
 Direct Access:
 Used for file contains fixed-length records
 Allow random access to any record directly

 Very useful where there is a large amount of records


 e.g. In database

 The basic random access method can be view as a special case:


 Where each record == one byte

[ CS2106 L10 - AY2021 S1 ]


23
File Data: Generic Operations
Create: New file is created with no data

Performed before further operations


Open:
To prepare the necessary information for file operations later

Read:
Read data from file, usually starting from current position

Write:
Write data to file, usually starting from current position

Also known as seek


Repositioning: Move the current position to a new location
No actual Read/Write is performed

Truncate: Removes data between specified position to end of file

[ CS2106 L10 - AY2021 S1 ]


24
File Operations as System Calls
 OS provides file operations as system calls:
 Provide protection, concurrent and efficient access
 Maintain information

 Information kept for an opened file:


 File Pointer: Current location in file
 Disk Location: Actual file location on disk
 Open Count: How many times has this file opened?
 Useful to determine when to remove the entry in table

[ CS2106 L10 - AY2021 S1 ]


25
File Information in the OS
 Consider:
 Several processes can open the same file
 Several different files can be opened at any time
 What is a good way to organize the open-file information?

 Common approach – 2 tables


 System-wide open-file table:
 To keep track of the open files in the system
 Per-process open-file table:
 To keep track of the open files for a process
 Each entry points to the system-wide table entries

[ CS2106 L10 - AY2021 S1 ]


26
File Operations: Unix Illustration
Proc A PCB Op.Type: …
0 File offset: …
Process make 0 "File Data":
file system 1 File1.abc
calls, usually
… …
with file … …
descriptor fd
fd Op.Type: Read
File Descriptor x File offset: 1234
Table "File Data":
System
Calls Proc B PCB … …
0
1 Op.Type: Write
y File offset: 5678 File2.def
… … "File Data":

fd
… …
File Descriptor
Table
System-wide "Actual File“
Per-process
Open File Table (i-node table)
[ CS2106 L10 - AY2021 S1 ] FD Table
27
Tables in the Kernel Space: Unix Illustration

[ CS2106 L10 - AY2021 S1 ]


28
Process Sharing File in Unix: Case 1
 A file is opened twice:
 2 file descriptors
 2 entries in the system-wide Proc A PCB
open file table
Op.Type: …
 I/O can occur at independent File offset: 5000
offsets fd1
"File Data":
 When: File.abc
 Two process open the same file
 Same process open the file twice Proc B PCB Op.Type: …
File offset: 2000
fd2 "File Data":

The shared file

[ CS2106 L10 - AY2021 S1 ]


29
Process Sharing File in Unix: Case 2
 Two file descriptors
pointing to the same entry
in the system-wide open
file table
Parent PCB

 Only one offset  I/O


changes the offset for the fd1
other process
Op.Type: …
File offset: 3000
 When: Child PCB "File Data": File.abc
 fork() after file is opened
 dup () within the same fd1
process
The shared file

[ CS2106 L10 - AY2021 S1 ]


30
Just your regular folders

DIRECTORY

[ CS2106 L10 - AY2021 S1 ]


31
Directory: Basics
 Directory ( folder ) is used to:
1. Provide a logical grouping of files
 The user view of directory
2. Keep track of files
 The actual system usage of directory

 Several ways to structure directory:


 Single-Level
 Tree-Structure
 Directed Acyclic Graph (DAG)
 General Graph
[ CS2106 L10 - AY2021 S1 ]
32
Directory Structure: Single-Level
Usually known as
the root directory

directory

file 1 file 2 file 3 file 4

[ CS2106 L10 - AY2021 S1 ]


33
Directory Structure: Tree-Structured
dir1

dir2
file 1 file 2

dir3
file 3

file 3

[ CS2106 L10 - AY2021 S1 ]


34
Directory Structure: Tree-Structured
 General Idea:
 Directories can be recursively embedded in other directories
 Naturally forms a tree structure
 Two ways to refers to a file:
 Absolute Pathname:
 Directory names followed from root of tree + final file
 i.e. the Path from root directory to the file
 Relative Pathname:
 Directory names followed from the current working directory (CWD)
 CWD can be set explicitly or implicitly changed by moving into a new directory
under shell prompt

[ CS2106 L10 - AY2021 S1 ]


35
Directory Structure: DAG
/

dir2

dir3

alias

If this link can


be added, then
Tree  DAG file3
[ CS2106 L10 - AY2021 S1 ]
36
Directory Structure: DAG
 If a file can be shared:
 Only one copy of actual content
 "Appears" in multiple directories
 With different path names
 Then tree structure  DAG

 Two implementations in Unix:


 Hard Link
 Limited to file only
 Symbolic Link
 Can be file or directory
 This has an "interesting" effect….
[ CS2106 L10 - AY2021 S1 ]
37
DAG: Unix Hard Link
 Consider:
 Directory A is the owner of file F
 Directory B wants to share F
 Hard Link:
 A and B has separate pointers point to the actual file F in disk
 Pros:
 Low overhead, only pointers are added in directory
 Cons:
 Deletion problems:
 e.g. If B deletes F? If A deletes F?
 Unix Command: " ln "

[ CS2106 L10 - AY2021 S1 ]


38
DAG: Unix Symbolic Link
 Symbolic Link:
 B creates a special link file, G
 G contains the path name of F
 When G is accessed:
 Find out where is F, then access F
 Pros:
 Simple deletion:
 If B deletes: G deleted, not F
 If A deletes: F is gone, G remains (but not working)
 Cons:
 Larger overhead:
 Special link file take up actual disk space
 Unix Command: "ln –s"
[ CS2106 L10 - AY2021 S1 ]
39
Directory Structure: General Graph

dir2

cycle
dir3

If this link can be


added, then Tree
 General Graph
[ CS2106 L10 - AY2021 S1 ]
40
Directory Structure: General Graph
 General Graph Directory Structure is not desirable:
 Hard to traverse
 Need to prevent infinite looping

 Hard to determine when to remove a file/directory

 In Unix:
 Symbolic link is allowed to link to directory
 General Graph can be created

[ CS2106 L10 - AY2021 S1 ]


41
I'm afraid you have to wait…..

I/O SCHEDULING

[ CS2106 L10 - AY2021 S1 ]


42
Magnetic Disk in One Glance
Rotation
(Change Sector)

Seek
(Change Track)

Track

Disk
Head
Sector

[ CS2106 L10 - AY2021 S1 ]


43
Disk Scheduling: The Problem
 Due to the significant seek and rotational latency, OS should
schedule the disk I/O requests
 I/O (disk) scheduling:
 Intention of reducing overall waiting time
 As rotational latency is hard to mitigate, we focus on reducing the
seeking time
 Balance the need for high throughput while trying to fairly share I/O
requests amongst processes

[ CS2106 L10 - AY2021 S1 ]


44
Disk Scheduling: Algorithms
 Consider the following disk I/O requests indicated by only the
track number (magnetic disks):
 13, 14, 2, 18, 17, 21, 15
 A few obvious candidates:
 FCFS
 SSF (Shortest Seek First)
 "SJF" modified for the disk context
 The SCAN family (aka Elevator):
 Bi-Direction [Innermost  Outermost] (SCAN)
 1-Direction [Outermost Innermost] (C-SCAN)

 Very intuitive: Imagine the tracks are floors in a building, and the disk head is
the elevator servicing the floors (Figure out the algorithm before lecture )
[ CS2106 L10 - AY2021 S1 ]
45
SCAN: Disk Head Movement
 disk I/O requests indicated by only the track number :
[13, 14, 2, 10, 17, 31, 21, 7]
35

30

25

20
Track

15

10

0
0 2 4 6 8 10 12 14 16
Simulated Time
[ CS2106 L10 - AY2021 S1 ]
46
I/O Scheduling: Newer Algorithms
 Deadline - 3 queues for I/O requests:
 Sorted
 Read FIFO - read requests stored chronologically
 Write FIFO - write requests stored chronologically
 noop (No-operation) - no sorting
 cfq (Completely Fair Queueing) - time slice and per-
process sorted queues
 bfq (Budget Fair Queuing) (Multiqueue) - fair sharing
based on the number of sectors requested

[ CS2106 L10 - AY2021 S1 ]


47
Summary
 Covered basics of file system from a user point of view

 Understand the basic requirements of a FS

 Understand the components of a FS:


 File and Directory

 Discussed OS responsibility in I/O scheduling

[ CS2106 L10 - AY2021 S1 ]


48
For your reference only

UNIX FILE OPERATIONS

[ CS2106 L10 - AY2021 S1 ]


49
File Operations Example: Unix System Calls
 Header Files:
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>

 File related Unix System Calls


 open(), read(), write(), lseek(), close()
 General Information:
 Opened file has an identifier
 File Descriptor: Integer
 Used for other operations
 File is access on a byte-by-byte basis
 No interpretation of data
[ CS2106 L10 - AY2021 S1 ]
50
Opening Files: open( )
 Function Call:
int open( char *path, int flags )
 Return:
 -1: Failed to open file
 >=0: file descriptor, a unique index for opened file
 Parameters:
 path: File path
 flags: Many options can be set using bit-wise-OR
 Read, Write or Read+Write mode
 Truncation, Append mode
 Create file if no exists
 … Many many more 

[ CS2106 L10 - AY2021 S1 ]


51
Opening Files: open() (cont)
 Example:
int fd; //file descriptor

//Open an existing file for read only


fd = open( "data.txt", O_RDONLY );

//Create the file if not found, open for read + write


fd = open("data.txt", O_RDWR | O_CREAT );

 By convention:
 Default file descriptors:
 STDIN (0), STDOUT (1), STDERR (2)

[ CS2106 L10 - AY2021 S1 ]


52
Read Operation: read()
 Function Call:
int read(int fd, void *buf, int n)
 Purpose:
 reads up to n bytes from current offset into buffer buf
 Return:
 number of bytes read, can be 0...n

 <n : end of file is reached

 Parameters:
 fd: file descriptor (must be opened for read)
 buf: An array large enough to store n bytes
 read() is sequential read:
 starts at current offset and increments offset by bytes read
[ CS2106 L10 - AY2021 S1 ]
53
Write Operation: write()
 Function Call:
int write(int fd, void *buf, int n)
 Purpose:
 writes up to n bytes from current offset from buffer buf

 Return:
 -1: Error

 >= 0: Number of bytes written

 Parameters:
 fd: file descriptor (must be opened for write)

 buf: An array of at least n bytes with values to be written

 Possible errors:
 exceeds file size limit, quota, disk space, etc.

 write() is sequential write:


 starts at current offset and increments offset by bytes written

 can increase file size beyond EOF  append new data

[ CS2106 L10 - AY2021 S1 ]


54
Repositioning: lseek()
 Function Call:
off_t lseek(int fd, off_t offset, int whence)
 Purpose:
 Move current position in file by offset

 Return:
 -1: Error

 >= 0: Current offset in file

 Parameters:
 fd: file descriptor (must be opened)

 offset: positive = move forward, negative = move backward

 whence: Point of reference for interpreting the offset


 SEEK_SET: absolute offset (count from the file start)
 SEEK_CUR: relative offset from current position (+/-)
 SEEK_END: relative offset from end of file (+/-)
 Can seek anywhere in file, even beyond end of existing data

[ CS2106 L10 - AY2021 S1 ]


55
Closing Files: close()
 Function Call:
int close( int fd )
 Return:
 -1: Error
 0: Successful
 Parameters:
 fd: file descriptor (must be opened)
 With close():
 fd no longer used anymore
 Kernel can remove associated data structures
 The identifier fd can be reused later
 By default:
 Process termination automatically closes all open files

[ CS2106 L10 - AY2021 S1 ]


56

You might also like