0% found this document useful (0 votes)
167 views32 pages

Linux Virtual File System: Peter J. Braam

This document provides an overview of the Linux virtual file system (VFS). It describes the data structures used in Linux VFS like inodes, dentries, superblocks. It explains the flow of control between the VFS and individual file systems and the methods each provides. Examples of different Linux file systems are given to illustrate VFS concepts. Key operations on inodes, superblocks and dentries are outlined.

Uploaded by

Kalesha Shaik
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views32 pages

Linux Virtual File System: Peter J. Braam

This document provides an overview of the Linux virtual file system (VFS). It describes the data structures used in Linux VFS like inodes, dentries, superblocks. It explains the flow of control between the VFS and individual file systems and the methods each provides. Examples of different Linux file systems are given to illustrate VFS concepts. Key operations on inodes, superblocks and dentries are outlined.

Uploaded by

Kalesha Shaik
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Linux Virtual File System

Peter J. Braam

P.J.Braam/CMU -- 1
Aims
• Present the data structures in Linux VFS
• Provide information about flow of control
• Describe methods and invariants needed to
implement a new file system
• Illustrate with some examples

P.J.Braam/CMU -- 2
History

• BSD implemented VFS for File access


NFS: aim dispatch to
different filesystems
• VMS had elaborate
filesystem
• NT/Win95 have VFS type
interfaces
• Newer systems integrate
VM with buffer cache.

P.J.Braam/CMU -- 3
Linux Filesystems
• Media based • Network
– ext2 - Linux native – nfs
– ufs - BSD – Coda
– fat - DOS FS – AFS - Andrew FS
– vfat - win 95 – smbfs - LanManager
– hpfs - OS/2 –

ncpfs - Novell
minix - well….
– Isofs - CDROM • Special ones
– sysv - Sysv Unix – procfs -/proc
– hfs - Macintosh – umsdos - Unix in DOS
– affs - Amiga Fast FS – userfs - redirector to user
– NTFS - NT’s FS
– adfs - Acorn-strongarm

P.J.Braam/CMU -- 4
Linux Filesystems (ctd)
• Forthcoming: • Linux serves (unrelated
– devfs - device file system to the VFS!)
– DFS - DCE distributed – NFS - user & kernel
FS
– Coda
• Varia:
– AppleShare -
– cfs - crypt filesystem netatalk/CAP
– cfs - cache filesystem
– SMB - samba
– ftpfs - ftp filesystem
– – NCP - Novell
mailfs - mail filesystem
– pgfs - Postgres versioning
file system

P.J.Braam/CMU -- 5
Usefulness

Linux is Obsolete

Andrew Tanenbaum

P.J.Braam/CMU -- 6
Linux VFS
• Multiple interfaces build up File access
VFS:
– files
– dentries
– inodes
– superblock
– quota
• VFS can do all caching &
provides utility fctns to FS
• FS provides methods to
VFS; many are optional

P.J.Braam/CMU -- 7
User level file access
• Typical user level types and code:
– pathnames: “/myfile”
– file descriptors: fd = open(“/myfile”…)
– attributes in struct stat: stat(“/myfile”, &mybuf),
chmod, chown...
– offsets: write, read, lseek
– directory handles: DIR *dh = opendir(“/mydir”)
– directory entries: struct dirent *ent = readdir(dh)

P.J.Braam/CMU -- 8
VFS
• Manages kernel level file abstractions in one
format for all file systems
• Receives system call requests from user level (e.g.
write, open, stat, link)
• Interacts with a specific file system based on
mount point traversal
• Receives requests from other parts of the kernel,
mostly from memory management

P.J.Braam/CMU -- 9
File system level
• Individual File Systems
– responsible for managing file & directory data
– responsible for managing meta-data: timestamps, owners,
protection etc
– translates data between
• particular FS data: e.g. disk data, NFS data,
Coda/AFS data
• VFS data: attributes etc in standard format
– e.g. nfs_getattr(….) returns attributes in VFS format,
acquires attributes in NFS format to do so.

P.J.Braam/CMU -- 10
Anatomy of stat system call
sys_stat(path, buf) {
dentry = namei(path);
if ( dentry == NULL ) return -ENOENT; Establish VFS data

inode = dentry->d_inode;
rc =inode->i_op->i_permission(inode); Call into inode layer of
if ( rc ) return -EPERM; filesystem
Call into inode layer of
rc = inode->i_op->i_getattr(inode, buf);
filesystem
dput(dentry);
return rc;
}
P.J.Braam/CMU -- 11
Anatomy of fstatfs system call

sys_fstatfs(fd, buf) { /* for things like “df” */


file = fget(fd); Translate fd to VFS
if ( file == NULL ) return -EBADF; data structure

superb = file->f_dentry->d_inode->i_super;

rc = superb->sb_op->sb_statfs(sb, buf); Call into superblock


return rc; layer of filesystem
}

P.J.Braam/CMU -- 12
Data structures
• VFS data structures for:
– VFS handle to the file: inode (BSD: vnode)
– User instantiated file handle: file (BSD: file)
– The whole filesystem: superblock (BSD: vfs)
– A name to inode translation: dentry

P.J.Braam/CMU -- 13
Shorthand method notation
• super block methods: sss_methodname
• inode methods: iii_methodname
• dentry methods: ddd_methodname
• file methods: fff_methodname

• instead of :
inode i_op lookup we write iii_lookup
P.J.Braam/CMU -- 14
namei
VFS FS
struct dentry *namei(parent, name) {
if (dentry = d_lookup(parent,name)) ddd_hash(parent, name)
ddd_revalidate(dentry)
else
iii_lookup(parent, name)

struct inode *iget(ino, dev) {


/* try cache else .. */
sss_read_inode(…)

}
P.J.Braam/CMU -- 15
Superblocks
• Handle metadata only (attributes etc)
• Responsible for retrieving and storing
metadata from the FS media or peers
• Struct superblocks hold things like:
– device, blocksize, dirty flags, list of dirty inodes
– super operations
– wait queue
– pointer to the root inode of this FS
P.J.Braam/CMU -- 16
Super Operations (sss_)
• Ops on Inodes: • Superblock manips:
– read_inode – read_super (mount)
– put_inode – put_super (unmount)
– write_inode – write_super (unmount)
– delete_inode – statfs (attributes)
– clear_inode
– notify_change

P.J.Braam/CMU -- 17
Inodes
• Inodes are VFS abstraction for the file
• Inode has operations (iii_methods)
• VFS maintains an inode cache, NOT the
individual FS’s (compare NT, BSD etc)
• Inodes contain an FS specific area where:
– ext2 stores disk block numbers etc
– AFS would store the FID
• Extraordinary inode ops are good for dealing
with stale NFS file handles etc.
P.J.Braam/CMU -- 18
What’s inside an inode - 1
list_head i_hash
list_head i_list
list_head i_dentry caching
int i_count

long i_ino
int i_dev
Identifies file

{m,a,c}time
{u,g}id
mode Usual stuff
size
n_link

P.J.Braam/CMU -- 19
What’s inside an inode -2
superblock i_sb
inode_ops i_op Which FS

wait objects, semaphore


lock For mmap,
vm_area_struct
pipe/socket info
networking
waiting
page information

union { FS Specific
ext2fs_inode_info i_ext2
info:
nfs_inode_info i_nfs
coda_inode_info i_coda blockno’s
..} u fids etc
P.J.Braam/CMU -- 20
Inode state
• Inode can be on one or two lists:
– (hash & in_use) or (hash & dirty ) or unused
– inode has a use count i_count
• Transitions
– unused  hash: iget calls sss_read_inode
– dirty in_use: sss_write_inode
– hash  unused: call on sss_clear_inode, but if
i_nlink = 0: iput calls sss_delete_inode when
i_count falls to 0
P.J.Braam/CMU -- 21
Inode Cache
Players: 1. iget: if i_count>0 ++ 3. free_inodes
2. iput: if i_count>1 - - 4. syncing inodes

Inode_hashtable
sss_clear_inode sss_read_inode
(freeing inos) (iget)
or
sss_delete_inode Fs storage
(iput) Unused inodes
Fs storage

Dirty inodes sss_write_inode


media fs only (sync one)
(mark_inode_dirty) Used inodes Fs storage
P.J.Braam/CMU -- 22
Sales
Red Hat Software sold 240,000 copies of Red Hat
Linux in 1997 and expects to reach 400,000 in
1998.

Estimates of installed servers (InfoWorld):


- Linux: 7 million
- OS/2: 5 million
- Macintosh: 1 million

P.J.Braam/CMU -- 23
Inode operations (iii_)
• lookup: return inode • symbolic links
– calls iget – readlink
• creation/removal – follow link
– create • pages
– link – readpage, writepage,
– unlink updatepage - read or write
– symlink page. Generic for mediafs.
– mkdir – bmap - return disk block
– rmdir number of logical block
– mknod • special operations
– rename – revalidate - see dentry sect
– truncate
– permission

P.J.Braam/CMU -- 24
Dentry world
• Dentry is a name to inode translation structure
• Cached agressively by VFS
• Eliminates lookups by FS & private caches
– timing on Coda FS: ls -lR 1000 files after priming cache
• linux 2.0.32: 7.2secs
• linux 2.1.92: 0.6secs
– disk fs: less benefit, NFS even more
• Negative entries!
• Namei is dramatically simplified P.J.Braam/CMU -- 25
Inside dentry’s
• name
• pointer to inode
• pointer to parent dentry
• list head of children
• chains for lots of lists
• use count

P.J.Braam/CMU -- 26
Dentry associated lists
Legend: inode dentry

dentry inode relationship dentry tree relationship

inode I_dentry list head inode i_dentry list head

= d_inode pointer = d_parent pointer


d_alias chains d_child chains
place: d_instantiate place: d_alloc
remove: dentry_iput remove: d_prune, d_invalidate, d_put
P.J.Braam/CMU -- 27
Dcache
dentry_hashtable (d_hash chains) • namei tries cache:
d_lookup
dhash(parent, name) list head – ddd_compare
• Success: ddd_revalidate
– d_invalidate if fails
– proceed if success
prune namei • Failure: iii_lookup
d_invalidate iii_lookup – find inode
d_drop d_add – iget
• sss_read_inode
– finish:
unused dentries (d_lru chains) • d_add
– can give negative entry
in dcache
P.J.Braam/CMU -- 28
Dentry methods
• ddd_revalidate: can force new lookup
• ddd_hash: compute hash value of name
• ddd_compare: are names equal?
• ddd_delete, ddd_put, ddd_iput: FS cleanup
opportunity

P.J.Braam/CMU -- 29
Dentry particulars:
• ddd_hash and ddd_compare have to deal with
extraordinary cases for msdos/vfat:
– case insensitive
– long and short filename pleasantries
• ddd_revalidate -- can force new lookup if
inode not in use:
– used for NFS/SMBfs aging
– used for Coda/AFS callbacks

P.J.Braam/CMU -- 30
Style

Dijkstra probably hates me

Linus Torvalds

P.J.Braam/CMU -- 31
Memory mapping
• vm_area structure has
– vm_operations
– inode, addresses etc.
• vm_operations
– map, unmap
– swapin, swapout
– nopage -- read when page isn’t in VM
• mmap
– calls on iii_readpage
– keeps a use count on the inode until unmap

P.J.Braam/CMU -- 32

You might also like