XFS - Extended Filesystem
XFS - Extended Filesystem
Seminar by:
Bauskar Krunal Suresh
Pune Institute Of Computer Technology
Date: 12/08/2002
-1-
CERTIFICATE
Date :
Place:
-2-
ACKNOWLEDGEMENT
-3-
INDEX:
4.1 Basics……………..…………………………………………..…20
4.2 File Management…………………………………..…………….20
4.3 XFS as Journalled Filesystem……………………..…………….25
4.4 Volume Management.……………………………..…………….26
4.5 Guaranteed Rate I/O …..…………………………..…………….30
4.6 Data Management Interface ……..………………..…………….31
4.7 Expanded Dump/Restore Capabilities…………..………………32
-4-
Chapter 5: ARCHITECTURE OF XFS
Bibliography……………………………………………………..46
-5-
1 . INTRODUCTION TO FILESYSTEM
1.1 Overview
1.1.1 File
File can be defined as – logically file means, collection of record, text, images, etc
physically - collection of bytes (in memory) in which these bytes are stored in particular
order and accessed in certain order to interpret data as information
1.1.2 Filesystem
Filesystem is system that is use to control/manage file on computer system/machine.
Input to this system is control command from user; output would be result of execution of
this command.
-6-
demanded use of larger files to store data, with extremely fast crash recovery, support for
large filesystems, directories with large numbers of files, and fair performance with small
and large files. Many of these future applications would require still larger file structure,
which should be scalable and reliable.
New-generation file systems have been designed to overcome those problems, keeping
scalability in mind. Several new structures and techniques have been included in those
filesystems.
Growing demand for larger files, had made it must to have scalable, reliable and high
performance filesystems that can handle larger files and performance shouldn’t be
degraded. Until evolution these filesystem, some older filesystem provided support to
some larger files but result into high performance penalty.
-7-
expense of performance. The revolutionary growth of CPU and storage technology has
fueled a dramatic increase in the size and complexity of filesystems, outstripping the
capabilities of many filesystems.
Today, at least four major players exist in the Linux journaling filesystem arena. They are
in various stages of completion, with some of them becoming ready for use in production
systems. They are:
-8-
2 . JOURNALING FILESYSTEM
Journaling filesystems maintain a special file called a log (or journal), the contents of
which are not cached. Whenever the filesystem is updated, a record describing the
transaction is added to the log. An idle thread processes these transactions, writes data to
the filesystem, and flags each processed transaction as completed. If the machine crashes,
the background process is run on reboot and simply finishes copying updates from the
journal to the filesystem. Incomplete transactions in the journal file are discarded, so the
filesystem's internal consistency is guaranteed.
Waiting for a fsck to complete on a server system can tax your patience more than it
should. Fortunately, a new breed of filesystem is coming to your Linux machine soon.
This cuts the complexity of a filesystem check by a couple of orders of magnitude. A
-9-
full-blown consistency check is never necessary (in contrast to ext2fs and similar
filesystems) and restoring a filesystem after a reboot is a matter of seconds at most.
Buffer cache is a buffer allocated in the main memory intended to speed I/O operations.
This kind of buffer is commonly used in file systems — the disk cache — and databases
to increase overall performance. The problem appears if there is a system crash, before
the buffers have been written to disk, that would cause the system to behave in an
inconsistent way after system reboot. Think of a file deleted in the cache, but remaining
in the hard disk. That's why databases and file systems have the ability to recover the
system back to a consistent state. Although databases have recovered quickly for years,
the file systems and more precisely UFS-like ones tend to increase their recover time as
file system size grows. The fsck recover tool for ext2fs has to scan through the entire disk
partition in order to take the file system back to a consistent state. This time-consuming
task often creates a lack of availability for large servers with hundreds of gigabytes or
sometimes terabytes. This is the main reason for the file systems to inherit database
recover technology, and thus the appearance of Journal File Systems.
These filesystems uses techniques originally developed for databases to log information
about operations performed on the file system meta-data as atomic transactions. In the
event of a system failure, a file system is restored to a consistent state by replaying the
log and applying log records for the appropriate transactions. The recovery time
associated with this log-based approach is much faster since the replay utility need only
examine the log records produced by recent file system activity rather than examine all
file system meta-data. Several other aspects of log-based recovery are of interest. First,
JFS only logs operations on meta-data, so replaying the log only restores the consistency
- 10 -
of structural relationships and resource allocation states within the file system. It does not
log file data or recover this data to consistent state. Consequently, some file data may be
lost or stale after recovery, and customers with a critical need for data consistency should
use synchronous I/O.
2.3.1 ReiserFS
ReiserFS is a radical departure from the traditional Unix filesystems, which are block-
structured. It will be available in the upcoming Red Hat 7.1 distribution and is already
available in SuSE Linux 7.0.
Hans Reiser writes about the filesystem he designed: "In my approach, I store both files
and filenames in a balanced tree, with small files, directory entries, inodes, and the tail
ends of large files all being more efficiently packed as a result of relaxing the
requirements of block alignment and eliminating the use of a fixed space allocation for
inodes." The effect is that a wide array of common operations, such a filename resolution
and file accesses, are optimized when compared to traditional filesystems such as ext2fs.
Furthermore, optimizations for small files are well developed, reducing storage overheads
due to fragmentation.
ReiserFS is not yet a true journaling filesystem (although full journaling support is
currently under development). Instead, buffering and preserve lists are used to track all
tree modifications, which achieves a very similar effect. This reduces the risk of
filesystem inconsistencies in the event of a crash and thus provides rapid recovery on
restart.
- 11 -
2.3.2 XFS
When SGI needed a high performance and scalable filesystem to replace EFS in 1990, it
developed XFS. XFS represents a leap into the future of filesystem management,
providing a 64-bit journalled filesystem that can handle large files quickly and reliably
XFS being 64-bit filesystem, which theoretically allows the creation of files that, are a
few million terabytes in size, which compares favorably to the limitations of 32-bit
filesystems. Being 64-bit filesystems it was must to have journalling support because of
large files support. XFS has separate sub-volume named, log sub-volume use to store log
(journal), to be use in recovery/crash process. XFS performs a binary search of the log for
transactions to replay. This eliminates the need to perform a slow total UNIX filesystem
check (fsck) after a system crash. Log file is discarded when data in file is not required.
2.3.3 JFS
IBM's JFS is a journaling filesystem used in its enterprise servers. It was designed for
"high-throughput server environments, key to running intranet and other high-
performance e-business file servers" according to IBM's Web site. Judging from the
documentation available and the source drops, it will still be a while before the Linux
port is completed and included in the standard kernel distribution.
JFS offers a sound design foundation and a proven track record on IBM servers. It uses
an interesting approach to organizing free blocks by structuring them in a tree and using a
special technique to collect and group continuous groups of free logical blocks. Although
it uses extents for a file's block addressing, free space is therefore not used to maintain
the free space. Small directories are supported in an optimized fashion (i.e., stored
directly within an inode), although with different limitations than those of XFS.
However, small files cannot be stored directly within an inode.
2.3.4 Ext3 FS
Ext3 FS is an alternative for all those who do not want to switch their filesystem, but
require journaling capabilities. It is distributed in the form of a kernel patch and provides
- 12 -
full backward compatibility. It also allows the conversion of an ext2fs partition without
reformatting and a reverse conversion to ext2fs, if desired.
However, using such an add-on to ext2fs has the drawback that none of the advanced
optimization techniques employed in the other journaling filesystems is available: no
balanced trees, no extents for free space, etc.
With the increasing size of hard disks, journaling filesystems are becoming important to
an ever-increasing number of users. If you ever waited for a filesystem check on a
machine with an 80GB hard disk, you know what I'm talking about. Even if you do not
plan to reboot your system often, they can save you a lot of time and trouble if you
experience a power failure or a hardware glitch. With the large number of contenders
striving to become the de-facto standard in the journaling filesystem space on Linux, we
can look forward to interesting months as these filesystems' code bases mature, are
integrated into the standard kernel, and are supported in upcoming releases of the major
Linux distributions.
However, migrating to another filesystem is not a trivial task. It usually requires backing
up your data, reformatting, and restoring the data onto the newly created volume. You
should thoroughly evaluate your options before making the switch.
- 13 -
3 . GETTING STARTED WITH XFS
This chapter describes brief overview of XFS filesystem, and list out features.
This chapter discusses need/evolution of XFS, along with some of goals it is made for.
XFS is a true next generation filesystem, not simply a rewrite or port of existing
technology. Most filesystems on the market are based upon old filesystem architectures
such as System V or BSD. XFS was "built from scratch". This allowed SGI to integrate
into XFS key features such as journalling for high reliability and redesign important areas
such as the allocation algorithms for increased performance with large filesystems. Thus,
XFS is the first new filesystem built for the demanding large filesystem needs of the
1990's:
- 14 -
-----------------------------------------------------------
TIME | FILESYSTEM
-----------------------------------------------------------
Early 1970's | Version 7 filesystem
Early 1980's | Berkeley "fast" filesystem (FFS)
Mid 1980's | Early journalled filesystems
Mid 1990's | XFS
-----------------------------------------------------------
As indicated above, most filesystems on the market evolved in an earlier age of smaller
filesystems, lower processing power, and limited storage capacity. Many current
filesystems are based on architectures that emphasize conserving storage space at the
expense of performance. The revolutionary growth of CPU and storage technology has
fueled a dramatic increase in the size and complexity of filesystems, outstripping the
capabilities of many filesystems. SGI designed XFS to meet this increasing need to
manage large complex filesystems with high performance and reliability
- 15 -
and filesystems. NFS version 3 is an available option for IRIX 5.3 and higher. Systems,
which support only NFS version 2, may access files on XFS filesystems subject to the
32-bit files size limit. XFS filesystems accessed over NFS version 2 may be greater than
2 gigabytes; all file accesses will operate correctly, but disk free space reports will be
incorrect. Additionally, the caching file system (CFS) uses NFS protocols, and allows
efficient remote access to XFS filesystems.
XFS is the first filesystem to integrate journal technology into a new filesystem rather
than add a section of journal code to an existing filesystem. This integration allows XFS
to be a more robust and faster filesystem. XFS uses a circular log, which usually employs
1000-2000 filesystem blocks. During normal system operation the log is written and
never read. Old information is dropped out the log file as its usefulness ends. These log
buffer writes are mostly performed asynchronously so as not to force user applications to
wait for them.
- 16 -
Approximately 1 transaction occurs per filesystem update operation. Batch transactions
enable XFS to make metadata updates faster than EFS. The atomic, multi-block updates
enabled by transactions allow XFS to cleanly update its complex metadata structures.
Figure 3.1 show XFS directory structure is based on B+-trees, which allow XFS to
maintain good response times, even as the number of files in a directory grows to tens or
hundreds of thousands of files. No other filesystem can match this metadata performance.
Moreover, the nodes within a B+Tree must contain a minimum number of pairs in order
to exist. Whenever a node's content gets below that minimum, the pairs contained would
be shifted to another existing node. This help in fast searches and rapid space allocation
- 17 -
3.2 Features Of XFS
Scalability which will support disk growth for decades to come
Scalable file sizes – million terabytes
Scalable filesystems – million terabytes
Scalable algorithm for high performance even with huge files
Large number of files
Large files including sparse files
Large directories
Advanced algorithm for fast performance on huge filesystems
High performance
Extremely fast transaction rates
Extremely high bandwidth
Extremely fast directory searches
Extremely fast space allocation
Exceptional performance : 500+ Mbytes/second
Advanced Features
Video Streaming Support
Guaranteed Rate I/O (GRIO)
- 18 -
Hierarchical storage (HSM)
Offsite storage appear as on-line
Compatibility
Backup with popular commercial packages such as Legato Networker
for IRIX
Support for multiple HSM including SGI DMF and Veritas HSM
through DMIG – DMAPI interface
NFS Compatibility: With NFS ver-3, 64-bit filesystem can be exported
to other system that supports NFS ver-3 protocol. Systems that uses
NFS ver-2 may access XFS filesystems with 32-bit limit imposed by
the protocol
Windows NT Compatibility : SGI uses open source Samba server to
export XFS filesystems to Windows and Windows NT systems.Samba
speaks SMB[Samba message block] and CIFS [Common Internet
Filesystem] protocols
- 19 -
4 . STRUCTURE OF XFS
Last chapter have given brief overview to what a XFS is and its features, here
we will discuss structure of XFS in detail.
4.1 Basics
Technically, XFS is based on the use of B+ trees to replace the conventional linear file
system structure. B+ trees provide an efficient way to index directory entries and manage
file extents, free space, and filesystem metadata. This guarantees quick directory listing
and file accesses. The allocation of disk blocks to inodes is done dynamically, which
means that you no longer need to create a filesystem with smaller block sizes for your
mail server; your filesystem will handle this automatically for you. XFS is also a 64-bit
filesystem, which theoretically allows the creation of files that are a few million terabytes
in size, which compares favorably to the limitations of 32-bit filesystems. The ability to
attach free-form metadata tags to files on an XFS volume is yet another useful feature of
this filesystem.
XFS also contains good support for multiprocessor machines. This is visible in the
implementation of the page buffer subsystem, which uses an AVL tree which is kept
separate from the objects to avoid locking problems and cache thrashing on larger SMP
systems. Multithreaded operation has been a declared design goal of this filesystem and
has been well-tested in large multiprocessor IRIX systems worldwide.
- 20 -
namespace manager handles allocation of directory files, normally placing them close to
the files in the directory for increased seek performance.
XFS space manager and namespace manger use sophisticated B-Tree indexing
technology to represent file location information contained inside directory files and to
represent the structure of the files themselves (location of information in a file). This
significantly increases the speed of accessing information in files, especially with large
files and filesystems. In the case of large filesystems, traditional filesystems linearly
search the file location information in directory files; this information often spans
multiple blocks and/or extents (a collection of blocks). XFS's B-Tree technology enables
it to go directly to the blocks and/or extents containing a file's location using
sophisticated indices. With large files, the B-Tree indices efficiently map the location of
the extents containing the file's data. This avoids the slower multi-level indirect schemes
(of other filesystems) which often require multiple block reads to find the desired
information.
4.2.2 Structure
The space manager divides each filesystem data sub-volume (or partition) into a number
of allocation groups at mkfs time. Each allocation group has a collection of inodes and
data blocks, and data structures to control their allocation. These allocation groups help to
divide the space management problem into easy to mange pieces, speeding up file
creation.
The administrator specifies the inode size at mkfs time (XFS set defaults the size to 256
byte). However, the location and number of inodes are allocated as needed by XFS,
unlike most filesystems, which require the static creation of the inodes at mkfs time. This
dynamic inode allocation permits greater performance (fewer seeks) since inodes can be
positioned closer to files and directories that use them.
Free blocks in an allocation group are kept track of using a pair of B-trees instead of a
bitmap. This provides better scalability to large systems. Block allocation and free block
operations are performed in parallel if done in different allocation groups.
- 21 -
The real-time sub-volume is divided into a number of fixed-size extents (a collection of
blocks) to facilitate uniform read/writes during guaranteed-rate I/O operations. The size is
chosen at mkfs time and is expected to be large (on the order of 1MByte). The real-time
subvolume allocation uses a bitmap to assure predictable performance; the access time
for B-tree indices varies.
The filesystem block size is set at filesystem creation time using mkfs. XFS supports
multiple block sizes, ranging from 512 bytes (disk sector size) up to 64KB and up to 1GB
for real-time data. The filesystem block size is the minimum unit of allocation for user
data in the filesystem. As s file is written by an application, space is reserved but blocks
are not allocated. This ensures that space is available, but gives XFS more flexibility in
allocating blocks.
XFS delays allocation of user data blocks when possible to make blocks more
contiguous; holding them in the buffer cache. This allows XFS to make extents large
without requiring the user to specify extent size, and without requiring a filesystem
reorganizer to fix the extent sizes after the fact. This also reduces the number of writes to
disk and extents used for a file.
- 22 -
4.2.4 Efficient Filesystem
As implied earlier, XFS uses sophisticated filesystem management techniques such as
extents and B-Tree indices to efficiently support:
Sparse files
Sparse files are files that contain arbitrary "holes," areas of the file which have never
been written and which read back as zeroes. XFS supports holes (avoids wasted space) by
using indexing (B-Trees) and extents.
- 23 -
Mapped files
Memory mapping of files is supported by XFS. It allows a program to "attach" a file such
that the file appears to be part of the program, and let's XFS worry about managing the
disk I/Os.
Large directories
Large directories are indexed in directory files using B-Trees to expedite searches,
insertions, and deletions. Operations on directories containing millions of files are almost
as fast as on directories containing only hundreds of files.
- 24 -
successfully tested an early "alpha" version of XFS on a large 377-gigabyte filesystem.
Using 24 SCSI channels and 24 Ciprico RAIDs, Tom measured direct I/O throughput of:
186 megabytes/second sustained read or write speed, 1 process
330 megabytes/second sustained read speed, 4 processes
Furthermore, XFS is designed to scale in performance to match the new CHALLENGE
MP architecture and beyond. In traditional filesystems, files, directories, and filesystems
have reduced performance as they grow in size; with XFS there is no performance
penalty.
- 25 -
the kernel at filesystem mount time. XFS performs a binary search of the log for
transactions to replay. This eliminates the need to perform a slow total UNIX filesystem
check (fsck) after a system crash. Also, when fsck finds inconsistent data structures it
must throw away anything suspicious. XFS knows what was happening at the time of
failure, so it never needs to throw anything away; it simply finishes what it started. Thus,
XFS's journalled recovery provides higher filesystem integrity than does standard UNIX.
To ensure data consistency, some log writes must be synchronous. This does not include
ordinary (buffered) writes. Direct I/O, synchronous I/O, or fsync() allow synchronous
user data writes.
- 26 -
and character devices in the /dev directory. Filesystems, databases, and other applications
access the volumes rather than the partitions. Each volume can be used as a single
filesystem or as a raw partition. A logical volume might include partitions from several
physical disks and, thus, be larger than any of the physical disks. Filesystems built on
these volumes can be created, mounted, and used in the normal way.
The volume manager stores all configuration data in the disk's labels. These labels are
stored on each disk and will be replicated so that a logical volume can be assembled even
if some pieces are missing. There is a negligible performance penalty for using XLV
when compared to accessing the disk directly; although plexing (mirroring data) will
mildly degrade write performance.
4.4.2 Sub-volumes
Within each logical volume, the volume manager implements sub-volumes, which are
separate linear address spaces of disk blocks in which the filesystem stores its data. For
EFS filesystems, a volume consists of just one sub-volume. For XFS filesystems, a
volume consists of a data sub-volume, an optional log sub-volume, and an optional real-
time sub-volume:
Data sub-volume
The data sub-volume contains user files and filesystem metadata (inodes, directories, and
free space blocks). It is required in all logical volumes containing XFS filesystems and is
the only sub-volume present in the EFS filesystems.
Log sub-volume
The log sub-volume contains XFS journalling information. It is a log of filesystem
transactions and is used to expedite system recovery after a crash.
Real-time sub-volume
Real-time sub-volumes are generally used for data applications such as video where
guaranteed response time is paramount.
Sub-volumes facilitate separation of different data types. For example, user data
could be prevented from overwriting filesystem log data. Sub-volumes also enable
filesystem data and user data to be configured to meet goals for performance and
reliability by putting sub-volumes on different disk drives - particularly useful for
- 27 -
separating out real-time data for guaranteed rate I/O operations. Each sub-volume can
also be optimally sized and organized independently. For example, the log sub-volume
can be plexed (mirrored) for fault tolerance and the real-time sub-volume can be striped
across a large number of disks to give maximum throughput for video playback.
Each sub-volume is made of partitions (real, physical regions of disk blocks) composed
by concatenation, plexing (mirroring), and striping. The volume manager is responsible
for translating logical addresses in the linear address spaces into real disk addresses from
the partitions. Where there are multiple copies of a logical block (plexing), the volume
manager writes simultaneously to all copies, and reads from any copy (since all copies
are identical). The volume manager maintains the equality of plexes across crashes and
both temporary and permanent disk failures. The volume manager performing retries and
rewrites masks single block failures in the plexed volumes
.
4.4.3 Plex and Volume Elements
A volume used for filesystem operations is usually composed of at least two sub-
volumes, one for the log and one for data. Each sub-volume can consist of a number of
plexes (mirrored data). Plexes are individually organized, but are mapped to the same
portion of the sub-volume's address space. Plexes may be added or detached while the
volume is active. The root filesystem may be plexed.
A plex consists of 1 to 128 volume elements, each of which maps a portion of the plex’s
address space (physical data location on disk), and are concatenated together. Each
volume element can be striped across a number of disk partitions. XFS allows online
growth of a filesystem using xfs_growfs.
- 28 -
Auto-assembly of logical volumes
The volume manager will assemble logical volumes by scanning the hardware on the
system and reading all the disk labels at system boot time.
- 29 -
XLV supports RAID devices. Thus, XLV is capable of handling large data transfers
generated from the filesystem code, through the volume manager, down to the RAID
driver.
- 30 -
configuration. They guarantee to deliver the requested performance, but with some
possibility of error in the data (due to the need to turn off disk drive self-diagnostics and
error-correction firmware). Hard guarantees are only possible if all the drives are on one
SCSI bus and XFS knows and "trusts" all the devices on that bus (such as using all disk
drives instead of unpredictable tape drives). Otherwise, XFS will allow a soft guarantee,
which allows the disk drive to retry operations in the event of an error, but this can
possibly result in missing the rate guarantee.
- 31 -
Data Management APplication Interface (DMAPI). Silicon Graphics is committed to the
group's goal of simplifying and standardizing an HSM interface to filesystems. The
DMIG also includes other companies such as IBM, Sun, Epoch, EMASS, Hitachi,
Veritas, etc.
The DMIG has produced a draft of the standard (Version 2.1 dated March 1995) whose
changes will be tracked as the DMAPI is implemented and used in the marketplace. XFS
has already implemented much of the DMAPI interface while most other major vendors
have lagged on their implementation. Two companies, EMASS and Hitachi, have
released HSM products that use the DMAPI. Many other HSM vendors have shown
interest in the DMAPI.
- 32 -
DMAPI to rapidly read the filesystem without understanding details of XFS's internal
structure, resulting in a far less complex program.
4.7.2 Features
Unlike traditional filesystems, which must be inactivated and then dismounted to
guarantee a consistent dump image, you can dump an XFS filesystem while it is being
used. Furthermore, XFS dumps/restores are resumable. This means that dump/restore
can resume after a temporary termination where it left off, instead going back to the
beginning of the backup process. XFS uses a log mechanism to see where the
dump/restore temporarily stopped and proceeds from there.
XFS supports two types of dumps: level and subtree. XFS can dump by levels, which
indicate whether to dump all of the filesystem or various incremental dumps of just
changes to the filesystem. XFS can also do "subtree" dumps by file name. With both
types of dumps the user does not have to perform a complete filesystem dump to backup
data. The online inventory of dump/restore actions contains much more information than
the old /etc/dumpdates file. The new online inventory directory /var/xfsdump provides
an extensive review of the dump history displayable on screen. This information will help
administrators to quickly restore filesystems.
- 33 -
If media errors occur during xfsdump, XFS dump allows the user to terminate the
current tape and get another one (resumable dump)
If media error occurs during xfsrestore, XFS restore uses efficient
resynchronization to restart/abandon the restore
XFS dump provides an "on-demand" progress report of all media handling operations.
Additionally, XFS restore can restore from tapes in any order, independent of how the
filesystem was dumped.
4.7.5 Administration
For backup and restore of files less than 2 GB in size, the standard IRIX utilities Backup,
bru, cpio, Restore, and tar may be used. To dump XFS filesystems, the new utility
xfsdump must be used instead of dump. Restoring from these dumps is done using
xfsrestore. Xfsrestore also allows the backup media to be on a remote host.
- 34 -
5 . ARCHITECTURE OF XFS
Last chapter have describe structure of XFS in detail, here we will discuss
architecture diagram of XFS and all it different component. We will also discuss
compatibility with system and its component
- 35 -
Architecture block diagram of XFS (figure 5.1) shows different component of XFS, these
components are discuss in brief here
5.1 System Call and Vnode Interfaces
The filesystem related calls are implemented at the system call and vnode interface:
read, write, open, ioctl, etc., for all filesystem types. The operations are then vectored
out to different routines for each filesystem type through the vnode interfaces.
The vnode interfaces also allow interoperation with remote clients such as NFS. As
indicated earlier, NFS Version 3.0 provides 64-bit file sharing capabilities for XFS.
System call and vnode operations support Hierarchical Storage Management (HSM)
and backup applications. An industry-wide working group (DMIG, Data Management
Interface Group) designed these
.
5.2 Lock Manager
The XFS lock manager implements locking on user files, supporting standard UNIX
file locking calls such as fcntl and flock. The XFS lock manager is similar to the EFS
lock manager with comparable performance.
Files are also identified internally by a numeric human-readable value unique to the
file, called the file unique id. Filesystems may be identified either by a "magic
cookie", typically a memory address of the root inode, or by a filesystem unique id.
Filesystem unique id's are assigned when the filesystem is created and are associated
uniquely with that filesystem until the filesystem is destroyed. In both cases, the
unique id's help administrators trouble shoot systems by clearly identifying different
files and filesystems.
- 36 -
The namespace manager manages the directory structures and the contents of the
inode that are unrelated to space management (such as file permissions and time
stamps). The namespace manager uses a cache to speed up naming operations. The
details of the name translation are hidden from the callers.
- 37 -
cache, since specific log writes must be sequenced before and after data operations
for correctness if there is a crash.
The space and name manager subsystems send logging requests to the log manager.
Each request may fill a partial log block or multiple blocks of the log. The log is
implemented as a circular sequential list, which wraps when writes reach the end.
Each log entry contains a log sequence number, so that the end of the log may be
found by looking for the highest sequence number.
On plexed volumes, the buffer cache is also responsible for inserting log records for
non-metadata blocks, so that the volume manager's write-change log does not need to
be used by the filesystem. This allows the system to keep the plexes of a volume
synchronized with each other in the event of a crash between writes.
- 38 -
5.9 Volume Manager
Also detailed earlier, XLV interposes a layer between the filesystem and the disk
drivers by building logical volumes (also known simply as volumes) on top of the
partition devices. A volume is a faster, more reliable "disk" made from many physical
disks, which allows concatenation, striping, and plexing of data.
- 39 -
IRIX 6.1 with XFS included -Power Challenge, Power Onyx, Power
Indigo II only (R8000-based machines)
-The root and /usr filesystems are factory installed as an XFS filesystem
on Power Challenge
XFS may be added to systems, which are running IRIX 5.3 and 6.0.1 by
upgrading to IRIX 5.3 with XFS or IRIX 6.1 (XFS included). XFS is a standard
integrated part of IRIX 6.1 and beyond.
With NFS ver-3, 64-bit filesystem can be exported to other system that supports NFS ver-
3 protocol. Systems that uses NFS ver-2 may access XFS filesystems with 32-bit limit
imposed by the protocol
- 40 -
6 . XFS verses OTHER FILESYSTEMS
This chapter describes why XFS, advantages of XFS over other file systems,
compatibility with other filesystems, some unique features.
18 thousand
XFS 512 bytes to 64KB 9 thousand petabytes
petabytes
512 bytes blocks / 4
petabytes 512 Tb with 512 bytes
JFS 512, 1024, 2048, 4096 bytes blocks
4KB blocks / 32 4 petabytes with 4KB blocks
petabytes
- 41 -
Other Dynamic I-node Dynamic I-node tracking Support for sparse
techniques allocation structures files
XFS YES B+tree YES
Figure 6.2 show support of XFS and other filesystems for dynamic inode allocation,
which does not limit number of file entries in filesystems
- 42 -
7 . XFS support for Linux
This chapter describes how to build a Linux system that runs on top of the SGI XFS
journaling filesystem. It describes preparation for XFS installation, configuring kernel for
installation and filesystem migration
- 43 -
$ export CVSROOT=':pserver:[email protected]:/cvs'
If you are running csh or tcsh:
$ setenv CVSROOT :pserver:[email protected]:/cvs
If you plan on updating your kernel often (to keep up with the latest changes) you
might want to put this in your login script. Then log in to the cvs server.
$ cvs login (the password is "cvs")
This needs to be done only ONCE, not everytime you access CVS.
Now grab linux-2.4-xfs. The first time you will want to do something like:
$ cvs -z3 co linux-2.4-xfs
After you have checked the code out, you can use:
$ cvs -z3 update linux-2.4-xfs
to update your copy to the latest version from the CVS server.
- 44 -
$ make dep, $ make bzImage , $ make modules
7.2.4 Add a new entry to your lilo configuration and re-install lilo
$ vi /etc/lilo.conf
Add a new image section to your lilo.conf file similar to the following:
image=/boot/vmlinuz-2.4.0-XFS label=xfs read-only
root=/dev/hda2
The "root=" line should match the "root=" line from the existing image sections in your
lilo.conf file. Don't forget to run lilo when you're through editing lilo.conf to make the
changes effective.
- 45 -
Make partition as need. Probably the trickiest part of creating a fully XFS system is
migrating the / filesystem since that is the system that supports the entire rest of the
system and it cannot actually be unmounted while the system is running.
Bibliography
Website :
- 46 -