0% found this document useful (0 votes)
76 views46 pages

XFS - Extended Filesystem

This document discusses journaling filesystems. It explains that journaling filesystems maintain a journal (log) file that records all metadata transactions before writing them to disk. This allows the filesystem to recover to a consistent state quickly after a crash by reading the journal instead of running a lengthy filesystem check. Some advantages of journaling include faster reboot times after crashes and more reliability by guaranteeing filesystem consistency. Popular journaling filesystems for Linux include XFS, JFS, and Ext3.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views46 pages

XFS - Extended Filesystem

This document discusses journaling filesystems. It explains that journaling filesystems maintain a journal (log) file that records all metadata transactions before writing them to disk. This allows the filesystem to recover to a consistent state quickly after a crash by reading the journal instead of running a lengthy filesystem check. Some advantages of journaling include faster reboot times after crashes and more reliability by guaranteeing filesystem consistency. Popular journaling filesystems for Linux include XFS, JFS, and Ext3.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

XFS – Extended Filesystem

Seminar by:
Bauskar Krunal Suresh
Pune Institute Of Computer Technology

Date: 12/08/2002

-1-
CERTIFICATE

This is to certify that Mr. Bauskar Krunal Suresh of B.E.Computer has


successfully completed his necessary seminar work and prepared a bonafide
report on topic ―XFS-Extended Filesystem‖ in satisfactory manner at Pune
at Pune Institute of Computer Technology (P.I.C.T) as partial fulfillment of
degree course in computer in academic year 2002-2003 as prescribe by
University of Pune.

Date :
Place:

------------------------ ----------------------- ----------------------

Guide Seminar Principal


Co-ordinator

-2-
ACKNOWLEDGEMENT

It gives me proud privilege to complete this seminar work on


―XFS-Extended Filesystem‖ under valuable guidance of Prof. R.B.Ingle
(Guide) at Pune Institute of Computer Technology (P.I.C.T), Pune. I am also
extremely grateful to CVK Rao (H.O.D of Computer Department) and
G.P.Potdar (Seminar Co-ordinator) for providing all facility and every help
for smooth progress of seminar work.
I would also like to thanks all Staff Member of Computer
department for timely help and encouragement of fulfillment of seminar
work .At last I would like to thanks all my colleague for providing help in
my work.

-3-
INDEX:

Chapter 1: INTRODUCTION TO FILESYSTEM


1.1 Overview ……………………………………………………..…6
1.2 New Generation Filesystems…………………………………….7

Chapter 2: JOURNALING FILESYSTEM

2.1 What a Journal mean?……………………………………………9


2.2 Journaling Filesystems………...………………………………..9
2.3 Different types of Journalled Filesystems……………………..11
2.4 Advantages of journalled filesystems……………………..…….13

Chapter 3: GETTING STARTED WITH XFS

3.1Getting started with XFS?…..…………………………………...14


3.2 Features Of XFS....…………...………………………………....18

Chapter 4: STRUCTURE OF XFS

4.1 Basics……………..…………………………………………..…20
4.2 File Management…………………………………..…………….20
4.3 XFS as Journalled Filesystem……………………..…………….25
4.4 Volume Management.……………………………..…………….26
4.5 Guaranteed Rate I/O …..…………………………..…………….30
4.6 Data Management Interface ……..………………..…………….31
4.7 Expanded Dump/Restore Capabilities…………..………………32

-4-
Chapter 5: ARCHITECTURE OF XFS

5.1 Architecture of XFS………………………………..…………....35


5.2 Compatibility with system and system components…………….39
5.3 Advantages of XFS ………...……………………..…………….40

Chapter 6: XFS verses OTHER FILESYSTEM

6.1 Why XFS?….……………………………….…………………...41


6.2 XFS Vs other filesystems...…...…………………………………41
6.3 Future advancements…………….....…..………………………..42

Chapter 7: XFS SUPPORT FOR LINUX

7.1 Preparation for XFS installation...………….…………………...43


7.2 Configuring kernel and installation………………..……………44
7.3 Filesystem migration………………..…..………………………45

Bibliography……………………………………………………..46

-5-
1 . INTRODUCTION TO FILESYSTEM

This chapter describes general information about filesystems, such as files,


filesystem, need for filesystems, need for larger filesystems.

1.1 Overview
1.1.1 File
File can be defined as – logically file means, collection of record, text, images, etc
physically - collection of bytes (in memory) in which these bytes are stored in particular
order and accessed in certain order to interpret data as information

1.1.2 Filesystem
Filesystem is system that is use to control/manage file on computer system/machine.
Input to this system is control command from user; output would be result of execution of
this command.

1.1.3 Need for Filesystem


Filesystem allows user to create, delete, etc, files, as required, for managing disk space
efficiently, reliably, consistently. It provides faster as well as easy access to any file. All
these operation are hidden from user. User need not worry about things. Most of
filesystem provides facility to have restricted access to files/folder (collection of files) to
particular group of user.

1.1.4 Need for Large file


Most of recent application such as video server, database server, web server etc. demand
increase disk capacity and high bandwidth, with parallelism. These applications

-6-
demanded use of larger files to store data, with extremely fast crash recovery, support for
large filesystems, directories with large numbers of files, and fair performance with small
and large files. Many of these future applications would require still larger file structure,
which should be scalable and reliable.

1.1.5 Need for Larger filesystem


There are two major problems with old structures:
They are unable to cope with new storage capacities; old FS were designed with
certain file, directory and partition sizes in mind. File system structures have a fixed
number of bits to store file size information, a fixed number of bits to store the logical
block number, etc. As a consequence of that fixed number of bits, file sizes, partition
sizes and the number of directory entries are limited. Old structures often lack the number
of bits required to manage certain object sizes.
They are inadequate to manage with new storage capacities: although old
structures are sometimes able to manage with new object sizes, they are sometimes
inadequate to manage with them for performance reasons. The main reason is that certain
structures behave well with old sizes, but with the new ones lead to performance losses.

New-generation file systems have been designed to overcome those problems, keeping
scalability in mind. Several new structures and techniques have been included in those
filesystems.

Growing demand for larger files, had made it must to have scalable, reliable and high
performance filesystems that can handle larger files and performance shouldn’t be
degraded. Until evolution these filesystem, some older filesystem provided support to
some larger files but result into high performance penalty.

1.2 New Generation Filesystems


In recent years, most filesystems on the market evolved in an earlier age of smaller
filesystems, lower processing power, and limited storage capacity. Many current
filesystems are based on architectures that emphasize conserving storage space at the

-7-
expense of performance. The revolutionary growth of CPU and storage technology has
fueled a dramatic increase in the size and complexity of filesystems, outstripping the
capabilities of many filesystems.

Today, at least four major players exist in the Linux journaling filesystem arena. They are
in various stages of completion, with some of them becoming ready for use in production
systems. They are:

 Hans Reiser's ResierFS


 SGI's XFS
 IBM's JFS
 Ext3

-8-
2 . JOURNALING FILESYSTEM

This chapter describes about journaling filesystems; it’s need, its


advantages, and a brief overview of different types of journal filesystems that support
journaling.

2.1 What a Journal mean?


Logs are maintained in each volume and used to record information about operations on
meta-data. The log has a format that also is set by the file system creation utility. A single
log may be used simultaneously by multiple mounted filesets.

Journaling filesystems maintain a special file called a log (or journal), the contents of
which are not cached. Whenever the filesystem is updated, a record describing the
transaction is added to the log. An idle thread processes these transactions, writes data to
the filesystem, and flags each processed transaction as completed. If the machine crashes,
the background process is run on reboot and simply finishes copying updates from the
journal to the filesystem. Incomplete transactions in the journal file are discarded, so the
filesystem's internal consistency is guaranteed.

2.2 Journaling Filesystems


2.2.1 Need for Journaling filesystem
Non-Journalled file systems rely on restart-time utilities (that is, fsck), which examine all
of the file system's meta-data (such as directories and disk addressing structures) to detect
and repair structural integrity problems. This is a time-consuming and error-prone
process, which, in the worst case, can lose or misplace data.

Waiting for a fsck to complete on a server system can tax your patience more than it
should. Fortunately, a new breed of filesystem is coming to your Linux machine soon.
This cuts the complexity of a filesystem check by a couple of orders of magnitude. A

-9-
full-blown consistency check is never necessary (in contrast to ext2fs and similar
filesystems) and restoring a filesystem after a reboot is a matter of seconds at most.

Buffer cache is a buffer allocated in the main memory intended to speed I/O operations.
This kind of buffer is commonly used in file systems — the disk cache — and databases
to increase overall performance. The problem appears if there is a system crash, before
the buffers have been written to disk, that would cause the system to behave in an
inconsistent way after system reboot. Think of a file deleted in the cache, but remaining
in the hard disk. That's why databases and file systems have the ability to recover the
system back to a consistent state. Although databases have recovered quickly for years,
the file systems and more precisely UFS-like ones tend to increase their recover time as
file system size grows. The fsck recover tool for ext2fs has to scan through the entire disk
partition in order to take the file system back to a consistent state. This time-consuming
task often creates a lack of availability for large servers with hundreds of gigabytes or
sometimes terabytes. This is the main reason for the file systems to inherit database
recover technology, and thus the appearance of Journal File Systems.

2.2.2 What is Journaling filesystem?


The filesystem is a "journalled" filesystem. This means that updates to filesystem
metadata (inodes, directories, bitmaps, etc.) are written to a serial log area on disk before
the original disk blocks are updated in place. In the event of a crash, such operations can
be redone using data present in the log, to restore the filesystem to a consistent state.

These filesystems uses techniques originally developed for databases to log information
about operations performed on the file system meta-data as atomic transactions. In the
event of a system failure, a file system is restored to a consistent state by replaying the
log and applying log records for the appropriate transactions. The recovery time
associated with this log-based approach is much faster since the replay utility need only
examine the log records produced by recent file system activity rather than examine all
file system meta-data. Several other aspects of log-based recovery are of interest. First,
JFS only logs operations on meta-data, so replaying the log only restores the consistency

- 10 -
of structural relationships and resource allocation states within the file system. It does not
log file data or recover this data to consistent state. Consequently, some file data may be
lost or stale after recovery, and customers with a critical need for data consistency should
use synchronous I/O.

2.3 Different types of Journalled Filesystems


The "journalled" File System provides a log-based system that was developed for
transaction-oriented, high performance systems. Scalable and robust, its advantage over
non-journaled file systems is its quick restart capability, and can restore a file system to a
state in a matter of seconds or minutes.

2.3.1 ReiserFS
ReiserFS is a radical departure from the traditional Unix filesystems, which are block-
structured. It will be available in the upcoming Red Hat 7.1 distribution and is already
available in SuSE Linux 7.0.

Hans Reiser writes about the filesystem he designed: "In my approach, I store both files
and filenames in a balanced tree, with small files, directory entries, inodes, and the tail
ends of large files all being more efficiently packed as a result of relaxing the
requirements of block alignment and eliminating the use of a fixed space allocation for
inodes." The effect is that a wide array of common operations, such a filename resolution
and file accesses, are optimized when compared to traditional filesystems such as ext2fs.
Furthermore, optimizations for small files are well developed, reducing storage overheads
due to fragmentation.

ReiserFS is not yet a true journaling filesystem (although full journaling support is
currently under development). Instead, buffering and preserve lists are used to track all
tree modifications, which achieves a very similar effect. This reduces the risk of
filesystem inconsistencies in the event of a crash and thus provides rapid recovery on
restart.

- 11 -
2.3.2 XFS
When SGI needed a high performance and scalable filesystem to replace EFS in 1990, it
developed XFS. XFS represents a leap into the future of filesystem management,
providing a 64-bit journalled filesystem that can handle large files quickly and reliably

XFS being 64-bit filesystem, which theoretically allows the creation of files that, are a
few million terabytes in size, which compares favorably to the limitations of 32-bit
filesystems. Being 64-bit filesystems it was must to have journalling support because of
large files support. XFS has separate sub-volume named, log sub-volume use to store log
(journal), to be use in recovery/crash process. XFS performs a binary search of the log for
transactions to replay. This eliminates the need to perform a slow total UNIX filesystem
check (fsck) after a system crash. Log file is discarded when data in file is not required.

2.3.3 JFS
IBM's JFS is a journaling filesystem used in its enterprise servers. It was designed for
"high-throughput server environments, key to running intranet and other high-
performance e-business file servers" according to IBM's Web site. Judging from the
documentation available and the source drops, it will still be a while before the Linux
port is completed and included in the standard kernel distribution.

JFS offers a sound design foundation and a proven track record on IBM servers. It uses
an interesting approach to organizing free blocks by structuring them in a tree and using a
special technique to collect and group continuous groups of free logical blocks. Although
it uses extents for a file's block addressing, free space is therefore not used to maintain
the free space. Small directories are supported in an optimized fashion (i.e., stored
directly within an inode), although with different limitations than those of XFS.
However, small files cannot be stored directly within an inode.

2.3.4 Ext3 FS
Ext3 FS is an alternative for all those who do not want to switch their filesystem, but
require journaling capabilities. It is distributed in the form of a kernel patch and provides

- 12 -
full backward compatibility. It also allows the conversion of an ext2fs partition without
reformatting and a reverse conversion to ext2fs, if desired.

However, using such an add-on to ext2fs has the drawback that none of the advanced
optimization techniques employed in the other journaling filesystems is available: no
balanced trees, no extents for free space, etc.

2.4 Advantages of journalled filesystems


This cuts the complexity of a filesystem check by a couple of orders of magnitude. A
full-blown consistency check is never necessary (in contrast to ext2fs and similar
filesystems) and restoring a filesystem after a reboot is a matter of seconds at most.

With the increasing size of hard disks, journaling filesystems are becoming important to
an ever-increasing number of users. If you ever waited for a filesystem check on a
machine with an 80GB hard disk, you know what I'm talking about. Even if you do not
plan to reboot your system often, they can save you a lot of time and trouble if you
experience a power failure or a hardware glitch. With the large number of contenders
striving to become the de-facto standard in the journaling filesystem space on Linux, we
can look forward to interesting months as these filesystems' code bases mature, are
integrated into the standard kernel, and are supported in upcoming releases of the major
Linux distributions.

However, migrating to another filesystem is not a trivial task. It usually requires backing
up your data, reformatting, and restoring the data onto the newly created volume. You
should thoroughly evaluate your options before making the switch.

- 13 -
3 . GETTING STARTED WITH XFS
This chapter describes brief overview of XFS filesystem, and list out features.
This chapter discusses need/evolution of XFS, along with some of goals it is made for.

3.1 Getting started with XFS


XFS is Silicon Graphics' (SGI) next-generation filesystem and volume manager for IRIX
systems. XFS represents a leap into the future of filesystem management, providing a 64-
bit journalled filesystem that can handle large files quickly and reliably. In addition, XFS
provides the unique feature of guaranteed rate I/O that leverages Silicon Graphics'
intimate knowledge of media serving environments. As a next generation product, XFS
was designed as a new fully integrated product, rather than a modification of an existing
product. Thus, important filesystem features such as journalling were seamlessly
integrated into XFS, instead of being awkwardly added to an existing filesystem.

3.1.1 Evolution of XFS


When SGI needed a high performance and scalable filesystem to replace EFS in 1990, it
developed XFS, to handle the demands of increased disk capacity and bandwidth, and
parallelism with new applications such as film, video, and large databases. These
demands included extremely fast crash recovery, support for large filesystems, directories
with large numbers of files, and fair performance with small and large files.

XFS is a true next generation filesystem, not simply a rewrite or port of existing
technology. Most filesystems on the market are based upon old filesystem architectures
such as System V or BSD. XFS was "built from scratch". This allowed SGI to integrate
into XFS key features such as journalling for high reliability and redesign important areas
such as the allocation algorithms for increased performance with large filesystems. Thus,
XFS is the first new filesystem built for the demanding large filesystem needs of the
1990's:

- 14 -
-----------------------------------------------------------
TIME | FILESYSTEM
-----------------------------------------------------------
Early 1970's | Version 7 filesystem
Early 1980's | Berkeley "fast" filesystem (FFS)
Mid 1980's | Early journalled filesystems
Mid 1990's | XFS
-----------------------------------------------------------

As indicated above, most filesystems on the market evolved in an earlier age of smaller
filesystems, lower processing power, and limited storage capacity. Many current
filesystems are based on architectures that emphasize conserving storage space at the
expense of performance. The revolutionary growth of CPU and storage technology has
fueled a dramatic increase in the size and complexity of filesystems, outstripping the
capabilities of many filesystems. SGI designed XFS to meet this increasing need to
manage large complex filesystems with high performance and reliability

3.1.2 64-bit Filesystem


3.1.2.1 Room to Grow
The XFS filesystem supports files of up to 263-1 bytes = 9,223,372,036,854,775,807
bytes (9 million terabytes) and filesystems of up to 264 bytes. In IRIX 5.3, files and
filesystems are restricted to 240-1 = 1,099,511,627,775 bytes (1 terabyte). IRIX 6.1
supports full 64-bit files with filesystems subject to the 1-terabyte restriction. Thus with
XFS, filesystems are more limited by the user's ability to attach disks to the hardware
than by the filesystem's capacity. Disk drive capacities are growing approximately 1.7
times per year, so this added filesystem capacity will prove invaluable now and in the
future. As nearly the only filesystem on the market specially designed to scale to the full
64-bit address space; XFS is leading the way into the future of large filesystem
management.

3.1.2.2 64-bit Compatibility


Both 32-bit and 64-bit interfaces are supported where the underlying OS and hardware
support them. NFS version 3, which is a 64-bit protocol, allows the export of large files

- 15 -
and filesystems. NFS version 3 is an available option for IRIX 5.3 and higher. Systems,
which support only NFS version 2, may access files on XFS filesystems subject to the
32-bit files size limit. XFS filesystems accessed over NFS version 2 may be greater than
2 gigabytes; all file accesses will operate correctly, but disk free space reports will be
incorrect. Additionally, the caching file system (CFS) uses NFS protocols, and allows
efficient remote access to XFS filesystems.

32-bit applications may access 64-bit files through two methods.


First, applications may use extended system calls lseek64(), stat64(), etc. These
system calls enable the user to write 32-bit programs that can track 64-bit file position
and file size.
Second, The new compilers from SGI allow administrators to build N32
application interfaces that help the administrators avoid changing any source code. These
compilers are available for IRIX 6.1 and above.

3.1.3 XFS- Journalled Filesystem


As mentioned earlier, XFS utilizes database journalling (logging) technology. Updates to
filesystem metadata are written to a separate serial log area before the original disk
blocks are updated. Journalling promotes high availability by allowing systems to quickly
recover from failures while maintaining the disk-based file structure in a consistent state
at all times.

XFS is the first filesystem to integrate journal technology into a new filesystem rather
than add a section of journal code to an existing filesystem. This integration allows XFS
to be a more robust and faster filesystem. XFS uses a circular log, which usually employs
1000-2000 filesystem blocks. During normal system operation the log is written and
never read. Old information is dropped out the log file as its usefulness ends. These log
buffer writes are mostly performed asynchronously so as not to force user applications to
wait for them.

- 16 -
Approximately 1 transaction occurs per filesystem update operation. Batch transactions
enable XFS to make metadata updates faster than EFS. The atomic, multi-block updates
enabled by transactions allow XFS to cleanly update its complex metadata structures.

3.1.4 B+ -Tree Structure


XFS uses B+-tree (advanced version of balanced tree). That means the length of the path
taking us from the tree's root to any leaf node should always be the same.

3.1 XFS Binary Tree Directory Speed

Figure 3.1 show XFS directory structure is based on B+-trees, which allow XFS to
maintain good response times, even as the number of files in a directory grows to tens or
hundreds of thousands of files. No other filesystem can match this metadata performance.
Moreover, the nodes within a B+Tree must contain a minimum number of pairs in order
to exist. Whenever a node's content gets below that minimum, the pairs contained would
be shifted to another existing node. This help in fast searches and rapid space allocation

- 17 -
3.2 Features Of XFS
 Scalability which will support disk growth for decades to come
 Scalable file sizes – million terabytes
 Scalable filesystems – million terabytes
 Scalable algorithm for high performance even with huge files
 Large number of files
 Large files including sparse files
 Large directories
 Advanced algorithm for fast performance on huge filesystems

 Journaling and Reliability


 Guarantees filesystems consistency for high reliability
 Rapid restart [less than a second] after unexpected interrupts
 Proven technology : Hundred of Thousands of systems in the field
have use XFS reliably for years
 Designed with log/database (journal) technology as a fundamental part
not just an extension to an existing filesystem

 High performance
 Extremely fast transaction rates
 Extremely high bandwidth
 Extremely fast directory searches
 Extremely fast space allocation
 Exceptional performance : 500+ Mbytes/second

 Advanced Features
 Video Streaming Support
 Guaranteed Rate I/O (GRIO)

- 18 -
 Hierarchical storage (HSM)
 Offsite storage appear as on-line

 Compatibility
 Backup with popular commercial packages such as Legato Networker
for IRIX
 Support for multiple HSM including SGI DMF and Veritas HSM
through DMIG – DMAPI interface
 NFS Compatibility: With NFS ver-3, 64-bit filesystem can be exported
to other system that supports NFS ver-3 protocol. Systems that uses
NFS ver-2 may access XFS filesystems with 32-bit limit imposed by
the protocol
 Windows NT Compatibility : SGI uses open source Samba server to
export XFS filesystems to Windows and Windows NT systems.Samba
speaks SMB[Samba message block] and CIFS [Common Internet
Filesystem] protocols

 Integrated, full-function volume manager called XLV

 Extremely high I/O performance that scales well on multiprocessing


systems

- 19 -
4 . STRUCTURE OF XFS

Last chapter have given brief overview to what a XFS is and its features, here
we will discuss structure of XFS in detail.

4.1 Basics
Technically, XFS is based on the use of B+ trees to replace the conventional linear file
system structure. B+ trees provide an efficient way to index directory entries and manage
file extents, free space, and filesystem metadata. This guarantees quick directory listing
and file accesses. The allocation of disk blocks to inodes is done dynamically, which
means that you no longer need to create a filesystem with smaller block sizes for your
mail server; your filesystem will handle this automatically for you. XFS is also a 64-bit
filesystem, which theoretically allows the creation of files that are a few million terabytes
in size, which compares favorably to the limitations of 32-bit filesystems. The ability to
attach free-form metadata tags to files on an XFS volume is yet another useful feature of
this filesystem.

XFS also contains good support for multiprocessor machines. This is visible in the
implementation of the page buffer subsystem, which uses an AVL tree which is kept
separate from the objects to avoid locking problems and cache thrashing on larger SMP
systems. Multithreaded operation has been a declared design goal of this filesystem and
has been well-tested in large multiprocessor IRIX systems worldwide.

4.2 File Management


4.2.1 Basics
The XFS space manager efficiently allocates disk space within a filesystem. It is
responsible for mapping a file (a sequence of bytes) onto sequences of disk blocks. The
internal structures of the filesystem -- allocation groups, inodes, and free space
management -- are the fundamental items controlled by the space manager. The

- 20 -
namespace manager handles allocation of directory files, normally placing them close to
the files in the directory for increased seek performance.

XFS space manager and namespace manger use sophisticated B-Tree indexing
technology to represent file location information contained inside directory files and to
represent the structure of the files themselves (location of information in a file). This
significantly increases the speed of accessing information in files, especially with large
files and filesystems. In the case of large filesystems, traditional filesystems linearly
search the file location information in directory files; this information often spans
multiple blocks and/or extents (a collection of blocks). XFS's B-Tree technology enables
it to go directly to the blocks and/or extents containing a file's location using
sophisticated indices. With large files, the B-Tree indices efficiently map the location of
the extents containing the file's data. This avoids the slower multi-level indirect schemes
(of other filesystems) which often require multiple block reads to find the desired
information.

4.2.2 Structure
The space manager divides each filesystem data sub-volume (or partition) into a number
of allocation groups at mkfs time. Each allocation group has a collection of inodes and
data blocks, and data structures to control their allocation. These allocation groups help to
divide the space management problem into easy to mange pieces, speeding up file
creation.
The administrator specifies the inode size at mkfs time (XFS set defaults the size to 256
byte). However, the location and number of inodes are allocated as needed by XFS,
unlike most filesystems, which require the static creation of the inodes at mkfs time. This
dynamic inode allocation permits greater performance (fewer seeks) since inodes can be
positioned closer to files and directories that use them.
Free blocks in an allocation group are kept track of using a pair of B-trees instead of a
bitmap. This provides better scalability to large systems. Block allocation and free block
operations are performed in parallel if done in different allocation groups.

- 21 -
The real-time sub-volume is divided into a number of fixed-size extents (a collection of
blocks) to facilitate uniform read/writes during guaranteed-rate I/O operations. The size is
chosen at mkfs time and is expected to be large (on the order of 1MByte). The real-time
subvolume allocation uses a bitmap to assure predictable performance; the access time
for B-tree indices varies.

4.2.3 Handles Contiguous Data


The space manager optimizes the layout of blocks in a file to avoid seeking during
sequential processing. It also keeps related files (those in the same directory) close to
each other on disk. More importantly, XFS uses large extents (groups of blocks) to form
contiguous regions in a single file. Current block-based filesystems map a file to blocks
of fixed length. Extent based filesystems map a file to an extent, not just a single block.
Extents are easy to represent, minimize fragmentation and reduce the number of pointers
required to represent a large file. Thus, data is kept in large contiguous extents that can be
extracted from a disk in a single I/O operation - significantly reducing I/O time.
Filesystem extents are variable in size and configured by XFS as needed. XFS supports
512 bytes to 1 gigabyte per extent and optimizes the size for higher read performance.
This is in contrast to other filesystems, which provide only fixed extent sizes.

The filesystem block size is set at filesystem creation time using mkfs. XFS supports
multiple block sizes, ranging from 512 bytes (disk sector size) up to 64KB and up to 1GB
for real-time data. The filesystem block size is the minimum unit of allocation for user
data in the filesystem. As s file is written by an application, space is reserved but blocks
are not allocated. This ensures that space is available, but gives XFS more flexibility in
allocating blocks.
XFS delays allocation of user data blocks when possible to make blocks more
contiguous; holding them in the buffer cache. This allows XFS to make extents large
without requiring the user to specify extent size, and without requiring a filesystem
reorganizer to fix the extent sizes after the fact. This also reduces the number of writes to
disk and extents used for a file.

- 22 -
4.2.4 Efficient Filesystem
As implied earlier, XFS uses sophisticated filesystem management techniques such as
extents and B-Tree indices to efficiently support:

Very large files (64-bit size)


There is little or no performance penalty to access blocks in different areas of a large file.
XFS creates large sizeable extents to locate file data close together for faster reading and
writing of data. Using B-Tree indexing technology, XFS assigns small areas on the disk
to write indices for the location of data in large files. Generally these indices point to the
extents containing the data. The use of B-Tree indices increases performance by avoiding
slower multi-level indirect searches of the filesystem data structures - especially when
accessing blocks at the end of large files.

Very small files


Most symbolic links and directory files are small files. XFS allows these files to be stored
in inodes for increased performance. XFS also uses delayed writes to wait to gather the
entire small file in the buffer cache before writing to disk. This reduces the number of
writes to disk and extents used for a file.

Files with few extents


These files are represented by extent lists to increase performance. Extent lists are simple
linear lists of the files' structure, which avoid the sophisticated B-Tree operations
necessary for large files.

Sparse files
Sparse files are files that contain arbitrary "holes," areas of the file which have never
been written and which read back as zeroes. XFS supports holes (avoids wasted space) by
using indexing (B-Trees) and extents.

- 23 -
Mapped files
Memory mapping of files is supported by XFS. It allows a program to "attach" a file such
that the file appears to be part of the program, and let's XFS worry about managing the
disk I/Os.

Large directories
Large directories are indexed in directory files using B-Trees to expedite searches,
insertions, and deletions. Operations on directories containing millions of files are almost
as fast as on directories containing only hundreds of files.

4.2.5 Attribute Management


XFS supports user defined Arbitrary attributes which allow information about a file to be
stored outside of the file. For example, a graphic image could have an associated
description or a text document could have an attribute showing the language in which the
document is written. An unlimited number of attributes can be associated with a file or
directory. They can have any name, type or size of value. These Attributes represent the
first substantial enhancement to the UNIX file interface in 10 years.

4.2.6 Online Filesystem Reconfiguration


An active filesystem may be extended (enlarged) by adding more space to the underlying
volume. This operation is supported on-line by the space manager, which receives a
request to expand the filesystem and updates on-disk and in-memory structures to
implement it.

4.2.7 Fast, Scalable Filesystem Performance


XFS is a very high-performance filesystem with support for contiguous data (reduced
I/Os) and both direct (non-buffered) and asynchronous I/O. SGI has demonstrated I/O
through the filesystem in excess of 500 MBytes/second. Customers have demonstrated
and documented high I/O and bandwidth as well. For example, Tom Ruwart, Director of
the Army High Performance Computing Lab at the University of Minnesota, has

- 24 -
successfully tested an early "alpha" version of XFS on a large 377-gigabyte filesystem.
Using 24 SCSI channels and 24 Ciprico RAIDs, Tom measured direct I/O throughput of:
 186 megabytes/second sustained read or write speed, 1 process
 330 megabytes/second sustained read speed, 4 processes
Furthermore, XFS is designed to scale in performance to match the new CHALLENGE
MP architecture and beyond. In traditional filesystems, files, directories, and filesystems
have reduced performance as they grow in size; with XFS there is no performance
penalty.

4.3 XFS as Journalled Filesystem


4.3.1 Journalling
As mentioned earlier, XFS utilizes database journalling (logging) technology. Updates to
filesystem metadata are written to a separate serial log area before the original disk
blocks are updated. Journalling promotes high availability by allowing systems to quickly
recover from failures while maintaining the disk-based file structure in a consistent state
at all times.
XFS is the first filesystem to integrate journal technology into a new filesystem rather
than add a section of journal code to an existing filesystem. This integration allows XFS
to be a more robust and faster filesystem.
XFS uses a circular log, which usually employs 1000-2000 filesystem blocks. During
normal system operation the log is written and never read. Old information is dropped out
the log file as its usefulness ends. These log buffer writes are mostly performed
asynchronously so as not to force user applications to wait for them.
Approximately 1 transaction occurs per filesystem update operation. Batch transactions
enable XFS to make metadata updates faster than EFS. The atomic, multi-block updates
enabled by transactions allow XFS to cleanly update its complex metadata structures.

4.3.2 Reliable Recovery


In the event of a crash, operations performed prior to the crash can be redone using data
present in the log to restore the filesystem structure to a consistent state. This is done in

- 25 -
the kernel at filesystem mount time. XFS performs a binary search of the log for
transactions to replay. This eliminates the need to perform a slow total UNIX filesystem
check (fsck) after a system crash. Also, when fsck finds inconsistent data structures it
must throw away anything suspicious. XFS knows what was happening at the time of
failure, so it never needs to throw anything away; it simply finishes what it started. Thus,
XFS's journalled recovery provides higher filesystem integrity than does standard UNIX.
To ensure data consistency, some log writes must be synchronous. This does not include
ordinary (buffered) writes. Direct I/O, synchronous I/O, or fsync() allow synchronous
user data writes.

4.3.3 Fast Recovery


The XFS log-based recovery mechanism will recover a filesystem within a few seconds.
The XFS recovery mechanism does not need to scan all inodes or directories to ensure
consistency. Also, journalling makes the XFS recovery time independent of the
filesystem size. Thus, the recovery time depends only upon the level of activity in the
filesystem at the time of the failure.

4.4 Volume Management


4.4.1 Basics
The xlv volume manager (XLV) is an integral part of the XFS filesystem. The volume
manager provides an operational interface to the system's disks and isolates the higher
layers of the filesystem and applications from the details of the hardware. Essentially,
higher-level software "sees" the logical volumes created by XLV exactly like disks. Yet,
a logical volume is a faster, more reliable "disk" made from many physical disks
providing important features such as the following (discussed in detail later):
 concatenating volumes for a larger disk
 striping volumes for a larger disk with more bandwidth
 plexing (mirroring) volumes for a more reliable disk
The use of volumes enables XFS to create filesystems or raw devices that span more than
one disk partition. These volumes behave like regular disk partitions and appear as block

- 26 -
and character devices in the /dev directory. Filesystems, databases, and other applications
access the volumes rather than the partitions. Each volume can be used as a single
filesystem or as a raw partition. A logical volume might include partitions from several
physical disks and, thus, be larger than any of the physical disks. Filesystems built on
these volumes can be created, mounted, and used in the normal way.
The volume manager stores all configuration data in the disk's labels. These labels are
stored on each disk and will be replicated so that a logical volume can be assembled even
if some pieces are missing. There is a negligible performance penalty for using XLV
when compared to accessing the disk directly; although plexing (mirroring data) will
mildly degrade write performance.

4.4.2 Sub-volumes
Within each logical volume, the volume manager implements sub-volumes, which are
separate linear address spaces of disk blocks in which the filesystem stores its data. For
EFS filesystems, a volume consists of just one sub-volume. For XFS filesystems, a
volume consists of a data sub-volume, an optional log sub-volume, and an optional real-
time sub-volume:
Data sub-volume
The data sub-volume contains user files and filesystem metadata (inodes, directories, and
free space blocks). It is required in all logical volumes containing XFS filesystems and is
the only sub-volume present in the EFS filesystems.
Log sub-volume
The log sub-volume contains XFS journalling information. It is a log of filesystem
transactions and is used to expedite system recovery after a crash.
Real-time sub-volume
Real-time sub-volumes are generally used for data applications such as video where
guaranteed response time is paramount.
Sub-volumes facilitate separation of different data types. For example, user data
could be prevented from overwriting filesystem log data. Sub-volumes also enable
filesystem data and user data to be configured to meet goals for performance and
reliability by putting sub-volumes on different disk drives - particularly useful for

- 27 -
separating out real-time data for guaranteed rate I/O operations. Each sub-volume can
also be optimally sized and organized independently. For example, the log sub-volume
can be plexed (mirrored) for fault tolerance and the real-time sub-volume can be striped
across a large number of disks to give maximum throughput for video playback.
Each sub-volume is made of partitions (real, physical regions of disk blocks) composed
by concatenation, plexing (mirroring), and striping. The volume manager is responsible
for translating logical addresses in the linear address spaces into real disk addresses from
the partitions. Where there are multiple copies of a logical block (plexing), the volume
manager writes simultaneously to all copies, and reads from any copy (since all copies
are identical). The volume manager maintains the equality of plexes across crashes and
both temporary and permanent disk failures. The volume manager performing retries and
rewrites masks single block failures in the plexed volumes
.
4.4.3 Plex and Volume Elements
A volume used for filesystem operations is usually composed of at least two sub-
volumes, one for the log and one for data. Each sub-volume can consist of a number of
plexes (mirrored data). Plexes are individually organized, but are mapped to the same
portion of the sub-volume's address space. Plexes may be added or detached while the
volume is active. The root filesystem may be plexed.
A plex consists of 1 to 128 volume elements, each of which maps a portion of the plex’s
address space (physical data location on disk), and are concatenated together. Each
volume element can be striped across a number of disk partitions. XFS allows online
growth of a filesystem using xfs_growfs.

4.4.4 Volume Manipulation


XLV provides the following services transparent to the filesystems and applications that
access the volumes:

General volume manipulation


You can create volumes, delete them, and move them to another system.

- 28 -
Auto-assembly of logical volumes
The volume manager will assemble logical volumes by scanning the hardware on the
system and reading all the disk labels at system boot time.

Plexing for higher system and data reliability


A sub-volume can contain one to four plexes (mirrors) of the data. Each plex contains a
portion or all of the subvolume's data. By creating a volume with multiple plexes, system
reliability is increased.

Disk striping for higher I/O performance


Striped volume elements consist of two or more disk partitions, organized so that an
amount of data called the stripe unit is written to each disk partition before writing the
next stripe unit-worth of data to the next partition. This provides a performance
advantage on large systems by allowing parallel I/O activity from the disk drives for large
I/O operations.

Concatenation to build large filesystems


You can build arbitrarily large volumes, up to the 64-bit limit, by concatenating (joining)
volume elements together (a maximum of 128 volume elements). This is useful for
creating a filesystem that is larger than the size of a single disk. XLV volumes may store
XFS filesystems, EFS filesystems, or be used as raw partitions (databases).

4.4.5 I/O to a Volume


Each volume has a pair of device nodes (just like disks). I/O to XLV devices uses a
kernel driver, which dispatches the I/O to the right set of disks. Plexing is more
complicated:
 reads are automatically balanced across all plexes (in a round robin order)
 writes are performed simultaneously to all plexes
 single disk failures are masked via retries and/or rewrites in case of failures

- 29 -
XLV supports RAID devices. Thus, XLV is capable of handling large data transfers
generated from the filesystem code, through the volume manager, down to the RAID
driver.

4.4.6 Online Administration


XLV allows online volume reconfigurations, such as increasing the size of a volume or
adding/removing a piece of a volume. This reduces system downtime.

4.5 Guaranteed rate I/O


4.5.1 A Unique Feature
The XFS guaranteed rate I/O system (GRIO) is a unique feature of XFS not found in
other filesystems. GRIO allows applications to reserve specific bandwidth to or from the
filesystem. XFS will calculate the performance available and guarantee that the requested
level of performance is met for a specified time. This frees the programmer from having
to predict the performance, which can be complex and variable on flexible systems such
as the CHALLENGE systems. This functionality is critical for full rate, high-resolution
media delivery systems such as video-on-demand or satellite systems that need to process
information at a certain rate as the satellite passes over head.
While it is possible to obtain proprietary guaranteed rate devices; no other system
integrates this feature into the filesystem. This integration yields significant performance
benefits. For example, in XFS the real-time data can be separated from the regular data
by XLV. This allows the administrator to physically locate the real-time data separate
from the metadata and regular data for faster processing. Thus, dedicated storage devices
may be reconfigured for higher performance at the expense of reliability. Moreover, the
GRIO subsystem supports all storage devices (with some configuration), instead of
locking the user to proprietary storage devices.

4.5.2 Hard vs. Soft Guarantee


Guarantees can be hard or soft, depending upon the trade-off between reliability and
performance. Hard guarantees place greater restrictions on the system hardware

- 30 -
configuration. They guarantee to deliver the requested performance, but with some
possibility of error in the data (due to the need to turn off disk drive self-diagnostics and
error-correction firmware). Hard guarantees are only possible if all the drives are on one
SCSI bus and XFS knows and "trusts" all the devices on that bus (such as using all disk
drives instead of unpredictable tape drives). Otherwise, XFS will allow a soft guarantee,
which allows the disk drive to retry operations in the event of an error, but this can
possibly result in missing the rate guarantee.

4.5.3 Guarantee Mechanism


Applications request guarantees by providing a file descriptor, data rate, duration, and
start time. The filesystem calculates the performance available and, if the bandwidth is
available, guarantees that the requested level of performance can be met for the given
period of time. To make a bandwidth reservation, a user issues a grio_request call on a
file. All real-time data accesses are made with standard read and write system calls.
Guaranteed rate I/O does not impact the buffer cache, because programs which utilize
this mechanism are required to use direct I/O - avoiding the buffer cache. Real-time data
may also be accessed in a non-real-time way using only direct I/O calls without GRIO.
The knowledge of the available bandwidth for reservation is located in a user level
reservation scheduling daemon ggd. The daemon has knowledge of the characteristics
and configuration of the disks and volumes on the system (including backplane and SCSI
bus throughput), and it tracks both current and future bandwidth reservations.
By default, IRIX supports four GRIO streams (concurrent uses of GRIO). The number of
streams can be increased to 40 by purchasing the High Performance Guaranteed-Rate
I/O-5-40 option or more using the Unlimited Streams option.

4.6 Data Management Interface


4.6.1 New Standard
In 1993, a group of computer system and data storage system vendors established the
Data Management Interface Group (DMIG) to establish a standard filesystem interface
for Hierarchical Storage Management (HSM) systems. This interface is referred to as a

- 31 -
Data Management APplication Interface (DMAPI). Silicon Graphics is committed to the
group's goal of simplifying and standardizing an HSM interface to filesystems. The
DMIG also includes other companies such as IBM, Sun, Epoch, EMASS, Hitachi,
Veritas, etc.
The DMIG has produced a draft of the standard (Version 2.1 dated March 1995) whose
changes will be tracked as the DMAPI is implemented and used in the marketplace. XFS
has already implemented much of the DMAPI interface while most other major vendors
have lagged on their implementation. Two companies, EMASS and Hitachi, have
released HSM products that use the DMAPI. Many other HSM vendors have shown
interest in the DMAPI.

4.6.2 Modified XFS Dump


Silicon Graphics has modified the XFS backup interface "xfsdump" to work more
efficiently with the HSM interfaces. XFS's xfsdump uses DMAPI interfaces to
understand the location and structure of the files in the filesystem associated with any
HSM products. This provides increased efficiency and the ability to work with an HSM
instead of fighting it

4.7 Expanded Dump/Restore Capabilities


4.7.1 Basics
XFS's dump and restore programs have been written to support the new next generation
features of XFS. Thus, the new dump and restore efficiently supports large filesystems
with up to 4 billion files and all other items discussed in this paper such as sparse files.
SGI modelled XFS's dump/restore after the BSD UNIX dump/restore as indicated below:
 Similar command line options with new (incompatible) standard option syntax
 Inode-based dump order instead of tar or cpio which perform directory searches
 Directory tree reconstruction on restore
XFS's new dump/restore is bundled with XFS and works well with other advanced 3rd
party solutions such as Networker. Also, as mentioned above, XFS's dump uses the

- 32 -
DMAPI to rapidly read the filesystem without understanding details of XFS's internal
structure, resulting in a far less complex program.

4.7.2 Features
Unlike traditional filesystems, which must be inactivated and then dismounted to
guarantee a consistent dump image, you can dump an XFS filesystem while it is being
used. Furthermore, XFS dumps/restores are resumable. This means that dump/restore
can resume after a temporary termination where it left off, instead going back to the
beginning of the backup process. XFS uses a log mechanism to see where the
dump/restore temporarily stopped and proceeds from there.
XFS supports two types of dumps: level and subtree. XFS can dump by levels, which
indicate whether to dump all of the filesystem or various incremental dumps of just
changes to the filesystem. XFS can also do "subtree" dumps by file name. With both
types of dumps the user does not have to perform a complete filesystem dump to backup
data. The online inventory of dump/restore actions contains much more information than
the old /etc/dumpdates file. The new online inventory directory /var/xfsdump provides
an extensive review of the dump history displayable on screen. This information will help
administrators to quickly restore filesystems.

4.7.3 Media Handling


XFS dump is designed to use the "end of tape" indication to determine when a new tape
needs to be started. Thus, administrators with large filesystems do not need to worry
about catastrophic "end-of-tape" conditions while dumping data to tapes. In XFS, a dump
may span multiple tapes with multiple dumps per tape. This frees the operator from
struggling to guess the amount of tape required.
In addition, media error handling has been improved:
 XFS dump now relies only on hardware error correction instead of slowing down
the process with extra error handling
 XFS dump possesses error detection and the ability to get the past errors and
minimize the data lost during this operation

- 33 -
 If media errors occur during xfsdump, XFS dump allows the user to terminate the
current tape and get another one (resumable dump)
 If media error occurs during xfsrestore, XFS restore uses efficient
resynchronization to restart/abandon the restore
XFS dump provides an "on-demand" progress report of all media handling operations.
Additionally, XFS restore can restore from tapes in any order, independent of how the
filesystem was dumped.

4.7.4 High Performance


XFS dump/restore provides high performance by:
 Using multi-threaded processing that streams the drives so that they never
"starve" for data and evenly distributes the dump across drives
 Employing very large record sizes (typically 2MB) to reduce I/O operations

4.7.5 Administration
For backup and restore of files less than 2 GB in size, the standard IRIX utilities Backup,
bru, cpio, Restore, and tar may be used. To dump XFS filesystems, the new utility
xfsdump must be used instead of dump. Restoring from these dumps is done using
xfsrestore. Xfsrestore also allows the backup media to be on a remote host.

- 34 -
5 . ARCHITECTURE OF XFS

Last chapter have describe structure of XFS in detail, here we will discuss
architecture diagram of XFS and all it different component. We will also discuss
compatibility with system and its component

5.1 Architecture of XFS


The structure of XFS is similar to that of a conventional file system with the addition of
the volume manager between the disk drivers and the file system code and the log
manager. The following sections describe the functionality of each component in the
architectural diagram.

5.1 XFS Architecture Diagram

- 35 -
Architecture block diagram of XFS (figure 5.1) shows different component of XFS, these
components are discuss in brief here
5.1 System Call and Vnode Interfaces
The filesystem related calls are implemented at the system call and vnode interface:
read, write, open, ioctl, etc., for all filesystem types. The operations are then vectored
out to different routines for each filesystem type through the vnode interfaces.
The vnode interfaces also allow interoperation with remote clients such as NFS. As
indicated earlier, NFS Version 3.0 provides 64-bit file sharing capabilities for XFS.
System call and vnode operations support Hierarchical Storage Management (HSM)
and backup applications. An industry-wide working group (DMIG, Data Management
Interface Group) designed these
.
5.2 Lock Manager
The XFS lock manager implements locking on user files, supporting standard UNIX
file locking calls such as fcntl and flock. The XFS lock manager is similar to the EFS
lock manager with comparable performance.

5.3 NameSpace Manager


The XFS namespace manager implements filesystem naming operations, translating
path names into file references (i.e., finding files). Its inode number identifies a file
internally to the filesystem. The inode is the on-disk structure, which holds all the
information about a file. The inode number is the label (or index) of the inode within
the particular filesystem.

Files are also identified internally by a numeric human-readable value unique to the
file, called the file unique id. Filesystems may be identified either by a "magic
cookie", typically a memory address of the root inode, or by a filesystem unique id.
Filesystem unique id's are assigned when the filesystem is created and are associated
uniquely with that filesystem until the filesystem is destroyed. In both cases, the
unique id's help administrators trouble shoot systems by clearly identifying different
files and filesystems.

- 36 -
The namespace manager manages the directory structures and the contents of the
inode that are unrelated to space management (such as file permissions and time
stamps). The namespace manager uses a cache to speed up naming operations. The
details of the name translation are hidden from the callers.

5.4 Attribute Manager


The attribute manager implements filesystem attribute operations: storing and
retrieving arbitrary user-defined attributes associated with objects in the namespace.
An attribute is stored internally by attaching it to the inode of the referenced object.
No storage for arbitrary attributes is allocated when an object is created, and any
attributes that exist when an object is destroyed are destroyed as well.
The system backup utility will back up and restore the attributes of an object when
that object is backed up or restored. Standard NFS does not support attributes beyond
the traditional UNIX set, so these attributes are not visible in any way to a client that
is accessing an XFS filesystem via standard NFS. NFS mounted filesystems continue
to operate as if this feature did not exist.

5.5 Space Manager


As described earlier, the XFS space manager efficiently allocates the disk space
within a filesystem using extents and B-Tree indices. It is responsible for mapping
files, allocation groups, inodes, and free space.

5.6 Log Manager


Also as indicated earlier, all changes to filesystem metadata (inodes, directories,
bitmaps, etc.) are serially logged (journalled) to a separate area of disk space by the
log manager. The log allows fast reconstruction of a filesystem (recovery) to a
consistent state if a crash intervenes before the metadata blocks are written to disk.
There is a separate log space for each filesystem for safety; the underlying volume
manager manages this separation. The log manager utilizes information provided by
the space manager to control the sequencing of write operations from the buffer

- 37 -
cache, since specific log writes must be sequenced before and after data operations
for correctness if there is a crash.
The space and name manager subsystems send logging requests to the log manager.
Each request may fill a partial log block or multiple blocks of the log. The log is
implemented as a circular sequential list, which wraps when writes reach the end.
Each log entry contains a log sequence number, so that the end of the log may be
found by looking for the highest sequence number.
On plexed volumes, the buffer cache is also responsible for inserting log records for
non-metadata blocks, so that the volume manager's write-change log does not need to
be used by the filesystem. This allows the system to keep the plexes of a volume
synchronized with each other in the event of a crash between writes.

5.7 Buffer Cache


The buffer cache is a cache of disk blocks for the various filesystems local to a
machine. Reads and writes may be performed from the buffer cache. Cache entries
are flushed when new entries are needed, in an order which takes into account
frequency (or recency) of use and filesystem semantics. Filesystem metadata as well
as file data is stored in the buffer cache. User requests may bypass the cache by
setting flags (O_DIRECT); otherwise all filesystem I/O goes through the cache.
The current buffer cache interfaces are extended from the EFS filesystem in two
ways. First, 64-bit versions of the interfaces are added to support XFS's 64-bit file
sizes. Second, a transaction mechanism is provided. This allows buffer cache clients
to collect and modify buffers during an operation, send the changed buffers to the log
manager, and then release all the buffers after successful logging.

5.8 Guaranteed Rate I/O


As detailed earlier, XFS supports digital media applications by providing a
guaranteed rate I/O (GRIO) mechanism. This allows applications to specify "real-
time" guarantees for the rate at which they can read or write a file.

- 38 -
5.9 Volume Manager
Also detailed earlier, XLV interposes a layer between the filesystem and the disk
drivers by building logical volumes (also known simply as volumes) on top of the
partition devices. A volume is a faster, more reliable "disk" made from many physical
disks, which allows concatenation, striping, and plexing of data.

5.10 Disk Drivers


Disk drivers are the same as in traditional and current IRIX systems, except for 64-bit
compatibility and error management handling for guaranteed rate I/O.

5.2 Compatibility with system and system Components


XFS runs on all supported SGI machines, which run IRIX 5.3 or higher, except IP4 and
IP6. The filesystem is implemented under the Virtual Filesystem Switch, which is
extended from prior releases. This allows XFS to support the mixing of EFS and XFS
filesystems on the same system. Hence, the administrator could easily move the more
frequently used local filesystems to XFS and leave others, as EFS for later conversion as
needed. XFS provides all the functions available in the current EFS filesystem at a
superior performance level. This includes performance features such as asynchronous I/O
volume, direct I/O, and synchronous I/O, in addition to normal (buffered) I/O.
The XFS volume manager, XLV, performs all the functions of the Lv and can be run on
the same system as these volume managers, but not on the same volume. This allows a
gradual change over to the new filesystem and volume manager. Converting from Lv
logical volumes to XLV logical volumes is easy. The program lv_to_xlv and xlv_make
convert Lv logical volumes to XLV without having to dump and restore data.
Additionally, EFS and non-filesystem applications can run on top of the new volume
manager. Ordinary driver interfaces are presented to these clients
XFS is available now in the following configurations:
 IRIX 5.3 with XFS -Supported on all SGI platforms except IP4, IP6, and
R8000-based machines

- 39 -
 IRIX 6.1 with XFS included -Power Challenge, Power Onyx, Power
Indigo II only (R8000-based machines)
-The root and /usr filesystems are factory installed as an XFS filesystem
on Power Challenge

XFS may be added to systems, which are running IRIX 5.3 and 6.0.1 by
upgrading to IRIX 5.3 with XFS or IRIX 6.1 (XFS included). XFS is a standard
integrated part of IRIX 6.1 and beyond.

With NFS ver-3, 64-bit filesystem can be exported to other system that supports NFS ver-
3 protocol. Systems that uses NFS ver-2 may access XFS filesystems with 32-bit limit
imposed by the protocol

5.3 Advantages of XFS

 64-bit filesystems – support to larger files


 Support to large number files
 Journal for fast recovery during crash
 Fast recovery and space allocation due use of B+tree
 Dynamic inodes allocation so unlimited files entry in filessytems
 Compatible with other filesystems

- 40 -
6 . XFS verses OTHER FILESYSTEMS

This chapter describes why XFS, advantages of XFS over other file systems,
compatibility with other filesystems, some unique features.

6.1 Why XFS?


Why should one switch to XFS/Linux if ReiserFS will be readily available in Red Hat 7.1
and SuSE 7.0? The main factor is trust, robustness, and maturity. XFS has been deployed
on IRIX systems since 1994 and been used in a wide array of mission-critical
applications. It's a proven technology, while ReiserFS and ext3fs are relatively new
without offering too much new functionality.

6.2 XFS Vs other filesystems


Figure/Table 6.1 given below show performance enhancement with XFS over other FS

Max. file system size Block sizes Max. file size

18 thousand
XFS 512 bytes to 64KB 9 thousand petabytes
petabytes
512 bytes blocks / 4
petabytes 512 Tb with 512 bytes
JFS 512, 1024, 2048, 4096 bytes blocks
4KB blocks / 32 4 petabytes with 4KB blocks
petabytes

Up to 64KB 4GB, 2^10 petabytes in


ReiserFS 4GB of blocks, 16 Tb
Currently fixed 4KB ReiserFS (3.6.xx)

Ext3FS 4Tb 1KB-4KB


2GB

6.1 Comparison between various filesystems on basis of file, block size

- 41 -
Other Dynamic I-node Dynamic I-node tracking Support for sparse
techniques allocation structures files
XFS YES B+tree YES

JFS YES B+tree with I-node extents YES

ReiserFS YES its main B*tree YES


NA
Ext3FS NO NO

6.2 Comparison between various filesystems on basis of inode allocation

Figure 6.2 show support of XFS and other filesystems for dynamic inode allocation,
which does not limit number of file entries in filesystems

6.3 Future advancements


Now SGI is contributing this technology to the Open Source community and is in the
process of finalizing its port to Linux. Part of these work has been already over only
some of features which limited to IRIX are to implemented for Linux.SGI also aim in
developing CXFS (Clustered XFS) allowing IRIX, Linux, Windows host to share
common set of disks

- 42 -
7 . XFS support for Linux

This chapter describes how to build a Linux system that runs on top of the SGI XFS
journaling filesystem. It describes preparation for XFS installation, configuring kernel for
installation and filesystem migration

7.1 Preparation for XFS installation


XFS can be used with Linux.Due to popularity of Linux many of user who wanted to use
XFS need to use it for Linux. SGI is developing XFS for Linux, complete support
[supporting all features advantages of XFS] as that of IRIX has not under developed, but
usable Linux kernel Patches are available for free download at ―oss.sgi.com‖.
The Linux port is still undergoing development and some features are still to be finalized.
For example, loop-mounting a file containing an XFS volume will not work without
problems, yet. The X/Open data management API provided on IRIX is still incomplete in
the Linux port and guaranteed rate I/O is also an IRIX exclusive, so far. Even now, XFS
is more than just available alternative on Linux.
Currently the only place to get the source code for the XFS enabled Linux kernel is
straight from SGI's Open
Source Development site via CVS.
 linux-2.5-xfs: development tree 
 linux-2.4-xfs: stable bug fix only tree
Here are the steps to download the kernel source tree:
 Normally the linux kernel source is installed in the /usr/src directory, so you should
start off by switching to that directory.
$ cd /usr/src
 Next, you should set the CVSROOT environment variable so that it points to the
proper CVS server.
 If you are running sh, bash, ksh, etc..: 

- 43 -
$ export CVSROOT=':pserver:[email protected]:/cvs'
 If you are running csh or tcsh: 
$ setenv CVSROOT :pserver:[email protected]:/cvs
 If you plan on updating your kernel often (to keep up with the latest changes) you
might want to put this in your login script. Then log in to the cvs server.
$ cvs login (the password is "cvs")
This needs to be done only ONCE, not everytime you access CVS.
 Now grab linux-2.4-xfs. The first time you will want to do something like:
$ cvs -z3 co linux-2.4-xfs
After you have checked the code out, you can use:
$ cvs -z3 update linux-2.4-xfs
to update your copy to the latest version from the CVS server.

7.2 Configuring kernel and installation


7.2.1 Configuring your kernel for XFS support
After having downloaded the cvs source tree the actual kernel source will be in
/usr/src/linux-2.4-xfs (-beta)/linux, so you should switch to that directory before running
the make config command of your choice. The main things that must be included in your
kernel to provide XFS support are "Page Buffer support" and (duh) "SGI XFS filesystem
support." Both options are available in the "File systems" section of the kernel
configuration. You will need to have "Prompt for development and/or incomplete
code/drivers" selected under "Code maturity level options" for those options to be
available to you. Optionally, you may also want to select "Enable XFS Debug mode" and
"Enable XFS Vnode Tracing" under "SGI XFS filesystem support." These options may
slow your XFS implementation some, but may be useful in tracing the cause of a crash if
a crash occurs.

7.2.2 Building the kernel and modules


As with any kernel build, the following commands must be run to actually build the new
kernel:

- 44 -
$ make dep, $ make bzImage , $ make modules

7.2.3. Installing the new kernel and modules


Again this is standard for any kernel installation:
$ make modules_install
$ cp arch/i386/boot/bzImage /boot/vmlinuz-2.4.0-XFS

7.2.4 Add a new entry to your lilo configuration and re-install lilo
$ vi /etc/lilo.conf
Add a new image section to your lilo.conf file similar to the following:
image=/boot/vmlinuz-2.4.0-XFS label=xfs read-only
root=/dev/hda2
The "root=" line should match the "root=" line from the existing image sections in your
lilo.conf file. Don't forget to run lilo when you're through editing lilo.conf to make the
changes effective.

7.2.5 Build and install the XFS utilities


There are a number of tools that come with the XFS filesystem that allow you to build
and manage your XFS filesystems that must be built as well. These tools are in the
/usr/src/linux-2.4-xfs(-beta)/cmd/xfsprogs directory.
Change to that directory:
$ cd ../cmd/xfsprogs
Build and install the xfs utilities:
$ make install

7.3 Filesystem migration

- 45 -
Make partition as need. Probably the trickiest part of creating a fully XFS system is
migrating the / filesystem since that is the system that supports the entire rest of the
system and it cannot actually be unmounted while the system is running.

Bibliography

Website :

1. http://oss.sgi.com ----- Silicon Graphics


2. http://oss.software.ibm.com ---- IBM
3. http://www.linuxgazatte.com ---- Linux gazette

- 46 -

You might also like