XFS Reference
XFS Reference
CONTRIBUTORS
Written by Susan Ellis and John Raithel
Illustrated by Gloria Ackley
Production by Gloria Ackley
Engineering contributions by Doug Doucette, Wei Hu, Tom Phelan, and Chuck Bullis
Cover design and illustration by Rob Aguilar, Rikk Carey, Dean Hodgkinson,
Erik Lindholm, and Kay Maitz
Copyright 1994, Silicon Graphics, Inc. All Rights Reserved
This document contains proprietary and confidential information of Silicon
Graphics, Inc. The contents of this document may not be disclosed to third parties,
copied, or duplicated in any form, in whole or in part, without the prior written
permission of Silicon Graphics, Inc.
RESTRICTED RIGHTS LEGEND
Use, duplication, or disclosure of the technical data contained in this document by
the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the
Rights in Technical Data and Computer Software clause at DFARS 52.227-7013
and/or in similar or successor clauses in the FAR, or in the DOD or NASA FAR
Supplement. Unpublished rights reserved under the Copyright Laws of the United
States. Contractor/manufacturer is Silicon Graphics, Inc., 2011 N. Shoreline Blvd.,
Mountain View, CA 94043-1389.
Silicon Graphics, the Silicon Graphics logo, and IRIS are registered trademarks, and
IRIX, XFS, Extent File System, Indy, CHALLENGE, IRIS InSight, and REACT are
trademarks of Silicon Graphics, Inc. UNIX is a registered trademark in the United
States and other countries, licensed exclusively through X/Open Company, Ltd.
Network License System and NetLS is trademarks of Apollo Computer, Inc., a
subsidiary of Hewlett-Packard Company. NFS is a registered trademark of Sun
Microsystems. Legato NetWorker is a registered trademark of Legato Systems, Inc.
EXABYTE is a trademark of EXABYTE Corporation.
Contents
List of Examples ix
List of Figures xi
List of Tables xiii
About This Guide xv
Audience xvi
How to Use This Guide xvi
Hardware Requirements xvii
Conventions xvii
Product Support xviii
For More Information xviii
1.
2.
iii
Contents
iv
Contents
Using xfsrestore 47
xfsrestore Operations 47
Simple Restores 48
Restoring Individual Files 50
Network Restores 50
Interactive Restores 51
Cumulative Restores 52
Restoring Interrupted Dumps 53
Interrupted Restores 55
The housekeeping and orphanage Directories 56
Dump and Restore With STDIN/STDOUT 56
Other Backup Utilities and XFS 57
tar 57
cpio 57
bru 57
System Recovery 58
4.
Contents
vi
Guaranteed-Rate I/O 91
Guaranteed-Rate I/O Overview 92
GRIO Guarantee Types 93
Hard Guarantees 93
Soft Guarantees 94
VOD Guarantees 94
Example: Comparing VOD and Non-VOD 95
GRIO System Components 96
Hardware Configuration Requirements for GRIO 97
Disabling Disk Error Recovery 98
Configuring the ggd Daemon 101
Example: Setting Up an XLV Logical Volume for GRIO
GRIO File Formats 106
/etc/grio_config File Format 106
/etc/grio_disks File Format 108
/etc/config/ggd.options File Format 109
90
102
Contents
A.
B.
111
115
Index 187
vii
List of Examples
Example 2-1
Example 2-2
Example 2-3
Example 2-4
Example 5-1
ix
List of Figures
Figure 3-1
Figure 3-2
Figure 3-3
Figure 3-4
Figure 4-1
Figure 4-2
Figure 4-3
Figure 4-4
Figure 4-5
Figure 4-6
Figure 4-7
xi
List of Tables
Table 2-1
Table 2-2
Table 2-3
Table 3-1
Table 3-2
Table 5-1
Table 5-2
Table B-1
xiii
Getting Started With XFS Filesystems describes the XFS filesystem and XLV
Volume Manager. Developed at Silicon Graphics, these IRIX features
provide high-performance alternatives to the Extent File System (EFS) and
logical volume managers previously available with IRIX. This guide was
prepared in conjunction with the initial release of XFS, called IRIX 5.3 with
XFS.
The features described in this guide are included in IRIX system software
releases beginning with the IRIX 5.3 with XFS release. However, to use
several features, you must obtain NetLS licenses by purchasing separate
software options. The features that require NetLS licenses are:
This guide covers only system administration of XFS filesystems and XLV
logical volumes (including volumes used for GRIO). See the section For
More Information later in this chapter for information about the
programmatic interface to XFS, which is provided with the IRIS
Development Option (IDO) software option.
xv
Audience
This guide is written for system administrators and other knowledgeable
IRIX users who want to use XFS filesystems and/or XLV logical volumes.
Because many of the procedures in this guide can result in loss of files on the
system if the procedures are not performed correctly, this guide and its
procedures should be used only by people who are
xvi
Hardware Requirements
Hardware Requirements
At least 32 MB of memory is recommended for systems with XFS
filesystems.
XFS filesystems and XLV logical volumes are not supported on systems with
IP4 or IP6 CPUs.
Using XLV logical volumes is not recommended on systems with a single
disk.
Some uses of guaranteed-rate I/O, described in Chapter 5,
Guaranteed-Rate I/O, have special disk configuration requirements.
These requirements are explained in the section Hardware Configuration
Requirements for GRIO in Chapter 5.
Conventions
This guide uses these font conventions:
italics
fixed-width type
xvii
<Enter>
Product Support
Silicon Graphics offers a comprehensive product support and maintenance
program for its products. For information about using support services for
this product, refer to the Release Notes that accompany it.
xviii
Online reference pages for system calls and library routines relevant to
XFS and GRIO are provided in the IRIS Developers Option (IDO)
software product. Appendix B provides a complete list of these
reference pages.
For instructions for loading the miniroot, see the Software Installation
Administrators Guide.
For information on acquiring and installing NetLS licenses that enable the
High Performance Guaranteed-Rate I/O software options, see the Network
License System Administration Guide.
For addition information on the software releases that include the new
features documented in this guide, see the Release Notes for these products:
IRIX
eoe
xfs
plexing
grio
nfs
dev
xix
Chapter 1
1.
This guide provides the information you need to get started using the new
Silicon Graphics filesystem technology:
This chapter highlights the major features of XFS, XLV, and GRIO.
XFS Features
XFS is designed for use on most Silicon Graphics systemsfrom desktop
systems to supercomputer systems. Its major features include
rapid and reliable recovery after system crashes because of the use of
journaling technology
XLV Features
For backup and restore, the standard IRIX utilities Backup(1), bru(1), cpio(1),
Restore(1), and tar(1) and the optional software product NetWorker for IRIX
can be used for files less than 2 GB in size. To dump XFS filesystems, the new
utility xfsdump(1M) must be used instead of dump(1M). Restoring from these
dumps is done using xfsrestore(1M). See Table 3-1 and Table 3-2 in Chapter 3,
Dumping and Restoring XFS Filesystems, for more information about the
relationships between xfsdump, xfsrestore, dump, and restore on XFS and EFS
filesystems.
XLV Features
The new XLV Volume Manager provides these advantages when XLV logical
volumes are used as raw devices, when XFS filesystems are created on them,
and when EFS filesystems are created on them:
When XFS filesystems are used on XLV volumes, each logical volume can
contain up to three subvolumes: data (required), log, and real-time. The data
subvolume normally contains user files and filesystem metadata (inodes,
indirect blocks, directories, and free space blocks). The log subvolume is
used for filesystem journal records. If there is no log subvolume, journal
records are placed in the data subvolume. Data with special I/O bandwidth
requirements, such as video, can be placed on the real-time subvolume.
XLV increases system reliability and availability by enabling you to add or
remove a plex, increase the size of (grow) a volume, and replace failed
elements of a plexed volume without taking the volume out of service.
Converting from lv logical volumes to XLV logical volumes is easy. Using the
programs lv_to_xlv(1M) and xlv_make(1M), you can convert lv logical
volumes to XLV without having to dump and restore your data. Converting
from IRIS Volume Manager volumes to XLV is beyond the scope of this
guide.
Note: The plexing feature of XLV is available only when you purchase the
Disk Plexing Option software option. See the plexing Release Notes for
information on purchasing this software option and obtaining the required
NetLS license.
GRIO Features
The guaranteed-rate I/O system (GRIO) allows applications to reserve
specific I/O bandwidth to and from the filesystem. Applications request
guarantees by providing a file descriptor, data rate, duration, and start time.
The filesystem calculates the performance available and, if the request is
granted, guarantees that the requested level of performance can be met for a
given time. This frees programmers from having to predict the performance
and is critical for media delivery systems such as video-on-demand.
Guarantees can be hard or soft, a way of expressing the trade-off between
reliability and performance. Hard guarantees deliver the requested
performance, but with some possibility of error in the data (due to the
requirements for turning off disk drive self-diagnostics and error-correction
firmware). Soft guarantees allow the disk drive to retry operations in the
event of an error, but this can possibly result in missing the rate guarantee.
Hard guarantees place greater restrictions on the system hardware
configuration.
Note: By default, IRIX supports four GRIO streams (concurrent uses of
GRIO). To increase the number of streams to 40, you can purchase the High
Performance Guaranteed-Rate I/O5-40 Streams software option. For more
streams, you can purchase the High Performance Guaranteed-Rate I/O
Unlimited Streams software option. See the grio Release Notes for information
on purchasing these software options and obtaining the required NetLS
licenses.
Chapter 2
2.
There is insufficient free disk space (see the section Checking for
Adequate Free Disk Space in this chapter).
The filesystem is the root filesystem and you intend to increase its size
later. In this case, delay the conversion until you are ready to increase
the size of the filesystem.
The filesystems are the root and, if present, usr filesystems and you
want to continue using the System Recovery procedure (item 4,
Recover System, on the System Maintenance Menu). System Recovery
doesnt work with XFS filesystems because of the limitations of
bru(1M), which is used by System Recovery. However, xfsdump(1M) can
be used to create backups that can be used to recover the system, if
necessary.
Prerequisite Software
Using XFS filesystems and XLV logical volumes requires at least IRIX 5.3
with XFS or a later system software release. The procedures in this chapter
assume that the proper software has been installed and the system rebooted
prior to beginning the procedure.
Some important subsystems in the IRIX 5.3 with XFS and later releases are:
eoe1.sw.unix
eoe2.sw.efs
eoe2.sw.lv
eoe2.sw.xfs
eoe2.sw.xlv
If you are converting the root and usr filesystems, you must have software
distribution CDs or access to a remote distribution directory for IRIX Release
5.3 with XFS or a later system software release. Instructions on loading the
miniroot from these CDs is provided in Chapter 3 of the Software Installation
Administrators Guide.
For root filesystems on systems with separate root and usr filesystems,
the recommended block size is 512 bytes. (Root filesystems in this
configuration usually dont have much extra disk space and large block
sizes compound the problem.)
Block sizes are specified in bytes in decimal (default), octal (prefixed by 0),
or hexadecimal (prefixed by 0x or 0X). If the number has the suffix k, it is
multiplied by 1024. If the number has the suffix m, it is multiplied by
1048576 (1024 * 1024).
For real-time subvolumes of XLV logical volumes, the block size is the same
as the block size of the data subvolume. The guidelines for the extent size
are:
The extent size must be a multiple of the block size of the data
subvolume.
The extent size should be matched to the application and the stripe unit
of the volume elements used in the real-time subvolume.
internal
If you want the log and the data subvolume to be on different partitions
or to use different subvolume configurations for them, use an external
log.
If you are making the XFS filesystem on a disk partition (rather than on
an XLV logical volume), you must use an internal log.
If you are making the XFS filesystem on an XLV logical volume that has
no log subvolume, you must use an internal log.
If you are making the XFS filesystem on an XLV logical volume that has
a log subvolume, you must use an external log.
For more information about XLV and log subvolumes, see Chapter 4, XLV
Logical Volumes.
The amount of disk space needed for the log is a function of how the
filesystem is used. The amount of disk space required for log records is
proportional to the transaction rate and the size of transactions on the
filesystem, not the size of the filesystem. Larger block sizes result in larger
transactions. Transactions from directory updates (for example, mkdir(1),
rmdir(1), create(2), and unlink(2)) cause more log data to be generated. You
must choose the amount of disk space to dedicate to the log (called the log
size).
The minimum log size is 512 blocks. Some guidelines for log sizes are shown
in Table 2-1.
Table 2-1
Log Size
Blocks
Transaction Activity
Small
512 blocks
Medium
2000 blocks
Average
Large
4000 blocks
Very high
For external logs, the size of the log is the same as the size of the log
subvolume. The log subvolume is one or more disk partitions. You may find
that you need to re-partition a disk to create a properly sized log subvolume
(see the section Disk Partitioning in this chapter). For external logs, the
size of the log is set when you create the log subvolume with xlv_make(1M).
For internal logs, the size of the log is specified when you create the
filesystem with mkfs(1M).
The log size is specified in bytes or as a multiple of the filesystem block size.
Decimal numbers are the default, but they can be specified in octal (prefixed
by 0) or hexadecimal (prefixed by 0x or 0X). Numbers with no suffixes are
bytes. If the number has the suffix k, it is multiplied by 1024 bytes. If the
number has the suffix m, it is multiplied by 1048576 (1024 * 1024) bytes or
one megabyte. If the number has the suffix b, it is multiplied by the
filesystem block size.
Get the size in kilobytes of the filesystem to be converted and round the
result to the next megabyte. For example:
Type
efs
kbytes
969857
use
648451
avail %use
321406 67%
Mounted on
/
If you plan to use an internal log (see the section Choosing the Log
Type and Size in this chapter), give this command to get an estimate of
the disk space required for the files in the filesystem after conversion:
xfs_estimate -i logsize -b blocksize mountpoint
logsize is the size of the log. blocksize is the block size you chose for user
files in the section Choosing Block Sizes in this chapter. mountpoint is
the directory that is the mount point for the filesystem.
The output of this command tells you how much disk space the files in
the filesystem and an internal log of size logsize will take after
conversion to XFS.
10
3.
If you plan to use an external log, give this command to get an estimate
of the disk space required for the files in the filesystem after conversion:
xfs_estimate -e 0 -b blocksize mountpoint
blocksize is the block size you chose for user files in the section
Choosing Block Sizes in this chapter. mountpoint is the directory that
is the mount point for the filesystem.
The first line of output from xfs_estimate tells you how much disk space
the files in the filesystem will take after conversion to XFS. In addition
to this, you will need disk space on a different disk partition for the
external log. You should ignore the second line of output.
4.
Compare the size of the filesystem from step 1 with the size of the files
from step 2 or step 3. For example,
970 MB - 739 MB = 231 MB free disk space
739 MB / 970 MB = 76.2% full
Repartition the disk to increase size of the disk partition for the
filesystem.
If there isnt sufficient disk space in the root filesystem and you have
separate root and usr filesystems, switch to combined root and usr
filesystems on a single disk partition.
11
Disk Partitioning
Many system administrators may find that they want or need to repartition
disks when they switch to XFS filesystems and/or XLV logical volumes. The
next two subsections explain why you might want to repartition and give
some tips on partition sizes and types.
Why Should Disks Be Repartitioned?
If the system disk has separate partitions for root and usr, the root
partition may be running out of space. Repartitioning is a way to
increase the space in root (at the expense of the size of usr) or to solve
the problem by combining root and usr into a single partition.
If you plan to use XLV logical volumes, you may want to put the XFS
log into a small subvolume. This requires disk repartitioning to create a
small partition for the log subvolume.
If you plan to use XLV logical volumes, you may want to repartition to
create disk partitions of equal size that can be striped or plexed.
12
If you are repartitioning the system disk, you must use the standalone
version of fx. Otherwise, you can use the IRIX version of fx. Using the
expert mode of fx (the x option) shouldnt be necessary.
If you repartition a system disk, remember that the swap space should
never be less than 40 MB. A smaller swap space impacts system
performance and makes it impossible to install software using the
miniroot.
New partition types have been added to fx. Table 2-2 lists and describes
all partition types.
Table 2-2
Disk Partition
Type
Description
efs
lvol
raw
xfs
xfslog
xlv
volhdr
Volume header
volume
Entire volume
13
If you are converting filesystems on a system disk, the tape drive must
be local.
If you are converting filesystems on option disks, the tape drive can be
local or remote.
The filesystem that will contain the dump must have sufficient disk
space available to hold the filesystems to be converted.
The disk partition is part of an XLV logical volume that doesnt have a
log subvolume (the log is internal).
14
1.
2.
Identify the device name of the partition, partition, where you plan to
create the filesystem. For example, if you plan to use partition 7 (the
entire disk) of a SCSI option disk on controller 0, unit 2, partition is
/dev/dsk/dks0d2s7. For more information on determining partition
(also known as a special file), see dks(7M) for SCSI disks and ipi(7M) for
Xylogics IPI disks.
3.
Any data that is on the disk partition will be destroyed (to convert the
data rather than destroy it, use the procedure in the section
Converting a Filesystem on an Option Disk from EFS to XFS in this
chapter instead).
4.
blocksize is the filesystem block size (see the section Choosing Block
Sizes in this chapter) and logsize is the size of the area dedicated to log
records (see the section Choosing the Log Type and Size in this
chapter).
Example 2-1 shows the command line used to create an XFS filesystem
and the system output. The filesystem has a 10 MB internal log and a
block size of 1K bytes and is on the partition /dev/dsk/dks0d2s7.
Example 2-1
5.
15
6.
7.
2.
blocksize is the block size for filesystem (see Choosing Block Sizes in
this chapter), and volume is the device name for the volume.
Example 2-2 shows the command line used to create an XFS filesystem
on a logical volume /dev/dsk/xlv/a and a block size of 1K bytes and the
system output.
Example 2-2
mkfs -b size=1k /dev/dsk/xlv/a
meta-data=/dev/dsk/xlv/a
data
=
log
=volume log
realtime =none
16
Example 2-3 shows the command line used to create an XFS filesystem
on a logical volume /dev/dsk/xlv/xlv_data1 and the system output. The
default block size of 4096 bytes is used and the real-time extent size is
set to 128K bytes.
Example 2-3
3.
4.
5.
17
During this procedure, you can repartition the system disk if needed. For
example, you can convert from separate root and usr filesystems to a single,
combined filesystem, or you can resize partitions to make the root partition
larger and the usr partition smaller. See the section Disk Partitioning in
this chapter for more information.
The early steps of this procedure ask you to identify the values of various
variables, which are used later in the procedure. You may find it helpful to
make a list of the variables and values for later reference. Be sure to perform
only the steps that apply to your situation. Perform all steps as superuser.
Note: It is very important to follow this procedure as documented without
18
2.
3.
Use prtvtoc(1M) to get the device name of the root disk partition,
rootpartition. For example:
# prtvtoc
Printing label for root disk
* /dev/rdsk/dks0d1s0 (bootfile "/unix")
...
The bootfile line contains the raw device name of the root disk
partition, which is /dev/rdsk/dks0d1s0 in this example. rootpartition is
the non-raw device name, which is /dev/dsk/dks0d1s0 in this
example.
4.
Partition
...
6
Type
efs
Fs
If the system disk has separate root and usr filesystems, use the output
of prtvtoc in the previous step to figure out the device name of the usr
partition, usrpartition. Look for the line that shows a mount directory of
/usr:
Start: sec
yes
116725
(cyl)
( 203)
Size: sec
727950
(cyl)
(1266)
Mount Directory
/usr
The usr partition number is shown in the first column of this line; it is 6
in this example. To determine the value of usrpartition, replace the final
digit in rootpartition with the usr partition number. For this example,
usrpartition is /dev/dsk/dks0d1s6.
5.
If you are using a tape drive as the backup device, use hinv(1M) to get
the controller and unit numbers (<tapecntlr> and <tapeunit>) of the tape
drive. For example:
# hinv -c tape
Tape drive: unit 2 on SCSI controller 0: DAT
19
6.
If you are using a disk drive as your backup device, use df(1) to get the
device name, backupdevice, and mount point, backupfs, of the partition
that contains the filesystem where you plan to put the backup. For
example:
# df
Filesystem
/dev/root
/dev/dsk/dks0d3s7
/dev/dsk/dks0d2s7
Type blocks
use
avail %use
efs 1992630 538378 1454252 27%
efs 3826812 1559740 2267072 41%
efs 2004550
23 2004527
0%
Mounted on
/
/d3
/d2
The filesystem mounted at /d2 has plenty of disk space for a backup of
the system disk (/ uses 538,378 blocks and /d2 has 2,004,527 blocks
available). The backupdevice for /d2 is /dev/dsk/dks0d2s7 and the
backupfs is /d2.
7.
Replace efs with xfs in the line for the root filesystem, /, if there
is a line for root.
If root and usr are separate filesystems and will remain so, replace
efs with xfs in the line for the usr filesystem.
If root and usr have been separate filesystems, but the disk will be
repartitioned during the conversion procedure so that they are
combined, remove the line for the usr filesystem.
8.
9.
20
11. Create a full backup of the root filesystem by giving this command:
# dump 0uCf tapesize dumpdevice rootpartition
tapesize is the tape capacity (its used for backup to disks, too) and
dumpdevice is the appropriate device name for the tape drive or the
name of the file that will contain the dump image. Table 2-3 gives the
values of tapesize and dumpdevice for different tape drives and disk. The
dump(1M) reference page is included in Appendix B.
Table 2-3
Backup Device
tapesize
dumpdevice
Disk
2m
DAT tape
2m
/dev/rmt/tps<tapecntlr>d<tapeunit>nsv
DLT tape
10m
/dev/rmt/tps<tapecntlr>d<tapeunit>nsv
EXABYTE 8mm
model 8200 tape
2m
/dev/rmt/tps<tapecntlr>d<tapeunit>nsv
EXABYTE 8mm
model 8500 tape
4m
/dev/rmt/tps<tapecntlr>d<tapeunit>nsv
150k
/dev/rmt/tps<tapecntlr>d<tapeunit>ns
12. If usr is a separate filesystem, insert a new tape (if you are using tape)
and create a full backup of the usr filesystem by giving this command:
# dump tapesize dumpdevice usrpartition
21
14. To repartition the system disk, use the standalone version of fx(1M).
This version of fx is invoked from the Command Monitor, so you must
bring up the Command Monitor. To do this, quit out of inst, reboot the
system, shut down the system, then request the Command Monitor. An
example of this procedure is:
# exit
...
Inst> quit
...
Ready to restart the system. Restart? { (y)es, (n)o, (sh)ell, (h)elp }: yes
...
login: root
# halt
...
System Maintenance Menu
...
Option? 5
Command Monitor. Type "exit" to return to the menu.
>>
22
16. Load the miniroot again, using the same procedure you used in step 9.
17. Switch to the shell prompt in inst:
Inst> sh
blocksize is the filesystem block size (see the section Choosing Block
Sizes in this chapter) and logsize is the size of the area dedicated to log
records (see the section Choosing the Log Type and Size in this
chapter).
Example 2-4 shows an example of this command for a root filesystem
and the command output. The filesystem is made on /dev/dsk/dks0d1s0
with a block size of 512 bytes and a log size of 500 KB (1000 blocks * 512
bytes/block).
Example 2-4
# mkfs -d name=/dev/dsk/dks0d1s0
meta-data=/dev/dsk/dks0d1s0
data
=
log
=internal log
realtime =none
23
20. If you have a separate usr filesystem, give this command to make it an
XFS filesystem:
# mkfs -d name=usrpartition -b size=usrblocksize -l internal,size=usrlogsize
usrblocksize and usrlogsize are the block size and log size youve chosen
for the usr filesystem.
21. Mount the root filesystem with this command:
# mount rootpartition /root
22. If you have a separate usr filesystem, create the /usr mount point
directory and mount the filesystem with these commands:
# mkdir /root/usr
# mount usrpartition /root/usr
23. If you made the backup on disk, create a mount point for the filesystem
that contains the backup and mount it:
# mkdir /backupfs
# mount backupdevice /backupfs
24. If you made the backup on tape, restore all files on the root filesystem
from the backup you made in step 11 by putting the correct tape in the
tape drive and giving these commands:
# cd /root
# mt -t /dev/rmt/tps<tapecntlr>d<tapeunit> rewind
# restore rf dumpdevice
You may need to be patient while the restore is taking place; it normally
doesnt generate any output and it can take a while. The restore(1M)
reference page is included in Appendix B.
25. If you made the backup on disk, restore all files on the root filesystem
from the backup you made in step 11 by giving these commands:
# cd /root
# restore rf /backupfs/root.dump
26. If you made a backup of the usr filesystem in step 12 on tape, restore all
files in the backup by putting the correct tape in the tape drive and
giving these commands:
# cd /root/usr
# mt -t /dev/rmt/tps<tapecntlr>d<tapeunit> rewind
# restore rf dumpdevice
24
27. If you made a backup of the usr filesystem in step 12 on disk, restore all
files in the backup by giving these commands:
# cd /root/usr
# restore rf /backupfs/usr.dump
28. Move the new version of /etc/fstab that you created in step 7 into place:
# mv /root/etc/fstab.xfs /root/etc/fstab
29. Exit from the shell and inst and restart the system:
# exit
#
Calculating sizes .. 100% Done.
Inst> quit
...
Ready to restart the system. Restart? { (y)es, (n)o, (sh)ell, (h)elp }: yes
Preparing to restart system ...
The system is being restarted.
2.
25
3.
Identify the device name of the partition, partition, where you plan to
create the filesystem. For example, if you plan to use partition 7 (the
entire disk) of a SCSI option disk on controller 0, unit 2, partition is
/dev/dsk/dks0d2s7. For more information on determining partition
(also known as a special file), see dks(7M) for SCSI disks and ipi(7M) for
Xylogics IPI disks.
4.
Back up all files on the disk partition to tape or disk because they will
be destroyed by the conversion process. You can use any backup utility
(Backup, bru, cpio, tar, and so on) and backup to a local or remote tape
drive or a local or remote disk. For example, the command for dump for
local tape is:
# dump 0uCf tapesize dumpdevice partition
tapesize is the tape capacity (its used for backup to disks, too) and
dumpdevice is the device name for the tape drive. Table 2-3 gives the
values of tapesize and dumpdevice for different local tape drives and disk.
You can get the values of <tapecntlr> and <tapeunit> used in the table
from the command hinv c tape. The dump(1M) reference page is
included in Appendix B.
5.
6.
blocksize is the filesystem block size (see the section Choosing Block
Sizes in this chapter) and logsize is the size of the area dedicated to log
records (see the section Choosing the Log Type and Size in this
chapter). Example 2-1 shows an example of this command line and its
output.
7.
26
8.
In the file /etc/fstab, in the entry for partition, replace efs with xfs. For
example:
partition mountdir xfs rw,raw=rawpartition 0 0
Restore the files to the filesystem from the backup you made in step 4.
For example, if you gave the dump command in step 4, the commands
to restore the files from tape are:
# cd mountdir
# mt -t device rewind
# restore rf dumpdevice
xlvvolume is the device file for the logical volume, for example
/dev/dsk/xlv/xlv0.
27
lvvolume is the device file for the logical volume, for example
/dev/dsk/lv0.
Unlike fsck, xfs_check does not repair any reported filesystem consistency
problems; it only reports them. If xfs_check reports a filesystem consistency
problem:
28
2.
3.
4.
Chapter 3
3.
This chapter describes how the xfsdump and xfsrestore utilities work and how
to use them to back up and recover data on XFS filesystems. (The
xfsdump(1M) and xfsrestore(1M) reference pages provide online information
on these utilities.) A short section at the end of this chapter, Other Backup
Utilities, discusses XFS-related issues of other utilities that can be used to
perform backups.
This chapter contains the following sections:
Using xfsdump
Using xfsrestore
Table 3-1 and Table 3-2 summarize when to use xfsdump and xfsrestore and
when their EFS counterparts, dump(1M) and restore(1M), must be used.
Table 3-1
Dump It Using
EFS
dump
XFS
xfsdump
Table 3-2
Restore It Using
On a Filesystem of Type
dump
restore
EFS or XFS
xfsdump
xfsrestore
EFS or XFS
29
Note than you can restore data in either EFS or XFS filesystems, but must use
the restore utility that corresponds with the dump utility used to make the
backup. The xfsdump and xfsrestore utilities are only available with the XFS
filesystem.
30
With xfsdump and xfsrestore, you can back up and restore data using
local or remote drives. Multiple dumps can be placed on a single media
object.
xfsdump and xfsrestore support incremental dumps. Also, you can back
up filesystems, directories, and/or individual files, and then restore
filesystems, directories, and files independent of how they were backed
up.
With xfsrestore, you can restore xfsdump data onto EFS filesystems.
(xfsdump backs up mounted XFS filesystems only.)
Integration
xfsdump does not affect the state of the filesystem being dumped (for
example, access times are retained), and xfsrestore restores files as close
to the original as possible.
xfsrestore detects and bypasses media errors and recovers rapidly after
encountering them.
User Interface
xfsdump optionally prompts for additional media when the end of the
current media is reached. Operator estimates of media capacity are not
required. xfsdump also supports automated backups.
31
Media Layout
The following section introduces some terminology and then describes the
way xfsdump formats data on the storage media for use by xfsrestore.
Terminology
a stream terminator
The data segment(s) contains the actual data, the dump inventory contains
a list of the dump objects in the dump, and the stream terminator marks the
end of the dump stream. When a dump stream is composed of multiple
dump objects, each object is contained in a media file. Some output devices,
for example standard output, do not support the concept of media filesthe
dump stream is only the data.
32
The simplest dump, for example the dump of a small amount of data to a
single tape, produces a data segment and a stream terminator as the only
dump objects. If the optional inventory object is added, you have a dump
such as that illustrated in Figure 3-1. (In the data layout diagrams in this
section, the optional inventory object is always included.)
Data
Media files
Inventory
Terminator
Figure 3-1
33
You can also dump data streams that are larger than a single media object.
The data stream can be broken between any two media files including data
segment boundaries. (The inventory is never broken into segments.) The
xfsdump utility prompts for a new media object when the end of the current
media object is reached. Figure 3-2 illustrates the data layout of a single
dump session that requires two media objects.
Data
segment
Media object 1
Data
segment
Data
segment
Data
segment
Media object 2
Inventory
Terminator
Figure 3-2
34
Data
segment
Data
segment
First dump
Inventory
Terminator
Data
segment
Data
segment
Second dump
Inventory
Terminator
Figure 3-3
For drives that do not permit termination to operate in this way, other means are used
to achieve the same effective result.
35
Figure 3-4 illustrates a case in which multiple dumps use multiple media
objects. If media files already exist on the additional media object(s), the
xfsdump utility finds the existing stream terminator, erases it, and begins
writing the new dump data stream.
Data
segment
Data
segment
Media object 1
First dump
Inventory
Terminator
Data
segment
Data
segment
Second dump
Data
segment
Media object 2
Data
segment
Inventory
Terminator
Figure 3-4
36
Using xfsdump
Using xfsdump
This section discusses how to use the xfsdump command to backup data to
local and remote devices. You can get a summary of xfsdump syntax with the
h option:
# xfsdump -h
xfsdump: version X.X
xfsdump: usage: xfsdump [ -f <destination> ]
[ -h (help) ]
[ -l <level> ]
[ -s <subtree> ... ]
[ -v <verbosity {silent, verbose, trace}> ]
[ -F (dont prompt) ]
[ -I (display dump inventory) ]
[ -J (inhibit inventory update) ]
[ -L <session label> ]
[ -M <media label> ]
[ -R (resume) ]
[ - (stdout) ]
<source (mntpnt|device)>
Specifying Media
You can use xfsdump to back up data to various media. For example, you can
dump data to a tape or hard disk. The drive containing the media object may
be connected to the local system or accessible over the network.
Backing Up to a Local Tape Drive
37
In this case, a session label (L option) and a media label (M option) are
supplied, and the entire filesystem is dumped. Since no verbosity option is
supplied, the default of verbose is used, resulting in the detailed screen
output. The dump inventory is updated with the record of this backup
because the -J option is not specified.
Following is an example of a backup of a subdirectory of a filesystem. In this
example, the verbosity is set to silent, and the dump inventory is not updated
(J option):
# xfsdump -f /dev/tape -v silent -J -s people/fred /usr
38
Using xfsdump
To back up data to a remote tape drive, use the standard remote system
syntax, specifying the system (by hostname if supported by a nameserver or
IP address if not) followed by a colon (:), then the pathname of the special
file.
Note: For remote backups, use the variable block size tape device if the
device supports variable block size operation, otherwise use the fixed block
size device (see intro(7)).
In this case, /usr/people/fred is backed up to the variable block size tape device
on the remote system theduke.
39
Note: The superuser account on the local system must be able to rsh to the
You can back up data to a file instead of a device. In the following example,
a file (Makefile) and a directory (Source) are backed up to a dump file
(monday_backup) in /usr/tmp on the local system:
# xfsdump -f /usr/tmp/monday_backup -v silent -J -s \
people/fred/Makefile -s people/fred/Source /usr
You may also dump to a file on a remote system, but note that the file must
be in the remote systems /dev directory. For example, the following
command backs up the /usr/people/fred subdirectory on the local system to
the regular file /dev/fred_mon_12-2 on the remote system theduke:
# xfsdump -f theduke:/dev/fred_mon_12-2 -s people/fred /usr
Reusing Tapes
When you use a new tape as the media object of a dump session, xfsdump
begins writing dump data at the beginning of the tape without prompting.
If the tape already has dump data on it, xfsdump begins writing data after the
last dump stream, again without prompting.
40
Using xfsdump
If, however, the tape contains data that is not from a dump session, xfsdump
prompts you before continuing:
# xfsdump -f /dev/tape /test
xfsdump: version X.X - type ^C for status and control
xfsdump: dump date: Fri Dec 2 11:25:19 1994
xfsdump: level 0 dump
xfsdump: session id: d23cc072-b21d-1001-8f97-080069068eeb
xfsdump: preparing tape drive
xfsdump: this tape contains data that is not part of an XFS dump
xfsdump: do you want to overwrite this tape?
type y to overwrite, n to change tapes or abort (y/n):
You must answer y if you want to continue with the dump session, or n to
quit. If you answer y, the dump session resumes and the tape is overwritten.
If you do not respond to the prompt, the session will eventually timeout.
Note that this means that an automatic backup, for example one initiated by
a crontab entry, will not succeedunless you specified the -F option with the
xfsdump command, which forces it to overwrite the tape rather than prompt
for approval.
Erasing Used Tapes
Erase pre-existing data on tapes with the mt erase command. Make sure the
tape is not write-protected.
For example, to prepare a used tape in the local default tape drive, enter:
# mt -f /dev/tape erase
Caution: This erases all data on the tape, including any dump sessions.
The tape can now used by xfsdump without prompting for approval.
41
In the following example, a new tape is used and the level 0 dump is the first
dump written to it:
# xfsdump -f /dev/tape -l 0 -M Jun_94 -L week_1 -v silent /usr
A week later, a level 1 dump of the filesystem is performed on the same tape:
# xfsdump -f /dev/tape -l 1 -L week_2 /usr
The tape is forwarded past the existing dump data and the new data from
the level 1 dump is written after it. (Note that it is not necessary to specify
the media label for each successive dump on a media object.)
42
Using xfsdump
and so on, for the four weeks of a month in this example, the fourth week
being a level 3 dump (up to nine dump levels are supported). Refer to
Cumulative Restores on page 52 for information on the proper procedure
for restoring incremental dumps.
Resumed Dump Example
You can interrupt a dump session and resume it later. To interrupt a dump
session, type the interrupt character (typically CTRL-C). You receive a list of
options which allow you to interrupt the session, change verbosity level, or
resume the session.
In the following example, xfsdump is interrupted after dumping
approximately 20% of a filesystem:
# xfsdump -f /dev/tape -L 210994u -v silent /usr
xfsdump: this tape contains data that is not part of an XFS dump
xfsdump: do you want to overwrite this tape?
type y to overwrite, n to change tapes or abort (y/n): y
overwriting
^C
status: 91/168 files dumped, 20.48 percent complete, 70 seconds elapsed
0: interrupt this session
1: change verbosity
2: continue
-> 0
session interrupt initiated
xfsdump: dump interrupted prior to ino 11615 offset 0
You can later continue the dump by including the R option and a different
session label:
# xfsdump -f /dev/tape -R -L 2nd210994u -v silent /usr
Any files that were not backed up before the interruption, and any file
changes that were made during the interruption, are backed up after the
dump is resumed.
43
Note: Use of the R option requires that the dump was made with a dump
inventory taken, that is, the J option was not used with xfsdump.
Notice that the dump inventory records are presented sequentially and are
indented to illustrate the hierarchical order of the dump information.
44
Using xfsdump
You can view a subset of the dump inventory by specifying the level of depth
(1, 2, or 3) that you want to view. For example, specifying depth=2 filters out
a lot of the specific dump information as you can see by comparing the
previous output with this:
# xfsdump -I depth=2
file system 0:
fs id:
d23cb450-b21d-1001-8f97-080069068eeb
session 0:
mount point:
magnolia.wpd.xyz.com:/test
device:
magnolia.wpd.xyz.com:/dev/rdsk/dks0d3s2
time:
Mon Nov 28 11:44:04 1994
session label: ""
session id:
d23cbf44-b21d-1001-8f97-080069068eeb
level:
0
resumed:
NO
subtree:
NO
streams:
1
session 1:
mount point:
magnolia.wpd.xyz.com:/test
device:
magnolia.wpd.xyz.com:/dev/rdsk/dks0d3s2
.
.
.
45
Note that you can also look at a list of contents on the dump media itself by
using the t option with xfsrestore. (The xfsrestore utility is discussed in detail
in the following section.) For example, to list the contents of the dump tape
currently in the local tape drive:
# xfsrestore -f /dev/tape -t -v silent | more
xfsrestore: dump session found
xfsrestore: session label: "week_1"
xfsrestore: session id: d23cbcb4-b21d-1001-8f97-080069068eeb
xfsrestore: no media label
xfsrestore: media id: d23cbcb5-b21d-1001-8f97-080069068eeb
do you want to select this dump? (y/n): y
selected
one
A/five
people/fred/TOC
people/fred/ch3.doc
people/fred/ch3TOC.doc
people/fred/questions
A/four
people/fred/script_0
people/fred/script_1
people/fred/script_2
people/fred/script_3
people/fred/sub1/TOC
people/fred/sub1/ch3.doc
people/fred/sub1/ch3TOC.doc
people/fred/sub1/questions
people/fred/sub1/script_0
people/fred/sub1/script_1
people/fred/sub1/script_2
people/fred/sub1/script_3
people/fred/sub1/xdump1.doc
people/fred/sub1/xdump1.doc.backup
people/fred/sub1/xfsdump.doc
people/fred/sub1/xfsdump.doc.auto
people/fred/sub1/sub2/TOC
---more---
46
Using xfsrestore
Using xfsrestore
This section discusses the xfsrestore command, which you must use to view
and extract data from the dump data created by xfsdump. You can get a
summary of xfsrestore syntax with the h option:
# xfsrestore -h
xfsrestore: version X.X
xfsrestore: usage: xfsrestore [
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
xfsrestore Operations
Use xfsrestore to restore data backed up with xfsdump. You can restore files,
subdirectories, and filesystemsregardless of the way they were backed up.
For example, if you back up an entire filesystem in a single dump, you can
select individual files and subdirectories from within that filesystem to
restore.
47
Simple Restores
A simple restore is a non-cumulative restore (for information on restoring
incremental dumps, refer to Cumulative Restores on page 52). An
example of a simple, noninteractive use of xfsrestore is:
# xfsrestore -f /dev/tape /usr
xfsrestore: version X.X - type ^C for status and control
xfsrestore: preparing tape drive
xfsrestore: dump session found
xfsrestore: no session label
xfsrestore: session id: d23cbbbe-b21d-1001-8f97-080069068eeb
xfsrestore: no media label
xfsrestore: media id: d23cbbbf-b21d-1001-8f97-080069068eeb
do you want to restore this dump? (y/n): y
beginning restore
xfsrestore: restore of level 0 dump of magnolia.wpd.xyz.com:/usr created Tue
Nov 22 15:47:54 1994
xfsrestore: beginning media file
xfsrestore: reading ino map
xfsrestore: initializing the map tree
xfsrestore: reading the directory hierarchy
xfsrestore: restoring non-directory files
xfsrestore: ending media file
xfsrestore: restoring directory attributes
xfsrestore: restore complete: 115 seconds elapsed
In this case, xfsrestore went to the first dump on the tape and asked if this was
the dump to restore. If you had answered n, xfsrestore would have proceeded
to the next dump on the tape (if there was one) and asked if this was the
dump you wanted to restore.
48
Using xfsrestore
You can request a specific dump if you used xfsdump with a session label. For
example:
# xfsrestore -f /dev/tape -L Wed_11_23 /usr
xfsrestore: version X.X - type ^C for status and control
xfsrestore: preparing tape drive
xfsrestore: dump session found
xfsrestore: advancing tape to next media file
xfsrestore: dump session found
xfsrestore: restore of level 0 dump of magnolia.wpd.xyz.com:/usr created Wed
Nov 23 11:17:54 1994
xfsrestore: beginning media file
xfsrestore: reading ino map
xfsrestore: initializing the map tree
xfsrestore: reading the directory hierarchy
xfsrestore: restoring non-directory files
xfsrestore: ending media file
xfsrestore: restoring directory attributes
xfsrestore: restore complete: 200 seconds elapsed
In this way you recover a dump with a single command line and do not have
to answer y or n to the prompt(s) asking you if the dump session found is the
correct one. To be even more exact, use the -S option and specify the unique
session ID of the particular dump session:
# xfsrestore -f /dev/tape -S \
d23cbf47-b21d-1001-8f97-080069068eeb /usr2/tmp
xfsrestore: version X.X - type ^C for status and control
xfsrestore: preparing tape drive
xfsrestore: dump session found
xfsrestore: advancing tape to next media file
xfsrestore: advancing tape to next media file
xfsrestore: dump session found
xfsrestore: restore of level 0 dump of magnolia.wpd.xyz.com:/test resumed Mon
Nov 28 11:50:41 1994
xfsrestore: beginning media file
xfsrestore: media file 0 (media 0, file 2)
xfsrestore: reading ino map
xfsrestore: initializing the map tree
xfsrestore: reading the directory hierarchy
xfsrestore: restoring non-directory files
xfsrestore: ending media file
xfsrestore: restoring directory attributes
xfsrestore: restore complete: 229 seconds elapsed
49
You can find the session ID by viewing the dump inventory (see Viewing
the Dump Inventory on page 44). Session labels might be duplicated, but
session IDs never are.
Restoring Individual Files
You can also restore a file in place that is, restore it directly to where it
came from in the original backup. Note, however, that if you do not use a -e,
-E, or -n option, you overwrite any existing file(s) of the same name.
In the following example, the subdirectory people/fred is restored in the
destination /usr - this would overwrite any files and subdirectories in
/usr/people/fred with the data on the dump tape:
# xfsrestore -f /dev/tape -L week_1 -s people/fred /usr
Network Restores
You can use standard network references to specify devices and files on the
network. For example, to use the tape drive on a network host named
magnolia as the source for a restore, you can use the command:
# xfsrestore -f magnolia:/dev/tape -L 120694u2 /usr2
xfsrestore: version X.X - type ^C for status and control
xfsrestore: preparing tape drive
xfsrestore: dump session found
xfsrestore: advancing tape to next media file
xfsrestore: dump session found
xfsrestore: restore of level 0 dump of magnolia.wpd.xyz.com:/usr2 created Tue
Dec 6 10:55:17 1994
xfsrestore: beginning media file
xfsrestore: media file 0 (media 0, file 1)
xfsrestore: reading ino map
xfsrestore: initializing the map tree
xfsrestore: reading the directory hierarchy
50
Using xfsrestore
xfsrestore:
xfsrestore:
xfsrestore:
xfsrestore:
In this case, the dump data is extracted from the tape on magnolia and the
destination is the directory /usr2 on the local system. Refer to the section
Dump and Restore With STDIN/STDOUT on page 56 for an example of
using the standard input option of xfsrestore.
Interactive Restores
51
-> ls
->
->
->
->
->
->
4122
4130
4126
4121
add two
cd people
ls
4124
add fred
ls
*
4124
extract
people/
two
A/
one
fred/
fred/
In the interactive restore session above, the subdirectory people/fred and the
file two were restored relative to the current working directory (.). Note
that an asterisk (*) in your ls output indicates your selections.
Cumulative Restores
Cumulative restores sequentially restore incremental dumps to recreate
filesystems and are also used to restore interrupted dumps. To perform a
cumulative restore of a filesystem, begin with the media object that contains
the base level dump and recover it first, then recover the incremental dump
with the next higher dump level number, then the next, and so on. Use the
r option to inform xfsrestore that you are performing a cumulative recovery.
In the following example, the level 0 base dump and succeeding higher level
dumps are on /dev/tape. First the level 0 dump is restored, then each higher
level dump in succession:
# xfsrestore -f /dev/tape -r -v silent .
xfsrestore: dump session found
xfsrestore: session label: "week_1"
xfsrestore: session id: d23cbcb4-b21d-1001-8f97-080069068eeb
xfsrestore: no media label
xfsrestore: media id: d23cbcb5-b21d-1001-8f97-080069068eeb
do you want to select this dump? (y/n): y
selected
52
Using xfsrestore
Next, enter the same command again, but when prompted if you want to
restore the first dump (which youve already restored), type n. Then type y
in response to the continue searching question. When you come to the next
dump, restore it:
# xfsrestore -f /dev/tape -r -v silent .
xfsrestore: dump session found
xfsrestore: session label: "week_1"
xfsrestore: session id: d23cbcb4-b21d-1001-8f97-080069068eeb
xfsrestore: no media label
xfsrestore: media id: d23cbcb5-b21d-1001-8f97-080069068eeb
do you want to select this dump? (y/n): n
not selected
do you want to continue searching? (y/n): y
continuing search
xfsrestore: dump session found
xfsrestore: session label: "week_2"
xfsrestore: session id: d23cbcb8-b21d-1001-8f97-080069068eeb
xfsrestore: no media label
xfsrestore: media id: d23cbcb5-b21d-1001-8f97-080069068eeb
do you want to select this dump? (y/n): y
selected
.
.
.
You then repeat this process, only now skipping the first two dumps and
restoring the third, and so on, until you have recovered the entire sequence
of incremental dumps. The full and latest copy of the filesystem will then
have been restored. In this case, it is restored relative to ., that is, in the
directory you are in when the sequence of xfsrestore commands is issued.
Restoring Interrupted Dumps
53
54
Using xfsrestore
From this it can be determined that session 0 was interrupted and then
resumed and completed in session 1.
To restore the interrupted dump session in the example above, use the
following sequence of commands:
# xfsrestore -f /dev/tape -r -L 180894usr .
# xfsrestore -f /dev/tape -r -L Resumed180894usr .
This restores the entire /usr backup relative to the current directory. (You
should remove the housekeeping directory from the destination directory
when you are finished.)
Interrupted Restores
In a manner similar to xfsdump interruptions, you can interrupt an xfsrestore
session. This allows you to interrupt a restore session and then resume it
later. To interrupt a restore session, type the interrupt character (typically
CTRL-C). You receive a list of options which allow you to interrupt the
session, change the verbosity level, or resume the session.
# xfsrestore -f /dev/tape -v silent /usr
xfsrestore: dump session found
xfsrestore: no session label
xfsrestore: session id: d23cbf44-b21d-1001-8f97-080069068eeb
xfsrestore: no media label
xfsrestore: media id: d23cbf45-b21d-1001-8f97-080069068eeb
do you want to select this dump? (y/n): y
selected
status: 92/168 files restored, 41.03 percent complete, 135 seconds elapsed
0: interrupt this session
1: change verbosity
2: continue
-> 0
session interrupt initiated
55
56
tar
The -K option has been added to the tar(1) command for use with files larger
than 2GB. If the -K option is not used, tar skips any files larger than 2GB and
issues a warning. Note that use of this option can create tar archives that are
not usable on non-XFS systems . The -K option cannot be used in
combination with the -O option, which creates tar archives formatted in an
older, pre-POSIX format.
cpio
The -K option has been added to the cpio(1) command for use with files
larger than 2GB. If the -K option is not used, cpio skips any files larger than
2GB and issues a warning. Note that use of this option can create cpio
archives that are not usable on non-XFS systems. The -K option can only be
used in combination with the -o (output) option. The -K option cannot be
used in combination with the -c option (which creates cpio archives with
ASCII headers), nor with the -H option (used to specify various header
formats).
bru
The -K option has been added to the bru(1) command for use with files larger
than 2GB. If the -K option is not used, bru skips any files that it cannot
compress to less than 2GB and issues a warning. Note that use of this option
can create bru archives that are not usable on non-XFS systems. The -K
option can only be used in combination with the -Z (use 12-bit LZW file
compression) option.
57
System Recovery
The PROM Monitors System Recovery option does not work correctly for
XFS filesystems.
58
Chapter 4
4.
Note: One feature of the XLV Volume Manager described in this chapter,
plexing (mirroring), is available only when you purchase the Disk Plexing
Option software option. See the plexing Release Notes for information on
purchasing this software option and obtaining the required NetLS license.
59
XLV Overview
Traditionally, UNIX systems represent disk partitions as block and character
devices. These devices are actually kernel-based interfaces that allow
applications to access the partitions on either a character or block basis. The
actual disk interactions are performed by disk device drivers.
Some applications, such as high-performance databases, access these
partition devices directly for maximum performance. However, most
applications simplify their disk access by interfacing with filesystems.
Filesystems isolate applications from the concerns of disk management by
providing the familiar file and directory model for disk access.
XLV interposes another layer into this model by building logical volumes (also
known as volumes) on top of the partition devices. Volumes appear as block
and character devices in the /dev directory. Filesystems, databases, and other
applications access the volumes rather than the partitions. Logical volumes
provide services such as disk plexing (also known as mirroring) and striping
transparently to the applications that access the volumes. A logical volume
might include partitions from several physical disk drives and, thus, be
larger than any of the physical disks. EFS or XFS filesystems can be made on
XLV logical volumes.
60
XLV Overview
Logical
volume
Log
subvolume
Plex
Volume
element
Data
subvolume
Plex
Realtime
subvolume
Plex
Plex
Volume
element
Volume
element
Striped
volume
element
Volume
element
Logical
Physical
Partition 0
Partition 1
Partition 7
Partition 7
Partition 7
Partition 7
Partition 7
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
Partition 6
Disk 1
Figure 4-1
61
Subvolume
Subvolume
Subvolume
Data
Log
Realtime
(Optional)
(Optional)
Figure 4-2
62
Volume Composition
XLV Overview
Plex
Figure 4-3
Plex
Plex
Plex
(Optional)
(Optional)
(Optional)
Subvolume Composition
Each subvolume is a distinct address space and a distinct type. The types of
subvolumes are:
Data subvolume
The data subvolume is required in all logical volumes. It is
the only subvolume present in EFS filesystems.
Log subvolume The log subvolume contains XFS journaling information. It
is a log of filesystem transactions and is used to expedite
system recovery after a crash. Log information is sometimes
put in the data subvolume rather than in a log subvolume
(see the section Choosing the Log Type and Size in
Chapter 2 and the mkfs_xfs(1M) reference page and its
discussion of the l option for more information).
63
Real-time subvolume
Real-time subvolumes are generally used for data
applications such as video, where guaranteed response time
is more important than data integrity. The section Using
the Real-Time Subvolume in this chapter and Chapter 5,
Guaranteed-Rate I/O, explain how applications access
data on real-time subvolumes.
Subvolumes enforce separation among data types. For example, user data
cannot overwrite filesystem log data. Subvolumes also enable filesystem
data and user data to be configured to meet goals for performance and
reliability. For example, performance can be improved by putting
subvolumes on different disk drives.
Each subvolume can be organized independently. For example, the log
subvolume can be plexed for fault tolerance and the real-time subvolume
can be striped across a large number of disks to give maximum throughput
for video playback.
Volume elements that are part of a real-time subvolume should not be on the
same disk as volume elements used for data or log subvolumes. This is a
recommendation for all files on real-time subvolumes and required for files
used for guaranteed-rate I/O with hard guarantees. (See Hardware
Configuration Requirements for GRIO in Chapter 5 for more information.)
You can create subvolumes, but you cannot detach them from their volumes
or delete them. A subvolume is automatically deleted when the volume is
deleted.
Plexes
A subvolume can contain from one to four plexes (sometimes called mirrors).
Each plex contains a portion or all of the subvolumes data. By creating a
volume with multiple plexes, system reliability is increased.
If there is just one plex in a subvolume, that plex spans the entire address
space of the subvolume. However, when there are multiple plexes,
individual plexes can have holes in their address spaces as long as the union
of all plexes spans the entire address space.
64
XLV Overview
Volume element
Plex
.
.
.
(Up to 128 volume elements)
Figure 4-4
Plex Composition
65
The simplest type of volume element is a single disk partition. The two other
types of volume elements, striped volume elements and multipartition
volume elements, are composed of several disk partitions. Figure 4-5 shows
a single partition volume element.
Singlepartition volume element
Disk
partition
Figure 4-5
66
XLV Overview
Figure 4-6 shows a striped volume element. Striped volume elements consist
of two or more disk partitions, organized so that an amount of data called
the stripe unit is written to each disk partition before writing the next stripe
unit-worth of data to the next partition.
Striped volume element
Stripe unit
Disk
partition
Disk
partition
...
Disk
partition
(Optional)
Figure 4-6
Striping can be used to alternate sections of data among multiple disks. This
provides a performance advantage by allowing parallel I/O activity.
Figure 4-7 shows a multipartition volume element in which the volume
element is composed of more than one disk partition. In this configuration,
the disk partitions are addressed sequentially.
67
Disk
partition
.
.
.
(Optional)
Disk
partition
(Optional)
Figure 4-7
Any mixture of the three types of volume elements (single partition, striped,
and multipartition) can be concatenated in a plex.
68
XLV Overview
When a volume is created on one system and moved (by moving the disks)
to another system, the new volume name is the same as the original volume
name with the hostname of the original system prepended. For example, if a
volume called xlv0 is moved from a system called engrlab1 to a system
called engrlab2, the device name of the volume on the new system is
/dev/dsk/xlv/engrlab1.xlv0 (the old system name engrlab1 has been prepended
to the volume name xlv0).
XLV Daemons
The XLV daemons are:
xlv_labd
xlvd
xlv_plexd
XLV does not require an explicit configuration file, nor is it turned on and off
with chkconfig(1M). XLV is able to assemble logical volumes based solely
upon information written in the disk labels. During initialization, the system
performs a hardware inventory, reads all the disk labels, and automatically
assembles the available disks into volumes.
If some disks are missing, XLV checks to see if there are enough volume
elements among the available plexes to map the entire address space. If the
whole address space is available, XLV brings the volume online even if some
of the plexes are incomplete.
69
70
The basic guidelines for choosing which subvolumes to use with XFS
filesystems are:
Log subvolumes are optional. If they are not used, log information is
put into an internal log in the data subvolume (by giving the l internal
option to mkfs).
When real-time subvolumes are used, make a small log subvolume and
a small data subvolume. Dont put much (if any) user data in the
filesystem, just real-time data.
Choosing the size of the log (and therefore the size of the log
subvolume) is discussed in the section Choosing the Log Type and
Size in Chapter 2. Note that if you do not intend to repartition a disk to
create an optimal-size log partition, your choice of an available disk
partition may determine the size of the log.
71
Plexing
The basic guidelines for plexing are:
Use plexing when high reliability and high availability of data are
required.
Plexes can have holes in them, portions of the address range not
contained by a volume element, as long as at least one of the plexes in
the subvolume has a volume element with the address range of the
hole.
Striping
The basic guidelines for striping are:
72
73
1.
# df
Filesystem
/dev/root
/dev/dsk/dks0d2s7
/dev/dsk/dks0d3s7
# umount /d2
# umount /d3
Unmount the disks that will be used in the volume if they are mounted.
For example:
Type blocks
efs 1939714
efs 2004550
efs 3826812
2.
use
avail %use
430115 1509599 22%
22 2004528
0%
22 3826790
0%
Mounted on
/
/d2
/d3
Start xlv_make:
# xlv_make
xlv_make>
3.
4.
6.
You can specify the last portion of the disk partition pathname (as
shown) or the full pathname. xlv_make accepts disk partitions that are of
types xlv, xfs, and efs. You can use other partition types, for
example lvol, by giving the force option, for example, ve force
dks0d2s7. xlv_make automatically changes the partition type to xlv.
74
7.
8.
9.
2.
Create a file, called xlv0.specs for example, that contains input for
xlv_make. For this example and a volume named xlv0, the file contains:
vol xlv0
data
plex
ve -stripe -stripe_unit 256 dks0d2s7 dks0d3s7
plex
ve -stripe -stripe_unit 256 dks0d4s7 dks0d5s7
end
show
exit
75
2.
3.
76
Choose new names for the logical volumes, if desired. XLV, unlike lv,
only requires names to be valid filenames, so you can choose more
meaningful names. For example, you can make the volume names the
same as the mount points you use. If you mount logical volumes at /a,
/b, and /c, you can name the XLV volumes a, b, and c.
2.
Unmount all lv logical volumes that you plan to convert to XLV logical
volumes. For example:
umount /a
3.
If you want to change the volume names, edit scriptfile and replace the
names on the lines that begin with vol with the new names. For
example, change:
vol lv0
to:
vol a
77
6.
7.
8.
9.
If you changed the name of the volume, for example from lv0 to a,
make the change in the first field.
If you changed the name of the volume, for example from lv0 to a,
make the change in the raw device.
/a efs rw,raw=/dev/rdsk/lv0 0 0
78
Note: The full menu is shown above; if you do not have a valid license for
the Disk Plexing Option software option, several of the plex-related menu
selections do not appear.
79
In this example, there are two high-level volume objects, a volume element
named spare_ve and a logical volume named xlv0. The volume element is a
high-level volume object because it is not part of any plex or subvolume.
To display the complete hierarchy of a high-level volume object, use
selection 42 of the xlv_admin menu, for example:
xlv_admin> 42
Please enter name of object to be operated on.
xlv_admin> xlv0
============= Displaying Requested Object ==========
vol xlv0
ve xlv0.data.0.0 [active]
start=0, end=226799, (cat)grp_size=1
/dev/dsk/dks0d2s7 (226800 blks)
ve xlv0.data.0.1 [active]
start=226800, end=453599, (cat)grp_size=1
/dev/dsk/dks0d3s7 (226800 blks)
ve xlv0.data.0.2 [active]
start=453600, end=680399, (cat)grp_size=1
/dev/dsk/dks0d4s7 (226800 blks)
This output shows that xlv0 contains only a data subvolume. The data
subvolume has one plex that has three volume elements.
80
If any of the volume elements you plan to add to the volume dont exist
yet, create them with xlv_make. For example, follow this procedure to
create a volume element out of a new disk, /dev/dsk/dks0d4s7:
xlv_make
xlv_make> ve new_ve dks0d4s7
new_ve
xlv_make> end
Object specification completed
xlv_make> exit
Newly created objects will be written to disk.
Is this what you want?(yes) yes
Invoking xlv_assemble
3.
81
4.
If the plex that you want to add to the subvolume doesnt exist yet,
create it with xlv_make. For example, to create a plex called plex1 to add
to the data subvolume of a volume called root_vol, give these
commands:
# xlv_make
xlv_make> show
Completed Objects
(1) vol root_vol
ve root_vol.data.0.0 [active]
start=0, end=1992629, (cat)grp_size=1
/dev/dsk/dks0d1s0 (1992630 blks)
xlv_make> plex plex1
82
plex1
xlv_make> ve /dev/dsk/dks0d2s0
plex1.0
xlv_make> end
Object specification completed
xlv_make> exit
Newly created objects will be written to disk.
Is this what you want?(yes) yes
Invoking xlv_assemble
2.
Use the xlv_admin command menu to add the plex to the volume. For
example, to add the standalone plex plex1 to root_vol, use this
procedure:
# xlv_admin
**************** XLV Administration Menu **********
...
3.
Add a plex to an existing volume.
...
42.
Show information for an object.
...
99.
Exit
...
xlv_admin> 42
Please enter name of object to be operated on.
xlv_admin> root_vol
============= Displaying Requested Object ==========
vol root_vol
ve root_vol.data.0.0 [active]
start=0, end=1992629, (cat)grp_size=1
/dev/dsk/dks0d1s0 (1992630 blks)
83
3.
You can confirm that root_vol now has two plexes by using selection 42
of the xlv_admin command menu:
xlv_admin> 42
Please enter name of object to be operated on.
xlv_admin> root_vol
============= Displaying Requested Object ==========
vol root_vol
ve root_vol.data.0.0 [active]
start=0, end=1992629, (cat)grp_size=1
/dev/dsk/dks0d1s0 (1992630 blks)
ve root_vol.data.1.0 [empty]
start=0, end=1992629, (cat)grp_size=1
/dev/dsk/dks0d2s0 (1992630 blks)
Exit xlv_admin:
xlv_admin> 99
#
84
The plex revive completes and the new plex switches to [active] state
automatically, but if you want to check its progress and verify that the plex
has become active, follow this procedure:
1.
# ps -ef | grep xlv
root
27
1
root
35
1
root
31
1
root
407
27
-d 50331648 -b 128 -w
root
410
397
10:49:27 ?
10:49:28 ?
10:49:27 ?
11:01:01 ?
0 1992629
11:01:11 pts/0
0:00
0:00
0:00
0:00
/sbin/xlv_plexd -m 4
/sbin/xlv_labd
xlvd
xlv_plexd -v 2 -n root_vol.data
0 10:49:27 ?
0 10:49:28 ?
0 10:49:27 ?
0:00 /sbin/xlv_plexd -m 4
0:00 /sbin/xlv_labd
0:03 xlvd
85
Exit xlv_admin:
xlv_admin> 99
#
Start xlv_admin and display the volume that has the plex that you plan
to detach, for example, root_vol:
# xlv_admin
...
1.
Add a ve to an existing plex.
...
12.
Detach a plex from an existing volume.
...
42.
Show information for an object.
................ Exit ................
99.
Exit
Please select choice...
xlv_admin> 42
Please enter name of object to be operated on.
xlv_admin> root_vol
============= Displaying Requested Object ==========
vol root_vol
ve root_vol.data.0.0 [active]
start=0, end=1992629, (cat)grp_size=1
/dev/dsk/dks0d1s0 (1992630 blks)
ve root_vol.data.1.0 [active]
start=0, end=1992629, (cat)grp_size=1
/dev/dsk/dks0d2s0 (1992630 blks)
86
2.
Detach plex 1 and give it the name plex1 by giving these commands:
xlv_admin> 12
Please enter name of object to be operated on.
xlv_admin> root_vol
Please select plex number (0-3).
xlv_admin> 1
Please enter name of new object.
xlv_admin> plex1
Please select choice...
3.
To examine the volume and the detached plex, give these commands:
xlv_admin> 42
Please enter name of object to be operated on.
xlv_admin> plex1
============= Displaying Requested Object ==========
plex plex1
ve plex1.0 [empty]
start=0, end=1992629, (cat)grp_size=1
/dev/dsk/dks0d2s0 (1992630 blks)
4.
Exit xlv_admin:
xlv_admin> 99
#
87
If you are deleting a volume, you must unmount it first. For example:
umount /vol1
2.
Start xlv_admin and list the root of each object hierarchy on the system:
# xlv_admin
...
31.
Delete an object.
...
41.
Show object by name and type, only.
...
99.
Exit
Please select choice...
xlv_admin> 41
==================== Listing Objects =============
Volume:
root_vol
Plex:
plex1
3.
4.
5.
Exit xlv_admin:
xlv_admin> 99
#
88
You cannot create real-time files using any standard utilities. Only
specially-written programs can create real-time files. The next section,
Creating Files on the Real-time Subvolume, explains how.
Real-time files are displayed by ls(1), just as any other file. However,
there is no way to tell from the ls output whether a particular file is on a
data subvolume or is a real-time file on a real-time subvolume. Only a
specially-written program can determine the type of a file. The
F_FSGETXATTR fcntl(2) system call is used to determine if a file is a
real-time or a standard data file. If the file is a real-time file, the
fsx_xflags field of the fsxattr structure has the XFS_XFLAG_REALTIME
bit set.
The df(1) utility displays the disk space in the data subvolume by
default. When the r option is given, the real-time subvolumes disk
space and usage is added. df can report that there is free disk space in
the filesystem when the real-time subvolume is full, and df r can report
that there is free disk space when the data subvolume is full.
89
Real-time files can only be read or written using direct I/O. Therefore,
read(2) and write(2) operations to a real-time file must meet the requirements
specified by the F_DIOINFO fcntl(2) call. See the open(2) reference page for a
discussion of the O_DIRECT option to the open() system call.
90
Chapter 5
5.
Guaranteed-Rate I/O
91
92
Hard Guarantees
A hard guarantee means the system will do everything possible to make sure
the application receives the amount of data that has been reserved during
each second of the reservation duration.
Hard guarantees are possible only when the disks that are used for the
real-time subvolume meet the requirements listed in the section Hardware
Configuration Requirements for GRIO in this chapter.
Because of these disk configuration requirements, incorrect data can be
returned to the application without an error notification, but the I/O
requests return within the guaranteed time. If an application requests a hard
guarantee and some part of the system configuration makes the granting of
a hard guarantee impossible, the reservation is rejected. The application can
then issue a reservation request with a soft guarantee.
93
Soft Guarantees
A soft guarantee means the system tries to achieve the desired rate, but there
may be circumstances beyond its control that cause it to fail. For example, if
a non-real-time disk is on the same SCSI bus as real-time disks and there is a
disk data error on the non-real-time disk, the driver retries the request to
recover the data. This could cause the rate guarantee on the real-time disk to
be missed.
VOD Guarantees
VOD (Video On Demand) is a special type of rate guarantee applied to either
hard or soft guarantees. It allows more streams to be supported per disk
drive, but requires that the application provide careful control of when and
where I/O requests are issued.
VOD guarantees are supported only when using a striped volume. The
application must time multiplex the I/O requests to different drives at
different times. A process stream can only access a single disk during any
one second. Therefore, the stripe unit must be set to the number of kilobytes
of data that the application needs to access per second per stream of data.
(The stripe unit is set using xlv_make(1M) when volume elements are
created.) If the process tries to access data on a different disk during a time
period, it is suspended until the appropriate time period.
With VOD reservations, if the application does not read the file sequentially,
but rather skips around in the file, it will have a performance impact. For
example, if disks are four-way striped, it could take as long as four seconds
(the size of the volume stripe) for the first I/O request after a seek to
complete.
94
95
96
Put only real-time subvolume volume elements on a single disk (not log
or data subvolume volume elements). This configuration is
recommended for soft guarantees and required for hard guarantees.
Only SCSI disks can be used for real-time subvolumes. IPI, ESDI, and
other non-SCSI disks cannot be used.
For GRIO with hard guarantees, each disk used for hard guarantees
must be on a controller whose disks are used exclusively for real-time
subvolumes. These controllers cannot have any devices other than SCSI
disks on their buses. Any other devices could prevent the disk from
accessing the SCSI bus in a timely manner and cause the rate to be
missed.
The drive firmware in each disk used in the real-time subvolume must
have the predictive failure analysis and thermal recalibration features
disabled. All disk drives have been shipped from Silicon Graphics this
way since March 1994.
For hard guarantees, the disk drive retry and error correction
mechanisms must be disabled for all disks used in the real-time
subvolume. See the section Disabling Disk Error Recovery in this
chapter for more information.
Disks used in the data and log subvolumes of the XLV logical volume
must not have their retry mechanisms disabled. The data and log
subvolumes contain information critical to the filesystem and cannot
afford an occasional disk error.
97
Parameter
New Setting
Disabled
Disabled
Disabled
SGI
0664N1D
6s61
SGI
0664N1D
4I4I
The procedure for setting disk drive parameters is shown in the example
below. It uses the parameters shown in Table 5-1 for a disk drive on
controller 131, unit 1.
98
fx -x
fx version 5.3, Nov 18, 1994
fx: "device-name" = (dksc) <Enter>
fx: ctlr# = (0) 131
fx: drive# = (1) 1
fx: lun# = (0)
...opening dksc(131,1,0)
...controller test...OK
Scsi drive type == SGI
0664N1D
6s61
----- please choose one (? for help, .. to quit this menu)----[exi]t
[d]ebug/
[l]abel/
[b]adblock/
[exe]rcise/
[r]epartition/
fx > label
----- please choose one (? for help, .. to quit this menu)----[sh]ow/
[sy]nc
[se]t/
[c]reate/
fx/label> show
----- please choose one (? for help, .. to quit this menu)----[para]meters
[part]itions
[b]ootinfo
[a]ll
[g]eometry
[s]giinfo
[d]irectory
fx/label/show> parameters
----- current drive parameters----Error correction enabled
Enable data transfer on error
Don't report recovered errors
Do delay for error recovery
Don't transfer bad blocks
Error retry attempts
10
Do auto bad block reallocation (read)
Do auto bad block reallocation (write)
Drive readahead enabled
Drive buffered writes disabled
Drive disable prefetch
65535
Drive minimum prefetch
0
Drive maximum prefetch
65535
Drive prefetch ceiling
65535
Number of cache segments
4
Read buffer ratio
0/256
Write buffer ratio
0/256
Command Tag Queueing disabled
99
[sh]ow/
fx/label> set
[sy]nc
[se]t/
[c]reate/
100
Run ggd as a real-time process. If the system has more than one CPU
and you are willing to dedicate an entire CPU to performing GRIO
requests, add the c cpunum to the file /etc/config/ggd.options. This causes
the CPU to be marked isolated, restricted to running selected processes,
and nonpreemptive. After ggd has been restarted, you can confirm that
the CPU has been marked by giving this command (cpunum is 3 in this
example):
mpadmin -s
processors: 0 1 2 3 4 5 6 7
unrestricted: 0 1 2 5 6 7
isolated: 3
restricted: 3
preemptive: 0 1 2 4 5 6 7
clock: 0
fast clock: 0
101
To mark an additional CPU for real-time processes after ggd has been
restarted, give these commands:
mpadmin -rcpunum2
mpadmin -Icpunum2
mpadmin -Ccpunum2
102
rate
num_disks
stripe_unit
extent_size
opt_IO_size
Variable
Type of Guarantee
Comment
vol_name
any
xlv_grio
This name matches the last
component of the device name for
the volume,
/dev/dsk/xlv/vol_name
rate
any
512
num_disks
any
stripe_unit
hard or soft
512*1K/(4*512)
256
Example
Value
103
extent_size
opt_IO_size
2.
Type of Guarantee
Comment
Example
Value
512*1K/512
1024
hard or soft
512 * 1K
512k
512 * 1K * 4
2048k
hard or soft
64
Same as rate
512
Create an xlv_make(1M) script file that creates the XLV logical volume.
(See the section Using xlv_make to Create Volume Objects in
Chapter 4 for more information.) Example 5-1 shows an example script
file for a volume.
Example 5-1
104
3.
5.
mountdir is the full pathname of the directory that is the mount point
for the filesystem.
6.
7.
If the file /etc/grio_config exists, and you see OPTSZ=65536 for each
device, skip to step 9.
8.
9.
If you want soft rate guarantees, edit /etc/grio_config and remove this
string:
RT=1
from the lines for disks where software retry is required (see the section
/etc/grio_config File Format in this chapter for more information).
10. Restart the ggd daemon:
/etc/init.d/grio stop
/etc/init.d/grio start
Now the user application can be started. Files created on the real-time
subvolume volume can be accessed using guaranteed-rate I/O.
105
106
SYSTEM
CPUn
MEMn
IOBn
IOAnm
CTRn
DSKnUm
NUM
SLOT
VER
The CPU type of system (for example, IP22, IP19, and so on;
not used on all systems)
NUMCPUS
MHZ
CTLRNUM
UNIT
RT
The value is the integer or text string value assigned to the parameter. The
string enclosed in parentheses at the end of the line describes the
component.
Some examples of component records taken from /etc/grio_config on an
Indy system are shown below. Each record is a single line, even if it is
shown on multiple lines here.
This describes a 100 MHz CPU board in slot 0. It supports five thousand
64 KB operations per second.
107
SYSTEM: CPU
This describes the CPU board as being attached to the system bus.
CTR0: DSK0U1
108
The first field is always the keyword ADD. The next field is a 28-character
string that is the drive manufacturers disk ID string. The next field is an
integer denoting the optimal I/O size of the device in bytes. The last field is
an integer denoting the number of optimal I/O size requests that the disk
can satisfy in one second.
Some examples of these records are:
ADD
SGI
SEAGATE ST31200N9278
64K
23
ADD
SGI
0064N1D 4I4I
64K
23
c cpunum
If you change this file, you must restart ggd to have your changes take effect.
See the section Configuring the ggd Daemon in this chapter for more
information.
109
Appendix A
A.
Error Messages
This appendix explains some of the error messages that can occur while
performing the procedures in this guide.
111
112
113
Appendix B
B.
Reference Pages
This appendix lists reference pages (man pages) that provide information
about topics that relate to XFS and XLV. The printed form of this guide
includes copies of the reference pages for key utilities and file formats.
All reference pages can be viewed online using the man(1) command. On
systems with graphics, they can also be viewed using the xman(1) command
or the Man Pages item on the Help toolchest.
Category
Reference Pages
Subsystem
General information
eoe2.man.xfs,
eoe2.man.xlv
XFS utilities
mkfs_xfs(1M), xfs_bmap(1M),
xfs_check(1M), xfs_estimate(1M),
xfs_growfs(1M), xfs_logprint(1M),
xfsdump(1M), xfsrestore(1M)
eoe2.man.xfs
XLV utilities
lv_to_xlv(1M), xlv_assemble(1M),
xlv_make(1M),
xlv_set_primary(1M),
xlv_shutdown(1M)
eoe2.man.xlv
XLV daemons
xlv_labd(1M), xlv_plexd(1M),
xlvd(1M)
eoe2.man.xlv
115
Category
Reference Pages
Subsystem
cfg(1M), ggd(1M)
eoe2.man.xfs
grio_get_rtgkey(3X),
grio_remove_request(3X),
grio_request(3X),
grio_use_rtgkey(3X)
dev.man.irix_lib
grio_config(4), grio_disks(4)
eoe2.man.xfs
eoe1.man.unix
dev.man.irix_lib
fcntl(2), fstat64(2), ftruncate64(2),
getrlimit64(2), lseek64(2), lstat64(2),
mmap64(2), mount(2), setrlimit64(2),
stat64(2), syssgi(2), truncate64(2)
dev.man.irix_lib
aio_cancel64(3), aio_error64(3),
aio_read64(3), aio_return64(3),
aio_sgi_init(3), aio_sgi_init64(3),
aio_suspend64(3), aio_write64(3),
lio_listio64(3), fd_to_handle(3X),
fgetpos64(3S), free_handle(3X),
fseek64(3S), fsetpos64(3S), ftell64(3S),
ftw64(3C), handle_to_fshandle(3X),
nftw64(3C), open_by_handle(3X),
path_to_fshandle(3X),
path_to_handle(3X),
readlink_by_handle(3X)
stat64(5)
dev.man.irix_lib
116
117
cfg(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
cfg scans the hardware available on the system and creates a file that describes the rates that can be
guaranteed on each I/O device. The output file is file if the f option is specified, otherwise /etc/grio_config
is used. 64 kilobytes is used as the optimal i/o size unless the d option is used.
The cfg utility appends a checksum to the end of the file so that ggd can determine if the file has been
edited.
NOTES
If the d option is used to change the optimal I/O size, the /etc/grio_disks file must be edited to indicate
the number of requests supported per second for the given optimal I/O size.
FILES
/etc/grio_config
/etc/grio_disks
SEE ALSO
118
DUMP(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
dump backs up all files in filesystem, or files changed after a certain date to magnetic tape or files. The key
specifies the date and other options about the dump. Key consists of characters from the set
0123456789fusCcdbWwn. Any arguments supplied for specific options are given as subsequent words on
the command line, in the same order as that of the options listed.
If no key is given, the key is assumed to be 9u and the filesystem specified is dumped to the default tape
device /dev/tape.
0
9 This number is the dump level. All files modified since the last date stored in the file /etc/dumpdates
for the same filesystem at lesser levels will be dumped. If no date is determined by the level, the
beginning of time is assumed; thus the option 0 causes the entire filesystem to be dumped. For
instance, if you did a level 2 dump on Monday, followed by a level 4 dump on Tuesday, a
subsequent level 3 dump on Wednesday would contain all files modified or added to the filesystem since the level 2 (Monday) backup. A level 0 dump copies the entire filesystem to the
dump volume.
f
Place the dump on the next argument file instead of the default tape device /dev/tape. If the name of
the file is , dump writes to standard output. If the name of the file is of the format machine:device
the filesystem is dumped across the network to the remote machine. Since dump is normally run by
root, the name of the local machine must appear in the .rhosts file of the remote machine. If the file
name argument is of the form user@machine:device, dump will attempt to execute as the specified user
on the remote machine. The specified user must have a .rhosts file on the remote machine that
allows root from the local machine. dump creates a remote server, /etc/rmt, on the client machine
to access the tape device.
If the dump completes successfully, write the date of the beginning of the dump on file
/etc/dumpdates. This file records a separate date for each filesystem and each dump level. The format
of /etc/dumpdates is readable by people, consisting of one free format record per line: filesystem
name, increment level and ctime(3C) format dump date. /etc/dumpdates may be edited to change any
of the fields, if necessary.
The size of the dump tape is specified in feet. The number of feet is taken from the next argument.
When the specified size is reached, dump will prompt the operator and wait for the reel/volume to
be changed. The default tape size for the standard 9 track half inch reels is 2400 feet. The default for
cartridge tapes is an effective tape length of 5400 feet, and this assumes a 9-track QIC-24 tape whose
physical tape length is 600 feet. See note on cartridge tapes parameters below.
119
DUMP(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
d
The density of the tape, expressed in BPI (bytes per inch), is taken from the next argument. This is
used in calculating the amount of tape used per reel. The default is 1600 BPI, except for the cartridge
tape which has a default density of 1000 BPI. Unless a higher density is specified explicitly, dump
uses its default density - even if the tape drive is capable of higher-density operation (for instance
6250 BPI). If the density specified does not correspond to the density of the tape device being used,
dump will not be able to handle end-of-tape properly.
The blocking factor (number of 1 Kbyte blocks written out together) is taken from the next argument.
The default is 10. The default blocking factor for tapes of density 6250 BPI and greater is 32. If
values larger than 32 are used, restore will not correctly determine the block size unless the b option
is also used. To maximize tape utilization, use a blocking factor which is a multiple of 8. For most
types of supported tape drives, the greatest capacity and tape throughput is obtained using a blocking factor of 128 or even larger; note that restore(1m) will only automatically determine the blocking
factor if it is 32 or less.
This specifies the total tape capacity in 1K blocks, overriding the c, s, and d arguments if they are
also given. No adjustment is made for possible inter-record gaps, or lost capacity due to stop/start
repositioning, so it isnt necessary to guess how the dump algorithm for these factors will affect the
parameters. Since they arent taken into account, and there may also be lost capacity due to retries
on media errors (by the drive), one should be conservative when specifying capacity.
The argument is parsed with strtoul(3), so it may be in any base (e.g., a 0x prefix specifies a hex
value, a 0 prefix specifies octal, no prefix is decimal). The argument may have a k, K, m, or M suffix.
The first two multiply the value by 1024, the 3rd and 4th multiple by 1048576, so a tape with a 2.2
Gbyte capacity might be specified as C 2m allowing 10% loss to retries, etc.
120
Indicates that the tape is a cartridge tape instead of the standard default half-inch reel. This should
always be specified when using cartridge tapes. The values for blocking factor, size and density are
taken to be 10 (1 KByte blocks), 5400 feet and 1000 BPI respectively unless overridden with the b,
s or d option. Cartridge tapes with multiple tracks have a greater effective length which can be
specified with the s option.
dump tells the operator what file systems need to be dumped. This information is gleaned from the
files /etc/dumpdates and /etc/fstab. The W option causes dump to print out, for each file system in
/etc/dumpdates the most recent dump date and level, and highlights those file systems that should be
dumped. The mnt_freq field in the /etc/fstab entry of the file system must be non-zero for dump to
determine whether the file system should be dumped or not. If the W option is set, no other option
must be given, and dump exits immediately.
Whenever dump requires operator attention, notify by means similar to a wall(1) all of the operators
in the group operator.
DUMP(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
dump reads the character device associated with the filesystem and dumps the contents onto the specified
tape device. It searches /etc/fstab to find the associated character device.
NOTES
121
DUMP(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Although the Tuesday Friday incrementals contain extra copies of files from Monday, this scheme
assures that any file modified during the week can be recovered from the previous days incremental
dump.
Dump Parameters
The following table gives a list of available tape formats, size and densities. It is important that the correct
parameters be given to dump, if they are different from the defaults.
Parameters for cartridge tapes
Cartridge Interface
Number of Tracks
Physical Tape Length (feet)
Effective Tape Length (feet)
QIC-24
9
600
5400
QIC-120
15
600
9000
QIC-150
18
600
10800
Cartridge tapes with multiple tracks have an greater effective length. The tape lengths give above assume
a physical tape length of 600 feet. In general the effective tape length can be calculated by multiplying the
physical tape length by the number of tracks. Since some tape is usually lost due to tape errors, and
because dump does not handle end-of-tape gracefully, it pays to be conservative in estimating the effective tape length.
Parameters for half-inch tapes
Thickness
Reel Sizes (inches)
Tape Length (feet)
6.0
200
7.0
600
8.5
1200
10.5
2400
3600
1.9 mm
1.3 mm
The density for these tapes can be any one of the following: 800, 1600, 3200 or 6250 BPI.
Parameters for 8mm tapes
Tape Type
P5 (European)
P6 (American)
length
(meters)
112
112
capacity
(Mbytes)
2200
2000
There was a bug in dump which causes it to miscalculate the number of tapes required when it is given a
large value for the density and a small value for tape length. To work around this, a density of 54000 and
length of 6000 feet was recommended while using 8mm tapes, rather than the actual density and length,
now the calculations are done with floating point numbers, so overflow is no longer an issue; with large
capacity drives such as the 8mm and 4mm, it is normally easier to specify capacity as C 2000k, rather than
trying to calculate a workable density and length.
If you do not wish to use the C option, then when using drives with no "inter-record gaps" (i.e., almost
every type except 9-track), use the c option, and the formula:
capacity in bytes = 7 * densityvalue * lengthvalue
122
DUMP(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
and round down a bit to be conservative (allowing for block rewrites, etc.). The density should be kept
under 100000 to avoid overflows in the capacity calculations. Thus, for a DAT drive with a 90 meter tape
(2 * 109 capacity), one might use:
2000000000 = 7 * 47619 * 6000
or rounding down:
dump 0csd 6000 47000
EXAMPLES
123
DUMP(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
dump 9ucdsf 54000 6000 /dev/mt/tps0d6nrnsv /os
dump 9uCf 2048 /dev/mt/tps0d6nrnsv /os
dump 9uCf 2m /dev/mt/tps0d6nrnsv /os
All do a level 9 dump of /os to the local tape device /dev/mt/tps0d6nrnsv using a tape density of 54000
BPI and tape length of 6000 feet where the tape device being used is an 8mm tape drive (there is a slight
difference in capacity between the first form and the others).
dump W
prints out, for each file system in /etc/dumpdates the most recent dump date and level, and highlights those
file systems that should be dumped.
FILES
/dev/tape
/etc/dumpdates
/etc/fstab
/etc/group
SEE ALSO
restore(1M), dump(5), fstab(5), group(4), rmt(1M), rhosts(1M), mtio(7), wall(1), shutdown(1M), ctime(3C)
DIAGNOSTICS
Fewer than 32 read errors on the filesystem are ignored. Each reel requires a new process, so parent
processes for reels already written just hang around until the entire tape is written.
dump with the W or w options does not report filesystems that have never been recorded in
/etc/dumpdates, even if listed in /etc/fstab.
It would be nice if dump knew about the dump sequence, kept track of the tapes scribbled on, told the
operator which tape to mount when, and provided more assistance for the operator running restore.
It is recommended that incremental dumps also be performed with the system running in single-user
mode.
dump needs accurate information regarding the length and density of the tapes used. It can dump the
filesystem on multiple volumes, but since there is no way of specifying different sizes for multiple tapes,
all tapes used should be at least as long as the specified/default length. If dump reaches the end of the
tape volume unexpectedly (as a result of a longer than actual length specification), it will abort the entire
dump.
124
ggd(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
ggd [ d [ cd ] ] [ c cpunum ]
DESCRIPTION
ggd manages the I/O-rate guarantees that have been granted to processes on the system. The daemon is
started from a script in the /etc/rc2.d directory. It reads the /etc/grio_config and /etc/grio_disks files to obtain
information about the available hardware devices. Processes can make requests for I/O-rate guarantees
by using the grio_request(3X) library call. After determining if the I/O rate can be guaranteed, the daemon returns a confirmation or rejection to the calling process.
The /etc/grio_config and /etc/grio_disks files are only read when the daemon is started. If these files are
edited, the daemon must be stopped and restarted in order to use the new information.
The d option with the d modifier causes verbose debugging information to be displayed. The d option
with the c modifier causes the checksum processing of the /etc/grio_config file to be disabled. This allows
the system administrator to customize a system by editing the configuration file. The c option causes the
daemon to mark the given cpunum cpu as a real-time cpu. The cpu will be isolated from the rest of the
processors on the system and the ggd daemon will be allowed to only run on this cpu. See the sysmp(2)
reference page for more information on real-time processing.
FILES
/etc/grio_config
/etc/grio_disks
SEE ALSO
If the ggd daemon is killed and restarted, all previous rate guarantees will become invalid. It creates a
lock file, /tmp/grio.lock, to prevent more than one copy of the daemon from running concurrently. If the
daemon is killed, this file must be removed before it can be successfully restarted.
125
lv_to_xlv(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
lv_to_xlv parses the file describing the logical volumes used by the local machine and generates the
required xlv_make(1M) commands to create an equivalent XLV volume. Normally, lv_to_xlv uses the logical volume file /etc/lvtab, but when the f option is specified, the given argument lvtab_file is used. If the
o option is specified, the xlv_make(1M) commands are sent to the file output_file instead of stdout.
FILES
/etc/lvtab
SEE ALSO
126
mkfs(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
mkfs constructs a filesystem by writing on the special file given as one of the command line arguments.
The filesystem constructed is either an EFS filesystem or an XFS filesystem depending on the arguments
given. mkfs constructs EFS filesystems by executing mkfs_efs; XFS filesystems are constructed by executing mkfs_xfs. The filesystem type chosen can be forced with the t option (also spelled F). If one of those
options is not given, mkfs determines which filesystem type to construct by examining its arguments.
SEE ALSO
mkfs_efs(1M), mkfs_xfs(1M)
127
mkfs_xfs(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
mkfs_xfs constructs an XFS filesystem by writing on a special file using the values found in the arguments
of the command line. It is invoked automatically by mkfs(1M) when mkfs is given the t xfs option or
options that are specific to XFS.
XFS filesystems are composed of a data section, a log section, and optionally a real-time section. This
separation can be accomplished using the XLV volume manager to create a multi-subvolume volume, or
by embedding an internal log section in the data section. In the former case, the xlv-device name is supplied as the final argument. In the latter case a disk partition, lv(7M) logical volume, or XLV logical
volume without a log subvolume may contain the XFS filesystem, which must be named by the d
name=special option.
Each of the subopt=value elements in the argument list above can be given as multiple comma-separated
subopt=value suboptions if multiple suboptions apply to the same option. Equivalently, each main option
may be given multiple times with different suboptions. For example, l internal,size=1000b and l internal l size=1000b are equivalent.
In the descriptions below, sizes are given in bytes, blocks, kilobytes, or megabytes. Sizes are treated as
hexadecimal if prefixed by 0x or 0X, octal if prefixed by 0, or decimal otherwise. If suffixed with b then
the size is converted by multiplying it by the filesystems block size. If suffixed with k then the size is
converted by multiplying it by 1024. If suffixed with m then the size is converted by multiplying it by
1048576 (1024 * 1024).
b
128
mkfs_xfs(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
The name suboption is used to specify the name of the special file containing the filesystem. In
this case, the log section must be specified as internal (with a size, see the l option below) and
there can be no real-time section. Either the block or character special device can be supplied. An
XLV logical volume with a log subvolume cannot be supplied here.
The file suboption is used to specify that the file given by the name suboption is a regular file.
The suboption value is either 0 or 1, with 1 signifying that the file is regular. This suboption is
used only to make a filesystem image (for instance, a miniroot image).
The size suboption is used to specify the size of the data section. This suboption is required if d
file=1 is given. Otherwise, it is only needed if the filesystem should occupy less space than the
size of the special file.
i
Inode options.
This option specifies the inode size of the filesystem. The XFS inode contains a fixed-size part and
a variable-size part. The variable-size part, whose size is affected by this option, can contain:
directory data, for small directories; symbolic link data, for small symbolic links; the extent list for
the file, for files with a small number of extents; and the root of a tree describing the location of
extents for the file, for files with a large number of extents.
The valid suboptions are: log=value, perblock=value, and size=value; only one can be supplied.
The inode size is specified either as a base two logarithm value with log=, in bytes with size=, or
as the number fitting in a filesystem block with perblock=. The default value is 256 bytes. The
minimum value for inode size is 128, and the maximum value is 2048 (2 KB) subject to the restriction that the inode size cannot exceed one half of the filesystem block size.
p protofile
If the optional p protofile argument is given, mkfs_xfs uses protofile as a prototype file and takes its
directions from that file. The blocks and inodes specifiers in the protofile are provided for backwards compatibility, but are otherwise unused. The prototype file contains tokens separated by
spaces or newlines. A sample prototype specification follows (line numbers have been added to
aid in the explanation):
1
2
/stand/diskboot
4872 110
129
mkfs_xfs(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
3
4
5
6
7
8
9
10
11
12
13
14
d--777 3 1
usr
d--777 3 1
sh
---755 3 1 /bin/sh
ken
d--755 6 1
$
b0
b--644 3 1 0 0
c0
c--644 3 1 0 0
fifo
p--644 3 1
slink
l--644 3 1 /a/symbolic/link
: This is a comment line
$
$
Line 1 is a dummy string. (It was formerly the bootfilename.) It is present for backward compatibility; boot blocks are not used on SGI machines.
Note that some string of characters must be present as the first line of the proto file to cause it to
be parsed correctly; the value of this string is immaterial since it is ignored.
Line 2 contains two numeric values (formerly the numbers of blocks and inodes). These are also
merely for backward compatibility: two numeric values must appear at this point for the proto
file to be correctly parsed, but their values are immaterial since they are ignored.
Lines 3-11 tell mkfs_xfs about files and directories to be included in this filesystem. Line 3
specifies the root directory. Lines 4-6 and 8-10 specifies other directories and files. Note the special symbolic link syntax on line 11.
The $ on line 7 tells mkfs_xfs to end the branch of the filesystem it is on, and continue from the
next higher directory. It must be the last character on a line. The colon on line 12 introduces a
comment; all characters up until the following newline are ignored. Note that this means you
may not have a file in a prototype file whose name contains a colon. The $ on lines 13 and 14 end
the process, since no additional specifications follow.
File specifications give the mode, the user ID, the group ID, and the initial contents of the file.
Valid syntax for the contents field depends on the first character of the mode.
The mode for a file is specified by a 6-character string. The first character specifies the type of the
file. The character range is bcdpl to specify regular, block special, character special, directory
files, named pipes (fifos) and symbolic links, respectively. The second character of the mode is
either u or to specify setuserID mode or not. The third is g or for the setgroupID mode. The
rest of the mode is a 3-digit octal number giving the owner, group, and other read, write, execute
permissions (see chmod(1)).
Two decimal number tokens come after the mode; they specify the user and group IDs of the
owner of the file.
130
mkfs_xfs(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
If the file is a regular file, the next token of the specification may be a pathname from which the
contents and size are copied. If the file is a block or character special file, two decimal numbers
follow that give the major and minor device numbers. If the file is a symbolic link, the next token
of the specification is used as the contents of the link. If the file is a directory, mkfs_xfs makes the
entries . and .. and then reads a list of names and (recursively) file specifications for the entries in
the directory. As noted above, the scan is terminated with the token $.
q
Quiet option.
Normally mkfs_xfs prints the parameters of the filesystem to be constructed; the q flag
suppresses this.
SEE ALSO
mkfs(1M), mkfs_efs(1M)
BUGS
131
RESTORE(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
restore reads tapes dumped with the dump (1M) command and restores them relative to the current directory. Its actions are controlled by the key argument. The key is a string of characters containing at most
one function letter and possibly one or more function modifiers. Any arguments supplied for specific
options are given as subsequent words on the command line, in the same order as that of the options
listed. Other arguments to the command are file or directory names specifying the files that are to be
restored. Unless the h key is specified (see below), the appearance of a directory name refers to the files
and (recursively) subdirectories of that directory.
The function portion of the key is specified by one of the following letters:
r
Restore the entire tape. The tape is read and its full contents loaded into the current directory. This
should not be done lightly; the r key should only be used to restore a complete level 0 dump tape
onto a clear file system or to restore an incremental dump tape after a full level zero restore. Thus
/etc/mkfs /dev/dsk/dks0d2s0
/etc/mount /dev/dsk0d2s0 /mnt
cd /mnt
restore r
is a typical sequence to restore a complete dump. Another restore can be done to get an incremental
dump in on top of this. Note that restore leaves a file restoresymtable in the root directory to pass
information between incremental restore passes. This file should be removed when the last incremental tape has been restored. Also, see the note in the BUGS section below.
132
Resume restoring. restore requests a particular tape of a multi volume set on which to restart a full
restore (see the r key above). This allows restore to be interrupted and then restarted.
The named files are extracted from the tape. If the named file matches a directory whose contents
had been written onto the tape, and the h key is not specified, the directory is recursively extracted.
The owner, modification time, and mode are restored (if possible). If no file argument is given, then
the root directory is extracted, which results in the entire content of the tape being extracted, unless
the h key has been specified.
The names of the specified files are listed if they occur on the tape. If no file argument is given, then
the root directory is listed, which results in the entire content of the tape being listed, unless the h
key has been specified. Note that the t key replaces the function of the old dumpdir program.
This mode allows interactive restoration of files from a dump tape. After reading in the directory
information from the tape, restore provides a shell like interface that allows the user to move
around the directory tree selecting files to be extracted. The available commands are given below;
for those commands that require an argument, the default is the current directory.
RESTORE(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
ls [arg] List the current or specified directory. Entries that are directories are appended with a
/. Entries that have been marked for extraction are prepended with a *. If the verbose
key is set the inode number of each entry is also listed.
cd arg Change the current working directory to the specified argument.
pwd Print the full pathname of the current working directory.
add [arg] The current directory or specified argument is added to the list of files to be extracted. If
a directory is specified, then it and all its descendents are added to the extraction list (unless
the h key is specified on the command line). Files that are on the extraction list are prepended
with a * when they are listed by ls.
delete [arg] The current directory or specified argument is deleted from the list of files to be
extracted. If a directory is specified, then it and all its descendents are deleted from the extraction list (unless the h key is specified on the command line). The most expedient way to
extract most of the files from a directory is to add the directory to the extraction list and then
delete those files that are not needed.
extract All the files that are on the extraction list are extracted from the dump tape. restore will
ask which volume the user wishes to mount. The fastest way to extract a few files is to start
with the last volume, and work towards the first volume.
setmodes All the directories that have been added to the extraction list have their owner, modes,
and times set; nothing is extracted from the tape. This is useful for cleaning up after a restore
has been prematurely aborted.
verbose The sense of the v key is toggled. When set, the verbose key causes the ls command to list
the inode numbers of all entries. It also causes restore to print out information about each file
as it is extracted.
help List a summary of the available commands.
quit restore immediately exits, even if the extraction list is not empty.
The following characters may be used in addition to the letter that selects the function desired.
b
The next argument to restore is used as the block size of the tape (in kilobytes). If the b option is not
specified, restore tries to determine the tape block size dynamically, but will only be able to do so if
the block size is 32 or less. For larger sizes, the b option must be used with restore.
133
RESTORE(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
f
The next argument to restore is used as the name of the archive instead of /dev/tape. If the name of the
file is , restore reads from standard input. Thus, dump(1M) and restore can be used in a pipeline
to dump and restore a file system with the command
dump 0f - /usr | (cd /mnt; restore xf -)
If the name of the file is of the format machine:device then the filesystem dump is restored from the
specified machine over the network. restore creates a remote server /etc/rmt, on the client machine
to access the tape device. Since restore is normally run by root, the name of the local machine must
appear in the .rhosts file of the remote machine. If the file name argument is of the form
user@machine:device, restore will attempt to execute as the specified use on the remote machine. The
specified user must have a .rhosts file on the remote machine that allows root from the local
machine.
134
Normally restore does its work silently. The v (verbose) key causes it to type the name of each file it
treats preceded by its file type.
restore will not ask whether it should abort the restore if gets a tape error. It will always try to skip
over the bad tape block(s) and continue as best it can.
restore will extract by inode numbers rather than by file name. This is useful if only a few files are
being extracted, and one wants to avoid regenerating the complete pathname to the file.
restore extracts the actual directory, rather than the files that it references. This prevents hierarchical restoration of complete subtrees from the tape.
The next argument to restore is a number which selects the dump file when there are multiple dump
files on the same tape. File numbering starts at 1.
Only those files which are newer than the file specified by the next argument are considered for restoration. restore looks at the modification time of the specified file using the stat(2) system call.
Restores only non-existent files or newer versions (as determined by the file status change time
stored in the dump file) of existing files. Note that the ls(1) command shows the modification time
and not the file status change time. See stat(2) for more details.
Normally restore does not use chown(2) to restore files to the original user and group id unless it is
being run by the super-user (or with the effective user id of zero). This is to provide Berkeley style
semantics. This can be overridden with the o option which will result in restore attempting to
restore the original ownership to the files.
Do not write anything to the disk. This option can be used to validate the tapes after a dump. If
invoked with the "r" option, restore goes through the motion of reading all the dump tapes without
actually writing anything to the disk.
RESTORE(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
DIAGNOSTICS
135
RESTORE(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Error while writing to file /tmp/rstdir*
An error was encountered while writing to the temporary file containing information about the
owner, mode and timestamp information of directories. Use the TMPDIR environment variable to
relocate this file in a directory which has more space available.
EXAMPLES
restore r
will restore the entire tape into the current directory, reading from the default tape device /dev/tape.
restore rf [email protected]:/dev/tape
will restore the entire tape into the current directory, reading from the remote tape device /dev/tape on
host kestrel.sgi.com using the guest account.
restore x /etc/hosts /etc/fstab /etc/myfile
will restore the three specified files into the current directory, reading from the default tape device
/dev/tape.
restore x /dev/dsk
will restore the entire /dev/dsk directory and subdirectories recursively into the current directory, reading from the default tape device /dev/tape
restore rN
will read the entire tape and go through all the motions of restoring the entire dump, without writing to
the disk. This can be used to validate the dump tape.
restore xe /usr/dir/foo
will restore (recursively) all files in the given directory /usr/dir/foo. However, no existing files are
overwritten.
restore xn /usr/dir/bar
will restore (recursively) all files which are newer than the given file /usr/dir/bar.
FILES
/dev/tape
This is the default tape device used unless the environment variable TAPE is set.
136
RESTORE(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
/tmp/rstdir*
This temporary file contains the directories on the tape. If the environment variable TMPDIR is set,
then the file will be created in that directory.
/tmp/rstmode*
This temporary file contains the owner, mode, and time stamps for directories. If the environment
variable TMPDIR is set, then the file will be created in that directory.
./restoresymtable
Information is passed between incremental restores in this file.
SEE ALSO
restore can get confused when doing incremental restores from dump tapes that were made on active file
systems.
A level 0 dump must be done after a full restore. Because restore runs in user code, it has no control
over inode allocation. This results in the files being restored having an inode numbering different from
the filesystem that was originally dumped. Thus a full dump must be done to get a new set of directories
reflecting the new inode numbering, even though the contents of the files is unchanged, so that later incremental dumps will be correct.
Existing dangling symlinks are modified even if the e option is supplied, if the dump tape contains a hard
link by the same name.
137
xfs_check(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xfs_check checks whether an XFS filesystem is consistent. It is normally run only when there is reason to
believe that the filesystem has a consistency problem. The filesystem to be checked is specified by the
xlvspecial or diskspecial argument, which should be the disk or volume device for the filesystem. Filesystems stored in files can also be checked, using the f flag. The filesystem should normally be unmounted
or read-only during the execution of xfs_check, otherwise spurious problems are reported.
The options to xfs_check are:
d
Specifies that the special device is a disk partition name or an lv(7M) volume name (as opposed
to an XLV logical volume).
Specifies that the special device is actually a file (see the mkfs_xfs -d file option). This might happen if an image copy of a filesystem has been made into an ordinary file.
Specifies that only serious errors should be reported. Serious errors are those that make it
impossible to find major data structures in the filesystem. This option can be used to cut down
the amount of output when there is a serious problem, when it might make it difficult to see
what the real problem is.
Specifies verbose output; it is impossibly long for a reasonably-sized filesystem. This option is
intended for internal use only.
i ino
Specifies verbose behavior for a specific inode. For instance, it can be used to locate all the
blocks associated with a given inode.
Any output from xfs_check means that the filesystem has an inconsistency. The only repair mechanism
available is to dump the filesystem with xfsdump(1M), then use mkfs_xfs(1M) to make a new filesystem,
then use xfsrestore(1M) to restore the data.
DIAGNOSTICS
Under two circumstances, xfs_check unfortunately might dump core rather than produce useful output.
First, if the filesystem is completely corrupt, a core dump might be produced instead of the message xxx
is not a valid filesystem. Second, if the filesystem is very large (has many files) then xfs_check
might run out of memory.
138
xfs_check(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
The following is a description of the most likely problems and the associated messages. Most of the diagnostics produced are only meaningful with an understanding of the structure of the filesystem.
xxx is not an XLV volume device name
The d option is needed for filesystems that reside in disk partitions instead of in XLV
volumes.
agf_freeblks n, counted m in ag a
The freeblocks count in the allocation group header for allocation group a doesnt match
the number of blocks counted free.
agf_longest n, counted m in ag a
The longest free extent in the allocation group header for allocation group a doesnt
match the longest free extent found in the allocation group.
agi_count n, counted m in ag a
The allocated inode count in the allocation group header for allocation group a doesnt
match the number of inodes counted in the allocation group.
agi_freecount n, counted m in ag a
The free inode count in the allocation group header for allocation group a doesnt match
the number of inodes counted free in the allocation group.
block a/b expected inum 0 got i
The block number is specified as a pair (allocation group number, block in the allocation
group). The block is used multiple times (shared), between multiple inodes. This message usually follows a message of the next type.
block a/b expected type unknown got y
The block is used multiple times (shared).
block a/b type unknown not expected
The block is unaccounted for (not in the freelist and not in use).
link count mismatch for inode nnn (name xxx), nlink m, counted n
The inode has a bad link count (number of references in directories).
rtblock b expected inum 0 got i
The block is used multiple times (shared), between multiple inodes. This message usually follows a message of the next type.
rtblock b expected type unknown got y
The real-time block is used multiple times (shared).
rtblock b type unknown not expected
The real-time block is unaccounted for (not in the freelist and not in use).
sb_fdblocks n, counted m
The number of free data blocks recorded in the superblock doesnt match the number
counted free in the filesystem.
139
xfs_check(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
sb_frextents n, counted m
The number of free real-time extents recorded in the superblock doesnt match the
number counted free in the filesystem.
sb_icount n, counted m
The number of allocated inodes recorded in the superblock doesnt match the number
allocated in the filesystem.
sb_ifree n, counted m
The number of free inodes recorded in the superblock doesnt match the number free in
the filesystem.
SEE ALSO
140
xfs_estimate(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
For each directory argument, xfs_estimate estimates the space that directory would take if it were copied to
an XFS filesystem. Note that xfs_estimate does not cross mount points. Also, the following definitions are
used: KB = *1024, MB = *1024*1024, GB = *1024*1024*1024.
b blocksize
Use blocksize instead of the default blocksize of 4096 bytes. The modifier k may be used after the
number to indicate multiplication by 1024. For example,
xfs_estimate -b 64k /
requests an estimate of the space required by the directory / on an XFS filesystem using a blocksize of 64k (65536) bytes.
v
i, -e logsize
Use logsize instead of the default log size of 10 MB. -i refers to an internal log, while -e refers to
an external log. The modifier k or m may be used after the number to indicate multiplication by
1024 or 1048576, respectively.
For example,
xfs_estimate -i 1m /
requests an estimate of the space required by the directory / on an XFS filesystem using an internal log of 1 megabyte.
EXAMPLES
141
xfs_estimate(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
% xfs_estimate /var/tmp
/var/tmp will take about 14 megabytes
142
xfs_growfs(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xfs_growfs expands an existing XFS filesystem, see xfs(4). The mount-point argument should be the pathname of the directory where the filesystem is mounted. The filesystem must be mounted to be grown, see
mount(1M). The existing contents of the filesystem are undisturbed, and the added space becomes available for additional file storage.
The d option specifies that the data section of the filesystem will be grown. If the D size option is given,
the data section will be grown to that size, otherwise the data section will be grown to the largest size possible. The size is expressed in filesystem blocks.
The r option specifies that the real-time section of the filesystem will be grown. If the R size option is
given, the real-time section will be grown to that size, otherwise the real-time section will be grown to the
largest size possible. The size is expressed in filesystem blocks. The filesystem does not need to have contained a real-time section before the growfs operation.
The l option specifies that the log section of the filesystem will be grown, shrunk, or moved. If the L
size option is given, the log section will be changed to be that size, if possible. The size is expressed in
filesystem blocks. The size of an internal log must be smaller than the size of an allocation group (this
value is printed at mkfs(1M) time). If the i option is given, the new log will be an internal log (inside the
data section). If the x option is given, the new log will be an external log (in an XLV log subvolume). If
neither i nor x is given with l, then the log will continue to be internal or external as it was before.
xfs_growfs is most often used in conjunction with logical volumes, see xlv(7M) or lv(7M). However, it can
also be used on a regular disk partition, for example if a partition has been enlarged while retaining the
same starting block.
PRACTICAL USE
Filesystems normally occupy all of the space on the device where they reside. In order to grow a filesystem, it is necessary to provide added space for it to occupy. Therefore there must be at least one spare
new disk partition available. Adding the space is done through the mechanism of logical volumes. If the
filesystem already resides on a logical volume, the volume is simply extended using mklv(1M) or
xlv_admin(1M). If the filesystem is currently on a regular partition, it is necessary to create a new logical
volume whose first member is the existing partition, with subsequent members being the new partition(s)
to be added. Again, mklv or xlv_admin is used for this. In either case xfs_growfs is run on the mounted
filesystem, and the expanded filesystem is then available for use.
SEE ALSO
143
xfsdump(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xfsdump backs up files in a filesystem. The files are dumped to storage media, a regular file, or standard
output. Options allow the operator to have all files dumped, just files that have changed since a previous
dump, or just files contained in a list of pathnames.
The xfsrestore(1M) utility re-populates a filesystem with the contents of the dump.
Each invocation of xfsdump dumps just one filesystem. That invocation is termed a dump session. The
dump session sends a single dump stream to the destination. The dump stream can span several media
objects, and a single media object can contain several dump streams. The typical media object is a tape
cartridge. The media object records the dump stream as one or more media files. A media file is a selfcontained partial dump. The portion of the dump stream contained on a media object can be split into
several media files to minimize the impact of media dropouts on the entire dump stream.
xfsdump maintains an online dump inventory in /var/xfsdump/inventory. The I option displays the inventory contents hierarchically. The levels of the hierarchy are: filesystem, dump session, stream, and media
file.
f destination
Specifies the dump destination. It can be the pathname of a device (such as a tape drive), a regular
file, or a remote tape drive (see rmt(1M)). This option must be omitted if the standard output option
(a lone preceding the filesystem specification) is specified.
l level
Specifies a dump level of 0 to 9. The dump level determines the base dump to which this dump is
relative. The base dump is the most recent dump at a lesser level. A level 0 dump is absolute all
files are dumped. A dump level where 1 <= level <= 9 is referred to as an incremental dump. Only
files that have been changed since the base dump are dumped. Subtree dumps (see the s option
below) cannot be used as the base for incremental dumps.
s pathname ...
Restricts the dump to files contained in the specified pathnames (subtrees). Up to 100 pathnames
can be specified. A pathname must be relative to the mount point of the filesystem. For example, if a
filesystem is mounted at /d2, the pathname argument for the directory /d2/users is users. A pathname can be a file or a directory; if it is a directory, the entire hierarchy of files and subdirectories
rooted at that directory is dumped. Subtree dumps cannot be used as the base for incremental
dumps (see the l option above).
144
xfsdump(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
v verbosity_level
Specifies the level of detail of the messages displayed during the course of the dump. The argument
can be silent, verbose, or trace. The default is verbose.
F
Dont prompt the operator. When xfsdump encounters a media object containing non-xfsdump data,
xfsdump normally asks the operator for permission to overwrite. With this option the overwrite is
performed, no questions asked. When xfsdump encounters end-of-media, xfsdump normally asks the
operator if another media object will be provided. With this option the dump is instead interrupted.
Displays the xfsdump inventory (no dump is performed). xfsdump records each dump session in an
online inventory in /var/xfsdump/inventory. xfsdump uses this inventory to determine the base for
incremental dumps. It is also useful for manually identifying a dump session to be restored.
Suboptions to filter the inventory display are described later.
Inhibits the normal update of the inventory. This is useful when the media being dumped to will be
discarded or overwritten.
L session_label
Specifies a label for the dump session. It can be any arbitrary string up to 255 characters long.
M media_label
Specifies a label for all media objects (for example, tape cartridges) written during the session. It
can be any arbitrary string up to 255 characters long.
R
Resumes a previously interrupted dump session. If the most recent dump at this dumps level (
l
option) was interrupted, this dump contains only files not in the interrupted dump and consistent
with the incremental level. However, files contained in the interrupted dump that have been subsequently modified are re-dumped.
A lone causes the dump stream to be sent to the standard output, where it can be piped to another
utility such as xfsrestore(1M) or redirected to a file. This option cannot be used with the f option.
The must follow all other options, and precede the filesystem specification.
The filesystem, filesystem, can be specified either as a mount point or as a special device file (for example,
/dev/dsk/dks0d1s0). The filesystem must be mounted to be dumped.
NOTES
Dump Interruption
A dump can be interrupted at any time and later resumed. To interrupt, type control-C (or the current
terminal interrupt character). The operator is prompted to select one of several operations, including
dump interruption. After the operator selects dump interruption, the dump continues until a convenient
break point is encountered (typically the end of the current file). Very large files are broken into smaller
subfiles, so the wait for the end of the current file is brief.
Dump Resumption
A previously interrupted dump can be resumed by specifying the R option. If the most recent dump at
the specified level was interrupted, the new dump does not include files already dumped, unless they
have changed since the interrupted dump.
145
xfsdump(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Media Management
A single media object can contain many dump streams. Conversely, a single dump stream can span multiple media objects. If a dump stream is sent to a media object already containing one or more dumps,
xfsdump appends the new dump stream after the last dump stream. Media files are never overwritten. If
end-of-media is encountered during the course of a dump, the operator is prompted to insert a new
media object into the drive. The dump continuation is appended after the last media file on the new
media object.
Inventory
Each dump session updates an inventory database in /var/xfsdump/inventory. xfsdump uses the inventory
to determine the base of incremental and resumed dumps.
This database can be displayed by invoking xfsdump with the I option. The display uses tabbed indentation to present the inventory hierarchically. The first level is filesystem. The second level is session. The
third level is media stream (currently only one stream is supported). The fourth level lists the media files
sequentially composing the stream.
Several suboptions are available to filter the display. Specifying I depth=n (where n is 1, 2, or 3) limits
the hierarchical depth of the display. Specifying I mobjid=value (where value is a media id) or I
mobjlabel=value (where value is a media label) limits the display to media files contained in the specified
media object. Similarly, the display can be restricted to a specific filesystem identified by mount point
using I mnt=host-qualified_mount_point_pathname, by filesystem id using I fsid=filesystem_id, or by device using I dev=host-qualified_device_pathname. At most three suboptions may be specified at once: one
to constrain the depth, one to constrain the media object, and one to constrain the filesystem. For example, I depth=1,mobjlabel="tape 1",mnt=host1:/test_mnt would display only the filesystem information
(depth=1) for those filesystems which were mounted on host1:/test_mnt at the time of the dump, and only
those filesystems dumped to the media object labeled "tape 1".
There is currently no way to remove dumps from the inventory.
An additional media file is placed at the end of each dump stream. This media file contains the inventory
information for the current dump session. This is currently unused.
When operating in the miniroot environment, xfsdump does not create and does not reference the inventory database. Thus incremental and resumed dumps are not allowed.
Labels
The operator can specify a label to identify the dump session and a label to identify a media object. The
session label is placed in every media file produced in the course of the dump, and is recorded in the
inventory.
The media label is used to identify media objects, and is independent of the session label. Each media file
on the media object contains a copy of the media label. An error will be returned if the operator specifies
a media label which does not match the media label on a media object containing valid media files.
Media labels are recorded in the inventory.
146
xfsdump(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
UUIDs
UUIDs (Universally Unique Identifiers) are used in three places: to identify the filesystem being dumped,
to identify the dump session, and to identify each media object. The inventory display (
I) includes all of
these.
Dump Level Usage
The dump level mechanism provides a structured form of incremental dumps. A dump of level level
includes only files that have changed since the most recent dump at a level less than level. For example,
the operator can establish a dump schedule that involves a full dump every Friday and a daily incremental dump containing only files that have changed since the previous dump. In this case Fridays dump
would be at level 0, Saturdays at level 1, Sundays at level 2, and so on, up to the Thursday dump at level
6.
The above schedule results in a very tedious restore procedure to fully reconstruct the Thursday version
of the filesystem; xfsrestore would need to be fed all 7 dumps in sequence. A compromise schedule is to
use level 1 on Saturday, Monday, and Wednesday, and level 2 on Sunday, Tuesday, and Thursday. The
Monday and Wednesday dumps would take longer, but the worst case restore requires the accumulation
of just three dumps, one each at level 0, level 1, and level 2.
Miniroot Restrictions
xfsdump is subject to the following restrictions when operated in the miniroot environment: nonrestartable, no incrementals, no online inventory, synchronous I/O.
FILES
rmt(1M), xfsrestore(1M)
DIAGNOSTICS
The exit code is 0 on normal completion, non-zero if an error occurs or the dump is terminated by the
operator.
BUGS
xfsdump always rewinds tape media, then seeks to the end of the last dump prior to appending the
current dump.
Some of the command line options are not checked until the media has been rewound. Thus, errors in
those options are not reported immediately.
xfsdump does not dump unmounted filesystems.
The dump frequency field of /etc/fstab is not supported.
xfsdump does not have the capability to send mail when operator intervention is required.
Only one f option is allowed, because xfsdump does not have the capability to partition the dump into
multiple streams, each directed to a different media drive.
147
xfsdump(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
No means is provided to remove media objects from the inventory.
xfsdump requires root privilege (except for inventory display).
xfsdump can only dump XFS filesystems.
The media format used by xfsdump can only be understood by xfsrestore.
Dumps may not be written to fixed block size tape devices via the remote tape device interface.
xfsdump does not know how to manage CD-ROM or other removable disk drives.
148
xfsrestore(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xfsrestore restores filesystems from dumps produced by xfsdump(1M). Two modes of operation are available: simple and cumulative.
The default is simple mode. xfsrestore populates the specified destination directory, destination, with the
files contained in the dump media.
The r option specifies the cumulative mode. Successive invocations of xfsrestore are used to apply a
chronologically ordered sequence of delta dumps to a base (level 0) dump. The contents of the filesystem
at the time each dump was produced is reproduced. This can involve adding, deleting, renaming, linking, and unlinking files and directories.
A delta dump is defined as either an incremental dump (xfsdump l option with level > 0) or a resumed
dump (xfsdump R option). The deltas must be applied in the order they were produced. Each delta
applied must have been produced with the previously applied delta as its base.
a housekeeping
Each invocation of xfsrestore creates a directory called xfsrestorehousekeeping. This directory is normally created directly under the destination directory. The a option allows the operator to specify
an alternate directory, housekeeping, in which xfsrestore creates the xfsrestorehousekeeping directory.
When performing a cumulative (
r option) restore, each successive invocation of xfsrestore must
specify the same alternate directory.
e
f source
Specifies the source of the dump to be restored. This can be the pathname of a device (such as a
tape drive), a regular file, or a remote tape drive (see rmt(1M)). This option must be omitted if the
standard input option (a lone preceding the destination specification) is specified.
i
Selects interactive operation. Once the on-media directory hierarchy has been read, an interactive
dialogue is begun. The operator uses a small set of commands to peruse the directory hierarchy,
selecting files and subtrees for extraction. The available commands are given below. Initially nothing is selected, except for those subtrees specified with s command line options.
ls [arg]
List the entries in the current directory or the specified directory, or the specified
non-directory file entry. Both the entrys original inode number and name are
displayed. Entries that are directories are appended with a /. Entries that have
been selected for extraction are prepended with a *.
149
xfsrestore(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
cd [arg]
Change the current working directory to the specified argument, or to the filesystem root directory if no argument is specified.
pwd
Print the pathname of the current directory, relative to the filesystem root.
add [arg]
The current directory or specified file or directory within the current directory is
selected for extraction. If a directory is specified, then it and all its descendents are
selected. Entries that are selected for extraction are prepended with a * when
they are listed by ls.
delete [arg]
The current directory or specified file or directory within the current directory is
deselected for extraction. If a directory is specified, then it and all its descendents
are deselected. The most expedient way to extract most of the files from a directory is to select the directory and then deselect those files that are not needed.
extract
Ends the interactive dialogue, and causes all selected subtrees to be restored.
quit
xfsrestore ends the interactive dialogue and immediately exits, even if there are files
or subtrees selected for extraction.
help
Simultaneous cumulative (
r option) and interactive restores are not allowed.
n file
Allows xfsrestore to restore only files newer than file. The modification time of file (i.e., as displayed
with the ls -l command) is compared to the i-node modification time of each file on the source
media (i.e., as displayed with the ls -lc command) . A file is restored from media only if its i-node
modification time is greater than or equal to the modification time of file.
r
s subtree
Specifies a subtree to restore. Any number of s options are allowed. The restore is constrained to
the union of all subtrees specified. Each subtree is specified as a pathname relative to the restore
destination. If a directory is specified, the directory and all files beneath that directory are restored.
Simultaneous cumulative (
r option) and subtree restores are not allowed.
t
Displays the contents of the dump, but does not create or modify any files or directories. It may be
desirable to set the verbosity level to silent when using this option.
v verbosity_level
Specifies the level of detail of the messages displayed during the course of the restore. The argument can be silent, verbose, or trace. The default is verbose.
E
150
Prevents xfsrestore from overwriting newer versions of files. The i-node modification time of the
on-media file is compared to the i-node modification time of corresponding file in the destination
directory. The file is restored only if the on-media version is newer than the version in the destination directory. The i-node modification time of a file can be displayed with the ls -lc command.
xfsrestore(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
I
Causes the xfsdump inventory to be displayed (no restore is performed). Each time xfsdump is used,
an online inventory in /var/xfsdump/inventory is updated. This is used to determine the base for
incremental dumps. It is also useful for manually identifying a dump session to be restored (see the
L and S options). Suboptions to filter the inventory display are described later.
L session_label
Specifies the label of the dump session to be restored. The source media is searched for this label. It
is any arbitrary string up to 255 characters long. The label of the desired dump session can be
copied from the inventory display produced by the I option.
S session_id
Specifies the session UUID of the dump session to be restored. The source media is searched for
this UUID. The UUID of the desired dump session can be copied from the inventory display produced by the I option.
A lone causes the standard input to be read as the source of the dump to be restored. Standard
input can be a pipe from another utility (such as xfsdump(1M)) or a redirected file. This option cannot be used with the f option. The must follow all other options, and precede the destination
specification.
The dumped filesystem is restored into the destination directory. There is no default; the destination must
be specified.
NOTES
Cumulative Restoration
A base (level 0) dump and an ordered set of delta dumps can be sequentially restored, each on top of the
previous, to reproduce the contents of the original filesystem at the time the last delta was produced. The
operator invokes xfsrestore once for each dump. The r option must be specified. The destination directory must be the same for all invocations. Each invocation leaves a directory named xfsrestorehousekeeping
in the destination directory (however, see the a option above). This directory contains the state information that must be communicated between invocations. The operator must remove this directory after the
last delta has been applied.
xfsrestore also generates a directory named orphanage in the destination directory. xfsrestore removes this
directory after completing a simple restore. However, if orphanage is not empty, it will not be removed.
This can happen if files present on the dump media are not referenced by any of the restored directories.
The orphanage has an entry for each such file. The entry name is the files original inode number.
xfsrestore does not remove the orphanage after cumulative restores. Like the xfsrestorehousekeeping directory, the operator must remove it after applying all delta dumps.
Media Management
A dump consists of one or more media files contained on one or more media objects. A media file contains all or a portion of the filesystem dump. Large filesystems are broken up into multiple media files to
minimize the impact of media dropouts, and to accommodate media object boundaries (end-of-media).
151
xfsrestore(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
A media object is any storage medium: a tape cartridge, a remote tape device (see rmt(1M)), a regular file,
or the standard input (currently other removable media drives are not supported). Tape cartridges can
contain multiple media files, which are typically separated by (in tape parlance) file marks. If a dump
spans multiple media objects, the restore must begin with the media object containing the first media file
dumped. The operator is prompted when the next media object is needed.
Media objects can contain more than one dump. The operator can select the desired dump by specifying
the dump label (
L option), or by specifying the dump UUID (
S option). If neither is specified, xfsrestore
scans the entire media object, prompting the operator as each dump session is encountered.
The inventory display (
I option) is useful for identifying the media objects required. It is also useful for
identifying a dump session. The session UUID can be copied from the inventory display to the S option
argument to unambiguously identify a dump session to be restored.
Dumps placed in regular files or the standard output do not span multiple media objects, nor do they
contain multiple dumps.
Inventory
Each dump session updates an inventory database in /var/xfsdump/inventory. This database can be
displayed by invoking xfsrestore with the I option. The display uses tabbed indentation to present the
inventory hierarchically. The first level is filesystem. The second level is session. The third level is media
stream (currently only one stream is supported). The fourth level lists the media files sequentially composing the stream.
Several suboptions are available to filter the display. Specifying I depth=n (where n is 1, 2, or 3) limits
the hierarchical depth of the display. Specifying I mobjid=value (where value is a media id) or I
mobjlabel=value (where value is a media label) limits the display to media files contained in the specified
media object. Similarly, the display can be restricted to a specific filesystem identified by mount point
using I mnt=host-qualified_mount_point_pathname, by filesystem id using I fsid=filesystem_id, or by device using I dev=host-qualified_device_pathname. At most three suboptions may be specified at once: one
to constrain the depth, one to constrain the media object, and one to constrain the filesystem. For example, I depth=1,mobjlabel="tape 1",mnt=host1:/test_mnt would display only the filesystem information
(depth=1) for those filesystems which were mounted on host1:/test_mnt at the time of the dump, and only
those filesystems dumped to the media object labeled "tape 1".
There is currently no way to remove dumps from the inventory.
An additional media file is placed at the end of each dump stream. This media file contains the inventory
information for the current dump session. This is currently unused.
Media Errors
xfsdump is tolerant of media errors, but cannot do error correction. If a media error occurs in the body of
a media file, the filesystem file represented at that point is lost. The bad portion of the media is skipped,
and the restoration resumes at the next filesystem file after the bad portion of the media.
152
xfsrestore(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
If a media error occurs in the beginning of the media file, the entire media file is lost. For this reason,
large dumps are broken into a number of reasonably sized media files. The restore resumes with the next
media file.
FILES
rmt(1M), xfsdump(1M)
DIAGNOSTICS
The exit code is 0 on normal completion, and non-zero if an error occurred or the restore was terminated
by the operator.
BUGS
153
xlv_admin(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xlv_admin modifies XLV logical volume objects and their disk labels
SYNOPSIS
xlv_admin [ r root ]
DESCRIPTION
xlv_admin is a menu-driven command that is used to modify existing XLV objects (volumes, plexes,
volume elements, and XLV disk labels). xlv_admin can operate on XLV volumes even while they are
mounted and in use.
xlv_admin supports a single command line option:
r root
Use root as the root directory. This is used in the miniroot when / is mounted as /root.
154
The add operations allow you to add an XLV object to another XLV object. This allows you to,
for example, add a plex to a volume. The plex or volume element to be added must first be
created via xlv_make(1M). Note that xlv_admin refers to the larger object (the volume, in this
case) as the object to be operated on.
xlv_admin(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
detach
The detach operations allow you to separate a part of an XLV object and make it an independent XLV object. If you detach a plex from a plexed volume, for example, that plex would
be separated from the volume and made into a standalone plex. The original volume would
have one less plex.
remove
The remove operations allow you to destroy a part of an XLV object. Removing plex number
1 from a volume with two plexes results in an XLV volume that has a single plex. The disk
partitions that were part of the removed plex are no longer part of any XLV object.
delete
show
The show operations allow you to examine the list of XLV objects on the system and their
structure.
155
xlv_admin(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
xlv_admin> 42
Please enter name of object to be operated on.
xlv_admin> ve5
============= Displaying Requested Object ==========
ve ve5 [empty]
start=0, end=76199, (cat)grp_size=1
/dev/dsk/dks0d2s5 (76200 blks)
156
xlv_admin(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Assuming that we have a volume element, spareve, that contains a single disk partition
/dev/dsk/dks1d4s2, the following sequense of commands adds it to the end of plex 0 of the data
subvolume of volume db1:
Please select choice...
xlv_admin> 2
Please enter name of object to be operated on.
xlv_admin> db1.data.0
Please enter the object you wish to add to the target.
xlv_admin> spareve
Please select choice...
xlv_admin> 42
Please enter name of object to be operated on.
xlv_admin> db1
vol db1
ve db1.data.0.0 [active]
start=0, end=1100799, (cat)grp_size=1
/dev/dsk/dks1d4s0 (1100800 blks)
ve db1.data.0.1 [active]
start=1100800, end=2201599, (cat)grp_size=1
/dev/dsk/dks1d4s1 (1100800 blks)
ve db1.data.0.2 [active]
start=2201600, end=3302399, (cat)grp_size=1
/dev/dsk/dks1d4s2 (1100800 blks)
3. Add a plex to an existing volume.
Allows you to add a plex to a volume. This allows you to create duplicate copies of the data
on the volume for greater reliability. This operation is sometimes called mirroring. When you
pick this selection, xlv_admin prompts you for the volume to add the plex to and the name of
the plex. After the plex has been added, xlv_admin automatically initiates a plex revive operation; this copies the data from the original XLV plexes to the newly added plex so that the plex
holds the same data as the original plexes in the volume. The following shows how to add a
plex named plex2 to the data subvolume of volume db1:
Please select choice...
xlv_admin> 3
Please enter name of object to be operated on.
xlv_admin> db1.data
Please enter the object you wish to add to the target.
xlv_admin> plex2
157
xlv_admin(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
You can use selection 42 to display volume db1 and see that the disk partitions that were part
of plex2 are now a component of db2. Note that plex2 no longer exists as a standalone plex
since it has been merged into volume db1.
11. Detach a ve from an existing plex.
Allows you to separate a volume element from a plex. This volume element can later be reinserted into some other XLV object. The plex from which the volume element is detached may
be a standalone plex or part of a volume. The detached volume element remains an XLV
object. The user first specifies the object from which the volume element will be detached and
then the name to be given to the detached volume element.
Note that detach and remove operations differ in how they handle the volume element once it
has been separated from the plex. A detach operation leaves the volume element intact while
the remove operation destroys the volume element by freeing its associated disks for use by
other volumes. The detach operation may be thought of as an unlink. You should use either
the detach or remove operation depending on whether you want the volume element to be left
intact after it has been separated from its plex.
12. Detach a plex from an existing volume.
Allows you to separate a plex from a volume. The user first specifies the volume and subvolume from which the plex is to be detached and then the name to assign to the newly created
standalone plex. This plex can later be added back to a volume by choosing selection 3.
The following example shows how to detach the first plex from a volume:
xlv_admin> 12
Please enter name of object to be operated on.
xlv_admin> db1.data
Please select plex number (0-3).
xlv_admin> 0
Please enter name of new object.
xlv_admin> detplex0
Please select choice...
xlv_admin>
21. Remove a ve from an existing plex.
Allows you to separate a volume element from a plex and destroys the removed volume element. The following shows how you can remove the second volume element from a plex:
xlv_admin> 21
Please enter name of object to be operated on.
xlv_admin> db1.data
Please select plex number (0-3).
xlv_admin> 0
Please enter ve number.
xlv_admin> 1
Please select choice...
158
xlv_admin(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
xlv_admin>
22. Remove a plex from an existing volume.
Allows you to separate a plex from a volume and destroys the removed plex.
31. Delete an object.
Allows you to delete a volume, a standalone plex, or a standalone volume element. This
operation removes the XLV configuration from the disk partitions that make up the XLV
object. Because the XLV configuration information is stored in the volume header (see
vh(7M)), this operation does not affect any user data that may have been written to the user
disk partitions.
32. Delete all XLV disk labels.
Allows you to delete the XLV configuration from all the disks on the system. You might want
to do this, for example, to initialize all the disks on a new system to ensure that there are no
leftover XLV configuration information on the disks. Note that this is a very dangerous operation. Deleting the disk labels destroys all of the XLV objects on the system.
41. Show object by name and type, only.
Allows you to view all the XLV objects on the system. This command lists only the names and
types of the XLV objects. The following shows what the output of this selection looks like:
Please select choice...
xlv_admin> 41
==================== Listing Objects =============
Volume:
root_vol
Volume:
db1
Volume Element:
ve12
Plex:
plex2
42. Show information for an object.
Allows you to see detailed information on an XLV object. It displays all the XLV parameters
as well as the disk partitions that make up the object.
In the example below, you can see that the volume named db1 has one subvolume of type data
that contains two plexes. The first plex has two volume elements, while the second plex only
has one volume element. The first volume element in each plex covers the same range of disk
blocks. For each volume element, xlv_admin displays the partitions that make up the volume
element, the size of the partition, and the range of this volumes disk blocks that map to the
volume element.
Please select choice...
xlv_admin> 42
Please enter name of object to be operated on.
xlv_admin> db1
vol db1
159
xlv_admin(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
ve db1.data.0.0 [active]
start=0, end=1100799, (cat)grp_size=1
/dev/dsk/dks1d4s0 (1100800 blks)
ve db1.data.0.1 [active]
start=1100800, end=2201599, (cat)grp_size=1
/dev/dsk/dks1d4s1 (1100800 blks)
ve db1.data.1.0 [active]
start=0, end=1100799, (cat)grp_size=1
/dev/dsk/dks1d4s2 (1100800 blks)
Note that the xlv_admin operations are complete in that they modify the XLV disk labels and
kernel as appropriate. If an operation is not successful, an error message is printed to the
screen explaining the failure.
SEE ALSO
Note that the xlv_admin operations modify both the XLV disk labels and the kernel data structures as
appropriate. This means that you do not need to run xlv_assemble(1M) for your changes to take effect.
The only exception to this is selection 32, which affects only the disk labels.
xlv_admin automatically initiates plex revive operations (see xlv_plexd(1M)) as required when you add a
new plex or when you add a volume element to a plexed volume.
You must be root to run xlv_admin.
160
xlv_assemble(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xlv_assemble scans all the disks attached to the local system for logical volume labels. It assembles all the
available logical volumes and generates a configuration data structure. xlv_assemble also creates the device nodes for all XLV volumes in /dev/dsk/xlv and /dev/rdsk/xlv. The kernel is then activated with the
newly created configuration data structure. If necessary, xlv_assemble will also ask the xlv_plexd(1M) to
perform any necessary plex revives.
xlv_assemble is automatically run on system startup from a script in the /etc/init.d/xlv directory. By default,
it is also automatically run after you run xlv_make(1M).
xlv_assemble supports the following options:
h name
Use name as the local nodename. Every logical volume label contains a system nodename. See
the l option below.
Assemble only those logical volumes that were created on this local system. Local logical
volumes have the local nodename in their logical volume labels. The default is to assemble all
logical volumes.
Scan all disks for logical volume labels, but dont save the logical volume configuration and
dont activate the kernel with this configuration.
Proceed quietly and dont display status messages after putting together the logical volume
configuration.
r root
Use root as the root directory. This is used in the miniroot when / is mounted as /root.
FILES
/dev/dsk/xlv/...
/dev/rdsk/xlv/...
/dev/dsk/xlv_root
/dev/rdsk/xlv_root
SEE ALSO
161
xlv_assemble(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NOTE
162
xlv_labd(7M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xlv_labd
xlv_plexd [ m #_subprocs ] [ b blocksize ] [ w sleep-interval ]
[ v verbosity ] [ h ]
DESCRIPTION
xlv_labd, xlv_plexd, and xlvd are logical volume daemons. xlv_labd and xlv_plexd reside in user process
space and xlvd resides in kernel process space.
The XLV label daemon, xlv_labd, is a user process that writes logical volume disk labels. It is normally
started during system restart. Upon startup, xlv_labd immediately calls into the kernel to wait for an
action request from the kernel daemon, xlvd. When an action request comes, xlv_labd processes it and
updates the appropriate volume disk labels. After completing the update, xlv_labd calls back into the kernel to wait for another request.
The XLV plex copy daemon, xlv_plexd, is a user process responsible for making all plexes within a subvolume consistent. The master xlv_plexd process is started at system startup time, with the m option, and
subsequently used when new plexes are added. It receives requests to revive plexes via the named pipe
/etc/.xlv_plexd_request_fifo and starts child processes to perform the actual plex copy.
m #_subprocs
b blocksize
blocksize is the granularity of a single plex copy operation in blocks. The default is
128 blocks, which means XLV initiates a plex copy of 128 blocks, sleeps as indicated by the w option (see below), then moves on to the next set of 128 blocks.
w sleep-interval
v verbosity
The XLV daemon, xlvd, is a kernel process that handles I/O to plexes and performs plex error recovery.
When disk labels require updating, xlvd initiates an action request to xlv_labd to perform the disk label
update. If there arent multiple plexes, xlvd does not do anything.
NOTE
All three daemons are automatically started and do not need to be explicitly invoked.
FILES
/etc/.xlv_plexd_request_fifo
163
xlv_labd(7M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
SEE ALSO
xlv(7M)
164
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xlv_make [ f ] [ v ] [ A ] [ input_file ]
DESCRIPTION
xlv_make creates new logical volume objects by writing logical volume labels to the devices that are to
constitute the volume objects. A volume object can be an entire volume, a plex, or a volume element.
xlv_make allows you to create objects that are not full volumes so that you can maintain a set of spares.
xlv_make supports the following options:
f
Force xlv_make to create a volume element even if the partition type for the partition specified does
not correspond with its intended usage. This is useful, for example, in converting lv(7M) volumes
to xlv(7M) volumes. It is also used to allow creation of objects involving currently mounted partitions.
Verbose option. Causes xlv_make to generate more detailed output. Also, it causes
xlv_assemble(1M) to generate output upon exit from xlv_make.
Do not invoke xlv_assemble(1M) upon exit from xlv_make. The default is to invoke xlv_assemble
with the q option unless the v option is specified, in which case xlv_assemble is invoked with no
options. To invoke other xlv_assemble options, specify the A option and invoke xlv_assemble
manually.
xlv_make only allows you to create volume objects out of disk partitions that are not currently part of
other volume objects. Partitions must be of a type suitable for use by xlv_make. Suitable types are xfs, efs,
xlv, and xfslog. Partition types other than these will be rejected unless the f command line option or the
ve force interactive command is specified. See fx(1M) for more information regarding partition types.
xlv_admin(1M) must be used to modify or destroy volume objects.
xlv_make can be run either interactively or it can take its commands from an input file, input_file. xlv_make
is written using Tcl. Therefore, all the Tcl features such as variables, control structures, and so on can be
used in xlv_make commands.
xlv_make creates volume objects by writing the disk labels. To make the newly created logical volumes
active, xlv_assemble(1M) must be run. xlv_assemble is, by default, automatically invoked upon successful
exit from xlv_make; xlv_assemble scans all the disks attached to the system and automatically assembles all
the available logical volumes.
Objects are specified top-down and depth-first. You start by specifying the top-level object, and continue
to specify the pieces that make it up. When you have completed specifying an object at one level, you can
back up and specify another object at the same level.
The commands are:
165
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
vol volume_name
Specifies a volume. The volume_name is required. It can be up to 14 characters in length.
log
data
rt
Specifies a real-time subvolume. Real-time subvolumes are used for guaranteed-rate I/O and
also for high performance applications that isolate user data on a separate subvolume.
plex [plex_name]
Specifies a plex. If this plex is specified outside of a volume, then plex_name must be given. A
plex that exists outside of a volume is known as a standalone plex.
ve [volume_element_name] [
stripe] [
concat] [
force]
[
stripe_unit stripe_unit_size] [
start blkno] device_pathnames
Specifies a volume element. If this volume element is specified outside of a plex, then
volume_element_name must be given.
stripe
Specifies that the data within this volume element will be striped across all the
disks named by device_pathnames.
concat
Specifies that all the devices named by device_pathnames are to be joined linearly
into a single logical range of blocks. This is the default if no flags are specified.
force
Forces the specification of the volume element when the partition type does not
agree with the volume elements intended usage. For example, a partition with
type xfslog could be assigned to a data subvolume. Also, force allows the
specification of an object that includes a partition that is currently mounted.
stripe_unit stripe_unit_size
specifies the number of blocks to write to one disk before writing to the next disk
in a stripe set. stripe_unit_size is expressed in 512-byte blocks. stripe_unit is
only meaningful when used in conjunction with stripe. The default stripe unit
size, if this flag is not set, is one track. Note: lv called this parameter the granularity.
start blkno
166
Specifies that this volume element should start at the given block number within
the plex.
end
clear
show
Prints out all the volume objects on the system. This includes existing volume objects (created
during an earlier xlv_make session) and new objects specified during this session that have not
been created (written out to the disk labels) yet.
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
exit
Create the objects specified during this session by writing the disk labels out to all the disks
affected, and exit xlv_make. In interactive mode, the user will be prompted to confirm this action
if any new objects have been created.
quit
Leave xlv_make without creating the specified objects (without writing the disk labels). All the
work done during this invocation of xlv_make will be lost. In interactive mode, the user is
prompted to confirm this action if any objects have been specified.
help
Same as help.
sh
Fork a shell.
EXAMPLES
Example 1
To make a volume from a description in an input file called volume_config.txt, give this command:
# xlv_make volume_config.txt
Example 2
This example shows making some volume objects interactively.
# xlv_make
Now make a small volume. (Note that xlv_make automatically adds a /dev/dsk to the disk paritition name
if it is missing from the ve command.)
xlv_make> vol small
small
xlv_make> log
small.log
xlv_make> plex
small.log.0
xlv_make> ve dks0d2s3
small.log.0.0
xlv_make> data
small.data
xlv_make> plex
small.data.0
167
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
xlv_make> ve dks0d2s14 dks0d2s12
small.data.0.0
xlv_make> end
Object specification completed
xlv_make> show
vol small
ve small.log.0.0
d710aa7d-b21d-1001-868d-080069077725
start=0, end=1523, (cat)grp_size=1
/dev/dsk/dks0d2s3 (1524 blks)
d710aa7e-b21d-1001-868d-080069077725
ve small.data.0.0
d710aa81-b21d-1001-868d-080069077725
start=0, end=4571, (cat)grp_size=2
/dev/dsk/dks0d2s14 (1524 blks) d710aa82-b21d-1001-868d-080069077725
/dev/dsk/dks0d2s12 (3048 blks) d710aa83-b21d-1001-868d-080069077725
plex spare_plex1
ve spare_plex1.0
d710aa77-b21d-1001-868d-080069077725
start=0, end=3047, (cat)grp_size=2
/dev/dsk/dks0d2s1 (1524 blks)
d710aa78-b21d-1001-868d-080069077725
/dev/dsk/dks0d2s2 (1524 blks)
d710aa79-b21d-1001-868d-080069077725
xlv_make> help
vol volume_name - Create a volume.
data | log | rt - Create subvolume of this type.
plex [plex_name] - Create a plex.
ve [-start] [-stripe] [-stripe_unit N] [-force] [volume_element_name] partition(s)
end - Finished composing current object.
clear- Delete partially created object.
show - Show all objects.
exit - Write labels and terminate session.
quit - Terminate session without writing labels.
help or ? - Display this help message.
sh - Fork a shell.
xlv_make> exit
#
Note that the strings like d710aa82-b21d-1001-868d-080069077725 shown above are the universally unique
identifiers (UUIDs) that identify each XLV object.
Example 3
This example shows a description file that makes the same volume objects as in Example 2.
# A spare plex
plex spare_plex1
ve dks0d2s1 dks0d2s2
# A small volume
168
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
vol small
log
plex
ve dks0d2s3
data
plex
ve dks0d2s14 dks0d2s12
end
# Write labels before terminating session.
exit
Example 4
This example shows making a complex volume interactively. It makes a volume for an XFS filesystem
that has a single-partition log and a plexed (mirrored) data subvolume that is striped.
# xlv_make
xlv_make> vol movies
movies
xlv_make> log
movies.log
xlv_make> plex
movies.log.0
xlv_make> ve /dev/dsk/dks0d2s1
movies.log.0.0
Let the data subvolume have two plexes, each of which consists of two sets of striped disks. The data
written to the data subvolume will be copied to both movies.data.0 and movies.data.1.
xlv_make> data
movies.data
xlv_make> plex
movies.data.0
xlv_make> ve -stripe
movies.data.0.0
xlv_make> ve -stripe
movies.data.0.1
xlv_make> plex
movies.data.1
xlv_make> ve -stripe
movies.data.1.0
xlv_make> ve -stripe
movies.data.1.1
Add a small real-time subvolume. Stripe the data across two disks, with the stripe unit set to 1024 512byte sectors.
169
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
xlv_make> rt
movies.rt
xlv_make> plex
movies.rt.0
xlv_make> ve -stripe -stripe_unit 1024 dks4d1s6 dks4d2s6
movies.rt.0.0
xlv_make> end
Object specification completed
xlv_make> exit
#
DIAGNOSTICS
170
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Too many volume elements have been specified for this plex
You have reached the maximum number of volume elements that can be in a single plex.
An error occurred in creating the specified objects
An error occurred while writing the volume configuration out to the disk labels.
Unrecognized flag: flag
flag is not recognized.
Unexpected symbol: symbol
symbol is an unknown command.
A volume name must be specified
You have given a vol command without giving the name of the volume as an argument.
Too many disk partitions
You have specified too many devices for the volume element.
Cannot determine size of partition; please verify that the device exists
xlv_make is unable to figure out the size of the specified disk partition. Make sure that
the device exists.
Unequal partition sizes, truncating the larger partition
The partitions specified for a striped volume element are not of the same size. This
leaves some disk space unusable in the larger partition because data is striped across all
the partitions in a volume element.
A disk partition must be specified
You have given the ve command without specifying the disk partitions that belong to the
volume element as arguments to the command.
Unknown device: %s
You have specified a disk partition that either has no device node in /dev/dsk or is missing altogether.
Illegal value
The value is out of range for the given flag.
The volume elements address range must be increasing
When you specify the starting offset of a volume element within a plex by using the ve
start command, you must specify them in increasing order.
Disk partition partition is already being used
The disk partition named in the ve command is already in use by some other volume
object.
Disk partition partition is mounted; use force to override
The disk partition named in the ve command is currently mounted. Use of the force
argument is required to perform the operation.
171
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Address range doesnt match corresponding volume element in other plexes
A volume element within a plex must have the same address range in all plexes for the
subvolume that includes those plexes.
There are partially specified objects, use quit to exit without
creating them You have entered the quit command while there are specified, but not
created objects. You should enter quit again to really quit at this point and discard
specified objects.
Missing flag value for: %s
A command was given that requires an additional argument that was not given.
Malloc failed
There is insufficient memory available for xlv_make to operate successfully.
An error occurred in updating the volume header
An attempt to modify a disks volume header was unsuccessful.
A striped volume element must have at least two partitions
The ve stripe command was given and only one partition was specified.
Log ve should have partition type xfslog
Data ve should have partition type xlv
Rt ve should have partition type xlv
Standalone object should have partition type xlv or xfslog
Mixing partition type xfslog with data types not allowed
All the paritions that make up a volume element must have the same partition type,
either xlv or xfslog.
Partition type must be consistent with other ves in plex
Partition type does not correspond with intended usage.
Partition could already belong to lv.
Check /etc/lvtab A warning that this partition may already belong to an lv volume.
Illegal partition type
An attempt was made to specify a partition that cannot, under any circumstance, be
used in an xlv(7M) volume. An example of such a partition would be the volume
header.
Subvolume type does not match any known
The subvolume being operated on is of no known type.
Size mismatch
The partition size information in the volume header does not match that contained in the
xlv label.
172
xlv_make(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Device number mismatch
A warning that the device number in the xlv label does not match that of the volume
header.
The same partition cannot be listed twice
The ve command was given with the same partition listed twice.
SEE ALSO
The disk labels created by xlv_make are stored only in the volume header of the disks. They do not destroy user data. Therefore, you can make an lv(7M) volume into an XLV volume and still preserve all the
data on the logical volume.
xlv_make changes the partition type of partitions used in newly created objects to either xlv or xfslog
depending upon their usage.
You must pick a different name for each volume, standalone plex, and standalone volume element. You
cannot have, for example, both a volume and a plex named yy.
You must be root to run xlv_make.
173
xlv_set_primary(1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xlv_set_primary device_name
DESCRIPTION
xlv_set_primary finds the XLV volume and plex to which device_name belongs and makes that plex the
active copy. All the other plexes that belong to this volume are marked stale. This causes all of the plexes
in this volume to be synchronized to the contents of the active plex when the volume is later assembled by
xlv_assemble(1M).
xlv_set_primary is designed for use during the miniroot when only a single plex of the volume is running.
Making that plex the primary plex of the volume ensures that whatever changes are made to this plex (for
example, installing software) are made to the other plexes when they come online.
This command has no effect if device_name is not part of an XLV volume.
SEE ALSO
174
xlv_shutdown( 1M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
xlv_shutdown [ v ] [ n volume-name ]
DESCRIPTION
xlv_shutdown is used to gracefully shut down (disassemble) logical volumes after their corresponding
filesystems have been unmounted. It is called by /etc/umountfs, which is called by /etc/inittab at system
shutdown time. xlv_shutdown typically does not need to be explicitly invoked.
xlv_shutdown gets the XLV volumes from the kernel and cleanly shuts them down. This ensures that all
the plexes in a volume are in sync so that they do not need to be revived when restarted. After a volume
has been shut down, xlv_assemble(1M) needs be run before using the volume again. Note that
xlv_shutdown does not shut down a root volume or volumes with mounted filesystems.
xlv_shutdown supports the following options:
n volume-name
Shut down only the given volume. The default behavior is to close down all possible
volumes.
SEE ALSO
175
grio_config(4)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
The /etc/grio_config file contains information describing the I/O rates for each device or controller in the
system. This information is read by ggd and is used to allocate I/O-rate guarantees to requesting
processes.
The grio_config file is composed of entries of two different types. The first describes a system element and
its bandwidth:
device_name= OPTSZ=# NUM=# CTLRNUM=# UNIT=# RT=1
(comment)
OPTSZ, NUM, CTLRNUM, and UNIT are keywords. OPTSZ refers to the optimal I/O size of the device in bytes, and NUM is the number of OPTSZ sized I/O requests that can be guaranteed each second.
These fields are required for each device. CTLRNUM and UNIT refer to the device SCSI controller and
unit respectively. These are used only when necessary to identify a particular device. RT=1 is used to
indicate that the disk device is part of an XLV real-time subvolume and that the error retry mechanism
should be disabled. The comment field is optional, but usually contains a description of the device.
The second type of entry describes the relationship between elements in the system:
device_name: dev1 dev2 dev3
This means that dev1, dev2, and dev3 are attached to device_name. In order to get a rate guarantee on one of
these devices, a rate guarantee must also be obtained on device_name as well.
With these entries, ggd is able to construct a performance tree. This tree is used to determine if an I/Orate-guarantee request can be satisfied.
FILES
/etc/grio_config
SEE ALSO
Currently, all devices have OPTSZ set to 64K bytes. If a device has OPTSZ and NUM values of 0, the
I/O characteristics of the device could not be determined, and the device is not considered when making
rate guarantees.
176
grio_disks(4)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
The /etc/grio_disks file contains information describing the I/O rates for individual types of disk drives.
The entries are of the form:
ADD "SGIxxxxxxxxxxxxxxxxxxxxxxxxx"
The first item is the key word ADD. The next item is a 28 character string describing the type of disk
drive. This is the same as the disk drive ID string. Drives recommended by Silicon Graphics usually have
the "SGI" string as the first characters in this string. The next number describes the optimal I/O size in
bytes for the disk device. The final number is the number of optimal sized I/O requests that can be performed by the disk drive each second.
The performance characteristics for most supported disk drives are already known by the ggd daemon.
This file is used to allow system administrators to add the characteristics of new types of drives so that
ggd can make reliable guarantees.
FILES
/etc/grio_disks
SEE ALSO
The number of optimal sized I/O requests that can be guaranteed each second may be significantly less
than the maximum performance of the drive. This is because each request is considered to be distinct and
may require a maximum length seek before the request is issued.
177
xfs(4)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
An XFS filesystem can reside on a regular disk partition or on a logical volume (see lv(7M) and xlv(7M)).
An XFS filesystem has up to three parts: a data section, a log section, and a real-time section. For disk
partition and lv logical volume filesystems, the real-time section is absent, and the log area is contained
within the data section. For XLV logical volume filesystems, the real-time section is optional, and the log
section can be separate from the data section or contained within it. The filesystem sections are divided
into a certain number of blocks, whose size is specified at mkfs(1M) time with the b option.
The data section contains all the filesystem metadata (inodes, directories, indirect blocks) as well as the
user file data for ordinary (non-real-time) files and the log area if the log is internal to the data section.
The data section is divided into a number of allocation groups. The number and size of the allocation
groups are chosen by mkfs so that there is normally a small number of equal-sized groups. The number of
allocation groups controls the amount of parallelism available in file and block allocation. It should be
increased from the default if there is sufficient memory and a lot of allocation activity. More allocation
groups are added (of the original size) when xfs_growfs(1M) is run.
The log section (or area, if it is internal to the data section) is used to store changes to filesystem metadata
while the filesystem is running until those changes are made to the data section. It is written sequentially
during normal operation and read only during mount. When mounting a filesystem after a crash, the log
is read to complete operations that were in progress at the time of the crash.
The real-time section is used to store the data of real-time files. These files had an attribute bit set through
fcntl(2) after file creation, before any data was written to the file. The real-time section is divided into a
number of extents of fixed size (specified at mkfs time). Each file in the real-time section has an extent size
that is a multiple of the real-time section extent size.
Each allocation group contains several data structures. The first sector contains the superblock. For allocation groups after the first, the superblock is just a copy and is not updated after mkfs. The next three
sectors contain information for block and inode allocation within the allocation group. Also contained
within each allocation group are data structures to locate free blocks and inodes; these are located
through the header structures.
All these data structures are subject to change, and the headers that specify their layout on disk are not
provided.
SEE ALSO
178
grio(5)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
Guaranteed-rate I/O (GRIO) refers to a guarantee made by the system to a user process indicating that
the given process will receive data from a peripheral device at a predefined rate regardless of any other
activity on the system. The purpose of this mechanism is to manage the sharing of scarce I/O resources
amongst a number of competing processes, and to permit a given process to reserve a portion of the
systems resources for its exclusive use for a period of time.
Currently, the only I/O resources that can be reserved using the GRIO mechanism are files stored on the
real-time subvolume of an XFS filesystem.
A GRIO guarantee is defined as the number of bytes that can be read or written to a given file by a given
process, each second. If a process has a GRIO guarantee on a file and it issues I/O requests in sizes equal
to the guaranteed amount, then the read or write calls are guaranteed to complete in less than one second.
If the process issues I/O requests at a size or rate greater than the guarantee, the excess requests are
blocked until such time as they fall within the scope of the guarantee.
There are a number of components in the GRIO mechanism. The first is the guarantee-granting daemon,
ggd. This is a user level process that is started when the system is booted. It controls the granting of
guarantees, the initiation and expiration of existing guarantees, and the monitoring of the available
bandwidths of each I/O device on the system. User processes communicate with the daemon using the
grio_request(3X), grio_remove_request(3X), grio_get_rtgkey(3X), and grio_use_rtgkey(3X) library calls.
When ggd is started, it reads the files /etc/grio_config and /etc/grio_disks to determine the bandwidths of the
various devices on the system. These files are generated by the cfg utility but may be edited by the system
administrator to tune performance. If ggd is terminated, all existing rate guarantees are removed.
The next component of the GRIO mechanism is the XLV volume manager. Rate guarantees may only be
obtained from files on the real-time subvolume of an XFS filesystem. The disk driver command retry
mechanism is disabled on the disks that make up the real-time subvolume. This means that if a drive
error occurs, the data is lost. The intent of real-time files is to read/write data from the disk as rapidly as
possible. If the device driver is forced to retry one processs disk request, it causes the requests from
other processes to become delayed.
If one partition of a disk is used in a real-time subvolume, the entire disk is considered to be used for
real-time operation. If one disk on a SCSI controller is used for real-time operation then all the other devices on that controller must be used for real-time operation as well.
In order to use the guaranteed-rate I/O mechanism effectively, the XLV volume and XFS filesystem must
be set up properly. The next section gives an example.
By default, the ggd daemon will allow two process streams to obtain rate guarantees. If support for more
streams is desired, it is necessary to obtain licenses for the additional streams. The license information is
stored in the /usr/var/netls/nodelock file and interpreted by the ggd daemon on startup.
179
grio(5)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
EXAMPLE
The example in this section describes a method of laying out the disks, filesystem, and real-time file that
enables the greatest number of processes to obtain guarantees on a single file concurrently. It is not necessary to construct a file in this manner in order to use GRIO, however fewer processes can obtain rate
guarantees on the file as a result. Assume that there are four disk partitions available for the real-time
subvolume of an XLV volume. Each one of the partitions is on a different physical disk.
Before setting up the XFS filesystem, the I/O request size used by the user process must be determined.
In order to get the greatest I/O rate, the file data should be striped across all the disks in the subvolume.
To avoid filesystem fragmentation and to force all I/O operations to be on stripe boundaries, the file
extent size should be an even multiple of the volume stripe width. Rate guarantees are always made
assuming I/O request sizes that are even multiples of an optimal I/O size. The optimal I/O size is
specified on a per device basis in the /etc/grio_config file but it is usually 64K bytes. Therefore, the I/O
request size should be a multiple of 64K bytes and equal to the volume stripe width. The file extent size
should be set to a multiple of the volume stripe width.
In this example, let the file extent size be equal to the stripe width. The application always issues I/O
operations of size equal to the extent size. Assuming there are four disks available, let the stripe step size
be equal to 64k bytes. The file extent size and volume stripe width are set to 256K bytes. All application
I/O operations then will be performed in 256k byte blocks.
Once the XLV volume and XFS filesystem have been created, the application can create the real-time file.
Real-time files must be read or written using direct, synchronous I/O requests. The open(2) manual page
describes the use and buffer alignment restrictions when using direct I/O. When creating a real-time file,
the F_FSSETXATTR command must be issued to set the XFS_XFLAG_REALTIME flag. This can only be
issued on a newly created file. It is not possible to mark a file as real-time once non-real-time data blocks
have been allocated to it. Rate guarantees cannot be obtained when creating a file. In order for a rate
guarantee to be obtained, it is necessary to know the layout of the blocks of the file on the disks. This cannot be determined until after the file has been written.
After the real-time file has been created, the application can issue a grio_request(3X) to obtain the rate
guarantee. With the rate guarantee established, the application read or write requests to the file, using the
given file descriptor, will complete within the guaranteed time. This will continue until the file is closed,
the guarantee is removed by the application via grio_remove_request(3X), or the guarantee expires.
DIAGNOSTICS
If a rate cannot be guaranteed, ggd returns an error to the requesting process. It also returns the amount
of bandwidth currently available on the device. The process can then determine if this amount is
sufficient and if so issue another rate guarantee request.
FILES
/etc/grio_config
/etc/grio_disks
/usr/var/netls/nodelock
180
grio(5)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
SEE ALSO
181
xlv(7M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
NAME
/dev/dsk/xlv/*
/dev/rdsk/xlv/*
DESCRIPTION
XLV devices provide access to disk storage as logical volumes. A logical volume is an object that behaves
like a disk partition, but its storage may span several physical disk devices.
Using XLV, you can concatenate disks together to create larger logical volumes, stripe data across disks to
create logical volumes with greater throughput, and plex (or mirror) disks for reliability. In addition,
XLV enables you to change the configuration of volumes while the volume is actively being used as a
filesystem.
The geometry of logical volumes (e.g., the disks that belong to it, how they are put together, etc.) are
stored in the disk labels of the disks that belong to the logical volumes. When the system starts up, the
utility xlv_assemble(1M) scans all the disks on the system and automatically assembles them into logical
volumes. xlv_assemble(1M) also creates any necessary device nodes.
XLV device names always begin with /dev/{r}dsk/xlv/device_name where the device_name is assigned by
the creator of the volume. See xlv_make(1M) for how volumes are created.
Device numbers range from 0 to one less than the maximum number of logical volume devices
configured in the system. This is 10 by default; this number may be changed by rebuilding a kernel with
lboot(1M).
There is a kernel driver, referred to as xlv, and some daemons for the logical volume devices. The driver
is a pseudo device not directly associated with any physical hardware; its function is to map requests on
logical volume devices into requests on the underlying disk devices. The daemons take care of error
recovery and dynamic reconfiguration of volumes.
Volume Objects
XLV allows you to work with whole volumes and pieces of volumes. Pieces of volumes are useful for
creating and reconfiguring volumes in units that are larger than individual disk partitions.
Each volume consists of up to three subvolumes. An xfs(4) filesystem usually has a large data subvolume in
which all the user files and metadata such as inodes are stored and a small log subvolume in which the
filesystem log is stored. For high-performance and real-time applications, a volume can also have a realtime subvolume that contains only user files aligned at configurable block boundaries. Guaranteed rate
I/O can be done to real-time subvolumes. See grio(5).
Each subvolume can be independently organized as 1 to 4 plexes. Plexes are sometimes known as mirrors. XLV makes sure that the data in all the plexes of a subvolume are the same. Plexes are useful for
reliability since a subvolume remains available if any of its plexes are available. Since each subvolume is
independently organized, you can choose to plex any, all, or none of the subvolumes within a volume.
182
xlv(7M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Each plex consists of up to 128 volume elements. Each volume element is a collection of disk partitions that
may be either striped or concatenated. By adding volume elements, you can extend the size of a subvolume even one that is striped. Volume elements within a plex do not need to be of the same size. However, all the volume elements at the same offset in all the plexes of the subvolume must be the same size.
For example, the first and second volume elements in a plex can have different sizes. But the first volume
element in all the plexes of the subvolume must be the same size. This restriction is necessary because the
volume element is the unit of recovery. Note that if XLV gets an unrecoverable disk error on one disk
partition in a volume element, the entire volume element is taken offline.
Each volume element can consist of from 1 to 100 disk partitions. The disks can be treated as either a concatenated set (in which case XLV writes to the partitions sequentially) or as a striped set (in which case XLV
writes a stripe units worth of data to one disk and then rotates to the next disk in the stripe set.) In general, it is better to use volume elements that contain single disks when you want to concatenate disks
together and only use volume elements with multiple disks when you want to use disk-striping. This is
because the volume element is the unit of recovery.
XLV allows you to create and work with volumes, subvolumes, plexes, and volume elements. The
interesting operations associated with volumes are: creating them, assembling disk partitions into
volumes, mounting them, changing volume configurations, shutting them down, and destroying them.
Naming Volume Objects
Each XLV object is composed of a hierarchy of lower level objects. For example, a volume is composed of
subvolumes that are in turn composed of plexes, etc. To let you refer to a component of an XLV object,
XLV has adopted a hierarchical naming convention. For example:
movies.data.0.5.50
Refers to the volume named movie, the data subvolume, plex 0 of that subvolume,
volume element 5 within that plex, and disk partition 50 within that volume element. Note that the numbers are zero-based.
movies.log.2
Refers to plex number 2 in the log subvolume of the volume named movies.
movies.rt.1.5
Refers to volume element 5 within plex number 1 of the real-time subvolume of the
volume named movies.
If you create an object outside of a volume, then that object has a user-assigned name. For example,
spare_plex.2.1 refers to disk partition number 1 of volume element number 2 of a standalone plex named
spare_plex. spare_plex does not currently belong to any subvolumes.
These names are echoed by xlv_make(1M) as objects are created. They are also useful in specifying the
objects to change via xlv_admin(1M).
Creating Volumes
Volumes are created via xlv_make(1M). This utility writes the volume geometry to all the disks that
belong to the volume object. The geometry is written to the volume headers. See vh(7M).
Assembling Volumes
After a volume has been created, it must be made known to the kernel driver before I/O can be initiated
to the volume. The command xlv_assemble(1M) scans all the disks attached to the system and assembles
all the logical volumes that it finds. It then passes the configuration to the kernel. This is usually done
183
xlv(7M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
during system startup. Once a volume has been assembled, I/O can be performed.
Working with Filesystems
The normal filesystem utilities such as mkfs(1M) and mount(1M) work with logical volumes.
A logical volume consisting of a single disk partition (that may be plexed) can be used as root(7M). You
cannot boot directly off a logical volume; you must specify the underlying disk partition. partition.
Modifying Volumes
The geometry of a volume object can be modified either offline or online. To modify a volume object
offline, first unmount the filesystem, then destroy the volume object by using xlv_admin(1M). Then, you
can run xlv_make(1M) to create new XLV objects. Note that xlv_make only allows you to use disk partitions that are not currently part of volume objects.
You can also modify volume objects while they are online by using xlv_admin(1M). You can grow a
volume, add a plex, and remove a plex while the volume is actively being used. Note that I/O is blocked
while the configuration is being changed. The blocked I/O is completed after the configuration has been
written out to the disk labels.
You can also use xlv_admin to remove a volume element from a plex while the volume is online if there is
at least one other plex that covers the range of disk blocks affected. Note that you can choose to plex only
a portion of the address space of a subvolume.
Working with Plexes
When there are multiple plexes, XLV recovers from read errors. In addition, XLV attempts to rewrite the
data back to the failed plex. XLV masks write errors if it can write to at least one of the plexes.
When a plexed volume starts up, XLV automatically makes sure that all the data among the plexes within
each subvolume is consistent. This may involve copying the data from one plex to the others. While this
is going on, the volume is available at a degraded performance. You can eliminate the need for plex
recovery by shutting down the plex with xlv_shutdown(1M). xlv_shutdown synchronizes the plexes and
marks them as been the same so that when they restart, XLV knows that the plexes are consistent and can
therefore avoid the plex copies.
FILES
/dev/dsk/xlv/*
/dev/rdsk/xlv/*
/var/sysgen/master.d/xlv
SEE ALSO
XLV runs on both XFS and EFS filesystems. In addition, you can read and write to XLV devices using the
raw device interfaces.
184
xlv(7M)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
XLV disk labels are stored on the disks themselves. Therefore, you can physically reposition the disk
drives and XLV still assembles them correctly.
You can upgrade from an existing lv(7M) volume to an XLV volume by using lv_to_xlv(1M).
When you are running in the miniroot, the XLV device nodes are created in /root/dev/dsk/xlv and
/root/dev/rdsk/xlv .
185
Index
B
backup and restore
amount of time it takes, 14
compatibility of dump and restore utilities, 13
during conversion to XFS, 13, 21, 26
using xfsdump and xfsrestore, 29-58
utilities, 3
block sizes
and mkfs, 15, 23, 26
guidelines, 7
range of sizes, 2, 7
syntax, 7
bru utility, 57
C
cfg utility
description, 96
reference page, 117
using, 105
compatibility
32-bit programs and XFS, 2
dump/restore and filesystem type, 3
EFS and XFS, 1
NFS, 2
of software releases, 6
component records, 106
concatenation
definition, 66
guidelines, 73
D
daemons
GRIO, 96, 101
XLV, 69
data segments, xfsdump utility, 32
/dev/dsk/xlv directory, 68
device name
disk for dump file, 20
identifying, 15
identifying with prtvtoc, 19
tape drive, 19
df utility and XLV, 89
direct I/O, 90
disk labels
and logical volume assembly, 69
daemon that writes them, 69
information used at system startup, 63
written by xlv_make, 73
disk partitions
and external log size, 71
and volume elements, 66
block and character devices, 60
device names, 15, 19
187
Index
E
EFS filesystems
and XLV logical volumes, 60
XLV subvolumes, 71
error messages, 111-113
error recovery
and XLV, 70
disabling for GRIO, 98-100
from specific errors, 111-113
/etc/config/ggd.options file, 96, 109
/etc/fstab file
entries for system disk, 20
F
fcntl system call, 89
font conventions, xvii
fx utility
disk partition types, 13
new features, 12
standalone vs. IRIX, 12
using the standalone version, 22
G
ggd daemon
description, 96
reference page, 117
restarting, 101, 105
GRIO
component records, 106
configuring the ggd daemon, 101
creating an XLV logical volume for, 102
description, 4, 91
disabling disk error recovery, 98-100
disk errors, 93
features, 4
file descriptors, 92
file formats, 106-109
hard guarantees, 93, 97
hardware configuration requirements, 97
188
Index
H
hard errors, 70
hard guarantees, 93, 97
hardware requirements, xvii, 97
housekeeping directory, 56
I
IDO software release, 6
incremental dumps, xfsdump utility, 42
interactive restore, xfsrestore utility, 51
interrupted restores, xfsrestore utility, 55
inventory, xfsdump utility, 32, 44
IRIS Volume Manager, 1, 4
J
journaling information, 2, 63
189
K
kernel panics, 112
L
library routines new with XFS, 116
logical volumes
adding plexes, 82
choosing which subvolumes, 71
coming up at system startup, 63, 69
creating, examples, 73-76
definition of volume, 62
deleting objects, 88
description, 60
detaching plexes, 86
device names, 68
disk labels, 59, 63, 69, 73
displaying objects, 80
example (figure), 60
growing, 81
hierarchy of objects, 60
increasing size, 81
lv. See lv logical volumes.
moving to a new system, 63, 69
naming, 68
preparing for use, 76
read and write errors, 70
See also subvolumes.
See also volume elements.
See also XLV.
sizes, 71
used as raw devices, 59, 63
volume composition, 62
logs
choosing size, 9
choosing type, 8
description, 8
external, definition, 8
external, specifying size, 9
Index
internal, definition, 8
internal, specifying size, 9
internal log, when used, 71
size syntax, 10
lv_to_xlv utility
reference page, 117
using, 77
lv logical volumes, 4
converting to XLV, 77
M
manual pages, xviii, 115
media layout, xfsdump utility, 32
media object, xfsdump utility, 32
metadata, filesystem, 3, 112
mkfs_xfs utility
reference page, 117
See also mkfs utility, 117
mkfs utility
command line syntax, 15, 16, 23, 26
example output, 15, 16, 23
for GRIO, 105
reference page, 117
mpadmin utility, 101
N
NetLS licenses
Disk Plexing Option, xv, 4, 59
High Performance Guaranteed-Rate I/O, xv, 4, 91
NFS
compatibility, 2
software release, 6
O
online reference pages, 115
optimal I/O size, 103, 107, 108
orphanage directory, 56
P
plexes
adding to volumes, 82
definition, 64
deleting, 88
detaching, 86
Disk Plexing Option, xv, 4, 59
displaying, 80
example of creating, 75
holes in address space, 64, 72
monitoring plex revives, 85
plex composition, 65
plex revive definition, 65
read and write errors, 70
See also logical volumes.
volume element sizes, 72
when to use, 72
plex revives, 85
prerequisite hardware, xvii, 97
prerequisite software, 6
prtvtoc utility, 19
R
read continuous (RC) bit, 97
real-time files, 89
real-time process, 101
real-time subvolumes
and utilities, 89
creating files, 89
190
Index
GRIO files, 92
hardware requirements, 97
only real-time on disk, 64
reference pages
for more information, xviii
included in this guide, 116
related to XFS and XLV, 115
viewing, 115
relationship records, 108
restore utility
and XFS filesystems, 3
commands used during conversion to XFS, 24, 27
reference page, 117
vs. xfsrestore, 29
restoring files, xfsrestore utility, 50
restoring interrupted dumps, xfsrestore utility, 53
retry mechanisms, 97
root filesystem
combining with usr, 11
converting to XFS, 18
dumping, 21
growing, 6
restoring all files, 24
restrictions, 72
root partition
and striping, 72
and XLV, 70
combining with usr partition, 22
converting to XFS, 18-25
device name, 19
subvolumes
composition, 63
data subvolume definition, 63
displaying, 80
log subvolume definition, 63
real time subvolume definition, 64
See also logical volumes.
subvolume types, 63
system calls, new and modified for XFS, 116
System Recovery, PROM Monitor, 58
T
tapes, reusing with xfsdump utility, 40
tar utility, 57
terminator, xfsdump utility, 32
U
usr filesystem
combining with root filesystem, 11
converting to XFS, 18
dumping, 21
restoring all files, 24
/usr/lib/libgrio.so, 96
usr partition
combining with root partition, 22
device name, 19
utilities changed for XFS and XLV, 116
S
V
soft guarantees, 94
software release, 6
stream terminator, xfsdump utility, 32
striped volume elements. See volume elements.
stripe unit, definition, 67
191
Index
displaying, 80
multipartition volume elements, definition, 67
multipartition volume elements not
recommended, 73
single partition volume elements, definition, 66
striped, definition, 67
striped, example of creating, 75
striping, when to use, 72, 94
volumes. See logical volumes.
X
XFS
and standard utilities, 2
block sizes, 2, 7
cant use when ..., 5
changed system calls, 116
changed utilities, 116
compatibility with EFS, 1
converting an option disk, 25
converting system disk, 18-25
features, 1
filesystem on a new disk partition, 14
journaling information, 63
logs. See logs.
making filesystems, 14-17
new library routines, 116
on system disk, 18
preparing to make filesystems, 5-14
restore compatibility, 3
subsystems, 6
utilities, 2, 115
xfs(4) reference page, 117
xfs_check utility
error messages, 113
how to use, 27
reference page, 117
reporting and repairing problems, 28
xfs_estimate utility
how to use, 10
reference page, 117
xfs_growfs utility, 82, 117
xfsdump utility
dump inventory, 44
features, 30
incremental dumps, 42
media layout, 32
network usage, 56
reference page, 117
resumed dumps, 42
reusing media, 40
specifying media, 37
STDOUT, 56
using, 37
xfsrestore utility
and EFS filesystems, 3
cumulative restores, 52
features, 30
interactive restore, 51
interrupted restores, 55
network usage, 50, 56
reference page, 117
restoring files, 50
restoring interrupted dumps, 53
session ID, 48
session label, 48
simple restores, 48
STDIN, 56
using, 47
XLV
compatibility with EFS, 1
compatibility with XFS, 1
converting lv logical volumes, 77
daemons, 69, 115
dont use XLV when ..., 70
error policy, 70
features, 3
logical volumes. See logical volumes.
no configuration file, 69
192
Index
overview, 60-70
planning logical volumes, 70-73
relationship to IRIS Volume Manager, 1
relationship to lv, 1
See also logical volumes.
utilities, 115
with EFS, 3
xlv(7M) reference page, 117
xlv_admin utility
adding a plex, 82
and Disk Plexing Option, 79
deleting volume objects, 88
detaching a plex, 86
displaying objects, 80
growing a volume, 81
menu, 79
reference page, 117
xlv_assemble utility, 117
xlv_labd daemon
description, 69
reference page, 117
xlv_make utility
error messages, 113
GRIO example, 104
reference page, 117
using to create volume objects, 73-76
xlv_plexd daemon
description, 69
reference page, 117
xlv_set_primary utility, 117
xlv_shutdown utility, 117
xlvd daemon
description, 69
reference page, 117
XLV logical volumes. See logical volumes.
193
Technical errors
Please send the title and part number of the document with your comments. The part
number for this document is 007-2549-001.
Thank you!
To fax your comments (or annotated copies of manual pages), use this
fax number: 415-965-0964