GPFS 4.1 ProblemDet
GPFS 4.1 ProblemDet
Version 4 Release 1
GA76-0443-00
General Parallel File System
Version 4 Release 1
GA76-0443-00
Note
Before using this information and the product it supports, read the information in “Notices” on page 259.
This edition applies to version 4 release 1 modification 0 of the following products, and to all subsequent releases
and modifications until otherwise indicated in new editions:
v IBM General Parallel File System ordered through Passport Advantage® (product number 5725-Q01)
v IBM General Parallel File System ordered through AAS/eConfig (product number 5641-GPF)
v IBM General Parallel File System ordered through HVEC/Xcel (product number 5641-GP6, 5641-GP7, or
5641-GP8)
Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of the
change.
IBM welcomes your comments; see the topic “How to send your comments” on page xii. When you send
information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes
appropriate without incurring any obligation to you.
© Copyright IBM Corporation 2014.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Tables . . . . . . . . . . . . . . . vii mmapplypolicy -L 0 . . . . . . . . . . 26
mmapplypolicy -L 1 . . . . . . . . . . 26
About this information . . . . . . . . ix mmapplypolicy -L 2 . . . . . . . . . . 27
mmapplypolicy -L 3 . . . . . . . . . . 28
Prerequisite and related information . . . . . . x
mmapplypolicy -L 4 . . . . . . . . . . 29
Conventions used in this information . . . . . . xi
mmapplypolicy -L 5 . . . . . . . . . . 29
How to send your comments . . . . . . . . xii
mmapplypolicy -L 6 . . . . . . . . . . 30
The mmcheckquota command . . . . . . . . 31
| Summary of changes . . . . . . . . xiii The mmlsnsd command . . . . . . . . . . 31
The mmwindisk command . . . . . . . . . 32
Chapter 1. Logs, dumps, and traces . . 1 The mmfileid command . . . . . . . . . . 33
The GPFS log . . . . . . . . . . . . . . 1 The SHA digest . . . . . . . . . . . . . 35
Creating a master GPFS log file . . . . . . . 2
The operating system error log facility . . . . . . 2 | Chapter 4. Deadlock amelioration . . . 37
MMFS_ABNORMAL_SHUTDOWN . . . . . 3 | Automated deadlock detection . . . . . . . . 37
MMFS_DISKFAIL. . . . . . . . . . . . 3 | Automated deadlock data collection . . . . . . 38
MMFS_ENVIRON . . . . . . . . . . . 3 | Automated deadlock breakup . . . . . . . . 38
MMFS_FSSTRUCT . . . . . . . . . . . 3
MMFS_GENERIC. . . . . . . . . . . . 4
Chapter 5. Other problem
MMFS_LONGDISKIO . . . . . . . . . . 4
MMFS_QUOTA . . . . . . . . . . . . 4 determination tools . . . . . . . . . 39
MMFS_SYSTEM_UNMOUNT . . . . . . . 5
MMFS_SYSTEM_WARNING . . . . . . . . 5 Chapter 6. GPFS installation,
Error log entry example . . . . . . . . . 5 configuration, and operation problems . 41
The gpfs.snap command . . . . . . . . . . 6 Installation and configuration problems . . . . . 41
Using the gpfs.snap command . . . . . . . 7 What to do after a node of a GPFS cluster
Data always gathered by gpfs.snap on all crashes and has been reinstalled . . . . . . 42
platforms . . . . . . . . . . . . . . 8 Problems with the /etc/hosts file . . . . . . 42
Data always gathered by gpfs.snap on AIX . . . 9 Linux configuration considerations . . . . . 42
Data always gathered by gpfs.snap on Linux . . 9 Problems with running commands on other
Data always gathered by gpfs.snap on Windows 10 nodes . . . . . . . . . . . . . . . 43
Data always gathered by gpfs.snap for a master GPFS cluster configuration data files are locked 44
snapshot . . . . . . . . . . . . . . 10 Recovery from loss of GPFS cluster configuration
The mmfsadm command . . . . . . . . . . 10 data file . . . . . . . . . . . . . . 45
The GPFS trace facility. . . . . . . . . . . 11 Automatic backup of the GPFS cluster data. . . 45
Generating GPFS trace reports . . . . . . . 11 Error numbers specific to GPFS applications calls 46
GPFS modules cannot be loaded on Linux . . . . 46
Chapter 2. GPFS cluster state GPFS daemon will not come up . . . . . . . 47
information . . . . . . . . . . . . . 17 Steps to follow if the GPFS daemon does not
The mmafmctl Device getstate command . . . . 17 come up . . . . . . . . . . . . . . 47
The mmdiag command . . . . . . . . . . 17 Unable to start GPFS after the installation of a
The mmgetstate command . . . . . . . . . 17 new release of GPFS . . . . . . . . . . 49
The mmlscluster command . . . . . . . . . 18 GPFS error messages for shared segment and
The mmlsconfig command . . . . . . . . . 19 network problems . . . . . . . . . . . 49
The mmrefresh command . . . . . . . . . 19 Error numbers specific to GPFS application calls
The mmsdrrestore command . . . . . . . . 20 when the daemon is unable to come up . . . . 49
The mmexpelnode command . . . . . . . . 20 GPFS daemon went down . . . . . . . . . 50
GPFS failures due to a network failure . . . . . 51
Kernel panics with a 'GPFS dead man switch timer
Chapter 3. GPFS file system and disk
has expired, and there's still outstanding I/O
information . . . . . . . . . . . . . 23 requests' message . . . . . . . . . . . . 52
Restricted mode mount . . . . . . . . . . 23 Quorum loss . . . . . . . . . . . . . . 52
Read-only mode mount . . . . . . . . . . 23 Delays and deadlocks . . . . . . . . . . . 53
The lsof command . . . . . . . . . . . . 24 Node cannot be added to the GPFS cluster . . . . 54
The mmlsmount command . . . . . . . . . 24 Remote node expelled after remote file system
The mmapplypolicy -L command . . . . . . . 25 successfully mounted . . . . . . . . . . . 55
Contents v
vi GPFS: Problem Determination Guide
Tables
1. GPFS library information units . . . . . . ix | 3. Message severity tags ordered by priority 119
2. Conventions . . . . . . . . . . . . xi
To find out which version of GPFS is running on a particular AIX node, enter:
lslpp -l gpfs\*
To find out which version of GPFS is running on a particular Linux node, enter:
rpm -qa | grep gpfs
To find out which version of GPFS is running on a particular Windows node, open the Programs and
Features control panel. The IBM® General Parallel File System installed program name includes the
version number.
To use these information units effectively, you must be familiar with the GPFS licensed product and the
AIX, Linux, or Windows operating system, or all of them, depending on which operating systems are in
use at your installation. Where necessary, these information units provide some background information
relating to AIX, Linux, or Windows; however, more commonly they refer to the appropriate operating
system documentation.
Table 1. GPFS library information units
Information unit Type of information Intended users
GPFS: Administration and This information unit explains how to System administrators or programmers
Programming Reference do the following: of GPFS systems
v Use the commands, programming
interfaces, and user exits unique to
GPFS
v Manage clusters, file systems, disks,
and quotas
v Export a GPFS file system using the
Network File System (NFS) protocol
GPFS: Advanced Administration This information unit explains how to System administrators or programmers
use the following advanced features of seeking to understand and use the
GPFS: advanced features of GPFS
v Accessing GPFS file systems from
other GPFS clusters
v Policy-based data management for
GPFS
v Creating and maintaining snapshots
of GPFS file systems
v Establishing disaster recovery for
your GPFS cluster
v Monitoring GPFS I/O performance
with the mmpmon command
v Miscellaneous advanced
administration topics
Note: Users of GPFS for Windows must be aware that on Windows, UNIX-style file names need to be
converted appropriately. For example, the GPFS cluster configuration data is stored in the
| /var/mmfs/gen/mmsdrfs file. On Windows, the UNIX name space starts under the %SystemDrive%\cygwin
| directory, so the GPFS cluster configuration data is stored in the C:\cygwin\var\mmfs\gen\mmsdrfs file.
Table 2. Conventions
Convention Usage
bold Bold words or characters represent system elements that you must use literally, such as
commands, flags, values, and selected menu options.
Depending on the context, bold typeface sometimes represents path names, directories, or file
names.
bold underlined bold underlined keywords are defaults. These take effect if you do not specify a different
keyword.
constant width Examples and information that the system displays appear in constant-width typeface.
Italics are also used for information unit titles, for the first use of a glossary term, and for
general emphasis in text.
<key> Angle brackets (less-than and greater-than) enclose the name of a key on the keyboard. For
example, <Enter> refers to the key on your terminal or workstation that is labeled with the
word Enter.
\ In command examples, a backslash indicates that the command or coding example continues
on the next line. For example:
mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" \
-E "PercentTotUsed < 85" -m p "FileSystem space used"
{item} Braces enclose a list from which you must choose an item in format and syntax descriptions.
[item] Brackets enclose optional items in format and syntax descriptions.
<Ctrl-x> The notation <Ctrl-x> indicates a control character sequence. For example, <Ctrl-c> means
that you hold down the control key while pressing <c>.
item... Ellipses indicate that you can repeat the preceding item one or more times.
| In synopsis statements, vertical lines separate a list of choices. In other words, a vertical line
means Or.
In the left margin of the document, vertical lines indicate technical changes to the
information.
Include the publication title and order number, and, if applicable, the specific location of the information
about which you have comments (for example, a page number or a table number).
To contact the GPFS development organization, send your comments to the following e-mail address:
| Summary of changes
| This topic summarizes changes to the GPFS licensed program and the GPFS library. Within each
| information unit in the library, a vertical line to the left of text and illustrations indicates technical
| changes or additions made to the previous edition of the information.
| Changes to the GPFS licensed program and the GPFS library for version 4, release 1 include the
| following:
| GPFS product structure
| GPFS now comes in three levels of function: GPFS Standard Edition, GPFS Express Edition, and
| GPFS Advanced Edition.
| Active file management (AFM)
| Enhancements to AFM include the following:
| v AFM environments can now support Parallel I/O. During reads, all mapped gateway nodes
| are used to fetch a single file from home. During writes, all mapped gateways are used to
| synchronize file changes to home.
| v In addition to the NFS protocol, AFM now supports the native GPFS protocol for the AFM
| communication channel providing improved integration of GPFS features and attributes.
| v GPFS 4.1 includes a number of features optimizing AFM operations and usability. These
| features include prefetch enhancements to handle gateway node failures during prefetch. AFM
| introduces new version of hashing (afmHashVersion=2), which minimizes the impact of
| gateway nodes joining or leaving the active cluster. Also, AFM cache states will now have
| different states based on fileset and queue states.
| v GPFS 4.1 supports the migration of data from any legacy NFS storage device or GPFS cluster
| to an AFM fileset. Data migration eases data transfer when upgrading hardware or buying a
| new system. The data source is an NFS v3 export and can be either a GPFS or a non-GPFS
| source as well. AFM based migration can minimize downtime for applications and consolidate
| data from multiple legacy systems into a more powerful cache.
| Autonomic tuning for mmbackup
| The mmbackup command can be tuned to control the numbers of threads used on each node to
| scan the file system, perform inactive object expiration, and carry out modified object backup. In
| addition, the sizes of lists of objects expired or backed up can be controlled or autonomically
| tuned to select these list sizes if they are not specified. List sizes are now independent for backup
| and expire tasks. For more information, see the GPFS: Administration and Programming Reference
| topic: “Tuning backups with the mmbackup command”.
| Backup 3.2 format discontinued
| Starting with GPFS 4.1, the mmbackup command will no longer support incremental backup
| using the /Device/.snapshots/.mmbuSnapshot path name that was used with GPFS 3.2 and
| earlier. For more information, see the GPFS: Administration and Programming Reference topic: “File
| systems backed up using GPFS 3.2 or earlier versions of mmbackup”.
| Cluster Configuration Repository (CCR)
| GPFS 4.1 introduces a new quorum-based repository for the configuration data. This replaces the
| current server-based repository, which required specific nodes to be designated as primary and
| backup configuration server nodes.
| Cluster NFS improvements
| Cluster NFS (CNFS) has been enhanced to support IPv6 and NFS V4.
Summary of changes xv
| v mmbackup
| v mmchattr
| v mmchcluster
| v mmchconfig
| v mmchfileset
| v mmchfs
| v mmchpdisk
| v mmchrecoverygroup
| v mmcrcluster
| v mmcrfileset
| v mmcrfs
| v mmcrnsd
| v mmcrrecoverygroup
| v mmcrvdisk
| v mmdelvdisk
| v mmdiag
| v mmlscluster
| v mmlsfs
| v mmlsmount
| v mmlsrecoverygroup
| v mmmigratefs
| v mmmount
| v mmrestorefs
| v mmsnapdir
| v mmumount
| Changed structures:
| The following structures were changed:
| v gpfs_acl_t
| v gpfs_direntx_t
| v gpfs_direntx64_t
| v gpfs_iattr_t
| v gpfs_iattr64_t
| Changed subroutines:
| The following subroutines were changed:
| v gpfs_fgetattrs()
| v gpfs_fputattrs()
| v gpfs_fputattrswithpathname()
| v gpfs_fstat()
| v gpfs_stat()
| Deleted commands:
| The following commands were deleted:
| v mmafmhomeconfig
| Deleted structures:
| There are no deleted structures.
GPFS has it own error log, but the operating system error log is also useful because it contains
information about hardware failures and operating system or other software failures that can affect GPFS.
Note: GPFS error logs and messages contain the MMFS prefix. This is intentional, because GPFS shares
many components with the IBM Multi-Media LAN Server, a related licensed program.
GPFS also provides a system snapshot dump, trace, and other utilities that can be used to obtain detailed
information about specific problems.
The GPFS log can be found in the /var/adm/ras directory on each node. The GPFS log file is named
mmfs.log.date.nodeName, where date is the time stamp when the instance of GPFS started on the node and
nodeName is the name of the node. The latest GPFS log file can be found by using the symbolic file name
/var/adm/ras/mmfs.log.latest.
The GPFS log from the prior startup of GPFS can be found by using the symbolic file name
/var/adm/ras/mmfs.log.previous. All other files have a timestamp and node name appended to the file
name.
At GPFS startup, log files that have not been accessed during the last ten days are deleted. If you want to
save old log files, copy them elsewhere.
| Starting with GPFS 4.1, many GPFS log messages can be sent to syslog on Linux. The systemLogLevel
| attribute of the mmchconfig command controls which GPFS log messages are sent to syslog. For more
| information, see the mmchconfig command in the GPFS: Administration and Programming Reference.
This example shows normal operational messages that appear in the GPFS log file:
| Removing old /var/adm/ras/mmfs.log.* files:
| Unloading modules from /lib/modules/3.0.13-0.27-default/extra
| Unloading module tracedev
| Loading modules from /lib/modules/3.0.13-0.27-default/extra
| Module Size Used by
| mmfs26 2155186 0
| mmfslinux 379348 1 mmfs26
| tracedev 48513 2 mmfs26,mmfslinux
Depending on the size and complexity of your system configuration, the amount of time to start GPFS
varies. Taking your system configuration into consideration, after a reasonable amount of time if you
cannot access a file system that has been mounted (either automatically or with a mount or mmmount
command), examine the log file for error messages.
GPFS is a file system that runs on multiple nodes of a cluster. This means that problems originating on
one node of a cluster often have effects that are visible on other nodes. It is often valuable to merge the
GPFS logs in pursuit of a problem. Having accurate time stamps aids the analysis of the sequence of
events.
Before following any of the debug steps, IBM suggests that you:
1. Synchronize all clocks of all nodes in the GPFS cluster. If this is not done, and clocks on different
nodes are out of sync, there is no way to establish the real time line of events occurring on multiple
nodes. Therefore, a merged error log is less useful for determining the origin of a problem and
tracking its effects.
2. Merge and chronologically sort all of the GPFS log entries from each node in the cluster. The
--gather-logs option of “The gpfs.snap command” on page 6 can be used to achieve this:
gpfs.snap --gather-logs -d /tmp/logs -N all
The system displays information similar to:
gpfs.snap: Gathering mmfs logs ...
gpfs.snap: The sorted and unsorted mmfs.log files are in /tmp/logs
If the --gather-logs option is not available on your system, you can create your own script to achieve
the same task; use /usr/lpp/mmfs/samples/gatherlogs.sample.sh as an example.
Failures in the error log can be viewed by issuing this command on an AIX node:
errpt -a
On Windows, use the Event Viewer and look for events with a source label of GPFS in the Application
event category.
| On Linux, syslog may include GPFS log messages and the error logs described in this section. The
| systemLogLevel attribute of the mmchconfig command controls which GPFS log messages are sent to
| syslog. For more information, see the mmchconfig command in the GPFS: Administration and
| Programming Reference.
The error log contains information about several classes of events or errors. These classes are:
v “MMFS_ABNORMAL_SHUTDOWN”
v “MMFS_DISKFAIL”
v “MMFS_ENVIRON”
v “MMFS_FSSTRUCT”
v “MMFS_GENERIC” on page 4
v “MMFS_LONGDISKIO” on page 4
v “MMFS_QUOTA” on page 4
v “MMFS_SYSTEM_UNMOUNT” on page 5
v “MMFS_SYSTEM_WARNING” on page 5
MMFS_ABNORMAL_SHUTDOWN
The MMFS_ABNORMAL_SHUTDOWN error log entry means that GPFS has determined that it must
shutdown all operations on this node because of a problem. Insufficient memory on the node to handle
critical recovery situations can cause this error. In general there will be other error log entries from GPFS
or some other component associated with this error log entry.
MMFS_DISKFAIL
The MMFS_DISKFAIL error log entry indicates that GPFS has detected the failure of a disk and forced
the disk to the stopped state. This is ordinarily not a GPFS error but a failure in the disk subsystem or
the path to the disk subsystem.
MMFS_ENVIRON
MMFS_ENVIRON error log entry records are associated with other records of the MMFS_GENERIC or
MMFS_SYSTEM_UNMOUNT types. They indicate that the root cause of the error is external to GPFS
and usually in the network that supports GPFS. Check the network and its physical connections. The
data portion of this record supplies the return code provided by the communications code.
MMFS_FSSTRUCT
The MMFS_FSSTRUCT error log entry indicates that GPFS has detected a problem with the on-disk
structure of the file system. The severity of these errors depends on the exact nature of the inconsistent
data structure. If it is limited to a single file, EIO errors will be reported to the application and operation
will continue. If the inconsistency affects vital metadata structures, operation will cease on this file
system. These errors are often associated with an MMFS_SYSTEM_UNMOUNT error log entry and will
If the file system is severely damaged, the best course of action is to follow the procedures in “Additional
information to collect for file system corruption or MMFS_FSSTRUCT errors” on page 116, and then
contact the IBM Support Center.
MMFS_GENERIC
The MMFS_GENERIC error log entry means that GPFS self diagnostics have detected an internal error,
or that additional information is being provided with an MMFS_SYSTEM_UNMOUNT report. If the
record is associated with an MMFS_SYSTEM_UNMOUNT report, the event code fields in the records
will be the same. The error code and return code fields might describe the error. See Chapter 13,
“Messages,” on page 121 for a listing of codes generated by GPFS.
If the error is generated by the self diagnostic routines, service personnel should interpret the return and
error code fields since the use of these fields varies by the specific error. Errors caused by the self
checking logic will result in the shutdown of GPFS on this node.
MMFS_GENERIC errors can result from an inability to reach a critical disk resource. These errors might
look different depending on the specific disk resource that has become unavailable, like logs and
allocation maps. This type of error will usually be associated with other error indications. Other errors
generated by disk subsystems, high availability components, and communications components at the
same time as, or immediately preceding, the GPFS error should be pursued first because they might be
the cause of these errors. MMFS_GENERIC error indications without an associated error of those types
represent a GPFS problem that requires the IBM Support Center. See “Information to collect before
contacting the IBM Support Center” on page 115.
MMFS_LONGDISKIO
The MMFS_LONGDISKIO error log entry indicates that GPFS is experiencing very long response time
for disk requests. This is a warning message and can indicate that your disk system is overloaded or that
a failing disk is requiring many I/O retries. Follow your operating system's instructions for monitoring
the performance of your I/O subsystem on this node and on any disk server nodes that might be
involved. The data portion of this error record specifies the disk involved. There might be related error
log entries from the disk subsystems that will pinpoint the actual cause of the problem. If the disk is
attached to an AIX node, refer to the AIX Information Center (http://publib16.boulder.ibm.com/pseries/
index.htm) and search for performance management. To enable or disable, use the mmchfs -w command.
For more details, contact your IBM service representative.
The mmpmon command can be used to analyze I/O performance on a per-node basis. See “Failures
using the mmpmon command” on page 85 and the Monitoring GPFS I/O performance with the mmpmon
command topic in the GPFS: Advanced Administration Guide.
MMFS_QUOTA
The MMFS_QUOTA error log entry is used when GPFS detects a problem in the handling of quota
information. This entry is created when the quota manager has a problem reading or writing the quota
file. If the quota manager cannot read all entries in the quota file when mounting a file system with
quotas enabled, the quota manager shuts down but file system manager initialization continues. Mounts
will not succeed and will return an appropriate error message (see “File system forced unmount” on page
71).
Quota accounting depends on a consistent mapping between user names and their numeric identifiers.
This means that a single user accessing a quota enabled file system from different nodes should map to
the same numeric user identifier from each node. Within a local cluster this is usually achieved by
ensuring that /etc/passwd and /etc/group are identical across the cluster.
It might be necessary to run an offline quota check (mmcheckquota) to repair or recreate the quota file. If
the quota file is corrupted, mmcheckquota will not restore it. The file must be restored from the backup
copy. If there is no backup copy, an empty file can be set as the new quota file. This is equivalent to
recreating the quota file. To set an empty file or use the backup file, issue the mmcheckquota command
with the appropriate operand:
v -u UserQuotaFilename for the user quota file
v -g GroupQuotaFilename for the group quota file
v -j FilesetQuotaFilename for the fileset quota file
After replacing the appropriate quota file, reissue the mmcheckquota command to check the file system
inode and space usage.
For information about running the mmcheckquota command, see “The mmcheckquota command” on
page 31.
MMFS_SYSTEM_UNMOUNT
The MMFS_SYSTEM_UNMOUNT error log entry means that GPFS has discovered a condition that
might result in data corruption if operation with this file system continues from this node. GPFS has
marked the file system as disconnected and applications accessing files within the file system will receive
ESTALE errors. This can be the result of:
v The loss of a path to all disks containing a critical data structure.
If you are using SAN attachment of your storage, consult the problem determination guides provided
by your SAN switch vendor and your storage subsystem vendor.
v An internal processing error within the file system.
See “File system forced unmount” on page 71. Follow the problem determination and repair actions
specified.
MMFS_SYSTEM_WARNING
The MMFS_SYSTEM_WARNING error log entry means that GPFS has detected a system level value
approaching its maximum limit. This might occur as a result of the number of inodes (files) reaching its
limit. If so, issue the mmchfs command to increase the number of inodes for the file system so there is at
least a minimum of 5% free.
Probable Causes
STORAGE SUBSYSTEM
COMMUNICATIONS SUBSYSTEM
Failure Causes
STORAGE SUBSYSTEM
COMMUNICATIONS SUBSYSTEM
Recommended Actions
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
EVENT CODE
15558007
STATUS CODE
212
VOLUME
gpfsd
The information gathered with the gpfs.snap command can be used in conjunction with other
information (for example, GPFS internal dumps, traces, and kernel thread dumps) to solve a GPFS
problem.
Use the -z option to generate a non-master snapshot. This is useful if there are many nodes on which to
take a snapshot, and only one master snapshot is needed. For a GPFS problem within a large cluster
(hundreds or thousands of nodes), one strategy might call for a single master snapshot (one invocation of
gpfs.snap with no options), and multiple non-master snapshots (multiple invocations of gpfs.snap with
the -z option).
Use the -N option to obtain gpfs.snap data from multiple nodes in the cluster. When the -N option is
used, the gpfs.snap command takes non-master snapshots of all the nodes specified with this option and
a master snapshot of the node on which it was invoked.
The difference between a master snapshot and a non-master snapshot is the data that is gathered. A
master snapshot gathers information from nodes in the cluster. A master snapshot contains all data that a
non-master snapshot has. There are two categories of data that is collected:
1. Data that is always gathered by gpfs.snap (for master snapshots and non-master snapshots):
v “Data always gathered by gpfs.snap on all platforms”
v “Data always gathered by gpfs.snap on AIX” on page 9
v “Data always gathered by gpfs.snap on Linux” on page 9
v “Data always gathered by gpfs.snap on Windows” on page 10
2. Data that is gathered by gpfs.snap only in the case of a master snapshot. See “Data always gathered
by gpfs.snap for a master snapshot” on page 10.
Note: The contents of mmfsadm output might vary from release to release, which could obsolete any
user programs that depend on that output. Therefore, we suggest that you do not create user programs
that invoke mmfsadm.
The mmfsadm command extracts data from GPFS without using locking, so that it can collect the data in
the event of locking errors. In certain rare cases, this can cause GPFS or the node to fail. Several options
of this command exist and might be required for use:
cleanup
Delete shared segments left by a previously failed GPFS daemon without actually restarting the
daemon.
dump what
Dumps the state of a large number of internal state values that might be useful in determining
the sequence of events. The what parameter can be set to all, indicating that all available data
Other options provide interactive GPFS debugging, but are not described here. Output from the
mmfsadm command will be required in almost all cases where a GPFS problem is being reported. The
mmfsadm command collects data only on the node where it is issued. Depending on the nature of the
problem, mmfsadm output might be required from several or all nodes. The mmfsadm output from the
file system manager is often required.
To determine where the file system manager is, issue the mmlsmgr command:
mmlsmgr
GPFS tracing is based on the kernel trace facility on AIX, embedded GPFS trace subsystem on Linux, and
the Windows ETL subsystem on Windows. The level of detail that is gathered by the trace facility is
controlled by setting the trace levels using the mmtracectl command.
The mmtracectl command sets up and enables tracing using default settings for various common problem
situations. Using this command improves the probability of gathering accurate and reliable problem
determination information. For more information about the mmtracectl command, see the GPFS:
Administration and Programming Reference.
If the problem requires more detailed tracing, the IBM Support Center personnel might ask you to
modify the GPFS trace levels. Use the mmtracectl command to establish the required trace classes and
levels of tracing. The syntax to modify trace classes and levels is as follows:
mmtracectl --set --trace={io | all | def | "Class Level [Class Level ...}"]
For example, to tailor the trace level for I/O, issue the following command:
mmtracectl --set --trace=io
Once the trace levels are established, start the tracing by issuing:
mmtracectl --start
After the trace data has been gathered, stop the tracing by issuing:
mmtracectl --stop
To clear the trace settings and make sure tracing is turned off, issue:
mmtracectl --off
Other possible values that can be specified for the trace Class include:
afm
active file management
alloc
disk space allocation
allocmgr
allocation manager
basic
'basic' classes
brl
byte range locks
cksum
checksum services
cleanup
cleanup routines
cmd
ts commands
defrag
defragmentation
The trace Level can be set to a value from 0 through 14, which represents an increasing level of detail. A
value of 0 turns tracing off. To display the trace level in use, issue the mmfsadm showtrace command.
On AIX, the –aix-trace-buffer-size option can be used to control the size of the trace buffer in memory.
On Linux nodes only, use the mmtracectl command to change the following:
v The trace buffer size in blocking mode.
For example, to set the trace buffer size in blocking mode to 8K, issue:
mmtracectl --set --tracedev-buffer-size=8K
v The raw data compression level.
For example, to set the trace raw data compression level to the best ratio, issue:
mmtracectl --set --tracedev-compression-level=9
v The trace buffer size in overwrite mode.
For example, to set the trace buffer size in overwrite mode to 32K, issue:
mmtracectl --set --tracedev-overwrite-buffer-size=32K
v When to overwrite the old data.
For example, to wait to overwrite the data until the trace data is written to the local disk and the
buffer is available again, issue:
mmtracectl --set --tracedev-write-mode=blocking
For more information about the mmtracectl command, see the GPFS: Administration and Programming
Reference.
When this command displays a NeedsResync target/fileset state, inconsistencies between home and cache
are being fixed automatically; however, unmount and mount operations are required to return the state to
Active.
The mmafmctl Device getstate command is fully described in the Commands topic in the GPFS:
Administration and Programming Reference.
Use the mmdiag command to query various aspects of the GPFS internal state for troubleshooting and
tuning purposes. The mmdiag command displays information about the state of GPFS on the node where
it is executed. The command obtains the required information by querying the GPFS daemon process
(mmfsd), and thus will only function when the GPFS daemon is running.
The mmdiag command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.
The remaining flags have the same meaning as in the mmshutdown command. They can be used to
specify the nodes on which to get the state of the GPFS daemon.
For example, to display the quorum, the number of nodes up, and the total number of nodes, issue:
mmgetstate -L -a
The mmgetstate command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.
The mmlscluster command is fully described in the Commands topic in the: GPFS: Administration and
Programming Reference.
Depending on your configuration, additional information not documented in either the mmcrcluster
command or the mmchconfig command may be displayed to assist in problem determination.
If a configuration parameter is not shown in the output of this command, the default value for that
parameter, as documented in the mmchconfig command, is in effect.
The mmlsconfig command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.
Use the mmrefresh command only when you suspect that something is not working as expected and the
reason for the malfunction is a problem with the GPFS configuration data. For example, a mount
command fails with a device not found error, and you know that the file system exists. Another example
is if any of the files in the /var/mmfs/gen directory were accidentally erased. Under normal
circumstances, the GPFS command infrastructure maintains the cluster data files automatically and there
is no need for user intervention.
The -f flag can be used to force the GPFS cluster configuration data files to be rebuilt whether they
appear to be at the most current level or not. If no other option is specified, the command affects only the
node on which it is run. The remaining flags have the same meaning as in the mmshutdown command,
and are used to specify the nodes on which the refresh is to be performed.
For example, to place the GPFS cluster configuration data files at the latest level, on all nodes in the
cluster, issue:
mmrefresh -a
The mmsdrrestore command restores the latest GPFS system files on the specified nodes. If no nodes are
specified, the command restores the configuration information only on the node where it is invoked. If
the local GPFS configuration file is missing, the file specified with the -F option from the node specified
with the -p option is used instead. This command works best when used in conjunction with the
mmsdrbackup user exit, which is described in the GPFS user exits topic in the GPFS: Administration and
Programming Reference.
The mmsdrrestore command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.
The cluster manager keeps a list of the expelled nodes. Expelled nodes will not be allowed to rejoin the
cluster until they are removed from the list using the -r or --reset option on the mmexpelnode command.
The expelled nodes information will also be reset if the cluster manager node goes down or is changed
with mmchmgr -c.
Or,
mmexpelnode {-l | --list}
Or,
mmexpelnode {-r | --reset} -N {all | Node[,Node...]}
Restricted mode mount is not intended for normal operation, but may allow the recovery of some user
data. Only data which is referenced by intact directories and metadata structures would be available.
Attention:
1. Follow the procedures in “Information to collect before contacting the IBM Support Center” on page
115, and then contact the IBM Support Center before using this capability.
2. Attempt this only after you have tried to repair the file system with the mmfsck command. (See
“Why does the offline mmfsck command fail with "Error creating internal storage"?” on page 111.)
3. Use this procedure only if the failing disk is attached to an AIX or Linux node.
Some disk failures can result in the loss of enough metadata to render the entire file system unable to
mount. In that event it might be possible to preserve some user data through a restricted mode mount. This
facility should only be used if a normal mount does not succeed, and should be considered a last resort
to save some data after a fatal disk failure.
Restricted mode mount is invoked by using the mmmount command with the -o rs flags. After a
restricted mode mount is done, some data may be sufficiently accessible to allow copying to another file
system. The success of this technique depends on the actual disk structures damaged.
Attention: Attempt this only after you have tried to repair the file system with the mmfsck command.
Read-only mode mount is invoked by using the mmmount command with the -o ro flags. After a
read-only mode mount is done, some data may be sufficiently accessible to allow copying to another file
system. The success of this technique depends on the actual disk structures damaged.
The lsof command is available in Linux distributions or by using anonymous ftp from
lsof.itap.purdue.edu (cd to /pub/tools/unix/lsof). The inventor of the lsof command is Victor A. Abell
([email protected]), Purdue University Computing Center.
Use the -L option to see the node name and IP address of each node that has the file system in use. This
command can be used for all file systems, all remotely mounted file systems, or file systems mounted on
nodes of certain clusters.
While not specifically intended as a service aid, the mmlsmount command is useful in these situations:
1. When writing and debugging new file system administrative procedures, to determine which nodes
have a file system mounted and which do not.
2. When mounting a file system on multiple nodes, to determine which nodes have successfully
completed the mount and which have not.
3. When a file system is mounted, but appears to be inaccessible to some nodes but accessible to others,
to determine the extent of the problem.
4. When a normal (not force) unmount has not completed, to determine the affected nodes.
5. When a file system has force unmounted on some nodes but not others, to determine the affected
nodes.
For example, to list the nodes having all file systems mounted:
mmlsmount all -L
The mmlsmount command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.
The -L flag, used in conjunction with the -I test flag, allows you to display the actions that would be
performed by a policy file without actually applying it. This way, potential errors and misunderstandings
can be detected and corrected without actually making these mistakes.
The mmapplypolicy command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.
mmapplypolicy -L 1
Use this option to display all of the information (if any) from the previous level, plus some information
as the command runs, but not for each file. This option also displays total numbers for file migration and
deletion.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 1
mmapplypolicy -L 2
Use this option to display all of the information from the previous levels, plus each chosen file and the
scheduled migration or deletion action.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 2
mmapplypolicy -L 3
Use this option to display all of the information from the previous levels, plus each candidate file and the
applicable rule.
This command:
mmapplypolicy fs1-P policyfile -I test -L 3
mmapplypolicy -L 4
Use this option to display all of the information from the previous levels, plus the name of each explicitly
excluded file, and the applicable rule.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 4
indicate that there are two excluded files, /fs1/file1.save and /fs1/file2.save.
mmapplypolicy -L 5
Use this option to display all of the information from the previous levels, plus the attributes of candidate
and excluded files.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 5
mmapplypolicy -L 6
Use this option to display all of the information from the previous levels, plus files that are not candidate
files, and their attributes.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 6
contains information about the data1 file, which is not a candidate file.
Indications leading you to the conclusion that you should run the mmcheckquota command include:
v MMFS_QUOTA error log entries. This error log entry is created when the quota manager has a
problem reading or writing the quota file.
v Quota information is lost due to node failure. Node failure could leave users unable to open files or
deny them disk space that their quotas should allow.
v The in doubt value is approaching the quota limit. The sum of the in doubt value and the current usage
may not exceed the hard limit. Consequently, the actual block space and number of files available to
the user of the group may be constrained by the in doubt value. Should the in doubt value approach a
significant percentage of the quota, use the mmcheckquota command to account for the lost space and
files.
v User, group, or fileset quota files are corrupted.
During the normal operation of file systems with quotas enabled (not running mmcheckquota online),
the usage data reflects the actual usage of the blocks and inodes in the sense that if you delete files you
should see the usage amount decrease. The in doubt value does not reflect how much the user has used
already, it is just the amount of quotas that the quota server has assigned to its clients. The quota server
does not know whether the assigned amount has been used or not. The only situation where the in doubt
value is important to the user is when the sum of the usage and the in doubt value is greater than the
user's quota hard limit. In this case, the user is not allowed to allocate more blocks or inodes unless he
brings the usage down.
The mmcheckquota command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.
To find out the local device names for these disks, use the mmlsnsd command with the -m option. For
example, issuing mmlsnsd -m produces output similar to this:
Disk name NSD volume ID Device Node name Remarks
------------------------------------------------------------------------------------
hd2n97 0972846145C8E924 /dev/hdisk2 c5n97g.ppd.pok.ibm.com server node
hd2n97 0972846145C8E924 /dev/hdisk2 c5n98g.ppd.pok.ibm.com server node
hd3n97 0972846145C8E927 /dev/hdisk3 c5n97g.ppd.pok.ibm.com server node
hd3n97 0972846145C8E927 /dev/hdisk3 c5n98g.ppd.pok.ibm.com server node
hd4n97 0972846145C8E92A /dev/hdisk4 c5n97g.ppd.pok.ibm.com server node
hd4n97 0972846145C8E92A /dev/hdisk4 c5n98g.ppd.pok.ibm.com server node
hd5n98 0972846245EB501C /dev/hdisk5 c5n97g.ppd.pok.ibm.com server node
hd5n98 0972846245EB501C /dev/hdisk5 c5n98g.ppd.pok.ibm.com server node
hd6n98 0972846245DB3AD8 /dev/hdisk6 c5n97g.ppd.pok.ibm.com server node
hd6n98 0972846245DB3AD8 /dev/hdisk6 c5n98g.ppd.pok.ibm.com server node
hd7n97 0972846145C8E934 /dev/hd7n97 c5n97g.ppd.pok.ibm.com server node
To obtain extended information for NSDs, use the mmlsnsd command with the -X option. For example,
issuing mmlsnsd -X produces output similar to this:
Disk name NSD volume ID Device Devtype Node name Remarks
---------------------------------------------------------------------------------------------------
hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no
hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no
hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no
hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no
sdfnsd 0972845E45F02E81 /dev/sdf generic c5n94g.ppd.pok.ibm.com server node
sdfnsd 0972845E45F02E81 /dev/sdm generic c5n96g.ppd.pok.ibm.com server node
The mmlsnsd command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.
For example, if you issue mmwindisk list, your output is similar to this:
Disk Avail Type Status Size GPFS Partition ID
---- ----- ------- --------- -------- ------------------------------------
0 BASIC ONLINE 137 GiB
1 GPFS ONLINE 55 GiB 362DD84E-3D2E-4A59-B96B-BDE64E31ACCF
2 GPFS ONLINE 200 GiB BD5E64E4-32C8-44CE-8687-B14982848AD2
3 GPFS ONLINE 55 GiB B3EC846C-9C41-4EFD-940D-1AFA6E2D08FB
4 GPFS ONLINE 55 GiB 6023455C-353D-40D1-BCEB-FF8E73BF6C0F
5 GPFS ONLINE 55 GiB 2886391A-BB2D-4BDF-BE59-F33860441262
6 GPFS ONLINE 55 GiB 00845DCC-058B-4DEB-BD0A-17BAD5A54530
7 GPFS ONLINE 55 GiB 260BCAEB-6E8A-4504-874D-7E07E02E1817
8 GPFS ONLINE 55 GiB 863B6D80-2E15-457E-B2D5-FEA0BC41A5AC
9 YES UNALLOC OFFLINE 55 GiB
10 YES UNALLOC OFFLINE 200 GiB
Where:
Disk
is the Windows disk number as shown in the Disk Management console and the DISKPART
command-line utility.
Avail
shows the value YES when the disk is available and in a state suitable for creating an NSD.
The mmwindisk command does not provide the NSD volume ID. You can use mmlsnsd -m to find the
relationship between NSDs and devices, which are disk numbers on Windows.
Attention: Use this command only when directed by the IBM Support Center.
Before running mmfileid, you must run a disk analysis utility and obtain the disk sector numbers that
are suspect or known to be damaged. These sectors are input to the mmfileid command.
Or,
:{NsdName|DiskNum|BROKEN}[:PhysAddr1[-PhysAddr2]]
NodeName
Specifies a node in the GPFS cluster that has access to the disk to scan. NodeName must be
specified if the disk is identified using its physical volume name. NodeName should be omitted if
the disk is identified with its NSD name, its GPFS disk ID number, or if the keyword BROKEN is
used.
DiskName
Specifies the physical volume name of the disk to scan as known on node NodeName.
NsdName
Specifies the GPFS NSD name of the disk to scan.
DiskNum
Specifies the GPFS disk ID number of the disk to scan as displayed by the mmlsdisk -L
command.
BROKEN
Specifies that all disks in the file system should be scanned to find files that have broken
addresses resulting in lost data.
PhysAddr1[-PhysAddr2]
Specifies the range of physical disk addresses to scan. The default value for PhysAddr1 is zero.
The default value for PhysAddr2 is the value for PhysAddr1.
If both PhysAddr1 and PhysAddr2 are zero, the entire disk is searched.
The output can be redirected to a file (using the -o flag) and sorted on the inode number, using the sort
command.
The mmfileid command output contains one line for each inode found to be located on the corrupt disk
sector. Each line of the command output has this format:
InodeNumber LogicalDiskAddress SnapshotId Filename
InodeNumber
Indicates the inode number of the file identified by mmfileid.
LogicalDiskAddress
Indicates the disk block (disk sector) number of the file identified by mmfileid.
SnapshotId
Indicates the snapshot identifier for the file. A SnapshotId of 0 means that the file is not a snapshot
file.
Filename
Indicates the name of the file identified by mmfileid. File names are relative to the root of the file
system in which they reside.
Assume that a disk analysis tool reported that hdisk6, hdisk7, hdisk8, and hdisk9 contained bad sectors.
Then the command:
mmfileid /dev/gpfsB -F addr.in
The lines starting with the word Address represent GPFS system metadata files or reserved disk areas. If
your output contains any of these lines, do not attempt to replace or repair the indicated files. If you
suspect that any of the special files are damaged, call the IBM Support Center for assistance.
The line:
14336 1072256 0 /gpfsB/tesDir/testFile.out
indicates that inode number 14336, disk address 1072256 contains file /gpfsB/tesDir/testFile.out, which
does not belong to a snapshot (0 to the left of the name). This file is located on a potentially bad disk
sector area.
The line
14344 2922528 1 /gpfsB/x.img
indicates that inode number 14344, disk address 2922528 contains file /gpfsB/x.img, which belongs to
snapshot number 1 (1 to the left of the name). This file is located on a potentially bad disk sector area.
The SHA digest is a short and convenient way to identify a key registered with either the mmauth show
or mmremotecluster command. In theory, two keys may have the same SHA digest. In practice, this is
extremely unlikely. The SHA digest can be used by the administrators of two GPFS clusters to determine
if they each have received (and registered) the right key file from the other administrator.
An example is the situation of two administrators named Admin1 and Admin2 who have registered the
others' respective key file, but find that mount attempts by Admin1 for file systems owned by Admin2
fail with the error message: Authorization failed. To determine which administrator has registered the
wrong key, they each run mmauth show and send the local clusters SHA digest to the other
administrator. Admin1 then runs the mmremotecluster command and verifies that the SHA digest for
Admin2's cluster matches the SHA digest for the key that Admin1 has registered. Admin2 then runs the
mmauth show command and verifies that the SHA digest for Admin1's cluster matches the key that
Admin2 has authorized.
If Admin1 finds that the SHA digests do not match, Admin1 runs the mmremotecluster update
command, passing the correct key file as input.
If Admin2 finds that the SHA digests do not match, Admin2 runs the mmauth update command,
passing the correct key file as input.
This is an example of the output produced by the mmauth show all command:
Cluster name: fksdcm.pok.ibm.com
Cipher list: EXP1024-RC2-CBC-MD5
SHA digest: d5eb5241eda7d3ec345ece906bfcef0b6cd343bd
File system access: fs1 (rw, root allowed)
| Deadlocks can be disruptive in certain situations, more so than other type of failure. A deadlock
| effectively represents a single point of failure that can render the entire cluster inoperable. When a
| deadlock is encountered on a production system, it can take a long time to debug. The typical approach
| to recovering from a deadlock involves rebooting all of the nodes in the cluster. Thus, deadlocks can lead
| to prolonged and complete outages of clusters.
| To troubleshoot deadlocks you must have specific types of debug data that must be collected while the
| deadlock is in progress. Data collection commands must be run manually, and if this is not done before
| the deadlock is broken up, determining the root cause of the deadlock after that will be difficult. Also,
| deadlock detection requires some form of external action; for example, a complaint from a user. This
| means that detecting a deadlock in progress could take many hours.
| Starting with GPFS 4.1, automated deadlock detection, automated deadlock data collection, and
| automated deadlock breakup options are provided to make it easier to handle a deadlock situation.
| Automated deadlock detection monitors waiters. Deadlock detection relies on a configurable threshold to
| determine if a deadlock is in progress. When a deadlock is detected, an alert is issued in the mmfs.log,
| the operating system log, and the deadlockDetected callback is triggered.
| Automated deadlock detection is enabled by default and controlled with the mmchconfig attribute
| deadlockDetectionThreshold. A potential deadlock is detected when a waiter waits longer than
| deadlockDetectionThreshold. To view the current threshold for deadlock detection, enter the following
| command:
| mmlsconfig deadlockDetectionThreshold
| Automated deadlock data collection can be used to help gather this crucial debug data on detection of a
| potential deadlock.
| Automated deadlock data collection is enabled by default and controlled with the mmchconfig attribute
| deadlockDataCollectionDailyLimit. The deadlockDataCollectionDailyLimit attribute specifies the
| maximum number of times debug data can be collected in a 24-hour period. To view the current data
| collection interval, enter the following command:
| mmlsconfig deadlockDataCollectionDailyLimit
| If a system administrator prefers to control the deadlock breakup process, the deadlockDetected callback
| can be used to notify system administrators that a potential deadlock has been detected. The information
| from the mmdiag --deadlock section can then be used to help determine what steps to take to resolve the
| deadlock.
| Automated deadlock breakup is disabled by default and controlled with the mmchconfig attribute
| deadlockBreakupDelay. The deadlockBreakupDelay attribute specifies how long to wait after a
| deadlock is detected before attempting to break up the deadlock. Enough time must be provided to allow
| the debug data collection to complete. To view the current breakup delay, enter the following command:
| mmlsconfig deadlockBreakupDelay
| The value of 0 shows that automated deadlock breakup is disabled. To enable automated deadlock
| breakup, specify a positive value for deadlockBreakupDelay. If automated deadlock breakup is to be
| enabled, a delay of 300 seconds or longer is recommended.
If your problem occurs on the AIX operating system, see the appropriate kernel debugging
documentation in the AIX Information Center (http://publib16.boulder.ibm.com/pseries/index.htm) for
information about the AIX kdb command.
If your problem occurs on the Linux operating system, see the documentation for your distribution
vendor.
If your problem occurs on the Windows operating system, the following tools that are available from
Microsoft at: http://www.microsoft.com/en/us/default.aspx, might be useful in troubleshooting:
v Debugging Tools for Windows
v Process Monitor
v Process Explorer
v Microsoft Windows Driver Kit
v Microsoft Windows Software Development Kit
The mmpmon command is intended for system administrators to analyze their I/O on the node on
which it is run. It is not primarily a diagnostic tool, but may be used as one for certain problems. For
example, running mmpmon on several nodes may be used to detect nodes that are experiencing poor
performance or connectivity problems.
The syntax of the mmpmon command is fully described in the Commands topic in the GPFS:
Administration and Programming Reference. For details on the mmpmon command, see the Monitoring GPFS
I/O performance with the mmpmon command topic in the GPFS: Advanced Administration Guide.
A GPFS installation problem should be suspected when GPFS modules are not loaded successfully, GPFS
commands do not work, either on the node that you are working on or on other nodes, new command
operands added with a new release of GPFS are not recognized, or there are problems with the kernel
extension.
A GPFS configuration problem should be suspected when the GPFS daemon will not activate, it will not
remain active, or it fails on some nodes but not on others. Suspect a configuration problem also if
quorum is lost, certain nodes appear to hang or do not communicate properly with GPFS, nodes cannot
be added to the cluster or are expelled, or GPFS performance is very noticeably degraded once a new
release of GPFS is installed or configuration parameters have been changed.
These are some of the errors encountered with GPFS installation, configuration and operation:
v “Installation and configuration problems”
v “GPFS modules cannot be loaded on Linux” on page 46
v “GPFS daemon will not come up” on page 47
v “GPFS daemon went down” on page 50
v “GPFS failures due to a network failure” on page 51
v “Kernel panics with a 'GPFS dead man switch timer has expired, and there's still outstanding I/O
requests' message” on page 52
v “Quorum loss” on page 52
v “Delays and deadlocks” on page 53
v “Node cannot be added to the GPFS cluster” on page 54
v “Remote node expelled after remote file system successfully mounted” on page 55
v “Disaster recovery problems” on page 55
v “GPFS commands are unsuccessful” on page 56
v “Application program errors” on page 58
v “Troubleshooting Windows problems” on page 59
v “OpenSSH connection delays” on page 60
| where primaryServer is the name of the primary GPFS cluster configuration server.
If the /var/mmfs/gen/mmsdrfs file is not present on the primary GPFS cluster configuration server, but it
is present on some other node in the cluster, restore the file by issuing these commands:
mmsdrrestore -p remoteNode -F remoteFile
mmchcluster -p LATEST
where remoteNode is the node that has an up-to-date version of the /var/mmfs/gen/mmsdrfs file, and
remoteFile is the full path name of that file on that node.
| This restore procedure may not work if the repository type is CCR and the files in /var/mmfs/ccr are
| damaged or missing. If this is the case, locate a node on which the /var/mmfs/gen/mmsdrfs file is
| present. Ensure that GPFS is shut down on all of the nodes and then disable the CCR:
| mmchcluster --ccr-disable
| Using the procedure described in this section, restore the /var/mmfs/gen/mmsdrfs file on all nodes on
| which it is missing and then re-enable the CCR:
| mmchcluster --ccr-enable
One way to ensure that the latest version of the /var/mmfs/gen/mmsdrfs file is always available is to use
the mmsdrbackup user exit.
If you have made modifications to any of the user exits in /var/mmfs/etc, you will have to restore them
before starting GPFS.
For additional information, see “Recovery from loss of GPFS cluster configuration data file” on page 45.
If you receive this message, correct the /etc/hosts file so that each node interface to be used by GPFS
appears only once in the file.
Authorization problems
The rsh and rcp commands are used by GPFS administration commands to perform operations on other
nodes. The rsh daemon (rshd) on the remote node must recognize the command being run and must
obtain authorization to invoke it.
| Note: The rsh and rcp commands that are shipped with Cygwin are not supported on Windows. Use the
ssh and scp commands that are shipped with the OpenSSH package supported by GPFS. Refer to the
GPFS FAQ in the IBM Cluster information center (http://publib.boulder.ibm.com/infocenter/clresctr/
vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html) or GPFS FAQ in the IBM
Knowledge Center (http://www.ibm.com/support/knowledgecenter/SSFKCN/gpfsclustersfaq.html) for
the latest OpenSSH information.
For the rsh and rcp commands issued by GPFS administration commands to succeed, each node in the
cluster must have an .rhosts file in the home directory for the root user, with file permission set to 600.
This .rhosts file must list each of the nodes and the root user. If such an .rhosts file does not exist on each
node in the cluster, the rsh and rcp commands issued by GPFS commands will fail with permission
errors, causing the GPFS commands to fail in turn.
If you elected to use installation specific remote invocation shell and remote file copy commands, you
must ensure:
1. Proper authorization is granted to all nodes in the GPFS cluster.
2. The nodes in the GPFS cluster can communicate without the use of a password, and without any
extraneous messages.
Connectivity problems
Another reason why rsh may fail is that connectivity to a needed node has been lost. Error messages
from mmdsh may indicate that connectivity to such a node has been lost. Here is an example:
If error messages indicate that connectivity to a node has been lost, use the ping command to verify
whether the node can still be reached:
ping k145n04
PING k145n04: (119.114.68.69): 56 data bytes
<Ctrl- C>
----k145n04 PING Statistics----
3 packets transmitted, 0 packets received, 100% packet loss
If connectivity has been lost, restore it, then reissue the GPFS command.
The mmcommon showLocks command displays information about the lock server, lock name, lock
holder, PID, and extended information. If a GPFS administration command is not responding,
stopping the command will free the lock. If another process has this PID, another error occurred to
the original GPFS command, causing it to die without freeing the lock, and this new process has the
same PID. If this is the case, do not kill the process.
2. If any locks are held and you want to release them manually, from any node in the GPFS cluster issue
the command:
mmcommon freeLocks
If the /var/mmfs/gen/mmsdrfs file is removed by accident from any of the nodes, and an up-to-date
version of the file is present on the primary GPFS cluster configuration server, restore the file by issuing
this command from the node on which it is missing:
| mmsdrrestore -p primaryServer
| where primaryServer is the name of the primary GPFS cluster configuration server.
If the /var/mmfs/gen/mmsdrfs file is not present on the primary GPFS cluster configuration server, but is
present on some other node in the cluster, restore the file by issuing these commands:
mmsdrrestore -p remoteNode -F remoteFile
mmchcluster -p LATEST
where remoteNode is the node that has an up-to-date version of the /var/mmfs/gen/mmsdrfs file and
remoteFile is the full path name of that file on that node.
| This restore procedure may not work if the repository type is CCR and the files in /var/mmfs/ccr are
| damaged or missing. If this is the case, locate a node on which the /var/mmfs/gen/mmsdrfs file is
| present. Ensure that GPFS is shut down on all of the nodes and then disable the CCR:
| mmchcluster --ccr-disable
| Using the procedure described in this section, restore the /var/mmfs/gen/mmsdrfs file on all nodes on
| which it is missing and then re-enable the CCR:
| mmchcluster --ccr-enable
One way to ensure that the latest version of the /var/mmfs/gen/mmsdrfs file is always available is to use
the mmsdrbackup user exit.
Some of the more common problems that you may encounter are:
1. If the portability layer is not built, you may see messages similar to:
Mon Mar 26 20:56:30 EDT 2012: runmmfs starting
Removing old /var/adm/ras/mmfs.log.* files:
Unloading modules from /lib/modules/2.6.32.12-0.6-ppc64/extra
runmmfs: The /lib/modules/2.6.32.12-0.6-ppc64/extra/mmfslinux.ko kernel extension does not exist.
runmmfs: Unable to verify kernel/module configuration.
For more information about the mmfslinux module, see the Building the GPFS portability layer topic in the
GPFS: Concepts, Planning, and Installation Guide.
The output of this command should list mmfsd as operational. For example:
12230 pts/8 00:00:00 mmfsd
If the output does not show this, the GPFS daemon needs to be started with the mmstartup
command.
3. If you did not specify the autoload option on the mmcrcluster or the mmchconfig command, you
need to manually start the daemon by issuing the mmstartup command.
to each node in the cluster. A properly working network and node will correctly reply to the ping
with no lost packets.
Query the network interface that GPFS is using with:
netstat -i
Determine the problem with accessing node nodeName and correct it.
6. Verify that the GPFS environment is properly initialized by issuing these commands and ensuring that
the output is as expected.
v Issue the mmlscluster command to list the cluster configuration. This will also update the GPFS
configuration data on the node. Correct any reported errors before continuing.
v List all file systems that were created in this cluster. For an AIX node, issue:
lsfs -v mmfs
For a Linux node, issue:
cat /etc/fstab | grep gpfs
If any of these commands produce unexpected results, this may be an indication of corrupted GPFS
cluster configuration data file information. Follow the procedures in “Information to collect before
contacting the IBM Support Center” on page 115, and then contact the IBM Support Center.
7. GPFS requires a quorum of nodes to be active before any file system operations can be honored. This
requirement guarantees that a valid single token management domain exists for each GPFS file
system. Prior to the existence of a quorum, most requests are rejected with a message indicating that
quorum does not exist.
To identify which nodes in the cluster have daemons up or down, issue:
mmgetstate -L -a
If insufficient nodes are active to achieve quorum, go to any nodes not listed as active and perform
problem determination steps on these nodes. A quorum node indicates that it is part of a quorum by
writing an mmfsd ready message to the GPFS log. Remember that your system may have quorum
nodes and non-quorum nodes, and only quorum nodes are counted to achieve the quorum.
8. This step applies only to AIX nodes. Verify that GPFS kernel extension is not having problems with its
shared segment by invoking:
cat /var/adm/ras/mmfs.log.latest
is written to the GPFS log, incompatible versions of GPFS code exist on nodes within the same cluster.
v If messages stating that functions are not supported are written to the GPFS log, you may not have the
correct kernel extensions loaded.
1. Ensure that the latest GPFS install packages are loaded on your system.
2. If running on Linux, ensure that the latest kernel extensions have been installed and built. See the
Building the GPFS portability layer topic in the GPFS: Concepts, Planning, and Installation Guide.
3. Reboot the GPFS node after an installation to ensure that the latest kernel extension is loaded.
v The daemon will not start because the configuration data was not migrated. See “Installation and
configuration problems” on page 41.
For network problems, follow the problem determination and repair actions specified with the following
message:
| 6027-306 [E]
Could not initialize inter-node communication
These are all conditions where the GPFS internal checking has determined that continued operation
would be dangerous to the consistency of your data. Some of these conditions are errors within GPFS
processing but most represent a failure of the surrounding environment.
In most cases, the daemon will exit and restart after recovery. If it is not safe to simply force the
unmounted file systems to recover, the GPFS daemon will exit.
Indications leading you to the conclusion that the daemon went down:
v Applications running at the time of the failure will see either ENODEV or ESTALE errors. The ENODEV
errors are generated by the operating system until the daemon has restarted. The ESTALE error is
generated by GPFS as soon as it restarts.
When quorum is lost, applications with open files receive an ESTALE error return code until the files are
closed and reopened. New file open operations will fail until quorum is restored and the file system is
remounted. Applications accessing these files prior to GPFS return may receive a ENODEV return code
from the operating system.
v The GPFS log contains the message:
| 6027-650 [X]
The mmfs daemon is shutting down abnormally.
Most GPFS daemon down error messages are in the mmfs.log.previous log for the instance that failed.
If the daemon restarted, it generates a new mmfs.log.latest. Begin problem determination for these
errors by examining the operating system error log.
If an existing quorum is lost, GPFS stops all processing within the cluster to protect the integrity of
your data. GPFS will attempt to rebuild a quorum of nodes and will remount the file system if
automatic mounts are specified.
v Open requests are rejected with no such file or no such directory errors.
When quorum has been lost, requests are rejected until the node has rejoined a valid quorum and
mounted its file systems. If messages indicate lack of quorum, follow the procedures in “GPFS daemon
will not come up” on page 47.
v Removing the setuid bit from the permissions of these commands may produce errors for non-root
users:
Note: The mode bits for all listed commands are 4555 or -r-sr-xr-x. To restore the default (shipped)
permission, enter:
chmod 4555 tscommand
This dependence is direct because various GPFS internal messages flow on the network, and may be
indirect if the underlying disk technology is dependent on the network. Symptoms included in an
indirect failure would be inability to complete I/O or GPFS moving disks to the down state.
The problem can also be first detected by the GPFS network communication layer. If network
connectivity is lost between nodes or GPFS heart beating services cannot sustain communication to a
node, GPFS will declare the node dead and perform recovery procedures. This problem will manifest
itself by messages appearing in the GPFS log such as:
Mon Jun 25 22:23:36.298 2007: Close connection to 192.168.10.109 c5n109. Attempting reconnect.
Mon Jun 25 22:23:37.300 2007: Connecting to 192.168.10.109 c5n109
Mon Jun 25 22:23:37.398 2007: Close connection to 192.168.10.109 c5n109
Mon Jun 25 22:23:38.338 2007: Recovering nodes: 9.114.132.109
Mon Jun 25 22:23:38.722 2007: Recovered 1 nodes.
If a sufficient number of nodes fail, GPFS will lose the quorum of nodes, which exhibits itself by
messages appearing in the GPFS log, similar to this:
Mon Jun 25 11:08:10 2007: Close connection to 179.32.65.4 gpfs2
Mon Jun 25 11:08:10 2007: Lost membership in cluster gpfsxx.kgn.ibm.com. Unmounting file system.
When either of these cases occur, perform problem determination on your network connectivity. Failing
components could be network hardware such as switches or host bus adapters.
Kernel panics with a 'GPFS dead man switch timer has expired, and
there's still outstanding I/O requests' message
This problem can be detected by an error log with a label of KERNEL_PANIC, and the PANIC
MESSAGES or a PANIC STRING.
For example:
GPFS Deadman Switch timer has expired, and there’s still outstanding I/O requests
GPFS is designed to tolerate node failures through per-node metadata logging (journaling). The log file is
called the recovery log. In the event of a node failure, GPFS performs recovery by replaying the recovery
log for the failed node, thus restoring the file system to a consistent state and allowing other nodes to
continue working. Prior to replaying the recovery log, it is critical to ensure that the failed node has
indeed failed, as opposed to being active but unable to communicate with the rest of the cluster.
In the latter case, if the failed node has direct access (as opposed to accessing the disk with an NSD
server) to any disks that are a part of the GPFS file system, it is necessary to ensure that no I/O requests
submitted from this node complete once the recovery log replay has started. To accomplish this, GPFS
uses the disk lease mechanism. The disk leasing mechanism guarantees that a node does not submit any
more I/O requests once its disk lease has expired, and the surviving nodes use disk lease time out as a
guideline for starting recovery.
This situation is complicated by the possibility of 'hung I/O'. If an I/O request is submitted prior to the
disk lease expiration, but for some reason (for example, device driver malfunction) the I/O takes a long
time to complete, it is possible that it may complete after the start of the recovery log replay during
recovery. This situation would present a risk of file system corruption. In order to guard against such a
contingency, when I/O requests are being issued directly to the underlying disk device, GPFS initiates a
kernel timer, referred to as dead man switch. The dead man switch timer goes off in the event of disk
lease expiration, and checks whether there is any outstanding I/O requests. If there is any I/O pending, a
kernel panic is initiated to prevent possible file system corruption.
Such a kernel panic is not an indication of a software defect in GPFS or the operating system kernel, but
rather it is a sign of
1. Network problems (the node is unable to renew its disk lease).
2. Problems accessing the disk device (I/O requests take an abnormally long time to complete). See
“MMFS_LONGDISKIO” on page 4.
Quorum loss
Each GPFS cluster has a set of quorum nodes explicitly set by the cluster administrator.
When quorum loss or loss of connectivity occurs, any nodes still running GPFS suspend the use of file
systems owned by the cluster experiencing the problem. This may result in GPFS access within the
suspended file system receiving ESTALE errnos. Nodes continuing to function after suspending file
system access will start contacting other nodes in the cluster in an attempt to rejoin or reform the
quorum. If they succeed in forming a quorum, access to the file system is restarted.
Normally, quorum loss or loss of connectivity occurs if a node goes down or becomes isolated from its
peers by a network failure. The expected response is to address the failing condition.
If file system processes appear to stop making progress, there may be a system resource problem or an
internal deadlock within GPFS.
Note: A deadlock can occur if user exit scripts that will be called by the mmaddcallback facility are
placed in a GPFS file system. The scripts should be placed in a local file system so they are accessible
even when the networks fail.
Inode Information
-----------------
| This example shows that deadlock debug data was automatically collected in /tmp/mmfs. If deadlock
| debug data was not automatically collected, it would need to be manually collected.
| To determine which nodes have the longest waiting threads, issue this command on each node:
| /usr/lpp/mmfs/bin/mmdiag --waiters waitTimeInSeconds
| For all nodes that have threads waiting longer than waitTimeInSeconds seconds, issue:
| mmfsadm dump all
| Notes:
| a. Each node can potentially dump more than 200 MB of data.
| b. Run the mmfsadm dump all command only on nodes that you are sure the threads are really
| hung. An mmfsadm dump all command can follow pointers that are changing and cause the node
| to crash.
| 3. If the deadlock situation cannot be corrected, follow the instructions in “Additional information to
| collect for delays and deadlocks” on page 116, then contact the IBM Support Center.
One cause of this condition is when the subnets attribute of the mmchconfig command has been used to
specify subnets to GPFS, and there is an incorrect netmask specification on one or more nodes of the
clusters involved in the remote mount. Check to be sure that all netmasks are correct for the network
interfaces used for GPFS communication.
might appear in the GPFS log for active/active disaster recovery scenarios with GPFS replication. The
purpose of these messages is to record the fact that a quorum override decision has been made after the
loss of a majority of disks. A message similar to the these will appear in the log on the file system
manager node every time it reads the file system descriptor with an overridden quorum:
...
| 6027-435 [N] The file system descriptor quorum has been overridden.
| 6027-490 [N] The descriptor replica on disk gpfs23nsd has been excluded.
| 6027-490 [N] The descriptor replica on disk gpfs24nsd has been excluded.
...
For more information on quorum override, see the GPFS: Concepts, Planning, and Installation Guide and
search on quorum.
For PPRC and FlashCopy-based configurations, additional problem determination information may be
collected from the ESS log file. This information and the appropriate ESS documentation should be
consulted when dealing with various types disk subsystem-related failures. For instance, if users are
These messages indicate that rsh is not working properly on nodes k145n01 and k145n02.
If you encounter this type of failure, determine why rsh is not working on the identified node. Then
fix the problem.
4. Most problems encountered during file system creation fall into three classes:
v You did not create network shared disks which are required to build the file system.
v The creation operation cannot access the disk.
Follow the procedures for checking access to the disk. This can result from a number of factors
including those described in “NSD and underlying disk subsystem failures” on page 91.
v Unsuccessful attempt to communicate with the file system manager.
The file system creation runs on the file system manager node. If that node goes down, the mmcrfs
command may not succeed.
5. If the mmdelnode command was unsuccessful and you plan to permanently de-install GPFS from a
node, you should first remove the node from the cluster. If this is not done and you run the
mmdelnode command after the mmfs code is removed, the command will fail and display a message
similar to this example:
Verifying GPFS is stopped on all affected nodes ...
k145n05: ksh: /usr/lpp/mmfs/bin/mmremote: not found.
If this happens, power off the node and run the mmdelnode command again.
6. If you have successfully installed and are operating with the latest level of GPFS, but cannot run the
new functions available, it is probable that you have not issued the mmchfs -V full or mmchfs -V
compat command to change the version of the file system. This command must be issued for each of
your file systems.
In addition to mmchfs -V, you may need to run the mmmigratefs command. See the File system
format changes between versions of GPFS topic in the GPFS: Administration and Programming Reference.
Note: Before issuing the -V option (with full or compat), see the Migration, coexistence and compatibility
topic in the GPFS: Concepts, Planning, and Installation Guide. You must ensure that all nodes in the
cluster have been migrated to the latest level of GPFS code and that you have successfully run the
mmchconfig release=LATEST command.
If the daemon failed while running the command, you will see message 6027-663. Follow the procedures
in “GPFS daemon went down” on page 50.
6027-663
Lost connection to file system daemon.
If the daemon was not running when you issued the command, you will see message 6027-665. Follow
the procedures in “GPFS daemon will not come up” on page 47.
6027-665
Failed to connect to file system daemon: errorString.
When GPFS commands are unsuccessful, the system may display information similar to these error
messages:
6027-1627
The following nodes are not aware of the configuration server change: nodeList. Do not start GPFS
on the above nodes until the problem is resolved.
Note: There is no way to force GPFS nodes to relinquish all their local shares in order to check for
lost quotas. This can only be determined by running the mmcheckquota command immediately after
mounting the file system, and before any allocations are made. In this case, the value in doubt is the
amount lost.
To display the latest quota usage information, use the -e option on either the mmlsquota or the
mmrepquota commands. Remember that the mmquotaon and mmquotaoff commands do not enable
and disable quota management. These commands merely control enforcement of quota limits. Usage
continues to be counted and recorded in the quota files regardless of enforcement.
Reduce quota usage by deleting or compressing files or moving them out of the file system. Consider
increasing quota limit.
The SMB2 protocol is negotiated between a client and the server during the establishment of the SMB
connection, and it becomes active only if both the client and the server are SMB2 capable. If either side is
not SMB2 capable, the default SMB (version 1) protocol gets used.
The SMB2 protocol does active metadata caching on the client redirector side, and it relies on Directory
Change Notification on the server to invalidate and refresh the client cache. However, GPFS on Windows
currently does not support Directory Change Notification. As a result, if SMB2 is used for serving out a
GPFS filesystem, the SMB2 redirector cache on the client will not see any cache-invalidate operations if
the actual metadata is changed, either directly on the server or via another CIFS client. In such a case, the
SMB2 client will continue to see its cached version of the directory contents until the redirector cache
expires. Therefore, the use of SMB2 protocol for CIFS sharing of GPFS file systems can result in the CIFS
clients seeing an inconsistent view of the actual GPFS namespace.
A workaround is to disable the SMB2 protocol on the CIFS server (that is, the GPFS compute node). This
will ensure that the SMB2 never gets negotiated for file transfer even if any CIFS client is SMB2 capable.
To disable SMB2 on the GPFS compute node, follow the instructions under the “MORE INFORMATION”
section at the following URL:
http://support.microsoft.com/kb/974103
If you are using OpenSSH and experiencing an SSH connection delay (and if IPv6 is not supported in
your environment), try disabling IPv6 on your Windows nodes and remove or comment out any IPv6
addresses from the /etc/resolv.conf file.
You can also suspect a file system problem if a file system unmounts unexpectedly, or you receive an
error message indicating that file system activity can no longer continue due to an error, and the file
system is being unmounted to preserve its integrity. Record all error messages and log entries that you
receive relative to the problem, making sure that you look on all affected nodes for this data.
These are some of the errors encountered with GPFS file systems:
v “File system will not mount”
v “File system will not unmount” on page 70
v “File system forced unmount” on page 71
v “Unable to determine whether a file system is mounted” on page 73
v “Multiple file system manager failures” on page 74
v “Discrepancy between GPFS configuration data and the on-disk data for a file system” on page 75
v “Errors associated with storage pools, filesets and policies” on page 75
v “Failures using the mmbackup command” on page 81
v “Snapshot problems” on page 82
v “Failures using the mmpmon command” on page 85
v “NFS problems” on page 87
v “Problems working with Samba” on page 87
v “Data integrity” on page 88
v “Messages requeuing in AFM” on page 88
To start the automount daemon, issue the mmcommon startAutomounter command, or stop and
restart GPFS using the mmshutdown and mmstartup commands.
Note: If automountdir is mounted (as in step 2) and the mmcommon startAutomounter command is
not able to bring up the automount daemon, manually umount the automountdir before issuing the
mmcommon startAutomounter again.
4. Verify that the mount command was issued to GPFS by examining the GPFS log. You should see
something like this:
Mon Jun 25 11:33:03 2004: Command: mount gpfsx2.kgn.ibm.com:gpfs55 5182
5. Examine /var/log/messages for autofs error messages.
This is an example of what you might see if the remote file system name does not exist.
To start the automount daemon, issue the mmcommon startAutomounter command, or stop and
restart GPFS using the mmshutdown and mmstartup commands.
4. Verify that the mount command was issued to GPFS by examining the GPFS log. You should see
something like this:
Mon Jun 25 11:33:03 2007: Command: mount gpfsx2.kgn.ibm.com:gpfs55 5182
5. Since the autofs daemon logs status using syslogd, examine the syslogd log file for status information
from automountd. Here is an example of a failed automount request:
Jun 25 15:55:25 gpfsa1 automountd [9820 ] :mount of /gpfs/gpfs55:status 13
6. After you have established that GPFS has received a mount request from autofs (Step 4) and that
mount request failed (Step 5), issue a mount command for the GPFS file system and follow the
directions in “File system will not mount” on page 61.
7. If automount fails for a non-GPFS file system and you are using file /etc/auto.master, use file
/etc/auto_master instead. Add the entries from /etc/auto.master to /etc/auto_master and restart the
automount daemon.
These are some of the errors encountered when mounting remote file systems:
Remote file system I/O fails with the “Function not implemented” error message
when UID mapping is enabled
When user ID (UID) mapping in a multi-cluster environment is enabled, certain kinds of mapping
infrastructure configuration problems might result in I/O requests on a remote file system failing:
ls -l /fs1/testfile
ls: /fs1/testfile: Function not implemented
| For more information about configuring UID mapping, see the IBM white paper entitled UID Mapping for
| GPFS in a Multi-cluster Environment in the IBM Cluster information center (http://
| publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_uid/uid_gpfs.htm)
| or the IBM Knowledge Center (http://www.ibm.com/support/knowledgecenter/SSFKCN/
| uid_gpfs.html).
Remote file system will not mount due to differing GPFS cluster security
configurations
A mount command fails with a message similar to this:
Cannot mount gpfsxx2.ibm.com:gpfs66: Host is down.
The GPFS log on the cluster issuing the mount command should have entries similar to these:
There is more information in the log file /var/adm/ras/mmfs.log.latest
Mon Jun 25 16:39:27 2007: Waiting to join remote cluster gpfsxx2.ibm.com
Mon Jun 25 16:39:27 2007: Command: mount gpfsxx2.ibm.com:gpfs66 30291
Mon Jun 25 16:39:27 2007: The administrator of 199.13.68.12 gpfslx2 requires
secure connections. Contact the administrator to obtain the target clusters
key and register the key using "mmremotecluster update".
Mon Jun 25 16:39:27 2007: A node join was rejected. This could be due to
incompatible daemon versions, failure to find the node
in the configuration database, or no configuration manager found.
Mon Jun 25 16:39:27 2007: Failed to join remote cluster gpfsxx2.ibm.com
Mon Jun 25 16:39:27 2007: Command err 693: mount gpfsxx2.ibm.com:gpfs66 30291
The GPFS log file on the cluster that owns and serves the file system will have an entry indicating the
problem as well, similar to this:
To resolve this problem, contact the administrator of the cluster that owns and serves the file system to
obtain the key and register the key using mmremotecluster command.
The SHA digest field of the mmauth show and mmremotecluster commands may be used to determine if
there is a key mismatch, and on which cluster the key should be updated. For more information on the
SHA digest, see “The SHA digest” on page 35.
To resolve the problem, correct the contact list and try the mount again.
The remote cluster name does not match the cluster name supplied by the
mmremotecluster command
A mount command fails with a message similar to this:
Cannot mount gpfslx2:gpfs66: Network is unreachable
In this example, the correct cluster name is gpfslx2.ibm.com and not gpfslx2
mmlscluster
To resolve the problem, use the mmremotecluster show command and verify that the cluster name
matches the remote cluster and the contact nodes are valid nodes in the remote cluster. Verify that GPFS
is active on the contact nodes in the remote cluster. Another way to resolve this problem is to change the
contact nodes using the mmremotecluster update command.
The NSD disk does not have an NSD server specified and the mounting cluster
does not have direct access to the disks
A file system mount fails with a message similar to this:
Failed to open gpfs66.
No such device
mount: Stale NFS file handle
Some file system data are inaccessible at this time.
Check error log for additional information.
Cannot mount gpfslx2.ibm.com:gpfs66: Stale NFS file handle
To resolve the problem, the cluster that owns and serves the file system must define one or more NSD
servers.
The mmchconfig cipherlist=AUTHONLY command must be run on both the cluster that owns and
controls the file system, and the cluster that is attempting to mount the file system.
See the GPFS: Administration and Programming Reference for detailed information about the mmauth
command and the mmremotefs command.
Mount failure due to client nodes joining before NSD servers are
online
If a client node joins the GPFS cluster and attempts file system access prior to the file system's NSD
servers being active, the mount fails. This is especially true when automount is used. This situation can
occur during cluster startup, or any time that an NSD server is brought online with client nodes already
active and attempting to mount a file system served by the NSD server.
Two mmchconfig command options are used to specify the amount of time for GPFS mount requests to
wait for an NSD server to join the cluster:
nsdServerWaitTimeForMount
Specifies the number of seconds to wait for an NSD server to come up at GPFS cluster startup
time, after a quorum loss, or after an NSD server failure.
Valid values are between 0 and 1200 seconds. The default is 300. The interval for checking is 10
seconds. If nsdServerWaitTimeForMount is 0, nsdServerWaitTimeWindowOnMount has no
effect.
nsdServerWaitTimeWindowOnMount
Specifies a time window to determine if quorum is to be considered recently formed.
Valid values are between 1 and 1200 seconds. The default is 600. If nsdServerWaitTimeForMount
is 0, nsdServerWaitTimeWindowOnMount has no effect.
The GPFS daemon need not be restarted in order to change these values. The scope of these two
operands is the GPFS cluster. The -N flag can be used to set different values on different nodes. In this
case, the settings on the file system manager node take precedence over the settings of nodes trying to
access the file system.
When a node rejoins the cluster (after it was expelled, experienced a communications problem, lost
quorum, or other reason for which it dropped connection and rejoined), that node resets all the failure
times that it knows about. Therefore, when a node rejoins it sees the NSD servers as never having failed.
From the node's point of view, it has rejoined the cluster and old failure information is no longer
relevant.
GPFS checks the cluster formation criteria first. If that check falls outside the window, GPFS then checks
for NSD server fail times being within the window.
the file system will not unmount until all processes are finished accessing it. If mmfsd is up, the
processes accessing the file system can be determined. See “The lsof command” on page 24. These
processes can be killed with the command:
lsof filesystem | grep -v COMMAND | awk ’{print $2}’ | xargs kill -9
If mmfsd is not operational, the lsof command will not be able to determine which processes are still
accessing the file system.
Note:
a. See “File system forced unmount” for the consequences of doing this.
b. Before forcing the unmount of the file system, issue the lsof command and close any files that are
open.
c. On Linux, you might encounter a situation where a GPFS file system cannot be unmounted, even
if you issue the mmumount -f command. In this case, you must reboot the node to clear the
condition. You can also try the system umount command before you reboot. For example:
| umount -f /fileSystem
4. If a file system that is mounted by a remote cluster needs to be unmounted, you can force the
unmount by issuing the command:
| mmumount fileSystem -f -C RemoteClusterName
If your file system has been forced to unmount, follow these steps:
1. With the failure of a single disk, if you have not specified multiple failure groups and replication of
metadata, GPFS will not be able to continue because it cannot write logs or other critical metadata. If
you have specified multiple failure groups and replication of metadata, the failure of multiple disks in
different failure groups will put you in the same position. In either of these situations, GPFS will
Once it is decided how many replicas to create, GPFS picks disks to hold the replicas, so that all replicas
will be in different failure groups, if possible, to reduce the risk of multiple failures. In picking replica
locations, the current state of the disks is taken into account. Stopped or suspended disks are avoided.
Similarly, when a failed disk is brought back online, GPFS may modify the subset to rebalance the file
system descriptors across the failure groups. The subset can be found by issuing the mmlsdisk -L
command.
GPFS requires a majority of the replicas on the subset of disks to remain available to sustain file system
operations:
v If there are at least five different failure groups, GPFS will be able to tolerate a loss of two of the five
groups. If disks out of three different failure groups are lost, the file system descriptor may become
inaccessible due to the loss of the majority of the replicas.
v If there are at least three different failure groups, GPFS will be able to tolerate a loss of one of the three
groups. If disks out of two different failure groups are lost, the file system descriptor may become
inaccessible due to the loss of the majority of the replicas.
v If there are fewer than three failure groups, a loss of one failure group may make the descriptor
inaccessible.
In certain failure situations, GPFS cannot determine whether the file system in question is mounted or
not, and so cannot perform the requested command. In such cases, message 6027-1996 (Command was
unable to determine whether file system fileSystem is mounted) is issued.
If you encounter this message, perform problem determination, resolve the problem, and reissue the
command. If you cannot determine or resolve the problem, you may be able to successfully run the
command by first shutting down the GPFS daemon on all nodes of the cluster (using mmshutdown -a),
thus ensuring that the file system is not mounted.
When the file system manager node fails, another file system manager is appointed in a manner that is
not visible to applications except for the time required to switch over.
There are situations where it may be impossible to appoint a file system manager. Such situations involve
the failure of paths to disk resources from many, if not all, nodes. In this event, the cluster manager
nominates several host names to successively try to become the file system manager. If none succeed, the
cluster manager unmounts the file system everywhere. See “NSD and underlying disk subsystem
failures” on page 91.
The required action here is to address the underlying condition that caused the forced unmounts and
then remount the file system. In most cases, this means correcting the path to the disks required by GPFS.
If NSD disk servers are being used, the most common failure is the loss of access through the
communications network. If SAN access is being used to all disks, the most common failure is the loss of
connectivity through the SAN.
You issue a disk command (for example, mmadddisk, mmdeldisk, or mmrpldisk) and receive the
message:
6027-1290
GPFS configuration data for file system fileSystem may not be in agreement with the on-disk data
for the file system. Issue the command:
mmcommon recoverfs fileSystem
Before a disk is added to or removed from a file system, a check is made that the GPFS configuration
data for the file system is in agreement with the on-disk data for the file system. The above message is
issued if this check was not successful. This may occur if an earlier GPFS disk command was unable to
complete successfully for some reason. Issue the mmcommon recoverfs command to bring the GPFS
configuration data into agreement with the on-disk data for the file system.
If running mmcommon recoverfs does not resolve the problem, follow the procedures in “Information to
collect before contacting the IBM Support Center” on page 115, and then contact the IBM Support Center.
When you are sure that your setup is correct, see if your problem falls into one of these categories:
v “A NO_SPACE error occurs when a file system is known to have adequate free space”
v “Negative values occur in the 'predicted pool utilizations', when some files are 'ill-placed'” on page 77
v “Policies - usage errors” on page 77
v “Errors encountered with policies” on page 78
v “Filesets - usage errors” on page 79
v “Errors encountered with filesets” on page 79
v “Storage pools - usage errors” on page 80
v “Errors encountered with storage pools” on page 81
This output indicates that the file system is only 51% full.
4. To query the storage usage for an individual storage pool, the user must issue the mmdf command.
mmdf fs1
Inode Information
------------------
In this case, the user sees that storage pool sp1 has 0% free space left and that is the reason for the
NO_SPACE error message.
5. To resolve the problem, the user must change the placement policy file to avoid putting data in a full
storage pool, delete some files in storage pool sp1, or add more space to the storage pool.
Suppose that 2 GB of data from a 5 GB file named abc, that is supposed to be in the system storage pool,
are actually located in another pool. This 2 GB of data is said to be 'ill-placed'. Also, suppose that 3 GB of
this file are in the system storage pool, and no other file is assigned to the system storage pool.
If you run the mmapplypolicy command to schedule file abc to be moved from the system storage pool
to a storage pool named YYY, the mmapplypolicy command does the following:
1. Starts with the 'Current pool utilization' for the system storage pool, which is 3 GB.
2. Subtracts 5 GB, the size of file abc.
3. Arrives at a 'Predicted Pool Utilization' of negative 2 GB.
The mmapplypolicy command does not know how much of an 'ill-placed' file is currently in the wrong
storage pool and how much is in the correct storage pool.
When there are ill-placed files in the system storage pool, the 'Predicted Pool Utilization' can be any
positive or negative value. The positive value can be capped by the LIMIT clause of the MIGRATE rule.
The 'Current Pool Utilizations' should always be between 0% and 100%.
Note: I/O errors while migrating files indicate failing storage devices and must be addressed like any
other I/O error. The same is true for any file system error or panic encountered while migrating files.
The mmlsfileset command identifies filesets in this state by displaying a status of 'Deleting'.
5. If you unlink a fileset that has other filesets linked below it, any filesets linked to it (that is, child
filesets) become inaccessible. The child filesets remain linked to the parent and will become accessible
again when the parent is re-linked.
6. By default, the mmdelfileset command will not delete a fileset that is not empty.
To empty a fileset, first unlink all its immediate child filesets, to remove their junctions from the
fileset to be deleted. Then, while the fileset itself is still linked, use rm -rf or a similar command, to
remove the rest of the contents of the fileset. Now the fileset may be unlinked and deleted.
Alternatively, the fileset to be deleted can be unlinked first and then mmdelfileset can be used with
the -f (force) option. This will unlink its child filesets, then destroy the files and directories contained
in the fileset.
7. When deleting a small dependent fileset, it may be faster to use the rm -rf command instead of the
mmdelfileset command with the -f option.
When the mmafmctl Device getstate command displays a NeedsResync target/fileset state, inconsistencies
exist between the home and cache. To ensure that the cached data is synchronized with the home and the
fileset is returned to Active state, either the file system must be unmounted and mounted or the fileset
must be unlinked and linked. Once this is done, the next update to fileset data will trigger an automatic
synchronization of data from the cache to the home.
Snapshot problems
Use the mmlssnapshot command as a general hint for snapshot-related problems, to find out what
snapshots exist, and what state they are in. Use the mmsnapdir command to find the snapshot directory
name used to permit access.
The mmlssnapshot command displays the list of all snapshots of a file system. This command lists the
snapshot name, some attributes of the snapshot, as well as the snapshot's status. The mmlssnapshot
command does not require the file system to be mounted.
An example of a snapshot restriction error is exceeding the maximum number of snapshots allowed at
one time. For simple errors of these types, you can determine the source of the error by reading the error
message or by reading the description of the command. You can also run the mmlssnapshot command to
see the complete list of existing snapshots.
Examples of incorrect snapshot name errors are trying to delete a snapshot that does not exist or trying to
create a snapshot using the same name as an existing snapshot. The rules for naming global and fileset
snapshots are designed to minimize conflicts between the file system administrator and the fileset
owners. These rules can result in errors when fileset snapshot names are duplicated across different
filesets or when the snapshot command -j option (specifying a qualifying fileset name) is provided or
omitted incorrectly. To resolve name problems review the mmlssnapshot output with careful attention to
the Fileset column. You can also specify the -s or -j options of the mmlssnapshot command to limit the
output. For snapshot deletion, the -j option must exactly match the Fileset column.
For more information about snapshot naming conventions, see the mmcrsnapshot command in the GPFS:
Administration and Programming Reference.
the user should fix the underlying problem and reissue the mmrestorefs command. If the user cannot fix
the underlying problem, the following steps can be taken to complete the restore command and recover
the user data:
1. If there are other snapshots available, the user can restore a different snapshot.
2. If the error code in the message is ENOSPC, there are not enough free blocks in the file system to
restore the selected snapshot. The user may add space to the file system by adding a new disk. As an
alternative, the user may delete a different snapshot from the file system to free some existing space.
The user is not allowed to delete the snapshot that is being restored. Once there is additional free
space, reissue the mmrestorefs command.
3. The mmrestorefs command can be forced to continue, even if it encounters an error, by using the -c
option. The command will restore as many files as possible, but may leave the file system in an
inconsistent state. Some files may not have been restored or may no longer be accessible. The user
should run mmfsck after the restore completes to make the file system consistent again.
4. If the above steps fail, the file system may be mounted in restricted mode, allowing the user to copy
as many files as possible into a newly created file system, or one created from an offline backup of the
data. See “Restricted mode mount” on page 23.
Note: In both steps 3 and 4, user data is lost. These steps are provided to allow as much user data as
possible to be recovered.
| the file system that contains the snapshot to restore should be mounted, and then the fileset of the
| snapshot should be linked.
It is also possible to get a name conflict as a result of issuing the mmrestorefs command. Since
mmsnapdir allows changing the name of the dynamically-generated snapshot directory, it is possible that
The fix is the similar to the one mentioned before. Perform one of these two steps:
1. After the mmrestorefs command completes, rename the conflicting file or directory that was restored
in the root directory.
2. Run the mmsnapdir command to select a different name for the dynamically-generated snapshot
directory.
Finally, the mmsnapdir -a option enables a dynamically-generated snapshot directory in every directory,
not just the file system root. This allows each user quick access to snapshots of their own files by going
into .snapshots in their home directory or any other of their directories.
Unlike .snapshots in the file system root, .snapshots in other directories is invisible, that is, an ls -a
command will not list .snapshots. This is intentional because recursive file system utilities such as find,
du or ls -R would otherwise either fail or produce incorrect or undesirable results. To access snapshots,
the user must explicitly specify the name of the snapshot directory, for example: ls ~/.snapshots. If there
is a name conflict (that is, a normal file or directory named .snapshots already exists in the user's home
directory), the user must rename the existing file or directory.
The inode numbers that are used for and within these special .snapshots directories are constructed
dynamically and do not follow the standard rules. These inode numbers are visible to applications
through standard commands, such as stat, readdir, or ls. The inode numbers reported for these
directories can also be reported differently on different operating systems. Applications should not expect
consistent numbering for such inodes.
The mmpmon command is thoroughly documented in the Monitoring GPFS I/O performance with the
mmpmon command topic in the GPFS: Advanced Administration Guide, and the Commands topic in the
GPFS: Administration and Programming Reference. Before proceeding with mmpmon problem
determination, review all of this material to ensure that you are using mmpmon correctly.
Note: Do not use the perfmon trace class of the GPFS trace to diagnose mmpmon problems. This trace
event does not provide the necessary data.
For details on how GPFS and NFS interact, see the NFS and GPFS topic in the GPFS: Administration and
Programming Reference.
These are some of the problems encountered when GPFS interacts with NFS:
v “NFS client with stale inode data”
v “NFS V4 problems”
Turning off NFS caching will result in extra file systems operations to GPFS, and negatively affect its
performance.
The clocks of all nodes in the GPFS cluster must be synchronized. If this is not done, NFS access to the
data, as well as other GPFS file system operations, may be disrupted. NFS relies on metadata timestamps
to validate the local operating system cache. If the same directory is either NFS-exported from more than
one node, or is accessed with both the NFS and GPFS mount point, it is critical that clocks on all nodes
that access the file system (GPFS nodes and NFS clients) are constantly synchronized using appropriate
software (for example, NTP). Failure to do so may result in stale information seen on the NFS clients.
NFS V4 problems
Before analyzing an NFS V4 problem, review this documentation to determine if you are using NFS V4
ACLs and GPFS correctly:
1. The NFS Version 4 Protocol paper and other information found at: www.nfsv4.org.
2. The Managing GPFS access control lists and NFS export topic in the GPFS: Administration and
Programming Reference.
3. The GPFS exceptions and limitations to NFS V4 ACLs topic in the GPFS: Administration and Programming
Reference.
The commands mmdelacl and mmputacl can be used to revert an NFS V4 ACL to a traditional ACL. Use
the mmdelacl command to remove the ACL, leaving access controlled entirely by the permission bits in
the mode. Then use the chmod command to modify the permissions, or the mmputacl and mmeditacl
commands to assign a new ACL.
For files, the mmputacl and mmeditacl commands can be used at any time (without first issuing the
mmdelacl command) to assign any type of ACL. The command mmeditacl -k posix provides a
translation of the current ACL into traditional POSIX form and can be used to more easily create an ACL
to edit, instead of having to create one from scratch.
Data integrity
GPFS takes extraordinary care to maintain the integrity of customer data. However, certain hardware
failures, or in extremely unusual circumstances, the occurrence of a programming error can cause the loss
of data in a file system.
GPFS performs extensive checking to validate metadata and ceases using the file system if metadata
becomes inconsistent. This can appear in two ways:
1. The file system will be unmounted and applications will begin seeing ESTALE return codes to file
operations.
2. Error log entries indicating an MMFS_SYSTEM_UNMOUNT and a corruption error are generated.
If actual disk data corruption occurs, this error will appear on each node in succession. Before proceeding
with the following steps, follow the procedures in “Information to collect before contacting the IBM
Support Center” on page 115, and then contact the IBM Support Center.
1. Examine the error logs on the NSD servers for any indication of a disk error that has been reported.
2. Take appropriate disk problem determination and repair actions prior to continuing.
3. After completing any required disk repair actions, run the offline version of the mmfsck command on
the file system.
4. If your error log or disk analysis tool indicates that specific disk blocks are in error, use the mmfileid
command to determine which files are located on damaged areas of the disk, and then restore these
files. See “The mmfileid command” on page 33 for more information.
5. If data corruption errors occur in only one node, it is probable that memory structures within the
node have been corrupted. In this case, the file system is probably good but a program error exists in
GPFS or another authorized program with access to GPFS data structures.
Follow the directions in “Data integrity” and then reboot the node. This should clear the problem. If
the problem repeats on one node without affecting other nodes check the programming specifications
code levels to determine that they are current and compatible and that no hardware errors were
reported. Refer to the GPFS: Concepts, Planning, and Installation Guide for correct software levels.
Running the mmfsadm dump afm all command on the gateway node shows the queued messages.
Requeued messages show in the dumps similar to the following example:
c12c4apv13.gpfs.net: Normal Queue: (listed by execution order) (state: Active)
c12c4apv13.gpfs.net: Write [612457.552962] requeued file3 (43 @ 293) chunks 0 bytes 0 0
NSDs, for example, might be defined on top of Fibre Channel SAN connected disks. This information
provides detail on the creation, use, and failure of NSDs and their underlying disk technologies.
These are some of the errors encountered with GPFS disks and NSDs:
v “NSD and underlying disk subsystem failures”
v “GPFS has declared NSDs built on top of AIX logical volumes as down” on page 100
v “Disk accessing commands fail to complete due to problems with some non-IBM disks” on page 102
v “Persistent Reserve errors” on page 102
v “GPFS is not using the underlying multipath device” on page 105
Note: If you are reinstalling the operating system on one node and erasing all partitions from the system,
GPFS descriptors will be removed from any NSD this node can access locally. The results of this action
might require recreating the file system and restoring from backup. If you experience this problem, do
not unmount the file system on any node that is currently mounting the file system. Contact the IBM
Support Center immediately to see if the problem can be corrected.
For disks that are SAN-attached to all nodes in the cluster, device=DiskName should refer to the disk
device name in /dev on the node where the mmcrnsd command is issued. If a server list is specified,
device=DiskName must refer to the name of the disk on the first server node. The same disk can have
different local names on different nodes.
When the mmcrnsd command encounters an error condition, one of these messages is displayed:
6027-2108
Error found while processing stanza
or
6027-1636
Error found while checking disk descriptor descriptor
Usually, this message is preceded by one or more messages describing the error more specifically.
or
6027-1661
Failed while processing disk descriptor descriptor on node nodeName.
One of these errors can occur if an NSD server node does not have read and write access to the disk. The
NSD server node needs to write an NSD volume ID to the raw disk. If an additional NSD server node is
specified, that NSD server node will scan its disks to find this NSD volume ID string. If the disk is
SAN-attached to all nodes in the cluster, the NSD volume ID is written to the disk by the node on which
the mmcrnsd command is running.
If you need to find out the local device names for these disks, you could use the -m option on the
mmlsnsd command. For example, issuing:
mmlsnsd -m
To find the nodes to which disk t65nsd4b is attached and the corresponding local devices for that disk,
issue:
mmlsnsd -d t65nsd4b -M
To display extended information about a node's view of its NSDs, the mmlsnsd -X command can be
used:
mmlsnsd -X -d "hd3n97;sdfnsd;hd5n98"
Note: The -m, -M and -X options of the mmlsnsd command can be very time consuming, especially on
large clusters. Use these options judiciously.
If for some reason the second step fails, for example because the disk is damaged and cannot be written
to, the mmdelnsd command issues a message describing the error and then another message stating the
exact command to issue to complete the deletion of the NSD. If these instructions are not successfully
completed, a subsequent mmcrnsd command can fail with
6027-1662
Disk device deviceName refers to an existing NSD name.
This error message indicates that the disk is either an existing NSD, or that the disk was previously an
NSD that had been removed from the GPFS cluster using the mmdelnsd -p command, and had not been
marked as available.
If the GPFS data structures are not removed from the disk, it might be unusable for other purposes. For
example, if you are trying to create an AIX volume group on the disk, the mkvg command might fail
with messages similar to:
0516-1339 /usr/sbin/mkvg: Physical volume contains some 3rd party volume group.
0516-1397 /usr/sbin/mkvg: The physical volume hdisk5, will not be added to the volume group.
0516-862 /usr/sbin/mkvg: Unable to create volume group.
The easiest way to recover such a disk is to temporarily define it as an NSD again (using the -v no
option) and then delete the just-created NSD. For example:
mmcrnsd -F filename -v no
mmdelnsd -F filename
GPFS will stop using a disk that is determined to have failed. This event is marked as MMFS_DISKFAIL
in an error log entry (see “The operating system error log facility” on page 2). The state of a disk can be
checked by issuing the mmlsdisk command.
The consequences of stopping disk usage depend on what is stored on the disk:
v Certain data blocks may be unavailable because the data residing on a stopped disk is not replicated.
v Certain data blocks may be unavailable because the controlling metadata resides on a stopped disk.
v In conjunction with other disks that have failed, all copies of critical data structures may be unavailable
resulting in the unavailability of the entire file system.
The disk will remain unavailable until its status is explicitly changed through the mmchdisk command.
After that command is issued, any replicas that exist on the failed disk are updated before the disk is
used.
On AIX, consult “The operating system error log facility” on page 2 for hardware configuration error log
entries.
Accessible disk devices will generate error log entries similar to this example for a SSA device:
--------------------------------------------------------------------------
LABEL: SSA_DEVICE_ERROR
IDENTIFIER: FE9E9357
Description
DISK OPERATION ERROR
Probable Causes
DASD DEVICE
Failure Causes
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ERROR CODE
2310 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
Description
DISK FAILURE
Probable Causes
STORAGE SUBSYSTEM
DISK
Failure Causes
STORAGE SUBSYSTEM
DISK
Recommended Actions
CHECK POWER
RUN DIAGNOSTICS AGAINST THE FAILING DEVICE
Detail Data
EVENT CODE
1027755
VOLUME
fs3
RETURN CODE
19
PHYSICAL VOLUME
vp31n05
-----------------------------------------------------------------
| GPFS offers a method of protection called replication, which overcomes disk failure at the expense of
| additional disk space. GPFS allows replication of data and metadata. This means that three instances of
| data, metadata, or both can be automatically created and maintained for any file in a GPFS file system. If
| one instance becomes unavailable due to disk failure, another instance is used instead. You can set
| different replication specifications for each file, or apply default settings specified at file system creation.
| Refer to the File system replication parameters topic in the GPFS: Concepts, Planning, and Installation Guide.
Note: If there are any GPFS file systems with pending I/O to the down disk, the I/O will timeout if
the system administrator does not stop it.
To see if there are any threads that have been waiting a long time for I/O to complete, on all nodes
issue:
mmfsadm dump waiters 10 | grep "I/O completion"
3. The next step is irreversible! Do not run this command unless data and metadata have been replicated.
This command scans file system metadata for disk addresses belonging to the disk in question, then
replaces them with a special “broken disk address” value, which may take a while.
CAUTION:
Be extremely careful with using the -p option of mmdeldisk, because by design it destroys
references to data blocks, making affected blocks unavailable. This is a last-resort tool, to be used
when data loss may have already occurred, to salvage the remaining data–which means it cannot
take any precautions. If you are not absolutely certain about the state of the file system and the
impact of running this command, do not attempt to run it without first contacting the IBM Support
Center.
mmdeldisk fs1 gpfs1n12 -p
4. Invoke the mmfileid command with the operand :BROKEN:
mmfileid :BROKEN
For more information, see “The mmfileid command” on page 33.
5. After the disk is properly repaired and available for use, you can add it back to the file system.
You can rebalance the file system at the same time by issuing:
mmadddisk fs1 gpfs12nsd -r
Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only
for file systems with large files that are mostly invariant. In many cases, normal file update and
creation will rebalance your file system over time, without the cost of the rebalancing.
2. To re-replicate data that only has single copy, issue:
mmrestripefs fs1 -r
Optionally, use the -b flag instead of the -r flag to rebalance across all disks.
Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only
for file systems with large files that are mostly invariant. In many cases, normal file update and
creation will rebalance your file system over time, without the cost of the rebalancing.
3. Optionally, check the file system for metadata inconsistencies by issuing the offline version of
mmfsck:
mmfsck fs1
Strict replication
If data or metadata replication is enabled, and the status of an existing disk changes so that the disk is no
longer available for block allocation (if strict replication is enforced), you may receive an errno of
ENOSPC when you create or append data to an existing file. A disk becomes unavailable for new block
allocation if it is being deleted, replaced, or it has been suspended. If you need to delete, replace, or
suspend a disk, and you need to write new data while the disk is offline, you can disable strict
replication by issuing the mmchfs -K no command before you perform the disk action. However, data
written while replication is disabled will not be replicated properly. Therefore, after you perform the disk
action, you must re-enable strict replication by issuing the mmchfs -K command with the original value
of the -K option (always or whenpossible) and then run the mmrestripefs -r command. To determine if a
disk has strict replication enforced, issue the mmlsfs -K command.
Note: A disk in a down state that has not been explicitly suspended is still available for block allocation,
and thus a spontaneous disk failure will not result in application I/O requests failing with ENOSPC.
While new blocks will be allocated on such a disk, nothing will actually be written to the disk until its
availability changes to up following an mmchdisk start command. Missing replica updates that took
place while the disk was down will be performed when mmchdisk start runs.
No replication
When there is no replication, the system metadata has been lost and the file system is basically
irrecoverable. You may be able to salvage some of the user data, but it will take work and time. A forced
unmount of the file system will probably already have occurred. If not, it probably will very soon if you
try to do any recovery work. You can manually force the unmount yourself:
1. Mount the file system in read-only mode (see “Read-only mode mount” on page 23). This will bypass
recovery errors and let you read whatever you can find. Directories may be lost and give errors, and
parts of files will be missing. Get what you can now, for all will soon be gone. On a single node,
issue:
mount -o ro /dev/fs1
2. If you read a file in block-size chunks and get an EIO return code that block of the file has been lost.
The rest of the file may have useful data to recover or it can be erased. To save the file system
parameters for recreation of the file system, issue:
mmlsfs fs1 > fs1.saveparms
Error numbers specific to GPFS application calls when disk failure occurs
When a disk failure has occurred, GPFS may report these error numbers in the operating system error
log, or return them to an application:
EOFFLINE = 208, Operation failed because a disk is offline
This error is most commonly returned when an attempt to open a disk fails. Since GPFS will
attempt to continue operation with failed disks, this will be returned when the disk is first
needed to complete a command or application request. If this return code occurs, check your disk
for stopped states, and check to determine if the network path exists.
To repair the disks, see your disk vendor problem determination guide. Follow the problem
determination and repair actions specified.
ENO_MGR = 212, The current file system manager failed and no new manager could be appointed.
This error usually occurs when a large number of disks are unavailable or when there has been a
major network failure. Run the mmlsdisk command to determine whether disks have failed. If
disks have failed, check the operating system error log on all nodes for indications of errors. Take
corrective action by issuing the mmchdisk command.
To repair the disks, see your disk vendor problem determination guide. Follow the problem
determination and repair actions specified.
This is the default behavior, and can be changed with the useNSDserver file system mount option. See
the NSD server considerations topic in the GPFS: Concepts, Planning, and Installation Guide.
Note: In general, after fixing the path to a disk, you must run the mmnsddiscover command on the
server that lost the path to the NSD. (Until the mmnsddiscover command is run, the reconnected node
will see its local disks and start using them by itself, but it will not act as the NSD server.)
After that, you must run the command on all client nodes that need to access the NSD on that server; or
you can achieve the same effect with a single mmnsddiscover invocation if you utilize the -N option to
specify a node list that contains all the NSD servers and clients that need to rediscover paths.
If both your data and metadata have been replicated, implement these recovery actions:
1. Unmount the file system:
mmumount fs1 -a
2. Delete the disk from the file system:
mmdeldisk fs1 gpfs10nsd -c
3. If you are replacing the disk, add the new disk to the file system:
mmadddisk fs1 gpfs11nsd
4. Then restripe the file system:
mmrestripefs fs1 -b
Note: Ensure there is sufficient space elsewhere in your file system for the data to be stored by using
the mmdf command.
GPFS has declared NSDs built on top of AIX logical volumes as down
Earlier releases of GPFS allowed AIX logical volumes to be used in GPFS file systems. Using AIX logical
volumes in GPFS file systems is now discouraged as they are limited with regard to their clustering
ability and cross platform support.
Existing file systems using AIX logical volumes are however still supported, and this information might
be of use when working with those configurations.
which will display any underlying physical device present on this node which is backing the NSD. If the
underlying device is a logical volume, perform a mapping from the logical volume to the volume group.
For example, to verify the volume group gpfs1vg on the five nodes in the GPFS cluster, for each node in
the cluster issue:
lspv | grep gpfs1vg
Here the output shows that on each of the five nodes the volume group gpfs1vg is the same physical
disk (has the same pvid). The hdisk numbers vary, but the fact that they may be called different hdisk
names on different nodes has been accounted for in the GPFS product. This is an example of a properly
defined volume group.
If any of the pvids were different for the same volume group, this would indicate that the same volume
group name has been used when creating volume groups on different physical volumes. This will not
work for GPFS. A volume group name can be used only for the same physical volume shared among
nodes in a cluster. For more information, refer to the IBM pSeries and AIX Information Center
(http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp) and search for operating system and
device management.
For some non-IBM disks, when many varyonvg -u commands are issued in parallel, some of the AIX
varyonvg -u invocations do not complete, causing the disk command to hang.
This situation is recognized by the GPFS disk command not completing after a long period of time, and
the persistence of the varyonvg processes as shown by the output of the ps -ef command on some of the
nodes of the cluster. In these cases, kill the varyonvg processes that were issued by the GPFS disk
command on the nodes of the cluster. This allows the GPFS disk command to complete. Before mounting
the affected file system on any node where a varyonvg process was killed, issue the varyonvg -u
command (varyonvg -u vgname) on the node to make the disk available to GPFS. Do this on each of the
nodes in question, one by one, until all of the GPFS volume groups are varied online.
GPFS allows file systems to have a mix of PR and non-PR disks. In this configuration, GPFS will fence PR
disks for node failures and recovery and non-PR disk will use disk leasing. If all of the disks are PR
disks, disk leasing is not used, so recovery times improve.
GPFS uses the mmchconfig command to enable PR. Issuing this command with the appropriate
usePersistentReserve option configures disks automatically. If this command fails, the most likely cause
is either a hardware or device driver problem. Other PR-related errors will probably be seen as file
system unmounts that are related to disk reservation problems. This type of problem should be debugged
with existing trace tools.
Persistent Reserve refers to a set of Small Computer Systems Interface-3 (SCSI-3) standard commands and
command options. These PR commands and command options give SCSI initiators the ability to establish,
preempt, query, and reset a reservation policy with a specified target disk. The functions provided by PR
commands are a superset of current reserve and release mechanisms. These functions are not compatible
with legacy reserve and release mechanisms. Target disks can only support reservations from either the
legacy mechanisms or the current mechanisms.
Note: Attempting to mix Persistent Reserve commands with legacy reserve and release commands will
result in the target disk returning a reservation conflict error.
Persistent Reserve establishes an interface through a reserve_policy attribute for SCSI disks. You can
optionally use this attribute to specify the type of reservation that the device driver will establish before
accessing data on the disk. For devices that do not support the reserve_policy attribute, the drivers will use
the value of the reserve_lock attribute to determine the type of reservation to use for the disk. GPFS
supports four values for the reserve_policy attribute:
Persistent Reserve support affects both the parallel (scdisk) and SCSI-3 (scsidisk) disk device drivers and
configuration methods. When a device is opened (for example, when the varyonvg command opens the
underlying hdisks), the device driver checks the ODM for reserve_policy and PR_key_value and then opens
the device appropriately. For PR, each host attached to the shared disk must use unique registration key
values for reserve_policy and PR_key_value. On AIX, you can display the values assigned to reserve_policy
and PR_key_value by issuing:
lsattr -El hdiskx -a reserve_policy,PR_key_value
If needed, use the AIX chdev command to set reserve_policy and PR_key_value.
Note: GPFS manages reserve_policy and PR_key_value using reserve_policy=PR_shared when Persistent
Reserve support is enabled and reserve_policy=no_reserve when Persistent Reserve is disabled.
Notes:
1. To view the keys that are currently registered on a disk, issue the following command from a node
that has access to the disk:
/usr/lpp/mmfs/bin/tsprreadkeys hdiskx
2. To check the AIX ODM status of a single disk on a node, issue the following command from a node
that has access to the disk:
lsattr -El hdiskx -a reserve_policy,PR_key_value
Before trying to clear the PR reservation, use the following instructions to verify that the disk is really
intended for GPFS use. Note that in this example, the device name is specified without a prefix (/dev/sdp
is specified as sdp).
1. Display all the registration key values on the disk:
If the registered key values all start with 0x00006d, which indicates that the PR registration was issued
by GPFS, proceed to the next step to verify the SCSI-3 PR reservation type. Otherwise, contact your
system administrator for information about clearing the disk state.
2. Display the reservation type on the disk:
/usr/lpp/mmfs/bin/tsprreadres sdp
If the output does not indicate a PR reservation with this type, contact your system administrator for
information about clearing the disk state.
The mmlsdisk command output might show unexpected results for multipath I/O devices. For example
if you issue this command:
mmlsdisk dmfs2 -M
The mmlsdisk output shows that I/O for NSD m0001 is being performed on disk /dev/sdb, but it should
show that I/O is being performed on the device-mapper multipath (DMM) /dev/dm-30. Disk /dev/sdb is
one of eight paths of the DMM /dev/dm-30 as shown from the multipath command.
To change the NSD device type to a known device type, create a file that contains the NSD name and
device type pair (one per line) and issue this command:
mmchconfig updateNsdType=/tmp/filename
| When mmapplypolicy is invoked to perform a key rewrap, the command may issue messages like the
| following:
Existing file systems using AIX logical volumes are, however, still supported. This information might be
of use when working with those configurations.
If an error report contains a reference to a logical volume pertaining to GPFS, you can use the lslv -l
command to list the physical volume name. For example, if you want to find the physical disk associated
with logical volume gpfs7lv, issue:
lslv -l gpfs44lv
Output is similar to this, with the physical volume name in column one.
gpfs44lv:N/A
PV COPIES IN BAND DISTRIBUTION
hdisk8 537:000:000 100% 108:107:107:107:108
In this example, k164n04 and k164n05 are quorum nodes and k164n06 is a nonquorum node.
To change the quorum status of a node, use the mmchnode command. To change one quorum node to
nonquorum, GPFS does not have to be stopped. If you are changing more than one node at the same
time, GPFS needs to be down on all the affected nodes. GPFS does not have to be stopped when
changing nonquorum nodes to quorum nodes, nor does it need to be stopped on nodes that are not
affected.
For example, to make k164n05 a nonquorum node, and k164n06 a quorum node, issue these commands:
mmchnode --nonquorum -N k164n05
mmchnode --quorum -N k164n06
To set a node's quorum designation at the time that it is added to the cluster, see the mmcrcluster or
mmaddnode commands.
The default dump directory for GPFS is /tmp/mmfs. This directory might disappear on Linux if cron is
set to run the /etc/cron.daily/tmpwatch script. The tmpwatch script removes files and directories in /tmp
that have not been accessed recently. Administrators who want to use a different directory for GPFS
dumps can change the directory by issuing this command:
mmchconfig dataStructureDump=/name_of_some_other_big_file_system
You can exclude all GPFS file systems by adding gpfs to the excludeFileSytemType list in this script, or
exclude specific GPFS file systems in the excludeFileSytemType list.
/usr/bin/updatedb -f "excludeFileSystemType" -e "excludeFileSystem"
If indexing GPFS file systems is desired, only one node should run the updatedb command and build the
database in a GPFS file system. If the database is built within a GPFS file system it will be visible on all
nodes after one node finishes building it.
Once you start a new session (by logging out and logging back in), the use of the GPFS drive letter will
supersede any of your settings for the same drive letter. This is standard behavior for all local file
systems on Windows.
Why does the offline mmfsck command fail with "Error creating
internal storage"?
The mmfsck command requires some temporary space on the file system manager for storing internal
data during a file system scan. The internal data will be placed in the directory specified by the mmfsck
-t command line parameter (/tmp by default). The amount of temporary space that is needed is
proportional to the number of inodes (used and unused) in the file system that is being scanned. If GPFS
is unable to create a temporary file of the required size, the mmfsck command will fail with error
message:
Error creating internal storage
AFM resync is used by the administrator under special circumstances such as home corruption. The
administrator can choose to update home with the contents from the cache by using the mmafmctl
resync command. Resync works for single-writer cache only.
The mode of an AFM client cache fileset cannot be changed from local-update mode to any other mode;
however, it can be changed from read-only to single-writer (and vice versa), and from either read-only or
single-writer to local-update.
Accessing the contents in an AFM cache for an uncached object while in disconnected mode results in an
input/output error.
Why are setuid/setgid bits in a single-writer cache reset at home after data is
appended?
The setuid/setgid bits in a single-writer cache are reset at home after data is appended to files on which
those bits were previously set and synced. This is because over NFS, a write operation to a setuid file
resets the setuid bit.
On a fileset whose metadata in all subdirectories is not cached, any application that optimizes by
assuming that directories contain two fewer subdirectories than their hard link count will not traverse the
last subdirectory. One such example is find; on Linux, a workaround for this is to use find -noleaf to
correctly traverse a directory that has not been cached.
For an operating system in the gateway whose Linux kernel version is below 2.6.32, the NFS max rsize is
32K, so AFM would not support an extended attribute size of more than 32K on that gateway.
The .ptrash directory is present in cache and home. In some cases, where there is a conflict that AFM
cannot resolve automatically, the file is moved to .ptrash at cache or home. In cache the .ptrash gets
cleaned up when eviction is triggered. At home, it is not cleared automatically. When the administrator is
looking to clear some space, the .ptrash should be cleaned up first.
Fix the problem and access the fileset to make it active again. If the fileset does not automatically become
active, and remains dropped for a long time (more than five minutes), then do one of the following:
1. Unmount the file system and then remount it.
2. Unlink the dropped fileset and then link it again.
3. Restart GPFS on the gateway node.
Why is my data not read from the network locally when I have an FPO pool
(write-affinity enabled storage pool) created?
When you create a storage pool that is to contain files that make use of FPO features, you must specify
allowWriteAffinity=yes in the storage pool stanza.
To enable the policy to read replicas from local disks, you must also issue the following command:
mmchconfig readReplicaPolicy=local
To change the failure group in a write-affinity–enabled storage pool, you must use the mmdeldisk and
mmadddisk commands; you cannot use mmchdisk to change it directly.
Why does Hadoop receive a fixed value for the block group factor instead of the
GPFS default value?
When a customer does not define the dfs.block.size property in the configuration file, the GPFS
connector will use a fixed block size to initialize Hadoop. The reason for this is that Hadoop has only one
block size per file system, whereas GPFS allows different chunk sizes (block-group-factor × data block
size) for different data pools because block size is a per-pool property. To avoid a mismatch when using
Hadoop with FPO, define dfs.block.size and dfs.replication in the configuration file.
How can I retain the original data placement when I restore data from a TSM
server?
When data in an FPO pool is backed up in a TSM server and then restored, the original placement map
will be broken unless you set the write affinity failure group for each file before backup.
For AFM home or cache, an FPO pool file written on the local side will be placed according to the write
affinity depth and write affinity failure group definitions of the local side. When a file is synced from
home to cache, it follows the same FPO placement rule as when written from the gateway node in the
cache cluster. When a file is synced from cache to home, it follows the same FPO data placement rule as
when written from the NFS server in the home cluster.
To retain the same file placement at both home and cache, ensure that each has the same cluster
configuration, and set the write affinity failure group for each file.
Obtain this information as quickly as you can after a problem is detected, so that error logs will not wrap
and system parameters that are always changing, will be captured as close to the point of failure as
possible. When a serious problem is detected, collect this information and then call IBM. For more
information, see:
v “Information to collect before contacting the IBM Support Center”
v “How to contact the IBM Support Center” on page 117.
Regardless of the problem encountered with GPFS, the following data should be available when you
contact the IBM Support Center:
1. A description of the problem.
2. Output of the failing application, command, and so forth.
3. A tar file generated by the gpfs.snap command that contains data from the nodes in the cluster. In
large clusters, the gpfs.snap command can collect data from certain nodes (for example, the affected
nodes, NSD servers, or manager nodes) using the -N option.
For more information about gathering data with gpfs.snap, see “The gpfs.snap command” on page 6.
If the gpfs.snap command cannot be run, collect these items:
a. Any error log entries relating to the event:
v On an AIX node, issue this command:
errpt -a
v On a Linux node, create a tar file of all the entries in the /var/log/messages file from all nodes in
the cluster or the nodes that experienced the failure. For example, issue the following command
to create a tar file that includes all nodes in the cluster:
mmdsh -v -N all "cat /var/log/messages" > all.messages
v On a Windows node, use the Export List... dialog in the Event Viewer to save the event log to a
file.
b. A master GPFS log file that is merged and chronologically sorted for the date of the failure (see
“Creating a master GPFS log file” on page 2).
c. If the cluster was configured to store dumps, collect any internal GPFS dumps written to that
directory relating to the time of the failure. The default directory is /tmp/mmfs.
d. On a failing Linux node, gather the installed software packages and the versions of each package
by issuing this command:
rpm -qa
e. On a failing AIX node, gather the name, most recent level, state, and description of all installed
software packages by issuing this command:
lslpp -l
f. File system attributes for all of the failing file systems, issue:
When a delay or deadlock situation is suspected, the IBM Support Center will need additional
information to assist with problem diagnosis. If you have not done so already, ensure you have the
following information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to GPFS” on page 115.
| 2. The deadlock debug data collected automatically.
3. If the cluster size is relatively small and the maxFilesToCache setting is not high (less than 10,000),
issue the following command:
gpfs.snap --deadlock
If the cluster size is large or the maxFilesToCache setting is high (greater than 1M), issue the
following command:
gpfs.snap --deadlock --quick
For more information about the --deadlock and --quick options, see “The gpfs.snap command” on
page 6.
When file system corruption or MMFS_FSSTRUCT errors are encountered, the IBM Support Center will
need additional information to assist with problem diagnosis. If you have not done so already, ensure
you have the following information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to GPFS” on page 115.
2. Unmount the file system everywhere, then run mmfsck -n in offline mode and redirect it to an output
file.
The IBM Support Center will determine when and if you should run the mmfsck -y command.
When the GPFS daemon is repeatedly crashing, the IBM Support Center will need additional information
to assist with problem diagnosis. If you have not done so already, ensure you have the following
information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to GPFS” on page 115.
2. Ensure the /tmp/mmfs directory exists on all nodes. If this directory does not exist, the GPFS daemon
will not generate internal dumps.
3. Set the traces on this cluster and all clusters that mount any file system from this cluster:
mmtracectl --set --trace=def --trace-recycle=global
4. Start the trace facility by issuing:
For failures in non-IBM software, follow the problem-reporting procedures provided with that product.
A severity tag is a one-character alphabetic code (A through Z), optionally followed by a colon (:) and a
number, and surrounded by an opening and closing bracket ([ ]). For example:
[E] or [E:nnn]
If more than one substring within a message matches this pattern (for example, [A] or [A:nnn]), the
severity tag is the first such matching string.
When the severity tag includes a numeric code (nnn), this is an error code associated with the message. If
this were the only problem encountered by the command, the command return code would be nnn.
If a message does not have a severity tag, the message does not conform to this specification. You can
determine the message severity by examining the text or any supplemental information provided in the
message catalog, or by contacting the IBM Support Center.
| Each message severity tag has an assigned priority that can be used to filter the messages that are sent to
| the error log on Linux. Filtering is controlled with the mmchconfig attribute systemLogLevel. The default
| for systemLogLevel is error, which means GPFS will send all error [E], critical [X], and alert [A]
| messages to the error log. The values allowed for systemLogLevel are: alert, critical, error, warning,
| notice, configuration, informational, detail, or debug. Additionally, the value none can be specified so
| no messages are sent to the error log.
| Alert [A] messages have the highest priority, and debug [B] messages have the lowest priority. If the
| systemLogLevel default of error is changed, only messages with the specified severity and all those with
| a higher priority are sent to the error log. The following table lists the message severity tags in order of
| priority:
| Table 3. Message severity tags ordered by priority
| Type of message
| (systemLogLevel
| Severity tag attribute) Meaning
| A alert Indicates a problem where action must be taken immediately. Notify the
| appropriate person to correct the problem.
| X critical Indicates a critical condition that should be corrected immediately. The
| system discovered an internal inconsistency of some kind. Command
| execution might be halted or the system might attempt to continue despite
| the inconsistency. Report these errors to IBM.
| E error Indicates an error condition. Command execution might or might not
| continue, but this error was likely caused by a persistent condition and will
| remain until corrected by some other program or administrative action. For
| example, a command operating on a single file or other GPFS object might
| terminate upon encountering any condition of severity E. As another
| example, a command operating on a list of files, finding that one of the files
| has permission bits set that disallow the operation, might continue to
| operate on all other files within the specified list of files.
Explanation: A configuration error was found. User response: Verify your GPFS daemon version.
| 6027-338 [N] Waiting for number user(s) of shared | 6027-343 [E] Node nodeName is incompatible because
segment to release it. its version (number) is less than the
Explanation: The mmfsd daemon is attempting to minimum compatible version of this
terminate, but cannot because some process is holding node (number). [value/value]
the shared segment while in a system call. The message Explanation: The GPFS daemon tried to make a
will repeat every 30 seconds until the count drops to connection with another GPFS daemon. However, the
zero. other daemon is not compatible. Its version is less than
User response: Find the process that is not the minimum compatible version of the daemon
responding, and find a way to get it out of its system running on this node. The numbers in square brackets
call. are for IBM Service use.
User response: Verify your GPFS daemon version.
| 6027-339 [E] Nonnumeric trace value 'value' after class
'class'. | 6027-344 [E] Node nodeName is incompatible because
Explanation: The specified trace value is not its version is greater than the maximum
recognized. compatible version of this node
(number). [value/value]
User response: Specify a valid trace integer value.
Explanation: The GPFS daemon tried to make a
connection with another GPFS daemon. However, the
other daemon is not compatible. Its version is greater
than the maximum compatible version of the daemon
running on this node. The numbers in square brackets cards. Run the mmchconfig subnets command to
are for IBM Service use. correct the value.
User response: Verify your GPFS daemon version.
| 6027-350 [E] Bad "subnets" configuration: primary IP
address ipAddress is on a private subnet.
6027-345 Network error on ipAddress, check
Use a public IP address instead.
connectivity.
Explanation: GPFS is configured to allow multiple IP
Explanation: A TCP error has caused GPFS to exit due
addresses per node (subnets configuration parameter),
to a bad return code from an error. Exiting allows
but the primary IP address of the node (the one
recovery to proceed on another node and resources are
specified when the cluster was created or when the
not tied up on this node.
node was added to the cluster) was found to be on a
User response: Follow network problem private subnet. If multiple IP addresses are used, the
determination procedures. primary address must be a public IP address.
User response: Remove the node from the cluster;
| 6027-346 [E] Incompatible daemon version. My then add it back using a public IP address.
version = number, repl.my_version =
number
6027-358 Communication with mmspsecserver
Explanation: The GPFS daemon tried to make a through socket name failed, err value:
connection with another GPFS daemon. However, the errorString, msgType messageType.
other GPFS daemon is not the same version and it sent
Explanation: Communication failed between
a reply indicating its version number is incompatible.
spsecClient (the daemon) and spsecServer.
User response: Verify your GPFS daemon version.
User response: Verify both the communication socket
and the mmspsecserver process.
| 6027-347 [E] Remote host ipAddress refused
connection because IP address ipAddress
6027-359 The mmspsecserver process is shutting
was not in the node list file
down. Reason: explanation.
Explanation: The GPFS daemon tried to make a
Explanation: The mmspsecserver process received a
connection with another GPFS daemon. However, the
signal from the mmfsd daemon or encountered an
other GPFS daemon sent a reply indicating it did not
error on execution.
recognize the IP address of the connector.
User response: Verify the reason for shutdown.
User response: Add the IP address of the local host to
the node list file on the remote host.
6027-360 Disk name must be removed from the
/etc/filesystems stanza before it can be
| 6027-348 [E] Bad "subnets" configuration: invalid deleted. Another disk in the file system
subnet "ipAddress".
can be added in its place if needed.
Explanation: A subnet specified by the subnets
Explanation: A disk being deleted is found listed in
configuration parameter could not be parsed.
the disks= list for a file system.
User response: Run the mmlsconfig command and
User response: Remove the disk from list.
check the value of the subnets parameter. Each subnet
must be specified as a dotted-decimal IP address. Run
the mmchconfig subnets command to correct the | 6027-361 [E] Local access to disk failed with EIO,
value. switching to access the disk remotely.
Explanation: Local access to the disk failed. To avoid
| 6027-349 [E] Bad "subnets" configuration: invalid unmounting of the file system, the disk will now be
cluster name pattern accessed remotely.
"clusterNamePattern".
User response: Wait until work continuing on the
Explanation: A cluster name pattern specified by the local node completes. Then determine why local access
subnets configuration parameter could not be parsed. to the disk failed, correct the problem and restart the
daemon. This will cause GPFS to begin accessing the
User response: Run the mmlsconfig command and
disk locally again.
check the value of the subnets parameter. The optional
cluster name pattern following subnet address must be
a shell-style pattern allowing '*', '/' and '[...]' as wild
User response: Determine the reason why the disks User response: Issue the mmlsdisk command to
are inaccessible for reading, then reissue the display disk status. Then either issue the mmchdisk
mmadddisk command. command to change the status of the disk to
replacement or specify a new disk that has a status of
replacement.
| 6027-365 [I] Rediscovered local access to disk.
Explanation: Rediscovered local access to disk, which 6027-374 Disk name may not be replaced.
failed earlier with EIO. For good performance, the disk
will now be accessed locally. Explanation: A disk being replaced with mmrpldisk
does not have a status of ready or suspended.
User response: Wait until work continuing on the
local node completes. This will cause GPFS to begin User response: Use the mmlsdisk command to
accessing the disk locally again. display disk status. Issue the mmchdisk command to
change the status of the disk to be replaced to either
ready or suspended.
6027-369 I/O error writing file system descriptor
for disk name.
6027-375 Disk name diskName already in file
Explanation: mmadddisk detected an I/O error while system.
writing a file system descriptor on a disk.
Explanation: The replacement disk name specified in
User response: Determine the reason the disk is the mmrpldisk command already exists in the file
inaccessible for writing and reissue the mmadddisk system.
command.
User response: Specify a different disk as the
replacement disk.
6027-376 Previous replace command must be 6027-382 Value value for the 'sector size' option
completed before starting a new one. for disk disk is not a multiple of value.
Explanation: The mmrpldisk command failed because Explanation: When parsing disk lists, the sector size
the status of other disks shows that a replace command given is not a multiple of the default sector size.
did not complete.
User response: Specify a correct sector size.
User response: Issue the mmlsdisk command to
display disk status. Retry the failed mmrpldisk
6027-383 Disk name name appears more than
command or issue the mmchdisk command to change
once.
the status of the disks that have a status of replacing or
replacement. Explanation: When parsing disk lists, a duplicate
name is found.
6027-377 Cannot replace a disk that is in use. User response: Remove the duplicate name.
Explanation: Attempting to replace a disk in place,
but the disk specified in the mmrpldisk command is 6027-384 Disk name name already in file system.
still available for use.
Explanation: When parsing disk lists, a disk name
User response: Use the mmchdisk command to stop already exists in the file system.
GPFS's use of the disk.
User response: Rename or remove the duplicate disk.
User response: Specify a correct 'has metadata' value. 3. Disks are not correctly defined on all active nodes.
4. Disks, logical volumes, network shared disks, or
6027-390 Value value for the 'has metadata' option virtual shared disks were incorrectly re-configured
for disk name is invalid. after creating a file system.
Explanation: When parsing disk lists, the 'has User response: Verify:
metadata' value given is not valid. 1. The disks are correctly defined on all nodes.
User response: Specify a correct 'has metadata' value. 2. The paths to the disks are correctly defined and
operational.
6027-394 Too many disks specified for file User response: Start any disks that have been stopped
system. Maximum = number. by the mmchdisk command or by hardware failures.
Verify that paths to all disks are correctly defined and
Explanation: Too many disk names were passed in the operational.
disk descriptor list.
User response: Check the disk descriptor list or the 6027-420 Inode size must be greater than zero.
file containing the list.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-399 Not enough items in disk descriptor list
entry, need fields. User response: Record the above information. Contact
the IBM Support Center.
Explanation: When parsing a disk descriptor, not
enough fields were specified for one disk.
6027-421 Inode size must be a multiple of logical
User response: Correct the disk descriptor to use the sector size.
correct disk descriptor syntax.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-416 Incompatible file system descriptor
version or not formatted. User response: Record the above information. Contact
the IBM Support Center.
Explanation: Possible reasons for the error are:
1. A file system descriptor version that is not valid
was encountered.
2. No file system descriptor can be found.
6027-422 Inode size must be at least as large as 6027-428 Indirect block size must be a multiple
the logical sector size. of the minimum fragment size.
Explanation: An internal consistency check has found Explanation: An internal consistency check has found
a problem with file system parameters. a problem with file system parameters.
User response: Record the above information. Contact User response: Record the above information. Contact
the IBM Support Center. the IBM Support Center.
6027-423 Minimum fragment size must be a 6027-429 Indirect block size must be less than
multiple of logical sector size. full data block size.
Explanation: An internal consistency check has found Explanation: An internal consistency check has found
a problem with file system parameters. a problem with file system parameters.
User response: Record the above information. Contact User response: Record the above information. Contact
the IBM Support Center. the IBM Support Center.
6027-424 Minimum fragment size must be greater 6027-430 Default metadata replicas must be less
than zero. than or equal to default maximum
number of metadata replicas.
Explanation: An internal consistency check has found
a problem with file system parameters. Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Record the above information. Contact
the IBM Support Center. User response: Record the above information. Contact
the IBM Support Center.
6027-425 File system block size of blockSize is
larger than maxblocksize parameter. 6027-431 Default data replicas must be less than
or equal to default maximum number of
Explanation: An attempt is being made to mount a
data replicas.
file system whose block size is larger than the
maxblocksize parameter as set by mmchconfig. Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Use the mmchconfig
maxblocksize=xxx command to increase the maximum User response: Record the above information. Contact
allowable block size. the IBM Support Center.
6027-426 Warning: mount detected unavailable 6027-432 Default maximum metadata replicas
disks. Use mmlsdisk fileSystem to see must be less than or equal to value.
details.
Explanation: An internal consistency check has found
Explanation: The mount command detected that some a problem with file system parameters.
disks needed for the file system are unavailable.
User response: Record the above information. Contact
User response: Without file system replication the IBM Support Center.
enabled, the mount will fail. If it has replication, the
mount may succeed depending on which disks are
6027-433 Default maximum data replicas must be
unavailable. Use mmlsdisk to see details of the disk
less than or equal to value.
status.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-427 Indirect block size must be at least as
large as the minimum fragment size. User response: Record the above information. Contact
the IBM Support Center.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-434 Indirect blocks must be at least as big as
User response: Record the above information. Contact
inodes.
the IBM Support Center.
Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Record the above information. Contact
the IBM Support Center.
6027-468 Disk name listed in fileName or local | 6027-472 [E] File system format version versionString
mmsdrfs file, not found in device name. is not supported.
Run: mmcommon recoverfs name.
Explanation: The current file system format version is
Explanation: Tried to access a file system but the disks not supported.
listed in the operating system's file system database or
User response: Verify:
the local mmsdrfs file for the device do not exist in the
file system. 1. The disks are correctly defined on all nodes.
2. The paths to the disks are correctly defined and
User response: Check the configuration and
operative.
availability of disks. Run the mmcommon recoverfs
device command. If this does not resolve the problem,
configuration data in the SDR may be incorrect. If no | 6027-473 [X] File System fileSystem unmounted by the
user modifications have been made to the SDR, contact system with return code value reason
the IBM Support Center. If user modifications have code value
been made, correct these modifications.
Explanation: Console log entry caused by a forced
unmount due to disk or communication failure.
6027-469 File system name does not match
descriptor. User response: Correct the underlying problem and
remount the file system.
Explanation: The file system name found in the
descriptor on disk does not match the corresponding
device name in /etc/filesystems. | 6027-474 [X] Recovery Log I/O Failed, Unmounting
file system fileSystem
User response: Check the operating system's file
system database. Explanation: I/O to the recovery log failed.
User response: Check the paths to all disks making up
6027-470 Disk name may still belong to file the file system. Run the mmlsdisk command to
system filesystem. Created on IPandTime. determine if GPFS has declared any disks unavailable.
Repair any paths to disks that have failed. Remount the
Explanation: The disk being added by the mmcrfs, file system.
mmadddisk, or mmrpldisk command appears to still
belong to some file system.
6027-475 The option '--inode-limit' is not enabled.
User response: Verify that the disks you are adding Use option '-V' to enable most recent
do not belong to an active file system, and use the -v features.
no option to bypass this check. Use this option only if
you are sure that no other file system has this disk Explanation: mmchfs --inode-limit is not enabled
configured because you may cause data corruption in under the current file system format version.
both file systems if this is not the case. User response: Run mmchfs -V, this will change the
file system format to the latest format supported.
6027-471 Disk diskName: Incompatible file system
descriptor version or not formatted. 6027-476 Restricted mount using only available
Explanation: Possible reasons for the error are: file system descriptor.
1. A file system descriptor version that is not valid Explanation: Fewer than the necessary number of file
was encountered. system descriptors were successfully read. Using the
2. No file system descriptor can be found. best available descriptor to allow the restricted mount
to continue.
3. Disks are not correctly defined on all active nodes.
4. Disks, logical volumes, network shared disks, or User response: Informational message only.
virtual shared disks were incorrectly reconfigured
after creating a file system. 6027-477 The option -z is not enabled. Use the -V
User response: Verify: option to enable most recent features.
1. The disks are correctly defined on all nodes. Explanation: The file system format version does not
2. The paths to the disks are correctly defined and support the -z option on the mmchfs command.
operative. User response: Change the file system format version
by issuing mmchfs -V.
6027-478 The option -z could not be changed. 6027-484 Remount failed for device after daemon
fileSystem is still in use. restart.
Explanation: The file system is still mounted or Explanation: A remount failed after daemon restart.
another GPFS administration command (mm...) is This ordinarily occurs because one or more disks are
running against the file system. unavailable. Other possibilities include loss of
connectivity to one or more disks.
User response: Unmount the file system if it is
mounted, and wait for any command that is running to User response: Issue the mmlsdisk command and
complete before reissuing the mmchfs -z command. check for down disks. Issue the mmchdisk command
to start any down disks, then remount the file system.
If there is another problem with the disks or the
| 6027-479 [N] Mount of fsName was blocked by
connections to the disks, take necessary corrective
fileName
actions and remount the file system.
Explanation: The internal or external mount of the file
system was blocked by the existence of the specified
6027-485 Perform mmchdisk for any disk failures
file.
and re-mount.
User response: If the file system needs to be mounted,
Explanation: Occurs in conjunction with 6027-484.
remove the specified file.
User response: Follow the User response for 6027-484.
6027-480 Cannot enable DMAPI in a file system
with existing snapshots. 6027-486 No local device specified for
fileSystemName in clusterName.
Explanation: The user is not allowed to enable
DMAPI for a file system with existing snapshots. Explanation: While attempting to mount a remote file
system from another cluster, GPFS was unable to
User response: Delete all existing snapshots in the file
determine the local device name for this file system.
system and repeat the mmchfs command.
User response: There must be a /dev/sgname special
device defined. Check the error code. This is probably a
| 6027-481 [E] Remount failed for mountid id:
configuration error in the specification of a remote file
errnoDescription
system. Run mmremotefs show to check that the
Explanation: mmfsd restarted and tried to remount remote file system is properly configured.
any file systems that the VFS layer thinks are still
mounted.
6027-487 Failed to write the file system descriptor
User response: Check the errors displayed and the to disk diskName.
errno description.
Explanation: An error occurred when mmfsctl include
was writing a copy of the file system descriptor to one
| 6027-482 [E] Remount failed for device name: of the disks specified on the command line. This could
errnoDescription have been caused by a failure of the corresponding disk
device, or an error in the path to the disk.
Explanation: mmfsd restarted and tried to remount
any file systems that the VFS layer thinks are still User response: Verify that the disks are correctly
mounted. defined on all nodes. Verify that paths to all disks are
correctly defined and operational.
User response: Check the errors displayed and the
errno description.
6027-488 Error opening the exclusion disk file
fileName.
| 6027-483 [N] Remounted name
Explanation: Unable to retrieve the list of excluded
Explanation: mmfsd restarted and remounted the disks from an internal configuration file.
specified file system because it was in the kernel's list
of previously mounted file systems. User response: Ensure that GPFS executable files have
been properly installed on all nodes. Perform required
User response: Informational message only. configuration steps prior to starting GPFS.
6027-489 Attention: The desired replication factor 6027-495 You have requested that the file system
exceeds the number of available be upgraded to version number. This
dataOrMetadata failure groups. This is will enable new functionality but will
allowed, but the files will not be prevent you from using the file system
replicated and will therefore be at risk. with earlier releases of GPFS. Do you
want to continue?
Explanation: You specified a number of replicas that
exceeds the number of failure groups available. Explanation: Verification request in response to the
mmchfs -V full command. This is a request to upgrade
User response: Reissue the command with a smaller
the file system and activate functions that are
replication factor, or increase the number of failure
incompatible with a previous release of GPFS.
groups.
User response: Enter yes if you want the conversion
to take place.
| 6027-490 [N] The descriptor replica on disk diskName
has been excluded.
6027-496 You have requested that the file system
Explanation: The file system descriptor quorum has
version for local access be upgraded to
been overridden and, as a result, the specified disk was
version number. This will enable some
excluded from all operations on the file system
new functionality but will prevent local
descriptor quorum.
nodes from using the file system with
User response: None. Informational message only. earlier releases of GPFS. Remote nodes
are not affected by this change. Do you
want to continue?
6027-492 The file system is already at file system
version number Explanation: Verification request in response to the
mmchfs -V command. This is a request to upgrade the
Explanation: The user tried to upgrade the file system file system and activate functions that are incompatible
format using mmchfs -V --version=v, but the specified with a previous release of GPFS.
version is smaller than the current version of the file
system. User response: Enter yes if you want the conversion
to take place.
User response: Specify a different value for the
--version option.
6027-497 The file system has already been
upgraded to number using -V full. It is
6027-493 File system version number is not not possible to revert back.
supported on nodeName nodes in the
cluster. Explanation: The user tried to upgrade the file system
format using mmchfs -V compat, but the file system
Explanation: The user tried to upgrade the file system has already been fully upgraded.
format using mmchfs -V, but some nodes in the local
cluster are still running an older GPFS release that does User response: Informational message only.
support the new format version.
User response: Install a newer version of GPFS on 6027-498 Incompatible file system format. Only
those nodes. file systems formatted with GPFS 3.2.1.5
or later can be mounted on this
platform.
6027-494 File system version number is not
supported on nodeName remote nodes Explanation: A user running GPFS on Microsoft
mounting the file system. Windows tried to mount a file system that was
formatted with a version of GPFS that did not have
Explanation: The user tried to upgrade the file system Windows support.
format using mmchfs -V, but the file system is still
mounted on some nodes in remote clusters that do User response: Create a new file system using current
support the new format version. GPFS code.
Explanation: mmfsmnthelp was called with an User response: None. Informational message only.
incorrect parameter.
User response: Contact the IBM Support Center. 6027-515 Cannot mount fileSystem on mountPoint
Explanation: There was an error mounting the named
6027-504 Not enough memory to allocate internal GPFS file system. Errors in the disk path usually cause
data structure. this problem.
Explanation: Self explanatory. User response: Take the action indicated by other
error messages and error log entries.
User response: Increase ulimit or paging space
6027-507 program: loadFile is not loaded. User response: Take the action indicated by other
error messages and error log entries.
Explanation: The program could not be loaded.
User response: None. Informational message only. 6027-518 Cannot mount fileSystem: Already
mounted.
6027-510 Cannot mount fileSystem on mountPoint: Explanation: An attempt has been made to mount a
errorString file system that is already mounted.
Explanation: There was an error mounting the GPFS User response: None. Informational message only.
file system.
User response: Determine action indicated by the
error messages and error log entries. Errors in the disk
path often cause this problem.
6027-519 Cannot mount fileSystem on mountPoint: 6027-535 Disks up to size size can be added to
File system table full. storage pool pool.
Explanation: An attempt has been made to mount a Explanation: Based on the parameters given to
file system when the file system table is full. mmcrfs and the size and number of disks being
formatted, GPFS has formatted its allocation maps to
User response: None. Informational message only.
allow disks up the given size to be added to this
storage pool by the mmadddisk command.
6027-520 Cannot mount fileSystem: File system
User response: None. Informational message only. If
table full.
the reported maximum disk size is smaller than
Explanation: An attempt has been made to mount a necessary, delete the file system with mmdelfs and
file system when the file system table is full. rerun mmcrfs with either larger disks or a larger value
for the -n parameter.
User response: None. Informational message only.
User response: None. Informational message only. Explanation: Insufficient memory for GPFS internal
data structures with current system and GPFS
configuration.
6027-531 The following disks of name will be
formatted on node nodeName: list. User response: Reduce page pool usage with the
mmchconfig command, or add additional RAM to
Explanation: Output showing which disks will be system.
formatted by the mmcrfs command.
User response: None. Informational message only. 6027-537 Disks up to size size can be added to
this file system.
| 6027-532 [E] The quota record recordNumber in file Explanation: Based on the parameters given to the
fileName is not valid. mmcrfs command and the size and number of disks
Explanation: A quota entry contained a checksum that being formatted, GPFS has formatted its allocation
is not valid. maps to allow disks up the given size to be added to
this file system by the mmadddisk command.
User response: Remount the file system with quotas
disabled. Restore the quota file from back up, and run User response: None, informational message only. If
mmcheckquota. the reported maximum disk size is smaller than
necessary, delete the file system with mmdelfs and
reissue the mmcrfs command with larger disks or a
| 6027-533 [W] Inode space inodeSpace in file system larger value for the -n parameter.
fileSystem is approaching the limit for
the maximum number of inodes.
6027-538 Error accessing disks.
Explanation: The number of files created is
approaching the file system limit. Explanation: The mmcrfs command encountered an
error accessing one or more of the disks.
User response: Use the mmchfileset command to
increase the maximum number of files to avoid User response: Verify that the disk descriptors are
reaching the inode limit and possible performance coded correctly and that all named disks exist and are
degradation. online.
6027-534 Cannot create a snapshot in a 6027-539 Unable to clear descriptor areas for
DMAPI-enabled file system, fileSystem.
rc=returnCode. Explanation: The mmdelfs command encountered an
Explanation: You cannot create a snapshot in a error while invalidating the file system control
DMAPI-enabled file system. structures on one or more disks in the file system being
deleted.
User response: Use the mmchfs command to disable
DMAPI, and reissue the command. User response: If the problem persists, specify the -p
option on the mmdelfs command.
6027-544 Could not invalidate disk of fileSystem. 6027-551 fileSystem is still in use.
Explanation: A disk could not be written to invalidate Explanation: The mmdelfs or mmcrfs command
its contents. Check the subsystems in the path to the found that the named file system is still mounted or
disk. This is often an I/O error. that another GPFS command is running against the file
system.
User response: Ensure the indicated logical volume is
writable. User response: Unmount the file system if it is
mounted, or wait for GPFS commands in progress to
terminate before retrying the command.
6027-545 Error processing fileset metadata file.
Explanation: There is no I/O path to critical metadata
6027-552 Scan completed successfully.
or metadata has been corrupted.
Explanation: The scan function has completed without
User response: Verify that the I/O paths to all disks
error.
are valid and that all disks are either in the 'recovering'
or 'up' availability states. If all disks are available and User response: None. Informational message only.
the problem persists, issue the mmfsck command to
repair damaged metadata
6027-553 Scan failed on number user or system
files.
6027-546 Error processing allocation map for
storage pool poolName. Explanation: Data may be lost as a result of pointers
that are not valid or unavailable disks.
Explanation: There is no I/O path to critical metadata,
or metadata has been corrupted. User response: Some files may have to be restored
from backup copies. Issue the mmlsdisk command to
User response: Verify that the I/O paths to all disks check the availability of all the disks that make up the
are valid, and that all disks are either in the 'recovering' file system.
6027-554 Scan failed on number out of number user 6027-560 File system is already suspended.
or system files.
Explanation: The tsfsctl command was asked to
Explanation: Data may be lost as a result of pointers suspend a suspended file system.
that are not valid or unavailable disks.
User response: None. Informational message only.
User response: Some files may have to be restored
from backup copies. Issue the mmlsdisk command to
6027-561 Error migrating log.
check the availability of all the disks that make up the
file system. Explanation: There are insufficient available disks to
continue operation.
6027-555 The desired replication factor exceeds User response: Restore the unavailable disks and
the number of available failure groups. reissue the command.
Explanation: You have specified a number of replicas
that exceeds the number of failure groups available. 6027-562 Error processing inodes.
User response: Reissue the command with a smaller Explanation: There is no I/O path to critical metadata
replication factor or increase the number of failure or metadata has been corrupted.
groups.
User response: Verify that the I/O paths to all disks
are valid and that all disks are either in the recovering
6027-556 Not enough space for the desired or up availability. Issue the mmlsdisk command.
number of replicas.
Explanation: In attempting to restore the correct 6027-563 File system is already running.
replication, GPFS ran out of space in the file system.
The operation can continue but some data is not fully Explanation: The tsfsctl command was asked to
replicated. resume a file system that is already running.
User response: Make additional space available and User response: None. Informational message only.
reissue the command.
6027-564 Error processing inode allocation map.
6027-557 Not enough space or available disks to Explanation: There is no I/O path to critical metadata
properly balance the file. or metadata has been corrupted.
Explanation: In attempting to stripe data within the User response: Verify that the I/O paths to all disks
file system, data was placed on a disk other than the are valid and that all disks are either in the recovering
desired one. This is normally not a problem. or up availability. Issue the mmlsdisk command.
User response: Run mmrestripefs to rebalance all
files. 6027-565 Scanning user file metadata ...
Explanation: Progress information.
6027-558 Some data are unavailable.
User response: None. Informational message only.
Explanation: An I/O error has occurred or some disks
are in the stopped state.
6027-566 Error processing user file metadata.
User response: Check the availability of all disks by
issuing the mmlsdisk command and check the path to Explanation: Error encountered while processing user
all disks. Reissue the command. file metadata.
User response: None. Informational message only.
6027-559 Some data could not be read or written.
Explanation: An I/O error has occurred or some disks 6027-567 Waiting for pending file system scan to
are in the stopped state. finish ...
User response: Check the availability of all disks and Explanation: Progress information.
the path to all disks, and reissue the command. User response: None. Informational message only.
6027-568 Waiting for number pending file system 6027-575 Unable to complete low level format for
scans to finish ... | fileSystem. Failed with error errorCode
Explanation: Progress information. Explanation: The mmcrfs command was unable to
create the low level file structures for the file system.
User response: None. Informational message only.
User response: Check other error messages and the
error log. This is usually an error accessing disks.
6027-569 Incompatible parameters. Unable to
allocate space for file system metadata.
Change one or more of the following as 6027-576 Storage pools have not been enabled for
suggested and try again: file system fileSystem.
Explanation: Incompatible file system parameters Explanation: User invoked a command with a storage
were detected. pool option (-p or -P) before storage pools were
enabled.
User response: Refer to the details given and correct
the file system parameters. User response: Enable storage pools with the mmchfs
-V command, or correct the command invocation and
reissue the command.
6027-570 Incompatible parameters. Unable to
create file system. Change one or more
of the following as suggested and try 6027-577 Attention: number user or system files
again: are not properly replicated.
Explanation: Incompatible file system parameters Explanation: GPFS has detected files that are not
were detected. replicated correctly due to a previous failure.
User response: Refer to the details given and correct User response: Issue the mmrestripefs command at
the file system parameters. the first opportunity.
6027-571 Logical sector size value must be the 6027-578 Attention: number out of number user or
same as disk sector size. system files are not properly replicated:
Explanation: This message is produced by the mmcrfs Explanation: GPFS has detected files that are not
command if the sector size given by the -l option is not replicated correctly
the same as the sector size given for disks in the -d
option.
6027-579 Some unreplicated file system metadata
User response: Correct the options and reissue the has been lost. File system usable only in
command. restricted mode.
Explanation: A disk was deleted that contained vital
6027-572 Completed creation of file system file system metadata that was not replicated.
fileSystem.
User response: Mount the file system in restricted
Explanation: The mmcrfs command has successfully mode (-o rs) and copy any user data that may be left
completed. on the file system. Then delete the file system.
User response: None. Informational message only.
6027-580 Unable to access vital system metadata.
Too many disks are unavailable.
| 6027-573 All data on the following disks of
fileSystem will be destroyed: Explanation: Metadata is unavailable because the
disks on which the data reside are stopped, or an
Explanation: Produced by the mmdelfs command to
attempt was made to delete them.
list the disks in the file system that is about to be
destroyed. Data stored on the disks will be lost. User response: Either start the stopped disks, try to
delete the disks again, or recreate the file system.
User response: None. Informational message only.
User response: Determine why a disk is unavailable. command must be run with the file system unmounted.
6027-582 Some data has been lost. 6027-588 No more than number nodes can mount
a file system.
Explanation: An I/O error has occurred or some disks
are in the stopped state. Explanation: The limit of the number of nodes that
can mount a file system was exceeded.
User response: Check the availability of all disks by
issuing the mmlsdisk command and check the path to User response: Observe the stated limit for how many
all disks. Reissue the command. nodes can mount a file system.
6027-584 Incompatible parameters. Unable to 6027-589 Scanning file system metadata, phase
allocate space for root directory. Change number ...
one or more of the following as
Explanation: Progress information.
suggested and try again:
User response: None. Informational message only.
Explanation: Inconsistent parameters have been
passed to the mmcrfs command, which would result in
the creation of an inconsistent file system. Suggested | 6027-590 [W] GPFS is experiencing a shortage of
parameter changes are given. pagepool. This message will not be
repeated for at least one hour.
User response: Reissue the mmcrfs command with the
suggested parameter changes. Explanation: Pool starvation occurs, buffers have to be
continually stolen at high aggressiveness levels.
6027-585 Incompatible parameters. Unable to User response: Issue the mmchconfig command to
allocate space for ACL data. Change one increase the size of pagepool.
or more of the following as suggested
and try again:
6027-591 Unable to allocate sufficient inodes for
Explanation: Inconsistent parameters have been file system metadata. Increase the value
passed to the mmcrfs command, which would result in for option and try again.
the creation of an inconsistent file system. The
parameters entered require more space than is Explanation: Too few inodes have been specified on
available. Suggested parameter changes are given. the -N option of the mmcrfs command.
User response: Reissue the mmcrfs command with the User response: Increase the size of the -N option and
suggested parameter changes. reissue the mmcrfs command.
6027-586 Quota server initialization failed. 6027-592 Mount of fileSystem is waiting for the
mount disposition to be set by some
Explanation: Quota server initialization has failed. data management application.
This message may appear as part of the detail data in
the quota error log. Explanation: Data management utilizing DMAPI is
enabled for the file system, but no data management
User response: Check status and availability of the application has set a disposition for the mount event.
disks. If quota files have been corrupted, restore them
from the last available backup. Finally, reissue the User response: Start the data management application
command. and verify that the application sets the mount
disposition.
6027-594 Disk diskName cannot be added to | 6027-597 [E] The quota command was requested to
storage pool poolName. Allocation map process quotas for a type (user, group, or
cannot accommodate disks larger than fileset), which is not enabled.
size MB.
Explanation: A quota command was requested to
Explanation: The specified disk is too large compared process quotas for a user, group, or fileset quota type,
to the disks that were initially used to create the which is not enabled.
storage pool.
User response: Verify that the user, group, or fileset
User response: Specify a smaller disk or add the disk quota type is enabled and reissue the command.
to a new storage pool.
| 6027-598 [E] The supplied file does not contain quota
6027-595 [E] While creating quota files, file fileName, information.
with no valid quota information was
Explanation: A file supplied as a quota file does not
| found in the root directory. Remove files
contain quota information.
with reserved quota file names (e.g.
user.quota) without valid quota User response: Change the file so it contains valid
information from the root directory by: - quota information and reissue the command.
mounting the file system without
quotas, - removing the files, and - To mount the file system so that new quota files are
remounting the file system with quotas created:
to recreate new quota files. To use quota 1. Mount the file system without quotas.
file names other than the reserved 2. Verify there are no files in the root directory with
names, use the mmcheckquota the reserved user.quota or group.quota name.
command.
3. Remount the file system with quotas.
Explanation: While mounting a file system, the state
of the file system descriptor indicates that quota files
| 6027-599 [E] File supplied to the command does not
do not exist. However, files that do not contain quota
exist in the root directory.
information but have one of the reserved names:
user.quota, group.quota, or fileset.quota exist in the Explanation: The user-supplied name of a new quota
root directory. file has not been found.
User response: To mount the file system so that new User response: Ensure that a file with the supplied
quota files will be created, perform these steps: name exists. Then reissue the command.
1. Mount the file system without quotas.
2. Verify that there are no files in the root directory 6027-600 On node nodeName an earlier error may
with the reserved names: user.quota, group.quota, have caused some file system data to be
or fileset.quota. inaccessible at this time. Check error log
3. Remount the file system with quotas. To mount the for additional information. After
file system with other files used as quota files, issue correcting the problem, the file system
the mmcheckquota command. can be mounted again to restore normal
data access.
| 6027-596 [I] While creating quota files, file fileName Explanation: An earlier error may have caused some
containing quota information was found file system data to be inaccessible at this time.
in the root directory. This file will be User response: Check the error log for additional
used as quotaType quota file. information. After correcting the problem, the file
Explanation: While mounting a file system, the state system can be mounted again.
of the file system descriptor indicates that quota files
do not exist. However, files that have one of the 6027-601 Error changing pool size.
reserved names user.quota, group.quota, or
fileset.quota and contain quota information, exist in the Explanation: The mmchconfig command failed to
root directory. The file with the reserved name will be change the pool size to the requested value.
used as the quota file. User response: Follow the suggested actions in the
User response: None. Informational message. other messages that occur with this one.
User response: Check the return code. This is usually User response: Check that the communications paths
due to network or disk connectivity problems. Issue the are available between the two nodes.
mmlsdisk command to determine if the paths to the
6027-614 Value value for option name is out of 6027-621 Negative quota limits are not allowed.
range. Valid values are number through
Explanation: The quota value must be positive.
number.
User response: Reissue the mmedquota command and
Explanation: The value for an option in the command
enter valid values when editing the information.
line arguments is out of range.
User response: Correct the command line and reissue
| 6027-622 [E] Failed to join remote cluster clusterName
the command.
Explanation: The node was not able to establish
communication with another cluster, usually while
6027-615 mmcommon getContactNodes
attempting to mount a file system from a remote
clusterName failed. Return code value.
cluster.
Explanation: mmcommon getContactNodes failed
User response: Check other console messages for
while looking up contact nodes for a remote cluster,
additional information. Verify that contact nodes for the
usually while attempting to mount a file system from a
remote cluster are set correctly. Run mmremotefs show
remote cluster.
and mmremotecluster show to display information
User response: Check the preceding messages, and about the remote cluster.
consult the earlier chapters of this document. A
frequent cause for such errors is lack of space in /var.
6027-623 All disks up and ready
Explanation: Self-explanatory.
| 6027-616 [X] Duplicate address ipAddress in node list
User response: None. Informational message only.
Explanation: The IP address appears more than once
in the node list file.
6027-624 No disks
User response: Check the node list shown by the
mmlscluster command. Explanation: Self-explanatory.
User response: None. Informational message only.
| 6027-617 [I] Recovered number nodes for cluster
clusterName.
6027-625 Migrate already pending.
Explanation: The asynchronous part (phase 2) of node
failure recovery has completed. Explanation: A request to migrate the file system
manager failed because a previous migrate request has
User response: None. Informational message only. not yet completed.
User response: None. Informational message only.
| 6027-618 [X] Local host not found in node list (local
ip interfaces: interfaceList)
6027-626 Migrate to node nodeName already
Explanation: The local host specified in the node list pending.
file could not be found.
Explanation: A request to migrate the file system
User response: Check the node list shown by the manager failed because a previous migrate request has
mmlscluster command. not yet completed.
User response: None. Informational message only.
6027-619 Negative grace times are not allowed.
Explanation: The mmedquota command received a 6027-627 Node nodeName is already manager for
negative value for the -t option. fileSystem.
User response: Reissue the mmedquota command Explanation: A request has been made to change the
with a nonnegative value for grace time. file system manager node to the node that is already
the manager.
6027-620 Hard quota limit must not be less than User response: None. Informational message only.
soft limit.
Explanation: The hard quota limit must be greater 6027-628 Sending migrate request to current
than or equal to the soft quota limit. manager node nodeName.
User response: Reissue the mmedquota command and Explanation: A request has been made to change the
enter valid values when editing the information. file system manager node.
| 6027-630 [N] Node nodeName appointed as manager | 6027-636 [E] Disk marked as stopped or offline.
for fileSystem.
Explanation: A disk continues to be marked down
Explanation: The mmchmgr command successfully due to a previous error and was not opened again.
changed the node designated as the file system
User response: Check the disk status by issuing the
manager.
mmlsdisk command, then issue the mmchdisk start
User response: None. Informational message only. command to restart the disk.
6027-631 Failed to appoint node nodeName as | 6027-637 [E] RVSD is not active.
manager for fileSystem.
Explanation: The RVSD subsystem needs to be
Explanation: A request to change the file system activated.
manager node has failed.
User response: See the appropriate IBM Reliable
User response: Accompanying messages will describe Scalable Cluster Technology (RSCT) document at:
the reason for the failure. Also, see the mmfs.log file on publib.boulder.ibm.com/clresctr/windows/public/
the target node. rsctbooks.html and search on diagnosing IBM Virtual
Shared Disk problems.
Explanation: A node number, IP address, or host User response: Decide which mount mode you want
name that is not valid has been entered in the to use, and use that mount mode on both nodes.
configuration file or as input for a command.
User response: Validate your configuration | 6027-640 [E] File system is mounted
information and the condition of your network. This Explanation: A command has been issued that
message may result from an inability to translate a requires that the file system be unmounted.
node name.
User response: Unmount the file system and reissue
the command.
| 6027-641 [E] Unable to access vital system metadata. | 6027-646 [E] File system unmounted due to loss of
Too many disks are unavailable or the cluster membership.
file system is corrupted.
Explanation: Quorum was lost, causing file systems to
Explanation: An attempt has been made to access a be unmounted.
file system, but the metadata is unavailable. This can be
User response: Get enough nodes running the GPFS
caused by:
daemon to form a quorum.
1. The disks on which the metadata resides are either
stopped or there was an unsuccessful attempt to
delete them. | 6027-647 [E] File fileName could not be run with err
errno.
2. The file system is corrupted.
Explanation: The specified shell script could not be
User response: To access the file system:
run. This message is followed by the error string that is
1. If the disks are the problem either start the stopped returned by the exec.
disks or try to delete them.
User response: Check file existence and access
2. If the file system has been corrupted, you will have
permissions.
to recreate it from backup medium.
6027-662 mmfsd timed out waiting for primary 6027-668 Could not send message to file system
node nodeName. daemon
Explanation: The mmfsd server is about to terminate. Explanation: Attempt to send a message to the file
system failed.
User response: Ensure that the mmfs.cfg
configuration file contains the correct host name or IP User response: Check if the file system daemon is up
address of the primary node. Check mmfsd on the and running.
primary node.
6027-669 Could not connect to file system
6027-663 Lost connection to file system daemon. daemon.
Explanation: The connection between a GPFS Explanation: The TCP connection between the
command and the mmfsd daemon has broken. The command and the daemon could not be established.
daemon has probably crashed.
User response: Check additional error messages.
User response: Ensure that the mmfsd daemon is
running. Check the error log.
6027-670 Value for 'option' is not valid. Valid
values are list.
6027-664 Unexpected message from file system
Explanation: The specified value for the given
daemon.
command option was not valid. The remainder of the
Explanation: The version of the mmfsd daemon does line will list the valid keywords.
not match the version of the GPFS command.
User response: Correct the command line.
User response: Ensure that all GPFS software
components are at the same version.
6027-671 Keyword missing or incorrect.
Explanation: A missing or incorrect keyword was
6027-665 Failed to connect to file system daemon:
encountered while parsing command line arguments
errorString
User response: Correct the command line.
Explanation: An error occurred while trying to create
a session with mmfsd.
6027-672 Too few arguments specified.
User response: Ensure that the mmfsd daemon is
running. Also, only root can run most GPFS Explanation: Too few arguments were specified on the
commands. The mode bits of the commands must be command line.
set-user-id to root.
User response: Correct the command line.
6027-676 Option option specified more than once. 6027-684 Value value for option is incorrect.
Explanation: The named option was specified more Explanation: An incorrect value was specified for the
than once on the command line. named option.
User response: Correct the command line. User response: Correct the command line.
6027-677 Option option is incorrect. 6027-685 Value value for option option is out of
range. Valid values are number through
Explanation: An incorrect option was specified on the
number.
command line.
Explanation: An out of range value was specified for
User response: Correct the command line.
the named option.
User response: Correct the command line.
6027-678 Misplaced or incorrect parameter name.
Explanation: A misplaced or incorrect parameter was
6027-686 option (value) exceeds option (value).
specified on the command line.
Explanation: The value of the first option exceeds the
User response: Correct the command line.
value of the second option. This is not permitted.
User response: Correct the command line.
6027-679 Device name is not valid.
Explanation: An incorrect device name was specified
6027-687 Disk name is specified more than once.
on the command line.
Explanation: The named disk was specified more than
User response: Correct the command line.
once on the command line.
User response: Correct the command line.
| 6027-680 [E] Disk failure. Volume name. rc = value.
| Physical volume name.
6027-688 Failed to read file system descriptor.
| Explanation: An I/O request to a disk or a request to
| fence a disk has failed in such a manner that GPFS can Explanation: The disk block containing critical
| no longer use the disk. information about the file system could not be read
from disk.
| User response: Check the disk hardware and the
| software subsystems in the path to the disk. User response: This is usually an error in the path to
the disks. If there are associated messages indicating an
I/O error such as ENODEV or EIO, correct that error
6027-681 Required option name was not specified.
and retry the operation. If there are no associated I/O
Explanation: A required option was not specified on errors, then run the mmfsck command with the file
the command line. system unmounted.
User response: Correct the command line. Explanation: Could not obtain enough memory
(RAM) to perform an operation.
User response: Either retry the operation when the
mmfsd daemon is less heavily loaded, or increase the
size of one or more of the memory pool parameters by
issuing the mmchconfig command.
6027-691 Failed to send message to node | 6027-698 [E] Not enough memory to allocate internal
nodeName. data structure.
Explanation: A message to another file system node Explanation: A file system operation failed because no
could not be sent. memory is available for allocating internal data
structures.
User response: Check additional error message and
the internode communication configuration. User response: Stop other processes that may have
main memory pinned for their use.
6027-692 Value for option is not valid. Valid
values are yes, no. | 6027-699 [E] Inconsistency in file system metadata.
Explanation: An option that is required to be yes or Explanation: File system metadata on disk has been
no is neither. corrupted.
User response: Correct the command line. User response: This is an extremely serious error that
may cause loss of data. Issue the mmfsck command
with the file system unmounted to make repairs. There
6027-693 Cannot open disk name.
will be a POSSIBLE FILE CORRUPTION entry in the
Explanation: Could not access the given disk. system error log that should be forwarded to the IBM
Support Center.
User response: Check the disk hardware and the path
to the disk.
| 6027-700 [E] Log recovery failed.
6027-694 Disk not started; disk name has a bad Explanation: An error was encountered while
volume label. restoring file system metadata from the log.
Explanation: The volume label on the disk does not User response: Check additional error message. A
match that expected by GPFS. likely reason for this error is that none of the replicas of
the log could be accessed because too many disks are
User response: Check the disk hardware. For currently unavailable. If the problem persists, issue the
hot-pluggable drives, ensure that the proper drive has mmfsck command with the file system unmounted.
been plugged in.
unavailable disks or insufficient memory for file system system database (the given file) for a valid device entry.
control structures. Check other error messages as well
as the error log for additional information. Unmount
6027-707 Unable to open file fileName.
the file system and correct any I/O errors. Then
remount the file system and try the operation again. If Explanation: The named file cannot be opened.
the problem persists, issue the mmfsck command with
the file system unmounted to make repairs. User response: Check that the file exists and has the
correct permissions.
Explanation: The file system has encountered an error Explanation: An incorrect keyword was encountered.
that is serious enough to make some or all data User response: Correct the command line.
inaccessible. This message indicates that an error
occurred that left the file system in an unusable state.
6027-709 Incorrect response. Valid responses are
User response: Possible reasons include too many "yes", "no", or "noall"
unavailable disks or insufficient memory for file system
control structures. Check other error messages as well Explanation: A question was asked that requires a yes
as the error log for additional information. Unmount or no answer. The answer entered was neither yes, no,
the file system and correct any I/O errors. Then nor noall.
remount the file system and try the operation again. If User response: Enter a valid response.
the problem persists, issue the mmfsck command with
the file system unmounted to make repairs.
6027-710 Attention:
Explanation: The mmfsd daemon is not accepting Explanation: The file system has encountered an error
messages because it is restarting or stopping. that is serious enough to make some or all data
inaccessible. This message indicates that an error
User response: None. Informational message only. occurred that left the file system in an unusable state.
Possible reasons include too many unavailable disks or
| 6027-719 [E] Device type not supported. insufficient memory for file system control structures.
Explanation: A disk being added to a file system with User response: Check other error messages as well as
the mmadddisk or mmcrfs command is not a character the error log for additional information. Correct any
mode special file, or has characteristics not recognized I/O errors. Then, remount the file system and try the
by GPFS. operation again. If the problem persists, issue the
mmfsck command with the file system unmounted to
User response: Check the characteristics of the disk make repairs.
being added to the file system.
6027-725 The mmfsd daemon is not ready to 6027-731 Error number while performing command
handle commands yet. Waiting for for name quota on fileSystem
quorum.
Explanation: An error occurred when switching
Explanation: The GPFS mmfsd daemon is not quotas of a certain type on or off. If errors were
accepting messages because it is waiting for quorum. returned for multiple file systems, only the error code
is shown.
User response: Determine why insufficient nodes have
joined the group to achieve quorum and rectify the User response: Check the error code shown by the
problem. message to determine the reason.
| 6027-726 [E] Quota initialization/start-up failed. 6027-732 Error while performing command on
fileSystem.
Explanation: Quota manager initialization was
unsuccessful. The file system manager finished without Explanation: An error occurred while performing the
quotas. Subsequent client mount requests will fail. stated command when listing or reporting quotas.
User response: Check the error log and correct I/O User response: None. Informational message only.
errors. It may be necessary to issue the mmcheckquota
command with the file system unmounted.
6027-733 Edit quota: Incorrect format!
Explanation: The format of one or more edited quota
6027-727 Specified driver type type does not
limit entries was not correct.
match disk name driver type type.
User response: Reissue the mmedquota command.
Explanation: The driver type specified on the
Change only the values for the limits and follow the
mmchdisk command does not match the current driver
instructions given.
type of the disk.
User response: Verify the driver type and reissue the
command.
| 6027-734 [W] Quota check for 'fileSystem' ended
prematurely.
Explanation: The user interrupted and terminated the
6027-728 Specified sector size value does not
command.
match disk name sector size value.
User response: If ending the command was not
Explanation: The sector size specified on the
intended, reissue the mmcheckquota command.
mmchdisk command does not match the current sector
size of the disk.
6027-735 Error editing string from mmfsd.
User response: Verify the sector size and reissue the
command. Explanation: An internal error occurred in the mmfsd
when editing a string.
6027-729 Attention: No changes for disk name User response: None. Informational message only.
were specified.
Explanation: The disk descriptor in the mmchdisk 6027-736 Attention: Due to an earlier error
command does not specify that any changes are to be normal access to this file system has
made to the disk. been disabled. Check error log for
additional information. The file system
User response: Check the disk descriptor to determine
must be unmounted and then mounted
if changes are needed.
again to restore normal data access.
Explanation: The file system has encountered an error
6027-730 command on fileSystem.
that is serious enough to make some or all data
Explanation: Quota was activated or deactivated as inaccessible. This message indicates that an error
stated as a result of the mmquotaon, mmquotaoff, occurred that left the file system in an unusable state.
mmdefquotaon, or mmdefquotaoff commands. Possible reasons include too many unavailable disks or
insufficient memory for file system control structures.
User response: None, informational only. This
message is enabled with the -v option on the User response: Check other error messages as well as
mmquotaon, mmquotaoff, mmdefquotaon, or the error log for additional information. Unmount the
mmdefquotaoff commands. file system and correct any I/O errors. Then, remount
the file system and try the operation again. If the
problem persists, issue the mmfsck command with the
file system unmounted to make repairs. already recorded in the file system configuration. The
most likely reason for this problem is that too many
disks have become unavailable or are still unavailable
6027-737 Attention: No metadata disks remain.
after the disk state change.
Explanation: The mmchdisk command has been
User response: Issue an mmchdisk start command
issued, but no metadata disks remain.
when more disks are available.
User response: None. Informational message only.
6027-744 Unable to run command while the file
6027-738 Attention: No data disks remain. system is mounted in restricted mode.
Explanation: The mmchdisk command has been Explanation: A command that can alter the data in a
issued, but no data disks remain. file system was issued while the file system was
mounted in restricted mode.
User response: None. Informational message only.
User response: Mount the file system in read-only or
read-write mode or unmount the file system and then
6027-739 Attention: Due to an earlier reissue the command.
configuration change the file system is
no longer properly balanced.
6027-745 fileSystem: no quotaType quota
Explanation: The mmlsdisk command found that the management enabled.
file system is not properly balanced.
Explanation: A quota command of the cited type was
User response: Issue the mmrestripefs -b command at issued for the cited file system when no quota
your convenience. management was enabled.
User response: Enable quota management and reissue
6027-740 Attention: Due to an earlier the command.
configuration change the file system is
no longer properly replicated.
6027-746 Editing quota limits for this user or
Explanation: The mmlsdisk command found that the group not permitted.
file system is not properly replicated.
Explanation: The root user or system group was
User response: Issue the mmrestripefs -r command at specified for quota limit editing in the mmedquota
your convenience command.
User response: Specify a valid user or group in the
6027-741 Attention: Due to an earlier mmedquota command. Editing quota limits for the root
configuration change the file system user or system group is prohibited.
may contain data that is at risk of being
lost.
| 6027-747 [E] Too many nodes in cluster (max number)
Explanation: The mmlsdisk command found that or file system (max number).
critical data resides on disks that are suspended or
being deleted. Explanation: The operation cannot succeed because
too many nodes are involved.
User response: Issue the mmrestripefs -m command
as soon as possible. User response: Reduce the number of nodes to the
applicable stated limit.
6027-742 Error occurred while executing a
command for fileSystem. 6027-748 fileSystem: no quota management
enabled
Explanation: A quota command encountered a
problem on a file system. Processing continues with the Explanation: A quota command was issued for the
next file system. cited file system when no quota management was
enabled.
User response: None. Informational message only.
User response: Enable quota management and reissue
the command.
6027-743 Initial disk state was updated
successfully, but another error may have
changed the state again.
Explanation: The mmchdisk command encountered
an error after the disk status or availability change was
6027-749 Pool size changed to number K = number | 6027-756 [E] Configuration invalid or inconsistent
M. between different nodes.
Explanation: Pool size successfully changed. Explanation: Self-explanatory.
User response: None. Informational message only. User response: Check cluster and file system
configuration.
| 6027-750 [E] The node address ipAddress is not
defined in the node list 6027-757 name is not an excluded disk.
Explanation: An address does not exist in the GPFS Explanation: Some of the disks passed to the mmfsctl
configuration file. include command are not marked as excluded in the
mmsdrfs file.
User response: Perform required configuration steps
prior to starting GPFS on the node. User response: Verify the list of disks supplied to this
command.
| 6027-751 [E] Error code value
6027-758 Disk(s) not started; disk name has a bad
Explanation: Provides additional information about an
volume label.
error.
Explanation: The volume label on the disk does not
User response: See accompanying error messages.
match that expected by GPFS.
User response: Check the disk hardware. For
| 6027-752 [E] Lost membership in cluster clusterName.
hot-pluggable drives, make sure the proper drive has
Unmounting file systems.
been plugged in.
Explanation: This node has lost membership in the
cluster. Either GPFS is no longer available on enough
6027-759 fileSystem is still in use.
nodes to maintain quorum, or this node could not
communicate with other members of the quorum. This Explanation: The mmfsctl include command found
could be caused by a communications failure between that the named file system is still mounted, or another
nodes, or multiple GPFS failures. GPFS command is running against the file system.
User response: See associated error logs on the failed User response: Unmount the file system if it is
nodes for additional problem determination mounted, or wait for GPFS commands in progress to
information. terminate before retrying the command.
| 6027-753 [E] Could not run command command | 6027-760 [E] Unable to perform i/o to the disk. This
| node is either fenced from accessing the
Explanation: The GPFS daemon failed to run the
| disk or this node's disk lease has
specified command.
| expired.
User response: Verify correct installation.
| Explanation: A read or write to the disk failed due to
| either being fenced from the disk or no longer having a
6027-754 Error reading string for mmfsd. | disk lease.
Explanation: GPFS could not properly read an input | User response: Verify disk hardware fencing setup is
string. | correct if being used. Ensure network connectivity
| between this node and other nodes is operational.
User response: Check that GPFS is properly installed.
6027-762 No quota enabled file system found. 6027-769 Malformed mmpmon command
'command'.
Explanation: There is no quota-enabled file system in
this cluster. Explanation: The command read from the input file is
malformed, perhaps with an unknown keyword.
User response: None. Informational message only.
User response: Correct the command invocation and
reissue the command.
6027-763 uidInvalidate: Incorrect option option.
Explanation: An incorrect option passed to the
6027-770 Error writing user.quota file.
uidinvalidate command.
Explanation: An error occurred while writing the cited
User response: Correct the command invocation.
quota file.
User response: Check the status and availability of the
6027-764 Error invalidating UID remapping cache
disks and reissue the command.
for domain.
Explanation: An incorrect domain name passed to the
6027-771 Error writing group.quota file.
uidinvalidate command.
Explanation: An error occurred while writing the cited
User response: Correct the command invocation.
quota file.
User response: Check the status and availability of the
| 6027-765 [W] Tick value hasn't changed for nearly
disks and reissue the command.
number seconds
Explanation: Clock ticks incremented by AIX have not
6027-772 Error writing fileset.quota file.
been incremented.
Explanation: An error occurred while writing the cited
User response: Check the error log for hardware or
quota file.
device driver problems that might cause timer
interrupts to be lost. User response: Check the status and availability of the
disks and reissue the command.
| 6027-766 [N] This node will be expelled from cluster
cluster due to expel msg from node 6027-774 fileSystem: quota management is not
enabled, or one or more quota clients
Explanation: This node is being expelled from the
are not available.
cluster.
Explanation: An attempt was made to perform quotas
User response: Check the network connection
commands without quota management enabled, or one
between this node and the node specified above.
or more quota clients failed during quota check.
User response: Correct the cause of the problem, and
| 6027-767 [N] Request sent to node to expel node from then reissue the quota command.
cluster cluster
Explanation: This node sent an expel request to the
6027-775 During mmcheckquota processing,
cluster manager node to expel another node.
number node(s) failed. It is
User response: Check network connection between recommended that mmcheckquota be
this node and the node specified above. repeated.
Explanation: Nodes failed while an online quota
6027-768 Wrong number of operands for check was running.
mmpmon command 'command'.
User response: Reissue the quota check command.
Explanation: The command read from the input file
has the wrong number of operands.
6027-776 fileSystem: There was not enough space
User response: Correct the command invocation and for the report. Please repeat quota
reissue the command. check!
Explanation: The vflag is set in the tscheckquota
command, but either no space or not enough space
could be allocated for the differences to be printed.
User response: Correct the space problem and reissue
the quota check.
| 6027-777 [I] Recovering nodes: nodeList | 6027-786 [E] Message failed because the destination
node refused the connection.
Explanation: Recovery for one or more nodes has
begun. Explanation: This node sent a message to a node that
refuses to establish a connection.
User response: No response is needed if this message
is followed by 'recovered nodes' entries specifying the User response: Check previous messages for further
nodes. If this message is not followed by such a information.
message, determine why recovery did not complete.
| 6027-787 [E] Security configuration data is
| 6027-778 [I] Recovering nodes in cluster cluster: inconsistent or unavailable.
nodeList
Explanation: There was an error configuring security
Explanation: Recovery for one or more nodes in the on this node.
cited cluster has begun.
User response: Check previous messages for further
User response: No response is needed if this message information.
is followed by 'recovered nodes' entries on the cited
cluster specifying the nodes. If this message is not
followed by such a message, determine why recovery
| 6027-788 [E] Failed to load or initialize security
library.
did not complete.
Explanation: There was an error loading or initializing
the security library on this node.
6027-779 Incorrect fileset name filesetName.
User response: Check previous messages for further
Explanation: The fileset name provided on the
information.
command line is incorrect.
User response: Correct the fileset name and reissue
6027-789 Unable to read offsets offset to offset for
the command.
inode inode snap snap, from disk
diskName, sector sector.
6027-780 Incorrect path to fileset junction
Explanation: The mmdeldisk -c command found that
junctionName.
the cited addresses on the cited disk represent data that
Explanation: The path to the fileset junction is is no longer readable.
incorrect.
User response: Save this output for later use in
User response: Correct the junction path and reissue cleaning up failing disks.
the command.
6027-790 Specified storage pool poolName does not
6027-781 Storage pools have not been enabled for match disk diskName storage pool
file system fileSystem. poolName. Use mmdeldisk and
mmadddisk to change a disk's storage
Explanation: The user invoked a command with a
pool.
storage pool option (-p or -P) before storage pools were
enabled. Explanation: An attempt was made to change a disk's
storage pool assignment using the mmchdisk
User response: Enable storage pools with the mmchfs
command. This can only be done by deleting the disk
-V command, or correct the command invocation and
from its current storage pool and then adding it to the
reissue the command.
new pool.
User response: Delete the disk from its current storage
| 6027-784 [E] Device not ready.
pool and then add it to the new pool.
Explanation: A device is not ready for operation.
User response: Check previous messages for further 6027-792 Policies have not been enabled for file
information. system fileSystem.
Explanation: The cited file system must be upgraded
| 6027-785 [E] Cannot establish connection. to use policies.
Explanation: This node cannot establish a connection User response: Upgrade the file system via the
to another node. mmchfs -V command.
6027-793 No policy file was installed for file 6027-851 Unable to process interrupt received.
system fileSystem.
Explanation: An interrupt occurred that tsiostat
Explanation: No policy file was installed for this file cannot process.
system.
User response: Contact the IBM Support Center.
User response: Install a policy file.
6027-852 interval and count must be positive
6027-794 Failed to read policy file for file system integers.
fileSystem.
Explanation: Incorrect values were supplied for
Explanation: Failed to read the policy file for the tsiostat parameters.
requested file system.
User response: Correct the command invocation and
User response: Reinstall the policy file. reissue the command.
6027-795 Failed to open fileName: errorCode. 6027-853 interval must be less than 1024.
Explanation: An incorrect file name was specified to Explanation: An incorrect value was supplied for the
tschpolicy. interval parameter.
User response: Correct the command invocation and User response: Correct the command invocation and
reissue the command. reissue the command.
6027-796 Failed to read fileName: errorCode. 6027-854 count must be less than 1024.
Explanation: An incorrect file name was specified to Explanation: An incorrect value was supplied for the
tschpolicy. count parameter.
User response: Correct the command invocation and User response: Correct the command invocation and
reissue the command. reissue the command.
6027-797 Failed to stat fileName: errorCode. 6027-855 Unable to connect to server, mmfsd is
not started.
Explanation: An incorrect file name was specified to
tschpolicy. Explanation: The tsiostat command was issued but
the file system is not started.
User response: Correct the command invocation and
reissue the command. User response: Contact your system administrator.
6027-798 Policy files are limited to number bytes. 6027-856 No information to report.
Explanation: A user-specified policy file exceeded the Explanation: The tsiostat command was issued but no
maximum-allowed length. file systems are mounted.
User response: Install a smaller policy file. User response: Contact your system administrator.
6027-850 Unable to issue this command from a 6027-858 File system not mounted.
non-root user.
Explanation: The requested file system is not
Explanation: tsiostat requires root privileges to run. mounted.
User response: Get the system administrator to User response: Mount the file system and reattempt
change the executable to set the UID to 0. the failing operation.
privileges). Correct the command invocation and commands to examine and change the pool and
reissue the command. replication attributes of the named file.
| 6027-873 [W] Error on gpfs_stat_inode([pathName/ | 6027-879 [E] Error deleting pathName: errorString
| fileName],inodeNumber.genNumber):
Explanation: An error occurred while attempting to
| errorString
delete the named file.
| Explanation: An error occurred during a
User response: Investigate the file and possibly
| gpfs_stat_inode operation.
reissue the command. The file may have been removed
| User response: Reissue the command. If the problem or locked by another application.
| persists, contact the IBM Support Center.
6027-880 Error on gpfs_seek_inode(inodeNumber):
| 6027-874 [E] Error: incorrect Date@Time errorString
(YYYY-MM-DD@HH:MM:SS)
Explanation: An error occurred during a
specification: specification
gpfs_seek_inode operation.
Explanation: The Date@Time command invocation
User response: Reissue the command. If the problem
argument could not be parsed.
persists, contact the contact the IBM Support Center
User response: Correct the command invocation and
try again. The syntax should look similar to:
2005-12-25@07:30:00.
| 6027-881 [E] Error on gpfs_iopen([rootPath/
pathName],inodeNumber): errorString
Explanation: An error occurred during a gpfs_iopen
| 6027-875 [E] Error on gpfs_stat(pathName): errorString
operation.
Explanation: An error occurred while attempting to
User response: Reissue the command. If the problem
stat() the cited path name.
persists, contact the IBM Support Center.
User response: Determine whether the cited path
name exists and is accessible. Correct the command
arguments as necessary and reissue the command.
| 6027-882 [E] Error on gpfs_ireaddir(rootPath/
| pathName): errorString
Explanation: An error occurred during a
| 6027-876 [E] Error starting directory scan(pathName):
gpfs_ireaddir() operation.
errorString
User response: Reissue the command. If the problem
Explanation: The specified path name is not a
persists, contact the IBM Support Center.
directory.
User response: Determine whether the specified path
name exists and is an accessible directory. Correct the
| 6027-883 [W] Error on
gpfs_next_inode(maxInodeNumber):
command arguments as necessary and reissue the
errorString
command.
Explanation: An error occurred during a
gpfs_next_inode operation.
| 6027-877 [E] Error opening pathName: errorString
User response: Reissue the command. If the problem
Explanation: An error occurred while attempting to
persists, contact the IBM Support Center.
open the named file. Its pool and replication attributes
remain unchanged.
User response: Investigate the file and possibly
| 6027-884 [E:nnn] Error during directory scan
reissue the command. The file may have been removed Explanation: A terminal error occurred during the
or locked by another application. directory scan phase of the command.
User response: Verify the command arguments.
| 6027-878 [E] Error on gpfs_fcntl(pathName): errorString Reissue the command. If the problem persists, contact
(offset=offset) the IBM Support Center.
Explanation: An error occurred while attempting fcntl
on the named file. Its pool or replication attributes may
not have been adjusted.
User response: Investigate the file and possibly
reissue the command. Use the mmlsattr and mmchattr
| 6027-885 [E:nnn] Error during inode scan: errorString | 6027-892 [E] Error on pthread_create: where
#threadNumber_or_portNumber_or_
Explanation: A terminal error occurred during the
socketNumber: errorString
inode scan phase of the command.
Explanation: An error occurred while creating the
User response: Verify the command arguments.
thread during a pthread_create operation.
Reissue the command. If the problem persists, contact
the IBM Support Center. User response: Consider some of the command
parameters that might affect memory usage. For further
assistance, contact the IBM Support Center.
| 6027-886 [E:nnn] Error during policy decisions scan
Explanation: A terminal error occurred during the
| 6027-893 [X] Error on pthread_mutex_init: errorString
policy decisions phase of the command.
Explanation: An error occurred during a
User response: Verify the command arguments.
pthread_mutex_init operation.
Reissue the command. If the problem persists, contact
the IBM Support Center. User response: Contact the IBM Support Center.
| 6027-900 [E] Error opening work file fileName: | 6027-906 [E:nnn] Error on system(command)
errorString
Explanation: An error occurred during the system call
Explanation: An error occurred while attempting to with the specified argument string.
open the named work file.
User response: Read and investigate related error
User response: Investigate the file and possibly messages.
reissue the command. Check that the path name is
defined and accessible.
| 6027-907 [E:nnn] Error from sort_file(inodeListname,
| sortCommand,sortInodeOptions,tempDir)
| 6027-901 [E] Error writing to work file fileName:
Explanation: An error occurred while sorting the
errorString
named work file using the named sort command with
Explanation: An error occurred while attempting to the given options and working directory.
write to the named work file.
User response: Check these:
User response: Investigate the file and possibly v The sort command is installed on your system.
reissue the command. Check that there is sufficient free
v The sort command supports the given options.
space in the file system.
v The working directory is accessible.
v The file system has sufficient free space.
| 6027-902 [E] Error parsing work file fileName. Service
index: number
Explanation: An error occurred while attempting to
| 6027-908 [W] Attention: In RULE 'ruleName'
(ruleNumber), the pool named by
read the specified work file.
"poolName 'poolType'" is not defined in
User response: Investigate the file and possibly the file system.
reissue the command. Make sure that there is enough
Explanation: The cited pool is not defined in the file
free space in the file system. If the error persists,
system.
contact the IBM Support Center.
User response: Correct the rule and reissue the
command.
| 6027-903 [E:nnn] Error while loading policy rules.
This is not an irrecoverable error; the command will
Explanation: An error occurred while attempting to
continue to run. Of course it will not find any files in
read or parse the policy file, which may contain syntax
an incorrect FROM POOL and it will not be able to
errors. Subsequent messages include more information
migrate any files to an incorrect TO POOL.
about the error.
User response: Read all of the related error messages
and try to correct the problem.
| 6027-909 [E] Error on pthread_join: where
#threadNumber: errorString
Explanation: An error occurred while reaping the
| 6027-904 [E] Error returnCode from PD writer for
thread during a pthread_join operation.
inode=inodeNumber pathname=pathName
User response: Contact the IBM Support Center.
Explanation: An error occurred while writing the
policy decision for the candidate file with the indicated
inode number and path name to a work file. There | 6027-910 [E:nnn] Error during policy execution
probably will be related error messages.
Explanation: A terminating error occurred during the
User response: Read all the related error messages. policy execution phase of the command.
Attempt to correct the problems.
User response: Verify the command arguments and
reissue the command. If the problem persists, contact
| 6027-905 [E] Error: Out of memory. Service index: the IBM Support Center.
number
Explanation: The command has exhausted virtual | 6027-911 [E] Error on changeSpecification change for
memory. pathName. errorString
User response: Consider some of the command Explanation: This message provides more details
parameters that might affect memory usage. For further about a gpfs_fcntl() error.
assistance, contact the IBM Support Center.
User response: Use the mmlsattr and mmchattr
commands to examine the file, and then reissue the
change command.
| 6027-912 [E] Error on restriping of pathName. 6027-918 Cannot make this change to a nonzero
errorString length file.
Explanation: This provides more details on a Explanation: GPFS does not support the requested
gpfs_fcntl() error. change to the replication attributes.
User response: Use the mmlsattr and mmchattr User response: You may want to create a new file
commands to examine the file and then reissue the with the desired attributes and then copy your data to
restriping command. that file and rename it appropriately. Be sure that there
are sufficient disks assigned to the pool with different
failure groups to support the desired replication
6027-913 Desired replication exceeds number of
attributes.
failure groups.
Explanation: While restriping a file, the tschattr or
6027-919 Replication parameter range error (value,
tsrestripefile command found that the desired
value).
replication exceeded the number of failure groups.
Explanation: Similar to message 6027-918. The (a,b)
User response: Reissue the command after adding or
numbers are the allowable range of the replication
restarting file system disks.
attributes.
User response: You may want to create a new file
6027-914 Insufficient space in one of the replica
with the desired attributes and then copy your data to
failure groups.
that file and rename it appropriately. Be sure that there
Explanation: While restriping a file, the tschattr or are sufficient disks assigned to the pool with different
tsrestripefile command found there was insufficient failure groups to support the desired replication
space in one of the replica failure groups. attributes.
Explanation: While restriping a file, the tschattr or User response: Contact the IBM Support Center.
tsrestripefile command found that there was
insufficient space to properly balance the file.
| 6027-921 [E] Error on socket socketName(hostName):
User response: Reissue the command after adding or errorString
restarting file system disks.
Explanation: An error occurred during a socket
operation.
6027-916 Too many disks unavailable to properly
User response: Verify any command arguments
balance file.
related to interprocessor communication and then
Explanation: While restriping a file, the tschattr or reissue the command. If the problem persists, contact
tsrestripefile command found that there were too the IBM Support Center.
many disks unavailable to properly balance the file.
User response: Reissue the command after adding or | 6027-922 [X] Error in Mtconx - p_accepts should not
restarting file system disks. be empty
Explanation: The program discovered an inconsistency
6027-917 All replicas of a data block were or logic error within itself.
previously deleted.
User response: Contact the IBM Support Center.
Explanation: While restriping a file, the tschattr or
tsrestripefile command found that all replicas of a data
| 6027-923 [W] Error - command client is an
block were previously deleted.
incompatible version: hostName
User response: Reissue the command after adding or protocolVersion
restarting file system disks.
Explanation: While operating in master/client mode,
the command discovered that the client is running an
incompatible version.
User response: Ensure the same version of the
User response: Upgrade the command software on all User response: Correct command line and reissue the
nodes and reissue the command. command.
| 6027-937 [E] Error creating shared temporary 6027-945 -r value exceeds number of failure
sub-directory subDirName: subDirPath groups for data.
Explanation: The mkdir command failed on the Explanation: The mmchattr command received
named subdirectory path. command line arguments that were not valid.
User response: Specify an existing writable shared User response: Correct command line and reissue the
directory as the shared temporary directory argument command.
to the policy command. The policy command will
create a subdirectory within that. 6027-946 Not a regular file or directory.
Explanation: An mmlsattr or mmchattr command
| 6027-938 [E] Error closing work file fileName: error occurred.
errorString
User response: Correct the problem and reissue the
Explanation: An error occurred while attempting to command.
close the named work file or socket.
User response: Record the above information. Contact 6027-947 Stat failed: A file or directory in the
the IBM Support Center. path name does not exist.
Explanation: A file or directory in the path name does
| 6027-939 [E] Error on not exist.
| gpfs_quotactl(pathName,commandCode,
| resourceId): errorString User response: Correct the problem and reissue the
command.
| Explanation: An error occurred while attempting
| gpfs_quotactl().
| 6027-948 [E:nnn] fileName: get clone attributes failed:
| User response: Correct the policy rules and/or enable errorString
| GPFS quota tracking. If problem persists contact the
| IBM Support Center. Explanation: The tsfattr call failed.
User response: Check for additional error messages.
| 6027-951 [E] Error on operationName to work file | 6027-959 'fileName' is not a regular file.
fileName: errorString
| Explanation: Only regular files are allowed to be clone
Explanation: An error occurred while attempting to | parents.
do a (write-like) operation on the named work file.
| User response: This file is not a valid target for
User response: Investigate the file and possibly | mmclone operations.
reissue the command. Check that there is sufficient free
space in the file system.
| 6027-960 cannot access 'fileName': errorString.
| Explanation: This message provides more details
| 6027-953 Failed to get a handle for fileset
| about a stat() error.
| filesetName, snapshot snapshotName in file
| system fileSystem. errorMessage. | User response: Correct the problem and reissue the
| command.
| Explanation: Failed to get a handle for a specific
| fileset snapshot in the file system.
6027-961 Cannot execute command.
| User response: Correct the command line and reissue
| the command. If the problem persists, contact the IBM Explanation: The mmeditacl command cannot invoke
| Support Center. the mmgetacl or mmputacl command.
User response: Contact your system administrator.
| 6027-954 Failed to get the maximum inode
| number in the active file system.
6027-963 EDITOR environment variable not set
| errorMessage.
Explanation: Self-explanatory.
| Explanation: Failed to get the maximum inode
| number in the current active file system. User response: Set the EDITOR environment variable
and reissue the command.
| User response: Correct the command line and reissue
| the command. If the problem persists, contact the IBM
| Support Center. 6027-964 EDITOR environment variable must be
an absolute path name
| 6027-955 Failed to set the maximum allowed Explanation: Self-explanatory.
| memory for the specified fileSystem
| command. User response: Set the EDITOR environment variable
correctly and reissue the command.
| Explanation: Failed to set the maximum allowed
| memory for the specified command.
6027-965 Cannot create temporary file
| User response: Correct the command line and reissue
| the command. If the problem persists, contact the IBM Explanation: Self-explanatory.
| Support Center. User response: Contact your system administrator.
Explanation: An unexpected error was encountered by User response: Correct the problem and reissue the
mmgetacl or mmeditacl. command.
6027-987 name is not a valid special name. 6027-993 Keyword aclType is incorrect. Valid
values are: 'posix', 'nfs4', 'native'.
Explanation: Produced by the mmputacl command
when the NFS V4 'special' identifier is followed by an Explanation: One of the mm*acl commands specified
unknown special id string. name is one of the following: an incorrect value with the -k option.
'owner@', 'group@', 'everyone@'.
User response: Correct the aclType value and reissue
User response: Specify a valid NFS V4 special name the command.
and reissue the command.
6027-994 ACL permissions cannot be denied to
6027-988 type is not a valid NFS V4 type. the file owner.
Explanation: Produced by the mmputacl command Explanation: The mmputacl command found that the
when the type field in an ACL entry is not one of the READ_ACL, WRITE_ACL, READ_ATTR, or
supported NFS Version 4 type values. type is one of the WRITE_ATTR permissions are explicitly being denied
following: 'allow' or 'deny'. to the file owner. This is not permitted, in order to
prevent the file being left with an ACL that cannot be
User response: Specify a valid NFS V4 type and
modified.
reissue the command.
User response: Do not select the READ_ACL,
WRITE_ACL, READ_ATTR, or WRITE_ATTR
6027-989 name is not a valid NFS V4 flag.
permissions on deny ACL entries for the OWNER.
Explanation: A flag specified in an ACL entry is not
one of the supported values, or is not valid for the type
6027-995 This command will run on a remote
of object (inherit flags are valid for directories only).
node, nodeName.
Valid values are FileInherit, DirInherit, and
InheritOnly. Explanation: The mmputacl command was invoked
for a file that resides on a file system in a remote
User response: Specify a valid NFS V4 option and
cluster, and UID remapping is enabled. To parse the
reissue the command.
user and group names from the ACL file correctly, the
command will be run transparently on a node in the
6027-990 Missing permissions (value found, value remote cluster.
are required).
User response: None. Informational message only.
Explanation: The permissions listed are less than the
number required.
| 6027-996 [E:nnn] Error reading policy text from:
User response: Add the missing permissions and | fileName
reissue the command.
Explanation: An error occurred while attempting to
open or read the specified policy file. The policy file
6027-991 Combining FileInherit and DirInherit may be missing or inaccessible.
makes the mask ambiguous.
User response: Read all of the related error messages
Explanation: Produced by the mmputacl command and try to correct the problem.
when WRITE/CREATE is specified without MKDIR
(or the other way around), and both the
| 6027-997 [W] Attention: RULE 'ruleName' attempts to
FILE_INHERIT and DIR_INHERIT flags are specified.
redefine EXTERNAL POOLorLISTliteral
User response: Make separate FileInherit and 'poolName', ignored.
DirInherit entries and reissue the command.
Explanation: Execution continues as if the specified
rule was not present.
6027-992 Subdirectory name already exists. Unable
User response: Correct or remove the policy rule.
to create snapshot.
Explanation: tsbackup was unable to create a
| 6027-998 [E] Error in FLR/PDR serving for client
snapshot because the snapshot subdirectory already
clientHostNameAndPortNumber:
exists. This condition sometimes is caused by issuing a
FLRs=numberOfFileListRecords
Tivoli restore operation without specifying a different
PDRs=numberOfPolicyDecisionResponses
subdirectory as the target of the restore.
pdrs=numberOfPolicyDecisionResponseRecords
User response: Remove or rename the existing
Explanation: A protocol error has been detected
subdirectory and then retry the command.
among cooperating mmapplypolicy processes.
If user-specified [nodelist] lines are in error, correct these Explanation: Some, but not enough, arguments were
lines. specified to the mmcrfsc command.
User response: Specify all arguments as per the usage
6027-1005 Common is not sole item on [] line statement that follows.
number.
Explanation: A [nodelist] line in the input stream 6027-1023 File system size must be an integer:
contains common plus any other names. value
User response: Fix the format of the [nodelist] line in Explanation: The first two arguments specified to the
the mmfs.cfg input file. This is usually the NodeFile mmcrfsc command are not integers.
specified on the mmchconfig command.
User response: File system size is an internal
If no user-specified [nodelist] lines are in error, contact argument. The mmcrfs command should never call the
the IBM Support Center. mmcrfsc command without a valid file system size
argument. Contact the IBM Support Center.
If user-specified [nodelist] lines are in error, correct these
lines.
6027-1028 Incorrect value for -name flag.
6027-1006 Incorrect custom [ ] line number. Explanation: An incorrect argument was specified
with an option that requires one of a limited number of
Explanation: A [nodelist] line in the input stream is not allowable options (for example, -s or any of the yes |
of the format: [nodelist]. This covers syntax errors not no options).
covered by messages 6027-1004 and 6027-1005.
User response: Use one of the valid values for the
User response: Fix the format of the list of nodes in specified option.
the mmfs.cfg input file. This is usually the NodeFile
specified on the mmchconfig command.
6027-1029 Incorrect characters in integer field for
If no user-specified lines are in error, contact the IBM -name option.
Support Center.
Explanation: An incorrect character was specified with
If user-specified lines are in error, correct these lines. the indicated option.
6027-1034 Missing argument after optionName 6027-1043 DefaultDataReplicas must be less than
option. or equal MaxDataReplicas.
Explanation: An option was not followed by an Explanation: The specified DefaultDataReplicas was
argument. greater than MaxDataReplicas.
User response: All options need an argument. Specify User response: Specify a valid value for
one. DefaultDataReplicas.
6027-1057 InodeSize must be less than or equal to 6027-1123 Disk name must be specified in disk
Blocksize. descriptor.
Explanation: The specified InodeSize was not less than Explanation: The disk name positional parameter (the
or equal to Blocksize. first field) in a disk descriptor was empty. The bad disk
descriptor is displayed following this message.
User response: Specify a valid value for InodeSize.
User response: Correct the input and rerun the
command.
6027-1059 Mode must be M or S: mode
Explanation: The first argument provided in the
6027-1124 Disk usage must be dataOnly,
mmcrfsc command was not M or S.
metadataOnly, descOnly, or
User response: The mmcrfsc command should not be dataAndMetadata.
called by a user. If any other command produces this
Explanation: The disk usage parameter has a value
error, contact the IBM Support Center.
that is not valid.
User response: Correct the input and reissue the
6027-1084 The specified block size (valueK)
command.
exceeds the maximum allowed block
size currently in effect (valueK). Either
specify a smaller value for the -B 6027-1132 Interrupt received: changes not
parameter, or increase the maximum propagated.
block size by issuing: mmchconfig
Explanation: An interrupt was received after changes
maxblocksize=valueK and restart the
were committed but before the changes could be
GPFS daemon.
propagated to all the nodes.
Explanation: The specified value for block size was
User response: All changes will eventually propagate
greater than the value of the maxblocksize
as nodes recycle or other GPFS administration
configuration parameter.
commands are issued. Changes can be activated now
User response: Specify a valid value or increase the by manually restarting the GPFS daemons.
value of the allowed block size by specifying a larger
value on the maxblocksize parameter of the
6027-1133 Interrupt received. Only a subset of the
mmchconfig command.
parameters were changed.
Explanation: An interrupt was received in mmchfs
6027-1113 Incorrect option: option.
before all of the requested changes could be completed.
Explanation: The specified command option is not
User response: Use mmlsfs to see what the currently
valid.
active settings are. Reissue the command if you want to
User response: Specify a valid option and reissue the change additional parameters.
command.
6027-1135 Restriping may not have finished.
6027-1119 Obsolete option: option.
Explanation: An interrupt occurred during restriping.
Explanation: A command received an option that is
User response: Restart the restripe. Verify that the file
not valid any more.
system was not damaged by running the mmfsck
User response: Correct the command line and reissue command.
the command.
6027-1136 option option specified twice.
6027-1120 Interrupt received: No changes made.
Explanation: An option was specified multiple times
Explanation: A GPFS administration command (mm...) on a command line.
received an interrupt before committing any changes.
User response: Correct the error on the command line
User response: None. Informational message only. and reissue the command.
6027-1143 Cannot open fileName. 6027-1150 Error encountered while importing disk
diskName.
Explanation: A file could not be opened.
Explanation: The mmimportfs command encountered
User response: Verify that the specified file exists and problems while processing the disk.
that you have the proper authorizations.
User response: Check the preceding messages for
more information.
6027-1144 Incompatible cluster types. You cannot
move file systems that were created by
GPFS cluster type sourceCluster into 6027-1151 Disk diskName already exists in the
GPFS cluster type targetCluster. cluster.
Explanation: The source and target cluster types are Explanation: You are trying to import a file system
incompatible. that has a disk with the same name as some disk from
a file system that is already in the cluster.
User response: Contact the IBM Support Center for
assistance. User response: Remove or replace the disk with the
conflicting name.
6027-1152 Block size must be 16K, 64K, 128K, 6027-1159 The following file systems were not
256K, 512K, 1M, 2M, 4M, 8M or 16M. imported: fileSystemList
Explanation: The specified block size value is not Explanation: The mmimportfs command was not able
valid. to import the specified file systems. Check the
preceding messages for error information.
User response: Specify a valid block size value.
User response: Correct the problems and reissue the
mmimportfs command.
6027-1153 At least one node in the cluster must be
defined as a quorum node.
6027-1160 The drive letters for the following file
Explanation: All nodes were explicitly designated or
systems have been reset: fileSystemList.
allowed to default to be nonquorum.
Explanation: The drive letters associated with the
User response: Specify which of the nodes should be
specified file systems are already in use by existing file
considered quorum nodes and reissue the command.
systems and have been reset.
User response: After the mmimportfs command
6027-1154 Incorrect node node specified for
finishes, use the -t option of the mmchfs command to
command.
assign new drive letters as needed.
Explanation: The user specified a node that is not
valid.
6027-1161 Use the dash character (-) to separate
User response: Specify a valid node. multiple node designations.
Explanation: A command detected an incorrect
6027-1155 The NSD servers for the following disks character used as a separator in a list of node
from file system fileSystem were reset or designations.
not defined: diskList
User response: Correct the command line and reissue
Explanation: Either the mmimportfs command the command.
encountered disks with no NSD servers, or was forced
to reset the NSD server information for one or more
6027-1162 Use the semicolon character (;) to
disks.
separate the disk names.
User response: After the mmimportfs command
Explanation: A command detected an incorrect
finishes, use the mmchnsd command to assign NSD
character used as a separator in a list of disk names.
server nodes to the disks as needed.
User response: Correct the command line and reissue
the command.
6027-1156 The NSD servers for the following free
disks were reset or not defined: diskList
6027-1163 GPFS is still active on nodeName.
Explanation: Either the mmimportfs command
encountered disks with no NSD servers, or was forced Explanation: The GPFS daemon was discovered to be
to reset the NSD server information for one or more active on the specified node during an operation that
disks. requires the daemon to be stopped.
User response: After the mmimportfs command User response: Stop the daemon on the specified node
finishes, use the mmchnsd command to assign NSD and rerun the command.
server nodes to the disks as needed.
6027-1164 Use mmchfs -t to assign drive letters as
6027-1157 Use the mmchnsd command to assign needed.
NSD servers as needed.
Explanation: The mmimportfs command was forced
Explanation: Either the mmimportfs command to reset the drive letters associated with one or more
encountered disks with no NSD servers, or was forced file systems. Check the preceding messages for detailed
to reset the NSD server information for one or more information.
disks. Check the preceding messages for detailed
User response: After the mmimportfs command
information.
finishes, use the -t option of the mmchfs command to
User response: After the mmimportfs command assign new drive letters as needed.
finishes, use the mmchnsd command to assign NSD
server nodes to the disks as needed.
6027-1165 The PR attributes for the following 6027-1189 You cannot delete all the disks.
disks from file system fileSystem were
Explanation: The number of disks to delete is greater
reset or not yet established: diskList
than or equal to the number of disks in the file system.
Explanation: The mmimportfs command disabled the
User response: Delete only some of the disks. If you
Persistent Reserve attribute for one or more disks.
want to delete them all, use the mmdelfs command.
User response: After the mmimportfs command
finishes, use the mmchconfig command to enable
6027-1197 parameter must be greater than value:
Persistent Reserve in the cluster as needed.
value.
Explanation: An incorrect value was specified for the
6027-1166 The PR attributes for the following free
named parameter.
disks were reset or not yet established:
diskList User response: Correct the input and reissue the
command.
Explanation: The mmimportfs command disabled the
Persistent Reserve attribute for one or more disks.
6027-1200 tscrfs failed. Cannot create device
User response: After the mmimportfs command
finishes, use the mmchconfig command to enable Explanation: The internal tscrfs command failed.
Persistent Reserve in the cluster as needed.
User response: Check the error message from the
command that failed.
6027-1167 Use mmchconfig to enable Persistent
Reserve in the cluster as needed.
6027-1201 Disk diskName does not belong to file
Explanation: The mmimportfs command disabled the system fileSystem.
Persistent Reserve attribute for one or more disks.
Explanation: The specified disk was not found to be
User response: After the mmimportfs command part of the cited file system.
finishes, use the mmchconfig command to enable
Persistent Reserve in the cluster as needed. User response: If the disk and file system were
specified as part of a GPFS command, reissue the
command with a disk that belongs to the specified file
6027-1168 Inode size must be 512, 1K or 4K. system.
Explanation: The specified inode size is not valid.
6027-1203 Attention: File system fileSystem may
User response: Specify a valid inode size.
have some disks that are in a non-ready
state. Issue the command: mmcommon
6027-1169 attribute must be value. recoverfs fileSystem
Explanation: The specified value of the given attribute Explanation: The specified file system may have some
is not valid. disks that are in a non-ready state.
User response: Specify a valid value. User response: Run mmcommon recoverfs fileSystem
to ensure that the GPFS configuration data for the file
system is current, and then display the states of the
6027-1178 parameter must be from value to value:
disks in the file system using the mmlsdisk command.
valueSpecified
If any disks are in a non-ready state, steps should be
Explanation: A parameter value specified was out of
taken to bring these disks into the ready state, or to
range.
remove them from the file system. This can be done by
User response: Keep the specified value within the mounting the file system, or by using the mmchdisk
range shown. command for a mounted or unmounted file system.
When maintenance is complete or the failure has been
repaired, use the mmchdisk command with the start
6027-1188 Duplicate disk specified: disk option. If the failure cannot be repaired without loss of
Explanation: A disk was specified more than once on data, you can use the mmdeldisk command to delete
the command line. the disks.
User response: Choose an unused name or path. User response: Examine the error code and other
messages to determine the reason for the failure.
Correct the problem and reissue the command.
6027-1208 File system fileSystem not found in
cluster clusterName.
6027-1214 Unable to enable Persistent Reserve on
Explanation: The specified file system does not belong the following disks: diskList
to the cited remote cluster. The local information about
the file system is not current. The file system may have Explanation: The command was unable to set up all
been deleted, renamed, or moved to a different cluster. of the disks to use Persistent Reserve.
User response: Contact the administrator of the User response: Examine the disks and the additional
remote cluster that owns the file system and verify the error information to determine if the disks should have
accuracy of the local information. Use the mmremotefs supported Persistent Reserve. Correct the problem and
show command to display the local information about reissue the command.
the file system. Use the mmremotefs update command
to make the necessary changes. 6027-1215 Unable to reset the Persistent Reserve
attributes on one or more disks on the
following nodes: nodeList
Explanation: The command could not reset Persistent
Reserve on at least one disk on the specified nodes.
User response: Examine the additional error
User response: Determine why the node cannot access User response: Specify a mount point or allow the
the specified NSDs. Correct the problem and reissue default settings for the file system to take effect.
the command.
6027-1226 Explicit mount points are not supported
6027-1220 Node nodeName cannot be used as an in a Windows environment. Specify a
NSD server for Persistent Reserve disk drive letter or allow the default settings
diskName because it is not an AIX node. to take effect.
Explanation: The node shown was specified as an Explanation: An explicit mount point was specified on
NSD server for diskName, but the node does not the mmmount command but the target node runs the
support Persistent Reserve. Windows operating system.
User response: Specify a node that supports Persistent User response: Specify a drive letter or allow the
Reserve as an NSD server. default settings for the file system to take effect.
6027-1221 The number of NSD servers exceeds the 6027-1227 The main GPFS cluster configuration
maximum (value) allowed. | file is locked. Retrying ...
Explanation: The number of NSD servers in the disk Explanation: Another GPFS administration command
descriptor exceeds the maximum allowed. has locked the cluster configuration file. The current
process will try to obtain the lock a few times before
User response: Change the disk descriptor to specify giving up.
no more NSD servers than the maximum allowed.
User response: None. Informational message only.
6027-1228 Lock creation successful. 6027-1234 Adding node node to the cluster will
exceed the quorum node limit.
Explanation: The holder of the lock has released it
and the current process was able to obtain it. Explanation: An attempt to add the cited node to the
cluster resulted in the quorum node limit being
User response: None. Informational message only. The
exceeded.
command will now continue.
User response: Change the command invocation to
not exceed the node quorum limit, and reissue the
6027-1229 Timed out waiting for lock. Try again
command.
later.
Explanation: Another GPFS administration command
6027-1235 The fileName kernel extension does not
kept the main GPFS cluster configuration file locked for
exist.
over a minute.
Explanation: The cited kernel extension does not exist.
User response: Try again later. If no other GPFS
administration command is presently running, see User response: Create the needed kernel extension by
“GPFS cluster configuration data files are locked” on compiling a custom mmfslinux module for your kernel
page 44. (see steps in /usr/lpp/mmfs/src/README), or copy the
binaries from another node with the identical
environment.
6027-1230 diskName is a tiebreaker disk and cannot
be deleted.
6027-1236 Unable to verify kernel/module
Explanation: A request was made to GPFS to delete a
configuration.
node quorum tiebreaker disk.
Explanation: The mmfslinux kernel extension does
User response: Specify a different disk for deletion.
not exist.
User response: Create the needed kernel extension by
6027-1231 GPFS detected more than eight quorum
compiling a custom mmfslinux module for your kernel
nodes while node quorum with
(see steps in /usr/lpp/mmfs/src/README), or copy the
tiebreaker disks is in use.
binaries from another node with the identical
Explanation: A GPFS command detected more than environment.
eight quorum nodes, but this is not allowed while node
quorum with tiebreaker disks is in use.
6027-1237 The GPFS daemon is still running; use
User response: Reduce the number of quorum nodes the mmshutdown command.
to a maximum of eight, or use the normal node
Explanation: An attempt was made to unload the
quorum algorithm.
GPFS kernel extensions while the GPFS daemon was
still running.
6027-1232 GPFS failed to initialize the tiebreaker
User response: Use the mmshutdown command to
disks.
shut down the daemon.
Explanation: A GPFS command unsuccessfully
attempted to initialize the node quorum tiebreaker
6027-1238 Module fileName is still in use. Unmount
disks.
all GPFS file systems and issue the
User response: Examine prior messages to determine command: mmfsadm cleanup
why GPFS was unable to initialize the tiebreaker disks
Explanation: An attempt was made to unload the
and correct the problem. After that, reissue the
cited module while it was still in use.
command.
User response: Unmount all GPFS file systems and
issue the command mmfsadm cleanup. If this does not
6027-1233 Incorrect keyword: value.
solve the problem, reboot the machine.
Explanation: A command received a keyword that is
not valid.
6027-1239 Error unloading module moduleName.
User response: Correct the command line and reissue
Explanation: GPFS was unable to unload the cited
the command.
module.
User response: Unmount all GPFS file systems and
issue the command mmfsadm cleanup. If this does not
solve the problem, reboot the machine.
6027-1253 Incorrect value for option option. 6027-1259 command not found. Ensure the
OpenSSL code is properly installed.
Explanation: The provided value for the specified
option is not valid. Explanation: The specified command was not found.
User response: Correct the error and reissue the User response: Ensure the OpenSSL code is properly
command. installed and reissue the command.
6027-1254 Warning: Not all nodes have proper | 6027-1260 File fileName does not contain any
GPFS license designations. Use the | typeOfStanza stanzas.
mmchlicense command to designate
licenses as needed.
| Explanation: The input file should contain at least one
| specified stanza.
Explanation: Not all nodes in the cluster have valid
license designations.
| User response: Correct the input file and reissue the
| command.
User response: Use mmlslicense to see the current
license designations. Use mmchlicense to assign valid
GPFS licenses to all nodes as needed.
| 6027-1261 descriptorField must be specified in
| descriptorType descriptor.
6027-1270 The device name device contains a slash, 6027-1277 No contact nodes were provided for
but not as its first character. cluster clusterName.
Explanation: The specified device name contains a Explanation: A GPFS command found that no contact
slash, but the first character is not a slash. nodes have been specified for the cited cluster.
User response: The device name must be an User response: Use the mmremotecluster command to
unqualified device name or an absolute device path specify some contact nodes for the cited cluster.
name, for example: fs0 or /dev/fs0.
6027-1278 None of the contact nodes in cluster
6027-1271 Unexpected error from command. Return clusterName can be reached.
code: value
Explanation: A GPFS command was unable to reach
Explanation: A GPFS administration command (mm...) any of the contact nodes for the cited cluster.
received an unexpected error code from an internally
User response: Determine why the contact nodes for
called command.
the cited cluster cannot be reached and correct the
User response: Perform problem determination. See problem, or use the mmremotecluster command to
“GPFS commands are unsuccessful” on page 56. specify some additional contact nodes that can be
reached.
6027-1272 Unknown user name userName.
6027-1287 Node nodeName returned ENODEV for
Explanation: The specified value cannot be resolved to
disk diskName.
a valid user ID (UID).
Explanation: The specified node returned ENODEV
User response: Reissue the command with a valid
for the specified disk.
user name.
User response: Determine the cause of the ENODEV
error for the specified disk and rectify it. The ENODEV
6027-1273 Unknown group name groupName.
may be due to disk fencing or the removal of a device
Explanation: The specified value cannot be resolved to that previously was present.
a valid group ID (GID).
User response: Reissue the command with a valid 6027-1288 Remote cluster clusterName was not
group name. found.
Explanation: A GPFS command found that the cited
6027-1274 Unexpected error obtaining the lockName cluster has not yet been identified to GPFS as a remote
lock. cluster.
Explanation: GPFS cannot obtain the specified lock. User response: Specify a remote cluster known to
GPFS, or use the mmremotecluster command to make
User response: Examine any previous error messages. the cited cluster known to GPFS.
Correct any problems and reissue the command. If the
problem persists, perform problem determination and
contact the IBM Support Center. 6027-1289 Name name is not allowed. It contains
the following invalid special character:
char
6027-1275 Daemon node adapter Node was not
found on admin node Node. Explanation: The cited name is not allowed because it
contains the cited invalid special character.
Explanation: An input node descriptor was found to
be incorrect. The node adapter specified for GPFS User response: Specify a name that does not contain
daemon communications was not found to exist on the an invalid special character, and reissue the command.
cited GPFS administrative node.
User response: Correct the input node descriptor and 6027-1290 GPFS configuration data for file system
reissue the command. fileSystem may not be in agreement with
the on-disk data for the file system.
Issue the command: mmcommon
6027-1276 Command failed for disks: diskList. recoverfs fileSystem
Explanation: A GPFS command was unable to Explanation: GPFS detected that the GPFS
complete successfully on the listed disks. configuration database data for the specified file system
User response: Correct the problems and reissue the may not be in agreement with the on-disk data for the
command. file system. This may be caused by a GPFS disk
| 6027-1292 The -N option cannot be used with Explanation: All disk descriptors specify dataOnly for
| attribute name. disk usage.
Explanation: The specified configuration attribute User response: Change at least one disk descriptor in
cannot be changed on only a subset of nodes. This the file system to indicate a usage of metadataOnly or
attribute must be the same on all nodes in the cluster. dataAndMetadata.
6027-1340 File fileName not found. Recover the file 6027-1348 Disk with NSD volume id NSD volume
or run mmauth genkey. id no longer exists in the GPFS cluster
configuration data but the NSD volume
Explanation: The cited file was not found.
id was not erased from the disk. To
User response: Recover the file or run the mmauth remove the NSD volume id, issue:
genkey command to recreate it. mmdelnsd -p NSD volume id -N
nodeNameList
6027-1341 Starting force unmount of GPFS file Explanation: A GPFS administration command (mm...)
systems successfully removed the disk with the specified NSD
volume id from the GPFS cluster configuration data but
Explanation: Progress information for the was unable to erase the NSD volume id from the disk.
mmshutdown command.
User response: Issue the specified command to
User response: None. Informational message only. remove the NSD volume id from the disk.
6027-1342 Unmount not finished after value 6027-1352 fileSystem is not a remote file system
seconds. Waiting value more seconds. known to GPFS.
Explanation: Progress information for the Explanation: The cited file system is not the name of a
mmshutdown command. remote file system known to GPFS.
User response: None. Informational message only. User response: Use the mmremotefs command to
identify the cited file system to GPFS as a remote file
6027-1343 Unmount not finished after value system, and then reissue the command that failed.
seconds.
Explanation: Progress information for the 6027-1357 An internode connection between GPFS
mmshutdown command. nodes was disrupted.
User response: None. Informational message only. Explanation: An internode connection between GPFS
nodes was disrupted, preventing its successful
completion.
6027-1344 Shutting down GPFS daemons
User response: Reissue the command. If the problem
Explanation: Progress information for the recurs, determine and resolve the cause of the
mmshutdown command. disruption. If the problem persists, contact the IBM
User response: None. Informational message only. Support Center.
User response: None. Informational message only. User response: This is an informational message.
6027-1347 Disk with NSD volume id NSD volume 6027-1359 Cluster clusterName is not authorized to
id no longer exists in the GPFS cluster access this cluster.
configuration data but the NSD volume Explanation: Self-explanatory.
id was not erased from the disk. To
remove the NSD volume id, issue: User response: This is an informational message.
mmdelnsd -p NSD volume id
Explanation: A GPFS administration command (mm...) 6027-1361 Attention: There are no available valid
successfully removed the disk with the specified NSD VFS type values for mmfs in /etc/vfs.
volume id from the GPFS cluster configuration data but Explanation: An out of range number was used as the
was unable to erase the NSD volume id from the disk. vfs number for GPFS.
User response: Issue the specified command to User response: The valid range is 8 through 32. Check
remove the NSD volume id from the disk. /etc/vfs and remove unneeded entries.
6027-1393 Incorrect node designation specified: 6027-1503 Completed adding disks to file system
type. fileSystem.
Explanation: A node designation that is not valid was Explanation: The mmadddisk command successfully
specified. Valid values are client or manager. completed.
User response: Correct the command line and reissue User response: None. Informational message only.
the command.
6027-1504 File name could not be run with err error.
6027-1394 Operation not allowed for the local
Explanation: A failure occurred while trying to run an
cluster.
external program.
Explanation: The requested operation cannot be
User response: Make sure the file exists. If it does,
performed for the local cluster.
check its access permissions.
User response: Specify the name of a remote cluster.
6027-1505 Could not get minor number for name.
6027-1450 Could not allocate storage.
Explanation: Could not obtain a minor number for the
Explanation: Sufficient memory cannot be allocated to specified block or character device.
run the mmsanrepairfs command.
User response: Problem diagnosis will depend on the
User response: Increase the amount of memory subsystem that the device belongs to. For example,
available. device /dev/VSD0 belongs to the IBM Virtual Shared
Disk subsystem and problem determination should
follow guidelines in that subsystem's documentation.
| 6027-1500 [E] Open devicetype device failed with error:
Explanation: The "open" of a device failed. Operation
6027-1507 READ_KEYS ioctl failed with
of the file system may continue unless this device is
errno=returnCode, tried timesTried times.
needed for operation. If this is a replicated disk device,
Related values are
it will often not be needed. If this is a block or
scsi_status=scsiStatusValue,
character device for another subsystem (such as
sense_key=senseKeyValue,
/dev/VSD0) then GPFS will discontinue operation.
scsi_asc=scsiAscValue,
User response: Problem diagnosis will depend on the scsi_ascq=scsiAscqValue.
subsystem that the device belongs to. For instance
Explanation: A READ_KEYS ioctl call failed with the
device "/dev/VSD0" belongs to the IBM Virtual Shared
errno= and related values shown.
Disk subsystem and problem determination should
follow guidelines in that subsystem's documentation. If User response: Check the reported errno= value and
this is a normal disk device then take needed repair try to correct the problem. If the problem persists,
action on the specified disk. contact the IBM Support Center.
| 6027-1501 [X] Volume label of disk name is name, 6027-1508 Registration failed with
should be uid. errno=returnCode, tried timesTried times.
Related values are
Explanation: The UID in the disk descriptor does not
scsi_status=scsiStatusValue,
match the expected value from the file system
sense_key=senseKeyValue,
descriptor. This could occur if a disk was overwritten
scsi_asc=scsiAscValue,
by another application or if the IBM Virtual Shared
scsi_ascq=scsiAscqValue.
Disk subsystem incorrectly identified the disk.
Explanation: A REGISTER ioctl call failed with the
User response: Check the disk configuration.
errno= and related values shown.
User response: Check the reported errno= value and
| 6027-1502 [X] Volume label of disk diskName is try to correct the problem. If the problem persists,
corrupt.
contact the IBM Support Center.
Explanation: The disk descriptor has a bad magic
number, version, or checksum. This could occur if a
disk was overwritten by another application or if the
IBM Virtual Shared Disk subsystem incorrectly
identified the disk.
User response: Check the disk configuration.
6027-1509 READRES ioctl failed with 6027-1515 READ KEY ioctl failed with
errno=returnCode, tried timesTried times. rc=returnCode. Related values are SCSI
Related values are status=scsiStatusValue,
scsi_status=scsiStatusValue, host_status=hostStatusValue,
sense_key=senseKeyValue, driver_status=driverStatsValue.
scsi_asc=scsiAscValue,
Explanation: An ioctl call failed with stated return
scsi_ascq=scsiAscqValue.
code, errno value, and related values.
Explanation: A READRES ioctl call failed with the
User response: Check the reported errno and correct
errno= and related values shown.
the problem if possible. Otherwise, contact the IBM
User response: Check the reported errno= value and Support Center.
try to correct the problem. If the problem persists,
contact the IBM Support Center.
6027-1516 REGISTER ioctl failed with
rc=returnCode. Related values are SCSI
| 6027-1510 [E] Error mounting file system stripeGroup status=scsiStatusValue,
on mountPoint; errorQualifier (gpfsErrno) host_status=hostStatusValue,
driver_status=driverStatsValue.
Explanation: An error occurred while attempting to
mount a GPFS file system on Windows. Explanation: An ioctl call failed with stated return
code, errno value, and related values.
User response: Examine the error details, previous
errors, and the GPFS message log to identify the cause. User response: Check the reported errno and correct
the problem if possible. Otherwise, contact the IBM
Support Center.
| 6027-1511 [E] Error unmounting file system
stripeGroup; errorQualifier (gpfsErrno)
6027-1517 READ RESERVE ioctl failed with
Explanation: An error occurred while attempting to
rc=returnCode. Related values are SCSI
unmount a GPFS file system on Windows.
status=scsiStatusValue,
User response: Examine the error details, previous host_status=hostStatusValue,
errors, and the GPFS message log to identify the cause. driver_status=driverStatsValue.
Explanation: An ioctl call failed with stated return
| 6027-1512 [E] WMI query for queryType failed; code, errno value, and related values.
errorQualifier (gpfsErrno)
User response: Check the reported errno and correct
Explanation: An error occurred while running a WMI the problem if possible. Otherwise, contact the IBM
query on Windows. Support Center.
User response: Check the reported errno and correct Explanation: An ioctl call failed with stated return
the problem if possible. Otherwise, contact the IBM code, errno value, and related values.
Support Center. User response: Check the reported errno and correct
| 6027-1537 [E] Connect failed to ipAddress: reason 6027-1543 error propagating parameter.
Explanation: An attempt to connect sockets between Explanation: mmfsd could not propagate a
nodes failed. configuration parameter value to one or more nodes in
the cluster.
User response: Check the reason listed and the
connection to the indicated IP address. User response: Contact the IBM Support Center.
| 6027-1539 [E] Connect progress select failed to Explanation: The sum of prefetchthreads,
ipAddress: reason worker1threads, and nsdMaxWorkerThreads exceeds
the permitted value.
Explanation: An attempt to connect sockets between
nodes failed. User response: Accept the calculated values or reduce
the individual settings using mmchconfig
User response: Check the reason listed and the prefetchthreads=newvalue or mmchconfig
connection to the indicated IP address. worker1threads=newvalue. or mmchconfig
nsdMaxWorkerThreads=newvalue. After using
| 6027-1540 [A] Try and buy license has expired! mmchconfig, the new settings will not take affect until
the GPFS daemon is restarted.
Explanation: Self explanatory.
User response: Purchase a GPFS license to continue | 6027-1545 [A] The GPFS product that you are
using GPFS. attempting to run is not a fully
functioning version. This probably
| 6027-1541 [N] Try and buy license expires in number means that this is an update version and
days. not the full product version. Install the
GPFS full product version first, then
Explanation: Self-explanatory. apply any applicable update version
User response: When the Try and Buy license expires, before attempting to start GPFS.
you will need to purchase a GPFS license to continue Explanation: GPFS requires a fully licensed GPFS
using GPFS. installation.
User response: Verify installation of licensed GPFS, or
| 6027-1542 [A] Old shared memory exists but it is not purchase and install a licensed version of GPFS.
valid nor cleanable.
Explanation: A new GPFS daemon started and found | 6027-1546 [W] Attention: parameter size of value is too
existing shared segments. The contents were not small. New value is value.
recognizable, so the GPFS daemon could not clean
them up. Explanation: A configuration parameter is temporarily
assigned a new value.
User response:
User response: Check the mmfs.cfg file. Use the
1. Stop the GPFS daemon from trying to start by
mmchconfig command to set a valid value for the
issuing the mmshutdown command for the nodes
parameter.
having the problem.
2. Find the owner of the shared segments with keys
from 0x9283a0ca through 0x9283a0d1. If a non-GPFS | 6027-1547 [A] Error initializing daemon: performing
program owns these segments, GPFS cannot run on shutdown
this node. Explanation: GPFS kernel extensions are not loaded,
3. If these segments are left over from a previous and the daemon cannot initialize. GPFS may have been
GPFS daemon: started incorrectly.
a. Remove them by issuing: User response: Check GPFS log for errors resulting
ipcrm -m shared_memory_id from kernel extension loading. Ensure that GPFS is
b. Restart GPFS by issuing the mmstartup started with the mmstartup command.
command on the affected nodes.
| 6027-1548 [A] Error: daemon and kernel extension do 6027-1559 The -i option failed. Changes will take
not match. effect after GPFS is restarted.
Explanation: The GPFS kernel extension loaded in Explanation: The -i option on the mmchconfig
memory and the daemon currently starting do not command failed. The changes were processed
appear to have come from the same build. successfully, but will take effect only after the GPFS
daemons are restarted.
User response: Ensure that the kernel extension was
reloaded after upgrading GPFS. See “GPFS modules User response: Check for additional error messages.
cannot be loaded on Linux” on page 46 for details. Correct the problem and reissue the command.
| 6027-1549 [A] Attention: custom-built kernel 6027-1560 This GPFS cluster contains file systems.
extension; the daemon and kernel You cannot delete the last node.
extension do not match.
Explanation: An attempt has been made to delete a
Explanation: The GPFS kernel extension loaded in GPFS cluster that still has one or more file systems
memory does not come from the same build as the associated with it.
starting daemon. The kernel extension appears to have
User response: Before deleting the last node of a GPFS
been built from the kernel open source package.
cluster, delete all file systems that are associated with it.
User response: None. This applies to both local and remote file systems.
| 6027-1550 [W] Error: Unable to establish a session 6027-1561 Attention: Failed to remove
with an Active Directory server. ID node-specific changes.
remapping via Microsoft Identity
Explanation: The internal mmfixcfg routine failed to
Management for Unix will be
remove node-specific configuration settings, if any, for
unavailable.
one or more of the nodes being deleted. This is of
Explanation: GPFS tried to establish an LDAP session consequence only if the mmchconfig command was
with an Active Directory server (normally the domain indeed used to establish node specific settings and
controller host), and has been unable to do so. these nodes are later added back into the cluster.
User response: Ensure the domain controller is User response: If you add the nodes back later, ensure
available. that the configuration parameters for the nodes are set
as desired.
6027-1555 Mount point and device name cannot be
equal: name 6027-1562 command command cannot be executed.
Either none of the nodes in the cluster
Explanation: The specified mount point is the same as
are reachable, or GPFS is down on all of
the absolute device name.
the nodes.
User response: Enter a new device name or absolute
Explanation: The command that was issued needed to
mount point path name.
perform an operation on a remote node, but none of
the nodes in the cluster were reachable, or GPFS was
6027-1556 Interrupt received. not accepting commands on any of the nodes.
Explanation: A GPFS administration command User response: Ensure that the affected nodes are
received an interrupt. available and all authorization requirements are met.
Correct any problems and reissue the command.
User response: None. Informational message only.
6027-1564 To change the authentication key for the 6027-1571 commandName does not exist or failed;
local cluster, run: mmauth genkey. automount mounting may not work.
Explanation: The authentication keys for the local Explanation: One or more of the GPFS file systems
cluster must be created only with the specified were defined with the automount attribute but the
command. requisite automount command is missing or failed.
User response: Run the specified command to User response: Correct the problem and restart GPFS.
establish a new authentication key for the nodes in the Or use the mount command to explicitly mount the file
cluster. system.
6027-1565 disk not found in file system fileSystem. 6027-1572 The command must run on a node that
is part of the cluster.
Explanation: A disk specified for deletion or
replacement does not exist. Explanation: The node running the mmcrcluster
command (this node) must be a member of the GPFS
User response: Specify existing disks for the indicated
cluster.
file system.
User response: Issue the command from a node that
will belong to the cluster.
6027-1566 Remote cluster clusterName is already
defined.
6027-1573 Command completed: No changes made.
Explanation: A request was made to add the cited
cluster, but the cluster is already known to GPFS. Explanation: Informational message.
User response: None. The cluster is already known to User response: Check the preceding messages, correct
GPFS. any problems, and reissue the command.
6027-1567 fileSystem from cluster clusterName is 6027-1574 Permission failure. The command
already defined. requires root authority to execute.
Explanation: A request was made to add the cited file Explanation: The command, or the specified
system from the cited cluster, but the file system is command option, requires root authority.
already known to GPFS.
User response: Log on as root and reissue the
User response: None. The file system is already command.
known to GPFS.
6027-1578 File fileName does not contain node
6027-1568 command command failed. Only names.
parameterList changed.
Explanation: The specified file does not contain valid
Explanation: The mmchfs command failed while node names.
making the requested changes. Any changes to the
User response: Node names must be specified one per
attributes in the indicated parameter list were
line. The name localhost and lines that start with '#'
successfully completed. No other file system attributes
character are ignored.
were changed.
User response: Reissue the command if you want to
6027-1579 File fileName does not contain data.
change additional attributes of the file system. Changes
can be undone by issuing the mmchfs command with Explanation: The specified file does not contain data.
the original value for the affected attribute.
User response: Verify that you are specifying the
correct file name and reissue the command.
6027-1570 virtual shared disk support is not
installed.
6027-1587 Unable to determine the local device
Explanation: The command detected that IBM Virtual name for disk nsdName on node
Shared Disk support is not installed on the node on nodeName.
which it is running.
Explanation: GPFS was unable to determine the local
User response: Install IBM Virtual Shared Disk device name for the specified GPFS disk.
support.
User response: Determine why the specified disk on
the specified node could not be accessed and correct
the problem. Possible reasons include: connectivity
User response: Change the invocation of the mmdsh Explanation: This message contains progress
command to use the -F or -L options, or set the information about the mmmount command.
WCOLL environment variable before invoking the User response: None. Informational message only.
mmdsh command.
6027-1625 option cannot be used with attribute 6027-1630 The GPFS cluster data on nodeName is
name. back level.
Explanation: An attempt was made to change a Explanation: A GPFS command attempted to commit
configuration attribute and requested the change to changes to the GPFS cluster configuration data, but the
take effect immediately (-i or -I option). However, the data on the server is already at a higher level. This can
specified attribute does not allow the operation. happen if the GPFS cluster configuration files were
altered outside the GPFS environment, or if the
User response: If the change must be made now, leave
mmchcluster command did not complete successfully.
off the -i or -I option. Then recycle the nodes to pick
up the new value. User response: Correct any problems and reissue the
command. If the problem persists, issue the mmrefresh
-f -a command.
6027-1626 Command is not supported in the type
environment.
6027-1631 The commit process failed.
Explanation: A GPFS administration command (mm...)
is not supported in the specified environment. Explanation: A GPFS administration command (mm...)
cannot commit its changes to the GPFS cluster
User response: Verify if the task is needed in this
configuration data.
environment, and if it is, use a different command.
User response: Examine the preceding messages,
correct the problem, and reissue the command. If the
6027-1627 The following nodes are not aware of
problem persists, perform problem determination and
the configuration server change: nodeList.
contact the IBM Support Center.
Do not start GPFS on the above nodes
until the problem is resolved.
6027-1632 The GPFS cluster configuration data on
Explanation: The mmchcluster command could not
nodeName is different than the data on
propagate the new cluster configuration servers to the
nodeName.
specified nodes.
Explanation: The GPFS cluster configuration data on
User response: Correct the problems and run the
the primary cluster configuration server node is
mmchcluster -p LATEST command before starting
different than the data on the secondary cluster
GPFS on the specified nodes.
configuration server node. This can happen if the GPFS
cluster configuration files were altered outside the
6027-1628 Cannot determine basic environment GPFS environment or if the mmchcluster command did
information. Not enough nodes are not complete successfully.
available.
User response: Correct any problems and issue the
Explanation: The mmchcluster command was unable mmrefresh -f -a command. If the problem persists,
to retrieve the GPFS cluster data files. Usually, this is perform problem determination and contact the IBM
due to too few nodes being available. Support Center.
6027-1637 command quitting. None of the specified 6027-1645 Node nodeName is fenced out from disk
nodes are valid. diskName.
Explanation: A GPFS command found that none of Explanation: A GPFS command attempted to access
the specified nodes passed the required tests. the specified disk, but found that the node attempting
the operation was fenced out from the disk.
User response: Determine why the nodes were not
accepted, fix the problems, and reissue the command. User response: Check whether there is a valid reason
why the node should be fenced out from the disk. If
there is no such reason, unfence the disk and reissue
6027-1638 Command: There are no unassigned the command.
nodes in the cluster.
Explanation: A GPFS command in a cluster 6027-1647 Unable to find disk with NSD volume
environment needs unassigned nodes, but found there id NSD volume id.
are none.
Explanation: A disk with the specified NSD volume id
User response: Verify whether there are any cannot be found.
unassigned nodes in the cluster. If there are none,
either add more nodes to the cluster using the User response: Specify a correct disk NSD volume id.
mmaddnode command, or delete some nodes from the
cluster using the mmdelnode command, and then
6027-1648 GPFS was unable to obtain a lock from
reissue the command.
node nodeName.
Explanation: GPFS failed in its attempt to get a lock
6027-1639 Command failed. Examine previous
from another node in the cluster.
error messages to determine cause.
User response: Verify that the reported node is
Explanation: A GPFS command failed due to
reachable. Examine previous error messages, if any. Fix
previously-reported errors.
the problems and then reissue the command.
User response: Check the previous error messages, fix
the problems, and then reissue the command. If no
6027-1661 Failed while processing disk descriptor
other messages are shown, examine the GPFS log files
descriptor on node nodeName.
in the /var/adm/ras directory on each node.
Explanation: A disk descriptor was found to be
unsatisfactory in some way.
6027-1642 command: Starting GPFS ...
User response: Check the preceding messages, if any,
Explanation: Progress information for the mmstartup
and correct the condition that caused the disk
command.
descriptor to be rejected.
User response: None. Informational message only.
6027-1662 Disk device deviceName refers to an
6027-1643 The number of quorum nodes exceeds existing NSD name
the maximum (number) allowed.
Explanation: The specified disk device refers to an
Explanation: An attempt was made to add more existing NSD.
quorum nodes to a cluster than the maximum number
User response: Specify another disk that is not an
allowed.
existing NSD.
User response: Reduce the number of quorum nodes,
and reissue the command.
6027-1663 Disk descriptor descriptor should refer to 6027-1677 Disk diskName is of an unknown type.
an existing NSD. Use mmcrnsd to create
Explanation: The specified disk is of an unknown
the NSD.
type.
Explanation: An NSD disk given as input is not
User response: Specify a disk whose type is
known to GPFS.
recognized by GPFS.
User response: Create the NSD. Then rerun the
command.
6027-1680 Disk name diskName is already
registered for use by GPFS.
6027-1664 command: Processing node nodeName
Explanation: The cited disk name was specified for
Explanation: Progress information. use by GPFS, but there is already a disk by that name
registered for use by GPFS.
User response: None. Informational message only.
User response: Specify a different disk name for use
by GPFS and reissue the command.
6027-1665 Issue the command from a node that
remains in the cluster.
6027-1681 Node nodeName is being used as an NSD
Explanation: The nature of the requested change
server.
requires the command be issued from a node that will
remain in the cluster. Explanation: The specified node is defined as a server
node for some disk.
User response: Run the command from a node that
will remain in the cluster. User response: If you are trying to delete the node
from the GPFS cluster, you must either delete the disk
or define another node as its server.
| 6027-1666 [I] No disks were found.
Explanation: A command searched for disks but
6027-1685 Processing continues without lock
found none.
protection.
User response: If disks are desired, create some using
Explanation: The command will continue processing
the mmcrnsd command.
although it was not able to obtain the lock that
prevents other GPFS commands from running
6027-1670 Incorrect or missing remote shell simultaneously.
command: name
User response: Ensure that no other GPFS command
Explanation: The specified remote command does not is running. See the command documentation for
exist or is not executable. additional details.
User response: Specify a valid command. Explanation: A command requires the lock for the
GPFS system data but was not able to obtain it.
6027-1672 option value parameter must be an User response: Check the preceding messages, if any.
absolute path name. Follow the procedure in the GPFS: Problem
Determination Guide for what to do when the GPFS
Explanation: The mount point does not begin with '/'. system data is locked. Then reissue the command.
User response: Specify the full path for the mount
point. 6027-1689 vpath disk diskName is not recognized as
an IBM SDD device.
6027-1674 command: Unmounting file systems ... Explanation: The mmvsdhelper command found that
Explanation: This message contains progress the specified disk is a vpath disk, but it is not
information about the mmumount command. recognized as an IBM SDD device.
User response: None. Informational message only. User response: Ensure the disk is configured as an
User response: None. Informational message only. Explanation: The local mmfsd daemon has
successfully connected to a remote daemon.
6027-1704 mmspsecserver (pid number) ready for User response: None. Informational message only.
service.
Explanation: The mmspsecserver process has created 6027-1712 Unexpected zero bytes received from
all the service threads necessary for mmfsd. name. Continuing.
User response: None. Informational message only. Explanation: This is an informational message. A
socket read resulted in zero bytes being read.
6027-1705 command: incorrect number of User response: If this happens frequently, check IP
connections (number), exiting... connections.
| 6027-1725 [E] The key used by the cluster named User response: Install a version of OpenSSL that
clusterName has changed. Contact the supports the required cipherList or contact the
administrator to obtain the new key and administrator of the target cluster and request that a
register it using "mmauth update". supported cipherList be assigned to this remote cluster.
| 6027-1732 [X] Remote mounts are not enabled within | 6027-1738 [E] Close connection to ipAddress
this cluster. (errorString). Attempting reconnect.
Explanation: Remote mounts cannot be performed in Explanation: Connection socket closed.
this cluster.
User response: None. Informational message only.
User response: See the GPFS: Advanced Administration
Guide for instructions about enabling remote mounts. In
particular, make sure the keys have been generated and
| 6027-1739 [X] Accept socket connection failed: err
value.
a cipherlist has been set.
Explanation: The Accept socket connection received
an unexpected error.
6027-1733 OpenSSL dynamic lock support could
not be loaded. User response: None. Informational message only.
Explanation: One of the functions required for
dynamic lock support was not included in the version | 6027-1740 [E] Timed out waiting for a reply from node
of the OpenSSL library that GPFS is configured to use. ipAddress.
User response: If this functionality is required, shut Explanation: A message that was sent to the specified
down the daemon, install a version of OpenSSL with node did not receive a response within the expected
the desired functionality, and configure GPFS to use it. time limit.
Then restart the daemon.
User response: None. Informational message only.
User response: None. Informational message only. User response: None. Informational message only.
| 6027-1736 [N] Reconnected to ipAddress | 6027-1743 [W] Failed to load GSKit library path:
| (dlerror) errorMessage
Explanation: The local mmfsd daemon has
successfully reconnected to a remote daemon following | Explanation: The GPFS daemon could not load the
an unexpected connection break. | library required to secure the node-to-node
| communications.
User response: None. Informational message only.
| User response: Verify that the gpfs.gskit package
| was properly installed.
| 6027-1737 [N] Close connection to ipAddress
(errorString).
| 6027-1744 [I] GSKit library loaded and initialized.
Explanation: Connection socket closed.
| Explanation: The GPFS daemon successfully loaded
User response: None. Informational message only. | the library required to secure the node-to-node
| communications.
| User response: None. Informational message only.
| 6027-1745 [E] Unable to resolve symbol for routine: | established because the remote GPFS node closed the
| functionName (dlerror) errorMessage
| connection.
| Explanation: An error occurred while resolving a | User response: None. Informational message only.
| symbol required for transport-level security.
| User response: Verify that the gpfs.gskit package | 6027-1751 [N] A secure send to node ipAddress was
| was properly installed. | cancelled: connection reset by peer
| (return code value).
| 6027-1746 [E] Failed to load or initialize GSKit | Explanation: Securely sending a message failed
| library: error value
| because the remote GPFS node closed the connection.
| Explanation: An error occurred during the | User response: None. Informational message only.
| initialization of the transport-security code.
| User response: Verify that the gpfs.gskit package | 6027-1752 [N] A secure receive to node ipAddress was
| was properly installed. | cancelled: connection reset by peer
| (return code value).
| 6027-1747 [W] The TLS handshake with node | Explanation: Securely receiving a message failed
| ipAddress failed with error value
| because the remote GPFS node closed the connection.
| (handshakeType). | User response: None. Informational message only.
| Explanation: An error occurred while trying to
| establish a secure connection with another GPFS node. | 6027-1803 [E] Global NSD disk, name, not found.
| User response: Examine the error messages to obtain Explanation: A client tried to open a globally-attached
| information about the error. Under normal NSD disk, but a scan of all disks failed to find that
| circumstances, the retry logic will ensure that the NSD.
| connection is re-established. If this error persists, record
| the error code and contact the IBM Support Center. User response: Ensure that the globally-attached disk
is available on every node that references it.
| 6027-1750 [N] The handshakeType TLS handshake with User response: Perform disk diagnostics.
| node ipAddress was cancelled: connection
| reset by peer (return code value).
| Explanation: A secure connection could not be
| 6027-1807 [E] NSD nsdName is using Persistent | 6027-1813 [A] Error reading volume identifier (for
Reserve, this will require an NSD server objectName name) from configuration file.
on an osName node.
Explanation: The volume identifier for the named
Explanation: A client tried to open a globally-attached recovery group or vdisk could not be read from the
NSD disk, but the disk is using Persistent Reserve. An mmsdrfs file. This should never occur.
osName NSD server is needed. GPFS only supports
User response: Check for damage to the mmsdrfs file.
Persistent Reserve on certain operating systems.
User response: Use the mmchnsd command to add an
osName NSD server for the NSD.
| 6027-1814 [E] Vdisk vdiskName cannot be associated
with its recovery group
recoveryGroupName. This vdisk will be
| 6027-1808 [A] Unable to reserve space for NSD ignored.
buffers. Increase pagepool size to at
Explanation: The named vdisk cannot be associated
least requiredPagePoolSize MB. Refer to
with its recovery group.
the GPFS: Administration and
Programming Reference for more User response: Check for damage to the mmsdrfs file.
information on selecting an appropriate
pagepool size.
| 6027-1815 [A] Error reading volume identifier (for
Explanation: The pagepool usage for an NSD buffer NSD name) from configuration file.
(4*maxblocksize) is limited by factor nsdBufSpace. The
value of nsdBufSpace can be in the range of 10-70. The Explanation: The volume identifier for the named
default value is 30. NSD could not be read from the mmsdrfs file. This
should never occur.
User response: Use the mmchconfig command to
decrease the value of maxblocksize or to increase the User response: Check for damage to the mmsdrfs file.
value of pagepool or nsdBufSpace.
| 6027-1816 [E] The defined server serverName for
| 6027-1809 [E] The defined server serverName for NSD recovery group recoveryGroupName could
NsdName couldn't be resolved. not be resolved.
Explanation: The host name of the NSD server could Explanation: The hostname of the NSD server could
not be resolved by gethostbyName(). not be resolved by gethostbyName().
User response: Fix the host name resolution. User response: Fix hostname resolution.
| 6027-1810 [I] Vdisk server recovery: delay number sec. | 6027-1817 [E] Vdisks are defined, but no recovery
for safe recovery. groups are defined.
Explanation: Wait for the existing disk lease to expire Explanation: There are vdisks defined in the mmsdrfs
before performing vdisk server recovery. file, but no recovery groups are defined. This should
never occur.
User response: None.
User response: Check for damage to the mmsdrfs file.
6027-1820 Disk descriptor for name refers to an 6027-1850 [E] NSD-RAID services are not configured
existing NSD. on node nodeName. Check the
nsdRAIDTracks and
Explanation: The mmcrrecoverygroup command or
nsdRAIDBufferPoolSizePct
mmaddpdisk command found an existing NSD.
configuration attributes.
User response: Correct the input file, or use the -v
Explanation: A GPFS Native RAID command is being
option.
executed, but NSD-RAID services are not initialized
either because the specified attributes have not been set
6027-1821 Error errno writing disk descriptor on or had invalid values.
name.
User response: Correct the attributes and restart the
Explanation: The mmcrrecoverygroup command or GPFS daemon.
mmaddpdisk command got an error writing the disk
descriptor.
| 6027-1851 [A] Cannot configure NSD-RAID services.
User response: Perform disk diagnostics. The nsdRAIDBufferPoolSizePct of the
pagepool must result in at least 128MiB
of space.
6027-1822 Error errno reading disk descriptor on
name. Explanation: The GPFS daemon is starting and cannot
initialize the NSD-RAID services because of the
Explanation: The tspreparedpdisk command got an memory consideration specified.
error reading the disk descriptor.
User response: Correct the
User response: Perform disk diagnostics. nsdRAIDBufferPoolSizePct attribute and restart the
GPFS daemon.
6027-1823 Path error, name and name are the same
disk. | 6027-1852 [A] Cannot configure NSD-RAID services.
Explanation: The tspreparedpdisk command got an nsdRAIDTracks is too large, the
error during path verification. The pdisk descriptor file maximum on this node is value.
is miscoded. Explanation: The GPFS daemon is starting and cannot
User response: Correct the pdisk descriptor file and initialize the NSD-RAID services because the
reissue the command. nsdRAIDTracks attribute is too large.
User response: Correct the nsdRAIDTracks attribute
| 6027-1824 [X] An unexpected Device Mapper path and restart the GPFS daemon.
dmDevice (nsdId) has been detected. The
new path does not have a Persistent 6027-1853 [E] Recovery group recoveryGroupName does
Reserve set up. Server disk diskName not exist or is not active.
will be put offline
Explanation: A command was issued to a RAID
Explanation: A new device mapper path is detected or recovery group that does not exist, or is not in the
a previously failed path is activated after the local active state.
device discovery has finished. This path lacks a
Persistent Reserve, and cannot be used. All device User response: Retry the command with a valid RAID
paths must be active at mount time. recovery group name or wait for the recovery group to
become active.
User response: Check the paths to all disks making up
the file system. Repair any paths to disks which have
failed. Rediscover the paths for the NSD. 6027-1854 [E] Cannot find declustered array arrayName
in recovery group recoveryGroupName.
| 6027-1825 [A] Unrecoverable NSD checksum error on Explanation: The specified declustered array name
I/O to NSD disk nsdName, using server was not found in the RAID recovery group.
serverName. Exceeds retry limit number.
User response: Specify a valid declustered array name
Explanation: The allowed number of retries was within the RAID recovery group.
exceeded when encountering an NSD checksum error
on I/O to the indicated disk, using the indicated server.
User response: There may be network issues that
require investigation.
6027-1869 [E] Error updating the recovery group 6027-1876 [E] Cannot remove declustered array
descriptor. arrayName because it is the only
remaining declustered array with at
Explanation: Error occurred updating the RAID
least number pdisks.
recovery group descriptor.
Explanation: The command failed to remove a
User response: Retry the command.
declustered array because no other declustered array in
the recovery group has sufficient pdisks to store the
6027-1870 [E] Recovery group name name is already in on-disk recovery group descriptor at the required fault
use. tolerance level.
Explanation: The recovery group name already exists. User response: Add pdisks to another declustered
array in this recovery group before removing this one.
User response: Choose a new recovery group name
using the characters a-z, A-Z, 0-9, and underscore, at
most 63 characters in length. 6027-1877 [E] Cannot remove declustered array
arrayName because the array still
contains vdisks.
6027-1871 [E] There is only enough free space to
allocate number spare(s) in declustered Explanation: Declustered arrays that still contain
array arrayName. vdisks cannot be deleted.
Explanation: Too many spares were specified. User response: Delete any vdisks remaining in this
declustered array using the tsdelvdisk command before
User response: Retry the command with a valid retrying this command.
number of spares.
User response: Delete any vdisks remaining in this Explanation: The tsdelpdisk command can be used
RAID recovery group using the tsdelvdisk command either to delete individual pdisks from a declustered
before retrying this command. array, or to delete a full declustered array from a
recovery group. You cannot, however, delete a
declustered array by deleting all of its pdisks -- at least
6027-1873 [E] Pdisk creation failed for pdisk one must remain.
pdiskName: err=errorNum.
User response: Delete the declustered array instead of
Explanation: Pdisk creation failed because of the removing all of its pdisks.
specified error.
User response: None. 6027-1879 [E] Cannot remove pdisk pdiskName because
arrayName is the only remaining
6027-1874 [E] Error adding pdisk to a recovery group. declustered array with at least number
pdisks.
Explanation: tsaddpdisk failed to add new pdisks to a
recovery group. Explanation: The command failed to remove a pdisk
from a declustered array because no other declustered
User response: Check the list of pdisks in the -d or -F array in the recovery group has sufficient pdisks to
parameter of tsaddpdisk. store the on-disk recovery group descriptor at the
required fault tolerance level.
6027-1875 [E] Cannot delete the only declustered User response: Add pdisks to another declustered
array. array in this recovery group before removing pdisks
Explanation: Cannot delete the only remaining from this one.
declustered array from a recovery group.
User response: Instead, delete the entire recovery 6027-1880 [E] Cannot remove pdisk pdiskName because
group. the number of pdisks in declustered
array arrayName would fall below the
code width of one or more of its vdisks.
Explanation: The number of pdisks in a declustered
array must be at least the maximum code width of any
vdisk in the declustered array.
6027-1906 There is no file system with drive letter 6027-1927 The requested disks are not known to
driveLetter. GPFS.
Explanation: No file system in the GPFS cluster has Explanation: GPFS could not find the requested NSDs
the specified drive letter. in the cluster.
User response: Reissue the command with a valid file User response: Reissue the command, specifying
system. known disks.
6027-1908 The option option is not allowed for 6027-1929 cipherlist is not a valid cipher list.
remote file systems.
Explanation: The cipher list must be set to a value
Explanation: The specified option can be used only supported by GPFS. All nodes in the cluster must
for locally-owned file systems. support a common cipher.
User response: Correct the command line and reissue | User response: Use mmauth show ciphers to display
the command. | a list of the supported ciphers.
6027-1909 There are no available free disks. Disks 6027-1930 Disk diskName belongs to file system
must be prepared prior to invoking fileSystem.
command. Define the disks using the
Explanation: A GPFS administration command (mm...)
command command.
found that the requested disk to be deleted still belongs
Explanation: The currently executing command to a file system.
(mmcrfs, mmadddisk, mmrpldisk) requires disks to be
User response: Check that the correct disk was
defined for use by GPFS using one of the GPFS disk
requested. If so, delete the disk from the file system
creation commands: mmcrnsd, mmcrvsd.
before proceeding.
User response: Create disks and reissue the failing
command.
6027-1931 The following disks are not known to
GPFS: diskNames.
6027-1910 Node nodeName is not a quorum node.
Explanation: A GPFS administration command (mm...)
Explanation: The mmchmgr command was asked to found that the specified disks are not known to GPFS.
move the cluster manager to a nonquorum node. Only
User response: Verify that the correct disks were
one of the quorum nodes can be a cluster manager.
requested.
User response: Designate the node to be a quorum
node, specify a different node on the command line, or
6027-1932 No disks were specified that could be
allow GPFS to choose the new cluster manager node.
deleted.
Explanation: A GPFS administration command (mm...)
6027-1911 File system fileSystem belongs to cluster
determined that no disks were specified that could be
clusterName. The option option is not
deleted.
allowed for remote file systems.
User response: Examine the preceding messages,
Explanation: The specified option can be used only
correct the problems, and reissue the command.
for locally-owned file systems.
User response: Correct the command line and reissue
6027-1933 Disk diskName has been removed from
the command.
the GPFS cluster configuration data but
the NSD volume id was not erased from
6027-1922 IP aliasing is not supported (node). the disk. To remove the NSD volume id,
Specify the main device. issue mmdelnsd -p NSDvolumeid.
Explanation: IP aliasing is not supported. Explanation: A GPFS administration command (mm...)
successfully removed the specified disk from the GPFS
User response: Specify a node identifier that resolves
cluster configuration data, but was unable to erase the
to the IP address of a main device for the node.
NSD volume id from the disk.
User response: Issue the specified command to
remove the NSD volume id from the disk.
6027-1934 Disk diskName has been removed from 6027-1941 Cannot handle multiple interfaces for
the GPFS cluster configuration data but host hostName.
the NSD volume id was not erased from
Explanation: Multiple entries were found for the
the disk. To remove the NSD volume id,
given hostname or IP address either in /etc/hosts or by
issue: mmdelnsd -p NSDvolumeid -N
the host command.
nodeList.
User response: Make corrections to /etc/hosts and
Explanation: A GPFS administration command (mm...)
reissue the command.
successfully removed the specified disk from the GPFS
cluster configuration data but was unable to erase the
NSD volume id from the disk. 6027-1942 Unexpected output from the 'host -t a
name' command:
User response: Issue the specified command to
remove the NSD volume id from the disk. Explanation: A GPFS administration command (mm...)
received unexpected output from the host -t a
command for the given host.
6027-1936 Node nodeName cannot support
Persistent Reserve on disk diskName User response: Issue the host -t a command
because it is not an AIX node. The disk interactively and carefully review the output, as well as
will be used as a non-PR disk. any error messages.
Explanation: A non-AIX node was specified as an
NSD server for the disk. The disk will be used as a 6027-1943 Host name not found.
non-PR disk.
Explanation: A GPFS administration command (mm...)
User response: None. Informational message only. could not resolve a host from /etc/hosts or by using the
host command.
6027-1937 A node was specified more than once as User response: Make corrections to /etc/hosts and
an NSD server in disk descriptor reissue the command.
descriptor.
Explanation: A node was specified more than once as 6027-1945 Disk name diskName is not allowed.
an NSD server in the disk descriptor shown. Names beginning with gpfs are reserved
for use by GPFS.
User response: Change the disk descriptor to
eliminate any redundancies in the list of NSD servers. Explanation: The cited disk name is not allowed
because it begins with gpfs.
6027-1938 configParameter is an incorrect parameter. User response: Specify a disk name that does not
Line in error: configLine. The line is begin with gpfs and reissue the command.
ignored; processing continues.
Explanation: The specified parameter is not valid and 6027-1947 Use mmauth genkey to recover the file
will be ignored. fileName, or to generate and commit a
new key.
User response: None. Informational message only.
Explanation: The specified file was not found.
6027-1939 Line in error: line. User response: Recover the file, or generate a new key
by running: mmauth genkey propagate or generate a
Explanation: The specified line from a user-provided
new key by running mmauth genkey new, followed by
input file contains errors.
the mmauth genkey commit command.
User response: Check the preceding messages for
more information. Correct the problems and reissue the
6027-1948 Disk diskName is too large.
command.
Explanation: The specified disk is too large.
6027-1940 Unable to set reserve policy policy on User response: Specify a smaller disk and reissue the
disk diskName on node nodeName. command.
Explanation: The specified disk should be able to
support Persistent Reserve, but an attempt to set up the
registration key failed.
User response: Correct the problem and reissue the
command.
6027-1949 Propagating the cluster configuration 6027-1968 Failed while processing disk diskName.
data to all affected nodes.
Explanation: An error was detected while processing
Explanation: The cluster configuration data is being the specified disk.
sent to the rest of the nodes in the cluster.
User response: Examine prior messages to determine
User response: This is an informational message. the reason for the failure. Correct the problem and
reissue the command.
6027-1950 Local update lock is busy.
6027-1969 Device device already exists on node
Explanation: More than one process is attempting to
nodeName
update the GPFS environment at the same time.
Explanation: This device already exists on the
User response: Repeat the command. If the problem
specified node.
persists, verify that there are no blocked processes.
User response: None.
6027-1951 Failed to obtain the local environment
update lock. 6027-1970 Disk diskName has no space for the
quorum data structures. Specify a
Explanation: GPFS was unable to obtain the local
different disk as tiebreaker disk.
environment update lock for more than 30 seconds.
Explanation: There is not enough free space in the file
User response: Examine previous error messages, if
system descriptor for the tiebreaker disk data
any. Correct any problems and reissue the command. If
structures.
the problem persists, perform problem determination
and contact the IBM Support Center. User response: Specify a different disk as a tiebreaker
disk.
6027-1962 Permission denied for disk diskName
6027-1974 None of the quorum nodes can be
Explanation: The user does not have permission to
reached.
access disk diskName.
Explanation: Ensure that the quorum nodes in the
User response: Correct the permissions and reissue
cluster can be reached. At least one of these nodes is
the command.
required for the command to succeed.
User response: Ensure that the quorum nodes are
6027-1963 Disk diskName was not found.
available and reissue the command.
Explanation: The specified disk was not found.
User response: Specify an existing disk and reissue 6027-1975 The descriptor file contains more than
the command. one descriptor.
Explanation: The descriptor file must contain only one
6027-1964 I/O error on diskName descriptor.
Explanation: An I/O error occurred on the specified User response: Correct the descriptor file.
disk.
User response: Check for additional error messages. 6027-1976 The descriptor file contains no
Check the error log for disk hardware problems. descriptor.
Explanation: The descriptor file must contain only one
6027-1967 Disk diskName belongs to back-level file descriptor.
system fileSystem or the state of the disk
User response: Correct the descriptor file.
is not ready. Use mmchfs -V to convert
the file system to the latest format. Use
mmchdisk to change the state of a disk. 6027-1977 Failed validating disk diskName. Error
code errorCode.
Explanation: The specified disk cannot be initialized
for use as a tiebreaker disk. Possible reasons are Explanation: GPFS control structures are not as
suggested in the message text. expected.
User response: Use the mmlsfs and mmlsdisk User response: Contact the IBM Support Center.
commands to determine what action is needed to
correct the problem.
6027-1985 mmfskxload: The format of the GPFS 6027-1996 Command was unable to determine
kernel extension is not correct for this whether file system fileSystem is
version of AIX. mounted.
Explanation: This version of AIX is incompatible with Explanation: The command was unable to determine
the current format of the GPFS kernel extension. whether the cited file system is mounted.
User response: Contact your system administrator to User response: Examine any prior error messages to
check the AIX version and GPFS kernel extension. determine why the command could not determine
whether the file system was mounted, resolve the
problem if possible, and then reissue the command. If
6027-1986 junctionName does not resolve to a
you cannot resolve the problem, reissue the command
directory in deviceName. The junction
with the daemon down on all nodes of the cluster. This
must be within the specified file
will ensure that the file system is not mounted, which
system.
may allow the command to proceed.
Explanation: The cited junction path name does not
belong to the specified file system.
6027-1997 Backup control file fileName from a
User response: Correct the junction path name and previous backup does not exist.
reissue the command.
Explanation: The mmbackup command was asked to
do an incremental or a resume backup, but the control
6027-1987 Name name is not allowed. file from a previous backup could not be found.
Explanation: The cited name is not allowed because it User response: Restore the named file to the file
is a reserved word or a prohibited character. system being backed up and reissue the command, or
else do a full backup.
User response: Specify a different name and reissue
the command.
6027-1998 Line lineNumber of file fileName is
incorrect:
6027-1988 File system fileSystem is not mounted.
Explanation: A line in the specified file passed to the
Explanation: The cited file system is not currently
command had incorrect syntax. The line with the
mounted on this node.
incorrect syntax is displayed next, followed by a
User response: Ensure that the file system is mounted description of the correct syntax for the line.
and reissue the command.
User response: Correct the syntax of the line and
reissue the command.
6027-1993 File fileName either does not exist or has
an incorrect format.
6027-1999 Syntax error. The correct syntax is:
Explanation: The specified file does not exist or has string.
an incorrect format.
Explanation: The specified input passed to the
User response: Check whether the input file specified command has incorrect syntax.
actually exists.
User response: Correct the syntax and reissue the
command.
6027-1994 Did not find any match with the input
disk address.
Explanation: The mmfileid command returned
without finding any disk addresses that match the
given input.
6027-2000 Could not clear fencing for disk 6027-2010 vgName is not a valid volume group
physicalDiskName. name.
Explanation: The fencing information on the disk Explanation: vgName passed to the command is not
could not be cleared. found in the ODM, implying that vgName does not
exist.
User response: Make sure the disk is accessible by this
node and retry. User response: Run the command on a valid volume
group name.
6027-2002 Disk physicalDiskName of type diskType is
not supported for fencing. 6027-2011 For the hdisk specification -h
physicalDiskName to be valid
Explanation: This disk is not a type that supports
physicalDiskName must be the only disk
fencing.
in the volume group. However, volume
User response: None. group vgName contains disks.
Explanation: The hdisk specified belongs to a volume
6027-2004 None of the specified nodes belong to group that contains other disks.
this GPFS cluster.
User response: Pass an hdisk that belongs to a volume
Explanation: The nodes specified do not belong to the group that contains only this disk.
GPFS cluster.
User response: Choose nodes that belong to the 6027-2012 physicalDiskName is not a valid physical
cluster and try the command again. volume name.
Explanation: The specified name is not a valid
6027-2007 Unable to display fencing for disk physical disk name.
physicalDiskName.
User response: Choose a correct physical disk name
Explanation: Cannot retrieve fencing information for and retry the command.
this disk.
User response: Make sure that this node has access to 6027-2013 pvid is not a valid physical volume id.
the disk before retrying.
Explanation: The specified value is not a valid
physical volume ID.
6027-2008 For the logical volume specification -l
User response: Choose a correct physical volume ID
lvName to be valid lvName must be the
and retry the command.
only logical volume in the volume
group. However, volume group vgName
contains logical volumes. 6027-2014 Node node does not have access to disk
physicalDiskName.
Explanation: The command is being run on a logical
volume that belongs to a volume group that has more Explanation: The specified node is not able to access
than one logical volume. the specified disk.
User response: Run this command only on a logical User response: Choose a different node or disk (or
volume where it is the only logical volume in the both), and retry the command. If both the node and
corresponding volume group. disk name are correct, make sure that the node has
access to the disk.
6027-2009 logicalVolume is not a valid logical
volume. 6027-2015 Node node does not hold a reservation
for disk physicalDiskName.
Explanation: logicalVolume does not exist in the ODM,
implying that logical name does not exist. Explanation: The node on which this command is run
does not have access to the disk.
User response: Run the command on a valid logical
volume. User response: Run this command from another node
that has access to the disk.
6027-2028 could not open disk device 6027-2101 Insufficient free space in fileSystem
diskDeviceName (storage minimum required).
Explanation: A problem occurred on a disk open. Explanation: There is not enough free space in the
specified file system or directory for the command to
User response: Ensure the disk is accessible and not
successfully complete.
fenced out, and then reissue the command.
User response: Correct the problem and reissue the
command.
6027-2029 could not close disk device
diskDeviceName
6027-2102 Node nodeName is not mmremotefs to
Explanation: A problem occurred on a disk close.
run the command.
User response: None.
Explanation: The specified node is not available to run
a command. Depending on the command, a different
6027-2030 ioctl failed with DSB=value and node may be tried.
result=value reason: explanation
User response: Determine why the specified node is
Explanation: An ioctl call failed with stated return not available and correct the problem.
code, errno value, and related values.
User response: Check the reported errno and correct 6027-2103 Directory dirName does not exist
the problem, if possible. Otherwise, contact the IBM
Explanation: The specified directory does not exist.
Support Center.
User response: Reissue the command specifying an
existing directory.
6027-2031 ioctl failed with non-zero return code
Explanation: An ioctl failed with a non-zero return
6027-2104 The GPFS release level could not be
code.
determined on nodes: nodeList.
User response: Correct the problem, if possible.
Explanation: The command was not able to determine
Otherwise, contact the IBM Support Center.
the level of the installed GPFS code on the specified
nodes.
| 6027-2049 [X] Cannot pin a page pool of size value
User response: Reissue the command after correcting
bytes.
the problem.
Explanation: A GPFS page pool cannot be pinned into
memory on this machine.
6027-2105 The following nodes must be upgraded
User response: Increase the physical memory size of to GPFS release productVersion or higher:
the machine. nodeList
Explanation: The command requires that all nodes be
| 6027-2050 [E] Pagepool has size actualValue bytes at the specified GPFS release level.
instead of the requested requestedValue
User response: Correct the problem and reissue the
bytes.
command.
Explanation: The configured GPFS page pool is too
large to be allocated or pinned into memory on this
6027-2106 Ensure the nodes are available and run:
machine. GPFS will work properly, but with reduced
command.
capacity for caching user data.
Explanation: The command could not complete
User response: To prevent this message from being
normally.
generated when the GPFS daemon starts, reduce the
page pool size using the mmchconfig command. User response: Check the preceding messages, correct
the problems, and issue the specified command until it
completes successfully.
6027-2100 Incorrect range value-value specified.
Explanation: The range specified to the command is
6027-2107 Upgrade the lower release level nodes
incorrect. The first parameter value must be less than
and run: command.
or equal to the second parameter value.
Explanation: The command could not complete
User response: Correct the address range and reissue
normally.
the command.
User response: Check the preceding messages, correct
6027-2121 [I] Recovery group name deleted on node 6027-2128 [E] The attribute attribute must be
nodeName. configured to use hostname as a recovery
group server.
Explanation: The recovery group has been deleted.
Explanation: The specified GPFS configuration
User response: This is an informational message.
attributes must be configured to use the node as a
recovery group server.
6027-2122 [E] The number of spares (numberOfSpares)
User response: Use the mmchconfig command to set
must be less than the number of pdisks
the attributes, then reissue the command.
(numberOfpdisks) being created.
Explanation: The number of spares specified must be
6027-2129 [E] Vdisk block size (blockSize) must match
less than the number of pdisks that are being created.
the file system block size (blockSize).
User response: Correct the input and reissue the
Explanation: The specified NSD is a vdisk with a
command.
block size that does not match the block size of the file
system.
6027-2123 [E] The GPFS daemon is down on the
User response: Reissue the command using block
vdiskName servers.
sizes that match.
Explanation: The GPFS daemon was down on the
vdisk servers when mmdelvdisk was issued.
6027-2130 [E] Could not find an active server for
User response: Start the GPFS daemon on the recovery group name.
specified nodes and issue the specified mmdelvdisk
Explanation: A command was issued that acts on a
command.
recovery group, but no active server was found for the
specified recovery group.
6027-2124 [E] Vdisk vdiskName is still NSD nsdName.
User response: Perform problem determination.
Use the mmdelnsd command.
Explanation: The specified vdisk is still an NSD.
6027-2131 [E] Cannot create an NSD on a log vdisk.
User response: Use the mmdelnsd command.
Explanation: The specified disk is a log vdisk; it
cannot be used for an NSD.
6027-2125 [E] nsdName is a vdisk-based NSD and
User response: Specify another disk that is not a log
cannot be used as a tiebreaker disk.
vdisk.
Explanation: Vdisk-based NSDs cannot be specified as
tiebreaker disks.
6027-2132 [E] Log vdisk vdiskName cannot be deleted
User response: Correct the input and reissue the while there are other vdisks in recovery
command. group name.
Explanation: The specified disk is a log vdisk; it must
6027-2126 [I] No recovery groups were found. be the last vdisk deleted from the recovery group.
Explanation: A command searched for recovery User response: Delete the other vdisks first.
groups but found none.
User response: None. Informational message only. 6027-2133 [E] Unable to delete recovery group name;
vdisks are still defined.
6027-2127 [E] Disk descriptor descriptor refers to an Explanation: Cannot delete a recovery group while
existing pdisk. there are still vdisks defined.
Explanation: The specified disk descriptor refers to an User response: Delete all the vdisks first.
existing pdisk.
User response: Specify another disk that is not an 6027-2134 Node nodeName cannot be used as an
existing pdisk. NSD server for Persistent Reserve disk
diskName because it is not a Linux node.
Explanation: There was an attempt to enable
Persistent Reserve for a disk, but not all of the NSD
server nodes are running Linux.
| 6027-2150 The archive system client backupProgram 6027-2156 The image archive index ImagePath could
could not be found or is not executable. not be found.
Explanation: TSM dsmc or other specified backup or Explanation: The archive image index could be found
archive system client could not be found. in the specified path
User response: Verify that TSM is installed, dsmc can User response: Check command arguments for correct
be found in the installation location or that the archiver specification of image path, then try the command
client specified is executable. again.
6027-2151 The path directoryPath is not contained 6027-2157 The image archive index ImagePath is
in the snapshot snapshotName. corrupt or incomplete.
Explanation: The directory path supplied is not Explanation: The archive image index specified is
contained in the snapshot named with the -S damaged.
parameter.
User response: Check the archive image index file for
User response: Correct the directory path or snapshot corruption and remedy.
name supplied, or omit -S and the snapshot name in
the command.
6027-2158 Disk usage must be dataOnly,
metadataOnly, descOnly,
6027-2152 The path directoryPath containing image | dataAndMetadata, vdiskLog,
archives was not found. | vdiskLogTip, vdiskLogTipBackup, or
| vdiskLogReserved.
Explanation: The directory path supplied does not
contain the expected image files to archive into TSM. Explanation: The disk usage positional parameter in a
vdisk descriptor has a value that is not valid. The bad
User response: Correct the directory path name
disk descriptor is displayed following this message.
supplied.
User response: Correct the input and reissue the
command.
6027-2153 The archiving system backupProgram
exited with status return code. Image
backup files have been preserved in 6027-2159 [E] parameter is not valid or missing in the
globalWorkDir vdisk descriptor.
Explanation: Archiving system executed and returned Explanation: The vdisk descriptor is not valid. The
a non-zero exit status due to some error. bad descriptor is displayed following this message.
User response: Examine archiver log files to discern User response: Correct the input and reissue the
the cause of the archiver's failure. Archive the command.
preserved image files from the indicated path.
6027-2160 [E] Vdisk vdiskName is already mapped to
6027-2154 Unable to create a policy file for image NSD nsdName.
backup in policyFilePath.
Explanation: The command cannot create the specified
Explanation: A temporary file could not be created in NSD because the underlying vdisk is already mapped
the global shared directory path. to a different NSD.
User response: Check or correct the directory path User response: Correct the input and reissue the
name supplied. command.
6027-2155 File system fileSystem must be mounted 6027-2161 [E] NAS servers cannot be specified when
read only for restore. creating an NSD on a vdisk.
Explanation: The empty file system targeted for Explanation: The command cannot create the specified
restoration must be mounted in read only mode during NSD because servers were specified and the underlying
restoration. disk is a vdisk.
User response: Unmount the file system on all nodes User response: Correct the input and reissue the
and remount it read only, then try the command again. command.
6027-2174 Option option can be specified only in 6027-2183 [E] Peer snapshots using mmpsnap are
conjunction with option. allowed only for single-writer filesets.
Explanation: The cited option cannot be specified by Explanation: This operation is allowed only for
itself. single-writer filesets.
User response: Correct the input and reissue the User response: Check the previous error messages
command. and correct the problems.
6027-2175 [E] Exported path exportPath does not exist 6027-2184 [E] If the recovery group is damaged, issue
mmdelrecoverygroup name -p.
Explanation: The directory or one of the components
in the directory path to be exported does not exist. Explanation: No active servers were found for the
recovery group that is being deleted. If the recovery
User response: Correct the input and reissue the
group is damaged the -p option is needed.
command.
User response: Perform diagnosis and reissue the
command.
6027-2176 [E] mmchattr for fileName failed.
Explanation: The command to change the attributes of
6027-2185 [E] There are no pdisk stanzas in the input
the file failed.
file fileName.
User response: Check the previous error messages
Explanation: The mmcrrecoverygroup input stanza
and correct the problems.
file has no pdisk stanzas.
User response: Correct the input file and reissue the
6027-2177 [E] Cannot create file fileName.
command.
Explanation: The command to create the specified file
failed.
6027-2186 [E] There were no valid vdisk stanzas in the
User response: Check the previous error messages input file fileName.
and correct the problems.
Explanation: The mmcrvdisk input stanza file has no
valid vdisk stanzas.
6027-2178 File fileName does not contain any NSD
User response: Correct the input file and reissue the
descriptors or stanzas.
command.
Explanation: The input file should contain at least one
NSD descriptor or stanza.
6027-2187 [E] Could not get pdisk information for the
User response: Correct the input file and reissue the following recovery groups:
command. recoveryGroupList
Explanation: An mmlspdisk all command could not
6027-2181 [E] Failover is allowed only for query all of the recovery groups because some nodes
single-writer and independent-writer could not be reached.
filesets.
User response: None.
Explanation: This operation is allowed only for
single-writer filesets.
6027-2188 Unable to determine the local node
User response: Check the previous error messages identity.
and correct the problems.
Explanation: The command is not able to determine
the identity of the local node. This can be the result of
6027-2182 [E] Resync is allowed only for single-writer a disruption in the network over which the GPFS
filesets. daemons communicate.
Explanation: This operation is allowed only for User response: Ensure the GPFS daemon network (as
single-writer filesets. identified in the output of the mmlscluster command)
is fully operational and reissue the command.
User response: Check the previous error messages
and correct the problems.
6027-2189 [E] Action action is allowed only for 6027-2197 [E] Empty file encountered when running
read-only filesets. the mmafmctl flushPending command.
Explanation: The specified action is only allowed for Explanation: The mmafmctl flushPending command
read-only filesets. did not find any entries in the file specified with the
--list-file option.
User response: None.
User response: Correct the input file and reissue the
command.
6027-2190 [E] Cannot prefetch file fileName. The file
does not belong to fileset fileset.
6027-2198 [E] Cannot run the mmafmctl flushPending
Explanation: The requested file does not belong to the
command on directory dirName.
fileset.
Explanation: The mmafmctl flushPending command
User response: None.
cannot be issued on this directory.
User response: Correct the input and reissue the
6027-2191 [E] Vdisk vdiskName not found in recovery
command.
group recoveryGroupName.
Explanation: The mmdelvdisk command was invoked
6027-2199 [E] No enclosures were found.
with the --recovery-group option to delete one or more
vdisks from a specific recovery group. The specified Explanation: A command searched for disk enclosures
vdisk does not exist in this recovery group. but none were found.
User response: Correct the input and reissue the User response: None.
command.
6027-2200 [E] Cannot have multiple nodes updating
6027-2193 [E] Recovery group recoveryGroupName must firmware for the same enclosure.
be active on the primary server Enclosure serialNumber is already being
serverName. updated by node nodeName.
Explanation: The recovery group must be active on Explanation: The mmchenclosure command was
the specified node. called with multiple nodes updating the same
firmware.
User response: Use the mmchrecoverygroup
command to activate the group and reissue the User response: Correct the node list and reissue the
command. command.
6027-2194 [E] The state of fileset filesetName is Expired; 6027-2201 [E] The mmafmctl flushPending command
prefetch cannot be performed. completed with errors.
Explanation: The prefetch operation cannot be Explanation: An error occurred while flushing the
performed on filesets that are in the Expired state. queue.
User response: None. User response: Examine the GPFS log to identify the
cause.
6027-2195 [E] Error getting snapshot ID for
snapshotName. 6027-2202 [E] There is a SCSI-3 PR reservation on
disk diskname. mmcrnsd cannot format
Explanation: The command was unable to obtain the
the disk because the cluster is not
resync snapshot ID.
configured as PR enabled.
User response: Examine the preceding messages,
Explanation: The specified disk has a SCSI-3 PR
correct the problem, and reissue the command. If the
reservation, which prevents the mmcrnsd command
problem persists, perform problem determination and
from formatting it.
contact the IBM Support Center.
User response: Clear the PR reservation by following
the instructions in “Clearing a leftover Persistent
6027-2196 [E] Resync is allowed only when the fileset
Reserve reservation” on page 103.
queue is in active state.
Explanation: This operation is allowed only when the
fileset queue is in active state.
User response: None.
6027-2203 Node nodeName is not a gateway node. 6027-2210 [E] Unable to build a storage enclosure
inventory file on node nodeName.
Explanation: The specified node is not a gateway
node. Explanation: A command was unable to build a
storage enclosure inventory file. This is a temporary file
User response: Designate the node as a gateway node
that is required to complete the requested command.
or specify a different node on the command line.
User response: None.
| 6027-2204 AFM target map mapName is already
| defined. 6027-2211 [E] Error collecting firmware information on
node nodeName.
| Explanation: A request was made to create an AFM
| target map with the cited name, but that map name is Explanation: A command was unable to gather
| already defined. firmware information from the specified node.
| User response: Specify a different name for the new User response: Ensure the node is active and retry the
| AFM target map or first delete the current map command.
| definition and then recreate it.
6027-2212 [E] Firmware update file updateFile was not
| 6027-2205 There are no AFM target map found.
| definitions.
Explanation: The mmchfirmware command could not
| Explanation: A command searched for AFM target find the specified firmware update file to load.
| map definitions but found none.
User response: Locate the firmware update file and
| User response: None. Informational message only. retry the command.
| 6027-2206 AFM target map mapName is not 6027-2213 [E] Pdisk path redundancy was lost while
| defined. updating enclosure firmware.
| Explanation: The cited AFM target map name is not Explanation: The mmchfirmware command lost paths
| known to GPFS. after loading firmware and rebooting the Enclosure
Services Module.
| User response: Specify an AFM target map known to
| GPFS. User response: Wait a few minutes and then retry the
command. GPFS might need to be shut down to finish
updating the enclosure firmware.
6027-2207 Node nodeName is being used as a
gateway node for the AFM cluster
clusterName. 6027-2214 [E] Timeout waiting for firmware to load.
Explanation: The specified node is defined as a Explanation: A storage enclosure firmware update
gateway node for the specified AFM cluster. was in progress, but the update did not complete
within the expected time frame.
User response: If you are trying to delete the node
from the GPFS cluster or delete the gateway node role, User response: Wait a few minutes, and then use the
you must remove it from the export server map. mmlsfirmware command to ensure the operation
completed.
6027-2208 [E] commandName is already running in the
cluster. 6027-2215 [E] Storage enclosure serialNumber not
found.
Explanation: Only one instance of the specified
command is allowed to run. Explanation: The specified storage enclosure was not
found.
User response: None.
User response: None.
6027-2209 [E] Unable to list objectName on node
nodeName. 6027-2216 Quota management is disabled for file
system fileSystem.
Explanation: A command was unable to list the
specific object that was requested. Explanation: Quota management is disabled for the
specified file system.
User response: None.
User response: Enable quota management for the file
system.
6027-2217 [E] Error errno updating firmware for drives 6027-2223 [E] Storage enclosure serialNumber is not
driveList. redundant. Shutdown GPFS in the
cluster and retry the mmchfirmware
Explanation: The firmware load failed for the
command.
specified drives. Some of the drives may have been
updated. Explanation: The mmchfirmware command found a
non-redundant storage enclosure. Proceeding could
User response: None.
cause loss of data access.
User response: Shutdown GPFS in the cluster and
6027-2218 [E] Storage enclosure serialNumber
retry the mmchfirmware command.
component componentType component ID
componentId not found.
6027-2224 [E] Peer snapshot creation failed. Error code
Explanation: The mmchenclosure command could not
errorCode.
find the component specified for replacement.
Explanation: For an active fileset, check the AFM
User response: Use the mmlsenclosure command to
target configuration for peer snapshots. Ensure there is
determine valid input and then retry the command.
at least one gateway node configured for the cluster.
Examine the preceding messages and the GPFS log for
6027-2219 [E] Storage enclosure serialNumber additional details.
component componentType component ID
User response: Correct the problems and reissue the
componentId did not fail. Service is not
command.
required.
Explanation: The component specified for the
6027-2225 [E] Peer snapshot successfully deleted at
mmchenclosure command does not need service.
cache. The delete snapshot operation
User response: Use the mmlsenclosure command to failed at home. Error code errorCode.
determine valid input and then retry the command.
Explanation: For an active fileset, check the AFM
target configuration for peer snapshots. Ensure there is
6027-2220 [E] Recovery group name has pdisks with at least one gateway node configured for the cluster.
missing paths. Consider using the -v no Examine the preceding messages and the GPFS log for
option of the mmchrecoverygroup additional details.
command.
User response: Correct the problems and reissue the
Explanation: The mmchrecoverygroup command command.
failed because all the servers could not see all the disks,
and the primary server is missing paths to disks.
6027-2226 [E] Invalid firmware update file.
User response: If the disks are cabled correctly, use
Explanation: An invalid firmware update file was
the -v no option of the mmchrecoverygroup command.
specified for the mmchfirmware command.
User response: Reissue the command with a valid
6027-2221 [E] Error determining redundancy of
update file.
enclosure serialNumber ESM esmName.
Explanation: The mmchrecoverygroup command
6027-2227 [E] Failback is allowed only for
failed. Check the following error messages.
independent-writer filesets.
User response: Correct the problem and retry the
Explanation: Failback operation is allowed only for
command.
independent-writer filesets.
User response: Check the fileset mode.
6027-2222 [E] Storage enclosure serialNumber already
has a newer firmware version:
firmwareLevel. 6027-2228 [E] The daemon version (daemonVersion) on
node nodeName is lower than the
Explanation: The mmchfirmware command found a
daemon version (daemonVersion) on node
newer level of firmware on the specified storage
nodeName.
enclosure.
Explanation: A command was issued that requires
User response: If the intent is to force on the older
nodes to be at specific levels, but the affected GPFS
firmware version, use the -v no option.
servers are not at compatible levels to support this
operation.
User response: Update the GPFS code on the specified User response: Wait for the currently running
servers and retry the command. command to complete and reissue the command.
| 6027-2229 [E] Cache Eviction/Prefetch is not allowed 6027-2501 Could not allocate storage.
| for DR filesets.
Explanation: Sufficient memory could not be allocated
| Explanation: Cache Eviction/Prefetch is not allowed to run the mmsanrepairfs command.
| for DR filesets.
User response: Increase the amount of memory
| User response: None. available.
| 6027-2230 [E] afmTarget=newTargetString is not | 6027-2576 [E] Error: Daemon value kernel value
| allowed. To change the AFM target, use PAGE_SIZE mismatch.
| mmafmctl failover with the --target-only
Explanation: The GPFS kernel extension loaded in
| option.
memory does not have the same PAGE_SIZE value as
| Explanation: The mmchfileset command cannot be the GPFS daemon PAGE_SIZE value that was returned
| used to change the NFS server or IP address of the from the POSIX sysconf API.
| home cluster.
User response: Verify that the kernel header files used
| User response: To change the AFM target, use the to build the GPFS portability layer are the same kernel
| mmafmctl failover command and specify the header files used to build the running kernel.
| --target-only option.
6027-2600 Cannot create a new snapshot until an
| 6027-2231 [E] The specified block size blockSize is existing one is deleted. File system
| smaller than the system page size fileSystem has a limit of number online
| pageSize. snapshots.
| Explanation: The file system block size cannot be Explanation: The file system has reached its limit of
| smaller than the system memory page size. online snapshots
| User response: Specify a block size greater than or User response: Delete an existing snapshot, then issue
| equal to the system memory page size. the create snapshot command again.
| 6027-2232 [E] Peer snapshots are allowed only for 6027-2601 Snapshot name dirName already exists.
| targets using the NFS protocol.
Explanation: by the tscrsnapshot command.
| Explanation: The mmpsnap command can be used to
User response: Delete existing file/directory and
| create snapshots only for filesets that are configured to
reissue the command.
| use the NFS protocol.
| User response: Specify a valid fileset target.
6027-2602 Unable to delete snapshot snapshotName
from file system fileSystem. rc=returnCode.
| 6027-2233 [E] Fileset filesetName in file system
Explanation: This message is issued by the
| filesystemName does not contain peer
tscrsnapshot command.
| snapshot snapshotName. The delete
| snapshot operation failed at cache. Error User response: Delete the snapshot using the
| code errorCode. tsdelsnapshot command.
| Explanation: The specified snapshot name was not
| found. The command expects the name of an existing 6027-2603 Unable to get permission to create
| peer snapshot of the active fileset in the specified file snapshot, rc=returnCode.
| system.
Explanation: This message is issued by the
| User response: Reissue the command with a valid tscrsnapshot command.
| peer snapshot name.
User response: Reissue the command.
6027-2604 Unable to quiesce all nodes, 6027-2611 Cannot delete snapshot snapshotName
rc=returnCode. which is in state snapshotState.
Explanation: This message is issued by the Explanation: The snapshot cannot be deleted while it
tscrsnapshot command. is in the cited transition state because of an in-progress
snapshot operation.
User response: Restart failing nodes or switches and
reissue the command. User response: Wait for the in-progress operation to
complete and then reissue the command.
6027-2605 Unable to resume all nodes,
rc=returnCode. 6027-2612 Snapshot named snapshotName does not
exist.
Explanation: This message is issued by the
tscrsnapshot command. Explanation: A snapshot to be listed does not exist.
User response: Restart failing nodes or switches. User response: Specify only existing snapshot names.
6027-2606 Unable to sync all nodes, rc=returnCode. 6027-2613 Cannot restore snapshot. fileSystem is
mounted on number node(s) and in use
Explanation: This message is issued by the
on number node(s).
tscrsnapshot command.
Explanation: This message is issued by the
User response: Restart failing nodes or switches and
tsressnapshot command.
reissue the command.
User response: Unmount the file system and reissue
the restore command.
6027-2607 Cannot create new snapshot until an
existing one is deleted. Fileset
filesetName has a limit of number 6027-2614 File system fileSystem does not contain
snapshots. snapshot snapshotName err = number.
Explanation: The fileset has reached its limit of Explanation: An incorrect snapshot name was
snapshots. specified.
User response: Delete an existing snapshot, then issue User response: Specify a valid snapshot and issue the
the create snapshot command again. command again.
6027-2608 Cannot create new snapshot: state of 6027-2615 Cannot restore snapshot snapshotName
fileset filesetName is inconsistent which is snapshotState, err = number.
(badState).
Explanation: The specified snapshot is not in a valid
Explanation: An operation on the cited fileset is state.
incomplete.
User response: Specify a snapshot that is in a valid
User response: Complete pending fileset actions, then state and issue the command again.
issue the create snapshot command again.
6027-2616 Restoring snapshot snapshotName
6027-2609 Fileset named filesetName does not exist. requires quotaTypes quotas to be enabled.
Explanation: One of the filesets listed does not exist. Explanation: The snapshot being restored requires
quotas to be enabled, since they were enabled when the
User response: Specify only existing fileset names.
snapshot was created.
User response: Issue the recommended mmchfs
6027-2610 File system fileSystem does not contain
command to enable quotas.
snapshot snapshotName err = number.
Explanation: An incorrect snapshot name was
6027-2617 You must run: mmchfs fileSystem -Q yes.
specified.
Explanation: The snapshot being restored requires
User response: Select a valid snapshot and issue the
quotas to be enabled, since they were enabled when the
command again.
snapshot was created.
User response: Issue the cited mmchfs command to
enable quotas.
| 6027-2618 [N] Restoring snapshot snapshotName in file 6027-2624 Previous snapshot snapshotName is not
system fileSystem requires quotaTypes valid and must be deleted before a new
quotas to be enabled. snapshot may be created.
Explanation: The snapshot being restored in the cited Explanation: The cited previous snapshot is not valid
file system requires quotas to be enabled, since they and must be deleted before a new snapshot may be
were enabled when the snapshot was created. created.
User response: Issue the mmchfs command to enable User response: Delete the previous snapshot using the
quotas. mmdelsnapshot command, and then reissue the
original snapshot command.
6027-2619 Restoring snapshot snapshotName
requires quotaTypes quotas to be 6027-2625 Previous snapshot snapshotName must be
disabled. restored before a new snapshot may be
created.
Explanation: The snapshot being restored requires
quotas to be disabled, since they were not enabled Explanation: The cited previous snapshot must be
when the snapshot was created. restored before a new snapshot may be created.
User response: Issue the cited mmchfs command to User response: Run mmrestorefs on the previous
disable quotas. snapshot, and then reissue the original snapshot
command.
6027-2620 You must run: mmchfs fileSystem -Q no.
6027-2626 Previous snapshot snapshotName is not
Explanation: The snapshot being restored requires
valid and must be deleted before
quotas to be disabled, since they were not enabled
another snapshot may be deleted.
when the snapshot was created.
Explanation: The cited previous snapshot is not valid
User response: Issue the cited mmchfs command to
and must be deleted before another snapshot may be
disable quotas.
deleted.
User response: Delete the previous snapshot using the
| 6027-2621 [N] Restoring snapshot snapshotName in file
mmdelsnapshot command, and then reissue the
system fileSystem requires quotaTypes
original snapshot command.
quotas to be disabled.
Explanation: The snapshot being restored in the cited
6027-2627 Previous snapshot snapshotName is not
file system requires quotas to be disabled, since they
valid and must be deleted before
were disabled when the snapshot was created.
another snapshot may be restored.
User response: Issue the mmchfs command to disable
Explanation: The cited previous snapshot is not valid
quotas.
and must be deleted before another snapshot may be
restored.
6027-2622 Error restoring inode inode, err number
User response: Delete the previous snapshot using the
Explanation: The online snapshot was corrupted. mmdelsnapshot command, and then reissue the
original snapshot command.
User response: Restore the file from an offline
snapshot.
6027-2628 More than one snapshot is marked for
restore.
| 6027-2623 [E] Error deleting snapshot snapshotName in
file system fileSystem err number Explanation: More than one snapshot is marked for
restore.
Explanation: The cited snapshot could not be deleted
during file system recovery. User response: Restore the previous snapshot and
then reissue the original snapshot command.
User response: Run the mmfsck command to recover
any lost data blocks.
6027-2629 Offline snapshot being restored.
Explanation: An offline snapshot is being restored.
User response: When the restore of the offline
snapshot completes, reissue the original snapshot
command.
6027-2633 Attention: Disk configuration for 6027-2640 Incorrect path to fileset junction
fileSystem has changed while tsdf was filesetJunction.
running.
Explanation: The path to the cited fileset junction is
Explanation: The disk configuration for the cited file incorrect.
system changed while the tsdf command was running.
User response: Correct the junction path and reissue
User response: Reissue the mmdf command. the command.
6027-2634 Attention: number of number regions in 6027-2641 Incorrect fileset junction name
fileSystem were unavailable for free filesetJunction.
space.
Explanation: The cited junction name is incorrect.
Explanation: Some regions could not be accessed
during the tsdf run. Typically, this is due to utilities User response: Correct the junction name and reissue
such mmdefragfs or mmfsck running concurrently. the command.
6027-2645 Fileset filesetName already exists. Explanation: The user tried to unlink the root fileset,
or is not authorized to unlink the selected fileset.
Explanation: An attempt to create a fileset failed
because the specified fileset name already exists. User response: None. The fileset cannot be unlinked.
6027-2646 Unable to sync all nodes while Explanation: An attempt was made to unlink a fileset
quiesced, rc=returnCode that is linked to a parent fileset that is being deleted.
Explanation: This message is issued by the User response: Delete or unlink the children, and then
tscrsnapshot command. delete the parent fileset.
6027-2647 Fileset filesetName must be unlinked to Explanation: The fileset to be deleted has other filesets
be deleted. linked to it, and cannot be deleted without using the -f
flag, or unlinking the child filesets.
Explanation: The cited fileset must be unlinked before
it can be deleted. User response: Delete or unlink the children, and then
delete the parent fileset.
User response: Unlink the fileset, and then reissue the
delete command.
6027-2655 Fileset filesetName cannot be deleted.
6027-2648 Filesets have not been enabled for file Explanation: The user is not allowed to delete the root
system fileSystem. fileset.
Explanation: The current file system format version User response: None. The fileset cannot be deleted.
does not support filesets.
User response: Change the file system format version 6027-2656 Unable to quiesce fileset at all nodes.
by issuing mmchfs -V. Explanation: An attempt to quiesce the fileset at all
nodes failed.
6027-2649 Fileset filesetName contains user files and User response: Check communication hardware and
cannot be deleted unless the -f option is reissue the command.
specified.
Explanation: An attempt was made to delete a
non-empty fileset.
User response: Remove all files and directories from
6027-2657 Fileset filesetName has open files. Specify 6027-2664 Fileset at pathName cannot be changed.
-f to force unlink.
Explanation: The user specified a fileset to tschfileset
Explanation: An attempt was made to unlink a fileset that cannot be changed.
that has open files.
User response: None. You cannot change the
User response: Close the open files and then reissue attributes of the root fileset.
command, or use the -f option on the unlink command
to force the open files to close.
6027-2665 mmfileid already in progress for name.
Explanation: An mmfileid command is already
6027-2658 Fileset filesetName cannot be linked into
running.
a snapshot at pathName.
User response: Wait for the currently running
Explanation: The user specified a directory within a
command to complete, and issue the new command
snapshot for the junction to a fileset, but snapshots
again.
cannot be modified.
User response: Select a directory within the active file
6027-2666 mmfileid can only handle a maximum
system, and reissue the command.
of diskAddresses disk addresses.
Explanation: Too many disk addresses specified.
6027-2659 Fileset filesetName is already linked.
User response: Provide less than 256 disk addresses to
Explanation: The user specified a fileset that was
the command.
already linked.
User response: Unlink the fileset and then reissue the
| 6027-2667 [I] Allowing block allocation for file
link command.
system fileSystem that makes a file
ill-replicated due to insufficient resource
6027-2660 Fileset filesetName cannot be linked. and puts data at risk.
Explanation: The fileset could not be linked. This Explanation: The partialReplicaAllocation file system
typically happens when the fileset is in the process of option allows allocation to succeed even when all
being deleted. replica blocks cannot be allocated. The file was marked
as not replicated correctly and the data may be at risk
User response: None.
if one of the remaining disks fails.
User response: None. Informational message only.
6027-2661 Fileset junction pathName already exists.
Explanation: A file or directory already exists at the
6027-2670 Fileset name filesetName not found.
specified junction.
Explanation: The fileset name that was specified with
User response: Select a new junction name or a new
the command invocation was not found.
directory for the link and reissue the link command.
User response: Correct the fileset name and reissue
the command.
6027-2662 Directory pathName for junction has too
many links.
6027-2671 Fileset command on fileSystem failed;
Explanation: The directory specified for the junction
snapshot snapshotName must be restored
has too many links.
first.
User response: Select a new directory for the link and
Explanation: The file system is being restored either
reissue the command.
from an offline backup or a snapshot, and the restore
operation has not finished. Fileset commands cannot be
6027-2663 Fileset filesetName cannot be changed. run.
Explanation: The user specified a fileset to tschfileset User response: Run the mmrestorefs command to
that cannot be changed. complete the snapshot restore operation or to finish the
offline restore, then reissue the fileset command.
User response: None. You cannot change the
attributes of the root fileset.
6027-2672 Junction parent directory inode number 6027-2677 Fileset filesetName has pending changes
inodeNumber is not valid. that need to be synced.
Explanation: An inode number passed to tslinkfileset Explanation: A user is trying to change a caching
is not valid. option for a fileset while it has local changes that are
not yet synced with the home server.
User response: Check the mmlinkfileset command
arguments for correctness. If a valid junction path was User response: Perform AFM recovery before
provided, contact the IBM Support Center. reissuing the command.
| 6027-2673 [X] Duplicate owners of an allocation region 6027-2678 Filesystem fileSystem is mounted on
(index indexNumber, region regionNumber, nodes nodes or fileset filesetName is not
pool poolNumber) were detected for file unlinked.
system fileSystem: nodes nodeName and
Explanation: A user is trying to change a caching
nodeName.
feature for a fileset while the filesystem is still mounted
Explanation: The allocation region should not have or the fileset is still linked.
duplicate owners.
User response: Unmount the filesystem from all nodes
User response: Contact the IBM Support Center. or unlink the fileset before reissuing the command.
| 6027-2674 [X] The owner of an allocation region 6027-2679 Mount of fileSystem failed because
(index indexNumber, region regionNumber, mount event not handled by any data
pool poolNumber) that was detected for management application.
file system fileSystem: node nodeName is
Explanation: The mount failed because the file system
not valid.
is enabled for DMAPI events (-z yes), but there was no
Explanation: The file system had detected a problem data management application running to handle the
with the ownership of an allocation region. This may event.
result in a corrupted file system and loss of data. One
User response: Make sure the DM application (for
or more nodes may be terminated to prevent any
example HSM or HPSS) is running before the file
further damage to the file system.
system is mounted.
User response: Unmount the file system and run the
kwdmmfsck command to repair the file system.
6027-2680 AFM filesets cannot be created for file
system fileSystem.
6027-2675 Only file systems with NFSv4 ACL
Explanation: The current file system format version
semantics enabled can be mounted on
does not support AFM-enabled filesets; the -p option
this platform.
cannot be used.
Explanation: A user is trying to mount a file system
User response: Change the file system format version
on Microsoft Windows, but the ACL semantics disallow
by issuing mmchfs -V.
NFSv4 ACLs.
User response: Enable NFSv4 ACL semantics using
6027-2681 Snapshot snapshotName has linked
the mmchfs command (-k option)
independent filesets
Explanation: The specified snapshot is not in a valid
6027-2676 Only file systems with NFSv4 locking
state.
semantics enabled can be mounted on
this platform. User response: Correct the problem and reissue the
command.
Explanation: A user is trying to mount a file system
on Microsoft Windows, but the POSIX locking
semantics are in effect. | 6027-2682 [E] Set quota file attribute error
(reasonCode)explanation
User response: Enable NFSv4 locking semantics using
the mmchfs command (-D option). Explanation: While mounting a file system a new
quota file failed to be created due to inconsistency with
the current degree of replication or the number of
failure groups.
User response: Disable quotas. Check and correct the
| 6027-2696 [E] The number of inodes to preallocate 6027-2702 Unexpected mmpmon response from file
cannot be lower than the number inodes system daemon.
already allocated.
Explanation: An unexpected response was received to
Explanation: The specified number of nodes to an mmpmon request.
preallocate is not valid.
User response: Ensure that the mmfsd daemon is
User response: Correct the --inode-limit argument running. Check the error log. Ensure that all GPFS
then retry the command. software components are at the same version.
6027-2697 Fileset at junctionPath has pending 6027-2703 Unknown mmpmon command command.
changes that need to be synced.
Explanation: An unknown mmpmon command was
Explanation: A user is trying to change a caching read from the input file.
option for a fileset while it has local changes that are
User response: Correct the command and rerun.
not yet synced with the home server.
User response: Perform AFM recovery before
6027-2704 Permission failure. The command
reissuing the command.
requires root authority to execute.
Explanation: The mmpmon command was issued
6027-2698 File system fileSystem is mounted on
with a nonzero UID.
nodes nodes or fileset at junctionPath is
not unlinked. User response: Log on as root and reissue the
command.
Explanation: A user is trying to change a caching
feature for a fileset while the filesystem is still mounted
or the fileset is still linked. 6027-2705 Could not establish connection to file
system daemon.
User response: Unmount the filesystem from all nodes
or unlink the fileset before reissuing the command. Explanation: The connection between a GPFS
command and the mmfsd daemon could not be
established. The daemon may have crashed, or never
6027-2699 Cannot create a new independent fileset
been started, or (for mmpmon) the allowed number of
until an existing one is deleted. File
simultaneous connections has been exceeded.
system fileSystem has a limit of
maxNumber independent filesets. User response: Ensure that the mmfsd daemon is
running. Check the error log. For mmpmon, ensure
Explanation: An attempt to create an independent
that the allowed number of simultaneous connections
fileset for the cited file system failed because it would
has not been exceeded.
exceed the cited limit.
User response: Remove unneeded independent filesets
and reissue the command.
| 6027-2706 [I] Recovered number nodes.
Explanation: The asynchronous part (phase 2) of node
failure recovery has completed.
| 6027-2700 [E] A node join was rejected. This could be
due to incompatible daemon versions, User response: None. Informational message only.
failure to find the node in the
configuration database, or no
configuration manager found. | 6027-2707 [I] Node join protocol waiting value
seconds for node recovery
Explanation: A request to join nodes was explicitly
rejected. Explanation: Node join protocol is delayed until phase
2 of previous node failure recovery protocol is
User response: Verify that compatible versions of complete.
GPFS are installed on all nodes. Also, verify that the
joining node is in the configuration database. User response: None. Informational message only.
6027-2701 The mmpmon command file is empty. | 6027-2708 [E] Rejected node join protocol. Phase two
of node failure recovery appears to still
Explanation: The mmpmon command file is empty. be in progress.
User response: Check file size, existence, and access Explanation: Node join protocol is rejected after a
permissions. number of internal delays and phase two node failure
protocol is still in progress.
Explanation: The mmfsd daemon cannot add any Explanation: The mmchmgr -c command generates
more file systems to the table because it is full. this message if the specified node is already the cluster
manager.
User response: None. Informational message only.
User response: None. Informational message only.
6027-2714 Could not appoint node nodeName as | 6027-2722 [E] Node limit of number has been reached.
cluster manager. Ignoring nodeName.
Explanation: The mmchmgr -c command generates Explanation: The number of nodes that have been
this message if the specified node cannot be appointed added to the cluster is greater than some cluster
as a new cluster manager. members can handle.
User response: Make sure that the specified node is a User response: Delete some nodes from the cluster
quorum node and that GPFS is running on that node. using the mmdelnode command, or shut down GPFS
on nodes that are running older versions of the code
with lower limits.
| 6027-2723 [N] This node (nodeName) is now Cluster 6027-2729 Value value for option optionName is out
Manager for clusterName. of range. Valid values are value through
value.
Explanation: This is an informational message when a
new cluster manager takes over. Explanation: An out of range value was specified for
the specified option.
User response: None. Informational message only.
User response: Correct the command line.
| 6027-2724 [I] reasonString. Probing cluster clusterName
| 6027-2730 [E] Node nodeName failed to take over as
Explanation: This is an informational message when a
cluster manager.
lease request has not been renewed.
Explanation: An attempt to takeover as cluster
User response: None. Informational message only.
manager failed.
User response: Make sure that GPFS is running on a
| 6027-2725 [N] Node nodeName lease renewal is
sufficient number of quorum nodes.
overdue. Pinging to check if it is alive
Explanation: This is an informational message on the
6027-2731 Failed to locate a working cluster
cluster manager when a lease request has not been
manager.
renewed.
Explanation: The cluster manager has failed or
User response: None. Informational message only.
changed. The new cluster manager has not been
appointed.
| 6027-2726 [I] Recovered number nodes for file system
User response: Check the internode communication
fileSystem.
configuration and ensure enough GPFS nodes are up to
Explanation: The asynchronous part (phase 2) of node make a quorum.
failure recovery has completed.
User response: None. Informational message only. 6027-2732 Attention: No data disks remain in the
system pool. Use mmapplypolicy to
migrate all data left in the system pool
6027-2727 fileSystem: quota manager is not to other storage pool.
available.
Explanation: The mmchdiskcommand has been issued
Explanation: An attempt was made to perform a but no data disks remain in the system pool. Warn user
quota command without a quota manager running. to use mmapplyppolicy to move data to other storage
This could be caused by a conflicting offline mmfsck pool.
command.
User response: None. Informational message only.
User response: Reissue the command once the
conflicting program has ended.
6027-2733 The file system name (fsname) is longer
than the maximum allowable length
| 6027-2728 [N] Connection from node rejected because (maxLength).
| it does not support IPv6
Explanation: The file system name is invalid because
| Explanation: A connection request was received from it is longer than the maximum allowed length of 255
| a node that does not support Internet Protocol Version characters.
| 6 (IPv6), and at least one node in the cluster is
| configured with an IPv6 address (not an IPv4-mapped User response: Specify a file system name whose
| one) as its primary address. Since the connecting node length is 255 characters or less and reissue the
| will not be able to communicate with the IPv6 node, it command.
| is not permitted to join the cluster.
| User response: Upgrade the connecting node to a | 6027-2734 [E] Disk failure from node nodeName
| version of GPFS that supports IPv6, or delete all nodes Volume name. Physical volume name.
| with IPv6-only addresses from the cluster.
Explanation: An I/O request to a disk or a request to
fence a disk has failed in such a manner that GPFS can
no longer use the disk.
User response: Check the disk hardware and the
software subsystems in the path to the disk.
| 6027-2735 [E] Not a manager | 6027-2742 [I] CallExitScript: exit script exitScript on
event eventName returned code
Explanation: This node is not a manager or no longer
returnCode, quorumloss.
a manager of the type required to proceed with the
operation. This could be caused by the change of Explanation: This node invoked the user-specified
manager in the middle of the operation. callback handler for the tiebreakerCheck event and it
returned a non-zero value. The user-specified action
User response: Retry the operation.
with the error is quorumloss.
User response: None. Informational message only.
6027-2736 The value for --block-size must be the
keyword auto or the value must be of
the form nK, nM, nG or nT, where n is 6027-2743 Permission denied.
an optional integer in the range 1 to
Explanation: The command is invoked by an
1023.
unauthorized user.
Explanation: An invalid value was specified with the
User response: Retry the command with an
--block-size option.
authorized user.
User response: Reissue the command with a valid
option.
| 6027-2744 [D] Invoking tiebreaker callback script
Explanation: The node is invoking the callback script
6027-2738 Editing quota limits for the root user is
due to change in quorum membership.
not permitted
User response: None. Informational message only.
Explanation: The root user was specified for quota
limits editing in the mmedquota command.
User response: Specify a valid user or group in the
| 6027-2745 [E] File system is not mounted.
mmedquota command. Editing quota limits for the root Explanation: A command was issued, which requires
user or system group is prohibited. that the file system be mounted.
User response: Mount the file system and reissue the
6027-2739 Editing quota limits for groupName command.
group not permitted.
Explanation: The system group was specified for | 6027-2746 [E] Too many disks unavailable for this
quota limits editing in the mmedquota command. server to continue serving a
RecoveryGroup.
User response: Specify a valid user or group in the
mmedquota command. Editing quota limits for the root Explanation: RecoveryGroup panic: Too many disks
user or system group is prohibited. unavailable to continue serving this RecoveryGroup.
This server will resign, and failover to an alternate
server will be attempted.
| 6027-2740 [I] Starting new election as previous clmgr
is expelled User response: Ensure the alternate server took over.
Determine what caused this event and address the
Explanation: This node is taking over as clmgr
situation. Prior messages may help determine the cause
without challenge as the old clmgr is being expelled.
of the event.
User response: None. Informational message only.
| 6027-2747 [E] Inconsistency detected between the local
| 6027-2741 [W] This node can not continue to be node number retrieved from 'mmsdrfs'
| cluster manager (nodeNumber) and the node number
retrieved from 'mmfs.cfg' (nodeNumber).
Explanation: This node invoked the user-specified
callback handler for event tiebreakerCheck and it Explanation: The node number retrieved by obtaining
returned a non-zero value. This node cannot continue the list of nodes in the mmsdrfs file did not match the
to be the cluster manager. node number contained in mmfs.cfg. There may have
been a recent change in the IP addresses being used by
User response: None. Informational message only.
network interfaces configured at the node.
User response: Stop and restart GPFS daemon.
6027-2748 Terminating because a conflicting | 6027-2754 [X] Challenge thread did not respond to
program on the same inode space challenge in time: took TimeIntervalSecs
inodeSpace is running. seconds.
Explanation: A program detected that it must Explanation: Challenge thread took too long to
terminate because a conflicting program is running. respond to a disk challenge. Challenge thread will exit,
which will result in the local node losing quorum.
User response: Reissue the command after the
conflicting program ends. User response: None. Informational message only.
6027-2749 Specified locality group 'number' does | 6027-2755 [N] Another node committed disk election
not match disk 'name' locality group with sequence CommittedSequenceNumber
'number'. To change locality groups in an (our sequence was OurSequenceNumber).
SNC environment, please use the
Explanation: Another node committed a disk election
mmdeldisk and mmadddisk commands.
with a sequence number higher than the one used
Explanation: The locality group specified on the when this node used to commit an election in the past.
mmchdisk command does not match the current This means that the other node has become, or is
locality group of the disk. becoming, a Cluster Manager. To avoid having two
Cluster Managers, this node will lose quorum.
User response: To change locality groups in an SNC
environment, use the mmdeldisk and mmadddisk User response: None. Informational message only.
commands.
| 6027-2756 Attention: In file system FileSystemName,
| 6027-2750 [I] Node NodeName is now the Group | FileSetName (Default)
Leader. | QuotaLimitType(QuotaLimit) for
| QuotaTypeUerName/GroupName/FilesetName
Explanation: A new cluster Group Leader has been
| is too small. Suggest setting it higher
assigned.
| than minQuotaLimit.
User response: None. Informational message only.
Explanation: Users set too low quota limits. It will
cause unexpected quota behavior. MinQuotaLimit is
| 6027-2751 [I] Starting new election: Last elected: computed through:
NodeNumber Sequence: SequenceNumber 1. for block: QUOTA_THRESHOLD *
Explanation: A new disk election will be started. The MIN_SHARE_BLOCKS * subblocksize
disk challenge will be skipped since the last elected 2. for inode: QUOTA_THRESHOLD *
node was either none or the local node. MIN_SHARE_INODES
User response: None. Informational message only. User response: Users should reset quota limits so that
they are more than MinQuotaLimit. It is just a warning.
Quota limits will be set anyway.
| 6027-2752 [I] This node got elected. Sequence:
SequenceNumber
| 6027-2757 [E] The peer snapshot is in progress. Queue
Explanation: Local node got elected in the disk cannot be flushed now.
election. This node will become the cluster manager.
Explanation: The Peer Snapshot is in progress. Queue
User response: None. Informational message only. cannot be flushed now.
User response: Reissue the command once the peer
| 6027-2753 [N] Responding to disk challenge: snapshot has ended.
response: ResponseValue. Error code:
ErrorCode.
| 6027-2758 [E] The AFM target is not configured for
Explanation: A disk challenge has been received, | peer snapshots. Run mmafmconfig on
indicating that another node is attempting to become a | the AFM target cluster.
Cluster Manager. Issuing a challenge response, to
confirm the local node is still alive and will remain the Explanation: The .afmctl file is probably not present
Cluster Manager. on AFM target cluster.
User response: None. Informational message only. | User response: Run mmafmconfig on the AFM target
cluster to configure the AFM target cluster.
| 6027-2759 [N] Disk lease period expired in cluster 6027-2765 command on 'fileSystem' is finished
ClusterName. Attempting to reacquire waiting. Processing continues ... name
lease.
Explanation: A program detected that it can now
Explanation: The disk lease period expired, which will continue the processing since a conflicting program has
prevent the local node from being able to perform disk ended.
I/O. This can be caused by a temporary
User response: None. Informational message only.
communication outage.
User response: If message is repeated then the
communication outage should be investigated.
| 6027-2766 [I] User script has chosen to expel node
nodeName instead of node nodeName.
Explanation: User has specified a callback script that
| 6027-2760 [N] Disk lease reacquired in cluster
is invoked whenever a decision is about to be taken on
ClusterName.
what node should be expelled from the active cluster.
Explanation: The disk lease has been reacquired, and As a result of the execution of the script, GPFS will
disk I/O will be resumed. reverse its decision on what node to expel.
User response: None. Informational message only. User response: None.
6027-2761 Unable to run command on 'fileSystem' | 6027-2767 [E] Error errorNumber while accessing
while the file system is mounted in | tiebreaker devices.
restricted mode.
| Explanation: An error was encountered while reading
Explanation: A command that can alter data in a file | from or writing to the tiebreaker devices. When such
system was issued while the file system was mounted | error happens while the cluster manager is checking for
in restricted mode. | challenges, it will cause the cluster manager to lose
| cluster membership.
User response: Mount the file system in read-only or
read-write mode or unmount the file system and then | User response: Verify the health of the tiebreaker
reissue the command. | devices.
User response: Ensure the -Q yes option is in effect User response: Check the network connection
for the file system, then enable default quota with the between this node and the node listed in the message.
mmdefquotaon command.
| 6027-2779 [E] Challenge thread stopped.
6027-2774 fileSystem: Per-fileset quotas are not
Explanation: A tiebreaker challenge thread stopped
enabled.
because of an error. Cluster membership will be lost.
Explanation: A command was issued to modify
User response: Check for additional error messages.
fileset-level quota, but per-fileset quota management is
File systems will be unmounted, then the node will
not enabled.
rejoin the cluster.
User response: Ensure that the --perfileset-quota
option is in effect for the file system and reissue the
| 6027-2780 [E] Not enough quorum nodes reachable:
command.
reachableNodes.
Explanation: The cluster manager cannot reach a
6027-2775 Storage pool named poolName does not
sufficient number of quorum nodes, and therefore must
exist.
resign to prevent cluster partitioning.
Explanation: The mmlspool command was issued, but
User response: Determine if there is a network outage
the specified storage pool does not exist.
or if too many nodes have failed.
User response: Correct the input and reissue the
command.
| 6027-2781 [E] Lease expired for numSecs seconds
(shutdownOnLeaseExpiry).
6027-2776 Attention: A disk being stopped reduces
Explanation: Disk lease expired for too long, which
the degree of system metadata
results in the node losing cluster membership.
replication (value) or data replication
(value) to lower than tolerable. User response: None. The node will attempt to rejoin
the cluster.
Explanation: The mmchdisk stop command was
issued, but the disk cannot be stopped because of the
current file system metadata and data replication | 6027-2782 [E] This node is being expelled from the
factors. cluster.
User response: Make more disks available, delete Explanation: This node received a message instructing
unavailable disks, or change the file system metadata it to leave the cluster, which might indicate
replication factor. Also check the current value of the communication problems between this node and some
unmountOnDiskFail configuration parameter. other node in the cluster.
User response: None. The node will attempt to rejoin
| 6027-2777 [E] Node nodeName is being expelled the cluster.
because of an expired lease. Pings sent:
pingsSent. Replies received:
pingRepliesReceived.
| 6027-2783 [E] New leader elected with a higher ballot
number.
Explanation: The node listed did not renew its lease
Explanation: A new group leader was elected with a
in a timely fashion and is being expelled from the
higher ballot number, and this node is no longer the
cluster.
leader. Therefore, this node must leave the cluster and
User response: Check the network connection rejoin.
between this node and the node listed in the message.
User response: None. The node will attempt to rejoin
the cluster.
| 6027-2789 [E] Tiebreaker script returned a non-zero | User response: Examine mmfs.log file on all quorum
value.
| nodes for indication of a corrupted ballot file. If
| 6027-2793 is found then follow instructions for that
Explanation: The tiebreaker script, invoked during | message. If problem cannot be resolved, shut down
group leader election, returned a non-zero value, which | GPFS across the cluster, undefine, and then redefine the
results in the node losing cluster membership and then | tiebreakerdisks configuration variable, and finally
attempting to rejoin the cluster. | restart the cluster.
| 6027-2798 [E] The node nodeName does not have a | 6027-2805 [I] Loaded policy 'policyFileName or
| valid Extended License to run the filesystemName': summaryOfPolicyRules
| requested command.
Explanation: The specified loaded policy has the
| Explanation: The file system manager node does not specified policy rules.
| have a valid extended license to run ILM, AFM, or
User response: None. Informational message only.
| CNFS command.
| User response: Make sure gpfs.ext package is installed
| correctly on file system manager node and try again.
| 6027-2806 [E] Error while validating policy
'policyFileName or filesystemName':
rc=errorCode: errorDetailsString
6027-2800 Available memory exceeded on request
Explanation: An error occurred while validating the
to allocate number bytes. Trace point
specified policy.
sourceFile-tracePoint.
User response: Correct the policy rules, heeding the
Explanation: The available memory was exceeded
error details in this message and other messages issued
during an allocation request made from the cited
immediately before or after this message. Use the
source file and trace point.
mmchpolicy command to install a corrected policy
User response: Try shutting down and then restarting rules file.
GPFS. If the problem recurs, contact the IBM Support
Center.
| 6027-2807 [W] Error in evaluation of placement
policy for file fileName: errorDetailsString
6027-2801 Policy set syntax version versionString
Explanation: An error occurred while evaluating the
not supported.
installed placement policy for a particular new file.
Explanation: The policy rules do not comply with the Although the policy rules appeared to be syntactically
supported syntax. correct when the policy was installed, evidently there is
a problem when certain values of file attributes occur at
User response: Rewrite the policy rules, following the
runtime.
documented, supported syntax and keywords.
User response: Determine which file names and
attributes trigger this error. Correct the policy rules,
heeding the error details in this message and other
messages issued immediately before or after this
| 6027-2952 [E] Unknown assert class 'assertClass'. 6027-3000 [E] No disk enclosures were found on the
target node.
Explanation: The assert class is not recognized.
Explanation: GPFS is unable to communicate with any
User response: Specify a valid assert class.
disk enclosures on the node serving the specified
pdisks. This might be because there are no disk
| 6027-2953 [E] Non-numeric assert value 'value' after enclosures attached to the node, or it might indicate a
class 'class'. problem in communicating with the disk enclosures.
While the problem persists, disk maintenance with the
Explanation: The specified assert value is not mmchcarrier command is not available.
recognized.
User response: Check disk enclosure connections and
User response: Specify a valid assert integer value. run the command again. Use mmaddpdisk --replace as
an alternative method of replacing failed disks.
| 6027-2954 [E] Assert value 'value' after class 'class' must
be from 0 to 127. 6027-3001 [E] Location of pdisk pdiskName of recovery
Explanation: The specified assert value is not group recoveryGroupName is not known.
recognized. Explanation: GPFS is unable to find the location of the
User response: Specify a valid assert integer value. given pdisk.
User response: Check the disk enclosure hardware.
| 6027-2955 [W] Time-of-day may have jumped back.
Late by delaySeconds seconds to wake 6027-3002 [E] Disk location code locationCode is not
certain threads. known.
Explanation: Time-of-day may have jumped back, Explanation: A disk location code specified on the
which has resulted in some threads being awakened command line was not found.
later than expected. It is also possible that some other
factor has caused a delay in waking up the threads. User response: Check the disk location code.
User response: Verify if there is any problem with
network time synchronization, or if time-of-day is being 6027-3003 [E] Disk location code locationCode was
incorrectly set. specified more than once.
Explanation: The same disk location code was
| 6027-2956 [E] Invalid crypto engine type specified more than once in the tschcarrier command.
| (encryptionCryptoEngineType):
User response: Check the command usage and run
| cryptoEngineType.
again.
| Explanation: The specified value for
| encryptionCryptoEngineType is incorrect. 6027-3004 [E] Disk location codes locationCode and
| User response: Specify a valid value for locationCode are not in the same disk
| encryptionCryptoEngineType. carrier.
Explanation: The tschcarrier command cannot be used
| 6027-2957 [E] Invalid cluster manager selection choice to operate on more than one disk carrier at a time.
| (clusterManagerSelection):
User response: Check the command usage and rerun.
| clusterManagerSelection.
| Explanation: The specified value for 6027-3005 [W] Pdisk in location locationCode is
| clusterManagerSelection is incorrect. controlled by recovery group
| User response: Specify a valid value for recoveryGroupName.
| clusterManagerSelection. Explanation: The tschcarrier command detected that a
pdisk in the indicated location is controlled by a
| 6027-2958 [E] Invalid NIST compliance type different recovery group than the one specified.
| (nistCompliance): nistComplianceValue.
User response: Check the disk location code and
| Explanation: The specified value for nistCompliance recovery group name.
| is incorrect.
| User response: Specify a valid value for
| nistCompliance.
6027-3008 [E] Incorrect recovery group given for 6027-3014 [E] Pdisk pdiskName of recovery group
location. recoveryGroupName was expected to be
replaced with a new disk; instead, it
Explanation: The mmchcarrier command detected that
was moved from location locationCode to
the specified recovery group name given does not
location locationCode.
match that of the pdisk in the specified location.
Explanation: The mmchcarrier command expected a
User response: Check the disk location code and
pdisk to be removed and replaced with a new disk. But
recovery group name. If you are sure that the disks in
instead of being replaced, the old pdisk was moved
the carrier are not being used by other recovery groups,
into a different location.
it is possible to override the check using the --force-RG
flag. Use this flag with caution as it can cause disk User response: Repeat the disk replacement
errors and potential data loss in other recovery groups. procedure.
6027-3009 [E] Pdisk pdiskName of recovery group 6027-3015 [E] Pdisk pdiskName of recovery group
recoveryGroupName is not currently recoveryGroupName in location
scheduled for replacement. locationCode cannot be used as a
replacement for pdisk pdiskName of
Explanation: A pdisk specified in a tschcarrier or
recovery group recoveryGroupName.
tsaddpdisk command is not currently scheduled for
replacement. Explanation: The tschcarrier command expected a
pdisk to be removed and replaced with a new disk. But
User response: Make sure the correct disk location
instead of finding a new disk, the mmchcarrier
code or pdisk name was given. For the mmchcarrier
command found that another pdisk was moved to the
command, the --force-release option can be used to
replacement location.
override the check.
User response: Repeat the disk replacement
procedure, making sure to replace the failed pdisk with
6027-3010 [E] Command interrupted.
a new disk.
Explanation: The mmchcarrier command was
interrupted by a conflicting operation, for example the
6027-3016 [E] Replacement disk in location
mmchpdisk --resume command on the same pdisk.
locationCode has an incorrect FRU
User response: Run the mmchcarrier command again. fruCode; expected FRU code is fruCode.
Explanation: The replacement disk has a different
6027-3011 [W] Disk location locationCode failed to field replaceable unit code than that of the original
power off. disk.
Explanation: The mmchcarrier command detected an User response: Replace the pdisk with a disk of the
error when trying to power off a disk. same part number. If you are certain the new disk is a
valid substitute, override this check by running the
6027-3030 [E] There must be at least number non-spare 6027-3036 [E] Partition size must be a power of 2.
pdisks in declustered array
declusteredArrayName for configuration Explanation: The partitionSize parameter of some
data replicas. declustered array was invalid.
Explanation: A delete pdisk or change of spares User response: Correct the partitionSize parameter
operation failed because the resulting number of and reissue the command.
non-spare pdisks would fall below the number required
to hold configuration data for the declustered array. 6027-3037 [E] Partition size must be between number
User response: Add additional pdisks to the and number.
declustered array. If replacing a pdisk, use mmchcarrier Explanation: The partitionSize parameter of some
or mmaddpdisk --replace. declustered array was invalid.
User response: Correct the partitionSize parameter to
6027-3031 [E] There is not enough available a power of 2 within the specified range and reissue the
configuration data space in declustered command.
array declusteredArrayName to complete
this operation.
6027-3038 [E] AU log too small; must be at least
Explanation: Creating a vdisk, deleting a pdisk, or number bytes.
changing the number of spares failed because there is
not enough available space in the declustered array for Explanation: The auLogSize parameter of a new
configuration data. declustered array was invalid.
User response: Replace any failed pdisks in the User response: Increase the auLogSize parameter and
declustered array and allow time for rebalance reissue the command.
operations to more evenly distribute the available
space. Add pdisks to the declustered array. 6027-3039 [E] A vdisk with disk usage vdiskLogTip
must be the first vdisk created in a
6027-3032 [E] Temporarily unable to create vdisk recovery group.
vdiskName because more time is required Explanation: The --logTip disk usage was specified
to rebalance the available space in for a vdisk other than the first one created in a
declustered array declusteredArrayName. recovery group.
Explanation: Cannot create the specified vdisk until User response: Retry the command with a different
rebuild and rebalance processes are able to more evenly disk usage.
distribute the available space.
User response: Replace any failed pdisks in the 6027-3040 [E] Declustered array configuration data
recovery group, allow time for rebuild and rebalance does not fit.
processes to more evenly distribute the spare space
within the array, and retry the command. Explanation: There is not enough space in the pdisks
of a new declustered array to hold the AU log area
using the current partition size.
6027-3034 [E] The input pdisk name (pdiskName) did
not match the pdisk name found on User response: Increase the partitionSize parameter
disk (pdiskName). or decrease the auLogSize parameter and reissue the
command.
Explanation: Cannot add the specified pdisk, because
the input pdiskName did not match the pdiskName that
was written on the disk.
6027-3041 [E] Declustered array attributes cannot be 6027-3046 [E] The nonStealable buffer limit may be
changed. too low on server serverName. Check the
configuration attributes of the recovery
Explanation: The partitionSize and auLogSize
group servers: pagepool,
attributes of a declustered array cannot be changed
nsdRAIDBufferPoolSizePct,
after the declustered array has been created. They may
nsdRAIDNonStealableBufPct.
only be set by a command that creates the declustered
array. Explanation: The limit of non-stealable buffers is low
on the specified recovery group server. This is probably
User response: Remove the partitionSize and
because of an improperly configured system. We would
auLogSize attributes from the input file of the
expect the specified configuration variables to be the
mmaddpdisk command and reissue the command.
same for the recovery group servers.
User response: Use the mmchconfig command to
6027-3042 [E] The log tip vdisk cannot be destroyed if
correct the configuration.
there are other vdisks.
| Explanation: In recovery groups with versions prior to
6027-3047 [E] Location of pdisk pdiskName is not
| 3.5.0.11, the log tip vdisk cannot be destroyed if other known.
| vdisks still exist within the recovery group.
Explanation: GPFS is unable to find the location of the
| User response: Remove the user vdisks or upgrade
given pdisk.
| the version of the recovery group with
| mmchrecoverygroup --version, then retry the User response: Check the disk enclosure hardware.
| command to remove the log tip vdisk.
6027-3048 [E] Pdisk pdiskName is not currently
6027-3043 [E] Log vdisks cannot have multiple use scheduled for replacement.
specifications.
Explanation: A pdisk specified in a tschcarrier or
Explanation: A vdisk can have usage vdiskLog, tsaddpdisk command is not currently scheduled for
vdiskLogTip, or vdiskLogReserved, but not more than replacement.
one.
User response: Make sure the correct disk location
User response: Retry the command with only one of code or pdisk name was given. For the tschcarrier
the --log, --logTip, or --logReserved attributes. command, the --force-release option can be used to
override the check.
6027-3044 [E] Unable to determine resource
requirements for all the recovery groups 6027-3049 [E] The minimum size for vdisk vdiskName
served by node value: to override this is number.
check reissue the command with the -v
Explanation: The vdisk size was too small.
no flag.
User response: Increase the size of the vdisk and retry
Explanation: A recovery group or vdisk is being
the command.
created, but GPFS can not determine if there are
enough non-stealable buffer resources to allow the node
to successfully serve all the recovery groups at the 6027-3050 [E] There are already number suspended
same time once the new object is created. pdisks in declustered array arrayName.
You must resume pdisks in the array
User response: You can override this check by
before suspending more.
reissuing the command with the -v flag.
Explanation: The number of suspended pdisks in the
declustered array has reached the maximum limit.
| 6027-3045 [W] Buffer request exceeds the
Allowing more pdisks to be suspended in the array
non-stealable buffer limit, increase the
would put data availability at risk.
nsdRAIDNonStealableBufPct.
User response: Resume one more suspended pdisks in
Explanation: The limit of non-stealable buffers has
the array by using the mmchcarrier or mmchpdisk
been exceeded.
commands then retry the command.
User response: Use the mmchconfig command to
increase the nsdRAIDNonStealableBufPct attribute.
6027-3051 [E] Checksum granularity must be number 6027-3057 [E] Disk enclosure is no longer reporting
or number. information on location locationCode.
Explanation: The only allowable values for the Explanation: The disk enclosure reported an error
checksumGranularity attribute of a vdisk are 8K and when GPFS tried to obtain updated status on the disk
32K. location.
User response: Change the checksumGranularity User response: Try running the command again. Make
attribute of the vdisk, then retry the command. sure that the disk enclosure firmware is current. Check
for improperly seated connectors within the disk
enclosure.
6027-3052 [E] Checksum granularity cannot be
specified for log vdisks.
| 6027-3058 [A] GSS license failure - GPFS Native
Explanation: The checksumGranularity attribute
RAID services will not be configured on
cannot be applied to a log vdisk.
this node.
User response: Remove the checksumGranularity
Explanation: The GPFS Storage Server has not been
attribute of the log vdisk, then retry the command.
validly installed. Therefore, GPFS Native RAID services
will not be configured.
6027-3053 [E] Vdisk block size must be between
User response: Install a legal copy of the base GPFS
number and number for the specified
code and restart the GPFS daemon.
code when checksum granularity number
is used.
6027-3059 [E] The serviceDrain state is only permitted
Explanation: An invalid vdisk block size was
when all nodes in the cluster are
specified. The message lists the allowable range of
running daemon version version or
block sizes.
higher.
User response: Use a vdisk virtual block size within
Explanation: The mmchpdisk command option
the range shown, or use a different vdisk RAID code,
--begin-service-drain was issued, but there are
or use a different checksum granularity.
backlevel nodes in the cluster that do not support this
action.
6027-3054 [W] Disk in location locationCode failed to
User response: Upgrade the nodes in the cluster to at
come online.
least the specified version and run the command again.
Explanation: The mmchcarrier command detected an
error when trying to bring a disk back online.
| 6027-3060 [E] Block sizes of all log vdisks must be the
User response: Make sure the disk is firmly seated | same.
and run the command again. Check the operating
| Explanation: The block sizes of the log tip vdisk, the
system error log.
| log tip backup vdisk, and the log home vdisk must all
| be the same.
6027-3055 [E] The fault tolerance of the code cannot
| User response: Try running the command again after
be greater than the fault tolerance of the
| adjusting the block sizes of the log vdisks.
internal configuration data.
Explanation: The RAID code specified for a new vdisk
6027-3061 [E] Cannot delete path pathName because
is more fault-tolerant than the configuration data that
there would be no other working paths
will describe the vdisk.
to pdisk pdiskName of RG
User response: Use a code with a smaller fault recoveryGroupName.
tolerance.
Explanation: When the -v yes option is specified on
the --delete-paths subcommand of the tschrecgroup
6027-3056 [E] Long and short term event log size and command, it is not allowed to delete the last working
fast write log percentage are only path to a pdisk.
applicable to log home vdisk.
User response: Try running the command again after
Explanation: The longTermEventLogSize, repairing other broken paths for the named pdisk, or
shortTermEventLogSize, and fastWriteLogPct options reduce the list of paths being deleted, or run the
are only applicable to log home vdisk. command with -v no.
6027-3062 [E] Recovery group version version is not | 6027-3068 [E] The sizes of the log tip vdisk and the
compatible with the current recovery | log tip backup vdisk must be the same.
group version.
| Explanation: The log tip vdisk must be the same size
Explanation: The recovery group version specified | as the log tip backup vdisk.
with the --version option does not support all of the
| User response: Adjust the vdisk sizes and retry the
features currently supported by the recovery group.
| mmcrvdisk command.
User response: Run the command with a new value
for --version. The allowable values will be listed
| 6027-3069 [E] Log vdisks cannot use code codeName.
following this message.
| Explanation: Log vdisks must use a RAID code that
| uses replication, or be unreplicated. They cannot use
6027-3063 [E] Unknown recovery group version
| parity-based codes such as 8+2P.
version.
| User response: Retry the command with a valid RAID
Explanation: The recovery group version named by
| code.
the argument of the --version option was not
recognized.
| 6027-3070 [E] Log vdisk vdiskName cannot appear in
User response: Run the command with a new value
| the same declustered array as log vdisk
for --version. The allowable values will be listed
| vdiskName.
following this message.
| Explanation: No two log vdisks may appear in the
| same declustered array.
6027-3064 [I] Allowable recovery group versions are:
| User response: Specify a different declustered array
Explanation: Informational message listing allowable
| for the new log vdisk and retry the command.
recovery group versions.
User response: Run the command with one of the
| 6027-3071 [E] Device not found: deviceName.
recovery group versions listed.
| Explanation: A device name given in an
| mmcrrecoverygroup or mmaddpdisk command was
6027-3065 [E] The maximum size of a log tip vdisk is
| not found.
size.
| User response: Check the device name.
Explanation: Running mmcrvdisk for a log tip vdisk
failed because the size is too large.
| 6027-3072 [E] Invalid device name: deviceName.
User response: Correct the size parameter and run the
command again. | Explanation: A device name given in an
| mmcrrecoverygroup or mmaddpdisk command is
| 6027-3066 [E] A recovery group may only contain one | invalid.
| log tip vdisk. | User response: Check the device name.
| Explanation: A log tip vdisk already exists in the
| recovery group. | 6027-3073 [E] Error formatting pdisk pdiskName on
| User response: None. | device diskName.
| Explanation: An error occurred when trying to format
| 6027-3067 [E] Log tip backup vdisks not supported by | a new pdisk.
| this recovery group version. | User response: Check that the disk is working
| Explanation: Vdisks with usage type | properly.
| vdiskLogTipBackup are not supported by all recovery
| group versions. | 6027-3074 [E] Node nodeName not found in cluster
| User response: Upgrade the recovery group to a later | configuration.
| version using the --version option of | Explanation: A node name specified in a command
| mmchrecoverygroup. | does not exist in the cluster configuration.
| User response: Check the command arguments.
| 6027-3075 [E] The --servers list must contain the | 6027-3080 [E] Cannot remove pdisk pdiskName because
| current node, nodeName. | declustered array declusteredArrayName
| would have fewer disks than its
| Explanation: The --servers list of a tscrrecgroup
| replacement threshold.
| command does not list the server on which the
| command is being run. | Explanation: The replacement threshold for a
| declustered array must not be larger than the number
| User response: Check the --servers list. Make sure the
| of pdisks in the declustered array.
| tscrrecgroup command is run on a server that will
| actually server the recovery group. | User response: Reduce the replacement threshold for
| the declustered array, then retry the mmdelpdisk
| command.
| 6027-3076 [E] Remote pdisks are not supported by this
| recovery group version.
6027-3200 AFM ERROR: command pCacheCmd
| Explanation: Pdisks that are not directly attached are
fileset filesetName fileids
| not supported by all recovery group versions.
[parentId,childId,tParentId,targetId,ReqCmd]
| User response: Upgrade the recovery group to a later name sourceName original error oerr
| version using the --version option of application error aerr remote error
| mmchrecoverygroup. remoteError
Explanation: AFM operations on a particular file
| 6027-3077 [E] There must be at least number pdisks in failed.
| recovery group recoveryGroupName for
User response: For asynchronous operations that are
| configuration data replicas.
requeued, run the mmafmctl command with the
| Explanation: A change of pdisks failed because the resumeRequeued option after fixing the problem at the
| resulting number of pdisks would fall below the home cluster.
| needed replication factor for the recovery group
| descriptor.
6027-3201 AFM ERROR DETAILS: type:
| User response: Do not attempt to delete more pdisks. remoteCmdType snapshot name
snapshotName snapshot ID snapshotId
| 6027-3078 [E] Replacement threshold for declustered Explanation: Peer snapshot creation or deletion failed.
| array declusteredArrayName of recovery
User response: Fix snapshot creation or deletion error.
| group recoveryGroupName cannot exceed
| number.
6027-3204 AFM: Failed to set xattr on inode
| Explanation: The replacement threshold cannot be
inodeNum error err, ignoring.
| larger than the maximum number of pdisks in a
| declustered array. The maximum number of pdisks in a Explanation: Setting extended attributes on an inode
| declustered array depends on the version number of failed.
| the recovery group. The current limit is given in this
User response: None.
| message.
| User response: Use a smaller replacement threshold or
6027-3205 AFM: Failed to get xattrs for inode
| upgrade the recovery group version.
inodeNum, ignoring.
Explanation: Getting extended attributes on an inode
| 6027-3079 [E] Number of spares for declustered array
failed.
| declusteredArrayName of recovery group
| recoveryGroupName cannot exceed number. User response: None.
| Explanation: The number of spares cannot be larger
| than the maximum number of pdisks in a declustered 6027-3209 Home NFS mount of host:path failed
| array. The maximum number of pdisks in a declustered with error err
| array depends on the version number of the recovery
| group. The current limit is given in this message. Explanation: NFS mounting of path from the home
cluster failed.
| User response: Use a smaller number of spares or
| upgrade the recovery group version. User response: Make sure the exported path can be
mounted over NFSv3.
6027-3210 Cannot find AFM control file for fileset 6027-3216 Fileset filesetName encountered an error
filesetName in the exported file system at synchronizing with the remote cluster.
home. ACLs and extended attributes Cannot synchronize with the remote
will not be synchronized. Sparse files cluster until AFM recovery is executed.
will have zeros written for holes.
Explanation: Cache failed to synchronize with home
Explanation: Either home path does not belong to because of an out of memory or conflict error.
GPFS, or the AFM control file is not present in the Recovery, resynchronization, or both will be performed
exported path. by GPFS to synchronize cache with the home.
User response: If the exported path belongs to a GPFS User response: None.
| file system, run the mmafmconfig command with the
enable option on the export path at home.
6027-3217 AFM ERROR Unable to unmount NFS
export for fileset filesetName
6027-3211 Change in home export detected.
Explanation: NFS unmount of the path failed.
Caching will be disabled.
User response: None.
Explanation: A change in home export was detected
or the home path is stale.
6027-3220 AFM: Home NFS mount of host:path
User response: Ensure the exported path is accessible.
failed with error err for file system
fileSystem fileset id filesetName. Caching
6027-3212 AFM ERROR: Cannot enable AFM for will be disabled and the mount will be
fileset filesetName (error err) tried again after mountRetryTime seconds,
on next request to gateway
Explanation: AFM was not enabled for the fileset
because the root file handle was modified, or the Explanation: NFS mount of the home cluster failed.
remote path is stale. The mount will be tried again after mountRetryTime
seconds.
User response: Ensure the remote export path is
accessible for NFS mount. User response: Make sure the exported path can be
mounted over NFSv3.
6027-3213 Cannot find snapshot link directory
name for exported file system at home 6027-3221 AFM: Home NFS mount of host:path
for fileset filesetName. Snapshot directory succeeded for file system fileSystem
at home will be cached. fileset filesetName. Caching is enabled.
Explanation: Unable to determine the snapshot Explanation: NFS mounting of the path from the
directory at the home cluster. home cluster succeeded. Caching is enabled.
User response: None. User response: None.
| 6027-3214 [E] AFM: Unexpiration of fileset filesetName | 6027-3224 [I] AFM: Failed to set extended attributes
failed with error err. Use mmafmctl to on file system fileSystem inode inodeNum
manually unexpire the fileset. error err, ignoring.
Explanation: Unexpiration of fileset failed after a Explanation: Setting extended attributes on an inode
home reconnect. failed.
User response: Run the mmafmctl command with the User response: None.
unexpire option on the fileset.
| 6027-3225 [I] AFM: Failed to get extended attributes
| 6027-3215 [W] AFM: Peer snapshot delayed due to for file system fileSystem inode
long running execution of operation to inodeNum, ignoring.
remote cluster for fileset filesetName.
Explanation: Getting extended attributes on an inode
Peer snapshot continuing to wait.
failed.
Explanation: Peer snapshot command timed out
User response: None.
waiting to flush messages.
User response: None.
| 6027-3226 [I] AFM: Cannot find control file for file | 6027-3232 type AFM: pCacheCmd file system
system fileSystem fileset filesetName in the | fileSystem fileset filesetName file IDs
exported file system at home. ACLs and | [parentId.childId.tParentId.targetId,flag]
extended attributes will not be | name sourceName origin error err
synchronized. Sparse files will have
Explanation: AFM operations on a particular file
zeros written for holes.
failed.
Explanation: Either the home path does not belong to
User response: For asynchronous operations that are
GPFS, or the AFM control file is not present in the
requeued, run the mmafmctl command with the
exported path.
resumeRequeued option after fixing the problem at the
User response: If the exported path belongs to a GPFS home cluster.
| file system, run the mmafmconfig command with the
enable option on the export path at home.
| 6027-3233 [I] AFM: Previous error repeated repeatNum
times.
| 6027-3227 [E] AFM: Cannot enable AFM for file
Explanation: Multiple AFM operations have failed.
system fileSystem fileset filesetName (error
err) User response: None.
Explanation: AFM was not enabled for the fileset
because the root file handle was modified, or the | 6027-3234 [E] AFM: Unable to start thread to unexpire
remote path is stale. filesets.
User response: Ensure the remote export path is Explanation: Failed to start thread for unexpiration of
accessible for NFS mount. fileset.
User response: None.
| 6027-3228 [E] AFM: Unable to unmount NFS export
for file system fileSystem fileset
filesetName | 6027-3235 [I] AFM: Stopping recovery for the file
system fileSystem fileset filesetName
Explanation: NFS unmount of the path failed.
Explanation: AFM recovery terminated because the
User response: None. current node is no longer MDS for the fileset.
User response: None.
| 6027-3229 [E] AFM: File system fileSystem fileset
filesetName encountered an error
synchronizing with the remote cluster. | 6027-3236 [E] AFM: Recovery on file system fileSystem
Cannot synchronize with the remote | fileset filesetName failed with error err.
cluster until AFM recovery is executed. | Recovery will be retried on next access
| after recovery retry interval (timeout
Explanation: The cache failed to synchronize with | seconds) or manually resolve known
home because of an out of memory or conflict error. | problems and recover the fileset.
Recovery, resynchronization, or both will be performed
by GPFS to synchronize the cache with the home. Explanation: Recovery failed to complete on the
fileset. The next access will restart recovery.
User response: None.
Explanation: AFM recovery failed. Fileset will be
temporarily put into dropped state and will be
| 6027-3230 [I] AFM: Cannot find snapshot link recovered on accessing fileset after timeout mentioned
directory name for exported file system in the error message. User can recover the fileset
at home for file system fileSystem fileset manually by running mmafmctl command with
filesetName. Snapshot directory at home recover option after rectifying any known errors
will be cached. leading to failure.
Explanation: Unable to determine the snapshot User response: None.
directory at the home cluster.
User response: None. | 6027-3239 [E] AFM: Remote command remoteCmdType
on file system fileSystem snapshot
snapshotName snapshot ID snapshotId
failed.
Explanation: A failure occurred when creating or
deleting a peer snapshot.
| 6027-3244 [I] AFM: Home mount of afmTarget | 6027-3250 [E] AFM: Refresh intervals cannot be set for
| succeeded for file system fileSystem | fileset.
| fileset filesetName. Caching is enabled.
| Explanation: Refresh intervals are not supported on
| Explanation: A mount of the path from the home | DR mode filesets.
| cluster succeeded. Caching is enabled.
| User response: None.
| User response: None.
| 6027-3252 [I] AFM: Home has been restored for cache
| 6027-3245 [E] AFM: Home mount of afmTarget failed | filesetName. Synchronization with home
| with error error for file system fileSystem | will be resumed.
| fileset ID filesetName. Caching will be
| Explanation: A change in home export was detected
| disabled and the mount will be tried
| that caused the home to be restored. Synchronization
| again after mountRetryTime seconds, on
| with home will be resumed.
| the next request to the gateway.
| User response: None.
| Explanation: A mount of the home cluster failed. The
| mount will be tried again after mountRetryTime seconds.
| User response: Verify that the afmTarget can be
| mounted using the specified protocol.
| 6027-3253 [E] AFM: Change in home is detected for | 6027-3304 Attention: Disk space reclaim on number
| cache filesetName. Synchronization with | of number regions in fileSystem returned
| home is suspended until the problem is | errors.
| resolved.
| Explanation: Free disk space reclaims on some regions
| Explanation: A change in home export was detected | failed during tsreclaim run. Typically this is due to the
| or the home path is stale. | lack of space reclaim support by the disk controller or
| operating system. It may also be due to utilities such
| User response: Ensure the exported path is accessible.
| defrag or fsck running concurrently.
| User response: Reissue the mmdf command. Verify
| 6027-3254 [W] AFM: Home is taking longer than
| that the disk controllers and the operating systems in
| expected to respond for cache
| the cluster support thin-provisioning space reclaim. Or
| filesetName. Synchronization with home
| rerun the mmreclaim command after defrag or fsck
| is temporarily suspended.
| completes.
| Explanation: An pending message from gateway node
| to home is taking longer than expected to respond. This
| 6027-3305 AFM Fileset filesetName cannot be
| could be the result of a network issue or a problem at
| changed as it is in beingDeleted state
| the home site.
| Explanation: The user specified a fileset to tschfileset
| User response: Ensure the exported path is accessible.
| that cannot be changed.
| User response: None. You cannot change the
6027-3300 Attribute afmShowHomeSnapshot
| attributes of the root fileset.
cannot be changed for a single-writer
fileset.
6027-3400 Attention: The file system is at risk. The
Explanation: Changing afmShowHomeSnapshot is
specified replication factor does not
not supported for single-writer filesets.
tolerate unavailable metadata disks.
User response: None.
Explanation: The default metadata replication was
reduced to one while there were unavailable, or
6027-3301 Unable to quiesce all nodes; some stopped, metadata disks. This condition prevents future
processes are busy or holding required file system manager takeover.
resources.
User response: Change the default metadata
Explanation: A timeout occurred on one or more replication, or delete unavailable disks if possible.
nodes while trying to quiesce the file system during a
snapshot command.
6027-3401 Failure group value for disk diskName is
User response: Check the GPFS log on the file system not valid.
manager node.
Explanation: An explicit failure group must be
specified for each disk that belongs to a write affinity
6027-3302 Attribute afmShowHomeSnapshot enabled storage pool.
cannot be changed for a afmMode fileset.
User response: Specify a valid failure group.
Explanation: Changing afmShowHomeSnapshot is
not supported for single-writer or independent-writer
| 6027-3402 [X] An unexpected device mapper path
filesets.
dmDevice (nsdId) was detected. The new
User response: None. path does not have Persistent Reserve
enabled. The local access to disk
diskName will be marked as down.
6027-3303 Cannot restore snapshot; quota
management is active for fileSystem. Explanation: A new device mapper path was detected,
or a previously failed path was activated after the local
Explanation: File system quota management is still device discovery was finished. This path lacks a
active. The file system must be unmounted when Persistent Reserve and cannot be used. All device paths
restoring global snapshots. must be active at mount time.
User response: Unmount the file system and reissue User response: Check the paths to all disks in the file
the restore command. system. Repair any failed paths to disks then rediscover
the local disk access.
| 6027-3404 [E] The current file system version does not | 6027-3458 [E] Invalid length for the Keyname string.
| support write caching.
| Explanation: The Keyname string has an incorrect
| Explanation: The current file system version does not | length. The length of the specified string was either
| allow the write caching option. | zero or it was larger than the maximum allowed
| length.
| User response: Use mmchfs -V to convert the file
| system to version 14.04 (4.1.0.0) or higher and reissue | User response: Verify the Keyname string.
| the command.
| 6027-3459 [E] Not enough memory.
| 6027-3450 Error errorNumber when purging key
| (file system fileSystem). Key name format
| Explanation: Unable to allocate memory for the
| possibly incorrect.
| Keyname string.
| User response: Examine the error message following | User response: Examine the error messages
| this message for information on the specific failure. | surrounding this message. Contact the IBM Support
| Center.
| 6027-3465 [E] Cannot retrieve original key. | 6027-3472 [E] Could not combine the keys.
| Explanation: Original key being used by the file | Explanation: Unable to combine the keys used to
| cannot be retrieved from the key server. | wrap a file key.
| User response: Verify that the key server is available, | User response: Examine the keys being used. Contact
| the credentials to access the key server are correct, and | the IBM Support Center.
| that the key is defined on the key server.
| 6027-3473 [E] Could not locate the RKM.conf file.
| 6027-3466 [E] Cannot retrieve new key.
| Explanation: Unable to locate the RKM.conf
| Explanation: Unable to retrieve the new key specified | configuration file.
| in the rewrap from the key server.
| User response: Contact the IBM Support Center.
| User response: Verify that the key server is available,
| the credentials to access the key server are correct, and
| that the key is defined on the key server.
| 6027-3474 [E] Could not open fileType file ('fileName'
| was specified).
| 6027-3469 [E] Encryption is enabled but the crypto | 6027-3475 [E] Could not read file 'fileName'.
| module could not be initialized. Error
| code: number. Ensure that the GPFS
| Explanation: Unable to read the specified file.
| crypto package was installed. | User response: Ensure that the specified file is
| Explanation: Encryption is enabled, but the
| accessible from the node.
| cryptographic module required for encryption could
| not be loaded. | 6027-3476 [E] Could not seek through file 'fileName'.
| User response: Ensure that the packages required for | Explanation: Unable to seek through the specified file.
| encryption are installed on each node in the cluster. | Possible inconsistency in the local file system where the
| file is stored.
| 6027-3470 [E] Cannot create file fileName: extended | User response: Ensure that the specified file can be
| attribute is too large: numBytesRequired | read from the local node.
| bytes (numBytesAvailable available)
| (fileset filesetNumber, file system
| fileSystem). | 6027-3477 [E] Could not wrap the FEK.
| Explanation: Unable to create an encryption file | Explanation: Unable to wrap the file encryption key.
| because the extended attribute required for encryption | User response: Examine other error messages. Verify
| is too large. | that the encryption policies being used are correct.
| User response: Change the encryption policy so that
| the file key is wrapped fewer times, reduce the number | 6027-3478 [E] Insufficient memory.
| of keys used to wrap a file key, or create a file system
| with a larger inode size. | Explanation: Internal error: unable to allocate memory.
| User response: Restart GPFS. Contact the IBM
| 6027-3471 [E] At least one key must be specified. | Support Center.
| Explanation: No key name was specified.
| 6027-3479 [E] Missing combine parameter string.
| User response: Specify at least one key name.
| Explanation: The combine parameter string was not
| specified in the encryption policy.
| User response: Verify the syntax of the encryption
| policy.
| 6027-3480 [E] Missing encryption parameter string. | 6027-3487 [E] The RKM ID cannot be longer than
| number characters.
| Explanation: The encryption parameter string was not
| specified in the encryption policy. | Explanation: The remote key manager ID cannot be
| longer than the specified length.
| User response: Verify the syntax of the encryption
| policy. | User response: Use a shorter remote key manager ID.
| 6027-3481 [E] Missing wrapping parameter string. | 6027-3488 [E] The length of the key ID cannot be
| zero.
| Explanation: The wrapping parameter string was not
| specified in the encryption policy. | Explanation: The length of the specified key ID string
| cannot be zero.
| User response: Verify the syntax of the encryption
| policy. | User response: Specify a key ID string with a valid
| length.
| 6027-3482 [E] 'combineParameter' could not be parsed as
| a valid combine parameter string. | 6027-3489 [E] The length of the RKM ID cannot be
| zero.
| Explanation: Unable to parse the combine parameter
| string. | Explanation: The length of the specified RKM ID
| string cannot be zero.
| User response: Verify the syntax of the encryption
| policy. | User response: Specify an RKM ID string with a valid
| length.
| 6027-3483 [E] 'encryptionParameter' could not be parsed
| as a valid encryption parameter string. | 6027-3490 [E] The maximum size of the RKM.conf file
| currently supported is number bytes.
| Explanation: Unable to parse the encryption
| parameter string. | Explanation: The RKM.conf file is larger than the size
| that is currently supported.
| User response: Verify the syntax of the encryption
| policy. | User response: User a smaller RKM.conf configuration
| file.
| 6027-3484 [E] 'wrappingParameter' could not be parsed
| as a valid wrapping parameter string. | 6027-3491 [E] The string 'Keyname' could not be parsed
| as a valid key name.
| Explanation: Unable to parse the wrapping parameter
| string. | Explanation: The specified string could not be parsed
| as a valid key name.
| User response: Verify the syntax of the encryption
| policy. | User response: Specify a valid Keyname string.
| 6027-3485 [E] The Keyname string cannot be longer | 6027-3493 [E] numKeys keys were specified but a
| than number characters. | maximum of numKeysMax is supported.
| Explanation: The specified Keyname string has too | Explanation: The maximum number of specified key
| many characters. | IDs was exceeded.
| User response: Verify that the specified Keyname string | User response: Change the encryption policy to use
| is correct. | fewer keys.
| 6027-3486 [E] The KMIP library could not be | 6027-3494 [E] Unrecognized cipher mode.
| initialized.
| Explanation: Unable to recognize the specified cipher
| Explanation: The KMIP library used to communicate | mode.
| with the key server could not be initialized.
| User response: Specify one of the valid cipher modes.
| User response: Restart GPFS. Contact the IBM
| Support Center.
| 6027-3496 [E] Unrecognized combine mode. | 6027-3504 [E] Unrecognized encryption mode ('mode').
| Explanation: Unable to recognize the specified | Explanation: The specified encryption mode was not
| combine mode. | recognized.
| User response: Specify one of the valid combine | User response: Specify a valid encryption mode.
| modes.
| 6027-3505 [E] Invalid key length ('keyLength').
| 6027-3497 [E] Unrecognized encryption mode.
| Explanation: The specified key length was incorrect.
| Explanation: Unable to recognize the specified
| encryption mode.
| User response: Specify a valid key length.
| Explanation: The specified cipher mode was not | User response: Correct the syntax in RKM.conf.
| recognized.
| User response: Specify a valid cipher mode. | 6027-3512 [E] The specified type 'type' for backend
| 'backend' is invalid.
| Explanation: An incorrect type was specified for a key
| server backend.
| 6027-3517 [E] Could not open library (libName). | 6027-3527 [E] Backend 'backend' could not be
| initialized (error errorNumber).
| Explanation: Unable to open the specified library.
| Explanation: Key server backend could not be
| User response: Verify that all required packages are | initialized.
| installed for encryption. Contact the IBM Support
| Center. | User response: Examine the error messages. Verify
| connectivity to the server. Contact the IBM Support
| Center.
| 6027-3518 [E] The length of the RKM ID string is
| invalid (must be between 0 and length
| characters).
| 6027-3528 [E] Unrecognized wrapping mode
| ('wrapMode').
| Explanation: The length of the RKM backend ID is
| invalid. | Explanation: The specified key wrapping mode was
| not recognized.
| User response: Specify an RKM backend ID with a
| valid length. | User response: Specify a valid key wrapping mode.
| Explanation: The specified passphrase is incorrect for | Explanation: The required license is missing for the
| the backend. | GPFS encryption package.
| User response: Ensure that the correct passphrase is | User response: Ensure that the GPFS encryption
| used for the backend in RKM.conf. | package was installed properly.
| 6027-3537 [E] Setting default encryption parameters | 6027-3546 [E] Key 'keyID:rkmID' could not be fetched.
| requires empty combine and wrapping | The specified RKM ID does not exist;
| parameter strings. | check the RKM.conf settings.
| Explanation: A non-empty combine or wrapping | Explanation: The specified RKM ID part of the key
| parameter string was used in an encryption policy rule | name does not exist, and therefore the key cannot be
| that also uses the default parameter string. | retrieved. The corresponding RKM might have been
| removed from RKM.conf.
| User response: Ensure that neither the combine nor
| the wrapping parameter is set when the default | User response: Check the set of RKMs specified in
| parameter string is used in the encryption rule. | RKM.conf.
| 6027-3547 [E] Key 'keyID:rkmID' could not be fetched. | 6027-3700 [E] Key 'keyID' was not found on RKM ID
| The connection was reset by the peer | 'rkmID'.
| while performing the TLS handshake.
| Explanation: The specified key could not be retrieved
| Explanation: The specified key could not be retrieved | from the key server.
| from the server, because the connection with the server
| was reset while performing the TLS handshake.
| User response: Verify that the key is present at the
| server. Verify that the name of the keys used in the
| User response: Check connectivity to the server. | encryption policy is correct.
| Check credentials to access the server. Contact the IBM
| Support Center.
| 6027-3701 [E] Key 'keyID:rkmID' could not be fetched.
| The authentication with the RKM was
| 6027-3548 [E] Key 'keyID:rkmID' could not be fetched. | not successful.
| The IP address of the RKM could not be
| resolved.
| Explanation: Unable to authenticate with the key
| server.
| Explanation: The specified key could not be retrieved
| from the server because the IP address of the server
| User response: Verify that the credentials used to
| could not be resolved.
| authenticate with the key server are correct.
Accessibility features
The following list includes the major accessibility features in GPFS:
v Keyboard-only operation
v Interfaces that are commonly used by screen readers
v Keys that are discernible by touch but do not activate just by touching them
v Industry-standard devices for ports and connectors
v The attachment of alternative input and output devices
The IBM Cluster Information Center, and its related publications, are accessibility-enabled. The
accessibility features of the information center are described in the Accessibility topic at the following
URL: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.addinfo.doc/
access.html.
Keyboard navigation
This product uses standard Microsoft Windows navigation keys.
IBM may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in
your area. Any reference to an IBM product, program, or service is not intended to state or imply that
only that IBM product, program, or service may be used. Any functionally equivalent product, program,
or service that does not infringe any IBM intellectual property right may be used instead. However, it is
the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or
service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You can send
license inquiries, in writing, to:
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property
Department in your country or send inquiries, in writing, to:
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law:
This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in
any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of
the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
IBM Corporation
Dept. 30ZA/Building 707
Mail Station P300
2455 South Road,
Poughkeepsie, NY 12601-5400
U.S.A.
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment or a fee.
The licensed program described in this document and all licensed material available for it are provided
by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or
any equivalent agreement between us.
Any performance data contained herein was determined in a controlled environment. Therefore, the
results obtained in other operating environments may vary significantly. Some measurements may have
been made on development-level systems and there is no guarantee that these measurements will be the
same on generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products and
cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products.
This information contains examples of data and reports used in daily business operations. To illustrate
them as completely as possible, the examples include the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to the names and addresses used by an
actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs
in any form without payment to IBM, for the purposes of developing, using, marketing or distributing
application programs conforming to the application programming interface for the operating platform for
which the sample programs are written. These examples have not been thoroughly tested under all
conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be
liable for any damages arising out of your use of the sample programs.
If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation in the United States,
other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Notices 261
262 GPFS: Problem Determination Guide
Glossary
This glossary provides terms and definitions for D
the GPFS product.
Data Management Application Program
Interface (DMAPI)
The following cross-references are used in this
The interface defined by the Open
glossary:
Group's XDSM standard as described in
v See refers you from a nonpreferred term to the the publication System Management: Data
preferred term or from an abbreviation to the Storage Management (XDSM) API Common
spelled-out form. Application Environment (CAE) Specification
v See also refers you to a related or contrasting C429, The Open Group ISBN
term. 1-85912-190-X.
deadman switch timer
For other terms and definitions, see the IBM
A kernel timer that works on a node that
Terminology website (http://www.ibm.com/
has lost its disk lease and has outstanding
software/globalization/terminology/) (opens in
I/O requests. This timer ensures that the
new window).
node cannot complete the outstanding
I/O requests (which would risk causing
B
file system corruption), by causing a
block utilization panic in the kernel.
The measurement of the percentage of
dependent fileset
used subblocks per allocated blocks.
A fileset that shares the inode space of an
existing independent fileset.
C
disk descriptor
cluster
A definition of the type of data that the
A loosely-coupled collection of
disk contains and the failure group to
independent systems (nodes) organized
which this disk belongs. See also failure
into a network for the purpose of sharing
group.
resources and communicating with each
other. See also GPFS cluster. disposition
The session to which a data management
cluster configuration data
event is delivered. An individual
The configuration data that is stored on
disposition is set for each type of event
the cluster configuration servers.
from each file system.
cluster manager
disk leasing
The node that monitors node status using
A method for controlling access to storage
disk leases, detects failures, drives
devices from multiple host systems. Any
recovery, and selects file system
host that wants to access a storage device
managers. The cluster manager is the
configured to use disk leasing registers
node with the lowest node number
for a lease; in the event of a perceived
among the quorum nodes that are
failure, a host system can deny access,
operating at a particular time.
preventing I/O operations with the
control data structures storage device until the preempted system
Data structures needed to manage file has reregistered.
data and metadata cached in memory.
domain
Control data structures include hash
A logical grouping of resources in a
tables and link pointers for finding
network for the purpose of common
cached data; lock states and tokens to
management and administration.
implement distributed locking; and
various flags and sequence numbers to
keep track of updates to the cached data.
Glossary 265
P the data can be read or regenerated from
the other disk drives in the array due to
| policy A list of file-placement, service-class, and
data redundancy.
| encryption rules that define characteristics
| and placement of files. Several policies recovery
| can be defined within the configuration, The process of restoring access to file
| but only one policy set is active at one system data when a failure has occurred.
| time. Recovery can involve reconstructing data
or providing alternative routing through a
policy rule
different server.
A programming statement within a policy
that defines a specific action to be replication
performed. The process of maintaining a defined set
of data in more than one location.
pool A group of resources with similar
Replication involves copying designated
characteristics and attributes.
changes for one location (a source) to
portability another (a target), and synchronizing the
The ability of a programming language to data in both locations.
compile successfully on different
| RKM server
operating systems without requiring
| Remote key management server. An RKM
changes to the source code.
| server is used to store MEKs.
primary GPFS cluster configuration server
rule A list of conditions and actions that are
In a GPFS cluster, the node chosen to
triggered when certain conditions are met.
maintain the GPFS cluster configuration
Conditions include attributes about an
data.
object (file name, type or extension, dates,
private IP address owner, and groups), the requesting client,
A IP address used to communicate on a and the container name associated with
private network. the object.
public IP address
S
A IP address used to communicate on a
public network. SAN-attached
Disks that are physically attached to all
Q nodes in the cluster using Serial Storage
Architecture (SSA) connections or using
quorum node
Fibre Channel switches.
A node in the cluster that is counted to
determine whether a quorum exists. Scale Out Backup and Restore (SOBAR)
A specialized mechanism for data
quota The amount of disk space and number of
protection against disaster only for GPFS
inodes assigned as upper limits for a
file systems that are managed by Tivoli
specified user, group of users, or fileset.
Storage Manager (TSM) Hierarchical
quota management Storage Management (HSM).
The allocation of disk blocks to the other
secondary GPFS cluster configuration server
nodes writing to the file system, and
In a GPFS cluster, the node chosen to
comparison of the allocated space to
maintain the GPFS cluster configuration
quota limits at regular intervals.
data in the event that the primary GPFS
cluster configuration server fails or
R
becomes unavailable.
Redundant Array of Independent Disks (RAID)
Secure Hash Algorithm digest (SHA digest)
A collection of two or more disk physical
A character string used to identify a GPFS
drives that present to the host an image
security key.
of one or more logical disk drives. In the
event of a single physical device failure, session failure
The loss of all resources of a data
Glossary 267
268 GPFS: Problem Determination Guide
Index
Special characters cipherList 68
Clearing a leftover Persistent Reserve reservation 103
/etc/filesystems 62 client node 69
/etc/fstab 62 clock synchronization 2, 77
/etc/hosts 42 cluster
/etc/resolv.conf 60 deleting a node 57
/tmp/mmfs 110 cluster configuration information
/tmp/mmfs directory 115 displaying 18
/usr/lpp/mmfs/bin 46 cluster data
/usr/lpp/mmfs/bin/runmmfs 12 backup 45
/usr/lpp/mmfs/samples/gatherlogs.sample.sh file 2 cluster file systems
/var/adm/ras/mmfs.log.latest 1 displaying 19
/var/adm/ras/mmfs.log.previous 1, 57 cluster security configuration 66
/var/mmfs/etc/mmlock 44 cluster state information 17
/var/mmfs/gen/mmsdrfs 45 commands
.ptrash directory 112 conflicting invocation 61
.rhosts 43 errpt 115
.snapshots 82, 84, 85 gpfs.snap 6, 7, 8, 9, 10, 115
grep 3
lslpp 115
A lslv 109
access lsof 24, 70, 71
to disk 95 lspv 101
ACCESS_TIME attribute 29, 30 lsvg 100
accessibility features for the GPFS product 257 lxtrace 11
active file management in disconnected mode 112 mmadddisk 75, 80, 97, 100, 102
active file management, questions related to 111 mmaddnode 54, 55, 110
active file management, resync in 111 mmafmctl Device getstate 17
adding encryption policies 107 mmapplypolicy 25, 77, 78, 80
administration commands mmauth 35, 67
failure 44 mmbackup 81
AFM fileset, changing mode of 112 mmchcluster 43
AFM in disconnected mode 112 mmchconfig 19, 47, 55, 69, 110
AFM, extended attribute size supported by 112 mmchdisk 62, 72, 75, 80, 91, 94, 95, 97, 99
AFM, resync in 111 mmcheckquota 5, 31, 59, 72
AIX mmchfs 5, 46, 54, 57, 62, 63, 64, 72
kernel debugger 39 mmchnsd 91
AIX error logs mmcommon recoverfs 75
MMFS_DISKFAIL 95 mmcommon showLocks 44
MMFS_QUOTA 72 mmcrcluster 19, 43, 47, 54, 110
unavailable disks 72 mmcrfs 57, 58, 91, 102
application programs mmcrnsd 91, 94
errors 3, 5, 50, 58 mmcrsnapshot 83, 84
authorization error 43 mmdeldisk 75, 80, 97, 100
autofs 65 mmdelfileset 79
autofs mount 64 mmdelfs 98, 99
autoload option mmdelnode 55, 57
on mmchconfig command 47 mmdelnsd 94, 98
on mmcrcluster command 47 mmdelsnapshot 83
automount 63, 69 mmdf 53, 75, 100
automount daemon 64 mmdiag 17
automount failure 63, 65 mmexpelnode 20
mmfileid 33, 88, 97
mmfsadm 10, 15, 49, 54, 97
C mmfsck 23, 61, 62, 80, 84, 88, 97, 100, 111
mmgetstate 17, 48, 56
candidate file 25, 28 mmlsattr 78, 79
attributes 29 mmlscluster 18, 55, 67, 109
changing mode of AFM fileset 112 mmlsconfig 11, 19, 64
checking, Persistent Reserve 103 mmlsdisk 58, 61, 62, 72, 75, 91, 94, 96, 99, 116
chosen file 25, 27 mmlsfileset 79
CIFS serving, Windows SMB2 protocol 60
D
data E
replicated 97 enabling Persistent Reserve manually 104
data always gathered by gpfs.snap 8 encryption policies, adding 107
for a master snapshot 10 encryption problems 107
on AIX 9 ERRNO I/O error code 57
on all platforms 8 error codes
on Linux 9 EIO 3, 91, 98
Index 271
failure groups (continued) filesets (continued)
use of 72 problems 75
failure of mmchpolicy 107 snapshots 79
failure, key rewrap 107 unlinking 79
failure, mount 107 usage errors 79
failures FSDesc structure 72
mmbackup 81 full file system or fileset 112
file creation failure 107
file creation, opening, reading, writing (failure) 107
file migration
problems 78
G
gathering data to solve GPFS problems 6
File Placement Optimizer (FPO), questions related to 113
generating GPFS trace reports
file placement policy 78
mmtracectl command 11
file system descriptor 72, 73
GPFS
failure groups 72
data integrity 88
inaccessible 72
nodes will not start 49
file system manager
replication 96
cannot appoint 71
unable to start 41
contact problems
GPFS cluster
communication paths unavailable 62
problems adding nodes 54
multiple failures 74
recovery from loss of GPFS cluster configuration data
file system mount failure 107
files 45
file system or fileset getting full 112
GPFS cluster data
file systems
backup 45
cannot be unmounted 24
locked 44
creation failure 57
GPFS cluster data files storage 45
determining if mounted 73
GPFS command
discrepancy between configuration data and on-disk
failed 56
data 75
return code 56
forced unmount 5, 71, 74
unsuccessful 56
free space shortage 84
GPFS configuration data 75
listing mounted 24
GPFS daemon 43, 47, 61, 70
loss of access 58
crash 50
not consistent 84
fails to start 47
remote 65
went down 4, 50
unable to determine if mounted 73
will not start 47
will not mount 23, 24, 61
GPFS daemon went down 50
will not unmount 70
GPFS is not using the underlying multipath device 105
FILE_SIZE attribute 29, 30
GPFS kernel extension 46
files
GPFS local node failure 68
/etc/filesystems 62
GPFS log 1, 2, 47, 48, 50, 61, 64, 65, 66, 67, 68, 69, 71, 115
/etc/fstab 62
GPFS messages 121
/etc/group 4
GPFS modules
/etc/hosts 42
cannot be loaded 46
/etc/passwd 4
GPFS problems 41, 61, 91
/etc/resolv.conf 60
GPFS startup time 2
/usr/lpp/mmfs/bin/runmmfs 12
GPFS trace facility 11
/usr/lpp/mmfs/samples/gatherlogs.sample.sh 2
GPFS Windows SMB2 protocol (CIFS serving) 60
/var/adm/ras/mmfs.log.latest 1
gpfs.snap command 6, 115
/var/adm/ras/mmfs.log.previous 1, 57
data always gathered for a master snapshot 10
/var/mmfs/etc/mmlock 44
data always gathered on AIX 9
/var/mmfs/gen/mmsdrfs 45
data always gathered on all platforms 8
.rhosts 43
data always gathered on Linux 9
detecting damage 33
data always gathered on Windows 10
mmfs.log 1, 2, 47, 48, 50, 61, 64, 65, 66, 67, 68, 69, 71, 115
using 7
mmsdrbackup 45
grep command 3
mmsdrfs 45
Group Services
FILESET_NAME attribute 29, 30
verifying quorum 48
filesets
GROUP_ID attribute 29, 30
child 79
deleting 79
dropped 112
emptying 79 H
errors 79 hard loop ID 42
lost+found 80 hints and tips for GPFS problems 109
moving contents 79 Home and .ssh directory ownership and permissions 59
performance 79
Index 273
mmfsadm command 10, 15, 49, 54, 97 mount command 24, 61, 62, 63, 84, 98, 102
mmfsck 80 mount failure 107
mmfsck command 23, 61, 62, 84, 88, 97, 100 Multi-Media LAN Server 1
failure 111
mmfsd 47, 61, 70
will not start 47
mmfslinux
N
network failure 51
kernel module 46
network problems 3
mmgetstate command 17, 48, 56
NFS
mmlock directory 44
problems 87
mmlsattr 78, 79
NFS client
mmlscluster command 18, 55, 67, 109
with stale inode data 87
mmlsconfig command 11, 19, 64
NFS V4
mmlsdisk command 58, 61, 62, 72, 75, 91, 94, 96, 99, 116
problems 87
mmlsfileset 79
NO SUCH DIRECTORY error code 50
mmlsfs command 63, 97, 98, 115
NO SUCH FILE error code 50
mmlsmgr command 11, 62
NO_SPACE
mmlsmount command 24, 47, 58, 61, 70, 71, 91
error 75
mmlsnsd 31
node
mmlsnsd command 92, 93, 100
crash 117
mmlspolicy 78
hang 117
mmlsquota command 58, 59
rejoin 70
mmlssnapshot command 82, 83, 84
node crash 42
mmmount command 23, 61, 72, 102
node failure 52
mmpmon
node reinstall 42
abend 86
nodes
altering input file 85
cannot be added to GPFS cluster 54
concurrent usage 85
non-quorum node 109
counters wrap 86
notices 259
dump 86
NSD 100
hang 86
creating 94
incorrect input 85
deleting 94
incorrect output 86
displaying information of 92
restrictions 85
extended information 93
setup problems 85
failure 91
trace 86
NSD disks
unsupported features 86
creating 91
mmpmon command 39, 85
using 91
mmquotaoff command 59
NSD server 68, 69, 70
mmquotaon command 59
nsdServerWaitTimeForMount
mmrefresh command 19, 62, 64
changing 70
mmremotecluster 35
nsdServerWaitTimeWindowOnMount
mmremotecluster command 67, 68
changing 70
mmremotefs command 64, 67
mmrepquota command 59
mmrestorefs command 83, 84, 85
mmrestripefile 78, 81 O
mmrestripefs 81 opening a file, failure 107
mmrestripefs command 97, 100 OpenSSH connection delays
mmrpldisk 80 Windows 60
mmrpldisk command 75, 102 orphaned file 80
mmsdrbackup 45
mmsdrfs 45
mmsdrrestore command 20
mmshutdown command 18, 20, 48, 50, 64, 65
P
partitioning information, viewing 32
mmsnapdir command 82, 84, 85
patent information 259
mmstartup command 47, 64, 65
performance 43
mmtracectl command
permission denied
generating GPFS trace reports 11
remote mounts fail 69
mmumount command 70, 72
permission denied failure 107
mmunlinkfileset 79
permission denied failure (key rewrap) 107
mmwindisk 32
Persistent Reserve
mode of AFM fileset, changing 112
checking 103
MODIFICATION_TIME attribute 29, 30
clearing a leftover reservation 103
module is incompatible 47
errors 102
mount
manually enabling or disabling 104
problems 69
understanding 102
Index 275
syslog facility trace (continued)
Linux 3 tasking system 14
syslogd 65 token manager 14
system load 110 ts commands 12
system snapshots 6, 7 vdisk 15
system storage pool 77, 80 vdisk debugger 14
vdisk hospital 15
vnode layer 15
T trace classes 12
trace facility 11, 12
the IBM Support Center 117
mmfsadm command 10
threads
trace level 15
tuning 43
trace reports, generating 11
waiting 54
trademarks 260
Tivoli Storage Manager server 81
traversing a directory that has not been cached 112
trace
troubleshooting errors 59
active file management 12
troubleshooting Windows errors 59
allocation manager 12
TSM client 81
basic classes 12
TSM server 81
behaviorals 14
MAXNUMMP 82
byte range locks 12
tuning 43
call to routines in SharkMsg.h 13
checksum services 12
cleanup routines 12
cluster security 14 U
concise vnop description 15 umount command 71, 72, 100
daemon routine entry/exit 13 underlying multipath device 105
daemon specific code 14 understanding, Persistent Reserve 102
data shipping 13 useNSDserver attribute 99
defragmentation 12 USER_ID attribute 29, 30
dentry operations 13 using the gpfs.snap command 7
disk lease 13 gathering data 6
disk space allocation 12
DMAPI 13
error logging 13
events exporter 13
V
value too large failure 107
file operations 13
varyon problems 101
file system 13
varyonvg command 102
generic kernel vfs information 13
viewing disks and partitioning information 32
inode allocation 13
volume group 101
interprocess locking 13
kernel operations 13
kernel routine entry/exit 13
low-level vfs locking 13 W
mailbox message handling 13 Windows 59
malloc/free in shared segment 13 file system mounted on the wrong drive letter 111
miscellaneous tracing and debugging 14 Home and .ssh directory ownership and permissions 59
mmpmon 13 mounted file systems, Windows 111
mnode operations 13 OpenSSH connection delays 60
mutexes and condition variables 14 problem seeing newly mounted file systems 111
network shared disk 14 problem seeing newly mounted Windows file systems 111
online multinode fsck 13 problems running as administrator 60
operations in Thread class 14 Windows 111
page allocator 14 Windows SMB2 protocol (CIFS serving) 60
parallel inode tracing 14 writing to a file, failure 107
performance monitors 14
physical disk I/O 13
physical I/O 13
pinning to real memory 14
quota management 14
rdma 14
recovery log 13
SANergy 14
scsi services 14
shared segments 14
SMB locks 14
SP message handling 14
super operations 14
Printed in USA
GA76-0443-00