0% found this document useful (0 votes)
169 views298 pages

GPFS 4.1 ProblemDet

The document is the Problem Determination Guide for IBM's General Parallel File System (GPFS) Version 4 Release 1. It provides detailed troubleshooting information, commands, and error messages related to GPFS, including logs, configuration issues, and file system problems. The guide is intended for users to effectively diagnose and resolve issues encountered while using GPFS.

Uploaded by

rookiesnewbie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views298 pages

GPFS 4.1 ProblemDet

The document is the Problem Determination Guide for IBM's General Parallel File System (GPFS) Version 4 Release 1. It provides detailed troubleshooting information, commands, and error messages related to GPFS, including logs, configuration issues, and file system problems. The guide is intended for users to effectively diagnose and resolve issues encountered while using GPFS.

Uploaded by

rookiesnewbie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 298

General Parallel File System

Version 4 Release 1

Problem Determination Guide



GA76-0443-00
General Parallel File System
Version 4 Release 1

Problem Determination Guide



GA76-0443-00
Note
Before using this information and the product it supports, read the information in “Notices” on page 259.

This edition applies to version 4 release 1 modification 0 of the following products, and to all subsequent releases
and modifications until otherwise indicated in new editions:
v IBM General Parallel File System ordered through Passport Advantage® (product number 5725-Q01)
v IBM General Parallel File System ordered through AAS/eConfig (product number 5641-GPF)
v IBM General Parallel File System ordered through HVEC/Xcel (product number 5641-GP6, 5641-GP7, or
5641-GP8)
Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of the
change.
IBM welcomes your comments; see the topic “How to send your comments” on page xii. When you send
information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes
appropriate without incurring any obligation to you.
© Copyright IBM Corporation 2014.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Tables . . . . . . . . . . . . . . . vii mmapplypolicy -L 0 . . . . . . . . . . 26
mmapplypolicy -L 1 . . . . . . . . . . 26
About this information . . . . . . . . ix mmapplypolicy -L 2 . . . . . . . . . . 27
mmapplypolicy -L 3 . . . . . . . . . . 28
Prerequisite and related information . . . . . . x
mmapplypolicy -L 4 . . . . . . . . . . 29
Conventions used in this information . . . . . . xi
mmapplypolicy -L 5 . . . . . . . . . . 29
How to send your comments . . . . . . . . xii
mmapplypolicy -L 6 . . . . . . . . . . 30
The mmcheckquota command . . . . . . . . 31
| Summary of changes . . . . . . . . xiii The mmlsnsd command . . . . . . . . . . 31
The mmwindisk command . . . . . . . . . 32
Chapter 1. Logs, dumps, and traces . . 1 The mmfileid command . . . . . . . . . . 33
The GPFS log . . . . . . . . . . . . . . 1 The SHA digest . . . . . . . . . . . . . 35
Creating a master GPFS log file . . . . . . . 2
The operating system error log facility . . . . . . 2 | Chapter 4. Deadlock amelioration . . . 37
MMFS_ABNORMAL_SHUTDOWN . . . . . 3 | Automated deadlock detection . . . . . . . . 37
MMFS_DISKFAIL. . . . . . . . . . . . 3 | Automated deadlock data collection . . . . . . 38
MMFS_ENVIRON . . . . . . . . . . . 3 | Automated deadlock breakup . . . . . . . . 38
MMFS_FSSTRUCT . . . . . . . . . . . 3
MMFS_GENERIC. . . . . . . . . . . . 4
Chapter 5. Other problem
MMFS_LONGDISKIO . . . . . . . . . . 4
MMFS_QUOTA . . . . . . . . . . . . 4 determination tools . . . . . . . . . 39
MMFS_SYSTEM_UNMOUNT . . . . . . . 5
MMFS_SYSTEM_WARNING . . . . . . . . 5 Chapter 6. GPFS installation,
Error log entry example . . . . . . . . . 5 configuration, and operation problems . 41
The gpfs.snap command . . . . . . . . . . 6 Installation and configuration problems . . . . . 41
Using the gpfs.snap command . . . . . . . 7 What to do after a node of a GPFS cluster
Data always gathered by gpfs.snap on all crashes and has been reinstalled . . . . . . 42
platforms . . . . . . . . . . . . . . 8 Problems with the /etc/hosts file . . . . . . 42
Data always gathered by gpfs.snap on AIX . . . 9 Linux configuration considerations . . . . . 42
Data always gathered by gpfs.snap on Linux . . 9 Problems with running commands on other
Data always gathered by gpfs.snap on Windows 10 nodes . . . . . . . . . . . . . . . 43
Data always gathered by gpfs.snap for a master GPFS cluster configuration data files are locked 44
snapshot . . . . . . . . . . . . . . 10 Recovery from loss of GPFS cluster configuration
The mmfsadm command . . . . . . . . . . 10 data file . . . . . . . . . . . . . . 45
The GPFS trace facility. . . . . . . . . . . 11 Automatic backup of the GPFS cluster data. . . 45
Generating GPFS trace reports . . . . . . . 11 Error numbers specific to GPFS applications calls 46
GPFS modules cannot be loaded on Linux . . . . 46
Chapter 2. GPFS cluster state GPFS daemon will not come up . . . . . . . 47
information . . . . . . . . . . . . . 17 Steps to follow if the GPFS daemon does not
The mmafmctl Device getstate command . . . . 17 come up . . . . . . . . . . . . . . 47
The mmdiag command . . . . . . . . . . 17 Unable to start GPFS after the installation of a
The mmgetstate command . . . . . . . . . 17 new release of GPFS . . . . . . . . . . 49
The mmlscluster command . . . . . . . . . 18 GPFS error messages for shared segment and
The mmlsconfig command . . . . . . . . . 19 network problems . . . . . . . . . . . 49
The mmrefresh command . . . . . . . . . 19 Error numbers specific to GPFS application calls
The mmsdrrestore command . . . . . . . . 20 when the daemon is unable to come up . . . . 49
The mmexpelnode command . . . . . . . . 20 GPFS daemon went down . . . . . . . . . 50
GPFS failures due to a network failure . . . . . 51
Kernel panics with a 'GPFS dead man switch timer
Chapter 3. GPFS file system and disk
has expired, and there's still outstanding I/O
information . . . . . . . . . . . . . 23 requests' message . . . . . . . . . . . . 52
Restricted mode mount . . . . . . . . . . 23 Quorum loss . . . . . . . . . . . . . . 52
Read-only mode mount . . . . . . . . . . 23 Delays and deadlocks . . . . . . . . . . . 53
The lsof command . . . . . . . . . . . . 24 Node cannot be added to the GPFS cluster . . . . 54
The mmlsmount command . . . . . . . . . 24 Remote node expelled after remote file system
The mmapplypolicy -L command . . . . . . . 25 successfully mounted . . . . . . . . . . . 55

© Copyright IBM Corp. 2014 iii


Disaster recovery problems . . . . . . . . . 55 Problems not directly related to snapshots . . . 82
Disaster recovery setup problems . . . . . . 56 Snapshot usage errors . . . . . . . . . . 83
Other problems with disaster recovery . . . . 56 Snapshot status errors . . . . . . . . . . 83
GPFS commands are unsuccessful . . . . . . . 56 Errors encountered when restoring a snapshot. . 84
GPFS error messages for unsuccessful GPFS Snapshot directory name conflicts . . . . . . 84
commands. . . . . . . . . . . . . . 58 Failures using the mmpmon command . . . . . 85
Application program errors . . . . . . . . . 58 Setup problems using mmpmon . . . . . . 85
GPFS error messages for application program Incorrect output from mmpmon . . . . . . 86
errors . . . . . . . . . . . . . . . 59 Abnormal termination or hang in mmpmon . . 86
Troubleshooting Windows problems . . . . . . 59 NFS problems . . . . . . . . . . . . . 87
Home and .ssh directory ownership and NFS client with stale inode data . . . . . . 87
permissions . . . . . . . . . . . . . 59 NFS V4 problems . . . . . . . . . . . 87
Problems running as Administrator . . . . . 60 Problems working with Samba . . . . . . . . 87
GPFS Windows and SMB2 protocol (CIFS Data integrity. . . . . . . . . . . . . . 88
serving) . . . . . . . . . . . . . . 60 Error numbers specific to GPFS application calls
OpenSSH connection delays . . . . . . . . . 60 when data integrity may be corrupted . . . . 88
Messages requeuing in AFM. . . . . . . . . 88
Chapter 7. GPFS file system problems 61
File system will not mount . . . . . . . . . 61 Chapter 8. GPFS disk problems . . . . 91
GPFS error messages for file system mount NSD and underlying disk subsystem failures . . . 91
problems . . . . . . . . . . . . . . 63 Error encountered while creating and using NSD
Error numbers specific to GPFS application calls disks . . . . . . . . . . . . . . . 91
when a file system mount is not successful . . . 63 Displaying NSD information . . . . . . . 92
Automount file system will not mount . . . . 63 NSD creation fails with a message referring to an
Remote file system will not mount . . . . . 65 existing NSD . . . . . . . . . . . . . 94
Mount failure due to client nodes joining before GPFS has declared NSDs as down . . . . . . 94
NSD servers are online . . . . . . . . . 69 Unable to access disks . . . . . . . . . . 95
File system will not unmount . . . . . . . . 70 Guarding against disk failures . . . . . . . 96
File system forced unmount . . . . . . . . . 71 Disk media failure . . . . . . . . . . . 96
Additional failure group considerations . . . . 72 Disk connectivity failure and recovery . . . . 99
GPFS error messages for file system forced Partial disk failure. . . . . . . . . . . 100
unmount problems . . . . . . . . . . . 73 GPFS has declared NSDs built on top of AIX
Error numbers specific to GPFS application calls logical volumes as down . . . . . . . . . 100
when a file system has been forced to unmount . 73 Verify logical volumes are properly defined for
Unable to determine whether a file system is GPFS use . . . . . . . . . . . . . . 100
mounted . . . . . . . . . . . . . . . 73 Check the volume group on each node . . . . 101
GPFS error messages for file system mount status 74 Volume group varyon problems . . . . . . 101
Multiple file system manager failures. . . . . . 74 Disk accessing commands fail to complete due to
GPFS error messages for multiple file system problems with some non-IBM disks . . . . . . 102
manager failures. . . . . . . . . . . . 74 Persistent Reserve errors . . . . . . . . . 102
Error numbers specific to GPFS application calls Understanding Persistent Reserve . . . . . 102
when file system manager appointment fails . . 75 Checking Persistent Reserve . . . . . . . 103
Discrepancy between GPFS configuration data and Clearing a leftover Persistent Reserve
the on-disk data for a file system . . . . . . . 75 reservation . . . . . . . . . . . . . 103
Errors associated with storage pools, filesets and Manually enabling or disabling Persistent
policies . . . . . . . . . . . . . . . . 75 Reserve . . . . . . . . . . . . . . 104
A NO_SPACE error occurs when a file system is GPFS is not using the underlying multipath device 105
known to have adequate free space . . . . . 75
Negative values occur in the 'predicted pool | Chapter 9. GPFS encryption problems 107
utilizations', when some files are 'ill-placed' . . 77 | Unable to add encryption policies (failure of
Policies - usage errors . . . . . . . . . . 77 | mmchpolicy) . . . . . . . . . . . . . 107
Errors encountered with policies . . . . . . 78 | “Permission denied” failure when creating,
Filesets - usage errors . . . . . . . . . . 79 | opening, reading, or writing to a file . . . . . 107
Errors encountered with filesets . . . . . . 79 | “Value too large” failure when creating a file . . . 107
Storage pools - usage errors . . . . . . . . 80 | Mount failure for a file system with encryption
Errors encountered with storage pools . . . . 81 | rules . . . . . . . . . . . . . . . . 107
Failures using the mmbackup command. . . . . 81 | “Permission denied” failure of key rewrap . . . 107
GPFS error messages for mmbackup errors . . . 82
TSM error messages . . . . . . . . . . 82
Snapshot problems . . . . . . . . . . . . 82
Problems with locating a snapshot. . . . . . 82

iv GPFS: Problem Determination Guide


Chapter 10. Other problem How to contact the IBM Support Center . . . . 117
determination hints and tips . . . . . 109
Which physical disk is associated with a logical Chapter 12. Message severity tags 119
volume? . . . . . . . . . . . . . . . 109
Which nodes in my cluster are quorum nodes? . . 109 Chapter 13. Messages. . . . . . . . 121
What is stored in the /tmp/mmfs directory and
why does it sometimes disappear? . . . . . . 110 Accessibility features for GPFS . . . 257
Why does my system load increase significantly
Accessibility features . . . . . . . . . . . 257
during the night? . . . . . . . . . . . . 110
Keyboard navigation . . . . . . . . . . . 257
What do I do if I receive message 6027-648? . . . 110
IBM and accessibility . . . . . . . . . . . 257
Why can't I see my newly mounted Windows file
system? . . . . . . . . . . . . . . . 111
Why is the file system mounted on the wrong Notices . . . . . . . . . . . . . . 259
drive letter? . . . . . . . . . . . . . . 111 Trademarks . . . . . . . . . . . . . . 260
Why does the offline mmfsck command fail with
"Error creating internal storage"? . . . . . . . 111 Glossary . . . . . . . . . . . . . 263
Questions related to active file management . . . 111
Questions related to File Placement Optimizer Index . . . . . . . . . . . . . . . 269
(FPO) . . . . . . . . . . . . . . . . 113

Chapter 11. Contacting IBM . . . . . 115


Information to collect before contacting the IBM
Support Center . . . . . . . . . . . . . 115

Contents v
vi GPFS: Problem Determination Guide
Tables
1. GPFS library information units . . . . . . ix | 3. Message severity tags ordered by priority 119
2. Conventions . . . . . . . . . . . . xi

© Copyright IBM Corp. 2014 vii


viii GPFS: Problem Determination Guide
About this information
This edition applies to GPFS™ version 4.1 for AIX®, Linux, and Windows.

To find out which version of GPFS is running on a particular AIX node, enter:
lslpp -l gpfs\*

To find out which version of GPFS is running on a particular Linux node, enter:
rpm -qa | grep gpfs

To find out which version of GPFS is running on a particular Windows node, open the Programs and
Features control panel. The IBM® General Parallel File System installed program name includes the
version number.

Which GPFS information unit provides the information you need?

The GPFS library consists of the information units listed in Table 1.

To use these information units effectively, you must be familiar with the GPFS licensed product and the
AIX, Linux, or Windows operating system, or all of them, depending on which operating systems are in
use at your installation. Where necessary, these information units provide some background information
relating to AIX, Linux, or Windows; however, more commonly they refer to the appropriate operating
system documentation.
Table 1. GPFS library information units
Information unit Type of information Intended users
GPFS: Administration and This information unit explains how to System administrators or programmers
Programming Reference do the following: of GPFS systems
v Use the commands, programming
interfaces, and user exits unique to
GPFS
v Manage clusters, file systems, disks,
and quotas
v Export a GPFS file system using the
Network File System (NFS) protocol
GPFS: Advanced Administration This information unit explains how to System administrators or programmers
use the following advanced features of seeking to understand and use the
GPFS: advanced features of GPFS
v Accessing GPFS file systems from
other GPFS clusters
v Policy-based data management for
GPFS
v Creating and maintaining snapshots
of GPFS file systems
v Establishing disaster recovery for
your GPFS cluster
v Monitoring GPFS I/O performance
with the mmpmon command
v Miscellaneous advanced
administration topics

© Copyright IBM Corp. 2014 ix


Table 1. GPFS library information units (continued)
Information unit Type of information Intended users
GPFS: Concepts, Planning, and This information unit provides System administrators, analysts,
Installation information about the following topics: installers, planners, and programmers of
v Introducing GPFS GPFS clusters who are very experienced
with the operating systems on which
v GPFS architecture
each GPFS cluster is based
v Planning concepts for GPFS
v Installing GPFS
v Migration, coexistence and
compatibility
v Applying maintenance
v Configuration and tuning
v Uninstalling GPFS
GPFS: Data Management API This information unit describes the Data Application programmers who are
Guide Management Application Programming experienced with GPFS systems and
Interface (DMAPI) for GPFS. familiar with the terminology and
concepts in the XDSM standard
This implementation is based on The
Open Group's System Management:
Data Storage Management (XDSM) API
Common Applications Environment
(CAE) Specification C429, The Open
Group, ISBN 1-85912-190-X
specification. The implementation is
compliant with the standard. Some
optional features are not implemented.

The XDSM DMAPI model is intended


mainly for a single-node environment.
Some of the key concepts, such as
sessions, event delivery, and recovery,
required enhancements for a
multiple-node environment such as
GPFS.

Use this information if you intend to


write application programs to do the
following:
v Monitor events associated with a
GPFS file system or with an
individual file
v Manage and maintain GPFS file
system data
GPFS: Problem Determination This information unit contains System administrators of GPFS systems
Guide explanations of GPFS error messages who are experienced with the
and explains how to handle problems subsystems used to manage disks and
you may encounter with GPFS. who are familiar with the concepts
presented in GPFS: Concepts, Planning,
and Installation

Prerequisite and related information


For updates to this information, see the IBM Cluster information center (http://publib.boulder.ibm.com/
infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfsbooks.html) or GPFS in the IBM Knowledge
Center (http://www.ibm.com/support/knowledgecenter/SSFKCN/welcome).

x GPFS: Problem Determination Guide


For the latest support information, see the GPFS FAQ in the IBM Cluster information center
(http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/
gpfsclustersfaq.html) or GPFS FAQ in the IBM Knowledge Center (http://www.ibm.com/support/
knowledgecenter/SSFKCN/gpfsclustersfaq.html).

Conventions used in this information


Table 2 describes the typographic conventions used in this information. UNIX file name conventions are
used throughout this information.

Note: Users of GPFS for Windows must be aware that on Windows, UNIX-style file names need to be
converted appropriately. For example, the GPFS cluster configuration data is stored in the
| /var/mmfs/gen/mmsdrfs file. On Windows, the UNIX name space starts under the %SystemDrive%\cygwin
| directory, so the GPFS cluster configuration data is stored in the C:\cygwin\var\mmfs\gen\mmsdrfs file.
Table 2. Conventions
Convention Usage
bold Bold words or characters represent system elements that you must use literally, such as
commands, flags, values, and selected menu options.

Depending on the context, bold typeface sometimes represents path names, directories, or file
names.
bold underlined bold underlined keywords are defaults. These take effect if you do not specify a different
keyword.
constant width Examples and information that the system displays appear in constant-width typeface.

Depending on the context, constant-width typeface sometimes represents path names,


directories, or file names.
italic Italic words or characters represent variable values that you must supply.

Italics are also used for information unit titles, for the first use of a glossary term, and for
general emphasis in text.
<key> Angle brackets (less-than and greater-than) enclose the name of a key on the keyboard. For
example, <Enter> refers to the key on your terminal or workstation that is labeled with the
word Enter.
\ In command examples, a backslash indicates that the command or coding example continues
on the next line. For example:
mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" \
-E "PercentTotUsed < 85" -m p "FileSystem space used"
{item} Braces enclose a list from which you must choose an item in format and syntax descriptions.
[item] Brackets enclose optional items in format and syntax descriptions.
<Ctrl-x> The notation <Ctrl-x> indicates a control character sequence. For example, <Ctrl-c> means
that you hold down the control key while pressing <c>.
item... Ellipses indicate that you can repeat the preceding item one or more times.
| In synopsis statements, vertical lines separate a list of choices. In other words, a vertical line
means Or.

In the left margin of the document, vertical lines indicate technical changes to the
information.

About this information xi


How to send your comments
Your feedback is important in helping us to produce accurate, high-quality information. If you have any
comments about this information or any other GPFS documentation, send your comments to the
following e-mail address:

[email protected]

Include the publication title and order number, and, if applicable, the specific location of the information
about which you have comments (for example, a page number or a table number).

To contact the GPFS development organization, send your comments to the following e-mail address:

[email protected]

xii GPFS: Problem Determination Guide


|

| Summary of changes
| This topic summarizes changes to the GPFS licensed program and the GPFS library. Within each
| information unit in the library, a vertical line to the left of text and illustrations indicates technical
| changes or additions made to the previous edition of the information.

| GPFS version 4 release 1

| Changes to the GPFS licensed program and the GPFS library for version 4, release 1 include the
| following:
| GPFS product structure
| GPFS now comes in three levels of function: GPFS Standard Edition, GPFS Express Edition, and
| GPFS Advanced Edition.
| Active file management (AFM)
| Enhancements to AFM include the following:
| v AFM environments can now support Parallel I/O. During reads, all mapped gateway nodes
| are used to fetch a single file from home. During writes, all mapped gateways are used to
| synchronize file changes to home.
| v In addition to the NFS protocol, AFM now supports the native GPFS protocol for the AFM
| communication channel providing improved integration of GPFS features and attributes.
| v GPFS 4.1 includes a number of features optimizing AFM operations and usability. These
| features include prefetch enhancements to handle gateway node failures during prefetch. AFM
| introduces new version of hashing (afmHashVersion=2), which minimizes the impact of
| gateway nodes joining or leaving the active cluster. Also, AFM cache states will now have
| different states based on fileset and queue states.
| v GPFS 4.1 supports the migration of data from any legacy NFS storage device or GPFS cluster
| to an AFM fileset. Data migration eases data transfer when upgrading hardware or buying a
| new system. The data source is an NFS v3 export and can be either a GPFS or a non-GPFS
| source as well. AFM based migration can minimize downtime for applications and consolidate
| data from multiple legacy systems into a more powerful cache.
| Autonomic tuning for mmbackup
| The mmbackup command can be tuned to control the numbers of threads used on each node to
| scan the file system, perform inactive object expiration, and carry out modified object backup. In
| addition, the sizes of lists of objects expired or backed up can be controlled or autonomically
| tuned to select these list sizes if they are not specified. List sizes are now independent for backup
| and expire tasks. For more information, see the GPFS: Administration and Programming Reference
| topic: “Tuning backups with the mmbackup command”.
| Backup 3.2 format discontinued
| Starting with GPFS 4.1, the mmbackup command will no longer support incremental backup
| using the /Device/.snapshots/.mmbuSnapshot path name that was used with GPFS 3.2 and
| earlier. For more information, see the GPFS: Administration and Programming Reference topic: “File
| systems backed up using GPFS 3.2 or earlier versions of mmbackup”.
| Cluster Configuration Repository (CCR)
| GPFS 4.1 introduces a new quorum-based repository for the configuration data. This replaces the
| current server-based repository, which required specific nodes to be designated as primary and
| backup configuration server nodes.
| Cluster NFS improvements
| Cluster NFS (CNFS) has been enhanced to support IPv6 and NFS V4.

© Copyright IBM Corp. 2014 xiii


| Cygwin replaces SUA for Windows nodes
| SUA is no longer supported for Windows nodes. Cygwin is now a required prerequisite before
| installing GPFS on Windows nodes.
| Deadlock amelioration
| Automated deadlock detection, automated deadlock data collection, and automated deadlock
| breakup can be used to help simplify deadlock troubleshooting.
| Encryption
| Support is provided for file encryption that ensures both secure storage and secure deletion of
| data. Encryption is only available with the GPFS Advanced Edition. For more information, see
| the GPFS: Advanced Administration Guide topic: “Encryption”.
| File Placement Optimizer (FPO)
| Enhancements to FPO include the following:
| v To avoid performance impacts, data locality is now maintained when running the
| mmrestripefs -r command.
| v Asynchronous I/O performance was improved.
| v The performance of GPFS O_DIRECT vectored I/O was improved.
| v Data locality performance for AFM-FPO was improved.
| v The mmchpool command was provided to change GPFS-FPO relevant attributes
| (writeAffinityDepth and blockGroupFactor) for FPO storage pools.
| v Write affinity depth of 2 was improved to assign (write) all of the files in a fileset to the same
| second-replica node.
| Fileset snapshot restore
| Files can be restored from a fileset-level snapshot in a mounted file system.
| Local read-only cache
| Support is provided for large local read-only cache using solid-state disks. This makes data
| available with very low latency, and the cache serves to reduce the load on the shared network
| and on the backend disk storage, optimizing performance.
| Message logging
| Starting with GPFS 4.1, many GPFS log messages can be sent to syslog on Linux. Severity tags
| were added to numerous messages, and these tags can be used to filter the messages that are sent
| to syslog. The systemLogLevel attribute of the mmchconfig command controls which GPFS log
| messages are sent to syslog.
| mmsetquota command
| The new mmsetquota command enables you to set quota limits, default quota limits, or grace
| periods for users, groups, and filesets in the file system from which the command is issued.
| NFSv4 ACL formats
| The ACL entry MKDIR was replaced by APPEND/MKDIR to allow WRITE and APPEND to be
| specified independently. A new NoPropagateInherit ACL flag was introduced; this flag indicates
| that the ACL entry should be included in the initial ACL for subdirectories created in this
| directory but not further propagated to subdirectories created below that level.
| NIST SP800-131A compliance
| GPFS can be configured to operate in conformance with the NIST SP800-131A recommendations
| for communication across nodes when the cipherList configuration variable is specified. The
| nistCompliance configuration variable controls whether conformance with NIST SP800-131A is
| enforced. The algorithms and key lengths used for file encryption all conform with NIST
| SP800-131A.
| NSD formats
| A new NSD format was introduced. The new format is referred to as NSD v2, and the old format

xiv GPFS: Problem Determination Guide


| is referred to as NSD v1. The NSD v1 format is compatible with GPFS releases prior to 4.1. The
| latest GPFS release recognizes both NSD v1 and NSD v2 formatted disks.
| Online migration of extended attributes
| The mmmigratefs command can now be run with the file system mounted.
| Quota management
| Quota management improvements for file system format 4.1 and higher include:
| v Allowing quota management to be enabled and disabled without unmounting the file system.
| v The user.quota, group.quota, and fileset.quota files are no longer used. Quota files are now
| metadata files and do not appear in the file system name space.
| Rapid repair
| Performance of repairing large replicated files when restarting disks which were down has been
| improved. The repair will occur only on the blocks that changed while the disk was down, rather
| than on the entire file.
| snapshotCreated callback
| A new event called snapshotCreated was added to help correlate the timing of DMAPI events
| with the creation of a snapshot.
| TSM version verification
| The TSM Backup-Archive client must be installed and at the same version on all the nodes that
| will execute the mmbackup command or named in a node specification with -N. Starting with
| GPFS 4.1, the mmbackup command will verify that the TSM Backup-Archive client versions and
| configuration are correct before executing the backup.
| User-defined node classes
| Nodes can now be grouped into user-defined node classes that are created with the
| mmcrnodeclass command. After a node class is created, it can be specified as an argument on
| commands that accept the -N NodeClass option. User-defined node classes are managed with the
| mmchnodeclass, mmdelnodeclass, and mmlsnodeclass commands.
| Documented commands, structures, and subroutines
| The following lists the modifications to the documented commands, structures, and subroutines:
| New commands:
| The following commands are new:
| v mmafmconfig
| v mmchnodeclass
| v mmchpool
| v mmcrnodeclass
| v mmdelnodeclass
| v mmlsnodeclass
| v mmsetquota
| New structures:
| There are no new structures.
| New subroutines:
| There are no new subroutines.
| Changed commands:
| The following commands were changed:
| v mmaddcallback
| v mmafmctl
| v mmafmlocal
| v mmauth

Summary of changes xv
| v mmbackup
| v mmchattr
| v mmchcluster
| v mmchconfig
| v mmchfileset
| v mmchfs
| v mmchpdisk
| v mmchrecoverygroup
| v mmcrcluster
| v mmcrfileset
| v mmcrfs
| v mmcrnsd
| v mmcrrecoverygroup
| v mmcrvdisk
| v mmdelvdisk
| v mmdiag
| v mmlscluster
| v mmlsfs
| v mmlsmount
| v mmlsrecoverygroup
| v mmmigratefs
| v mmmount
| v mmrestorefs
| v mmsnapdir
| v mmumount
| Changed structures:
| The following structures were changed:
| v gpfs_acl_t
| v gpfs_direntx_t
| v gpfs_direntx64_t
| v gpfs_iattr_t
| v gpfs_iattr64_t
| Changed subroutines:
| The following subroutines were changed:
| v gpfs_fgetattrs()
| v gpfs_fputattrs()
| v gpfs_fputattrswithpathname()
| v gpfs_fstat()
| v gpfs_stat()
| Deleted commands:
| The following commands were deleted:
| v mmafmhomeconfig
| Deleted structures:
| There are no deleted structures.

xvi GPFS: Problem Determination Guide


| Deleted subroutines:
| There are no deleted subroutines.
| Messages
| The following lists the new, changed, and deleted messages:
| New messages
| 6027-680, 6027-760, 6027-873, 6027-939, 6027-953, 6027-954, 6027-955, 6027-956, 6027-957,
| 6027-959, 6027-960, 6027-1629, 6027-1743, 6027-1744, 6027-1745, 6027-1746, 6027-1747,
| 6027-1748, 6027-1749, 6027-1750, 6027-1751, 6027-1752, 6027-2229, 6027-2230, 6027-2231,
| 6027-2232, 6027-2233, 6027-2728, 6027-2767, 6027-2793, 6027-2794, 6027-2795, 6027-2796,
| 6027-2797, 6027-2798, 6027-2822, 6027-2823, 6027-2824, 6027-2825, 6027-2826, 6027-2827,
| 6027-2828, 6027-2956, 6027-2957, 6027-2958, 6027-3066, 6027-3067, 6027-3068, 6027-3069,
| 6027-3070, 6027-3071, 6027-3072, 6027-3073, 6027-3074, 6027-3075, 6027-3076, 6027-3077,
| 6027-3078, 6027-3079, 6027-3080, 6027-3241, 6027-3242, 6027-3243, 6027-3244, 6027-3245,
| 6027-3246, 6027-3247, 6027-3248, 6027-3249, 6027-3250, 6027-3252, 6027-3253, 6027-3254,
| 6027-3304, 6027-3305, 6027-3404, 6027-3450, 6027-3451, 6027-3452, 6027-3453, 6027-3457,
| 6027-3458, 6027-3459, 6027-3460, 6027-3461, 6027-3462, 6027-3463, 6027-3464, 6027-3465,
| 6027-3466, 6027-3468, 6027-3469, 6027-3470, 6027-3471, 6027-3472, 6027-3473, 6027-3474,
| 6027-3475, 6027-3476, 6027-3477, 6027-3478, 6027-3479, 6027-3480, 6027-3481, 6027-3482,
| 6027-3483, 6027-3484, 6027-3485, 6027-3486, 6027-3487, 6027-3488, 6027-3489, 6027-3490,
| 6027-3491, 6027-3493, 6027-3494, 6027-3495, 6027-3496, 6027-3497, 6027-3498, 6027-3499,
| 6027-3500, 6027-3501, 6027-3502, 6027-3503, 6027-3504, 6027-3505, 6027-3506, 6027-3509,
| 6027-3510, 6027-3511, 6027-3512, 6027-3513, 6027-3514, 6027-3515, 6027-3516, 6027-3517,
| 6027-3518, 6027-3519, 6027-3520, 6027-3521, 6027-3522, 6027-3524, 6027-3527, 6027-3528,
| 6027-3529, 6027-3530, 6027-3533, 6027-3534, 6027-3535, 6027-3536, 6027-3537, 6027-3540,
| 6027-3541, 6027-3543, 6027-3544, 6027-3545, 6027-3546, 6027-3547, 6027-3548, 6027-3549,
| 6027-3550, 6027-3555, 6027-3700, 6027-3701, 6027-3702, 6027-3703, 6027-3704
| Changed messages
| 6027-328, 6027-542, 6027-573, 6027-575, 6027-595, 6027-597, 6027-755, 6027-882, 6027-884,
| 6027-885, 6027-886, 6027-906, 6027-907, 6027-910, 6027-996, 6027-1227, 6027-1260,
| 6027-1261, 6027-1262, 6027-1263, 6027-1292, 6027-1303, 6027-1305, 6027-1308, 6027-1309,
| 6027-1524, 6027-1717, 6027-1718, 6027-1890, 6027-1891, 6027-1898, 6027-2150, 6027-2158,
| 6027-2204, 6027-2205, 6027-2206, 6027-2741, 6027-2756, 6027-2758, 6027-3026, 6027-3060,
| 6027-3215, 6027-3226, 6027-3227, 6027-3228, 6027-3232, 6027-3236, 6027-3239, 6027-3240
| Changed messages (only severity tags added)
| 6027-300, 6027-302, 6027-303, 6027-304, 6027-305, 6027-306, 6027-310, 6027-311, 6027-312,
| 6027-313, 6027-314, 6027-315, 6027-316, 6027-317, 6027-318, 6027-323, 6027-334, 6027-335,
| 6027-336, 6027-337, 6027-338, 6027-339, 6027-341, 6027-342, 6027-343, 6027-344, 6027-346,
| 6027-347, 6027-348, 6027-349, 6027-350, 6027-361, 6027-365, 6027-378, 6027-435, 6027-472,
| 6027-473, 6027-474, 6027-479, 6027-481, 6027-482, 6027-483, 6027-490, 6027-499, 6027-532,
| 6027-533, 6027-550, 6027-590, 6027-593, 6027-596, 6027-598, 6027-599, 6027-604, 6027-605,
| 6027-606, 6027-608, 6027-611, 6027-613, 6027-616, 6027-617, 6027-618, 6027-622, 6027-629,
| 6027-630, 6027-635, 6027-636, 6027-637, 6027-638, 6027-639, 6027-640, 6027-641, 6027-642,
| 6027-643, 6027-646, 6027-647, 6027-650, 6027-695, 6027-696, 6027-697, 6027-698, 6027-699,
| 6027-700, 6027-701, 6027-702, 6027-703, 6027-711, 6027-712, 6027-716, 6027-717, 6027-719,
| 6027-720, 6027-721, 6027-724, 6027-726, 6027-734, 6027-747, 6027-750, 6027-751, 6027-752,
| 6027-753, 6027-756, 6027-761, 6027-765, 6027-766, 6027-767, 6027-777, 6027-778, 6027-784,
| 6027-785, 6027-786, 6027-787, 6027-788, 6027-866, 6027-870, 6027-871, 6027-872, 6027-874,
| 6027-875, 6027-876, 6027-877, 6027-878, 6027-879, 6027-881, 6027-883, 6027-887, 6027-888,
| 6027-889, 6027-890, 6027-891, 6027-892, 6027-893, 6027-894, 6027-895, 6027-896, 6027-897,
| 6027-898, 6027-899, 6027-900, 6027-901, 6027-902, 6027-903, 6027-904, 6027-905, 6027-908,
| 6027-909, 6027-911, 6027-912, 6027-920, 6027-921, 6027-922, 6027-923, 6027-924, 6027-928,
| 6027-929, 6027-930, 6027-931, 6027-932, 6027-933, 6027-934, 6027-935, 6027-936, 6027-937,
| 6027-938, 6027-948, 6027-949, 6027-950, 6027-951, 6027-997, 6027-998, 6027-999, 6027-1500,
| 6027-1501, 6027-1502, 6027-1510, 6027-1511, 6027-1512, 6027-1537, 6027-1538, 6027-1539,

Summary of changes xvii


| 6027-1540, 6027-1541, 6027-1542, 6027-1544, 6027-1545, 6027-1546, 6027-1547, 6027-1548,
| 6027-1549, 6027-1550, 6027-1666, 6027-1709, 6027-1710, 6027-1711, 6027-1716, 6027-1724,
| 6027-1725, 6027-1726, 6027-1727, 6027-1728, 6027-1729, 6027-1730, 6027-1731, 6027-1732,
| 6027-1734, 6027-1735, 6027-1736, 6027-1737, 6027-1738, 6027-1739, 6027-1740, 6027-1741,
| 6027-1742, 6027-1803, 6027-1804, 6027-1805, 6027-1806, 6027-1807, 6027-1808, 6027-1809,
| 6027-1810, 6027-1811, 6027-1812, 6027-1813, 6027-1814, 6027-1815, 6027-1816, 6027-1817,
| 6027-1818, 6027-1824, 6027-1825, 6027-1851, 6027-1852, 6027-2049, 6027-2050, 6027-2576,
| 6027-2618, 6027-2621, 6027-2623, 6027-2667, 6027-2673, 6027-2674, 6027-2682, 6027-2694,
| 6027-2695, 6027-2696, 6027-2700, 6027-2706, 6027-2707, 6027-2708, 6027-2710, 6027-2711,
| 6027-2716, 6027-2722, 6027-2723, 6027-2724, 6027-2725, 6027-2726, 6027-2730, 6027-2734,
| 6027-2735, 6027-2740, 6027-2742, 6027-2744, 6027-2745, 6027-2746, 6027-2747, 6027-2750,
| 6027-2751, 6027-2752, 6027-2753, 6027-2754, 6027-2755, 6027-2757, 6027-2759, 6027-2760,
| 6027-2766, 6027-2777, 6027-2778, 6027-2779, 6027-2780, 6027-2781, 6027-2782, 6027-2783,
| 6027-2784, 6027-2785, 6027-2786, 6027-2787, 6027-2788, 6027-2789, 6027-2805, 6027-2806,
| 6027-2807, 6027-2810, 6027-2950, 6027-2952, 6027-2953, 6027-2954, 6027-2955, 6027-3035,
| 6027-3045, 6027-3058, 6027-3214, 6027-3224, 6027-3225, 6027-3229, 6027-3230, 6027-3233,
| 6027-3234, 6027-3235, 6027-3402
| Deleted messages
| 6027-1264, 6027-1265, 6027-1311, 6027-1312, 6027-1323, 6027-1324, 6027-1325, 6027-1327,
| 6027-1328, 6027-1330, 6027-1336, 6027-1337, 6027-1354, 6027-1355, 6027-1356, 6027-1369,
| 6027-1376, 6027-1397, 6027-1558, 6027-1569, 6027-1580, 6027-1585, 6027-1586, 6027-1629,
| 6027-1698, 6027-1868, 6027-1920, 6027-1944, 6027-1965, 6027-1971, 6027-1972, 6027-1973,
| 6027-1991, 6027-3218, 6027-3219, 6027-3237, 6027-3238

xviii GPFS: Problem Determination Guide


|

Chapter 1. Logs, dumps, and traces


The problem determination tools provided with General Parallel File System (GPFS) are intended for use
by experienced system administrators who know how to collect data and run debugging routines.

GPFS has it own error log, but the operating system error log is also useful because it contains
information about hardware failures and operating system or other software failures that can affect GPFS.

Note: GPFS error logs and messages contain the MMFS prefix. This is intentional, because GPFS shares
many components with the IBM Multi-Media LAN Server, a related licensed program.

GPFS also provides a system snapshot dump, trace, and other utilities that can be used to obtain detailed
information about specific problems.

The information is organized as follows:


v “The GPFS log”
v “The operating system error log facility” on page 2
v “The gpfs.snap command” on page 6
v “The mmfsadm command” on page 10
v “The GPFS trace facility” on page 11

The GPFS log


The GPFS log is a repository of error conditions that have been detected on each node, as well as
operational events such as file system mounts. The GPFS log is the first place to look when attempting to
debug abnormal events. Since GPFS is a cluster file system, events that occur on one node might affect
system behavior on other nodes, and all GPFS logs can have relevant data.

The GPFS log can be found in the /var/adm/ras directory on each node. The GPFS log file is named
mmfs.log.date.nodeName, where date is the time stamp when the instance of GPFS started on the node and
nodeName is the name of the node. The latest GPFS log file can be found by using the symbolic file name
/var/adm/ras/mmfs.log.latest.

The GPFS log from the prior startup of GPFS can be found by using the symbolic file name
/var/adm/ras/mmfs.log.previous. All other files have a timestamp and node name appended to the file
name.

At GPFS startup, log files that have not been accessed during the last ten days are deleted. If you want to
save old log files, copy them elsewhere.

| Starting with GPFS 4.1, many GPFS log messages can be sent to syslog on Linux. The systemLogLevel
| attribute of the mmchconfig command controls which GPFS log messages are sent to syslog. For more
| information, see the mmchconfig command in the GPFS: Administration and Programming Reference.

This example shows normal operational messages that appear in the GPFS log file:
| Removing old /var/adm/ras/mmfs.log.* files:
| Unloading modules from /lib/modules/3.0.13-0.27-default/extra
| Unloading module tracedev
| Loading modules from /lib/modules/3.0.13-0.27-default/extra
| Module Size Used by
| mmfs26 2155186 0
| mmfslinux 379348 1 mmfs26
| tracedev 48513 2 mmfs26,mmfslinux

© Copyright IBM Corporation © IBM 2014 1


| Wed Mar 26 15:56:46.343 2014: [I] mmfsd initializing. {Version: 4.1.0.0 Built: Mar 24 2014 22:59:56} ...
| Wed Mar 26 15:56:46.344 2014: [I] Tracing in blocking mode
| Wed Mar 26 15:56:46.343 2014: [I] Cleaning old shared memory ...
| Wed Mar 26 15:56:46.344 2014: [I] First pass parsing mmfs.cfg ...
| Wed Mar 26 15:56:46.343 2014: [I] Enabled automated deadlock detection.
| Wed Mar 26 15:56:46.344 2014: [I] Enabled automated deadlock debug data collection.
| Wed Mar 26 15:56:46.345 2014: [I] Enabled automated deadlock breakup.
| Wed Mar 26 15:56:46.344 2014: [I] Initializing the main process ...
| Wed Mar 26 15:56:46.350 2014: [I] Second pass parsing mmfs.cfg ...
| Wed Mar 26 15:56:46.351 2014: [I] Initializing the page pool ...
| Wed Mar 26 15:56:46.544 2014: [I] Initializing the mailbox message system ...
| Wed Mar 26 15:56:46.545 2014: [I] Initializing encryption ...
| Wed Mar 26 15:56:46.546 2014: [I] Initializing the thread system ...
| Wed Mar 26 15:56:46.547 2014: [I] Creating threads ...
| Wed Mar 26 15:56:46.554 2014: [I] Initializing inter-node communication ...
| Wed Mar 26 15:56:46.555 2014: [I] Creating the main SDR server object ...
| Wed Mar 26 15:56:46.556 2014: [I] Initializing the sdrServ library ...
| Wed Mar 26 15:56:46.555 2014: [I] Initializing the ccrServ library ...
| Wed Mar 26 15:56:46.561 2014: [I] Initializing the cluster manager ...
| Wed Mar 26 15:56:46.805 2014: [I] Initializing the token manager ...
| Wed Mar 26 15:56:46.809 2014: [I] Initializing network shared disks ...
| Wed Mar 26 15:56:47.350 2014: [I] Start the ccrServ ...
| Wed Mar 26 15:56:47.869 2014: [N] Connecting to 192.168.116.97 hs22n37 <c0p1>
| Wed Mar 26 15:56:47.870 2014: [I] Connected to 192.168.116.97 hs22n37 <c0p1>
| Wed Mar 26 15:56:47.918 2014: [I] Node 192.168.116.97 (hs22n37) is now the Group Leader.
| Wed Mar 26 15:56:47.943 2014: [N] mmfsd ready
| Wed Mar 26 15:56:47 EDT 2014: mmcommon mmfsup invoked. Parameters: 192.168.116.98 192.168.116.97 all

Depending on the size and complexity of your system configuration, the amount of time to start GPFS
varies. Taking your system configuration into consideration, after a reasonable amount of time if you
cannot access a file system that has been mounted (either automatically or with a mount or mmmount
command), examine the log file for error messages.

Creating a master GPFS log file


The GPFS log frequently shows problems on one node that actually originated on another node.

GPFS is a file system that runs on multiple nodes of a cluster. This means that problems originating on
one node of a cluster often have effects that are visible on other nodes. It is often valuable to merge the
GPFS logs in pursuit of a problem. Having accurate time stamps aids the analysis of the sequence of
events.

Before following any of the debug steps, IBM suggests that you:
1. Synchronize all clocks of all nodes in the GPFS cluster. If this is not done, and clocks on different
nodes are out of sync, there is no way to establish the real time line of events occurring on multiple
nodes. Therefore, a merged error log is less useful for determining the origin of a problem and
tracking its effects.
2. Merge and chronologically sort all of the GPFS log entries from each node in the cluster. The
--gather-logs option of “The gpfs.snap command” on page 6 can be used to achieve this:
gpfs.snap --gather-logs -d /tmp/logs -N all
The system displays information similar to:
gpfs.snap: Gathering mmfs logs ...
gpfs.snap: The sorted and unsorted mmfs.log files are in /tmp/logs
If the --gather-logs option is not available on your system, you can create your own script to achieve
the same task; use /usr/lpp/mmfs/samples/gatherlogs.sample.sh as an example.

The operating system error log facility


GPFS records file system or disk failures using the error logging facility provided by the operating
system: syslog facility on Linux, errpt facility on AIX, and Event Viewer on Windows.

2 GPFS: Problem Determination Guide


The error logging facility is referred to as the error log regardless of operating-system specific error log
facility naming conventions.

Failures in the error log can be viewed by issuing this command on an AIX node:
errpt -a

and this command on a Linux node:


grep "mmfs:" /var/log/messages

On Windows, use the Event Viewer and look for events with a source label of GPFS in the Application
event category.

| On Linux, syslog may include GPFS log messages and the error logs described in this section. The
| systemLogLevel attribute of the mmchconfig command controls which GPFS log messages are sent to
| syslog. For more information, see the mmchconfig command in the GPFS: Administration and
| Programming Reference.

The error log contains information about several classes of events or errors. These classes are:
v “MMFS_ABNORMAL_SHUTDOWN”
v “MMFS_DISKFAIL”
v “MMFS_ENVIRON”
v “MMFS_FSSTRUCT”
v “MMFS_GENERIC” on page 4
v “MMFS_LONGDISKIO” on page 4
v “MMFS_QUOTA” on page 4
v “MMFS_SYSTEM_UNMOUNT” on page 5
v “MMFS_SYSTEM_WARNING” on page 5

MMFS_ABNORMAL_SHUTDOWN
The MMFS_ABNORMAL_SHUTDOWN error log entry means that GPFS has determined that it must
shutdown all operations on this node because of a problem. Insufficient memory on the node to handle
critical recovery situations can cause this error. In general there will be other error log entries from GPFS
or some other component associated with this error log entry.

MMFS_DISKFAIL
The MMFS_DISKFAIL error log entry indicates that GPFS has detected the failure of a disk and forced
the disk to the stopped state. This is ordinarily not a GPFS error but a failure in the disk subsystem or
the path to the disk subsystem.

MMFS_ENVIRON
MMFS_ENVIRON error log entry records are associated with other records of the MMFS_GENERIC or
MMFS_SYSTEM_UNMOUNT types. They indicate that the root cause of the error is external to GPFS
and usually in the network that supports GPFS. Check the network and its physical connections. The
data portion of this record supplies the return code provided by the communications code.

MMFS_FSSTRUCT
The MMFS_FSSTRUCT error log entry indicates that GPFS has detected a problem with the on-disk
structure of the file system. The severity of these errors depends on the exact nature of the inconsistent
data structure. If it is limited to a single file, EIO errors will be reported to the application and operation
will continue. If the inconsistency affects vital metadata structures, operation will cease on this file
system. These errors are often associated with an MMFS_SYSTEM_UNMOUNT error log entry and will

Chapter 1. Logs, dumps, and traces 3


probably occur on all nodes. If the error occurs on all nodes, some critical piece of the file system is
inconsistent. This can occur as a result of a GPFS error or an error in the disk system.

If the file system is severely damaged, the best course of action is to follow the procedures in “Additional
information to collect for file system corruption or MMFS_FSSTRUCT errors” on page 116, and then
contact the IBM Support Center.

MMFS_GENERIC
The MMFS_GENERIC error log entry means that GPFS self diagnostics have detected an internal error,
or that additional information is being provided with an MMFS_SYSTEM_UNMOUNT report. If the
record is associated with an MMFS_SYSTEM_UNMOUNT report, the event code fields in the records
will be the same. The error code and return code fields might describe the error. See Chapter 13,
“Messages,” on page 121 for a listing of codes generated by GPFS.

If the error is generated by the self diagnostic routines, service personnel should interpret the return and
error code fields since the use of these fields varies by the specific error. Errors caused by the self
checking logic will result in the shutdown of GPFS on this node.

MMFS_GENERIC errors can result from an inability to reach a critical disk resource. These errors might
look different depending on the specific disk resource that has become unavailable, like logs and
allocation maps. This type of error will usually be associated with other error indications. Other errors
generated by disk subsystems, high availability components, and communications components at the
same time as, or immediately preceding, the GPFS error should be pursued first because they might be
the cause of these errors. MMFS_GENERIC error indications without an associated error of those types
represent a GPFS problem that requires the IBM Support Center. See “Information to collect before
contacting the IBM Support Center” on page 115.

MMFS_LONGDISKIO
The MMFS_LONGDISKIO error log entry indicates that GPFS is experiencing very long response time
for disk requests. This is a warning message and can indicate that your disk system is overloaded or that
a failing disk is requiring many I/O retries. Follow your operating system's instructions for monitoring
the performance of your I/O subsystem on this node and on any disk server nodes that might be
involved. The data portion of this error record specifies the disk involved. There might be related error
log entries from the disk subsystems that will pinpoint the actual cause of the problem. If the disk is
attached to an AIX node, refer to the AIX Information Center (http://publib16.boulder.ibm.com/pseries/
index.htm) and search for performance management. To enable or disable, use the mmchfs -w command.
For more details, contact your IBM service representative.

The mmpmon command can be used to analyze I/O performance on a per-node basis. See “Failures
using the mmpmon command” on page 85 and the Monitoring GPFS I/O performance with the mmpmon
command topic in the GPFS: Advanced Administration Guide.

MMFS_QUOTA
The MMFS_QUOTA error log entry is used when GPFS detects a problem in the handling of quota
information. This entry is created when the quota manager has a problem reading or writing the quota
file. If the quota manager cannot read all entries in the quota file when mounting a file system with
quotas enabled, the quota manager shuts down but file system manager initialization continues. Mounts
will not succeed and will return an appropriate error message (see “File system forced unmount” on page
71).

Quota accounting depends on a consistent mapping between user names and their numeric identifiers.
This means that a single user accessing a quota enabled file system from different nodes should map to
the same numeric user identifier from each node. Within a local cluster this is usually achieved by
ensuring that /etc/passwd and /etc/group are identical across the cluster.

4 GPFS: Problem Determination Guide


When accessing quota enabled file systems from other clusters, you need to either ensure individual
accessing users have equivalent entries in /etc/passwd and /etc/group, or use the user identity mapping
| facility as outlined in the IBM white paper entitled UID Mapping for GPFS in a Multi-cluster Environment
| in the IBM Cluster information center (http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/
| com.ibm.cluster.gpfs.doc/gpfs_uid/uid_gpfs.htm) or the IBM Knowledge Center (http://www.ibm.com/
| support/knowledgecenter/SSFKCN/uid_gpfs.html).

It might be necessary to run an offline quota check (mmcheckquota) to repair or recreate the quota file. If
the quota file is corrupted, mmcheckquota will not restore it. The file must be restored from the backup
copy. If there is no backup copy, an empty file can be set as the new quota file. This is equivalent to
recreating the quota file. To set an empty file or use the backup file, issue the mmcheckquota command
with the appropriate operand:
v -u UserQuotaFilename for the user quota file
v -g GroupQuotaFilename for the group quota file
v -j FilesetQuotaFilename for the fileset quota file

After replacing the appropriate quota file, reissue the mmcheckquota command to check the file system
inode and space usage.

For information about running the mmcheckquota command, see “The mmcheckquota command” on
page 31.

MMFS_SYSTEM_UNMOUNT
The MMFS_SYSTEM_UNMOUNT error log entry means that GPFS has discovered a condition that
might result in data corruption if operation with this file system continues from this node. GPFS has
marked the file system as disconnected and applications accessing files within the file system will receive
ESTALE errors. This can be the result of:
v The loss of a path to all disks containing a critical data structure.
If you are using SAN attachment of your storage, consult the problem determination guides provided
by your SAN switch vendor and your storage subsystem vendor.
v An internal processing error within the file system.

See “File system forced unmount” on page 71. Follow the problem determination and repair actions
specified.

MMFS_SYSTEM_WARNING
The MMFS_SYSTEM_WARNING error log entry means that GPFS has detected a system level value
approaching its maximum limit. This might occur as a result of the number of inodes (files) reaching its
limit. If so, issue the mmchfs command to increase the number of inodes for the file system so there is at
least a minimum of 5% free.

Error log entry example


This is an example of an error log entry that indicates a failure in either the storage subsystem or
communication subsystem:
LABEL: MMFS_SYSTEM_UNMOUNT
IDENTIFIER: C954F85D

Date/Time: Thu Jul 8 10:17:10 CDT


Sequence Number: 25426
Machine Id: 000024994C00
Node Id: nos6
Class: S
Type: PERM
Resource Name: mmfs

Chapter 1. Logs, dumps, and traces 5


Description
STORAGE SUBSYSTEM FAILURE

Probable Causes
STORAGE SUBSYSTEM
COMMUNICATIONS SUBSYSTEM

Failure Causes
STORAGE SUBSYSTEM
COMMUNICATIONS SUBSYSTEM

Recommended Actions
CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
EVENT CODE
15558007
STATUS CODE
212
VOLUME
gpfsd

The gpfs.snap command


The gpfs.snap command creates an informational system snapshot at a single point in time. This system
snapshot consists of cluster configuration, disk configuration, network configuration, network status,
GPFS logs, dumps, and traces. Use the gpfs.snap command as one of the main tools to gather
preliminary data when a GPFS problem is encountered, such as a hung file system, a hung GPFS
command, or an mmfsd daemon assert.

The information gathered with the gpfs.snap command can be used in conjunction with other
information (for example, GPFS internal dumps, traces, and kernel thread dumps) to solve a GPFS
problem.

The syntax of the gpfs.snap command is:


gpfs.snap [-c "CommandString"] [-d OutputDirectory] [-m | -z]
[-a | -N {Node[,Node...] | NodeFile | NodeClass}]
[--check-space | --no-check-space | --check-space-only]
[--deadlock [--quick]] [--exclude-aix-disk-attr]
[--exclude-aix-lvm] [--exclude-net] [--exclude-merge-logs]
[--gather-logs] [--mmdf] [--prefix]

These options are used with gpfs.snap:


-c "CommandString"
Specifies the command string to run on the specified nodes. When this option is specified, the data
collected is limited to the result of the specified command string; the standard data collected by
gpfs.snap is not collected. CommandString can consist of multiple commands, which are separated by
semicolons (;) and enclosed in double quotation marks (").
-d OutputDirectory
Specifies the output directory. The default is /tmp/gpfs.snapOut.
-m Specifying this option is equivalent to specifying --exclude-merge-logs with -N.
-z Collects gpfs.snap data only from the node on which the command is invoked. No master data is
collected.
-a Directs gpfs.snap to collect data from all nodes in the cluster. This is the default.
-N {Node[,Node ...] | NodeFile | NodeClass}
Specifies the nodes from which to collect gpfs.snap data. This option supports all defined node

6 GPFS: Problem Determination Guide


classes. For general information on how to specify node names, see the Specifying nodes as input to
GPFS commands topic in the GPFS: Administration and Programming Reference.
--check-space
Specifies that space checking is performed before collecting data.
--no-check-space
Specifies that no space checking is performed. This is the default.
---check-space-only
Specifies that only space checking is performed. No data is collected.
--deadlock
Collects only the minimum amount of data necessary to debug a deadlock problem. Part of the data
collected is the output of the mmfsadm dump all command. This option ignores all other options
except for -a, -N, -d, and --prefix.
--quick
Collects less data when specified along with the --deadlock option. The output includes mmfsadm
dump most, mmfsadm dump kthreads, and 10 seconds of trace in addition to the usual gpfs.snap
output.
--exclude-aix-disk-attr
Specifies that data about AIX disk attributes will not be collected. Collecting data about AIX disk
attributes on an AIX node that has a large number of disks could be very time-consuming, so using
this option could help improve performance.
--exclude-aix-lvm
Specifies that data about the AIX Logical Volume Manager (LVM) will not be collected.
--exclude-net
Specifies that network-related information will not be collected.
--exclude-merge-logs
Specifies that merge logs and waiters will not be collected.
--gather-logs
Gathers, merges, and chronologically sorts all of the mmfs.log files. The results are stored in the
directory specified with -d option.
--mmdf
Specifies that mmdf output will be collected.
--prefix
Specifies that the prefix name gpfs.snap will be added to the tar file.

Use the -z option to generate a non-master snapshot. This is useful if there are many nodes on which to
take a snapshot, and only one master snapshot is needed. For a GPFS problem within a large cluster
(hundreds or thousands of nodes), one strategy might call for a single master snapshot (one invocation of
gpfs.snap with no options), and multiple non-master snapshots (multiple invocations of gpfs.snap with
the -z option).

Use the -N option to obtain gpfs.snap data from multiple nodes in the cluster. When the -N option is
used, the gpfs.snap command takes non-master snapshots of all the nodes specified with this option and
a master snapshot of the node on which it was invoked.

Using the gpfs.snap command


Running the gpfs.snap command with no options is similar to running gpfs.snap -a. It collects data from
all nodes in the cluster. This invocation creates a file that is made up of multiple gpfs.snap snapshots.
The file that is created includes a master snapshot of the node from which the gpfs.snap command was
invoked and non-master snapshots of each of other nodes in the cluster.

Chapter 1. Logs, dumps, and traces 7


If the node on which the gpfs.snap command is run is not a file system manager node, gpfs.snap creates
a non-master snapshot on the file system manager nodes.

The difference between a master snapshot and a non-master snapshot is the data that is gathered. A
master snapshot gathers information from nodes in the cluster. A master snapshot contains all data that a
non-master snapshot has. There are two categories of data that is collected:
1. Data that is always gathered by gpfs.snap (for master snapshots and non-master snapshots):
v “Data always gathered by gpfs.snap on all platforms”
v “Data always gathered by gpfs.snap on AIX” on page 9
v “Data always gathered by gpfs.snap on Linux” on page 9
v “Data always gathered by gpfs.snap on Windows” on page 10
2. Data that is gathered by gpfs.snap only in the case of a master snapshot. See “Data always gathered
by gpfs.snap for a master snapshot” on page 10.

Data always gathered by gpfs.snap on all platforms


These items are always obtained by the gpfs.snap command when gathering data for an AIX, Linux, or
Windows node:
1. The output of these commands:
v ls -l /user/lpp/mmfs/bin
v mmdevdiscover
v tspreparedisk -S
v mmfsadm dump malloc
v mmfsadm dump fs
v df -k
v ifconfig interface
v ipcs -a
v ls -l /dev
v mmfsadm dump alloc hist
v mmfsadm dump alloc stats
v mmfsadm dump allocmgr
v mmfsadm dump allocmgr hist
v mmfsadm dump allocmgr stats
v mmfsadm dump cfgmgr
v mmfsadm dump config
v mmfsadm dump dealloc stats
v mmfsadm dump disk
v mmfsadm dump mmap
v mmfsadm dump mutex
v mmfsadm dump nsd
| v mmfsadm dump rpc
v mmfsadm dump sgmgr
v mmfsadm dump stripe
v mmfsadm dump tscomm
v mmfsadm dump version
v mmfsadm dump waiters
v netstat with the -i, -r, -rn, -s, and -v options

8 GPFS: Problem Determination Guide


v ps -edf
v vmstat
2. The contents of these files:
v /etc/syslog.conf or /etc/syslog-ng.conf
v /tmp/mmfs/internal*
v /tmp/mmfs/trcrpt*
v /var/adm/ras/mmfs.log.*
v /var/mmfs/gen/*
v /var/mmfs/etc/*
v /var/mmfs/tmp/*
v /var/mmfs/ssl/* except for complete.map and id_rsa files

Data always gathered by gpfs.snap on AIX


These items are always obtained by the gpfs.snap command when gathering data for an AIX node:
1. The output of these commands:
v errpt -a
v lssrc -a
v lslpp -hac
v no -a
2. The contents of these files:
v /etc/filesystems
v /etc/trcfmt

Data always gathered by gpfs.snap on Linux


These items are always obtained by the gpfs.snap command when gathering data for a Linux node:
1. The output of these commands:
v dmesg
v fdisk -l
v lsmod
v lspci
v rpm -qa
v rpm --verify gpfs.base
v rpm --verify gpfs.docs
v rpm --verify gpfs.gpl
v rpm --verify gpfs.msg.en_US
2. The contents of these files:
v /etc/filesystems
v /etc/fstab
v /etc/*release
v /proc/cpuinfo
v /proc/version
v /usr/lpp/mmfs/src/config/site.mcr
v /var/log/messages*

Chapter 1. Logs, dumps, and traces 9


Data always gathered by gpfs.snap on Windows
These items are always obtained by the gpfs.snap command when gathering data for a Windows node:
1. The output from systeminfo.exe
2. Any raw trace files *.tmf and mmfs.trc*
3. The *.pdb symbols from /usr/lpp/mmfs/bin/symbols

Data always gathered by gpfs.snap for a master snapshot


When the gpfs.snap command is specified with no options, a master snapshot is taken on the node
where the command was issued. All of the information from “Data always gathered by gpfs.snap on all
platforms” on page 8, “Data always gathered by gpfs.snap on AIX” on page 9, “Data always gathered by
gpfs.snap on Linux” on page 9, and “Data always gathered by gpfs.snap on Windows” is obtained, as
well as this data:
1. The output of these commands:
v mmauth
v mmgetstate -a
v mmlscluster
v mmlsconfig
v mmlsdisk
v mmlsfileset
v mmlsfs
v mmlspolicy
v mmlsmgr
v mmlsnode -a
v mmlsnsd
v mmlssnapshot
v mmremotecluster
v mmremotefs
v tsstatus
2. The contents of the /var/adm/ras/mmfs.log.* file (on all nodes in the cluster)

The mmfsadm command


The mmfsadm command is intended for use by trained service personnel. IBM suggests you do not run
this command except under the direction of such personnel.

Note: The contents of mmfsadm output might vary from release to release, which could obsolete any
user programs that depend on that output. Therefore, we suggest that you do not create user programs
that invoke mmfsadm.

The mmfsadm command extracts data from GPFS without using locking, so that it can collect the data in
the event of locking errors. In certain rare cases, this can cause GPFS or the node to fail. Several options
of this command exist and might be required for use:
cleanup
Delete shared segments left by a previously failed GPFS daemon without actually restarting the
daemon.
dump what
Dumps the state of a large number of internal state values that might be useful in determining
the sequence of events. The what parameter can be set to all, indicating that all available data

10 GPFS: Problem Determination Guide


should be collected, or to another value, indicating more restricted collection of data. The output
is presented to STDOUT and should be collected by redirecting STDOUT.
showtrace
Shows the current level for each subclass of tracing available in GPFS. Trace level 14 provides the
highest level of tracing for the class and trace level 0 provides no tracing. Intermediate values
exist for most classes. More tracing requires more storage and results in a higher probability of
overlaying the required event.
trace class n
Sets the trace class to the value specified by n. Actual trace gathering only occurs when the
mmtracectl command has been issued.

Other options provide interactive GPFS debugging, but are not described here. Output from the
mmfsadm command will be required in almost all cases where a GPFS problem is being reported. The
mmfsadm command collects data only on the node where it is issued. Depending on the nature of the
problem, mmfsadm output might be required from several or all nodes. The mmfsadm output from the
file system manager is often required.

To determine where the file system manager is, issue the mmlsmgr command:
mmlsmgr

Output similar to this example is displayed:


file system manager node
---------------- ------------------
fs3 9.114.94.65 (c154n01)
fs2 9.114.94.73 (c154n09)
fs1 9.114.94.81 (c155n01)

Cluster manager node: 9.114.94.65 (c154n01)

The GPFS trace facility


GPFS includes many different trace points to facilitate rapid problem determination of failures.

GPFS tracing is based on the kernel trace facility on AIX, embedded GPFS trace subsystem on Linux, and
the Windows ETL subsystem on Windows. The level of detail that is gathered by the trace facility is
controlled by setting the trace levels using the mmtracectl command.

The mmtracectl command sets up and enables tracing using default settings for various common problem
situations. Using this command improves the probability of gathering accurate and reliable problem
determination information. For more information about the mmtracectl command, see the GPFS:
Administration and Programming Reference.

Generating GPFS trace reports


Use the mmtracectl command to configure trace-related configuration variables and to start and stop the
trace facility on any range of nodes in the GPFS cluster.

To configure and use the trace properly:


1. Issue the mmlsconfig dataStructureDump command to verify that a directory for dumps was created
when the cluster was configured. The default location for trace and problem determination data is
/tmp/mmfs. Use mmtracectl as instructed by service personnel to set trace configuration parameters
as required if the default parameters are insufficient. For example, if the problem results in GPFS
shutting down, set the traceRecycle variable with --trace-recycle as described in the mmtracectl
command in order to ensure that GPFS traces are performed at the time the error occurs.
If desired, specify another location for trace and problem determination data by issuing this
command:
Chapter 1. Logs, dumps, and traces 11
mmchconfig dataStructureDump=path_for_storage_of_dumps
2. To start the tracing facility on all nodes, issue this command:
mmtracectl --start
3. Re-create the problem.
4. When the event to be captured occurs, stop the trace as soon as possible by issuing this command:
mmtracectl --stop
5. The output of the GPFS trace facility is stored in /tmp/mmfs, unless the location was changed using
the mmchconfig command in Step 1 on page 11. Save this output.
6. If the problem results in a shutdown and restart of the GPFS daemon, set the traceRecycle variable as
necessary to start tracing automatically on daemon startup and stop the trace automatically on
daemon shutdown.

If the problem requires more detailed tracing, the IBM Support Center personnel might ask you to
modify the GPFS trace levels. Use the mmtracectl command to establish the required trace classes and
levels of tracing. The syntax to modify trace classes and levels is as follows:
mmtracectl --set --trace={io | all | def | "Class Level [Class Level ...}"]

For example, to tailor the trace level for I/O, issue the following command:
mmtracectl --set --trace=io

Once the trace levels are established, start the tracing by issuing:
mmtracectl --start

After the trace data has been gathered, stop the tracing by issuing:
mmtracectl --stop

To clear the trace settings and make sure tracing is turned off, issue:
mmtracectl --off

Other possible values that can be specified for the trace Class include:
afm
active file management
alloc
disk space allocation
allocmgr
allocation manager
basic
'basic' classes
brl
byte range locks
cksum
checksum services
cleanup
cleanup routines
cmd
ts commands
defrag
defragmentation

12 GPFS: Problem Determination Guide


dentry
dentry operations
dentryexit
daemon routine entry/exit
disk
physical disk I/O
disklease
disk lease
dmapi
Data Management API
ds data shipping
errlog
error logging
eventsExporter
events exporter
file
file operations
fs file system
fsck
online multinode fsck
ialloc
inode allocation
io physical I/O
kentryexit
kernel routine entry/exit
kernel
kernel operations
klockl
low-level vfs locking
ksvfs
generic kernel vfs information
lock
interprocess locking
log
recovery log
malloc
malloc and free in shared segment
mb mailbox message handling
mmpmon
mmpmon command
mnode
mnode operations
msg
call to routines in SharkMsg.h

Chapter 1. Logs, dumps, and traces 13


mutex
mutexes and condition variables
nsd
network shared disk
perfmon
performance monitors
pgalloc
page allocator tracing
pin
pinning to real memory
pit
parallel inode tracing
quota
quota management
rdma
rdma
sanergy
SANergy®
scsi
scsi services
sec
cluster security
shared
shared segments
smb
SMB locks
sp SP message handling
super
super_operations
tasking
tasking system but not Thread operations
thread
operations in Thread class
tm token manager
ts daemon specific code
user1
miscellaneous tracing and debugging
user2
miscellaneous tracing and debugging
vbhvl
behaviorals
vdb
vdisk debugger

14 GPFS: Problem Determination Guide


vdisk
vdisk
vhosp
vdisk hospital
vnode
vnode layer of VFS kernel support
vnop
one line per VNOP with all important information

The trace Level can be set to a value from 0 through 14, which represents an increasing level of detail. A
value of 0 turns tracing off. To display the trace level in use, issue the mmfsadm showtrace command.

On AIX, the –aix-trace-buffer-size option can be used to control the size of the trace buffer in memory.

On Linux nodes only, use the mmtracectl command to change the following:
v The trace buffer size in blocking mode.
For example, to set the trace buffer size in blocking mode to 8K, issue:
mmtracectl --set --tracedev-buffer-size=8K
v The raw data compression level.
For example, to set the trace raw data compression level to the best ratio, issue:
mmtracectl --set --tracedev-compression-level=9
v The trace buffer size in overwrite mode.
For example, to set the trace buffer size in overwrite mode to 32K, issue:
mmtracectl --set --tracedev-overwrite-buffer-size=32K
v When to overwrite the old data.
For example, to wait to overwrite the data until the trace data is written to the local disk and the
buffer is available again, issue:
mmtracectl --set --tracedev-write-mode=blocking

Note: Before switching between --tracedev-write-mode=overwrite and --tracedev-write-


mode=blocking, or vice versa, run the mmtracectl --stop command first. Next, run the mmtracectl --set
--tracedev-write-mode command to switch to the desired mode. Finally, restart tracing with the
mmtracectl --start command.

For more information about the mmtracectl command, see the GPFS: Administration and Programming
Reference.

Chapter 1. Logs, dumps, and traces 15


16 GPFS: Problem Determination Guide
Chapter 2. GPFS cluster state information
There are a number of GPFS commands used to obtain cluster state information.

The information is organized as follows:


v “The mmafmctl Device getstate command”
v “The mmdiag command”
v “The mmgetstate command”
v “The mmlscluster command” on page 18
v “The mmlsconfig command” on page 19
v “The mmrefresh command” on page 19
v “The mmsdrrestore command” on page 20
v “The mmexpelnode command” on page 20

The mmafmctl Device getstate command


The mmafmctl Device getstate command displays the status of active file management cache filesets and
gateway nodes.

When this command displays a NeedsResync target/fileset state, inconsistencies between home and cache
are being fixed automatically; however, unmount and mount operations are required to return the state to
Active.

The mmafmctl Device getstate command is fully described in the Commands topic in the GPFS:
Administration and Programming Reference.

The mmdiag command


The mmdiag command displays diagnostic information about the internal GPFS state on the current
node.

Use the mmdiag command to query various aspects of the GPFS internal state for troubleshooting and
tuning purposes. The mmdiag command displays information about the state of GPFS on the node where
it is executed. The command obtains the required information by querying the GPFS daemon process
(mmfsd), and thus will only function when the GPFS daemon is running.

The mmdiag command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.

The mmgetstate command


The mmgetstate command displays the state of the GPFS daemon on one or more nodes.

These flags are of interest for problem determination:


-a List all nodes in the GPFS cluster. The option does not display information for nodes that cannot be
reached. You may obtain more information if you specify the -v option.
-L Additionally display quorum, number of nodes up, and total number of nodes.

© Copyright IBM Corporation © IBM 2014 17


The total number of nodes may sometimes be larger than the actual number of nodes in the cluster.
This is the case when nodes from other clusters have established connections for the purposes of
mounting a file system that belongs to your cluster.
-s Display summary information: number of local and remote nodes that have joined in the cluster,
number of quorum nodes, and so forth.
-v Display intermediate error messages.

The remaining flags have the same meaning as in the mmshutdown command. They can be used to
specify the nodes on which to get the state of the GPFS daemon.

The GPFS states recognized and displayed by this command are:


active
GPFS is ready for operations.
arbitrating
A node is trying to form quorum with the other available nodes.
down
GPFS daemon is not running on the node or is recovering from an internal error.
unknown
Unknown value. Node cannot be reached or some other error occurred.

For example, to display the quorum, the number of nodes up, and the total number of nodes, issue:
mmgetstate -L -a

The system displays output similar to:


Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
--------------------------------------------------------------------
2 k154n06 1* 3 7 active quorum node
3 k155n05 1* 3 7 active quorum node
4 k155n06 1* 3 7 active quorum node
5 k155n07 1* 3 7 active
6 k155n08 1* 3 7 active
9 k156lnx02 1* 3 7 active
11 k155n09 1* 3 7 active

where *, if present, indicates that tiebreaker disks are being used.

The mmgetstate command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.

The mmlscluster command


The mmlscluster command displays GPFS cluster configuration information.

The syntax of the mmlscluster command is:


mmlscluster

The system displays output similar to:


GPFS cluster information
========================
GPFS cluster name: cluster1.kgn.ibm.com
GPFS cluster id: 680681562214606028
GPFS UID domain: cluster1.kgn.ibm.com
Remote shell command: /usr/bin/rsh
Remote file copy command: /usr/bin/rcp
| Repository type: server-based

18 GPFS: Problem Determination Guide


GPFS cluster configuration servers:
-----------------------------------
Primary server: k164n06.kgn.ibm.com
Secondary server: k164n05.kgn.ibm.com

Node Daemon node name IP address Admin node name Designation


----------------------------------------------------------------------------------
1 k164n04.kgn.ibm.com 198.117.68.68 k164n04.kgn.ibm.com quorum
2 k164n05.kgn.ibm.com 198.117.68.71 k164n05.kgn.ibm.com quorum
3 k164n06.kgn.ibm.com 198.117.68.70 k164n06.kgn.ibm.com quorum-manager

The mmlscluster command is fully described in the Commands topic in the: GPFS: Administration and
Programming Reference.

The mmlsconfig command


The mmlsconfig command displays current configuration data for a GPFS cluster.

Depending on your configuration, additional information not documented in either the mmcrcluster
command or the mmchconfig command may be displayed to assist in problem determination.

If a configuration parameter is not shown in the output of this command, the default value for that
parameter, as documented in the mmchconfig command, is in effect.

The syntax of the mmlsconfig command is:


mmlsconfig

The system displays information similar to:


Configuration data for cluster cl1.cluster:
---------------------------------------------
clusterName cl1.cluster
clusterId 680752107138921233
autoload no
| minReleaseLevel 4.1.0.0
pagepool 1G
maxblocksize 4m
[c5n97g]
pagepool 3500m
[common]
cipherList EXP-RC4-MD5

File systems in cluster cl1.cluster:


--------------------------------------
/dev/fs2

The mmlsconfig command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.

The mmrefresh command


The mmrefresh command is intended for use by experienced system administrators who know how to
collect data and run debugging routines.

Use the mmrefresh command only when you suspect that something is not working as expected and the
reason for the malfunction is a problem with the GPFS configuration data. For example, a mount
command fails with a device not found error, and you know that the file system exists. Another example
is if any of the files in the /var/mmfs/gen directory were accidentally erased. Under normal
circumstances, the GPFS command infrastructure maintains the cluster data files automatically and there
is no need for user intervention.

Chapter 2. GPFS cluster state information 19


The mmrefresh command places the most recent GPFS cluster configuration data files on the specified
nodes. The syntax of this command is:
mmrefresh [-f] [ -a | -N {Node[,Node...] | NodeFile | NodeClass}]

The -f flag can be used to force the GPFS cluster configuration data files to be rebuilt whether they
appear to be at the most current level or not. If no other option is specified, the command affects only the
node on which it is run. The remaining flags have the same meaning as in the mmshutdown command,
and are used to specify the nodes on which the refresh is to be performed.

For example, to place the GPFS cluster configuration data files at the latest level, on all nodes in the
cluster, issue:
mmrefresh -a

The mmsdrrestore command


The mmsdrrestore command is intended for use by experienced system administrators.

The mmsdrrestore command restores the latest GPFS system files on the specified nodes. If no nodes are
specified, the command restores the configuration information only on the node where it is invoked. If
the local GPFS configuration file is missing, the file specified with the -F option from the node specified
with the -p option is used instead. This command works best when used in conjunction with the
mmsdrbackup user exit, which is described in the GPFS user exits topic in the GPFS: Administration and
Programming Reference.

The mmsdrrestore command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.

The mmexpelnode command


The mmexpelnode command instructs the cluster manager to expel the target nodes and to run the
normal recovery protocol.

The cluster manager keeps a list of the expelled nodes. Expelled nodes will not be allowed to rejoin the
cluster until they are removed from the list using the -r or --reset option on the mmexpelnode command.
The expelled nodes information will also be reset if the cluster manager node goes down or is changed
with mmchmgr -c.

The syntax of the mmexpelnode command is:


mmexpelnode [-o | --once] [-f | --is-fenced] [-w | --wait] -N Node[,Node...]

Or,
mmexpelnode {-l | --list}

Or,
mmexpelnode {-r | --reset} -N {all | Node[,Node...]}

The flags used by this command are:


-o | --once
Specifies that the nodes should not be prevented from rejoining. After the recovery protocol
completes, expelled nodes will be allowed to rejoin the cluster immediately, without the need to first
invoke mmexpelnode --reset.
-f | --is-fenced
Specifies that the nodes are fenced out and precluded from accessing any GPFS disks without first

20 GPFS: Problem Determination Guide


rejoining the cluster (for example, the nodes were forced to reboot by turning off power). Using this
flag allows GPFS to start log recovery immediately, skipping the normal 35-second wait.
-w | --wait
Instructs the mmexpelnode command to wait until GPFS recovery for the failed node has completed
before it runs.
-l | --list
Lists all currently expelled nodes.
-r | --reset
Allows the specified nodes to rejoin the cluster (that is, resets the status of the nodes). To unexpel all
of the expelled nodes, issue: mmexpelnode -r -N all.
-N {all | Node[,Node...]}
Specifies a list of host names or IP addresses that represent the nodes to be expelled or unexpelled.
Specify the daemon interface host names or IP addresses as shown by the mmlscluster command.
The mmexpelnode command does not support administration node names or node classes.

Note: -N all can only be used to unexpel nodes.

Examples of the mmexpelnode command


1. To expel node c100c1rp3, issue the command:
mmexpelnode -N c100c1rp3
2. To show a list of expelled nodes, issue the command:
mmexpelnode --list
The system displays information similar to:
Node List
---------------------
192.168.100.35 (c100c1rp3.ppd.pok.ibm.com)
3. To allow node c100c1rp3 to rejoin the cluster, issue the command:
mmexpelnode -r -N c100c1rp3

Chapter 2. GPFS cluster state information 21


22 GPFS: Problem Determination Guide
Chapter 3. GPFS file system and disk information
The problem determination tools provided with GPFS for file system, disk and NSD problem
determination are intended for use by experienced system administrators who know how to collect data
and run debugging routines.

The information is organized as follows:


v “Restricted mode mount”
v “Read-only mode mount”
v “The lsof command” on page 24
v “The mmlsmount command” on page 24
v “The mmapplypolicy -L command” on page 25
v “The mmcheckquota command” on page 31
v “The mmlsnsd command” on page 31
v “The mmwindisk command” on page 32
v “The mmfileid command” on page 33
v “The SHA digest” on page 35

Restricted mode mount


GPFS provides a capability to mount a file system in a restricted mode when significant data structures
have been destroyed by disk failures or other error conditions.

Restricted mode mount is not intended for normal operation, but may allow the recovery of some user
data. Only data which is referenced by intact directories and metadata structures would be available.

Attention:
1. Follow the procedures in “Information to collect before contacting the IBM Support Center” on page
115, and then contact the IBM Support Center before using this capability.
2. Attempt this only after you have tried to repair the file system with the mmfsck command. (See
“Why does the offline mmfsck command fail with "Error creating internal storage"?” on page 111.)
3. Use this procedure only if the failing disk is attached to an AIX or Linux node.

Some disk failures can result in the loss of enough metadata to render the entire file system unable to
mount. In that event it might be possible to preserve some user data through a restricted mode mount. This
facility should only be used if a normal mount does not succeed, and should be considered a last resort
to save some data after a fatal disk failure.

Restricted mode mount is invoked by using the mmmount command with the -o rs flags. After a
restricted mode mount is done, some data may be sufficiently accessible to allow copying to another file
system. The success of this technique depends on the actual disk structures damaged.

Read-only mode mount


Some disk failures can result in the loss of enough metadata to make the entire file system unable to
mount. In that event, it might be possible to preserve some user data through a read-only mode mount.

Attention: Attempt this only after you have tried to repair the file system with the mmfsck command.

© Copyright IBM Corp. 2014 23


This facility should be used only if a normal mount does not succeed, and should be considered a last
resort to save some data after a fatal disk failure.

Read-only mode mount is invoked by using the mmmount command with the -o ro flags. After a
read-only mode mount is done, some data may be sufficiently accessible to allow copying to another file
system. The success of this technique depends on the actual disk structures damaged.

The lsof command


The lsof (list open files) command returns the user processes that are actively using a file system. It is
sometimes helpful in determining why a file system remains in use and cannot be unmounted.

The lsof command is available in Linux distributions or by using anonymous ftp from
lsof.itap.purdue.edu (cd to /pub/tools/unix/lsof). The inventor of the lsof command is Victor A. Abell
([email protected]), Purdue University Computing Center.

The mmlsmount command


The mmlsmount command lists the nodes that have a given GPFS file system mounted.

Use the -L option to see the node name and IP address of each node that has the file system in use. This
command can be used for all file systems, all remotely mounted file systems, or file systems mounted on
nodes of certain clusters.

While not specifically intended as a service aid, the mmlsmount command is useful in these situations:
1. When writing and debugging new file system administrative procedures, to determine which nodes
have a file system mounted and which do not.
2. When mounting a file system on multiple nodes, to determine which nodes have successfully
completed the mount and which have not.
3. When a file system is mounted, but appears to be inaccessible to some nodes but accessible to others,
to determine the extent of the problem.
4. When a normal (not force) unmount has not completed, to determine the affected nodes.
5. When a file system has force unmounted on some nodes but not others, to determine the affected
nodes.

For example, to list the nodes having all file systems mounted:
mmlsmount all -L

The system displays output similar to:


File system fs2 is mounted on 7 nodes:
192.168.3.53 c25m3n12 c34.cluster
192.168.110.73 c34f2n01 c34.cluster
192.168.110.74 c34f2n02 c34.cluster
192.168.148.77 c12c4apv7 c34.cluster
192.168.132.123 c20m2n03 c34.cluster (internal mount)
192.168.115.28 js21n92 c34.cluster (internal mount)
192.168.3.124 c3m3n14 c3.cluster

File system fs3 is not mounted.

File system fs3 (c3.cluster:fs3) is mounted on 7 nodes:


192.168.2.11 c2m3n01 c3.cluster
192.168.2.12 c2m3n02 c3.cluster
192.168.2.13 c2m3n03 c3.cluster
192.168.3.123 c3m3n13 c3.cluster

24 GPFS: Problem Determination Guide


192.168.3.124 c3m3n14 c3.cluster
192.168.110.74 c34f2n02 c34.cluster
192.168.80.20 c21f1n10 c21.cluster

The mmlsmount command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.

The mmapplypolicy -L command


Use the -L flag of the mmapplypolicy command when you are using policy files to manage storage
resources and the data stored on those resources. This command has different levels of diagnostics to
help debug and interpret the actions of a policy file.

The -L flag, used in conjunction with the -I test flag, allows you to display the actions that would be
performed by a policy file without actually applying it. This way, potential errors and misunderstandings
can be detected and corrected without actually making these mistakes.

These are the trace levels for the mmapplypolicy -L flag:


Value Description
0 Displays only serious errors.
1 Displays some information as the command runs, but not for each file.
2 Displays each chosen file and the scheduled action.
3 All of the above, plus displays each candidate file and the applicable rule.
4 All of the above, plus displays each explicitly excluded file, and the applicable rule.
5 All of the above, plus displays the attributes of candidate and excluded files.
6 All of the above, plus displays files that are not candidate files, and their attributes.

These terms are used:


candidate file
A file that matches a policy rule.
chosen file
A candidate file that has been scheduled for an action.

This policy file is used in the examples that follow:


/* Exclusion rule */
RULE ’exclude *.save files’ EXCLUDE WHERE NAME LIKE ’%.save’
/* Deletion rule */
RULE ’delete’ DELETE FROM POOL ’sp1’ WHERE NAME LIKE ’%tmp%’
/* Migration rule */
RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WHERE NAME LIKE ’%file%’
/* Typo in rule : removed later */
RULE ’exclude 2’ EXCULDE
/* List rule */
RULE EXTERNAL LIST ’tmpfiles’ EXEC ’/tmp/exec.list’
RULE ’all’ LIST ’tmpfiles’ where name like ’%tmp%’

These are some of the files in file system /fs1:


. .. data1 file.tmp0 file.tmp1 file0 file1 file1.save file2.save

The mmapplypolicy command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.

Chapter 3. GPFS file system and disk information 25


mmapplypolicy -L 0
Use this option to display only serious errors.

In this example, there is an error in the policy file. This command:


mmapplypolicy fs1 -P policyfile -I test -L 0

produces output similar to this:


[E:-1] Error while loading policy rules.
PCSQLERR: Unexpected SQL identifier token - ’EXCULDE’.
PCSQLCTX: at line 8 of 8: RULE ’exclude 2’ {{{EXCULDE}}}
mmapplypolicy: Command failed. Examine previous error messages to determine cause.

The error in the policy file is corrected by removing these lines:


/* Typo in rule */
RULE ’exclude 2’ EXCULDE

Now rerun the command:


mmapplypolicy fs1 -P policyfile -I test -L 0

No messages are produced because no serious errors were detected.

mmapplypolicy -L 1
Use this option to display all of the information (if any) from the previous level, plus some information
as the command runs, but not for each file. This option also displays total numbers for file migration and
deletion.

This command:
mmapplypolicy fs1 -P policyfile -I test -L 1

produces output similar to this:


[I] GPFS Current Data Pool Utilization in KB and %
sp1 5120 19531264 0.026214%
system 102400 19531264 0.524288%
[I] Loaded policy rules from policyfile.
Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2009-03-04@02:40:12 UTC
parsed 0 Placement Rules, 0 Restore Rules, 3 Migrate/Delete/Exclude Rules,
1 List Rules, 1 External Pool/List Rules
/* Exclusion rule */
RULE ’exclude *.save files’ EXCLUDE WHERE NAME LIKE ’%.save’
/* Deletion rule */
RULE ’delete’ DELETE FROM POOL ’sp1’ WHERE NAME LIKE ’%tmp%’
/* Migration rule */
RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WHERE NAME LIKE ’%file%’
/* List rule */
RULE EXTERNAL LIST ’tmpfiles’ EXEC ’/tmp/exec.list’
RULE ’all’ LIST ’tmpfiles’ where name like ’%tmp%’
[I] Directories scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
[I] Inodes scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
[I] Summary of Rule Applicability and File Choices:
Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule
0 2 32 0 0 0 RULE ’exclude *.save files’ EXCLUDE WHERE(.)
1 2 16 2 16 0 RULE ’delete’ DELETE FROM POOL ’sp1’ WHERE(.)
2 2 32 2 32 0 RULE ’migration to system pool’ MIGRATE FROM POOL \
’sp1’ TO POOL ’system’ WHERE(.)
3 2 16 2 16 0 RULE ’all’ LIST ’tmpfiles’ WHERE(.)

[I] Files with no applicable rules: 5.

[I] GPFS Policy Decisions and File Choice Totals:

26 GPFS: Problem Determination Guide


Chose to migrate 32KB: 2 of 2 candidates;
Chose to premigrate 0KB: 0 candidates;
Already co-managed 0KB: 0 candidates;
Chose to delete 16KB: 2 of 2 candidates;
Chose to list 16KB: 2 of 2 candidates;
0KB of chosen data is illplaced or illreplicated;
Predicted Data Pool Utilization in KB and %:
sp1 5072 19531264 0.025969%
system 102432 19531264 0.524451%

mmapplypolicy -L 2
Use this option to display all of the information from the previous levels, plus each chosen file and the
scheduled migration or deletion action.

This command:
mmapplypolicy fs1 -P policyfile -I test -L 2

produces output similar to this:


[I] GPFS Current Data Pool Utilization in KB and %
sp1 5120 19531264 0.026214%
system 102400 19531264 0.524288%
[I] Loaded policy rules from policyfile.
Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2009-03-04@02:43:10 UTC
parsed 0 Placement Rules, 0 Restore Rules, 3 Migrate/Delete/Exclude Rules,
1 List Rules, 1 External Pool/List Rules
/* Exclusion rule */
RULE ’exclude *.save files’ EXCLUDE WHERE NAME LIKE ’%.save’
/* Deletion rule */
RULE ’delete’ DELETE FROM POOL ’sp1’ WHERE NAME LIKE ’%tmp%’
/* Migration rule */
RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WHERE NAME LIKE ’%file%’
/* List rule */
RULE EXTERNAL LIST ’tmpfiles’ EXEC ’/tmp/exec.list’
RULE ’all’ LIST ’tmpfiles’ where name like ’%tmp%’
[I] Directories scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
[I] Inodes scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
WEIGHT(INF) LIST ’tmpfiles’ /fs1/file.tmp1 SHOW()
WEIGHT(INF) LIST ’tmpfiles’ /fs1/file.tmp0 SHOW()
WEIGHT(INF) DELETE /fs1/file.tmp1 SHOW()
WEIGHT(INF) DELETE /fs1/file.tmp0 SHOW()
WEIGHT(INF) MIGRATE /fs1/file1 TO POOL system SHOW()
WEIGHT(INF) MIGRATE /fs1/file0 TO POOL system SHOW()
[I] Summary of Rule Applicability and File Choices:
Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule
0 2 32 0 0 0 RULE ’exclude *.save files’ EXCLUDE WHERE(.)
1 2 16 2 16 0 RULE ’delete’ DELETE FROM POOL ’sp1’ WHERE(.)
2 2 32 2 32 0 RULE ’migration to system pool’ MIGRATE FROM POOL \
’sp1’ TO POOL ’system’ WHERE(.)
3 2 16 2 16 0 RULE ’all’ LIST ’tmpfiles’ WHERE(.)

[I] Files with no applicable rules: 5.

[I] GPFS Policy Decisions and File Choice Totals:


Chose to migrate 32KB: 2 of 2 candidates;
Chose to premigrate 0KB: 0 candidates;
Already co-managed 0KB: 0 candidates;
Chose to delete 16KB: 2 of 2 candidates;
Chose to list 16KB: 2 of 2 candidates;
0KB of chosen data is illplaced or illreplicated;
Predicted Data Pool Utilization in KB and %:
sp1 5072 19531264 0.025969%
system 102432 19531264 0.524451%

where the lines:

Chapter 3. GPFS file system and disk information 27


WEIGHT(INF) LIST ’tmpfiles’ /fs1/file.tmp1 SHOW()
WEIGHT(INF) LIST ’tmpfiles’ /fs1/file.tmp0 SHOW()
WEIGHT(INF) DELETE /fs1/file.tmp1 SHOW()
WEIGHT(INF) DELETE /fs1/file.tmp0 SHOW()
WEIGHT(INF) MIGRATE /fs1/file1 TO POOL system SHOW()
WEIGHT(INF) MIGRATE /fs1/file0 TO POOL system SHOW()

show the chosen files and the scheduled action.

mmapplypolicy -L 3
Use this option to display all of the information from the previous levels, plus each candidate file and the
applicable rule.

This command:
mmapplypolicy fs1-P policyfile -I test -L 3

produces output similar to this:


[I] GPFS Current Data Pool Utilization in KB and %
sp1 5120 19531264 0.026214%
system 102400 19531264 0.524288%
[I] Loaded policy rules from policyfile.
Evaluating MIGRATE/DELETE/EXCLUDE rules with CURRENT_TIMESTAMP = 2009-03-04@02:32:16 UTC
parsed 0 Placement Rules, 0 Restore Rules, 3 Migrate/Delete/Exclude Rules,
1 List Rules, 1 External Pool/List Rules
/* Exclusion rule */
RULE ’exclude *.save files’ EXCLUDE WHERE NAME LIKE ’%.save’
/* Deletion rule */
RULE ’delete’ DELETE FROM POOL ’sp1’ WHERE NAME LIKE ’%tmp%’
/* Migration rule */
RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WHERE NAME LIKE ’%file%’
/* List rule */
RULE EXTERNAL LIST ’tmpfiles’ EXEC ’/tmp/exec.list’
RULE ’all’ LIST ’tmpfiles’ where name like ’%tmp%’
[I] Directories scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
/fs1/file.tmp1 RULE ’delete’ DELETE FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp1 RULE ’all’ LIST ’tmpfiles’ WEIGHT(INF)
/fs1/file.tmp0 RULE ’delete’ DELETE FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp0 RULE ’all’ LIST ’tmpfiles’ WEIGHT(INF)
/fs1/file1 RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)
/fs1/file0 RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)
[I] Inodes scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
WEIGHT(INF) LIST ’tmpfiles’ /fs1/file.tmp1 SHOW()
WEIGHT(INF) LIST ’tmpfiles’ /fs1/file.tmp0 SHOW()
WEIGHT(INF) DELETE /fs1/file.tmp1 SHOW()
WEIGHT(INF) DELETE /fs1/file.tmp0 SHOW()
WEIGHT(INF) MIGRATE /fs1/file1 TO POOL system SHOW()
WEIGHT(INF) MIGRATE /fs1/file0 TO POOL system SHOW()
[I] Summary of Rule Applicability and File Choices:
Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule
0 2 32 0 0 0 RULE ’exclude *.save files’ EXCLUDE WHERE(.)
1 2 16 2 16 0 RULE ’delete’ DELETE FROM POOL ’sp1’ WHERE(.)
2 2 32 2 32 0 RULE ’migration to system pool’ MIGRATE FROM POOL \
’sp1’ TO POOL ’system’ WHERE(.)
3 2 16 2 16 0 RULE ’all’ LIST ’tmpfiles’ WHERE(.)

[I] Files with no applicable rules: 5.

[I] GPFS Policy Decisions and File Choice Totals:


Chose to migrate 32KB: 2 of 2 candidates;
Chose to premigrate 0KB: 0 candidates;
Already co-managed 0KB: 0 candidates;
Chose to delete 16KB: 2 of 2 candidates;
Chose to list 16KB: 2 of 2 candidates;

28 GPFS: Problem Determination Guide


0KB of chosen data is illplaced or illreplicated;
Predicted Data Pool Utilization in KB and %:
sp1 5072 19531264 0.025969%
system 102432 19531264 0.524451%

where the lines:


/fs1/file.tmp1 RULE ’delete’ DELETE FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp1 RULE ’all’ LIST ’tmpfiles’ WEIGHT(INF)
/fs1/file.tmp0 RULE ’delete’ DELETE FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp0 RULE ’all’ LIST ’tmpfiles’ WEIGHT(INF)
/fs1/file1 RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)
/fs1/file0 RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)

show the candidate files and the applicable rules.

mmapplypolicy -L 4
Use this option to display all of the information from the previous levels, plus the name of each explicitly
excluded file, and the applicable rule.

This command:
mmapplypolicy fs1 -P policyfile -I test -L 4

produces the following additional information:


[I] Directories scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
/fs1/file1.save RULE ’exclude *.save files’ EXCLUDE
/fs1/file2.save RULE ’exclude *.save files’ EXCLUDE
/fs1/file.tmp1 RULE ’delete’ DELETE FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp1 RULE ’all’ LIST ’tmpfiles’ WEIGHT(INF)
/fs1/file.tmp0 RULE ’delete’ DELETE FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp0 RULE ’all’ LIST ’tmpfiles’ WEIGHT(INF)
/fs1/file1 RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)
/fs1/file0 RULE ’migration to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)

where the lines:


/fs1/file1.save RULE ’exclude *.save files’ EXCLUDE
/fs1/file2.save RULE ’exclude *.save files’ EXCLUDE

indicate that there are two excluded files, /fs1/file1.save and /fs1/file2.save.

mmapplypolicy -L 5
Use this option to display all of the information from the previous levels, plus the attributes of candidate
and excluded files.

These attributes include:


v MODIFICATION_TIME
v USER_ID
v GROUP_ID
v FILE_SIZE
v POOL_NAME
v ACCESS_TIME
v KB_ALLOCATED
v FILESET_NAME

This command:
mmapplypolicy fs1 -P policyfile -I test -L 5

Chapter 3. GPFS file system and disk information 29


produces the following additional information:
[I] Directories scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
/fs1/file1.save [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE ’exclude \
*.save files’ EXCLUDE
/fs1/file2.save [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-03@21:19:57 16 root] RULE ’exclude \
*.save files’ EXCLUDE
/fs1/file.tmp1 [2009-03-04@02:09:31 0 0 0 sp1 2009-03-04@02:09:31 0 root] RULE ’delete’ DELETE \
FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp1 [2009-03-04@02:09:31 0 0 0 sp1 2009-03-04@02:09:31 0 root] RULE ’all’ LIST \
’tmpfiles’ WEIGHT(INF)
/fs1/file.tmp0 [2009-03-04@02:09:38 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE ’delete’ \
DELETE FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp0 [2009-03-04@02:09:38 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE ’all’ \
LIST ’tmpfiles’ WEIGHT(INF)
/fs1/file1 [2009-03-03@21:32:41 0 0 16384 sp1 2009-03-03@21:32:41 16 root] RULE ’migration \
to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)
/fs1/file0 [2009-03-03@21:21:11 0 0 16384 sp1 2009-03-03@21:32:41 16 root] RULE ’migration \
to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)

where the lines:


/fs1/file1.save [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE ’exclude \
*.save files’ EXCLUDE
/fs1/file2.save [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-03@21:19:57 16 root] RULE ’exclude \
*.save files’ EXCLUDE

show the attributes of excluded files /fs1/file1.save and /fs1/file2.save.

mmapplypolicy -L 6
Use this option to display all of the information from the previous levels, plus files that are not candidate
files, and their attributes.

These attributes include:


v MODIFICATION_TIME
v USER_ID
v GROUP_ID
v FILE_SIZE
v POOL_NAME
v ACCESS_TIME
v KB_ALLOCATED
v FILESET_NAME

This command:
mmapplypolicy fs1 -P policyfile -I test -L 6

produces the following additional information:


[I] Directories scan: 10 files, 1 directories, 0 other objects, 0 ’skipped’ files and/or errors.
/fs1/. [2009-03-04@02:10:43 0 0 8192 system 2009-03-04@02:17:43 8 root] NO RULE APPLIES
/fs1/file1.save [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE \
’exclude *.save files’ EXCLUDE
/fs1/file2.save [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-03@21:19:57 16 root] RULE \
’exclude *.save files’ EXCLUDE
/fs1/file.tmp1 [2009-03-04@02:09:31 0 0 0 sp1 2009-03-04@02:09:31 0 root] RULE ’delete’ \
DELETE FROM POOL ’sp1’ WEIGHT(INF)
/fs1/file.tmp1 [2009-03-04@02:09:31 0 0 0 sp1 2009-03-04@02:09:31 0 root] RULE ’all’ LIST \
’tmpfiles’ WEIGHT(INF)
/fs1/data1 [2009-03-03@21:20:23 0 0 0 sp1 2009-03-04@02:09:31 0 root] NO RULE APPLIES
/fs1/file.tmp0 [2009-03-04@02:09:38 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE ’delete’ \
DELETE FROM POOL ’sp1’ WEIGHT(INF)

30 GPFS: Problem Determination Guide


/fs1/file.tmp0 [2009-03-04@02:09:38 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE ’all’ LIST \
’tmpfiles’ WEIGHT(INF)
/fs1/file1 [2009-03-03@21:32:41 0 0 16384 sp1 2009-03-03@21:32:41 16 root] RULE ’migration \
to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)
/fs1/file0 [2009-03-03@21:21:11 0 0 16384 sp1 2009-03-03@21:32:41 16 root] RULE ’migration \
to system pool’ MIGRATE FROM POOL ’sp1’ TO POOL ’system’ WEIGHT(INF)

where the line:


/fs1/data1 [2009-03-03@21:20:23 0 0 0 sp1 2009-03-04@02:09:31 0 root] NO RULE APPLIES

contains information about the data1 file, which is not a candidate file.

The mmcheckquota command


The mmcheckquota command counts inode and space usage for a file system and writes the collected
data into quota files.

Indications leading you to the conclusion that you should run the mmcheckquota command include:
v MMFS_QUOTA error log entries. This error log entry is created when the quota manager has a
problem reading or writing the quota file.
v Quota information is lost due to node failure. Node failure could leave users unable to open files or
deny them disk space that their quotas should allow.
v The in doubt value is approaching the quota limit. The sum of the in doubt value and the current usage
may not exceed the hard limit. Consequently, the actual block space and number of files available to
the user of the group may be constrained by the in doubt value. Should the in doubt value approach a
significant percentage of the quota, use the mmcheckquota command to account for the lost space and
files.
v User, group, or fileset quota files are corrupted.

During the normal operation of file systems with quotas enabled (not running mmcheckquota online),
the usage data reflects the actual usage of the blocks and inodes in the sense that if you delete files you
should see the usage amount decrease. The in doubt value does not reflect how much the user has used
already, it is just the amount of quotas that the quota server has assigned to its clients. The quota server
does not know whether the assigned amount has been used or not. The only situation where the in doubt
value is important to the user is when the sum of the usage and the in doubt value is greater than the
user's quota hard limit. In this case, the user is not allowed to allocate more blocks or inodes unless he
brings the usage down.

The mmcheckquota command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.

The mmlsnsd command


The mmlsnsd command displays information about the currently defined disks in the cluster.

For example, if you issue mmlsnsd, your output is similar to this:


File system Disk name NSD servers
---------------------------------------------------------------------------
fs2 hd3n97 c5n97g.ppd.pok.ibm.com,c5n98g.ppd.pok.ibm.com,c5n99g.ppd.pok.ibm.com
fs2 hd4n97 c5n97g.ppd.pok.ibm.com,c5n98g.ppd.pok.ibm.com,c5n99g.ppd.pok.ibm.com
fs2 hd5n98 c5n98g.ppd.pok.ibm.com,c5n97g.ppd.pok.ibm.com,c5n99g.ppd.pok.ibm.com
fs2 hd6n98 c5n98g.ppd.pok.ibm.com,c5n97g.ppd.pok.ibm.com,c5n99g.ppd.pok.ibm.com
fs2 sdbnsd c5n94g.ppd.pok.ibm.com,c5n96g.ppd.pok.ibm.com
fs2 sdcnsd c5n94g.ppd.pok.ibm.com,c5n96g.ppd.pok.ibm.com
fs2 sddnsd c5n94g.ppd.pok.ibm.com,c5n96g.ppd.pok.ibm.com
fs2 sdensd c5n94g.ppd.pok.ibm.com,c5n96g.ppd.pok.ibm.com
fs2 sdgnsd c5n94g.ppd.pok.ibm.com,c5n96g.ppd.pok.ibm.com

Chapter 3. GPFS file system and disk information 31


fs2 sdfnsd c5n94g.ppd.pok.ibm.com,c5n96g.ppd.pok.ibm.com
fs2 sdhnsd c5n94g.ppd.pok.ibm.com,c5n96g.ppd.pok.ibm.com
(free disk) hd2n97 c5n97g.ppd.pok.ibm.com,c5n98g.ppd.pok.ibm.com

To find out the local device names for these disks, use the mmlsnsd command with the -m option. For
example, issuing mmlsnsd -m produces output similar to this:
Disk name NSD volume ID Device Node name Remarks
------------------------------------------------------------------------------------
hd2n97 0972846145C8E924 /dev/hdisk2 c5n97g.ppd.pok.ibm.com server node
hd2n97 0972846145C8E924 /dev/hdisk2 c5n98g.ppd.pok.ibm.com server node
hd3n97 0972846145C8E927 /dev/hdisk3 c5n97g.ppd.pok.ibm.com server node
hd3n97 0972846145C8E927 /dev/hdisk3 c5n98g.ppd.pok.ibm.com server node
hd4n97 0972846145C8E92A /dev/hdisk4 c5n97g.ppd.pok.ibm.com server node
hd4n97 0972846145C8E92A /dev/hdisk4 c5n98g.ppd.pok.ibm.com server node
hd5n98 0972846245EB501C /dev/hdisk5 c5n97g.ppd.pok.ibm.com server node
hd5n98 0972846245EB501C /dev/hdisk5 c5n98g.ppd.pok.ibm.com server node
hd6n98 0972846245DB3AD8 /dev/hdisk6 c5n97g.ppd.pok.ibm.com server node
hd6n98 0972846245DB3AD8 /dev/hdisk6 c5n98g.ppd.pok.ibm.com server node
hd7n97 0972846145C8E934 /dev/hd7n97 c5n97g.ppd.pok.ibm.com server node

To obtain extended information for NSDs, use the mmlsnsd command with the -X option. For example,
issuing mmlsnsd -X produces output similar to this:
Disk name NSD volume ID Device Devtype Node name Remarks
---------------------------------------------------------------------------------------------------
hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no
hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no
hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no
hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no
sdfnsd 0972845E45F02E81 /dev/sdf generic c5n94g.ppd.pok.ibm.com server node
sdfnsd 0972845E45F02E81 /dev/sdm generic c5n96g.ppd.pok.ibm.com server node

The mmlsnsd command is fully described in the Commands topic in the GPFS: Administration and
Programming Reference.

The mmwindisk command


On Windows nodes, use the mmwindisk command to view all disks known to the operating system
along with partitioning information relevant to GPFS.

For example, if you issue mmwindisk list, your output is similar to this:
Disk Avail Type Status Size GPFS Partition ID
---- ----- ------- --------- -------- ------------------------------------
0 BASIC ONLINE 137 GiB
1 GPFS ONLINE 55 GiB 362DD84E-3D2E-4A59-B96B-BDE64E31ACCF
2 GPFS ONLINE 200 GiB BD5E64E4-32C8-44CE-8687-B14982848AD2
3 GPFS ONLINE 55 GiB B3EC846C-9C41-4EFD-940D-1AFA6E2D08FB
4 GPFS ONLINE 55 GiB 6023455C-353D-40D1-BCEB-FF8E73BF6C0F
5 GPFS ONLINE 55 GiB 2886391A-BB2D-4BDF-BE59-F33860441262
6 GPFS ONLINE 55 GiB 00845DCC-058B-4DEB-BD0A-17BAD5A54530
7 GPFS ONLINE 55 GiB 260BCAEB-6E8A-4504-874D-7E07E02E1817
8 GPFS ONLINE 55 GiB 863B6D80-2E15-457E-B2D5-FEA0BC41A5AC
9 YES UNALLOC OFFLINE 55 GiB
10 YES UNALLOC OFFLINE 200 GiB

Where:
Disk
is the Windows disk number as shown in the Disk Management console and the DISKPART
command-line utility.
Avail
shows the value YES when the disk is available and in a state suitable for creating an NSD.

32 GPFS: Problem Determination Guide


GPFS Partition ID
is the unique ID for the GPFS partition on the disk.

The mmwindisk command does not provide the NSD volume ID. You can use mmlsnsd -m to find the
relationship between NSDs and devices, which are disk numbers on Windows.

The mmfileid command


The mmfileid command determines which files are located on areas of a disk that are damaged or
considered to be suspect.

Attention: Use this command only when directed by the IBM Support Center.

Before running mmfileid, you must run a disk analysis utility and obtain the disk sector numbers that
are suspect or known to be damaged. These sectors are input to the mmfileid command.

The syntax is:


mmfileid Device
{-d DiskDesc | -F DescFile}
[-o OutputFile] [-f NumThreads] [-t Directory]
[-N {Node[,Node...] | NodeFile | NodeClass}]

The input parameters are:


Device
The device name for the file system on which this utility is to be run. This must be the first
parameter and is required.
-d DiskDesc
A descriptor identifying the disk to be scanned. DiskDesc has this format:
NodeName:DiskName[:PhysAddr1[-PhysAddr2]]

Or,
:{NsdName|DiskNum|BROKEN}[:PhysAddr1[-PhysAddr2]]
NodeName
Specifies a node in the GPFS cluster that has access to the disk to scan. NodeName must be
specified if the disk is identified using its physical volume name. NodeName should be omitted if
the disk is identified with its NSD name, its GPFS disk ID number, or if the keyword BROKEN is
used.
DiskName
Specifies the physical volume name of the disk to scan as known on node NodeName.
NsdName
Specifies the GPFS NSD name of the disk to scan.
DiskNum
Specifies the GPFS disk ID number of the disk to scan as displayed by the mmlsdisk -L
command.
BROKEN
Specifies that all disks in the file system should be scanned to find files that have broken
addresses resulting in lost data.
PhysAddr1[-PhysAddr2]
Specifies the range of physical disk addresses to scan. The default value for PhysAddr1 is zero.
The default value for PhysAddr2 is the value for PhysAddr1.
If both PhysAddr1 and PhysAddr2 are zero, the entire disk is searched.

Chapter 3. GPFS file system and disk information 33


Examples of valid disk descriptors are:
k148n07:hdisk9:2206310-2206810
:gpfs1008nsd:
:10:27645856
:BROKEN
-F DescFile
Specifies a file containing a list of disk descriptors, one per line.
-f NumThreads
Specifies the number of worker threads that are to be created by the mmfileid command.
The default value is 16. The minimum value is 1. The maximum value can be as large as is allowed
by the operating system pthread_create function for a single process. A suggested value is twice the
number of disks in the file system.
-N {Node[,Node...] | NodeFile | NodeClass}
Specifies the list of nodes that will participate in determining the disk addresses. This command
supports all defined node classes. The default is all (all nodes in the GPFS cluster will participate).
For general information on how to specify node names, see the Specifying nodes as input to GPFS
commands topic in the GPFS: Administration and Programming Reference.
-o OutputFile
The path name of a file to which the result from the mmfileid command is to be written. If not
specified, the result is sent to standard output.
-t Directory
Specifies the directory to use for temporary storage during mmfileid command processing. The
default directory is /tmp.

The output can be redirected to a file (using the -o flag) and sorted on the inode number, using the sort
command.

The mmfileid command output contains one line for each inode found to be located on the corrupt disk
sector. Each line of the command output has this format:
InodeNumber LogicalDiskAddress SnapshotId Filename
InodeNumber
Indicates the inode number of the file identified by mmfileid.
LogicalDiskAddress
Indicates the disk block (disk sector) number of the file identified by mmfileid.
SnapshotId
Indicates the snapshot identifier for the file. A SnapshotId of 0 means that the file is not a snapshot
file.
Filename
Indicates the name of the file identified by mmfileid. File names are relative to the root of the file
system in which they reside.

Assume that a disk analysis tool reported that hdisk6, hdisk7, hdisk8, and hdisk9 contained bad sectors.
Then the command:
mmfileid /dev/gpfsB -F addr.in

where addr.in contains this:


k148n07:hdisk9:2206310-2206810
k148n07:hdisk8:2211038-2211042
k148n07:hdisk8:2201800-2202800
k148n01:hdisk6:2921879-2926880
k148n09:hdisk7:1076208-1076610

34 GPFS: Problem Determination Guide


may produce output similar to this:
Address 2201958 is contained in the Block allocation map (inode 1)
Address 2206688 is contained in the ACL Data file (inode 4, snapId 0)
Address 2211038 is contained in the Log File (inode 7, snapId 0)
14336 1076256 0 /gpfsB/tesDir/testFile.out
14344 2922528 1 /gpfsB/x.img

The lines starting with the word Address represent GPFS system metadata files or reserved disk areas. If
your output contains any of these lines, do not attempt to replace or repair the indicated files. If you
suspect that any of the special files are damaged, call the IBM Support Center for assistance.

The line:
14336 1072256 0 /gpfsB/tesDir/testFile.out

indicates that inode number 14336, disk address 1072256 contains file /gpfsB/tesDir/testFile.out, which
does not belong to a snapshot (0 to the left of the name). This file is located on a potentially bad disk
sector area.

The line
14344 2922528 1 /gpfsB/x.img

indicates that inode number 14344, disk address 2922528 contains file /gpfsB/x.img, which belongs to
snapshot number 1 (1 to the left of the name). This file is located on a potentially bad disk sector area.

The SHA digest


The Secure Hash Algorithm (SHA) digest is relevant only when using GPFS in a multi-cluster
environment.

The SHA digest is a short and convenient way to identify a key registered with either the mmauth show
or mmremotecluster command. In theory, two keys may have the same SHA digest. In practice, this is
extremely unlikely. The SHA digest can be used by the administrators of two GPFS clusters to determine
if they each have received (and registered) the right key file from the other administrator.

An example is the situation of two administrators named Admin1 and Admin2 who have registered the
others' respective key file, but find that mount attempts by Admin1 for file systems owned by Admin2
fail with the error message: Authorization failed. To determine which administrator has registered the
wrong key, they each run mmauth show and send the local clusters SHA digest to the other
administrator. Admin1 then runs the mmremotecluster command and verifies that the SHA digest for
Admin2's cluster matches the SHA digest for the key that Admin1 has registered. Admin2 then runs the
mmauth show command and verifies that the SHA digest for Admin1's cluster matches the key that
Admin2 has authorized.

If Admin1 finds that the SHA digests do not match, Admin1 runs the mmremotecluster update
command, passing the correct key file as input.

If Admin2 finds that the SHA digests do not match, Admin2 runs the mmauth update command,
passing the correct key file as input.

This is an example of the output produced by the mmauth show all command:
Cluster name: fksdcm.pok.ibm.com
Cipher list: EXP1024-RC2-CBC-MD5
SHA digest: d5eb5241eda7d3ec345ece906bfcef0b6cd343bd
File system access: fs1 (rw, root allowed)

Cluster name: kremote.cluster

Chapter 3. GPFS file system and disk information 35


Cipher list: EXP1024-RC4-SHA
SHA digest: eb71a3aaa89c3979841b363fd6d0a36a2a460a8b
File system access: fs1 (rw, root allowed)

Cluster name: dkq.cluster (this cluster)


Cipher list: AUTHONLY
SHA digest: 090cd57a2e3b18ac163e5e9bd5f26ffabaa6aa25
File system access: (all rw)

36 GPFS: Problem Determination Guide


|

| Chapter 4. Deadlock amelioration


| The distributed nature of GPFS, the complexity of the locking infrastructure, the dependency on the
| proper operation of disks and networks, and the overall complexity of operating in a clustered
| environment all contribute to increasing the probability of a deadlock.

| Deadlocks can be disruptive in certain situations, more so than other type of failure. A deadlock
| effectively represents a single point of failure that can render the entire cluster inoperable. When a
| deadlock is encountered on a production system, it can take a long time to debug. The typical approach
| to recovering from a deadlock involves rebooting all of the nodes in the cluster. Thus, deadlocks can lead
| to prolonged and complete outages of clusters.

| To troubleshoot deadlocks you must have specific types of debug data that must be collected while the
| deadlock is in progress. Data collection commands must be run manually, and if this is not done before
| the deadlock is broken up, determining the root cause of the deadlock after that will be difficult. Also,
| deadlock detection requires some form of external action; for example, a complaint from a user. This
| means that detecting a deadlock in progress could take many hours.

| Starting with GPFS 4.1, automated deadlock detection, automated deadlock data collection, and
| automated deadlock breakup options are provided to make it easier to handle a deadlock situation.

| The information is organized as follows:


| v “Automated deadlock detection”
| v “Automated deadlock data collection” on page 38
| v “Automated deadlock breakup” on page 38
|
| Automated deadlock detection
| Many deadlocks involve long waiters; for example, mmfsd threads that have been waiting for some event
| for a considerable duration of time. With some exceptions, long waiters typically indicate that something
| in the system is not healthy. There may be a deadlock in progress, some disk may be failing, or the entire
| system may be overloaded.

| All waiters can be broadly divided into four categories:


| v Waiters that can occur under normal operating conditions and can be ignored by automated deadlock
| detection.
| v Waiters that correspond to complex operations and can legitimately grow to moderate lengths.
| v Waiters that should never be long. For example, most mutexes should only be held briefly.
| v Waiters that can be used as an indicator of cluster overload. For example, waiters waiting for I/O
| completions or network availability.

| Automated deadlock detection monitors waiters. Deadlock detection relies on a configurable threshold to
| determine if a deadlock is in progress. When a deadlock is detected, an alert is issued in the mmfs.log,
| the operating system log, and the deadlockDetected callback is triggered.

| Automated deadlock detection is enabled by default and controlled with the mmchconfig attribute
| deadlockDetectionThreshold. A potential deadlock is detected when a waiter waits longer than
| deadlockDetectionThreshold. To view the current threshold for deadlock detection, enter the following
| command:
| mmlsconfig deadlockDetectionThreshold

© Copyright IBM Corporation © IBM 2014 37


| The system displays output similar to the following:
| deadlockDetectionThreshold 300

| To disable automated deadlock detection, specify a value of 0 for the deadlockDetectionThreshold


| attribute.
|
| Automated deadlock data collection
| In order to effectively troubleshoot a typical deadlock, it is imperative that the following debug data is
| collected:
| v A full internal dump (mmfsadm dump all)
| v A dump of kthreads (mmfsadm dump kthreads)
| v Trace snapshot (10-30 seconds of trace data)

| Automated deadlock data collection can be used to help gather this crucial debug data on detection of a
| potential deadlock.

| Automated deadlock data collection is enabled by default and controlled with the mmchconfig attribute
| deadlockDataCollectionDailyLimit. The deadlockDataCollectionDailyLimit attribute specifies the
| maximum number of times debug data can be collected in a 24-hour period. To view the current data
| collection interval, enter the following command:
| mmlsconfig deadlockDataCollectionDailyLimit

| The system displays output similar to the following:


| deadlockDataCollectionDailyLimit 10

| To disable automated deadlock data collection, specify a value of 0 for


| deadlockDataCollectionDailyLimit.
|
| Automated deadlock breakup
| Automated deadlock breakup helps resolve a deadlock situation without human intervention. To break
| up a deadlock, less disruptive actions are tried first; for example, causing a file system panic. If necessary,
| more disruptive actions are then taken; for example, shutting down a GPFS mmfsd daemon.

| If a system administrator prefers to control the deadlock breakup process, the deadlockDetected callback
| can be used to notify system administrators that a potential deadlock has been detected. The information
| from the mmdiag --deadlock section can then be used to help determine what steps to take to resolve the
| deadlock.

| Automated deadlock breakup is disabled by default and controlled with the mmchconfig attribute
| deadlockBreakupDelay. The deadlockBreakupDelay attribute specifies how long to wait after a
| deadlock is detected before attempting to break up the deadlock. Enough time must be provided to allow
| the debug data collection to complete. To view the current breakup delay, enter the following command:
| mmlsconfig deadlockBreakupDelay

| The system displays output similar to the following:


| deadlockBreakupDelay 0

| The value of 0 shows that automated deadlock breakup is disabled. To enable automated deadlock
| breakup, specify a positive value for deadlockBreakupDelay. If automated deadlock breakup is to be
| enabled, a delay of 300 seconds or longer is recommended.

38 GPFS: Problem Determination Guide


Chapter 5. Other problem determination tools
Other problem determination tools include the kernel debugging facilities and the mmpmon command.

If your problem occurs on the AIX operating system, see the appropriate kernel debugging
documentation in the AIX Information Center (http://publib16.boulder.ibm.com/pseries/index.htm) for
information about the AIX kdb command.

If your problem occurs on the Linux operating system, see the documentation for your distribution
vendor.

If your problem occurs on the Windows operating system, the following tools that are available from
Microsoft at: http://www.microsoft.com/en/us/default.aspx, might be useful in troubleshooting:
v Debugging Tools for Windows
v Process Monitor
v Process Explorer
v Microsoft Windows Driver Kit
v Microsoft Windows Software Development Kit

The mmpmon command is intended for system administrators to analyze their I/O on the node on
which it is run. It is not primarily a diagnostic tool, but may be used as one for certain problems. For
example, running mmpmon on several nodes may be used to detect nodes that are experiencing poor
performance or connectivity problems.

The syntax of the mmpmon command is fully described in the Commands topic in the GPFS:
Administration and Programming Reference. For details on the mmpmon command, see the Monitoring GPFS
I/O performance with the mmpmon command topic in the GPFS: Advanced Administration Guide.

© Copyright IBM Corporation © IBM 2014 39


40 GPFS: Problem Determination Guide
Chapter 6. GPFS installation, configuration, and operation
problems
You might encounter errors with GPFS installation, configuration, and operation. Use the information in
this topic to help you identify and correct errors.

A GPFS installation problem should be suspected when GPFS modules are not loaded successfully, GPFS
commands do not work, either on the node that you are working on or on other nodes, new command
operands added with a new release of GPFS are not recognized, or there are problems with the kernel
extension.

A GPFS configuration problem should be suspected when the GPFS daemon will not activate, it will not
remain active, or it fails on some nodes but not on others. Suspect a configuration problem also if
quorum is lost, certain nodes appear to hang or do not communicate properly with GPFS, nodes cannot
be added to the cluster or are expelled, or GPFS performance is very noticeably degraded once a new
release of GPFS is installed or configuration parameters have been changed.

These are some of the errors encountered with GPFS installation, configuration and operation:
v “Installation and configuration problems”
v “GPFS modules cannot be loaded on Linux” on page 46
v “GPFS daemon will not come up” on page 47
v “GPFS daemon went down” on page 50
v “GPFS failures due to a network failure” on page 51
v “Kernel panics with a 'GPFS dead man switch timer has expired, and there's still outstanding I/O
requests' message” on page 52
v “Quorum loss” on page 52
v “Delays and deadlocks” on page 53
v “Node cannot be added to the GPFS cluster” on page 54
v “Remote node expelled after remote file system successfully mounted” on page 55
v “Disaster recovery problems” on page 55
v “GPFS commands are unsuccessful” on page 56
v “Application program errors” on page 58
v “Troubleshooting Windows problems” on page 59
v “OpenSSH connection delays” on page 60

Installation and configuration problems


The GPFS: Concepts, Planning, and Installation Guide provides the step-by-step procedure for installing and
migrating GPFS, however, some problems might occur if the procedures were not properly followed.

Some of those problems might include:


v Not being able to start GPFS after installation of the latest level. Did you reboot your GPFS nodes
subsequent to the last invocation of GPFS at the old level and the first one at the new level? If you did,
see “GPFS daemon will not come up” on page 47. If not, reboot. For more information, see the
Initialization of the GPFS daemon topic in the GPFS: Concepts, Planning, and Installation Guide.
v Not being able to access a file system. See “File system will not mount” on page 61.
v New GPFS functions do not operate. See “GPFS commands are unsuccessful” on page 56.

© Copyright IBM Corporation © IBM 2014 41


What to do after a node of a GPFS cluster crashes and has been
reinstalled
After reinstalling GPFS code, check whether the /var/mmfs/gen/mmsdrfs file was lost. If it was lost, and
an up-to-date version of the file is present on the primary GPFS cluster configuration server, restore the
file by issuing this command from the node on which it is missing:
| mmsdrrestore -p primaryServer

| where primaryServer is the name of the primary GPFS cluster configuration server.

If the /var/mmfs/gen/mmsdrfs file is not present on the primary GPFS cluster configuration server, but it
is present on some other node in the cluster, restore the file by issuing these commands:
mmsdrrestore -p remoteNode -F remoteFile
mmchcluster -p LATEST

where remoteNode is the node that has an up-to-date version of the /var/mmfs/gen/mmsdrfs file, and
remoteFile is the full path name of that file on that node.

| This restore procedure may not work if the repository type is CCR and the files in /var/mmfs/ccr are
| damaged or missing. If this is the case, locate a node on which the /var/mmfs/gen/mmsdrfs file is
| present. Ensure that GPFS is shut down on all of the nodes and then disable the CCR:
| mmchcluster --ccr-disable

| Using the procedure described in this section, restore the /var/mmfs/gen/mmsdrfs file on all nodes on
| which it is missing and then re-enable the CCR:
| mmchcluster --ccr-enable

One way to ensure that the latest version of the /var/mmfs/gen/mmsdrfs file is always available is to use
the mmsdrbackup user exit.

If you have made modifications to any of the user exits in /var/mmfs/etc, you will have to restore them
before starting GPFS.

For additional information, see “Recovery from loss of GPFS cluster configuration data file” on page 45.

Problems with the /etc/hosts file


The /etc/hosts file must have a unique node name for each node interface to be used by GPFS. Violation
of this requirement results in the message:
6027-1941
Cannot handle multiple interfaces for host hostName.

If you receive this message, correct the /etc/hosts file so that each node interface to be used by GPFS
appears only once in the file.

Linux configuration considerations


Note: This information applies only to Linux nodes.

Depending on your system configuration, you may need to consider:


1. Why can only one host successfully attach to the Fibre Channel loop and see the Fibre Channel
disks?
Your host bus adapter may be configured with an enabled Hard Loop ID that conflicts with other host
bus adapters on the same Fibre Channel loop.

42 GPFS: Problem Determination Guide


To see if that is the case, reboot your machine and enter the adapter bios with <Alt-Q> when the
Fibre Channel adapter bios prompt appears. Under the Configuration Settings menu, select Host
Adapter Settings and either ensure that the Adapter Hard Loop ID option is disabled or assign a
unique Hard Loop ID per machine on the Fibre Channel loop.
2. Could the GPFS daemon be terminated due to a memory shortage?
The Linux virtual memory manager (VMM) exhibits undesirable behavior for low memory situations
on nodes, where the processes with the largest memory usage are killed by the kernel (using OOM
killer), yet no mechanism is available for prioritizing important processes that should not be initial
candidates for the OOM killer. The GPFS mmfsd daemon uses a large amount of pinned memory in
the pagepool for caching data and metadata, and so the mmfsd process is a likely candidate for
termination if memory must be freed up.
3. What are the performance tuning suggestions?
For an up-to-date list of tuning suggestions, see the GPFS FAQ in the IBM Cluster information center
(http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/
gpfsclustersfaq.html) or GPFS FAQ in the IBM Knowledge Center (http://www.ibm.com/support/
knowledgecenter/SSFKCN/gpfsclustersfaq.html).

Problems with running commands on other nodes


Many of the GPFS administration commands perform operations on nodes other than the node on which
the command was issued. This is achieved by utilizing a remote invocation shell and a remote file copy
command. By default these items are /usr/bin/rsh and /usr/bin/rcp. You also have the option of specifying
your own remote shell and remote file copy commands to be used instead of the default rsh and rcp. The
remote shell and copy commands must adhere to the same syntax forms as rsh and rcp but may
implement an alternate authentication mechanism. For details, see the mmcrcluster and mmchcluster
commands. These are problems you may encounter with the use of remote commands.

Authorization problems
The rsh and rcp commands are used by GPFS administration commands to perform operations on other
nodes. The rsh daemon (rshd) on the remote node must recognize the command being run and must
obtain authorization to invoke it.

| Note: The rsh and rcp commands that are shipped with Cygwin are not supported on Windows. Use the
ssh and scp commands that are shipped with the OpenSSH package supported by GPFS. Refer to the
GPFS FAQ in the IBM Cluster information center (http://publib.boulder.ibm.com/infocenter/clresctr/
vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html) or GPFS FAQ in the IBM
Knowledge Center (http://www.ibm.com/support/knowledgecenter/SSFKCN/gpfsclustersfaq.html) for
the latest OpenSSH information.

For the rsh and rcp commands issued by GPFS administration commands to succeed, each node in the
cluster must have an .rhosts file in the home directory for the root user, with file permission set to 600.
This .rhosts file must list each of the nodes and the root user. If such an .rhosts file does not exist on each
node in the cluster, the rsh and rcp commands issued by GPFS commands will fail with permission
errors, causing the GPFS commands to fail in turn.

If you elected to use installation specific remote invocation shell and remote file copy commands, you
must ensure:
1. Proper authorization is granted to all nodes in the GPFS cluster.
2. The nodes in the GPFS cluster can communicate without the use of a password, and without any
extraneous messages.

Connectivity problems
Another reason why rsh may fail is that connectivity to a needed node has been lost. Error messages
from mmdsh may indicate that connectivity to such a node has been lost. Here is an example:

Chapter 6. GPFS installation, configuration, and operation problems 43


mmdelnode -N k145n04
Verifying GPFS is stopped on all affected nodes ...
mmdsh: 6027-1617 There are no available nodes on which to run the command.
mmdelnode: 6027-1271 Unexpected error from verifyDaemonInactive: mmcommon onall.
Return code: 1

If error messages indicate that connectivity to a node has been lost, use the ping command to verify
whether the node can still be reached:
ping k145n04
PING k145n04: (119.114.68.69): 56 data bytes
<Ctrl- C>
----k145n04 PING Statistics----
3 packets transmitted, 0 packets received, 100% packet loss

If connectivity has been lost, restore it, then reissue the GPFS command.

GPFS error messages for rsh problems


When rsh problems arise, the system may display information similar to these error messages:
6027-1615
nodeName remote shell process had return code value.
6027-1617
There are no available nodes on which to run the command.

GPFS cluster configuration data files are locked


GPFS uses a file to serialize access of administration commands to the GPFS cluster configuration data
files. This lock file is kept on the primary GPFS cluster configuration server in the /var/mmfs/gen/
mmLockDir directory. If a system failure occurs before the cleanup of this lock file, the file will remain
and subsequent administration commands may report that the GPFS cluster configuration data files are
locked. Besides a serialization lock, certain GPFS commands may obtain an additional lock. This lock is
designed to prevent GPFS from coming up, or file systems from being mounted, during critical sections
of the command processing. If this happens you will see a message that shows the name of the blocking
command, similar to message:
6027-1242
GPFS is waiting for requiredCondition.

To release the lock:


1. Determine the PID and the system that owns the lock by issuing:
mmcommon showLocks

The mmcommon showLocks command displays information about the lock server, lock name, lock
holder, PID, and extended information. If a GPFS administration command is not responding,
stopping the command will free the lock. If another process has this PID, another error occurred to
the original GPFS command, causing it to die without freeing the lock, and this new process has the
same PID. If this is the case, do not kill the process.
2. If any locks are held and you want to release them manually, from any node in the GPFS cluster issue
the command:
mmcommon freeLocks

GPFS error messages for cluster configuration data file problems


When GPFS commands are unable to retrieve or update the GPFS cluster configuration data files, the
system may display information similar to these error messages:
6027-1628
Cannot determine basic environment information. Not enough nodes are available.

44 GPFS: Problem Determination Guide


6027-1630
The GPFS cluster data on nodeName is back level.
6027-1631
The commit process failed.
6027-1632
The GPFS cluster configuration data on nodeName is different than the data on nodeName.
6027-1633
Failed to create a backup copy of the GPFS cluster data on nodeName.

Recovery from loss of GPFS cluster configuration data file


A copy of the GPFS cluster configuration data files is stored in the /var/mmfs/gen/mmsdrfs file on each
node. For proper operation, this file must exist on each node in the GPFS cluster. The latest level of this
file is guaranteed to be on the primary, and secondary if specified, GPFS cluster configuration server
nodes that were defined when the GPFS cluster was first created with the mmcrcluster command.

If the /var/mmfs/gen/mmsdrfs file is removed by accident from any of the nodes, and an up-to-date
version of the file is present on the primary GPFS cluster configuration server, restore the file by issuing
this command from the node on which it is missing:
| mmsdrrestore -p primaryServer

| where primaryServer is the name of the primary GPFS cluster configuration server.

If the /var/mmfs/gen/mmsdrfs file is not present on the primary GPFS cluster configuration server, but is
present on some other node in the cluster, restore the file by issuing these commands:
mmsdrrestore -p remoteNode -F remoteFile
mmchcluster -p LATEST

where remoteNode is the node that has an up-to-date version of the /var/mmfs/gen/mmsdrfs file and
remoteFile is the full path name of that file on that node.

| This restore procedure may not work if the repository type is CCR and the files in /var/mmfs/ccr are
| damaged or missing. If this is the case, locate a node on which the /var/mmfs/gen/mmsdrfs file is
| present. Ensure that GPFS is shut down on all of the nodes and then disable the CCR:
| mmchcluster --ccr-disable

| Using the procedure described in this section, restore the /var/mmfs/gen/mmsdrfs file on all nodes on
| which it is missing and then re-enable the CCR:
| mmchcluster --ccr-enable

One way to ensure that the latest version of the /var/mmfs/gen/mmsdrfs file is always available is to use
the mmsdrbackup user exit.

Automatic backup of the GPFS cluster data


GPFS provides an exit, mmsdrbackup, that can be used to automatically back up the GPFS configuration
data every time it changes. To activate this facility, follow these steps:
1. Modify the GPFS-provided version of mmsdrbackup as described in its prologue, to accomplish the
backup of the mmsdrfs file however the user desires. This file is /usr/lpp/mmfs/samples/
mmsdrbackup.sample.
2. Copy this modified mmsdrbackup.sample file to /var/mmfs/etc/mmsdrbackup on all of the nodes in
the GPFS cluster. Make sure that the permission bits for /var/mmfs/etc/mmsdrbackup are set to
permit execution by root.

Chapter 6. GPFS installation, configuration, and operation problems 45


GPFS will invoke the user-modified version of mmsdrbackup in /var/mmfs/etc every time a change is
made to the mmsdrfs file. This will perform the backup of the mmsdrfs file according to the user's
specifications. See the GPFS user exits topic in the GPFS: Administration and Programming Reference.

Error numbers specific to GPFS applications calls


When experiencing installation and configuration problems, GPFS may report these error numbers in the
operating system error log facility, or return them to an application:
ECONFIG = 215, Configuration invalid or inconsistent between different nodes.
This error is returned when the levels of software on different nodes cannot coexist. For
information about which levels may coexist, see the GPFS FAQ in the IBM Cluster information
center (http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/
gpfs_faqs/gpfsclustersfaq.html) or GPFS FAQ in the IBM Knowledge Center
(http://www.ibm.com/support/knowledgecenter/SSFKCN/gpfsclustersfaq.html).
ENO_QUOTA_INST = 237, No Quota management enabled.
To enable quotas for the file system issue the mmchfs -Q yes command. To disable quotas for the
file system issue the mmchfs -Q no command.
EOFFLINE = 208, Operation failed because a disk is offline
This is most commonly returned when an open of a disk fails. Since GPFS will attempt to
continue operation with failed disks, this will be returned when the disk is first needed to
complete a command or application request. If this return code occurs, check your disk
subsystem for stopped states and check to determine if the network path exists. In rare situations,
this will be reported if disk definitions are incorrect.
EALL_UNAVAIL = 218, A replicated read or write failed because none of the replicas were available.
Multiple disks in multiple failure groups are unavailable. Follow the procedures in Chapter 8,
“GPFS disk problems,” on page 91 for unavailable disks.
| 6027-341 [D]
Node nodeName is incompatible because its maximum compatible version (number) is less than the
version of this node (number).
| 6027-342 [E]
Node nodeName is incompatible because its minimum compatible version is greater than the
version of this node (number).
| 6027-343 [E]
Node nodeName is incompatible because its version (number) is less than the minimum compatible
version of this node (number).
| 6027-344 [E]
Node nodeName is incompatible because its version is greater than the maximum compatible
version of this node (number).

GPFS modules cannot be loaded on Linux


You must build the GPFS portability layer binaries based on the kernel configuration of your system. For
more information, see the GPFS open source portability layer topic in the GPFS: Concepts, Planning, and
Installation Guide. During mmstartup processing, GPFS loads the mmfslinux kernel module.

Some of the more common problems that you may encounter are:
1. If the portability layer is not built, you may see messages similar to:
Mon Mar 26 20:56:30 EDT 2012: runmmfs starting
Removing old /var/adm/ras/mmfs.log.* files:
Unloading modules from /lib/modules/2.6.32.12-0.6-ppc64/extra
runmmfs: The /lib/modules/2.6.32.12-0.6-ppc64/extra/mmfslinux.ko kernel extension does not exist.
runmmfs: Unable to verify kernel/module configuration.

46 GPFS: Problem Determination Guide


Loading modules from /lib/modules/2.6.32.12-0.6-ppc64/extra
runmmfs: The /lib/modules/2.6.32.12-0.6-ppc64/extra/mmfslinux.ko kernel extension does not exist.
runmmfs: Unable to verify kernel/module configuration.
Mon Mar 26 20:56:30 EDT 2012 runmmfs: error in loading or unloading the mmfs kernel extension
Mon Mar 26 20:56:30 EDT 2012 runmmfs: stopping GPFS
2. The GPFS kernel modules, mmfslinux and tracedev, are built with a kernel version that differs from
that of the currently running Linux kernel. This situation can occur if the modules are built on
another node with a different kernel version and copied to this node, or if the node is rebooted using
a kernel with a different version.
3. If the mmfslinux module is incompatible with your system, you may experience a kernel panic on
GPFS startup. Ensure that the site.mcr has been configured properly from the site.mcr.proto, and
GPFS has been built and installed properly.

For more information about the mmfslinux module, see the Building the GPFS portability layer topic in the
GPFS: Concepts, Planning, and Installation Guide.

GPFS daemon will not come up


There are several indications that could lead you to the conclusion that the GPFS daemon (mmfsd) will
not come up and there are some steps to follow to correct the problem.

Those indications include:


v The file system has been enabled to mount automatically, but the mount has not completed.
v You issue a GPFS command and receive the message:
6027-665
Failed to connect to file system daemon: Connection refused.
v The GPFS log does not contain the message:
| 6027-300 [N]
mmfsd ready
v The GPFS log file contains this error message: 'Error: daemon and kernel extension do not match.' This
error indicates that the kernel extension currently loaded in memory and the daemon currently starting
have mismatching versions. This situation may arise if a GPFS code update has been applied, and the
node has not been rebooted prior to starting GPFS.
While GPFS scripts attempt to unload the old kernel extension during update and install operations,
such attempts may fail if the operating system is still referencing GPFS code and data structures. To
recover from this error, ensure that all GPFS file systems are successfully unmounted, and reboot the
node. The mmlsmount command can be used to ensure that all file systems are unmounted.

Steps to follow if the GPFS daemon does not come up


1. See “GPFS modules cannot be loaded on Linux” on page 46 if your node is running Linux, to verify
that you have built the portability layer.
2. Verify that the GPFS daemon is active by issuing:
ps -e | grep mmfsd

The output of this command should list mmfsd as operational. For example:
12230 pts/8 00:00:00 mmfsd

If the output does not show this, the GPFS daemon needs to be started with the mmstartup
command.
3. If you did not specify the autoload option on the mmcrcluster or the mmchconfig command, you
need to manually start the daemon by issuing the mmstartup command.

Chapter 6. GPFS installation, configuration, and operation problems 47


If you specified the autoload option, someone may have issued the mmshutdown command. In this
case, issue the mmstartup command. When using autoload for the first time, mmstartup must be run
manually. The autoload takes effect on the next reboot.
4. Verify that the network upon which your GPFS cluster depends is up by issuing:
ping nodename

to each node in the cluster. A properly working network and node will correctly reply to the ping
with no lost packets.
Query the network interface that GPFS is using with:
netstat -i

A properly working network will report no transmission errors.


5. Verify that the GPFS cluster configuration data is available by looking in the GPFS log. If you see the
message:
6027-1592
Unable to retrieve GPFS cluster files from node nodeName.

Determine the problem with accessing node nodeName and correct it.
6. Verify that the GPFS environment is properly initialized by issuing these commands and ensuring that
the output is as expected.
v Issue the mmlscluster command to list the cluster configuration. This will also update the GPFS
configuration data on the node. Correct any reported errors before continuing.
v List all file systems that were created in this cluster. For an AIX node, issue:
lsfs -v mmfs
For a Linux node, issue:
cat /etc/fstab | grep gpfs
If any of these commands produce unexpected results, this may be an indication of corrupted GPFS
cluster configuration data file information. Follow the procedures in “Information to collect before
contacting the IBM Support Center” on page 115, and then contact the IBM Support Center.
7. GPFS requires a quorum of nodes to be active before any file system operations can be honored. This
requirement guarantees that a valid single token management domain exists for each GPFS file
system. Prior to the existence of a quorum, most requests are rejected with a message indicating that
quorum does not exist.
To identify which nodes in the cluster have daemons up or down, issue:
mmgetstate -L -a
If insufficient nodes are active to achieve quorum, go to any nodes not listed as active and perform
problem determination steps on these nodes. A quorum node indicates that it is part of a quorum by
writing an mmfsd ready message to the GPFS log. Remember that your system may have quorum
nodes and non-quorum nodes, and only quorum nodes are counted to achieve the quorum.
8. This step applies only to AIX nodes. Verify that GPFS kernel extension is not having problems with its
shared segment by invoking:
cat /var/adm/ras/mmfs.log.latest

Messages such as:


6027-319
Could not create shared segment.
must be corrected by the following procedure:
a. Issue the mmshutdown command.
b. Remove the shared segment in an AIX environment:
1) Issue the mmshutdown command.

48 GPFS: Problem Determination Guide


2) Issue the mmfsadm cleanup command.
c. If you are still unable to resolve the problem, reboot the node.
9. If the previous GPFS daemon was brought down and you are trying to start a new daemon but are
unable to, this is an indication that the original daemon did not completely go away. Go to that node
and check the state of GPFS. Stopping and restarting GPFS or rebooting this node will often return
GPFS to normal operation. If this fails, follow the procedures in “Additional information to collect for
GPFS daemon crashes” on page 116, and then contact the IBM Support Center.

Unable to start GPFS after the installation of a new release of GPFS


If one or more nodes in the cluster will not start GPFS, these are the possible causes:
v If message:
| 6027-2700 [E]
A node join was rejected. This could be due to incompatible daemon versions, failure to find
the node in the configuration database, or no configuration manager found.

is written to the GPFS log, incompatible versions of GPFS code exist on nodes within the same cluster.
v If messages stating that functions are not supported are written to the GPFS log, you may not have the
correct kernel extensions loaded.
1. Ensure that the latest GPFS install packages are loaded on your system.
2. If running on Linux, ensure that the latest kernel extensions have been installed and built. See the
Building the GPFS portability layer topic in the GPFS: Concepts, Planning, and Installation Guide.
3. Reboot the GPFS node after an installation to ensure that the latest kernel extension is loaded.
v The daemon will not start because the configuration data was not migrated. See “Installation and
configuration problems” on page 41.

GPFS error messages for shared segment and network problems


For shared segment problems, follow the problem determination and repair actions specified with the
following messages:
6027-319
Could not create shared segment.
6027-320
Could not map shared segment.
6027-321
Shared segment mapped at wrong address (is value, should be value).
6027-322
Could not map shared segment in kernel extension.

For network problems, follow the problem determination and repair actions specified with the following
message:
| 6027-306 [E]
Could not initialize inter-node communication

Error numbers specific to GPFS application calls when the daemon is


unable to come up
When the daemon is unable to come up, GPFS may report these error numbers in the operating system
error log, or return them to an application:
ECONFIG = 215, Configuration invalid or inconsistent between different nodes.
This error is returned when the levels of software on different nodes cannot coexist. For

Chapter 6. GPFS installation, configuration, and operation problems 49


information about which levels may coexist, see the GPFS FAQ in the IBM Cluster information
center (http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/
gpfs_faqs/gpfsclustersfaq.html) or GPFS FAQ in the IBM Knowledge Center
(http://www.ibm.com/support/knowledgecenter/SSFKCN/gpfsclustersfaq.html).
| 6027-341 [D]
Node nodeName is incompatible because its maximum compatible version (number) is less than the
version of this node (number).
| 6027-342 [E]
Node nodeName is incompatible because its minimum compatible version is greater than the
version of this node (number).
| 6027-343 [E]
Node nodeName is incompatible because its version (number) is less than the minimum compatible
version of this node (number).
| 6027-344 [E]
Node nodeName is incompatible because its version is greater than the maximum compatible
version of this node (number).

GPFS daemon went down


There are a number of conditions that can cause the GPFS daemon to exit.

These are all conditions where the GPFS internal checking has determined that continued operation
would be dangerous to the consistency of your data. Some of these conditions are errors within GPFS
processing but most represent a failure of the surrounding environment.

In most cases, the daemon will exit and restart after recovery. If it is not safe to simply force the
unmounted file systems to recover, the GPFS daemon will exit.

Indications leading you to the conclusion that the daemon went down:
v Applications running at the time of the failure will see either ENODEV or ESTALE errors. The ENODEV
errors are generated by the operating system until the daemon has restarted. The ESTALE error is
generated by GPFS as soon as it restarts.
When quorum is lost, applications with open files receive an ESTALE error return code until the files are
closed and reopened. New file open operations will fail until quorum is restored and the file system is
remounted. Applications accessing these files prior to GPFS return may receive a ENODEV return code
from the operating system.
v The GPFS log contains the message:
| 6027-650 [X]
The mmfs daemon is shutting down abnormally.
Most GPFS daemon down error messages are in the mmfs.log.previous log for the instance that failed.
If the daemon restarted, it generates a new mmfs.log.latest. Begin problem determination for these
errors by examining the operating system error log.
If an existing quorum is lost, GPFS stops all processing within the cluster to protect the integrity of
your data. GPFS will attempt to rebuild a quorum of nodes and will remount the file system if
automatic mounts are specified.
v Open requests are rejected with no such file or no such directory errors.
When quorum has been lost, requests are rejected until the node has rejoined a valid quorum and
mounted its file systems. If messages indicate lack of quorum, follow the procedures in “GPFS daemon
will not come up” on page 47.
v Removing the setuid bit from the permissions of these commands may produce errors for non-root
users:

50 GPFS: Problem Determination Guide


mmdf
mmgetacl
mmlsdisk
mmlsfs
mmlsmgr
mmlspolicy
mmlsquota
mmlssnapshot
mmputacl
mmsnapdir
mmsnaplatest
The GPFS system-level versions of these commands (prefixed by ts) may need to be checked for how
permissions are set if non-root users see the following message:
6027-1209
GPFS is down on this node.
If the setuid bit is removed from the permissions on the system-level commands, the command cannot
be executed and the node is perceived as being down. The system-level versions of the commands are:
| tsdf
| tslsdisk
| tslsfs
| tslsmgr
| tslspolicy
| tslsquota
| tslssnapshot
| tssnapdir
| tssnaplatest
| These are found in the /usr/lpp/mmfs/bin directory.

Note: The mode bits for all listed commands are 4555 or -r-sr-xr-x. To restore the default (shipped)
permission, enter:
chmod 4555 tscommand

Attention: Only administration-level versions of GPFS commands (prefixed by mm) should be


executed. Executing system-level commands (prefixed by ts) directly will produce unexpected results.
v For all other errors, follow the procedures in “Additional information to collect for GPFS daemon
crashes” on page 116, and then contact the IBM Support Center.

GPFS failures due to a network failure


For proper functioning, GPFS depends both directly and indirectly on correct network operation.

This dependence is direct because various GPFS internal messages flow on the network, and may be
indirect if the underlying disk technology is dependent on the network. Symptoms included in an
indirect failure would be inability to complete I/O or GPFS moving disks to the down state.

The problem can also be first detected by the GPFS network communication layer. If network
connectivity is lost between nodes or GPFS heart beating services cannot sustain communication to a
node, GPFS will declare the node dead and perform recovery procedures. This problem will manifest
itself by messages appearing in the GPFS log such as:
Mon Jun 25 22:23:36.298 2007: Close connection to 192.168.10.109 c5n109. Attempting reconnect.
Mon Jun 25 22:23:37.300 2007: Connecting to 192.168.10.109 c5n109
Mon Jun 25 22:23:37.398 2007: Close connection to 192.168.10.109 c5n109
Mon Jun 25 22:23:38.338 2007: Recovering nodes: 9.114.132.109
Mon Jun 25 22:23:38.722 2007: Recovered 1 nodes.

Chapter 6. GPFS installation, configuration, and operation problems 51


Nodes mounting file systems owned and served by other clusters may receive error messages similar to
this:
Mon Jun 25 16:11:16 2007: Close connection to 89.116.94.81 k155n01
Mon Jun 25 16:11:21 2007: Lost membership in cluster remote.cluster. Unmounting file systems.

If a sufficient number of nodes fail, GPFS will lose the quorum of nodes, which exhibits itself by
messages appearing in the GPFS log, similar to this:
Mon Jun 25 11:08:10 2007: Close connection to 179.32.65.4 gpfs2
Mon Jun 25 11:08:10 2007: Lost membership in cluster gpfsxx.kgn.ibm.com. Unmounting file system.

When either of these cases occur, perform problem determination on your network connectivity. Failing
components could be network hardware such as switches or host bus adapters.

Kernel panics with a 'GPFS dead man switch timer has expired, and
there's still outstanding I/O requests' message
This problem can be detected by an error log with a label of KERNEL_PANIC, and the PANIC
MESSAGES or a PANIC STRING.

For example:
GPFS Deadman Switch timer has expired, and there’s still outstanding I/O requests

GPFS is designed to tolerate node failures through per-node metadata logging (journaling). The log file is
called the recovery log. In the event of a node failure, GPFS performs recovery by replaying the recovery
log for the failed node, thus restoring the file system to a consistent state and allowing other nodes to
continue working. Prior to replaying the recovery log, it is critical to ensure that the failed node has
indeed failed, as opposed to being active but unable to communicate with the rest of the cluster.

In the latter case, if the failed node has direct access (as opposed to accessing the disk with an NSD
server) to any disks that are a part of the GPFS file system, it is necessary to ensure that no I/O requests
submitted from this node complete once the recovery log replay has started. To accomplish this, GPFS
uses the disk lease mechanism. The disk leasing mechanism guarantees that a node does not submit any
more I/O requests once its disk lease has expired, and the surviving nodes use disk lease time out as a
guideline for starting recovery.

This situation is complicated by the possibility of 'hung I/O'. If an I/O request is submitted prior to the
disk lease expiration, but for some reason (for example, device driver malfunction) the I/O takes a long
time to complete, it is possible that it may complete after the start of the recovery log replay during
recovery. This situation would present a risk of file system corruption. In order to guard against such a
contingency, when I/O requests are being issued directly to the underlying disk device, GPFS initiates a
kernel timer, referred to as dead man switch. The dead man switch timer goes off in the event of disk
lease expiration, and checks whether there is any outstanding I/O requests. If there is any I/O pending, a
kernel panic is initiated to prevent possible file system corruption.

Such a kernel panic is not an indication of a software defect in GPFS or the operating system kernel, but
rather it is a sign of
1. Network problems (the node is unable to renew its disk lease).
2. Problems accessing the disk device (I/O requests take an abnormally long time to complete). See
“MMFS_LONGDISKIO” on page 4.

Quorum loss
Each GPFS cluster has a set of quorum nodes explicitly set by the cluster administrator.

52 GPFS: Problem Determination Guide


These quorum nodes and the selected quorum algorithm determine the availability of file systems owned
by the cluster. See the GPFS: Concepts, Planning, and Installation Guide and search for quorum.

When quorum loss or loss of connectivity occurs, any nodes still running GPFS suspend the use of file
systems owned by the cluster experiencing the problem. This may result in GPFS access within the
suspended file system receiving ESTALE errnos. Nodes continuing to function after suspending file
system access will start contacting other nodes in the cluster in an attempt to rejoin or reform the
quorum. If they succeed in forming a quorum, access to the file system is restarted.

Normally, quorum loss or loss of connectivity occurs if a node goes down or becomes isolated from its
peers by a network failure. The expected response is to address the failing condition.

Delays and deadlocks


The first item to check when a file system appears hung is the condition of the networks including the
network used to access the disks.

Look for increasing numbers of dropped packets on all nodes by issuing:


v The netstat -D command on an AIX node.
v The ifconfig interfacename command, where interfacename is the name of the interface being used by
GPFS for communication.
When using subnets (see the Using remote access with public and private IP addresses topic in the GPFS:
Advanced Administration Guide), different interfaces may be in use for intra-cluster and intercluster
communication. The presence of a hang or dropped packed condition indicates a network support issue
that should be pursued first. Contact your local network administrator for problem determination for
your specific network configuration.

If file system processes appear to stop making progress, there may be a system resource problem or an
internal deadlock within GPFS.

Note: A deadlock can occur if user exit scripts that will be called by the mmaddcallback facility are
placed in a GPFS file system. The scripts should be placed in a local file system so they are accessible
even when the networks fail.

To debug a deadlock, do the following:


1. Check how full your file system is by issuing the mmdf command. If the mmdf command does not
respond, contact the IBM Support Center. Otherwise, the system displays information similar to:
disk disk size failure holds holds free KB free KB
name in KB group metadata data in full blocks in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 1.1 TB)
dm2 140095488 1 yes yes 136434304 ( 97%) 278232 ( 0%)
dm4 140095488 1 yes yes 136318016 ( 97%) 287442 ( 0%)
dm5 140095488 4000 yes yes 133382400 ( 95%) 386018 ( 0%)
dm0nsd 140095488 4005 yes yes 134701696 ( 96%) 456188 ( 0%)
dm1nsd 140095488 4006 yes yes 133650560 ( 95%) 492698 ( 0%)
dm15 140095488 4006 yes yes 140093376 (100%) 62 ( 0%)
------------- -------------------- -------------------
(pool total) 840572928 814580352 ( 97%) 1900640 ( 0%)

============= ==================== ===================


(total) 840572928 814580352 ( 97%) 1900640 ( 0%)

Inode Information
-----------------

Chapter 6. GPFS installation, configuration, and operation problems 53


Number of used inodes: 4244
Number of free inodes: 157036
Number of allocated inodes: 161280
Maximum number of inodes: 512000
GPFS operations that involve allocation of data and metadata blocks (that is, file creation and writes)
will slow down significantly if the number of free blocks drops below 5% of the total number. Free up
some space by deleting some files or snapshots (keeping in mind that deleting a file will not
necessarily result in any disk space being freed up when snapshots are present). Another possible
cause of a performance loss is the lack of free inodes. Issue the mmchfs command to increase the
number of inodes for the file system so there is at least a minimum of 5% free. If the file system is
approaching these limits, you may notice the following error messages:
| 6027-533 [W]
Inode space inodeSpace in file system fileSystem is approaching the limit for the maximum
number of inodes.
operating system error log entry
Jul 19 12:51:49 node1 mmfs: Error=MMFS_SYSTEM_WARNING, ID=0x4DC797C6,
Tag=3690419: File system warning. Volume fs1. Reason: File system fs1 is approaching the
limit for the maximum number of inodes/files.
| 2. If automated deadlock detection and deadlock data collection are enabled, look in the latest GPFS log
| file to determine if the system detected the deadlock and collected the appropriate debug data. Look
| in /var/adm/ras/mmfs.log.latest for messages similar to the following:
| Thu Feb 13 14:58:09.524 2014: [A] Deadlock detected: 2014-02-13 14:52:59: waiting 309.888 seconds on node
| p7fbn12: SyncHandlerThread 65327: on LkObjCondvar, reason ’waiting for RO lock’
| Thu Feb 13 14:58:09.525 2014: [I] Forwarding debug data collection request to cluster manager p7fbn11 of
| cluster cluster1.gpfs.net
| Thu Feb 13 14:58:09.524 2014: [I] Calling User Exit Script gpfsDebugDataCollection: event deadlockDebugData,
| Async command /usr/lpp/mmfs/bin/mmcommon.
| Thu Feb 13 14:58:10.625 2014: [N] sdrServ: Received deadlock notification from 192.168.117.21
| Thu Feb 13 14:58:10.626 2014: [N] GPFS will attempt to collect debug data on this node.
| mmtrace: move /tmp/mmfs/lxtrace.trc.p7fbn12.recycle.cpu0
| /tmp/mmfs/trcfile.140213.14.58.10.deadlock.p7fbn12.recycle.cpu0
| mmtrace: formatting /tmp/mmfs/trcfile.140213.14.58.10.deadlock.p7fbn12.recycle to
| /tmp/mmfs/trcrpt.140213.14.58.10.deadlock.p7fbn12.gz

| This example shows that deadlock debug data was automatically collected in /tmp/mmfs. If deadlock
| debug data was not automatically collected, it would need to be manually collected.
| To determine which nodes have the longest waiting threads, issue this command on each node:
| /usr/lpp/mmfs/bin/mmdiag --waiters waitTimeInSeconds
| For all nodes that have threads waiting longer than waitTimeInSeconds seconds, issue:
| mmfsadm dump all

| Notes:
| a. Each node can potentially dump more than 200 MB of data.
| b. Run the mmfsadm dump all command only on nodes that you are sure the threads are really
| hung. An mmfsadm dump all command can follow pointers that are changing and cause the node
| to crash.
| 3. If the deadlock situation cannot be corrected, follow the instructions in “Additional information to
| collect for delays and deadlocks” on page 116, then contact the IBM Support Center.

Node cannot be added to the GPFS cluster


There is an indication leading you to the conclusion that a node cannot be added to a cluster and steps to
follow to correct the problem.

That indication is:


v You issue the mmcrcluster or mmaddnode command and receive the message:

54 GPFS: Problem Determination Guide


6027-1598
Node nodeName was not added to the cluster. The node appears to already belong to a GPFS
cluster.

Steps to follow if a node cannot be added to a cluster:


1. Run the mmlscluster command to verify that the node is not in the cluster.
2. If the node is not in the cluster, issue this command on the node that could not be added:
mmdelnode -f
3. Reissue the mmaddnode command.

Remote node expelled after remote file system successfully mounted


This problem produces 'node expelled from cluster' messages.

One cause of this condition is when the subnets attribute of the mmchconfig command has been used to
specify subnets to GPFS, and there is an incorrect netmask specification on one or more nodes of the
clusters involved in the remote mount. Check to be sure that all netmasks are correct for the network
interfaces used for GPFS communication.

Disaster recovery problems


As with any type of problem or failure, obtain the GPFS log files (mmfs.log.*) from all nodes in the
cluster and, if available, the content of the internal dumps.

For more information see:


v The Establishing disaster recovery for your GPFS cluster topic in the GPFS: Advanced Administration Guide
for detailed information about GPFS disaster recovery
v “Creating a master GPFS log file” on page 2
v “Information to collect before contacting the IBM Support Center” on page 115

These two messages:


| 6027-435 [N]
The file system descriptor quorum has been overridden.
| 6027-490 [N]
The descriptor replica on disk diskName has been excluded.

might appear in the GPFS log for active/active disaster recovery scenarios with GPFS replication. The
purpose of these messages is to record the fact that a quorum override decision has been made after the
loss of a majority of disks. A message similar to the these will appear in the log on the file system
manager node every time it reads the file system descriptor with an overridden quorum:
...
| 6027-435 [N] The file system descriptor quorum has been overridden.
| 6027-490 [N] The descriptor replica on disk gpfs23nsd has been excluded.
| 6027-490 [N] The descriptor replica on disk gpfs24nsd has been excluded.
...

For more information on quorum override, see the GPFS: Concepts, Planning, and Installation Guide and
search on quorum.

For PPRC and FlashCopy-based configurations, additional problem determination information may be
collected from the ESS log file. This information and the appropriate ESS documentation should be
consulted when dealing with various types disk subsystem-related failures. For instance, if users are

Chapter 6. GPFS installation, configuration, and operation problems 55


unable to perform a PPRC failover (or failback) task successfully or unable to generate a FlashCopy® of a
disk volume, they should consult the subsystem log and the appropriate ESS documentation. For
additional information, refer to:
v IBM Enterprise Storage Server® (http://www.redbooks.ibm.com/redbooks/pdfs/sg245465.pdf)
v IBM TotalStorage Enterprise Storage Server Web Interface User's Guide (http://publibfp.boulder.ibm.com/
epubs/pdf/f2bui05.pdf).

Disaster recovery setup problems


These setup problems may impact your ability to use disaster recovery successfully:
1. Considerations of data integrity require proper setup of PPRC consistency groups in PPRC
environments. Additionally, when using the FlashCopy facility, make sure to suspend all I/O activity
before generating the FlashCopy image. See “Data integrity” on page 88.
2. In certain cases, it may not be possible to restore access to the file system even after relaxing the node
and disk quorums. For example, in a three failure group configuration, GPFS will tolerate and recover
from a complete loss of a single failure group (and the tiebreaker with a quorum override). However,
all disks in the remaining failure group must remain active and usable in order for the file system to
continue its operation. A subsequent loss of at least one of the disks in the remaining failure group
would render the file system unusable and trigger a forced unmount. In such situations, users may
still be able perform a restricted mount (as described in “Restricted mode mount” on page 23) and
attempt to recover parts of their data from the damaged file system.
3. When running mmfsctl syncFSconfig, you may get an error similar to this one:
mmfsctl: None of the nodes in the peer cluster can be reached
If this happens, check the network connectivity between the peer GPFS clusters and verify their
remote shell setup. This command requires full TCP/IP connectivity between the two sites, and all
nodes must be able to communicate using ssh or rsh without the use of a password.

Other problems with disaster recovery


1. Currently, users are advised to always specify the all option when invoking the mmfsctl
syncFSconfig command, rather than the device name of one specific file system. This enables GPFS to
detect and correctly resolve the configuration discrepancies that may have occurred as result of a
manual administrative action in the target GPFS cluster (the one into which the configuration is being
imported).
2. The optional SpecFile parameter to the mmfsctl syncFSconfig (specified with the -S flag) must be a
fully-qualified path name defining the location of the spec data file on nodes in the target cluster. It is
not the local path name to the file on the node, from which the mmfsctl command is being issued. A
copy of this file must be available at the provided path name on all peer contact nodes (the ones
defined in the RemoteNodesFile).

GPFS commands are unsuccessful


GPFS commands can be unsuccessful for various reasons.

Unsuccessful command results will be indicated by:


v Return codes indicating the GPFS daemon is no longer running.
v Command specific problems indicating you are unable to access the disks.
v A nonzero return code from the GPFS command.

Some reasons that GPFS commands can be unsuccessful include:


1. If all commands are generically unsuccessful, this may be due to a daemon failure. Verify that the
GPFS daemon is active. Issue:
mmgetstate

56 GPFS: Problem Determination Guide


If the daemon is not active, check /var/adm/ras/mmfs.log.latest and /var/adm/ras/mmfs.log.previous
on the local node and on the file system manager node. These files enumerate the failing sequence of
the GPFS daemon.
If there is a communication failure with the file system manager node, you will receive an error and
the errno global variable may be set to EIO (I/O error).
2. Verify the GPFS cluster configuration data files are not locked and are accessible. To determine if the
GPFS cluster configuration data files are locked, see “GPFS cluster configuration data files are locked”
on page 44.
3. The rsh command is not functioning correctly. See “Authorization problems” on page 43.
If rsh is not functioning properly on a node in the GPFS cluster, a GPFS administration command that
needs to run work on that node will fail with a 'permission is denied' error. The system displays
information similar to:
mmlscluster
rshd: 0826-813 Permission is denied.
mmdsh: 6027-1615 k145n02 remote shell process had return code 1.
mmlscluster: 6027-1591 Attention: Unable to retrieve GPFS cluster files from node k145n02
rshd: 0826-813 Permission is denied.
mmdsh: 6027-1615 k145n01 remote shell process had return code 1.
mmlscluster: 6027-1592 Unable to retrieve GPFS cluster files from node k145n01

These messages indicate that rsh is not working properly on nodes k145n01 and k145n02.
If you encounter this type of failure, determine why rsh is not working on the identified node. Then
fix the problem.
4. Most problems encountered during file system creation fall into three classes:
v You did not create network shared disks which are required to build the file system.
v The creation operation cannot access the disk.
Follow the procedures for checking access to the disk. This can result from a number of factors
including those described in “NSD and underlying disk subsystem failures” on page 91.
v Unsuccessful attempt to communicate with the file system manager.
The file system creation runs on the file system manager node. If that node goes down, the mmcrfs
command may not succeed.
5. If the mmdelnode command was unsuccessful and you plan to permanently de-install GPFS from a
node, you should first remove the node from the cluster. If this is not done and you run the
mmdelnode command after the mmfs code is removed, the command will fail and display a message
similar to this example:
Verifying GPFS is stopped on all affected nodes ...
k145n05: ksh: /usr/lpp/mmfs/bin/mmremote: not found.

If this happens, power off the node and run the mmdelnode command again.
6. If you have successfully installed and are operating with the latest level of GPFS, but cannot run the
new functions available, it is probable that you have not issued the mmchfs -V full or mmchfs -V
compat command to change the version of the file system. This command must be issued for each of
your file systems.
In addition to mmchfs -V, you may need to run the mmmigratefs command. See the File system
format changes between versions of GPFS topic in the GPFS: Administration and Programming Reference.

Note: Before issuing the -V option (with full or compat), see the Migration, coexistence and compatibility
topic in the GPFS: Concepts, Planning, and Installation Guide. You must ensure that all nodes in the
cluster have been migrated to the latest level of GPFS code and that you have successfully run the
mmchconfig release=LATEST command.

Chapter 6. GPFS installation, configuration, and operation problems 57


Make sure you have operated with the new level of code for some time and are certain you want to
migrate to the latest level of GPFS. Issue the mmchfs -V full command only after you have definitely
decided to accept the latest level, as this will cause disk changes that are incompatible with previous
levels of GPFS.
For more information about the mmchfs command, see the GPFS: Administration and Programming
Reference.

GPFS error messages for unsuccessful GPFS commands


If message 6027-538 is returned from the mmcrfs command, verify that the disk descriptors are specified
correctly and that all named disks exist and are online. Issue the mmlsnsd command to check the disks.
6027-538
Error accessing disks.

If the daemon failed while running the command, you will see message 6027-663. Follow the procedures
in “GPFS daemon went down” on page 50.
6027-663
Lost connection to file system daemon.

If the daemon was not running when you issued the command, you will see message 6027-665. Follow
the procedures in “GPFS daemon will not come up” on page 47.
6027-665
Failed to connect to file system daemon: errorString.

When GPFS commands are unsuccessful, the system may display information similar to these error
messages:
6027-1627
The following nodes are not aware of the configuration server change: nodeList. Do not start GPFS
on the above nodes until the problem is resolved.

Application program errors


When receiving application program errors, there are various courses of action to take.

Follow these steps to help resolve application program errors:


1. Loss of file system access usually appears first as an error received by an application. Such errors are
normally encountered when the application tries to access an unmounted file system.
The most common reason for losing access to a single file system is a failure somewhere in the path
to a large enough number of disks to jeopardize your data if operation continues. These errors may be
reported in the operating system error log on any node because they are logged in the first node to
detect the error. Check all error logs for errors.
The mmlsmount -L command can be used to determine the nodes that have successfully mounted a
file system.
2. There are several cases where the state of a given disk subsystem will prevent access by GPFS. This
will be seen by the application as I/O errors of various types and will be reported in the error logs as
MMFS_SYSTEM_UNMOUNT or MMFS_DISKFAIL records. This state can be found by issuing the
mmlsdisk command.
3. If allocation of data blocks or files (which quota limits should allow) fails, issue the mmlsquota
command for the user, group or fileset.
If filesets are involved, use these steps to determine which fileset was being accessed at the time of
the failure:
a. From the error messages generated, obtain the path name of the file being accessed.

58 GPFS: Problem Determination Guide


| b. Go to the directory just obtained, and use this mmlsattr -L command to obtain the fileset name:
| mmlsattr -L . | grep "fileset name:"

| The system produces output similar to:


| fileset name: myFileset
| c. Use the mmlsquota -j command to check the quota limit of the fileset. For example, using the
| fileset name found in the previous step, issue this command:
| mmlsquota -j myFileset -e

| The system produces output similar to:


| Block Limits | File Limits
| Filesystem type KB quota limit in_doubt grace | files quota limit in_doubt grace Remarks
| fs1 FILESET 2152 0 0 0 none | 250 0 250 0 none
The mmlsquota output is similar when checking the user and group quota. If usage is equal to or
approaching the hard limit, or if the grace period has expired, make sure that no quotas are lost by
checking in doubt values.
If quotas are exceeded in the in doubt category, run the mmcheckquota command. For more
information, see “The mmcheckquota command” on page 31.

Note: There is no way to force GPFS nodes to relinquish all their local shares in order to check for
lost quotas. This can only be determined by running the mmcheckquota command immediately after
mounting the file system, and before any allocations are made. In this case, the value in doubt is the
amount lost.
To display the latest quota usage information, use the -e option on either the mmlsquota or the
mmrepquota commands. Remember that the mmquotaon and mmquotaoff commands do not enable
and disable quota management. These commands merely control enforcement of quota limits. Usage
continues to be counted and recorded in the quota files regardless of enforcement.
Reduce quota usage by deleting or compressing files or moving them out of the file system. Consider
increasing quota limit.

GPFS error messages for application program errors


Application program errors can be associated with these GPFS message numbers:
6027-506
program: loadFile is already loaded at address.
| 6027-695 [E]
File system is read-only.

Troubleshooting Windows problems


The topics that follow apply to Windows Server 2008.

Home and .ssh directory ownership and permissions


Make sure users own their home directories, which is not normally the case on Windows. They should
also own ~/.ssh and the files it contains. Here is an example of file attributes that work:
bash-3.00$ ls -l -d ~
drwx------ 1 demyn Domain Users 0 Dec 5 11:53 /dev/fs/D/Users/demyn
bash-3.00$ ls -l -d ~/.ssh
drwx------ 1 demyn Domain Users 0 Oct 26 13:37 /dev/fs/D/Users/demyn/.ssh
bash-3.00$ ls -l ~/.ssh
total 11
drwx------ 1 demyn Domain Users 0 Oct 26 13:37 .
drwx------ 1 demyn Domain Users 0 Dec 5 11:53 ..
-rw-r--r-- 1 demyn Domain Users 603 Oct 26 13:37 authorized_keys2

Chapter 6. GPFS installation, configuration, and operation problems 59


-rw------- 1 demyn Domain Users 672 Oct 26 13:33 id_dsa
-rw-r--r-- 1 demyn Domain Users 603 Oct 26 13:33 id_dsa.pub
-rw-r--r-- 1 demyn Domain Users 2230 Nov 11 07:57 known_hosts
bash-3.00$

Problems running as Administrator


You might have problems using SSH when running as the domain Administrator user. These issues do
not apply to other accounts, even if they are members of the Administrators group.

GPFS Windows and SMB2 protocol (CIFS serving)


SMB2 is the new version of the Server Message Block (SMB) protocol that was introduced with Windows
Vista and Windows Server 2008.

Various enhancements include the following (among others):


v reduced “chattiness” of the protocol
v larger buffer sizes
v faster file transfers
v caching of metadata such as directory content and file properties
v better scalability by increasing the support for number of users, shares, and open files per server

The SMB2 protocol is negotiated between a client and the server during the establishment of the SMB
connection, and it becomes active only if both the client and the server are SMB2 capable. If either side is
not SMB2 capable, the default SMB (version 1) protocol gets used.

The SMB2 protocol does active metadata caching on the client redirector side, and it relies on Directory
Change Notification on the server to invalidate and refresh the client cache. However, GPFS on Windows
currently does not support Directory Change Notification. As a result, if SMB2 is used for serving out a
GPFS filesystem, the SMB2 redirector cache on the client will not see any cache-invalidate operations if
the actual metadata is changed, either directly on the server or via another CIFS client. In such a case, the
SMB2 client will continue to see its cached version of the directory contents until the redirector cache
expires. Therefore, the use of SMB2 protocol for CIFS sharing of GPFS file systems can result in the CIFS
clients seeing an inconsistent view of the actual GPFS namespace.

A workaround is to disable the SMB2 protocol on the CIFS server (that is, the GPFS compute node). This
will ensure that the SMB2 never gets negotiated for file transfer even if any CIFS client is SMB2 capable.

To disable SMB2 on the GPFS compute node, follow the instructions under the “MORE INFORMATION”
section at the following URL:
http://support.microsoft.com/kb/974103

OpenSSH connection delays


OpenSSH can be sensitive to network configuration issues that often do not affect other system
components. One common symptom is a substantial delay (20 seconds or more) to establish a connection.
When the environment is configured correctly, a command such as ssh gandalf date should only take one
or two seconds to complete.

If you are using OpenSSH and experiencing an SSH connection delay (and if IPv6 is not supported in
your environment), try disabling IPv6 on your Windows nodes and remove or comment out any IPv6
addresses from the /etc/resolv.conf file.

60 GPFS: Problem Determination Guide


Chapter 7. GPFS file system problems
Suspect a GPFS file system problem when a file system will not mount or unmount.

You can also suspect a file system problem if a file system unmounts unexpectedly, or you receive an
error message indicating that file system activity can no longer continue due to an error, and the file
system is being unmounted to preserve its integrity. Record all error messages and log entries that you
receive relative to the problem, making sure that you look on all affected nodes for this data.

These are some of the errors encountered with GPFS file systems:
v “File system will not mount”
v “File system will not unmount” on page 70
v “File system forced unmount” on page 71
v “Unable to determine whether a file system is mounted” on page 73
v “Multiple file system manager failures” on page 74
v “Discrepancy between GPFS configuration data and the on-disk data for a file system” on page 75
v “Errors associated with storage pools, filesets and policies” on page 75
v “Failures using the mmbackup command” on page 81
v “Snapshot problems” on page 82
v “Failures using the mmpmon command” on page 85
v “NFS problems” on page 87
v “Problems working with Samba” on page 87
v “Data integrity” on page 88
v “Messages requeuing in AFM” on page 88

File system will not mount


There are indications leading you to the conclusion that your file system will not mount and courses of
action you can take to correct the problem.

Some of those indications include:


v On performing a manual mount of the file system, you get errors from either the operating system or
GPFS.
v If the file system was created with the option of an automatic mount, you will have failure return
codes in the GPFS log.
v Your application cannot access the data it needs. Check the GPFS log for messages.
v Return codes or error messages from the mmmount command.
v The mmlsmount command indicates that the file system is not mounted on certain nodes.

If your file system will not mount, follow these steps:


1. On a quorum node in the cluster that owns the file system, verify that quorum has been achieved.
Check the GPFS log to see if an mmfsd ready message has been logged, and that no errors were
reported on this or other nodes.
2. Verify that a conflicting command is not running. This applies only to the cluster that owns the file
system. However, other clusters would be prevented from mounting the file system if a conflicting
command is running in the cluster that owns the file system.

© Copyright IBM Corporation © IBM 2014 61


For example, a mount command may not be issued while the mmfsck command is running. The
mount command may not be issued until the conflicting command completes. Note that interrupting
the mmfsck command is not a solution because the file system will not be mountable until the
command completes. Try again after the conflicting command has completed.
3. Verify that sufficient disks are available to access the file system by issuing the mmlsdisk command.
GPFS requires a minimum number of disks to find a current copy of the core metadata. If sufficient
disks cannot be accessed, the mount will fail. The corrective action is to fix the path to the disk. See
“NSD and underlying disk subsystem failures” on page 91.
Missing disks can also cause GPFS to be unable to find critical metadata structures. The output of
the mmlsdisk command will show any unavailable disks. If you have not specified metadata
replication, the failure of one disk may result in your file system being unable to mount. If you have
specified metadata replication, it will require two disks in different failure groups to disable the
entire file system. If there are down disks, issue the mmchdisk start command to restart them and
retry the mount.
For a remote file system, mmlsdisk provides information about the disks of the file system.
However mmchdisk must be run from the cluster that owns the file system.
If there are no disks down, you can also look locally for error log reports, and follow the problem
determination and repair actions specified in your storage system vendor problem determination
guide. If the disk has failed, follow the procedures in “NSD and underlying disk subsystem failures”
on page 91.
4. Verify that communication paths to the other nodes are available. The lack of communication paths
between all nodes in the cluster may impede contact with the file system manager.
5. Verify that the file system is not already mounted. Issue the mount command.
6. Verify that the GPFS daemon on the file system manager is available. Run the mmlsmgr command
to determine which node is currently assigned as the file system manager. Run a trivial data access
command such as an ls on the mount point directory. If the command fails, see “GPFS daemon went
down” on page 50.
7. Check to see if the mount point directory exists and that there is an entry for the file system in the
/etc/fstab file (for Linux) or /etc/filesystems file (for AIX). The device name for a file system mount
point will be listed in column one of the /etc/fstab entry or as a dev= attribute in the /etc/filesystems
stanza entry. A corresponding device name must also appear in the /dev file system.
If any of these elements are missing, an update to the configuration information may not have been
propagated to this node. Issue the mmrefresh command to rebuild the configuration information on
the node and reissue the mmmount command.
Do not add GPFS file system information to /etc/filesystems (for AIX) or /etc/fstab (for Linux)
directly. If after running mmrefresh -f the file system information is still missing from
/etc/filesystems (for AIX) or /etc/fstab (for Linux), follow the procedures in “Information to collect
before contacting the IBM Support Center” on page 115, and then contact the IBM Support Center.
8. Check the number of file systems that are already mounted. There is a maximum number of 256
mounted file systems for a GPFS cluster. Remote file systems are included in this number.
9. If you issue mmchfs -V compat, it enables backwardly-compatible format changes only. Nodes in
remote clusters that were able to mount the file system before will still be able to do so.
If you issue mmchfs -V full, it enables all new functions that require different on-disk data
structures. Nodes in remote clusters running an older GPFS version will no longer be able to mount
the file system. If there are any nodes running an older GPFS version that have the file system
mounted at the time this command is issued, the mmchfs command will fail. For more information
about completing the migration to a new level of GPFS, see the GPFS: Concepts, Planning, and
Installation Guide.
All nodes that access the file system must be upgraded to the same level of GPFS. Check for the
possibility that one or more of the nodes was accidently left out of an effort to upgrade a multi-node
system to a new GPFS release. If you need to return to the earlier level of GPFS, you must re-create
the file system from the backup medium and restore the content in order to access it.

62 GPFS: Problem Determination Guide


10. If DMAPI is enabled for the file system, ensure that a data management application is started and
has set a disposition for the mount event. Refer to the GPFS: Data Management API Guide and the
user's guide from your data management vendor.
The data management application must be started in the cluster that owns the file system. If the
application is not started, other clusters will not be able to mount the file system. Remote mounts of
DMAPI managed file systems may take much longer to complete than those not managed by
DMAPI.
11. Issue the mmlsfs -A command to check whether the automatic mount option has been specified. If
automatic mount option is expected, check the GPFS log in the cluster that owns and serves the file
system, for progress reports indicating:
starting ...
mounting ...
mounted ....
12. If quotas are enabled, check if there was an error while reading quota files. See “MMFS_QUOTA” on
page 4.
13. Verify the maxblocksize configuration parameter on all clusters involved. If maxblocksize is less
than the block size of the local or remote file system you are attempting to mount, you will not be
able to mount it.

GPFS error messages for file system mount problems


6027-419
Failed to read a file system descriptor.
| 6027-482 [E]
| Remount failed for device name: errnoDescription
6027-549
Failed to open name.
6027-580
Unable to access vital system metadata. Too many disks are unavailable.
6027-645
Attention: mmcommon getEFOptions fileSystem failed. Checking fileName.

Error numbers specific to GPFS application calls when a file system


mount is not successful
When a mount of a file system is not successful, GPFS may report these error numbers in the operating
system error log or return them to an application:
ENO_QUOTA_INST = 237, No Quota management enabled.
To enable quotas for the file system, issue the mmchfs -Q yes command. To disable quotas for
the file system issue the mmchfs -Q no command.

Automount file system will not mount


If an automount fails when you cd into the mount point directory, first check that the file system in
question is of automount type. Use the mmlsfs -A command for local file systems. Use the mmremotefs
show command for remote file systems.

Steps to follow if automount fails to mount on Linux


On Linux, perform these steps:
1. Verify that the GPFS file system mount point is actually a symbolic link to a directory in the
automountdir directory. If automountdir=/gpfs/autmountdir then the mount point /gpfs/gpfs66
would be a symbolic link to /gpfs/automountdir/gpfs66.
a. First, verify that GPFS is up and running.

Chapter 7. GPFS file system problems 63


b. Use the mmlsconfig command to verify the automountdir directory. The default automountdir is
named /gpfs/automountdir. If the GPFS file system mount point is not a symbolic link to the
GPFS automountdir directory, then accessing the mount point will not cause the automounter to
mount the file system.
c. If the command /bin/ls -ld of the mount point shows a directory, then run the command
mmrefresh -f. If the directory is empty, the command mmrefresh -f will remove the directory and
create a symbolic link.
If the directory is not empty, you need to move or remove the files contained in that directory, or
change the mount point of the file system. For a local file system, use the mmchfs command. For a
remote file system, use the mmremotefs command.
d. Once the mount point directory is empty, run the mmrefresh -f command.
2. Verify that the autofs mount has been established. Issue this command:
mount | grep automount

Output should be similar to this:


automount(pid20331) on /gpfs/automountdir type autofs (rw,fd=5,pgrp=20331,minproto=2,maxproto=3)
For RHEL5, verify the following line is in the default master map file (/etc/auto.master):
/gpfs/automountdir program:/usr/lpp/mmfs/bin/mmdynamicmap

For example, issue:


grep mmdynamicmap /etc/auto.master

Output should be similar to this:


/gpfs/automountdir program:/usr/lpp/mmfs/bin/mmdynamicmap
This is an autofs program map, and there will be a single mount entry for all GPFS automounted file
systems. The symbolic link points to this directory, and access through the symbolic link triggers the
mounting of the target GPFS file system. To create this GPFS autofs mount, issue the mmcommon
startAutomounter command, or stop and restart GPFS using the mmshutdown and mmstartup
commands.
3. Verify that the automount daemon is running. Issue this command:
ps -ef | grep automount

Output should be similar to this:


root 5116 1 0 Jun25 pts/0 00:00:00 /usr/sbin/automount /gpfs/automountdir program
/usr/lpp/mmfs/bin/mmdynamicmap
For RHEL5, verify that the autofs daemon is running. Issue this command:
ps -ef | grep automount

Output should be similar to this:


root 22646 1 0 01:21 ? 00:00:02 automount

To start the automount daemon, issue the mmcommon startAutomounter command, or stop and
restart GPFS using the mmshutdown and mmstartup commands.

Note: If automountdir is mounted (as in step 2) and the mmcommon startAutomounter command is
not able to bring up the automount daemon, manually umount the automountdir before issuing the
mmcommon startAutomounter again.
4. Verify that the mount command was issued to GPFS by examining the GPFS log. You should see
something like this:
Mon Jun 25 11:33:03 2004: Command: mount gpfsx2.kgn.ibm.com:gpfs55 5182
5. Examine /var/log/messages for autofs error messages.
This is an example of what you might see if the remote file system name does not exist.

64 GPFS: Problem Determination Guide


Jun 25 11:33:03 linux automount[20331]: attempting to mount entry /gpfs/automountdir/gpfs55
Jun 25 11:33:04 linux automount[28911]: >> Failed to open gpfs55.
Jun 25 11:33:04 linux automount[28911]: >> No such device
Jun 25 11:33:04 linux automount[28911]: >> mount: fs type gpfs not supported by kernel
Jun 25 11:33:04 linux automount[28911]: mount(generic): failed to mount /dev/gpfs55 (type gpfs)
on /gpfs/automountdir/gpfs55
6. After you have established that GPFS has received a mount request from autofs (Step 4 on page 64)
and that mount request failed (Step 5 on page 64), issue a mount command for the GPFS file system
and follow the directions in “File system will not mount” on page 61.

Steps to follow if automount fails to mount on AIX


On AIX, perform these steps:
1. First, verify that GPFS is up and running.
2. Verify that GPFS has established autofs mounts for each automount file system. Issue the following
command:
mount | grep autofs

The output is similar to this:


/var/mmfs/gen/mmDirectMap /gpfs/gpfs55 autofs Jun 25 15:03 ignore
/var/mmfs/gen/mmDirectMap /gpfs/gpfs88 autofs Jun 25 15:03 ignore
These are direct mount autofs mount entries. Each GPFS automount file system will have an autofs
mount entry. These autofs direct mounts allow GPFS to mount on the GPFS mount point. To create
any missing GPFS autofs mounts, issue the mmcommon startAutomounter command, or stop and
restart GPFS using the mmshutdown and mmstartup commands.
3. Verify that the autofs daemon is running. Issue this command:
ps -ef | grep automount

Output is similar to this:


root 9820 4240 0 15:02:50 - 0:00 /usr/sbin/automountd

To start the automount daemon, issue the mmcommon startAutomounter command, or stop and
restart GPFS using the mmshutdown and mmstartup commands.
4. Verify that the mount command was issued to GPFS by examining the GPFS log. You should see
something like this:
Mon Jun 25 11:33:03 2007: Command: mount gpfsx2.kgn.ibm.com:gpfs55 5182
5. Since the autofs daemon logs status using syslogd, examine the syslogd log file for status information
from automountd. Here is an example of a failed automount request:
Jun 25 15:55:25 gpfsa1 automountd [9820 ] :mount of /gpfs/gpfs55:status 13
6. After you have established that GPFS has received a mount request from autofs (Step 4) and that
mount request failed (Step 5), issue a mount command for the GPFS file system and follow the
directions in “File system will not mount” on page 61.
7. If automount fails for a non-GPFS file system and you are using file /etc/auto.master, use file
/etc/auto_master instead. Add the entries from /etc/auto.master to /etc/auto_master and restart the
automount daemon.

Remote file system will not mount


When a remote file system does not mount, the problem might be with how the file system was defined
to both the local and remote nodes, or the communication paths between them. Review the Mounting a
file system owned and served by another GPFS cluster topic in the GPFS: Advanced Administration Guide to
ensure that your setup is correct.

These are some of the errors encountered when mounting remote file systems:

Chapter 7. GPFS file system problems 65


v “Remote file system I/O fails with the “Function not implemented” error message when UID mapping
is enabled”
v “Remote file system will not mount due to differing GPFS cluster security configurations”
v “Cannot resolve contact node address” on page 67
v “The remote cluster name does not match the cluster name supplied by the mmremotecluster
command” on page 67
v “Contact nodes down or GPFS down on contact nodes” on page 68
v “GPFS is not running on the local node” on page 68
v “The NSD disk does not have an NSD server specified and the mounting cluster does not have direct
access to the disks” on page 68
v “The cipherList option has not been set properly” on page 68
v “Remote mounts fail with the “permission denied” error message” on page 69

Remote file system I/O fails with the “Function not implemented” error message
when UID mapping is enabled
When user ID (UID) mapping in a multi-cluster environment is enabled, certain kinds of mapping
infrastructure configuration problems might result in I/O requests on a remote file system failing:
ls -l /fs1/testfile
ls: /fs1/testfile: Function not implemented

To troubleshoot this error, verify the following configuration details:


1. That /var/mmfs/etc/mmuid2name and /var/mmfs/etc/mmname2uid helper scripts are present and
executable on all nodes in the local cluster and on all quorum nodes in the file system home cluster,
along with any data files needed by the helper scripts.
2. That UID mapping is enabled in both local cluster and remote file system home cluster configuration
by issuing the mmlsconfig enableUIDremap command.
3. That UID mapping helper scripts are working correctly.

| For more information about configuring UID mapping, see the IBM white paper entitled UID Mapping for
| GPFS in a Multi-cluster Environment in the IBM Cluster information center (http://
| publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_uid/uid_gpfs.htm)
| or the IBM Knowledge Center (http://www.ibm.com/support/knowledgecenter/SSFKCN/
| uid_gpfs.html).

Remote file system will not mount due to differing GPFS cluster security
configurations
A mount command fails with a message similar to this:
Cannot mount gpfsxx2.ibm.com:gpfs66: Host is down.

The GPFS log on the cluster issuing the mount command should have entries similar to these:
There is more information in the log file /var/adm/ras/mmfs.log.latest
Mon Jun 25 16:39:27 2007: Waiting to join remote cluster gpfsxx2.ibm.com
Mon Jun 25 16:39:27 2007: Command: mount gpfsxx2.ibm.com:gpfs66 30291
Mon Jun 25 16:39:27 2007: The administrator of 199.13.68.12 gpfslx2 requires
secure connections. Contact the administrator to obtain the target clusters
key and register the key using "mmremotecluster update".
Mon Jun 25 16:39:27 2007: A node join was rejected. This could be due to
incompatible daemon versions, failure to find the node
in the configuration database, or no configuration manager found.
Mon Jun 25 16:39:27 2007: Failed to join remote cluster gpfsxx2.ibm.com
Mon Jun 25 16:39:27 2007: Command err 693: mount gpfsxx2.ibm.com:gpfs66 30291

The GPFS log file on the cluster that owns and serves the file system will have an entry indicating the
problem as well, similar to this:

66 GPFS: Problem Determination Guide


Mon Jun 25 16:32:21 2007: Kill accepted connection from 199.13.68.12 because security is required, err 74

To resolve this problem, contact the administrator of the cluster that owns and serves the file system to
obtain the key and register the key using mmremotecluster command.

The SHA digest field of the mmauth show and mmremotecluster commands may be used to determine if
there is a key mismatch, and on which cluster the key should be updated. For more information on the
SHA digest, see “The SHA digest” on page 35.

Cannot resolve contact node address


The following error may occur if the contact nodes for gpfsyy2.ibm.com could not be resolved. You
would expect to see this if your DNS server was down, or the contact address has been deleted.
Mon Jun 25 15:24:14 2007: Command: mount gpfsyy2.ibm.com:gpfs14 20124
Mon Jun 25 15:24:14 2007: Host ’gpfs123.ibm.com’ in gpfsyy2.ibm.com is not valid.
Mon Jun 25 15:24:14 2007: Command err 2: mount gpfsyy2.ibm.com:gpfs14 20124

To resolve the problem, correct the contact list and try the mount again.

The remote cluster name does not match the cluster name supplied by the
mmremotecluster command
A mount command fails with a message similar to this:
Cannot mount gpfslx2:gpfs66: Network is unreachable

and the GPFS log contains message similar to this:


Mon Jun 25 12:47:18 2007: Waiting to join remote cluster gpfslx2
Mon Jun 25 12:47:18 2007: Command: mount gpfslx2:gpfs66 27226
Mon Jun 25 12:47:18 2007: Failed to join remote cluster gpfslx2
Mon Jun 25 12:47:18 2007: Command err 719: mount gpfslx2:gpfs66 27226

Perform these steps:


1. Verify that the remote cluster name reported by the mmremotefs show command is the same name as
reported by the mmlscluster command from one of the contact nodes.
2. Verify the list of contact nodes against the list of nodes as shown by the mmlscluster command from
the remote cluster.

In this example, the correct cluster name is gpfslx2.ibm.com and not gpfslx2
mmlscluster

Output is similar to this:


GPFS cluster information
========================
GPFS cluster name: gpfslx2.ibm.com
GPFS cluster id: 649437685184692490
GPFS UID domain: gpfslx2.ibm.com
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
| Repository type: server-based

GPFS cluster configuration servers:


-----------------------------------
Primary server: gpfslx2.ibm.com
Secondary server: (none)

Node Daemon node name IP address Admin node name Designation


---------------------------------------------------------------------------
1 gpfslx2 198.117.68.68 gpfslx2.ibm.com quorum

Chapter 7. GPFS file system problems 67


Contact nodes down or GPFS down on contact nodes
A mount command fails with a message similar to this:
GPFS: 6027-510 Cannot mount /dev/gpfs22 on /gpfs22: A remote host did not respond
within the timeout period.

The GPFS log will have entries similar to this:


Mon Jun 25 13:11:14 2007: Command: mount gpfslx22:gpfs22 19004
Mon Jun 25 13:11:14 2007: Waiting to join remote cluster gpfslx22
Mon Jun 25 13:11:15 2007: Connecting to 199.13.68.4 gpfslx22
Mon Jun 25 13:16:36 2007: Failed to join remote cluster gpfslx22
Mon Jun 25 13:16:36 2007: Command err 78: mount gpfslx22:gpfs22 19004

To resolve the problem, use the mmremotecluster show command and verify that the cluster name
matches the remote cluster and the contact nodes are valid nodes in the remote cluster. Verify that GPFS
is active on the contact nodes in the remote cluster. Another way to resolve this problem is to change the
contact nodes using the mmremotecluster update command.

GPFS is not running on the local node


A mount command fails with a message similar to this:
mount: fs type gpfs not supported by kernel

Follow your procedures for starting GPFS on the local node.

The NSD disk does not have an NSD server specified and the mounting cluster
does not have direct access to the disks
A file system mount fails with a message similar to this:
Failed to open gpfs66.
No such device
mount: Stale NFS file handle
Some file system data are inaccessible at this time.
Check error log for additional information.
Cannot mount gpfslx2.ibm.com:gpfs66: Stale NFS file handle

The GPFS log will contain information similar to this:


Mon Jun 25 14:10:46 2007: Command: mount gpfslx2.ibm.com:gpfs66 28147
Mon Jun 25 14:10:47 2007: Waiting to join remote cluster gpfslx2.ibm.com
Mon Jun 25 14:10:47 2007: Connecting to 199.13.68.4 gpfslx2
Mon Jun 25 14:10:47 2007: Connected to 199.13.68.4 gpfslx2
Mon Jun 25 14:10:47 2007: Joined remote cluster gpfslx2.ibm.com
Mon Jun 25 14:10:48 2007: Global NSD disk, gpfs1nsd, not found.
Mon Jun 25 14:10:48 2007: Disk failure. Volume gpfs66. rc = 19. Physical volume gpfs1nsd.
Mon Jun 25 14:10:48 2007: File System gpfs66 unmounted by the system with return code 19 reason code 0
Mon Jun 25 14:10:48 2007: No such device
Mon Jun 25 14:10:48 2007: Command err 666: mount gpfslx2.ibm.com:gpfs66 28147

To resolve the problem, the cluster that owns and serves the file system must define one or more NSD
servers.

The cipherList option has not been set properly


Another reason for remote mount to fail is if cipherList is not set to a valid value. A mount command
would fail with messages similar to this:
6027-510 Cannot mount /dev/dqfs1 on /dqfs1: A remote host is not available.

The GPFS log would contain messages similar to this:


Wed Jul 18 16:11:20.496 2007: Command: mount remote.cluster:fs3 655494
Wed Jul 18 16:11:20.497 2007: Waiting to join remote cluster remote.cluster
Wed Jul 18 16:11:20.997 2007: Remote mounts are not enabled within this cluster. \
See the Advanced Administration Guide for instructions. In particular ensure keys have been \
generated and a cipherlist has been set.

68 GPFS: Problem Determination Guide


Wed Jul 18 16:11:20.998 2007: A node join was rejected. This could be due to
incompatible daemon versions, failure to find the node
in the configuration database, or no configuration manager found.
Wed Jul 18 16:11:20.999 2007: Failed to join remote cluster remote.cluster
Wed Jul 18 16:11:20.998 2007: Command: err 693: mount remote.cluster:fs3 655494
Wed Jul 18 16:11:20.999 2007: Message failed because the destination node refused the connection.

The mmchconfig cipherlist=AUTHONLY command must be run on both the cluster that owns and
controls the file system, and the cluster that is attempting to mount the file system.

Remote mounts fail with the “permission denied” error message


There are many reasons why remote mounts can fail with a “permission denied” error message.

Follow these steps to resolve permission denied problems:


1. Check with the remote cluster's administrator to make sure that the proper keys are in place. The
mmauth show command on both clusters will help with this.
2. Check that the grant access for the remote mounts has been given on the remote cluster with the
mmauth grant command. Use the mmauth show command from the remote cluster to verify this.
3. Check that the file system access permission is the same on both clusters using the mmauth show
command and the mmremotefs show command. If a remote cluster is only allowed to do a read-only
mount (see the mmauth show command), the remote nodes must specify -o ro on their mount
requests (see the mmremotefs show command). If you try to do remote mounts with read/write (rw)
access for remote mounts that have read-only (ro) access, you will get a “permission denied” error.

See the GPFS: Administration and Programming Reference for detailed information about the mmauth
command and the mmremotefs command.

Mount failure due to client nodes joining before NSD servers are
online
If a client node joins the GPFS cluster and attempts file system access prior to the file system's NSD
servers being active, the mount fails. This is especially true when automount is used. This situation can
occur during cluster startup, or any time that an NSD server is brought online with client nodes already
active and attempting to mount a file system served by the NSD server.

The file system mount failure produces a message similar to this:


Mon Jun 25 11:23:34 EST 2007: mmmount: Mounting file systems ...
No such device
Some file system data are inaccessible at this time.
Check error log for additional information.
After correcting the problem, the file system must be unmounted and then
mounted again to restore normal data access.
Failed to open fs1.
No such device
Some file system data are inaccessible at this time.
Cannot mount /dev/fs1 on /fs1: Missing file or filesystem

The GPFS log contains information similar to this:


Mon Jun 25 11:23:54 2007: Command: mount fs1 32414
Mon Jun 25 11:23:58 2007: Disk failure. Volume fs1. rc = 19. Physical volume sdcnsd.
Mon Jun 25 11:23:58 2007: Disk failure. Volume fs1. rc = 19. Physical volume sddnsd.
Mon Jun 25 11:23:58 2007: Disk failure. Volume fs1. rc = 19. Physical volume sdensd.
Mon Jun 25 11:23:58 2007: Disk failure. Volume fs1. rc = 19. Physical volume sdgnsd.
Mon Jun 25 11:23:58 2007: Disk failure. Volume fs1. rc = 19. Physical volume sdhnsd.
Mon Jun 25 11:23:58 2007: Disk failure. Volume fs1. rc = 19. Physical volume sdinsd.
Mon Jun 25 11:23:58 2007: File System fs1 unmounted by the system with return code 19
reason code 0
Mon Jun 25 11:23:58 2007: No such device

Chapter 7. GPFS file system problems 69


Mon Jun 25 11:23:58 2007: File system manager takeover failed.
Mon Jun 25 11:23:58 2007: No such device
Mon Jun 25 11:23:58 2007: Command: err 52: mount fs1 32414
Mon Jun 25 11:23:58 2007: Missing file or filesystem

Two mmchconfig command options are used to specify the amount of time for GPFS mount requests to
wait for an NSD server to join the cluster:
nsdServerWaitTimeForMount
Specifies the number of seconds to wait for an NSD server to come up at GPFS cluster startup
time, after a quorum loss, or after an NSD server failure.
Valid values are between 0 and 1200 seconds. The default is 300. The interval for checking is 10
seconds. If nsdServerWaitTimeForMount is 0, nsdServerWaitTimeWindowOnMount has no
effect.
nsdServerWaitTimeWindowOnMount
Specifies a time window to determine if quorum is to be considered recently formed.
Valid values are between 1 and 1200 seconds. The default is 600. If nsdServerWaitTimeForMount
is 0, nsdServerWaitTimeWindowOnMount has no effect.

The GPFS daemon need not be restarted in order to change these values. The scope of these two
operands is the GPFS cluster. The -N flag can be used to set different values on different nodes. In this
case, the settings on the file system manager node take precedence over the settings of nodes trying to
access the file system.

When a node rejoins the cluster (after it was expelled, experienced a communications problem, lost
quorum, or other reason for which it dropped connection and rejoined), that node resets all the failure
times that it knows about. Therefore, when a node rejoins it sees the NSD servers as never having failed.
From the node's point of view, it has rejoined the cluster and old failure information is no longer
relevant.

GPFS checks the cluster formation criteria first. If that check falls outside the window, GPFS then checks
for NSD server fail times being within the window.

File system will not unmount


There are indications leading you to the conclusion that your file system will not unmount and a course
of action to correct the problem.

Those indications include:


v Return codes or error messages indicate the file system will not unmount.
v The mmlsmount command indicates that the file system is still mounted on one or more nodes.
v Return codes or error messages from the mmumount command.

If your file system will not unmount, follow these steps:


1. If you get an error message similar to:
umount: /gpfs1: device is busy

the file system will not unmount until all processes are finished accessing it. If mmfsd is up, the
processes accessing the file system can be determined. See “The lsof command” on page 24. These
processes can be killed with the command:
lsof filesystem | grep -v COMMAND | awk ’{print $2}’ | xargs kill -9
If mmfsd is not operational, the lsof command will not be able to determine which processes are still
accessing the file system.

70 GPFS: Problem Determination Guide


For Linux nodes it is possible to use the /proc pseudo file system to determine current file access. For
each process currently running on the system, there is a subdirectory /proc/pid/fd, where pid is the
numeric process ID number. This subdirectory is populated with symbolic links pointing to the files
that this process has open. You can examine the contents of the fd subdirectory for all running
processes, manually or with the help of a simple script, to identify the processes that have open files
in GPFS file systems. Terminating all of these processes may allow the file system to unmount
successfully.
2. Verify that there are no disk media failures.
Look on the NSD server node for error log entries. Identify any NSD server node that has generated
an error log entry. See “Disk media failure” on page 96 for problem determination and repair actions
to follow.
3. If the file system must be unmounted, you can force the unmount by issuing the mmumount -f
command:

Note:
a. See “File system forced unmount” for the consequences of doing this.
b. Before forcing the unmount of the file system, issue the lsof command and close any files that are
open.
c. On Linux, you might encounter a situation where a GPFS file system cannot be unmounted, even
if you issue the mmumount -f command. In this case, you must reboot the node to clear the
condition. You can also try the system umount command before you reboot. For example:
| umount -f /fileSystem
4. If a file system that is mounted by a remote cluster needs to be unmounted, you can force the
unmount by issuing the command:
| mmumount fileSystem -f -C RemoteClusterName

File system forced unmount


There are indications that lead you to the conclusion that your file system has been forced to unmount
and various courses of action that you can take to correct the problem.

Those indications are:


v Forced unmount messages in the GPFS log.
v Your application no longer has access to data.
v Your application is getting ESTALE or ENOENT return codes.
v Multiple unsuccessful attempts to appoint a file system manager may cause the cluster manager to
unmount the file system everywhere.
Such situations involve the failure of paths to disk resources from many, if not all, nodes. The
underlying problem may be at the disk subsystem level, or lower. The error logs for each node that
unsuccessfully attempted to appoint a file system manager will contain records of a file system
unmount with an error that are either coded 212, or that occurred when attempting to assume
management of the file system. Note that these errors apply to a specific file system although it is
possible that shared disk communication paths will cause the unmount of multiple file systems.
v File system unmounts with an error indicating too many disks are unavailable.
The mmlsmount -L command can be used to determine which nodes currently have a given file
system mounted.

If your file system has been forced to unmount, follow these steps:
1. With the failure of a single disk, if you have not specified multiple failure groups and replication of
metadata, GPFS will not be able to continue because it cannot write logs or other critical metadata. If
you have specified multiple failure groups and replication of metadata, the failure of multiple disks in
different failure groups will put you in the same position. In either of these situations, GPFS will

Chapter 7. GPFS file system problems 71


forcibly unmount the file system. This will be indicated in the error log by records indicating exactly
which access failed, with an MMFS_SYSTEM_UNMOUNT record indicating the forced unmount.
The user response to this is to take the needed actions to restore the disk access and issue the
mmchdisk command to disks that are shown as down in the information displayed by the mmlsdisk
command.
2. Internal errors in processing data on a single file system may cause loss of file system access. These
errors may clear with the invocation of the umount command, followed by a remount of the file
system, but they should be reported as problems to IBM.
3. If an MMFS_QUOTA error log entry containing Error writing quota file... is generated, the quota
manager continues operation if the next write for the user, group, or fileset is successful. If not,
further allocations to the file system will fail. Check the error code in the log and make sure that the
disks containing the quota file are accessible. Run the mmcheckquota command. For more
information, see “The mmcheckquota command” on page 31.
If the file system must be repaired without quotas:
a. Disable quota management by issuing the command:
mmchfs Device -Q no
b. Issue the mmmount command for the file system.
c. Make any necessary repairs and install the backup quota files.
d. Issue the mmumount -a command for the file system.
e. Restore quota management by issuing the mmchfs Device -Q yes command.
f. Run the mmcheckquota command with the -u, -g, and -j options. For more information, see “The
mmcheckquota command” on page 31.
g. Issue the mmmount command for the file system.
4. If errors indicate that too many disks are unavailable, see “Additional failure group considerations.”

Additional failure group considerations


There is a structure in GPFS called the file system descriptor that is initially written to every disk in the file
system, but is replicated on a subset of the disks as changes to the file system occur, such as adding or
deleting disks. Based on the number of failure groups and disks, GPFS creates between one and five
replicas of the descriptor:
v If there are at least five different failure groups, five replicas are created.
v If there are at least three different disks, three replicas are created.
v If there are only one or two disks, a replica is created on each disk.

Once it is decided how many replicas to create, GPFS picks disks to hold the replicas, so that all replicas
will be in different failure groups, if possible, to reduce the risk of multiple failures. In picking replica
locations, the current state of the disks is taken into account. Stopped or suspended disks are avoided.
Similarly, when a failed disk is brought back online, GPFS may modify the subset to rebalance the file
system descriptors across the failure groups. The subset can be found by issuing the mmlsdisk -L
command.

GPFS requires a majority of the replicas on the subset of disks to remain available to sustain file system
operations:
v If there are at least five different failure groups, GPFS will be able to tolerate a loss of two of the five
groups. If disks out of three different failure groups are lost, the file system descriptor may become
inaccessible due to the loss of the majority of the replicas.
v If there are at least three different failure groups, GPFS will be able to tolerate a loss of one of the three
groups. If disks out of two different failure groups are lost, the file system descriptor may become
inaccessible due to the loss of the majority of the replicas.
v If there are fewer than three failure groups, a loss of one failure group may make the descriptor
inaccessible.

72 GPFS: Problem Determination Guide


If the subset consists of three disks and there are only two failure groups, one failure group must have
two disks and the other failure group has one. In a scenario that causes one entire failure group to
disappear all at once, if the half of the disks that are unavailable contain the single disk that is part of
the subset, everything stays up. The file system descriptor is moved to a new subset by updating the
remaining two copies and writing the update to a new disk added to the subset. But if the downed
failure group contains a majority of the subset, the file system descriptor cannot be updated and the
file system has to be force unmounted.
Introducing a third failure group consisting of a single disk that is used solely for the purpose of
maintaining a copy of the file system descriptor can help prevent such a scenario. You can designate
this disk by using the descOnly designation for disk usage on the disk descriptor. With the descOnly
designation, the disk does not hold any of the other file system data or metadata and can be as small
as 4 MB. See the NSD creation considerations topic in the GPFS: Concepts, Planning, and Installation Guide
and the Establishing disaster recovery for your GPFS cluster topic in the GPFS: Advanced Administration
Guide.

GPFS error messages for file system forced unmount problems


Indications there are not enough disks available:
6027-418
Inconsistent file system quorum. readQuorum=value writeQuorum=value quorumSize=value.
6027-419
Failed to read a file system descriptor.

Indications the file system has been forced to unmount:


| 6027-473 [X]
| File System fileSystem unmounted by the system with return code value reason code value
| 6027-474 [X]
| Recovery Log I/O Failed, Unmounting file system fileSystem

Error numbers specific to GPFS application calls when a file system


has been forced to unmount
When a file system has been forced to unmount, GPFS may report these error numbers in the operating
system error log or return them to an application:
EPANIC = 666, A file system has been forcibly unmounted because of an error. Most likely due to the
failure of one or more disks containing the last copy of metadata.
See “The operating system error log facility” on page 2 for details.
EALL_UNAVAIL = 218, A replicated read or write failed because none of the replicas were available.
Multiple disks in multiple failure groups are unavailable. Follow the procedures in Chapter 8,
“GPFS disk problems,” on page 91 for unavailable disks.

Unable to determine whether a file system is mounted


Certain GPFS file system commands cannot be performed when the file system in question is mounted.

In certain failure situations, GPFS cannot determine whether the file system in question is mounted or
not, and so cannot perform the requested command. In such cases, message 6027-1996 (Command was
unable to determine whether file system fileSystem is mounted) is issued.

If you encounter this message, perform problem determination, resolve the problem, and reissue the
command. If you cannot determine or resolve the problem, you may be able to successfully run the
command by first shutting down the GPFS daemon on all nodes of the cluster (using mmshutdown -a),
thus ensuring that the file system is not mounted.

Chapter 7. GPFS file system problems 73


GPFS error messages for file system mount status
6027-1996
Command was unable to determine whether file system fileSystem is mounted.

Multiple file system manager failures


The correct operation of GPFS requires that one node per file system function as the file system manager
at all times. This instance of GPFS has additional responsibilities for coordinating usage of the file system.

When the file system manager node fails, another file system manager is appointed in a manner that is
not visible to applications except for the time required to switch over.

There are situations where it may be impossible to appoint a file system manager. Such situations involve
the failure of paths to disk resources from many, if not all, nodes. In this event, the cluster manager
nominates several host names to successively try to become the file system manager. If none succeed, the
cluster manager unmounts the file system everywhere. See “NSD and underlying disk subsystem
failures” on page 91.

The required action here is to address the underlying condition that caused the forced unmounts and
then remount the file system. In most cases, this means correcting the path to the disks required by GPFS.
If NSD disk servers are being used, the most common failure is the loss of access through the
communications network. If SAN access is being used to all disks, the most common failure is the loss of
connectivity through the SAN.

GPFS error messages for multiple file system manager failures


The inability to successfully appoint a file system manager after multiple attempts can be associated with
both the error messages listed in “File system forced unmount” on page 71, as well as these additional
messages:
v When a forced unmount occurred on all nodes:
| 6027-635 [E]
The current file system manager failed and no new manager will be appointed.
v If message 6027-636 is displayed, it means that there may be a disk failure. See “NSD and underlying
disk subsystem failures” on page 91 for NSD problem determination and repair procedures.
| 6027-636 [E]
| Disk marked as stopped or offline.
v Message 6027-632 is the last message in this series of messages. See the accompanying messages:
6027-632
Failed to appoint new manager for fileSystem.
v Message 6027-631 occurs on each attempt to appoint a new manager (see the messages on the
referenced node for the specific reason as to why it failed):
6027-631
Failed to appoint node nodeName as manager for fileSystem.
v Message 6027-638 indicates which node had the original error (probably the original file system
manager node):
| 6027-638 [E]
File system fileSystem unmounted by node nodeName

74 GPFS: Problem Determination Guide


Error numbers specific to GPFS application calls when file system
manager appointment fails
When the appointment of a file system manager is unsuccessful after multiple attempts, GPFS may report
these error numbers in error logs, or return them to an application:
ENO_MGR = 212, The current file system manager failed and no new manager could be appointed.
This usually occurs when a large number of disks are unavailable or when there has been a major
network failure. Run mmlsdisk to determine whether disks have failed and take corrective action
if they have by issuing the mmchdisk command.

Discrepancy between GPFS configuration data and the on-disk data


for a file system
There is an indication leading you to the conclusion that there may be a discrepancy between the GPFS
configuration data and the on-disk data for a file system.

You issue a disk command (for example, mmadddisk, mmdeldisk, or mmrpldisk) and receive the
message:
6027-1290
GPFS configuration data for file system fileSystem may not be in agreement with the on-disk data
for the file system. Issue the command:
mmcommon recoverfs fileSystem

Before a disk is added to or removed from a file system, a check is made that the GPFS configuration
data for the file system is in agreement with the on-disk data for the file system. The above message is
issued if this check was not successful. This may occur if an earlier GPFS disk command was unable to
complete successfully for some reason. Issue the mmcommon recoverfs command to bring the GPFS
configuration data into agreement with the on-disk data for the file system.

If running mmcommon recoverfs does not resolve the problem, follow the procedures in “Information to
collect before contacting the IBM Support Center” on page 115, and then contact the IBM Support Center.

Errors associated with storage pools, filesets and policies


When an error is suspected while working with storage pools, policies and filesets, check the relevant
section in the GPFS: Advanced Administration Guide to ensure that your setup is correct.

When you are sure that your setup is correct, see if your problem falls into one of these categories:
v “A NO_SPACE error occurs when a file system is known to have adequate free space”
v “Negative values occur in the 'predicted pool utilizations', when some files are 'ill-placed'” on page 77
v “Policies - usage errors” on page 77
v “Errors encountered with policies” on page 78
v “Filesets - usage errors” on page 79
v “Errors encountered with filesets” on page 79
v “Storage pools - usage errors” on page 80
v “Errors encountered with storage pools” on page 81

A NO_SPACE error occurs when a file system is known to have


adequate free space
A ENOSPC (NO_SPACE) message can be returned even if a file system has remaining space. The
NO_SPACE error might occur even if the df command shows that the file system is not full.

Chapter 7. GPFS file system problems 75


The user might have a policy that writes data into a specific storage pool. When the user tries to create a
file in that storage pool, it returns the ENOSPC error if the storage pool is full. The user next issues the
df command, which indicates that the file system is not full, because the problem is limited to the one
storage pool in the user's policy. In order to see if a particular storage pool is full, the user must issue the
mmdf command.

Here is a sample scenario:


1. The user has a policy rule that says files whose name contains the word 'tmp' should be put into
storage pool sp1 in the file system fs1. This command displays the rule:
mmlspolicy fs1 -L

The system produces output similar to this:


/* This is a policy for GPFS file system fs1 */

/* File Placement Rules */


RULE SET POOL ’sp1’ WHERE name like ’%tmp%’
RULE ’default’ SET POOL ’system’
/* End of Policy */
2. The user moves a file from the /tmp directory to fs1 that has the word 'tmp' in the file name, meaning
data of tmpfile should be placed in storage pool sp1:
mv /tmp/tmpfile /fs1/

The system produces output similar to this:


mv: writing `/fs1/tmpfile’: No space left on device

This is an out-of-space error.


3. This command shows storage information for the file system:
df |grep fs1

The system produces output similar to this:


/dev/fs1 280190976 140350976 139840000 51% /fs1

This output indicates that the file system is only 51% full.
4. To query the storage usage for an individual storage pool, the user must issue the mmdf command.
mmdf fs1

The system produces output similar to this:


disk disk size failure holds holds free KB free KB
name in KB group metadata data in full blocks in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system
gpfs1nsd 140095488 4001 yes yes 139840000 (100%) 19936 ( 0%)
------------- -------------------- -------------------
(pool total) 140095488 139840000 (100%) 19936 ( 0%)

Disks in storage pool: sp1


gpfs2nsd 140095488 4001 no yes 0 ( 0%) 248 ( 0%)
------------- -------------------- -------------------
(pool total) 140095488 0 ( 0%) 248 ( 0%)

============= ==================== ===================


(data) 280190976 139840000 ( 50%) 20184 ( 0%)
(metadata) 140095488 139840000 (100%) 19936 ( 0%)
============= ==================== ===================
(total) 280190976 139840000 ( 50%) 20184 ( 0%)

Inode Information
------------------

76 GPFS: Problem Determination Guide


Number of used inodes: 74
Number of free inodes: 137142
Number of allocated inodes: 137216
Maximum number of inodes: 150016

In this case, the user sees that storage pool sp1 has 0% free space left and that is the reason for the
NO_SPACE error message.
5. To resolve the problem, the user must change the placement policy file to avoid putting data in a full
storage pool, delete some files in storage pool sp1, or add more space to the storage pool.

Negative values occur in the 'predicted pool utilizations', when some


files are 'ill-placed'
This is a hypothetical situation where ill-placed files can cause GPFS to produce a 'Predicted Pool
Utilization' of a negative value.

Suppose that 2 GB of data from a 5 GB file named abc, that is supposed to be in the system storage pool,
are actually located in another pool. This 2 GB of data is said to be 'ill-placed'. Also, suppose that 3 GB of
this file are in the system storage pool, and no other file is assigned to the system storage pool.

If you run the mmapplypolicy command to schedule file abc to be moved from the system storage pool
to a storage pool named YYY, the mmapplypolicy command does the following:
1. Starts with the 'Current pool utilization' for the system storage pool, which is 3 GB.
2. Subtracts 5 GB, the size of file abc.
3. Arrives at a 'Predicted Pool Utilization' of negative 2 GB.

The mmapplypolicy command does not know how much of an 'ill-placed' file is currently in the wrong
storage pool and how much is in the correct storage pool.

When there are ill-placed files in the system storage pool, the 'Predicted Pool Utilization' can be any
positive or negative value. The positive value can be capped by the LIMIT clause of the MIGRATE rule.
The 'Current Pool Utilizations' should always be between 0% and 100%.

Policies - usage errors


These are common mistakes and misunderstandings encountered when dealing with policies:
1. You are advised to test your policy rules using the mmapplypolicy command with the -I test option.
Also consider specifying a test-subdirectory within your file system. Do not apply a policy to an
entire file system of vital files until you are confident that the rules correctly express your intentions.
Even then, you are advised to do a sample run with the mmapplypolicy -I test command using the
option -L 3 or higher, to better understand which files are selected as candidates, and which
candidates are chosen.
The -L flag of the mmapplypolicy command can be used to check a policy before it is applied. For
examples and more information on this flag, see “The mmapplypolicy -L command” on page 25.
2. There is a 1 MB limit on the total size of the policy file installed in GPFS.
3. Ensure that all clocks on all nodes of the GPFS cluster are synchronized. Depending on the policies in
effect, variations in the clock times can cause unexpected behavior.
The mmapplypolicy command uses the time on the node on which it is run as the current time.
Policy rules may refer to a file's last access time or modification time, which is set by the node which
last accessed or modified the file. If the clocks are not synchronized, files may be treated as older or
younger than their actual age, and this could cause files to be migrated or deleted prematurely, or not
at all.
A suggested solution is to use NTP to keep the clocks synchronized on all nodes in the cluster.
4. The rules of a policy file are evaluated in order.

Chapter 7. GPFS file system problems 77


A new file is assigned to the storage pool of the first rule that it matches. If the file fails to match any
rule, the file creation fails with an EINVAL error code. A suggested solution is to put a DEFAULT
clause as the last entry of the policy file.
5. When a policy file is installed, GPFS verifies that the named storage pools exist.
However, GPFS allows an administrator to delete pools that are mentioned in the policy file. This
allows more freedom for recovery from hardware errors. Consequently, the administrator must be
careful when deleting storage pools referenced in the policy.

Errors encountered with policies


These are errors encountered with policies and how to analyze them:
1. Policy file never finishes, appears to be looping.
The mmapplypolicy command runs by making two passes over the file system - one over the inodes
and one over the directory structure. The policy rules are applied to each file to determine a list of
candidate files. The list is sorted by the weighting specified in the rules, then applied to the file
system. No file is ever moved more than once. However, due to the quantity of data involved, this
operation may take a long time and appear to be hung or looping.
The time required to run mmapplypolicy is a function of the number of files in the file system, the
current load on the file system, and on the node in which mmapplypolicy is run. If this function
appears to not finish, you may need to reduce the load on the file system or run mmapplypolicy on a
less loaded node in the cluster.
2. Initial file placement is not correct.
The placement rules specify a single pool for initial placement. The first rule that matches the file's
attributes selects the initial pool. If that pool is incorrect, then the placement rules must be updated to
select a different pool. You may see current placement rules by running mmlspolicy -L. For existing
files, the file can be moved to its desired pool using the mmrestripefile or mmchattr commands.
For examples and more information on mmlspolicy -L, see “The mmapplypolicy -L command” on
page 25.
3. Data migration, deletion or exclusion not working properly.
The mmapplypolicy command selects a list of candidate files to be migrated or deleted. The list is
sorted by the weighting factor specified in the rules, then applied to a sufficient number of files on
the candidate list to achieve the utilization thresholds specified by the pools. The actual migration and
deletion are done in parallel.
These are some reasons for apparently incorrect operation:
v The file was not selected as a candidate for the expected rule. Each file is selected as a candidate for
only the first rule that matched its attributes. If the matched rule specifies an invalid storage pool,
the file is not moved. The -L 4 option on mmapplypolicy displays the details for candidate
selection and file exclusion.
v The file was a candidate, but was not operated on. Only the candidates necessary to achieve the
desired pool utilizations are migrated. Using the -L 3 option displays more information on
candidate selection and files chosen for migration.
For more information on mmlspolicy -L, see “The mmapplypolicy -L command” on page 25.
v The file was scheduled for migration but was not moved. In this case, the file will be shown as
'ill-placed' by the mmlsattr -L command, indicating that the migration did not succeed. This occurs
if the new storage pool assigned to the file did not have sufficient free space for the file when the
actual migration was attempted. Since migrations are done in parallel, it is possible that the target
pool had files which were also migrating, but had not yet been moved. If the target pool now has
sufficient free space, the files can be moved using the commands: mmrestripefs, mmrestripefile,
mmchattr.
4. Asserts or error messages indicating a problem.

78 GPFS: Problem Determination Guide


The policy rule language can only check for some errors at runtime. For example, a rule that causes a
divide by zero cannot be checked when the policy file is installed. Errors of this type generate an
error message and stop the policy evaluation for that file.

Note: I/O errors while migrating files indicate failing storage devices and must be addressed like any
other I/O error. The same is true for any file system error or panic encountered while migrating files.

Filesets - usage errors


These are common mistakes and misunderstandings encountered when dealing with filesets:
1. Fileset junctions look very much like ordinary directories, but they cannot be deleted by the usual
commands such as rm -r or rmdir. Using these commands on a fileset junction could result in a Not
owner message on an AIX system, or an Operation not permitted message on a Linux system.
As a consequence these commands may fail when applied to a directory that is a fileset junction.
Similarly, when rm -r is applied to a directory that contains a fileset junction, it will fail as well.
On the other hand, rm -r will delete all the files contained in the filesets linked under the specified
directory. Use the mmunlinkfileset command to remove fileset junctions.
2. Files and directories may not be moved from one fileset to another, nor may a hard link cross fileset
boundaries.
If the user is unaware of the locations of fileset junctions, mv and ln commands may fail
unexpectedly. In most cases, the mv command will automatically compensate for this failure and use
a combination of cp and rm to accomplish the desired result. Use the mmlsfileset command to view
the locations of fileset junctions. Use the mmlsattr -L command to determine the fileset for any given
file.
3. Because a snapshot saves the contents of a fileset, deleting a fileset included in a snapshot cannot
completely remove the fileset.
The fileset is put into a 'deleted' state and continues to appear in mmlsfileset output. Once the last
snapshot containing the fileset is deleted, the fileset will be completely removed automatically. The
mmlsfileset --deleted command indicates deleted filesets and shows their names in parentheses.
4. Deleting a large fileset may take some time and may be interrupted by other failures, such as disk
errors or system crashes.
When this occurs, the recovery action leaves the fileset in a 'being deleted' state. Such a fileset may
not be linked into the namespace. The corrective action it to finish the deletion by reissuing the fileset
delete command:
mmdelfileset fs1 fsname1 -f

The mmlsfileset command identifies filesets in this state by displaying a status of 'Deleting'.
5. If you unlink a fileset that has other filesets linked below it, any filesets linked to it (that is, child
filesets) become inaccessible. The child filesets remain linked to the parent and will become accessible
again when the parent is re-linked.
6. By default, the mmdelfileset command will not delete a fileset that is not empty.
To empty a fileset, first unlink all its immediate child filesets, to remove their junctions from the
fileset to be deleted. Then, while the fileset itself is still linked, use rm -rf or a similar command, to
remove the rest of the contents of the fileset. Now the fileset may be unlinked and deleted.
Alternatively, the fileset to be deleted can be unlinked first and then mmdelfileset can be used with
the -f (force) option. This will unlink its child filesets, then destroy the files and directories contained
in the fileset.
7. When deleting a small dependent fileset, it may be faster to use the rm -rf command instead of the
mmdelfileset command with the -f option.

Errors encountered with filesets


These are errors encountered with filesets and how to analyze them:

Chapter 7. GPFS file system problems 79


1. Problems can arise when running backup and archive utilities against a file system with unlinked
filesets. See the Filesets and backup topic in the GPFS: Advanced Administration Guide for details.
2. In the rare case that the mmfsck command encounters a serious error checking the file system's fileset
metadata, it may not be possible to reconstruct the fileset name and comment. These cannot be
inferred from information elsewhere in the file system. If this happens, mmfsck will create a dummy
name for the fileset, such as 'Fileset911' and the comment will be set to the empty string.
3. Sometimes mmfsck encounters orphaned files or directories (those without a parent directory), and
traditionally these are reattached in a special directory called 'lost+found' in the file system root.
When a file system contains multiple filesets, however, orphaned files and directories are reattached
in the 'lost+found' directory in the root of the fileset to which they belong. For the root fileset, this
directory appears in the usual place, but other filesets may each have their own 'lost+found' directory.

Active file management fileset errors

When the mmafmctl Device getstate command displays a NeedsResync target/fileset state, inconsistencies
exist between the home and cache. To ensure that the cached data is synchronized with the home and the
fileset is returned to Active state, either the file system must be unmounted and mounted or the fileset
must be unlinked and linked. Once this is done, the next update to fileset data will trigger an automatic
synchronization of data from the cache to the home.

Storage pools - usage errors


These are common mistakes and misunderstandings encountered when dealing with storage pools:
1. Only the system storage pool is allowed to store metadata. All other pools must have the dataOnly
attribute.
2. Take care to create your storage pools with sufficient numbers of failure groups to enable the desired
level of replication.
When the file system is created, GPFS requires all of the initial pools to have at least as many failure
groups as defined by the default replication (-m and -r flags on the mmcrfs command). However,
once the file system has been created, the user can create a storage pool with fewer failure groups
than the default replication.
The mmadddisk command issues a warning, but it allows the disks to be added and the storage pool
defined. To use the new pool, the user must define a policy rule to create or migrate files into the new
pool. This rule should be defined to set an appropriate replication level for each file assigned to the
pool. If the replication level exceeds the number of failure groups in the storage pool, all files
assigned to the pool incur added overhead on each write to the file, in order to mark the file as
ill-replicated.
To correct the problem, add additional disks to the storage pool, defining a different failure group, or
insure that all policy rules that assign files to the pool also set the replication appropriately.
3. GPFS does not permit the mmchdisk or mmrpldisk command to change a disk's storage pool
assignment. Changing the pool assignment requires all data residing on the disk to be moved to
another disk before the disk can be reassigned. Moving the data is a costly and time-consuming
operation; therefore GPFS requires an explicit mmdeldisk command to move it, rather than moving it
as a side effect of another command.
4. Some storage pools allow larger disks to be added than do other storage pools.
When the file system is created, GPFS defines the maximum size disk that can be supported using the
on-disk data structures to represent it. Likewise, when defining a new storage pool, the newly created
on-disk structures establish a limit on the maximum size disk that can be added to that pool.
To add disks that exceed the maximum size allowed by a storage pool, simply create a new pool
using the larger disks.
The mmdf command can be used to find the maximum disk size allowed for a storage pool.
5. If you try to delete a storage pool when there are files still assigned to the pool, consider this:

80 GPFS: Problem Determination Guide


A storage pool is deleted when all disks assigned to the pool are deleted. To delete the last disk, all
data residing in the pool must be moved to another pool. Likewise, any files assigned to the pool,
whether or not they contain data, must be reassigned to another pool. The easiest method for
reassigning all files and migrating all data is to use the mmapplypolicy command with a single rule
to move all data from one pool to another. You should also install a new placement policy that does
not assign new files to the old pool. Once all files have been migrated, reissue the mmdeldisk
command to delete the disk and the storage pool.
If all else fails, and you have a disk that has failed and cannot be recovered, follow the procedures in
“Information to collect before contacting the IBM Support Center” on page 115, and then contact the
IBM Support Center for commands to allow the disk to be deleted without migrating all data from it.
Files with data left on the failed device will lose data. If the entire pool is deleted, any existing files
assigned to that pool are reassigned to a “broken” pool, which prevents writes to the file until the file
is reassigned to a valid pool.
6. Ill-placed files - understanding and correcting them.
The mmapplypolicy command migrates a file between pools by first assigning it to a new pool, then
moving the file's data. Until the existing data is moved, the file is marked as 'ill-placed' to indicate
that some of its data resides in its previous pool. In practice, mmapplypolicy assigns all files to be
migrated to their new pools, then it migrates all of the data in parallel. Ill-placed files indicate that the
mmapplypolicy or mmchattr command did not complete its last migration or that -I defer was used.
To correct the placement of the ill-placed files, the file data needs to be migrated to the assigned
pools. You can use the mmrestripefs, or mmrestripefile commands to move the data.
7. Using the -P PoolName option on the mmrestripefs, command:
This option restricts the restripe operation to a single storage pool. For example, after adding a disk to
a pool, only the data in that pool needs to be restriped. In practice, -P PoolName simply restricts the
operation to the files assigned to the specified pool. Files assigned to other pools are not included in
the operation, even if the file is ill-placed and has data in the specified pool.

Errors encountered with storage pools


These are error encountered with policies and how to analyze them:
1. Access time to one pool appears slower than the others.
A consequence of striping data across the disks is that the I/O throughput is limited by the slowest
device. A device encountering hardware errors or recovering from hardware errors may effectively
limit the throughput to all devices. However using storage pools, striping is done only across the
disks assigned to the pool. Thus a slow disk impacts only its own pool; all other pools are not
impeded.
To correct the problem, check the connectivity and error logs for all disks in the slow pool.
2. Other storage pool problems might really be disk problems and should be pursued from the
standpoint of making sure that your disks are properly configured and operational. See Chapter 8,
“GPFS disk problems,” on page 91.

Failures using the mmbackup command


Use the mmbackup command to back up the files in a GPFS file system to storage on a Tivoli® Storage
Manager (TSM) server. A number of factors can cause mmbackup to fail.

The most common of these are:


v The file system is not mounted on the node issuing the mmbackup command.
v The file system is not mounted on the TSM client nodes.
v The mmbackup command was issued to back up a file system owned by a remote cluster.
v The TSM clients are not able to communicate with the TSM server due to authorization problems.
v The TSM server is down or out of storage space.

Chapter 7. GPFS file system problems 81


v When the target of the backup is tape, the TSM server may be unable to handle all of the backup client
processes because the value of the TSM server's MAXNUMMP parameter is set lower than the number
of client processes. This failure is indicated by message ANS1312E from TSM.

The errors from mmbackup normally indicate the underlying problem.

GPFS error messages for mmbackup errors


6027-1995
Device deviceName is not mounted on node nodeName.

TSM error messages


ANS1312E
Server media mount not possible.

Snapshot problems
Use the mmlssnapshot command as a general hint for snapshot-related problems, to find out what
snapshots exist, and what state they are in. Use the mmsnapdir command to find the snapshot directory
name used to permit access.

The mmlssnapshot command displays the list of all snapshots of a file system. This command lists the
snapshot name, some attributes of the snapshot, as well as the snapshot's status. The mmlssnapshot
command does not require the file system to be mounted.

Problems with locating a snapshot


The mmlssnapshot and mmsnapdir commands are provided to assist in locating the snapshots in the file
system directory structure. Only valid snapshots are visible in the file system directory structure. They
appear in a hidden subdirectory of the file system's root directory. By default the subdirectory is named
.snapshots. The valid snapshots appear as entries in the snapshot directory and may be traversed like
any other directory. The mmsnapdir command can be used to display the assigned snapshot directory
name.

Problems not directly related to snapshots


Many errors returned from the snapshot commands are not specifically related to the snapshot. For
example, disk failures or node failures could cause a snapshot command to fail. The response to these
types of errors is to fix the underlying problem and try the snapshot command again.

GPFS error messages for indirect snapshot errors


The error messages for this type of problem do not have message numbers, but can be recognized by
their message text:
v 'Unable to sync all nodes, rc=errorCode.'
v 'Unable to get permission to create snapshot, rc=errorCode.'
v 'Unable to quiesce all nodes, rc=errorCode.'
v 'Unable to resume all nodes, rc=errorCode.'
v 'Unable to delete snapshot filesystemName from file system snapshotName, rc=errorCode.'
v 'Error restoring inode number, error errorCode.'
v 'Error deleting snapshot snapshotName in file system filesystemName, error errorCode.'
v 'commandString failed, error errorCode.'
v 'None of the nodes in the cluster is reachable, or GPFS is down on all of the nodes.'
v 'File system filesystemName is not known to the GPFS cluster.'

82 GPFS: Problem Determination Guide


Snapshot usage errors
Many errors returned from the snapshot commands are related to usage restrictions or incorrect snapshot
names.

An example of a snapshot restriction error is exceeding the maximum number of snapshots allowed at
one time. For simple errors of these types, you can determine the source of the error by reading the error
message or by reading the description of the command. You can also run the mmlssnapshot command to
see the complete list of existing snapshots.

Examples of incorrect snapshot name errors are trying to delete a snapshot that does not exist or trying to
create a snapshot using the same name as an existing snapshot. The rules for naming global and fileset
snapshots are designed to minimize conflicts between the file system administrator and the fileset
owners. These rules can result in errors when fileset snapshot names are duplicated across different
filesets or when the snapshot command -j option (specifying a qualifying fileset name) is provided or
omitted incorrectly. To resolve name problems review the mmlssnapshot output with careful attention to
the Fileset column. You can also specify the -s or -j options of the mmlssnapshot command to limit the
output. For snapshot deletion, the -j option must exactly match the Fileset column.

For more information about snapshot naming conventions, see the mmcrsnapshot command in the GPFS:
Administration and Programming Reference.

GPFS error messages for snapshot usage errors


The error messages for this type of problem do not have message numbers, but can be recognized by
their message text:
v 'File system filesystemName does not contain a snapshot snapshotName, rc=errorCode.'
v 'Cannot create a new snapshot until an existing one is deleted. File system filesystemName has a limit of
number online snapshots.'
v 'Cannot restore snapshot. snapshotName is mounted on number nodes and in use on number nodes.'
v 'Cannot create a snapshot in a DM enabled file system, rc=errorCode.'

Snapshot status errors


Some snapshot commands like mmdelsnapshot and mmrestorefs may require a substantial amount of
time to complete. If the command is interrupted, say by the user or due to a failure, the snapshot may be
left in an invalid state. In many cases, the command must be completed before other snapshot commands
are allowed to run. The source of the error may be determined from the error message, the command
description, or the snapshot status available from mmlssnapshot.

GPFS error messages for snapshot status errors


The error messages for this type of problem do not have message numbers, but can be recognized by
their message text:
v 'Cannot delete snapshot snapshotName which is snapshotState, error = errorCode.'
v 'Cannot restore snapshot snapshotName which is snapshotState, error = errorCode.'
v 'Previous snapshot snapshotName is invalid and must be deleted before a new snapshot may be created.'
v 'Previous snapshot snapshotName must be restored before a new snapshot may be created.'
v 'Previous snapshot snapshotName is invalid and must be deleted before another snapshot may be
deleted.'
v 'Previous snapshot snapshotName is invalid and must be deleted before another snapshot may be
restored.'
v 'More than one snapshot is marked for restore.'
v 'Offline snapshot being restored.'

Chapter 7. GPFS file system problems 83


Errors encountered when restoring a snapshot
If the mmrestorefs command is interrupted, the file system may not be consistent, and GPFS will not
allow it to be mounted until the restore command completes. The error message for this case is:
6027-2632
Mount of fileSystem failed: snapshot snapshotName must be restored before it can be mounted.

If the mmrestorefs command fails with the following error:


6027-2622
Error restoring inode inode, err number

the user should fix the underlying problem and reissue the mmrestorefs command. If the user cannot fix
the underlying problem, the following steps can be taken to complete the restore command and recover
the user data:
1. If there are other snapshots available, the user can restore a different snapshot.
2. If the error code in the message is ENOSPC, there are not enough free blocks in the file system to
restore the selected snapshot. The user may add space to the file system by adding a new disk. As an
alternative, the user may delete a different snapshot from the file system to free some existing space.
The user is not allowed to delete the snapshot that is being restored. Once there is additional free
space, reissue the mmrestorefs command.
3. The mmrestorefs command can be forced to continue, even if it encounters an error, by using the -c
option. The command will restore as many files as possible, but may leave the file system in an
inconsistent state. Some files may not have been restored or may no longer be accessible. The user
should run mmfsck after the restore completes to make the file system consistent again.
4. If the above steps fail, the file system may be mounted in restricted mode, allowing the user to copy
as many files as possible into a newly created file system, or one created from an offline backup of the
data. See “Restricted mode mount” on page 23.

Note: In both steps 3 and 4, user data is lost. These steps are provided to allow as much user data as
possible to be recovered.

| If the mmrestorefs -j command fails with the following error:


| 6027-953
| Failed to get a handle for fileset filesetName, snapshot snapshotName in file system fileSystem.
| errorMessage.

| the file system that contains the snapshot to restore should be mounted, and then the fileset of the
| snapshot should be linked.

Snapshot directory name conflicts


By default, all snapshots appear in a directory named .snapshots in the root directory of the file system.
This directory is dynamically generated when the first snapshot is created and continues to exist even
after the last snapshot is deleted. If the user tries to create the first snapshot, and a normal file or
| directory named .snapshots already exists, the mmcrsnapshot command will be successful but the
| snapshot may not be accessed.

There are two ways to fix this problem:


1. Delete or rename the existing file or directory
2. Tell GPFS to use a different name for the dynamically-generated directory of snapshots by running
the mmsnapdir command.

It is also possible to get a name conflict as a result of issuing the mmrestorefs command. Since
mmsnapdir allows changing the name of the dynamically-generated snapshot directory, it is possible that

84 GPFS: Problem Determination Guide


an older snapshot contains a normal file or directory that conflicts with the current name of the snapshot
directory. When this older snapshot is restored, the mmrestorefs command will recreate the old, normal
file or directory in the file system root directory. The mmrestorefs command will not fail in this case, but
the restored file or directory will hide the existing snapshots. After invoking mmrestorefs it may
therefore appear as if the existing snapshots have disappeared. However, mmlssnapshot should still
show all existing snapshots.

The fix is the similar to the one mentioned before. Perform one of these two steps:
1. After the mmrestorefs command completes, rename the conflicting file or directory that was restored
in the root directory.
2. Run the mmsnapdir command to select a different name for the dynamically-generated snapshot
directory.

Finally, the mmsnapdir -a option enables a dynamically-generated snapshot directory in every directory,
not just the file system root. This allows each user quick access to snapshots of their own files by going
into .snapshots in their home directory or any other of their directories.

Unlike .snapshots in the file system root, .snapshots in other directories is invisible, that is, an ls -a
command will not list .snapshots. This is intentional because recursive file system utilities such as find,
du or ls -R would otherwise either fail or produce incorrect or undesirable results. To access snapshots,
the user must explicitly specify the name of the snapshot directory, for example: ls ~/.snapshots. If there
is a name conflict (that is, a normal file or directory named .snapshots already exists in the user's home
directory), the user must rename the existing file or directory.

The inode numbers that are used for and within these special .snapshots directories are constructed
dynamically and do not follow the standard rules. These inode numbers are visible to applications
through standard commands, such as stat, readdir, or ls. The inode numbers reported for these
directories can also be reported differently on different operating systems. Applications should not expect
consistent numbering for such inodes.

Failures using the mmpmon command


The mmpmon command manages performance monitoring and displays performance information.

The mmpmon command is thoroughly documented in the Monitoring GPFS I/O performance with the
mmpmon command topic in the GPFS: Advanced Administration Guide, and the Commands topic in the
GPFS: Administration and Programming Reference. Before proceeding with mmpmon problem
determination, review all of this material to ensure that you are using mmpmon correctly.

Setup problems using mmpmon


Remember these points when using the mmpmon command:
v You must have root authority.
v The GPFS daemon must be active.
v The input file must contain valid input requests, one per line. When an incorrect request is detected by
mmpmon, it issues an error message and terminates.
Input requests that appear in the input file before the first incorrect request are processed by mmpmon.
v Do not alter the input file while mmpmon is running.
v Output from mmpmon is sent to standard output (STDOUT) and errors are sent to standard
(STDERR).
v Up to five instances of mmpmon may run on a given node concurrently. See Monitoring GPFS I/O
performance with the mmpmon command in the GPFS: Advanced Administration Guide and search on
Running mmpmon concurrently from multiple users for the limitations regarding concurrent usage of
mmpmon.

Chapter 7. GPFS file system problems 85


v The mmpmon command does not support:
– Monitoring read requests without monitoring writes, or the other way around.
– Choosing which file systems to monitor.
– Monitoring on a per-disk basis.
– Specifying different size or latency ranges for reads and writes.
– Specifying different latency values for a given size range.

Incorrect output from mmpmon


If the output from mmpmon is incorrect, such as zero counters when you know that I/O activity is
taking place, consider these points:
1. Someone may have issued the reset or rhist reset requests.
2. Counters may have wrapped due to a large amount of I/O activity, or running mmpmon for an
extended period of time. For a discussion of counter sizes and counter wrapping, see Monitoring GPFS
I/O performance with the mmpmon command in the GPFS: Advanced Administration Guide and search for
Counter sizes and counter wrapping.
3. See Monitoring GPFS I/O performance with the mmpmon command in the GPFS: Advanced Administration
Guide and search for Other information about mmpmon output, which gives specific instances where
mmpmon output may be different than what was expected.

Abnormal termination or hang in mmpmon


If mmpmon hangs, perform these steps:
1. Ensure that sufficient time has elapsed to cover the mmpmon timeout value. It is controlled using the
-t flag on the mmpmon command.
2. Issue the ps command to find the PID for mmpmon.
3. Issue the kill command to terminate this PID.
4. Try the function again.
5. If the problem persists, issue this command:
mmfsadm dump eventsExporter
6. Copy the output of mmfsadm to a safe location.
7. Follow the procedures in “Information to collect before contacting the IBM Support Center” on page
115, and then contact the IBM Support Center.

If mmpmon terminates abnormally, perform these steps:


1. Determine if the GPFS daemon has failed, and if so restart it.
2. Review your invocation of mmpmon, and verify the input.
3. Try the function again.
4. If the problem persists, follow the procedures in “Information to collect before contacting the IBM
Support Center” on page 115, and then contact the IBM Support Center.

Tracing the mmpmon command


When mmpmon does not work properly, there are two trace classes used to determine the cause of the
problem. Use these only when requested by the IBM Support Center.
eventsExporter
Reports attempts to connect and whether or not they were successful.
mmpmon
Shows the command string that came in to mmpmon, and whether it was successful or not.

Note: Do not use the perfmon trace class of the GPFS trace to diagnose mmpmon problems. This trace
event does not provide the necessary data.

86 GPFS: Problem Determination Guide


NFS problems
There are some problems that can be encountered when GPFS interacts with NFS.

For details on how GPFS and NFS interact, see the NFS and GPFS topic in the GPFS: Administration and
Programming Reference.

These are some of the problems encountered when GPFS interacts with NFS:
v “NFS client with stale inode data”
v “NFS V4 problems”

NFS client with stale inode data


For performance reasons, some NFS implementations cache file information on the client. Some of the
information (for example, file state information such as file size and timestamps) is not kept up-to-date in
this cache. The client may view stale inode data (on ls -l, for example) if exporting a GPFS file system
with NFS. If this is not acceptable for a given installation, caching can be turned off by mounting the file
system on the client using the appropriate operating system mount command option (for example, -o
noac on Linux NFS clients).

Turning off NFS caching will result in extra file systems operations to GPFS, and negatively affect its
performance.

The clocks of all nodes in the GPFS cluster must be synchronized. If this is not done, NFS access to the
data, as well as other GPFS file system operations, may be disrupted. NFS relies on metadata timestamps
to validate the local operating system cache. If the same directory is either NFS-exported from more than
one node, or is accessed with both the NFS and GPFS mount point, it is critical that clocks on all nodes
that access the file system (GPFS nodes and NFS clients) are constantly synchronized using appropriate
software (for example, NTP). Failure to do so may result in stale information seen on the NFS clients.

NFS V4 problems
Before analyzing an NFS V4 problem, review this documentation to determine if you are using NFS V4
ACLs and GPFS correctly:
1. The NFS Version 4 Protocol paper and other information found at: www.nfsv4.org.
2. The Managing GPFS access control lists and NFS export topic in the GPFS: Administration and
Programming Reference.
3. The GPFS exceptions and limitations to NFS V4 ACLs topic in the GPFS: Administration and Programming
Reference.

The commands mmdelacl and mmputacl can be used to revert an NFS V4 ACL to a traditional ACL. Use
the mmdelacl command to remove the ACL, leaving access controlled entirely by the permission bits in
the mode. Then use the chmod command to modify the permissions, or the mmputacl and mmeditacl
commands to assign a new ACL.

For files, the mmputacl and mmeditacl commands can be used at any time (without first issuing the
mmdelacl command) to assign any type of ACL. The command mmeditacl -k posix provides a
translation of the current ACL into traditional POSIX form and can be used to more easily create an ACL
to edit, instead of having to create one from scratch.

Problems working with Samba


If Windows (Samba) clients fail to access files with messages indicating file sharing conflicts, and no such
conflicts exist, there may be a mismatch with file locking rules.

Chapter 7. GPFS file system problems 87


File systems being exported with Samba may (depending on which version of Samba you are using)
require the -D nfs4 flag on the mmchfs or mmcrfs commands. This setting enables NFS V4 and CIFS
(Samba) sharing rules. Some versions of Samba will fail share requests if the file system has not been
configured to support them.

Data integrity
GPFS takes extraordinary care to maintain the integrity of customer data. However, certain hardware
failures, or in extremely unusual circumstances, the occurrence of a programming error can cause the loss
of data in a file system.

GPFS performs extensive checking to validate metadata and ceases using the file system if metadata
becomes inconsistent. This can appear in two ways:
1. The file system will be unmounted and applications will begin seeing ESTALE return codes to file
operations.
2. Error log entries indicating an MMFS_SYSTEM_UNMOUNT and a corruption error are generated.

If actual disk data corruption occurs, this error will appear on each node in succession. Before proceeding
with the following steps, follow the procedures in “Information to collect before contacting the IBM
Support Center” on page 115, and then contact the IBM Support Center.
1. Examine the error logs on the NSD servers for any indication of a disk error that has been reported.
2. Take appropriate disk problem determination and repair actions prior to continuing.
3. After completing any required disk repair actions, run the offline version of the mmfsck command on
the file system.
4. If your error log or disk analysis tool indicates that specific disk blocks are in error, use the mmfileid
command to determine which files are located on damaged areas of the disk, and then restore these
files. See “The mmfileid command” on page 33 for more information.
5. If data corruption errors occur in only one node, it is probable that memory structures within the
node have been corrupted. In this case, the file system is probably good but a program error exists in
GPFS or another authorized program with access to GPFS data structures.
Follow the directions in “Data integrity” and then reboot the node. This should clear the problem. If
the problem repeats on one node without affecting other nodes check the programming specifications
code levels to determine that they are current and compatible and that no hardware errors were
reported. Refer to the GPFS: Concepts, Planning, and Installation Guide for correct software levels.

Error numbers specific to GPFS application calls when data integrity


may be corrupted
When there is the possibility of data corruption, GPFS may report these error numbers in the operating
system error log, or return them to an application:
EVALIDATE=214, Invalid checksum or other consistency check failure on disk data structure.
This indicates that internal checking has found an error in a metadata structure. The severity of
the error depends on which data structure is involved. The cause of this is usually GPFS
software, disk hardware or other software between GPFS and the disk. Running mmfsck should
repair the error. The urgency of this depends on whether the error prevents access to some file or
whether basic metadata structures are involved.

Messages requeuing in AFM


Sometimes requests in the AFM messages queue on the gateway node get requeued because of errors at
home. For example, if there is no space at home to perform a new write, a write message that is queued
is not successful and gets requeued. The administrator would see the failed message getting requeued in
the queue on the gateway node. The administrator has to resolve the issue by adding more space at

88 GPFS: Problem Determination Guide


home and running the mmafmctl resumeRequeued command, so that the requeued messages are
executed at home again. If mmafmctl resumeRequeued is not run by an administrator, AFM would still
execute the message in the regular order of message executions from cache to home.

Running the mmfsadm dump afm all command on the gateway node shows the queued messages.
Requeued messages show in the dumps similar to the following example:
c12c4apv13.gpfs.net: Normal Queue: (listed by execution order) (state: Active)
c12c4apv13.gpfs.net: Write [612457.552962] requeued file3 (43 @ 293) chunks 0 bytes 0 0

Chapter 7. GPFS file system problems 89


90 GPFS: Problem Determination Guide
Chapter 8. GPFS disk problems
GPFS uses only disk devices prepared as Network Shared Disks (NSDs). However NSDs might exist on
top of a number of underlying disk technologies.

NSDs, for example, might be defined on top of Fibre Channel SAN connected disks. This information
provides detail on the creation, use, and failure of NSDs and their underlying disk technologies.

These are some of the errors encountered with GPFS disks and NSDs:
v “NSD and underlying disk subsystem failures”
v “GPFS has declared NSDs built on top of AIX logical volumes as down” on page 100
v “Disk accessing commands fail to complete due to problems with some non-IBM disks” on page 102
v “Persistent Reserve errors” on page 102
v “GPFS is not using the underlying multipath device” on page 105

NSD and underlying disk subsystem failures


There are indications that will lead you to the conclusion that your file system has disk failures.

Some of those indications include:


v Your file system has been forced to unmount. See “File system forced unmount” on page 71.
v The mmlsmount command indicates that the file system is not mounted on certain nodes.
v Your application is getting EIO errors.
v Operating system error logs indicate you have stopped using a disk in a replicated system, but your
replication continues to operate.
v The mmlsdisk command shows that disks are down.

Note: If you are reinstalling the operating system on one node and erasing all partitions from the system,
GPFS descriptors will be removed from any NSD this node can access locally. The results of this action
might require recreating the file system and restoring from backup. If you experience this problem, do
not unmount the file system on any node that is currently mounting the file system. Contact the IBM
Support Center immediately to see if the problem can be corrected.

Error encountered while creating and using NSD disks


GPFS requires that disk devices be prepared as NSDs. This is done using the mmcrnsd command. The
input to the mmcrnsd command is given in the form of disk stanzas. For a complete explanation of disk
stanzas, see the following GPFS: Administration and Programming Reference topics:
v Stanza files
v mmchdisk command
v mmchnsd command
v mmcrfs command
v mmcrnsd command

For disks that are SAN-attached to all nodes in the cluster, device=DiskName should refer to the disk
device name in /dev on the node where the mmcrnsd command is issued. If a server list is specified,
device=DiskName must refer to the name of the disk on the first server node. The same disk can have
different local names on different nodes.

© Copyright IBM Corp. 2014 91


When you specify an NSD server node, that node performs all disk I/O operations on behalf of nodes in
the cluster that do not have connectivity to the disk. You can also specify up to eight additional NSD
server nodes. These additional NSD servers will become active if the first NSD server node fails or is
unavailable.

When the mmcrnsd command encounters an error condition, one of these messages is displayed:
6027-2108
Error found while processing stanza

or
6027-1636
Error found while checking disk descriptor descriptor

Usually, this message is preceded by one or more messages describing the error more specifically.

Another possible error from mmcrnsd is:


6027-2109
Failed while processing disk stanza on node nodeName.

or
6027-1661
Failed while processing disk descriptor descriptor on node nodeName.

One of these errors can occur if an NSD server node does not have read and write access to the disk. The
NSD server node needs to write an NSD volume ID to the raw disk. If an additional NSD server node is
specified, that NSD server node will scan its disks to find this NSD volume ID string. If the disk is
SAN-attached to all nodes in the cluster, the NSD volume ID is written to the disk by the node on which
the mmcrnsd command is running.

Displaying NSD information


Use the mmlsnsd command to display information about the currently defined NSDs in the cluster. For
example, if you issue mmlsnsd, your output may be similar to this:
File system Disk name NSD servers
---------------------------------------------------------------------------
fs1 t65nsd4b (directly attached)
fs5 t65nsd12b c26f4gp01.ppd.pok.ibm.com,c26f4gp02.ppd.pok.ibm.com
fs6 t65nsd13b c26f4gp01.ppd.pok.ibm.com,c26f4gp02.ppd.pok.ibm.com,c26f4gp03.ppd.pok.ibm.com

This output shows that:


v There are three NSDs in this cluster: t65nsd4b, t65nsd12b, and t65nsd13b.
v NSD disk t65nsd4b of filesystem fs1 is SAN-attached to all nodes in the cluster.
v NSD disk t65nsd12b of file system fs5 has 2 NSD server nodes.
v NSD disk t65nsd13b of file system fs6 has 3 NSD server nodes.

If you need to find out the local device names for these disks, you could use the -m option on the
mmlsnsd command. For example, issuing:
mmlsnsd -m

produces output similar to this example:


Disk name NSD volume ID Device Node name Remarks
-----------------------------------------------------------------------------------------
t65nsd12b 0972364D45EF7B78 /dev/hdisk34 c26f4gp01.ppd.pok.ibm.com server node
t65nsd12b 0972364D45EF7B78 /dev/hdisk34 c26f4gp02.ppd.pok.ibm.com server node

92 GPFS: Problem Determination Guide


t65nsd12b 0972364D45EF7B78 /dev/hdisk34 c26f4gp04.ppd.pok.ibm.com
t65nsd13b 0972364D00000001 /dev/hdisk35 c26f4gp01.ppd.pok.ibm.com server node
t65nsd13b 0972364D00000001 /dev/hdisk35 c26f4gp02.ppd.pok.ibm.com server node
t65nsd13b 0972364D00000001 - c26f4gp03.ppd.pok.ibm.com (not found) server node
t65nsd4b 0972364D45EF7614 /dev/hdisk26 c26f4gp04.ppd.pok.ibm.com

From this output we can tell that:


v The local disk name for t65nsd12b on NSD server c26f4gp01 is hdisk34.
v NSD disk t65nsd13b is not attached to node on which the mmlsnsd command was issued,
nodec26f4gp04.
v The mmlsnsd command was not able to determine the local device for NSD disk t65nsd13b on
c26f4gp03 server.

To find the nodes to which disk t65nsd4b is attached and the corresponding local devices for that disk,
issue:
mmlsnsd -d t65nsd4b -M

Output is similar to this example:


Disk name NSD volume ID Device Node name Remarks
-----------------------------------------------------------------------------------------
t65nsd4b 0972364D45EF7614 /dev/hdisk92 c26f4gp01.ppd.pok.ibm.com
t65nsd4b 0972364D45EF7614 /dev/hdisk92 c26f4gp02.ppd.pok.ibm.com
t65nsd4b 0972364D45EF7614 - c26f4gp03.ppd.pok.ibm.com (not found) directly attached
t65nsd4b 0972364D45EF7614 /dev/hdisk26 c26f4gp04.ppd.pok.ibm.com

From this output we can tell that NSD t65nsd4b is:


v Known as hdisk92 on node c26f4gp01 and c26f4gp02.
v Known as hdisk26 on node c26f4gp04
v Is not attached to node c26f4gp03

To display extended information about a node's view of its NSDs, the mmlsnsd -X command can be
used:
mmlsnsd -X -d "hd3n97;sdfnsd;hd5n98"

The system displays information similar to:


Disk name NSD volume ID Device Devtype Node name Remarks
---------------------------------------------------------------------------------------------------
hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no
hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no
hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no
hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no
sdfnsd 0972845E45F02E81 /dev/sdf generic c5n94g.ppd.pok.ibm.com server node
sdfnsd 0972845E45F02E81 /dev/sdm generic c5n96g.ppd.pok.ibm.com server node

From this output we can tell that:


v Disk hd3n97 is an hdisk known as /dev/hdisk3 on NSD server node c5n97 and c5n98.
v Disk sdfnsd is a generic disk known as /dev/sdf and /dev/sdm on NSD server node c5n94g and
c5n96g, respectively.
v In addition to the above information, the NSD volume ID is displayed for each disk.

Note: The -m, -M and -X options of the mmlsnsd command can be very time consuming, especially on
large clusters. Use these options judiciously.

Chapter 8. GPFS disk problems 93


NSD creation fails with a message referring to an existing NSD
NSDs are deleted with the mmdelnsd command. Internally, this is a two-step process:
1. Remove the NSD definitions from the GPFS control information.
2. Zero-out GPFS-specific data structures on the disk.

If for some reason the second step fails, for example because the disk is damaged and cannot be written
to, the mmdelnsd command issues a message describing the error and then another message stating the
exact command to issue to complete the deletion of the NSD. If these instructions are not successfully
completed, a subsequent mmcrnsd command can fail with
6027-1662
Disk device deviceName refers to an existing NSD name.

This error message indicates that the disk is either an existing NSD, or that the disk was previously an
NSD that had been removed from the GPFS cluster using the mmdelnsd -p command, and had not been
marked as available.

If the GPFS data structures are not removed from the disk, it might be unusable for other purposes. For
example, if you are trying to create an AIX volume group on the disk, the mkvg command might fail
with messages similar to:
0516-1339 /usr/sbin/mkvg: Physical volume contains some 3rd party volume group.
0516-1397 /usr/sbin/mkvg: The physical volume hdisk5, will not be added to the volume group.
0516-862 /usr/sbin/mkvg: Unable to create volume group.

The easiest way to recover such a disk is to temporarily define it as an NSD again (using the -v no
option) and then delete the just-created NSD. For example:
mmcrnsd -F filename -v no
mmdelnsd -F filename

GPFS has declared NSDs as down


There are several situations in which disks can appear to fail to GPFS. Almost all of these situations
involve a failure of the underlying disk subsystem. The following information describes how GPFS reacts
to these failures and how to find the cause.

GPFS will stop using a disk that is determined to have failed. This event is marked as MMFS_DISKFAIL
in an error log entry (see “The operating system error log facility” on page 2). The state of a disk can be
checked by issuing the mmlsdisk command.

The consequences of stopping disk usage depend on what is stored on the disk:
v Certain data blocks may be unavailable because the data residing on a stopped disk is not replicated.
v Certain data blocks may be unavailable because the controlling metadata resides on a stopped disk.
v In conjunction with other disks that have failed, all copies of critical data structures may be unavailable
resulting in the unavailability of the entire file system.
The disk will remain unavailable until its status is explicitly changed through the mmchdisk command.
After that command is issued, any replicas that exist on the failed disk are updated before the disk is
used.

GPFS can declare disks down for a number of reasons:


v If the first NSD server goes down and additional NSD servers were not assigned, or all of the
additional NSD servers are also down and no local device access is available on the node, the disks are
marked as stopped.
v A failure of an underlying disk subsystem may result in a similar marking of disks as stopped.
1. Issue the mmlsdisk command to verify the status of the disks in the file system.

94 GPFS: Problem Determination Guide


2. Issue the mmchdisk command with the -a option to start all stopped disks.
v Disk failures should be accompanied by error log entries (see The operating system error log facility)
for the failing disk. GPFS error log entries labelled MMFS_DISKFAIL will occur on the node detecting
the error. This error log entry will contain the identifier of the failed disk. Follow the problem
determination and repair actions specified in your disk vendor problem determination guide. After
performing problem determination and repair issue the mmchdisk command to bring the disk back
up.

Unable to access disks


If you cannot open a disk, the specification of the disk may be incorrect. It is also possible that a
configuration failure may have occurred during disk subsystem initialization. For example, on Linux you
should consult /var/log/messages to determine if disk device configuration errors have occurred.
Feb 16 13:11:18 host123 kernel: SCSI device sdu: 35466240 512-byte hdwr sectors (18159 MB)
Feb 16 13:11:18 host123 kernel: sdu: I/O error: dev 41:40, sector 0
Feb 16 13:11:18 host123 kernel: unable to read partition table

On AIX, consult “The operating system error log facility” on page 2 for hardware configuration error log
entries.

Accessible disk devices will generate error log entries similar to this example for a SSA device:
--------------------------------------------------------------------------
LABEL: SSA_DEVICE_ERROR
IDENTIFIER: FE9E9357

Date/Time: Wed Sep 8 10:28:13 edt


Sequence Number: 54638
Machine Id: 000203334C00
Node Id: c154n09
Class: H
Type: PERM
Resource Name: pdisk23
Resource Class: pdisk
Resource Type: scsd
Location: USSA4B33-D3
VPD:
Manufacturer................IBM
Machine Type and Model......DRVC18B
Part Number.................09L1813
ROS Level and ID............0022
Serial Number...............6800D2A6HK
EC Level....................E32032
Device Specific.(Z2)........CUSHA022
Device Specific.(Z3)........09L1813
Device Specific.(Z4)........99168

Description
DISK OPERATION ERROR

Probable Causes
DASD DEVICE

Failure Causes
DISK DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
ERROR CODE
2310 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------

Chapter 8. GPFS disk problems 95


or this one from GPFS:
---------------------------------------------------------------------------
LABEL: MMFS_DISKFAIL
IDENTIFIER: 9C6C05FA

Date/Time: Tue Aug 3 11:26:34 edt


Sequence Number: 55062
Machine Id: 000196364C00
Node Id: c154n01
Class: H
Type: PERM
Resource Name: mmfs
Resource Class: NONE
Resource Type: NONE
Location:

Description
DISK FAILURE

Probable Causes
STORAGE SUBSYSTEM
DISK

Failure Causes
STORAGE SUBSYSTEM
DISK

Recommended Actions
CHECK POWER
RUN DIAGNOSTICS AGAINST THE FAILING DEVICE

Detail Data
EVENT CODE
1027755
VOLUME
fs3
RETURN CODE
19
PHYSICAL VOLUME
vp31n05
-----------------------------------------------------------------

Guarding against disk failures


There are various ways to guard against the loss of data due to disk media failures. For example, the use
of a RAID controller, which masks disk failures with parity disks, or a twin-tailed disk, could prevent the
need for using these recovery steps.

| GPFS offers a method of protection called replication, which overcomes disk failure at the expense of
| additional disk space. GPFS allows replication of data and metadata. This means that three instances of
| data, metadata, or both can be automatically created and maintained for any file in a GPFS file system. If
| one instance becomes unavailable due to disk failure, another instance is used instead. You can set
| different replication specifications for each file, or apply default settings specified at file system creation.
| Refer to the File system replication parameters topic in the GPFS: Concepts, Planning, and Installation Guide.

Disk media failure


Regardless of whether you have chosen additional hardware or replication to protect your data against
media failures, you first need to determine that the disk has completely failed. If the disk has completely
failed and it is not the path to the disk which has failed, follow the procedures defined by your disk
vendor. Otherwise:
1. Check on the states of the disks for the file system:
mmlsdisk fs1 -e

96 GPFS: Problem Determination Guide


GPFS will mark disks down if there have been problems accessing the disk.
2. To prevent any I/O from going to the down disk, issue these commands immediately:
mmchdisk fs1 suspend -d gpfs1nsd
mmchdisk fs1 stop -d gpfs1nsd

Note: If there are any GPFS file systems with pending I/O to the down disk, the I/O will timeout if
the system administrator does not stop it.

To see if there are any threads that have been waiting a long time for I/O to complete, on all nodes
issue:
mmfsadm dump waiters 10 | grep "I/O completion"
3. The next step is irreversible! Do not run this command unless data and metadata have been replicated.
This command scans file system metadata for disk addresses belonging to the disk in question, then
replaces them with a special “broken disk address” value, which may take a while.
CAUTION:
Be extremely careful with using the -p option of mmdeldisk, because by design it destroys
references to data blocks, making affected blocks unavailable. This is a last-resort tool, to be used
when data loss may have already occurred, to salvage the remaining data–which means it cannot
take any precautions. If you are not absolutely certain about the state of the file system and the
impact of running this command, do not attempt to run it without first contacting the IBM Support
Center.
mmdeldisk fs1 gpfs1n12 -p
4. Invoke the mmfileid command with the operand :BROKEN:
mmfileid :BROKEN
For more information, see “The mmfileid command” on page 33.
5. After the disk is properly repaired and available for use, you can add it back to the file system.

Replicated metadata and data


If you have replicated metadata and data and only disks in a single failure group have failed, everything
should still be running normally but with slightly degraded performance. You can determine the
replication values set for the file system by issuing the mmlsfs command. Proceed with the appropriate
course of action:
1. After the failed disk has been repaired, issue an mmadddisk command to add the disk to the file
system:
mmadddisk fs1 gpfs12nsd

You can rebalance the file system at the same time by issuing:
mmadddisk fs1 gpfs12nsd -r

Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only
for file systems with large files that are mostly invariant. In many cases, normal file update and
creation will rebalance your file system over time, without the cost of the rebalancing.
2. To re-replicate data that only has single copy, issue:
mmrestripefs fs1 -r

Optionally, use the -b flag instead of the -r flag to rebalance across all disks.

Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only
for file systems with large files that are mostly invariant. In many cases, normal file update and
creation will rebalance your file system over time, without the cost of the rebalancing.
3. Optionally, check the file system for metadata inconsistencies by issuing the offline version of
mmfsck:
mmfsck fs1

Chapter 8. GPFS disk problems 97


If mmfsck succeeds, you may still have errors that occurred. Check to verify no files were lost. If files
containing user data were lost, you will have to restore the files from the backup media.
If mmfsck fails, sufficient metadata was lost and you need to recreate your file system and restore the
data from backup media.

Replicated metadata only


If you have only replicated metadata, you should be able to recover some, but not all, of the user data.
Recover any data to be kept using normal file operations or erase the file. If you read a file in block-size
chunks and get a failure return code and an EIO errno, that block of the file has been lost. The rest of the
file may have useful data to recover, or it can be erased.

Strict replication
If data or metadata replication is enabled, and the status of an existing disk changes so that the disk is no
longer available for block allocation (if strict replication is enforced), you may receive an errno of
ENOSPC when you create or append data to an existing file. A disk becomes unavailable for new block
allocation if it is being deleted, replaced, or it has been suspended. If you need to delete, replace, or
suspend a disk, and you need to write new data while the disk is offline, you can disable strict
replication by issuing the mmchfs -K no command before you perform the disk action. However, data
written while replication is disabled will not be replicated properly. Therefore, after you perform the disk
action, you must re-enable strict replication by issuing the mmchfs -K command with the original value
of the -K option (always or whenpossible) and then run the mmrestripefs -r command. To determine if a
disk has strict replication enforced, issue the mmlsfs -K command.

Note: A disk in a down state that has not been explicitly suspended is still available for block allocation,
and thus a spontaneous disk failure will not result in application I/O requests failing with ENOSPC.
While new blocks will be allocated on such a disk, nothing will actually be written to the disk until its
availability changes to up following an mmchdisk start command. Missing replica updates that took
place while the disk was down will be performed when mmchdisk start runs.

No replication
When there is no replication, the system metadata has been lost and the file system is basically
irrecoverable. You may be able to salvage some of the user data, but it will take work and time. A forced
unmount of the file system will probably already have occurred. If not, it probably will very soon if you
try to do any recovery work. You can manually force the unmount yourself:
1. Mount the file system in read-only mode (see “Read-only mode mount” on page 23). This will bypass
recovery errors and let you read whatever you can find. Directories may be lost and give errors, and
parts of files will be missing. Get what you can now, for all will soon be gone. On a single node,
issue:
mount -o ro /dev/fs1
2. If you read a file in block-size chunks and get an EIO return code that block of the file has been lost.
The rest of the file may have useful data to recover or it can be erased. To save the file system
parameters for recreation of the file system, issue:
mmlsfs fs1 > fs1.saveparms

Note: This next step is irreversible!


To delete the file system, issue:
mmdelfs fs1
3. To repair the disks, see your disk vendor problem determination guide. Follow the problem
determination and repair actions specified.
4. Delete the affected NSDs. Issue:
mmdelnsd nsdname

The system displays output similar to this:

98 GPFS: Problem Determination Guide


mmdelnsd: Processing disk nsdname
mmdelnsd: 6027-1371 Propagating the cluster configuation data to all
affected nodes. This is an asynchronous process.
5. Create a disk descriptor file for the disks to be used. This will include recreating NSDs for the new
file system.
6. Recreate the file system with either different parameters or the same as you used before. Use the disk
descriptor file.
7. Restore lost data from backups.

GPFS error messages for disk media failures


Disk media failures can be associated with these GPFS message numbers:
6027-418
Inconsistent file system quorum. readQuorum=value writeQuorum=value quorumSize=value
| 6027-482 [E]
| Remount failed for device name: errnoDescription
6027-485
Perform mmchdisk for any disk failures and re-mount.
| 6027-636 [E]
| Disk marked as stopped or offline.

Error numbers specific to GPFS application calls when disk failure occurs
When a disk failure has occurred, GPFS may report these error numbers in the operating system error
log, or return them to an application:
EOFFLINE = 208, Operation failed because a disk is offline
This error is most commonly returned when an attempt to open a disk fails. Since GPFS will
attempt to continue operation with failed disks, this will be returned when the disk is first
needed to complete a command or application request. If this return code occurs, check your disk
for stopped states, and check to determine if the network path exists.
To repair the disks, see your disk vendor problem determination guide. Follow the problem
determination and repair actions specified.
ENO_MGR = 212, The current file system manager failed and no new manager could be appointed.
This error usually occurs when a large number of disks are unavailable or when there has been a
major network failure. Run the mmlsdisk command to determine whether disks have failed. If
disks have failed, check the operating system error log on all nodes for indications of errors. Take
corrective action by issuing the mmchdisk command.
To repair the disks, see your disk vendor problem determination guide. Follow the problem
determination and repair actions specified.

Disk connectivity failure and recovery


If a disk is defined to have a local connection and to be connected to defined NSD servers, and the local
connection fails, GPFS bypasses the broken local connection and uses the NSD servers to maintain disk
access. The following error message appears in the GPFS log:
| 6027-361 [E]
Local access to disk failed with EIO, switching to access the disk remotely.

This is the default behavior, and can be changed with the useNSDserver file system mount option. See
the NSD server considerations topic in the GPFS: Concepts, Planning, and Installation Guide.

Chapter 8. GPFS disk problems 99


For a file system using the default mount option useNSDserver=asneeded, disk access fails over from
local access to remote NSD access. Once local access is restored, GPFS detects this fact and switches back
to local access. The detection and switch over are not instantaneous, but occur at approximately five
minute intervals.

Note: In general, after fixing the path to a disk, you must run the mmnsddiscover command on the
server that lost the path to the NSD. (Until the mmnsddiscover command is run, the reconnected node
will see its local disks and start using them by itself, but it will not act as the NSD server.)

After that, you must run the command on all client nodes that need to access the NSD on that server; or
you can achieve the same effect with a single mmnsddiscover invocation if you utilize the -N option to
specify a node list that contains all the NSD servers and clients that need to rediscover paths.

Partial disk failure


If the disk has only partially failed and you have chosen not to implement hardware protection against
media failures, the steps to restore your data depends on whether you have used replication. If you have
replicated neither your data nor metadata, you will need to issue the offline version of the mmfsck
command, and then restore the lost information from the backup media. If it is just the data which was
not replicated, you will need to restore the data from the backup media. There is no need to run the
mmfsck command if the metadata is intact.

If both your data and metadata have been replicated, implement these recovery actions:
1. Unmount the file system:
mmumount fs1 -a
2. Delete the disk from the file system:
mmdeldisk fs1 gpfs10nsd -c
3. If you are replacing the disk, add the new disk to the file system:
mmadddisk fs1 gpfs11nsd
4. Then restripe the file system:
mmrestripefs fs1 -b

Note: Ensure there is sufficient space elsewhere in your file system for the data to be stored by using
the mmdf command.

GPFS has declared NSDs built on top of AIX logical volumes as down
Earlier releases of GPFS allowed AIX logical volumes to be used in GPFS file systems. Using AIX logical
volumes in GPFS file systems is now discouraged as they are limited with regard to their clustering
ability and cross platform support.

Existing file systems using AIX logical volumes are however still supported, and this information might
be of use when working with those configurations.

Verify logical volumes are properly defined for GPFS use


To verify your logical volume configuration, you must first determine the mapping between the GPFS
NSD and the underlying disk device. Issue the command:
mmlsnsd -m

which will display any underlying physical device present on this node which is backing the NSD. If the
underlying device is a logical volume, perform a mapping from the logical volume to the volume group.

Issue the commands:


lsvg -o | lsvg -i -l

100 GPFS: Problem Determination Guide


The output will be a list of logical volumes and corresponding volume groups. Now issue the lsvg
command for the volume group containing the logical volume. For example:
lsvg gpfs1vg

The system displays information similar to:


VOLUME GROUP: gpfs1vg VG IDENTIFIER: 000195600004c00000000ee60c66352
VG STATE: active PP SIZE: 16 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 542 (8672 megabytes)
MAX LVs: 256 FREE PPs: 0 (0 megabytes)
LVs: 1 USED PPs: 542 (8672 megabytes)
OPEN LVs: 1 QUORUM: 2
TOTAL PVs: 1 VG DESCRIPTORS: 2
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 1 AUTO ON: no
MAX PPs per PV: 1016 MAX PVs: 32
LTG size: 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no

Check the volume group on each node


Make sure that all disks are properly defined to all nodes in the GPFS cluster:
1. Issue the AIX lspv command on all nodes in the GPFS cluster and save the output.
2. Compare the pvid and volume group fields for all GPFS volume groups.
Each volume group must have the same pvid and volume group name on each node. The hdisk
name for these disks may vary.

For example, to verify the volume group gpfs1vg on the five nodes in the GPFS cluster, for each node in
the cluster issue:
lspv | grep gpfs1vg

The system displays information similar to:


k145n01: hdisk3 00001351566acb07 gpfs1vg active
k145n02: hdisk3 00001351566acb07 gpfs1vg active
k145n03: hdisk5 00001351566acb07 gpfs1vg active
k145n04: hdisk5 00001351566acb07 gpfs1vg active
k145n05: hdisk7 00001351566acb07 gpfs1vg active

Here the output shows that on each of the five nodes the volume group gpfs1vg is the same physical
disk (has the same pvid). The hdisk numbers vary, but the fact that they may be called different hdisk
names on different nodes has been accounted for in the GPFS product. This is an example of a properly
defined volume group.

If any of the pvids were different for the same volume group, this would indicate that the same volume
group name has been used when creating volume groups on different physical volumes. This will not
work for GPFS. A volume group name can be used only for the same physical volume shared among
nodes in a cluster. For more information, refer to the IBM pSeries and AIX Information Center
(http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp) and search for operating system and
device management.

Volume group varyon problems


If an NSD backed by an underlying logical volume will not come online to a node, it may be due to
varyonvg problems at the volume group layer. Issue the varyoffvg command for the volume group at all
nodes and restart GPFS. On startup, GPFS will varyon any underlying volume groups in proper
sequence.

Chapter 8. GPFS disk problems 101


Disk accessing commands fail to complete due to problems with some
non-IBM disks
Certain disk commands, such as mmcrfs, mmadddisk, mmrpldisk, mmmount and the operating system's
mount, might issue the varyonvg -u command if the NSD is backed by an AIX logical volume.

For some non-IBM disks, when many varyonvg -u commands are issued in parallel, some of the AIX
varyonvg -u invocations do not complete, causing the disk command to hang.

This situation is recognized by the GPFS disk command not completing after a long period of time, and
the persistence of the varyonvg processes as shown by the output of the ps -ef command on some of the
nodes of the cluster. In these cases, kill the varyonvg processes that were issued by the GPFS disk
command on the nodes of the cluster. This allows the GPFS disk command to complete. Before mounting
the affected file system on any node where a varyonvg process was killed, issue the varyonvg -u
command (varyonvg -u vgname) on the node to make the disk available to GPFS. Do this on each of the
nodes in question, one by one, until all of the GPFS volume groups are varied online.

Persistent Reserve errors


You can use Persistent Reserve (PR) to provide faster failover times between disks that support this
feature. PR allows the stripe group manager to "fence" disks during node failover by removing the
reservation keys for that node. In contrast, non-PR disk failovers cause the system to wait until the disk
lease expires.

GPFS allows file systems to have a mix of PR and non-PR disks. In this configuration, GPFS will fence PR
disks for node failures and recovery and non-PR disk will use disk leasing. If all of the disks are PR
disks, disk leasing is not used, so recovery times improve.

GPFS uses the mmchconfig command to enable PR. Issuing this command with the appropriate
usePersistentReserve option configures disks automatically. If this command fails, the most likely cause
is either a hardware or device driver problem. Other PR-related errors will probably be seen as file
system unmounts that are related to disk reservation problems. This type of problem should be debugged
with existing trace tools.

Understanding Persistent Reserve


Note: While Persistent Reserve (PR) is supported on both AIX and Linux, reserve_policy is applicable only
to AIX.

Persistent Reserve refers to a set of Small Computer Systems Interface-3 (SCSI-3) standard commands and
command options. These PR commands and command options give SCSI initiators the ability to establish,
preempt, query, and reset a reservation policy with a specified target disk. The functions provided by PR
commands are a superset of current reserve and release mechanisms. These functions are not compatible
with legacy reserve and release mechanisms. Target disks can only support reservations from either the
legacy mechanisms or the current mechanisms.

Note: Attempting to mix Persistent Reserve commands with legacy reserve and release commands will
result in the target disk returning a reservation conflict error.

Persistent Reserve establishes an interface through a reserve_policy attribute for SCSI disks. You can
optionally use this attribute to specify the type of reservation that the device driver will establish before
accessing data on the disk. For devices that do not support the reserve_policy attribute, the drivers will use
the value of the reserve_lock attribute to determine the type of reservation to use for the disk. GPFS
supports four values for the reserve_policy attribute:

102 GPFS: Problem Determination Guide


no_reserve::
Specifies that no reservations are used on the disk.
single_path::
Specifies that legacy reserve/release commands are used on the disk.
PR_exclusive::
Specifies that Persistent Reserve is used to establish exclusive host access to the disk.
PR_shared::
Specifies that Persistent Reserve is used to establish shared host access to the disk.

Persistent Reserve support affects both the parallel (scdisk) and SCSI-3 (scsidisk) disk device drivers and
configuration methods. When a device is opened (for example, when the varyonvg command opens the
underlying hdisks), the device driver checks the ODM for reserve_policy and PR_key_value and then opens
the device appropriately. For PR, each host attached to the shared disk must use unique registration key
values for reserve_policy and PR_key_value. On AIX, you can display the values assigned to reserve_policy
and PR_key_value by issuing:
lsattr -El hdiskx -a reserve_policy,PR_key_value

If needed, use the AIX chdev command to set reserve_policy and PR_key_value.

Note: GPFS manages reserve_policy and PR_key_value using reserve_policy=PR_shared when Persistent
Reserve support is enabled and reserve_policy=no_reserve when Persistent Reserve is disabled.

Checking Persistent Reserve


For Persistent Reserve to function properly, you must have PR enabled on all of the disks that are
PR-capable. To determine the PR status in the cluster:
1. Determine if PR is enabled on the cluster
a. Issue mmlsconfig
b. Check for usePersistentReserve=yes
2. Determine if PR is enabled for all disks on all nodes
a. Make sure that GPFS has been started and mounted on all of the nodes
b. Enable PR by issuing mmchconfig
c. Issue the command mmlsnsd -X and look for pr=yes on all the hdisk lines

Notes:
1. To view the keys that are currently registered on a disk, issue the following command from a node
that has access to the disk:
/usr/lpp/mmfs/bin/tsprreadkeys hdiskx
2. To check the AIX ODM status of a single disk on a node, issue the following command from a node
that has access to the disk:
lsattr -El hdiskx -a reserve_policy,PR_key_value

Clearing a leftover Persistent Reserve reservation


Message number 6027-2202 indicates that a specified disk has a SCSI-3 PR reservation, which prevents
the mmcrnsd command from formatting it. The example below is specific to a Linux environment.
Output on AIX is similar but not identical.

Before trying to clear the PR reservation, use the following instructions to verify that the disk is really
intended for GPFS use. Note that in this example, the device name is specified without a prefix (/dev/sdp
is specified as sdp).
1. Display all the registration key values on the disk:

Chapter 8. GPFS disk problems 103


/usr/lpp/mmfs/bin/tsprreadkeys sdp

The system displays information similar to:


Registration keys for sdp
1. 00006d0000000001

If the registered key values all start with 0x00006d, which indicates that the PR registration was issued
by GPFS, proceed to the next step to verify the SCSI-3 PR reservation type. Otherwise, contact your
system administrator for information about clearing the disk state.
2. Display the reservation type on the disk:
/usr/lpp/mmfs/bin/tsprreadres sdp

The system displays information similar to:


yes:LU_SCOPE:WriteExclusive-AllRegistrants:0000000000000000

If the output indicates a PR reservation with type WriteExclusive-AllRegistrants, proceed to the


following instructions for clearing the SCSI-3 PR reservation on the disk.

If the output does not indicate a PR reservation with this type, contact your system administrator for
information about clearing the disk state.

To clear the SCSI-3 PR reservation on the disk, follow these steps:


1. Choose a hex value (HexValue); for example, 0x111abc that is not in the output of the tsprreadkeys
command run previously. Register the local node to the disk by entering the following command with
the chosen HexValue:
/usr/lpp/mmfs/bin/tsprregister sdp 0x111abc
2. Verify that the specified HexValue has been registered to the disk:
/usr/lpp/mmfs/bin/tsprreadkeys sdp

The system displays information similar to:


Registration keys for sdp
1. 00006d0000000001
2. 0000000000111abc
3. Clear the SCSI-3 PR reservation on the disk:
/usr/lpp/mmfs/bin/tsprclear sdp 0x111abc
4. Verify that the PR registration has been cleared:
/usr/lpp/mmfs/bin/tsprreadkeys sdp

The system displays information similar to:


Registration keys for sdp
5. Verify that the reservation has been cleared:
/usr/lpp/mmfs/bin/tsprreadres sdp

The system displays information similar to:


no:::
The disk is now ready to use for creating an NSD.

Manually enabling or disabling Persistent Reserve


Attention: Manually enabling or disabling Persistent Reserve should only be done under the
supervision of the IBM Support Center with GPFS stopped on the node.

104 GPFS: Problem Determination Guide


The IBM Support Center will help you determine if the PR state is incorrect for a disk. If the PR state is
incorrect, you may be directed to correct the situation by manually enabling or disabling PR on that disk.

GPFS is not using the underlying multipath device


You can view the underlying disk device where I/O is performed on an NSD disk by using the
mmlsdisk command with the -M option.

The mmlsdisk command output might show unexpected results for multipath I/O devices. For example
if you issue this command:
mmlsdisk dmfs2 -M

The system displays information similar to:


Disk name IO performed on node Device Availability
------------ ----------------------- ----------------- ------------
m0001 localhost /dev/sdb up

The following command is available on Linux only.


# multipath -ll
mpathae (36005076304ffc0e50000000000000001) dm-30 IBM,2107900
[size=10G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=8][active]
\_ 1:0:5:1 sdhr 134:16 [active][ready]
\_ 1:0:4:1 sdgl 132:16 [active][ready]
\_ 1:0:1:1 sdff 130:16 [active][ready]
\_ 1:0:0:1 sddz 128:16 [active][ready]
\_ 0:0:7:1 sdct 70:16 [active][ready]
\_ 0:0:6:1 sdbn 68:16 [active][ready]
\_ 0:0:5:1 sdah 66:16 [active][ready]
\_ 0:0:4:1 sdb 8:16 [active][ready]

The mmlsdisk output shows that I/O for NSD m0001 is being performed on disk /dev/sdb, but it should
show that I/O is being performed on the device-mapper multipath (DMM) /dev/dm-30. Disk /dev/sdb is
one of eight paths of the DMM /dev/dm-30 as shown from the multipath command.

This problem could occur for the following reasons:


v The previously installed user exit /var/mmfs/etc/nsddevices is missing. To correct this, restore user exit
/var/mmfs/etc/nsddevices and restart GPFS.
v The multipath device type does not match the GPFS known device type. For a list of known device
types, see /usr/lpp/mmfs/bin/mmdevdiscover. After you have determined the device type for your
multipath device, use the mmchconfig command to change the NSD disk to a known device type and
then restart GPFS.

The output below shows that device type dm-30 is dmm:


/usr/lpp/mmfs/bin/mmdevdiscover | grep dm-30
dm-30 dmm

To change the NSD device type to a known device type, create a file that contains the NSD name and
device type pair (one per line) and issue this command:
mmchconfig updateNsdType=/tmp/filename

where the contents of /tmp/filename are:


m0001 dmm

The system displays information similar to:

Chapter 8. GPFS disk problems 105


mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

106 GPFS: Problem Determination Guide


|

| Chapter 9. GPFS encryption problems


| The topics that follow provide solutions for problems that may be encountered while setting up or using
| encryption.
|
| Unable to add encryption policies (failure of mmchpolicy)
| If mmchpolicy fails when you are trying to add encryption policies, follow these diagnostic steps.
| 1. Confirm that the gpfs.crypto and gpfs.gskit packages are installed.
| 2. Confirm that the file system is at GPFS 4.1 level and that the fast external attributes (--fastea) option
| is enabled.
| 3. Examine error messages in /var/adm/ras/mmfs.log.latest.
|
| “Permission denied” failure when creating, opening, reading, or
| writing to a file
| If you experience a “Permission denied” failure when creating, opening, reading, or writing a file, follow
| these diagnostic steps.
| 1. Confirm that the key server is operational and correctly set up and can be accessed via the network.
| 2. Confirm that the RKM.conf file is present on all nodes from which the given file is to be accessed. The
| RKM.conf file must contain entries for all the RKMs needed to access the given file.
| 3. Verify that the master keys needed by the file and the keys specified in the encryption policies are
| present on the key server.
| 4. Examine the error messages in the /var/adm/ras/mmfs.log.latest file.
|
| “Value too large” failure when creating a file
| If you experience a “Value too large to be stored in data type” failure when creating a file, follow these
| diagnostic steps.
| 1. Examine error messages in /var/adm/ras/mmfs.log.latest to confirm that the problem is related to
| the extended attributes being too large for the inode. The size of the encryption extended attribute is a
| function of the number of keys used to encrypt a file. If you encounter this issue, update the
| encryption policy to reduce the number of keys needed to access any given file.
| 2. If the previous step does not solve the problem, create a new file system with a larger inode size.
|
| Mount failure for a file system with encryption rules
| If you experience a mount failure for a file system with encryption rules, follow these diagnostic steps.
| 1. Confirm that the gpfs.crypto and gpfs.gskit packages are installed.
| 2. Confirm that the /var/mmfs/etc/RKM.conf file is present on the node and that the content in RKM.conf
| is correct.
| 3. Examine the error messages in /var/adm/ras/mmfs.log.latest.
|
| “Permission denied” failure of key rewrap
| If you experience a “Permission denied” failure of a key rewrap, follow these diagnostic steps.

| When mmapplypolicy is invoked to perform a key rewrap, the command may issue messages like the
| following:

© Copyright IBM Corporation © IBM 2014 107


| [E] Error on gpfs_enc_file_rewrap_key(/fs1m/sls/test4,KEY-d7bd45d8-9d8d-4b85-a803-e9b794ec0af2:hs21n56_new,KEY-40a0b68b-c86d-4519-9e48-3714d3b71e20:js21n92)
| Permission denied(13)

| If you receive a message similar to this, follow these steps:


| 1. Check for syntax errors in the migration policy syntax.
| 2. Ensure that the new key is not already being used for the file.
| 3. Ensure that both the original and the new keys are retrievable.
| 4. Examine the error messages in /var/adm/ras/mmfs.log.latest for additional details.

108 GPFS: Problem Determination Guide


Chapter 10. Other problem determination hints and tips
These hints and tips might be helpful when investigating problems related to logical volumes, quorum
nodes, or system performance that can be encountered while using GPFS.

See these topics for more information:


v “Which physical disk is associated with a logical volume?”
v “Which nodes in my cluster are quorum nodes?”
v “What is stored in the /tmp/mmfs directory and why does it sometimes disappear?” on page 110
v “Why does my system load increase significantly during the night?” on page 110
v “What do I do if I receive message 6027-648?” on page 110
v “Why can't I see my newly mounted Windows file system?” on page 111
v “Why is the file system mounted on the wrong drive letter?” on page 111
v “Why does the offline mmfsck command fail with "Error creating internal storage"?” on page 111
v “Questions related to active file management” on page 111
v “Questions related to File Placement Optimizer (FPO)” on page 113

Which physical disk is associated with a logical volume?


Earlier releases of GPFS allowed AIX logical volumes to be used in GPFS file systems. Their use is now
discouraged because they are limited with regard to their clustering ability and cross platform support.

Existing file systems using AIX logical volumes are, however, still supported. This information might be
of use when working with those configurations.

If an error report contains a reference to a logical volume pertaining to GPFS, you can use the lslv -l
command to list the physical volume name. For example, if you want to find the physical disk associated
with logical volume gpfs7lv, issue:
lslv -l gpfs44lv

Output is similar to this, with the physical volume name in column one.
gpfs44lv:N/A
PV COPIES IN BAND DISTRIBUTION
hdisk8 537:000:000 100% 108:107:107:107:108

Which nodes in my cluster are quorum nodes?


Use the mmlscluster command to determine which nodes in your cluster are quorum nodes.

Output is similar to this:


GPFS cluster information
========================
GPFS cluster name: cluster1.kgn.ibm.com
GPFS cluster id: 680681562214606028
GPFS UID domain: cluster1.kgn.ibm.com
Remote shell command: /usr/bin/rsh
Remote file copy command: /usr/bin/rcp
| Repository type: server-based

GPFS cluster configuration servers:


-----------------------------------
Primary server: k164n06.kgn.ibm.com

© Copyright IBM Corp. 2014 109


Secondary server: k164n05.kgn.ibm.com

Node Daemon node name IP address Admin node name Designation


--------------------------------------------------------------------------------
1 k164n04.kgn.ibm.com 198.117.68.68 k164n04.kgn.ibm.com quorum
2 k164n05.kgn.ibm.com 198.117.68.71 k164n05.kgn.ibm.com quorum
3 k164n06.kgn.ibm.com 198.117.68.70 k164n06.kgn.ibm.com

In this example, k164n04 and k164n05 are quorum nodes and k164n06 is a nonquorum node.

To change the quorum status of a node, use the mmchnode command. To change one quorum node to
nonquorum, GPFS does not have to be stopped. If you are changing more than one node at the same
time, GPFS needs to be down on all the affected nodes. GPFS does not have to be stopped when
changing nonquorum nodes to quorum nodes, nor does it need to be stopped on nodes that are not
affected.

For example, to make k164n05 a nonquorum node, and k164n06 a quorum node, issue these commands:
mmchnode --nonquorum -N k164n05
mmchnode --quorum -N k164n06

To set a node's quorum designation at the time that it is added to the cluster, see the mmcrcluster or
mmaddnode commands.

What is stored in the /tmp/mmfs directory and why does it sometimes


disappear?
When GPFS encounters an internal problem, certain state information is saved in the GPFS dump
directory for later analysis by the IBM Support Center.

The default dump directory for GPFS is /tmp/mmfs. This directory might disappear on Linux if cron is
set to run the /etc/cron.daily/tmpwatch script. The tmpwatch script removes files and directories in /tmp
that have not been accessed recently. Administrators who want to use a different directory for GPFS
dumps can change the directory by issuing this command:
mmchconfig dataStructureDump=/name_of_some_other_big_file_system

Why does my system load increase significantly during the night?


On some Linux distributions, cron runs the /etc/cron.daily/slocate.cron job every night. This will try to
index all the files in GPFS. This will put a very large load on the GPFS token manager.

You can exclude all GPFS file systems by adding gpfs to the excludeFileSytemType list in this script, or
exclude specific GPFS file systems in the excludeFileSytemType list.
/usr/bin/updatedb -f "excludeFileSystemType" -e "excludeFileSystem"

If indexing GPFS file systems is desired, only one node should run the updatedb command and build the
database in a GPFS file system. If the database is built within a GPFS file system it will be visible on all
nodes after one node finishes building it.

What do I do if I receive message 6027-648?


The mmedquota or mmdefedquota commands can fail with message 6027-648: EDITOR environment
variable must be full path name. This message occurs when the value of the EDITOR environment
variable is not an absolute path name.

To resolve this error, do the following:


1. Change the value of the EDITOR environment variable to an absolute path name.

110 GPFS: Problem Determination Guide


2. Check to see if the EDITOR variable is set in the $HOME/.kshrc file. If it is set, check to see if it is an
absolute path name because the mmedquota or mmdefedquota command could retrieve the EDITOR
environment variable from that file.

Why can't I see my newly mounted Windows file system?


On Windows, a newly mounted file system might not be visible to you if you are currently logged on to
a system. This can happen if you have mapped a network share to the same drive letter as GPFS.

Once you start a new session (by logging out and logging back in), the use of the GPFS drive letter will
supersede any of your settings for the same drive letter. This is standard behavior for all local file
systems on Windows.

Why is the file system mounted on the wrong drive letter?


Before mounting a GPFS file system, you must be certain that the drive letter required for GPFS is freely
available and is not being used by a local disk or a network-mounted file system on all computation
nodes where the GPFS file system will be mounted.

Why does the offline mmfsck command fail with "Error creating
internal storage"?
The mmfsck command requires some temporary space on the file system manager for storing internal
data during a file system scan. The internal data will be placed in the directory specified by the mmfsck
-t command line parameter (/tmp by default). The amount of temporary space that is needed is
proportional to the number of inodes (used and unused) in the file system that is being scanned. If GPFS
is unable to create a temporary file of the required size, the mmfsck command will fail with error
message:
Error creating internal storage

This failure could be caused by:


v The lack of sufficient disk space in the temporary directory on the file system manager
v The lack of sufficient pagepool on the file system manager as shown in mmlsconfig pagepool output
v Insufficiently high filesize limit set for the root user by the operating system
v The lack of support for large files in the file system that is being used for temporary storage. Some file
systems limit the maximum file size because of architectural constraints. For example, JFS on AIX does
not support files larger than 2 GB, unless the Large file support option has been specified when the
file system was created. Check local operating system documentation for maximum file size limitations.

Questions related to active file management


The following questions are related to active file management.

How can resync be used in active file management?

AFM resync is used by the administrator under special circumstances such as home corruption. The
administrator can choose to update home with the contents from the cache by using the mmafmctl
resync command. Resync works for single-writer cache only.

Chapter 10. Other problem determination hints and tips 111


How can I change the mode of a fileset?

The mode of an AFM client cache fileset cannot be changed from local-update mode to any other mode;
however, it can be changed from read-only to single-writer (and vice versa), and from either read-only or
single-writer to local-update.

To change the mode, do the following:


1. Ensure that fileset status is active and that the gateway is available.
2. Umount the file system.
3. Unlink the fileset.
4. Run the mmchfileset command to change the mode.
5. Link the fileset again.

What error can result from operating in disconnected mode?

Accessing the contents in an AFM cache for an uncached object while in disconnected mode results in an
input/output error.

Why are setuid/setgid bits in a single-writer cache reset at home after data is
appended?

The setuid/setgid bits in a single-writer cache are reset at home after data is appended to files on which
those bits were previously set and synced. This is because over NFS, a write operation to a setuid file
resets the setuid bit.

How can I traverse a directory that has not been cached?

On a fileset whose metadata in all subdirectories is not cached, any application that optimizes by
assuming that directories contain two fewer subdirectories than their hard link count will not traverse the
last subdirectory. One such example is find; on Linux, a workaround for this is to use find -noleaf to
correctly traverse a directory that has not been cached.

What extended attribute size is supported?

For an operating system in the gateway whose Linux kernel version is below 2.6.32, the NFS max rsize is
32K, so AFM would not support an extended attribute size of more than 32K on that gateway.

What should I do when my file system or fileset is getting full?

The .ptrash directory is present in cache and home. In some cases, where there is a conflict that AFM
cannot resolve automatically, the file is moved to .ptrash at cache or home. In cache the .ptrash gets
cleaned up when eviction is triggered. At home, it is not cleared automatically. When the administrator is
looking to clear some space, the .ptrash should be cleaned up first.

How can I bring a dropped fileset back to active?

Fix the problem and access the fileset to make it active again. If the fileset does not automatically become
active, and remains dropped for a long time (more than five minutes), then do one of the following:
1. Unmount the file system and then remount it.
2. Unlink the dropped fileset and then link it again.
3. Restart GPFS on the gateway node.

112 GPFS: Problem Determination Guide


Questions related to File Placement Optimizer (FPO)
The following questions are related to File Placement Optimizer (FPO).

Why is my data not read from the network locally when I have an FPO pool
(write-affinity enabled storage pool) created?

When you create a storage pool that is to contain files that make use of FPO features, you must specify
allowWriteAffinity=yes in the storage pool stanza.

To enable the policy to read replicas from local disks, you must also issue the following command:
mmchconfig readReplicaPolicy=local

How can I change a failure group for a disk in an FPO environment?

To change the failure group in a write-affinity–enabled storage pool, you must use the mmdeldisk and
mmadddisk commands; you cannot use mmchdisk to change it directly.

Why does Hadoop receive a fixed value for the block group factor instead of the
GPFS default value?

When a customer does not define the dfs.block.size property in the configuration file, the GPFS
connector will use a fixed block size to initialize Hadoop. The reason for this is that Hadoop has only one
block size per file system, whereas GPFS allows different chunk sizes (block-group-factor × data block
size) for different data pools because block size is a per-pool property. To avoid a mismatch when using
Hadoop with FPO, define dfs.block.size and dfs.replication in the configuration file.

How can I retain the original data placement when I restore data from a TSM
server?

When data in an FPO pool is backed up in a TSM server and then restored, the original placement map
will be broken unless you set the write affinity failure group for each file before backup.

How is an FPO pool file placed at AFM home and cache?

For AFM home or cache, an FPO pool file written on the local side will be placed according to the write
affinity depth and write affinity failure group definitions of the local side. When a file is synced from
home to cache, it follows the same FPO placement rule as when written from the gateway node in the
cache cluster. When a file is synced from cache to home, it follows the same FPO data placement rule as
when written from the NFS server in the home cluster.

To retain the same file placement at both home and cache, ensure that each has the same cluster
configuration, and set the write affinity failure group for each file.

Chapter 10. Other problem determination hints and tips 113


114 GPFS: Problem Determination Guide
Chapter 11. Contacting IBM
Specific information about a problem such as: symptoms, traces, error logs, GPFS logs, and file system
status is vital to IBM in order to resolve a GPFS problem.

Obtain this information as quickly as you can after a problem is detected, so that error logs will not wrap
and system parameters that are always changing, will be captured as close to the point of failure as
possible. When a serious problem is detected, collect this information and then call IBM. For more
information, see:
v “Information to collect before contacting the IBM Support Center”
v “How to contact the IBM Support Center” on page 117.

Information to collect before contacting the IBM Support Center


For effective communication with the IBM Support Center to help with problem diagnosis, you need to
collect certain information.

Information to collect for all problems related to GPFS

Regardless of the problem encountered with GPFS, the following data should be available when you
contact the IBM Support Center:
1. A description of the problem.
2. Output of the failing application, command, and so forth.
3. A tar file generated by the gpfs.snap command that contains data from the nodes in the cluster. In
large clusters, the gpfs.snap command can collect data from certain nodes (for example, the affected
nodes, NSD servers, or manager nodes) using the -N option.
For more information about gathering data with gpfs.snap, see “The gpfs.snap command” on page 6.
If the gpfs.snap command cannot be run, collect these items:
a. Any error log entries relating to the event:
v On an AIX node, issue this command:
errpt -a
v On a Linux node, create a tar file of all the entries in the /var/log/messages file from all nodes in
the cluster or the nodes that experienced the failure. For example, issue the following command
to create a tar file that includes all nodes in the cluster:
mmdsh -v -N all "cat /var/log/messages" > all.messages
v On a Windows node, use the Export List... dialog in the Event Viewer to save the event log to a
file.
b. A master GPFS log file that is merged and chronologically sorted for the date of the failure (see
“Creating a master GPFS log file” on page 2).
c. If the cluster was configured to store dumps, collect any internal GPFS dumps written to that
directory relating to the time of the failure. The default directory is /tmp/mmfs.
d. On a failing Linux node, gather the installed software packages and the versions of each package
by issuing this command:
rpm -qa
e. On a failing AIX node, gather the name, most recent level, state, and description of all installed
software packages by issuing this command:
lslpp -l
f. File system attributes for all of the failing file systems, issue:

© Copyright IBM Corporation © IBM 2014 115


mmlsfs Device
g. The current configuration and state of the disks for all of the failing file systems, issue:
mmlsdisk Device
h. A copy of file /var/mmfs/gen/mmsdrfs from the primary cluster configuration server.
4. If you are experiencing one of the following problems, see the appropriate section before contacting
the IBM Support Center:
v For delay and deadlock issues, see “Additional information to collect for delays and deadlocks.”
v For file system corruption or MMFS_FSSTRUCT errors, see “Additional information to collect for
file system corruption or MMFS_FSSTRUCT errors.”
v For GPFS daemon crashes, see “Additional information to collect for GPFS daemon crashes.”

Additional information to collect for delays and deadlocks

When a delay or deadlock situation is suspected, the IBM Support Center will need additional
information to assist with problem diagnosis. If you have not done so already, ensure you have the
following information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to GPFS” on page 115.
| 2. The deadlock debug data collected automatically.
3. If the cluster size is relatively small and the maxFilesToCache setting is not high (less than 10,000),
issue the following command:
gpfs.snap --deadlock
If the cluster size is large or the maxFilesToCache setting is high (greater than 1M), issue the
following command:
gpfs.snap --deadlock --quick
For more information about the --deadlock and --quick options, see “The gpfs.snap command” on
page 6.

Additional information to collect for file system corruption or MMFS_FSSTRUCT


errors

When file system corruption or MMFS_FSSTRUCT errors are encountered, the IBM Support Center will
need additional information to assist with problem diagnosis. If you have not done so already, ensure
you have the following information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to GPFS” on page 115.
2. Unmount the file system everywhere, then run mmfsck -n in offline mode and redirect it to an output
file.

The IBM Support Center will determine when and if you should run the mmfsck -y command.

Additional information to collect for GPFS daemon crashes

When the GPFS daemon is repeatedly crashing, the IBM Support Center will need additional information
to assist with problem diagnosis. If you have not done so already, ensure you have the following
information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to GPFS” on page 115.
2. Ensure the /tmp/mmfs directory exists on all nodes. If this directory does not exist, the GPFS daemon
will not generate internal dumps.
3. Set the traces on this cluster and all clusters that mount any file system from this cluster:
mmtracectl --set --trace=def --trace-recycle=global
4. Start the trace facility by issuing:

116 GPFS: Problem Determination Guide


mmtracectl --start
5. Recreate the problem if possible or wait for the assert to be triggered again.
6. Once the assert is encountered on the node, turn off the trace facility by issuing:
mmtracectl --off
If traces were started on multiple clusters, mmtracectl --off should be issued immediately on all
clusters.
7. Collect gpfs.snap output:
gpfs.snap

How to contact the IBM Support Center


IBM support is available for various types of IBM hardware and software problems that GPFS customers
may encounter.

These problems include the following:


v IBM hardware failure
v Node halt or crash not related to a hardware failure
v Node hang or response problems
v Failure in other software supplied by IBM
If you have an IBM Software Maintenance service contract
If you have an IBM Software Maintenance service contract, contact IBM Support as follows:

Your location Method of contacting IBM Support


In the United States Call 1-800-IBM-SERV for support.
Outside the United States Contact your local IBM Support Center or see the
directory of worldwide contacts at www.ibm.com/
planetwide

When you contact IBM Support, the following will occur:


1. You will be asked for the information you collected in “Information to collect before
contacting the IBM Support Center” on page 115.
2. You will be given a time period during which an IBM representative will return your call. Be
sure that the person you identified as your contact can be reached at the phone number you
provided in the PMR.
3. An online Problem Management Record (PMR) will be created to track the problem you are
reporting, and you will be advised to record the PMR number for future reference.
4. You may be requested to send data related to the problem you are reporting, using the PMR
number to identify it.
5. Should you need to make subsequent calls to discuss the problem, you will also use the PMR
number to identify the problem.
If you do not have an IBM Software Maintenance service contract
If you do not have an IBM Software Maintenance service contract, contact your IBM sales
representative to find out how to proceed. Be prepared to provide the information you collected
in “Information to collect before contacting the IBM Support Center” on page 115.

For failures in non-IBM software, follow the problem-reporting procedures provided with that product.

Chapter 11. Contacting IBM 117


118 GPFS: Problem Determination Guide
Chapter 12. Message severity tags
GPFS has adopted a message severity tagging convention. This convention applies to some newer
messages and to some messages that are being updated and adapted to be more usable by scripts or
semi-automated management programs.

A severity tag is a one-character alphabetic code (A through Z), optionally followed by a colon (:) and a
number, and surrounded by an opening and closing bracket ([ ]). For example:
[E] or [E:nnn]

If more than one substring within a message matches this pattern (for example, [A] or [A:nnn]), the
severity tag is the first such matching string.

When the severity tag includes a numeric code (nnn), this is an error code associated with the message. If
this were the only problem encountered by the command, the command return code would be nnn.

If a message does not have a severity tag, the message does not conform to this specification. You can
determine the message severity by examining the text or any supplemental information provided in the
message catalog, or by contacting the IBM Support Center.

| Each message severity tag has an assigned priority that can be used to filter the messages that are sent to
| the error log on Linux. Filtering is controlled with the mmchconfig attribute systemLogLevel. The default
| for systemLogLevel is error, which means GPFS will send all error [E], critical [X], and alert [A]
| messages to the error log. The values allowed for systemLogLevel are: alert, critical, error, warning,
| notice, configuration, informational, detail, or debug. Additionally, the value none can be specified so
| no messages are sent to the error log.

| Alert [A] messages have the highest priority, and debug [B] messages have the lowest priority. If the
| systemLogLevel default of error is changed, only messages with the specified severity and all those with
| a higher priority are sent to the error log. The following table lists the message severity tags in order of
| priority:
| Table 3. Message severity tags ordered by priority
| Type of message
| (systemLogLevel
| Severity tag attribute) Meaning
| A alert Indicates a problem where action must be taken immediately. Notify the
| appropriate person to correct the problem.
| X critical Indicates a critical condition that should be corrected immediately. The
| system discovered an internal inconsistency of some kind. Command
| execution might be halted or the system might attempt to continue despite
| the inconsistency. Report these errors to IBM.
| E error Indicates an error condition. Command execution might or might not
| continue, but this error was likely caused by a persistent condition and will
| remain until corrected by some other program or administrative action. For
| example, a command operating on a single file or other GPFS object might
| terminate upon encountering any condition of severity E. As another
| example, a command operating on a list of files, finding that one of the files
| has permission bits set that disallow the operation, might continue to
| operate on all other files within the specified list of files.

© Copyright IBM Corporation © IBM 2014 119


| Table 3. Message severity tags ordered by priority (continued)
| Type of message
| (systemLogLevel
| Severity tag attribute) Meaning
| W warning Indicates a problem, but command execution continues. The problem can be
| a transient inconsistency. It can be that the command has skipped some
| operations on some objects, or is reporting an irregularity that could be of
| interest. For example, if a multipass command operating on many files
| discovers during its second pass that a file that was present during the first
| pass is no longer present, the file might have been removed by another
| command or program.
| N notice Indicates a normal but significant condition. These events are unusual but
| not error conditions, and might be summarized in an email to developers or
| administrators for spotting potential problems. No immediate action is
| required.
| C configuration Indicates a configuration change; such as, creating a file system or removing
| a node from the cluster.
| I informational Indicates normal operation. This message by itself indicates that nothing is
| wrong; no action is required.
| D detail Indicates verbose operational messages; no is action required.
| B debug Indicates debug-level messages that are useful to application developers for
| debugging purposes. This information is not useful during operations.
|

120 GPFS: Problem Determination Guide


|

Chapter 13. Messages


This topic contains explanations for GPFS error messages.

User response: Make sure /var/mmfs/etc/gpfsready


6027-000 Attention: A disk being removed
completes and returns a zero exit status, or disable the
reduces the number of failure groups to
verifyGpfsReady option via mmchconfig
nFailureGroups, which is below the
verifyGpfsReady=no.
number required for replication:
nReplicas.
Explanation: Replication cannot protect data against
| 6027-305 [N] script failed with exit code code
disk failures when there are insufficient failure groups. Explanation: The verifyGpfsReady=yes configuration
attribute is set and /var/mmfs/etc/gpfsready script did
User response: Add more disks in new failure groups
not complete successfully
to the file system or accept the risk of data loss.
User response: Make sure /var/mmfs/etc/gpfsready
completes and returns a zero exit status, or disable the
| 6027-300 [N] mmfsd ready
verifyGpfsReady option via mmchconfig
Explanation: The mmfsd server is up and running. verifyGpfsReady=no.
User response: None. Informational message only.
| 6027-306 [E] Could not initialize inter-node
communication
6027-301 File fileName could not be run with err
errno. Explanation: The GPFS daemon was unable to
initialize the communications required to proceed.
Explanation: The named shell script could not be
executed. This message is followed by the error string User response: User action depends on the return
that is returned by the exec. code shown in the accompanying message
(/usr/include/errno.h). The communications failure that
User response: Check file existence and access
caused the failure must be corrected. One possibility is
permissions.
an rc value of 67, indicating that the required port is
unavailable. This may mean that a previous version of
| 6027-302 [E] Could not execute script the mmfs daemon is still running. Killing that daemon
may resolve the problem.
Explanation: The verifyGpfsReady=yes configuration
attribute is set, but the /var/mmfs/etc/gpfsready script
could not be executed. | 6027-310 [I] command initializing. {Version
versionName: Built date time}
User response: Make sure /var/mmfs/etc/gpfsready
exists and is executable, or disable the Explanation: The mmfsd server has started execution.
verifyGpfsReady option via mmchconfig
User response: None. Informational message only.
verifyGpfsReady=no.

| 6027-311 [N] programName is shutting down.


| 6027-303 [N] script killed by signal signal
Explanation: The stated program is about to
Explanation: The verifyGpfsReady=yes configuration
terminate.
attribute is set and /var/mmfs/etc/gpfsready script did
not complete successfully. User response: None. Informational message only.
User response: Make sure /var/mmfs/etc/gpfsready
completes and returns a zero exit status, or disable the | 6027-312 [E] Unknown trace class 'traceClass'.
verifyGpfsReady option via mmchconfig
Explanation: The trace class is not recognized.
verifyGpfsReady=no.
User response: Specify a valid trace class.
| 6027-304 [W] script ended abnormally
Explanation: The verifyGpfsReady=yes configuration
| 6027-313 [X] Cannot open configuration file fileName.
attribute is set and /var/mmfs/etc/gpfsready script did Explanation: The configuration file could not be
not complete successfully. opened.

© Copyright IBM Corp. 2014 121


6027-314 [E] • 6027-329

User response: The configuration file is


6027-320 Could not map shared segment
/var/mmfs/gen/mmfs.cfg. Verify that this file and
/var/mmfs/gen/mmsdrfs exist in your system. Explanation: The shared segment could not be
attached.
| 6027-314 [E] command requires SuperuserName User response: This is an error from the AIX
authority to execute. operating system. Check the accompanying error
indications from AIX.
Explanation: The mmfsd server was started by a user
without superuser authority.
6027-321 Shared segment mapped at wrong
User response: Log on as a superuser and reissue the
address (is value, should be value).
command.
Explanation: The shared segment did not get mapped
to the expected address.
| 6027-315 [E] Bad config file entry in fileName, line
number. User response: Contact the IBM Support Center.
Explanation: The configuration file has an incorrect
entry. 6027-322 Could not map shared segment in
kernel extension
User response: Fix the syntax error in the
configuration file. Verify that you are not using a Explanation: The shared segment could not be
configuration file that was created on a release of GPFS mapped in the kernel.
subsequent to the one that you are currently running.
User response: If an EINVAL error message is
displayed, the kernel extension could not use the
| 6027-316 [E] Unknown config parameter "parameter" shared segment because it did not have the correct
in fileName, line number. GPFS version number. Unload the kernel extension and
restart the GPFS daemon.
Explanation: There is an unknown parameter in the
configuration file.
| 6027-323 [A] Error unmapping shared segment.
User response: Fix the syntax error in the
configuration file. Verify that you are not using a Explanation: The shared segment could not be
configuration file that was created on a release of GPFS detached.
subsequent to the one you are currently running.
User response: Check reason given by error message.

| 6027-317 [A] Old server with PID pid still running.


6027-324 Could not create message queue for
Explanation: An old copy of mmfsd is still running. main process
User response: This message would occur only if the Explanation: The message queue for the main process
user bypasses the SRC. The normal message in this could not be created. This is probably an operating
case would be an SRC message stating that multiple system error.
instances are not allowed. If it occurs, stop the previous
User response: Contact the IBM Support Center.
instance and use the SRC commands to restart the
daemon.
| 6027-328 [W] Value 'value' for 'parameter' is out of
| 6027-318 [E] Watchdog: Some process appears stuck; | range in fileName. Valid values are value
stopped the daemon process.
| through value. value used.

Explanation: A high priority process got into a loop.


| Explanation: An error was found in the
| /var/mmfs/gen/mmfs.cfg file.
User response: Stop the old instance of the mmfs
server, then restart it.
| User response: Check the /var/mmfs/gen/mmfs.cfg
| file.

6027-319 Could not create shared segment


6027-329 Cannot pin the main shared segment:
Explanation: The shared segment could not be name
created.
Explanation: Trying to pin the shared segment during
User response: This is an error from the AIX initialization.
operating system. Check the accompanying error
User response: Check the mmfs.cfg file. The pagepool
indications from AIX.
size may be too large. It cannot be more than 80% of
real memory. If a previous mmfsd crashed, check for

122 GPFS: Problem Determination Guide


6027-334 [E] • 6027-344 [E]

processes that begin with the name mmfs that may be


6027-340 Child process file failed to start due to
holding on to an old pinned shared segment. Issue
error rc: errStr.
mmchconfig command to change the pagepool size.
Explanation: A failure occurred when GPFS attempted
to start a program.
| 6027-334 [E] Error initializing internal
communications. User response: If the program was a user exit script,
verify the script file exists and has appropriate
Explanation: The mailbox system used by the daemon
permissions assigned. If the program was not a user
for communication with the kernel cannot be
exit script, then this is an internal GPFS error or the
initialized.
GPFS installation was altered.
User response: Increase the size of available memory
using the mmchconfig command.
| 6027-341 [D] Node nodeName is incompatible because
its maximum compatible version
| 6027-335 [E] Configuration error: check fileName. (number) is less than the version of this
node (number). [value/value]
Explanation: A configuration error is found.
Explanation: The GPFS daemon tried to make a
User response: Check the mmfs.cfg file and other connection with another GPFS daemon. However, the
error messages. other daemon is not compatible. Its maximum
compatible version is less than the version of the
| 6027-336 [E] Value 'value' for configuration parameter daemon running on this node. The numbers in square
'parameter' is not valid. Check fileName. brackets are for IBM Service use.

Explanation: A configuration error was found. User response: Verify your GPFS daemon version.

User response: Check the mmfs.cfg file.


| 6027-342 [E] Node nodeName is incompatible because
its minimum compatible version is
| 6027-337 [N] Waiting for resources to be reclaimed greater than the version of this node
before exiting. (number). [value/value]
Explanation: The mmfsd daemon is attempting to Explanation: The GPFS daemon tried to make a
terminate, but cannot because data structures in the connection with another GPFS daemon. However, the
daemon shared segment may still be referenced by other daemon is not compatible. Its minimum
kernel code. This message may be accompanied by compatible version is greater than the version of the
other messages that show which disks still have I/O in daemon running on this node. The numbers in square
progress. brackets are for IBM Service use.
User response: None. Informational message only. User response: Verify your GPFS daemon version.

| 6027-338 [N] Waiting for number user(s) of shared | 6027-343 [E] Node nodeName is incompatible because
segment to release it. its version (number) is less than the
Explanation: The mmfsd daemon is attempting to minimum compatible version of this
terminate, but cannot because some process is holding node (number). [value/value]
the shared segment while in a system call. The message Explanation: The GPFS daemon tried to make a
will repeat every 30 seconds until the count drops to connection with another GPFS daemon. However, the
zero. other daemon is not compatible. Its version is less than
User response: Find the process that is not the minimum compatible version of the daemon
responding, and find a way to get it out of its system running on this node. The numbers in square brackets
call. are for IBM Service use.
User response: Verify your GPFS daemon version.
| 6027-339 [E] Nonnumeric trace value 'value' after class
'class'. | 6027-344 [E] Node nodeName is incompatible because
Explanation: The specified trace value is not its version is greater than the maximum
recognized. compatible version of this node
(number). [value/value]
User response: Specify a valid trace integer value.
Explanation: The GPFS daemon tried to make a
connection with another GPFS daemon. However, the
other daemon is not compatible. Its version is greater
than the maximum compatible version of the daemon

Chapter 13. Messages 123


6027-345 • 6027-361 [E]

running on this node. The numbers in square brackets cards. Run the mmchconfig subnets command to
are for IBM Service use. correct the value.
User response: Verify your GPFS daemon version.
| 6027-350 [E] Bad "subnets" configuration: primary IP
address ipAddress is on a private subnet.
6027-345 Network error on ipAddress, check
Use a public IP address instead.
connectivity.
Explanation: GPFS is configured to allow multiple IP
Explanation: A TCP error has caused GPFS to exit due
addresses per node (subnets configuration parameter),
to a bad return code from an error. Exiting allows
but the primary IP address of the node (the one
recovery to proceed on another node and resources are
specified when the cluster was created or when the
not tied up on this node.
node was added to the cluster) was found to be on a
User response: Follow network problem private subnet. If multiple IP addresses are used, the
determination procedures. primary address must be a public IP address.
User response: Remove the node from the cluster;
| 6027-346 [E] Incompatible daemon version. My then add it back using a public IP address.
version = number, repl.my_version =
number
6027-358 Communication with mmspsecserver
Explanation: The GPFS daemon tried to make a through socket name failed, err value:
connection with another GPFS daemon. However, the errorString, msgType messageType.
other GPFS daemon is not the same version and it sent
Explanation: Communication failed between
a reply indicating its version number is incompatible.
spsecClient (the daemon) and spsecServer.
User response: Verify your GPFS daemon version.
User response: Verify both the communication socket
and the mmspsecserver process.
| 6027-347 [E] Remote host ipAddress refused
connection because IP address ipAddress
6027-359 The mmspsecserver process is shutting
was not in the node list file
down. Reason: explanation.
Explanation: The GPFS daemon tried to make a
Explanation: The mmspsecserver process received a
connection with another GPFS daemon. However, the
signal from the mmfsd daemon or encountered an
other GPFS daemon sent a reply indicating it did not
error on execution.
recognize the IP address of the connector.
User response: Verify the reason for shutdown.
User response: Add the IP address of the local host to
the node list file on the remote host.
6027-360 Disk name must be removed from the
/etc/filesystems stanza before it can be
| 6027-348 [E] Bad "subnets" configuration: invalid deleted. Another disk in the file system
subnet "ipAddress".
can be added in its place if needed.
Explanation: A subnet specified by the subnets
Explanation: A disk being deleted is found listed in
configuration parameter could not be parsed.
the disks= list for a file system.
User response: Run the mmlsconfig command and
User response: Remove the disk from list.
check the value of the subnets parameter. Each subnet
must be specified as a dotted-decimal IP address. Run
the mmchconfig subnets command to correct the | 6027-361 [E] Local access to disk failed with EIO,
value. switching to access the disk remotely.
Explanation: Local access to the disk failed. To avoid
| 6027-349 [E] Bad "subnets" configuration: invalid unmounting of the file system, the disk will now be
cluster name pattern accessed remotely.
"clusterNamePattern".
User response: Wait until work continuing on the
Explanation: A cluster name pattern specified by the local node completes. Then determine why local access
subnets configuration parameter could not be parsed. to the disk failed, correct the problem and restart the
daemon. This will cause GPFS to begin accessing the
User response: Run the mmlsconfig command and
disk locally again.
check the value of the subnets parameter. The optional
cluster name pattern following subnet address must be
a shell-style pattern allowing '*', '/' and '[...]' as wild

124 GPFS: Problem Determination Guide


6027-362 • 6027-375

6027-362 Attention: No disks were deleted, but 6027-370 mmdeldisk completed.


some data was migrated. The file system
Explanation: The mmdeldisk command has
may no longer be properly balanced.
completed.
Explanation: The mmdeldisk command did not
User response: None. Informational message only.
complete migrating data off the disks being deleted.
The disks were restored to normal ready, status, but
the migration has left the file system unbalanced. This 6027-371 Cannot delete all disks in the file
may be caused by having too many disks unavailable system
or insufficient space to migrate all of the data to other
disks. Explanation: An attempt was made to delete all the
disks in a file system.
User response: Check disk availability and space
requirements. Determine the reason that caused the User response: Either reduce the number of disks to
command to end before successfully completing the be deleted or use the mmdelfs command to delete the
migration and disk deletion. Reissue the mmdeldisk file system.
command.
6027-372 Replacement disk must be in the same
6027-363 I/O error writing disk descriptor for failure group as the disk being replaced.
disk name. Explanation: An improper failure group was specified
Explanation: An I/O error occurred when the for mmrpldisk.
mmadddisk command was writing a disk descriptor on User response: Specify a failure group in the disk
a disk. This could have been caused by either a descriptor for the replacement disk that is the same as
configuration error or an error in the path to the disk. the failure group of the disk being replaced.
User response: Determine the reason the disk is
inaccessible for writing and reissue the mmadddisk 6027-373 Disk diskName is being replaced, so
command. status of disk diskName must be
replacement.
6027-364 Error processing disks. Explanation: The mmrpldisk command failed when
Explanation: An error occurred when the mmadddisk retrying a replace operation because the new disk does
command was reading disks in the file system. not have the correct status.

User response: Determine the reason why the disks User response: Issue the mmlsdisk command to
are inaccessible for reading, then reissue the display disk status. Then either issue the mmchdisk
mmadddisk command. command to change the status of the disk to
replacement or specify a new disk that has a status of
replacement.
| 6027-365 [I] Rediscovered local access to disk.
Explanation: Rediscovered local access to disk, which 6027-374 Disk name may not be replaced.
failed earlier with EIO. For good performance, the disk
will now be accessed locally. Explanation: A disk being replaced with mmrpldisk
does not have a status of ready or suspended.
User response: Wait until work continuing on the
local node completes. This will cause GPFS to begin User response: Use the mmlsdisk command to
accessing the disk locally again. display disk status. Issue the mmchdisk command to
change the status of the disk to be replaced to either
ready or suspended.
6027-369 I/O error writing file system descriptor
for disk name.
6027-375 Disk name diskName already in file
Explanation: mmadddisk detected an I/O error while system.
writing a file system descriptor on a disk.
Explanation: The replacement disk name specified in
User response: Determine the reason the disk is the mmrpldisk command already exists in the file
inaccessible for writing and reissue the mmadddisk system.
command.
User response: Specify a different disk as the
replacement disk.

Chapter 13. Messages 125


6027-376 • 6027-389

6027-376 Previous replace command must be 6027-382 Value value for the 'sector size' option
completed before starting a new one. for disk disk is not a multiple of value.
Explanation: The mmrpldisk command failed because Explanation: When parsing disk lists, the sector size
the status of other disks shows that a replace command given is not a multiple of the default sector size.
did not complete.
User response: Specify a correct sector size.
User response: Issue the mmlsdisk command to
display disk status. Retry the failed mmrpldisk
6027-383 Disk name name appears more than
command or issue the mmchdisk command to change
once.
the status of the disks that have a status of replacing or
replacement. Explanation: When parsing disk lists, a duplicate
name is found.
6027-377 Cannot replace a disk that is in use. User response: Remove the duplicate name.
Explanation: Attempting to replace a disk in place,
but the disk specified in the mmrpldisk command is 6027-384 Disk name name already in file system.
still available for use.
Explanation: When parsing disk lists, a disk name
User response: Use the mmchdisk command to stop already exists in the file system.
GPFS's use of the disk.
User response: Rename or remove the duplicate disk.

| 6027-378 [I] I/O still in progress near sector number


on disk diskName. 6027-385 Value value for the 'sector size' option
for disk name is out of range. Valid
Explanation: The mmfsd daemon is attempting to values are number through number.
terminate, but cannot because data structures in the
daemon shared segment may still be referenced by Explanation: When parsing disk lists, the sector size
kernel code. In particular, the daemon has started an given is not valid.
I/O that has not yet completed. It is unsafe for the User response: Specify a correct sector size.
daemon to terminate until the I/O completes, because
of asynchronous activity in the device driver that will
access data structures belonging to the daemon. 6027-386 Value value for the 'sector size' option
for disk name is invalid.
User response: Either wait for the I/O operation to
time out, or issue a device-dependent command to Explanation: When parsing disk lists, the sector size
terminate the I/O. given is not valid.
User response: Specify a correct sector size.
6027-379 Could not invalidate disk(s).
Explanation: Trying to delete a disk and it could not 6027-387 Value value for the 'failure group' option
be written to in order to invalidate its contents. for disk name is out of range. Valid
values are number through number.
User response: No action needed if removing that
disk permanently. However, if the disk is ever to be Explanation: When parsing disk lists, the failure
used again, the -v flag must be specified with a value group given is not valid.
of no when using either the mmcrfs or mmadddisk User response: Specify a correct failure group.
command.

6027-388 Value value for the 'failure group' option


6027-380 Disk name missing from disk descriptor for disk name is invalid.
list entry name.
Explanation: When parsing disk lists, the failure
Explanation: When parsing disk lists, no disks were group given is not valid.
named.
User response: Specify a correct failure group.
User response: Check the argument list of the
command.
6027-389 Value value for the 'has metadata' option
for disk name is out of range. Valid
values are number through number.
Explanation: When parsing disk lists, the 'has
metadata' value given is not valid.

126 GPFS: Problem Determination Guide


6027-390 • 6027-421

User response: Specify a correct 'has metadata' value. 3. Disks are not correctly defined on all active nodes.
4. Disks, logical volumes, network shared disks, or
6027-390 Value value for the 'has metadata' option virtual shared disks were incorrectly re-configured
for disk name is invalid. after creating a file system.

Explanation: When parsing disk lists, the 'has User response: Verify:
metadata' value given is not valid. 1. The disks are correctly defined on all nodes.
User response: Specify a correct 'has metadata' value. 2. The paths to the disks are correctly defined and
operational.

6027-391 Value value for the 'has data' option for


disk name is out of range. Valid values 6027-417 Bad file system descriptor.
are number through number. Explanation: A file system descriptor that is not valid
Explanation: When parsing disk lists, the 'has data' was encountered.
value given is not valid. User response: Verify:
User response: Specify a correct 'has data' value. 1. The disks are correctly defined on all nodes.
2. The paths to the disks are correctly defined and
6027-392 Value value for the 'has data' option for operational.
disk name is invalid.
Explanation: When parsing disk lists, the 'has data' 6027-418 Inconsistent file system quorum.
value given is not valid. readQuorum=value writeQuorum=value
quorumSize=value.
User response: Specify a correct 'has data' value.
Explanation: A file system descriptor that is not valid
was encountered.
6027-393 Either the 'has data' option or the 'has
metadata' option must be '1' for disk User response: Start any disks that have been stopped
diskName. by the mmchdisk command or by hardware failures. If
the problem persists, run offline mmfsck.
Explanation: When parsing disk lists the 'has data' or
'has metadata' value given is not valid.
6027-419 Failed to read a file system descriptor.
User response: Specify a correct 'has data' or 'has
metadata' value. Explanation: Not enough valid replicas of the file
system descriptor could be read from the file system.

6027-394 Too many disks specified for file User response: Start any disks that have been stopped
system. Maximum = number. by the mmchdisk command or by hardware failures.
Verify that paths to all disks are correctly defined and
Explanation: Too many disk names were passed in the operational.
disk descriptor list.
User response: Check the disk descriptor list or the 6027-420 Inode size must be greater than zero.
file containing the list.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-399 Not enough items in disk descriptor list
entry, need fields. User response: Record the above information. Contact
the IBM Support Center.
Explanation: When parsing a disk descriptor, not
enough fields were specified for one disk.
6027-421 Inode size must be a multiple of logical
User response: Correct the disk descriptor to use the sector size.
correct disk descriptor syntax.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-416 Incompatible file system descriptor
version or not formatted. User response: Record the above information. Contact
the IBM Support Center.
Explanation: Possible reasons for the error are:
1. A file system descriptor version that is not valid
was encountered.
2. No file system descriptor can be found.

Chapter 13. Messages 127


6027-422 • 6027-434

6027-422 Inode size must be at least as large as 6027-428 Indirect block size must be a multiple
the logical sector size. of the minimum fragment size.
Explanation: An internal consistency check has found Explanation: An internal consistency check has found
a problem with file system parameters. a problem with file system parameters.
User response: Record the above information. Contact User response: Record the above information. Contact
the IBM Support Center. the IBM Support Center.

6027-423 Minimum fragment size must be a 6027-429 Indirect block size must be less than
multiple of logical sector size. full data block size.
Explanation: An internal consistency check has found Explanation: An internal consistency check has found
a problem with file system parameters. a problem with file system parameters.
User response: Record the above information. Contact User response: Record the above information. Contact
the IBM Support Center. the IBM Support Center.

6027-424 Minimum fragment size must be greater 6027-430 Default metadata replicas must be less
than zero. than or equal to default maximum
number of metadata replicas.
Explanation: An internal consistency check has found
a problem with file system parameters. Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Record the above information. Contact
the IBM Support Center. User response: Record the above information. Contact
the IBM Support Center.
6027-425 File system block size of blockSize is
larger than maxblocksize parameter. 6027-431 Default data replicas must be less than
or equal to default maximum number of
Explanation: An attempt is being made to mount a
data replicas.
file system whose block size is larger than the
maxblocksize parameter as set by mmchconfig. Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Use the mmchconfig
maxblocksize=xxx command to increase the maximum User response: Record the above information. Contact
allowable block size. the IBM Support Center.

6027-426 Warning: mount detected unavailable 6027-432 Default maximum metadata replicas
disks. Use mmlsdisk fileSystem to see must be less than or equal to value.
details.
Explanation: An internal consistency check has found
Explanation: The mount command detected that some a problem with file system parameters.
disks needed for the file system are unavailable.
User response: Record the above information. Contact
User response: Without file system replication the IBM Support Center.
enabled, the mount will fail. If it has replication, the
mount may succeed depending on which disks are
6027-433 Default maximum data replicas must be
unavailable. Use mmlsdisk to see details of the disk
less than or equal to value.
status.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-427 Indirect block size must be at least as
large as the minimum fragment size. User response: Record the above information. Contact
the IBM Support Center.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-434 Indirect blocks must be at least as big as
User response: Record the above information. Contact
inodes.
the IBM Support Center.
Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Record the above information. Contact
the IBM Support Center.

128 GPFS: Problem Determination Guide


6027-435 [N] • 6027-465

system database and local mmsdrfs file for this file


| 6027-435 [N] The file system descriptor quorum has
system.
been overridden.
Explanation: The mmfsctl exclude command was
6027-452 No disks found in disks= list.
previously issued to override the file system descriptor
quorum after a disaster. Explanation: No disks listed when opening a file
system.
User response: None. Informational message only.
User response: Check the operating system's file
system database and local mmsdrfs file for this file
6027-438 Duplicate disk name name.
system.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-453 No disk name found in a clause of the
User response: Record the above information. Contact list.
the IBM Support Center.
Explanation: No disk name found in a clause of
thedisks= list.
6027-439 Disk name sector size value does not
User response: Check the operating system's file
match sector size value of other disk(s).
system database and local mmsdrfs file for this file
Explanation: An internal consistency check has found system.
a problem with file system parameters.
User response: Record the above information. Contact 6027-461 Unable to find name device.
the IBM Support Center.
Explanation: Self explanatory.
User response: There must be a /dev/sgname special
6027-441 Unable to open disk 'name' on node
device defined. Check the error code. This could
nodeName.
indicate a configuration error in the specification of
Explanation: A disk name that is not valid was disks, logical volumes, network shared disks, or virtual
specified in a GPFS disk command. shared disks.
User response: Correct the parameters of the
executing GPFS disk command. 6027-462 name must be a char or block special
device.
6027-445 Value for option '-m' cannot exceed the Explanation: Opening a file system.
number of metadata failure groups.
User response: There must be a /dev/sgname special
Explanation: The current number of replicas of device defined. This could indicate a configuration
metadata cannot be larger than the number of failure error in the specification of disks, logical volumes,
groups that are enabled to hold metadata. network shared disks, or virtual shared disks.
User response: Use a smaller value for -m on the
mmchfs command, or increase the number of failure 6027-463 SubblocksPerFullBlock was not 32.
groups by adding disks to the file system.
Explanation: The value of the SubblocksPerFullBlock
variable was not 32. This situation should never exist,
6027-446 Value for option '-r' cannot exceed the and indicates an internal error.
number of data failure groups.
User response: Record the above information and
Explanation: The current number of replicas of data contact the IBM Support Center.
cannot be larger than the number of failure groups that
are enabled to hold data.
6027-465 The average file size must be at least as
User response: Use a smaller value for -r on the large as the minimum fragment size.
mmchfs command, or increase the number of failure
Explanation: When parsing the command line of
groups by adding disks to the file system.
tscrfs, it was discovered that the average file size is
smaller than the minimum fragment size.
6027-451 No disks= list found in mount options.
User response: Correct the indicated command
Explanation: No 'disks=' clause found in the mount parameters.
options list when opening a file system.
User response: Check the operating system's file

Chapter 13. Messages 129


6027-468 • 6027-477

6027-468 Disk name listed in fileName or local | 6027-472 [E] File system format version versionString
mmsdrfs file, not found in device name. is not supported.
Run: mmcommon recoverfs name.
Explanation: The current file system format version is
Explanation: Tried to access a file system but the disks not supported.
listed in the operating system's file system database or
User response: Verify:
the local mmsdrfs file for the device do not exist in the
file system. 1. The disks are correctly defined on all nodes.
2. The paths to the disks are correctly defined and
User response: Check the configuration and
operative.
availability of disks. Run the mmcommon recoverfs
device command. If this does not resolve the problem,
configuration data in the SDR may be incorrect. If no | 6027-473 [X] File System fileSystem unmounted by the
user modifications have been made to the SDR, contact system with return code value reason
the IBM Support Center. If user modifications have code value
been made, correct these modifications.
Explanation: Console log entry caused by a forced
unmount due to disk or communication failure.
6027-469 File system name does not match
descriptor. User response: Correct the underlying problem and
remount the file system.
Explanation: The file system name found in the
descriptor on disk does not match the corresponding
device name in /etc/filesystems. | 6027-474 [X] Recovery Log I/O Failed, Unmounting
file system fileSystem
User response: Check the operating system's file
system database. Explanation: I/O to the recovery log failed.
User response: Check the paths to all disks making up
6027-470 Disk name may still belong to file the file system. Run the mmlsdisk command to
system filesystem. Created on IPandTime. determine if GPFS has declared any disks unavailable.
Repair any paths to disks that have failed. Remount the
Explanation: The disk being added by the mmcrfs, file system.
mmadddisk, or mmrpldisk command appears to still
belong to some file system.
6027-475 The option '--inode-limit' is not enabled.
User response: Verify that the disks you are adding Use option '-V' to enable most recent
do not belong to an active file system, and use the -v features.
no option to bypass this check. Use this option only if
you are sure that no other file system has this disk Explanation: mmchfs --inode-limit is not enabled
configured because you may cause data corruption in under the current file system format version.
both file systems if this is not the case. User response: Run mmchfs -V, this will change the
file system format to the latest format supported.
6027-471 Disk diskName: Incompatible file system
descriptor version or not formatted. 6027-476 Restricted mount using only available
Explanation: Possible reasons for the error are: file system descriptor.

1. A file system descriptor version that is not valid Explanation: Fewer than the necessary number of file
was encountered. system descriptors were successfully read. Using the
2. No file system descriptor can be found. best available descriptor to allow the restricted mount
to continue.
3. Disks are not correctly defined on all active nodes.
4. Disks, logical volumes, network shared disks, or User response: Informational message only.
virtual shared disks were incorrectly reconfigured
after creating a file system. 6027-477 The option -z is not enabled. Use the -V
User response: Verify: option to enable most recent features.

1. The disks are correctly defined on all nodes. Explanation: The file system format version does not
2. The paths to the disks are correctly defined and support the -z option on the mmchfs command.
operative. User response: Change the file system format version
by issuing mmchfs -V.

130 GPFS: Problem Determination Guide


6027-478 • 6027-488

6027-478 The option -z could not be changed. 6027-484 Remount failed for device after daemon
fileSystem is still in use. restart.
Explanation: The file system is still mounted or Explanation: A remount failed after daemon restart.
another GPFS administration command (mm...) is This ordinarily occurs because one or more disks are
running against the file system. unavailable. Other possibilities include loss of
connectivity to one or more disks.
User response: Unmount the file system if it is
mounted, and wait for any command that is running to User response: Issue the mmlsdisk command and
complete before reissuing the mmchfs -z command. check for down disks. Issue the mmchdisk command
to start any down disks, then remount the file system.
If there is another problem with the disks or the
| 6027-479 [N] Mount of fsName was blocked by
connections to the disks, take necessary corrective
fileName
actions and remount the file system.
Explanation: The internal or external mount of the file
system was blocked by the existence of the specified
6027-485 Perform mmchdisk for any disk failures
file.
and re-mount.
User response: If the file system needs to be mounted,
Explanation: Occurs in conjunction with 6027-484.
remove the specified file.
User response: Follow the User response for 6027-484.
6027-480 Cannot enable DMAPI in a file system
with existing snapshots. 6027-486 No local device specified for
fileSystemName in clusterName.
Explanation: The user is not allowed to enable
DMAPI for a file system with existing snapshots. Explanation: While attempting to mount a remote file
system from another cluster, GPFS was unable to
User response: Delete all existing snapshots in the file
determine the local device name for this file system.
system and repeat the mmchfs command.
User response: There must be a /dev/sgname special
device defined. Check the error code. This is probably a
| 6027-481 [E] Remount failed for mountid id:
configuration error in the specification of a remote file
errnoDescription
system. Run mmremotefs show to check that the
Explanation: mmfsd restarted and tried to remount remote file system is properly configured.
any file systems that the VFS layer thinks are still
mounted.
6027-487 Failed to write the file system descriptor
User response: Check the errors displayed and the to disk diskName.
errno description.
Explanation: An error occurred when mmfsctl include
was writing a copy of the file system descriptor to one
| 6027-482 [E] Remount failed for device name: of the disks specified on the command line. This could
errnoDescription have been caused by a failure of the corresponding disk
device, or an error in the path to the disk.
Explanation: mmfsd restarted and tried to remount
any file systems that the VFS layer thinks are still User response: Verify that the disks are correctly
mounted. defined on all nodes. Verify that paths to all disks are
correctly defined and operational.
User response: Check the errors displayed and the
errno description.
6027-488 Error opening the exclusion disk file
fileName.
| 6027-483 [N] Remounted name
Explanation: Unable to retrieve the list of excluded
Explanation: mmfsd restarted and remounted the disks from an internal configuration file.
specified file system because it was in the kernel's list
of previously mounted file systems. User response: Ensure that GPFS executable files have
been properly installed on all nodes. Perform required
User response: Informational message only. configuration steps prior to starting GPFS.

Chapter 13. Messages 131


6027-489 • 6027-499 [X]

6027-489 Attention: The desired replication factor 6027-495 You have requested that the file system
exceeds the number of available be upgraded to version number. This
dataOrMetadata failure groups. This is will enable new functionality but will
allowed, but the files will not be prevent you from using the file system
replicated and will therefore be at risk. with earlier releases of GPFS. Do you
want to continue?
Explanation: You specified a number of replicas that
exceeds the number of failure groups available. Explanation: Verification request in response to the
mmchfs -V full command. This is a request to upgrade
User response: Reissue the command with a smaller
the file system and activate functions that are
replication factor, or increase the number of failure
incompatible with a previous release of GPFS.
groups.
User response: Enter yes if you want the conversion
to take place.
| 6027-490 [N] The descriptor replica on disk diskName
has been excluded.
6027-496 You have requested that the file system
Explanation: The file system descriptor quorum has
version for local access be upgraded to
been overridden and, as a result, the specified disk was
version number. This will enable some
excluded from all operations on the file system
new functionality but will prevent local
descriptor quorum.
nodes from using the file system with
User response: None. Informational message only. earlier releases of GPFS. Remote nodes
are not affected by this change. Do you
want to continue?
6027-492 The file system is already at file system
version number Explanation: Verification request in response to the
mmchfs -V command. This is a request to upgrade the
Explanation: The user tried to upgrade the file system file system and activate functions that are incompatible
format using mmchfs -V --version=v, but the specified with a previous release of GPFS.
version is smaller than the current version of the file
system. User response: Enter yes if you want the conversion
to take place.
User response: Specify a different value for the
--version option.
6027-497 The file system has already been
upgraded to number using -V full. It is
6027-493 File system version number is not not possible to revert back.
supported on nodeName nodes in the
cluster. Explanation: The user tried to upgrade the file system
format using mmchfs -V compat, but the file system
Explanation: The user tried to upgrade the file system has already been fully upgraded.
format using mmchfs -V, but some nodes in the local
cluster are still running an older GPFS release that does User response: Informational message only.
support the new format version.
User response: Install a newer version of GPFS on 6027-498 Incompatible file system format. Only
those nodes. file systems formatted with GPFS 3.2.1.5
or later can be mounted on this
platform.
6027-494 File system version number is not
supported on nodeName remote nodes Explanation: A user running GPFS on Microsoft
mounting the file system. Windows tried to mount a file system that was
formatted with a version of GPFS that did not have
Explanation: The user tried to upgrade the file system Windows support.
format using mmchfs -V, but the file system is still
mounted on some nodes in remote clusters that do User response: Create a new file system using current
support the new format version. GPFS code.

User response: Unmount the file system on the nodes


that do not support the new format version. | 6027-499 [X] An unexpected Device Mapper path
dmDevice (nsdId) has been detected. The
new path does not have a Persistent
Reserve set up. File system fileSystem
will be internally unmounted.
Explanation: A new device mapper path is detected or
a previously failed path is activated after the local

132 GPFS: Problem Determination Guide


6027-500 • 6027-518

device discovery has finished. This path lacks a


6027-511 Cannot unmount fileSystem:
Persistent Reserve, and can not be used. All device
errorDescription
paths must be active at mount time.
Explanation: There was an error unmounting the
User response: Check the paths to all disks making up
GPFS file system.
the file system. Repair any paths to disks which have
failed. Remount the file system. User response: Take the action indicated by errno
description.
6027-500 name loaded and configured.
6027-512 name not listed in /etc/vfs
Explanation: The kernel extension was loaded and
configured. Explanation: Error occurred while installing the GPFS
kernel extension, or when trying to mount a file
User response: None. Informational message only.
system.
User response: Check for the mmfs entry in /etc/vfs
6027-501 name:module moduleName unloaded.
Explanation: The kernel extension was unloaded.
6027-514 Cannot mount fileSystem on mountPoint:
User response: None. Informational message only. Already mounted.
Explanation: An attempt has been made to mount a
6027-502 Incorrect parameter: name. file system that is already mounted.

Explanation: mmfsmnthelp was called with an User response: None. Informational message only.
incorrect parameter.
User response: Contact the IBM Support Center. 6027-515 Cannot mount fileSystem on mountPoint
Explanation: There was an error mounting the named
6027-504 Not enough memory to allocate internal GPFS file system. Errors in the disk path usually cause
data structure. this problem.

Explanation: Self explanatory. User response: Take the action indicated by other
error messages and error log entries.
User response: Increase ulimit or paging space

6027-516 Cannot mount fileSystem


6027-505 Internal error, aborting.
Explanation: There was an error mounting the named
Explanation: Self explanatory. GPFS file system. Errors in the disk path usually cause
User response: Contact the IBM Support Center. this problem.
User response: Take the action indicated by other
6027-506 program: loadFile is already loaded at error messages and error log entries.
address.
Explanation: The program was already loaded at the 6027-517 Cannot mount fileSystem: errorString
address displayed. Explanation: There was an error mounting the named
User response: None. Informational message only. GPFS file system. Errors in the disk path usually cause
this problem.

6027-507 program: loadFile is not loaded. User response: Take the action indicated by other
error messages and error log entries.
Explanation: The program could not be loaded.
User response: None. Informational message only. 6027-518 Cannot mount fileSystem: Already
mounted.
6027-510 Cannot mount fileSystem on mountPoint: Explanation: An attempt has been made to mount a
errorString file system that is already mounted.
Explanation: There was an error mounting the GPFS User response: None. Informational message only.
file system.
User response: Determine action indicated by the
error messages and error log entries. Errors in the disk
path often cause this problem.

Chapter 13. Messages 133


6027-519 • 6027-539

6027-519 Cannot mount fileSystem on mountPoint: 6027-535 Disks up to size size can be added to
File system table full. storage pool pool.
Explanation: An attempt has been made to mount a Explanation: Based on the parameters given to
file system when the file system table is full. mmcrfs and the size and number of disks being
formatted, GPFS has formatted its allocation maps to
User response: None. Informational message only.
allow disks up the given size to be added to this
storage pool by the mmadddisk command.
6027-520 Cannot mount fileSystem: File system
User response: None. Informational message only. If
table full.
the reported maximum disk size is smaller than
Explanation: An attempt has been made to mount a necessary, delete the file system with mmdelfs and
file system when the file system table is full. rerun mmcrfs with either larger disks or a larger value
for the -n parameter.
User response: None. Informational message only.

6027-536 Insufficient system memory to run


6027-530 Mount of name failed: cannot mount GPFS daemon. Reduce page pool
restorable file system for read/write. memory size with the mmchconfig
Explanation: A file system marked as enabled for command or add additional RAM to
restore cannot be mounted read/write. system.

User response: None. Informational message only. Explanation: Insufficient memory for GPFS internal
data structures with current system and GPFS
configuration.
6027-531 The following disks of name will be
formatted on node nodeName: list. User response: Reduce page pool usage with the
mmchconfig command, or add additional RAM to
Explanation: Output showing which disks will be system.
formatted by the mmcrfs command.
User response: None. Informational message only. 6027-537 Disks up to size size can be added to
this file system.
| 6027-532 [E] The quota record recordNumber in file Explanation: Based on the parameters given to the
fileName is not valid. mmcrfs command and the size and number of disks
Explanation: A quota entry contained a checksum that being formatted, GPFS has formatted its allocation
is not valid. maps to allow disks up the given size to be added to
this file system by the mmadddisk command.
User response: Remount the file system with quotas
disabled. Restore the quota file from back up, and run User response: None, informational message only. If
mmcheckquota. the reported maximum disk size is smaller than
necessary, delete the file system with mmdelfs and
reissue the mmcrfs command with larger disks or a
| 6027-533 [W] Inode space inodeSpace in file system larger value for the -n parameter.
fileSystem is approaching the limit for
the maximum number of inodes.
6027-538 Error accessing disks.
Explanation: The number of files created is
approaching the file system limit. Explanation: The mmcrfs command encountered an
error accessing one or more of the disks.
User response: Use the mmchfileset command to
increase the maximum number of files to avoid User response: Verify that the disk descriptors are
reaching the inode limit and possible performance coded correctly and that all named disks exist and are
degradation. online.

6027-534 Cannot create a snapshot in a 6027-539 Unable to clear descriptor areas for
DMAPI-enabled file system, fileSystem.
rc=returnCode. Explanation: The mmdelfs command encountered an
Explanation: You cannot create a snapshot in a error while invalidating the file system control
DMAPI-enabled file system. structures on one or more disks in the file system being
deleted.
User response: Use the mmchfs command to disable
DMAPI, and reissue the command. User response: If the problem persists, specify the -p
option on the mmdelfs command.

134 GPFS: Problem Determination Guide


6027-540 • 6027-553

or 'up' availability. Issue the mmlsdisk command.


6027-540 Formatting file system.
Explanation: The mmcrfs command began to write
6027-547 Fileset filesetName was unlinked.
file system data structures onto the new disks.
Explanation: Fileset was already unlinked.
User response: None. Informational message only.
User response: None. Informational message only.
6027-541 Error formatting file system.
6027-548 Fileset filesetName unlinked from
Explanation: mmcrfs command encountered an error
filesetName.
while formatting a new file system. This is often an
I/O error. Explanation: A fileset being deleted contains junctions
to other filesets. The cited fileset were unlinked.
User response: Check the subsystems in the path to
the disk. Follow the instructions from other messages User response: None. Informational message only.
that appear with this one.

6027-549 Failed to open name.


| 6027-542 [N] Fileset in file system
| fileSystem:filesetName (id filesetId) has Explanation: The mount command was unable to
| been incompletely deleted. access a file system. Check the subsystems in the path
to the disk. This is often an I/O error.
Explanation: A fileset delete operation was
interrupted, leaving this fileset in an incomplete state. User response: Follow the suggested actions for the
other messages that occur with this one.
User response: Reissue the fileset delete command.

| 6027-550 [X] Allocation manager for fileSystem failed


6027-543 Error writing file system descriptor for to revoke ownership from node
fileSystem. nodeName.
Explanation: The mmcrfs command could not Explanation: An irrecoverable error occurred trying to
successfully write the file system descriptor in a revoke ownership of an allocation region. The
particular file system. Check the subsystems in the path allocation manager has panicked the file system to
to the disk. This is often an I/O error. prevent corruption of on-disk data.
User response: Check system error log, rerun mmcrfs. User response: Remount the file system.

6027-544 Could not invalidate disk of fileSystem. 6027-551 fileSystem is still in use.
Explanation: A disk could not be written to invalidate Explanation: The mmdelfs or mmcrfs command
its contents. Check the subsystems in the path to the found that the named file system is still mounted or
disk. This is often an I/O error. that another GPFS command is running against the file
system.
User response: Ensure the indicated logical volume is
writable. User response: Unmount the file system if it is
mounted, or wait for GPFS commands in progress to
terminate before retrying the command.
6027-545 Error processing fileset metadata file.
Explanation: There is no I/O path to critical metadata
6027-552 Scan completed successfully.
or metadata has been corrupted.
Explanation: The scan function has completed without
User response: Verify that the I/O paths to all disks
error.
are valid and that all disks are either in the 'recovering'
or 'up' availability states. If all disks are available and User response: None. Informational message only.
the problem persists, issue the mmfsck command to
repair damaged metadata
6027-553 Scan failed on number user or system
files.
6027-546 Error processing allocation map for
storage pool poolName. Explanation: Data may be lost as a result of pointers
that are not valid or unavailable disks.
Explanation: There is no I/O path to critical metadata,
or metadata has been corrupted. User response: Some files may have to be restored
from backup copies. Issue the mmlsdisk command to
User response: Verify that the I/O paths to all disks check the availability of all the disks that make up the
are valid, and that all disks are either in the 'recovering' file system.

Chapter 13. Messages 135


6027-554 • 6027-567

6027-554 Scan failed on number out of number user 6027-560 File system is already suspended.
or system files.
Explanation: The tsfsctl command was asked to
Explanation: Data may be lost as a result of pointers suspend a suspended file system.
that are not valid or unavailable disks.
User response: None. Informational message only.
User response: Some files may have to be restored
from backup copies. Issue the mmlsdisk command to
6027-561 Error migrating log.
check the availability of all the disks that make up the
file system. Explanation: There are insufficient available disks to
continue operation.
6027-555 The desired replication factor exceeds User response: Restore the unavailable disks and
the number of available failure groups. reissue the command.
Explanation: You have specified a number of replicas
that exceeds the number of failure groups available. 6027-562 Error processing inodes.
User response: Reissue the command with a smaller Explanation: There is no I/O path to critical metadata
replication factor or increase the number of failure or metadata has been corrupted.
groups.
User response: Verify that the I/O paths to all disks
are valid and that all disks are either in the recovering
6027-556 Not enough space for the desired or up availability. Issue the mmlsdisk command.
number of replicas.
Explanation: In attempting to restore the correct 6027-563 File system is already running.
replication, GPFS ran out of space in the file system.
The operation can continue but some data is not fully Explanation: The tsfsctl command was asked to
replicated. resume a file system that is already running.

User response: Make additional space available and User response: None. Informational message only.
reissue the command.
6027-564 Error processing inode allocation map.
6027-557 Not enough space or available disks to Explanation: There is no I/O path to critical metadata
properly balance the file. or metadata has been corrupted.
Explanation: In attempting to stripe data within the User response: Verify that the I/O paths to all disks
file system, data was placed on a disk other than the are valid and that all disks are either in the recovering
desired one. This is normally not a problem. or up availability. Issue the mmlsdisk command.
User response: Run mmrestripefs to rebalance all
files. 6027-565 Scanning user file metadata ...
Explanation: Progress information.
6027-558 Some data are unavailable.
User response: None. Informational message only.
Explanation: An I/O error has occurred or some disks
are in the stopped state.
6027-566 Error processing user file metadata.
User response: Check the availability of all disks by
issuing the mmlsdisk command and check the path to Explanation: Error encountered while processing user
all disks. Reissue the command. file metadata.
User response: None. Informational message only.
6027-559 Some data could not be read or written.
Explanation: An I/O error has occurred or some disks 6027-567 Waiting for pending file system scan to
are in the stopped state. finish ...

User response: Check the availability of all disks and Explanation: Progress information.
the path to all disks, and reissue the command. User response: None. Informational message only.

136 GPFS: Problem Determination Guide


6027-568 • 6027-581

6027-568 Waiting for number pending file system 6027-575 Unable to complete low level format for
scans to finish ... | fileSystem. Failed with error errorCode
Explanation: Progress information. Explanation: The mmcrfs command was unable to
create the low level file structures for the file system.
User response: None. Informational message only.
User response: Check other error messages and the
error log. This is usually an error accessing disks.
6027-569 Incompatible parameters. Unable to
allocate space for file system metadata.
Change one or more of the following as 6027-576 Storage pools have not been enabled for
suggested and try again: file system fileSystem.
Explanation: Incompatible file system parameters Explanation: User invoked a command with a storage
were detected. pool option (-p or -P) before storage pools were
enabled.
User response: Refer to the details given and correct
the file system parameters. User response: Enable storage pools with the mmchfs
-V command, or correct the command invocation and
reissue the command.
6027-570 Incompatible parameters. Unable to
create file system. Change one or more
of the following as suggested and try 6027-577 Attention: number user or system files
again: are not properly replicated.
Explanation: Incompatible file system parameters Explanation: GPFS has detected files that are not
were detected. replicated correctly due to a previous failure.
User response: Refer to the details given and correct User response: Issue the mmrestripefs command at
the file system parameters. the first opportunity.

6027-571 Logical sector size value must be the 6027-578 Attention: number out of number user or
same as disk sector size. system files are not properly replicated:
Explanation: This message is produced by the mmcrfs Explanation: GPFS has detected files that are not
command if the sector size given by the -l option is not replicated correctly
the same as the sector size given for disks in the -d
option.
6027-579 Some unreplicated file system metadata
User response: Correct the options and reissue the has been lost. File system usable only in
command. restricted mode.
Explanation: A disk was deleted that contained vital
6027-572 Completed creation of file system file system metadata that was not replicated.
fileSystem.
User response: Mount the file system in restricted
Explanation: The mmcrfs command has successfully mode (-o rs) and copy any user data that may be left
completed. on the file system. Then delete the file system.
User response: None. Informational message only.
6027-580 Unable to access vital system metadata.
Too many disks are unavailable.
| 6027-573 All data on the following disks of
fileSystem will be destroyed: Explanation: Metadata is unavailable because the
disks on which the data reside are stopped, or an
Explanation: Produced by the mmdelfs command to
attempt was made to delete them.
list the disks in the file system that is about to be
destroyed. Data stored on the disks will be lost. User response: Either start the stopped disks, try to
delete the disks again, or recreate the file system.
User response: None. Informational message only.

6027-581 Unable to access vital system metadata,


6027-574 Completed deletion of file system
file system corrupted.
fileSystem.
Explanation: When trying to access the files system,
Explanation: The mmdelfs command has successfully
the metadata was unavailable due to a disk being
completed.
deleted.
User response: None. Informational message only.

Chapter 13. Messages 137


6027-582 • 6027-593 [E]

User response: Determine why a disk is unavailable. command must be run with the file system unmounted.

6027-582 Some data has been lost. 6027-588 No more than number nodes can mount
a file system.
Explanation: An I/O error has occurred or some disks
are in the stopped state. Explanation: The limit of the number of nodes that
can mount a file system was exceeded.
User response: Check the availability of all disks by
issuing the mmlsdisk command and check the path to User response: Observe the stated limit for how many
all disks. Reissue the command. nodes can mount a file system.

6027-584 Incompatible parameters. Unable to 6027-589 Scanning file system metadata, phase
allocate space for root directory. Change number ...
one or more of the following as
Explanation: Progress information.
suggested and try again:
User response: None. Informational message only.
Explanation: Inconsistent parameters have been
passed to the mmcrfs command, which would result in
the creation of an inconsistent file system. Suggested | 6027-590 [W] GPFS is experiencing a shortage of
parameter changes are given. pagepool. This message will not be
repeated for at least one hour.
User response: Reissue the mmcrfs command with the
suggested parameter changes. Explanation: Pool starvation occurs, buffers have to be
continually stolen at high aggressiveness levels.
6027-585 Incompatible parameters. Unable to User response: Issue the mmchconfig command to
allocate space for ACL data. Change one increase the size of pagepool.
or more of the following as suggested
and try again:
6027-591 Unable to allocate sufficient inodes for
Explanation: Inconsistent parameters have been file system metadata. Increase the value
passed to the mmcrfs command, which would result in for option and try again.
the creation of an inconsistent file system. The
parameters entered require more space than is Explanation: Too few inodes have been specified on
available. Suggested parameter changes are given. the -N option of the mmcrfs command.

User response: Reissue the mmcrfs command with the User response: Increase the size of the -N option and
suggested parameter changes. reissue the mmcrfs command.

6027-586 Quota server initialization failed. 6027-592 Mount of fileSystem is waiting for the
mount disposition to be set by some
Explanation: Quota server initialization has failed. data management application.
This message may appear as part of the detail data in
the quota error log. Explanation: Data management utilizing DMAPI is
enabled for the file system, but no data management
User response: Check status and availability of the application has set a disposition for the mount event.
disks. If quota files have been corrupted, restore them
from the last available backup. Finally, reissue the User response: Start the data management application
command. and verify that the application sets the mount
disposition.

6027-587 Unable to initialize quota client because


there is no quota server. Please check | 6027-593 [E] The root quota entry is not found in its
error log on the file system manager assigned record
node. The mmcheckquota command Explanation: On mount, the root entry is not found in
must be run with the file system the first record of the quota file.
unmounted before retrying the
command. User response: Issue the mmcheckquota command to
verify that the use of root has not been lost.
Explanation: startQuotaClient failed.
User response: If the quota file could not be read
(check error log on file system manager. Issue the
mmlsmgr command to determine which node is the
file system manager), then the mmcheckquota

138 GPFS: Problem Determination Guide


6027-594 • 6027-601

6027-594 Disk diskName cannot be added to | 6027-597 [E] The quota command was requested to
storage pool poolName. Allocation map process quotas for a type (user, group, or
cannot accommodate disks larger than fileset), which is not enabled.
size MB.
Explanation: A quota command was requested to
Explanation: The specified disk is too large compared process quotas for a user, group, or fileset quota type,
to the disks that were initially used to create the which is not enabled.
storage pool.
User response: Verify that the user, group, or fileset
User response: Specify a smaller disk or add the disk quota type is enabled and reissue the command.
to a new storage pool.
| 6027-598 [E] The supplied file does not contain quota
6027-595 [E] While creating quota files, file fileName, information.
with no valid quota information was
Explanation: A file supplied as a quota file does not
| found in the root directory. Remove files
contain quota information.
with reserved quota file names (e.g.
user.quota) without valid quota User response: Change the file so it contains valid
information from the root directory by: - quota information and reissue the command.
mounting the file system without
quotas, - removing the files, and - To mount the file system so that new quota files are
remounting the file system with quotas created:
to recreate new quota files. To use quota 1. Mount the file system without quotas.
file names other than the reserved 2. Verify there are no files in the root directory with
names, use the mmcheckquota the reserved user.quota or group.quota name.
command.
3. Remount the file system with quotas.
Explanation: While mounting a file system, the state
of the file system descriptor indicates that quota files
| 6027-599 [E] File supplied to the command does not
do not exist. However, files that do not contain quota
exist in the root directory.
information but have one of the reserved names:
user.quota, group.quota, or fileset.quota exist in the Explanation: The user-supplied name of a new quota
root directory. file has not been found.
User response: To mount the file system so that new User response: Ensure that a file with the supplied
quota files will be created, perform these steps: name exists. Then reissue the command.
1. Mount the file system without quotas.
2. Verify that there are no files in the root directory 6027-600 On node nodeName an earlier error may
with the reserved names: user.quota, group.quota, have caused some file system data to be
or fileset.quota. inaccessible at this time. Check error log
3. Remount the file system with quotas. To mount the for additional information. After
file system with other files used as quota files, issue correcting the problem, the file system
the mmcheckquota command. can be mounted again to restore normal
data access.

| 6027-596 [I] While creating quota files, file fileName Explanation: An earlier error may have caused some
containing quota information was found file system data to be inaccessible at this time.
in the root directory. This file will be User response: Check the error log for additional
used as quotaType quota file. information. After correcting the problem, the file
Explanation: While mounting a file system, the state system can be mounted again.
of the file system descriptor indicates that quota files
do not exist. However, files that have one of the 6027-601 Error changing pool size.
reserved names user.quota, group.quota, or
fileset.quota and contain quota information, exist in the Explanation: The mmchconfig command failed to
root directory. The file with the reserved name will be change the pool size to the requested value.
used as the quota file. User response: Follow the suggested actions in the
User response: None. Informational message. other messages that occur with this one.

Chapter 13. Messages 139


6027-602 • 6027-613 [N]

disk are unavailable, and issue the mmchdisk if


6027-602 ERROR: file system not mounted.
necessary.
Mount file system fileSystem and retry
command.
6027-609 File system fileSystem unmounted
Explanation: A GPFS command that requires the file
because it does not have a manager.
system be mounted was issued.
Explanation: The file system had to be unmounted
User response: Mount the file system and reissue the
because a file system manager could not be assigned.
command.
An accompanying message tells which node was the
last manager.
6027-603 Current pool size: valueK = valueM, max
User response: Examine error log on the last file
block size: valueK = valueM.
system manager. Issue the mmlsdisk command to
Explanation: Displays the current pool size. determine if a number of disks are down. Examine the
other error logs for an indication of network, disk, or
User response: None. Informational message only.
virtual shared disk problems. Repair the base problem
and issue the mmchdisk command if required.
| 6027-604 [E] Parameter incompatibility. File system
block size is larger than maxblocksize
6027-610 Cannot mount file system fileSystem
parameter.
because it does not have a manager.
Explanation: An attempt is being made to mount a
Explanation: The file system had to be unmounted
file system whose block size is larger than the
because a file system manager could not be assigned.
maxblocksize parameter as set by mmchconfig.
An accompanying message tells which node was the
User response: Use the mmchconfig last manager.
maxblocksize=xxx command to increase the maximum
User response: Examine error log on the last file
allowable block size.
system manager node. Issue the mmlsdisk command to
determine if a number of disks are down. Examine the
| 6027-605 [N] File system has been renamed. other error logs for an indication of disk or network
shared disk problems. Repair the base problem and
Explanation: Self-explanatory. issue the mmchdisk command if required.
User response: None. Informational message only.
| 6027-611 [I] Recovery: fileSystem, delay number sec.
| 6027-606 [E] The node number nodeNumber is not for safe recovery.
defined in the node list Explanation: Informational. When disk leasing is in
Explanation: A node matching nodeNumber was not use, wait for the existing lease to expire before
found in the GPFS configuration file. performing log and token manager recovery.

User response: Perform required configuration steps User response: None.


prior to starting GPFS on the node.
6027-612 Unable to run command while the file
6027-607 mmcommon getEFOptions fileSystem system is suspended.
failed. Return code value. Explanation: A command that can alter data in a file
Explanation: The mmcommon getEFOptions system was issued while the file system was
command failed while looking up the names of the suspended.
disks in a file system. This error usually occurs during User response: Resume the file system and reissue the
mount processing. command.
User response: Check the preceding messages. A
frequent cause for such errors is lack of space in /var. | 6027-613 [N] Expel node request from node. Expelling:
node
| 6027-608 [E] File system manager takeover failed. Explanation: One node is asking to have another node
Explanation: An attempt to takeover as file system expelled from the cluster, usually because they have
manager failed. The file system is unmounted to allow communications problems between them. The cluster
another node to try. manager node will decide which one will be expelled.

User response: Check the return code. This is usually User response: Check that the communications paths
due to network or disk connectivity problems. Issue the are available between the two nodes.
mmlsdisk command to determine if the paths to the

140 GPFS: Problem Determination Guide


6027-614 • 6027-628

6027-614 Value value for option name is out of 6027-621 Negative quota limits are not allowed.
range. Valid values are number through
Explanation: The quota value must be positive.
number.
User response: Reissue the mmedquota command and
Explanation: The value for an option in the command
enter valid values when editing the information.
line arguments is out of range.
User response: Correct the command line and reissue
| 6027-622 [E] Failed to join remote cluster clusterName
the command.
Explanation: The node was not able to establish
communication with another cluster, usually while
6027-615 mmcommon getContactNodes
attempting to mount a file system from a remote
clusterName failed. Return code value.
cluster.
Explanation: mmcommon getContactNodes failed
User response: Check other console messages for
while looking up contact nodes for a remote cluster,
additional information. Verify that contact nodes for the
usually while attempting to mount a file system from a
remote cluster are set correctly. Run mmremotefs show
remote cluster.
and mmremotecluster show to display information
User response: Check the preceding messages, and about the remote cluster.
consult the earlier chapters of this document. A
frequent cause for such errors is lack of space in /var.
6027-623 All disks up and ready
Explanation: Self-explanatory.
| 6027-616 [X] Duplicate address ipAddress in node list
User response: None. Informational message only.
Explanation: The IP address appears more than once
in the node list file.
6027-624 No disks
User response: Check the node list shown by the
mmlscluster command. Explanation: Self-explanatory.
User response: None. Informational message only.
| 6027-617 [I] Recovered number nodes for cluster
clusterName.
6027-625 Migrate already pending.
Explanation: The asynchronous part (phase 2) of node
failure recovery has completed. Explanation: A request to migrate the file system
manager failed because a previous migrate request has
User response: None. Informational message only. not yet completed.
User response: None. Informational message only.
| 6027-618 [X] Local host not found in node list (local
ip interfaces: interfaceList)
6027-626 Migrate to node nodeName already
Explanation: The local host specified in the node list pending.
file could not be found.
Explanation: A request to migrate the file system
User response: Check the node list shown by the manager failed because a previous migrate request has
mmlscluster command. not yet completed.
User response: None. Informational message only.
6027-619 Negative grace times are not allowed.
Explanation: The mmedquota command received a 6027-627 Node nodeName is already manager for
negative value for the -t option. fileSystem.
User response: Reissue the mmedquota command Explanation: A request has been made to change the
with a nonnegative value for grace time. file system manager node to the node that is already
the manager.
6027-620 Hard quota limit must not be less than User response: None. Informational message only.
soft limit.
Explanation: The hard quota limit must be greater 6027-628 Sending migrate request to current
than or equal to the soft quota limit. manager node nodeName.
User response: Reissue the mmedquota command and Explanation: A request has been made to change the
enter valid values when editing the information. file system manager node.

Chapter 13. Messages 141


6027-629 [N] • 6027-640 [E]

User response: None. Informational message only.


| 6027-635 [E] The current file system manager failed
and no new manager will be appointed.
| 6027-629 [N] Node nodeName resigned as manager for
Explanation: The file system manager node could not
fileSystem.
be replaced. This is usually caused by other system
Explanation: Progress report produced by the errors, such as disk or communication errors.
mmchmgr command.
User response: See accompanying messages for the
User response: None. Informational message only. base failure.

| 6027-630 [N] Node nodeName appointed as manager | 6027-636 [E] Disk marked as stopped or offline.
for fileSystem.
Explanation: A disk continues to be marked down
Explanation: The mmchmgr command successfully due to a previous error and was not opened again.
changed the node designated as the file system
User response: Check the disk status by issuing the
manager.
mmlsdisk command, then issue the mmchdisk start
User response: None. Informational message only. command to restart the disk.

6027-631 Failed to appoint node nodeName as | 6027-637 [E] RVSD is not active.
manager for fileSystem.
Explanation: The RVSD subsystem needs to be
Explanation: A request to change the file system activated.
manager node has failed.
User response: See the appropriate IBM Reliable
User response: Accompanying messages will describe Scalable Cluster Technology (RSCT) document at:
the reason for the failure. Also, see the mmfs.log file on publib.boulder.ibm.com/clresctr/windows/public/
the target node. rsctbooks.html and search on diagnosing IBM Virtual
Shared Disk problems.

6027-632 Failed to appoint new manager for


fileSystem. | 6027-638 [E] File system fileSystem unmounted by
node nodeName
Explanation: An attempt to change the file system
manager node has failed. Explanation: Produced in the console log on a forced
unmount of the file system caused by disk or
User response: Accompanying messages will describe communication failures.
the reason for the failure. Also, see the mmfs.log file on
the target node. User response: Check the error log on the indicated
node. Correct the underlying problem and remount the
file system.
6027-633 Best choice node nodeName already
manager for fileSystem.
| 6027-639 [E] File system cannot be mounted in
Explanation: Informational message about the restricted mode and ro or rw
progress and outcome of a migrate request. concurrently
User response: None. Informational message only. Explanation: There has been an attempt to
concurrently mount a file system on separate nodes in
6027-634 Node name or number node is not valid. both a normal mode and in 'restricted' mode.

Explanation: A node number, IP address, or host User response: Decide which mount mode you want
name that is not valid has been entered in the to use, and use that mount mode on both nodes.
configuration file or as input for a command.
User response: Validate your configuration | 6027-640 [E] File system is mounted
information and the condition of your network. This Explanation: A command has been issued that
message may result from an inability to translate a requires that the file system be unmounted.
node name.
User response: Unmount the file system and reissue
the command.

142 GPFS: Problem Determination Guide


6027-641 [E] • 6027-661

| 6027-641 [E] Unable to access vital system metadata. | 6027-646 [E] File system unmounted due to loss of
Too many disks are unavailable or the cluster membership.
file system is corrupted.
Explanation: Quorum was lost, causing file systems to
Explanation: An attempt has been made to access a be unmounted.
file system, but the metadata is unavailable. This can be
User response: Get enough nodes running the GPFS
caused by:
daemon to form a quorum.
1. The disks on which the metadata resides are either
stopped or there was an unsuccessful attempt to
delete them. | 6027-647 [E] File fileName could not be run with err
errno.
2. The file system is corrupted.
Explanation: The specified shell script could not be
User response: To access the file system:
run. This message is followed by the error string that is
1. If the disks are the problem either start the stopped returned by the exec.
disks or try to delete them.
User response: Check file existence and access
2. If the file system has been corrupted, you will have
permissions.
to recreate it from backup medium.

6027-648 EDITOR environment variable must be


| 6027-642 [N] File system has been deleted.
full pathname.
Explanation: Self-explanatory.
Explanation: The value of the EDITOR environment
User response: None. Informational message only. variable is not an absolute path name.
User response: Change the value of the EDITOR
| 6027-643 [I] Node nodeName completed take over for environment variable to an absolute path name.
fileSystem.
Explanation: The mmchmgr command completed 6027-649 Error reading the mmpmon command
successfully. file.
User response: None. Informational message only. Explanation: An error occurred when reading the
mmpmon command file.
6027-644 The previous error was detected on User response: Check file existence and access
node nodeName. permissions.
Explanation: An unacceptable error was detected. This
usually occurs when attempting to retrieve file system | 6027-650 [X] The mmfs daemon is shutting down
information from the operating system's file system abnormally.
database or the cached GPFS system control data. The
Explanation: The GPFS daemon is shutting down as a
message identifies the node where the error was
result of an irrecoverable condition, typically a resource
encountered.
shortage.
User response: See accompanying messages for the
User response: Review error log entries, correct a
base failure. A common cause for such errors is lack of
resource shortage condition, and restart the GPFS
space in /var.
daemon.

6027-645 Attention: mmcommon getEFOptions


6027-660 Error displaying message from mmfsd.
fileSystem failed. Checking fileName.
Explanation: GPFS could not properly display an
Explanation: The names of the disks in a file system
output string sent from the mmfsd daemon due to
were not found in the cached GPFS system data,
some error. A description of the error follows.
therefore an attempt will be made to get the
information from the operating system's file system User response: Check that GPFS is properly installed.
database.
User response: If the command fails, see “File system 6027-661 mmfsd waiting for primary node
will not mount” on page 61. A common cause for such nodeName.
errors is lack of space in /var.
Explanation: The mmfsd server has to wait during
start up because mmfsd on the primary node is not yet
ready.
User response: None. Informational message only.

Chapter 13. Messages 143


6027-662 • 6027-675

6027-662 mmfsd timed out waiting for primary 6027-668 Could not send message to file system
node nodeName. daemon
Explanation: The mmfsd server is about to terminate. Explanation: Attempt to send a message to the file
system failed.
User response: Ensure that the mmfs.cfg
configuration file contains the correct host name or IP User response: Check if the file system daemon is up
address of the primary node. Check mmfsd on the and running.
primary node.
6027-669 Could not connect to file system
6027-663 Lost connection to file system daemon. daemon.
Explanation: The connection between a GPFS Explanation: The TCP connection between the
command and the mmfsd daemon has broken. The command and the daemon could not be established.
daemon has probably crashed.
User response: Check additional error messages.
User response: Ensure that the mmfsd daemon is
running. Check the error log.
6027-670 Value for 'option' is not valid. Valid
values are list.
6027-664 Unexpected message from file system
Explanation: The specified value for the given
daemon.
command option was not valid. The remainder of the
Explanation: The version of the mmfsd daemon does line will list the valid keywords.
not match the version of the GPFS command.
User response: Correct the command line.
User response: Ensure that all GPFS software
components are at the same version.
6027-671 Keyword missing or incorrect.
Explanation: A missing or incorrect keyword was
6027-665 Failed to connect to file system daemon:
encountered while parsing command line arguments
errorString
User response: Correct the command line.
Explanation: An error occurred while trying to create
a session with mmfsd.
6027-672 Too few arguments specified.
User response: Ensure that the mmfsd daemon is
running. Also, only root can run most GPFS Explanation: Too few arguments were specified on the
commands. The mode bits of the commands must be command line.
set-user-id to root.
User response: Correct the command line.

6027-666 Failed to determine file system manager.


6027-673 Too many arguments specified.
Explanation: While running a GPFS command in a
multiple node configuration, the local file system Explanation: Too many arguments were specified on
daemon is unable to determine which node is the command line.
managing the file system affected by the command. User response: Correct the command line.
User response: Check internode communication
configuration and ensure that enough GPFS nodes are 6027-674 Too many values specified for option
up to form a quorum. name.
Explanation: Too many values were specified for the
6027-667 Could not set up socket given option on the command line.
Explanation: One of the calls to create or bind the User response: Correct the command line.
socket used for sending parameters and messages
between the command and the daemon failed.
6027-675 Required value for option is missing.
User response: Check additional error messages.
Explanation: A required value was not specified for
the given option on the command line.
User response: Correct the command line.

144 GPFS: Problem Determination Guide


6027-676 • 6027-690

6027-676 Option option specified more than once. 6027-684 Value value for option is incorrect.
Explanation: The named option was specified more Explanation: An incorrect value was specified for the
than once on the command line. named option.
User response: Correct the command line. User response: Correct the command line.

6027-677 Option option is incorrect. 6027-685 Value value for option option is out of
range. Valid values are number through
Explanation: An incorrect option was specified on the
number.
command line.
Explanation: An out of range value was specified for
User response: Correct the command line.
the named option.
User response: Correct the command line.
6027-678 Misplaced or incorrect parameter name.
Explanation: A misplaced or incorrect parameter was
6027-686 option (value) exceeds option (value).
specified on the command line.
Explanation: The value of the first option exceeds the
User response: Correct the command line.
value of the second option. This is not permitted.
User response: Correct the command line.
6027-679 Device name is not valid.
Explanation: An incorrect device name was specified
6027-687 Disk name is specified more than once.
on the command line.
Explanation: The named disk was specified more than
User response: Correct the command line.
once on the command line.
User response: Correct the command line.
| 6027-680 [E] Disk failure. Volume name. rc = value.
| Physical volume name.
6027-688 Failed to read file system descriptor.
| Explanation: An I/O request to a disk or a request to
| fence a disk has failed in such a manner that GPFS can Explanation: The disk block containing critical
| no longer use the disk. information about the file system could not be read
from disk.
| User response: Check the disk hardware and the
| software subsystems in the path to the disk. User response: This is usually an error in the path to
the disks. If there are associated messages indicating an
I/O error such as ENODEV or EIO, correct that error
6027-681 Required option name was not specified.
and retry the operation. If there are no associated I/O
Explanation: A required option was not specified on errors, then run the mmfsck command with the file
the command line. system unmounted.

User response: Correct the command line.


6027-689 Failed to update file system descriptor.
6027-682 Device argument is missing. Explanation: The disk block containing critical
information about the file system could not be written
Explanation: The device argument was not specified to disk.
on the command line.
User response: This is a serious error, which may
User response: Correct the command line. leave the file system in an unusable state. Correct any
I/O errors, then run the mmfsck command with the
6027-683 Disk name is invalid. file system unmounted to make repairs.

Explanation: An incorrect disk name was specified on


the command line. 6027-690 Failed to allocate I/O buffer.

User response: Correct the command line. Explanation: Could not obtain enough memory
(RAM) to perform an operation.
User response: Either retry the operation when the
mmfsd daemon is less heavily loaded, or increase the
size of one or more of the memory pool parameters by
issuing the mmchconfig command.

Chapter 13. Messages 145


6027-691 • 6027-702 [X]

6027-691 Failed to send message to node | 6027-698 [E] Not enough memory to allocate internal
nodeName. data structure.
Explanation: A message to another file system node Explanation: A file system operation failed because no
could not be sent. memory is available for allocating internal data
structures.
User response: Check additional error message and
the internode communication configuration. User response: Stop other processes that may have
main memory pinned for their use.
6027-692 Value for option is not valid. Valid
values are yes, no. | 6027-699 [E] Inconsistency in file system metadata.
Explanation: An option that is required to be yes or Explanation: File system metadata on disk has been
no is neither. corrupted.
User response: Correct the command line. User response: This is an extremely serious error that
may cause loss of data. Issue the mmfsck command
with the file system unmounted to make repairs. There
6027-693 Cannot open disk name.
will be a POSSIBLE FILE CORRUPTION entry in the
Explanation: Could not access the given disk. system error log that should be forwarded to the IBM
Support Center.
User response: Check the disk hardware and the path
to the disk.
| 6027-700 [E] Log recovery failed.
6027-694 Disk not started; disk name has a bad Explanation: An error was encountered while
volume label. restoring file system metadata from the log.
Explanation: The volume label on the disk does not User response: Check additional error message. A
match that expected by GPFS. likely reason for this error is that none of the replicas of
the log could be accessed because too many disks are
User response: Check the disk hardware. For currently unavailable. If the problem persists, issue the
hot-pluggable drives, ensure that the proper drive has mmfsck command with the file system unmounted.
been plugged in.

| 6027-701 [X] Some file system data are inaccessible at


| 6027-695 [E] File system is read-only. this time.
Explanation: An operation was attempted that would Explanation: The file system has encountered an error
require modifying the contents of a file system, but the that is serious enough to make some or all data
file system is read-only. inaccessible. This message indicates that an occurred
User response: Make the file system R/W before that left the file system in an unusable state.
retrying the operation. User response: Possible reasons include too many
unavailable disks or insufficient memory for file system
| 6027-696 [E] Too many disks are unavailable. control structures. Check other error messages as well
as the error log for additional information. Unmount
Explanation: A file system operation failed because all the file system and correct any I/O errors. Then
replicas of a data or metadata block are currently remount the file system and try the operation again. If
unavailable. the problem persists, issue the mmfsck command with
User response: Issue the mmlsdisk command to check the file system unmounted to make repairs.
the availability of the disks in the file system; correct
disk hardware problems, and then issue the mmchdisk | 6027-702 [X] Some file system data are inaccessible at
command with the start option to inform the file this time. Check error log for additional
system that the disk or disks are available again. information. After correcting the
problem, the file system must be
| 6027-697 [E] No log available. unmounted and then mounted to restore
normal data access.
Explanation: A file system operation failed because no
space for logging metadata changes could be found. Explanation: The file system has encountered an error
that is serious enough to make some or all data
User response: Check additional error message. A inaccessible. This message indicates that an error
likely reason for this error is that all disks with occurred that left the file system in an unusable state.
available log space are currently unavailable.
User response: Possible reasons include too many

146 GPFS: Problem Determination Guide


6027-703 [X] • 6027-713

unavailable disks or insufficient memory for file system system database (the given file) for a valid device entry.
control structures. Check other error messages as well
as the error log for additional information. Unmount
6027-707 Unable to open file fileName.
the file system and correct any I/O errors. Then
remount the file system and try the operation again. If Explanation: The named file cannot be opened.
the problem persists, issue the mmfsck command with
the file system unmounted to make repairs. User response: Check that the file exists and has the
correct permissions.

| 6027-703 [X] Some file system data are inaccessible at


this time. Check error log for additional 6027-708 Keyword name is incorrect. Valid values
information. are list.

Explanation: The file system has encountered an error Explanation: An incorrect keyword was encountered.
that is serious enough to make some or all data User response: Correct the command line.
inaccessible. This message indicates that an error
occurred that left the file system in an unusable state.
6027-709 Incorrect response. Valid responses are
User response: Possible reasons include too many "yes", "no", or "noall"
unavailable disks or insufficient memory for file system
control structures. Check other error messages as well Explanation: A question was asked that requires a yes
as the error log for additional information. Unmount or no answer. The answer entered was neither yes, no,
the file system and correct any I/O errors. Then nor noall.
remount the file system and try the operation again. If User response: Enter a valid response.
the problem persists, issue the mmfsck command with
the file system unmounted to make repairs.
6027-710 Attention:

6027-704 Attention: Due to an earlier error Explanation: Precedes an attention messages.


normal access to this file system has User response: None. Informational message only.
been disabled. Check error log for
additional information. After correcting
the problem, the file system must be | 6027-711 [E] Specified entity, such as a disk or file
unmounted and then mounted again to system, does not exist.
restore normal data access.
Explanation: A file system operation failed because
Explanation: The file system has encountered an error the specified entity, such as a disk or file system, could
that is serious enough to make some or all data not be found.
inaccessible. This message indicates that an error
User response: Specify existing disk, file system, etc.
occurred that left the file system in an unusable state.
User response: Possible reasons include too many
| 6027-712 [E] Error in communications between
unavailable disks or insufficient memory for file system
mmfsd daemon and client program.
control structures. Check other error messages as well
as the error log for additional information. Unmount Explanation: A message sent between the mmfsd
the file system and correct any I/O errors. Then daemon and the client program had an incorrect format
remount the file system and try the operation again. If or content.
the problem persists, issue the mmfsck command with
User response: Verify that the mmfsd daemon is
the file system unmounted to make repairs.
running.

6027-705 Error code value.


6027-713 Unable to start because conflicting
Explanation: Provides additional information about an program name is running. Waiting until
error. it completes.
User response: See accompanying error messages. Explanation: A program detected that it cannot start
because a conflicting program is running. The program
will automatically start once the conflicting program
6027-706 The device name has no corresponding
has ended, as long as there are no other conflicting
entry in fileName or has an incomplete
programs running at that time.
entry.
User response: None. Informational message only.
Explanation: The command requires a device that has
a file system associated with it.
User response: Check the operating system's file

Chapter 13. Messages 147


6027-714 • 6027-724 [E]

disk being added to the file system.


6027-714 Terminating because conflicting
program name is running.
| 6027-721 [E] Host 'name' in fileName is not valid.
Explanation: A program detected that it must
terminate because a conflicting program is running. Explanation: A host name or IP address that is not
valid was found in a configuration file.
User response: Reissue the command once the
conflicting program has ended. User response: Check the configuration file specified
in the error message.
6027-715 command is finished waiting. Starting
execution now. 6027-722 Attention: Due to an earlier error
normal access to this file system has
Explanation: A program detected that it can now
been disabled. Check error log for
begin running because a conflicting program has
additional information. The file system
ended.
must be mounted again to restore
User response: None. Information message only. normal data access.
Explanation: The file system has encountered an error
| 6027-716 [E] Some file system data or metadata has that is serious enough to make some or all data
been lost. inaccessible. This message indicates that an error
occurred that left the file system in an unusable state.
Explanation: Unable to access some piece of file
Possible reasons include too many unavailable disks or
system data that has been lost due to the deletion of
insufficient memory for file system control structures.
disks beyond the replication factor.
User response: Check other error messages as well as
User response: If the function did not complete, try to
the error log for additional information. Correct any
mount the file system in restricted mode.
I/O errors. Then, remount the file system and try the
operation again. If the problem persists, issue the
| 6027-717 [E] Must execute mmfsck before mount. mmfsck command with the file system unmounted to
make repairs.
Explanation: An attempt has been made to mount a
file system on which an incomplete mmfsck command
was run. 6027-723 Attention: Due to an earlier error
normal access to this file system has
User response: Reissue the mmfsck command to the been disabled. Check error log for
repair file system, then reissue the mount command. additional information. After correcting
the problem, the file system must be
6027-718 The mmfsd daemon is not ready to mounted again to restore normal data
handle commands yet. access.

Explanation: The mmfsd daemon is not accepting Explanation: The file system has encountered an error
messages because it is restarting or stopping. that is serious enough to make some or all data
inaccessible. This message indicates that an error
User response: None. Informational message only. occurred that left the file system in an unusable state.
Possible reasons include too many unavailable disks or
| 6027-719 [E] Device type not supported. insufficient memory for file system control structures.

Explanation: A disk being added to a file system with User response: Check other error messages as well as
the mmadddisk or mmcrfs command is not a character the error log for additional information. Correct any
mode special file, or has characteristics not recognized I/O errors. Then, remount the file system and try the
by GPFS. operation again. If the problem persists, issue the
mmfsck command with the file system unmounted to
User response: Check the characteristics of the disk make repairs.
being added to the file system.

| 6027-724 [E] Incompatible file system format.


| 6027-720 [E] Actual sector size does not match given
sector size. Explanation: An attempt was made to access a file
system that was formatted with an older version of the
Explanation: A disk being added to a file system with product that is no longer compatible with the version
the mmadddisk or mmcrfs command has a physical currently running.
sector size that differs from that given in the disk
description list. User response: To change the file system format
version to the current version, issue the -V option on
User response: Check the physical sector size of the the mmchfs command.

148 GPFS: Problem Determination Guide


6027-725 • 6027-736

6027-725 The mmfsd daemon is not ready to 6027-731 Error number while performing command
handle commands yet. Waiting for for name quota on fileSystem
quorum.
Explanation: An error occurred when switching
Explanation: The GPFS mmfsd daemon is not quotas of a certain type on or off. If errors were
accepting messages because it is waiting for quorum. returned for multiple file systems, only the error code
is shown.
User response: Determine why insufficient nodes have
joined the group to achieve quorum and rectify the User response: Check the error code shown by the
problem. message to determine the reason.

| 6027-726 [E] Quota initialization/start-up failed. 6027-732 Error while performing command on
fileSystem.
Explanation: Quota manager initialization was
unsuccessful. The file system manager finished without Explanation: An error occurred while performing the
quotas. Subsequent client mount requests will fail. stated command when listing or reporting quotas.
User response: Check the error log and correct I/O User response: None. Informational message only.
errors. It may be necessary to issue the mmcheckquota
command with the file system unmounted.
6027-733 Edit quota: Incorrect format!
Explanation: The format of one or more edited quota
6027-727 Specified driver type type does not
limit entries was not correct.
match disk name driver type type.
User response: Reissue the mmedquota command.
Explanation: The driver type specified on the
Change only the values for the limits and follow the
mmchdisk command does not match the current driver
instructions given.
type of the disk.
User response: Verify the driver type and reissue the
command.
| 6027-734 [W] Quota check for 'fileSystem' ended
prematurely.
Explanation: The user interrupted and terminated the
6027-728 Specified sector size value does not
command.
match disk name sector size value.
User response: If ending the command was not
Explanation: The sector size specified on the
intended, reissue the mmcheckquota command.
mmchdisk command does not match the current sector
size of the disk.
6027-735 Error editing string from mmfsd.
User response: Verify the sector size and reissue the
command. Explanation: An internal error occurred in the mmfsd
when editing a string.
6027-729 Attention: No changes for disk name User response: None. Informational message only.
were specified.
Explanation: The disk descriptor in the mmchdisk 6027-736 Attention: Due to an earlier error
command does not specify that any changes are to be normal access to this file system has
made to the disk. been disabled. Check error log for
additional information. The file system
User response: Check the disk descriptor to determine
must be unmounted and then mounted
if changes are needed.
again to restore normal data access.
Explanation: The file system has encountered an error
6027-730 command on fileSystem.
that is serious enough to make some or all data
Explanation: Quota was activated or deactivated as inaccessible. This message indicates that an error
stated as a result of the mmquotaon, mmquotaoff, occurred that left the file system in an unusable state.
mmdefquotaon, or mmdefquotaoff commands. Possible reasons include too many unavailable disks or
insufficient memory for file system control structures.
User response: None, informational only. This
message is enabled with the -v option on the User response: Check other error messages as well as
mmquotaon, mmquotaoff, mmdefquotaon, or the error log for additional information. Unmount the
mmdefquotaoff commands. file system and correct any I/O errors. Then, remount
the file system and try the operation again. If the
problem persists, issue the mmfsck command with the

Chapter 13. Messages 149


6027-737 • 6027-748

file system unmounted to make repairs. already recorded in the file system configuration. The
most likely reason for this problem is that too many
disks have become unavailable or are still unavailable
6027-737 Attention: No metadata disks remain.
after the disk state change.
Explanation: The mmchdisk command has been
User response: Issue an mmchdisk start command
issued, but no metadata disks remain.
when more disks are available.
User response: None. Informational message only.
6027-744 Unable to run command while the file
6027-738 Attention: No data disks remain. system is mounted in restricted mode.

Explanation: The mmchdisk command has been Explanation: A command that can alter the data in a
issued, but no data disks remain. file system was issued while the file system was
mounted in restricted mode.
User response: None. Informational message only.
User response: Mount the file system in read-only or
read-write mode or unmount the file system and then
6027-739 Attention: Due to an earlier reissue the command.
configuration change the file system is
no longer properly balanced.
6027-745 fileSystem: no quotaType quota
Explanation: The mmlsdisk command found that the management enabled.
file system is not properly balanced.
Explanation: A quota command of the cited type was
User response: Issue the mmrestripefs -b command at issued for the cited file system when no quota
your convenience. management was enabled.
User response: Enable quota management and reissue
6027-740 Attention: Due to an earlier the command.
configuration change the file system is
no longer properly replicated.
6027-746 Editing quota limits for this user or
Explanation: The mmlsdisk command found that the group not permitted.
file system is not properly replicated.
Explanation: The root user or system group was
User response: Issue the mmrestripefs -r command at specified for quota limit editing in the mmedquota
your convenience command.
User response: Specify a valid user or group in the
6027-741 Attention: Due to an earlier mmedquota command. Editing quota limits for the root
configuration change the file system user or system group is prohibited.
may contain data that is at risk of being
lost.
| 6027-747 [E] Too many nodes in cluster (max number)
Explanation: The mmlsdisk command found that or file system (max number).
critical data resides on disks that are suspended or
being deleted. Explanation: The operation cannot succeed because
too many nodes are involved.
User response: Issue the mmrestripefs -m command
as soon as possible. User response: Reduce the number of nodes to the
applicable stated limit.
6027-742 Error occurred while executing a
command for fileSystem. 6027-748 fileSystem: no quota management
enabled
Explanation: A quota command encountered a
problem on a file system. Processing continues with the Explanation: A quota command was issued for the
next file system. cited file system when no quota management was
enabled.
User response: None. Informational message only.
User response: Enable quota management and reissue
the command.
6027-743 Initial disk state was updated
successfully, but another error may have
changed the state again.
Explanation: The mmchdisk command encountered
an error after the disk status or availability change was

150 GPFS: Problem Determination Guide


6027-749 • 6027-761 [W]

6027-749 Pool size changed to number K = number | 6027-756 [E] Configuration invalid or inconsistent
M. between different nodes.
Explanation: Pool size successfully changed. Explanation: Self-explanatory.
User response: None. Informational message only. User response: Check cluster and file system
configuration.
| 6027-750 [E] The node address ipAddress is not
defined in the node list 6027-757 name is not an excluded disk.
Explanation: An address does not exist in the GPFS Explanation: Some of the disks passed to the mmfsctl
configuration file. include command are not marked as excluded in the
mmsdrfs file.
User response: Perform required configuration steps
prior to starting GPFS on the node. User response: Verify the list of disks supplied to this
command.
| 6027-751 [E] Error code value
6027-758 Disk(s) not started; disk name has a bad
Explanation: Provides additional information about an
volume label.
error.
Explanation: The volume label on the disk does not
User response: See accompanying error messages.
match that expected by GPFS.
User response: Check the disk hardware. For
| 6027-752 [E] Lost membership in cluster clusterName.
hot-pluggable drives, make sure the proper drive has
Unmounting file systems.
been plugged in.
Explanation: This node has lost membership in the
cluster. Either GPFS is no longer available on enough
6027-759 fileSystem is still in use.
nodes to maintain quorum, or this node could not
communicate with other members of the quorum. This Explanation: The mmfsctl include command found
could be caused by a communications failure between that the named file system is still mounted, or another
nodes, or multiple GPFS failures. GPFS command is running against the file system.
User response: See associated error logs on the failed User response: Unmount the file system if it is
nodes for additional problem determination mounted, or wait for GPFS commands in progress to
information. terminate before retrying the command.

| 6027-753 [E] Could not run command command | 6027-760 [E] Unable to perform i/o to the disk. This
| node is either fenced from accessing the
Explanation: The GPFS daemon failed to run the
| disk or this node's disk lease has
specified command.
| expired.
User response: Verify correct installation.
| Explanation: A read or write to the disk failed due to
| either being fenced from the disk or no longer having a
6027-754 Error reading string for mmfsd. | disk lease.
Explanation: GPFS could not properly read an input | User response: Verify disk hardware fencing setup is
string. | correct if being used. Ensure network connectivity
| between this node and other nodes is operational.
User response: Check that GPFS is properly installed.

| 6027-761 [W] Attention: excessive timer drift between


| 6027-755 [I] Waiting for challenge challengeValue node and node (number over number sec).
| (node nodeNumber, sequence
| sequenceNumber) to be responded during Explanation: GPFS has detected an unusually large
| disk election difference in the rate of clock ticks (as returned by the
times() system call) between two nodes. Another node's
Explanation: The node has challenged another node, TOD clock and tick rate changed dramatically relative
which won the previous election and is waiting for the to this node's TOD clock and tick rate.
challenger to respond.
User response: Check error log for hardware or device
User response: None. Informational message only. driver problems that might cause timer interrupts to be
lost or a recent large adjustment made to the TOD
clock.

Chapter 13. Messages 151


6027-762 • 6027-776

6027-762 No quota enabled file system found. 6027-769 Malformed mmpmon command
'command'.
Explanation: There is no quota-enabled file system in
this cluster. Explanation: The command read from the input file is
malformed, perhaps with an unknown keyword.
User response: None. Informational message only.
User response: Correct the command invocation and
reissue the command.
6027-763 uidInvalidate: Incorrect option option.
Explanation: An incorrect option passed to the
6027-770 Error writing user.quota file.
uidinvalidate command.
Explanation: An error occurred while writing the cited
User response: Correct the command invocation.
quota file.
User response: Check the status and availability of the
6027-764 Error invalidating UID remapping cache
disks and reissue the command.
for domain.
Explanation: An incorrect domain name passed to the
6027-771 Error writing group.quota file.
uidinvalidate command.
Explanation: An error occurred while writing the cited
User response: Correct the command invocation.
quota file.
User response: Check the status and availability of the
| 6027-765 [W] Tick value hasn't changed for nearly
disks and reissue the command.
number seconds
Explanation: Clock ticks incremented by AIX have not
6027-772 Error writing fileset.quota file.
been incremented.
Explanation: An error occurred while writing the cited
User response: Check the error log for hardware or
quota file.
device driver problems that might cause timer
interrupts to be lost. User response: Check the status and availability of the
disks and reissue the command.
| 6027-766 [N] This node will be expelled from cluster
cluster due to expel msg from node 6027-774 fileSystem: quota management is not
enabled, or one or more quota clients
Explanation: This node is being expelled from the
are not available.
cluster.
Explanation: An attempt was made to perform quotas
User response: Check the network connection
commands without quota management enabled, or one
between this node and the node specified above.
or more quota clients failed during quota check.
User response: Correct the cause of the problem, and
| 6027-767 [N] Request sent to node to expel node from then reissue the quota command.
cluster cluster
Explanation: This node sent an expel request to the
6027-775 During mmcheckquota processing,
cluster manager node to expel another node.
number node(s) failed. It is
User response: Check network connection between recommended that mmcheckquota be
this node and the node specified above. repeated.
Explanation: Nodes failed while an online quota
6027-768 Wrong number of operands for check was running.
mmpmon command 'command'.
User response: Reissue the quota check command.
Explanation: The command read from the input file
has the wrong number of operands.
6027-776 fileSystem: There was not enough space
User response: Correct the command invocation and for the report. Please repeat quota
reissue the command. check!
Explanation: The vflag is set in the tscheckquota
command, but either no space or not enough space
could be allocated for the differences to be printed.
User response: Correct the space problem and reissue
the quota check.

152 GPFS: Problem Determination Guide


6027-777 [I] • 6027-792

| 6027-777 [I] Recovering nodes: nodeList | 6027-786 [E] Message failed because the destination
node refused the connection.
Explanation: Recovery for one or more nodes has
begun. Explanation: This node sent a message to a node that
refuses to establish a connection.
User response: No response is needed if this message
is followed by 'recovered nodes' entries specifying the User response: Check previous messages for further
nodes. If this message is not followed by such a information.
message, determine why recovery did not complete.
| 6027-787 [E] Security configuration data is
| 6027-778 [I] Recovering nodes in cluster cluster: inconsistent or unavailable.
nodeList
Explanation: There was an error configuring security
Explanation: Recovery for one or more nodes in the on this node.
cited cluster has begun.
User response: Check previous messages for further
User response: No response is needed if this message information.
is followed by 'recovered nodes' entries on the cited
cluster specifying the nodes. If this message is not
followed by such a message, determine why recovery
| 6027-788 [E] Failed to load or initialize security
library.
did not complete.
Explanation: There was an error loading or initializing
the security library on this node.
6027-779 Incorrect fileset name filesetName.
User response: Check previous messages for further
Explanation: The fileset name provided on the
information.
command line is incorrect.
User response: Correct the fileset name and reissue
6027-789 Unable to read offsets offset to offset for
the command.
inode inode snap snap, from disk
diskName, sector sector.
6027-780 Incorrect path to fileset junction
Explanation: The mmdeldisk -c command found that
junctionName.
the cited addresses on the cited disk represent data that
Explanation: The path to the fileset junction is is no longer readable.
incorrect.
User response: Save this output for later use in
User response: Correct the junction path and reissue cleaning up failing disks.
the command.
6027-790 Specified storage pool poolName does not
6027-781 Storage pools have not been enabled for match disk diskName storage pool
file system fileSystem. poolName. Use mmdeldisk and
mmadddisk to change a disk's storage
Explanation: The user invoked a command with a
pool.
storage pool option (-p or -P) before storage pools were
enabled. Explanation: An attempt was made to change a disk's
storage pool assignment using the mmchdisk
User response: Enable storage pools with the mmchfs
command. This can only be done by deleting the disk
-V command, or correct the command invocation and
from its current storage pool and then adding it to the
reissue the command.
new pool.
User response: Delete the disk from its current storage
| 6027-784 [E] Device not ready.
pool and then add it to the new pool.
Explanation: A device is not ready for operation.
User response: Check previous messages for further 6027-792 Policies have not been enabled for file
information. system fileSystem.
Explanation: The cited file system must be upgraded
| 6027-785 [E] Cannot establish connection. to use policies.

Explanation: This node cannot establish a connection User response: Upgrade the file system via the
to another node. mmchfs -V command.

User response: Check previous messages for further


information.

Chapter 13. Messages 153


6027-793 • 6027-858

6027-793 No policy file was installed for file 6027-851 Unable to process interrupt received.
system fileSystem.
Explanation: An interrupt occurred that tsiostat
Explanation: No policy file was installed for this file cannot process.
system.
User response: Contact the IBM Support Center.
User response: Install a policy file.
6027-852 interval and count must be positive
6027-794 Failed to read policy file for file system integers.
fileSystem.
Explanation: Incorrect values were supplied for
Explanation: Failed to read the policy file for the tsiostat parameters.
requested file system.
User response: Correct the command invocation and
User response: Reinstall the policy file. reissue the command.

6027-795 Failed to open fileName: errorCode. 6027-853 interval must be less than 1024.
Explanation: An incorrect file name was specified to Explanation: An incorrect value was supplied for the
tschpolicy. interval parameter.
User response: Correct the command invocation and User response: Correct the command invocation and
reissue the command. reissue the command.

6027-796 Failed to read fileName: errorCode. 6027-854 count must be less than 1024.
Explanation: An incorrect file name was specified to Explanation: An incorrect value was supplied for the
tschpolicy. count parameter.
User response: Correct the command invocation and User response: Correct the command invocation and
reissue the command. reissue the command.

6027-797 Failed to stat fileName: errorCode. 6027-855 Unable to connect to server, mmfsd is
not started.
Explanation: An incorrect file name was specified to
tschpolicy. Explanation: The tsiostat command was issued but
the file system is not started.
User response: Correct the command invocation and
reissue the command. User response: Contact your system administrator.

6027-798 Policy files are limited to number bytes. 6027-856 No information to report.
Explanation: A user-specified policy file exceeded the Explanation: The tsiostat command was issued but no
maximum-allowed length. file systems are mounted.
User response: Install a smaller policy file. User response: Contact your system administrator.

6027-799 Policy `policyName' installed and 6027-857 Error retrieving values.


broadcast to all nodes.
Explanation: The tsiostat command was issued and
Explanation: Self-explanatory. an internal error occurred.
User response: None. Informational message only. User response: Contact the IBM Support Center.

6027-850 Unable to issue this command from a 6027-858 File system not mounted.
non-root user.
Explanation: The requested file system is not
Explanation: tsiostat requires root privileges to run. mounted.
User response: Get the system administrator to User response: Mount the file system and reattempt
change the executable to set the UID to 0. the failing operation.

154 GPFS: Problem Determination Guide


6027-859 • 6027-872 [E]

User response: Enable storage pools via mmchfs -V,


6027-859 Set DIRECTIO failed
or correct the command invocation and reissue the
Explanation: The tsfattr call failed. command.
User response: Check for additional error messages.
Resolve the problems before reattempting the failing 6027-867 Change storage pool is not permitted.
operation.
Explanation: The user tried to change a file's assigned
storage pool but was not root or superuser.
6027-860 -d is not appropriate for an NFSv4 ACL
User response: Reissue the command as root or
Explanation: Produced by the mmgetacl or mmputacl superuser.
commands when the -d option was specified, but the
object has an NFS Version 4 ACL (does not have a
6027-868 mmchattr failed.
default).
Explanation: An error occurred while changing a file's
User response: None. Informational message only.
attributes.
User response: Check the error code and reissue the
6027-861 Set afm ctl failed
command.
Explanation: The tsfattr call failed.
User response: Check for additional error messages. 6027-869 File replication exceeds number of
Resolve the problems before reattempting the failing failure groups in destination storage
operation. pool.
Explanation: The tschattr command received incorrect
6027-862 Incorrect storage pool name poolName. command line arguments.
Explanation: An incorrect storage pool name was User response: Correct the command invocation and
provided. reissue the command.
User response: Determine the correct storage pool
name and reissue the command. | 6027-870 [E] Error on getcwd(): errorString. Try an
absolute path instead of just pathName
6027-863 File cannot be assigned to storage pool Explanation: The getcwd system call failed.
'poolName'.
User response: Specify an absolute path starting with
Explanation: The file cannot be assigned to the '/' on the command invocation, so that the command
specified pool. will not need to invoke getcwd.
User response: Determine the correct storage pool
name and reissue the command. | 6027-871 [E] Error on gpfs_get_pathname_from_
fssnaphandle(pathName): errorString.
6027-864 Set storage pool failed. Explanation: An error occurred during a
gpfs_get_pathname_from_fssnaphandle operation.
Explanation: An incorrect storage pool name was
provided. User response: Verify the invocation parameters and
make sure the command is running under a user ID
User response: Determine the correct storage pool
with sufficient authority (root or administrator
name and reissue the command.
privileges). Specify a GPFS file system device name or
a GPFS directory path name as the first argument.
6027-865 Restripe file data failed. Correct the command invocation and reissue the
command.
Explanation: An error occurred while restriping the
file data.
| 6027-872 [E] Is pathName a GPFS file system name or
User response: Check the error code and reissue the path?
command.
Explanation: An error occurred while attempting to
access the named GPFS file system or path.
| 6027-866 [E] Storage pools have not been enabled for
this file system. User response: Verify the invocation parameters and
make sure the command is running under a user ID
Explanation: The user invoked a command with a with sufficient authority (root or administrator
storage pool option (-p or -P) before storage pools were
enabled.

Chapter 13. Messages 155


6027-873 [W] • 6027-884 [E:nnn]

privileges). Correct the command invocation and commands to examine and change the pool and
reissue the command. replication attributes of the named file.

| 6027-873 [W] Error on gpfs_stat_inode([pathName/ | 6027-879 [E] Error deleting pathName: errorString
| fileName],inodeNumber.genNumber):
Explanation: An error occurred while attempting to
| errorString
delete the named file.
| Explanation: An error occurred during a
User response: Investigate the file and possibly
| gpfs_stat_inode operation.
reissue the command. The file may have been removed
| User response: Reissue the command. If the problem or locked by another application.
| persists, contact the IBM Support Center.
6027-880 Error on gpfs_seek_inode(inodeNumber):
| 6027-874 [E] Error: incorrect Date@Time errorString
(YYYY-MM-DD@HH:MM:SS)
Explanation: An error occurred during a
specification: specification
gpfs_seek_inode operation.
Explanation: The Date@Time command invocation
User response: Reissue the command. If the problem
argument could not be parsed.
persists, contact the contact the IBM Support Center
User response: Correct the command invocation and
try again. The syntax should look similar to:
2005-12-25@07:30:00.
| 6027-881 [E] Error on gpfs_iopen([rootPath/
pathName],inodeNumber): errorString
Explanation: An error occurred during a gpfs_iopen
| 6027-875 [E] Error on gpfs_stat(pathName): errorString
operation.
Explanation: An error occurred while attempting to
User response: Reissue the command. If the problem
stat() the cited path name.
persists, contact the IBM Support Center.
User response: Determine whether the cited path
name exists and is accessible. Correct the command
arguments as necessary and reissue the command.
| 6027-882 [E] Error on gpfs_ireaddir(rootPath/
| pathName): errorString
Explanation: An error occurred during a
| 6027-876 [E] Error starting directory scan(pathName):
gpfs_ireaddir() operation.
errorString
User response: Reissue the command. If the problem
Explanation: The specified path name is not a
persists, contact the IBM Support Center.
directory.
User response: Determine whether the specified path
name exists and is an accessible directory. Correct the
| 6027-883 [W] Error on
gpfs_next_inode(maxInodeNumber):
command arguments as necessary and reissue the
errorString
command.
Explanation: An error occurred during a
gpfs_next_inode operation.
| 6027-877 [E] Error opening pathName: errorString
User response: Reissue the command. If the problem
Explanation: An error occurred while attempting to
persists, contact the IBM Support Center.
open the named file. Its pool and replication attributes
remain unchanged.
User response: Investigate the file and possibly
| 6027-884 [E:nnn] Error during directory scan
reissue the command. The file may have been removed Explanation: A terminal error occurred during the
or locked by another application. directory scan phase of the command.
User response: Verify the command arguments.
| 6027-878 [E] Error on gpfs_fcntl(pathName): errorString Reissue the command. If the problem persists, contact
(offset=offset) the IBM Support Center.
Explanation: An error occurred while attempting fcntl
on the named file. Its pool or replication attributes may
not have been adjusted.
User response: Investigate the file and possibly
reissue the command. Use the mmlsattr and mmchattr

156 GPFS: Problem Determination Guide


6027-885 [E:nnn] • 6027-899 [X]

| 6027-885 [E:nnn] Error during inode scan: errorString | 6027-892 [E] Error on pthread_create: where
#threadNumber_or_portNumber_or_
Explanation: A terminal error occurred during the
socketNumber: errorString
inode scan phase of the command.
Explanation: An error occurred while creating the
User response: Verify the command arguments.
thread during a pthread_create operation.
Reissue the command. If the problem persists, contact
the IBM Support Center. User response: Consider some of the command
parameters that might affect memory usage. For further
assistance, contact the IBM Support Center.
| 6027-886 [E:nnn] Error during policy decisions scan
Explanation: A terminal error occurred during the
| 6027-893 [X] Error on pthread_mutex_init: errorString
policy decisions phase of the command.
Explanation: An error occurred during a
User response: Verify the command arguments.
pthread_mutex_init operation.
Reissue the command. If the problem persists, contact
the IBM Support Center. User response: Contact the IBM Support Center.

| 6027-887 [E] Error on gpfs_igetstoragepool(datapoolId): | 6027-894 [X] Error on pthread_mutex_lock: errorString


errorString
Explanation: An error occurred during a
Explanation: An error occurred during a pthread_mutex_lock operation.
gpfs_igetstoragepool operation.
User response: Contact the IBM Support Center.
User response: Reissue the command. If the problem
persists, contact the IBM Support Center.
| 6027-895 [X] Error on pthread_mutex_unlock:
errorString
| 6027-888 [E] Error on gpfs_igetfilesetname(filesetId):
Explanation: An error occurred during a
errorString
pthread_mutex_unlock operation.
Explanation: An error occurred during a
User response: Contact the IBM Support Center.
gpfs_igetfilesetname operation.
User response: Reissue the command. If the problem
persists, contact the IBM Support Center.
| 6027-896 [X] Error on pthread_cond_init: errorString
Explanation: An error occurred during a
pthread_cond_init operation.
| 6027-889 [E] Error on
gpfs_get_fssnaphandle(rootPath): User response: Contact the IBM Support Center.
errorString.
Explanation: An error occurred during a | 6027-897 [X] Error on pthread_cond_signal: errorString
gpfs_get_fssnaphandle operation.
Explanation: An error occurred during a
User response: Reissue the command. If the problem pthread_cond_signal operation.
persists, contact the IBM Support Center.
User response: Contact the IBM Support Center.

| 6027-890 [E] Error on gpfs_open_inodescan(rootPath):


errorString | 6027-898 [X] Error on pthread_cond_broadcast:
errorString
Explanation: An error occurred during a
gpfs_open_inodescan() operation. Explanation: An error occurred during a
pthread_cond_broadcast operation.
User response: Reissue the command. If the problem
persists, contact the IBM Support Center. User response: Contact the IBM Support Center.

| 6027-891 [X] WEIGHT(thresholdValue) UNKNOWN | 6027-899 [X] Error on pthread_cond_wait: errorString


pathName Explanation: An error occurred during a
Explanation: The named file was assigned the pthread_cond_wait operation.
indicated weight, but the rule type is UNKNOWN. User response: Contact the IBM Support Center.
User response: Contact the IBM Support Center.

Chapter 13. Messages 157


6027-900 [E] • 6027-911 [E]

| 6027-900 [E] Error opening work file fileName: | 6027-906 [E:nnn] Error on system(command)
errorString
Explanation: An error occurred during the system call
Explanation: An error occurred while attempting to with the specified argument string.
open the named work file.
User response: Read and investigate related error
User response: Investigate the file and possibly messages.
reissue the command. Check that the path name is
defined and accessible.
| 6027-907 [E:nnn] Error from sort_file(inodeListname,
| sortCommand,sortInodeOptions,tempDir)
| 6027-901 [E] Error writing to work file fileName:
Explanation: An error occurred while sorting the
errorString
named work file using the named sort command with
Explanation: An error occurred while attempting to the given options and working directory.
write to the named work file.
User response: Check these:
User response: Investigate the file and possibly v The sort command is installed on your system.
reissue the command. Check that there is sufficient free
v The sort command supports the given options.
space in the file system.
v The working directory is accessible.
v The file system has sufficient free space.
| 6027-902 [E] Error parsing work file fileName. Service
index: number
Explanation: An error occurred while attempting to
| 6027-908 [W] Attention: In RULE 'ruleName'
(ruleNumber), the pool named by
read the specified work file.
"poolName 'poolType'" is not defined in
User response: Investigate the file and possibly the file system.
reissue the command. Make sure that there is enough
Explanation: The cited pool is not defined in the file
free space in the file system. If the error persists,
system.
contact the IBM Support Center.
User response: Correct the rule and reissue the
command.
| 6027-903 [E:nnn] Error while loading policy rules.
This is not an irrecoverable error; the command will
Explanation: An error occurred while attempting to
continue to run. Of course it will not find any files in
read or parse the policy file, which may contain syntax
an incorrect FROM POOL and it will not be able to
errors. Subsequent messages include more information
migrate any files to an incorrect TO POOL.
about the error.
User response: Read all of the related error messages
and try to correct the problem.
| 6027-909 [E] Error on pthread_join: where
#threadNumber: errorString
Explanation: An error occurred while reaping the
| 6027-904 [E] Error returnCode from PD writer for
thread during a pthread_join operation.
inode=inodeNumber pathname=pathName
User response: Contact the IBM Support Center.
Explanation: An error occurred while writing the
policy decision for the candidate file with the indicated
inode number and path name to a work file. There | 6027-910 [E:nnn] Error during policy execution
probably will be related error messages.
Explanation: A terminating error occurred during the
User response: Read all the related error messages. policy execution phase of the command.
Attempt to correct the problems.
User response: Verify the command arguments and
reissue the command. If the problem persists, contact
| 6027-905 [E] Error: Out of memory. Service index: the IBM Support Center.
number
Explanation: The command has exhausted virtual | 6027-911 [E] Error on changeSpecification change for
memory. pathName. errorString
User response: Consider some of the command Explanation: This message provides more details
parameters that might affect memory usage. For further about a gpfs_fcntl() error.
assistance, contact the IBM Support Center.
User response: Use the mmlsattr and mmchattr
commands to examine the file, and then reissue the
change command.

158 GPFS: Problem Determination Guide


6027-912 [E] • 6027-923 [W]

| 6027-912 [E] Error on restriping of pathName. 6027-918 Cannot make this change to a nonzero
errorString length file.
Explanation: This provides more details on a Explanation: GPFS does not support the requested
gpfs_fcntl() error. change to the replication attributes.
User response: Use the mmlsattr and mmchattr User response: You may want to create a new file
commands to examine the file and then reissue the with the desired attributes and then copy your data to
restriping command. that file and rename it appropriately. Be sure that there
are sufficient disks assigned to the pool with different
failure groups to support the desired replication
6027-913 Desired replication exceeds number of
attributes.
failure groups.
Explanation: While restriping a file, the tschattr or
6027-919 Replication parameter range error (value,
tsrestripefile command found that the desired
value).
replication exceeded the number of failure groups.
Explanation: Similar to message 6027-918. The (a,b)
User response: Reissue the command after adding or
numbers are the allowable range of the replication
restarting file system disks.
attributes.
User response: You may want to create a new file
6027-914 Insufficient space in one of the replica
with the desired attributes and then copy your data to
failure groups.
that file and rename it appropriately. Be sure that there
Explanation: While restriping a file, the tschattr or are sufficient disks assigned to the pool with different
tsrestripefile command found there was insufficient failure groups to support the desired replication
space in one of the replica failure groups. attributes.

User response: Reissue the command after adding or


restarting file system disks. | 6027-920 [E] Error on pthread_detach(self): where:
errorString

6027-915 Insufficient space to properly balance Explanation: An error occurred during a


file. pthread_detach operation.

Explanation: While restriping a file, the tschattr or User response: Contact the IBM Support Center.
tsrestripefile command found that there was
insufficient space to properly balance the file.
| 6027-921 [E] Error on socket socketName(hostName):
User response: Reissue the command after adding or errorString
restarting file system disks.
Explanation: An error occurred during a socket
operation.
6027-916 Too many disks unavailable to properly
User response: Verify any command arguments
balance file.
related to interprocessor communication and then
Explanation: While restriping a file, the tschattr or reissue the command. If the problem persists, contact
tsrestripefile command found that there were too the IBM Support Center.
many disks unavailable to properly balance the file.
User response: Reissue the command after adding or | 6027-922 [X] Error in Mtconx - p_accepts should not
restarting file system disks. be empty
Explanation: The program discovered an inconsistency
6027-917 All replicas of a data block were or logic error within itself.
previously deleted.
User response: Contact the IBM Support Center.
Explanation: While restriping a file, the tschattr or
tsrestripefile command found that all replicas of a data
| 6027-923 [W] Error - command client is an
block were previously deleted.
incompatible version: hostName
User response: Reissue the command after adding or protocolVersion
restarting file system disks.
Explanation: While operating in master/client mode,
the command discovered that the client is running an
incompatible version.
User response: Ensure the same version of the

Chapter 13. Messages 159


6027-924 [X] • 6027-934 [W]

command software is installed on all nodes in the


| 6027-930 [W] Attention: In RULE 'ruleName' LIST
clusters and then reissue the command.
name 'listName' appears, but there is no
corresponding EXTERNAL LIST
| 6027-924 [X] Error - unrecognized client response 'listName' EXEC ... OPTS ... rule to
from hostName: clientResponse specify a program to process the
matching files.
Explanation: Similar to message 6027-923, except this
may be an internal logic error. Explanation: There should be an EXTERNAL LIST
rule for every list named by your LIST rules.
User response: Ensure the latest, same version
software is installed on all nodes in the clusters and User response: Add an "EXTERNAL LIST listName
then reissue the command. If the problem persists, EXEC scriptName OPTS opts" rule.
contact the IBM Support Center.
Note: This is not an unrecoverable error. For execution
with -I defer, file lists are generated and saved, so
6027-925 Directory cannot be assigned to storage
EXTERNAL LIST rules are not strictly necessary for
pool 'poolName'.
correct execution.
Explanation: The file cannot be assigned to the
specified pool.
| 6027-931 [E] Error - The policy evaluation phase did
User response: Determine the correct storage pool not complete.
name and reissue the command.
Explanation: One or more errors prevented the policy
evaluation phase from examining all of the files.
6027-926 Symbolic link cannot be assigned to
User response: Consider other messages emitted by
storage pool 'poolName'.
the command. Take appropriate action and then reissue
Explanation: The file cannot be assigned to the the command.
specified pool.
User response: Determine the correct storage pool | 6027-932 [E] Error - The policy execution phase did
name and reissue the command. not complete.
Explanation: One or more errors prevented the policy
6027-927 System file cannot be assigned to execution phase from operating on each chosen file.
storage pool 'poolName'.
User response: Consider other messages emitted by
Explanation: The file cannot be assigned to the the command. Take appropriate action and then reissue
specified pool. the command.

User response: Determine the correct storage pool


name and reissue the command. | 6027-933 [W] EXEC 'wouldbeScriptPathname' of
EXTERNAL POOL or LIST
'PoolOrListName' fails TEST with code
| 6027-928 [E] Error: filesystem/device fileSystem has no scriptReturnCode on this node.
snapshot with name snapshotName.
Explanation: Each EXEC defined in an EXTERNAL
Explanation: The specified file system does not have a POOL or LIST rule is run in TEST mode on each
snapshot with the specified snapshot name. node. Each invocation that fails with a nonzero return
User response: Use the mmlssnapshot command to code is reported. Command execution is terminated on
list the snapshot names for the file system. any node that fails any of these tests.
User response: Correct the EXTERNAL POOL or
| 6027-929 [W] Attention: In RULE 'ruleName' LIST rule, the EXEC script, or do nothing because this
(ruleNumber), both pools 'poolName' and is not necessarily an error. The administrator may
'poolName' are EXTERNAL. This is not a suppress execution of the mmapplypolicy command on
supported migration. some nodes by deliberately having one or more EXECs
return nonzero codes.
Explanation: The command does not support
migration between two EXTERNAL pools.
| 6027-934 [W] Attention: Specified snapshot:
User response: Correct the rule and reissue the 'SnapshotName' will be ignored because
command. the path specified: 'PathName' is not
within that snapshot.
Note: This is not an unrecoverable error. The
command will continue to run. Explanation: The command line specified both a path

160 GPFS: Problem Determination Guide


6027-935 [W] • 6027-948 [E:nnn]

name to be scanned and a snapshot name, but the


6027-940 Open failed.
snapshot name was not consistent with the path name.
Explanation: The open() system call was not
User response: If you wanted the entire snapshot, just
successful.
specify the GPFS file system name or device name. If
you wanted a directory within a snapshot, specify a User response: Check additional error messages.
path name within that snapshot (for example,
/gpfs/FileSystemName/.snapshots/SnapShotName/
6027-941 Set replication failed.
Directory).
Explanation: The open() system call was not
successful.
| 6027-935 [W] Attention: In RULE 'ruleName'
(ruleNumber) LIMIT or REPLICATE User response: Check additional error messages.
clauses are ignored; not supported for
migration to EXTERNAL pool
'storagePoolName'. 6027-943 -M and -R are only valid for zero length
files.
Explanation: GPFS does not support the LIMIT or
REPLICATE clauses during migration to external pools. Explanation: The mmchattr command received
command line arguments that were not valid.
User response: Correct the policy rule to avoid this
warning message. User response: Correct command line and reissue the
command.

| 6027-936 [W] Error - command master is an


incompatible version. 6027-944 -m value exceeds number of failure
groups for metadata.
Explanation: While operating in master/client mode,
the command discovered that the master is running an Explanation: The mmchattr command received
incompatible version. command line arguments that were not valid.

User response: Upgrade the command software on all User response: Correct command line and reissue the
nodes and reissue the command. command.

| 6027-937 [E] Error creating shared temporary 6027-945 -r value exceeds number of failure
sub-directory subDirName: subDirPath groups for data.

Explanation: The mkdir command failed on the Explanation: The mmchattr command received
named subdirectory path. command line arguments that were not valid.

User response: Specify an existing writable shared User response: Correct command line and reissue the
directory as the shared temporary directory argument command.
to the policy command. The policy command will
create a subdirectory within that. 6027-946 Not a regular file or directory.
Explanation: An mmlsattr or mmchattr command
| 6027-938 [E] Error closing work file fileName: error occurred.
errorString
User response: Correct the problem and reissue the
Explanation: An error occurred while attempting to command.
close the named work file or socket.
User response: Record the above information. Contact 6027-947 Stat failed: A file or directory in the
the IBM Support Center. path name does not exist.
Explanation: A file or directory in the path name does
| 6027-939 [E] Error on not exist.
| gpfs_quotactl(pathName,commandCode,
| resourceId): errorString User response: Correct the problem and reissue the
command.
| Explanation: An error occurred while attempting
| gpfs_quotactl().
| 6027-948 [E:nnn] fileName: get clone attributes failed:
| User response: Correct the policy rules and/or enable errorString
| GPFS quota tracking. If problem persists contact the
| IBM Support Center. Explanation: The tsfattr call failed.
User response: Check for additional error messages.

Chapter 13. Messages 161


6027-949 [E] • 6027-965

Resolve the problems before reattempting the failing


| 6027-956 Cannot allocate enough buffer to record
operation.
| different items.
| Explanation: Cannot allocate enough buffer to record
| 6027-949 [E] fileName: invalid clone attributes.
| different items which are used in the next phase.
Explanation: Self explanatory.
| User response: Correct the command line and reissue
User response: Check for additional error messages. | the command. If the problem persists, contact the
Resolve the problems before reattempting the failing | system administrator.
operation.
| 6027-957 Failed to get the root directory inode of
| 6027-950 [E:nnn] File cloning requires the 'fastea' | fileset filesetName
feature to be enabled.
| Explanation: Failed to get the root directory inode of a
Explanation: The file system fastea feature is not | fileset.
enabled.
| User response: Correct the command line and reissue
User response: Enable the fastea feature by issuing | the command. If the problem persists, contact the IBM
the mmchfs -V and mmmigratefs --fastea commands. | Support Center.

| 6027-951 [E] Error on operationName to work file | 6027-959 'fileName' is not a regular file.
fileName: errorString
| Explanation: Only regular files are allowed to be clone
Explanation: An error occurred while attempting to | parents.
do a (write-like) operation on the named work file.
| User response: This file is not a valid target for
User response: Investigate the file and possibly | mmclone operations.
reissue the command. Check that there is sufficient free
space in the file system.
| 6027-960 cannot access 'fileName': errorString.
| Explanation: This message provides more details
| 6027-953 Failed to get a handle for fileset
| about a stat() error.
| filesetName, snapshot snapshotName in file
| system fileSystem. errorMessage. | User response: Correct the problem and reissue the
| command.
| Explanation: Failed to get a handle for a specific
| fileset snapshot in the file system.
6027-961 Cannot execute command.
| User response: Correct the command line and reissue
| the command. If the problem persists, contact the IBM Explanation: The mmeditacl command cannot invoke
| Support Center. the mmgetacl or mmputacl command.
User response: Contact your system administrator.
| 6027-954 Failed to get the maximum inode
| number in the active file system.
6027-963 EDITOR environment variable not set
| errorMessage.
Explanation: Self-explanatory.
| Explanation: Failed to get the maximum inode
| number in the current active file system. User response: Set the EDITOR environment variable
and reissue the command.
| User response: Correct the command line and reissue
| the command. If the problem persists, contact the IBM
| Support Center. 6027-964 EDITOR environment variable must be
an absolute path name
| 6027-955 Failed to set the maximum allowed Explanation: Self-explanatory.
| memory for the specified fileSystem
| command. User response: Set the EDITOR environment variable
correctly and reissue the command.
| Explanation: Failed to set the maximum allowed
| memory for the specified command.
6027-965 Cannot create temporary file
| User response: Correct the command line and reissue
| the command. If the problem persists, contact the IBM Explanation: Self-explanatory.
| Support Center. User response: Contact your system administrator.

162 GPFS: Problem Determination Guide


6027-966 • 6027-986

6027-966 Cannot access fileName 6027-978 Incorrect, duplicate, or missing access


control entry detected.
Explanation: Self-explanatory.
Explanation: An access control entry in the ACL that
User response: Verify file permissions.
was created had incorrect syntax, one of the required
access control entries is missing, or the ACL contains
6027-967 Should the modified ACL be applied? duplicate access control entries.
(yes) or (no)
User response: Correct the problem and reissue the
Explanation: Self-explanatory. command.

User response: Respond yes if you want to commit


the changes, no otherwise. 6027-979 Incorrect ACL entry: entry.
Explanation: Self-explanatory.
6027-971 Cannot find fileName
User response: Correct the problem and reissue the
Explanation: Self-explanatory. command.

User response: Verify the file name and permissions.


6027-980 name is not a valid user name.
6027-972 name is not a directory (-d not valid). Explanation: Self-explanatory.
Explanation: Self-explanatory. User response: Specify a valid user name and reissue
the command.
User response: None, only directories are allowed to
have default ACLs.
6027-981 name is not a valid group name.
6027-973 Cannot allocate number byte buffer for Explanation: Self-explanatory.
ACL.
User response: Specify a valid group name and
Explanation: There was not enough available memory reissue the command.
to process the request.
User response: Contact your system administrator. 6027-982 name is not a valid ACL entry type.
Explanation: Specify a valid ACL entry type and
6027-974 Failure reading ACL (rc=number). reissue the command.

Explanation: An unexpected error was encountered by User response: Correct the problem and reissue the
mmgetacl or mmeditacl. command.

User response: Examine the return code, contact the


IBM Support Center if necessary. 6027-983 name is not a valid permission set.
Explanation: Specify a valid permission set and
6027-976 Failure writing ACL (rc=number). reissue the command.
Explanation: An unexpected error encountered by User response: Correct the problem and reissue the
mmputacl or mmeditacl. command.
User response: Examine the return code, Contact the
IBM Support Center if necessary. 6027-985 An error was encountered while
deleting the ACL (rc=value).
6027-977 Authorization failure Explanation: An unexpected error was encountered by
tsdelacl.
Explanation: An attempt was made to create or
modify the ACL for a file that you do not own. User response: Examine the return code and contact
the IBM Support Center, if necessary.
User response: Only the owner of a file or the root
user can create or change the access control list for a
file. 6027-986 Cannot open fileName.
Explanation: Self-explanatory.
User response: Verify the file name and permissions.

Chapter 13. Messages 163


6027-987 • 6027-998 [E]

6027-987 name is not a valid special name. 6027-993 Keyword aclType is incorrect. Valid
values are: 'posix', 'nfs4', 'native'.
Explanation: Produced by the mmputacl command
when the NFS V4 'special' identifier is followed by an Explanation: One of the mm*acl commands specified
unknown special id string. name is one of the following: an incorrect value with the -k option.
'owner@', 'group@', 'everyone@'.
User response: Correct the aclType value and reissue
User response: Specify a valid NFS V4 special name the command.
and reissue the command.
6027-994 ACL permissions cannot be denied to
6027-988 type is not a valid NFS V4 type. the file owner.
Explanation: Produced by the mmputacl command Explanation: The mmputacl command found that the
when the type field in an ACL entry is not one of the READ_ACL, WRITE_ACL, READ_ATTR, or
supported NFS Version 4 type values. type is one of the WRITE_ATTR permissions are explicitly being denied
following: 'allow' or 'deny'. to the file owner. This is not permitted, in order to
prevent the file being left with an ACL that cannot be
User response: Specify a valid NFS V4 type and
modified.
reissue the command.
User response: Do not select the READ_ACL,
WRITE_ACL, READ_ATTR, or WRITE_ATTR
6027-989 name is not a valid NFS V4 flag.
permissions on deny ACL entries for the OWNER.
Explanation: A flag specified in an ACL entry is not
one of the supported values, or is not valid for the type
6027-995 This command will run on a remote
of object (inherit flags are valid for directories only).
node, nodeName.
Valid values are FileInherit, DirInherit, and
InheritOnly. Explanation: The mmputacl command was invoked
for a file that resides on a file system in a remote
User response: Specify a valid NFS V4 option and
cluster, and UID remapping is enabled. To parse the
reissue the command.
user and group names from the ACL file correctly, the
command will be run transparently on a node in the
6027-990 Missing permissions (value found, value remote cluster.
are required).
User response: None. Informational message only.
Explanation: The permissions listed are less than the
number required.
| 6027-996 [E:nnn] Error reading policy text from:
User response: Add the missing permissions and | fileName
reissue the command.
Explanation: An error occurred while attempting to
open or read the specified policy file. The policy file
6027-991 Combining FileInherit and DirInherit may be missing or inaccessible.
makes the mask ambiguous.
User response: Read all of the related error messages
Explanation: Produced by the mmputacl command and try to correct the problem.
when WRITE/CREATE is specified without MKDIR
(or the other way around), and both the
| 6027-997 [W] Attention: RULE 'ruleName' attempts to
FILE_INHERIT and DIR_INHERIT flags are specified.
redefine EXTERNAL POOLorLISTliteral
User response: Make separate FileInherit and 'poolName', ignored.
DirInherit entries and reissue the command.
Explanation: Execution continues as if the specified
rule was not present.
6027-992 Subdirectory name already exists. Unable
User response: Correct or remove the policy rule.
to create snapshot.
Explanation: tsbackup was unable to create a
| 6027-998 [E] Error in FLR/PDR serving for client
snapshot because the snapshot subdirectory already
clientHostNameAndPortNumber:
exists. This condition sometimes is caused by issuing a
FLRs=numberOfFileListRecords
Tivoli restore operation without specifying a different
PDRs=numberOfPolicyDecisionResponses
subdirectory as the target of the restore.
pdrs=numberOfPolicyDecisionResponseRecords
User response: Remove or rename the existing
Explanation: A protocol error has been detected
subdirectory and then retry the command.
among cooperating mmapplypolicy processes.

164 GPFS: Problem Determination Guide


6027-999 [E] • 6027-1029

User response: Reissue the command. If the problem


6027-1007 attribute found in common multiple
persists, contact the IBM Support Center.
times: attribute.
Explanation: The attribute specified on the command
| 6027-999 [E] Authentication failed:
line is in the main input stream multiple times. This is
myNumericNetworkAddress with
occasionally legal, such as with the trace attribute.
partnersNumericNetworkAddress
These attributes, however, are not meant to be repaired
(code=codeIndicatingProtocolStepSequence
by mmfixcfg.
rc=errnoStyleErrorCode)
User response: Fix the configuration file (mmfs.cfg or
Explanation: Two processes at the specified network
mmfscfg1 in the SDR). All attributes modified by GPFS
addresses failed to authenticate. The cooperating
configuration commands may appear only once in
processes should be on the same network; they should
common sections of the configuration file.
not be separated by a firewall.
User response: Correct the configuration and try the
6027-1008 Attribute found in custom multiple
operation again. If the problem persists, contact the
times: attribute.
IBM Support Center.
Explanation: The attribute specified on the command
line is in a custom section multiple times. This is
6027-1004 Incorrect [nodelist] format in file:
occasionally legal. These attributes are not meant to be
nodeListLine
repaired by mmfixcfg.
Explanation: A [nodelist] line in the input stream is not
User response: Fix the configuration file (mmfs.cfg or
a comma-separated list of nodes.
mmfscfg1 in the SDR). All attributes modified by GPFS
User response: Fix the format of the [nodelist] line in configuration commands may appear only once in
the mmfs.cfg input file. This is usually the NodeFile custom sections of the configuration file.
specified on the mmchconfig command.
If no user-specified [nodelist] lines are in error, contact 6027-1022 Missing mandatory arguments on
the IBM Support Center. command line.

If user-specified [nodelist] lines are in error, correct these Explanation: Some, but not enough, arguments were
lines. specified to the mmcrfsc command.
User response: Specify all arguments as per the usage
6027-1005 Common is not sole item on [] line statement that follows.
number.
Explanation: A [nodelist] line in the input stream 6027-1023 File system size must be an integer:
contains common plus any other names. value

User response: Fix the format of the [nodelist] line in Explanation: The first two arguments specified to the
the mmfs.cfg input file. This is usually the NodeFile mmcrfsc command are not integers.
specified on the mmchconfig command.
User response: File system size is an internal
If no user-specified [nodelist] lines are in error, contact argument. The mmcrfs command should never call the
the IBM Support Center. mmcrfsc command without a valid file system size
argument. Contact the IBM Support Center.
If user-specified [nodelist] lines are in error, correct these
lines.
6027-1028 Incorrect value for -name flag.
6027-1006 Incorrect custom [ ] line number. Explanation: An incorrect argument was specified
with an option that requires one of a limited number of
Explanation: A [nodelist] line in the input stream is not allowable options (for example, -s or any of the yes |
of the format: [nodelist]. This covers syntax errors not no options).
covered by messages 6027-1004 and 6027-1005.
User response: Use one of the valid values for the
User response: Fix the format of the list of nodes in specified option.
the mmfs.cfg input file. This is usually the NodeFile
specified on the mmchconfig command.
6027-1029 Incorrect characters in integer field for
If no user-specified lines are in error, contact the IBM -name option.
Support Center.
Explanation: An incorrect character was specified with
If user-specified lines are in error, correct these lines. the indicated option.

Chapter 13. Messages 165


6027-1030 • 6027-1056

User response: Use a valid integer for the indicated


6027-1038 IndirectSize must be <= BlockSize and
option.
must be a multiple of LogicalSectorSize
(512).
6027-1030 Value below minimum for -optionLetter
Explanation: The IndirectSize specified was not a
option. Valid range is from value to value
multiple of 512 or the IndirectSize specified was larger
Explanation: The value specified with an option was than BlockSize.
below the minimum.
User response: Use valid values for IndirectSize and
User response: Use an integer in the valid range for BlockSize.
the indicated option.
6027-1039 InodeSize must be a multiple of
6027-1031 Value above maximum for option LocalSectorSize (512).
-optionLetter. Valid range is from value to
Explanation: The specified InodeSize was not a
value.
multiple of 512.
Explanation: The value specified with an option was
User response: Use a valid value for InodeSize.
above the maximum.
User response: Use an integer in the valid range for
6027-1040 InodeSize must be less than or equal to
the indicated option.
Blocksize.
Explanation: The specified InodeSize was not less
6027-1032 Incorrect option optionName.
than or equal to Blocksize.
Explanation: An unknown option was specified.
User response: Use a valid value for InodeSize.
User response: Use only the options shown in the
syntax.
6027-1042 DefaultMetadataReplicas must be less
than or equal to MaxMetadataReplicas.
6027-1033 Option optionName specified twice.
Explanation: The specified DefaultMetadataReplicas
Explanation: An option was specified more than once was greater than MaxMetadataReplicas.
on the command line.
User response: Specify a valid value for
User response: Use options only once. DefaultMetadataReplicas.

6027-1034 Missing argument after optionName 6027-1043 DefaultDataReplicas must be less than
option. or equal MaxDataReplicas.
Explanation: An option was not followed by an Explanation: The specified DefaultDataReplicas was
argument. greater than MaxDataReplicas.
User response: All options need an argument. Specify User response: Specify a valid value for
one. DefaultDataReplicas.

6027-1035 Option -optionName is mandatory. 6027-1055 LogicalSectorSize must be a multiple of


512
Explanation: A mandatory input option was not
specified. Explanation: The specified LogicalSectorSize was not a
multiple of 512.
User response: Specify all mandatory options.
User response: Specify a valid LogicalSectorSize.
6027-1036 Option expected at string.
6027-1056 Blocksize must be a multiple of
Explanation: Something other than an expected option LogicalSectorSize × 32
was encountered on the latter portion of the command
line. Explanation: The specified Blocksize was not a
multiple of LogicalSectorSize × 32.
User response: Follow the syntax shown. Options may
not have multiple values. Extra arguments are not User response: Specify a valid value for Blocksize.
allowed.

166 GPFS: Problem Determination Guide


6027-1057 • 6027-1136

6027-1057 InodeSize must be less than or equal to 6027-1123 Disk name must be specified in disk
Blocksize. descriptor.
Explanation: The specified InodeSize was not less than Explanation: The disk name positional parameter (the
or equal to Blocksize. first field) in a disk descriptor was empty. The bad disk
descriptor is displayed following this message.
User response: Specify a valid value for InodeSize.
User response: Correct the input and rerun the
command.
6027-1059 Mode must be M or S: mode
Explanation: The first argument provided in the
6027-1124 Disk usage must be dataOnly,
mmcrfsc command was not M or S.
metadataOnly, descOnly, or
User response: The mmcrfsc command should not be dataAndMetadata.
called by a user. If any other command produces this
Explanation: The disk usage parameter has a value
error, contact the IBM Support Center.
that is not valid.
User response: Correct the input and reissue the
6027-1084 The specified block size (valueK)
command.
exceeds the maximum allowed block
size currently in effect (valueK). Either
specify a smaller value for the -B 6027-1132 Interrupt received: changes not
parameter, or increase the maximum propagated.
block size by issuing: mmchconfig
Explanation: An interrupt was received after changes
maxblocksize=valueK and restart the
were committed but before the changes could be
GPFS daemon.
propagated to all the nodes.
Explanation: The specified value for block size was
User response: All changes will eventually propagate
greater than the value of the maxblocksize
as nodes recycle or other GPFS administration
configuration parameter.
commands are issued. Changes can be activated now
User response: Specify a valid value or increase the by manually restarting the GPFS daemons.
value of the allowed block size by specifying a larger
value on the maxblocksize parameter of the
6027-1133 Interrupt received. Only a subset of the
mmchconfig command.
parameters were changed.
Explanation: An interrupt was received in mmchfs
6027-1113 Incorrect option: option.
before all of the requested changes could be completed.
Explanation: The specified command option is not
User response: Use mmlsfs to see what the currently
valid.
active settings are. Reissue the command if you want to
User response: Specify a valid option and reissue the change additional parameters.
command.
6027-1135 Restriping may not have finished.
6027-1119 Obsolete option: option.
Explanation: An interrupt occurred during restriping.
Explanation: A command received an option that is
User response: Restart the restripe. Verify that the file
not valid any more.
system was not damaged by running the mmfsck
User response: Correct the command line and reissue command.
the command.
6027-1136 option option specified twice.
6027-1120 Interrupt received: No changes made.
Explanation: An option was specified multiple times
Explanation: A GPFS administration command (mm...) on a command line.
received an interrupt before committing any changes.
User response: Correct the error on the command line
User response: None. Informational message only. and reissue the command.

Chapter 13. Messages 167


6027-1137 • 6027-1151

User response: Correct the input and reissue the


6027-1137 option value must be yes or no.
command.
Explanation: A yes or no option was used with
something other than yes or no.
6027-1147 Error converting diskName into an NSD.
User response: Correct the error on the command line
Explanation: Error encountered while converting a
and reissue the command.
disk into an NSD.
User response: Check the preceding messages for
6027-1138 Incorrect extra argument: argument
more information.
Explanation: Non-option arguments followed the
mandatory arguments.
6027-1148 File system fileSystem already exists in
User response: Unlike most POSIX commands, the the cluster. Use mmchfs -W to assign a
main arguments come first, followed by the optional new device name for the existing file
arguments. Correct the error and reissue the command. system.
Explanation: You are trying to import a file system
6027-1140 Incorrect integer for option: number. into the cluster but there is already a file system with
the same name in the cluster.
Explanation: An option requiring an integer argument
was followed by something that cannot be parsed as an User response: Remove or rename the file system
integer. with the conflicting name.
User response: Specify an integer with the indicated
option. 6027-1149 fileSystem is defined to have mount point
mountpoint. There is already such a
mount point in the cluster. Use mmchfs
6027-1141 No disk descriptor file specified.
-T to assign a new mount point to the
Explanation: An -F flag was not followed by the path existing file system.
name of a disk descriptor file.
Explanation: The cluster into which the file system is
User response: Specify a valid disk descriptor file. being imported already contains a file system with the
same mount point as the mount point of the file system
being imported.
6027-1142 File fileName already exists.
User response: Use the -T option of the mmchfs
Explanation: The specified file already exists. command to change the mount point of the file system
User response: Rename the file or specify a different that is already in the cluster and then rerun the
file name and reissue the command. mmimportfs command.

6027-1143 Cannot open fileName. 6027-1150 Error encountered while importing disk
diskName.
Explanation: A file could not be opened.
Explanation: The mmimportfs command encountered
User response: Verify that the specified file exists and problems while processing the disk.
that you have the proper authorizations.
User response: Check the preceding messages for
more information.
6027-1144 Incompatible cluster types. You cannot
move file systems that were created by
GPFS cluster type sourceCluster into 6027-1151 Disk diskName already exists in the
GPFS cluster type targetCluster. cluster.

Explanation: The source and target cluster types are Explanation: You are trying to import a file system
incompatible. that has a disk with the same name as some disk from
a file system that is already in the cluster.
User response: Contact the IBM Support Center for
assistance. User response: Remove or replace the disk with the
conflicting name.

6027-1145 parameter must be greater than 0: value


Explanation: A negative value had been specified for
the named parameter, which requires a positive value.

168 GPFS: Problem Determination Guide


6027-1152 • 6027-1164

6027-1152 Block size must be 16K, 64K, 128K, 6027-1159 The following file systems were not
256K, 512K, 1M, 2M, 4M, 8M or 16M. imported: fileSystemList
Explanation: The specified block size value is not Explanation: The mmimportfs command was not able
valid. to import the specified file systems. Check the
preceding messages for error information.
User response: Specify a valid block size value.
User response: Correct the problems and reissue the
mmimportfs command.
6027-1153 At least one node in the cluster must be
defined as a quorum node.
6027-1160 The drive letters for the following file
Explanation: All nodes were explicitly designated or
systems have been reset: fileSystemList.
allowed to default to be nonquorum.
Explanation: The drive letters associated with the
User response: Specify which of the nodes should be
specified file systems are already in use by existing file
considered quorum nodes and reissue the command.
systems and have been reset.
User response: After the mmimportfs command
6027-1154 Incorrect node node specified for
finishes, use the -t option of the mmchfs command to
command.
assign new drive letters as needed.
Explanation: The user specified a node that is not
valid.
6027-1161 Use the dash character (-) to separate
User response: Specify a valid node. multiple node designations.
Explanation: A command detected an incorrect
6027-1155 The NSD servers for the following disks character used as a separator in a list of node
from file system fileSystem were reset or designations.
not defined: diskList
User response: Correct the command line and reissue
Explanation: Either the mmimportfs command the command.
encountered disks with no NSD servers, or was forced
to reset the NSD server information for one or more
6027-1162 Use the semicolon character (;) to
disks.
separate the disk names.
User response: After the mmimportfs command
Explanation: A command detected an incorrect
finishes, use the mmchnsd command to assign NSD
character used as a separator in a list of disk names.
server nodes to the disks as needed.
User response: Correct the command line and reissue
the command.
6027-1156 The NSD servers for the following free
disks were reset or not defined: diskList
6027-1163 GPFS is still active on nodeName.
Explanation: Either the mmimportfs command
encountered disks with no NSD servers, or was forced Explanation: The GPFS daemon was discovered to be
to reset the NSD server information for one or more active on the specified node during an operation that
disks. requires the daemon to be stopped.
User response: After the mmimportfs command User response: Stop the daemon on the specified node
finishes, use the mmchnsd command to assign NSD and rerun the command.
server nodes to the disks as needed.
6027-1164 Use mmchfs -t to assign drive letters as
6027-1157 Use the mmchnsd command to assign needed.
NSD servers as needed.
Explanation: The mmimportfs command was forced
Explanation: Either the mmimportfs command to reset the drive letters associated with one or more
encountered disks with no NSD servers, or was forced file systems. Check the preceding messages for detailed
to reset the NSD server information for one or more information.
disks. Check the preceding messages for detailed
User response: After the mmimportfs command
information.
finishes, use the -t option of the mmchfs command to
User response: After the mmimportfs command assign new drive letters as needed.
finishes, use the mmchnsd command to assign NSD
server nodes to the disks as needed.

Chapter 13. Messages 169


6027-1165 • 6027-1203

6027-1165 The PR attributes for the following 6027-1189 You cannot delete all the disks.
disks from file system fileSystem were
Explanation: The number of disks to delete is greater
reset or not yet established: diskList
than or equal to the number of disks in the file system.
Explanation: The mmimportfs command disabled the
User response: Delete only some of the disks. If you
Persistent Reserve attribute for one or more disks.
want to delete them all, use the mmdelfs command.
User response: After the mmimportfs command
finishes, use the mmchconfig command to enable
6027-1197 parameter must be greater than value:
Persistent Reserve in the cluster as needed.
value.
Explanation: An incorrect value was specified for the
6027-1166 The PR attributes for the following free
named parameter.
disks were reset or not yet established:
diskList User response: Correct the input and reissue the
command.
Explanation: The mmimportfs command disabled the
Persistent Reserve attribute for one or more disks.
6027-1200 tscrfs failed. Cannot create device
User response: After the mmimportfs command
finishes, use the mmchconfig command to enable Explanation: The internal tscrfs command failed.
Persistent Reserve in the cluster as needed.
User response: Check the error message from the
command that failed.
6027-1167 Use mmchconfig to enable Persistent
Reserve in the cluster as needed.
6027-1201 Disk diskName does not belong to file
Explanation: The mmimportfs command disabled the system fileSystem.
Persistent Reserve attribute for one or more disks.
Explanation: The specified disk was not found to be
User response: After the mmimportfs command part of the cited file system.
finishes, use the mmchconfig command to enable
Persistent Reserve in the cluster as needed. User response: If the disk and file system were
specified as part of a GPFS command, reissue the
command with a disk that belongs to the specified file
6027-1168 Inode size must be 512, 1K or 4K. system.
Explanation: The specified inode size is not valid.
6027-1203 Attention: File system fileSystem may
User response: Specify a valid inode size.
have some disks that are in a non-ready
state. Issue the command: mmcommon
6027-1169 attribute must be value. recoverfs fileSystem
Explanation: The specified value of the given attribute Explanation: The specified file system may have some
is not valid. disks that are in a non-ready state.
User response: Specify a valid value. User response: Run mmcommon recoverfs fileSystem
to ensure that the GPFS configuration data for the file
system is current, and then display the states of the
6027-1178 parameter must be from value to value:
disks in the file system using the mmlsdisk command.
valueSpecified
If any disks are in a non-ready state, steps should be
Explanation: A parameter value specified was out of
taken to bring these disks into the ready state, or to
range.
remove them from the file system. This can be done by
User response: Keep the specified value within the mounting the file system, or by using the mmchdisk
range shown. command for a mounted or unmounted file system.
When maintenance is complete or the failure has been
repaired, use the mmchdisk command with the start
6027-1188 Duplicate disk specified: disk option. If the failure cannot be repaired without loss of
Explanation: A disk was specified more than once on data, you can use the mmdeldisk command to delete
the command line. the disks.

User response: Specify each disk only once.

170 GPFS: Problem Determination Guide


6027-1204 • 6027-1215

6027-1204 command failed. 6027-1209 GPFS is down on this node.


Explanation: An internal command failed. This is Explanation: GPFS is not running on this node.
usually a call to the GPFS daemon.
User response: Ensure that GPFS is running and
User response: Check the error message from the reissue the command.
command that failed.
6027-1210 GPFS is not ready to handle commands
6027-1205 Failed to connect to remote cluster yet.
clusterName.
Explanation: GPFS is in the process of initializing or
Explanation: Attempt to establish a connection to the waiting for quorum to be reached.
specified cluster was not successful. This can be caused
User response: Reissue the command.
by a number of reasons: GPFS is down on all of the
contact nodes, the contact node list is obsolete, the
owner of the remote cluster revoked authorization, and 6027-1211 fileSystem refers to file system fileSystem
so forth. in cluster clusterName.
User response: If the error persists, contact the Explanation: Informational message.
administrator of the remote cluster and verify that the
contact node information is current and that the User response: None.
authorization key files are current as well.
6027-1212 File system fileSystem does not belong to
6027-1206 File system fileSystem belongs to cluster cluster clusterName.
clusterName. Command is not allowed Explanation: The specified file system refers to a file
for remote file systems. system that is remote to the cited cluster. Indirect
Explanation: The specified file system is not local to remote file system access is not allowed.
the cluster, but belongs to the cited remote cluster. User response: Contact the administrator of the
User response: Choose a local file system, or issue the remote cluster that owns the file system and verify the
command on a node in the remote cluster. accuracy of the local information. Use the mmremotefs
show command to display the local information about
the file system. Use the mmremotefs update command
6027-1207 There is already an existing file system to make the necessary changes.
using value.
Explanation: The mount point or device name 6027-1213 command failed. Error code errorCode.
specified matches that of an existing file system. The
device name and mount point must be unique within a Explanation: An internal command failed. This is
GPFS cluster. usually a call to the GPFS daemon.

User response: Choose an unused name or path. User response: Examine the error code and other
messages to determine the reason for the failure.
Correct the problem and reissue the command.
6027-1208 File system fileSystem not found in
cluster clusterName.
6027-1214 Unable to enable Persistent Reserve on
Explanation: The specified file system does not belong the following disks: diskList
to the cited remote cluster. The local information about
the file system is not current. The file system may have Explanation: The command was unable to set up all
been deleted, renamed, or moved to a different cluster. of the disks to use Persistent Reserve.

User response: Contact the administrator of the User response: Examine the disks and the additional
remote cluster that owns the file system and verify the error information to determine if the disks should have
accuracy of the local information. Use the mmremotefs supported Persistent Reserve. Correct the problem and
show command to display the local information about reissue the command.
the file system. Use the mmremotefs update command
to make the necessary changes. 6027-1215 Unable to reset the Persistent Reserve
attributes on one or more disks on the
following nodes: nodeList
Explanation: The command could not reset Persistent
Reserve on at least one disk on the specified nodes.
User response: Examine the additional error

Chapter 13. Messages 171


6027-1216 • 6027-1227

information to determine whether nodes were down or


6027-1222 Cannot assign a minor number for file
if there was a disk error. Correct the problems and
system fileSystem (major number
reissue the command.
deviceMajorNumber).
Explanation: The command was not able to allocate a
6027-1216 File fileName contains additional error
minor number for the new file system.
information.
User response: Delete unneeded /dev entries for the
Explanation: The command generated a file
specified major number and reissue the command.
containing additional error information.
User response: Examine the additional error
6027-1223 ipAddress cannot be used for NFS
information.
serving; it is used by the GPFS daemon.
Explanation: The IP address shown has been specified
6027-1217 A disk descriptor contains an incorrect
for use by the GPFS daemon. The same IP address
separator character.
cannot be used for NFS serving because it cannot be
Explanation: A command detected an incorrect failed over.
character used as a separator in a disk descriptor.
User response: Specify a different IP address for NFS
User response: Correct the disk descriptor and reissue use and reissue the command.
the command.
6027-1224 There is no file system with drive letter
6027-1218 Node nodeName does not have a GPFS driveLetter.
server license designation.
Explanation: No file system in the GPFS cluster has
Explanation: The function that you are assigning to the specified drive letter.
the node requires the node to have a GPFS server
User response: Reissue the command with a valid file
license.
system.
User response: Use the mmchlicense command to
assign a valid GPFS license to the node or specify a
6027-1225 Explicit drive letters are supported only
different node.
in a Windows environment. Specify a
mount point or allow the default
6027-1219 NSD discovery on node nodeName failed settings to take effect.
with return code value.
Explanation: An explicit drive letter was specified on
Explanation: The NSD discovery process on the the mmmount command but the target node does not
specified node failed with the specified return code. run the Windows operating system.

User response: Determine why the node cannot access User response: Specify a mount point or allow the
the specified NSDs. Correct the problem and reissue default settings for the file system to take effect.
the command.
6027-1226 Explicit mount points are not supported
6027-1220 Node nodeName cannot be used as an in a Windows environment. Specify a
NSD server for Persistent Reserve disk drive letter or allow the default settings
diskName because it is not an AIX node. to take effect.

Explanation: The node shown was specified as an Explanation: An explicit mount point was specified on
NSD server for diskName, but the node does not the mmmount command but the target node runs the
support Persistent Reserve. Windows operating system.

User response: Specify a node that supports Persistent User response: Specify a drive letter or allow the
Reserve as an NSD server. default settings for the file system to take effect.

6027-1221 The number of NSD servers exceeds the 6027-1227 The main GPFS cluster configuration
maximum (value) allowed. | file is locked. Retrying ...

Explanation: The number of NSD servers in the disk Explanation: Another GPFS administration command
descriptor exceeds the maximum allowed. has locked the cluster configuration file. The current
process will try to obtain the lock a few times before
User response: Change the disk descriptor to specify giving up.
no more NSD servers than the maximum allowed.
User response: None. Informational message only.

172 GPFS: Problem Determination Guide


6027-1228 • 6027-1239

6027-1228 Lock creation successful. 6027-1234 Adding node node to the cluster will
exceed the quorum node limit.
Explanation: The holder of the lock has released it
and the current process was able to obtain it. Explanation: An attempt to add the cited node to the
cluster resulted in the quorum node limit being
User response: None. Informational message only. The
exceeded.
command will now continue.
User response: Change the command invocation to
not exceed the node quorum limit, and reissue the
6027-1229 Timed out waiting for lock. Try again
command.
later.
Explanation: Another GPFS administration command
6027-1235 The fileName kernel extension does not
kept the main GPFS cluster configuration file locked for
exist.
over a minute.
Explanation: The cited kernel extension does not exist.
User response: Try again later. If no other GPFS
administration command is presently running, see User response: Create the needed kernel extension by
“GPFS cluster configuration data files are locked” on compiling a custom mmfslinux module for your kernel
page 44. (see steps in /usr/lpp/mmfs/src/README), or copy the
binaries from another node with the identical
environment.
6027-1230 diskName is a tiebreaker disk and cannot
be deleted.
6027-1236 Unable to verify kernel/module
Explanation: A request was made to GPFS to delete a
configuration.
node quorum tiebreaker disk.
Explanation: The mmfslinux kernel extension does
User response: Specify a different disk for deletion.
not exist.
User response: Create the needed kernel extension by
6027-1231 GPFS detected more than eight quorum
compiling a custom mmfslinux module for your kernel
nodes while node quorum with
(see steps in /usr/lpp/mmfs/src/README), or copy the
tiebreaker disks is in use.
binaries from another node with the identical
Explanation: A GPFS command detected more than environment.
eight quorum nodes, but this is not allowed while node
quorum with tiebreaker disks is in use.
6027-1237 The GPFS daemon is still running; use
User response: Reduce the number of quorum nodes the mmshutdown command.
to a maximum of eight, or use the normal node
Explanation: An attempt was made to unload the
quorum algorithm.
GPFS kernel extensions while the GPFS daemon was
still running.
6027-1232 GPFS failed to initialize the tiebreaker
User response: Use the mmshutdown command to
disks.
shut down the daemon.
Explanation: A GPFS command unsuccessfully
attempted to initialize the node quorum tiebreaker
6027-1238 Module fileName is still in use. Unmount
disks.
all GPFS file systems and issue the
User response: Examine prior messages to determine command: mmfsadm cleanup
why GPFS was unable to initialize the tiebreaker disks
Explanation: An attempt was made to unload the
and correct the problem. After that, reissue the
cited module while it was still in use.
command.
User response: Unmount all GPFS file systems and
issue the command mmfsadm cleanup. If this does not
6027-1233 Incorrect keyword: value.
solve the problem, reboot the machine.
Explanation: A command received a keyword that is
not valid.
6027-1239 Error unloading module moduleName.
User response: Correct the command line and reissue
Explanation: GPFS was unable to unload the cited
the command.
module.
User response: Unmount all GPFS file systems and
issue the command mmfsadm cleanup. If this does not
solve the problem, reboot the machine.

Chapter 13. Messages 173


6027-1240 • 6027-1252

6027-1240 Module fileName is already loaded. 6027-1246 configParameter is an obsolete parameter.


Line in error: configLine. The line is
Explanation: An attempt was made to load the cited
ignored; processing continues.
module, but it was already loaded.
Explanation: The specified parameter is not used by
User response: None. Informational message only.
GPFS anymore.
User response: None. Informational message only.
6027-1241 diskName was not found in
/proc/partitions.
6027-1247 configParameter cannot appear in a
Explanation: The cited disk was not found in
node-override section. Line in error:
/proc/partitions.
configLine. The line is ignored;
User response: Take steps to cause the disk to appear processing continues.
in /proc/partitions, and then reissue the command.
Explanation: The specified parameter must have the
same value across all nodes in the cluster.
6027-1242 GPFS is waiting for requiredCondition
User response: None. Informational message only.
Explanation: GPFS is unable to come up immediately
due to the stated required condition not being satisfied
6027-1248 Mount point can not be a relative path
yet.
name: path
User response: This is an informational message. As
Explanation: The mount point does not begin with /.
long as the required condition is not satisfied, this
message will repeat every five minutes. You may want User response: Specify the absolute path name for the
to stop the GPFS daemon after a while, if it will be a mount point.
long time before the required condition will be met.
6027-1249 operand can not be a relative path name:
6027-1243 command: Processing user configuration path.
file fileName
Explanation: The specified path name does not begin
Explanation: Progress information for the mmcrcluster with '/'.
command.
User response: Specify the absolute path name.
User response: None. Informational message only.
6027-1250 Key file is not valid.
6027-1244 configParameter is set by the mmcrcluster
Explanation: While attempting to establish a
processing. Line in error: configLine. The
connection to another node, GPFS detected that the
line will be ignored; processing
format of the public key file is not valid.
continues.
User response: Use the mmremotecluster command to
Explanation: The specified parameter is set by the
specify the correct public key.
mmcrcluster command and cannot be overridden by
the user.
6027-1251 Key file mismatch.
User response: None. Informational message only.
Explanation: While attempting to establish a
connection to another node, GPFS detected that the
6027-1245 configParameter must be set with the
public key file does not match the public key file of the
command command. Line in error:
cluster to which the file system belongs.
configLine. The line is ignored;
processing continues. User response: Use the mmremotecluster command to
specify the correct public key.
Explanation: The specified parameter has additional
dependencies and cannot be specified prior to the
completion of the mmcrcluster command. 6027-1252 Node nodeName already belongs to the
GPFS cluster.
User response: After the cluster is created, use the
specified command to establish the desired Explanation: A GPFS command found that a node to
configuration parameter. be added to a GPFS cluster already belongs to the
cluster.
User response: Specify a node that does not already
belong to the GPFS cluster.

174 GPFS: Problem Determination Guide


6027-1253 • 6027-1269

6027-1253 Incorrect value for option option. 6027-1259 command not found. Ensure the
OpenSSL code is properly installed.
Explanation: The provided value for the specified
option is not valid. Explanation: The specified command was not found.
User response: Correct the error and reissue the User response: Ensure the OpenSSL code is properly
command. installed and reissue the command.

6027-1254 Warning: Not all nodes have proper | 6027-1260 File fileName does not contain any
GPFS license designations. Use the | typeOfStanza stanzas.
mmchlicense command to designate
licenses as needed.
| Explanation: The input file should contain at least one
| specified stanza.
Explanation: Not all nodes in the cluster have valid
license designations.
| User response: Correct the input file and reissue the
| command.
User response: Use mmlslicense to see the current
license designations. Use mmchlicense to assign valid
GPFS licenses to all nodes as needed.
| 6027-1261 descriptorField must be specified in
| descriptorType descriptor.

6027-1255 There is nothing to commit. You must


| Explanation: A required field of the descriptor was
first run: command.
| empty. The incorrect descriptor is displayed following
| this message.
Explanation: You are attempting to commit an SSL
private key but such a key has not been generated yet.
| User response: Correct the input and reissue the
| command.
User response: Run the specified command to
generate the public/private key pair.
| 6027-1262 Unable to obtain the GPFS
| configuration file lock. Retrying ...
6027-1256 The current authentication files are
already committed.
| Explanation: A command requires the lock for the
| GPFS system data but was not able to obtain it.
Explanation: You are attempting to commit
public/private key files that were previously generated
| User response: None. Informational message only.
with the mmauth command. The files have already
been committed. | 6027-1263 Unable to obtain the GPFS
User response: None. Informational message.
| configuration file lock.
| Explanation: A command requires the lock for the
6027-1257 There are uncommitted authentication
| GPFS system data but was not able to obtain it.
files. You must first run: command. | User response: Check the preceding messages, if any.
Explanation: You are attempting to generate new
| Follow the procedure in “GPFS cluster configuration
public/private key files but previously generated files
| data files are locked” on page 44, and then reissue the
have not been committed yet.
| command.

User response: Run the specified command to commit


6027-1268 Missing arguments.
the current public/private key pair.
Explanation: A GPFS administration command
received an insufficient number of arguments.
6027-1258 You must establish a cipher list first.
Run: command. User response: Correct the command line and reissue
the command.
Explanation: You are attempting to commit an SSL
private key but a cipher list has not been established
yet. 6027-1269 The device name device starts with a
slash, but not /dev/.
User response: Run the specified command to specify
a cipher list. Explanation: The device name does not start with
/dev/.
User response: Correct the device name.

Chapter 13. Messages 175


6027-1270 • 6027-1290

6027-1270 The device name device contains a slash, 6027-1277 No contact nodes were provided for
but not as its first character. cluster clusterName.
Explanation: The specified device name contains a Explanation: A GPFS command found that no contact
slash, but the first character is not a slash. nodes have been specified for the cited cluster.
User response: The device name must be an User response: Use the mmremotecluster command to
unqualified device name or an absolute device path specify some contact nodes for the cited cluster.
name, for example: fs0 or /dev/fs0.
6027-1278 None of the contact nodes in cluster
6027-1271 Unexpected error from command. Return clusterName can be reached.
code: value
Explanation: A GPFS command was unable to reach
Explanation: A GPFS administration command (mm...) any of the contact nodes for the cited cluster.
received an unexpected error code from an internally
User response: Determine why the contact nodes for
called command.
the cited cluster cannot be reached and correct the
User response: Perform problem determination. See problem, or use the mmremotecluster command to
“GPFS commands are unsuccessful” on page 56. specify some additional contact nodes that can be
reached.
6027-1272 Unknown user name userName.
6027-1287 Node nodeName returned ENODEV for
Explanation: The specified value cannot be resolved to
disk diskName.
a valid user ID (UID).
Explanation: The specified node returned ENODEV
User response: Reissue the command with a valid
for the specified disk.
user name.
User response: Determine the cause of the ENODEV
error for the specified disk and rectify it. The ENODEV
6027-1273 Unknown group name groupName.
may be due to disk fencing or the removal of a device
Explanation: The specified value cannot be resolved to that previously was present.
a valid group ID (GID).
User response: Reissue the command with a valid 6027-1288 Remote cluster clusterName was not
group name. found.
Explanation: A GPFS command found that the cited
6027-1274 Unexpected error obtaining the lockName cluster has not yet been identified to GPFS as a remote
lock. cluster.

Explanation: GPFS cannot obtain the specified lock. User response: Specify a remote cluster known to
GPFS, or use the mmremotecluster command to make
User response: Examine any previous error messages. the cited cluster known to GPFS.
Correct any problems and reissue the command. If the
problem persists, perform problem determination and
contact the IBM Support Center. 6027-1289 Name name is not allowed. It contains
the following invalid special character:
char
6027-1275 Daemon node adapter Node was not
found on admin node Node. Explanation: The cited name is not allowed because it
contains the cited invalid special character.
Explanation: An input node descriptor was found to
be incorrect. The node adapter specified for GPFS User response: Specify a name that does not contain
daemon communications was not found to exist on the an invalid special character, and reissue the command.
cited GPFS administrative node.
User response: Correct the input node descriptor and 6027-1290 GPFS configuration data for file system
reissue the command. fileSystem may not be in agreement with
the on-disk data for the file system.
Issue the command: mmcommon
6027-1276 Command failed for disks: diskList. recoverfs fileSystem
Explanation: A GPFS command was unable to Explanation: GPFS detected that the GPFS
complete successfully on the listed disks. configuration database data for the specified file system
User response: Correct the problems and reissue the may not be in agreement with the on-disk data for the
command. file system. This may be caused by a GPFS disk

176 GPFS: Problem Determination Guide


6027-1291 • 6027-1302

command that did not complete normally.


6027-1297 Each device specifies metadataOnly for
User response: Issue the specified command to bring disk usage. This file system could not
the GPFS configuration database into agreement with store data.
the on-disk data.
Explanation: All disk descriptors specify
metadataOnly for disk usage.
6027-1291 Options name and name cannot be
User response: Change at least one disk descriptor in
specified at the same time.
the file system to indicate the usage of dataOnly or
Explanation: Incompatible options were specified on dataAndMetadata.
the command line.
User response: Select one of the options and reissue 6027-1298 Each device specifies dataOnly for disk
the command. usage. This file system could not store
metadata.

| 6027-1292 The -N option cannot be used with Explanation: All disk descriptors specify dataOnly for
| attribute name. disk usage.

Explanation: The specified configuration attribute User response: Change at least one disk descriptor in
cannot be changed on only a subset of nodes. This the file system to indicate a usage of metadataOnly or
attribute must be the same on all nodes in the cluster. dataAndMetadata.

User response: Certain attributes, such as autoload,


may not be customized from node to node. Change the 6027-1299 Incorrect value value specified for failure
attribute for the entire cluster. group.
Explanation: The specified failure group is not valid.
6027-1293 There are no remote file systems.
User response: Correct the problem and reissue the
Explanation: A value of all was specified for the command.
remote file system operand of a GPFS command, but
no remote file systems are defined.
6027-1300 No file systems were found.
User response: None. There are no remote file systems
Explanation: A GPFS command searched for file
on which to operate.
systems, but none were found.
User response: Create a GPFS file system before
6027-1294 Remote file system fileSystem is not
reissuing the command.
defined.
Explanation: The specified file system was used for
6027-1301 The NSD servers specified in the disk
the remote file system operand of a GPFS command,
descriptor do not match the NSD servers
but the file system is not known to GPFS.
currently in effect.
User response: Specify a remote file system known to
Explanation: The set of NSD servers specified in the
GPFS.
disk descriptor does not match the set that is currently
in effect.
6027-1295 The GPFS configuration information is
User response: Specify the same set of NSD servers in
incorrect or not available.
the disk descriptor as is currently in effect or omit it
Explanation: A problem has been encountered while from the disk descriptor and then reissue the
verifying the configuration information and the command. Use the mmchnsd command to change the
execution environment. NSD servers as needed.

User response: Check the preceding messages for


more information. Correct the problem and restart 6027-1302 clusterName is the name of the local
GPFS. cluster.
Explanation: The cited cluster name was specified as
6027-1296 Device name cannot be 'all'. the name of a remote cluster, but it is already being
used as the name of the local cluster.
Explanation: A device name of all was specified on a
GPFS command. User response: Use the mmchcluster command to
change the name of the local cluster, and then reissue
User response: Reissue the command with a valid the command that failed.
device name.

Chapter 13. Messages 177


6027-1303 • 6027-1339

| all nodes in the cluster, and then reissue the command.


| 6027-1303 This function is not available in the
| GPFS Express Edition.
6027-1332 Cannot find disk with command.
| Explanation: The requested function is not part of the
| GPFS Express Edition. Explanation: The specified disk cannot be found.
| User response: Install the GPFS Standard Edition on User response: Specify a correct disk name.
| all nodes in the cluster, and then reissue the command.
6027-1333 The following nodes could not be
6027-1304 Missing argument after option option. restored: nodeList. Correct the problems
and use the mmsdrrestore command to
Explanation: The specified command option requires a
recover these nodes.
value.
Explanation: The mmsdrrestore command was unable
User response: Specify a value and reissue the
to restore the configuration information for the listed
command.
nodes.
User response: Correct the problems and reissue the
| 6027-1305 Prerequisite libraries not found. Ensure
mmsdrrestore command for these nodes.
| productName is properly installed.
| Explanation: The specified software product is
6027-1334 Incorrect value for option option. Valid
| missing or is not properly installed.
values are: validValues.
| User response: Verify that the product is installed
Explanation: An incorrect argument was specified
| properly.
with an option requiring one of a limited number of
legal options.
6027-1306 Command command failed with return
User response: Use one of the legal values for the
code value.
indicated option.
Explanation: A command was not successfully
processed.
6027-1335 Command completed: Not all required
User response: Correct the failure specified by the changes were made.
command and reissue the command.
Explanation: Some, but not all, of the required
changes were made.
6027-1307 Disk disk on node nodeName already has
User response: Examine the preceding messages,
a volume group vgName that does not
correct the problems, and reissue the command.
appear to have been created by this
program in a prior invocation. Correct
the descriptor file or remove the volume 6027-1338 Command is not allowed for remote file
group and retry. systems.
Explanation: The specified disk already belongs to a Explanation: A command for which a remote file
volume group. system is not allowed was issued against a remote file
system.
User response: Either remove the volume group or
remove the disk descriptor and retry. User response: Choose a local file system, or issue the
command on a node in the cluster that owns the file
system.
| 6027-1308 feature is not available in the GPFS
| Express Edition.
6027-1339 Disk usage value is incompatible with
| Explanation: The specified function or feature is not
storage pool name.
| part of the GPFS Express Edition.
Explanation: A disk descriptor specified a disk usage
| User response: Install the GPFS Standard Edition on
involving metadata and a storage pool other than
| all nodes in the cluster, and then reissue the command.
system.
User response: Change the descriptor's disk usage
| 6027-1309 Storage pools are not available in the
field to dataOnly, or do not specify a storage pool
| GPFS Express Edition.
name.
| Explanation: Support for multiple storage pools is not
| part of the GPFS Express Edition.
| User response: Install the GPFS Standard Edition on

178 GPFS: Problem Determination Guide


6027-1340 • 6027-1361

6027-1340 File fileName not found. Recover the file 6027-1348 Disk with NSD volume id NSD volume
or run mmauth genkey. id no longer exists in the GPFS cluster
configuration data but the NSD volume
Explanation: The cited file was not found.
id was not erased from the disk. To
User response: Recover the file or run the mmauth remove the NSD volume id, issue:
genkey command to recreate it. mmdelnsd -p NSD volume id -N
nodeNameList

6027-1341 Starting force unmount of GPFS file Explanation: A GPFS administration command (mm...)
systems successfully removed the disk with the specified NSD
volume id from the GPFS cluster configuration data but
Explanation: Progress information for the was unable to erase the NSD volume id from the disk.
mmshutdown command.
User response: Issue the specified command to
User response: None. Informational message only. remove the NSD volume id from the disk.

6027-1342 Unmount not finished after value 6027-1352 fileSystem is not a remote file system
seconds. Waiting value more seconds. known to GPFS.
Explanation: Progress information for the Explanation: The cited file system is not the name of a
mmshutdown command. remote file system known to GPFS.
User response: None. Informational message only. User response: Use the mmremotefs command to
identify the cited file system to GPFS as a remote file
6027-1343 Unmount not finished after value system, and then reissue the command that failed.
seconds.
Explanation: Progress information for the 6027-1357 An internode connection between GPFS
mmshutdown command. nodes was disrupted.

User response: None. Informational message only. Explanation: An internode connection between GPFS
nodes was disrupted, preventing its successful
completion.
6027-1344 Shutting down GPFS daemons
User response: Reissue the command. If the problem
Explanation: Progress information for the recurs, determine and resolve the cause of the
mmshutdown command. disruption. If the problem persists, contact the IBM
User response: None. Informational message only. Support Center.

6027-1345 Finished 6027-1358 No clusters are authorized to access this


cluster.
Explanation: Progress information for the
mmshutdown command. Explanation: Self-explanatory.

User response: None. Informational message only. User response: This is an informational message.

6027-1347 Disk with NSD volume id NSD volume 6027-1359 Cluster clusterName is not authorized to
id no longer exists in the GPFS cluster access this cluster.
configuration data but the NSD volume Explanation: Self-explanatory.
id was not erased from the disk. To
remove the NSD volume id, issue: User response: This is an informational message.
mmdelnsd -p NSD volume id
Explanation: A GPFS administration command (mm...) 6027-1361 Attention: There are no available valid
successfully removed the disk with the specified NSD VFS type values for mmfs in /etc/vfs.
volume id from the GPFS cluster configuration data but Explanation: An out of range number was used as the
was unable to erase the NSD volume id from the disk. vfs number for GPFS.
User response: Issue the specified command to User response: The valid range is 8 through 32. Check
remove the NSD volume id from the disk. /etc/vfs and remove unneeded entries.

Chapter 13. Messages 179


6027-1362 • 6027-1375

file system, you should either reformat them, or use the


6027-1362 There are no remote cluster definitions.
-v no option on the mmcrfs or mmadddisk command.
Explanation: A value of all was specified for the
remote cluster operand of a GPFS command, but no
6027-1368 This GPFS cluster contains declarations
remote clusters are defined.
for remote file systems and clusters. You
User response: None. There are no remote clusters on cannot delete the last node.
which to operate.
Explanation: An attempt has been made to delete a
GPFS cluster that still has declarations for remote file
6027-1363 Remote cluster clusterName is not systems and clusters.
defined.
User response: Before deleting the last node of a GPFS
Explanation: The specified cluster was specified for cluster, delete all remote cluster and file system
the remote cluster operand of a GPFS command, but information. Use the delete option of the
the cluster is not known to GPFS. mmremotecluster and mmremotefs commands.
User response: Specify a remote cluster known to
GPFS. 6027-1370 The following nodes could not be
reached:
6027-1364 No disks specified Explanation: A GPFS command was unable to
communicate with one or more nodes in the cluster. A
Explanation: There were no disks in the descriptor list
list of the nodes that could not be reached follows.
or file.
User response: Determine why the reported nodes
User response: Specify at least one disk.
could not be reached and resolve the problem.

6027-1365 Disk diskName already belongs to file


6027-1371 Propagating the cluster configuration
system fileSystem.
data to all affected nodes. This is an
Explanation: The specified disk name is already asynchronous process.
assigned to a GPFS file system. This may be because
Explanation: A process is initiated to distribute the
the disk was specified more than once as input to the
cluster configuration data to other nodes in the cluster.
command, or because the disk was assigned to a GPFS
file system in the past. User response: This is an informational message. The
command does not wait for the distribution to finish.
User response: Specify the disk only once as input to
the command, or specify a disk that does not belong to
a file system. 6027-1373 There is no file system information in
input file fileName.
6027-1366 File system fileSystem has some disks Explanation: The cited input file passed to the
that are in a non-ready state. mmimportfs command contains no file system
information. No file system can be imported.
Explanation: The specified file system has some disks
that are in a non-ready state. User response: Reissue the mmimportfs command
while specifying a valid input file.
User response: Run mmcommon recoverfs fileSystem
to ensure that the GPFS configuration data for the file
system is current. If some disks are still in a non-ready 6027-1374 File system fileSystem was not found in
state, display the states of the disks in the file system input file fileName.
using the mmlsdisk command. Any disks in an
Explanation: The specified file system was not found
undesired non-ready state should be brought into the
in the input file passed to the mmimportfs command.
ready state by using the mmchdisk command or by
The file system cannot be imported.
mounting the file system. If these steps do not bring
the disks into the ready state, use the mmdeldisk User response: Reissue the mmimportfs command
command to delete the disks from the file system. while specifying a file system that exists in the input
file.
6027-1367 Attention: Not all disks were marked as
available. 6027-1375 The following file systems were not
imported: fileSystem.
Explanation: The process of marking the disks as
available could not be completed. Explanation: The mmimportfs command was unable
to import one or more of the file systems in the input
User response: Before adding these disks to a GPFS

180 GPFS: Problem Determination Guide


6027-1377 • 6027-1390

file. A list of the file systems that could not be


6027-1382 This node does not belong to a GPFS
imported follows.
cluster.
User response: Examine the preceding messages,
Explanation: The specified node does not appear to
rectify the problems that prevented the importation of
belong to a GPFS cluster, or the GPFS configuration
the file systems, and reissue the mmimportfs
information on the node has been lost.
command.
User response: Informational message. If you suspect
that there is corruption of the GPFS configuration
6027-1377 Attention: Unknown attribute specified:
information, recover the data following the procedures
name. Press the ENTER key to continue.
outlined in “Recovery from loss of GPFS cluster
Explanation: The mmchconfig command received an configuration data file” on page 45.
unknown attribute.
User response: Unless directed otherwise by the IBM 6027-1383 There is no record for this node in file
Support Center, press any key to bypass this attribute. fileName. Either the node is not part of
the cluster, the file is for a different
cluster, or not all of the node's adapter
6027-1378 Incorrect record found in the mmsdrfs interfaces have been activated yet.
file (code value):
Explanation: The mmsdrrestore command cannot find
Explanation: A line that is not valid was detected in a record for this node in the specified cluster
the main GPFS cluster configuration file configuration file. The search of the file is based on the
/var/mmfs/gen/mmsdrfs. currently active IP addresses of the node as reported by
User response: The data in the cluster configuration the ifconfig command.
file is incorrect. If no user modifications have been User response: Ensure that all adapter interfaces are
made to this file, contact the IBM Support Center. If properly functioning. Ensure that the correct GPFS
user modifications have been made, correct these configuration file is specified on the command line. If
modifications. the node indeed is not a member of the cluster, use the
mmaddnode command instead.
6027-1379 There is no file system with mount
point mountpoint. 6027-1386 Unexpected value for Gpfs object: value.
Explanation: No file system in the GPFS cluster has Explanation: A function received a value that is not
the specified mount point. allowed for the Gpfs object.
User response: Reissue the command with a valid file User response: Perform problem determination.
system.

6027-1388 File system fileSystem is not known to


6027-1380 File system fileSystem is already mounted the GPFS cluster.
at mountpoint.
Explanation: The file system was not found in the
Explanation: The specified file system is mounted at a GPFS cluster.
mount point different than the one requested on the
mmmount command. User response: If the file system was specified as part
of a GPFS command, reissue the command with a valid
User response: Unmount the file system and reissue file system.
the command.

6027-1390 Node node does not belong to the GPFS


6027-1381 Mount point cannot be specified when cluster, or was specified as input
mounting all file systems. multiple times.
Explanation: A device name of all and a mount point Explanation: Nodes that are not valid were specified.
were specified on the mmmount command.
User response: Verify the list of nodes. All specified
User response: Reissue the command with a device nodes must belong to the GPFS cluster, and each node
name for a single file system or do not specify a mount can be specified only once.
point.

Chapter 13. Messages 181


6027-1393 • 6027-1508

6027-1393 Incorrect node designation specified: 6027-1503 Completed adding disks to file system
type. fileSystem.
Explanation: A node designation that is not valid was Explanation: The mmadddisk command successfully
specified. Valid values are client or manager. completed.
User response: Correct the command line and reissue User response: None. Informational message only.
the command.
6027-1504 File name could not be run with err error.
6027-1394 Operation not allowed for the local
Explanation: A failure occurred while trying to run an
cluster.
external program.
Explanation: The requested operation cannot be
User response: Make sure the file exists. If it does,
performed for the local cluster.
check its access permissions.
User response: Specify the name of a remote cluster.
6027-1505 Could not get minor number for name.
6027-1450 Could not allocate storage.
Explanation: Could not obtain a minor number for the
Explanation: Sufficient memory cannot be allocated to specified block or character device.
run the mmsanrepairfs command.
User response: Problem diagnosis will depend on the
User response: Increase the amount of memory subsystem that the device belongs to. For example,
available. device /dev/VSD0 belongs to the IBM Virtual Shared
Disk subsystem and problem determination should
follow guidelines in that subsystem's documentation.
| 6027-1500 [E] Open devicetype device failed with error:
Explanation: The "open" of a device failed. Operation
6027-1507 READ_KEYS ioctl failed with
of the file system may continue unless this device is
errno=returnCode, tried timesTried times.
needed for operation. If this is a replicated disk device,
Related values are
it will often not be needed. If this is a block or
scsi_status=scsiStatusValue,
character device for another subsystem (such as
sense_key=senseKeyValue,
/dev/VSD0) then GPFS will discontinue operation.
scsi_asc=scsiAscValue,
User response: Problem diagnosis will depend on the scsi_ascq=scsiAscqValue.
subsystem that the device belongs to. For instance
Explanation: A READ_KEYS ioctl call failed with the
device "/dev/VSD0" belongs to the IBM Virtual Shared
errno= and related values shown.
Disk subsystem and problem determination should
follow guidelines in that subsystem's documentation. If User response: Check the reported errno= value and
this is a normal disk device then take needed repair try to correct the problem. If the problem persists,
action on the specified disk. contact the IBM Support Center.

| 6027-1501 [X] Volume label of disk name is name, 6027-1508 Registration failed with
should be uid. errno=returnCode, tried timesTried times.
Related values are
Explanation: The UID in the disk descriptor does not
scsi_status=scsiStatusValue,
match the expected value from the file system
sense_key=senseKeyValue,
descriptor. This could occur if a disk was overwritten
scsi_asc=scsiAscValue,
by another application or if the IBM Virtual Shared
scsi_ascq=scsiAscqValue.
Disk subsystem incorrectly identified the disk.
Explanation: A REGISTER ioctl call failed with the
User response: Check the disk configuration.
errno= and related values shown.
User response: Check the reported errno= value and
| 6027-1502 [X] Volume label of disk diskName is try to correct the problem. If the problem persists,
corrupt.
contact the IBM Support Center.
Explanation: The disk descriptor has a bad magic
number, version, or checksum. This could occur if a
disk was overwritten by another application or if the
IBM Virtual Shared Disk subsystem incorrectly
identified the disk.
User response: Check the disk configuration.

182 GPFS: Problem Determination Guide


6027-1509 • 6027-1519

6027-1509 READRES ioctl failed with 6027-1515 READ KEY ioctl failed with
errno=returnCode, tried timesTried times. rc=returnCode. Related values are SCSI
Related values are status=scsiStatusValue,
scsi_status=scsiStatusValue, host_status=hostStatusValue,
sense_key=senseKeyValue, driver_status=driverStatsValue.
scsi_asc=scsiAscValue,
Explanation: An ioctl call failed with stated return
scsi_ascq=scsiAscqValue.
code, errno value, and related values.
Explanation: A READRES ioctl call failed with the
User response: Check the reported errno and correct
errno= and related values shown.
the problem if possible. Otherwise, contact the IBM
User response: Check the reported errno= value and Support Center.
try to correct the problem. If the problem persists,
contact the IBM Support Center.
6027-1516 REGISTER ioctl failed with
rc=returnCode. Related values are SCSI
| 6027-1510 [E] Error mounting file system stripeGroup status=scsiStatusValue,
on mountPoint; errorQualifier (gpfsErrno) host_status=hostStatusValue,
driver_status=driverStatsValue.
Explanation: An error occurred while attempting to
mount a GPFS file system on Windows. Explanation: An ioctl call failed with stated return
code, errno value, and related values.
User response: Examine the error details, previous
errors, and the GPFS message log to identify the cause. User response: Check the reported errno and correct
the problem if possible. Otherwise, contact the IBM
Support Center.
| 6027-1511 [E] Error unmounting file system
stripeGroup; errorQualifier (gpfsErrno)
6027-1517 READ RESERVE ioctl failed with
Explanation: An error occurred while attempting to
rc=returnCode. Related values are SCSI
unmount a GPFS file system on Windows.
status=scsiStatusValue,
User response: Examine the error details, previous host_status=hostStatusValue,
errors, and the GPFS message log to identify the cause. driver_status=driverStatsValue.
Explanation: An ioctl call failed with stated return
| 6027-1512 [E] WMI query for queryType failed; code, errno value, and related values.
errorQualifier (gpfsErrno)
User response: Check the reported errno and correct
Explanation: An error occurred while running a WMI the problem if possible. Otherwise, contact the IBM
query on Windows. Support Center.

User response: Examine the error details, previous


errors, and the GPFS message log to identify the cause. 6027-1518 RESERVE ioctl failed with rc=returnCode.
Related values are SCSI
status=scsiStatusValue,
6027-1513 DiskName is not an sg device, or sg host_status=hostStatusValue,
driver is older than sg3 driver_status=driverStatsValue.
Explanation: The disk is not a SCSI disk, or supports Explanation: An ioctl call failed with stated return
SCSI standard older than SCSI 3. code, errno value, and related values.
User response: Correct the command invocation and User response: Check the reported errno and correct
try again. the problem if possible. Otherwise, contact the IBM
Support Center.
6027-1514 ioctl failed with rc=returnCode. Related
values are SCSI status=scsiStatusValue, 6027-1519 INQUIRY ioctl failed with rc=returnCode.
host_status=hostStatusValue, Related values are SCSI
driver_status=driverStatsValue. status=scsiStatusValue,
Explanation: An ioctl call failed with stated return host_status=hostStatusValue,
code, errno value, and related values. driver_status=driverStatsValue.

User response: Check the reported errno and correct Explanation: An ioctl call failed with stated return
the problem if possible. Otherwise, contact the IBM code, errno value, and related values.
Support Center. User response: Check the reported errno and correct

Chapter 13. Messages 183


6027-1520 • 6027-1536

the problem if possible. Otherwise, contact the IBM


6027-1530 Attention: parameter is set to value.
Support Center.
Explanation: A configuration parameter is temporarily
assigned a new value.
6027-1520 PREEMPT ABORT ioctl failed with
rc=returnCode. Related values are SCSI User response: Check the mmfs.cfg file. Use the
status=scsiStatusValue, mmchconfig command to set a valid value for the
host_status=hostStatusValue, parameter.
driver_status=driverStatsValue.
Explanation: An ioctl call failed with stated return 6027-1531 parameter value
code, errno value, and related values.
Explanation: The configuration parameter was
User response: Check the reported errno and correct changed from its default value.
the problem if possible. Otherwise, contact the IBM
User response: Check the mmfs.cfg file.
Support Center.

6027-1532 Attention: parameter (value) is not valid


6027-1521 Can not find register key registerKeyValue
in conjunction with parameter (value).
at device diskName.
Explanation: A configuration parameter has a value
Explanation: Unable to find given register key at the
that is not valid in relation to some other parameter.
disk.
This can also happen when the default value for some
User response: Correct the problem and reissue the parameter is not sufficiently large for the new, user set
command. value of a related parameter.
User response: Check the mmfs.cfg file.
6027-1522 CLEAR ioctl failed with rc=returnCode.
Related values are SCSI
6027-1533 parameter cannot be set dynamically.
status=scsiStatusValue,
host_status=hostStatusValue, Explanation: The mmchconfig command encountered
driver_status=driverStatsValue. a configuration parameter that cannot be set
dynamically.
Explanation: An ioctl call failed with stated return
code, errno value, and related values. User response: Check the mmchconfig command
arguments. If the parameter must be changed, use the
User response: Check the reported errno and correct
mmshutdown, mmchconfig, and mmstartup sequence
the problem if possible. Otherwise, contact the IBM
of commands.
Support Center.

6027-1534 parameter must have a value.


6027-1523 Disk name longer than value is not
allowed. Explanation: The tsctl command encountered a
configuration parameter that did not have a specified
Explanation: The specified disk name is too long.
value.
User response: Reissue the command with a valid
User response: Check the mmchconfig command
disk name.
arguments.

| 6027-1524 The READ_KEYS ioctl data does not


6027-1535 Unknown config name: parameter
| contain the key that was passed as
| input. Explanation: The tsctl command encountered an
unknown configuration parameter.
Explanation: A REGISTER ioctl call apparently
succeeded, but when the device was queried for the User response: Check the mmchconfig command
key, the key was not found. arguments.
User response: Check the device subsystem and try to
correct the problem. If the problem persists, contact the 6027-1536 parameter must be set using the tschpool
IBM Support Center. command.
Explanation: The tsctl command encountered a
configuration parameter that must be set using the
tschpool command.
User response: Check the mmchconfig command
arguments.

184 GPFS: Problem Determination Guide


6027-1537 [E] • 6027-1547 [A]

| 6027-1537 [E] Connect failed to ipAddress: reason 6027-1543 error propagating parameter.
Explanation: An attempt to connect sockets between Explanation: mmfsd could not propagate a
nodes failed. configuration parameter value to one or more nodes in
the cluster.
User response: Check the reason listed and the
connection to the indicated IP address. User response: Contact the IBM Support Center.

| 6027-1538 [I] Connect in progress to ipAddress | 6027-1544 [W] Sum of prefetchthreads(value),


worker1threads(value) and
Explanation: Connecting sockets between nodes.
nsdMaxWorkerThreads (value) exceeds
User response: None. Information message only. value. Reducing them to value, value and
value.

| 6027-1539 [E] Connect progress select failed to Explanation: The sum of prefetchthreads,
ipAddress: reason worker1threads, and nsdMaxWorkerThreads exceeds
the permitted value.
Explanation: An attempt to connect sockets between
nodes failed. User response: Accept the calculated values or reduce
the individual settings using mmchconfig
User response: Check the reason listed and the prefetchthreads=newvalue or mmchconfig
connection to the indicated IP address. worker1threads=newvalue. or mmchconfig
nsdMaxWorkerThreads=newvalue. After using
| 6027-1540 [A] Try and buy license has expired! mmchconfig, the new settings will not take affect until
the GPFS daemon is restarted.
Explanation: Self explanatory.
User response: Purchase a GPFS license to continue | 6027-1545 [A] The GPFS product that you are
using GPFS. attempting to run is not a fully
functioning version. This probably
| 6027-1541 [N] Try and buy license expires in number means that this is an update version and
days. not the full product version. Install the
GPFS full product version first, then
Explanation: Self-explanatory. apply any applicable update version
User response: When the Try and Buy license expires, before attempting to start GPFS.
you will need to purchase a GPFS license to continue Explanation: GPFS requires a fully licensed GPFS
using GPFS. installation.
User response: Verify installation of licensed GPFS, or
| 6027-1542 [A] Old shared memory exists but it is not purchase and install a licensed version of GPFS.
valid nor cleanable.
Explanation: A new GPFS daemon started and found | 6027-1546 [W] Attention: parameter size of value is too
existing shared segments. The contents were not small. New value is value.
recognizable, so the GPFS daemon could not clean
them up. Explanation: A configuration parameter is temporarily
assigned a new value.
User response:
User response: Check the mmfs.cfg file. Use the
1. Stop the GPFS daemon from trying to start by
mmchconfig command to set a valid value for the
issuing the mmshutdown command for the nodes
parameter.
having the problem.
2. Find the owner of the shared segments with keys
from 0x9283a0ca through 0x9283a0d1. If a non-GPFS | 6027-1547 [A] Error initializing daemon: performing
program owns these segments, GPFS cannot run on shutdown
this node. Explanation: GPFS kernel extensions are not loaded,
3. If these segments are left over from a previous and the daemon cannot initialize. GPFS may have been
GPFS daemon: started incorrectly.
a. Remove them by issuing: User response: Check GPFS log for errors resulting
ipcrm -m shared_memory_id from kernel extension loading. Ensure that GPFS is
b. Restart GPFS by issuing the mmstartup started with the mmstartup command.
command on the affected nodes.

Chapter 13. Messages 185


6027-1548 [A] • 6027-1563

| 6027-1548 [A] Error: daemon and kernel extension do 6027-1559 The -i option failed. Changes will take
not match. effect after GPFS is restarted.
Explanation: The GPFS kernel extension loaded in Explanation: The -i option on the mmchconfig
memory and the daemon currently starting do not command failed. The changes were processed
appear to have come from the same build. successfully, but will take effect only after the GPFS
daemons are restarted.
User response: Ensure that the kernel extension was
reloaded after upgrading GPFS. See “GPFS modules User response: Check for additional error messages.
cannot be loaded on Linux” on page 46 for details. Correct the problem and reissue the command.

| 6027-1549 [A] Attention: custom-built kernel 6027-1560 This GPFS cluster contains file systems.
extension; the daemon and kernel You cannot delete the last node.
extension do not match.
Explanation: An attempt has been made to delete a
Explanation: The GPFS kernel extension loaded in GPFS cluster that still has one or more file systems
memory does not come from the same build as the associated with it.
starting daemon. The kernel extension appears to have
User response: Before deleting the last node of a GPFS
been built from the kernel open source package.
cluster, delete all file systems that are associated with it.
User response: None. This applies to both local and remote file systems.

| 6027-1550 [W] Error: Unable to establish a session 6027-1561 Attention: Failed to remove
with an Active Directory server. ID node-specific changes.
remapping via Microsoft Identity
Explanation: The internal mmfixcfg routine failed to
Management for Unix will be
remove node-specific configuration settings, if any, for
unavailable.
one or more of the nodes being deleted. This is of
Explanation: GPFS tried to establish an LDAP session consequence only if the mmchconfig command was
with an Active Directory server (normally the domain indeed used to establish node specific settings and
controller host), and has been unable to do so. these nodes are later added back into the cluster.
User response: Ensure the domain controller is User response: If you add the nodes back later, ensure
available. that the configuration parameters for the nodes are set
as desired.
6027-1555 Mount point and device name cannot be
equal: name 6027-1562 command command cannot be executed.
Either none of the nodes in the cluster
Explanation: The specified mount point is the same as
are reachable, or GPFS is down on all of
the absolute device name.
the nodes.
User response: Enter a new device name or absolute
Explanation: The command that was issued needed to
mount point path name.
perform an operation on a remote node, but none of
the nodes in the cluster were reachable, or GPFS was
6027-1556 Interrupt received. not accepting commands on any of the nodes.

Explanation: A GPFS administration command User response: Ensure that the affected nodes are
received an interrupt. available and all authorization requirements are met.
Correct any problems and reissue the command.
User response: None. Informational message only.

6027-1563 Attention: The file system may no


6027-1557 You must first generate an longer be properly balanced.
authentication key file. Run: mmauth
genkey new. Explanation: The restripe phase of the mmadddisk or
mmdeldisk command failed.
Explanation: Before setting a cipher list, you must
generate an authentication key file. User response: Determine the cause of the failure and
run the mmrestripefs -b command.
User response: Run the specified command to
establish an authentication key for the nodes in the
cluster.

186 GPFS: Problem Determination Guide


6027-1564 • 6027-1587

6027-1564 To change the authentication key for the 6027-1571 commandName does not exist or failed;
local cluster, run: mmauth genkey. automount mounting may not work.
Explanation: The authentication keys for the local Explanation: One or more of the GPFS file systems
cluster must be created only with the specified were defined with the automount attribute but the
command. requisite automount command is missing or failed.
User response: Run the specified command to User response: Correct the problem and restart GPFS.
establish a new authentication key for the nodes in the Or use the mount command to explicitly mount the file
cluster. system.

6027-1565 disk not found in file system fileSystem. 6027-1572 The command must run on a node that
is part of the cluster.
Explanation: A disk specified for deletion or
replacement does not exist. Explanation: The node running the mmcrcluster
command (this node) must be a member of the GPFS
User response: Specify existing disks for the indicated
cluster.
file system.
User response: Issue the command from a node that
will belong to the cluster.
6027-1566 Remote cluster clusterName is already
defined.
6027-1573 Command completed: No changes made.
Explanation: A request was made to add the cited
cluster, but the cluster is already known to GPFS. Explanation: Informational message.
User response: None. The cluster is already known to User response: Check the preceding messages, correct
GPFS. any problems, and reissue the command.

6027-1567 fileSystem from cluster clusterName is 6027-1574 Permission failure. The command
already defined. requires root authority to execute.
Explanation: A request was made to add the cited file Explanation: The command, or the specified
system from the cited cluster, but the file system is command option, requires root authority.
already known to GPFS.
User response: Log on as root and reissue the
User response: None. The file system is already command.
known to GPFS.
6027-1578 File fileName does not contain node
6027-1568 command command failed. Only names.
parameterList changed.
Explanation: The specified file does not contain valid
Explanation: The mmchfs command failed while node names.
making the requested changes. Any changes to the
User response: Node names must be specified one per
attributes in the indicated parameter list were
line. The name localhost and lines that start with '#'
successfully completed. No other file system attributes
character are ignored.
were changed.
User response: Reissue the command if you want to
6027-1579 File fileName does not contain data.
change additional attributes of the file system. Changes
can be undone by issuing the mmchfs command with Explanation: The specified file does not contain data.
the original value for the affected attribute.
User response: Verify that you are specifying the
correct file name and reissue the command.
6027-1570 virtual shared disk support is not
installed.
6027-1587 Unable to determine the local device
Explanation: The command detected that IBM Virtual name for disk nsdName on node
Shared Disk support is not installed on the node on nodeName.
which it is running.
Explanation: GPFS was unable to determine the local
User response: Install IBM Virtual Shared Disk device name for the specified GPFS disk.
support.
User response: Determine why the specified disk on
the specified node could not be accessed and correct
the problem. Possible reasons include: connectivity

Chapter 13. Messages 187


6027-1588 • 6027-1600

problems, authorization problems, fenced disk, and so


6027-1595 No nodes were found that matched the
forth.
input specification.
Explanation: No nodes were found in the GPFS
6027-1588 Unknown GPFS execution environment:
cluster that matched those specified as input to a GPFS
value
command.
Explanation: A GPFS administration command
User response: Determine why the specified nodes
(prefixed by mm) was asked to operate on an unknown
were not valid, correct the problem, and reissue the
GPFS cluster type. The only supported GPFS cluster
GPFS command.
type is lc. This message may also be generated if there
is corruption in the GPFS system files.
6027-1596 The same node was specified for both
User response: Verify that the correct level of GPFS is
the primary and the secondary server.
installed on the node. If this is a cluster environment,
make sure the node has been defined as a member of Explanation: A command would have caused the
the GPFS cluster with the help of the mmcrcluster or primary and secondary GPFS cluster configuration
the mmaddnode command. If the problem persists, server nodes to be the same.
contact the IBM Support Center.
User response: Specify a different primary or
secondary node.
6027-1590 nodeName cannot be reached.
Explanation: A command needs to issue a remote 6027-1597 Node node is specified more than once.
function on a particular node but the node is not
Explanation: The same node appears more than once
reachable.
on the command line or in the input file for the
User response: Determine why the node is command.
unreachable, correct the problem, and reissue the
User response: All specified nodes must be unique.
command.
Note that even though two node identifiers may appear
different on the command line or in the input file, they
6027-1591 Attention: Unable to retrieve GPFS may still refer to the same node.
cluster files from node nodeName
Explanation: A command could not retrieve the GPFS 6027-1598 Node nodeName was not added to the
cluster files from a particular node. An attempt will be cluster. The node appears to already
made to retrieve the GPFS cluster files from a backup belong to a GPFS cluster.
node.
Explanation: A GPFS cluster command found that a
User response: None. Informational message only. node to be added to a cluster already has GPFS cluster
files on it.
6027-1592 Unable to retrieve GPFS cluster files User response: Use the mmlscluster command to
from node nodeName verify that the node is in the correct cluster. If it is not,
follow the procedure in “Node cannot be added to the
Explanation: A command could not retrieve the GPFS
GPFS cluster” on page 54.
cluster files from a particular node.
User response: Correct the problem and reissue the
6027-1599 The level of GPFS on node nodeName
command.
does not support the requested action.
Explanation: A GPFS command found that the level of
6027-1594 Run the command command until
the GPFS code on the specified node is not sufficient
successful.
for the requested action.
Explanation: The command could not complete
User response: Install the correct level of GPFS.
normally. The GPFS cluster data may be left in a state
that precludes normal operation until the problem is
corrected. 6027-1600 Make sure that the following nodes are
available: nodeList
User response: Check the preceding messages, correct
the problems, and issue the specified command until it Explanation: A GPFS command was unable to
completes successfully. complete because nodes critical for the success of the
operation were not reachable or the command was
interrupted.
User response: This message will normally be

188 GPFS: Problem Determination Guide


6027-1602 • 6027-1623

followed by a message telling you which command to


6027-1614 Cannot open file fileName. Error string
issue as soon as the problem is corrected and the
was: errorString.
specified nodes become available.
Explanation: The mmdsh command was unable to
successfully open a file.
6027-1602 nodeName is not a member of this
cluster. User response: Determine why the file could not be
opened and correct the problem.
Explanation: A command found that the specified
node is not a member of the GPFS cluster.
6027-1615 nodeName remote shell process had
User response: Correct the input or add the node to
return code value.
the GPFS cluster and reissue the command.
Explanation: A child remote shell process completed
with a nonzero return code.
6027-1603 The following nodes could not be added
to the GPFS cluster: nodeList. Correct the User response: Determine why the child remote shell
problems and use the mmaddnode process failed and correct the problem.
command to add these nodes to the
cluster.
6027-1616 Caught SIG signal - terminating the
Explanation: The mmcrcluster or the mmaddnode child processes.
command was unable to add the listed nodes to a
Explanation: The mmdsh command has received a
GPFS cluster.
signal causing it to terminate.
User response: Correct the problems and add the
User response: Determine what caused the signal and
nodes to the cluster using the mmaddnode command.
correct the problem.

6027-1604 Information cannot be displayed. Either


6027-1617 There are no available nodes on which
none of the nodes in the cluster are
to run the command.
reachable, or GPFS is down on all of the
nodes. Explanation: The mmdsh command found that there
are no available nodes on which to run the specified
Explanation: The command needed to perform an
command. Although nodes were specified, none of the
operation on a remote node, but none of the nodes in
nodes were reachable.
the cluster were reachable, or GPFS was not accepting
commands on any of the nodes. User response: Determine why the specified nodes
were not available and correct the problem.
User response: Ensure that the affected nodes are
available and all authorization requirements are met.
Correct any problems and reissue the command. 6027-1618 Unable to pipe. Error string was:
errorString.
6027-1610 Disk diskName is the only disk in file Explanation: The mmdsh command attempted to
system fileSystem. You cannot replace a open a pipe, but the pipe command failed.
disk when it is the only remaining disk
in the file system. User response: Determine why the call to pipe failed
and correct the problem.
Explanation: The mmrpldisk command was issued,
but there is only one disk in the file system.
6027-1619 Unable to redirect outputStream. Error
User response: Add a second disk and reissue the string was: string.
command.
Explanation: The mmdsh command attempted to
redirect an output stream using open, but the open
6027-1613 WCOLL (working collective) command failed.
environment variable not set.
User response: Determine why the call to open failed
Explanation: The mmdsh command was invoked and correct the problem.
without explicitly specifying the nodes on which the
command is to run by means of the -F or -L options,
and the WCOLL environment variable has not been set. 6027-1623 command: Mounting file systems ...

User response: Change the invocation of the mmdsh Explanation: This message contains progress
command to use the -F or -L options, or set the information about the mmmount command.
WCOLL environment variable before invoking the User response: None. Informational message only.
mmdsh command.

Chapter 13. Messages 189


6027-1625 • 6027-1634

6027-1625 option cannot be used with attribute 6027-1630 The GPFS cluster data on nodeName is
name. back level.
Explanation: An attempt was made to change a Explanation: A GPFS command attempted to commit
configuration attribute and requested the change to changes to the GPFS cluster configuration data, but the
take effect immediately (-i or -I option). However, the data on the server is already at a higher level. This can
specified attribute does not allow the operation. happen if the GPFS cluster configuration files were
altered outside the GPFS environment, or if the
User response: If the change must be made now, leave
mmchcluster command did not complete successfully.
off the -i or -I option. Then recycle the nodes to pick
up the new value. User response: Correct any problems and reissue the
command. If the problem persists, issue the mmrefresh
-f -a command.
6027-1626 Command is not supported in the type
environment.
6027-1631 The commit process failed.
Explanation: A GPFS administration command (mm...)
is not supported in the specified environment. Explanation: A GPFS administration command (mm...)
cannot commit its changes to the GPFS cluster
User response: Verify if the task is needed in this
configuration data.
environment, and if it is, use a different command.
User response: Examine the preceding messages,
correct the problem, and reissue the command. If the
6027-1627 The following nodes are not aware of
problem persists, perform problem determination and
the configuration server change: nodeList.
contact the IBM Support Center.
Do not start GPFS on the above nodes
until the problem is resolved.
6027-1632 The GPFS cluster configuration data on
Explanation: The mmchcluster command could not
nodeName is different than the data on
propagate the new cluster configuration servers to the
nodeName.
specified nodes.
Explanation: The GPFS cluster configuration data on
User response: Correct the problems and run the
the primary cluster configuration server node is
mmchcluster -p LATEST command before starting
different than the data on the secondary cluster
GPFS on the specified nodes.
configuration server node. This can happen if the GPFS
cluster configuration files were altered outside the
6027-1628 Cannot determine basic environment GPFS environment or if the mmchcluster command did
information. Not enough nodes are not complete successfully.
available.
User response: Correct any problems and issue the
Explanation: The mmchcluster command was unable mmrefresh -f -a command. If the problem persists,
to retrieve the GPFS cluster data files. Usually, this is perform problem determination and contact the IBM
due to too few nodes being available. Support Center.

User response: Correct any problems and ensure that


as many of the nodes in the cluster are available as 6027-1633 Failed to create a backup copy of the
possible. Reissue the command. If the problem persists, GPFS cluster data on nodeName.
record the above information and contact the IBM
Explanation: Commit could not create a correct copy
Support Center.
of the GPFS cluster configuration data.
User response: Check the preceding messages, correct
| 6027-1629 Error found while checking node
any problems, and reissue the command. If the
| descriptor descriptor
problem persists, perform problem determination and
| Explanation: A node descriptor was found to be contact the IBM Support Center.
| unsatisfactory in some way.
| User response: Check the preceding messages, if any, 6027-1634 The GPFS cluster configuration server
| and correct the condition that caused the disk node nodeName cannot be removed.
| descriptor to be rejected. Explanation: An attempt was made to delete a GPFS
cluster configuration server node.
User response: You cannot remove a cluster
configuration server node unless all nodes in the GPFS
cluster are being deleted. Before deleting a cluster
configuration server node, you must use the

190 GPFS: Problem Determination Guide


6027-1636 • 6027-1662

mmchcluster command to transfer its function to


6027-1644 Attention: The number of quorum
another node in the GPFS cluster.
nodes exceeds the suggested maximum
(number).
6027-1636 Error found while checking disk
Explanation: The number of quorum nodes in the
descriptor descriptor
cluster exceeds the maximum suggested number of
Explanation: A disk descriptor was found to be quorum nodes.
unsatisfactory in some way.
User response: Informational message. Consider
User response: Check the preceding messages, if any, reducing the number of quorum nodes to the
and correct the condition that caused the disk maximum suggested number of quorum nodes for
descriptor to be rejected. improved performance.

6027-1637 command quitting. None of the specified 6027-1645 Node nodeName is fenced out from disk
nodes are valid. diskName.

Explanation: A GPFS command found that none of Explanation: A GPFS command attempted to access
the specified nodes passed the required tests. the specified disk, but found that the node attempting
the operation was fenced out from the disk.
User response: Determine why the nodes were not
accepted, fix the problems, and reissue the command. User response: Check whether there is a valid reason
why the node should be fenced out from the disk. If
there is no such reason, unfence the disk and reissue
6027-1638 Command: There are no unassigned the command.
nodes in the cluster.
Explanation: A GPFS command in a cluster 6027-1647 Unable to find disk with NSD volume
environment needs unassigned nodes, but found there id NSD volume id.
are none.
Explanation: A disk with the specified NSD volume id
User response: Verify whether there are any cannot be found.
unassigned nodes in the cluster. If there are none,
either add more nodes to the cluster using the User response: Specify a correct disk NSD volume id.
mmaddnode command, or delete some nodes from the
cluster using the mmdelnode command, and then
6027-1648 GPFS was unable to obtain a lock from
reissue the command.
node nodeName.
Explanation: GPFS failed in its attempt to get a lock
6027-1639 Command failed. Examine previous
from another node in the cluster.
error messages to determine cause.
User response: Verify that the reported node is
Explanation: A GPFS command failed due to
reachable. Examine previous error messages, if any. Fix
previously-reported errors.
the problems and then reissue the command.
User response: Check the previous error messages, fix
the problems, and then reissue the command. If no
6027-1661 Failed while processing disk descriptor
other messages are shown, examine the GPFS log files
descriptor on node nodeName.
in the /var/adm/ras directory on each node.
Explanation: A disk descriptor was found to be
unsatisfactory in some way.
6027-1642 command: Starting GPFS ...
User response: Check the preceding messages, if any,
Explanation: Progress information for the mmstartup
and correct the condition that caused the disk
command.
descriptor to be rejected.
User response: None. Informational message only.
6027-1662 Disk device deviceName refers to an
6027-1643 The number of quorum nodes exceeds existing NSD name
the maximum (number) allowed.
Explanation: The specified disk device refers to an
Explanation: An attempt was made to add more existing NSD.
quorum nodes to a cluster than the maximum number
User response: Specify another disk that is not an
allowed.
existing NSD.
User response: Reduce the number of quorum nodes,
and reissue the command.

Chapter 13. Messages 191


6027-1663 • 6027-1689

6027-1663 Disk descriptor descriptor should refer to 6027-1677 Disk diskName is of an unknown type.
an existing NSD. Use mmcrnsd to create
Explanation: The specified disk is of an unknown
the NSD.
type.
Explanation: An NSD disk given as input is not
User response: Specify a disk whose type is
known to GPFS.
recognized by GPFS.
User response: Create the NSD. Then rerun the
command.
6027-1680 Disk name diskName is already
registered for use by GPFS.
6027-1664 command: Processing node nodeName
Explanation: The cited disk name was specified for
Explanation: Progress information. use by GPFS, but there is already a disk by that name
registered for use by GPFS.
User response: None. Informational message only.
User response: Specify a different disk name for use
by GPFS and reissue the command.
6027-1665 Issue the command from a node that
remains in the cluster.
6027-1681 Node nodeName is being used as an NSD
Explanation: The nature of the requested change
server.
requires the command be issued from a node that will
remain in the cluster. Explanation: The specified node is defined as a server
node for some disk.
User response: Run the command from a node that
will remain in the cluster. User response: If you are trying to delete the node
from the GPFS cluster, you must either delete the disk
or define another node as its server.
| 6027-1666 [I] No disks were found.
Explanation: A command searched for disks but
6027-1685 Processing continues without lock
found none.
protection.
User response: If disks are desired, create some using
Explanation: The command will continue processing
the mmcrnsd command.
although it was not able to obtain the lock that
prevents other GPFS commands from running
6027-1670 Incorrect or missing remote shell simultaneously.
command: name
User response: Ensure that no other GPFS command
Explanation: The specified remote command does not is running. See the command documentation for
exist or is not executable. additional details.

User response: Specify a valid command.


6027-1688 Command was unable to obtain the lock
for the GPFS system data. Unable to
6027-1671 Incorrect or missing remote file copy reach the holder of the lock nodeName.
command: name Check the preceding messages, if any.
Explanation: The specified remote command does not Follow the procedure outlined in the
exist or is not executable. GPFS: Problem Determination Guide.

User response: Specify a valid command. Explanation: A command requires the lock for the
GPFS system data but was not able to obtain it.

6027-1672 option value parameter must be an User response: Check the preceding messages, if any.
absolute path name. Follow the procedure in the GPFS: Problem
Determination Guide for what to do when the GPFS
Explanation: The mount point does not begin with '/'. system data is locked. Then reissue the command.
User response: Specify the full path for the mount
point. 6027-1689 vpath disk diskName is not recognized as
an IBM SDD device.
6027-1674 command: Unmounting file systems ... Explanation: The mmvsdhelper command found that
Explanation: This message contains progress the specified disk is a vpath disk, but it is not
information about the mmumount command. recognized as an IBM SDD device.

User response: None. Informational message only. User response: Ensure the disk is configured as an

192 GPFS: Problem Determination Guide


6027-1699 • 6027-1712

IBM SDD device. Then reissue the command. as an independent program.


User response: Retry with a valid number of
6027-1699 Remount failed for file system connections.
fileSystem. Error code errorCode.
Explanation: The specified file system was internally 6027-1706 mmspsecserver: parent program is not
unmounted. An attempt to remount the file system "mmfsd", exiting...
failed with the specified error code.
Explanation: The mmspsecserver process was invoked
User response: Check the daemon log for additional from a program other than mmfsd.
error messages. Ensure that all file system disks are
User response: None. Informational message only.
available and reissue the mount command.

6027-1707 mmfsd connected to mmspsecserver


6027-1700 Failed to load LAPI library. functionName
not found. Changing communication Explanation: The mmfsd daemon has successfully
protocol to TCP. connected to the mmspsecserver process through the
communication socket.
Explanation: The GPFS daemon failed to load
liblapi_r.a dynamically. User response: None. Informational message only.
User response: Verify installation of liblapi_r.a.
6027-1708 The mmfsd daemon failed to fork
mmspsecserver. Failure reason
6027-1701 mmfsd waiting to connect to
explanation
mmspsecserver. Setting up to retry every
number seconds for number minutes. Explanation: The mmfsd daemon failed to fork a child
process.
Explanation: The GPFS daemon failed to establish a
connection with the mmspsecserver process. User response: Check the GPFS installation.
User response: None. Informational message only.
| 6027-1709 [I] Accepted and connected to ipAddress
6027-1702 Process pid failed at functionName call, Explanation: The local mmfsd daemon has
socket socketName, errno value successfully accepted and connected to a remote
daemon.
Explanation: Either The mmfsd daemon or the
mmspsecserver process failed to create or set up the User response: None. Informational message only.
communication socket between them.
User response: Determine the reason for the error. | 6027-1710 [N] Connecting to ipAddress
Explanation: The local mmfsd daemon has started a
6027-1703 The processName process encountered connection request to a remote daemon.
error: errorString.
User response: None. Informational message only.
Explanation: Either the mmfsd daemon or the
mmspsecserver process called the error log routine to
log an incident. | 6027-1711 [I] Connected to ipAddress

User response: None. Informational message only. Explanation: The local mmfsd daemon has
successfully connected to a remote daemon.

6027-1704 mmspsecserver (pid number) ready for User response: None. Informational message only.
service.
Explanation: The mmspsecserver process has created 6027-1712 Unexpected zero bytes received from
all the service threads necessary for mmfsd. name. Continuing.

User response: None. Informational message only. Explanation: This is an informational message. A
socket read resulted in zero bytes being read.

6027-1705 command: incorrect number of User response: If this happens frequently, check IP
connections (number), exiting... connections.

Explanation: The mmspsecserver process was called


with an incorrect number of connections. This will
happen only when the mmspsecserver process is run

Chapter 13. Messages 193


6027-1715 • 6027-1731 [E]

cluster's key and register it using: mmremotecluster


6027-1715 EINVAL trap from connect call to
update.
ipAddress (socket name)
Explanation: The connect call back to the requesting
| 6027-1727 [E] The administrator of the cluster named
node failed.
clusterName does not require
User response: This is caused by a bug in AIX socket authentication. Unregister the clusters
support. Upgrade AIX kernel and TCP client support. key using "mmremotecluster update".
Explanation: The administrator of the cluster does not
| 6027-1716 [N] Close connection to ipAddress require authentication.
Explanation: Connection socket closed. User response: Unregister the clusters key using:
mmremotecluster update.
User response: None. Informational message only.

| 6027-1728 [E] Remote mounts are not enabled within


| 6027-1717 [E] Error initializing the configuration the cluster named clusterName. Contact
| server, err value
the administrator and request that they
| Explanation: The configuration server module could enable remote mounts.
| not be initialized due to lack of system resources. Explanation: The administrator of the cluster has not
| User response: Check system memory. enabled remote mounts.
User response: Contact the administrator and request
| 6027-1718 [E] Could not run command name, err value remote mount access.

| Explanation: The GPFS daemon failed to run the


| specified command. | 6027-1729 [E] The cluster named clusterName has not
authorized this cluster to mount file
| User response: Verify correct installation. systems. Contact the cluster
administrator and request access.
| 6027-1724 [E] The key used by the cluster named Explanation: The administrator of the cluster has not
clusterName has changed. Contact the authorized this cluster to mount file systems.
administrator to obtain the new key and
register it using "mmremotecluster User response: Contact the administrator and request
update". access.
Explanation: The administrator of the cluster has
changed the key used for authentication. | 6027-1730 [E] Unsupported cipherList cipherList
requested.
User response: Contact the administrator to obtain the
new key and register it using mmremotecluster update. Explanation: The target cluster requested a cipherList
not supported by the installed version of OpenSSL.

| 6027-1725 [E] The key used by the cluster named User response: Install a version of OpenSSL that
clusterName has changed. Contact the supports the required cipherList or contact the
administrator to obtain the new key and administrator of the target cluster and request that a
register it using "mmauth update". supported cipherList be assigned to this remote cluster.

Explanation: The administrator of the cluster has


changed the key used for authentication. | 6027-1731 [E] Unsupported cipherList cipherList
requested.
User response: Contact the administrator to obtain the
new key and register it using mmauth update. Explanation: The target cluster requested a cipherList
that is not supported by the installed version of
OpenSSL.
| 6027-1726 [E] The administrator of the cluster named
clusterName requires authentication. User response: Either install a version of OpenSSL
Contact the administrator to obtain the that supports the required cipherList or contact the
clusters key and register the key using administrator of the target cluster and request that a
"mmremotecluster update". supported cipherList be assigned to this remote cluster.
Explanation: The administrator of the cluster requires
authentication.
User response: Contact the administrator to obtain the

194 GPFS: Problem Determination Guide


6027-1732 [X] • 6027-1744 [I]

| 6027-1732 [X] Remote mounts are not enabled within | 6027-1738 [E] Close connection to ipAddress
this cluster. (errorString). Attempting reconnect.
Explanation: Remote mounts cannot be performed in Explanation: Connection socket closed.
this cluster.
User response: None. Informational message only.
User response: See the GPFS: Advanced Administration
Guide for instructions about enabling remote mounts. In
particular, make sure the keys have been generated and
| 6027-1739 [X] Accept socket connection failed: err
value.
a cipherlist has been set.
Explanation: The Accept socket connection received
an unexpected error.
6027-1733 OpenSSL dynamic lock support could
not be loaded. User response: None. Informational message only.
Explanation: One of the functions required for
dynamic lock support was not included in the version | 6027-1740 [E] Timed out waiting for a reply from node
of the OpenSSL library that GPFS is configured to use. ipAddress.
User response: If this functionality is required, shut Explanation: A message that was sent to the specified
down the daemon, install a version of OpenSSL with node did not receive a response within the expected
the desired functionality, and configure GPFS to use it. time limit.
Then restart the daemon.
User response: None. Informational message only.

| 6027-1734 [E] OpenSSL engine support could not be


loaded. | 6027-1741 [E] Error code value received from node
ipAddress.
Explanation: One of the functions required for engine
support was not included in the version of the Explanation: When a message was sent to the
OpenSSL library that GPFS is configured to use. specified node to check its status, an error occurred and
the node could not handle the message.
User response: If this functionality is required, shut
down the daemon, install a version of OpenSSL with User response: None. Informational message only.
the desired functionality, and configure GPFS to use it.
Then restart the daemon. | 6027-1742 [E] Message ID value was lost by node
ipAddress.
| 6027-1735 [E] Close connection to ipAddress. Explanation: During a periodic check of outstanding
Attempting reconnect. messages, a problem was detected where the
Explanation: Connection socket closed. The GPFS destination node no longer has any knowledge of a
daemon will attempt to reestablish the connection. particular message.

User response: None. Informational message only. User response: None. Informational message only.

| 6027-1736 [N] Reconnected to ipAddress | 6027-1743 [W] Failed to load GSKit library path:
| (dlerror) errorMessage
Explanation: The local mmfsd daemon has
successfully reconnected to a remote daemon following | Explanation: The GPFS daemon could not load the
an unexpected connection break. | library required to secure the node-to-node
| communications.
User response: None. Informational message only.
| User response: Verify that the gpfs.gskit package
| was properly installed.
| 6027-1737 [N] Close connection to ipAddress
(errorString).
| 6027-1744 [I] GSKit library loaded and initialized.
Explanation: Connection socket closed.
| Explanation: The GPFS daemon successfully loaded
User response: None. Informational message only. | the library required to secure the node-to-node
| communications.
| User response: None. Informational message only.

Chapter 13. Messages 195


6027-1745 [E] • 6027-1806 [X]

| 6027-1745 [E] Unable to resolve symbol for routine: | established because the remote GPFS node closed the
| functionName (dlerror) errorMessage
| connection.

| Explanation: An error occurred while resolving a | User response: None. Informational message only.
| symbol required for transport-level security.
| User response: Verify that the gpfs.gskit package | 6027-1751 [N] A secure send to node ipAddress was
| was properly installed. | cancelled: connection reset by peer
| (return code value).

| 6027-1746 [E] Failed to load or initialize GSKit | Explanation: Securely sending a message failed
| library: error value
| because the remote GPFS node closed the connection.

| Explanation: An error occurred during the | User response: None. Informational message only.
| initialization of the transport-security code.
| User response: Verify that the gpfs.gskit package | 6027-1752 [N] A secure receive to node ipAddress was
| was properly installed. | cancelled: connection reset by peer
| (return code value).

| 6027-1747 [W] The TLS handshake with node | Explanation: Securely receiving a message failed
| ipAddress failed with error value
| because the remote GPFS node closed the connection.
| (handshakeType). | User response: None. Informational message only.
| Explanation: An error occurred while trying to
| establish a secure connection with another GPFS node. | 6027-1803 [E] Global NSD disk, name, not found.
| User response: Examine the error messages to obtain Explanation: A client tried to open a globally-attached
| information about the error. Under normal NSD disk, but a scan of all disks failed to find that
| circumstances, the retry logic will ensure that the NSD.
| connection is re-established. If this error persists, record
| the error code and contact the IBM Support Center. User response: Ensure that the globally-attached disk
is available on every node that references it.

| 6027-1748 [W] A secure receive from node ipAddress


| failed with error value. | 6027-1804 [E] I/O to NSD disk, name, fails. No such
NSD locally found.
| Explanation: An error occurred while receiving an
| encrypted message from another GPFS node. Explanation: A server tried to perform I/O on an
NSD disk, but a scan of all disks failed to find that
| User response: Examine the error messages to obtain NSD.
| information about the error. Under normal
| circumstances, the retry logic will ensure that the User response: Make sure that the NSD disk is
| connection is re-established and the message is accessible to the client. If necessary, break a reservation.
| received. If this error persists, record the error code and
| contact the IBM Support Center. | 6027-1805 [N] Rediscovered nsd server access to
name.
| 6027-1749 [W] A secure send to node ipAddress failed Explanation: A server rediscovered access to the
| with error value. specified disk.
| Explanation: An error occurred while sending an User response: None.
| encrypted message to another GPFS node.
| User response: Examine the error messages to obtain | 6027-1806 [X] A Persistent Reserve could not be
| information about the error. Under normal established on device name (deviceName):
| circumstances, the retry logic will ensure that the errorLine.
| connection is re-established and the message is sent. If
| this error persists, record the error code and contact the Explanation: GPFS is using Persistent Reserve on this
| IBM Support Center. disk, but was unable to establish a reserve for this
node.

| 6027-1750 [N] The handshakeType TLS handshake with User response: Perform disk diagnostics.
| node ipAddress was cancelled: connection
| reset by peer (return code value).
| Explanation: A secure connection could not be

196 GPFS: Problem Determination Guide


6027-1807 [E] • 6027-1819

| 6027-1807 [E] NSD nsdName is using Persistent | 6027-1813 [A] Error reading volume identifier (for
Reserve, this will require an NSD server objectName name) from configuration file.
on an osName node.
Explanation: The volume identifier for the named
Explanation: A client tried to open a globally-attached recovery group or vdisk could not be read from the
NSD disk, but the disk is using Persistent Reserve. An mmsdrfs file. This should never occur.
osName NSD server is needed. GPFS only supports
User response: Check for damage to the mmsdrfs file.
Persistent Reserve on certain operating systems.
User response: Use the mmchnsd command to add an
osName NSD server for the NSD.
| 6027-1814 [E] Vdisk vdiskName cannot be associated
with its recovery group
recoveryGroupName. This vdisk will be
| 6027-1808 [A] Unable to reserve space for NSD ignored.
buffers. Increase pagepool size to at
Explanation: The named vdisk cannot be associated
least requiredPagePoolSize MB. Refer to
with its recovery group.
the GPFS: Administration and
Programming Reference for more User response: Check for damage to the mmsdrfs file.
information on selecting an appropriate
pagepool size.
| 6027-1815 [A] Error reading volume identifier (for
Explanation: The pagepool usage for an NSD buffer NSD name) from configuration file.
(4*maxblocksize) is limited by factor nsdBufSpace. The
value of nsdBufSpace can be in the range of 10-70. The Explanation: The volume identifier for the named
default value is 30. NSD could not be read from the mmsdrfs file. This
should never occur.
User response: Use the mmchconfig command to
decrease the value of maxblocksize or to increase the User response: Check for damage to the mmsdrfs file.
value of pagepool or nsdBufSpace.
| 6027-1816 [E] The defined server serverName for
| 6027-1809 [E] The defined server serverName for NSD recovery group recoveryGroupName could
NsdName couldn't be resolved. not be resolved.

Explanation: The host name of the NSD server could Explanation: The hostname of the NSD server could
not be resolved by gethostbyName(). not be resolved by gethostbyName().

User response: Fix the host name resolution. User response: Fix hostname resolution.

| 6027-1810 [I] Vdisk server recovery: delay number sec. | 6027-1817 [E] Vdisks are defined, but no recovery
for safe recovery. groups are defined.

Explanation: Wait for the existing disk lease to expire Explanation: There are vdisks defined in the mmsdrfs
before performing vdisk server recovery. file, but no recovery groups are defined. This should
never occur.
User response: None.
User response: Check for damage to the mmsdrfs file.

| 6027-1811 [I] Vdisk server recovery: delay complete.


| 6027-1818 [I] Relinquished recovery group
Explanation: Done waiting for existing disk lease to recoveryGroupName (err errorCode).
expire before performing vdisk server recovery.
Explanation: This node has relinquished serving the
User response: None. named recovery group.
User response: None.
| 6027-1812 [E] Rediscovery failed for name.
Explanation: A server failed to rediscover access to the 6027-1819 Disk descriptor for name refers to an
specified disk. existing pdisk.
User response: Check the disk access issues and run Explanation: The mmcrrecoverygroup command or
the command again. mmaddpdisk command found an existing pdisk.
User response: Correct the input file, or use the -v
option.

Chapter 13. Messages 197


6027-1820 • 6027-1854 [E]

6027-1820 Disk descriptor for name refers to an 6027-1850 [E] NSD-RAID services are not configured
existing NSD. on node nodeName. Check the
nsdRAIDTracks and
Explanation: The mmcrrecoverygroup command or
nsdRAIDBufferPoolSizePct
mmaddpdisk command found an existing NSD.
configuration attributes.
User response: Correct the input file, or use the -v
Explanation: A GPFS Native RAID command is being
option.
executed, but NSD-RAID services are not initialized
either because the specified attributes have not been set
6027-1821 Error errno writing disk descriptor on or had invalid values.
name.
User response: Correct the attributes and restart the
Explanation: The mmcrrecoverygroup command or GPFS daemon.
mmaddpdisk command got an error writing the disk
descriptor.
| 6027-1851 [A] Cannot configure NSD-RAID services.
User response: Perform disk diagnostics. The nsdRAIDBufferPoolSizePct of the
pagepool must result in at least 128MiB
of space.
6027-1822 Error errno reading disk descriptor on
name. Explanation: The GPFS daemon is starting and cannot
initialize the NSD-RAID services because of the
Explanation: The tspreparedpdisk command got an memory consideration specified.
error reading the disk descriptor.
User response: Correct the
User response: Perform disk diagnostics. nsdRAIDBufferPoolSizePct attribute and restart the
GPFS daemon.
6027-1823 Path error, name and name are the same
disk. | 6027-1852 [A] Cannot configure NSD-RAID services.
Explanation: The tspreparedpdisk command got an nsdRAIDTracks is too large, the
error during path verification. The pdisk descriptor file maximum on this node is value.
is miscoded. Explanation: The GPFS daemon is starting and cannot
User response: Correct the pdisk descriptor file and initialize the NSD-RAID services because the
reissue the command. nsdRAIDTracks attribute is too large.
User response: Correct the nsdRAIDTracks attribute
| 6027-1824 [X] An unexpected Device Mapper path and restart the GPFS daemon.
dmDevice (nsdId) has been detected. The
new path does not have a Persistent 6027-1853 [E] Recovery group recoveryGroupName does
Reserve set up. Server disk diskName not exist or is not active.
will be put offline
Explanation: A command was issued to a RAID
Explanation: A new device mapper path is detected or recovery group that does not exist, or is not in the
a previously failed path is activated after the local active state.
device discovery has finished. This path lacks a
Persistent Reserve, and cannot be used. All device User response: Retry the command with a valid RAID
paths must be active at mount time. recovery group name or wait for the recovery group to
become active.
User response: Check the paths to all disks making up
the file system. Repair any paths to disks which have
failed. Rediscover the paths for the NSD. 6027-1854 [E] Cannot find declustered array arrayName
in recovery group recoveryGroupName.

| 6027-1825 [A] Unrecoverable NSD checksum error on Explanation: The specified declustered array name
I/O to NSD disk nsdName, using server was not found in the RAID recovery group.
serverName. Exceeds retry limit number.
User response: Specify a valid declustered array name
Explanation: The allowed number of retries was within the RAID recovery group.
exceeded when encountering an NSD checksum error
on I/O to the indicated disk, using the indicated server.
User response: There may be network issues that
require investigation.

198 GPFS: Problem Determination Guide


6027-1855 [E] • 6027-1867 [E]

array consisting of all larger pdisks.


6027-1855 [E] Cannot find pdisk pdiskName in recovery
group recoveryGroupName.
6027-1862 [E] Cannot add pdisk pdiskName to
Explanation: The specified pdisk was not found.
declustered array arrayName; there can
User response: Retry the command with a valid pdisk be at most number pdisks in a
name. declustered array.
Explanation: The maximum number of pdisks that
6027-1856 [E] Vdisk vdiskName not found. can be added to a declustered array was exceeded.
Explanation: The specified vdisk was not found. User response: None.
User response: Retry the command with a valid vdisk
name. 6027-1863 [E] Pdisk sizes within a declustered array
cannot vary by more than number.
6027-1857 [E] A recovery group must contain between Explanation: The disk sizes within each declustered
number and number pdisks. array must be nearly the same.
Explanation: The number of pdisks specified is not User response: Create separate declustered arrays for
valid. each disk size.
User response: Correct the input and retry the
command. 6027-1864 [E] At least one declustered array must be
large and contain number+spares or more
pdisks.
6027-1858 [E] Cannot create declustered array
arrayName; there can be at most number Explanation: When creating a new RAID recovery
declustered arrays in a recovery group. group, at least one of the declustered arrays in the
recovery group must contain at least 2T+1 pdisks,
Explanation: The number of declustered arrays
where T is the maximum number of disk failures that
allowed in a recovery group has been exceeded.
can be tolerated within a declustered array. This is
User response: Reduce the number of declustered necessary in order to store the on-disk recovery group
arrays in the input file and retry the command. descriptor safely.
User response: Supply at least the indicated number
6027-1859 [E] Sector size of pdisk pdiskName is invalid. of pdisks in at least one declustered array of the
recovery group.
Explanation: All pdisks in a recovery group must
have the same physical sector size.
6027-1866 [E] Disk descriptor for diskName refers to an
User response: Correct the input file to use a different existing NSD.
disk and retry the command.
Explanation: A disk being added to a recovery group
appears to already be in-use as an NSD disk.
6027-1860 [E] Pdisk pdiskName must have size at least
number bytes. User response: Carefully check the disks given to
tscrrecgroup, tsaddpdisk or tschcarrier. If you are
Explanation: The pdisk must be at least as large as the certain the disk is not actually in-use, override the
indicated minimum size in order to be added to this check by specifying the -v no option.
declustered array.
User response: Correct the input file and retry the 6027-1867 [E] Disk descriptor for diskName refers to an
command. existing pdisk.
Explanation: A disk being added to a recovery group
6027-1861 [W] Size of pdisk pdiskName is too large for appears to already be in-use as a pdisk.
declustered array arrayName. Only
number of number bytes of that capacity User response: Carefully check the disks given to
will be used. tscrrecgroup, tsaddpdisk or tschcarrier. If you are
certain the disk is not actually in-use, override the
Explanation: For optimal utilization of space, pdisks check by specifying the -v no option.
added to this declustered array should be no larger
than the indicated maximum size. Only the indicated
portion of the total capacity of the pdisk will be
available for use.
User response: Consider creating a new declustered

Chapter 13. Messages 199


6027-1869 [E] • 6027-1880 [E]

6027-1869 [E] Error updating the recovery group 6027-1876 [E] Cannot remove declustered array
descriptor. arrayName because it is the only
remaining declustered array with at
Explanation: Error occurred updating the RAID
least number pdisks.
recovery group descriptor.
Explanation: The command failed to remove a
User response: Retry the command.
declustered array because no other declustered array in
the recovery group has sufficient pdisks to store the
6027-1870 [E] Recovery group name name is already in on-disk recovery group descriptor at the required fault
use. tolerance level.

Explanation: The recovery group name already exists. User response: Add pdisks to another declustered
array in this recovery group before removing this one.
User response: Choose a new recovery group name
using the characters a-z, A-Z, 0-9, and underscore, at
most 63 characters in length. 6027-1877 [E] Cannot remove declustered array
arrayName because the array still
contains vdisks.
6027-1871 [E] There is only enough free space to
allocate number spare(s) in declustered Explanation: Declustered arrays that still contain
array arrayName. vdisks cannot be deleted.

Explanation: Too many spares were specified. User response: Delete any vdisks remaining in this
declustered array using the tsdelvdisk command before
User response: Retry the command with a valid retrying this command.
number of spares.

6027-1878 [E] Cannot remove pdisk pdiskName because


6027-1872 [E] Recovery group still contains vdisks. it is the last remaining pdisk in
Explanation: RAID recovery groups that still contain declustered array arrayName. Remove the
vdisks cannot be deleted. declustered array instead.

User response: Delete any vdisks remaining in this Explanation: The tsdelpdisk command can be used
RAID recovery group using the tsdelvdisk command either to delete individual pdisks from a declustered
before retrying this command. array, or to delete a full declustered array from a
recovery group. You cannot, however, delete a
declustered array by deleting all of its pdisks -- at least
6027-1873 [E] Pdisk creation failed for pdisk one must remain.
pdiskName: err=errorNum.
User response: Delete the declustered array instead of
Explanation: Pdisk creation failed because of the removing all of its pdisks.
specified error.
User response: None. 6027-1879 [E] Cannot remove pdisk pdiskName because
arrayName is the only remaining
6027-1874 [E] Error adding pdisk to a recovery group. declustered array with at least number
pdisks.
Explanation: tsaddpdisk failed to add new pdisks to a
recovery group. Explanation: The command failed to remove a pdisk
from a declustered array because no other declustered
User response: Check the list of pdisks in the -d or -F array in the recovery group has sufficient pdisks to
parameter of tsaddpdisk. store the on-disk recovery group descriptor at the
required fault tolerance level.
6027-1875 [E] Cannot delete the only declustered User response: Add pdisks to another declustered
array. array in this recovery group before removing pdisks
Explanation: Cannot delete the only remaining from this one.
declustered array from a recovery group.
User response: Instead, delete the entire recovery 6027-1880 [E] Cannot remove pdisk pdiskName because
group. the number of pdisks in declustered
array arrayName would fall below the
code width of one or more of its vdisks.
Explanation: The number of pdisks in a declustered
array must be at least the maximum code width of any
vdisk in the declustered array.

200 GPFS: Problem Determination Guide


6027-1881 [E] • 6027-1893 [E]

User response: Either add pdisks or remove vdisks


6027-1887 [E] Vdisk block size must be between
from the declustered array.
number and number for the specified
code.
6027-1881 [E] Cannot remove pdisk pdiskName because
Explanation: An invalid vdisk block size was
of insufficient free space in declustered
specified. The message lists the allowable range of
array arrayName.
block sizes.
Explanation: The tsdelpdisk command could not
User response: Use a vdisk virtual block size within
delete a pdisk because there was not enough free space
the range shown, or use a different vdisk RAID code.
in the declustered array.
User response: Either add pdisks or remove vdisks
6027-1888 [E] Recovery group already contains number
from the declustered array.
vdisks.
Explanation: The RAID recovery group already
6027-1882 [E] Cannot remove pdisk pdiskName; unable
contains the maximum number of vdisks.
to drain the data from the pdisk.
User response: Create vdisks in another RAID
Explanation: Pdisk deletion failed because the system
recovery group, or delete one or more of the vdisks in
could not find enough free space on other pdisks to
the current RAID recovery group before retrying the
drain all of the data from the disk.
tscrvdisk command.
User response: Either add pdisks or remove vdisks
from the declustered array.
6027-1889 [E] Vdisk name vdiskName is already in use.
Explanation: The vdisk name given on the tscrvdisk
6027-1883 [E] Pdisk pdiskName deletion failed: process
command already exists.
interrupted.
User response: Choose a new vdisk name less than 64
Explanation: Pdisk deletion failed because the deletion
characters using the characters a-z, A-Z, 0-9, and
process was interrupted. This is most likely because of
underscore.
the recovery group failing over to a different server.
User response: Retry the command.
6027-1890 [E] A recovery group may only contain one
| log home vdisk.
6027-1884 [E] Missing or invalid vdisk name.
Explanation: A log vdisk already exists in the
Explanation: No vdisk name was given on the recovery group.
tscrvdisk command.
User response: None.
User response: Specify a vdisk name using the
characters a-z, A-Z, 0-9, and underscore of at most 63
| 6027-1891 [E] Cannot create vdisk before the log home
characters in length.
vdisk is created.
Explanation: The log vdisk must be the first vdisk
6027-1885 [E] Vdisk block size must be a power of 2.
created in a recovery group.
Explanation: The -B or --blockSize parameter of
User response: Retry the command after creating the
tscrvdisk must be a power of 2.
| log home vdisk.
User response: Reissue the tscrvdisk command with a
correct value for block size.
6027-1892 [E] Log vdisks must use replication.
Explanation: The log vdisk must use a RAID code
6027-1886 [E] Vdisk block size cannot exceed
that uses replication.
maxBlockSize (number).
User response: Retry the command with a valid RAID
Explanation: The virtual block size of a vdisk cannot
code.
be larger than the value of the GPFS configuration
parameter maxBlockSize.
6027-1893 [E] The declustered array must contain at
User response: Use a smaller vdisk virtual block size,
least as many non-spare pdisks as the
or increase the value of maxBlockSize using
width of the code.
mmchconfig maxBlockSize=newSize.
Explanation: The RAID code specified requires a
minimum number of disks larger than the size of the
declustered array that was given.

Chapter 13. Messages 201


6027-1894 [E] • 6027-1905

User response: Place the vdisk in a wider declustered


6027-1900 Failed to stat pathName.
array or use a narrower code.
Explanation: A stat() call failed for the specified
object.
6027-1894 [E] There is not enough space in the
declustered array to create additional User response: Correct the problem and reissue the
vdisks. command.
Explanation: There is insufficient space in the
declustered array to create even a minimum size vdisk 6027-1901 pathName is not a GPFS file system
with the given RAID code. object.
User response: Add additional pdisks to the Explanation: The specified path name does not resolve
declustered array, reduce the number of spares or use a to an object within a mounted GPFS file system.
different RAID code.
User response: Correct the problem and reissue the
command.
6027-1895 [E] Unable to create vdisk vdiskName
because there are too many failed
6027-1902 The policy file cannot be determined.
pdisks in declustered array
declusteredArrayName. Explanation: The command was not able to retrieve
the policy rules associated with the file system.
Explanation: Cannot create the specified vdisk,
because there are too many failed pdisks in the array. User response: Examine the preceding messages and
correct the reported problems. Establish a valid policy
User response: Replace failed pdisks in the
file with the mmchpolicy command or specify a valid
declustered array and allow time for rebalance
policy file on the command line.
operations to more evenly distribute the space.

6027-1903 path must be an absolute path name.


6027-1896 [E] Insufficient memory for vdisk metadata.
Explanation: The path name did not begin with a /.
Explanation: There was not enough pinned memory
for GPFS to hold all of the metadata necessary to User response: Specify the absolute path name for the
describe a vdisk. object.
User response: Increase the size of the GPFS page
pool. 6027-1904 Device with major/minor numbers
number and number already exists.
6027-1897 [E] Error formatting vdisk. Explanation: A device with the cited major and minor
numbers already exists.
Explanation: An error occurred formatting the vdisk.
User response: Check the preceding messages for
User response: None.
detailed information.

| 6027-1898 [E] The log home vdisk cannot be destroyed


6027-1905 name was not created by GPFS or could
if there are other vdisks.
not be refreshed.
| Explanation: The log home vdisk of a recovery group
Explanation: The attributes (device type, major/minor
cannot be destroyed if vdisks other than the log tip
number) of the specified file system device name are
vdisk still exist within the recovery group.
not as expected.
User response: Remove the user vdisks and then retry
User response: Check the preceding messages for
the command.
detailed information on the current and expected
values. These errors are most frequently caused by the
6027-1899 [E] Vdisk vdiskName is still in use. presence of /dev entries that were created outside the
GPFS environment. Resolve the conflict by renaming or
Explanation: The vdisk named on the tsdelvdisk
deleting the offending entries. Reissue the command
command is being used as an NSD disk.
letting GPFS create the /dev entry with the appropriate
User response: Remove the vdisk with the mmdelnsd parameters.
command before attempting to delete it.

202 GPFS: Problem Determination Guide


6027-1906 • 6027-1933

6027-1906 There is no file system with drive letter 6027-1927 The requested disks are not known to
driveLetter. GPFS.
Explanation: No file system in the GPFS cluster has Explanation: GPFS could not find the requested NSDs
the specified drive letter. in the cluster.
User response: Reissue the command with a valid file User response: Reissue the command, specifying
system. known disks.

6027-1908 The option option is not allowed for 6027-1929 cipherlist is not a valid cipher list.
remote file systems.
Explanation: The cipher list must be set to a value
Explanation: The specified option can be used only supported by GPFS. All nodes in the cluster must
for locally-owned file systems. support a common cipher.
User response: Correct the command line and reissue | User response: Use mmauth show ciphers to display
the command. | a list of the supported ciphers.

6027-1909 There are no available free disks. Disks 6027-1930 Disk diskName belongs to file system
must be prepared prior to invoking fileSystem.
command. Define the disks using the
Explanation: A GPFS administration command (mm...)
command command.
found that the requested disk to be deleted still belongs
Explanation: The currently executing command to a file system.
(mmcrfs, mmadddisk, mmrpldisk) requires disks to be
User response: Check that the correct disk was
defined for use by GPFS using one of the GPFS disk
requested. If so, delete the disk from the file system
creation commands: mmcrnsd, mmcrvsd.
before proceeding.
User response: Create disks and reissue the failing
command.
6027-1931 The following disks are not known to
GPFS: diskNames.
6027-1910 Node nodeName is not a quorum node.
Explanation: A GPFS administration command (mm...)
Explanation: The mmchmgr command was asked to found that the specified disks are not known to GPFS.
move the cluster manager to a nonquorum node. Only
User response: Verify that the correct disks were
one of the quorum nodes can be a cluster manager.
requested.
User response: Designate the node to be a quorum
node, specify a different node on the command line, or
6027-1932 No disks were specified that could be
allow GPFS to choose the new cluster manager node.
deleted.
Explanation: A GPFS administration command (mm...)
6027-1911 File system fileSystem belongs to cluster
determined that no disks were specified that could be
clusterName. The option option is not
deleted.
allowed for remote file systems.
User response: Examine the preceding messages,
Explanation: The specified option can be used only
correct the problems, and reissue the command.
for locally-owned file systems.
User response: Correct the command line and reissue
6027-1933 Disk diskName has been removed from
the command.
the GPFS cluster configuration data but
the NSD volume id was not erased from
6027-1922 IP aliasing is not supported (node). the disk. To remove the NSD volume id,
Specify the main device. issue mmdelnsd -p NSDvolumeid.
Explanation: IP aliasing is not supported. Explanation: A GPFS administration command (mm...)
successfully removed the specified disk from the GPFS
User response: Specify a node identifier that resolves
cluster configuration data, but was unable to erase the
to the IP address of a main device for the node.
NSD volume id from the disk.
User response: Issue the specified command to
remove the NSD volume id from the disk.

Chapter 13. Messages 203


6027-1934 • 6027-1948

6027-1934 Disk diskName has been removed from 6027-1941 Cannot handle multiple interfaces for
the GPFS cluster configuration data but host hostName.
the NSD volume id was not erased from
Explanation: Multiple entries were found for the
the disk. To remove the NSD volume id,
given hostname or IP address either in /etc/hosts or by
issue: mmdelnsd -p NSDvolumeid -N
the host command.
nodeList.
User response: Make corrections to /etc/hosts and
Explanation: A GPFS administration command (mm...)
reissue the command.
successfully removed the specified disk from the GPFS
cluster configuration data but was unable to erase the
NSD volume id from the disk. 6027-1942 Unexpected output from the 'host -t a
name' command:
User response: Issue the specified command to
remove the NSD volume id from the disk. Explanation: A GPFS administration command (mm...)
received unexpected output from the host -t a
command for the given host.
6027-1936 Node nodeName cannot support
Persistent Reserve on disk diskName User response: Issue the host -t a command
because it is not an AIX node. The disk interactively and carefully review the output, as well as
will be used as a non-PR disk. any error messages.
Explanation: A non-AIX node was specified as an
NSD server for the disk. The disk will be used as a 6027-1943 Host name not found.
non-PR disk.
Explanation: A GPFS administration command (mm...)
User response: None. Informational message only. could not resolve a host from /etc/hosts or by using the
host command.
6027-1937 A node was specified more than once as User response: Make corrections to /etc/hosts and
an NSD server in disk descriptor reissue the command.
descriptor.
Explanation: A node was specified more than once as 6027-1945 Disk name diskName is not allowed.
an NSD server in the disk descriptor shown. Names beginning with gpfs are reserved
for use by GPFS.
User response: Change the disk descriptor to
eliminate any redundancies in the list of NSD servers. Explanation: The cited disk name is not allowed
because it begins with gpfs.
6027-1938 configParameter is an incorrect parameter. User response: Specify a disk name that does not
Line in error: configLine. The line is begin with gpfs and reissue the command.
ignored; processing continues.
Explanation: The specified parameter is not valid and 6027-1947 Use mmauth genkey to recover the file
will be ignored. fileName, or to generate and commit a
new key.
User response: None. Informational message only.
Explanation: The specified file was not found.
6027-1939 Line in error: line. User response: Recover the file, or generate a new key
by running: mmauth genkey propagate or generate a
Explanation: The specified line from a user-provided
new key by running mmauth genkey new, followed by
input file contains errors.
the mmauth genkey commit command.
User response: Check the preceding messages for
more information. Correct the problems and reissue the
6027-1948 Disk diskName is too large.
command.
Explanation: The specified disk is too large.
6027-1940 Unable to set reserve policy policy on User response: Specify a smaller disk and reissue the
disk diskName on node nodeName. command.
Explanation: The specified disk should be able to
support Persistent Reserve, but an attempt to set up the
registration key failed.
User response: Correct the problem and reissue the
command.

204 GPFS: Problem Determination Guide


6027-1949 • 6027-1977

6027-1949 Propagating the cluster configuration 6027-1968 Failed while processing disk diskName.
data to all affected nodes.
Explanation: An error was detected while processing
Explanation: The cluster configuration data is being the specified disk.
sent to the rest of the nodes in the cluster.
User response: Examine prior messages to determine
User response: This is an informational message. the reason for the failure. Correct the problem and
reissue the command.
6027-1950 Local update lock is busy.
6027-1969 Device device already exists on node
Explanation: More than one process is attempting to
nodeName
update the GPFS environment at the same time.
Explanation: This device already exists on the
User response: Repeat the command. If the problem
specified node.
persists, verify that there are no blocked processes.
User response: None.
6027-1951 Failed to obtain the local environment
update lock. 6027-1970 Disk diskName has no space for the
quorum data structures. Specify a
Explanation: GPFS was unable to obtain the local
different disk as tiebreaker disk.
environment update lock for more than 30 seconds.
Explanation: There is not enough free space in the file
User response: Examine previous error messages, if
system descriptor for the tiebreaker disk data
any. Correct any problems and reissue the command. If
structures.
the problem persists, perform problem determination
and contact the IBM Support Center. User response: Specify a different disk as a tiebreaker
disk.
6027-1962 Permission denied for disk diskName
6027-1974 None of the quorum nodes can be
Explanation: The user does not have permission to
reached.
access disk diskName.
Explanation: Ensure that the quorum nodes in the
User response: Correct the permissions and reissue
cluster can be reached. At least one of these nodes is
the command.
required for the command to succeed.
User response: Ensure that the quorum nodes are
6027-1963 Disk diskName was not found.
available and reissue the command.
Explanation: The specified disk was not found.
User response: Specify an existing disk and reissue 6027-1975 The descriptor file contains more than
the command. one descriptor.
Explanation: The descriptor file must contain only one
6027-1964 I/O error on diskName descriptor.

Explanation: An I/O error occurred on the specified User response: Correct the descriptor file.
disk.
User response: Check for additional error messages. 6027-1976 The descriptor file contains no
Check the error log for disk hardware problems. descriptor.
Explanation: The descriptor file must contain only one
6027-1967 Disk diskName belongs to back-level file descriptor.
system fileSystem or the state of the disk
User response: Correct the descriptor file.
is not ready. Use mmchfs -V to convert
the file system to the latest format. Use
mmchdisk to change the state of a disk. 6027-1977 Failed validating disk diskName. Error
code errorCode.
Explanation: The specified disk cannot be initialized
for use as a tiebreaker disk. Possible reasons are Explanation: GPFS control structures are not as
suggested in the message text. expected.
User response: Use the mmlsfs and mmlsdisk User response: Contact the IBM Support Center.
commands to determine what action is needed to
correct the problem.

Chapter 13. Messages 205


6027-1984 • 6027-1999

User response: None. Informational message only.


6027-1984 Name name is not allowed. It is longer
than the maximum allowable length
(length). 6027-1995 Device deviceName is not mounted on
node nodeName.
Explanation: The cited name is not allowed because it
is longer than the cited maximum allowable length. Explanation: The specified device is not mounted on
the specified node.
User response: Specify a name whose length does not
exceed the maximum allowable length, and reissue the User response: Mount the specified device on the
command. specified node and reissue the command.

6027-1985 mmfskxload: The format of the GPFS 6027-1996 Command was unable to determine
kernel extension is not correct for this whether file system fileSystem is
version of AIX. mounted.
Explanation: This version of AIX is incompatible with Explanation: The command was unable to determine
the current format of the GPFS kernel extension. whether the cited file system is mounted.
User response: Contact your system administrator to User response: Examine any prior error messages to
check the AIX version and GPFS kernel extension. determine why the command could not determine
whether the file system was mounted, resolve the
problem if possible, and then reissue the command. If
6027-1986 junctionName does not resolve to a
you cannot resolve the problem, reissue the command
directory in deviceName. The junction
with the daemon down on all nodes of the cluster. This
must be within the specified file
will ensure that the file system is not mounted, which
system.
may allow the command to proceed.
Explanation: The cited junction path name does not
belong to the specified file system.
6027-1997 Backup control file fileName from a
User response: Correct the junction path name and previous backup does not exist.
reissue the command.
Explanation: The mmbackup command was asked to
do an incremental or a resume backup, but the control
6027-1987 Name name is not allowed. file from a previous backup could not be found.
Explanation: The cited name is not allowed because it User response: Restore the named file to the file
is a reserved word or a prohibited character. system being backed up and reissue the command, or
else do a full backup.
User response: Specify a different name and reissue
the command.
6027-1998 Line lineNumber of file fileName is
incorrect:
6027-1988 File system fileSystem is not mounted.
Explanation: A line in the specified file passed to the
Explanation: The cited file system is not currently
command had incorrect syntax. The line with the
mounted on this node.
incorrect syntax is displayed next, followed by a
User response: Ensure that the file system is mounted description of the correct syntax for the line.
and reissue the command.
User response: Correct the syntax of the line and
reissue the command.
6027-1993 File fileName either does not exist or has
an incorrect format.
6027-1999 Syntax error. The correct syntax is:
Explanation: The specified file does not exist or has string.
an incorrect format.
Explanation: The specified input passed to the
User response: Check whether the input file specified command has incorrect syntax.
actually exists.
User response: Correct the syntax and reissue the
command.
6027-1994 Did not find any match with the input
disk address.
Explanation: The mmfileid command returned
without finding any disk addresses that match the
given input.

206 GPFS: Problem Determination Guide


6027-2000 • 6027-2015

6027-2000 Could not clear fencing for disk 6027-2010 vgName is not a valid volume group
physicalDiskName. name.
Explanation: The fencing information on the disk Explanation: vgName passed to the command is not
could not be cleared. found in the ODM, implying that vgName does not
exist.
User response: Make sure the disk is accessible by this
node and retry. User response: Run the command on a valid volume
group name.
6027-2002 Disk physicalDiskName of type diskType is
not supported for fencing. 6027-2011 For the hdisk specification -h
physicalDiskName to be valid
Explanation: This disk is not a type that supports
physicalDiskName must be the only disk
fencing.
in the volume group. However, volume
User response: None. group vgName contains disks.
Explanation: The hdisk specified belongs to a volume
6027-2004 None of the specified nodes belong to group that contains other disks.
this GPFS cluster.
User response: Pass an hdisk that belongs to a volume
Explanation: The nodes specified do not belong to the group that contains only this disk.
GPFS cluster.
User response: Choose nodes that belong to the 6027-2012 physicalDiskName is not a valid physical
cluster and try the command again. volume name.
Explanation: The specified name is not a valid
6027-2007 Unable to display fencing for disk physical disk name.
physicalDiskName.
User response: Choose a correct physical disk name
Explanation: Cannot retrieve fencing information for and retry the command.
this disk.
User response: Make sure that this node has access to 6027-2013 pvid is not a valid physical volume id.
the disk before retrying.
Explanation: The specified value is not a valid
physical volume ID.
6027-2008 For the logical volume specification -l
User response: Choose a correct physical volume ID
lvName to be valid lvName must be the
and retry the command.
only logical volume in the volume
group. However, volume group vgName
contains logical volumes. 6027-2014 Node node does not have access to disk
physicalDiskName.
Explanation: The command is being run on a logical
volume that belongs to a volume group that has more Explanation: The specified node is not able to access
than one logical volume. the specified disk.
User response: Run this command only on a logical User response: Choose a different node or disk (or
volume where it is the only logical volume in the both), and retry the command. If both the node and
corresponding volume group. disk name are correct, make sure that the node has
access to the disk.
6027-2009 logicalVolume is not a valid logical
volume. 6027-2015 Node node does not hold a reservation
for disk physicalDiskName.
Explanation: logicalVolume does not exist in the ODM,
implying that logical name does not exist. Explanation: The node on which this command is run
does not have access to the disk.
User response: Run the command on a valid logical
volume. User response: Run this command from another node
that has access to the disk.

Chapter 13. Messages 207


6027-2016 • 6027-2027

the problem if possible. Otherwise, contact the IBM


6027-2016 SSA fencing support is not present on
Support Center.
this node.
Explanation: This node does not support SSA fencing.
6027-2024 ioctl failed with rc=returnCode,
User response: None. errno=errnoValue. Related values are
scsi_status=scsiStatusValue,
sense_key=senseKeyValue,
6027-2017 Node ID nodeId is not a valid SSA node
scsi_asc=scsiAscValue,
ID. SSA node IDs must be a number in
scsi_ascq=scsiAscqValue.
the range of 1 to 128.
Explanation: An ioctl call failed with stated return
Explanation: You specified a node ID outside of the
code, errno value, and related values.
acceptable range.
User response: Check the reported errno and correct
User response: Choose a correct node ID and retry the
the problem if possible. Otherwise, contact the IBM
command.
Support Center.

6027-2018 The SSA node id is not set.


6027-2025 READ_KEYS ioctl failed with
Explanation: The SSA node ID has not been set. errno=returnCode, tried timesTried times.
Related values are
User response: Set the SSA node ID. scsi_status=scsiStatusValue,
sense_key=senseKeyValue,
6027-2019 Unable to retrieve the SSA node id. scsi_asc=scsiAscValue,
scsi_ascq=scsiAscqValue.
Explanation: A failure occurred while trying to
retrieve the SSA node ID. Explanation: A READ_KEYS ioctl call failed with
stated errno value, and related values.
User response: None.
User response: Check the reported errno and correct
the problem if possible. Otherwise, contact the IBM
6027-2020 Unable to set fencing for disk Support Center.
physicalDiskName.
Explanation: A failure occurred while trying to set 6027-2026 READRES ioctl failed with
fencing for the specified disk. errno=returnCode, tried timesTried times.
User response: None. Related values are:
scsi_status=scsiStatusValue,
sense_key=senseKeyValue,
6027-2021 Unable to clear PR reservations for disk scsi_asc=scsiAscValue,
physicalDiskNam. scsi_ascq=scsiAscqValue.
Explanation: Failed to clear Persistent Reserve Explanation: A REGISTER ioctl call failed with stated
information on the disk. errno value, and related values.
User response: Make sure the disk is accessible by this User response: Check the reported errno and correct
node before retrying. the problem if possible. Otherwise, contact the IBM
Support Center.
6027-2022 Could not open disk physicalDiskName,
errno value. 6027-2027 READRES ioctl failed with
Explanation: The specified disk cannot be opened. errno=returnCode, tried timesTried times.
Related values are:
User response: Examine the errno value and other scsi_status=scsiStatusValue,
messages to determine the reason for the failure. sense_key=senseKeyValue,
Correct the problem and reissue the command. scsi_asc=scsiAscValue,
scsi_ascq=scsiAscqValue.
6027-2023 retVal = value, errno = value for key Explanation: A READRES ioctl call failed with stated
value. errno value, and related values.
Explanation: An ioctl call failed with stated return User response: Check the reported errno and correct
code, errno value, and related values. the problem if possible. Otherwise, contact the IBM
Support Center.
User response: Check the reported errno and correct

208 GPFS: Problem Determination Guide


6027-2028 • 6027-2107

6027-2028 could not open disk device 6027-2101 Insufficient free space in fileSystem
diskDeviceName (storage minimum required).
Explanation: A problem occurred on a disk open. Explanation: There is not enough free space in the
specified file system or directory for the command to
User response: Ensure the disk is accessible and not
successfully complete.
fenced out, and then reissue the command.
User response: Correct the problem and reissue the
command.
6027-2029 could not close disk device
diskDeviceName
6027-2102 Node nodeName is not mmremotefs to
Explanation: A problem occurred on a disk close.
run the command.
User response: None.
Explanation: The specified node is not available to run
a command. Depending on the command, a different
6027-2030 ioctl failed with DSB=value and node may be tried.
result=value reason: explanation
User response: Determine why the specified node is
Explanation: An ioctl call failed with stated return not available and correct the problem.
code, errno value, and related values.
User response: Check the reported errno and correct 6027-2103 Directory dirName does not exist
the problem, if possible. Otherwise, contact the IBM
Explanation: The specified directory does not exist.
Support Center.
User response: Reissue the command specifying an
existing directory.
6027-2031 ioctl failed with non-zero return code
Explanation: An ioctl failed with a non-zero return
6027-2104 The GPFS release level could not be
code.
determined on nodes: nodeList.
User response: Correct the problem, if possible.
Explanation: The command was not able to determine
Otherwise, contact the IBM Support Center.
the level of the installed GPFS code on the specified
nodes.
| 6027-2049 [X] Cannot pin a page pool of size value
User response: Reissue the command after correcting
bytes.
the problem.
Explanation: A GPFS page pool cannot be pinned into
memory on this machine.
6027-2105 The following nodes must be upgraded
User response: Increase the physical memory size of to GPFS release productVersion or higher:
the machine. nodeList
Explanation: The command requires that all nodes be
| 6027-2050 [E] Pagepool has size actualValue bytes at the specified GPFS release level.
instead of the requested requestedValue
User response: Correct the problem and reissue the
bytes.
command.
Explanation: The configured GPFS page pool is too
large to be allocated or pinned into memory on this
6027-2106 Ensure the nodes are available and run:
machine. GPFS will work properly, but with reduced
command.
capacity for caching user data.
Explanation: The command could not complete
User response: To prevent this message from being
normally.
generated when the GPFS daemon starts, reduce the
page pool size using the mmchconfig command. User response: Check the preceding messages, correct
the problems, and issue the specified command until it
completes successfully.
6027-2100 Incorrect range value-value specified.
Explanation: The range specified to the command is
6027-2107 Upgrade the lower release level nodes
incorrect. The first parameter value must be less than
and run: command.
or equal to the second parameter value.
Explanation: The command could not complete
User response: Correct the address range and reissue
normally.
the command.
User response: Check the preceding messages, correct

Chapter 13. Messages 209


6027-2108 • 6027-2120 [E]

the problems, and issue the specified command until it


6027-2114 The subsystem subsystem is already
completes successfully.
active.
Explanation: The user attempted to start a subsystem
6027-2108 Error found while processing stanza
that was already active.
Explanation: A stanza was found to be unsatisfactory
User response: None. Informational message only.
in some way.
User response: Check the preceding messages, if any,
6027-2115 Unable to resolve address range for disk
and correct the condition that caused the stanza to be
diskName on node nodeName.
rejected.
Explanation: A command could not perform address
range resolution for the specified disk and node values
6027-2109 Failed while processing disk stanza on
passed as input.
node nodeName.
User response: Correct the disk and node values
Explanation: A disk stanza was found to be
passed as input and reissue the command.
unsatisfactory in some way.
User response: Check the preceding messages, if any,
6027-2116 [E] The GPFS daemon must be active on
and correct the condition that caused the stanza to be
the recovery group server nodes.
rejected.
Explanation: The command requires that the GPFS
daemon be active on the recovery group server nodes.
6027-2110 Missing required parameter parameter
User response: Ensure GPFS is running on the
Explanation: The specified parameter is required for
recovery group server nodes and reissue the command.
this command.
User response: Specify the missing information and
6027-2117 [E] object name already exists.
reissue the command.
Explanation: The user attempted to create an object
with a name that already exists.
6027-2111 The following disks were not deleted:
diskList User response: Correct the name and reissue the
command.
Explanation: The command could not delete the
specified disks. Check the preceding messages for error
information. 6027-2118 [E] The parameter is invalid or missing in
the pdisk descriptor.
User response: Correct the problems and reissue the
command. Explanation: The pdisk descriptor is not valid. The
bad descriptor is displayed following this message.
6027-2112 Permission failure. Option option User response: Correct the input and reissue the
requires root authority to run. command.
Explanation: The specified command option requires
root authority. 6027-2119 [E] Recovery group name not found.
User response: Log on as root and reissue the Explanation: The specified recovery group was not
command. found.
User response: Correct the input and reissue the
6027-2113 Not able to associate diskName on node command.
nodeName with any known GPFS disk.
Explanation: A command could not find a GPFS disk 6027-2120 [E] Unable to delete recovery group name on
that matched the specified disk and node values passed nodes nodeNames.
as input.
Explanation: The recovery group could not be deleted
User response: Correct the disk and node values on the specified nodes.
passed as input and reissue the command.
User response: Perform problem determination.

210 GPFS: Problem Determination Guide


6027-2121 [I] • 6027-2134

6027-2121 [I] Recovery group name deleted on node 6027-2128 [E] The attribute attribute must be
nodeName. configured to use hostname as a recovery
group server.
Explanation: The recovery group has been deleted.
Explanation: The specified GPFS configuration
User response: This is an informational message.
attributes must be configured to use the node as a
recovery group server.
6027-2122 [E] The number of spares (numberOfSpares)
User response: Use the mmchconfig command to set
must be less than the number of pdisks
the attributes, then reissue the command.
(numberOfpdisks) being created.
Explanation: The number of spares specified must be
6027-2129 [E] Vdisk block size (blockSize) must match
less than the number of pdisks that are being created.
the file system block size (blockSize).
User response: Correct the input and reissue the
Explanation: The specified NSD is a vdisk with a
command.
block size that does not match the block size of the file
system.
6027-2123 [E] The GPFS daemon is down on the
User response: Reissue the command using block
vdiskName servers.
sizes that match.
Explanation: The GPFS daemon was down on the
vdisk servers when mmdelvdisk was issued.
6027-2130 [E] Could not find an active server for
User response: Start the GPFS daemon on the recovery group name.
specified nodes and issue the specified mmdelvdisk
Explanation: A command was issued that acts on a
command.
recovery group, but no active server was found for the
specified recovery group.
6027-2124 [E] Vdisk vdiskName is still NSD nsdName.
User response: Perform problem determination.
Use the mmdelnsd command.
Explanation: The specified vdisk is still an NSD.
6027-2131 [E] Cannot create an NSD on a log vdisk.
User response: Use the mmdelnsd command.
Explanation: The specified disk is a log vdisk; it
cannot be used for an NSD.
6027-2125 [E] nsdName is a vdisk-based NSD and
User response: Specify another disk that is not a log
cannot be used as a tiebreaker disk.
vdisk.
Explanation: Vdisk-based NSDs cannot be specified as
tiebreaker disks.
6027-2132 [E] Log vdisk vdiskName cannot be deleted
User response: Correct the input and reissue the while there are other vdisks in recovery
command. group name.
Explanation: The specified disk is a log vdisk; it must
6027-2126 [I] No recovery groups were found. be the last vdisk deleted from the recovery group.

Explanation: A command searched for recovery User response: Delete the other vdisks first.
groups but found none.
User response: None. Informational message only. 6027-2133 [E] Unable to delete recovery group name;
vdisks are still defined.

6027-2127 [E] Disk descriptor descriptor refers to an Explanation: Cannot delete a recovery group while
existing pdisk. there are still vdisks defined.

Explanation: The specified disk descriptor refers to an User response: Delete all the vdisks first.
existing pdisk.
User response: Specify another disk that is not an 6027-2134 Node nodeName cannot be used as an
existing pdisk. NSD server for Persistent Reserve disk
diskName because it is not a Linux node.
Explanation: There was an attempt to enable
Persistent Reserve for a disk, but not all of the NSD
server nodes are running Linux.

Chapter 13. Messages 211


6027-2135 • 6027-2149 [E]

User response: Correct the configuration and enter the


6027-2140 All NSD server nodes must be running
command again.
AIX or all running Linux to enable
Persistent Reserve for disk diskName.
6027-2135 All nodes in the cluster must be
Explanation: Attempt to enable Persistent Reserve for
running AIX to enable Persistent
a disk while not all NSD server nodes are running
Reserve for SAN attached disk diskName.
AIXor all running Linux.
Explanation: There was an attempt to enable
User response: Correct the configuration first.
Persistent Reserve for a SAN-attached disk, but not all
nodes in the cluster are running AIX.
6027-2141 Disk diskName is not configured as a
User response: Correct the configuration and run the
regular hdisk.
command again.
Explanation: In an AIX only cluster, Persistent Reserve
is supported for regular hdisks only.
6027-2136 All NSD server nodes must be running
AIX to enable Persistent Reserve for User response: Correct the configuration and enter the
disk diskName. command again.
Explanation: There was an attempt to enable
Persistent Reserve for the specified disk, but not all 6027-2142 Disk diskName is not configured as a
NSD servers are running AIX. regular generic disk.
User response: Correct the configuration and enter the Explanation: In a Linux only cluster, Persistent
command again. Reserve is supported for regular generic or device
mapper virtual disks only.
6027-2137 An attempt to clear the Persistent User response: Correct the configuration and enter the
Reserve reservations on disk diskName command again.
failed.
Explanation: You are importing a disk into a cluster in 6027-2147 [E] BlockSize must be specified in disk
which Persistent Reserve is disabled. An attempt to descriptor.
clear the Persistent Reserve reservations on the disk
Explanation: The blockSize positional parameter in a
failed.
vdisk descriptor was empty. The bad disk descriptor is
User response: Correct the configuration and enter the displayed following this message.
command again.
User response: Correct the input and reissue the
command.
6027-2138 The cluster must be running either all
AIX or all Linux nodes to change
6027-2148 [E] nodeName is not a valid recovery group
Persistent Reserve disk diskName to a
server for recoveryGroupName.
SAN-attached disk.
Explanation: The server name specified is not one of
Explanation: There was an attempt to redefine a
the defined recovery group servers.
Persistent Reserve disk as a SAN attached disk, but not
all nodes in the cluster were running either all AIX or User response: Correct the input and reissue the
all Linux nodes. command.
User response: Correct the configuration and enter the
command again. 6027-2149 [E] Could not get recovery group
information from an active server.
6027-2139 NSD server nodes must be running Explanation: A command that needed recovery group
either all AIX or all Linux to enable information failed; the GPFS daemons may have
Persistent Reserve for disk diskName. become inactive or the recovery group is temporarily
unavailable.
Explanation: There was an attempt to enable
Persistent Reserve for a disk, but not all NSD server User response: Reissue the command.
nodes were running all AIX or all Linux nodes.
User response: Correct the configuration and enter the
command again.

212 GPFS: Problem Determination Guide


6027-2150 • 6027-2161 [E]

| 6027-2150 The archive system client backupProgram 6027-2156 The image archive index ImagePath could
could not be found or is not executable. not be found.
Explanation: TSM dsmc or other specified backup or Explanation: The archive image index could be found
archive system client could not be found. in the specified path
User response: Verify that TSM is installed, dsmc can User response: Check command arguments for correct
be found in the installation location or that the archiver specification of image path, then try the command
client specified is executable. again.

6027-2151 The path directoryPath is not contained 6027-2157 The image archive index ImagePath is
in the snapshot snapshotName. corrupt or incomplete.
Explanation: The directory path supplied is not Explanation: The archive image index specified is
contained in the snapshot named with the -S damaged.
parameter.
User response: Check the archive image index file for
User response: Correct the directory path or snapshot corruption and remedy.
name supplied, or omit -S and the snapshot name in
the command.
6027-2158 Disk usage must be dataOnly,
metadataOnly, descOnly,
6027-2152 The path directoryPath containing image | dataAndMetadata, vdiskLog,
archives was not found. | vdiskLogTip, vdiskLogTipBackup, or
| vdiskLogReserved.
Explanation: The directory path supplied does not
contain the expected image files to archive into TSM. Explanation: The disk usage positional parameter in a
vdisk descriptor has a value that is not valid. The bad
User response: Correct the directory path name
disk descriptor is displayed following this message.
supplied.
User response: Correct the input and reissue the
command.
6027-2153 The archiving system backupProgram
exited with status return code. Image
backup files have been preserved in 6027-2159 [E] parameter is not valid or missing in the
globalWorkDir vdisk descriptor.
Explanation: Archiving system executed and returned Explanation: The vdisk descriptor is not valid. The
a non-zero exit status due to some error. bad descriptor is displayed following this message.
User response: Examine archiver log files to discern User response: Correct the input and reissue the
the cause of the archiver's failure. Archive the command.
preserved image files from the indicated path.
6027-2160 [E] Vdisk vdiskName is already mapped to
6027-2154 Unable to create a policy file for image NSD nsdName.
backup in policyFilePath.
Explanation: The command cannot create the specified
Explanation: A temporary file could not be created in NSD because the underlying vdisk is already mapped
the global shared directory path. to a different NSD.
User response: Check or correct the directory path User response: Correct the input and reissue the
name supplied. command.

6027-2155 File system fileSystem must be mounted 6027-2161 [E] NAS servers cannot be specified when
read only for restore. creating an NSD on a vdisk.
Explanation: The empty file system targeted for Explanation: The command cannot create the specified
restoration must be mounted in read only mode during NSD because servers were specified and the underlying
restoration. disk is a vdisk.
User response: Unmount the file system on all nodes User response: Correct the input and reissue the
and remount it read only, then try the command again. command.

Chapter 13. Messages 213


6027-2162 [E] • 6027-2173 [I]

nodes in the cluster before attempting to download


6027-2162 [E] Cannot set nsdRAIDTracks to zero;
firmware to a port card.
nodeName is a recovery group server.
User response: Stop GPFS on all nodes and reissue
Explanation: nsdRAIDTracks cannot be set to zero
the command.
while the node is still a recovery group server.
User response: Modify or delete the recovery group
6027-2169 Unable to disable Persistent Reserve on
and reissue the command.
the following disks: diskList
Explanation: The command was unable to disable
6027-2163 [E] Vdisk name not found in the daemon.
Persistent Reserve on the specified disks.
Recovery may be occurring. The disk
will not be deleted. User response: Examine the disks and additional error
information to determine if the disks should support
Explanation: GPFS cannot find the specified vdisk.
Persistent Reserve. Correct the problem and reissue the
This can happen if recovery is taking place and the
command.
recovery group is temporarily inactive.
User response: Reissue the command. If the recovery
6027-2170 [E] Recovery group recoveryGroupName does
group is damaged, specify the -p option.
not exist or is not active.
Explanation: A command was issued to a recovery
6027-2164 [E] Disk descriptor for name refers to an
group that does not exist or is not in the active state.
existing pdisk.
User response: Reissue the command with a valid
Explanation: The specified pdisk already exists.
recovery group name or wait for the recovery group to
User response: Correct the command invocation and become active.
try again.
6027-2171 [E] objectType objectName already exists in the
6027-2165 [E] Node nodeName cannot be used as a cluster.
server of both vdisks and non-vdisk
Explanation: The file system being imported contains
NSDs.
an object with a name that conflicts with the name of
Explanation: The command specified an action that an existing object in the cluster.
would have caused vdisks and non-vdisk NSDs to be
User response: If possible, remove the object with the
defined on the same server. This is not a supported
conflicting name.
configuration.
User response: Correct the command invocation and
6027-2172 [E] Errors encountered while importing
try again.
GPFS Native RAID objects.
Explanation: Errors were encountered while trying to
6027-2166 [E] GPFS Native RAID is not configured.
import a GPFS Native RAID based file system. No file
Explanation: GPFS Native RAID is not configured on systems will be imported.
this node.
User response: Check the previous error messages
User response: Reissue the command on the and if possible, correct the problems.
appropriate node.
6027-2173 [I] Use mmchrecoverygroup to assign and
6027-2167 [E] Device deviceName does not exist or is activate servers for the following
not active on this node. recovery groups (automatically assigns
NSD servers as well): recoveryGroupList
Explanation: The specified device does not exist or is
not active on the node. Explanation: The mmimportfs command imported the
specified recovery groups. These must have servers
User response: Reissue the command on the
assigned and activated.
appropriate node.
User response: After the mmimportfs command
finishes, use the mmchrecoverygroup command to
6027-2168 [E] The GPFS cluster must be shut down
assign NSD server nodes as needed.
before downloading firmware to port
cards.
Explanation: The GPFS daemon must be down on all

214 GPFS: Problem Determination Guide


6027-2174 • 6027-2188

6027-2174 Option option can be specified only in 6027-2183 [E] Peer snapshots using mmpsnap are
conjunction with option. allowed only for single-writer filesets.
Explanation: The cited option cannot be specified by Explanation: This operation is allowed only for
itself. single-writer filesets.
User response: Correct the input and reissue the User response: Check the previous error messages
command. and correct the problems.

6027-2175 [E] Exported path exportPath does not exist 6027-2184 [E] If the recovery group is damaged, issue
mmdelrecoverygroup name -p.
Explanation: The directory or one of the components
in the directory path to be exported does not exist. Explanation: No active servers were found for the
recovery group that is being deleted. If the recovery
User response: Correct the input and reissue the
group is damaged the -p option is needed.
command.
User response: Perform diagnosis and reissue the
command.
6027-2176 [E] mmchattr for fileName failed.
Explanation: The command to change the attributes of
6027-2185 [E] There are no pdisk stanzas in the input
the file failed.
file fileName.
User response: Check the previous error messages
Explanation: The mmcrrecoverygroup input stanza
and correct the problems.
file has no pdisk stanzas.
User response: Correct the input file and reissue the
6027-2177 [E] Cannot create file fileName.
command.
Explanation: The command to create the specified file
failed.
6027-2186 [E] There were no valid vdisk stanzas in the
User response: Check the previous error messages input file fileName.
and correct the problems.
Explanation: The mmcrvdisk input stanza file has no
valid vdisk stanzas.
6027-2178 File fileName does not contain any NSD
User response: Correct the input file and reissue the
descriptors or stanzas.
command.
Explanation: The input file should contain at least one
NSD descriptor or stanza.
6027-2187 [E] Could not get pdisk information for the
User response: Correct the input file and reissue the following recovery groups:
command. recoveryGroupList
Explanation: An mmlspdisk all command could not
6027-2181 [E] Failover is allowed only for query all of the recovery groups because some nodes
single-writer and independent-writer could not be reached.
filesets.
User response: None.
Explanation: This operation is allowed only for
single-writer filesets.
6027-2188 Unable to determine the local node
User response: Check the previous error messages identity.
and correct the problems.
Explanation: The command is not able to determine
the identity of the local node. This can be the result of
6027-2182 [E] Resync is allowed only for single-writer a disruption in the network over which the GPFS
filesets. daemons communicate.

Explanation: This operation is allowed only for User response: Ensure the GPFS daemon network (as
single-writer filesets. identified in the output of the mmlscluster command)
is fully operational and reissue the command.
User response: Check the previous error messages
and correct the problems.

Chapter 13. Messages 215


6027-2189 [E] • 6027-2202 [E]

6027-2189 [E] Action action is allowed only for 6027-2197 [E] Empty file encountered when running
read-only filesets. the mmafmctl flushPending command.
Explanation: The specified action is only allowed for Explanation: The mmafmctl flushPending command
read-only filesets. did not find any entries in the file specified with the
--list-file option.
User response: None.
User response: Correct the input file and reissue the
command.
6027-2190 [E] Cannot prefetch file fileName. The file
does not belong to fileset fileset.
6027-2198 [E] Cannot run the mmafmctl flushPending
Explanation: The requested file does not belong to the
command on directory dirName.
fileset.
Explanation: The mmafmctl flushPending command
User response: None.
cannot be issued on this directory.
User response: Correct the input and reissue the
6027-2191 [E] Vdisk vdiskName not found in recovery
command.
group recoveryGroupName.
Explanation: The mmdelvdisk command was invoked
6027-2199 [E] No enclosures were found.
with the --recovery-group option to delete one or more
vdisks from a specific recovery group. The specified Explanation: A command searched for disk enclosures
vdisk does not exist in this recovery group. but none were found.
User response: Correct the input and reissue the User response: None.
command.
6027-2200 [E] Cannot have multiple nodes updating
6027-2193 [E] Recovery group recoveryGroupName must firmware for the same enclosure.
be active on the primary server Enclosure serialNumber is already being
serverName. updated by node nodeName.
Explanation: The recovery group must be active on Explanation: The mmchenclosure command was
the specified node. called with multiple nodes updating the same
firmware.
User response: Use the mmchrecoverygroup
command to activate the group and reissue the User response: Correct the node list and reissue the
command. command.

6027-2194 [E] The state of fileset filesetName is Expired; 6027-2201 [E] The mmafmctl flushPending command
prefetch cannot be performed. completed with errors.
Explanation: The prefetch operation cannot be Explanation: An error occurred while flushing the
performed on filesets that are in the Expired state. queue.
User response: None. User response: Examine the GPFS log to identify the
cause.
6027-2195 [E] Error getting snapshot ID for
snapshotName. 6027-2202 [E] There is a SCSI-3 PR reservation on
disk diskname. mmcrnsd cannot format
Explanation: The command was unable to obtain the
the disk because the cluster is not
resync snapshot ID.
configured as PR enabled.
User response: Examine the preceding messages,
Explanation: The specified disk has a SCSI-3 PR
correct the problem, and reissue the command. If the
reservation, which prevents the mmcrnsd command
problem persists, perform problem determination and
from formatting it.
contact the IBM Support Center.
User response: Clear the PR reservation by following
the instructions in “Clearing a leftover Persistent
6027-2196 [E] Resync is allowed only when the fileset
Reserve reservation” on page 103.
queue is in active state.
Explanation: This operation is allowed only when the
fileset queue is in active state.
User response: None.

216 GPFS: Problem Determination Guide


6027-2203 • 6027-2216

6027-2203 Node nodeName is not a gateway node. 6027-2210 [E] Unable to build a storage enclosure
inventory file on node nodeName.
Explanation: The specified node is not a gateway
node. Explanation: A command was unable to build a
storage enclosure inventory file. This is a temporary file
User response: Designate the node as a gateway node
that is required to complete the requested command.
or specify a different node on the command line.
User response: None.
| 6027-2204 AFM target map mapName is already
| defined. 6027-2211 [E] Error collecting firmware information on
node nodeName.
| Explanation: A request was made to create an AFM
| target map with the cited name, but that map name is Explanation: A command was unable to gather
| already defined. firmware information from the specified node.
| User response: Specify a different name for the new User response: Ensure the node is active and retry the
| AFM target map or first delete the current map command.
| definition and then recreate it.
6027-2212 [E] Firmware update file updateFile was not
| 6027-2205 There are no AFM target map found.
| definitions.
Explanation: The mmchfirmware command could not
| Explanation: A command searched for AFM target find the specified firmware update file to load.
| map definitions but found none.
User response: Locate the firmware update file and
| User response: None. Informational message only. retry the command.

| 6027-2206 AFM target map mapName is not 6027-2213 [E] Pdisk path redundancy was lost while
| defined. updating enclosure firmware.
| Explanation: The cited AFM target map name is not Explanation: The mmchfirmware command lost paths
| known to GPFS. after loading firmware and rebooting the Enclosure
Services Module.
| User response: Specify an AFM target map known to
| GPFS. User response: Wait a few minutes and then retry the
command. GPFS might need to be shut down to finish
updating the enclosure firmware.
6027-2207 Node nodeName is being used as a
gateway node for the AFM cluster
clusterName. 6027-2214 [E] Timeout waiting for firmware to load.
Explanation: The specified node is defined as a Explanation: A storage enclosure firmware update
gateway node for the specified AFM cluster. was in progress, but the update did not complete
within the expected time frame.
User response: If you are trying to delete the node
from the GPFS cluster or delete the gateway node role, User response: Wait a few minutes, and then use the
you must remove it from the export server map. mmlsfirmware command to ensure the operation
completed.
6027-2208 [E] commandName is already running in the
cluster. 6027-2215 [E] Storage enclosure serialNumber not
found.
Explanation: Only one instance of the specified
command is allowed to run. Explanation: The specified storage enclosure was not
found.
User response: None.
User response: None.
6027-2209 [E] Unable to list objectName on node
nodeName. 6027-2216 Quota management is disabled for file
system fileSystem.
Explanation: A command was unable to list the
specific object that was requested. Explanation: Quota management is disabled for the
specified file system.
User response: None.
User response: Enable quota management for the file
system.

Chapter 13. Messages 217


6027-2217 [E] • 6027-2228 [E]

6027-2217 [E] Error errno updating firmware for drives 6027-2223 [E] Storage enclosure serialNumber is not
driveList. redundant. Shutdown GPFS in the
cluster and retry the mmchfirmware
Explanation: The firmware load failed for the
command.
specified drives. Some of the drives may have been
updated. Explanation: The mmchfirmware command found a
non-redundant storage enclosure. Proceeding could
User response: None.
cause loss of data access.
User response: Shutdown GPFS in the cluster and
6027-2218 [E] Storage enclosure serialNumber
retry the mmchfirmware command.
component componentType component ID
componentId not found.
6027-2224 [E] Peer snapshot creation failed. Error code
Explanation: The mmchenclosure command could not
errorCode.
find the component specified for replacement.
Explanation: For an active fileset, check the AFM
User response: Use the mmlsenclosure command to
target configuration for peer snapshots. Ensure there is
determine valid input and then retry the command.
at least one gateway node configured for the cluster.
Examine the preceding messages and the GPFS log for
6027-2219 [E] Storage enclosure serialNumber additional details.
component componentType component ID
User response: Correct the problems and reissue the
componentId did not fail. Service is not
command.
required.
Explanation: The component specified for the
6027-2225 [E] Peer snapshot successfully deleted at
mmchenclosure command does not need service.
cache. The delete snapshot operation
User response: Use the mmlsenclosure command to failed at home. Error code errorCode.
determine valid input and then retry the command.
Explanation: For an active fileset, check the AFM
target configuration for peer snapshots. Ensure there is
6027-2220 [E] Recovery group name has pdisks with at least one gateway node configured for the cluster.
missing paths. Consider using the -v no Examine the preceding messages and the GPFS log for
option of the mmchrecoverygroup additional details.
command.
User response: Correct the problems and reissue the
Explanation: The mmchrecoverygroup command command.
failed because all the servers could not see all the disks,
and the primary server is missing paths to disks.
6027-2226 [E] Invalid firmware update file.
User response: If the disks are cabled correctly, use
Explanation: An invalid firmware update file was
the -v no option of the mmchrecoverygroup command.
specified for the mmchfirmware command.
User response: Reissue the command with a valid
6027-2221 [E] Error determining redundancy of
update file.
enclosure serialNumber ESM esmName.
Explanation: The mmchrecoverygroup command
6027-2227 [E] Failback is allowed only for
failed. Check the following error messages.
independent-writer filesets.
User response: Correct the problem and retry the
Explanation: Failback operation is allowed only for
command.
independent-writer filesets.
User response: Check the fileset mode.
6027-2222 [E] Storage enclosure serialNumber already
has a newer firmware version:
firmwareLevel. 6027-2228 [E] The daemon version (daemonVersion) on
node nodeName is lower than the
Explanation: The mmchfirmware command found a
daemon version (daemonVersion) on node
newer level of firmware on the specified storage
nodeName.
enclosure.
Explanation: A command was issued that requires
User response: If the intent is to force on the older
nodes to be at specific levels, but the affected GPFS
firmware version, use the -v no option.
servers are not at compatible levels to support this
operation.

218 GPFS: Problem Determination Guide


6027-2229 [E] • 6027-2603

User response: Update the GPFS code on the specified User response: Wait for the currently running
servers and retry the command. command to complete and reissue the command.

| 6027-2229 [E] Cache Eviction/Prefetch is not allowed 6027-2501 Could not allocate storage.
| for DR filesets.
Explanation: Sufficient memory could not be allocated
| Explanation: Cache Eviction/Prefetch is not allowed to run the mmsanrepairfs command.
| for DR filesets.
User response: Increase the amount of memory
| User response: None. available.

| 6027-2230 [E] afmTarget=newTargetString is not | 6027-2576 [E] Error: Daemon value kernel value
| allowed. To change the AFM target, use PAGE_SIZE mismatch.
| mmafmctl failover with the --target-only
Explanation: The GPFS kernel extension loaded in
| option.
memory does not have the same PAGE_SIZE value as
| Explanation: The mmchfileset command cannot be the GPFS daemon PAGE_SIZE value that was returned
| used to change the NFS server or IP address of the from the POSIX sysconf API.
| home cluster.
User response: Verify that the kernel header files used
| User response: To change the AFM target, use the to build the GPFS portability layer are the same kernel
| mmafmctl failover command and specify the header files used to build the running kernel.
| --target-only option.
6027-2600 Cannot create a new snapshot until an
| 6027-2231 [E] The specified block size blockSize is existing one is deleted. File system
| smaller than the system page size fileSystem has a limit of number online
| pageSize. snapshots.
| Explanation: The file system block size cannot be Explanation: The file system has reached its limit of
| smaller than the system memory page size. online snapshots
| User response: Specify a block size greater than or User response: Delete an existing snapshot, then issue
| equal to the system memory page size. the create snapshot command again.

| 6027-2232 [E] Peer snapshots are allowed only for 6027-2601 Snapshot name dirName already exists.
| targets using the NFS protocol.
Explanation: by the tscrsnapshot command.
| Explanation: The mmpsnap command can be used to
User response: Delete existing file/directory and
| create snapshots only for filesets that are configured to
reissue the command.
| use the NFS protocol.
| User response: Specify a valid fileset target.
6027-2602 Unable to delete snapshot snapshotName
from file system fileSystem. rc=returnCode.
| 6027-2233 [E] Fileset filesetName in file system
Explanation: This message is issued by the
| filesystemName does not contain peer
tscrsnapshot command.
| snapshot snapshotName. The delete
| snapshot operation failed at cache. Error User response: Delete the snapshot using the
| code errorCode. tsdelsnapshot command.
| Explanation: The specified snapshot name was not
| found. The command expects the name of an existing 6027-2603 Unable to get permission to create
| peer snapshot of the active fileset in the specified file snapshot, rc=returnCode.
| system.
Explanation: This message is issued by the
| User response: Reissue the command with a valid tscrsnapshot command.
| peer snapshot name.
User response: Reissue the command.

6027-2500 mmsanrepairfs already in progress for


"name"
Explanation: This is an output from mmsanrepairfs
when another mmsanrepairfs command is already
running.

Chapter 13. Messages 219


6027-2604 • 6027-2617

6027-2604 Unable to quiesce all nodes, 6027-2611 Cannot delete snapshot snapshotName
rc=returnCode. which is in state snapshotState.
Explanation: This message is issued by the Explanation: The snapshot cannot be deleted while it
tscrsnapshot command. is in the cited transition state because of an in-progress
snapshot operation.
User response: Restart failing nodes or switches and
reissue the command. User response: Wait for the in-progress operation to
complete and then reissue the command.
6027-2605 Unable to resume all nodes,
rc=returnCode. 6027-2612 Snapshot named snapshotName does not
exist.
Explanation: This message is issued by the
tscrsnapshot command. Explanation: A snapshot to be listed does not exist.
User response: Restart failing nodes or switches. User response: Specify only existing snapshot names.

6027-2606 Unable to sync all nodes, rc=returnCode. 6027-2613 Cannot restore snapshot. fileSystem is
mounted on number node(s) and in use
Explanation: This message is issued by the
on number node(s).
tscrsnapshot command.
Explanation: This message is issued by the
User response: Restart failing nodes or switches and
tsressnapshot command.
reissue the command.
User response: Unmount the file system and reissue
the restore command.
6027-2607 Cannot create new snapshot until an
existing one is deleted. Fileset
filesetName has a limit of number 6027-2614 File system fileSystem does not contain
snapshots. snapshot snapshotName err = number.
Explanation: The fileset has reached its limit of Explanation: An incorrect snapshot name was
snapshots. specified.
User response: Delete an existing snapshot, then issue User response: Specify a valid snapshot and issue the
the create snapshot command again. command again.

6027-2608 Cannot create new snapshot: state of 6027-2615 Cannot restore snapshot snapshotName
fileset filesetName is inconsistent which is snapshotState, err = number.
(badState).
Explanation: The specified snapshot is not in a valid
Explanation: An operation on the cited fileset is state.
incomplete.
User response: Specify a snapshot that is in a valid
User response: Complete pending fileset actions, then state and issue the command again.
issue the create snapshot command again.
6027-2616 Restoring snapshot snapshotName
6027-2609 Fileset named filesetName does not exist. requires quotaTypes quotas to be enabled.
Explanation: One of the filesets listed does not exist. Explanation: The snapshot being restored requires
quotas to be enabled, since they were enabled when the
User response: Specify only existing fileset names.
snapshot was created.
User response: Issue the recommended mmchfs
6027-2610 File system fileSystem does not contain
command to enable quotas.
snapshot snapshotName err = number.
Explanation: An incorrect snapshot name was
6027-2617 You must run: mmchfs fileSystem -Q yes.
specified.
Explanation: The snapshot being restored requires
User response: Select a valid snapshot and issue the
quotas to be enabled, since they were enabled when the
command again.
snapshot was created.
User response: Issue the cited mmchfs command to
enable quotas.

220 GPFS: Problem Determination Guide


6027-2618 [N] • 6027-2629

| 6027-2618 [N] Restoring snapshot snapshotName in file 6027-2624 Previous snapshot snapshotName is not
system fileSystem requires quotaTypes valid and must be deleted before a new
quotas to be enabled. snapshot may be created.
Explanation: The snapshot being restored in the cited Explanation: The cited previous snapshot is not valid
file system requires quotas to be enabled, since they and must be deleted before a new snapshot may be
were enabled when the snapshot was created. created.
User response: Issue the mmchfs command to enable User response: Delete the previous snapshot using the
quotas. mmdelsnapshot command, and then reissue the
original snapshot command.
6027-2619 Restoring snapshot snapshotName
requires quotaTypes quotas to be 6027-2625 Previous snapshot snapshotName must be
disabled. restored before a new snapshot may be
created.
Explanation: The snapshot being restored requires
quotas to be disabled, since they were not enabled Explanation: The cited previous snapshot must be
when the snapshot was created. restored before a new snapshot may be created.
User response: Issue the cited mmchfs command to User response: Run mmrestorefs on the previous
disable quotas. snapshot, and then reissue the original snapshot
command.
6027-2620 You must run: mmchfs fileSystem -Q no.
6027-2626 Previous snapshot snapshotName is not
Explanation: The snapshot being restored requires
valid and must be deleted before
quotas to be disabled, since they were not enabled
another snapshot may be deleted.
when the snapshot was created.
Explanation: The cited previous snapshot is not valid
User response: Issue the cited mmchfs command to
and must be deleted before another snapshot may be
disable quotas.
deleted.
User response: Delete the previous snapshot using the
| 6027-2621 [N] Restoring snapshot snapshotName in file
mmdelsnapshot command, and then reissue the
system fileSystem requires quotaTypes
original snapshot command.
quotas to be disabled.
Explanation: The snapshot being restored in the cited
6027-2627 Previous snapshot snapshotName is not
file system requires quotas to be disabled, since they
valid and must be deleted before
were disabled when the snapshot was created.
another snapshot may be restored.
User response: Issue the mmchfs command to disable
Explanation: The cited previous snapshot is not valid
quotas.
and must be deleted before another snapshot may be
restored.
6027-2622 Error restoring inode inode, err number
User response: Delete the previous snapshot using the
Explanation: The online snapshot was corrupted. mmdelsnapshot command, and then reissue the
original snapshot command.
User response: Restore the file from an offline
snapshot.
6027-2628 More than one snapshot is marked for
restore.
| 6027-2623 [E] Error deleting snapshot snapshotName in
file system fileSystem err number Explanation: More than one snapshot is marked for
restore.
Explanation: The cited snapshot could not be deleted
during file system recovery. User response: Restore the previous snapshot and
then reissue the original snapshot command.
User response: Run the mmfsck command to recover
any lost data blocks.
6027-2629 Offline snapshot being restored.
Explanation: An offline snapshot is being restored.
User response: When the restore of the offline
snapshot completes, reissue the original snapshot
command.

Chapter 13. Messages 221


6027-2630 • 6027-2642

descriptors and reissue the command.


6027-2630 Program failed, error number.
Explanation: The tssnaplatest command encountered
6027-2637 The file system must contain at least
an error and printErrnoMsg failed.
one disk for metadata.
User response: Correct the problem shown and
Explanation: The disk descriptors for this command
reissue the command.
must include one and only one storage pool that is
allowed to contain metadata.
6027-2631 Attention: Snapshot snapshotName was
User response: Modify the command's disk
being restored to fileSystem.
descriptors and reissue the command.
Explanation: A file system in the process of a
snapshot restore cannot be mounted except under a
6027-2638 Maximum of number storage pools
restricted mount.
allowed.
User response: None. Informational message only.
Explanation: The cited limit on the number of storage
pools that may be defined has been exceeded.
6027-2632 Mount of fileSystem failed: snapshot
User response: Modify the command's disk
snapshotName must be restored before it
descriptors and reissue the command.
can be mounted.
Explanation: A file system in the process of a
6027-2639 Incorrect fileset name filesetName.
snapshot restore cannot be mounted for read only or
read/write access. Explanation: The fileset name provided in the
command invocation is incorrect.
User response: Run the mmrestorefs command to
complete the restoration, then reissue the mount User response: Correct the fileset name and reissue
command. the command.

6027-2633 Attention: Disk configuration for 6027-2640 Incorrect path to fileset junction
fileSystem has changed while tsdf was filesetJunction.
running.
Explanation: The path to the cited fileset junction is
Explanation: The disk configuration for the cited file incorrect.
system changed while the tsdf command was running.
User response: Correct the junction path and reissue
User response: Reissue the mmdf command. the command.

6027-2634 Attention: number of number regions in 6027-2641 Incorrect fileset junction name
fileSystem were unavailable for free filesetJunction.
space.
Explanation: The cited junction name is incorrect.
Explanation: Some regions could not be accessed
during the tsdf run. Typically, this is due to utilities User response: Correct the junction name and reissue
such mmdefragfs or mmfsck running concurrently. the command.

User response: Reissue the mmdf command.


6027-2642 Specify one and only one of
FilesetName or -J JunctionPath.
6027-2635 The free space data is not available.
Reissue the command without the -q Explanation: The change fileset and unlink fileset
option to collect it. commands accept either a fileset name or the fileset's
junction path to uniquely identify the fileset. The user
Explanation: The existing free space information for failed to provide either of these, or has tried to provide
the file system is currently unavailable. both.
User response: Reissue the mmdf command. User response: Correct the command invocation and
reissue the command.
6027-2636 Disks in storage pool storagePool must
have disk usage type dataOnly.
Explanation: A non-system storage pool cannot hold
metadata or descriptors.
User response: Modify the command's disk

222 GPFS: Problem Determination Guide


6027-2643 • 6027-2656

the fileset, or specify the -f option to the mmdelfileset


6027-2643 Cannot create a new fileset until an
command.
existing one is deleted. File system
fileSystem has a limit of maxNumber
filesets. 6027-2650 Fileset information is not available.
Explanation: An attempt to create a fileset for the Explanation: A fileset command failed to read file
cited file system failed because it would exceed the system metadata file. The file system may be corrupted.
cited limit.
User response: Run the mmfsck command to recover
User response: Remove unneeded filesets and reissue the file system.
the command.

6027-2651 Fileset filesetName cannot be unlinked.


6027-2644 Comment exceeds maximum length of
maxNumber characters. Explanation: The user tried to unlink the root fileset,
or is not authorized to unlink the selected fileset.
Explanation: The user-provided comment for the new
fileset exceeds the maximum allowed length. User response: None. The fileset cannot be unlinked.

User response: Shorten the comment and reissue the


command. 6027-2652 Fileset at junctionPath cannot be
unlinked.

6027-2645 Fileset filesetName already exists. Explanation: The user tried to unlink the root fileset,
or is not authorized to unlink the selected fileset.
Explanation: An attempt to create a fileset failed
because the specified fileset name already exists. User response: None. The fileset cannot be unlinked.

User response: Select a unique name for the fileset


and reissue the command. 6027-2653 Failed to unlink fileset filesetName from
filesetName.

6027-2646 Unable to sync all nodes while Explanation: An attempt was made to unlink a fileset
quiesced, rc=returnCode that is linked to a parent fileset that is being deleted.

Explanation: This message is issued by the User response: Delete or unlink the children, and then
tscrsnapshot command. delete the parent fileset.

User response: Restart failing nodes or switches and


reissue the command. 6027-2654 Fileset filesetName cannot be deleted
while other filesets are linked to it.

6027-2647 Fileset filesetName must be unlinked to Explanation: The fileset to be deleted has other filesets
be deleted. linked to it, and cannot be deleted without using the -f
flag, or unlinking the child filesets.
Explanation: The cited fileset must be unlinked before
it can be deleted. User response: Delete or unlink the children, and then
delete the parent fileset.
User response: Unlink the fileset, and then reissue the
delete command.
6027-2655 Fileset filesetName cannot be deleted.

6027-2648 Filesets have not been enabled for file Explanation: The user is not allowed to delete the root
system fileSystem. fileset.

Explanation: The current file system format version User response: None. The fileset cannot be deleted.
does not support filesets.
User response: Change the file system format version 6027-2656 Unable to quiesce fileset at all nodes.
by issuing mmchfs -V. Explanation: An attempt to quiesce the fileset at all
nodes failed.
6027-2649 Fileset filesetName contains user files and User response: Check communication hardware and
cannot be deleted unless the -f option is reissue the command.
specified.
Explanation: An attempt was made to delete a
non-empty fileset.
User response: Remove all files and directories from

Chapter 13. Messages 223


6027-2657 • 6027-2671

6027-2657 Fileset filesetName has open files. Specify 6027-2664 Fileset at pathName cannot be changed.
-f to force unlink.
Explanation: The user specified a fileset to tschfileset
Explanation: An attempt was made to unlink a fileset that cannot be changed.
that has open files.
User response: None. You cannot change the
User response: Close the open files and then reissue attributes of the root fileset.
command, or use the -f option on the unlink command
to force the open files to close.
6027-2665 mmfileid already in progress for name.
Explanation: An mmfileid command is already
6027-2658 Fileset filesetName cannot be linked into
running.
a snapshot at pathName.
User response: Wait for the currently running
Explanation: The user specified a directory within a
command to complete, and issue the new command
snapshot for the junction to a fileset, but snapshots
again.
cannot be modified.
User response: Select a directory within the active file
6027-2666 mmfileid can only handle a maximum
system, and reissue the command.
of diskAddresses disk addresses.
Explanation: Too many disk addresses specified.
6027-2659 Fileset filesetName is already linked.
User response: Provide less than 256 disk addresses to
Explanation: The user specified a fileset that was
the command.
already linked.
User response: Unlink the fileset and then reissue the
| 6027-2667 [I] Allowing block allocation for file
link command.
system fileSystem that makes a file
ill-replicated due to insufficient resource
6027-2660 Fileset filesetName cannot be linked. and puts data at risk.
Explanation: The fileset could not be linked. This Explanation: The partialReplicaAllocation file system
typically happens when the fileset is in the process of option allows allocation to succeed even when all
being deleted. replica blocks cannot be allocated. The file was marked
as not replicated correctly and the data may be at risk
User response: None.
if one of the remaining disks fails.
User response: None. Informational message only.
6027-2661 Fileset junction pathName already exists.
Explanation: A file or directory already exists at the
6027-2670 Fileset name filesetName not found.
specified junction.
Explanation: The fileset name that was specified with
User response: Select a new junction name or a new
the command invocation was not found.
directory for the link and reissue the link command.
User response: Correct the fileset name and reissue
the command.
6027-2662 Directory pathName for junction has too
many links.
6027-2671 Fileset command on fileSystem failed;
Explanation: The directory specified for the junction
snapshot snapshotName must be restored
has too many links.
first.
User response: Select a new directory for the link and
Explanation: The file system is being restored either
reissue the command.
from an offline backup or a snapshot, and the restore
operation has not finished. Fileset commands cannot be
6027-2663 Fileset filesetName cannot be changed. run.

Explanation: The user specified a fileset to tschfileset User response: Run the mmrestorefs command to
that cannot be changed. complete the snapshot restore operation or to finish the
offline restore, then reissue the fileset command.
User response: None. You cannot change the
attributes of the root fileset.

224 GPFS: Problem Determination Guide


6027-2672 • 6027-2682 [E]

6027-2672 Junction parent directory inode number 6027-2677 Fileset filesetName has pending changes
inodeNumber is not valid. that need to be synced.
Explanation: An inode number passed to tslinkfileset Explanation: A user is trying to change a caching
is not valid. option for a fileset while it has local changes that are
not yet synced with the home server.
User response: Check the mmlinkfileset command
arguments for correctness. If a valid junction path was User response: Perform AFM recovery before
provided, contact the IBM Support Center. reissuing the command.

| 6027-2673 [X] Duplicate owners of an allocation region 6027-2678 Filesystem fileSystem is mounted on
(index indexNumber, region regionNumber, nodes nodes or fileset filesetName is not
pool poolNumber) were detected for file unlinked.
system fileSystem: nodes nodeName and
Explanation: A user is trying to change a caching
nodeName.
feature for a fileset while the filesystem is still mounted
Explanation: The allocation region should not have or the fileset is still linked.
duplicate owners.
User response: Unmount the filesystem from all nodes
User response: Contact the IBM Support Center. or unlink the fileset before reissuing the command.

| 6027-2674 [X] The owner of an allocation region 6027-2679 Mount of fileSystem failed because
(index indexNumber, region regionNumber, mount event not handled by any data
pool poolNumber) that was detected for management application.
file system fileSystem: node nodeName is
Explanation: The mount failed because the file system
not valid.
is enabled for DMAPI events (-z yes), but there was no
Explanation: The file system had detected a problem data management application running to handle the
with the ownership of an allocation region. This may event.
result in a corrupted file system and loss of data. One
User response: Make sure the DM application (for
or more nodes may be terminated to prevent any
example HSM or HPSS) is running before the file
further damage to the file system.
system is mounted.
User response: Unmount the file system and run the
kwdmmfsck command to repair the file system.
6027-2680 AFM filesets cannot be created for file
system fileSystem.
6027-2675 Only file systems with NFSv4 ACL
Explanation: The current file system format version
semantics enabled can be mounted on
does not support AFM-enabled filesets; the -p option
this platform.
cannot be used.
Explanation: A user is trying to mount a file system
User response: Change the file system format version
on Microsoft Windows, but the ACL semantics disallow
by issuing mmchfs -V.
NFSv4 ACLs.
User response: Enable NFSv4 ACL semantics using
6027-2681 Snapshot snapshotName has linked
the mmchfs command (-k option)
independent filesets
Explanation: The specified snapshot is not in a valid
6027-2676 Only file systems with NFSv4 locking
state.
semantics enabled can be mounted on
this platform. User response: Correct the problem and reissue the
command.
Explanation: A user is trying to mount a file system
on Microsoft Windows, but the POSIX locking
semantics are in effect. | 6027-2682 [E] Set quota file attribute error
(reasonCode)explanation
User response: Enable NFSv4 locking semantics using
the mmchfs command (-D option). Explanation: While mounting a file system a new
quota file failed to be created due to inconsistency with
the current degree of replication or the number of
failure groups.
User response: Disable quotas. Check and correct the

Chapter 13. Messages 225


6027-2683 • 6027-2695 [E]

degree of replication and the number of failure groups.


6027-2690 Fileset filesetName can only be linked
Re-enable quotas.
within its own inode space.
Explanation: A dependent fileset can only be linked
6027-2683 Fileset filesetName in file system
within its own inode space.
fileSystem does not contain snapshot
snapshotName, err = number User response: Correct the junction path and reissue
the command.
Explanation: An incorrect snapshot name was
specified.
6027-2691 The fastea feature needs to be enabled
User response: Select a valid snapshot and issue the
for file system fileSystem before creating
command again.
AFM filesets.
Explanation: The current file system on-disk format
6027-2684 File system fileSystem does not contain
does not support storing of extended attributes in the
global snapshot snapshotName, err =
file's inode. This is required for AFM-enabled filesets.
number
User response: Use the mmmigratefs command to
Explanation: An incorrect snapshot name was
enable the fast extended-attributes feature.
specified.
User response: Select a valid snapshot and issue the
6027-2692 Error encountered while processing the
command again.
input file.
Explanation: The tscrsnapshot command encountered
6027-2685 Total file system capacity allows
an error while processing the input file.
minMaxInodes inodes in fileSystem.
Currently the total inode limits used by User response: Check and validate the fileset names
all the inode spaces in inodeSpace is listed in the input file.
inodeSpaceLimit. There must be at least
number inodes available to create a new
6027-2693 Fileset junction name junctionName
inode space. Use the mmlsfileset -L
conflicts with the current setting of
command to show the maximum inode
mmsnapdir.
limits of each fileset. Try reducing the
maximum inode limits for some of the Explanation: The fileset junction name conflicts with
inode spaces in fileSystem. the current setting of mmsnapdir.
Explanation: The number of inodes available is too User response: Select a new junction name or a new
small to create a new inode space. directory for the link and reissue the mmlinkfileset
command.
User response: Reduce the maximum inode limits and
issue the command again.
| 6027-2694 [I] The requested maximum number of
inodes is already at number.
6027-2688 Only independent filesets can be
configured as AFM filesets. The Explanation: The specified number of nodes is already
--inode-space=new option is required. in effect.
Explanation: Only independent filesets can be User response: This is an informational message.
configured for caching.
User response: Specify the --inode-space=new option. | 6027-2695 [E] The number of inodes to preallocate
cannot be higher than the maximum
number of inodes.
6027-2689 The value for --block-size must be the
keyword auto or the value must be of Explanation: The specified number of nodes to
the form [n]K, [n]M, [n]G or [n]T, where preallocate is not valid.
n is an optional integer in the range 1 to
1023. User response: Correct the --inode-limit argument
then retry the command.
Explanation: An invalid value was specified with the
--block-size option.
User response: Reissue the command with a valid
option.

226 GPFS: Problem Determination Guide


6027-2696 [E] • 6027-2708 [E]

| 6027-2696 [E] The number of inodes to preallocate 6027-2702 Unexpected mmpmon response from file
cannot be lower than the number inodes system daemon.
already allocated.
Explanation: An unexpected response was received to
Explanation: The specified number of nodes to an mmpmon request.
preallocate is not valid.
User response: Ensure that the mmfsd daemon is
User response: Correct the --inode-limit argument running. Check the error log. Ensure that all GPFS
then retry the command. software components are at the same version.

6027-2697 Fileset at junctionPath has pending 6027-2703 Unknown mmpmon command command.
changes that need to be synced.
Explanation: An unknown mmpmon command was
Explanation: A user is trying to change a caching read from the input file.
option for a fileset while it has local changes that are
User response: Correct the command and rerun.
not yet synced with the home server.
User response: Perform AFM recovery before
6027-2704 Permission failure. The command
reissuing the command.
requires root authority to execute.
Explanation: The mmpmon command was issued
6027-2698 File system fileSystem is mounted on
with a nonzero UID.
nodes nodes or fileset at junctionPath is
not unlinked. User response: Log on as root and reissue the
command.
Explanation: A user is trying to change a caching
feature for a fileset while the filesystem is still mounted
or the fileset is still linked. 6027-2705 Could not establish connection to file
system daemon.
User response: Unmount the filesystem from all nodes
or unlink the fileset before reissuing the command. Explanation: The connection between a GPFS
command and the mmfsd daemon could not be
established. The daemon may have crashed, or never
6027-2699 Cannot create a new independent fileset
been started, or (for mmpmon) the allowed number of
until an existing one is deleted. File
simultaneous connections has been exceeded.
system fileSystem has a limit of
maxNumber independent filesets. User response: Ensure that the mmfsd daemon is
running. Check the error log. For mmpmon, ensure
Explanation: An attempt to create an independent
that the allowed number of simultaneous connections
fileset for the cited file system failed because it would
has not been exceeded.
exceed the cited limit.
User response: Remove unneeded independent filesets
and reissue the command.
| 6027-2706 [I] Recovered number nodes.
Explanation: The asynchronous part (phase 2) of node
failure recovery has completed.
| 6027-2700 [E] A node join was rejected. This could be
due to incompatible daemon versions, User response: None. Informational message only.
failure to find the node in the
configuration database, or no
configuration manager found. | 6027-2707 [I] Node join protocol waiting value
seconds for node recovery
Explanation: A request to join nodes was explicitly
rejected. Explanation: Node join protocol is delayed until phase
2 of previous node failure recovery protocol is
User response: Verify that compatible versions of complete.
GPFS are installed on all nodes. Also, verify that the
joining node is in the configuration database. User response: None. Informational message only.

6027-2701 The mmpmon command file is empty. | 6027-2708 [E] Rejected node join protocol. Phase two
of node failure recovery appears to still
Explanation: The mmpmon command file is empty. be in progress.
User response: Check file size, existence, and access Explanation: Node join protocol is rejected after a
permissions. number of internal delays and phase two node failure
protocol is still in progress.

Chapter 13. Messages 227


6027-2709 • 6027-2722 [E]

User response: None. Informational message only.


6027-2715 Could not appoint a new cluster
manager.
6027-2709 Configuration manager node nodeName
Explanation: The mmchmgr -c command generates
not found in the node list.
this message when a node is not available as a cluster
Explanation: The specified node was not found in the manager.
node list.
User response: Make sure that GPFS is running on a
User response: Add the specified node to the node list sufficient number of quorum nodes.
and reissue the command.
| 6027-2716 [I] Challenge response received; canceling
| 6027-2710 [E] Node nodeName is being expelled due to disk election.
expired lease.
Explanation: The node has challenged another node,
Explanation: The nodes listed did not renew their which won the previous election, and detected a
lease in a timely fashion and will be expelled from the response to the challenge.
cluster.
User response: None. Informational message only.
User response: Check the network connection
between this node and the node specified above.
6027-2717 Node nodeName is already a cluster
manager or another node is taking over
| 6027-2711 [E] File system table full. as the cluster manager.

Explanation: The mmfsd daemon cannot add any Explanation: The mmchmgr -c command generates
more file systems to the table because it is full. this message if the specified node is already the cluster
manager.
User response: None. Informational message only.
User response: None. Informational message only.

6027-2712 Option 'optionName' has been


deprecated. 6027-2718 Incorrect port range:
GPFSCMDPORTRANGE='range'. Using
Explanation: The option that was specified with the default.
command is no longer supported. A warning message
is generated to indicate that the option has no effect. Explanation: The GPFS command port range format is
lllll[-hhhhh], where lllll is the low port value and hhhhh
User response: Correct the command line and then is the high port value. The valid range is 1 to 65535.
reissue the command.
User response: None. Informational message only.

6027-2713 Permission failure. The command


requires SuperuserName authority to 6027-2719 The files provided do not contain valid
execute. quota entries.
Explanation: The command, or the specified Explanation: The quota file provided does not have
command option, requires administrative authority. valid quota entries.
User response: Log on as a user with administrative User response: Check that the file being restored is a
privileges and reissue the command. valid GPFS quota file.

6027-2714 Could not appoint node nodeName as | 6027-2722 [E] Node limit of number has been reached.
cluster manager. Ignoring nodeName.

Explanation: The mmchmgr -c command generates Explanation: The number of nodes that have been
this message if the specified node cannot be appointed added to the cluster is greater than some cluster
as a new cluster manager. members can handle.

User response: Make sure that the specified node is a User response: Delete some nodes from the cluster
quorum node and that GPFS is running on that node. using the mmdelnode command, or shut down GPFS
on nodes that are running older versions of the code
with lower limits.

228 GPFS: Problem Determination Guide


6027-2723 [N] • 6027-2734 [E]

| 6027-2723 [N] This node (nodeName) is now Cluster 6027-2729 Value value for option optionName is out
Manager for clusterName. of range. Valid values are value through
value.
Explanation: This is an informational message when a
new cluster manager takes over. Explanation: An out of range value was specified for
the specified option.
User response: None. Informational message only.
User response: Correct the command line.
| 6027-2724 [I] reasonString. Probing cluster clusterName
| 6027-2730 [E] Node nodeName failed to take over as
Explanation: This is an informational message when a
cluster manager.
lease request has not been renewed.
Explanation: An attempt to takeover as cluster
User response: None. Informational message only.
manager failed.
User response: Make sure that GPFS is running on a
| 6027-2725 [N] Node nodeName lease renewal is
sufficient number of quorum nodes.
overdue. Pinging to check if it is alive
Explanation: This is an informational message on the
6027-2731 Failed to locate a working cluster
cluster manager when a lease request has not been
manager.
renewed.
Explanation: The cluster manager has failed or
User response: None. Informational message only.
changed. The new cluster manager has not been
appointed.
| 6027-2726 [I] Recovered number nodes for file system
User response: Check the internode communication
fileSystem.
configuration and ensure enough GPFS nodes are up to
Explanation: The asynchronous part (phase 2) of node make a quorum.
failure recovery has completed.
User response: None. Informational message only. 6027-2732 Attention: No data disks remain in the
system pool. Use mmapplypolicy to
migrate all data left in the system pool
6027-2727 fileSystem: quota manager is not to other storage pool.
available.
Explanation: The mmchdiskcommand has been issued
Explanation: An attempt was made to perform a but no data disks remain in the system pool. Warn user
quota command without a quota manager running. to use mmapplyppolicy to move data to other storage
This could be caused by a conflicting offline mmfsck pool.
command.
User response: None. Informational message only.
User response: Reissue the command once the
conflicting program has ended.
6027-2733 The file system name (fsname) is longer
than the maximum allowable length
| 6027-2728 [N] Connection from node rejected because (maxLength).
| it does not support IPv6
Explanation: The file system name is invalid because
| Explanation: A connection request was received from it is longer than the maximum allowed length of 255
| a node that does not support Internet Protocol Version characters.
| 6 (IPv6), and at least one node in the cluster is
| configured with an IPv6 address (not an IPv4-mapped User response: Specify a file system name whose
| one) as its primary address. Since the connecting node length is 255 characters or less and reissue the
| will not be able to communicate with the IPv6 node, it command.
| is not permitted to join the cluster.
| User response: Upgrade the connecting node to a | 6027-2734 [E] Disk failure from node nodeName
| version of GPFS that supports IPv6, or delete all nodes Volume name. Physical volume name.
| with IPv6-only addresses from the cluster.
Explanation: An I/O request to a disk or a request to
fence a disk has failed in such a manner that GPFS can
no longer use the disk.
User response: Check the disk hardware and the
software subsystems in the path to the disk.

Chapter 13. Messages 229


6027-2735 [E] • 6027-2747 [E]

| 6027-2735 [E] Not a manager | 6027-2742 [I] CallExitScript: exit script exitScript on
event eventName returned code
Explanation: This node is not a manager or no longer
returnCode, quorumloss.
a manager of the type required to proceed with the
operation. This could be caused by the change of Explanation: This node invoked the user-specified
manager in the middle of the operation. callback handler for the tiebreakerCheck event and it
returned a non-zero value. The user-specified action
User response: Retry the operation.
with the error is quorumloss.
User response: None. Informational message only.
6027-2736 The value for --block-size must be the
keyword auto or the value must be of
the form nK, nM, nG or nT, where n is 6027-2743 Permission denied.
an optional integer in the range 1 to
Explanation: The command is invoked by an
1023.
unauthorized user.
Explanation: An invalid value was specified with the
User response: Retry the command with an
--block-size option.
authorized user.
User response: Reissue the command with a valid
option.
| 6027-2744 [D] Invoking tiebreaker callback script
Explanation: The node is invoking the callback script
6027-2738 Editing quota limits for the root user is
due to change in quorum membership.
not permitted
User response: None. Informational message only.
Explanation: The root user was specified for quota
limits editing in the mmedquota command.
User response: Specify a valid user or group in the
| 6027-2745 [E] File system is not mounted.
mmedquota command. Editing quota limits for the root Explanation: A command was issued, which requires
user or system group is prohibited. that the file system be mounted.
User response: Mount the file system and reissue the
6027-2739 Editing quota limits for groupName command.
group not permitted.
Explanation: The system group was specified for | 6027-2746 [E] Too many disks unavailable for this
quota limits editing in the mmedquota command. server to continue serving a
RecoveryGroup.
User response: Specify a valid user or group in the
mmedquota command. Editing quota limits for the root Explanation: RecoveryGroup panic: Too many disks
user or system group is prohibited. unavailable to continue serving this RecoveryGroup.
This server will resign, and failover to an alternate
server will be attempted.
| 6027-2740 [I] Starting new election as previous clmgr
is expelled User response: Ensure the alternate server took over.
Determine what caused this event and address the
Explanation: This node is taking over as clmgr
situation. Prior messages may help determine the cause
without challenge as the old clmgr is being expelled.
of the event.
User response: None. Informational message only.
| 6027-2747 [E] Inconsistency detected between the local
| 6027-2741 [W] This node can not continue to be node number retrieved from 'mmsdrfs'
| cluster manager (nodeNumber) and the node number
retrieved from 'mmfs.cfg' (nodeNumber).
Explanation: This node invoked the user-specified
callback handler for event tiebreakerCheck and it Explanation: The node number retrieved by obtaining
returned a non-zero value. This node cannot continue the list of nodes in the mmsdrfs file did not match the
to be the cluster manager. node number contained in mmfs.cfg. There may have
been a recent change in the IP addresses being used by
User response: None. Informational message only.
network interfaces configured at the node.
User response: Stop and restart GPFS daemon.

230 GPFS: Problem Determination Guide


6027-2748 • 6027-2758 [E]

6027-2748 Terminating because a conflicting | 6027-2754 [X] Challenge thread did not respond to
program on the same inode space challenge in time: took TimeIntervalSecs
inodeSpace is running. seconds.
Explanation: A program detected that it must Explanation: Challenge thread took too long to
terminate because a conflicting program is running. respond to a disk challenge. Challenge thread will exit,
which will result in the local node losing quorum.
User response: Reissue the command after the
conflicting program ends. User response: None. Informational message only.

6027-2749 Specified locality group 'number' does | 6027-2755 [N] Another node committed disk election
not match disk 'name' locality group with sequence CommittedSequenceNumber
'number'. To change locality groups in an (our sequence was OurSequenceNumber).
SNC environment, please use the
Explanation: Another node committed a disk election
mmdeldisk and mmadddisk commands.
with a sequence number higher than the one used
Explanation: The locality group specified on the when this node used to commit an election in the past.
mmchdisk command does not match the current This means that the other node has become, or is
locality group of the disk. becoming, a Cluster Manager. To avoid having two
Cluster Managers, this node will lose quorum.
User response: To change locality groups in an SNC
environment, use the mmdeldisk and mmadddisk User response: None. Informational message only.
commands.
| 6027-2756 Attention: In file system FileSystemName,
| 6027-2750 [I] Node NodeName is now the Group | FileSetName (Default)
Leader. | QuotaLimitType(QuotaLimit) for
| QuotaTypeUerName/GroupName/FilesetName
Explanation: A new cluster Group Leader has been
| is too small. Suggest setting it higher
assigned.
| than minQuotaLimit.
User response: None. Informational message only.
Explanation: Users set too low quota limits. It will
cause unexpected quota behavior. MinQuotaLimit is
| 6027-2751 [I] Starting new election: Last elected: computed through:
NodeNumber Sequence: SequenceNumber 1. for block: QUOTA_THRESHOLD *
Explanation: A new disk election will be started. The MIN_SHARE_BLOCKS * subblocksize
disk challenge will be skipped since the last elected 2. for inode: QUOTA_THRESHOLD *
node was either none or the local node. MIN_SHARE_INODES

User response: None. Informational message only. User response: Users should reset quota limits so that
they are more than MinQuotaLimit. It is just a warning.
Quota limits will be set anyway.
| 6027-2752 [I] This node got elected. Sequence:
SequenceNumber
| 6027-2757 [E] The peer snapshot is in progress. Queue
Explanation: Local node got elected in the disk cannot be flushed now.
election. This node will become the cluster manager.
Explanation: The Peer Snapshot is in progress. Queue
User response: None. Informational message only. cannot be flushed now.
User response: Reissue the command once the peer
| 6027-2753 [N] Responding to disk challenge: snapshot has ended.
response: ResponseValue. Error code:
ErrorCode.
| 6027-2758 [E] The AFM target is not configured for
Explanation: A disk challenge has been received, | peer snapshots. Run mmafmconfig on
indicating that another node is attempting to become a | the AFM target cluster.
Cluster Manager. Issuing a challenge response, to
confirm the local node is still alive and will remain the Explanation: The .afmctl file is probably not present
Cluster Manager. on AFM target cluster.
User response: None. Informational message only. | User response: Run mmafmconfig on the AFM target
cluster to configure the AFM target cluster.

Chapter 13. Messages 231


6027-2759 [N] • 6027-2772

| 6027-2759 [N] Disk lease period expired in cluster 6027-2765 command on 'fileSystem' is finished
ClusterName. Attempting to reacquire waiting. Processing continues ... name
lease.
Explanation: A program detected that it can now
Explanation: The disk lease period expired, which will continue the processing since a conflicting program has
prevent the local node from being able to perform disk ended.
I/O. This can be caused by a temporary
User response: None. Informational message only.
communication outage.
User response: If message is repeated then the
communication outage should be investigated.
| 6027-2766 [I] User script has chosen to expel node
nodeName instead of node nodeName.
Explanation: User has specified a callback script that
| 6027-2760 [N] Disk lease reacquired in cluster
is invoked whenever a decision is about to be taken on
ClusterName.
what node should be expelled from the active cluster.
Explanation: The disk lease has been reacquired, and As a result of the execution of the script, GPFS will
disk I/O will be resumed. reverse its decision on what node to expel.
User response: None. Informational message only. User response: None.

6027-2761 Unable to run command on 'fileSystem' | 6027-2767 [E] Error errorNumber while accessing
while the file system is mounted in | tiebreaker devices.
restricted mode.
| Explanation: An error was encountered while reading
Explanation: A command that can alter data in a file | from or writing to the tiebreaker devices. When such
system was issued while the file system was mounted | error happens while the cluster manager is checking for
in restricted mode. | challenges, it will cause the cluster manager to lose
| cluster membership.
User response: Mount the file system in read-only or
read-write mode or unmount the file system and then | User response: Verify the health of the tiebreaker
reissue the command. | devices.

6027-2762 Unable to run command on 'fileSystem' 6027-2770 Disk diskName belongs to a


while the file system is suspended. write-affinity enabled storage pool. Its
failure group cannot be changed.
Explanation: A command that can alter data in a file
system was issued while the file system was Explanation: The failure group specified on the
suspended. mmchdisk command does not match the current failure
group of the disk.
User response: Resume the file system and reissue the
command. User response: Use the mmdeldisk and mmadddisk
commands to change failure groups in a write-affinity
enabled storage pool.
6027-2763 Unable to start command on 'fileSystem'
because conflicting program name is
running. Waiting until it completes. 6027-2771 fileSystem: Default per-fileset quotas are
disabled for quotaType.
Explanation: A program detected that it cannot start
because a conflicting program is running. The program Explanation: A command was issued to modify
will automatically start once the conflicting program default fileset-level quota, but default quotas are not
has ended as long as there are no other conflicting enabled.
programs running at that time.
User response: Ensure the --perfileset-quota option is
User response: None. Informational message only. in effect for the file system, then use the
mmdefquotaon command to enable default fileset-level
quotas. After default quotas are enabled, issue the
6027-2764 Terminating command on fileSystem
failed command again.
because a conflicting program name is
running.
6027-2772 Cannot close disk name.
Explanation: A program detected that it must
terminate because a conflicting program is running. Explanation: Could not access the specified disk.
User response: Reissue the command after the User response: Check the disk hardware and the path
conflicting program ends.

232 GPFS: Problem Determination Guide


6027-2773 • 6027-2783 [E]

to the disk. Refer to “Unable to access disks” on page


| 6027-2778 [I] Node nodeName: ping timed out. Pings
95.
sent: pingsSent. Replies received:
pingRepliesReceived.
6027-2773 fileSystem:filesetName: default quota for
Explanation: Ping timed out for the node listed, which
quotaType is disabled.
should be the cluster manager. A new cluster manager
Explanation: A command was issued to modify will be chosen while the current cluster manager is
default quota, but default quota is not enabled. expelled from the cluster.

User response: Ensure the -Q yes option is in effect User response: Check the network connection
for the file system, then enable default quota with the between this node and the node listed in the message.
mmdefquotaon command.
| 6027-2779 [E] Challenge thread stopped.
6027-2774 fileSystem: Per-fileset quotas are not
Explanation: A tiebreaker challenge thread stopped
enabled.
because of an error. Cluster membership will be lost.
Explanation: A command was issued to modify
User response: Check for additional error messages.
fileset-level quota, but per-fileset quota management is
File systems will be unmounted, then the node will
not enabled.
rejoin the cluster.
User response: Ensure that the --perfileset-quota
option is in effect for the file system and reissue the
| 6027-2780 [E] Not enough quorum nodes reachable:
command.
reachableNodes.
Explanation: The cluster manager cannot reach a
6027-2775 Storage pool named poolName does not
sufficient number of quorum nodes, and therefore must
exist.
resign to prevent cluster partitioning.
Explanation: The mmlspool command was issued, but
User response: Determine if there is a network outage
the specified storage pool does not exist.
or if too many nodes have failed.
User response: Correct the input and reissue the
command.
| 6027-2781 [E] Lease expired for numSecs seconds
(shutdownOnLeaseExpiry).
6027-2776 Attention: A disk being stopped reduces
Explanation: Disk lease expired for too long, which
the degree of system metadata
results in the node losing cluster membership.
replication (value) or data replication
(value) to lower than tolerable. User response: None. The node will attempt to rejoin
the cluster.
Explanation: The mmchdisk stop command was
issued, but the disk cannot be stopped because of the
current file system metadata and data replication | 6027-2782 [E] This node is being expelled from the
factors. cluster.
User response: Make more disks available, delete Explanation: This node received a message instructing
unavailable disks, or change the file system metadata it to leave the cluster, which might indicate
replication factor. Also check the current value of the communication problems between this node and some
unmountOnDiskFail configuration parameter. other node in the cluster.
User response: None. The node will attempt to rejoin
| 6027-2777 [E] Node nodeName is being expelled the cluster.
because of an expired lease. Pings sent:
pingsSent. Replies received:
pingRepliesReceived.
| 6027-2783 [E] New leader elected with a higher ballot
number.
Explanation: The node listed did not renew its lease
Explanation: A new group leader was elected with a
in a timely fashion and is being expelled from the
higher ballot number, and this node is no longer the
cluster.
leader. Therefore, this node must leave the cluster and
User response: Check the network connection rejoin.
between this node and the node listed in the message.
User response: None. The node will attempt to rejoin
the cluster.

Chapter 13. Messages 233


6027-2784 [E] • 6027-2794 [E]

User response: None. The node will attempt to rejoin


| 6027-2784 [E] No longer a cluster manager or lost
the cluster.
quorum while running a group protocol.
Explanation: Cluster manager no longer maintains
6027-2790 Attention: Disk parameters were
quorum after attempting to run a group protocol,
changed. Use the mmrestripefs
which might indicate a network outage or node
command with the -r option to relocate
failures.
data and metadata.
User response: None. The node will attempt to rejoin
Explanation: The mmchdisk command with the
the cluster.
change option was issued.
User response: Issue the mmrestripefs -r command to
| 6027-2785 [X] A severe error was encountered during
relocate data and metadata.
cluster probe.
Explanation: A severe error was encountered while
6027-2791 Disk diskName does not belong to file
running the cluster probe to determine the state of the
system deviceName.
nodes in the cluster.
Explanation: The input disk name does not belong to
User response: Examine additional error messages.
the specified file system.
The node will attempt to rejoin the cluster.
User response: Correct the command line.
| 6027-2786 [E] Unable to contact any quorum nodes
during cluster probe. 6027-2792 The current file system version does not
support default per-fileset quotas.
Explanation: This node has been unable to contact any
quorum nodes during cluster probe, which might Explanation: The current version of the file system
indicate a network outage or too many quorum node does not support default fileset-level quotas.
failures.
User response: Use the mmchfs -V command to
User response: Determine whether there was a activate the new function.
network outage or whether quorum nodes failed.

| 6027-2793 [E] Contents of local fileName file are


| 6027-2787 [E] Unable to contact enough other quorum | invalid. Node may be unable to be
nodes during cluster probe. | elected group leader.
Explanation: This node, a quorum node, was unable | Explanation: In an environment where tie-breaker
to contact a sufficient number of quorum nodes during | disks are used, the contents of the ballot file have
cluster probe, which might indicate a network outage | become invalid, possibly because the file has been
or too many quorum node failures. | overwritten by another application. This node will be
User response: Determine whether there was a
| unable to be elected group leader.
network outage or whether quorum nodes failed. | User response: Run mmcommon resetTiebreaker,
| which will ensure the GPFS daemon is down on all
| 6027-2788 [E] Attempt to run leader election failed | quorum nodes and then remove the given file on this
with error errorNumber.
| node. After that, restart the cluster on this and on the
| other nodes.
Explanation: This node attempted to run a group
leader election but failed to get elected. This failure
might indicate that two or more quorum nodes
| 6027-2794 [E] Invalid content of disk paxos sector for
attempted to run the election at the same time. As a
| disk diskName.
result, this node will lose cluster membership and then | Explanation: In an environment where tie-breaker
attempt to rejoin the cluster. | disks are used, the contents of either one of the
User response: None. The node will attempt to rejoin
| tie-breaker disks or the ballot files became invalid,
the cluster.
| possibly because the file has been overwritten by
| another application.

| 6027-2789 [E] Tiebreaker script returned a non-zero | User response: Examine mmfs.log file on all quorum
value.
| nodes for indication of a corrupted ballot file. If
| 6027-2793 is found then follow instructions for that
Explanation: The tiebreaker script, invoked during | message. If problem cannot be resolved, shut down
group leader election, returned a non-zero value, which | GPFS across the cluster, undefine, and then redefine the
results in the node losing cluster membership and then | tiebreakerdisks configuration variable, and finally
attempting to rejoin the cluster. | restart the cluster.

234 GPFS: Problem Determination Guide


6027-2795 • 6027-2807 [W]

| 6027-2795 An error occurred while executing 6027-2802 Object name 'poolName_or_filesetName' is


| command for fileSystem. not valid.
| Explanation: A quota command encountered a Explanation: The cited name is not a valid GPFS
| problem on a file system. Processing continues with the object, names an object that is not valid in this context,
| next file system. or names an object that no longer exists.
| User response: None. Informational message only. User response: Correct the input to identify a GPFS
object that exists and is valid in this context.
| 6027-2796 [W] Callback event eventName is not
| supported on this node; processing 6027-2803 Policy set must start with VERSION.
| continues ...
Explanation: The policy set does not begin with
| Explanation: informational VERSION as required.
| User response: User response: Rewrite the policy rules, following the
documented, supported syntax and keywords.
| 6027-2797 [I] Node nodeName: lease request received
| late. Pings sent: pingsSent. Maximum 6027-2804 Unexpected SQL result code -
| pings missed: maxPingsMissed. sqlResultCode.
| Explanation: The cluster manager reports that the Explanation: This could be an IBM programming
| lease request from the given node was received late, error.
| possibly indicating a network outage.
User response: Check that your SQL expressions are
| User response: Check the network connection correct and supported by the current release of GPFS. If
| between this node and the node listed in the message. the error recurs, contact the IBM Support Center.

| 6027-2798 [E] The node nodeName does not have a | 6027-2805 [I] Loaded policy 'policyFileName or
| valid Extended License to run the filesystemName': summaryOfPolicyRules
| requested command.
Explanation: The specified loaded policy has the
| Explanation: The file system manager node does not specified policy rules.
| have a valid extended license to run ILM, AFM, or
User response: None. Informational message only.
| CNFS command.
| User response: Make sure gpfs.ext package is installed
| correctly on file system manager node and try again.
| 6027-2806 [E] Error while validating policy
'policyFileName or filesystemName':
rc=errorCode: errorDetailsString
6027-2800 Available memory exceeded on request
Explanation: An error occurred while validating the
to allocate number bytes. Trace point
specified policy.
sourceFile-tracePoint.
User response: Correct the policy rules, heeding the
Explanation: The available memory was exceeded
error details in this message and other messages issued
during an allocation request made from the cited
immediately before or after this message. Use the
source file and trace point.
mmchpolicy command to install a corrected policy
User response: Try shutting down and then restarting rules file.
GPFS. If the problem recurs, contact the IBM Support
Center.
| 6027-2807 [W] Error in evaluation of placement
policy for file fileName: errorDetailsString
6027-2801 Policy set syntax version versionString
Explanation: An error occurred while evaluating the
not supported.
installed placement policy for a particular new file.
Explanation: The policy rules do not comply with the Although the policy rules appeared to be syntactically
supported syntax. correct when the policy was installed, evidently there is
a problem when certain values of file attributes occur at
User response: Rewrite the policy rules, following the
runtime.
documented, supported syntax and keywords.
User response: Determine which file names and
attributes trigger this error. Correct the policy rules,
heeding the error details in this message and other
messages issued immediately before or after this

Chapter 13. Messages 235


6027-2808 • 6027-2818

message. Use the mmchpolicy command to install a


6027-2814 This 'ruleName' rule is of unknown type
corrected policy rules file.
or not supported.
Explanation: The policy rule set seems to have a rule
6027-2808 In rule 'ruleName' (ruleNumber),
of an unknown type or a rule that is unsupported by
'wouldBePoolName' is not a valid pool
the current release of GPFS.
name.
User response: Correct the rule and reissue the policy
Explanation: The cited name that appeared in the
command.
cited rule is not a valid pool name. This may be
because the cited name was misspelled or removed
from the file system. 6027-2815 The value 'value' is not supported in a
'clauseType' clause.
User response: Correct or remove the rule.
Explanation: The policy rule clause seems to specify
an unsupported argument or value that is not
6027-2809 Validated policy 'policyFileName or
supported by the current release of GPFS.
filesystemName': summaryOfPolicyRules
User response: Correct the rule and reissue the policy
Explanation: The specified validated policy has the
command.
specified policy rules.
User response: None. Informational message only.
6027-2816 Policy rules employ features that would
require a file system upgrade.
| 6027-2810 [W] There are numberOfPools storage pools
Explanation: One or more policy rules have been
but the policy file is missing or empty.
written to use new features that cannot be installed on
Explanation: The cited number of storage pools are a back-level file system.
defined, but the policy file is missing or empty.
User response: Install the latest GPFS software on all
User response: You should probably install a policy nodes and upgrade the file system or change your
with placement rules using the mmchpolicy command, rules. (Note that LIMIT was introduced in GPFS
so that at least some of your data will be stored in your Release 3.2.)
nonsystem storage pools.
6027-2817 Error on popen/pclose (command_string):
6027-2811 Policy has no storage pool placement rc=return_code_from_popen_or_pclose
rules!
Explanation: The execution of the command_string by
Explanation: The policy has no storage pool popen/pclose resulted in an error.
placement rules.
User response: To correct the error, do one or more of
User response: You should probably install a policy the following:
with placement rules using the mmchpolicy command,
Check that the standard m4 macro processing
so that at least some of your data will be stored in your
command is installed on your system as /usr/bin/m4.
nonsystem storage pools.
Or:
6027-2812 Keyword 'keywordValue' begins a second Set the MM_M4_CMD environment variable.
clauseName clause - only one is allowed.
Or:
Explanation: The policy rule should only have one
Correct the macro definitions in your policy rules file.
clause of the indicated type.
If the problem persists, contact the IBM Support Center.
User response: Correct the rule and reissue the policy
command.
6027-2818 A problem occurred during m4
processing of policy rules. rc =
6027-2813 This 'ruleName' rule is missing a
return_code_from_popen_pclose_or_m4
clauseType required clause.
Explanation: An attempt to expand the policy rules
Explanation: The policy rule must have a clause of the
with an m4 subprocess yielded some warnings or
indicated type.
errors or the m4 macro wrote some output to standard
User response: Correct the rule and reissue the policy error. Details or related messages may follow this
command. message.
User response: To correct the error, do one or more of
the following:

236 GPFS: Problem Determination Guide


6027-2819 • 6027-2951

Check that the standard m4 macro processing


| 6027-2824 This file system version does not
command is installed on your system as /usr/bin/m4.
| support encryption rules.
Or:
| Explanation: This file system version does not support
Set the MM_M4_CMD environment variable. | encryption.
Or: | User response: Update the file system to a version
| which supports encryption.
Correct the macro definitions in your policy rules file.
If the problem persists, contact the IBM Support Center. | 6027-2825 Duplicate encryption set name 'setName'.
| Explanation: The given set name is duplicated in the
6027-2819 Error opening temp file temp_file_name: | policy file.
errorString
| User response: Ensure each set name appears only
Explanation: An error occurred while attempting to | once in the policy file.
open the specified temporary work file.
User response: Check that the path name is defined | 6027-2826 The encryption set 'setName' requested
and accessible. Check the file and then reissue the | by rule 'rule' could not be found.
command.
| Explanation: The given set name used in the rule
| cannot be found.
6027-2820 Error reading temp file temp_file_name:
errorString | User response: Verify if the set name is correct. Add
| the given set if it is missing from the policy.
Explanation: An error occurred while attempting to
read the specified temporary work file.
| 6027-2827 [E] Error in evaluation of encryption policy
User response: Check that the path name is defined | for file fileName: %s
and accessible. Check the file and then reissue the
command. | Explanation: An error occurred while evaluating the
| encryption rules in the given policy file.
6027-2821 Rule 'ruleName' (ruleNumber) specifies a | User response: Examine the other error messages
THRESHOLD for EXTERNAL POOL | produced while evaluating the policy file.
'externalPoolName'. This is not supported.
Explanation: GPFS does not support the | 6027-2828 [E] Encryption not supported on Windows.
THRESHOLD clause within a migrate rule that names | Encrypted file systems are not allowed
an external pool in the FROM POOL clause. | when Windows nodes are present in the
| cluster.
User response: Correct or remove the rule.
| Explanation: Self-explanatory.
| 6027-2822 This file system does not support fast | User response: To activate encryption, ensure there are
| extended attributes, which is needed for | no Windows nodes in the cluster.
| encryption.
| Explanation: Fast extended attributes need to be | 6027-2950 [E] Trace value 'value' after class 'class' must
| supported by the file system for encryption to be be from 0 to 14.
| activated.
Explanation: The specified trace value is not
| User response: Enable the fast extended attributes recognized.
| feature in this file system.
User response: Specify a valid trace integer value.

| 6027-2823 [E] Encryption activated in the file system,


6027-2951 Value value for worker1Threads must be
| but node not enabled for encryption.
<= than the original setting value
| Explanation: The file system is enabled for encryption,
Explanation: An attempt to dynamically set
| but this node is not.
worker1Threads found the value out of range. The
| User response: Ensure the GPFS encryption packages dynamic value must be 2 <= value <= the original
| are installed. Verify if encryption is supported on this setting when the GPFS daemon was started.
| node architecture.

Chapter 13. Messages 237


6027-2952 [E] • 6027-3005 [W]

| 6027-2952 [E] Unknown assert class 'assertClass'. 6027-3000 [E] No disk enclosures were found on the
target node.
Explanation: The assert class is not recognized.
Explanation: GPFS is unable to communicate with any
User response: Specify a valid assert class.
disk enclosures on the node serving the specified
pdisks. This might be because there are no disk
| 6027-2953 [E] Non-numeric assert value 'value' after enclosures attached to the node, or it might indicate a
class 'class'. problem in communicating with the disk enclosures.
While the problem persists, disk maintenance with the
Explanation: The specified assert value is not mmchcarrier command is not available.
recognized.
User response: Check disk enclosure connections and
User response: Specify a valid assert integer value. run the command again. Use mmaddpdisk --replace as
an alternative method of replacing failed disks.
| 6027-2954 [E] Assert value 'value' after class 'class' must
be from 0 to 127. 6027-3001 [E] Location of pdisk pdiskName of recovery
Explanation: The specified assert value is not group recoveryGroupName is not known.
recognized. Explanation: GPFS is unable to find the location of the
User response: Specify a valid assert integer value. given pdisk.
User response: Check the disk enclosure hardware.
| 6027-2955 [W] Time-of-day may have jumped back.
Late by delaySeconds seconds to wake 6027-3002 [E] Disk location code locationCode is not
certain threads. known.
Explanation: Time-of-day may have jumped back, Explanation: A disk location code specified on the
which has resulted in some threads being awakened command line was not found.
later than expected. It is also possible that some other
factor has caused a delay in waking up the threads. User response: Check the disk location code.
User response: Verify if there is any problem with
network time synchronization, or if time-of-day is being 6027-3003 [E] Disk location code locationCode was
incorrectly set. specified more than once.
Explanation: The same disk location code was
| 6027-2956 [E] Invalid crypto engine type specified more than once in the tschcarrier command.
| (encryptionCryptoEngineType):
User response: Check the command usage and run
| cryptoEngineType.
again.
| Explanation: The specified value for
| encryptionCryptoEngineType is incorrect. 6027-3004 [E] Disk location codes locationCode and
| User response: Specify a valid value for locationCode are not in the same disk
| encryptionCryptoEngineType. carrier.
Explanation: The tschcarrier command cannot be used
| 6027-2957 [E] Invalid cluster manager selection choice to operate on more than one disk carrier at a time.
| (clusterManagerSelection):
User response: Check the command usage and rerun.
| clusterManagerSelection.
| Explanation: The specified value for 6027-3005 [W] Pdisk in location locationCode is
| clusterManagerSelection is incorrect. controlled by recovery group
| User response: Specify a valid value for recoveryGroupName.
| clusterManagerSelection. Explanation: The tschcarrier command detected that a
pdisk in the indicated location is controlled by a
| 6027-2958 [E] Invalid NIST compliance type different recovery group than the one specified.
| (nistCompliance): nistComplianceValue.
User response: Check the disk location code and
| Explanation: The specified value for nistCompliance recovery group name.
| is incorrect.
| User response: Specify a valid value for
| nistCompliance.

238 GPFS: Problem Determination Guide


6027-3006 [W] • 6027-3016 [E]

User response: Check the disk enclosure hardware. If


6027-3006 [W] Pdisk in location locationCode is
the disk carrier has a lock and does not unlock, try
controlled by recovery group id
running the command again or use the manual carrier
idNumber.
release.
Explanation: The tschcarrier command detected that a
pdisk in the indicated location is controlled by a
6027-3012 [E] Cannot find a pdisk in location
different recovery group than the one specified.
locationCode.
User response: Check the disk location code and
Explanation: The tschcarrier command cannot find a
recovery group name.
pdisk to replace in the given location.
User response: Check the disk location code.
6027-3007 [E] Carrier contains pdisks from more than
one recovery group.
6027-3013 [W] Disk location locationCode failed to
Explanation: The tschcarrier command detected that a
power on.
disk carrier contains pdisks controlled by more than
one recovery group. Explanation: The mmchcarrier command detected an
error when trying to power on a disk.
User response: Use the tschpdisk command to bring
the pdisks in each of the other recovery groups offline User response: Make sure the disk is firmly seated
and then rerun the command using the --force-RG flag. and run the command again.

6027-3008 [E] Incorrect recovery group given for 6027-3014 [E] Pdisk pdiskName of recovery group
location. recoveryGroupName was expected to be
replaced with a new disk; instead, it
Explanation: The mmchcarrier command detected that
was moved from location locationCode to
the specified recovery group name given does not
location locationCode.
match that of the pdisk in the specified location.
Explanation: The mmchcarrier command expected a
User response: Check the disk location code and
pdisk to be removed and replaced with a new disk. But
recovery group name. If you are sure that the disks in
instead of being replaced, the old pdisk was moved
the carrier are not being used by other recovery groups,
into a different location.
it is possible to override the check using the --force-RG
flag. Use this flag with caution as it can cause disk User response: Repeat the disk replacement
errors and potential data loss in other recovery groups. procedure.

6027-3009 [E] Pdisk pdiskName of recovery group 6027-3015 [E] Pdisk pdiskName of recovery group
recoveryGroupName is not currently recoveryGroupName in location
scheduled for replacement. locationCode cannot be used as a
replacement for pdisk pdiskName of
Explanation: A pdisk specified in a tschcarrier or
recovery group recoveryGroupName.
tsaddpdisk command is not currently scheduled for
replacement. Explanation: The tschcarrier command expected a
pdisk to be removed and replaced with a new disk. But
User response: Make sure the correct disk location
instead of finding a new disk, the mmchcarrier
code or pdisk name was given. For the mmchcarrier
command found that another pdisk was moved to the
command, the --force-release option can be used to
replacement location.
override the check.
User response: Repeat the disk replacement
procedure, making sure to replace the failed pdisk with
6027-3010 [E] Command interrupted.
a new disk.
Explanation: The mmchcarrier command was
interrupted by a conflicting operation, for example the
6027-3016 [E] Replacement disk in location
mmchpdisk --resume command on the same pdisk.
locationCode has an incorrect FRU
User response: Run the mmchcarrier command again. fruCode; expected FRU code is fruCode.
Explanation: The replacement disk has a different
6027-3011 [W] Disk location locationCode failed to field replaceable unit code than that of the original
power off. disk.
Explanation: The mmchcarrier command detected an User response: Replace the pdisk with a disk of the
error when trying to power off a disk. same part number. If you are certain the new disk is a
valid substitute, override this check by running the

Chapter 13. Messages 239


6027-3017 [E] • 6027-3028 [E]

command again with the --force-fru option.


6027-3023 [E] Error initializing vdisk.
Explanation: The tscrvdisk command could not
6027-3017 [E] Error formatting replacement disk
initialize the vdisk.
diskName.
User response: Retry the command.
Explanation: An error occurred when trying to format
a replacement pdisk.
6027-3024 [E] Error retrieving recovery group
User response: Check the replacement disk.
recoveryGroupName event log.
Explanation: Because of an error, the
6027-3018 [E] A replacement for pdisk pdiskName of
tslsrecoverygroupevents command was unable to
recovery group recoveryGroupName was
retrieve the full event log.
not found in location locationCode.
User response: None.
Explanation: The tschcarrier command expected a
pdisk to be removed and replaced with a new disk, but
no replacement disk was found. 6027-3025 [E] Device deviceName does not exist or is
not active on this node.
User response: Make sure a replacement disk was
inserted into the correct slot. Explanation: The specified device was not found on
this node.
6027-3019 [E] Pdisk pdiskName of recovery group User response: None.
recoveryGroupName in location
locationCode was not replaced.
6027-3026 [E] Recovery group recoveryGroupName does
Explanation: The tschcarrier command expected a | not have an active log home vdisk.
pdisk to be removed and replaced with a new disk, but
Explanation: The indicated recovery group does not
the original pdisk was still found in the replacement
location.
| have an active log vdisk. This may be because the log
| home vdisk has not yet been created, because a
User response: Repeat the disk replacement, making | previously existing log home vdisk has been deleted, or
sure to replace the pdisk with a new disk. because the server is in the process of recovery.
| User response: Create a log home vdisk if none exists.
6027-3020 [E] Invalid state change, stateChangeName, Retry the command.
for pdisk pdiskName.
Explanation: The tschpdisk command received an 6027-3027 [E] Cannot configure NSD-RAID services
state change request that is not permitted. on this node.
User response: Correct the input and reissue the Explanation: NSD-RAID services are not supported on
command. this operating system or node hardware.
User response: Configure a supported node type as
6027-3021 [E] Unable to change identify state to the NSD RAID server and restart the GPFS daemon.
identifyState for pdisk pdiskName:
err=errorNum.
6027-3028 [E] There is not enough space in
Explanation: The tschpdisk command failed on an declustered array declusteredArrayName
identify request. for the requested vdisk size. The
maximum possible size for this vdisk is
User response: Check the disk enclosure hardware.
size.
Explanation: There is not enough space in the
6027-3022 [E] Unable to create vdisk metadata.
declustered array for the requested vdisk size.
Explanation: The tscrvdisk command could not create
User response: Create a smaller vdisk, remove
the necessary metadata for the specified vdisk.
existing vdisks or add additional pdisks to the
User response: Change the vdisk arguments and retry declustered array.
the command.

240 GPFS: Problem Determination Guide


6027-3029 [E] • 6027-3040 [E]

User response: Verify the input file and retry the


6027-3029 [E] There must be at least number non-spare
command.
pdisks in declustered array
declusteredArrayName to avoid falling
below the code width of vdisk | 6027-3035 [A] Cannot configure NSD-RAID services.
vdiskName. maxblocksize must be at least value.
Explanation: A change of spares operation failed Explanation: The GPFS daemon is starting and cannot
because the resulting number of non-spare pdisks initialize the NSD-RAID services because the
would fall below the code width of the indicated vdisk. maxblocksize attribute is too small.
User response: Add additional pdisks to the User response: Correct the maxblocksize attribute and
declustered array. restart the GPFS daemon.

6027-3030 [E] There must be at least number non-spare 6027-3036 [E] Partition size must be a power of 2.
pdisks in declustered array
declusteredArrayName for configuration Explanation: The partitionSize parameter of some
data replicas. declustered array was invalid.

Explanation: A delete pdisk or change of spares User response: Correct the partitionSize parameter
operation failed because the resulting number of and reissue the command.
non-spare pdisks would fall below the number required
to hold configuration data for the declustered array. 6027-3037 [E] Partition size must be between number
User response: Add additional pdisks to the and number.
declustered array. If replacing a pdisk, use mmchcarrier Explanation: The partitionSize parameter of some
or mmaddpdisk --replace. declustered array was invalid.
User response: Correct the partitionSize parameter to
6027-3031 [E] There is not enough available a power of 2 within the specified range and reissue the
configuration data space in declustered command.
array declusteredArrayName to complete
this operation.
6027-3038 [E] AU log too small; must be at least
Explanation: Creating a vdisk, deleting a pdisk, or number bytes.
changing the number of spares failed because there is
not enough available space in the declustered array for Explanation: The auLogSize parameter of a new
configuration data. declustered array was invalid.

User response: Replace any failed pdisks in the User response: Increase the auLogSize parameter and
declustered array and allow time for rebalance reissue the command.
operations to more evenly distribute the available
space. Add pdisks to the declustered array. 6027-3039 [E] A vdisk with disk usage vdiskLogTip
must be the first vdisk created in a
6027-3032 [E] Temporarily unable to create vdisk recovery group.
vdiskName because more time is required Explanation: The --logTip disk usage was specified
to rebalance the available space in for a vdisk other than the first one created in a
declustered array declusteredArrayName. recovery group.
Explanation: Cannot create the specified vdisk until User response: Retry the command with a different
rebuild and rebalance processes are able to more evenly disk usage.
distribute the available space.
User response: Replace any failed pdisks in the 6027-3040 [E] Declustered array configuration data
recovery group, allow time for rebuild and rebalance does not fit.
processes to more evenly distribute the spare space
within the array, and retry the command. Explanation: There is not enough space in the pdisks
of a new declustered array to hold the AU log area
using the current partition size.
6027-3034 [E] The input pdisk name (pdiskName) did
not match the pdisk name found on User response: Increase the partitionSize parameter
disk (pdiskName). or decrease the auLogSize parameter and reissue the
command.
Explanation: Cannot add the specified pdisk, because
the input pdiskName did not match the pdiskName that
was written on the disk.

Chapter 13. Messages 241


6027-3041 [E] • 6027-3050 [E]

6027-3041 [E] Declustered array attributes cannot be 6027-3046 [E] The nonStealable buffer limit may be
changed. too low on server serverName. Check the
configuration attributes of the recovery
Explanation: The partitionSize and auLogSize
group servers: pagepool,
attributes of a declustered array cannot be changed
nsdRAIDBufferPoolSizePct,
after the declustered array has been created. They may
nsdRAIDNonStealableBufPct.
only be set by a command that creates the declustered
array. Explanation: The limit of non-stealable buffers is low
on the specified recovery group server. This is probably
User response: Remove the partitionSize and
because of an improperly configured system. We would
auLogSize attributes from the input file of the
expect the specified configuration variables to be the
mmaddpdisk command and reissue the command.
same for the recovery group servers.
User response: Use the mmchconfig command to
6027-3042 [E] The log tip vdisk cannot be destroyed if
correct the configuration.
there are other vdisks.
| Explanation: In recovery groups with versions prior to
6027-3047 [E] Location of pdisk pdiskName is not
| 3.5.0.11, the log tip vdisk cannot be destroyed if other known.
| vdisks still exist within the recovery group.
Explanation: GPFS is unable to find the location of the
| User response: Remove the user vdisks or upgrade
given pdisk.
| the version of the recovery group with
| mmchrecoverygroup --version, then retry the User response: Check the disk enclosure hardware.
| command to remove the log tip vdisk.
6027-3048 [E] Pdisk pdiskName is not currently
6027-3043 [E] Log vdisks cannot have multiple use scheduled for replacement.
specifications.
Explanation: A pdisk specified in a tschcarrier or
Explanation: A vdisk can have usage vdiskLog, tsaddpdisk command is not currently scheduled for
vdiskLogTip, or vdiskLogReserved, but not more than replacement.
one.
User response: Make sure the correct disk location
User response: Retry the command with only one of code or pdisk name was given. For the tschcarrier
the --log, --logTip, or --logReserved attributes. command, the --force-release option can be used to
override the check.
6027-3044 [E] Unable to determine resource
requirements for all the recovery groups 6027-3049 [E] The minimum size for vdisk vdiskName
served by node value: to override this is number.
check reissue the command with the -v
Explanation: The vdisk size was too small.
no flag.
User response: Increase the size of the vdisk and retry
Explanation: A recovery group or vdisk is being
the command.
created, but GPFS can not determine if there are
enough non-stealable buffer resources to allow the node
to successfully serve all the recovery groups at the 6027-3050 [E] There are already number suspended
same time once the new object is created. pdisks in declustered array arrayName.
You must resume pdisks in the array
User response: You can override this check by
before suspending more.
reissuing the command with the -v flag.
Explanation: The number of suspended pdisks in the
declustered array has reached the maximum limit.
| 6027-3045 [W] Buffer request exceeds the
Allowing more pdisks to be suspended in the array
non-stealable buffer limit, increase the
would put data availability at risk.
nsdRAIDNonStealableBufPct.
User response: Resume one more suspended pdisks in
Explanation: The limit of non-stealable buffers has
the array by using the mmchcarrier or mmchpdisk
been exceeded.
commands then retry the command.
User response: Use the mmchconfig command to
increase the nsdRAIDNonStealableBufPct attribute.

242 GPFS: Problem Determination Guide


6027-3051 [E] • 6027-3061 [E]

6027-3051 [E] Checksum granularity must be number 6027-3057 [E] Disk enclosure is no longer reporting
or number. information on location locationCode.
Explanation: The only allowable values for the Explanation: The disk enclosure reported an error
checksumGranularity attribute of a vdisk are 8K and when GPFS tried to obtain updated status on the disk
32K. location.
User response: Change the checksumGranularity User response: Try running the command again. Make
attribute of the vdisk, then retry the command. sure that the disk enclosure firmware is current. Check
for improperly seated connectors within the disk
enclosure.
6027-3052 [E] Checksum granularity cannot be
specified for log vdisks.
| 6027-3058 [A] GSS license failure - GPFS Native
Explanation: The checksumGranularity attribute
RAID services will not be configured on
cannot be applied to a log vdisk.
this node.
User response: Remove the checksumGranularity
Explanation: The GPFS Storage Server has not been
attribute of the log vdisk, then retry the command.
validly installed. Therefore, GPFS Native RAID services
will not be configured.
6027-3053 [E] Vdisk block size must be between
User response: Install a legal copy of the base GPFS
number and number for the specified
code and restart the GPFS daemon.
code when checksum granularity number
is used.
6027-3059 [E] The serviceDrain state is only permitted
Explanation: An invalid vdisk block size was
when all nodes in the cluster are
specified. The message lists the allowable range of
running daemon version version or
block sizes.
higher.
User response: Use a vdisk virtual block size within
Explanation: The mmchpdisk command option
the range shown, or use a different vdisk RAID code,
--begin-service-drain was issued, but there are
or use a different checksum granularity.
backlevel nodes in the cluster that do not support this
action.
6027-3054 [W] Disk in location locationCode failed to
User response: Upgrade the nodes in the cluster to at
come online.
least the specified version and run the command again.
Explanation: The mmchcarrier command detected an
error when trying to bring a disk back online.
| 6027-3060 [E] Block sizes of all log vdisks must be the
User response: Make sure the disk is firmly seated | same.
and run the command again. Check the operating
| Explanation: The block sizes of the log tip vdisk, the
system error log.
| log tip backup vdisk, and the log home vdisk must all
| be the same.
6027-3055 [E] The fault tolerance of the code cannot
| User response: Try running the command again after
be greater than the fault tolerance of the
| adjusting the block sizes of the log vdisks.
internal configuration data.
Explanation: The RAID code specified for a new vdisk
6027-3061 [E] Cannot delete path pathName because
is more fault-tolerant than the configuration data that
there would be no other working paths
will describe the vdisk.
to pdisk pdiskName of RG
User response: Use a code with a smaller fault recoveryGroupName.
tolerance.
Explanation: When the -v yes option is specified on
the --delete-paths subcommand of the tschrecgroup
6027-3056 [E] Long and short term event log size and command, it is not allowed to delete the last working
fast write log percentage are only path to a pdisk.
applicable to log home vdisk.
User response: Try running the command again after
Explanation: The longTermEventLogSize, repairing other broken paths for the named pdisk, or
shortTermEventLogSize, and fastWriteLogPct options reduce the list of paths being deleted, or run the
are only applicable to log home vdisk. command with -v no.

User response: Remove any of these options and retry


vdisk creation.

Chapter 13. Messages 243


6027-3062 [E] • 6027-3074 [E]

6027-3062 [E] Recovery group version version is not | 6027-3068 [E] The sizes of the log tip vdisk and the
compatible with the current recovery | log tip backup vdisk must be the same.
group version.
| Explanation: The log tip vdisk must be the same size
Explanation: The recovery group version specified | as the log tip backup vdisk.
with the --version option does not support all of the
| User response: Adjust the vdisk sizes and retry the
features currently supported by the recovery group.
| mmcrvdisk command.
User response: Run the command with a new value
for --version. The allowable values will be listed
| 6027-3069 [E] Log vdisks cannot use code codeName.
following this message.
| Explanation: Log vdisks must use a RAID code that
| uses replication, or be unreplicated. They cannot use
6027-3063 [E] Unknown recovery group version
| parity-based codes such as 8+2P.
version.
| User response: Retry the command with a valid RAID
Explanation: The recovery group version named by
| code.
the argument of the --version option was not
recognized.
| 6027-3070 [E] Log vdisk vdiskName cannot appear in
User response: Run the command with a new value
| the same declustered array as log vdisk
for --version. The allowable values will be listed
| vdiskName.
following this message.
| Explanation: No two log vdisks may appear in the
| same declustered array.
6027-3064 [I] Allowable recovery group versions are:
| User response: Specify a different declustered array
Explanation: Informational message listing allowable
| for the new log vdisk and retry the command.
recovery group versions.
User response: Run the command with one of the
| 6027-3071 [E] Device not found: deviceName.
recovery group versions listed.
| Explanation: A device name given in an
| mmcrrecoverygroup or mmaddpdisk command was
6027-3065 [E] The maximum size of a log tip vdisk is
| not found.
size.
| User response: Check the device name.
Explanation: Running mmcrvdisk for a log tip vdisk
failed because the size is too large.
| 6027-3072 [E] Invalid device name: deviceName.
User response: Correct the size parameter and run the
command again. | Explanation: A device name given in an
| mmcrrecoverygroup or mmaddpdisk command is
| 6027-3066 [E] A recovery group may only contain one | invalid.
| log tip vdisk. | User response: Check the device name.
| Explanation: A log tip vdisk already exists in the
| recovery group. | 6027-3073 [E] Error formatting pdisk pdiskName on
| User response: None. | device diskName.
| Explanation: An error occurred when trying to format
| 6027-3067 [E] Log tip backup vdisks not supported by | a new pdisk.
| this recovery group version. | User response: Check that the disk is working
| Explanation: Vdisks with usage type | properly.
| vdiskLogTipBackup are not supported by all recovery
| group versions. | 6027-3074 [E] Node nodeName not found in cluster
| User response: Upgrade the recovery group to a later | configuration.
| version using the --version option of | Explanation: A node name specified in a command
| mmchrecoverygroup. | does not exist in the cluster configuration.
| User response: Check the command arguments.

244 GPFS: Problem Determination Guide


6027-3075 [E] • 6027-3209

| 6027-3075 [E] The --servers list must contain the | 6027-3080 [E] Cannot remove pdisk pdiskName because
| current node, nodeName. | declustered array declusteredArrayName
| would have fewer disks than its
| Explanation: The --servers list of a tscrrecgroup
| replacement threshold.
| command does not list the server on which the
| command is being run. | Explanation: The replacement threshold for a
| declustered array must not be larger than the number
| User response: Check the --servers list. Make sure the
| of pdisks in the declustered array.
| tscrrecgroup command is run on a server that will
| actually server the recovery group. | User response: Reduce the replacement threshold for
| the declustered array, then retry the mmdelpdisk
| command.
| 6027-3076 [E] Remote pdisks are not supported by this
| recovery group version.
6027-3200 AFM ERROR: command pCacheCmd
| Explanation: Pdisks that are not directly attached are
fileset filesetName fileids
| not supported by all recovery group versions.
[parentId,childId,tParentId,targetId,ReqCmd]
| User response: Upgrade the recovery group to a later name sourceName original error oerr
| version using the --version option of application error aerr remote error
| mmchrecoverygroup. remoteError
Explanation: AFM operations on a particular file
| 6027-3077 [E] There must be at least number pdisks in failed.
| recovery group recoveryGroupName for
User response: For asynchronous operations that are
| configuration data replicas.
requeued, run the mmafmctl command with the
| Explanation: A change of pdisks failed because the resumeRequeued option after fixing the problem at the
| resulting number of pdisks would fall below the home cluster.
| needed replication factor for the recovery group
| descriptor.
6027-3201 AFM ERROR DETAILS: type:
| User response: Do not attempt to delete more pdisks. remoteCmdType snapshot name
snapshotName snapshot ID snapshotId

| 6027-3078 [E] Replacement threshold for declustered Explanation: Peer snapshot creation or deletion failed.
| array declusteredArrayName of recovery
User response: Fix snapshot creation or deletion error.
| group recoveryGroupName cannot exceed
| number.
6027-3204 AFM: Failed to set xattr on inode
| Explanation: The replacement threshold cannot be
inodeNum error err, ignoring.
| larger than the maximum number of pdisks in a
| declustered array. The maximum number of pdisks in a Explanation: Setting extended attributes on an inode
| declustered array depends on the version number of failed.
| the recovery group. The current limit is given in this
User response: None.
| message.
| User response: Use a smaller replacement threshold or
6027-3205 AFM: Failed to get xattrs for inode
| upgrade the recovery group version.
inodeNum, ignoring.
Explanation: Getting extended attributes on an inode
| 6027-3079 [E] Number of spares for declustered array
failed.
| declusteredArrayName of recovery group
| recoveryGroupName cannot exceed number. User response: None.
| Explanation: The number of spares cannot be larger
| than the maximum number of pdisks in a declustered 6027-3209 Home NFS mount of host:path failed
| array. The maximum number of pdisks in a declustered with error err
| array depends on the version number of the recovery
| group. The current limit is given in this message. Explanation: NFS mounting of path from the home
cluster failed.
| User response: Use a smaller number of spares or
| upgrade the recovery group version. User response: Make sure the exported path can be
mounted over NFSv3.

Chapter 13. Messages 245


6027-3210 • 6027-3225 [I]

6027-3210 Cannot find AFM control file for fileset 6027-3216 Fileset filesetName encountered an error
filesetName in the exported file system at synchronizing with the remote cluster.
home. ACLs and extended attributes Cannot synchronize with the remote
will not be synchronized. Sparse files cluster until AFM recovery is executed.
will have zeros written for holes.
Explanation: Cache failed to synchronize with home
Explanation: Either home path does not belong to because of an out of memory or conflict error.
GPFS, or the AFM control file is not present in the Recovery, resynchronization, or both will be performed
exported path. by GPFS to synchronize cache with the home.
User response: If the exported path belongs to a GPFS User response: None.
| file system, run the mmafmconfig command with the
enable option on the export path at home.
6027-3217 AFM ERROR Unable to unmount NFS
export for fileset filesetName
6027-3211 Change in home export detected.
Explanation: NFS unmount of the path failed.
Caching will be disabled.
User response: None.
Explanation: A change in home export was detected
or the home path is stale.
6027-3220 AFM: Home NFS mount of host:path
User response: Ensure the exported path is accessible.
failed with error err for file system
fileSystem fileset id filesetName. Caching
6027-3212 AFM ERROR: Cannot enable AFM for will be disabled and the mount will be
fileset filesetName (error err) tried again after mountRetryTime seconds,
on next request to gateway
Explanation: AFM was not enabled for the fileset
because the root file handle was modified, or the Explanation: NFS mount of the home cluster failed.
remote path is stale. The mount will be tried again after mountRetryTime
seconds.
User response: Ensure the remote export path is
accessible for NFS mount. User response: Make sure the exported path can be
mounted over NFSv3.
6027-3213 Cannot find snapshot link directory
name for exported file system at home 6027-3221 AFM: Home NFS mount of host:path
for fileset filesetName. Snapshot directory succeeded for file system fileSystem
at home will be cached. fileset filesetName. Caching is enabled.
Explanation: Unable to determine the snapshot Explanation: NFS mounting of the path from the
directory at the home cluster. home cluster succeeded. Caching is enabled.
User response: None. User response: None.

| 6027-3214 [E] AFM: Unexpiration of fileset filesetName | 6027-3224 [I] AFM: Failed to set extended attributes
failed with error err. Use mmafmctl to on file system fileSystem inode inodeNum
manually unexpire the fileset. error err, ignoring.
Explanation: Unexpiration of fileset failed after a Explanation: Setting extended attributes on an inode
home reconnect. failed.
User response: Run the mmafmctl command with the User response: None.
unexpire option on the fileset.
| 6027-3225 [I] AFM: Failed to get extended attributes
| 6027-3215 [W] AFM: Peer snapshot delayed due to for file system fileSystem inode
long running execution of operation to inodeNum, ignoring.
remote cluster for fileset filesetName.
Explanation: Getting extended attributes on an inode
Peer snapshot continuing to wait.
failed.
Explanation: Peer snapshot command timed out
User response: None.
waiting to flush messages.
User response: None.

246 GPFS: Problem Determination Guide


6027-3226 [I] • 6027-3239 [E]

| 6027-3226 [I] AFM: Cannot find control file for file | 6027-3232 type AFM: pCacheCmd file system
system fileSystem fileset filesetName in the | fileSystem fileset filesetName file IDs
exported file system at home. ACLs and | [parentId.childId.tParentId.targetId,flag]
extended attributes will not be | name sourceName origin error err
synchronized. Sparse files will have
Explanation: AFM operations on a particular file
zeros written for holes.
failed.
Explanation: Either the home path does not belong to
User response: For asynchronous operations that are
GPFS, or the AFM control file is not present in the
requeued, run the mmafmctl command with the
exported path.
resumeRequeued option after fixing the problem at the
User response: If the exported path belongs to a GPFS home cluster.
| file system, run the mmafmconfig command with the
enable option on the export path at home.
| 6027-3233 [I] AFM: Previous error repeated repeatNum
times.
| 6027-3227 [E] AFM: Cannot enable AFM for file
Explanation: Multiple AFM operations have failed.
system fileSystem fileset filesetName (error
err) User response: None.
Explanation: AFM was not enabled for the fileset
because the root file handle was modified, or the | 6027-3234 [E] AFM: Unable to start thread to unexpire
remote path is stale. filesets.
User response: Ensure the remote export path is Explanation: Failed to start thread for unexpiration of
accessible for NFS mount. fileset.
User response: None.
| 6027-3228 [E] AFM: Unable to unmount NFS export
for file system fileSystem fileset
filesetName | 6027-3235 [I] AFM: Stopping recovery for the file
system fileSystem fileset filesetName
Explanation: NFS unmount of the path failed.
Explanation: AFM recovery terminated because the
User response: None. current node is no longer MDS for the fileset.
User response: None.
| 6027-3229 [E] AFM: File system fileSystem fileset
filesetName encountered an error
synchronizing with the remote cluster. | 6027-3236 [E] AFM: Recovery on file system fileSystem
Cannot synchronize with the remote | fileset filesetName failed with error err.
cluster until AFM recovery is executed. | Recovery will be retried on next access
| after recovery retry interval (timeout
Explanation: The cache failed to synchronize with | seconds) or manually resolve known
home because of an out of memory or conflict error. | problems and recover the fileset.
Recovery, resynchronization, or both will be performed
by GPFS to synchronize the cache with the home. Explanation: Recovery failed to complete on the
fileset. The next access will restart recovery.
User response: None.
Explanation: AFM recovery failed. Fileset will be
temporarily put into dropped state and will be
| 6027-3230 [I] AFM: Cannot find snapshot link recovered on accessing fileset after timeout mentioned
directory name for exported file system in the error message. User can recover the fileset
at home for file system fileSystem fileset manually by running mmafmctl command with
filesetName. Snapshot directory at home recover option after rectifying any known errors
will be cached. leading to failure.
Explanation: Unable to determine the snapshot User response: None.
directory at the home cluster.
User response: None. | 6027-3239 [E] AFM: Remote command remoteCmdType
on file system fileSystem snapshot
snapshotName snapshot ID snapshotId
failed.
Explanation: A failure occurred when creating or
deleting a peer snapshot.

Chapter 13. Messages 247


6027-3240 [E] • 6027-3252 [I]

User response: Examine the error details and retry the


| 6027-3246 [I] AFM: Prefetch recovery started for the
operation.
| file system fileSystem fileset filesetName.
| Explanation: Prefetch recovery started.
| 6027-3240 [E] AFM: pCacheCmd file system fileSystem
fileset filesetName file IDs | User response: None.
[parentId.childId.tParentId.targetId,flag]
error err
| 6027-3247 [I] AFM: Prefetch recovery completed for
Explanation: Operation failed to execute on home in | the file system fileSystem fileset
independent-writer mode. | filesetName. error error
User response: None. | Explanation: Prefetch recovery completed.
| User response: None.
| 6027-3241 [I] AFM: GW queue transfer started for file
| system fileSystem fileset filesetName.
| Transferring to nodeAddress.
| 6027-3248 [E] AFM: Cannot find the control file for
| fileset filesetName in the exported file
| Explanation: An old GW initiated the queue transfer | system at home. This file is required to
| because a new GW node joined the cluster, and the | operate in DR primary mode. The fileset
| fileset now belongs to the new GW node. | will be disabled.
| User response: None. | Explanation: Either the home path does not belong to
| GPFS, or the AFM control file is not present in the
| exported path.
| 6027-3242 [I] AFM: GW queue transfer started for file
| system fileSystem fileset filesetName. | User response: If the exported path belongs to a GPFS
| Receiving from nodeAddress. | file system, run the mmafmconfig command with the
| enable option on the export path at home.
| Explanation: An old MDS initiated the queue transfer
| because this node joined the cluster as GW and the
| fileset now belongs to this node. | 6027-3249 [E] AFM: Target for fileset filesetName is not
| a DR secondary mode fileset or file
| User response: None.
| system. This is required to operate in
| DR primary mode. The fileset will be
| 6027-3243 [I] AFM: GW queue transfer completed for | disabled.
| file system fileSystem fileset filesetName.
| Explanation: AFM target is not a DR secondary fileset
| error error
| or file system.
| Explanation: A GW queue transfer completed.
| User response: The AFM target fileset or file system
| User response: None. | should be converted to DR secondary mode.

| 6027-3244 [I] AFM: Home mount of afmTarget | 6027-3250 [E] AFM: Refresh intervals cannot be set for
| succeeded for file system fileSystem | fileset.
| fileset filesetName. Caching is enabled.
| Explanation: Refresh intervals are not supported on
| Explanation: A mount of the path from the home | DR mode filesets.
| cluster succeeded. Caching is enabled.
| User response: None.
| User response: None.
| 6027-3252 [I] AFM: Home has been restored for cache
| 6027-3245 [E] AFM: Home mount of afmTarget failed | filesetName. Synchronization with home
| with error error for file system fileSystem | will be resumed.
| fileset ID filesetName. Caching will be
| Explanation: A change in home export was detected
| disabled and the mount will be tried
| that caused the home to be restored. Synchronization
| again after mountRetryTime seconds, on
| with home will be resumed.
| the next request to the gateway.
| User response: None.
| Explanation: A mount of the home cluster failed. The
| mount will be tried again after mountRetryTime seconds.
| User response: Verify that the afmTarget can be
| mounted using the specified protocol.

248 GPFS: Problem Determination Guide


6027-3253 [E] • 6027-3402 [X]

| 6027-3253 [E] AFM: Change in home is detected for | 6027-3304 Attention: Disk space reclaim on number
| cache filesetName. Synchronization with | of number regions in fileSystem returned
| home is suspended until the problem is | errors.
| resolved.
| Explanation: Free disk space reclaims on some regions
| Explanation: A change in home export was detected | failed during tsreclaim run. Typically this is due to the
| or the home path is stale. | lack of space reclaim support by the disk controller or
| operating system. It may also be due to utilities such
| User response: Ensure the exported path is accessible.
| defrag or fsck running concurrently.
| User response: Reissue the mmdf command. Verify
| 6027-3254 [W] AFM: Home is taking longer than
| that the disk controllers and the operating systems in
| expected to respond for cache
| the cluster support thin-provisioning space reclaim. Or
| filesetName. Synchronization with home
| rerun the mmreclaim command after defrag or fsck
| is temporarily suspended.
| completes.
| Explanation: An pending message from gateway node
| to home is taking longer than expected to respond. This
| 6027-3305 AFM Fileset filesetName cannot be
| could be the result of a network issue or a problem at
| changed as it is in beingDeleted state
| the home site.
| Explanation: The user specified a fileset to tschfileset
| User response: Ensure the exported path is accessible.
| that cannot be changed.
| User response: None. You cannot change the
6027-3300 Attribute afmShowHomeSnapshot
| attributes of the root fileset.
cannot be changed for a single-writer
fileset.
6027-3400 Attention: The file system is at risk. The
Explanation: Changing afmShowHomeSnapshot is
specified replication factor does not
not supported for single-writer filesets.
tolerate unavailable metadata disks.
User response: None.
Explanation: The default metadata replication was
reduced to one while there were unavailable, or
6027-3301 Unable to quiesce all nodes; some stopped, metadata disks. This condition prevents future
processes are busy or holding required file system manager takeover.
resources.
User response: Change the default metadata
Explanation: A timeout occurred on one or more replication, or delete unavailable disks if possible.
nodes while trying to quiesce the file system during a
snapshot command.
6027-3401 Failure group value for disk diskName is
User response: Check the GPFS log on the file system not valid.
manager node.
Explanation: An explicit failure group must be
specified for each disk that belongs to a write affinity
6027-3302 Attribute afmShowHomeSnapshot enabled storage pool.
cannot be changed for a afmMode fileset.
User response: Specify a valid failure group.
Explanation: Changing afmShowHomeSnapshot is
not supported for single-writer or independent-writer
| 6027-3402 [X] An unexpected device mapper path
filesets.
dmDevice (nsdId) was detected. The new
User response: None. path does not have Persistent Reserve
enabled. The local access to disk
diskName will be marked as down.
6027-3303 Cannot restore snapshot; quota
management is active for fileSystem. Explanation: A new device mapper path was detected,
or a previously failed path was activated after the local
Explanation: File system quota management is still device discovery was finished. This path lacks a
active. The file system must be unmounted when Persistent Reserve and cannot be used. All device paths
restoring global snapshots. must be active at mount time.
User response: Unmount the file system and reissue User response: Check the paths to all disks in the file
the restore command. system. Repair any failed paths to disks then rediscover
the local disk access.

Chapter 13. Messages 249


6027-3404 [E] • 6027-3464 [E]

| 6027-3404 [E] The current file system version does not | 6027-3458 [E] Invalid length for the Keyname string.
| support write caching.
| Explanation: The Keyname string has an incorrect
| Explanation: The current file system version does not | length. The length of the specified string was either
| allow the write caching option. | zero or it was larger than the maximum allowed
| length.
| User response: Use mmchfs -V to convert the file
| system to version 14.04 (4.1.0.0) or higher and reissue | User response: Verify the Keyname string.
| the command.
| 6027-3459 [E] Not enough memory.
| 6027-3450 Error errorNumber when purging key
| (file system fileSystem). Key name format
| Explanation: Unable to allocate memory for the
| possibly incorrect.
| Keyname string.

| Explanation: An error was encountered when purging


| User response: Restart GPFS. Contact the IBM
| a key from the key cache. The specified key name
| Support Center.
| might have been incorrect, or an internal error was
| encountered. | 6027-3460 [E] Incorrect format for the Keyname string.
| User response: Ensure that the key name specified in | Explanation: An incorrect format was used when
| the command is correct. | specifying the Keyname string.
| User response: Verify the format of the Keyname
| 6027-3451 Error errorNumber when emptying cache | string.
| (file system fileSystem).
| Explanation: An error was encountered when purging | 6027-3461 [E] Error code: errorNumber.
| all the keys from the key cache.
| Explanation: An error occurred when processing a key
| User response: Contact the IBM Support Center. | ID.
| User response: Contact the IBM Support Center.
| 6027-3452 [E] Unable to create encrypted file fileName
| (inode inodeNumber, fileset filesetNumber,
| file system fileSystem). | 6027-3462 [E] Unable to rewrap key: original key
| name: originalKeyname, new key name:
| Explanation: Unable to create a new encrypted file. | newKeyname (inode inodeNumber, fileset
| The key required to encrypt the file might not be | filesetNumber, file system fileSystem).
| available.
| Explanation: Unable to rewrap the key for a specified
| User response: Examine the error message following | file, possibly because the existing key or the new key
| this message for information on the specific failure. | cannot be retrieved from the key server.
| User response: Examine the error message following
| 6027-3453 [E] Unable to open encrypted file: inode | this message for information on the specific failure.
| inodeNumber, fileset filesetNumber, file
| system fileSystem.
| 6027-3463 [E] Rewrap error.
| Explanation: Unable to open an existing encrypted
| file. The key used to encrypt the file might not be | Explanation: An internal error occurred during key
| available. | rewrap.

| User response: Examine the error message following | User response: Examine the error messages
| this message for information on the specific failure. | surrounding this message. Contact the IBM Support
| Center.

| 6027-3457 [E] Unable to rewrap key with name


| Keyname (inode inodeNumber, fileset | 6027-3464 [E] New key is already in use.
| filesetNumber, file system fileSystem). | Explanation: The new key specified in a key rewrap is
| Explanation: Unable to rewrap the key for a specified | already being used.
| file because of an error with the key name. | User response: Ensure that the new key specified in
| User response: Examine the error message following | the key rewrap is not being used by the file.
| this message for information on the specific failure.

250 GPFS: Problem Determination Guide


6027-3465 [E] • 6027-3479 [E]

| 6027-3465 [E] Cannot retrieve original key. | 6027-3472 [E] Could not combine the keys.
| Explanation: Original key being used by the file | Explanation: Unable to combine the keys used to
| cannot be retrieved from the key server. | wrap a file key.
| User response: Verify that the key server is available, | User response: Examine the keys being used. Contact
| the credentials to access the key server are correct, and | the IBM Support Center.
| that the key is defined on the key server.
| 6027-3473 [E] Could not locate the RKM.conf file.
| 6027-3466 [E] Cannot retrieve new key.
| Explanation: Unable to locate the RKM.conf
| Explanation: Unable to retrieve the new key specified | configuration file.
| in the rewrap from the key server.
| User response: Contact the IBM Support Center.
| User response: Verify that the key server is available,
| the credentials to access the key server are correct, and
| that the key is defined on the key server.
| 6027-3474 [E] Could not open fileType file ('fileName'
| was specified).

| 6027-3468 [E] Rewrap error code errorNumber.


| Explanation: Unable to open the specified
| configuration file. Encryption files will not be
| Explanation: Key rewrap failed. | accessible.
| User response: Record the error code and contact the | User response: Ensure that the specified configuration
| IBM Support Center. | file is present on all nodes.

| 6027-3469 [E] Encryption is enabled but the crypto | 6027-3475 [E] Could not read file 'fileName'.
| module could not be initialized. Error
| code: number. Ensure that the GPFS
| Explanation: Unable to read the specified file.
| crypto package was installed. | User response: Ensure that the specified file is
| Explanation: Encryption is enabled, but the
| accessible from the node.
| cryptographic module required for encryption could
| not be loaded. | 6027-3476 [E] Could not seek through file 'fileName'.
| User response: Ensure that the packages required for | Explanation: Unable to seek through the specified file.
| encryption are installed on each node in the cluster. | Possible inconsistency in the local file system where the
| file is stored.
| 6027-3470 [E] Cannot create file fileName: extended | User response: Ensure that the specified file can be
| attribute is too large: numBytesRequired | read from the local node.
| bytes (numBytesAvailable available)
| (fileset filesetNumber, file system
| fileSystem). | 6027-3477 [E] Could not wrap the FEK.

| Explanation: Unable to create an encryption file | Explanation: Unable to wrap the file encryption key.
| because the extended attribute required for encryption | User response: Examine other error messages. Verify
| is too large. | that the encryption policies being used are correct.
| User response: Change the encryption policy so that
| the file key is wrapped fewer times, reduce the number | 6027-3478 [E] Insufficient memory.
| of keys used to wrap a file key, or create a file system
| with a larger inode size. | Explanation: Internal error: unable to allocate memory.
| User response: Restart GPFS. Contact the IBM
| 6027-3471 [E] At least one key must be specified. | Support Center.
| Explanation: No key name was specified.
| 6027-3479 [E] Missing combine parameter string.
| User response: Specify at least one key name.
| Explanation: The combine parameter string was not
| specified in the encryption policy.
| User response: Verify the syntax of the encryption
| policy.

Chapter 13. Messages 251


6027-3480 [E] • 6027-3494 [E]

| 6027-3480 [E] Missing encryption parameter string. | 6027-3487 [E] The RKM ID cannot be longer than
| number characters.
| Explanation: The encryption parameter string was not
| specified in the encryption policy. | Explanation: The remote key manager ID cannot be
| longer than the specified length.
| User response: Verify the syntax of the encryption
| policy. | User response: Use a shorter remote key manager ID.

| 6027-3481 [E] Missing wrapping parameter string. | 6027-3488 [E] The length of the key ID cannot be
| zero.
| Explanation: The wrapping parameter string was not
| specified in the encryption policy. | Explanation: The length of the specified key ID string
| cannot be zero.
| User response: Verify the syntax of the encryption
| policy. | User response: Specify a key ID string with a valid
| length.
| 6027-3482 [E] 'combineParameter' could not be parsed as
| a valid combine parameter string. | 6027-3489 [E] The length of the RKM ID cannot be
| zero.
| Explanation: Unable to parse the combine parameter
| string. | Explanation: The length of the specified RKM ID
| string cannot be zero.
| User response: Verify the syntax of the encryption
| policy. | User response: Specify an RKM ID string with a valid
| length.
| 6027-3483 [E] 'encryptionParameter' could not be parsed
| as a valid encryption parameter string. | 6027-3490 [E] The maximum size of the RKM.conf file
| currently supported is number bytes.
| Explanation: Unable to parse the encryption
| parameter string. | Explanation: The RKM.conf file is larger than the size
| that is currently supported.
| User response: Verify the syntax of the encryption
| policy. | User response: User a smaller RKM.conf configuration
| file.
| 6027-3484 [E] 'wrappingParameter' could not be parsed
| as a valid wrapping parameter string. | 6027-3491 [E] The string 'Keyname' could not be parsed
| as a valid key name.
| Explanation: Unable to parse the wrapping parameter
| string. | Explanation: The specified string could not be parsed
| as a valid key name.
| User response: Verify the syntax of the encryption
| policy. | User response: Specify a valid Keyname string.

| 6027-3485 [E] The Keyname string cannot be longer | 6027-3493 [E] numKeys keys were specified but a
| than number characters. | maximum of numKeysMax is supported.
| Explanation: The specified Keyname string has too | Explanation: The maximum number of specified key
| many characters. | IDs was exceeded.
| User response: Verify that the specified Keyname string | User response: Change the encryption policy to use
| is correct. | fewer keys.

| 6027-3486 [E] The KMIP library could not be | 6027-3494 [E] Unrecognized cipher mode.
| initialized.
| Explanation: Unable to recognize the specified cipher
| Explanation: The KMIP library used to communicate | mode.
| with the key server could not be initialized.
| User response: Specify one of the valid cipher modes.
| User response: Restart GPFS. Contact the IBM
| Support Center.

252 GPFS: Problem Determination Guide


6027-3495 [E] • 6027-3512 [E]

| 6027-3495 [E] Unrecognized cipher. | 6027-3503 [E] Unrecognized cipher ('cipher').


| Explanation: Unable to recognize the specified cipher. | Explanation: The specified cipher was not recognized.
| User response: Specify one of the valid ciphers. | User response: Specify a valid cipher.

| 6027-3496 [E] Unrecognized combine mode. | 6027-3504 [E] Unrecognized encryption mode ('mode').
| Explanation: Unable to recognize the specified | Explanation: The specified encryption mode was not
| combine mode. | recognized.
| User response: Specify one of the valid combine | User response: Specify a valid encryption mode.
| modes.
| 6027-3505 [E] Invalid key length ('keyLength').
| 6027-3497 [E] Unrecognized encryption mode.
| Explanation: The specified key length was incorrect.
| Explanation: Unable to recognize the specified
| encryption mode.
| User response: Specify a valid key length.

| User response: Specify one of the valid encryption


| modes.
| 6027-3506 [E] Mode 'mode1' is not compatible with
| mode 'mode2', aborting.

| 6027-3498 [E] Invalid key length.


| Explanation: The two specified encryption parameters
| are not compatible.
| Explanation: An invalid key length was specified.
| User response: Change the encryption policy and
| User response: Specify a valid key length for the | specify compatible encryption parameters.
| chosen cipher mode.
| 6027-3509 [E] Key 'keyID:RKMID' could not be fetched
| 6027-3499 [E] Unrecognized wrapping mode. | (RKM reported error errorNumber).
| Explanation: Unable to recognize the specified | Explanation: The key with the specified name cannot
| wrapping mode. | be fetched from the key server.
| User response: Specify one of the valid wrapping | User response: Examine the error messages to obtain
| modes. | information about the failure. Verify connectivity to the
| key server and that the specified key is present at the
| server.
| 6027-3500 [E] Duplicate Keyname string 'keyIdentifier'.
| Explanation: A given Keyname string has been
| 6027-3510 [E] Could not bind symbol symbolName
| specified twice.
| (errorDescription).
| User response: Change the encryption policy to
| Explanation: Unable to find the location of a symbol
| eliminate the duplicate.
| in the library.
| User response: Contact the IBM Support Center.
| 6027-3501 [E] Unrecognized combine mode
| ('combineMode').
| 6027-3511 [E] Error encountered when parsing 'line':
| Explanation: The specified combine mode was not
| expected a new RKM backend stanza.
| recognized.
| Explanation: An error was encountered when parsing
| User response: Specify a valid combine mode.
| a line in RKM.conf. Parsing of the previous backend is
| complete, and the stanza for the next backend is
| 6027-3502 [E] Unrecognized cipher mode ('cipherMode'). | expected.

| Explanation: The specified cipher mode was not | User response: Correct the syntax in RKM.conf.
| recognized.
| User response: Specify a valid cipher mode. | 6027-3512 [E] The specified type 'type' for backend
| 'backend' is invalid.
| Explanation: An incorrect type was specified for a key
| server backend.

Chapter 13. Messages 253


6027-3513 [E] • 6027-3529 [E]

| User response: Specify a correct backend type in


| 6027-3519 [E] 'numAttempts' is not a valid number of
| RKM.conf.
| connection attempts.
| Explanation: The value specified for the number of
| 6027-3513 [E] Duplicate backend 'backend'.
| connection attempts is incorrect.
| Explanation: A duplicate backend name was specified
| User response: Specify a valid number of connection
| in RKM.conf.
| attempts.
| User response: Specify unique RKM backends in
| RKM.conf.
| 6027-3520 [E] 'sleepInterval' is not a valid sleep interval.
| Explanation: The value specified for the sleep interval
| 6027-3514 [E] Error encountered when parsing 'line':
| is incorrect.
| invalid key 'keyIdentifier'.
| User response: Specify a valid sleep interval value (in
| Explanation: An error was encountered when parsing
| microseconds).
| a line in RKM.conf.
| User response: Specify a well-formed stanza in
| 6027-3521 [E] 'timeout' is not a valid connection
| RKM.conf.
| timeout.
| Explanation: The value specified for the connection
| 6027-3515 [E] Error encountered when parsing 'line':
| timeout is incorrect.
| invalid key-value pair.
| User response: Specify a valid connection timeout (in
| Explanation: An error was encountered when parsing
| seconds).
| a line in RKM.conf: an invalid key-value pair was found.
| User response: Correct the specification of the RKM
| 6027-3522 [E] 'url' is not a valid URL.
| backend in RKM.conf.
| Explanation: The specified string is not a valid URL
| for the key server.
| 6027-3516 [E] Error encountered when parsing 'line':
| incomplete RKM backend stanza | User response: Specify a valid URL for the key server.
| 'backend'.
| Explanation: An error was encountered when parsing | 6027-3524 [E] 'tenantName' is not a valid tenantName.
| a line in RKM.conf. The specification of the backend
| stanza was incomplete. | Explanation: An incorrect value was specified for the
| tenant name.
| User response: Correct the specification of the RKM
| backend in RKM.conf. | User response: Specify a valid tenant name.

| 6027-3517 [E] Could not open library (libName). | 6027-3527 [E] Backend 'backend' could not be
| initialized (error errorNumber).
| Explanation: Unable to open the specified library.
| Explanation: Key server backend could not be
| User response: Verify that all required packages are | initialized.
| installed for encryption. Contact the IBM Support
| Center. | User response: Examine the error messages. Verify
| connectivity to the server. Contact the IBM Support
| Center.
| 6027-3518 [E] The length of the RKM ID string is
| invalid (must be between 0 and length
| characters).
| 6027-3528 [E] Unrecognized wrapping mode
| ('wrapMode').
| Explanation: The length of the RKM backend ID is
| invalid. | Explanation: The specified key wrapping mode was
| not recognized.
| User response: Specify an RKM backend ID with a
| valid length. | User response: Specify a valid key wrapping mode.

| 6027-3529 [E] An error was encountered while


| processing file 'fileName':
| Explanation: An error was encountered while
| processing the specified configuration file.

254 GPFS: Problem Determination Guide


6027-3530 [E] • 6027-3546 [E]

| User response: Examine the error messages that


| 6027-3540 [E] The specified RKM backend type
| follow and correct the corresponding conditions.
| (rkmType) is invalid.
| Explanation: The specified RKM type in RKM.conf is
| 6027-3530 [E] Unable to open encrypted file: key
| incorrect.
| retrieval not initialized (inode
| inodeNumber, fileset filesetNumber, file | User response: Ensure that only supported RKM
| system fileSystem). | types are specified in RKM.conf.
| Explanation: File is encrypted but the infrastructure
| required to retrieve encryption keys was not initialized, | 6027-3541 [E] Encryption is not supported on
| likely because processing of RKM.conf failed. | Windows.
| User response: Examine error messages at the time | Explanation: Encryption cannot be activated if there
| the file system was mounted. | are Windows nodes in the cluster.
| User response: Ensure that encryption is not activated
| 6027-3533 [E] Invalid encryption key derivation | if there are Windows nodes in the cluster.
| function.
| Explanation: An incorrect key derivation function was | 6027-3543 [E] The integrity of the file encrypting key
| specified. | could not be verified after unwrapping;
| the operation was cancelled.
| User response: Specify a valid key derivation
| function. | Explanation: When opening an existing encrypted file,
| the integrity of the file encrypting key could not be
| verified. Either the cryptographic extended attributes
| 6027-3534 [E] Unrecognized encryption key derivation
| were damaged, or the master key(s) used to unwrap
| function ('keyDerivation').
| the FEK have changed.
| Explanation: The specified key derivation function
| User response: Check for other symptoms of data
| was not recognized.
| corruption, and verify that the configuration of the key
| User response: Specify a valid key derivation | server has not changed.
| function.
| 6027-3544 [E] An error was encountered when parsing
| 6027-3535 [E] Incorrect client certificate label | 'line': duplicate key 'key'.
| 'clientCertLabel' for backend 'backend'.
| Explanation: A duplicate keyword was found in
| Explanation: The specified client keypair certificate | RKM.conf.
| label is incorrect for the backend.
| User response: Eliminate duplicate entries in the
| User response: Ensure that the correct client certificate | backend specification.
| label is used in RKM.conf.
| 6027-3545 [E] Encryption is enabled but there is no
| 6027-3536 [E] Incorrect passphrase 'passphrase' for | valid license. Ensure that the GPFS
| backend 'backend'. | crypto package was installed properly.

| Explanation: The specified passphrase is incorrect for | Explanation: The required license is missing for the
| the backend. | GPFS encryption package.
| User response: Ensure that the correct passphrase is | User response: Ensure that the GPFS encryption
| used for the backend in RKM.conf. | package was installed properly.

| 6027-3537 [E] Setting default encryption parameters | 6027-3546 [E] Key 'keyID:rkmID' could not be fetched.
| requires empty combine and wrapping | The specified RKM ID does not exist;
| parameter strings. | check the RKM.conf settings.

| Explanation: A non-empty combine or wrapping | Explanation: The specified RKM ID part of the key
| parameter string was used in an encryption policy rule | name does not exist, and therefore the key cannot be
| that also uses the default parameter string. | retrieved. The corresponding RKM might have been
| removed from RKM.conf.
| User response: Ensure that neither the combine nor
| the wrapping parameter is set when the default | User response: Check the set of RKMs specified in
| parameter string is used in the encryption rule. | RKM.conf.

Chapter 13. Messages 255


6027-3547 [E] • 6027-3704 [E]

| 6027-3547 [E] Key 'keyID:rkmID' could not be fetched. | 6027-3700 [E] Key 'keyID' was not found on RKM ID
| The connection was reset by the peer | 'rkmID'.
| while performing the TLS handshake.
| Explanation: The specified key could not be retrieved
| Explanation: The specified key could not be retrieved | from the key server.
| from the server, because the connection with the server
| was reset while performing the TLS handshake.
| User response: Verify that the key is present at the
| server. Verify that the name of the keys used in the
| User response: Check connectivity to the server. | encryption policy is correct.
| Check credentials to access the server. Contact the IBM
| Support Center.
| 6027-3701 [E] Key 'keyID:rkmID' could not be fetched.
| The authentication with the RKM was
| 6027-3548 [E] Key 'keyID:rkmID' could not be fetched. | not successful.
| The IP address of the RKM could not be
| resolved.
| Explanation: Unable to authenticate with the key
| server.
| Explanation: The specified key could not be retrieved
| from the server because the IP address of the server
| User response: Verify that the credentials used to
| could not be resolved.
| authenticate with the key server are correct.

| User response: Ensure that the hostname of the key


| server is correct. Verify whether there are problems
| 6027-3702 [E] Key 'keyID:rkmID' could not be fetched.
| with name resolutions.
| Permission denied.
| Explanation: Unable to authenticate with the key
| 6027-3549 [E] Key 'keyID:rkmID' could not be fetched. | server.
| The TCP connection with the RKM | User response: Verify that the credentials used to
| could not be established. | authenticate with the key server are correct.
| Explanation: Unable to establish a TCP connection
| with the key server. | 6027-3703 [E] I/O error while accessing the keystore
| User response: Check the connectivity to the key
| file 'keystoreFileName'.
| server. | Explanation: An error occurred while accessing the
| keystore file.
| 6027-3550 Error when retrieving encryption | User response: Verify that the name of the keystore
| attribute: errorDescription. | file in RKM.conf is correct. Verify that the keystore file
| Explanation: Unable to retrieve or decode the | can be read on each node.
| encryption attribute for a given file.
| User response: File could be damaged and may need | 6027-3704 [E] The keystore file 'keystoreFileName' has
| to be removed if it cannot be read. | an invalid format.
| Explanation: The specified keystore file has an invalid
| 6027-3555 name must be combined with
| format.
| FileInherit, DirInherit or both. | User response: Verify that the format of the keystore
| Explanation: NoPropagateInherit must be | file is correct.
| accompanied by other inherit flags. Valid values are
| FileInherit and DirInherit.
| User response: Specify a valid NFSv4 option and
| reissue the command.

256 GPFS: Problem Determination Guide


Accessibility features for GPFS
Accessibility features help users who have a disability, such as restricted mobility or limited vision, to use
information technology products successfully.

Accessibility features
The following list includes the major accessibility features in GPFS:
v Keyboard-only operation
v Interfaces that are commonly used by screen readers
v Keys that are discernible by touch but do not activate just by touching them
v Industry-standard devices for ports and connectors
v The attachment of alternative input and output devices

The IBM Cluster Information Center, and its related publications, are accessibility-enabled. The
accessibility features of the information center are described in the Accessibility topic at the following
URL: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.addinfo.doc/
access.html.

Keyboard navigation
This product uses standard Microsoft Windows navigation keys.

IBM and accessibility


See the IBM Human Ability and Accessibility Center for more information about the commitment that
IBM has to accessibility:
http://www.ibm.com/able/

© Copyright IBM Corporation © IBM 2014 257


258 GPFS: Problem Determination Guide
Notices
This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in
your area. Any reference to an IBM product, program, or service is not intended to state or imply that
only that IBM product, program, or service may be used. Any functionally equivalent product, program,
or service that does not infringe any IBM intellectual property right may be used instead. However, it is
the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or
service.

IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You can send
license inquiries, in writing, to:

IBM Director of Licensing


IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.

For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property
Department in your country or send inquiries, in writing, to:

Intellectual Property Licensing


Legal and Intellectual Property Law
IBM Japan Ltd.
19-21, Nihonbashi-Hakozakicho, Chuo-ku
Tokyo 103-8510, Japan

The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law:

INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS"


WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied
warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in
any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of
the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.

© Copyright IBM Corp. 2014 259


Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independently created programs and other programs (including this
one) and (ii) the mutual use of the information which has been exchanged, should contact:

IBM Corporation
Dept. 30ZA/Building 707
Mail Station P300
2455 South Road,
Poughkeepsie, NY 12601-5400
U.S.A.

Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment or a fee.

The licensed program described in this document and all licensed material available for it are provided
by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or
any equivalent agreement between us.

Any performance data contained herein was determined in a controlled environment. Therefore, the
results obtained in other operating environments may vary significantly. Some measurements may have
been made on development-level systems and there is no guarantee that these measurements will be the
same on generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.

Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products and
cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products.

This information contains examples of data and reports used in daily business operations. To illustrate
them as completely as possible, the examples include the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to the names and addresses used by an
actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs
in any form without payment to IBM, for the purposes of developing, using, marketing or distributing
application programs conforming to the application programming interface for the operating platform for
which the sample programs are written. These examples have not been thoroughly tested under all
conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be
liable for any damages arising out of your use of the sample programs.

If you are viewing this information softcopy, the photographs and color illustrations may not appear.

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

260 GPFS: Problem Determination Guide


Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.

Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation in the United States,
other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Notices 261
262 GPFS: Problem Determination Guide
Glossary
This glossary provides terms and definitions for D
the GPFS product.
Data Management Application Program
Interface (DMAPI)
The following cross-references are used in this
The interface defined by the Open
glossary:
Group's XDSM standard as described in
v See refers you from a nonpreferred term to the the publication System Management: Data
preferred term or from an abbreviation to the Storage Management (XDSM) API Common
spelled-out form. Application Environment (CAE) Specification
v See also refers you to a related or contrasting C429, The Open Group ISBN
term. 1-85912-190-X.
deadman switch timer
For other terms and definitions, see the IBM
A kernel timer that works on a node that
Terminology website (http://www.ibm.com/
has lost its disk lease and has outstanding
software/globalization/terminology/) (opens in
I/O requests. This timer ensures that the
new window).
node cannot complete the outstanding
I/O requests (which would risk causing
B
file system corruption), by causing a
block utilization panic in the kernel.
The measurement of the percentage of
dependent fileset
used subblocks per allocated blocks.
A fileset that shares the inode space of an
existing independent fileset.
C
disk descriptor
cluster
A definition of the type of data that the
A loosely-coupled collection of
disk contains and the failure group to
independent systems (nodes) organized
which this disk belongs. See also failure
into a network for the purpose of sharing
group.
resources and communicating with each
other. See also GPFS cluster. disposition
The session to which a data management
cluster configuration data
event is delivered. An individual
The configuration data that is stored on
disposition is set for each type of event
the cluster configuration servers.
from each file system.
cluster manager
disk leasing
The node that monitors node status using
A method for controlling access to storage
disk leases, detects failures, drives
devices from multiple host systems. Any
recovery, and selects file system
host that wants to access a storage device
managers. The cluster manager is the
configured to use disk leasing registers
node with the lowest node number
for a lease; in the event of a perceived
among the quorum nodes that are
failure, a host system can deny access,
operating at a particular time.
preventing I/O operations with the
control data structures storage device until the preempted system
Data structures needed to manage file has reregistered.
data and metadata cached in memory.
domain
Control data structures include hash
A logical grouping of resources in a
tables and link pointers for finding
network for the purpose of common
cached data; lock states and tokens to
management and administration.
implement distributed locking; and
various flags and sequence numbers to
keep track of updates to the cached data.

© Copyright IBM Corporation © IBM 2014 263


F using a single file system. A file system
manager processes changes to the state or
failback
description of the file system, controls the
Cluster recovery from failover following
regions of disks that are allocated to each
repair. See also failover.
node, and controls token management
failover and quota management.
(1) The assumption of file system duties
fragment
by another node when a node fails. (2)
The space allocated for an amount of data
The process of transferring all control of
too small to require a full block. A
the ESS to a single cluster in the ESS
fragment consists of one or more
when the other clusters in the ESS fails.
subblocks.
See also cluster. (3) The routing of all
transactions to a second controller when
G
the first controller fails. See also cluster.
global snapshot
failure group
A snapshot of an entire GPFS file system.
A collection of disks that share common
access paths or adapter connection, and GPFS cluster
could all become unavailable through a A cluster of nodes defined as being
single hardware failure. available for use by GPFS file systems.
| FEK File encryption key. An FEK is used to GPFS portability layer
| encrypt sectors of an individual file. The interface module that each
installation must build for its specific
fileset A hierarchical grouping of files managed
hardware platform and Linux
as a unit for balancing workload across a
distribution.
cluster. See also dependent fileset and
independent fileset. GPFS recovery log
A file that contains a record of metadata
fileset snapshot
activity, and exists for each node of a
A snapshot of an independent fileset plus
cluster. In the event of a node failure, the
all dependent filesets.
recovery log for the failed node is
file clone replayed, restoring the file system to a
A writable snapshot of an individual file. consistent state and allowing other nodes
to continue working.
file-management policy
A set of rules defined in a policy file that
I
GPFS uses to manage file migration and
file deletion. See also policy. ill-placed file
A file assigned to one storage pool, but
file-placement policy
having some or all of its data in a
A set of rules defined in a policy file that
different storage pool.
GPFS uses to manage the initial
placement of a newly created file. See also ill-replicated file
policy. A file with contents that are not correctly
replicated according to the desired setting
file system descriptor
for that file. This situation occurs in the
A data structure containing key
interval between a change in the file's
information about a file system. This
replication settings or suspending one of
information includes the disks assigned to
its disks, and the restripe of the file.
the file system (stripe group), the current
state of the file system, and pointers to independent fileset
key files such as quota files and log files. A fileset that has its own inode space.
file system descriptor quorum indirect block
The number of disks needed in order to A block containing pointers to other
write the file system descriptor correctly. blocks.
file system manager inode The internal structure that describes the
The provider of services for all the nodes

264 GPFS: Problem Determination Guide


individual files in the file system. There is mirroring of data protects it against data
one inode for each file. loss within the database or within the
recovery log.
inode space
A collection of inode number ranges multi-tailed
reserved for an independent fileset, which A disk connected to multiple nodes.
enables more efficient per-fileset
functions. N
| ISKLM namespace
| IBM Security Key Lifecycle Manager. For Space reserved by a file system to contain
| GPFS encryption, the ISKLM is used as an the names of its objects.
| RKM server to store MEKs.
Network File System (NFS)
A protocol, developed by Sun
J
Microsystems, Incorporated, that allows
journaled file system (JFS) any host in a network to gain access to
A technology designed for another host or netgroup and their file
high-throughput server environments, directories.
which are important for running intranet
Network Shared Disk (NSD)
and other high-performance e-business
A component for cluster-wide disk
file servers.
naming and access.
junction
NSD volume ID
A special directory entry that connects a
A unique 16 digit hex number that is
name in a directory of one fileset to the
used to identify and access all NSDs.
root directory of another fileset.
node An individual operating-system image
K within a cluster. Depending on the way in
which the computer system is partitioned,
kernel The part of an operating system that
it may contain one or more nodes.
contains programs for such tasks as
input/output, management and control of node descriptor
hardware, and the scheduling of user A definition that indicates how GPFS uses
tasks. a node. Possible functions include:
manager node, client node, quorum node,
M and nonquorum node.
| MEK Master encryption key. An MEK is used node number
| to encrypt other keys. A number that is generated and
maintained by GPFS as the cluster is
metadata
created, and as nodes are added to or
A data structures that contain access
deleted from the cluster.
information about file data. These include:
inodes, indirect blocks, and directories. node quorum
These data structures are not accessible to The minimum number of nodes that must
user applications. be running in order for the daemon to
start.
metanode
The one node per open file that is node quorum with tiebreaker disks
responsible for maintaining file metadata A form of quorum that allows GPFS to
integrity. In most cases, the node that has run with as little as one quorum node
had the file open for the longest period of available, as long as there is access to a
continuous time is the metanode. majority of the quorum disks.
mirroring non-quorum node
The process of writing the same data to A node in a cluster that is not counted for
multiple disks at the same time. The the purposes of quorum determination.

Glossary 265
P the data can be read or regenerated from
the other disk drives in the array due to
| policy A list of file-placement, service-class, and
data redundancy.
| encryption rules that define characteristics
| and placement of files. Several policies recovery
| can be defined within the configuration, The process of restoring access to file
| but only one policy set is active at one system data when a failure has occurred.
| time. Recovery can involve reconstructing data
or providing alternative routing through a
policy rule
different server.
A programming statement within a policy
that defines a specific action to be replication
performed. The process of maintaining a defined set
of data in more than one location.
pool A group of resources with similar
Replication involves copying designated
characteristics and attributes.
changes for one location (a source) to
portability another (a target), and synchronizing the
The ability of a programming language to data in both locations.
compile successfully on different
| RKM server
operating systems without requiring
| Remote key management server. An RKM
changes to the source code.
| server is used to store MEKs.
primary GPFS cluster configuration server
rule A list of conditions and actions that are
In a GPFS cluster, the node chosen to
triggered when certain conditions are met.
maintain the GPFS cluster configuration
Conditions include attributes about an
data.
object (file name, type or extension, dates,
private IP address owner, and groups), the requesting client,
A IP address used to communicate on a and the container name associated with
private network. the object.
public IP address
S
A IP address used to communicate on a
public network. SAN-attached
Disks that are physically attached to all
Q nodes in the cluster using Serial Storage
Architecture (SSA) connections or using
quorum node
Fibre Channel switches.
A node in the cluster that is counted to
determine whether a quorum exists. Scale Out Backup and Restore (SOBAR)
A specialized mechanism for data
quota The amount of disk space and number of
protection against disaster only for GPFS
inodes assigned as upper limits for a
file systems that are managed by Tivoli
specified user, group of users, or fileset.
Storage Manager (TSM) Hierarchical
quota management Storage Management (HSM).
The allocation of disk blocks to the other
secondary GPFS cluster configuration server
nodes writing to the file system, and
In a GPFS cluster, the node chosen to
comparison of the allocated space to
maintain the GPFS cluster configuration
quota limits at regular intervals.
data in the event that the primary GPFS
cluster configuration server fails or
R
becomes unavailable.
Redundant Array of Independent Disks (RAID)
Secure Hash Algorithm digest (SHA digest)
A collection of two or more disk physical
A character string used to identify a GPFS
drives that present to the host an image
security key.
of one or more logical disk drives. In the
event of a single physical device failure, session failure
The loss of all resources of a data

266 GPFS: Problem Determination Guide


management session due to the failure of extended attributes The system storage
the daemon on the session node. pool can also contain user data.
session node
T
The node on which a data management
session was created. token management
A system for controlling file access in
Small Computer System Interface (SCSI)
which each application performing a read
An ANSI-standard electronic interface
or write operation is granted some form
that allows personal computers to
of access to a specific block of file data.
communicate with peripheral hardware,
Token management provides data
such as disk drives, tape drives, CD-ROM
consistency and controls conflicts. Token
drives, printers, and scanners faster and
management has two components: the
more flexibly than previous interfaces.
token management server, and the token
snapshot management function.
An exact copy of changed data in the
token management function
active files and directories of a file system
A component of token management that
or fileset at a single point in time. See also
requests tokens from the token
fileset snapshot and global snapshot.
management server. The token
source node management function is located on each
The node on which a data management cluster node.
event is generated.
token management server
stand-alone client A component of token management that
The node in a one-node cluster. controls tokens relating to the operation
of the file system. The token management
storage area network (SAN)
server is located at the file system
A dedicated storage network tailored to a
manager node.
specific environment, combining servers,
storage products, networking products, twin-tailed
software, and services. A disk connected to two nodes.
storage pool
U
A grouping of storage space consisting of
volumes, logical unit numbers (LUNs), or user storage pool
addresses that share a common set of A storage pool containing the blocks of
administrative characteristics. data that make up user files.
stripe group
V
The set of disks comprising the storage
assigned to a file system. virtual file system (VFS)
A remote file system that has been
striping
mounted so that it is accessible to the
A storage process in which information is
local user.
split into blocks (a fixed amount of data)
and the blocks are written to (or read virtual node (vnode)
from) a series of disks in parallel. The structure that contains information
about a file system object in an virtual file
subblock
system (VFS).
The smallest unit of data accessible in an
I/O operation, equal to one thirty-second
of a data block.
system storage pool
A storage pool containing file system
control structures, reserved files,
directories, symbolic links, special devices,
as well as the metadata associated with
regular files, including indirect blocks and

Glossary 267
268 GPFS: Problem Determination Guide
Index
Special characters cipherList 68
Clearing a leftover Persistent Reserve reservation 103
/etc/filesystems 62 client node 69
/etc/fstab 62 clock synchronization 2, 77
/etc/hosts 42 cluster
/etc/resolv.conf 60 deleting a node 57
/tmp/mmfs 110 cluster configuration information
/tmp/mmfs directory 115 displaying 18
/usr/lpp/mmfs/bin 46 cluster data
/usr/lpp/mmfs/bin/runmmfs 12 backup 45
/usr/lpp/mmfs/samples/gatherlogs.sample.sh file 2 cluster file systems
/var/adm/ras/mmfs.log.latest 1 displaying 19
/var/adm/ras/mmfs.log.previous 1, 57 cluster security configuration 66
/var/mmfs/etc/mmlock 44 cluster state information 17
/var/mmfs/gen/mmsdrfs 45 commands
.ptrash directory 112 conflicting invocation 61
.rhosts 43 errpt 115
.snapshots 82, 84, 85 gpfs.snap 6, 7, 8, 9, 10, 115
grep 3
lslpp 115
A lslv 109
access lsof 24, 70, 71
to disk 95 lspv 101
ACCESS_TIME attribute 29, 30 lsvg 100
accessibility features for the GPFS product 257 lxtrace 11
active file management in disconnected mode 112 mmadddisk 75, 80, 97, 100, 102
active file management, questions related to 111 mmaddnode 54, 55, 110
active file management, resync in 111 mmafmctl Device getstate 17
adding encryption policies 107 mmapplypolicy 25, 77, 78, 80
administration commands mmauth 35, 67
failure 44 mmbackup 81
AFM fileset, changing mode of 112 mmchcluster 43
AFM in disconnected mode 112 mmchconfig 19, 47, 55, 69, 110
AFM, extended attribute size supported by 112 mmchdisk 62, 72, 75, 80, 91, 94, 95, 97, 99
AFM, resync in 111 mmcheckquota 5, 31, 59, 72
AIX mmchfs 5, 46, 54, 57, 62, 63, 64, 72
kernel debugger 39 mmchnsd 91
AIX error logs mmcommon recoverfs 75
MMFS_DISKFAIL 95 mmcommon showLocks 44
MMFS_QUOTA 72 mmcrcluster 19, 43, 47, 54, 110
unavailable disks 72 mmcrfs 57, 58, 91, 102
application programs mmcrnsd 91, 94
errors 3, 5, 50, 58 mmcrsnapshot 83, 84
authorization error 43 mmdeldisk 75, 80, 97, 100
autofs 65 mmdelfileset 79
autofs mount 64 mmdelfs 98, 99
autoload option mmdelnode 55, 57
on mmchconfig command 47 mmdelnsd 94, 98
on mmcrcluster command 47 mmdelsnapshot 83
automount 63, 69 mmdf 53, 75, 100
automount daemon 64 mmdiag 17
automount failure 63, 65 mmexpelnode 20
mmfileid 33, 88, 97
mmfsadm 10, 15, 49, 54, 97
C mmfsck 23, 61, 62, 80, 84, 88, 97, 100, 111
mmgetstate 17, 48, 56
candidate file 25, 28 mmlsattr 78, 79
attributes 29 mmlscluster 18, 55, 67, 109
changing mode of AFM fileset 112 mmlsconfig 11, 19, 64
checking, Persistent Reserve 103 mmlsdisk 58, 61, 62, 72, 75, 91, 94, 96, 99, 116
chosen file 25, 27 mmlsfileset 79
CIFS serving, Windows SMB2 protocol 60

© Copyright IBM Corp. 2014 269


commands (continued) data always gathered by gpfs.snap (continued)
mmlsfs 63, 97, 98, 115 on Windows 10
mmlsmgr 11, 62 data integrity 5, 88
mmlsmount 24, 47, 58, 61, 70, 71, 91 Data Management API (DMAPI)
mmlsnsd 31, 92, 93, 100 file system will not mount 63
mmlspolicy 78 data replication 96
mmlsquota 58, 59 data structure 3
mmlssnapshot 82, 83, 84 dataOnly attribute 80
mmmount 23, 61, 72, 102 dataStructureDump 11
mmpmon 39, 85 dead man switch timer 52
mmquotaoff 59 deadlock
mmquotaon 59 automated breakup 38
mmrefresh 19, 62, 64 deadlocks 53, 54
mmremotecluster 35, 67, 68 automated data collection 38
mmremotefs 64, 67 automated detection 37
mmrepquota 59 information about 37
mmrestorefs 83, 84, 85 delays 53, 54
mmrestripefile 78, 81 DELETE rule 25, 28
mmrestripefs 81, 97, 100 deleting a node
mmrpldisk 75, 80, 102 from a cluster 57
mmsdrrestore 20 descOnly 73
mmshutdown 18, 20, 47, 48, 50, 64, 65, 73 directories
mmsnapdir 82, 84, 85 /tmp/mmfs 110, 115
mmstartup 47, 64, 65 .snapshots 82, 84, 85
mmumount 70, 72 directory that has not been cached, traversing 112
mmunlinkfileset 79 disabling IPv6
mmwindisk 32 for SSH connection delays 60
mount 24, 61, 62, 63, 84, 98, 102 disabling Persistent Reserve manually 104
ping 44 disaster recovery
rcp 43 other problems 56
rpm 115 problems 55
rsh 43, 57 setup problems 56
umount 71, 72, 100 disconnected mode, AFM 112
varyonvg 102 disk access 95
commands, administration disk commands
failure 44 hang 102
communication paths disk descriptor replica 72
unavailable 62 disk failover 99
compiling mmfslinux module 46 disk leasing 52
configuration disk subsystem
hard loop ID 42 failure 91
performance tuning 43 disks
configuration data 75 damaged files 33
configuration parameters declared down 94
kernel 46 define for GPFS use 100
configuration problems 41 displaying information of 31
configuration variable settings failure 3, 5, 91
displaying 19 media failure 96
connectivity problems 43 partial failure 100
console logs replacing 75
mmfs.log 1 usage 72
contact node address 67 disks down 100
contact node failure 68 disks, viewing 32
contacting IBM 117 displaying disk information 31
creating a file, failure 107 displaying NSD information 92
creating a master GPFS log file 2 DMAPI
cron 110 coexistence 82
DNS server failure 67

D
data E
replicated 97 enabling Persistent Reserve manually 104
data always gathered by gpfs.snap 8 encryption policies, adding 107
for a master snapshot 10 encryption problems 107
on AIX 9 ERRNO I/O error code 57
on all platforms 8 error codes
on Linux 9 EIO 3, 91, 98

270 GPFS: Problem Determination Guide


error codes (continued) error messages (continued)
ENODEV 50 6027-631 74
ENOSPC 98 6027-632 74
ERRNO I/O 57 6027-635 74
ESTALE 5, 50 6027-636 74, 99
NO SUCH DIRECTORY 50 6027-638 74
NO SUCH FILE 50 6027-645 63
error logs 1 6027-650 50
example 5 6027-663 58
MMFS_ABNORMAL_SHUTDOWN 3 6027-665 47, 58
MMFS_DISKFAIL 3 6027-695 59
MMFS_ENVIRON 3 6027-953 84
MMFS_FSSTRUCT 3 ANS1312E 82
MMFS_GENERIC 4 descriptor replica 55
MMFS_LONGDISKIO 4 failed to connect 47, 99
MMFS_QUOTA 4, 31 GPFS cluster data recovery 44
MMFS_SYSTEM_UNMOUNT 5 incompatible version number 49
MMFS_SYSTEM_WARNING 5 mmbackup 82
error messages mmfsd ready 47
0516-1339 94 network problems 49
0516-1397 94 quorum 55
0516-862 94 rsh problems 44
6027-1209 51 shared segment problems 48, 49
6027-1242 44 snapshot 82, 83, 84
6027-1290 75 TSM 82
6027-1598 55 error number
6027-1615 44 configuration 46
6027-1617 44 EALL_UNAVAIL = 218 73
6027-1627 58 ECONFIG = 208 46
6027-1628 44 ECONFIG = 215 46, 49
6027-1630 45 ECONFIG = 218 46
6027-1631 45 ECONFIG = 237 46
6027-1632 45 EINVAL 77
6027-1633 45 ENO_MGR = 212 75, 99
6027-1636 92 ENO_QUOTA_INST = 237 63
6027-1661 92 ENOENT 71
6027-1662 94 ENOSPC 75
6027-1995 82 EOFFLINE = 208 99
6027-1996 74 EPANIC = 666 73
6027-2108 92 ESTALE 71
6027-2109 92 EVALIDATE = 214 88
6027-2622 84 file system forced unmount 73
6027-2632 84 GPFS application 99
6027-300 47 GPFS daemon will not come up 49
6027-306 49 installation 46
6027-319 48, 49 multiple file system manager failures 75
6027-320 49 errors, Persistent Reserve 102
6027-321 49 errpt command 115
6027-322 49 EXCLUDE rule 29
6027-341 46, 50 excluded file 29
6027-342 46, 50 attributes 29
6027-343 46, 50 extended attribute size supported by AFM 112
6027-344 46, 50
6027-361 99
6027-418 73, 99
6027-419 63, 73
F
facility
6027-435 55
Linux kernel crash dump (LKCD) 39
6027-473 73
failure
6027-474 73
disk 94
6027-482 63, 99
mmfsck command 111
6027-485 99
of disk media 96
6027-490 55
snapshot 82
6027-506 59
failure creating a file 107
6027-533 54
failure creating, opening, reading, writing to a file 107
6027-538 58
failure group 72
6027-549 63
failure groups
6027-580 63
loss of 72

Index 271
failure groups (continued) filesets (continued)
use of 72 problems 75
failure of mmchpolicy 107 snapshots 79
failure, key rewrap 107 unlinking 79
failure, mount 107 usage errors 79
failures FSDesc structure 72
mmbackup 81 full file system or fileset 112
file creation failure 107
file creation, opening, reading, writing (failure) 107
file migration
problems 78
G
gathering data to solve GPFS problems 6
File Placement Optimizer (FPO), questions related to 113
generating GPFS trace reports
file placement policy 78
mmtracectl command 11
file system descriptor 72, 73
GPFS
failure groups 72
data integrity 88
inaccessible 72
nodes will not start 49
file system manager
replication 96
cannot appoint 71
unable to start 41
contact problems
GPFS cluster
communication paths unavailable 62
problems adding nodes 54
multiple failures 74
recovery from loss of GPFS cluster configuration data
file system mount failure 107
files 45
file system or fileset getting full 112
GPFS cluster data
file systems
backup 45
cannot be unmounted 24
locked 44
creation failure 57
GPFS cluster data files storage 45
determining if mounted 73
GPFS command
discrepancy between configuration data and on-disk
failed 56
data 75
return code 56
forced unmount 5, 71, 74
unsuccessful 56
free space shortage 84
GPFS configuration data 75
listing mounted 24
GPFS daemon 43, 47, 61, 70
loss of access 58
crash 50
not consistent 84
fails to start 47
remote 65
went down 4, 50
unable to determine if mounted 73
will not start 47
will not mount 23, 24, 61
GPFS daemon went down 50
will not unmount 70
GPFS is not using the underlying multipath device 105
FILE_SIZE attribute 29, 30
GPFS kernel extension 46
files
GPFS local node failure 68
/etc/filesystems 62
GPFS log 1, 2, 47, 48, 50, 61, 64, 65, 66, 67, 68, 69, 71, 115
/etc/fstab 62
GPFS messages 121
/etc/group 4
GPFS modules
/etc/hosts 42
cannot be loaded 46
/etc/passwd 4
GPFS problems 41, 61, 91
/etc/resolv.conf 60
GPFS startup time 2
/usr/lpp/mmfs/bin/runmmfs 12
GPFS trace facility 11
/usr/lpp/mmfs/samples/gatherlogs.sample.sh 2
GPFS Windows SMB2 protocol (CIFS serving) 60
/var/adm/ras/mmfs.log.latest 1
gpfs.snap command 6, 115
/var/adm/ras/mmfs.log.previous 1, 57
data always gathered for a master snapshot 10
/var/mmfs/etc/mmlock 44
data always gathered on AIX 9
/var/mmfs/gen/mmsdrfs 45
data always gathered on all platforms 8
.rhosts 43
data always gathered on Linux 9
detecting damage 33
data always gathered on Windows 10
mmfs.log 1, 2, 47, 48, 50, 61, 64, 65, 66, 67, 68, 69, 71, 115
using 7
mmsdrbackup 45
grep command 3
mmsdrfs 45
Group Services
FILESET_NAME attribute 29, 30
verifying quorum 48
filesets
GROUP_ID attribute 29, 30
child 79
deleting 79
dropped 112
emptying 79 H
errors 79 hard loop ID 42
lost+found 80 hints and tips for GPFS problems 109
moving contents 79 Home and .ssh directory ownership and permissions 59
performance 79

272 GPFS: Problem Determination Guide


I migration (continued)
new commands will not run 57
I/O error while in disconnected mode 112 mmadddisk 80
I/O error, AFM 112 mmadddisk command 75, 97, 100, 102
I/O hang 52 mmaddnode command 54, 55, 110
I/O operations slow 4 mmafmctl Device getstate command 17
ill-placed files 77, 81 mmapplypolicy 77, 78, 80
ILM mmapplypolicy -L 0 26
problems 75 mmapplypolicy -L 1 26
inode data mmapplypolicy -L 2 27
stale 87 mmapplypolicy -L 3 28
inode limit 5 mmapplypolicy -L 4 29
installation problems 41 mmapplypolicy -L 5 29
mmapplypolicy -L 6 30
mmapplypolicy command 25
J mmauth 35
junctions mmauth command 67
deleting 79 mmbackup command 81
mmchcluster command 43
mmchconfig command 19, 47, 55, 69, 110
K mmchdisk 80
mmchdisk command 62, 72, 75, 91, 94, 95, 97, 99
KB_ALLOCATED attribute 29, 30 mmcheckquota command 5, 31, 59, 72
kdb 39 mmchfs command 5, 54, 57, 62, 63, 64, 72
KDB kernel debugger 39 mmchnsd command 91
kernel module mmchpolicy failure 107
mmfslinux 46 mmcommon 64, 65
kernel panic 52 mmcommon recoverfs command 75
kernel threads mmcommon showLocks command 44
at time of system hang or panic 39 mmcrcluster command 19, 43, 47, 54, 110
key rewrap failure 107 mmcrfs command 57, 58, 91, 102
mmcrnsd command 91, 94
mmcrsnapshot command 83, 84
L mmdefedquota command fails 110
license inquiries 259 mmdeldisk 80
Linux kernel mmdeldisk command 75, 97, 100
configuration considerations 42 mmdelfileset 79
crash dump facility 39 mmdelfs command 98, 99
logical volume 100 mmdelnode command 55, 57
location 109 mmdelnsd command 94, 98
Logical Volume Manager (LVM) 96 mmdelsnapshot command 83
long waiters mmdf 75
increasing the number of inodes 53 mmdf command 53, 100
lslpp command 115 mmdiag command 17
lslv command 109 mmdsh 43
lsof command 24, 70, 71 mmedquota command fails 110
lspv command 101 mmexpelnode command 20
lsvg command 100 mmfileid command 33, 88, 97
lxtrace command 11 MMFS_ABNORMAL_SHUTDOWN
lxtrace commands 11 error logs 3
MMFS_DISKFAIL
error logs 3
M MMFS_ENVIRON
error logs 3
manually enabling or disabling Persistent Reserve 104 MMFS_FSSTRUCT
master GPFS log file 2 error logs 3
maxblocksize parameter 63 MMFS_GENERIC
MAXNUMMP 82 error logs 4
memory shortage 3, 43 MMFS_LONGDISKIO
message 6027-648 110 error logs 4
message severity tags 119 MMFS_QUOTA
messages 121 error logs 4, 31
6027-1941 42 MMFS_SYSTEM_UNMOUNT
metadata error logs 5
replicated 97 MMFS_SYSTEM_WARNING
MIGRATE rule 25, 28 error logs 5
migration mmfs.log 1, 2, 47, 48, 50, 61, 64, 65, 66, 67, 68, 69, 71, 115
file system will not mount 62

Index 273
mmfsadm command 10, 15, 49, 54, 97 mount command 24, 61, 62, 63, 84, 98, 102
mmfsck 80 mount failure 107
mmfsck command 23, 61, 62, 84, 88, 97, 100 Multi-Media LAN Server 1
failure 111
mmfsd 47, 61, 70
will not start 47
mmfslinux
N
network failure 51
kernel module 46
network problems 3
mmgetstate command 17, 48, 56
NFS
mmlock directory 44
problems 87
mmlsattr 78, 79
NFS client
mmlscluster command 18, 55, 67, 109
with stale inode data 87
mmlsconfig command 11, 19, 64
NFS V4
mmlsdisk command 58, 61, 62, 72, 75, 91, 94, 96, 99, 116
problems 87
mmlsfileset 79
NO SUCH DIRECTORY error code 50
mmlsfs command 63, 97, 98, 115
NO SUCH FILE error code 50
mmlsmgr command 11, 62
NO_SPACE
mmlsmount command 24, 47, 58, 61, 70, 71, 91
error 75
mmlsnsd 31
node
mmlsnsd command 92, 93, 100
crash 117
mmlspolicy 78
hang 117
mmlsquota command 58, 59
rejoin 70
mmlssnapshot command 82, 83, 84
node crash 42
mmmount command 23, 61, 72, 102
node failure 52
mmpmon
node reinstall 42
abend 86
nodes
altering input file 85
cannot be added to GPFS cluster 54
concurrent usage 85
non-quorum node 109
counters wrap 86
notices 259
dump 86
NSD 100
hang 86
creating 94
incorrect input 85
deleting 94
incorrect output 86
displaying information of 92
restrictions 85
extended information 93
setup problems 85
failure 91
trace 86
NSD disks
unsupported features 86
creating 91
mmpmon command 39, 85
using 91
mmquotaoff command 59
NSD server 68, 69, 70
mmquotaon command 59
nsdServerWaitTimeForMount
mmrefresh command 19, 62, 64
changing 70
mmremotecluster 35
nsdServerWaitTimeWindowOnMount
mmremotecluster command 67, 68
changing 70
mmremotefs command 64, 67
mmrepquota command 59
mmrestorefs command 83, 84, 85
mmrestripefile 78, 81 O
mmrestripefs 81 opening a file, failure 107
mmrestripefs command 97, 100 OpenSSH connection delays
mmrpldisk 80 Windows 60
mmrpldisk command 75, 102 orphaned file 80
mmsdrbackup 45
mmsdrfs 45
mmsdrrestore command 20
mmshutdown command 18, 20, 48, 50, 64, 65
P
partitioning information, viewing 32
mmsnapdir command 82, 84, 85
patent information 259
mmstartup command 47, 64, 65
performance 43
mmtracectl command
permission denied
generating GPFS trace reports 11
remote mounts fail 69
mmumount command 70, 72
permission denied failure 107
mmunlinkfileset 79
permission denied failure (key rewrap) 107
mmwindisk 32
Persistent Reserve
mode of AFM fileset, changing 112
checking 103
MODIFICATION_TIME attribute 29, 30
clearing a leftover reservation 103
module is incompatible 47
errors 102
mount
manually enabling or disabling 104
problems 69
understanding 102

274 GPFS: Problem Determination Guide


ping command 44
PMR 117
R
policies RAID controller 96
DEFAULT clause 77 rcp command 43
deleting referenced objects 78 read-only mode mount 24
errors 78 reading a file, failure 107
file placement 77 recovery log 52
incorrect file placement 78 recreation of GPFS storage file
LIMIT clause 77 mmchcluster -p LATEST 45
long runtime 78 remote command problems 43
MIGRATE rule 77 remote file copy command
problems 75 default 43
rule evaluation 77 remote file system I/O fails with "Function not implemented"
usage errors 77 error 66
verifying 25 remote mounts fail with permission denied 69
policies (encryption), adding 107 remote node
policy file expelled 55
detecting errors 26 remote shell
size limit 77 default 43
totals 26 removing the setuid bit 50
policy rules replicated data 97
runtime problems 78 replicated metadata 97
POOL_NAME attribute 29, 30 replication 80
possible GPFS problems 41, 61, 91 of data 96
predicted pool utilization reporting a problem to IBM 11, 115
incorrect 77 resetting of setuid/setgits at AFM home 112
primary NSD server 69 restricted mode mount 23
problem resync in active file management 111
locating a snapshot 82 rpm command 115
not directly related to snapshot 82 rsh
snapshot 82 problems using 43
snapshot directory name 84 rsh command 43, 57
snapshot restore 84
snapshot status 83
snapshot usage 83 S
problem determination Samba
cluster state information 17 client failure 88
documentation 115 Secure Hash Algorithm digest 35
remote file system I/O fails with the "Function not service
implemented" error message when UID mapping is reporting a problem to IBM 115
enabled 66 serving (CIFS), Windows SMB2 protocol 60
reporting a problem to IBM 115 setuid bit, removing 50
tools 1, 23 setuid/setgid bits at AFM home, resetting of 112
tracing 11 severity tags
Problem Management Record 117 messages 119
problems SHA digest 35, 67
configuration 41 shared segments 48
installation 41 problems 49
mmbackup 81 SMB2 protocol (CIFS serving), Windows 60
problems running as administrator, Windows 60 snapshot
protocol (CIFS serving), Windows SMB2 60 directory name conflict 84
invalid state 83
restoring 84
Q status error 83
quorum 48, 109 usage error 83
disk 53 valid 82
quorum node 109 snapshot problems 82
quota storage pools
cannot write to quota file 72 deleting 78, 80
denied 58 errors 81
error number 46 failure groups 80
quota files 31 problems 75
quota problems 4 slow access time 81
usage errors 80
strict replication 98
subnets attribute 55

Index 275
syslog facility trace (continued)
Linux 3 tasking system 14
syslogd 65 token manager 14
system load 110 ts commands 12
system snapshots 6, 7 vdisk 15
system storage pool 77, 80 vdisk debugger 14
vdisk hospital 15
vnode layer 15
T trace classes 12
trace facility 11, 12
the IBM Support Center 117
mmfsadm command 10
threads
trace level 15
tuning 43
trace reports, generating 11
waiting 54
trademarks 260
Tivoli Storage Manager server 81
traversing a directory that has not been cached 112
trace
troubleshooting errors 59
active file management 12
troubleshooting Windows errors 59
allocation manager 12
TSM client 81
basic classes 12
TSM server 81
behaviorals 14
MAXNUMMP 82
byte range locks 12
tuning 43
call to routines in SharkMsg.h 13
checksum services 12
cleanup routines 12
cluster security 14 U
concise vnop description 15 umount command 71, 72, 100
daemon routine entry/exit 13 underlying multipath device 105
daemon specific code 14 understanding, Persistent Reserve 102
data shipping 13 useNSDserver attribute 99
defragmentation 12 USER_ID attribute 29, 30
dentry operations 13 using the gpfs.snap command 7
disk lease 13 gathering data 6
disk space allocation 12
DMAPI 13
error logging 13
events exporter 13
V
value too large failure 107
file operations 13
varyon problems 101
file system 13
varyonvg command 102
generic kernel vfs information 13
viewing disks and partitioning information 32
inode allocation 13
volume group 101
interprocess locking 13
kernel operations 13
kernel routine entry/exit 13
low-level vfs locking 13 W
mailbox message handling 13 Windows 59
malloc/free in shared segment 13 file system mounted on the wrong drive letter 111
miscellaneous tracing and debugging 14 Home and .ssh directory ownership and permissions 59
mmpmon 13 mounted file systems, Windows 111
mnode operations 13 OpenSSH connection delays 60
mutexes and condition variables 14 problem seeing newly mounted file systems 111
network shared disk 14 problem seeing newly mounted Windows file systems 111
online multinode fsck 13 problems running as administrator 60
operations in Thread class 14 Windows 111
page allocator 14 Windows SMB2 protocol (CIFS serving) 60
parallel inode tracing 14 writing to a file, failure 107
performance monitors 14
physical disk I/O 13
physical I/O 13
pinning to real memory 14
quota management 14
rdma 14
recovery log 13
SANergy 14
scsi services 14
shared segments 14
SMB locks 14
SP message handling 14
super operations 14

276 GPFS: Problem Determination Guide




Product Number: 5725-Q01


5641-GPF
5641-GP6
5641-GP7
5641-GP8

Printed in USA

GA76-0443-00

You might also like