General Parallel File System Problem Determination Guide
General Parallel File System Problem Determination Guide
GA76-0415-08
GA76-0415-08
Note Before using this information and the product it supports, read the information in Notices on page 249.
This edition applies to version 3 release 5 modification 0.11 of the following products, and to all subsequent releases and modifications until otherwise indicated in new editions: v IBM General Parallel File System for AIX and Linux on POWER (program number 5765-G66) v IBM General Parallel File System on x86 Architecture ordered through AAS/eConfig (program number 5765-XA3) v IBM General Parallel File System on x86 Architecture ordered through HVECC/xCC (program number 5641-A07) v IBM General Parallel File System on x86 Architecture ordered through Passport Advantage (program number 5724-N94) Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of the change. IBM welcomes your comments; see the topic How to send your comments on page xii. When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. Copyright IBM Corporation 1998, 2013. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Tables . . . . . . . . . . . . . . . vii About this information . . . . . . . . ix
Prerequisite and related information . . . . . . x Conventions used in this information . . . . . . xi How to send your comments . . . . . . . . xii mmapplypolicy -L 0 . . mmapplypolicy -L 1 . . mmapplypolicy -L 2 . . mmapplypolicy -L 3 . . mmapplypolicy -L 4 . . mmapplypolicy -L 5 . . mmapplypolicy -L 6 . . The mmcheckquota command The mmlsnsd command . . The mmwindisk command . The mmfileid command . . The SHA digest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 26 27 28 29 29 30 31 31 32 33 35
The GPFS log . . . . . . . . . . . . . Creating a master GPFS log file . . . . . . The operating system error log facility . . . . . MMFS_ABNORMAL_SHUTDOWN . . . . MMFS_DISKFAIL. . . . . . . . . . . MMFS_ENVIRON . . . . . . . . . . MMFS_FSSTRUCT . . . . . . . . . . MMFS_GENERIC. . . . . . . . . . . MMFS_LONGDISKIO . . . . . . . . . MMFS_QUOTA . . . . . . . . . . . MMFS_SYSTEM_UNMOUNT . . . . . . MMFS_SYSTEM_WARNING . . . . . . . Error log entry example . . . . . . . . The gpfs.snap command . . . . . . . . . Using the gpfs.snap command . . . . . . Data always gathered by gpfs.snap on all platforms . . . . . . . . . . . . . Data always gathered by gpfs.snap on AIX . . Data always gathered by gpfs.snap on Linux . Data always gathered by gpfs.snap on Windows Data always gathered by gpfs.snap for a master snapshot. . . . . . . . . . . . . . The mmfsadm command . . . . . . . . . The GPFS trace facility. . . . . . . . . . Generating GPFS trace reports . . . . . .
Chapter 4. Other problem determination tools . . . . . . . . . 37 Chapter 5. GPFS installation, configuration, and operation problems . 39
Installation and configuration problems . . . . . What to do after a node of a GPFS cluster crashes and has been reinstalled . . . . . . Problems with the /etc/hosts file . . . . . . Linux configuration considerations . . . . . Problems with running commands on other nodes . . . . . . . . . . . . . . . GPFS cluster configuration data files are locked Recovery from loss of GPFS cluster configuration data file . . . . . . . . . . . . . . Automatic backup of the GPFS cluster data. . . Error numbers specific to GPFS applications calls GPFS modules cannot be loaded on Linux . . . . GPFS daemon will not come up . . . . . . . Steps to follow if the GPFS daemon does not come up . . . . . . . . . . . . . . Unable to start GPFS after the installation of a new release of GPFS . . . . . . . . . . GPFS error messages for shared segment and network problems . . . . . . . . . . . Error numbers specific to GPFS application calls when the daemon is unable to come up . . . . GPFS daemon went down . . . . . . . . . GPFS failures due to a network failure . . . . . Kernel panics with a 'GPFS dead man switch timer has expired, and there's still outstanding I/O requests' message . . . . . . . . . . . . Quorum loss . . . . . . . . . . . . . . Delays and deadlocks . . . . . . . . . . . Node cannot be added to the GPFS cluster . . . . Remote node expelled after remote file system successfully mounted . . . . . . . . . . . Disaster recovery problems . . . . . . . . . Disaster recovery setup problems . . . . . . Other problems with disaster recovery . . . . GPFS commands are unsuccessful . . . . . . . 39 40 40 40 41 42 42 43 43 44 44 45 46 47 47 47 49
|
. 9 . 10 . 11 . 11
49 50 50 53 53 54 54 55 55
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
23 23 24 24 25
iii
GPFS error messages for unsuccessful GPFS commands. . . . . . . . . . . . Application program errors . . . . . . . GPFS error messages for application program errors . . . . . . . . . . . . . Troubleshooting Windows problems . . . . Home and .ssh directory ownership and permissions . . . . . . . . . . . Problems running as Administrator . . . GPFS Windows and SMB2 protocol (CIFS serving) . . . . . . . . . . . . OpenSSH connection delays . . . . . . .
. . . . . . . .
. 56 . 57 . 58 . 58 . 58 . 58 . 58 . 59
Snapshot directory name conflicts . . . . . Failures using the mmpmon command . . . . Setup problems using mmpmon . . . . . Incorrect output from mmpmon . . . . . Abnormal termination or hang in mmpmon . NFS problems . . . . . . . . . . . . NFS client with stale inode data . . . . . NFS V4 problems . . . . . . . . . . Problems working with Samba . . . . . . . Data integrity. . . . . . . . . . . . . Error numbers specific to GPFS application calls when data integrity may be corrupted . . . Messages requeuing in AFM. . . . . . . .
. . . . . . . . . .
84 85 85 86 86 87 87 87 87 88
. 88 . 88
61
61 63 63 63 65 69 70 71 72 73 73 73 74 74 74 75 75 75 75 77 77 78 79 79 80 81 81 82 82 82 82 82 83 83 84
iv
| |
Questions related to active file management . . . 109 Questions related to File Placement Optimizer (FPO) . . . . . . . . . . . . . . . . 111
. .
. .
. .
. .
. .
. .
. .
. .
. .
. 247 . 247
117
Contents
vi
Tables
| |
1. 2. 3. GPFS library information units . . . . . . ix Conventions . . . . . . . . . . . . xi New, changed, and deleted messages for GPFS 3.5.0.11 . . . . . . . . . . . . . . xv 4. Message severity tags . . . . . . . . . 117
vii
viii
To find out which version of GPFS is running on a particular Linux node, enter:
rpm -qa | grep gpfs
To find out which version of GPFS is running on a particular Windows node, open the Programs and Features control panel. The IBM General Parallel File System installed program name includes the version number.
ix
Table 1. GPFS library information units (continued) Information unit GPFS: Concepts, Planning, and Installation Type of information This information unit provides information about the following topics: v Introducing GPFS v Planning concepts for GPFS v SNMP support v Installing GPFS v Migration, coexistence and compatibility v Applying maintenance Configuration and tuning v Steps to uninstall GPFS GPFS: Data Management API Guide This information unit describes the Data Application programmers who are Management Application Programming experienced with GPFS systems and familiar with the terminology and Interface (DMAPI) for GPFS. concepts in the XDSM standard This implementation is based on The Open Group's System Management: Data Storage Management (XDSM) API Common Applications Environment (CAE) Specification C429, The Open Group, ISBN 1-85912-190-X specification. The implementation is compliant with the standard. Some optional features are not implemented. The XDSM DMAPI model is intended mainly for a single node environment. Some of the key concepts, such as sessions, event delivery, and recovery, required enhancements for a multiple-node environment such as GPFS. Use this information if you intend to write application programs to do the following: v monitor events associated with a GPFS file system or with an individual file v manage and maintain GPFS file system data GPFS: Problem Determination Guide This information unit contains explanations of GPFS error messages and explains how to handle problems you may encounter with GPFS. System administrators of GPFS systems who are experienced with the subsystems used to manage disks and who are familiar with the concepts presented in GPFS: Concepts, Planning, and Installation Intended users System administrators, analysts, installers, planners, and programmers of GPFS clusters who are very experienced with the operating systems on which each GPFS cluster is based
For the latest support information, see the GPFS FAQ (http://publib.boulder.ibm.com/infocenter/ clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html).
xi
xii
Summary of changes
This topic summarizes changes to the GPFS licensed program and the GPFS library. Within each information unit in the library, a vertical line to the left of text and illustrations indicates technical changes or additions made to the previous edition of the information. | | | | | | | | | | | | | | | |
| Backup migration-state changes When an HSM managed file system has files that change migration state, the mmbackup | command will nominate these files to have their metadata updated in the TSM database. | | File placement optimization Block group factor, write affinity depth, and write affinity failure group can be set using the | policy extended attributes setBGF, setWAD, and setWADFG. | | Online replica fix The mmrestripefs command was updated with a -c option, which provides a method for | scanning replicas of metadata and data for conflicts with the file system mounted. When conflicts | are found, the -c option attempts to fix the replicas. | | | | | | | | | | | | | | Problem determination improvements The GPFS: Problem Determination Guide was updated to improve the tasks related to problem determination. Specifically, the Contacting IBM topic was updated with improved first-failure data capture instructions. RDMA Support is provided for multiple fabrics and Connection Manager. Starting mmfsck on a busy system The mmdiag command has a new option called --commands. Use this option to help determine which commands are currently running on the local node, prior to running the mmfsck command. Documented commands, structures, and subroutines: New commands: There are no new commands.
Copyright IBM Corp. 1998, 2013
xiii
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
New structures: There are no new structures. New subroutines: There are no new subroutines. Changed commands: The following commands were changed: v gpfs.snap v mmaddcallback v v v v v v v v v v v v mmaddcomp mmadddisk mmafmctl mmafmhomeconfig mmafmlocal mmapplypolicy mmbackup mmchcomp mmchcomploc mmchconfig mmchenclosure mmchfileset
v mmchattr
v mmchfs v mmcrcluster v mmcrfs v mmcrnsd v mmcrrecoverygroup v v v v v v v v mmcrvdisk mmdefedquota mmdelcomp mmdiag mmdiscovercomp mmedquota mmfsck mmimgbackup
Changed structures: There are no changed structures. Changed subroutines: There are no changed subroutines.
xiv
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Messages The following table lists new, changed, and deleted messages:
Table 3. New, changed, and deleted messages for GPFS 3.5.0.11 New messages 6027-435, 6027-2178, 6027-2188, 6027-2224, 6027-2225, 6027-2226, 6027-2227, 6027-2228, 6027-2776, 6027-2777, 6027-2778, 6027-2779, 6027-2780, 6027-2781, 6027-2782, 6027-2783, 6027-2784, 6027-2785, 6027-2786, 6027-2787, 6027-2788, 6027-2789, 6027-2790, 6027-2791, 6027-2792, 6027-3059, 6027-3060, 6027-3061, 6027-3062, 6027-3063, 6027-3064, 6027-3065, 6027-3232, 6027-3233, 6027-3234, 6027-3235, 6027-3236, 6027-3237, 6027-3238, 6027-3239, 6027-3240, 6027-3400, 6027-3401, 6027-3402 Changed messages 6027-489, 6027-648, 6027-880, 6027-883, 6027-1299, 6027-1662, 6027-1850, 6027-1851, 6027-1852, 6027-1853, 6027-1854, 6027-1855, 6027-1856, 6027-1857, 6027-1858, 6027-1859, 6027-1860, 6027-1861, 6027-1862, 6027-1863, 6027-1864, 6027-1866, 6027-1867, 6027-1868, 6027-1869, 6027-1870, 6027-1871, 6027-1872, 6027-1873, 6027-1874, 6027-1875, 6027-1876, 6027-1877, 6027-1878, 6027-1879, 6027-1880, 6027-1881, 6027-1882, 6027-1883, 6027-1884, 6027-1885, 6027-1886, 6027-1887, 6027-1888, 6027-1889, 6027-1890, 6027-1891, 6027-1892, 6027-1893, 6027-1894, 6027-1895, 6027-1896, 6027-1897, 6027-1898, 6027-1899, 6027-2116, 6027-2117, 6027-2118, 6027-2119, 6027-2120, 6027-2121, 6027-2122, 6027-2123, 6027-2124, 6027-2125, 6027-2126, 6027-2127, 6027-2128, 6027-2129, 6027-2130, 6027-2131, 6027-2132, 6027-2133, 6027-2147, 6027-2148, 6027-2149, 6027-2159, 6027-2160, 6027-2161, 6027-2162, 6027-2163, 6027-2164, 6027-2165, 6027-2166, 6027-2167, 6027-2168, 6027-2170, 6027-2171, 6027-2172, 6027-2173, 6027-2175, 6027-2176, 6027-2177, 6027-2181, 6027-2182, 6027-2183, 6027-2184, 6027-2185, 6027-2186, 6027-2187, 6027-2189, 6027-2190, 6027-2191, 6027-2193, 6027-2194, 6027-2195, 6027-2196, 6027-2197, 6027-2198, 6027-2200, 6027-2201, 6027-2202, 6027-2212, 6027-2213, 6027-2216, 6027-2218, 6027-2219, 6027-2220, 6027-2742, 6027-2748, 6027-2764, 6027-2770, 6027-2771, 6027-2773, 6027-2774, 6027-3000, 6027-3001, 6027-3002, 6027-3003, 6027-3004, 6027-3005, 6027-3006, 6027-3007, 6027-3008, 6027-3009, 6027-3010, 6027-3011, 6027-3012, 6027-3013, 6027-3014, 6027-3015, 6027-3016, 6027-3017, 6027-3018, 6027-3019, 6027-3020, 6027-3021, 6027-3022, 6027-3023, 6027-3024, 6027-3025, 6027-3026, 6027-3027, 6027-3028, 6027-3029, 6027-3030, 6027-3031, 6027-3032, 6027-3034, 6027-3035, 6027-3036, 6027-3037, 6027-3038, 6027-3039, 6027-3040, 6027-3041, 6027-3042, 6027-3043, 6027-3044, 6027-3045, 6027-3046, 6027-3047, 6027-3048, 6027-3049, 6027-3050, 6027-3051, 6027-3052, 6027-3053, 6027-3054, 6027-3055, 6027-3056, 6027-3057, 6027-3058, 6027-3219, 6027-3220, 6027-3221, 6027-3224, 6027-3225, 6027-3226, 6027-3227, 6027-3228, 6027-3229, 6027-3230, 6027-3301, 6027-3303 Deleted messages 6027-2768, 6027-2769, 6027-3206, 6027-3222, 6027-3223, 6027-3231
Summary of changes
xv
xvi
| | | | | | | | | |
Mon Mon Mon Mon Mon Mon Mon Mon Mon Mon
May May May May May May May May May May
20 20 20 20 20 20 20 20 20 20
17:20:12 EDT 17:20:13.313 17:20:17.757 17:20:17.912 17:20:47.598 17:23:27.433 17:24:30.027 18:57:54.122 18:59:03.870 18:59:04.358
2013: 2013: 2013: 2013: 2013: 2013: 2013: 2013: 2013: 2013:
mmcommon mmfsup invoked. Parameters: 192.168.14.132 192.168.14.132 all Accepted and connected to 192.168.14.131 c14f1n01 <c0p0> Accepted and connected to 192.168.116.158 hs22n89 <c0p2> Node 192.168.14.132 (c14f1n02) appointed as manager for tmsfs. Node 192.168.14.132 (c14f1n02) completed take over for tmsfs. Recovering nodes: 192.168.14.131 Recovered 1 nodes for file system tmsfs. Accepted and connected to 192.168.14.131 c14f1n01 <c0n0> Command: mount tmsfs Command: err 0: mount tmsfs
Depending on the size and complexity of your system configuration, the amount of time to start GPFS varies. Taking your system configuration into consideration, after a reasonable amount of time if you | cannot access a file system that has been mounted (either automatically or with a mount or mmmount | command), examine the log file for error messages.
On Windows, use the Event Viewer and look for events with a source label of GPFS in the Application event category. The error log contains information about several classes of events or errors. These classes are:
v v v v v v v v v
MMFS_ABNORMAL_SHUTDOWN MMFS_DISKFAIL MMFS_ENVIRON MMFS_FSSTRUCT MMFS_GENERIC MMFS_LONGDISKIO on page 4 MMFS_QUOTA on page 4 MMFS_SYSTEM_UNMOUNT on page 5 MMFS_SYSTEM_WARNING on page 5
MMFS_ABNORMAL_SHUTDOWN
The MMFS_ABNORMAL_SHUTDOWN error log entry means that GPFS has determined that it must shutdown all operations on this node because of a problem. Insufficient memory on the node to handle critical recovery situations can cause this error. In general there will be other error log entries from GPFS or some other component associated with this error log entry.
MMFS_DISKFAIL
The MMFS_DISKFAIL error log entry indicates that GPFS has detected the failure of a disk and forced the disk to the stopped state. This is ordinarily not a GPFS error but a failure in the disk subsystem or the path to the disk subsystem.
MMFS_ENVIRON
MMFS_ENVIRON error log entry records are associated with other records of the MMFS_GENERIC or MMFS_SYSTEM_UNMOUNT types. They indicate that the root cause of the error is external to GPFS and usually in the network that supports GPFS. Check the network and its physical connections. The data portion of this record supplies the return code provided by the communications code.
MMFS_FSSTRUCT
The MMFS_FSSTRUCT error log entry indicates that GPFS has detected a problem with the on-disk structure of the file system. The severity of these errors depends on the exact nature of the inconsistent data structure. If it is limited to a single file, EIO errors will be reported to the application and operation will continue. If the inconsistency affects vital metadata structures, operation will cease on this file system. These errors are often associated with an MMFS_SYSTEM_UNMOUNT error log entry and will probably occur on all nodes. If the error occurs on all nodes, some critical piece of the file system is inconsistent. This can occur as a result of a GPFS error or an error in the disk system. | If the file system is severely damaged, the best course of action is to follow the procedures in Additional | information to collect for file system corruption or MMFS_FSSTRUCT errors on page 114, and then | contact the IBM Support Center.
MMFS_GENERIC
The MMFS_GENERIC error log entry means that GPFS self diagnostics have detected an internal error, or that additional information is being provided with an MMFS_SYSTEM_UNMOUNT report. If the record is associated with an MMFS_SYSTEM_UNMOUNT report, the event code fields in the records will be the same. The error code and return code fields might describe the error. See Chapter 11, Messages, on page 119 for a listing of codes generated by GPFS. If the error is generated by the self diagnostic routines, service personnel should interpret the return and error code fields since the use of these fields varies by the specific error. Errors caused by the self checking logic will result in the shutdown of GPFS on this node.
MMFS_GENERIC errors can result from an inability to reach a critical disk resource. These errors might look different depending on the specific disk resource that has become unavailable, like logs and allocation maps. This type of error will usually be associated with other error indications. Other errors generated by disk subsystems, high availability components, and communications components at the same time as, or immediately preceding, the GPFS error should be pursued first because they might be the cause of these errors. MMFS_GENERIC error indications without an associated error of those types represent a GPFS problem that requires the IBM Support Center. See Information to collect before contacting the IBM Support Center on page 113.
MMFS_LONGDISKIO
The MMFS_LONGDISKIO error log entry indicates that GPFS is experiencing very long response time for disk requests. This is a warning message and can indicate that your disk system is overloaded or that a failing disk is requiring many I/O retries. Follow your operating system's instructions for monitoring the performance of your I/O subsystem on this node and on any disk server nodes that might be involved. The data portion of this error record specifies the disk involved. There might be related error log entries from the disk subsystems that will pinpoint the actual cause of the problem. If the disk is attached to an AIX node, refer to the AIX Information Center (http://publib16.boulder.ibm.com/pseries/ | index.htm) and search for performance management. To enable or disable, use the mmchfs -w command. | For more details, contact your IBM service representative. The mmpmon command can be used to analyze I/O performance on a per-node basis. See Failures using the mmpmon command on page 85 and the Monitoring GPFS I/O performance with the mmpmon command topic in the GPFS: Advanced Administration Guide.
MMFS_QUOTA
The MMFS_QUOTA error log entry is used when GPFS detects a problem in the handling of quota information. This entry is created when the quota manager has a problem reading or writing the quota file. If the quota manager cannot read all entries in the quota file when mounting a file system with quotas enabled, the quota manager shuts down but file system manager initialization continues. Mounts will not succeed and will return an appropriate error message (see File system forced unmount on page 71). Quota accounting depends on a consistent mapping between user names and their numeric identifiers. This means that a single user accessing a quota enabled file system from different nodes should map to the same numeric user identifier from each node. Within a local cluster this is usually achieved by ensuring that /etc/passwd and /etc/group are identical across the cluster. When accessing quota enabled file systems from other clusters, you need to either ensure individual accessing users have equivalent entries in /etc/passwd and /etc/group, or use the user identity mapping facility as outlined in UID Mapping for GPFS in a Multi-Cluster Environment (http://www.ibm.com/ systems/resources/systems_clusters_software_whitepapers_uid_gpfs.pdf). It might be necessary to run an offline quota check (mmcheckquota) to repair or recreate the quota file. If the quota file is corrupted, mmcheckquota will not restore it. The file must be restored from the backup copy. If there is no backup copy, an empty file can be set as the new quota file. This is equivalent to recreating the quota file. To set an empty file or use the backup file, issue the mmcheckquota command with the appropriate operand: v -u UserQuotaFilename for the user quota file v -g GroupQuotaFilename for the group quota file v -j FilesetQuotaFilename for the fileset quota file | | After replacing the appropriate quota file, reissue the mmcheckquota command to check the file system inode and space usage.
For information about running the mmcheckquota command, see The mmcheckquota command on page 31.
MMFS_SYSTEM_UNMOUNT
The MMFS_SYSTEM_UNMOUNT error log entry means that GPFS has discovered a condition that might result in data corruption if operation with this file system continues from this node. GPFS has marked the file system as disconnected and applications accessing files within the file system will receive ESTALE errors. This can be the result of: v The loss of a path to all disks containing a critical data structure. If you are using SAN attachment of your storage, consult the problem determination guides provided by your SAN switch vendor and your storage subsystem vendor. v An internal processing error within the file system. See File system forced unmount on page 71. Follow the problem determination and repair actions specified.
MMFS_SYSTEM_WARNING
The MMFS_SYSTEM_WARNING error log entry means that GPFS has detected a system level value approaching its maximum limit. This might occur as a result of the number of inodes (files) reaching its limit. If so, issue the mmchfs command to increase the number of inodes for the file system so there is at least a minimum of 5% free.
| The information gathered with the gpfs.snap command can be used in conjunction with other | information (for example, GPFS internal dumps, traces, and kernel thread dumps) to solve a GPFS | problem. The syntax of the gpfs.snap command is:
gpfs.snap [-c "CommandString"] [-d OutputDirectory] [-m | -z] [-a | -N {Node[,Node...] | NodeFile | NodeClass}] [--check-space | --no-check-space | --check-space-only] [--deadlock [--quick]] [--exclude-aix-disk-attr] [--exclude-aix-lvm] [--exclude-net] [--exclude-merge-logs] [--gather-logs] [--mmdf] [--prefix]
| |
These options are used with gpfs.snap: -c "CommandString" Specifies the command string to run on the specified nodes. When this option is specified, the data collected is limited to the result of the specified command string; the standard data collected by gpfs.snap is not collected. CommandString can consist of multiple commands, which are separated by semicolons (;) and enclosed in double quotation marks ("). -d OutputDirectory Specifies the output directory. The default is /tmp/gpfs.snapOut. -m Specifying this option is equivalent to specifying --exclude-merge-logs with -N. | -z Collects gpfs.snap data only from the node on which the command is invoked. No master data is collected. | -a Directs gpfs.snap to collect data from all nodes in the cluster. This is the default. -N {Node[,Node ...] | NodeFile | NodeClass} Specifies the nodes from which to collect gpfs.snap data. This option supports all defined node classes. For general information on how to specify node names, see the Specifying nodes as input to GPFS commands topic in the GPFS: Administration and Programming Reference. --check-space Specifies that space checking is performed before collecting data. --no-check-space Specifies that no space checking is performed. This is the default. ---check-space-only Specifies that only space checking is performed. No data is collected. | --deadlock Collects only the minimum amount of data necessary to debug a deadlock problem. Part of the data | collected is the output of the mmfsadm dump all command. This option ignores all other options | except for -a, -N, -d, and --prefix. | | --quick Collects less data when specified along with the --deadlock option. The output of the mmfsadm | dump most command is collected instead of the output of the mmfsadm dump all command. |
--exclude-aix-disk-attr Specifies that data about AIX disk attributes will not be collected. Collecting data about AIX disk attributes on an AIX node that has a large number of disks could be very time-consuming, so using this option could help improve performance. --exclude-aix-lvm Specifies that data about the AIX Logical Volume Manager (LVM) will not be collected. --exclude-net Specifies that network-related information will not be collected. --exclude-merge-logs Specifies that merge logs and waiters will not be collected. | | | --gather-logs Gathers, merges, and chronologically sorts all of the mmfs.log files. The results are stored in the directory specified with -d option. --mmdf Specifies that mmdf output will be collected. --prefix Specifies that the prefix name gpfs.snap will be added to the tar file. Use the -z option to generate a non-master snapshot. This is useful if there are many nodes on which to take a snapshot, and only one master snapshot is needed. For a GPFS problem within a large cluster (hundreds or thousands of nodes), one strategy might call for a single master snapshot (one invocation of gpfs.snap with no options), and multiple non-master snapshots (multiple invocations of gpfs.snap with the -z option). | Use the -N option to obtain gpfs.snap data from multiple nodes in the cluster. When the -N option is | used, the gpfs.snap command takes non-master snapshots of all the nodes specified with this option and | a master snapshot of the node on which it was invoked.
v mmfsadm dump mmap v mmfsadm dump mutex v mmfsadm dump nsd v mmfsadm dump sgmgr v mmfsadm dump stripe v mmfsadm dump tscomm v mmfsadm dump version v mmfsadm dump waiters v netstat with the -i, -r, -rn, -s, and -v options v ps -edf v vmstat 2. The contents of these files: v /etc/syslog.conf or /etc/syslog-ng.conf v /tmp/mmfs/internal* v /tmp/mmfs/trcrpt* v v v v v /var/adm/ras/mmfs.log.* /var/mmfs/gen/* /var/mmfs/etc/* /var/mmfs/tmp/* /var/mmfs/ssl/* except for complete.map and id_rsa files
2. The contents of these files: v /etc/filesystems v v v v v v /etc/fstab /etc/*release /proc/cpuinfo /proc/version /usr/lpp/mmfs/src/config/site.mcr /var/log/messages*
v v v v v v v v v v v v
mmgetstate -a mmlscluster mmlsconfig mmlsdisk mmlsfileset mmlsfs mmlspolicy mmlsmgr mmlsnode -a mmlsnsd mmlssnapshot mmremotecluster
v mmremotefs v tsstatus 2. The contents of the /var/adm/ras/mmfs.log.* file (on all nodes in the cluster)
10
To determine where the file system manager is, issue the mmlsmgr command:
mmlsmgr
3. Re-create the problem. 4. When the event to be captured occurs, stop the trace as soon as possible by issuing this command:
mmtracectl --stop
5. The output of the GPFS trace facility is stored in /tmp/mmfs, unless the location was changed using the mmchconfig command in Step 1. Save this output. 6. If the problem results in a shutdown and restart of the GPFS daemon, set the traceRecycle variable as necessary to start tracing automatically on daemon startup and stop the trace automatically on daemon shutdown. If the problem requires more detailed tracing, the IBM Support Center personnel might ask you to modify the GPFS trace levels. Use the mmtracectl command to establish the required trace classes and levels of tracing. The syntax to modify trace classes and levels is as follows:
mmtracectl --set --trace={io | all | def | "Class Level [Class Level ...}"]
Chapter 1. Logs, dumps, and traces
11
For example, to tailor the trace level for I/O, issue the following command:
mmtracectl --set --trace=io
Once the trace levels are established, start the tracing by issuing:
mmtracectl --start
After the trace data has been gathered, stop the tracing by issuing:
mmtracectl --stop
To clear the trace settings and make sure tracing is turned off, issue:
mmtracectl --off
Other possible values that can be specified for the trace Class include: afm active file management alloc disk space allocation allocmgr allocation manager basic 'basic' classes brl byte range locks cksum checksum services cleanup cleanup routines cmd ts commands defrag defragmentation dentry dentry operations dentryexit daemon routine entry/exit disk physical disk I/O disklease disk lease dmapi Data Management API ds data shipping errlog error logging eventsExporter events exporter
12
file file operations fs file system fsck online multinode fsck ialloc inode allocation io physical I/O kentryexit kernel routine entry/exit kernel kernel operations klockl low-level vfs locking ksvfs generic kernel vfs information lock interprocess locking log recovery log malloc malloc and free in shared segment mb mailbox message handling mmpmon mmpmon command mnode mnode operations msg call to routines in SharkMsg.h mutex mutexes and condition variables nsd network shared disk perfmon performance monitors pgalloc page allocator tracing pin pinning to real memory pit parallel inode tracing quota quota management
13
rdma rdma sanergy SANergy scsi scsi services sec cluster security shared shared segments smb SMB locks sp SP message handling super super_operations tasking tasking system but not Thread operations thread operations in Thread class tm token manager ts daemon specific code user1 miscellaneous tracing and debugging user2 miscellaneous tracing and debugging vbhvl behaviorals vdb vdisk debugger vdisk vdisk vhosp vdisk hospital vnode vnode layer of VFS kernel support vnop one line per VNOP with all important information The trace Level can be set to a value from 0 through 14, which represents an increasing level of detail. A value of 0 turns tracing off. To display the trace level in use, issue the mmfsadm showtrace command. On AIX, the aix-trace-buffer-size option can be used to control the size of the trace buffer in memory. On Linux nodes only, use the mmtracectl command to change the following: v The trace buffer size in blocking mode. For example, to set the trace buffer size in blocking mode to 8K, issue:
14
v The raw data compression level. For example, to set the trace raw data compression level to the best ratio, issue:
mmtracectl --set --tracedev-compression-level=9
v The trace buffer size in overwrite mode. For example, to set the trace buffer size in overwrite mode to 32K, issue:
mmtracectl --set --tracedev-overwrite-buffer-size=32K
v When to overwrite the old data. For example, to wait to overwrite the data until the trace data is written to the local disk and the buffer is available again, issue:
mmtracectl --set --tracedev-write-mode=blocking
| | | |
Note: Before switching between --tracedev-write-mode=overwrite and --tracedev-writemode=blocking, or vice versa, run the mmtracectl --stop command first. Next, run the mmtracectl --set --tracedev-write-mode command to switch to the desired mode. Finally, restart tracing with the mmtracectl --start command. For more information about the mmtracectl command, see the GPFS: Administration and Programming Reference.
15
16
17
The total number of nodes may sometimes be larger than the actual number of nodes in the cluster. This is the case when nodes from other clusters have established connections for the purposes of mounting a file system that belongs to your cluster. -s Display summary information: number of local and remote nodes that have joined in the cluster, number of quorum nodes, and so forth. -v Display intermediate error messages. The remaining flags have the same meaning as in the mmshutdown command. They can be used to specify the nodes on which to get the state of the GPFS daemon. The GPFS states recognized and displayed by this command are: active GPFS is ready for operations. arbitrating A node is trying to form quorum with the other available nodes. down GPFS daemon is not running on the node or is recovering from an internal error. unknown Unknown value. Node cannot be reached or some other error occurred. For example, to display the quorum, the number of nodes up, and the total number of nodes, issue:
mmgetstate -L -a
where *, if present, indicates that tiebreaker disks are being used. The mmgetstate command is fully described in the Commands topic in the GPFS: Administration and Programming Reference.
18
GPFS cluster configuration servers: ----------------------------------Primary server: k164n06.kgn.ibm.com Secondary server: k164n05.kgn.ibm.com Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------1 k164n04.kgn.ibm.com 198.117.68.68 k164n04.kgn.ibm.com quorum 2 k164n05.kgn.ibm.com 198.117.68.71 k164n05.kgn.ibm.com quorum 3 k164n06.kgn.ibm.com 198.117.68.70 k164n06.kgn.ibm.com quorum-manager
The mmlscluster command is fully described in the Commands topic in the: GPFS: Administration and Programming Reference.
The mmlsconfig command is fully described in the Commands topic in the GPFS: Administration and Programming Reference.
19
The mmrefresh command places the most recent GPFS cluster configuration data files on the specified nodes. The syntax of this command is:
mmrefresh [-f] [ -a | -N {Node[,Node...] | NodeFile | NodeClass}]
The -f flag can be used to force the GPFS cluster configuration data files to be rebuilt whether they appear to be at the most current level or not. If no other option is specified, the command affects only the node on which it is run. The remaining flags have the same meaning as in the mmshutdown command, and are used to specify the nodes on which the refresh is to be performed. For example, to place the GPFS cluster configuration data files at the latest level, on all nodes in the cluster, issue:
mmrefresh -a
Or,
mmexpelnode {-l | --list}
Or,
mmexpelnode {-r | --reset} -N {all | Node[,Node...]}
The flags used by this command are: -o | --once Specifies that the nodes should not be prevented from rejoining. After the recovery protocol completes, expelled nodes will be allowed to rejoin the cluster immediately, without the need to first invoke mmexpelnode --reset. -f | --is-fenced Specifies that the nodes are fenced out and precluded from accessing any GPFS disks without first
20
rejoining the cluster (for example, the nodes were forced to reboot by turning off power). Using this flag allows GPFS to start log recovery immediately, skipping the normal 35-second wait. -w | --wait Instructs the mmexpelnode command to wait until GPFS recovery for the failed node has completed before it runs. -l | --list Lists all currently expelled nodes. -r | --reset Allows the specified nodes to rejoin the cluster (that is, resets the status of the nodes). To unexpel all of the expelled nodes, issue: mmexpelnode -r -N all. -N {all | Node[,Node...]} Specifies a list of host names or IP addresses that represent the nodes to be expelled or unexpelled. Specify the daemon interface host names or IP addresses as shown by the mmlscluster command. The mmexpelnode command does not support administration node names or node classes. Note: -N all can only be used to unexpel nodes.
21
22
| |
23
This facility should be used only if a normal mount does not succeed, and should be considered a last resort to save some data after a fatal disk failure. Read-only mode mount is invoked by using the mmmount command with the -o ro flags. After a read-only mode mount is done, some data may be sufficiently accessible to allow copying to another file system. The success of this technique depends on the actual disk structures damaged.
24
The mmlsmount command is fully described in the Commands topic in the GPFS: Administration and Programming Reference.
These terms are used: candidate file A file that matches a policy rule. chosen file A candidate file that has been scheduled for an action. This policy file is used in the examples that follow:
/* Exclusion rule */ RULE exclude *.save files EXCLUDE WHERE NAME LIKE %.save /* Deletion rule */ RULE delete DELETE FROM POOL sp1 WHERE NAME LIKE %tmp% /* Migration rule */ RULE migration to system pool MIGRATE FROM POOL sp1 TO POOL system WHERE NAME LIKE %file% /* Typo in rule : removed later */ RULE exclude 2 EXCULDE /* List rule */ RULE EXTERNAL LIST tmpfiles EXEC /tmp/exec.list RULE all LIST tmpfiles where name like %tmp%
| The mmapplypolicy command is fully described in the Commands topic in the GPFS: Administration and | Programming Reference.
25
mmapplypolicy -L 0
Use this option to display only serious errors. In this example, there is an error in the policy file. This command:
mmapplypolicy fs1 -P policyfile -I test -L 0
mmapplypolicy -L 1
Use this option to display all of the information (if any) from the previous level, plus some information | as the command runs, but not for each file. This option also displays total numbers for file migration and | deletion. This command:
mmapplypolicy fs1 -P policyfile -I test -L 1
26
Chose to migrate 32KB: 2 of 2 candidates; Chose to premigrate 0KB: 0 candidates; Already co-managed 0KB: 0 candidates; Chose to delete 16KB: 2 of 2 candidates; Chose to list 16KB: 2 of 2 candidates; 0KB of chosen data is illplaced or illreplicated; Predicted Data Pool Utilization in KB and %: sp1 5072 19531264 0.025969% system 102432 19531264 0.524451%
mmapplypolicy -L 2
Use this option to display all of the information from the previous levels, plus each chosen file and the scheduled migration or deletion action. This command:
mmapplypolicy fs1 -P policyfile -I test -L 2
27
LIST tmpfiles /fs1/file.tmp1 SHOW() LIST tmpfiles /fs1/file.tmp0 SHOW() DELETE /fs1/file.tmp1 SHOW() DELETE /fs1/file.tmp0 SHOW() MIGRATE /fs1/file1 TO POOL system SHOW() MIGRATE /fs1/file0 TO POOL system SHOW()
mmapplypolicy -L 3
Use this option to display all of the information from the previous levels, plus each candidate file and the applicable rule. This command:
mmapplypolicy fs1-P policyfile -I test -L 3
28
0KB of chosen data is illplaced or illreplicated; Predicted Data Pool Utilization in KB and %: sp1 5072 19531264 0.025969% system 102432 19531264 0.524451%
mmapplypolicy -L 4
Use this option to display all of the information from the previous levels, plus the name of each explicitly excluded file, and the applicable rule. This command:
mmapplypolicy fs1 -P policyfile -I test -L 4
indicate that there are two excluded files, /fs1/file1.save and /fs1/file2.save.
mmapplypolicy -L 5
Use this option to display all of the information from the previous levels, plus the attributes of candidate and excluded files. These attributes include: v MODIFICATION_TIME v v v v v v v USER_ID GROUP_ID FILE_SIZE POOL_NAME ACCESS_TIME KB_ALLOCATED FILESET_NAME
This command:
mmapplypolicy fs1 -P policyfile -I test -L 5
Chapter 3. GPFS file system and disk information
29
produces the following additional information: [I] Directories scan: 10 files, 1 directories, 0 other objects, 0 skipped files and/or errors. /fs1/file1.save [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE exclude \ *.save files EXCLUDE /fs1/file2.save [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-03@21:19:57 16 root] RULE exclude \ *.save files EXCLUDE /fs1/file.tmp1 [2009-03-04@02:09:31 0 0 0 sp1 2009-03-04@02:09:31 0 root] RULE delete DELETE \ FROM POOL sp1 WEIGHT(INF) /fs1/file.tmp1 [2009-03-04@02:09:31 0 0 0 sp1 2009-03-04@02:09:31 0 root] RULE all LIST \ tmpfiles WEIGHT(INF) /fs1/file.tmp0 [2009-03-04@02:09:38 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE delete \ DELETE FROM POOL sp1 WEIGHT(INF) /fs1/file.tmp0 [2009-03-04@02:09:38 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE all \ LIST tmpfiles WEIGHT(INF) /fs1/file1 [2009-03-03@21:32:41 0 0 16384 sp1 2009-03-03@21:32:41 16 root] RULE migration \ to system pool MIGRATE FROM POOL sp1 TO POOL system WEIGHT(INF) /fs1/file0 [2009-03-03@21:21:11 0 0 16384 sp1 2009-03-03@21:32:41 16 root] RULE migration \ to system pool MIGRATE FROM POOL sp1 TO POOL system WEIGHT(INF) where the lines:
/fs1/file1.save *.save files /fs1/file2.save *.save files [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE exclude \ EXCLUDE [2009-03-03@21:19:57 0 0 16384 sp1 2009-03-03@21:19:57 16 root] RULE exclude \ EXCLUDE
mmapplypolicy -L 6
Use this option to display all of the information from the previous levels, plus files that are not candidate files, and their attributes. These attributes include: v v v v v v MODIFICATION_TIME USER_ID GROUP_ID FILE_SIZE POOL_NAME ACCESS_TIME
30
/fs1/file.tmp0 [2009-03-04@02:09:38 0 0 16384 sp1 2009-03-04@02:09:38 16 root] RULE all LIST \ tmpfiles WEIGHT(INF) /fs1/file1 [2009-03-03@21:32:41 0 0 16384 sp1 2009-03-03@21:32:41 16 root] RULE migration \ to system pool MIGRATE FROM POOL sp1 TO POOL system WEIGHT(INF) /fs1/file0 [2009-03-03@21:21:11 0 0 16384 sp1 2009-03-03@21:32:41 16 root] RULE migration \ to system pool MIGRATE FROM POOL sp1 TO POOL system WEIGHT(INF)
contains information about the data1 file, which is not a candidate file.
31
To find out the local device names for these disks, use the mmlsnsd command with the -m option. For example, issuing mmlsnsd -m produces output similar to this:
Disk name NSD volume ID Device Node name Remarks -----------------------------------------------------------------------------------hd2n97 0972846145C8E924 /dev/hdisk2 c5n97g.ppd.pok.ibm.com server node hd2n97 0972846145C8E924 /dev/hdisk2 c5n98g.ppd.pok.ibm.com server node hd3n97 0972846145C8E927 /dev/hdisk3 c5n97g.ppd.pok.ibm.com server node hd3n97 0972846145C8E927 /dev/hdisk3 c5n98g.ppd.pok.ibm.com server node hd4n97 0972846145C8E92A /dev/hdisk4 c5n97g.ppd.pok.ibm.com server node hd4n97 0972846145C8E92A /dev/hdisk4 c5n98g.ppd.pok.ibm.com server node hd5n98 0972846245EB501C /dev/hdisk5 c5n97g.ppd.pok.ibm.com server node hd5n98 0972846245EB501C /dev/hdisk5 c5n98g.ppd.pok.ibm.com server node hd6n98 0972846245DB3AD8 /dev/hdisk6 c5n97g.ppd.pok.ibm.com server node hd6n98 0972846245DB3AD8 /dev/hdisk6 c5n98g.ppd.pok.ibm.com server node hd7n97 0972846145C8E934 /dev/hd7n97 c5n97g.ppd.pok.ibm.com server node
To obtain extended information for NSDs, use the mmlsnsd command with the -X option. For example, issuing mmlsnsd -X produces output similar to this:
Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no sdfnsd 0972845E45F02E81 /dev/sdf generic c5n94g.ppd.pok.ibm.com server node sdfnsd 0972845E45F02E81 /dev/sdm generic c5n96g.ppd.pok.ibm.com server node
The mmlsnsd command is fully described in the Commands topic in the GPFS: Administration and Programming Reference.
Where: Disk is the Windows disk number as shown in the Disk Management console and the DISKPART command-line utility. Avail shows the value YES when the disk is available and in a state suitable for creating an NSD.
32
GPFS Partition ID is the unique ID for the GPFS partition on the disk. The mmwindisk command does not provide the NSD volume ID. You can use mmlsnsd -m to find the relationship between NSDs and devices, which are disk numbers on Windows.
Before running mmfileid, you must run a disk analysis utility and obtain the disk sector numbers that are suspect or known to be damaged. These sectors are input to the mmfileid command. The syntax is:
mmfileid Device {-d DiskDesc | -F DescFile} [-o OutputFile] [-f NumThreads] [-t Directory] [-N {Node[,Node...] | NodeFile | NodeClass}]
The input parameters are: Device The device name for the file system on which this utility is to be run. This must be the first parameter and is required. -d DiskDesc A descriptor identifying the disk to be scanned. DiskDesc has this format:
NodeName:DiskName[:PhysAddr1[-PhysAddr2]]
Or,
:{NsdName|DiskNum|BROKEN}[:PhysAddr1[-PhysAddr2]]
NodeName Specifies a node in the GPFS cluster that has access to the disk to scan. NodeName must be specified if the disk is identified using its physical volume name. NodeName should be omitted if the disk is identified with its NSD name, its GPFS disk ID number, or if the keyword BROKEN is used. DiskName Specifies the physical volume name of the disk to scan as known on node NodeName. NsdName Specifies the GPFS NSD name of the disk to scan. DiskNum Specifies the GPFS disk ID number of the disk to scan as displayed by the mmlsdisk -L command. BROKEN Specifies that all disks in the file system should be scanned to find files that have broken addresses resulting in lost data. PhysAddr1[-PhysAddr2] Specifies the range of physical disk addresses to scan. The default value for PhysAddr1 is zero. The default value for PhysAddr2 is the value for PhysAddr1. If both PhysAddr1 and PhysAddr2 are zero, the entire disk is searched.
Chapter 3. GPFS file system and disk information
33
-F DescFile Specifies a file containing a list of disks descriptors, one per line. -f NumThreads Specifies the number of worker threads that are to be created by the mmfileid command. The default value is 16. The minimum value is 1. The maximum value can be as large as is allowed by the operating system pthread_create function for a single process. A suggested value is twice the number of disks in the file system. -N {Node[,Node...] | NodeFile | NodeClass} Specifies the list of nodes that will participate in determining the disk addresses. This command supports all defined node classes. The default is all (all nodes in the GPFS cluster will participate). For general information on how to specify node names, see the Specifying nodes as input to GPFS commands topic in the GPFS: Administration and Programming Reference. -o OutputFile The path name of a file to which the result from the mmfileid command is to be written. If not specified, the result is sent to standard output. -t Directory Specifies the directory to use for temporary storage during mmfileid command processing. The default directory is /tmp. The output can be redirected to a file (using the -o flag) and sorted on the inode number, using the sort command. The mmfileid command output contains one line for each inode found to be located on the corrupt disk sector. Each line of the command output has this format:
InodeNumber LogicalDiskAddress SnapshotId Filename
InodeNumber Indicates the inode number of the file identified by mmfileid. LogicalDiskAddress Indicates the disk block (disk sector) number of the file identified by mmfileid. SnapshotId Indicates the snapshot identifier for the file. A SnapshotId of 0 means that the file is not a snapshot file. Filename Indicates the name of the file identified by mmfileid. File names are relative to the root of the file system in which they reside. Assume that a disk analysis tool reported that hdisk6, hdisk7, hdisk8, and hdisk9 contained bad sectors. Then the command:
mmfileid /dev/gpfsB -F addr.in
34
The lines starting with the word Address represent GPFS system metadata files or reserved disk areas. If your output contains any of these lines, do not attempt to replace or repair the indicated files. If you suspect that any of the special files are damaged, call the IBM Support Center for assistance. The line:
14336 1072256 0 /gpfsB/tesDir/testFile.out
indicates that inode number 14336, disk address 1072256 contains file /gpfsB/tesDir/testFile.out, which does not belong to a snapshot (0 to the left of the name). This file is located on a potentially bad disk sector area. The line
14344 2922528 1 /gpfsB/x.img
indicates that inode number 14344, disk address 2922528 contains file /gpfsB/x.img, which belongs to snapshot number 1 (1 to the left of the name). This file is located on a potentially bad disk sector area.
35
Cipher list: EXP1024-RC4-SHA SHA digest: eb71a3aaa89c3979841b363fd6d0a36a2a460a8b File system access: fs1 (rw, root allowed) Cluster name: dkq.cluster (this cluster) Cipher list: AUTHONLY SHA digest: 090cd57a2e3b18ac163e5e9bd5f26ffabaa6aa25 File system access: (all rw)
36
37
38
39
What to do after a node of a GPFS cluster crashes and has been reinstalled
After reinstalling GPFS code, check whether the /var/mmfs/gen/mmsdrfs file was lost. If it was lost, and an up-to-date version of the file is present on the primary GPFS cluster configuration server, restore the file by issuing this command from the node on which it is missing.
mmsdrrestore
If the /var/mmfs/gen/mmsdrfs file is not present on the primary GPFS cluster configuration server, but it is present on some other node in the cluster, restore the file by issuing these commands:
mmsdrrestore -p remoteNode -F remoteFile mmchcluster -p LATEST
where remoteNode is the node that has an up-to-date version of the /var/mmfs/gen/mmsdrfs file, and remoteFile is the full path name of that file on that node. One way to ensure that the latest version of the /var/mmfs/gen/mmsdrfs file is always available is to use the mmsdrbackup user exit. If you have made modifications to any of the user exits in /var/mmfs/etc, you will have to restore them before starting GPFS. For additional information, see Recovery from loss of GPFS cluster configuration data file on page 42.
40
3. What are the performance tuning suggestions? For an up-to-date list of tuning suggestions, see the GPFS FAQ (http://publib.boulder.ibm.com/ infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html).
Authorization problems
The rsh and rcp commands are used by GPFS administration commands to perform operations on other nodes. The rsh daemon (rshd) on the remote node must recognize the command being run and must obtain authorization to invoke it. Note: The rsh and rcp commands that are shipped with the SUA subsystem are not supported on Windows. Use the ssh and scp commands that are shipped with the OpenSSH package supported by GPFS. Refer to the GPFS FAQ (http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/ com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html) for the latest OpenSSH information. For the rsh and rcp commands issued by GPFS administration commands to succeed, each node in the cluster must have an .rhosts file in the home directory for the root user, with file permission set to 600. This .rhosts file must list each of the nodes and the root user. If such an .rhosts file does not exist on each node in the cluster, the rsh and rcp commands issued by GPFS commands will fail with permission errors, causing the GPFS commands to fail in turn. If you elected to use installation specific remote invocation shell and remote file copy commands, you must ensure: 1. Proper authorization is granted to all nodes in the GPFS cluster. 2. The nodes in the GPFS cluster can communicate without the use of a password, and without any extraneous messages.
Connectivity problems
Another reason why rsh may fail is that connectivity to a needed node has been lost. Error messages from mmdsh may indicate that connectivity to such a node has been lost. Here is an example:
mmdelnode -N k145n04 Verifying GPFS is stopped on all affected nodes ... mmdsh: 6027-1617 There are no available nodes on which to run the command. mmdelnode: 6027-1271 Unexpected error from verifyDaemonInactive: mmcommon onall. Return code: 1
If error messages indicate that connectivity to a node has been lost, use the ping command to verify whether the node can still be reached:
ping k145n04 PING k145n04: (119.114.68.69): 56 data bytes <Ctrl- C> ----k145n04 PING Statistics---3 packets transmitted, 0 packets received, 100% packet loss
If connectivity has been lost, restore it, then reissue the GPFS command.
41
6027-1615 nodeName remote shell process had return code value. 6027-1617 There are no available nodes on which to run the command.
The mmcommon showLocks command displays information about the lock server, lock name, lock holder, PID, and extended information. If a GPFS administration command is not responding, stopping the command will free the lock. If another process has this PID, another error occurred to the original GPFS command, causing it to die without freeing the lock, and this new process has the same PID. If this is the case, do not kill the process. 2. If any locks are held and you want to release them manually, from any node in the GPFS cluster issue the command:
mmcommon freeLocks
42
If the /var/mmfs/gen/mmsdrfs file is removed by accident from any of the nodes, and an up-to-date version of the file is present on the primary GPFS cluster configuration server, restore the file by issuing this command from the node on which it is missing:
mmsdrrestore
If the /var/mmfs/gen/mmsdrfs file is not present on the primary GPFS cluster configuration server, but is present on some other node in the cluster, restore the file by issuing these commands:
mmsdrrestore -p remoteNode -F remoteFile mmchcluster -p LATEST
where remoteNode is the node that has an up-to-date version of the /var/mmfs/gen/mmsdrfs file and remoteFile is the full path name of that file on that node. One way to ensure that the latest version of the /var/mmfs/gen/mmsdrfs file is always available is to use the mmsdrbackup user exit.
43
6027-341 Node nodeName is incompatible because its maximum compatible version (number) is less than the version of this node (number). 6027-342 Node nodeName is incompatible because its minimum compatible version is greater than the version of this node (number). 6027-343 Node nodeName is incompatible because its version (number) is less than the minimum compatible version of this node (number). 6027-344 Node nodeName is incompatible because its version is greater than the maximum compatible version of this node (number). |
| You must build the GPFS portability layer binaries based on the kernel configuration of your system. For | more information, see the GPFS open source portability layer topic in the GPFS: Concepts, Planning, and | Installation Guide. During mmstartup processing, GPFS loads the mmfslinux kernel module. | Some of the more common problems that you may encounter are: | 1. If the portability layer is not built, you may see messages similar to: Mon Mar 26 20:56:30 EDT 2012: runmmfs starting | Removing old /var/adm/ras/mmfs.log.* files: | Unloading modules from /lib/modules/2.6.32.12-0.6-ppc64/extra | runmmfs: The /lib/modules/2.6.32.12-0.6-ppc64/extra/mmfslinux.ko kernel extension does not exist. | runmmfs: Unable to verify kernel/module configuration. | Loading modules from /lib/modules/2.6.32.12-0.6-ppc64/extra | runmmfs: The /lib/modules/2.6.32.12-0.6-ppc64/extra/mmfslinux.ko kernel extension does not exist. | runmmfs: Unable to verify kernel/module configuration. | Mon Mar 26 20:56:30 EDT 2012 runmmfs: error in loading or unloading the mmfs kernel extension | Mon Mar 26 20:56:30 EDT 2012 runmmfs: stopping GPFS | | 2. The GPFS kernel modules, mmfslinux and tracedev, are built with a kernel version that differs from that of the currently running Linux kernel. This situation can occur if the modules are built on | another node with a different kernel version and copied to this node, or if the node is rebooted using | a kernel with a different version. | | 3. If the mmfslinux module is incompatible with your system, you may experience a kernel panic on GPFS startup. Ensure that the site.mcr has been configured properly from the site.mcr.proto, and | GPFS has been built and installed properly. | | For more information about the mmfslinux module, see the Building the GPFS portability layer topic in the | GPFS: Concepts, Planning, and Installation Guide.
44
6027-300 mmfsd ready. v The GPFS log file contains this error message: 'Error: daemon and kernel extension do not match.' This error indicates that the kernel extension currently loaded in memory and the daemon currently starting have mismatching versions. This situation may arise if a GPFS code update has been applied, and the node has not been rebooted prior to starting GPFS. While GPFS scripts attempt to unload the old kernel extension during update and install operations, such attempts may fail if the operating system is still referencing GPFS code and data structures. To recover from this error, ensure that all GPFS file systems are successfully unmounted, and reboot the node. The mmlsmount command can be used to ensure that all file systems are unmounted.
The output of this command should list mmfsd as operational. For example:
12230 pts/8 00:00:00 mmfsd
If the output does not show this, the GPFS daemon needs to be started with the mmstartup command. 3. If you did not specify the autoload option on the mmcrcluster or the mmchconfig command, you need to manually start the daemon by issuing the mmstartup command. If you specified the autoload option, someone may have issued the mmshutdown command. In this case, issue the mmstartup command. When using autoload for the first time, mmstartup must be run manually. The autoload takes effect on the next reboot. 4. Verify that the network upon which your GPFS cluster depends is up by issuing:
ping nodename
to each node in the cluster. A properly working network and node will correctly reply to the ping with no lost packets. Query the network interface that GPFS is using with:
netstat -i
A properly working network will report no transmission errors. 5. Verify that the GPFS cluster configuration data is available by looking in the GPFS log. If you see the message: 6027-1592 Unable to retrieve GPFS cluster files from node nodeName. Determine the problem with accessing node nodeName and correct it. 6. Verify that the GPFS environment is properly initialized by issuing these commands and ensuring that the output is as expected. v Issue the mmlscluster command to list the cluster configuration. This will also update the GPFS configuration data on the node. Correct any reported errors before continuing. v List all file systems that were created in this cluster. For an AIX node, issue:
lsfs -v mmfs
| | | | | | | |
45
| | |
If any of these commands produce unexpected results, this may be an indication of corrupted GPFS cluster configuration data file information. Follow the procedures in Information to collect before contacting the IBM Support Center on page 113, and then contact the IBM Support Center. 7. GPFS requires a quorum of nodes to be active before any file system operations can be honored. This requirement guarantees that a valid single token management domain exists for each GPFS file system. Prior to the existence of a quorum, most requests are rejected with a message indicating that quorum does not exist. To identify which nodes in the cluster have daemons up or down, issue:
mmgetstate -L -a
If insufficient nodes are active to achieve quorum, go to any nodes not listed as active and perform problem determination steps on these nodes. A quorum node indicates that it is part of a quorum by writing a mmfsd ready message to the GPFS log. Remember that your system may have quorum nodes and non-quorum nodes, and only quorum nodes are counted to achieve the quorum. 8. This step applies only to AIX nodes. Verify that GPFS kernel extension is not having problems with its shared segment by invoking:
cat /var/adm/ras/mmfs.log.latest
Messages such as: 6027-319 Could not create shared segment. must be corrected by the following procedure: a. Issue the mmshutdown command. b. Remove the shared segment in an AIX environment: 1) Issue the mmshutdown command. 2) Issue the mmfsadm cleanup command. c. If you are still unable to resolve the problem, reboot the node. 9. If the previous GPFS daemon was brought down and you are trying to start a new daemon but are unable to, this is an indication that the original daemon did not completely go away. Go to that node and check the state of GPFS. Stopping and restarting GPFS or rebooting this node will often return GPFS to normal operation. If this fails, follow the procedures in Additional information to collect for | GPFS daemon crashes on page 114, and then contact the IBM Support Center. |
46
| | |
| For network problems, follow the problem determination and repair actions specified with the following | message: 6027-306 Could not initialize internode communication.
Error numbers specific to GPFS application calls when the daemon is unable to come up
When the daemon is unable to come up, GPFS may report these error numbers in the operating system error log, or return them to an application: ECONFIG = 215, Configuration invalid or inconsistent between different nodes. This error is returned when the levels of software on different nodes cannot coexist. For information about which levels may coexist, see the GPFS FAQ (http://publib.boulder.ibm.com/ infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html). 6027-341 Node nodeName is incompatible because its maximum compatible version (number) is less than the version of this node (number). 6027-342 Node nodeName is incompatible because its minimum compatible version is greater than the version of this node (number). 6027-343 Node nodeName is incompatible because its version (number) is less than the minimum compatible version of this node (number). 6027-344 Node nodeName is incompatible because its version is greater than the maximum compatible version of this node (number).
47
Indications leading you to the conclusion that the daemon went down: v Applications running at the time of the failure will see either ENODEV or ESTALE errors. The ENODEV errors are generated by the operating system until the daemon has restarted. The ESTALE error is generated by GPFS as soon as it restarts. When quorum is lost, applications with open files receive an ESTALE error return code until the files are closed and reopened. New file open operations will fail until quorum is restored and the file system is remounted. Applications accessing these files prior to GPFS return may receive a ENODEV return code from the operating system. v The GPFS log contains the message: 6027-650 The mmfs daemon is shutting down abnormally. Most GPFS daemon down error messages are in the mmfs.log.previous log for the instance that failed. If the daemon restarted, it generates a new mmfs.log.latest. Begin problem determination for these errors by examining the operating system error log. If an existing quorum is lost, GPFS stops all processing within the cluster to protect the integrity of your data. GPFS will attempt to rebuild a quorum of nodes and will remount the file system if automatic mounts are specified. v Open requests are rejected with no such file or no such directory errors. When quorum has been lost, requests are rejected until the node has rejoined a valid quorum and mounted its file systems. If messages indicate lack of quorum, follow the procedures in GPFS daemon will not come up on page 44. v Removing the setuid bit from the permissions of these commands may produce errors for non-root users: mmdf mmgetacl mmlsdisk mmlsfs mmlsmgr mmlspolicy mmlsquota mmlssnapshot mmputacl mmsnapdir mmsnaplatest The GPFS system-level versions of these commands (prefixed by ts) may need to be checked for how permissions are set if non-root users see the following message: 6027-1209 GPFS is down on this node. If the setuid bit is removed from the permissions on the system-level commands, the command cannot be executed and the node is perceived as being down. The system-level versions of the commands are: tsdf64 tslsdisk64 tslsfs64 tslsmgr64 tslspolicy64 tslsquota64 tslssnapshot64 tssnapdir64 tssnaplatest64 These are found in the following directories: On AIX machines, in the directory /usr/lpp/mmfs/bin/aix64 On Linux and Windows machines, in the directory /usr/lpp/mmfs/bin.
48
Note: The mode bits for all listed commands are 4555 or -r-sr-xr-x. To restore the default (shipped) permission, enter:
chmod 4555 tscommand
| |
Attention: Only administration-level versions of GPFS commands (prefixed by mm) should be executed. Executing system-level commands (prefixed by ts) directly will produce unexpected results. v For all other errors, follow the procedures in Additional information to collect for GPFS daemon crashes on page 114, and then contact the IBM Support Center.
Nodes mounting file systems owned and served by other clusters may receive error messages similar to this:
Mon Jun 25 16:11:16 2007: Close connection to 89.116.94.81 k155n01 Mon Jun 25 16:11:21 2007: Lost membership in cluster remote.cluster. Unmounting file systems.
If a sufficient number of nodes fail, GPFS will lose the quorum of nodes, which exhibits itself by messages appearing in the GPFS log, similar to this:
Mon Jun 25 11:08:10 2007: Close connection to 179.32.65.4 gpfs2 Mon Jun 25 11:08:10 2007: Lost membership in cluster gpfsxx.kgn.ibm.com. Unmounting file system.
When either of these cases occur, perform problem determination on your network connectivity. Failing components could be network hardware such as switches or host bus adapters.
Kernel panics with a 'GPFS dead man switch timer has expired, and there's still outstanding I/O requests' message
This problem can be detected by an error log with a label of KERNEL_PANIC, and the PANIC MESSAGES or a PANIC STRING. For example:
GPFS Deadman Switch timer has expired, and theres still outstanding I/O requests
GPFS is designed to tolerate node failures through per-node metadata logging (journaling). The log file is called the recovery log. In the event of a node failure, GPFS performs recovery by replaying the recovery log for the failed node, thus restoring the file system to a consistent state and allowing other nodes to continue working. Prior to replaying the recovery log, it is critical to ensure that the failed node has indeed failed, as opposed to being active but unable to communicate with the rest of the cluster.
49
In the latter case, if the failed node has direct access (as opposed to accessing the disk with an NSD server) to any disks that are a part of the GPFS file system, it is necessary to ensure that no I/O requests submitted from this node complete once the recovery log replay has started. To accomplish this, GPFS uses the disk lease mechanism. The disk leasing mechanism guarantees that a node does not submit any more I/O requests once its disk lease has expired, and the surviving nodes use disk lease time out as a guideline for starting recovery. This situation is complicated by the possibility of 'hung I/O'. If an I/O request is submitted prior to the disk lease expiration, but for some reason (for example, device driver malfunction) the I/O takes a long time to complete, it is possible that it may complete after the start of the recovery log replay during recovery. This situation would present a risk of file system corruption. In order to guard against such a contingency, when I/O requests are being issued directly to the underlying disk device, GPFS initiates a kernel timer, referred to as dead man switch. The dead man switch timer goes off in the event of disk lease expiration, and checks whether there is any outstanding I/O requests. If there is any I/O pending, a kernel panic is initiated to prevent possible file system corruption. Such a kernel panic is not an indication of a software defect in GPFS or the operating system kernel, but rather it is a sign of 1. Network problems (the node is unable to renew its disk lease). 2. Problems accessing the disk device (I/O requests take an abnormally long time to complete). See MMFS_LONGDISKIO on page 4.
Quorum loss
Each GPFS cluster has a set of quorum nodes explicitly set by the cluster administrator. These quorum nodes and the selected quorum algorithm determine the availability of file systems owned by the cluster. See the GPFS: Concepts, Planning, and Installation Guide and search for quorum. When quorum loss or loss of connectivity occurs, any nodes still running GPFS suspend the use of file systems owned by the cluster experiencing the problem. This may result in GPFS access within the suspended file system receiving ESTALE errnos. Nodes continuing to function after suspending file system access will start contacting other nodes in the cluster in an attempt to rejoin or reform the quorum. If they succeed in forming a quorum, access to the file system is restarted. Normally, quorum loss or loss of connectivity occurs if a node goes down or becomes isolated from its peers by a network failure. The expected response is to address the failing condition.
50
If file system processes appear to stop making progress, there may be a system resource problem or an internal deadlock within GPFS. Note: A deadlock can occur if user exit scripts that will be called by the mmaddcallback facility are placed in a GPFS file system. The scripts should be placed in a local file system so they are accessible even when the networks fail. 1. First, check how full your file system is by issuing the mmdf command. If the mmdf command does not respond, contact the IBM Support Center. Otherwise, the system displays information similar to:
disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------Disks in storage pool: system (Maximum disk size allowed is 1.1 TB) dm2 140095488 1 yes yes 136434304 ( 97%) 278232 ( 0%) dm4 140095488 1 yes yes 136318016 ( 97%) 287442 ( 0%) dm5 140095488 4000 yes yes 133382400 ( 95%) 386018 ( 0%) dm0nsd 140095488 4005 yes yes 134701696 ( 96%) 456188 ( 0%) dm1nsd 140095488 4006 yes yes 133650560 ( 95%) 492698 ( 0%) dm15 140095488 4006 yes yes 140093376 (100%) 62 ( 0%) -------------------------------- ------------------(pool total) 840572928 814580352 ( 97%) 1900640 ( 0%) (total) ============= 840572928 ==================== =================== 814580352 ( 97%) 1900640 ( 0%)
Inode Information ----------------Number of used inodes: Number of free inodes: Number of allocated inodes: Maximum number of inodes:
GPFS operations that involve allocation of data and metadata blocks (that is, file creation and writes) will slow down significantly if the number of free blocks drops below 5% of the total number. Free up some space by deleting some files or snapshots (keeping in mind that deleting a file will not necessarily result in any disk space being freed up when snapshots are present). Another possible cause of a performance loss is the lack of free inodes. Issue the mmchfs command to increase the number of inodes for the file system so there is at least a minimum of 5% free. GPFS error messages for file system delays and deadlocks 6027-533 Inode space inodeSpace in file system fileSystem is approaching the limit for the maximum number of inodes. operating system error log entry Jul 19 12:51:49 node1 mmfs: Error=MMFS_SYSTEM_WARNING, ID=0x4DC797C6, Tag=3690419: File system warning. Volume fs1. Reason: File system fs1 is approaching the limit for the maximum number of inodes/files. 2. If the file system is not nearing its maximum number of files, look at threads that are waiting. On all nodes of a live system, to determine which nodes have threads waiting longer than 10 seconds, issue this command on each node:
/usr/lpp/mmfs/bin/mmdiag --waiters 10
Determine which nodes have the long waiting threads. For all nodes that have threads waiting longer than 10 seconds, issue:
mmfsadm dump all
51
b. Run an mmfsadm dump all command only on nodes that you are sure the threads are really hung. An mmfsadm dump all command can follow pointers that are changing and cause the node to crash. To determine if a node is hung, determine the percentage of user time that is running (1 indicates quiet time), issue:
vmstat 5 2
Your next course of action depends on the information found in the dump of the waiters. Examine each long waiting thread and determine what it is waiting on: kxAwaitIOCompletion Indicates an abnormal wait time for I/O completion. Look into underlying disk subsystem problems. To determine which disk is being waited on, look in the dump disk section for:
In progress: In progress: read operation ... In progress: write operation ...
If the lines below In progress say none, there is no outstanding I/O for that device. The problem could be either a communication or a disk problem. This section will list the active server node for the hung I/O operation on that NSD. To determine if the hung operation is a communication or disk device problem, go to the dump nsd section in the output for the active server node. Then go to the dump disk section. If there are long in progress times for the NSD in the dump disk section, perform disk problem determination again. If there are no in progress I/O operations, but there are long communication waiters, see the long Remote Procedure Call section below. Remote Procedure Call (RPC) waits Determine why the RPC has not completed on some other node. In the dump, look in the tscomm section to find the same thread with pending messages to find the other node's IP address and the message name that was sent. If there are multiple nodes with pending replies, you have to check all of them to see the holdup. success means the reply has already returned. If the message is tmMsgRevoke, there might not be a thread on the other node to handle it. These can be queued on the lockable object for later handling. At the end of the tscomm section, there may also be messages that have not been given a handler thread. This happens quite often when there are hundreds of sgmMsgMount requests coming to a file system manager all at once. This is normal. In some rare cases where there are TCP problems, we have seen messages only partially returned. In this case there will be a few extra lines after a node in the connection table that shows how many bytes have been received so far and how many it expects. In the dump from the other node see why the message handler for that message name is waiting. In the tscomm section, if you see destination ip@ pending, contact the IBM Support Center. Wait on a mutex Follow the mutex pointer value in the dump mutexes section to see which thread is holding the mutex, and find that thread in the dump threads section to see why that thread is waiting. The desc pointer on a thread is what is recorded as a mutex holder. If you see a by kernel thread, issue an mmfsadm dump kthread. Wait on condition variable These are harder than mutexes to diagnose, because it requires a knowledge of the semantics of state changes in internal structures. In these cases, the thread is waiting for another thread to make a state change and wake up the waiting thread (like a Linux event wait).
52
| | | | | | | | | | | |
Wait on a lock There are lockable objects such as OpenFiles, AllocSegments, IndBlocks, that have keys. The keys look like 4 hex words separated by colons. Finding the object that this thread is working on is usually a matter of: a. Finding any object with the writer field holding the thread's desc value (if the thread has an exclusive type lock), or finding any object that already has a lock on it. For example, [ lx ]. These locks are in the dump locks section, but also show up in the dump files and dump buffers sections. If the node is also the file system manager, there is a dump tokenmgr section that has token information for all the nodes. Also for the file system manager in the dump threads section, look for a KX. If there is a KX call, this indicates a call to the kernel was being made. Issue an mmfsadm dump kthread. b. It is usually difficult to find out why the threads that have the lock are not releasing it. Usually there is another thread, or kthread, to follow in this case. 3. If the deadlock situation cannot be corrected, gather data using one of the following methods and contact the IBM Support Center: v If the cluster size is relatively small and the maxFilesToCache setting is not high (less than 10,000), issue the following command:
gpfs.snap --deadlock
This command will gather only the minimum amount of data necessary to debug the deadlock problem. Part of the data collected is the output of the mmfsadm dump all command. v If the cluster size is large or the maxFilesToCache setting is high (greater than 1M), issue the following command:
gpfs.snap --deadlock --quick
This command will gather less data, which includes mmfsadm dump most, mmfsadm dump kthreads, and 10 seconds of trace in addition to the usual gpfs.snap output.
53
One cause of this condition is when the subnets attribute of the mmchconfig command has been used to specify subnets to GPFS, and there is an incorrect netmask specification on one or more nodes of the clusters involved in the remote mount. Check to be sure that all netmasks are correct for the network interfaces used for GPFS communication.
For more information on quorum override, see the GPFS: Concepts, Planning, and Installation Guide and search on quorum. For PPRC and FlashCopy-based configurations, additional problem determination information may be collected from the ESS log file. This information and the appropriate ESS documentation should be consulted when dealing with various types disk subsystem-related failures. For instance, if users are unable to perform a PPRC failover (or failback) task successfully or unable to generate a FlashCopy of a disk volume, they should consult the subsystem log and the appropriate ESS documentation. For additional information, refer to: v IBM Enterprise Storage Server (http://www.redbooks.ibm.com/redbooks/pdfs/sg245465.pdf) v IBM TotalStorage Enterprise Storage Server Web Interface User's Guide (http://publibfp.boulder.ibm.com/ epubs/pdf/f2bui05.pdf).
54
all disks in the remaining failure group must remain active and usable in order for the file system to continue its operation. A subsequent loss of at least one of the disks in the remaining failure group would render the file system unusable and trigger a forced unmount. In such situations, users may still be able perform a restricted mount (as described in Restricted mode mount on page 23) and attempt to recover parts of their data from the damaged file system. 3. When running mmfsctl syncFSconfig, you may get an error similar to this one:
mmfsctl: None of the nodes in the peer cluster can be reached
If this happens, check the network connectivity between the peer GPFS clusters and verify their remote shell setup. This command requires full TCP/IP connectivity between the two sites, and all nodes must be able to communicate using ssh or rsh without the use of a password.
If the daemon is not active, check /var/adm/ras/mmfs.log.latest and /var/adm/ras/mmfs.log.previous on the local node and on the file system manager node. These files enumerate the failing sequence of the GPFS daemon. If there is a communication failure with the file system manager node, you will receive an error and the errno global variable may be set to EIO (I/O error). 2. Verify the GPFS cluster configuration data files are not locked and are accessible. To determine if the GPFS cluster configuration data files are locked, see GPFS cluster configuration data files are locked on page 42. 3. The rsh command is not functioning correctly. See Authorization problems on page 41. If rsh is not functioning properly on a node in the GPFS cluster, a GPFS administration command that needs to run work on that node will fail with a 'permission is denied' error. The system displays information similar to: |
mmlscluster rshd: 0826-813 Permission is denied. mmdsh: 6027-1615 k145n02 remote shell process had return code 1. mmlscluster: 6027-1591 Attention: Unable to retrieve GPFS cluster files from
Chapter 5. GPFS installation, configuration, and operation problems
55
node k145n02 rshd: 0826-813 Permission is denied. mmdsh: 6027-1615 k145n01 remote shell process had return code 1. mmlscluster: 6027-1592 Unable to retrieve GPFS cluster files from node k145n01
These messages indicate that rsh is not working properly on nodes k145n01 and k145n02. If you encounter this type of failure, determine why rsh is not working on the identified node. Then fix the problem. 4. Most problems encountered during file system creation fall into three classes: v You did not create network shared disks which are required to build the file system. v The creation operation cannot access the disk. Follow the procedures for checking access to the disk. This can result from a number of factors including those described in NSD and underlying disk subsystem failures on page 91. v Unsuccessful attempt to communicate with the file system manager. The file system creation runs on the file system manager node. If that node goes down, the mmcrfs command may not succeed. 5. If the mmdelnode command was unsuccessful and you plan to permanently de-install GPFS from a node, you should first remove the node from the cluster. If this is not done and you run the mmdelnode command after the mmfs code is removed, the command will fail and display a message similar to this example:
Verifying GPFS is stopped on all affected nodes ... k145n05: ksh: /usr/lpp/mmfs/bin/mmremote: not found.
If this happens, power off the node and run the mmdelnode command again. 6. If you have successfully installed and are operating with the latest level of GPFS, but cannot run the new functions available, it is probable that you have not issued the mmchfs -V full or mmchfs -V compat command to change the version of the file system. This command must be issued for each of your file systems. In addition to mmchfs -V, you may need to run the mmmigratefs command. See the File system format changes between versions of GPFS topic in the GPFS: Administration and Programming Reference. Note: Before issuing the -V option (with full or compat), see the Migration, coexistence and compatibility topic in the GPFS: Concepts, Planning, and Installation Guide. You must ensure that all nodes in the cluster have been migrated to the latest level of GPFS code and that you have successfully run the mmchconfig release=LATEST command. Make sure you have operated with the new level of code for some time and are certain you want to migrate to the latest level of GPFS. Issue the mmchfs -V full command only after you have definitely decided to accept the latest level, as this will cause disk changes that are incompatible with previous levels of GPFS. For more information about the mmchfs command, see the GPFS: Administration and Programming Reference.
56
If the daemon was not running when you issued the command, you will see message 6027-665. Follow the procedures in GPFS daemon will not come up on page 44. 6027-665 Failed to connect to file system daemon: errorString. When GPFS commands are unsuccessful, the system may display information similar to these error messages: 6027-1627 The following nodes are not aware of the configuration server change: nodeList. Do not start GPFS on the above nodes until the problem is resolved.
The mmlsquota output when checking the user and group quota is similar. If usage is equal to or approaching the hard limit, or if the grace period has expired, make sure that no quotas are lost by checking in doubt values. If quotas are exceeded in the in doubt category, run the mmcheckquota command. For more information, see The mmcheckquota command on page 31. Note: There is no way to force GPFS nodes to relinquish all their local shares in order to check for lost quotas. This can only be determined by running the mmcheckquota command immediately after mounting the file system, and before any allocations are made. In this case, the value in doubt is the amount lost. To display the latest quota usage information, use the -e option on either the mmlsquota or the mmrepquota commands. Remember that the mmquotaon and mmquotaoff commands do not enable
Chapter 5. GPFS installation, configuration, and operation problems
57
and disable quota management. These commands merely control enforcement of quota limits. Usage continues to be counted and recorded in the quota files regardless of enforcement. Reduce quota usage by deleting or compressing files or moving them out of the file system. Consider increasing quota limit.
0 Oct 26 13:37 /dev/fs/D/Users/demyn/.ssh 0 0 603 672 603 2230 Oct Dec Oct Oct Oct Nov 26 5 26 26 26 11 13:37 11:53 13:37 13:33 13:33 07:57 . .. authorized_keys2 id_dsa id_dsa.pub known_hosts
58
v caching of metadata such as directory content and file properties v better scalability by increasing the support for number of users, shares, and open files per server The SMB2 protocol is negotiated between a client and the server during the establishment of the SMB connection, and it becomes active only if both the client and the server are SMB2 capable. If either side is not SMB2 capable, the default SMB (version 1) protocol gets used. The SMB2 protocol does active metadata caching on the client redirector side, and it relies on Directory Change Notification on the server to invalidate and refresh the client cache. However, GPFS on Windows currently does not support Directory Change Notification. As a result, if SMB2 is used for serving out a GPFS filesystem, the SMB2 redirector cache on the client will not see any cache-invalidate operations if the actual metadata is changed, either directly on the server or via another CIFS client. In such a case, the SMB2 client will continue to see its cached version of the directory contents until the redirector cache expires. Therefore, the use of SMB2 protocol for CIFS sharing of GPFS filesystems can result in the CIFS clients seeing an inconsistent view of the actual GPFS namespace. A workaround is to disable the SMB2 protocol on the CIFS server (that is, the GPFS compute node). This will ensure that the SMB2 never gets negotiated for file transfer even if any CIFS client is SMB2 capable. To disable SMB2 on the GPFS compute node, follow the instructions under the MORE INFORMATION section at the following URL:
http://support.microsoft.com/kb/974103
59
60
v Problems working with Samba on page 87 v Data integrity on page 88 v Messages requeuing in AFM on page 88
61
For example, a mount command may not be issued while the mmfsck command is running. The mount command may not be issued until the conflicting command completes. Note that interrupting the mmfsck command is not a solution because the file system will not be mountable until the command completes. Try again after the conflicting command has completed. 3. Verify that sufficient disks are available to access the file system by issuing the mmlsdisk command. GPFS requires a minimum number of disks to find a current copy of the core metadata. If sufficient disks cannot be accessed, the mount will fail. The corrective action is to fix the path to the disk. See NSD and underlying disk subsystem failures on page 91. Missing disks can also cause GPFS to be unable to find critical metadata structures. The output of the mmlsdisk command will show any unavailable disks. If you have not specified metadata replication, the failure of one disk may result in your file system being unable to mount. If you have specified metadata replication, it will require two disks in different failure groups to disable the entire file system. If there are down disks, issue the mmchdisk start command to restart them and retry the mount. For a remote file system, mmlsdisk provides information about the disks of the file system. However mmchdisk must be run from the cluster that owns the file system. If there are no disks down, you can also look locally for error log reports, and follow the problem determination and repair actions specified in your storage system vendor problem determination guide. If the disk has failed, follow the procedures in NSD and underlying disk subsystem failures on page 91. 4. Verify that communication paths to the other nodes are available. The lack of communication paths between all nodes in the cluster may impede contact with the file system manager. 5. Verify that the file system is not already mounted. Issue the mount command. 6. Verify that the GPFS daemon on the file system manager is available. Run the mmlsmgr command to determine which node is currently assigned as the file system manager. Run a trivial data access command such as an ls on the mount point directory. If the command fails, see GPFS daemon went down on page 47. 7. Check to see if the mount point directory exists and that there is an entry for the file system in the /etc/fstab file (for Linux) or /etc/filesystems file (for AIX). The device name for a file system mount point will be listed in column one of the /etc/fstab entry or as a dev= attribute in the /etc/filesystems stanza entry. A corresponding device name must also appear in the /dev file system. If any of these elements are missing, an update to the configuration information may not have been propagated to this node. Issue the mmrefresh command to rebuild the configuration information on the node and reissue the mmmount command. Do not add GPFS file system information to /etc/filesystems (for AIX) or /etc/fstab (for Linux) directly. If after running mmrefresh -f the file system information is still missing from /etc/filesystems (for AIX) or /etc/fstab (for Linux), follow the procedures in Information to collect before contacting the IBM Support Center on page 113, and then contact the IBM Support Center. 8. Check the number of file systems that are already mounted. There is a maximum number of 256 mounted file systems for a GPFS cluster. Remote file systems are included in this number. 9. If you issue mmchfs -V compat, it enables backwardly-compatible format changes only. Nodes in remote clusters that were able to mount the file system before will still be able to do so. If you issue mmchfs -V full, it enables all new functions that require different on-disk data structures. Nodes in remote clusters running an older GPFS version will no longer be able to mount the file system. If there are any nodes running an older GPFS version that have the file system mounted at the time this command is issued, the mmchfs command will fail. For more information about completing the migration to a new level of GPFS, see the GPFS: Concepts, Planning, and Installation Guide. All nodes that access the file system must be upgraded to the same level of GPFS. Check for the possibility that one or more of the nodes was accidently left out of an effort to upgrade a multi-node system to a new GPFS release. If you need to return to the earlier level of GPFS, you must re-create the file system from the backup medium and restore the content in order to access it.
| |
62
10. If DMAPI is enabled for the file system, ensure that a data management application is started and has set a disposition for the mount event. Refer to the GPFS: Data Management API Guide and the user's guide from your data management vendor. The data management application must be started in the cluster that owns the file system. If the application is not started, other clusters will not be able to mount the file system. Remote mounts of DMAPI managed file systems may take much longer to complete than those not managed by DMAPI. 11. Issue the mmlsfs -A command to check whether the automatic mount option has been specified. If automatic mount option is expected, check the GPFS log in the cluster that owns and serves the file system, for progress reports indicating:
starting ... mounting ... mounted ....
12. If quotas are enabled, check if there was an error while reading quota files. See MMFS_QUOTA on page 4. 13. Verify the maxblocksize configuration parameter on all clusters involved. If maxblocksize is less than the block size of the local or remote file system you are attempting to mount, you will not be able to mount it.
Error numbers specific to GPFS application calls when a file system mount is not successful
When a mount of a file system is not successful, GPFS may report these error numbers in the operating system error log or return them to an application: ENO_QUOTA_INST = 237, No Quota management enabled. To enable quotas for the file system, issue the mmchfs -Q yes command. To disable quotas for the file system issue the mmchfs -Q no command.
63
b. Use the mmlsconfig command to verify the automountdir directory. The default automountdir is named /gpfs/automountdir. If the GPFS file system mount point is not a symbolic link to the GPFS automountdir directory, then accessing the mount point will not cause the automounter to mount the file system. c. If the command /bin/ls -ld of the mount point shows a directory, then run the command mmrefresh -f. If the directory is empty, the command mmrefresh -f will remove the directory and create a symbolic link. If the directory is not empty, you need to move or remove the files contained in that directory, or change the mount point of the file system. For a local file system, use the mmchfs command. For a remote file system, use the mmremotefs command. d. Once the mount point directory is empty, run the mmrefresh -f command. 2. Verify that the autofs mount has been established. Issue this command:
mount | grep automount
For RHEL5, verify the following line is in the default master map file (/etc/auto.master):
/gpfs/automountdir program:/usr/lpp/mmfs/bin/mmdynamicmap
This is an autofs program map, and there will be a single mount entry for all GPFS automounted file systems. The symbolic link points to this directory, and access through the symbolic link triggers the mounting of the target GPFS file system. To create this GPFS autofs mount, issue the mmcommon startAutomounter command, or stop and restart GPFS using the mmshutdown and mmstartup commands. 3. Verify that the automount daemon is running. Issue this command:
ps -ef | grep automount
For RHEL5, verify that the autofs daemon is running. Issue this command:
ps -ef | grep automount
To start the automount daemon, issue the mmcommon startAutomounter command, or stop and restart GPFS using the mmshutdown and mmstartup commands. Note: If automountdir is mounted (as in step 2) and the mmcommon startAutomounter command is not able to bring up the automount daemon, manually umount the automountdir before issuing the mmcommon startAutomounter again. 4. Verify that the mount command was issued to GPFS by examining the GPFS log. You should see something like this:
Mon Jun 25 11:33:03 2004: Command: mount gpfsx2.kgn.ibm.com:gpfs55 5182
5. Examine /var/log/messages for autofs error messages. This is an example of what you might see if the remote file system name does not exist.
64
25 25 25 25 25 on
11:33:03 linux automount[20331]: 11:33:04 linux automount[28911]: 11:33:04 linux automount[28911]: 11:33:04 linux automount[28911]: 11:33:04 linux automount[28911]: /gpfs/automountdir/gpfs55
attempting to mount entry /gpfs/automountdir/gpfs55 >> Failed to open gpfs55. >> No such device >> mount: fs type gpfs not supported by kernel mount(generic): failed to mount /dev/gpfs55 (type gpfs)
6. After you have established that GPFS has received a mount request from autofs (Step 4 on page 64) and that mount request failed (Step 5 on page 64), issue a mount command for the GPFS file system and follow the directions in File system will not mount on page 61.
These are direct mount autofs mount entries. Each GPFS automount file system will have an autofs mount entry. These autofs direct mounts allow GPFS to mount on the GPFS mount point. To create any missing GPFS autofs mounts, issue the mmcommon startAutomounter command, or stop and restart GPFS using the mmshutdown and mmstartup commands. 3. Verify that the autofs daemon is running. Issue this command:
ps -ef | grep automount
To start the automount daemon, issue the mmcommon startAutomounter command, or stop and restart GPFS using the mmshutdown and mmstartup commands. 4. Verify that the mount command was issued to GPFS by examining the GPFS log. You should see something like this:
Mon Jun 25 11:33:03 2007: Command: mount gpfsx2.kgn.ibm.com:gpfs55 5182
5. Since the autofs daemon logs status using syslogd, examine the syslogd log file for status information from automountd. Here is an example of a failed automount request:
Jun 25 15:55:25 gpfsa1 automountd [9820 ] :mount of /gpfs/gpfs55:status 13
6. After you have established that GPFS has received a mount request from autofs (Step 4) and that mount request failed (Step 5), issue a mount command for the GPFS file system and follow the directions in File system will not mount on page 61. 7. If automount fails for a non-GPFS file system and you are using file /etc/auto.master, use file /etc/auto_master instead. Add the entries from /etc/auto.master to /etc/auto_master and restart the automount daemon.
65
v Remote file system I/O fails with the Function not implemented error message when UID mapping is enabled v Remote file system will not mount due to differing GPFS cluster security configurations v Cannot resolve contact node address on page 67 v The remote cluster name does not match the cluster name supplied by the mmremotecluster command on page 67 v Contact nodes down or GPFS down on contact nodes on page 67 v GPFS is not running on the local node on page 68 v The NSD disk does not have a NSD server specified and the mounting cluster does not have direct access to the disks on page 68 v The cipherList option has not been set properly on page 68 v Remote mounts fail with the permission denied error message on page 69
Remote file system I/O fails with the Function not implemented error message when UID mapping is enabled
When user ID (UID) mapping in a multi-cluster environment is enabled, certain kinds of mapping infrastructure configuration problems might result in I/O requests on a remote file system failing:
ls -l /fs1/testfile ls: /fs1/testfile: Function not implemented
To troubleshoot this error, verify the following configuration details: 1. That /var/mmfs/etc/mmuid2name and /var/mmfs/etc/mmname2uid helper scripts are present and executable on all nodes in the local cluster and on all quorum nodes in the file system home cluster, along with any data files needed by the helper scripts. 2. That UID mapping is enabled in both local cluster and remote file system home cluster configuration by issuing the mmlsconfig enableUIDremap command. 3. That UID mapping helper scripts are working correctly. For more information about configuring UID mapping, see the UID Mapping for GPFS in a Multi-cluster Environment white paper at: http://www-03.ibm.com/systems/clusters/software/whitepapers/ uid_gpfs.html.
Remote file system will not mount due to differing GPFS cluster security configurations
A mount command fails with a message similar to this:
Cannot mount gpfsxx2.ibm.com:gpfs66: Host is down.
The GPFS log on the cluster issuing the mount command should have entries similar to these:
There is more information in the log file /var/adm/ras/mmfs.log.latest Mon Jun 25 16:39:27 2007: Waiting to join remote cluster gpfsxx2.ibm.com Mon Jun 25 16:39:27 2007: Command: mount gpfsxx2.ibm.com:gpfs66 30291 Mon Jun 25 16:39:27 2007: The administrator of 199.13.68.12 gpfslx2 requires secure connections. Contact the administrator to obtain the target clusters key and register the key using "mmremotecluster update". Mon Jun 25 16:39:27 2007: A node join was rejected. This could be due to incompatible daemon versions, failure to find the node in the configuration database, or no configuration manager found. Mon Jun 25 16:39:27 2007: Failed to join remote cluster gpfsxx2.ibm.com Mon Jun 25 16:39:27 2007: Command err 693: mount gpfsxx2.ibm.com:gpfs66 30291
The GPFS log file on the cluster that owns and serves the file system will have an entry indicating the problem as well, similar to this:
Mon Jun 25 16:32:21 2007: Kill accepted connection from 199.13.68.12 because security is required, err 74
66
To resolve this problem, contact the administrator of the cluster that owns and serves the file system to obtain the key and register the key using mmremotecluster command. The SHA digest field of the mmauth show and mmremotecluster commands may be used to determine if there is a key mismatch, and on which cluster the key should be updated. For more information on the SHA digest, see The SHA digest on page 35.
To resolve the problem, correct the contact list and try the mount again.
The remote cluster name does not match the cluster name supplied by the mmremotecluster command
A mount command fails with a message similar to this:
Cannot mount gpfslx2:gpfs66: Network is unreachable
Perform these steps: 1. Verify that the remote cluster name reported by the mmremotefs show command is the same name as reported by the mmlscluster command from one of the contact nodes. 2. Verify the list of contact nodes against the list of nodes as shown by the mmlscluster command from the remote cluster. In this example, the correct cluster name is gpfslx2.ibm.com and not gpfslx2
mmlscluster
GPFS cluster configuration servers: ----------------------------------Primary server: gpfslx2.ibm.com Secondary server: (none) Node Daemon node name IP address Admin node name Designation --------------------------------------------------------------------------1 gpfslx2 198.117.68.68 gpfslx2.ibm.com quorum
67
To resolve the problem, use the mmremotecluster show command and verify that the cluster name matches the remote cluster and the contact nodes are valid nodes in the remote cluster. Verify that GPFS is active on the contact nodes in the remote cluster. Another way to resolve this problem is to change the contact nodes using the mmremotecluster update command.
The NSD disk does not have a NSD server specified and the mounting cluster does not have direct access to the disks
A file system mount fails with a message similar to this:
Failed to open gpfs66. No such device mount: Stale NFS file handle Some file system data are inaccessible at this time. Check error log for additional information. Cannot mount gpfslx2.ibm.com:gpfs66: Stale NFS file handle
To resolve the problem, the cluster that owns and serves the file system must define one or more NSD servers.
68
The mmchconfig cipherlist=AUTHONLY command must be run on both the cluster that owns and controls the file system, and the cluster that is attempting to mount the file system.
Mount failure due to client nodes joining before NSD servers are online
If a client node joins the GPFS cluster and attempts file system access prior to the file system's NSD servers being active, the mount fails. This is especially true when automount is used. This situation can occur during cluster startup, or any time that an NSD server is brought online with client nodes already active and attempting to mount a file system served by the NSD server. The file system mount failure produces a message similar to this:
Mon Jun 25 11:23:34 EST 2007: mmmount: Mounting file systems ... No such device Some file system data are inaccessible at this time. Check error log for additional information. After correcting the problem, the file system must be unmounted and then mounted again to restore normal data access. Failed to open fs1. No such device Some file system data are inaccessible at this time. Cannot mount /dev/fs1 on /fs1: Missing file or filesystem
No such device File system manager takeover failed. No such device Command: err 52: mount fs1 32414 Missing file or filesystem
Two mmchconfig command options are used to specify the amount of time for GPFS mount requests to wait for an NSD server to join the cluster:
69
nsdServerWaitTimeForMount Specifies the number of seconds to wait for an NSD server to come up at GPFS cluster startup time, after a quorum loss, or after an NSD server failure. Valid values are between 0 and 1200 seconds. The default is 300. The interval for checking is 10 seconds. If nsdServerWaitTimeForMount is 0, nsdServerWaitTimeWindowOnMount has no effect. nsdServerWaitTimeWindowOnMount Specifies a time window to determine if quorum is to be considered recently formed. Valid values are between 1 and 1200 seconds. The default is 600. If nsdServerWaitTimeForMount is 0, nsdServerWaitTimeWindowOnMount has no effect. The GPFS daemon need not be restarted in order to change these values. The scope of these two operands is the GPFS cluster. The -N flag can be used to set different values on different nodes. In this case, the settings on the file system manager node take precedence over the settings of nodes trying to access the file system. When a node rejoins the cluster (after it was expelled, experienced a communications problem, lost quorum, or other reason for which it dropped connection and rejoined), that node resets all the failure times that it knows about. Therefore, when a node rejoins it sees the NSD servers as never having failed. From the node's point of view, it has rejoined the cluster and old failure information is no longer relevant. GPFS checks the cluster formation criteria first. If that check falls outside the window, GPFS then checks for NSD server fail times being within the window.
the file system will not unmount until all processes are finished accessing it. If mmfsd is up, the processes accessing the file system can be determined. See The lsof command on page 24. These processes can be killed with the command:
lsof filesystem | grep -v COMMAND | awk {print $2} | xargs kill -9
If mmfsd is not operational, the lsof command will not be able to determine which processes are still accessing the file system. For Linux nodes it is possible to use the /proc pseudo file system to determine current file access. For each process currently running on the system, there is a subdirectory /proc/pid/fd, where pid is the numeric process ID number. This subdirectory is populated with symbolic links pointing to the files that this process has open. You can examine the contents of the fd subdirectory for all running processes, manually or with the help of a simple script, to identify the processes that have open files in GPFS file systems. Terminating all of these processes may allow the file system to unmount successfully.
70
2. Verify that there are no disk media failures. Look on the NSD server node for error log entries. Identify any NSD server node that has generated an error log entry. See Disk media failure on page 96 for problem determination and repair actions to follow. 3. If the file system must be unmounted, you can force the unmount by issuing the mmumount -f command: Note: a. See File system forced unmount for the consequences of doing this. b. Before forcing the unmount of the file system, issue the lsof command and close any files that are open. c. On Linux, you might encounter a situation where a GPFS file system cannot be unmounted, even if you issue the mmumount -f command. In this case, you must reboot the node to clear the condition. You can also try the system umount command before you reboot. For example:
umount -f /filesystem
4. If a file system that is mounted by a remote cluster needs to be unmounted, you can force the unmount by issuing the command:
mmunmount filesystemname -f -C remoteclustername
v File system unmounts with an error indicating too many disks are unavailable. The mmlsmount -L command can be used to determine which nodes currently have a given file system mounted. If your file system has been forced to unmount, follow these steps: 1. With the failure of a single disk, if you have not specified multiple failure groups and replication of metadata, GPFS will not be able to continue because it cannot write logs or other critical metadata. If you have specified multiple failure groups and replication of metadata, the failure of multiple disks in different failure groups will put you in the same position. In either of these situations, GPFS will forcibly unmount the file system. This will be indicated in the error log by records indicating exactly which access failed, with a MMFS_SYSTEM_UNMOUNT record indicating the forced unmount. The user response to this is to take the needed actions to restore the disk access and issue the mmchdisk command to disks that are shown as down in the information displayed by the mmlsdisk command.
71
2. Internal errors in processing data on a single file system may cause loss of file system access. These errors may clear with the invocation of the umount command, followed by a remount of the file system, but they should be reported as problems to IBM. 3. If an MMFS_QUOTA error log entry containing Error writing quota file... is generated, the quota manager continues operation if the next write for the user, group, or fileset is successful. If not, further allocations to the file system will fail. Check the error code in the log and make sure that the disks containing the quota file are accessible. Run the mmcheckquota command. For more information, see The mmcheckquota command on page 31. If the file system must be repaired without quotas: a. Disable quota management by issuing the command:
mmchfs Device -Q no
b. c. d. e.
Issue the mmmount command for the file system. Make any necessary repairs and install the backup quota files. Issue the mmumount -a command for the file system. Restore quota management by issuing the mmchfs Device -Q yes command.
f. Run the mmcheckquota command with the -u, -g, and -j options. For more information, see The mmcheckquota command on page 31. g. Issue the mmmount command for the file system. 4. If errors indicate that too many disks are unavailable, see Additional failure group considerations.
72
remaining two copies and writing the update to a new disk added to the subset. But if the downed failure group contains a majority of the subset, the file system descriptor cannot be updated and the file system has to be force unmounted. Introducing a third failure group consisting of a single disk that is used solely for the purpose of maintaining a copy of the file system descriptor can help prevent such a scenario. You can designate this disk by using the descOnly designation for disk usage on the disk descriptor. With the descOnly designation, the disk does not hold any of the other file system data or metadata and can be as small as 4 MB. See the NSD creation considerations topic in the GPFS: Concepts, Planning, and Installation Guide and the Establishing disaster recovery for your GPFS cluster topic in the GPFS: Advanced Administration Guide.
Error numbers specific to GPFS application calls when a file system has been forced to unmount
When a file system has been forced to unmount, GPFS may report these error numbers in the operating system error log or return them to an application: EPANIC = 666, A file system has been forcibly unmounted because of an error. Most likely due to the failure of one or more disks containing the last copy of metadata. See The operating system error log facility on page 2 for details. EALL_UNAVAIL = 218, A replicated read or write failed because none of the replicas were available. Multiple disks in multiple failure groups are unavailable. Follow the procedures in Chapter 7, GPFS disk problems, on page 91 for unavailable disks.
73
74
Error numbers specific to GPFS application calls when file system manager appointment fails
When the appointment of a file system manager is unsuccessful after multiple attempts, GPFS may report these error numbers in error logs, or return them to an application: ENO_MGR = 212, The current file system manager failed and no new manager could be appointed. This usually occurs when a large number of disks are unavailable or when there has been a major network failure. Run mmlsdisk to determine whether disks have failed and take corrective action if they have by issuing the mmchdisk command.
Discrepancy between GPFS configuration data and the on-disk data for a file system
There is an indication leading you to the conclusion that there may be a discrepancy between the GPFS configuration data and the on-disk data for a file system. You issue a disk command (for example, mmadddisk, mmdeldisk, or mmrpldisk) and receive the message: 6027-1290 GPFS configuration data for file system fileSystem may not be in agreement with the on-disk data for the file system. Issue the command:
mmcommon recoverfs fileSystem
Before a disk is added to or removed from a file system, a check is made that the GPFS configuration data for the file system is in agreement with the on-disk data for the file system. The above message is issued if this check was not successful. This may occur if an earlier GPFS disk command was unable to complete successfully for some reason. Issue the mmcommon recoverfs command to bring the GPFS configuration data into agreement with the on-disk data for the file system. | If running mmcommon recoverfs does not resolve the problem, follow the procedures in Information to | collect before contacting the IBM Support Center on page 113, and then contact the IBM Support Center.
A NO_SPACE error occurs when a file system is known to have adequate free space
A ENOSPC (NO_SPACE) message can be returned even if a file system has remaining space. The NO_SPACE error might occur even if the df command shows that the file system is not full.
Chapter 6. GPFS file system problems
75
The user might have a policy that writes data into a specific storage pool. When the user tries to create a file in that storage pool, it returns the ENOSPC error if the storage pool is full. The user next issues the df command, which indicates that the file system is not full, because the problem is limited to the one storage pool in the user's policy. In order to see if a particular storage pool is full, the user must issue the mmdf command. Here is a sample scenario: 1. The user has a policy rule that says files whose name contains the word 'tmp' should be put into storage pool sp1 in the file system fs1. This command displays the rule:
mmlspolicy fs1 -L
2. The user moves a file from the /tmp directory to fs1 that has the word 'tmp' in the file name, meaning data of tmpfile should be placed in storage pool sp1:
mv /tmp/tmpfile /fs1/
This is an out-of-space error. 3. This command shows storage information for the file system:
df |grep fs1
This output indicates that the file system is only 51% full. 4. To query the storage usage for an individual storage pool, the user must issue the mmdf command.
mmdf fs1
76
Number of used inodes: 74 Number of free inodes: 137142 Number of allocated inodes: 137216 Maximum number of inodes: 150016
In this case, the user sees that storage pool sp1 has 0% free space left and that is the reason for the NO_SPACE error message. 5. To resolve the problem, the user must change the placement policy file to avoid putting data in a full storage pool, delete some files in storage pool sp1, or add more space to the storage pool.
Negative values occur in the 'predicted pool utilizations', when some files are 'ill-placed'
This is a hypothetical situation where ill-placed files can cause GPFS to produce a 'Predicted Pool Utilization' of a negative value. Suppose that 2 GB of data from a 5 GB file named abc, that is supposed to be in the system storage pool, are actually located in another pool. This 2 GB of data is said to be 'ill-placed'. Also, suppose that 3 GB of this file are in the system storage pool, and no other file is assigned to the system storage pool. If you run the mmapplypolicy command to schedule file abc to be moved from the system storage pool to a storage pool named YYY, the mmapplypolicy command does the following: 1. Starts with the 'Current pool utilization' for the system storage pool, which is 3 GB. 2. Subtracts 5 GB, the size of file abc. 3. Arrives at a 'Predicted Pool Utilization' of negative 2 GB. The mmapplypolicy command does not know how much of an 'ill-placed' file is currently in the wrong storage pool and how much is in the correct storage pool. When there are ill-placed files in the system storage pool, the 'Predicted Pool Utilization' can be any positive or negative value. The positive value can be capped by the LIMIT clause of the MIGRATE rule. The 'Current Pool Utilizations' should always be between 0% and 100%.
77
A new file is assigned to the storage pool of the first rule that it matches. If the file fails to match any rule, the file creation fails with an EINVAL error code. A suggested solution is to put a DEFAULT clause as the last entry of the policy file. 5. When a policy file is installed, GPFS verifies that the named storage pools exist. However, GPFS allows an administrator to delete pools that are mentioned in the policy file. This allows more freedom for recovery from hardware errors. Consequently, the administrator must be careful when deleting storage pools referenced in the policy.
78
The policy rule language can only check for some errors at runtime. For example, a rule that causes a divide by zero cannot be checked when the policy file is installed. Errors of this type generate an error message and stop the policy evaluation for that file. Note: I/O errors while migrating files indicate failing storage devices and must be addressed like any other I/O error. The same is true for any file system error or panic encountered while migrating files.
The mmlsfileset command identifies filesets in this state by displaying a status of 'Deleting'. 5. If you unlink a fileset that has other filesets linked below it, any filesets linked to it (that is, child filesets) become inaccessible. The child filesets remain linked to the parent and will become accessible again when the parent is re-linked. 6. By default, the mmdelfileset command will not delete a fileset that is not empty. To empty a fileset, first unlink all its immediate child filesets, to remove their junctions from the fileset to be deleted. Then, while the fileset itself is still linked, use rm -rf or a similar command, to remove the rest of the contents of the fileset. Now the fileset may be unlinked and deleted. Alternatively, the fileset to be deleted can be unlinked first and then mmdelfileset can be used with the -f (force) option. This will unlink its child filesets, then destroy the files and directories contained in the fileset. 7. When deleting a small dependent fileset, it may be faster to use the rm -rf command instead of the mmdelfileset command with the -f option.
79
1. Problems can arise when running backup and archive utilities against a file system with unlinked filesets. See the Filesets and backup topic in the GPFS: Advanced Administration Guide for details. 2. In the rare case that the mmfsck command encounters a serious error checking the file system's fileset metadata, it may not be possible to reconstruct the fileset name and comment. These cannot be inferred from information elsewhere in the file system. If this happens, mmfsck will create a dummy name for the fileset, such as 'Fileset911' and the comment will be set to the empty string. 3. Sometimes mmfsck encounters orphaned files or directories (those without a parent directory), and traditionally these are reattached in a special directory called 'lost+found' in the file system root. When a file system contains multiple filesets, however, orphaned files and directories are reattached in the 'lost+found' directory in the root of the fileset to which they belong. For the root fileset, this directory appears in the usual place, but other filesets may each have their own 'lost+found' directory.
80
A storage pool is deleted when all disks assigned to the pool are deleted. To delete the last disk, all data residing in the pool must be moved to another pool. Likewise, any files assigned to the pool, whether or not they contain data, must be reassigned to another pool. The easiest method for reassigning all files and migrating all data is to use the mmapplypolicy command with a single rule to move all data from one pool to another. You should also install a new placement policy that does not assign new files to the old pool. Once all files have been migrated, reissue the mmdeldisk command to delete the disk and the storage pool. | If all else fails, and you have a disk that has failed and cannot be recovered, follow the procedures in | Information to collect before contacting the IBM Support Center on page 113, and then contact the | IBM Support Center for commands to allow the disk to be deleted without migrating all data from it. Files with data left on the failed device will lose data. If the entire pool is deleted, any existing files assigned to that pool are reassigned to a broken pool, which prevents writes to the file until the file is reassigned to a valid pool. 6. Ill-placed files - understanding and correcting them. The mmapplypolicy command migrates a file between pools by first assigning it to a new pool, then moving the file's data. Until the existing data is moved, the file is marked as 'ill-placed' to indicate that some of its data resides in its previous pool. In practice, mmapplypolicy assigns all files to be migrated to their new pools, then it migrates all of the data in parallel. Ill-placed files indicate that the mmapplypolicy or mmchattr command did not complete its last migration or that -I defer was used. To correct the placement of the ill-placed files, the file data needs to be migrated to the assigned pools. You can use the mmrestripefs, or mmrestripefile commands to move the data. 7. Using the -P PoolName option on the mmrestripefs, command: This option restricts the restripe operation to a single storage pool. For example, after adding a disk to a pool, only the data in that pool needs to be restriped. In practice, -P PoolName simply restricts the operation to the files assigned to the specified pool. Files assigned to other pools are not included in the operation, even if the file is ill-placed and has data in the specified pool.
81
v When the target of the backup is tape, the TSM server may be unable to handle all of the backup client processes because the value of the TSM server's MAXNUMMP parameter is set lower than the number of client processes. This failure is indicated by message ANS1312E from TSM. v The mmbackup command is unable to create a snapshot to perform the backup because a subdirectory with the name of the snapshot subdirectory already exists. This is usually caused by the user doing a TSM restore of the backup without specifying a different name for receiving the restored contents of the file system than the name they were stored under in TSM, namely the snapshot subdirectory name. The errors from mmbackup normally indicate the underlying problem.
Snapshot problems
Use the mmlssnapshot command as a general hint for snapshot-related problems, to find out what snapshots exist, and what state they are in. Use the mmsnapdir command to find the snapshot directory name used to permit access. The mmlssnapshot command displays the list of all snapshots of a file system. This command lists the snapshot name, some attributes of the snapshot, as well as the snapshot's status. The mmlssnapshot command does not require the file system to be mounted.
82
v v v v v
'Unable to resume all nodes, rc=errorCode.' 'Unable to delete snapshot filesystemName from file system snapshotName, rc=errorCode.' 'Error restoring inode number, error errorCode.' 'Error deleting snapshot snapshotName in file system filesystemName, error errorCode.' 'commandString failed, error errorCode.'
v 'None of the nodes in the cluster is reachable, or GPFS is down on all of the nodes.' v 'File system filesystemName is not known to the GPFS cluster.'
83
v 'Previous snapshot snapshotName is invalid and must be deleted before another snapshot may be deleted.' v 'Previous snapshot snapshotName is invalid and must be deleted before another snapshot may be restored.' v 'More than one snapshot is marked for restore.' v 'Offline snapshot being restored.'
84
directory. When this older snapshot is restored, the mmrestorefs command will recreate the old, normal file or directory in the file system root directory. The mmrestorefs command will not fail in this case, but the restored file or directory will hide the existing snapshots. After invoking mmrestorefs it may therefore appear as if the existing snapshots have disappeared. However, mmlssnapshot should still show all existing snapshots. The fix is the similar to the one mentioned before. Perform one of these two steps: 1. After the mmrestorefs command completes, rename the conflicting file or directory that was restored in the root directory. 2. Run the mmsnapdir command to select a different name for the dynamically-generated snapshot directory. Finally, the mmsnapdir -a option enables a dynamically-generated snapshot directory in every directory, not just the file system root. This allows each user quick access to snapshots of their own files by going into .snapshots in their home directory or any other of their directories. Unlike .snapshots in the file system root, .snapshots in other directories is invisible, that is, an ls -a command will not list .snapshots. This is intentional because recursive file system utilities such as find, du or ls -R would otherwise either fail or produce incorrect or undesirable results. To access snapshots, the user must explicitly specify the name of the snapshot directory, for example: ls ~/.snapshots. If there is a name conflict (that is, a normal file or directory named .snapshots already exists in the user's home directory), the user must rename the existing file or directory. | | | | | The inode numbers that are used for and within these special .snapshots directories are constructed dynamically and do not follow the standard rules. These inode numbers are visible to applications through standard commands, such as stat, readdir, or ls. The inode numbers reported for these directories can also be reported differently on different operating systems. Applications should not expect consistent numbering for such inodes.
| The mmpmon command is thoroughly documented in the Monitoring GPFS I/O performance with the | mmpmon command topic in the GPFS: Advanced Administration Guide, and the Commands topic in the | GPFS: Administration and Programming Reference. Before proceeding with mmpmon problem | determination, review all of this material to ensure that you are using mmpmon correctly.
85
v The mmpmon command does not support: Monitoring read requests without monitoring writes, or the other way around. Choosing which file systems to monitor. Monitoring on a per-disk basis. Specifying different size or latency ranges for reads and writes. Specifying different latency values for a given size range.
6. Copy the output of mmfsadm to a safe location. | 7. Follow the procedures in Information to collect before contacting the IBM Support Center on page | 113, and then contact the IBM Support Center. If mmpmon terminates abnormally, perform these steps: 1. Determine if the GPFS daemon has failed, and if so restart it. 2. Review your invocation of mmpmon, and verify the input. 3. Try the function again. | 4. If the problem persists, follow the procedures in Information to collect before contacting the IBM | Support Center on page 113, and then contact the IBM Support Center.
86
NFS problems
There are some problems that can be encountered when GPFS interacts with NFS. For details on how GPFS and NFS interact, see the NFS and GPFS topic in the GPFS: Administration and Programming Reference. These are some of the problems encountered when GPFS interacts with NFS: v NFS client with stale inode data v NFS V4 problems
NFS V4 problems
Before analyzing an NFS V4 problem, review this documentation to determine if you are using NFS V4 ACLs and GPFS correctly: 1. The NFS Version 4 Protocol paper and other information found at: www.nfsv4.org. 2. The Managing GPFS access control lists and NFS export topic in the GPFS: Administration and Programming Reference. 3. The GPFS exceptions and limitations to NFS V4 ACLs topic in the GPFS: Administration and Programming Reference. The commands mmdelacl and mmputacl can be used to revert an NFS V4 ACL to a traditional ACL. Use the mmdelacl command to remove the ACL, leaving access controlled entirely by the permission bits in the mode. Then use the chmod command to modify the permissions, or the mmputacl and mmeditacl commands to assign a new ACL. For files, the mmputacl and mmeditacl commands can be used at any time (without first issuing the mmdelacl command) to assign any type of ACL. The command mmeditacl -k posix provides a translation of the current ACL into traditional POSIX form and can be used to more easily create an ACL to edit, instead of having to create one from scratch.
87
File systems being exported with Samba may (depending on which version of Samba you are using) require the -D nfs4 flag on the mmchfs or mmcrfs commands. This setting enables NFS V4 and CIFS (Samba) sharing rules. Some versions of Samba will fail share requests if the file system has not been configured to support them.
Data integrity
GPFS takes extraordinary care to maintain the integrity of customer data. However, certain hardware failures, or in extremely unusual circumstances, the occurrence of a programming error can cause the loss of data in a file system. GPFS performs extensive checking to validate metadata and ceases using the file system if metadata becomes inconsistent. This can appear in two ways: 1. The file system will be unmounted and applications will begin seeing ESTALE return codes to file operations. 2. Error log entries indicating a MMFS_SYSTEM_UNMOUNT and a corruption error are generated. | If actual disk data corruption occurs, this error will appear on each node in succession. Before proceeding | with the following steps, follow the procedures in Information to collect before contacting the IBM | Support Center on page 113, and then contact the IBM Support Center. 1. Examine the error logs on the NSD servers for any indication of a disk error that has been reported. 2. Take appropriate disk problem determination and repair actions prior to continuing. 3. After completing any required disk repair actions, run the offline version of the mmfsck command on the file system. 4. If your error log or disk analysis tool indicates that specific disk blocks are in error, use the mmfileid command to determine which files are located on damaged areas of the disk, and then restore these files. See The mmfileid command on page 33 for more information. 5. If data corruption errors occur in only one node, it is probable that memory structures within the node have been corrupted. In this case, the file system is probably good but a program error exists in GPFS or another authorized program with access to GPFS data structures. Follow the directions in Data integrity and then reboot the node. This should clear the problem. If the problem repeats on one node without affecting other nodes check the programming specifications code levels to determine that they are current and compatible and that no hardware errors were reported. Refer to the GPFS: Concepts, Planning, and Installation Guide for correct software levels.
Error numbers specific to GPFS application calls when data integrity may be corrupted
When there is the possibility of data corruption, GPFS may report these error numbers in the operating system error log, or return them to an application: EVALIDATE=214, Invalid checksum or other consistency check failure on disk data structure. This indicates that internal checking has found an error in a metadata structure. The severity of the error depends on which data structure is involved. The cause of this is usually GPFS software, disk hardware or other software between GPFS and the disk. Running mmfsck should repair the error. The urgency of this depends on whether the error prevents access to some file or whether basic metadata structures are involved.
88
home and running the mmafmctl resumeRequeued command, so that the requeued messages are executed at home again. If mmafmctl resumeRequeued is not run by an administrator, AFM would still execute the message in the regular order of message executions from cache to home. Running the mmfsadm dump afm all command on the gateway node shows the queued messages. Requeued messages show in the dumps similar to the following example:
c12c4apv13.gpfs.net: c12c4apv13.gpfs.net: Normal Queue: (listed by execution order) (state: Active) Write [612457.552962] requeued file3 (43 @ 293) chunks 0 bytes 0 0
89
90
v mmchnsd command | v mmcrfs command | v mmcrnsd command | For disks that are SAN-attached to all nodes in the cluster, device=DiskName should refer to the disk | device name in /dev on the node where the mmcrnsd command is issued. If a server list is specified, | device=DiskName must refer to the name of the disk on the first server node. The same disk can have different local names on different nodes.
91
When you specify an NSD server node, that node performs all disk I/O operations on behalf of nodes in the cluster that do not have connectivity to the disk. You can also specify up to eight additional NSD server nodes. These additional NSD servers will become active if the first NSD server node fails or is unavailable. | When the mmcrnsd command encounters an error condition, one of these messages is displayed: | 6027-2108 | Error found while processing stanza | or | 6027-1636 | Error found while checking disk descriptor: descriptor | Usually, this message is preceded by one or more messages describing the error more specifically. | Another possible error from mmcrnsd is: | 6027-2109 | Failed while processing disk stanza on node nodeName. | | | or 6027-1661 Failed while processing disk descriptor descriptor on node nodeName. One of these errors can occur if an NSD server node does not have read and write access to the disk. The NSD server node needs to write an NSD volume ID to the raw disk. If an additional NSD server node is specified, that NSD server node will scan its disks to find this NSD volume ID string. If the disk is SAN-attached to all nodes in the cluster, the NSD volume ID is written to the disk by the node on which the mmcrnsd command is running.
This output shows that: v There are three NSDs in this cluster: t65nsd4b, t65nsd12b, and t65nsd13b. v NSD disk t65nsd4b of filesystem fs1 is SAN-attached to all nodes in the cluster. v NSD disk t65nsd12b of file system fs5 has 2 NSD server nodes. v NSD disk t65nsd13b of file system fs6 has 3 NSD server nodes. If you need to find out the local device names for these disks, you could use the -m option on the mmlsnsd command. For example, issuing:
mmlsnsd -m
92
From this output we can tell that: v The local disk name for t65nsd12b on NSD server c26f4gp01 is hdisk34. v NSD disk t65nsd13b is not attached to node on which the mmlsnsd command was issued, nodec26f4gp04. v The mmlsnsd command was not able to determine the local device for NSD disk t65nsd13b on c26f4gp03 server. To find the nodes to which disk t65nsd4b is attached and the corresponding local device for that disk, issue:
mmlsnsd -d t65nsd4b -M
From this output we can tell that NSD t65nsd4b is: v Known as hdisk92 on node c26f4gp01 and c26f4gp02. v Known as hdisk26 on node c26f4gp04 v Is not attached to node c26f4gp03 To display extended information about a node's view of its NSDs, the mmlsnsd -X command can be used:
mmlsnsd -X -d "hd3n97;sdfnsd;hd5n98"
From this output we can tell that: v Disk hd3n97 is an hdisk known as /dev/hdisk3 on NSD server node c5n97 and c5n98. v Disk sdfnsd is a generic disk known as /dev/sdf and /dev/sdm on NSD server node c5n94g and c5n96g, respectively. v In addition to the above information, the NSD volume ID is displayed for each disk. Note: The -m, -M and -X options of the mmlsnsd command can be very time consuming, especially on large clusters. Use these options judiciously.
93
The easiest way to recover such a disk is to temporarily define it as an NSD again (using the -v no option) and then delete the just-created NSD. For example:
mmcrnsd -F filename -v no mmdelnsd -F filename
94
2. Issue the mmchdisk command with the -a option to start all stopped disks. v Disk failures should be accompanied by error log entries (see The operating system error log facility) for the failing disk. GPFS error log entries labelled MMFS_DISKFAIL will occur on the node detecting the error. This error log entry will contain the identifier of the failed disk. Follow the problem determination and repair actions specified in your disk vendor problem determination guide. After performing problem determination and repair issue the mmchdisk command to bring the disk back up.
On AIX, consult The operating system error log facility on page 2 for hardware configuration error log entries. Accessible disk devices will generate error log entries similar to this example for a SSA device:
-------------------------------------------------------------------------LABEL: SSA_DEVICE_ERROR IDENTIFIER: FE9E9357 Date/Time: Wed Sep 8 10:28:13 edt Sequence Number: 54638 Machine Id: 000203334C00 Node Id: c154n09 Class: H Type: PERM Resource Name: pdisk23 Resource Class: pdisk Resource Type: scsd Location: USSA4B33-D3 VPD: Manufacturer................IBM Machine Type and Model......DRVC18B Part Number.................09L1813 ROS Level and ID............0022 Serial Number...............6800D2A6HK EC Level....................E32032 Device Specific.(Z2)........CUSHA022 Device Specific.(Z3)........09L1813 Device Specific.(Z4)........99168 Description DISK OPERATION ERROR Probable Causes DASD DEVICE Failure Causes DISK DRIVE Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES Detail Data ERROR CODE 2310 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ---------------------------------------------------------------------------
95
96
GPFS will mark disks down if there have been problems accessing the disk. 2. To prevent any I/O from going to the down disk, issue these commands immediately:
mmchdisk fs1 suspend -d gpfs1nsd mmchdisk fs1 stop -d gpfs1nsd
Note: If there are any GPFS file systems with pending I/O to the down disk, the I/O will timeout if the system administrator does not stop it. To see if there are any threads that have been waiting a long time for I/O to complete, on all nodes issue:
mmfsadm dump waiters 10 | grep "I/O completion"
3. The next step is irreversible! Do not run this command unless data and metadata have been replicated. This command scans file system metadata for disk addresses belonging to the disk in question, then replaces them with a special broken disk address value, which may take a while. CAUTION: Be extremely careful with using the -p option of mmdeldisk, because by design it destroys references to data blocks, making affected blocks unavailable. This is a last-resort tool, to be used when data loss may have already occurred, to salvage the remaining datawhich means it cannot take any precautions. If you are not absolutely certain about the state of the file system and the impact of running this command, do not attempt to run it without first contacting the IBM Support Center.
mmdeldisk fs1 gpfs1n12 -p
For more information, see The mmfileid command on page 33. 5. After the disk is properly repaired and available for use, you can add it back to the file system.
You can rebalance the file system at the same time by issuing:
mmadddisk fs1 gpfs12nsd -r
Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only for file systems with large files that are mostly invariant. In many cases, normal file update and creation will rebalance your file system over time, without the cost of the rebalancing. 2. To re-replicate data that only has single copy, issue:
mmrestripefs fs1 -r
Optionally, use the -b flag instead of the -r flag to rebalance across all disks. Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only for file systems with large files that are mostly invariant. In many cases, normal file update and creation will rebalance your file system over time, without the cost of the rebalancing. 3. Optionally, check the file system for metadata inconsistencies by issuing the offline version of mmfsck:
mmfsck fs1
Chapter 7. GPFS disk problems
97
If mmfsck succeeds, you may still have errors that occurred. Check to verify no files were lost. If files containing user data were lost, you will have to restore the files from the backup media. If mmfsck fails, sufficient metadata was lost and you need to recreate your file system and restore the data from backup media.
Strict replication
If data or metadata replication is enabled, and the status of an existing disk changes so that the disk is no longer available for block allocation (if strict replication is enforced), you may receive an errno of ENOSPC when you create or append data to an existing file. A disk becomes unavailable for new block allocation if it is being deleted, replaced, or it has been suspended. If you need to delete, replace, or suspend a disk, and you need to write new data while the disk is offline, you can disable strict replication by issuing the mmchfs -K no command before you perform the disk action. However, data written while replication is disabled will not be replicated properly. Therefore, after you perform the disk action, you must re-enable strict replication by issuing the mmchfs -K command with the original value of the -K option (always or whenpossible) and then run the mmrestripe -r command. To determine if a disk has strict replication enforced, issue the mmlsfs -K command. Note: A disk in a down state that has not been explicitly suspended is still available for block allocation, and thus a spontaneous disk failure will not result in application I/O requests failing with ENOSPC. While new blocks will be allocated on such a disk, nothing will actually be written to the disk until its availability changes to up following an mmchdisk start command. Missing replica updates that took place while the disk was down will be performed when mmchdisk start runs.
No replication
When there is no replication, the system metadata has been lost and the file system is basically irrecoverable. You may be able to salvage some of the user data, but it will take work and time. A forced unmount of the file system will probably already have occurred. If not, it probably will very soon if you try to do any recovery work. You can manually force the unmount yourself: 1. Mount the file system in read-only mode (see Read-only mode mount on page 23). This will bypass recovery errors and let you read whatever you can find. Directories may be lost and give errors, and parts of files will be missing. Get what you can now, for all will soon be gone. On a single node, issue:
mount -o ro /dev/fs1
2. If you read a file in block-size chunks and get an EIO return code that block of the file has been lost. The rest of the file may have useful data to recover or it can be erased. To save the file system parameters for recreation of the file system, issue:
mmlsfs fs1 > fs1.saveparms
Note: This next step is irreversible! To delete the file system, issue:
mmdelfs fs1
3. To repair the disks, see your disk vendor problem determination guide. Follow the problem determination and repair actions specified. 4. Delete the affected NSDs. Issue:
mmdelnsd nsdname
98
mmdelnsd: Processing disk nsdname mmdelnsd: 6027-1371 Propagating the cluster configuation data to all affected nodes. This is an asynchronous process.
5. Create a disk descriptor file for the disks to be used. This will include recreating NSDs for the new file system. 6. Recreate the file system with either different parameters or the same as you used before. Use the disk descriptor file. 7. Restore lost data from backups.
Error numbers specific to GPFS application calls when disk failure occurs
When a disk failure has occurred, GPFS may report these error numbers in the operating system error log, or return them to an application: EOFFLINE = 208, Operation failed because a disk is offline This error is most commonly returned when an attempt to open a disk fails. Since GPFS will attempt to continue operation with failed disks, this will be returned when the disk is first needed to complete a command or application request. If this return code occurs, check your disk for stopped states, and check to determine if the network path exists. To repair the disks, see your disk vendor problem determination guide. Follow the problem determination and repair actions specified. ENO_MGR = 212, The current file system manager failed and no new manager could be appointed. This error usually occurs when a large number of disks are unavailable or when there has been a major network failure. Run the mmlsdisk command to determine whether disks have failed. If disks have failed, check the operating system error log on all nodes for indications of errors. Take corrective action by issuing the mmchdisk command. To repair the disks, see your disk vendor problem determination guide. Follow the problem determination and repair actions specified.
99
For a file system using the default mount option useNSDserver=asneeded, disk access fails over from local access to remote NSD access. Once local access is restored, GPFS detects this fact and switches back to local access. The detection and switch over are not instantaneous, but occur at approximately five minute intervals. Note: In general, after fixing the path to a disk, you must run the mmnsddiscover command on the server that lost the path to the NSD. (Until the mmnsddiscover command is run, the reconnected node will see its local disks and start using them by itself, but it will not act as the NSD server.) After that, you must run the command on all client nodes that need to access the NSD on that server; or you can achieve the same effect with a single mmnsddiscover invocation if you utilize the -N option to specify a node list that contains all the NSD servers and clients that need to rediscover paths.
3. If you are replacing the disk, add the new disk to the file system:
mmadddisk fs1 gpfs11nsd
Note: Ensure there is sufficient space elsewhere in your file system for the data to be stored by using the mmdf command.
GPFS has declared NSDs built on top of AIX logical volumes as down
Earlier releases of GPFS allowed AIX logical volumes to be used in GPFS file systems. Using AIX logical volumes in GPFS file systems is now discouraged as they are limited with regard to their clustering ability and cross platform support. Existing file systems using AIX logical volumes are however still supported, and this information might be of use when working with those configurations.
which will display any underlying physical device present on this node which is backing the NSD. If the underlying device is a logical volume, perform a mapping from the logical volume to the volume group. Issue the commands:
lsvg -o | lsvg -i -l
100
The output will be a list of logical volumes and corresponding volume groups. Now issue the lsvg command for the volume group containing the logical volume. For example:
lsvg gpfs1vg
Here the output shows that on each of the five nodes the volume group gpfs1vg is the same physical disk (has the same pvid). The hdisk numbers vary, but the fact that they may be called different hdisk names on different nodes has been accounted for in the GPFS product. This is an example of a properly defined volume group. If any of the pvids were different for the same volume group, this would indicate that the same volume group name has been used when creating volume groups on different physical volumes. This will not work for GPFS. A volume group name can be used only for the same physical volume shared among nodes in a cluster. For more information, refer to the IBM pSeries and AIX Information Center (http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp) and search for operating system and device management.
101
Disk accessing commands fail to complete due to problems with some non-IBM disks
Certain disk commands, such as mmcrfs, mmadddisk, mmrpldisk, mmmount and the operating system's mount, might issue the varyonvg -u command if the NSD is backed by an AIX logical volume. For some non-IBM disks, when many varyonvg -u commands are issued in parallel, some of the AIX varyonvg -u invocations do not complete, causing the disk command to hang. This situation is recognized by the GPFS disk command not completing after a long period of time, and the persistence of the varyonvg processes as shown by the output of the ps -ef command on some of the nodes of the cluster. In these cases, kill the varyonvg processes that were issued by the GPFS disk command on the nodes of the cluster. This allows the GPFS disk command to complete. Before mounting the affected file system on any node where a varyonvg process was killed, issue the varyonvg -u command (varyonvg -u vgname) on the node to make the disk available to GPFS. Do this on each of the nodes in question, one by one, until all of the GPFS volume groups are varied online.
102
no_reserve:: Specifies that no reservations are used on the disk. single_path:: Specifies that legacy reserve/release commands are used on the disk. PR_exclusive:: Specifies that Persistent Reserve is used to establish exclusive host access to the disk. PR_shared:: Specifies that Persistent Reserve is used to establish shared host access to the disk. Persistent Reserve support affects both the parallel (scdisk) and SCSI-3 (scsidisk) disk device drivers and configuration methods. When a device is opened (for example, when the varyonvg command opens the underlying hdisks), the device driver checks the ODM for reserve_policy and PR_key_value and then opens the device appropriately. For PR, each host attached to the shared disk must use unique registration key values for reserve_policy and PR_key_value. On AIX, you can display the values assigned to reserve_policy and PR_key_value by issuing:
lsattr -El hdiskx -a reserve_policy,PR_key_value
If needed, use the AIX chdev command to set reserve_policy and PR_key_value. Note: GPFS manages reserve_policy and PR_key_value using reserve_policy=PR_shared when Persistent Reserve support is enabled and reserve_policy=no_reserve when Persistent Reserve is disabled.
2. To check the AIX ODM status of a single disk on a node, issue the following command from a node that has access to the disk:
lsattr -El hdiskx -a reserve_policy,PR_key_value
103
/usr/lpp/mmfs/bin/tsprreadkeys sdp
If the registered key values all start with 0x00006d, which indicates that the PR registration was issued by GPFS, proceed to the next step to verify the SCSI-3 PR reservation type. Otherwise, contact your system administrator for information about clearing the disk state. 2. Display the reservation type on the disk:
/usr/lpp/mmfs/bin/tsprreadres sdp
If the output indicates a PR reservation with type WriteExclusive-AllRegistrants, proceed to the following instructions for clearing the SCSI-3 PR reservation on the disk. If the output does not indicate a PR reservation with this type, contact your system administrator for information about clearing the disk state. To clear the SCSI-3 PR reservation on the disk, follow these steps: 1. Choose a hex value (HexValue); for example, 0x111abc that is not in the output of the tsprreadkeys command run previously. Register the local node to the disk by entering the following command with the chosen HexValue:
/usr/lpp/mmfs/bin/tsprregister sdp 0x111abc
2. Verify that the specified HexValue has been registered to the disk:
/usr/lpp/mmfs/bin/tsprreadkeys sdp
104
The IBM Support Center will help you determine if the PR state is incorrect for a disk. If the PR state is incorrect, you may be directed to correct the situation by manually enabling or disabling PR on that disk.
The mmlsdisk output shows that I/O for NSD m0001 is being performed on disk /dev/sdb, but it should show that I/O is being performed on the device-mapper multipath (DMM) /dev/dm-30. Disk /dev/sdb is one of eight paths of the DMM /dev/dm-30 as shown from the multipath command. This problem could occur for the following reasons: v The previously installed user exit /var/mmfs/etc/nsddevices is missing. To correct this, restore user exit /var/mmfs/etc/nsddevices and restart GPFS. v The multipath device type does not match the GPFS known device type. For a list of known device types, see /usr/lpp/mmfs/bin/mmdevdiscover. After you have determined the device type for your multipath device, use the mmchconfig command to change the NSD disk to a known device type and then restart GPFS. The output below shows that device type dm-30 is dmm:
/usr/lpp/mmfs/bin/mmdevdiscover | grep dm-30 dm-30 dmm
To change the NSD device type to a known device type, create a file that contains the NSD name and device type pair (one per line) and issue this command:
mmchconfig updateNsdType=/tmp/filename
105
mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
106
v Questions related to active file management on page 109 v Questions related to File Placement Optimizer (FPO) on page 111
Output is similar to this, with the physical volume name in column one.
gpfs44lv:N/A PV hdisk8 COPIES 537:000:000 IN BAND 100% DISTRIBUTION 108:107:107:107:108
GPFS cluster configuration servers: ----------------------------------Primary server: k164n06.kgn.ibm.com Secondary server: k164n05.kgn.ibm.com
Copyright IBM Corp. 1998, 2013
107
Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------1 k164n04.kgn.ibm.com 198.117.68.68 k164n04.kgn.ibm.com quorum 2 k164n05.kgn.ibm.com 198.117.68.71 k164n05.kgn.ibm.com quorum 3 k164n06.kgn.ibm.com 198.117.68.70 k164n06.kgn.ibm.com
In this example, k164n04 and k164n05 are quorum nodes and k164n06 is a nonquorum node. To change the quorum status of a node, use the mmchnode command. To change one quorum node to nonquorum, GPFS does not have to be stopped. If you are changing more than one node at the same time, GPFS needs to be down on all the affected nodes. GPFS does not have to be stopped when changing nonquorum nodes to quorum nodes, nor does it need to be stopped on nodes that are not affected. For example, to make k164n05 a nonquorum node, and k164n06 a quorum node, issue these commands:
mmchnode --nonquorum -N k164n05 mmchnode --quorum -N k164n06
To set a node's quorum designation at the time that it is added to the cluster, see the mmcrcluster or mmaddnode commands.
What is stored in the /tmp/mmfs directory and why does it sometimes disappear?
When GPFS encounters an internal problem, certain state information is saved in the GPFS dump directory for later analysis by the IBM Support Center. The default dump directory for GPFS is /tmp/mmfs. This directory might disappear on Linux if cron is set to run the /etc/cron.daily/tmpwatch script. The tmpwatch script removes files and directories in /tmp that have not been accessed recently. Administrators who want to use a different directory for GPFS dumps can change the directory by issuing this command:
mmchconfig dataStructureDump=/name_of_some_other_big_file_system
If indexing GPFS file systems is desired, only one node should run the updatedb command and build the database in a GPFS file system. If the database is built within a GPFS file system it will be visible on all nodes after one node finishes building it.
108
2. Check to see if the EDITOR variable is set in the $HOME/.kshrc file. If it is set, check to see if it is an absolute path name because the mmedquota or mmdefedquota command could retrieve the EDITOR environment variable from that file.
Why does the offline mmfsck command fail with "Error creating internal storage"?
| | The mmfsck command requires some temporary space on the file system manager for storing internal data during a file system scan. The internal data will be placed in the directory specified by the mmfsck -t command line parameter (/tmp by default). The amount of temporary space that is needed is proportional to the number of inodes (used and unused) in the file system that is being scanned. If GPFS is unable to create a temporary file of the required size, the mmfsck command will fail with error message:
Error creating internal storage
This failure could be caused by: v The lack of sufficient disk space in the temporary directory on the file system manager v The lack of sufficient pagepool on the file system manager as shown in mmlsconfig pagepool output v Insufficiently high filesize limit set for the root user by the operating system v The lack of support for large files in the file system that is being used for temporary storage. Some file systems limit the maximum file size because of architectural constraints. For example, JFS on AIX does not support files larger than 2 GB, unless the Large file support option has been specified when the file system was created. Check local operating system documentation for maximum file size limitations.
109
Why are setuid/setgid bits in a single-writer cache reset at home after data is appended?
The setuid/setgid bits in a single-writer cache are reset at home after data is appended to files on which those bits were previously set and synced. This is because over NFS, a write operation to a setuid file resets the setuid bit.
110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Why is my data not read from the network locally when I have an FPO pool (write-affinity enabled storage pool) created?
When you create a storage pool that is to contain files that make use of FPO features, you must specify allowWriteAffinity=yes in the storage pool stanza. To enable the policy to read replicas from local disks, you must also issue the following command:
mmchconfig readReplicaPolicy=local
Why does Hadoop receive a fixed value for the block group factor instead of the GPFS default value?
When a customer does not define the dfs.block.size property in the configuration file, the GPFS connector will use a fixed block size to initialize Hadoop. The reason for this is that Hadoop has only one block size per file system, whereas GPFS allows different chunk sizes (block-group-factor data block size) for different data pools because block size is a per-pool property. To avoid a mismatch when using Hadoop with FPO, define dfs.block.size and dfs.replication in the configuration file.
How can I retain the original data placement when I restore data from a TSM server?
When data in an FPO pool is backed up in a TSM server and then restored, the original placement map will be broken unless you set the write affinity failure group for each file before backup.
111
112
v On a Linux node, create a tar file of all the entries in the /var/log/messages file from all nodes in the cluster or the nodes that experienced the failure. For example, issue the following command to create a tar file that includes all nodes in the cluster:
mmdsh -v -N all "cat /var/log/messages" > all.messages
v On a Windows node, use the Export List... dialog in the Event Viewer to save the event log to a file. b. A master GPFS log file that is merged and chronologically sorted for the date of the failure (see Creating a master GPFS log file on page 2). c. If the cluster was configured to store dumps, collect any internal GPFS dumps written to that directory relating to the time of the failure. The default directory is /tmp/mmfs. d. On a failing Linux node, gather the installed software packages and the versions of each package by issuing this command:
rpm -qa
e. On an failing AIX node, gather the name, most recent level, state, and description of all installed software packages by issuing this command:
lslpp -l
f. File system attributes for all of the failing file systems, issue:
Copyright IBM Corporation IBM 1998, 2013
113
| | | | | | | | | |
mmlsfs Device
g. The current configuration and state of the disks for all of the failing file systems, issue:
mmlsdisk Device
h. A copy of file /var/mmfs/gen/mmsdrfs from the primary cluster configuration server. 4. If you are experiencing one of the following problems, see the appropriate section before contacting the IBM Support Center: v For delay and deadlock issues, see Additional information to collect for delays and deadlocks. v For file system corruption or MMFS_FSSTRUCT errors, see Additional information to collect for file system corruption or MMFS_FSSTRUCT errors. v For GPFS daemon crashes, see Additional information to collect for GPFS daemon crashes.
| Additional information to collect for delays and deadlocks | | | | When a delay or deadlock situation is suspected, the IBM Support Center will need additional information to assist with problem diagnosis. If you have not done so already, ensure you have the following information available before contacting the IBM Support Center:
1. Everything that is listed in Information to collect for all problems related to GPFS on page 113. | 2. If the cluster size is relatively small and the maxFilesToCache setting is not high (less than 10,000), issue the following command: | gpfs.snap --deadlock | If the cluster size is large or the maxFilesToCache setting is high (greater than 1M), issue the | following command: | gpfs.snap --deadlock --quick | For more information about the --deadlock and --quick options, see The gpfs.snap command on | page 6. | | Additional information to collect for file system corruption or MMFS_FSSTRUCT | errors | When file system corruption or MMFS_FSSTRUCT errors are encountered, the IBM Support Center will | need additional information to assist with problem diagnosis. If you have not done so already, ensure | you have the following information available before contacting the IBM Support Center: | 1. Everything that is listed in Information to collect for all problems related to GPFS on page 113. | 2. Unmount the file system everywhere, then run mmfsck -n in offline mode and redirect it to an output file. | | The IBM Support Center will determine when and if you should run the mmfsck -y command. | Additional information to collect for GPFS daemon crashes | | | | | | | | | | When the GPFS daemon is repeatedly crashing, the IBM Support Center will need additional information to assist with problem diagnosis. If you have not done so already, ensure you have the following information available before contacting the IBM Support Center: 1. Everything that is listed in Information to collect for all problems related to GPFS on page 113. 2. Ensure the /tmp/mmfs directory exists on all nodes. If this directory does not exist, the GPFS daemon will not generate internal dumps. 3. Set the traces on this cluster and all clusters that mount any file system from this cluster:
mmtracectl --set --trace=def --trace-recycle=global
114
| | | | | | | |
5. Recreate the problem if possible or wait for the assert to be triggered again. 6. Once the assert is encountered on the node, turn off the trace facility by issuing:
mmtracectl --off
If traces were started on multiple clusters, mmtracectl --off should be issued immediately on all clusters. 7. Collect gpfs.snap output:
gpfs.snap
115
116
If more than one substring within a message matches this pattern (for example, [A] or [A:nnn]), the severity tag is the first such matching string. When the severity tag includes a numeric code (nnn), this is an error code associated with the message. If this were the only problem encountered by the command, the command return code would be nnn. If a message does not have a severity tag, the message does not conform to this specification. You can determine the message severity by examining the text or any supplemental information provided in the message catalog, or by contacting the IBM Support Center. Table 4 lists the severity tags:
Table 4. Message severity tags Severity tag I W Type of message Informational Warning Meaning Indicates normal operation. This message by itself indicates that nothing is wrong. Indicates a problem, but command execution continues. The problem can be a transient inconsistency. It can be that the command has skipped some operations on some objects, or is reporting an irregularity that could be of interest. For example, if a multipass command operating on many files discovers during its second pass that a file that was present during the first pass is no longer present, the file might have been removed by another command or program. Indicates an error. Command execution might or might not continue, but this error was likely caused by a persistent condition and will remain until corrected by some other program or administrative action. For example, a command operating on a single file or other GPFS object might terminate upon encountering any condition of severity E. As another example, a command operating on a list of files, finding that one of the files has permission bits set that disallow the operation, might continue to operate on all other files within the specified list of files. The error is probably related to the state of the file system. Command execution is usually halted due to this kind of error. For example, the file system is not mounted, so the command cannot execute. The system has discovered an internal inconsistency of some kind. Command execution might be halted or the system might attempt to continue in spite of the inconsistency. Report these kinds of errors to IBM.
Error
Severe error
Exception
117
118
Explanation: Replication cannot protect data against disk failures when there are insufficient failure groups. User response: Add more disks in new failure groups to the file system or accept the risk of data loss. 6027-300 mmfsd ready
Explanation: The verifyGpfsReady=yes configuration attribute is set and /var/mmfs/etc/gpfsready script did not complete successfully User response: Make sure /var/mmfs/etc/gpfsready completes and returns a zero exit status, or disable the verifyGpfsReady option via mmchconfig verifyGpfsReady=no. 6027-306 Could not initialize internode communication.
Explanation: The mmfsd server is up and running. User response: None. Informational message only. 6027-301 File fileName could not be run with err errno.
Explanation: The named shell script could not be executed. This message is followed by the error string that is returned by the exec. User response: Check file existence and access permissions. 6027-302 Could not execute script
Explanation: The GPFS daemon was unable to initialize the communications required to proceed. User response: User action depends on the return code shown in the accompanying message (/usr/include/errno.h). The communications failure that caused the failure must be corrected. One possibility is an rc value of 67, indicating that the required port is unavailable. This may mean that a previous version of the mmfs daemon is still running. Killing that daemon may resolve the problem. 6027-310 command initializing. {Version versionName: Built date time}
Explanation: The verifyGpfsReady=yes configuration attribute is set, but the /var/mmfs/etc/gpfsready script could not be executed. User response: Make sure /var/mmfs/etc/gpfsready exists and is executable, or disable the verifyGpfsReady option via mmchconfig verifyGpfsReady=no. 6027-303 script killed by signal signal
Explanation: The mmfsd server has started execution. User response: None. Informational message only. 6027-311 programName is shutting down.
Explanation: The verifyGpfsReady=yes configuration attribute is set and /var/mmfs/etc/gpfsready script did not complete successfully. User response: Make sure /var/mmfs/etc/gpfsready completes and returns a zero exit status, or disable the verifyGpfsReady option via mmchconfig verifyGpfsReady=no. 6027-304 script ended abnormally
Explanation: The stated program is about to terminate. User response: None. Informational message only. 6027-312 Unknown trace class traceClass.
Explanation: The trace class is not recognized. User response: Specify a valid trace class. 6027-313 Cannot open configuration file fileName.
Explanation: The verifyGpfsReady=yes configuration attribute is set and /var/mmfs/etc/gpfsready script did not complete successfully.
Copyright IBM Corp. 1998, 2013
119
6027-314 6027-329
User response: The configuration file is /var/mmfs/gen/mmfs.cfg. Verify that this file and /var/mmfs/gen/mmsdrfs exist in your system. 6027-314 command requires SuperuserName authority to execute. 6027-320 Could not map shared segment.
Explanation: The shared segment could not be attached. User response: This is an error from the AIX operating system. Check the accompanying error indications from AIX. 6027-321 Shared segment mapped at wrong address (is value, should be value).
Explanation: The mmfsd server was started by a user without superuser authority. User response: Log on as a superuser and reissue the command. 6027-315 Bad config file entry in fileName, line number.
Explanation: The shared segment did not get mapped to the expected address. User response: Contact the IBM Support Center. 6027-322 Could not map shared segment in kernel extension.
Explanation: The configuration file has an incorrect entry. User response: Fix the syntax error in the configuration file. Verify that you are not using a configuration file that was created on a release of GPFS subsequent to the one that you are currently running. 6027-316 Unknown config parameter parameter in fileName, line number.
Explanation: The shared segment could not be mapped in the kernel. User response: If an EINVAL error message is displayed, the kernel extension could not use the shared segment because it did not have the correct GPFS version number. Unload the kernel extension and restart the GPFS daemon. 6027-323 Error unmapping shared segment
Explanation: There is an unknown parameter in the configuration file. User response: Fix the syntax error in the configuration file. Verify that you are not using a configuration file that was created on a release of GPFS subsequent to the one you are currently running. 6027-317 Old server with PID pid still running.
Explanation: The shared segment could not be detached. User response: Check reason given by error message. 6027-324 Could not create message queue for main process
Explanation: An old copy of mmfsd is still running. User response: This message would occur only if the user bypasses the SRC. The normal message in this case would be an SRC message stating that multiple instances are not allowed. If it occurs, stop the previous instance and use the SRC commands to restart the daemon. 6027-318 Watchdog: Some process appears stuck; stopped the daemon process.
Explanation: The message queue for the main process could not be created. This is probably an operating system error. User response: Contact the IBM Support Center. 6027-328 Value value for parameter is out of range in fileName. Valid values are value through value. value used.
Explanation: A high priority process got into a loop. User response: Stop the old instance of the mmfs server, then restart it. 6027-319 Could not create shared segment.
Explanation: An error was found in the /var/mmfs/etc/mmfs.cfg files. User response: Check the mmfs.cfg file. 6027-329 Cannot pin the main shared segment: name
Explanation: The shared segment could not be created. User response: This is an error from the AIX operating system. Check the accompanying error indications from AIX.
Explanation: Trying to pin the shared segment during initialization. User response: Check the mmfs.cfg file. The pagepool size may be too large. It cannot be more than 80% of real memory. If a previous mmfsd crashed, check for processes that begin with the name mmfs that may be
120
6027-334 6027-344
holding on to an old pinned shared segment. Issue mmchconfig command to change the pagepool size. 6027-334 Error initializing internal communications. 6027-340 Child process file failed to start due to error rc: errStr.
Explanation: A failure occurred when GPFS attempted to start a program. User response: If the program was a user exit script, verify the script file exists and has appropriate permissions assigned. If the program was not a user exit script, then this is an internal GPFS error or the GPFS installation was altered. 6027-341 Node nodeName is incompatible because its maximum compatible version (number) is less than the version of this node (number). [value/value]
Explanation: The mailbox system used by the daemon for communication with the kernel cannot be initialized. User response: Increase the size of available memory using the mmchconfig command. 6027-335 Configuration error: check fileName.
Explanation: A configuration error is found. User response: Check the mmfs.cfg file and other error messages. 6027-336 Value value for configuration parameter parameter is not valid. Check fileName.
Explanation: A configuration error is found. User response: Check the mmfs.cfg file. 6027-337 Waiting for resources to be reclaimed before exiting.
Explanation: The GPFS daemon tried to make a connection with another GPFS daemon. However, the other daemon is not compatible. Its maximum compatible version is less than the version of the daemon running on this node. The numbers in square brackets are for IBM Service use. User response: Verify your GPFS daemon version. 6027-342 Node nodeName is incompatible because its minimum compatible version is greater than the version of this node (number). [value/value]
Explanation: The mmfsd daemon is attempting to terminate, but cannot because data structures in the daemon shared segment may still be referenced by kernel code. This message may be accompanied by other messages that show which disks still have I/O in progress. User response: None. Informational message only. 6027-338 Waiting for number user(s) of shared segment to release it.
Explanation: The GPFS daemon tried to make a connection with another GPFS daemon. However, the other daemon is not compatible. Its minimum compatible version is greater than the version of the daemon running on this node. The numbers in square brackets are for IBM Service use. User response: Verify your GPFS daemon version. 6027-343 Node nodeName is incompatible because its version (number) is less than the minimum compatible version of this node (number). [value/value]
Explanation: The mmfsd daemon is attempting to terminate, but cannot because some process is holding the shared segment while in a system call. The message will repeat every 30 seconds until the count drops to zero. User response: Find the process that is not responding, and find a way to get it out of its system call. 6027-339 Nonnumeric trace value value after class classname.
Explanation: The GPFS daemon tried to make a connection with another GPFS daemon. However, the other daemon is not compatible. Its version is less than the minimum compatible version of the daemon running on this node. The numbers in square brackets are for IBM Service use. User response: Verify your GPFS daemon version. 6027-344 Node nodeName is incompatible because its version is greater than the maximum compatible version of this node (number). [value/value]
Explanation: The specified trace value is not recognized. User response: Specify a valid trace integer value.
Explanation: The GPFS daemon tried to make a connection with another GPFS daemon. However, the other daemon is not compatible. Its version is greater than the maximum compatible version of the daemon
Chapter 11. Messages
121
6027-345 6027-362
running on this node. The numbers in square brackets are for IBM Service use. User response: Verify your GPFS daemon version. 6027-345 Network error on ipAddress, check connectivity. 6027-350 Bad subnets configuration: primary IP address ipAddress is on a private subnet. Use a public IP address instead.
Explanation: A TCP error has caused GPFS to exit due to a bad return code from an error. Exiting allows recovery to proceed on another node and resources are not tied up on this node. User response: Follow network problem determination procedures. 6027-346 Incompatible daemon version. My version =number, repl.my_version = number.
Explanation: GPFS is configured to allow multiple IP addresses per node (subnets configuration parameter), but the primary IP address of the node (the one specified when the cluster was created or when the node was added to the cluster) was found to be on a private subnet. If multiple IP addresses are used, the primary address must be a public IP address. User response: Remove the node from the cluster; then add it back using a public IP address. 6027-358 Communication with mmspsecserver through socket name failed, err value: errorString, msgType messageType.
Explanation: The GPFS daemon tried to make a connection with another GPFS daemon. However, the other GPFS daemon is not the same version and it sent a reply indicating its version number is incompatible. User response: Verify your GPFS daemon version.
Explanation: Communication failed between spsecClient (the daemon) and spsecServer. User response: Verify both the communication socket and the mmspsecserver process. 6027-359 The mmspsecserver process is shutting down. Reason: explanation.
6027-347
Remote host ipAddress refused connection because IP address ipAddress was not in the node list file.
Explanation: The GPFS daemon tried to make a connection with another GPFS daemon. However, the other GPFS daemon sent a reply indicating it did not recognize the IP address of the connector. User response: Add the IP address of the local host to the node list file on the remote host. 6027-348 Bad "subnets" configuration: invalid subnet ipAddress.
Explanation: The mmspsecserver process received a signal from the mmfsd daemon or encountered an error on execution. User response: Verify the reason for shutdown. 6027-360 Disk name must be removed from the /etc/filesystems stanza before it can be deleted. Another disk in the file system can be added in its place if needed.
Explanation: A disk being deleted is found listed in the disks= list for a file system. User response: Remove the disk from list. 6027-361 Local access to disk failed with EIO, switching to access the disk remotely.
Explanation: A subnet specified by the subnets configuration parameter could not be parsed. User response: Run the mmlsconfig command and check the value of the subnets parameter. Each subnet must be specified as a dotted-decimal IP address. Run the mmchconfig subnets command to correct the value. 6027-349 Bad subnets configuration: invalid cluster name pattern clusterNamePattern.
Explanation: Local access to the disk failed. To avoid unmounting of the file system, the disk will now be accessed remotely. User response: Wait until work continuing on the local node completes. Then determine why local access to the disk failed, correct the problem and restart the daemon. This will cause GPFS to begin accessing the disk locally again. 6027-362 Attention: No disks were deleted, but some data was migrated. The file system may no longer be properly balanced.
Explanation: A cluster name pattern specified by the subnets configuration parameter could not be parsed. User response: Run the mmlsconfig command and check the value of the subnets parameter. The optional cluster name pattern following subnet address must be a shell-style pattern allowing '*', '/' and '[...]' as wild cards. Run the mmchconfig subnets command to correct the value.
Explanation: The mmdeldisk command did not complete migrating data off the disks being deleted.
122
6027-363 6027-376
The disks were restored to normal ready, status, but the migration has left the file system unbalanced. This may be caused by having too many disks unavailable or insufficient space to migrate all of the data to other disks. User response: Check disk availability and space requirements. Determine the reason that caused the command to end before successfully completing the migration and disk deletion. Reissue the mmdeldisk command. 6027-363 I/O error writing disk descriptor for disk name. 6027-371 Cannot delete all disks in the file system
Explanation: An attempt was made to delete all the disks in a file system. User response: Either reduce the number of disks to be deleted or use the mmdelfs command to delete the file system. 6027-372 Replacement disk must be in the same failure group as the disk being replaced.
Explanation: An improper failure group was specified for mmrpldisk. User response: Specify a failure group in the disk descriptor for the replacement disk that is the same as the failure group of the disk being replaced. 6027-373 Disk diskName is being replaced, so status of disk diskName must be replacement.
Explanation: An I/O error occurred when the mmadddisk command was writing a disk descriptor on a disk. This could have been caused by either a configuration error or an error in the path to the disk. User response: Determine the reason the disk is inaccessible for writing and reissue the mmadddisk command. 6027-364 Error processing disks.
Explanation: An error occurred when the mmadddisk command was reading disks in the file system. User response: Determine the reason why the disks are inaccessible for reading, then reissue the mmadddisk command. 6027-365 Rediscovered local access to disk.
Explanation: The mmrpldisk command failed when retrying a replace operation because the new disk does not have the correct status. User response: Issue the mmlsdisk command to display disk status. Then either issue the mmchdisk command to change the status of the disk to replacement or specify a new disk that has a status of replacement. 6027-374 Disk name may not be replaced.
Explanation: Rediscovered local access to disk, which failed earlier with EIO. For good performance, the disk will now be accessed locally. User response: Wait until work continuing on the local node completes. This will cause GPFS to begin accessing the disk locally again. 6027-369 I/O error writing file system descriptor for disk name.
Explanation: A disk being replaced with mmrpldisk does not have a status of ready or suspended. User response: Use the mmlsdisk command to display disk status. Issue the mmchdisk command to change the status of the disk to be replaced to either ready or suspended. 6027-375 Disk name diskName already in file system.
Explanation: mmadddisk detected an I/O error while writing a file system descriptor on a disk. User response: Determine the reason the disk is inaccessible for writing and reissue the mmadddisk command. 6027-370 mmdeldisk completed.
Explanation: The replacement disk name specified in the mmrpldisk command already exists in the file system. User response: Specify a different disk as the replacement disk. 6027-376 Previous replace command must be completed before starting a new one.
Explanation: The mmdeldisk command has completed. User response: None. Informational message only.
Explanation: The mmrpldisk command failed because the status of other disks shows that a replace command did not complete. User response: Issue the mmlsdisk command to display disk status. Retry the failed mmrpldisk
Chapter 11. Messages
123
6027-377 6027-389
command or issue the mmchdisk command to change the status of the disks that have a status of replacing or replacement. 6027-377 Cannot replace a disk that is in use. 6027-382 Value value for the 'sector size' option for disk disk is not a multiple of value.
Explanation: When parsing disk lists, the sector size given is not a multiple of the default sector size. User response: Specify a correct sector size. 6027-383 Disk name name appears more than once.
Explanation: Attempting to replace a disk in place, but the disk specified in the mmrpldisk command is still available for use. User response: Use the mmchdisk command to stop GPFS's use of the disk. 6027-378 I/O still in progress near sector number on disk name.
Explanation: When parsing disk lists, a duplicate name is found. User response: Remove the duplicate name. 6027-384 Disk name name already in file system.
Explanation: The mmfsd daemon is attempting to terminate, but cannot because data structures in the daemon shared segment may still be referenced by kernel code. In particular, the daemon has started an I/O that has not yet completed. It is unsafe for the daemon to terminate until the I/O completes, because of asynchronous activity in the device driver that will access data structures belonging to the daemon. User response: Either wait for the I/O operation to time out, or issue a device-dependent command to terminate the I/O. 6027-379 Could not invalidate disk(s).
Explanation: When parsing disk lists, a disk name already exists in the file system. User response: Rename or remove the duplicate disk. 6027-385 Value value for the 'sector size' option for disk name is out of range. Valid values are number through number.
Explanation: When parsing disk lists, the sector size given is not valid. User response: Specify a correct sector size. 6027-386 Value value for the 'sector size' option for disk name is invalid.
Explanation: Trying to delete a disk and it could not be written to in order to invalidate its contents. User response: No action needed if removing that disk permanently. However, if the disk is ever to be used again, the -v flag must be specified with a value of no when using either the mmcrfs or mmadddisk command. 6027-380 Disk name missing from disk descriptor list entry name.
Explanation: When parsing disk lists, the sector size given is not valid. User response: Specify a correct sector size. 6027-387 Value value for the 'failure group' option for disk name is out of range. Valid values are number through number.
Explanation: When parsing disk lists, no disks were named. User response: Check the argument list of the command. 6027-381 Too many items in disk descriptor list entry name.
Explanation: When parsing disk lists, the failure group given is not valid. User response: Specify a correct failure group. 6027-388 Value value for the 'failure group' option for disk name is invalid.
Explanation: When parsing a disk descriptor, too many fields were specified for one disk. User response: Correct the disk descriptor to use the correct disk descriptor syntax.
Explanation: When parsing disk lists, the failure group given is not valid. User response: Specify a correct failure group. 6027-389 Value value for the 'has metadata' option for disk name is out of range. Valid values are number through number.
Explanation: When parsing disk lists, the 'has metadata' value given is not valid.
124
6027-390 6027-421
User response: Specify a correct 'has metadata' value. 6027-390 Value value for the 'has metadata' option for disk name is invalid. 3. Disks are not correctly defined on all active nodes. 4. Disks, logical volumes, network shared disks, or virtual shared disks were incorrectly re-configured after creating a file system. User response: Verify: 1. The disks are correctly defined on all nodes. 2. The paths to the disks are correctly defined and operational. 6027-417 Bad file system descriptor.
Explanation: When parsing disk lists, the 'has metadata' value given is not valid. User response: Specify a correct 'has metadata' value. 6027-391 Value value for the 'has data' option for disk name is out of range. Valid values are number through number.
Explanation: When parsing disk lists, the 'has data' value given is not valid. User response: Specify a correct 'has data' value. 6027-392 Value value for the 'has data' option for disk name is invalid.
Explanation: A file system descriptor that is not valid was encountered. User response: Verify: 1. The disks are correctly defined on all nodes. 2. The paths to the disks are correctly defined and operational. 6027-418 Inconsistent file system quorum. readQuorum=value writeQuorum=value quorumSize=value.
Explanation: When parsing disk lists, the 'has data' value given is not valid. User response: Specify a correct 'has data' value. 6027-393 Either the 'has data' option or the 'has metadata' option must be '1' for disk diskName.
Explanation: A file system descriptor that is not valid was encountered. User response: Start any disks that have been stopped by the mmchdisk command or by hardware failures. If the problem persists, run offline mmfsck. 6027-419 Failed to read a file system descriptor.
Explanation: When parsing disk lists the 'has data' or 'has metadata' value given is not valid. User response: Specify a correct 'has data' or 'has metadata' value. 6027-394 Too many disks specified for file system. Maximum = number.
Explanation: Not enough valid replicas of the file system descriptor could be read from the file system. User response: Start any disks that have been stopped by the mmchdisk command or by hardware failures. Verify that paths to all disks are correctly defined and operational. 6027-420 Inode size must be greater than zero.
Explanation: Too many disk names were passed in the disk descriptor list. User response: Check the disk descriptor list or the file containing the list. 6027-399 Not enough items in disk descriptor list entry, need fields.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-421 Inode size must be a multiple of logical sector size.
Explanation: When parsing a disk descriptor, not enough fields were specified for one disk. User response: Correct the disk descriptor to use the correct disk descriptor syntax. 6027-416 Incompatible file system descriptor version or not formatted.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center.
Explanation: Possible reasons for the error are: 1. A file system descriptor version that is not valid was encountered. 2. No file system descriptor can be found.
125
6027-422 6027-434
6027-422 Inode size must be at least as large as the logical sector size. 6027-428 Indirect block size must be a multiple of the minimum fragment size.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-423 Minimum fragment size must be a multiple of logical sector size.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-429 Indirect block size must be less than full data block size.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-424 Minimum fragment size must be greater than zero.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-430 Default metadata replicas must be less than or equal to default maximum number of metadata replicas.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-425 File system block size of blockSize is larger than maxblocksize parameter.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-431 Default data replicas must be less than or equal to default maximum number of data replicas.
Explanation: An attempt is being made to mount a file system whose block size is larger than the maxblocksize parameter as set by mmchconfig. User response: Use the mmchconfig maxblocksize=xxx command to increase the maximum allowable block size. 6027-426 Warning: mount detected unavailable disks. Use mmlsdisk fileSystem to see details.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-432 Default maximum metadata replicas must be less than or equal to value.
Explanation: The mount command detected that some disks needed for the file system are unavailable. User response: Without file system replication enabled, the mount will fail. If it has replication, the mount may succeed depending on which disks are unavailable. Use mmlsdisk to see details of the disk status. 6027-427 Indirect block size must be at least as large as the minimum fragment size.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-433 Default maximum data replicas must be less than or equal to value.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-434 Indirect blocks must be at least as big as inodes.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center.
126
6027-435 6027-465
| | | | | |
6027-435 The file system descriptor quorum has been overridden. system database and local mmsdrfs file for this file system. 6027-452 No disks found in disks= list.
Explanation: The mmfsctl exclude command was previously issued to override the file system descriptor quorum after a disaster. User response: None. Informational message only. 6027-438 Duplicate disk name name.
Explanation: No disks listed when opening a file system. User response: Check the operating system's file system database and local mmsdrfs file for this file system. 6027-453 No disk name found in a clause of the list.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-439 Disk name sector size value does not match sector size value of other disk(s).
Explanation: No disk name found in a clause of thedisks= list. User response: Check the operating system's file system database and local mmsdrfs file for this file system. 6027-461 Unable to find name device.
Explanation: An internal consistency check has found a problem with file system parameters. User response: Record the above information. Contact the IBM Support Center. 6027-441 Unable to open disk 'name' on node nodeName.
Explanation: Self explanatory. User response: There must be a /dev/sgname special device defined. Check the error code. This could indicate a configuration error in the specification of disks, logical volumes, network shared disks, or virtual shared disks. 6027-462 name must be a char or block special device.
Explanation: A disk name that is not valid was specified in a GPFS disk command. User response: Correct the parameters of the executing GPFS disk command. 6027-445 Value for option '-m' cannot exceed the number of metadata failure groups.
Explanation: Opening a file system. User response: There must be a /dev/sgname special device defined. This could indicate a configuration error in the specification of disks, logical volumes, network shared disks, or virtual shared disks. 6027-463 SubblocksPerFullBlock was not 32.
Explanation: The current number of replicas of metadata cannot be larger than the number of failure groups that are enabled to hold metadata. User response: Use a smaller value for -m on the mmchfs command, or increase the number of failure groups by adding disks to the file system. 6027-446 Value for option '-r' cannot exceed the number of data failure groups.
Explanation: The value of the SubblocksPerFullBlock variable was not 32. This situation should never exist, and indicates an internal error. User response: Record the above information and contact the IBM Support Center. 6027-465 The average file size must be at least as large as the minimum fragment size.
Explanation: The current number of replicas of data cannot be larger than the number of failure groups that are enabled to hold data. User response: Use a smaller value for -r on the mmchfs command, or increase the number of failure groups by adding disks to the file system. 6027-451 No disks= list found in mount options.
Explanation: When parsing the command line of tscrfs, it was discovered that the average file size is smaller than the minimum fragment size. User response: Correct the indicated command parameters.
Explanation: No 'disks=' clause found in the mount options list when opening a file system. User response: Check the operating system's file
127
6027-468 6027-477
6027-468 Disk name listed in fileName or local mmsdrfs file, not found in device name. Run: mmcommon recoverfs name. 6027-472 File system format version versionString is not supported.
Explanation: Tried to access a file system but the disks listed in the operating system's file system database or the local mmsdrfs file for the device do not exist in the file system. User response: Check the configuration and availability of disks. Run the mmcommon recoverfs device command. If this does not resolve the problem, configuration data in the SDR may be incorrect. If no user modifications have been made to the SDR, contact the IBM Support Center. If user modifications have been made, correct these modifications. 6027-469 File system name does not match descriptor.
Explanation: The current file system format version is not supported. User response: Verify: 1. The disks are correctly defined on all nodes. 2. The paths to the disks are correctly defined and operative. 6027-473 File System fileSystem unmounted by the system with return code value and reason code value.
Explanation: Console log entry caused by a forced unmount due to disk or communication failure. User response: Correct the underlying problem and remount the file system. 6027-474 Recovery Log I/O Failed, unmounting fileSystem.
Explanation: The file system name found in the descriptor on disk does not match the corresponding device name in /etc/filesystems. User response: Check the operating system's file system database. 6027-470 Disk name may still belong to file system filesystem. Created on IPandTime.
Explanation: I/O to the recovery log failed. User response: Check the paths to all disks making up the file system. Run the mmlsdisk command to determine if GPFS has declared any disks unavailable. Repair any paths to disks that have failed. Remount the file system. 6027-475 The option '--inode-limit' is not enabled. Use option '-V' to enable most recent features.
Explanation: The disk being added by the mmcrfs, mmadddisk, or mmrpldisk command appears to still belong to some file system. User response: Verify that the disks you are adding do not belong to an active file system, and use the -v no option to bypass this check. Use this option only if you are sure that no other file system has this disk configured because you may cause data corruption in both file systems if this is not the case. 6027-471 Disk diskName: Incompatible file system descriptor version or not formatted.
Explanation: mmchfs --inode-limit is not enabled under the current file system format version. User response: Run mmchfs -V, this will change the file system format to the latest format supported. 6027-476 Restricted mount using only available file system descriptor.
Explanation: Possible reasons for the error are: 1. A file system descriptor version that is not valid was encountered. 2. No file system descriptor can be found. 3. Disks are not correctly defined on all active nodes. 4. Disks, logical volumes, network shared disks, or virtual shared disks were incorrectly reconfigured after creating a file system. User response: Verify: 1. The disks are correctly defined on all nodes. 2. The paths to the disks are correctly defined and operative.
Explanation: Fewer than the necessary number of file system descriptors were successfully read. Using the best available descriptor to allow the restricted mount to continue. User response: Informational message only. 6027-477 The option -z is not enabled. Use the -V option to enable most recent features.
Explanation: The file system format version does not support the -z option on the mmchfs command. User response: Change the file system format version by issuing mmchfs -V.
128
6027-478 6027-488
6027-478 The option -z could not be changed. fileSystem is still in use. 6027-484 Remount failed for device after daemon restart.
Explanation: The file system is still mounted or another GPFS administration command (mm...) is running against the file system. User response: Unmount the file system if it is mounted, and wait for any command that is running to complete before reissuing the mmchfs -z command. 6027-479 Mount of fsName was blocked by fileName.
Explanation: A remount failed after daemon restart. This ordinarily occurs because one or more disks are unavailable. Other possibilities include loss of connectivity to one or more disks. User response: Issue the mmlsdisk command and check for down disks. Issue the mmchdisk command to start any down disks, then remount the file system. If there is another problem with the disks or the connections to the disks, take necessary corrective actions and remount the file system. 6027-485 Perform mmchdisk for any disk failures and re-mount.
Explanation: The internal or external mount of the file system was blocked by the existence of the specified file. User response: If the file system needs to be mounted, remove the specified file. 6027-480 Cannot enable DMAPI in a file system with existing snapshots.
Explanation: Occurs in conjunction with 6027-484. User response: Follow the User Response for 6027-484. 6027-486 No local device specified for fileSystemName in clusterName.
Explanation: The user is not allowed to enable DMAPI for a file system with existing snapshots. User response: Delete all existing snapshots in the file system and repeat the mmchfs command. 6027-481 Remount failed for mountid id: errnoDescription.
Explanation: While attempting to mount a remote file system from another cluster, GPFS was unable to determine the local device name for this file system. User response: There must be a /dev/sgname special device defined. Check the error code. This is probably a configuration error in the specification of a remote file system. Run mmremotefs show to check that the remote file system is properly configured. 6027-487 Failed to write the file system descriptor to disk diskName.
Explanation: mmfsd restarted and tried to remount any file systems that the VFS layer thinks are still mounted. User response: Check the errors displayed and the errno description. 6027-482 Remount failed for device id: errnoDescription.
Explanation: mmfsd restarted and tried to remount any file systems that the VFS layer thinks are still mounted. User response: Check the errors displayed and the errno description. 6027-483 Remounted name.
Explanation: An error occurred when mmfsctl include was writing a copy of the file system descriptor to one of the disks specified on the command line. This could have been caused by a failure of the corresponding disk device, or an error in the path to the disk. User response: Verify that the disks are correctly defined on all nodes. Verify that paths to all disks are correctly defined and operational. 6027-488 Error opening the exclusion disk file fileName.
Explanation: mmfsd restarted and remounted the specified file system because it was in the kernel's list of previously mounted file systems. User response: Informational message only.
Explanation: Unable to retrieve the list of excluded disks from an internal configuration file. User response: Ensure that GPFS executable files have been properly installed on all nodes. Perform required configuration steps prior to starting GPFS.
129
6027-489 6027-499
| 6027-489 | | | |
Attention: The desired replication factor exceeds the number of available dataOrMetadata failure groups. This is allowed, but the files will not be replicated and will therefore be at risk. 6027-495 You have requested that the file system be upgraded to version number. This will enable new functionality but will prevent you from using the file system with earlier releases of GPFS. Do you want to continue?
| Explanation: You specified a number of replicas that | exceeds the number of failure groups available.
User response: Reissue the command with a smaller replication factor, or increase the number of failure groups. 6027-490 The descriptor replica on disk name has been excluded.
Explanation: Verification request in response to the mmchfs -V full command. This is a request to upgrade the file system and activate functions that are incompatible with a previous release of GPFS. User response: Enter yes if you want the conversion to take place. 6027-496 You have requested that the file system version for local access be upgraded to version number. This will enable some new functionality but will prevent local nodes from using the file system with earlier releases of GPFS. Remote nodes are not affected by this change. Do you want to continue?
Explanation: The file system descriptor quorum has been overridden and, as a result, the specified disk was excluded from all operations on the file system descriptor quorum. User response: None. Informational message only. 6027-492 The file system is already at file system version number
Explanation: The user tried to upgrade the file system format using mmchfs -V --version=v, but the specified version is smaller than the current version of the file system. User response: Specify a different value for the --version option. 6027-493 File system version number is not supported on nodeName nodes in the cluster.
Explanation: Verification request in response to the mmchfs -V command. This is a request to upgrade the file system and activate functions that are incompatible with a previous release of GPFS. User response: Enter yes if you want the conversion to take place. 6027-497 The file system has already been upgraded to number using -V full. It is not possible to revert back.
Explanation: The user tried to upgrade the file system format using mmchfs -V, but some nodes in the local cluster are still running an older GPFS release that does support the new format version. User response: Install a newer version of GPFS on those nodes. 6027-494 File system version number is not supported on nodeName remote nodes mounting the file system.
Explanation: The user tried to upgrade the file system format using mmchfs -V compat, but the file system has already been fully upgraded. User response: Informational message only. 6027-498 Incompatible file system format. Only file systems formatted with GPFS 3.2.1.5 or later can be mounted on this platform.
Explanation: The user tried to upgrade the file system format using mmchfs -V, but the file system is still mounted on some nodes in remote clusters that do support the new format version. User response: Unmount the file system on the nodes that do not support the new format version.
Explanation: A user running GPFS on Microsoft Windows tried to mount a file system that was formatted with a version of GPFS that did not have Windows support. User response: Create a new file system using current GPFS code. 6027-499 An unexpected Device Mapper path dmDevice (nsdId) has been detected. The new path does not have a Persistent Reserve set up. File system fileSystem will be internally unmounted.
Explanation: A new device mapper path is detected or a previously failed path is activated after the local
130
6027-500 6027-518
device discovery has finished. This path lacks a Persistent Reserve, and can not be used. All device paths must be active at mount time. User response: Check the paths to all disks making up the file system. Repair any paths to disks which have failed. Remount the file system. 6027-500 name loaded and configured. 6027-512 name not listed in /etc/vfs Explanation: Error occurred while installing the GPFS kernel extension, or when trying to mount a file system. User response: Check for the mmfs entry in /etc/vfs 6027-514 Cannot mount fileSystem on mountPoint: Already mounted. 6027-511 Cannot unmount fileSystem: errorDescription
Explanation: There was an error unmounting the GPFS file system. User response: Take the action indicated by errno description.
Explanation: The kernel extension was loaded and configured. User response: None. Informational message only. 6027-501 name:module moduleName unloaded.
Explanation: The kernel extension was unloaded. User response: None. Informational message only. 6027-502 Incorrect parameter: name.
Explanation: An attempt has been made to mount a file system that is already mounted. User response: None. Informational message only. 6027-515 Cannot mount fileSystem on mountPoint
Explanation: mmfsmnthelp was called with an incorrect parameter. User response: Contact the IBM Support Center. 6027-504 Not enough memory to allocate internal data structure.
Explanation: There was an error mounting the named GPFS file system. Errors in the disk path usually cause this problem. User response: Take the action indicated by other error messages and error log entries. 6027-516 Cannot mount fileSystem
Explanation: Self explanatory. User response: Increase ulimit or paging space 6027-505 Internal error, aborting.
Explanation: Self explanatory. User response: Contact the IBM Support Center. 6027-506 program: loadFile is already loaded at address.
Explanation: There was an error mounting the named GPFS file system. Errors in the disk path usually cause this problem. User response: Take the action indicated by other error messages and error log entries. 6027-517 Cannot mount fileSystem: errorString
Explanation: The program was already loaded at the address displayed. User response: None. Informational message only. 6027-507 program: loadFile is not loaded.
Explanation: There was an error mounting the named GPFS file system. Errors in the disk path usually cause this problem. User response: Take the action indicated by other error messages and error log entries. 6027-518 Cannot mount fileSystem: Already mounted.
Explanation: The program could not be loaded. User response: None. Informational message only. 6027-510 Cannot mount fileSystem on mountPoint: errorString
Explanation: An attempt has been made to mount a file system that is already mounted. User response: None. Informational message only.
Explanation: There was an error mounting the GPFS file system. User response: Determine action indicated by the error messages and error log entries. Errors in the disk path often cause this problem.
131
6027-519 6027-539
6027-519 Cannot mount fileSystem on mountPoint: File system table full. 6027-535 Disks up to size size can be added to storage pool pool.
Explanation: An attempt has been made to mount a file system when the file system table is full. User response: None. Informational message only. 6027-520 Cannot mount fileSystem: File system table full.
Explanation: Based on the parameters given to mmcrfs and the size and number of disks being formatted, GPFS has formatted its allocation maps to allow disks up the given size to be added to this storage pool by the mmadddisk command. User response: None. Informational message only. If the reported maximum disk size is smaller than necessary, delete the file system with mmdelfs and rerun mmcrfs with either larger disks or a larger value for the -n parameter. 6027-536 Insufficient system memory to run GPFS daemon. Reduce page pool memory size with the mmchconfig command or add additional RAM to system.
Explanation: An attempt has been made to mount a file system when the file system table is full. User response: None. Informational message only. 6027-530 Mount of name failed: cannot mount restorable file system for read/write.
Explanation: A file system marked as enabled for restore cannot be mounted read/write. User response: None. Informational message only. 6027-531 The following disks of name will be formatted on node nodeName: list.
Explanation: Insufficient memory for GPFS internal data structures with current system and GPFS configuration. User response: Reduce page pool usage with the mmchconfig command, or add additional RAM to system. 6027-537 Disks up to size size can be added to this file system.
Explanation: Output showing which disks will be formatted by the mmcrfs command. User response: None. Informational message only. 6027-532 The quota record number in file fileName is not valid.
Explanation: A quota entry contained a checksum that is not valid. User response: Remount the file system with quotas disabled. Restore the quota file from back up, and run mmcheckquota. 6027-533 Inode space inodeSpace in file system fileSystem is approaching the limit for the maximum number of inodes.
Explanation: Based on the parameters given to the mmcrfs command and the size and number of disks being formatted, GPFS has formatted its allocation maps to allow disks up the given size to be added to this file system by the mmadddisk command. User response: None, informational message only. If the reported maximum disk size is smaller than necessary, delete the file system with mmdelfs and reissue the mmcrfs command with larger disks or a larger value for the -n parameter. 6027-538 Error accessing disks.
Explanation: The number of files created is approaching the file system limit. User response: Use the mmchfileset command to increase the maximum number of files to avoid reaching the inode limit and possible performance degradation. 6027-534 Cannot create a snapshot in a DMAPI-enabled file system, rc=returnCode.
Explanation: The mmcrfs command encountered an error accessing one or more of the disks. User response: Verify that the disk descriptors are coded correctly and that all named disks exist and are online. 6027-539 Unable to clear descriptor areas for fileSystem.
Explanation: You cannot create a snapshot in a DMAPI-enabled file system. User response: Use the mmchfs command to disable DMAPI, and reissue the command.
Explanation: The mmdelfs command encountered an error while invalidating the file system control structures on one or more disks in the file system being deleted. User response: If the problem persists, specify the -p option on the mmdelfs command.
132
6027-540 6027-553
6027-540 Formatting file system. 6027-547 Fileset filesetName was unlinked.
Explanation: The mmcrfs command began to write file system data structures onto the new disks. User response: None. Informational message only.
Explanation: Fileset was already unlinked. User response: None. Informational message only. 6027-548 Fileset filesetName unlinked from filesetName.
6027-541
Explanation: mmcrfs command encountered an error while formatting a new file system. This is often an I/O error. User response: Check the subsystems in the path to the disk. Follow the instructions from other messages that appear with this one. 6027-542 Fileset filesetName (id filesetId) has been incompletely deleted.
Explanation: A fileset being deleted contains junctions to other filesets. The cited fileset were unlinked. User response: None. Informational message only. 6027-549 Failed to open name.
Explanation: The mount command was unable to access a file system. Check the subsystems in the path to the disk. This is often an I/O error. User response: Follow the suggested actions for the other messages that occur with this one. 6027-550 Allocation manager for fileSystem failed to revoke ownership from node nodeName.
Explanation: A fileset delete operation was interrupted, leaving this fileset in an incomplete state. User response: Reissue the fileset delete command. 6027-543 Error writing file system descriptor for fileSystem.
Explanation: The mmcrfs command could not successfully write the file system descriptor in a particular file system. Check the subsystems in the path to the disk. This is often an I/O error. User response: Check system error log, rerun mmcrfs.
Explanation: An irrecoverable error occurred trying to revoke ownership of an allocation region. The allocation manager has panicked the file system to prevent corruption of on-disk data. User response: Remount the file system. 6027-551 fileSystem is still in use.
6027-544
Explanation: A disk could not be written to invalidate its contents. Check the subsystems in the path to the disk. This is often an I/O error. User response: Ensure the indicated logical volume is writable. 6027-545 Error processing fileset metadata file.
Explanation: The mmdelfs or mmcrfs command found that the named file system is still mounted or that another GPFS command is running against the file system. User response: Unmount the file system if it is mounted, or wait for GPFS commands in progress to terminate before retrying the command. 6027-552 Scan completed successfully.
Explanation: There is no I/O path to critical metadata or metadata has been corrupted. User response: Verify that the I/O paths to all disks are valid and that all disks are either in the 'recovering' or 'up' availability states. If all disks are available and the problem persists, issue the mmfsck command to repair damaged metadata 6027-546 Error processing allocation map for storage pool poolName.
Explanation: The scan function has completed without error. User response: None. Informational message only. 6027-553 Scan failed on number user or system files.
Explanation: Data may be lost as a result of pointers that are not valid or unavailable disks. User response: Some files may have to be restored from backup copies. Issue the mmlsdisk command to check the availability of all the disks that make up the file system.
Explanation: There is no I/O path to critical metadata, or metadata has been corrupted. User response: Verify that the I/O paths to all disks are valid, and that all disks are either in the 'recovering' or 'up' availability. Issue the mmlsdisk command.
133
6027-554 6027-567
6027-554 Scan failed on number out of number user or system files. 6027-560 File system is already suspended.
Explanation: Data may be lost as a result of pointers that are not valid or unavailable disks. User response: Some files may have to be restored from backup copies. Issue the mmlsdisk command to check the availability of all the disks that make up the file system. 6027-555 The desired replication factor exceeds the number of available failure groups.
Explanation: The tsfsctl command was asked to suspend a suspended file system. User response: None. Informational message only. 6027-561 Error migrating log.
Explanation: There are insufficient available disks to continue operation. User response: Restore the unavailable disks and reissue the command. 6027-562 Error processing inodes.
Explanation: You have specified a number of replicas that exceeds the number of failure groups available. User response: Reissue the command with a smaller replication factor or increase the number of failure groups. 6027-556 Not enough space for the desired number of replicas.
Explanation: There is no I/O path to critical metadata or metadata has been corrupted. User response: Verify that the I/O paths to all disks are valid and that all disks are either in the recovering or up availability. Issue the mmlsdisk command. 6027-563 File system is already running.
Explanation: In attempting to restore the correct replication, GPFS ran out of space in the file system. The operation can continue but some data is not fully replicated. User response: Make additional space available and reissue the command. 6027-557 Not enough space or available disks to properly balance the file.
Explanation: The tsfsctl command was asked to resume a file system that is already running. User response: None. Informational message only. 6027-564 Error processing inode allocation map.
Explanation: There is no I/O path to critical metadata or metadata has been corrupted. User response: Verify that the I/O paths to all disks are valid and that all disks are either in the recovering or up availability. Issue the mmlsdisk command. 6027-565 Scanning user file metadata ...
Explanation: In attempting to stripe data within the file system, data was placed on a disk other than the desired one. This is normally not a problem. User response: Run mmrestripefs to rebalance all files. 6027-558 Some data are unavailable.
Explanation: Progress information. User response: None. Informational message only. 6027-566 Error processing user file metadata.
Explanation: An I/O error has occurred or some disks are in the stopped state. User response: Check the availability of all disks by issuing the mmlsdisk command and check the path to all disks. Reissue the command. 6027-559 Some data could not be read or written.
Explanation: Error encountered while processing user file metadata. User response: None. Informational message only.
Explanation: An I/O error has occurred or some disks are in the stopped state. User response: Check the availability of all disks and the path to all disks, and reissue the command.
6027-567
134
6027-568 6027-581
6027-568 Waiting for number pending file system scans to finish ... 6027-575 Unable to complete low level format for fileSystem.
Explanation: Progress information. User response: None. Informational message only. 6027-569 Incompatible parameters. Unable to allocate space for file system metadata. Change one or more of the following as suggested and try again:
Explanation: The mmcrfs command was unable to create the low level file structures for the file system. User response: Check other error messages and the error log. This is usually an error accessing disks. 6027-576 Storage pools have not been enabled for file system fileSystem.
Explanation: Incompatible file system parameters were detected. User response: Refer to the details given and correct the file system parameters. 6027-570 Incompatible parameters. Unable to create file system. Change one or more of the following as suggested and try again:
Explanation: User invoked a command with a storage pool option (-p or -P) before storage pools were enabled. User response: Enable storage pools with the mmchfs -V command, or correct the command invocation and reissue the command. 6027-577 Attention: number user or system files are not properly replicated.
Explanation: Incompatible file system parameters were detected. User response: Refer to the details given and correct the file system parameters. 6027-571 Logical sector size value must be the same as disk sector size.
Explanation: GPFS has detected files that are not replicated correctly due to a previous failure. User response: Issue the mmrestripefs command at the first opportunity. 6027-578 Attention: number out of number user or system files are not properly replicated:
Explanation: This message is produced by the mmcrfs command if the sector size given by the -l option is not the same as the sector size given for disks in the -d option. User response: Correct the options and reissue the command. 6027-572 Completed creation of file system fileSystem.
Explanation: GPFS has detected files that are not replicated correctly 6027-579 Some unreplicated file system metadata has been lost. File system usable only in restricted mode.
Explanation: A disk was deleted that contained vital file system metadata that was not replicated. User response: Mount the file system in restricted mode (-o rs) and copy any user data that may be left on the file system. Then delete the file system. 6027-580 Unable to access vital system metadata, too many disks are unavailable.
Explanation: The mmcrfs command has successfully completed. User response: None. Informational message only. 6027-573 All data on following disks of fileSystem will be destroyed:
Explanation: Produced by the mmdelfs command to list the disks in the file system that is about to be destroyed. Data stored on the disks will be lost. User response: None. Informational message only. 6027-574 Completed deletion of file system fileSystem.
Explanation: Metadata is unavailable because the disks on which the data reside are stopped, or an attempt was made to delete them. User response: Either start the stopped disks, try to delete the disks again, or recreate the file system. 6027-581 Unable to access vital system metadata, file system corrupted.
Explanation: The mmdelfs command has successfully completed. User response: None. Informational message only.
Explanation: When trying to access the files system, the metadata was unavailable due to a disk being deleted.
135
6027-582 6027-593
User response: Determine why a disk is unavailable. 6027-582 Some data has been lost. command must be run with the file system unmounted. 6027-588 No more than number nodes can mount a file system.
Explanation: An I/O error has occurred or some disks are in the stopped state. User response: Check the availability of all disks by issuing the mmlsdisk command and check the path to all disks. Reissue the command. 6027-584 Incompatible parameters. Unable to allocate space for root directory. Change one or more of the following as suggested and try again:
Explanation: The limit of the number of nodes that can mount a file system was exceeded. User response: Observe the stated limit for how many nodes can mount a file system. 6027-589 Scanning file system metadata, phase number ...
Explanation: Progress information. User response: None. Informational message only. 6027-590 GPFS is experiencing a shortage of pagepool. This message will not be repeated for at least one hour.
Explanation: Inconsistent parameters have been passed to the mmcrfs command, which would result in the creation of an inconsistent file system. Suggested parameter changes are given. User response: Reissue the mmcrfs command with the suggested parameter changes. 6027-585 Incompatible parameters. Unable to allocate space for ACL data. Change one or more of the following as suggested and try again:
Explanation: Pool starvation occurs, buffers have to be continually stolen at high aggressiveness levels. User response: Issue the mmchconfig command to increase the size of pagepool. 6027-591 Unable to allocate sufficient inodes for file system metadata. Increase the value for option and try again.
Explanation: Inconsistent parameters have been passed to the mmcrfs command, which would result in the creation of an inconsistent file system. The parameters entered require more space than is available. Suggested parameter changes are given. User response: Reissue the mmcrfs command with the suggested parameter changes. 6027-586 Quota server initialization failed.
Explanation: Too few inodes have been specified on the -N option of the mmcrfs command. User response: Increase the size of the -N option and reissue the mmcrfs command. 6027-592 Mount of fileSystem is waiting for the mount disposition to be set by some data management application.
Explanation: Quota server initialization has failed. This message may appear as part of the detail data in the quota error log. User response: Check status and availability of the disks. If quota files have been corrupted, restore them from the last available backup. Finally, reissue the command. 6027-587 Unable to initialize quota client because there is no quota server. Please check error log on the file system manager node. The mmcheckquota command must be run with the file system unmounted before retrying the command.
Explanation: Data management utilizing DMAPI is enabled for the file system, but no data management application has set a disposition for the mount event. User response: Start the data management application and verify that the application sets the mount disposition. 6027-593 The root quota entry is not found in its assigned record.
Explanation: On mount, the root entry is not found in the first record of the quota file. User response: Issue the mmcheckquota command to verify that the use of root has not been lost.
Explanation: startQuotaClient failed. User response: If the quota file could not be read (check error log on file system manager. Issue the mmlsmgr command to determine which node is the file system manager), then the mmcheckquota
136
6027-594 6027-601
6027-594 Disk diskName cannot be added to storage pool poolName. Allocation map cannot accommodate disks larger than size MB. 6027-597 The quota command is requested to process quotas for a type (user, group, or fileset), which is not enabled
Explanation: The specified disk is too large compared to the disks that were initially used to create the storage pool. User response: Specify a smaller disk or add the disk to a new storage pool. 6027-595 While creating quota files, file file name, with no valid quota information, was found in the root directory. Please remove files with reserved quota file names (for example, user.quota) without valid quota information from the root directory by: 1 mounting the file system without quotas. 2. removing the files. 3. remounting the file system with quotas to recreate new quota files. To use quota file names other than the reserved names, use the mmcheckquota command.
Explanation: A quota command is requested to process quotas for a user, group, or fileset quota type, which is not enabled. User response: Verify that the user, group, or fileset quota type is enabled and reissue the command. 6027-598 The supplied file does not contain quota information.
Explanation: A file supplied as a quota file does not contain quota information. User response: Change the file so it contains valid quota information and reissue the command. To mount the file system so that new quota files are created: 1. Mount the file system without quotas. 2. Verify there are no files in the root directory with the reserved user.quota or group.quota name. 3. Remount the file system with quotas. 6027-599 File supplied to the command does not exist in the root directory.
Explanation: While mounting a file system, the state of the file system descriptor indicates that quota files do not exist. However, files that do not contain quota information but have one of the reserved names: user.quota, group.quota, or fileset.quota exist in the root directory. User response: To mount the file system so that new quota files will be created, perform these steps: 1. Mount the file system without quotas. 2. Verify that there are no files in the root directory with the reserved names: user.quota, group.quota, or fileset.quota. 3. Remount the file system with quotas. To mount the file system with other files used as quota files, issue the mmcheckquota command. 6027-596 While creating quota files, file file name containing quota information was found in the root directory. This file will be used as quota type quota file.
Explanation: The user-supplied name of a new quota file has not been found. User response: Ensure that a file with the supplied name exists. Then reissue the command. 6027-600 On node nodeName an earlier error may have caused some file system data to be inaccessible at this time. Check error log for additional information. After correcting the problem, the file system can be mounted again to restore normal data access.
Explanation: An earlier error may have caused some file system data to be inaccessible at this time. User response: Check the error log for additional information. After correcting the problem, the file system can be mounted again. 6027-601 Error changing pool size.
Explanation: While mounting a file system, the state of the file system descriptor indicates that quota files do not exist. However, files that have one of the reserved names user.quota, group.quota, or fileset.quota and contain quota information, exist in the root directory. The file with the reserved name will be used as the quota file. User response: None. Informational message.
Explanation: The mmchconfig command failed to change the pool size to the requested value. User response: Follow the suggested actions in the other messages that occur with this one.
137
6027-602 6027-613
6027-602 ERROR: file system not mounted. Mount file system fileSystem and retry command. disk are unavailable, and issue the mmchdisk if necessary. 6027-609 File system fileSystem unmounted because it does not have a manager.
Explanation: A GPFS command that requires the file system be mounted was issued. User response: Mount the file system and reissue the command. 6027-603 Current pool size: valueK = valueM, max block size: valueK = valueM.
Explanation: The file system had to be unmounted because a file system manager could not be assigned. An accompanying message tells which node was the last manager. User response: Examine error log on the last file system manager. Issue the mmlsdisk command to determine if a number of disks are down. Examine the other error logs for an indication of network, disk, or virtual shared disk problems. Repair the base problem and issue the mmchdisk command if required. 6027-610 Cannot mount file system fileSystem because it does not have a manager.
Explanation: Displays the current pool size. User response: None. Informational message only. 6027-604 Parameter incompatibility. File system block size is larger than maxblocksize parameter.
Explanation: An attempt is being made to mount a file system whose block size is larger than the maxblocksize parameter as set by mmchconfig. User response: Use the mmchconfig maxblocksize=xxx command to increase the maximum allowable block size. 6027-605 File system has been renamed.
Explanation: The file system had to be unmounted because a file system manager could not be assigned. An accompanying message tells which node was the last manager. User response: Examine error log on the last file system manager node. Issue the mmlsdisk command to determine if a number of disks are down. Examine the other error logs for an indication of disk or network shared disk problems. Repair the base problem and issue the mmchdisk command if required. 6027-611 Recovery: file system, delay number sec. for safe recovery.
Explanation: Self-explanatory. User response: None. Informational message only. 6027-606 The node number nodeNumber is not defined in the node list.
Explanation: A node matching nodeNumber was not found in the GPFS configuration file. User response: Perform required configuration steps prior to starting GPFS on the node. 6027-607 mmcommon getEFOptions fileSystem failed. Return code value.
Explanation: Informational. When disk leasing is in use, wait for the existing lease to expire before performing log and token manager recovery. User response: None. 6027-612 Unable to run command while the file system is suspended.
Explanation: The mmcommon getEFOptions command failed while looking up the names of the disks in a file system. This error usually occurs during mount processing. User response: Check the preceding messages. A frequent cause for such errors is lack of space in /var. 6027-608 file system manager takeover failed.
Explanation: A command that can alter data in a file system was issued while the file system was suspended. User response: Resume the file system and reissue the command. 6027-613 Expel node request from node. Expelling: node.
Explanation: An attempt to takeover as file system manager failed. The file system is unmounted to allow another node to try. User response: Check the return code. This is usually due to network or disk connectivity problems. Issue the mmlsdisk command to determine if the paths to the
Explanation: One node is asking to have another node expelled from the cluster, usually because they have communications problems between them. The cluster manager node will decide which one will be expelled. User response: Check that the communications paths are available between the two nodes.
138
6027-614 6027-628
6027-614 Value value for option name is out of range. Valid values are number through number. 6027-621 Negative quota limits are not allowed.
Explanation: The quota value must be positive. User response: Reissue the mmedquota command and enter valid values when editing the information. 6027-622 Failed to join remote cluster name.
Explanation: The value for an option in the command line arguments is out of range. User response: Correct the command line and reissue the command. 6027-615 mmcommon getContactNodes clusterName failed. Return code value.
Explanation: The node was not able to establish communication with another cluster, usually while attempting to mount a file system from a remote cluster. User response: Check other console messages for additional information. Verify that contact nodes for the remote cluster are set correctly. Run mmremotefs show and mmremotecluster show to display information about the remote cluster. 6027-623 All disks up and ready.
Explanation: mmcommon getContactNodes failed while looking up contact nodes for a remote cluster, usually while attempting to mount a file system from a remote cluster. User response: Check the preceding messages, and consult the earlier chapters of this document. A frequent cause for such errors is lack of space in /var. 6027-616 Duplicate address ipAddress in node list.
Explanation: Self-explanatory. User response: None. Informational message only. 6027-624 No disks
Explanation: The IP address appears more than once in the node list file. User response: Check the node list shown by the mmlscluster command. 6027-617 Recovered number nodes for cluster clusterName.
Explanation: Self-explanatory. User response: None. Informational message only. 6027-625 Migrate already pending.
Explanation: The asynchronous part (phase 2) of node failure recovery has completed. User response: None. Informational message only. 6027-618 Local host not found in node list (local ip interfaces: interfaceList).
Explanation: A request to migrate the file system manager failed because a previous migrate request has not yet completed. User response: None. Informational message only. 6027-626 Migrate to node nodeName already pending.
Explanation: The local host specified in the node list file could not be found. User response: Check the node list shown by the mmlscluster command. 6027-619 Negative grace times are not allowed.
Explanation: A request to migrate the file system manager failed because a previous migrate request has not yet completed. User response: None. Informational message only.
Explanation: The mmedquota command received a negative value for the -t option. User response: Reissue the mmedquota command with a nonnegative value for grace time. 6027-620 Hard quota limit must not be less than soft limit.
6027-627
Explanation: A request has been made to change the file system manager node to the node that is already the manager. User response: None. Informational message only. 6027-628 Sending migrate request to current manager node nodeName.
Explanation: The hard quota limit must be greater than or equal to the soft quota limit. User response: Reissue the mmedquota command and enter valid values when editing the information.
Explanation: A request has been made to change the file system manager node.
139
6027-629 6027-640
User response: None. Informational message only. 6027-629 Node nodeName resigned as manager for fileSystem. 6027-635 The current file system manager failed and no new manager will be appointed.
Explanation: Progress report produced by the mmchmgr command. User response: None. Informational message only. 6027-630 Node nodeName appointed as manager for fileSystem.
Explanation: The file system manager node could not be replaced. This is usually caused by other system errors, such as disk or communication errors. User response: See accompanying messages for the base failure. 6027-636 Disks marked as stopped or offline.
Explanation: The mmchmgr command successfully changed the node designated as the file system manager. User response: None. Informational message only. 6027-631 Failed to appoint node nodeName as manager for fileSystem.
Explanation: A disk continues to be marked down due to a previous error and was not opened again. User response: Check the disk status by issuing the mmlsdisk command, then issue the mmchdisk start command to restart the disk. 6027-637 RVSD is not active.
Explanation: A request to change the file system manager node has failed. User response: Accompanying messages will describe the reason for the failure. Also, see the mmfs.log file on the target node. 6027-632 Failed to appoint new manager for fileSystem.
Explanation: The RVSD subsystem needs to be activated. User response: See the appropriate IBM Reliable Scalable Cluster Technology (RSCT) document at: publib.boulder.ibm.com/clresctr/windows/public/ rsctbooks.html and search on diagnosing IBM Virtual Shared Disk problems. 6027-638 File system fileSystem unmounted by node nodeName.
Explanation: An attempt to change the file system manager node has failed. User response: Accompanying messages will describe the reason for the failure. Also, see the mmfs.log file on the target node. 6027-633 Best choice node nodeName already manager for fileSystem.
Explanation: Produced in the console log on a forced unmount of the file system caused by disk or communication failures. User response: Check the error log on the indicated node. Correct the underlying problem and remount the file system. 6027-639 File system cannot be mounted in restricted mode and ro or rw concurrently.
Explanation: Informational message about the progress and outcome of a migrate request. User response: None. Informational message only. 6027-634 Node name or number node is not valid.
Explanation: There has been an attempt to concurrently mount a file system on separate nodes in both a normal mode and in 'restricted' mode. User response: Decide which mount mode you want to use, and use that mount mode on both nodes. 6027-640 File system is mounted.
Explanation: A node number, IP address, or host name that is not valid has been entered in the configuration file or as input for a command. User response: Validate your configuration information and the condition of your network. This message may result from an inability to translate a node name.
Explanation: A command has been issued that requires that the file system be unmounted. User response: Unmount the file system and reissue the command.
140
6027-641 6027-661
6027-641 Unable to access vital system metadata. Too many disks are unavailable or the file system is corrupted. 6027-646 File system unmounted due to loss of cluster membership.
Explanation: An attempt has been made to access a file system, but the metadata is unavailable. This can be caused by: 1. The disks on which the metadata resides are either stopped or there was an unsuccessful attempt to delete them. 2. The file system is corrupted. User response: To access the file system: 1. If the disks are the problem either start the stopped disks or try to delete them. 2. If the file system has been corrupted, you will have to recreate it from backup medium. 6027-642 File system has been deleted.
Explanation: Quorum was lost, causing file systems to be unmounted. User response: Get enough nodes running the GPFS daemon to form a quorum. 6027-647 File fileName could not be run with err errno.
Explanation: The specified shell script could not be run. This message is followed by the error string that is returned by the exec. User response: Check file existence and access permissions. 6027-648 EDITOR environment variable must be full pathname.
Explanation: Self-explanatory. User response: None. Informational message only. 6027-643 Node nodeName completed takeover for fileSystem.
Explanation: The value of the EDITOR environment variable is not an absolute path name. User response: Change the value of the EDITOR environment variable to an absolute path name. 6027-649 Error reading the mmpmon command file.
Explanation: The mmchmgr command completed successfully. User response: None. Informational message only. 6027-644 The previous error was detected on node nodeName.
Explanation: An error occurred when reading the mmpmon command file. User response: Check file existence and access permissions. 6027-650 The mmfs daemon is shutting down abnormally.
Explanation: An unacceptable error was detected. This usually occurs when attempting to retrieve file system information from the operating system's file system database or the cached GPFS system control data. The message identifies the node where the error was encountered. User response: See accompanying messages for the base failure. A common cause for such errors is lack of space in /var. 6027-645 Attention: mmcommon getEFOptions fileSystem failed. Checking fileName.
Explanation: The GPFS daemon is shutting down as a result of an irrecoverable condition, typically a resource shortage. User response: Review error log entries, correct a resource shortage condition, and restart the GPFS daemon. 6027-660 Error displaying message from mmfsd.
Explanation: The names of the disks in a file system were not found in the cached GPFS system data, therefore an attempt will be made to get the information from the operating system's file system database. User response: If the command fails, see File system will not mount on page 61. A common cause for such errors is lack of space in /var.
Explanation: GPFS could not properly display an output string sent from the mmfsd daemon due to some error. A description of the error follows. User response: Check that GPFS is properly installed. 6027-661 mmfsd waiting for primary node nodeName.
Explanation: The mmfsd server has to wait during start up because mmfsd on the primary node is not yet ready. User response: None. Informational message only.
Chapter 11. Messages
141
6027-662 6027-675
6027-662 mmfsd timed out waiting for primary node nodeName. 6027-668 Could not send message to file system daemon
Explanation: The mmfsd server is about to terminate. User response: Ensure that the mmfs.cfg configuration file contains the correct host name or IP address of the primary node. Check mmfsd on the primary node. 6027-663 Lost connection to file system daemon.
Explanation: Attempt to send a message to the file system failed. User response: Check if the file system daemon is up and running. 6027-669 Could not connect to file system daemon.
Explanation: The connection between a GPFS command and the mmfsd daemon has broken. The daemon has probably crashed. User response: Ensure that the mmfsd daemon is running. Check the error log. 6027-664 Unexpected message from file system daemon.
Explanation: The TCP connection between the command and the daemon could not be established. User response: Check additional error messages. 6027-670 Value for 'option' is not valid. Valid values are list.
Explanation: The version of the mmfsd daemon does not match the version of the GPFS command. User response: Ensure that all GPFS software components are at the same version. 6027-665 Failed to connect to file system daemon: errorString.
Explanation: The specified value for the given command option was not valid. The remainder of the line will list the valid keywords. User response: Correct the command line. 6027-671 Keyword missing or incorrect.
Explanation: A missing or incorrect keyword was encountered while parsing command line arguments User response: Correct the command line. 6027-672 Too few arguments specified.
Explanation: An error occurred while trying to create a session with mmfsd. User response: Ensure that the mmfsd daemon is running. Also, only root can run most GPFS commands. The mode bits of the commands must be set-user-id to root. 6027-666 Failed to determine file system manager.
Explanation: Too few arguments were specified on the command line. User response: Correct the command line. 6027-673 Too many arguments specified.
Explanation: While running a GPFS command in a multiple node configuration, the local file system daemon is unable to determine which node is managing the file system affected by the command. User response: Check internode communication configuration and ensure that enough GPFS nodes are up to form a quorum. 6027-667 Could not set up socket
Explanation: Too many arguments were specified on the command line. User response: Correct the command line. 6027-674 Too many values specified for option name.
Explanation: Too many values were specified for the given option on the command line. User response: Correct the command line. 6027-675 Required value for option is missing.
Explanation: One of the calls to create or bind the socket used for sending parameters and messages between the command and the daemon failed. User response: Check additional error messages.
Explanation: A required value was not specified for the given option on the command line. User response: Correct the command line.
142
6027-676 6027-692
6027-676 Option option specified more than once. 6027-686 option (value) exceeds option (value).
Explanation: The named option was specified more than once on the command line. User response: Correct the command line. 6027-677 Option option is incorrect.
Explanation: The value of the first option exceeds the value of the second option. This is not permitted. User response: Correct the command line. 6027-687 Disk name is specified more than once.
Explanation: An incorrect option was specified on the command line. User response: Correct the command line. 6027-678 Misplaced or incorrect parameter name.
Explanation: The named disk was specified more than once on the command line. User response: Correct the command line. 6027-688 Failed to read file system descriptor.
Explanation: A misplaced or incorrect parameter was specified on the command line. User response: Correct the command line. 6027-679 Device name is not valid.
Explanation: The disk block containing critical information about the file system could not be read from disk. User response: This is usually an error in the path to the disks. If there are associated messages indicating an I/O error such as ENODEV or EIO, correct that error and retry the operation. If there are no associated I/O errors, then run the mmfsck command with the file system unmounted. 6027-689 Failed to update file system descriptor.
Explanation: An incorrect device name was specified on the command line. User response: Correct the command line. 6027-681 Required option name was not specified.
Explanation: A required option was not specified on the command line. User response: Correct the command line. 6027-682 Device argument is missing.
Explanation: The disk block containing critical information about the file system could not be written to disk. User response: This is a serious error, which may leave the file system in an unusable state. Correct any I/O errors, then run the mmfsck command with the file system unmounted to make repairs. 6027-690 Failed to allocate I/O buffer.
Explanation: The device argument was not specified on the command line. User response: Correct the command line. 6027-683 Disk name is invalid.
Explanation: Could not obtain enough memory (RAM) to perform an operation. User response: Either retry the operation when the mmfsd daemon is less heavily loaded, or increase the size of one or more of the memory pool parameters by issuing the mmchconfig command. 6027-691 Failed to send message to node nodeName.
Explanation: An incorrect disk name was specified on the command line. User response: Correct the command line. 6027-684 Value value for option is incorrect.
Explanation: An incorrect value was specified for the named option. User response: Correct the command line. 6027-685 Value value for option option is out of range. Valid values are number through number.
Explanation: A message to another file system node could not be sent. User response: Check additional error message and the internode communication configuration. 6027-692 Value for option is not valid. Valid values are yes, no.
Explanation: An out of range value was specified for the named option. User response: Correct the command line.
Explanation: An option that is required to be yes or no is neither. User response: Correct the command line.
Chapter 11. Messages
143
6027-693 6027-703
6027-693 Cannot open disk name. with the file system unmounted to make repairs. There will be a POSSIBLE FILE CORRUPTION entry in the system error log that should be forwarded to the IBM Support Center. 6027-700 6027-694 Disk not started; disk name has a bad volume label. Log recovery failed.
Explanation: Could not access the given disk. User response: Check the disk hardware and the path to the disk.
Explanation: An error was encountered while restoring file system metadata from the log. User response: Check additional error message. A likely reason for this error is that none of the replicas of the log could be accessed because too many disks are currently unavailable. If the problem persists, issue the mmfsck command with the file system unmounted. 6027-701 Some file system data are inaccessible at this time.
Explanation: The volume label on the disk does not match that expected by GPFS. User response: Check the disk hardware. For hot-pluggable drives, ensure that the proper drive has been plugged in. 6027-695 File system is read-only.
Explanation: An operation was attempted that would require modifying the contents of a file system, but the file system is read-only. User response: Make the file system R/W before retrying the operation. 6027-696 Too many disks are unavailable.
Explanation: The file system has encountered an error that is serious enough to make some or all data inaccessible. This message indicates that an occurred that left the file system in an unusable state. User response: Possible reasons include too many unavailable disks or insufficient memory for file system control structures. Check other error messages as well as the error log for additional information. Unmount the file system and correct any I/O errors. Then remount the file system and try the operation again. If the problem persists, issue the mmfsck command with the file system unmounted to make repairs. 6027-702 Some file system data are inaccessible at this time. Check error log for additional information. After correcting the problem, the file system must be unmounted and then mounted to restore normal data access.
Explanation: A file system operation failed because all replicas of a data or metadata block are currently unavailable. User response: Issue the mmlsdisk command to check the availability of the disks in the file system; correct disk hardware problems, and then issue the mmchdisk command with the start option to inform the file system that the disk or disks are available again. 6027-697 No log available.
Explanation: A file system operation failed because no space for logging metadata changes could be found. User response: Check additional error message. A likely reason for this error is that all disks with available log space are currently unavailable. 6027-698 Not enough memory to allocate internal data structure.
Explanation: The file system has encountered an error that is serious enough to make some or all data inaccessible. This message indicates that an error occurred that left the file system in an unusable state. User response: Possible reasons include too many unavailable disks or insufficient memory for file system control structures. Check other error messages as well as the error log for additional information. Unmount the file system and correct any I/O errors. Then remount the file system and try the operation again. If the problem persists, issue the mmfsck command with the file system unmounted to make repairs. 6027-703 Some file system data are inaccessible at this time. Check error log for additional information.
Explanation: A file system operation failed because no memory is available for allocating internal data structures. User response: Stop other processes that may have main memory pinned for their use. 6027-699 Inconsistency in file system metadata.
Explanation: File system metadata on disk has been corrupted. User response: This is an extremely serious error that may cause loss of data. Issue the mmfsck command
Explanation: The file system has encountered an error that is serious enough to make some or all data inaccessible. This message indicates that an error occurred that left the file system in an unusable state.
144
6027-704 6027-715
User response: Possible reasons include too many unavailable disks or insufficient memory for file system control structures. Check other error messages as well as the error log for additional information. Unmount the file system and correct any I/O errors. Then remount the file system and try the operation again. If the problem persists, issue the mmfsck command with the file system unmounted to make repairs. 6027-704 Attention: Due to an earlier error normal access to this file system has been disabled. Check error log for additional information. After correcting the problem, the file system must be unmounted and then mounted again to restore normal data access. 6027-709 Incorrect response. Valid responses are yes, no, or noall.
Explanation: A question was asked that requires a yes or no answer. The answer entered was neither yes, no, nor noall. User response: Enter a valid response. 6027-710 Attention:
Explanation: Precedes an attention messages. User response: None. Informational message only. 6027-711 Specified entity, such as a disk or file system, does not exist.
Explanation: The file system has encountered an error that is serious enough to make some or all data inaccessible. This message indicates that an error occurred that left the file system in an unusable state. User response: Possible reasons include too many unavailable disks or insufficient memory for file system control structures. Check other error messages as well as the error log for additional information. Unmount the file system and correct any I/O errors. Then remount the file system and try the operation again. If the problem persists, issue the mmfsck command with the file system unmounted to make repairs. 6027-705 Error code value.
Explanation: A file system operation failed because the specified entity, such as a disk or file system, could not be found. User response: Specify existing disk, file system, etc. 6027-712 Error in communications between mmfsd daemon and client program.
Explanation: A message sent between the mmfsd daemon and the client program had an incorrect format or content. User response: Verify that the mmfsd daemon is running. 6027-713 Unable to start because conflicting program name is running. Waiting until it completes.
Explanation: Provides additional information about an error. User response: See accompanying error messages. 6027-706 The device name has no corresponding entry in fileName or has an incomplete entry.
Explanation: The command requires a device that has a file system associated with it. User response: Check the operating system's file system database (the given file) for a valid device entry. 6027-707 Unable to open file fileName.
Explanation: A program detected that it cannot start because a conflicting program is running. The program will automatically start once the conflicting program has ended, as long as there are no other conflicting programs running at that time. User response: None. Informational message only. 6027-714 Terminating because conflicting program name is running.
Explanation: The named file cannot be opened. User response: Check that the file exists and has the correct permissions. 6027-708 Keyword name is incorrect. Valid values are list.
Explanation: A program detected that it must terminate because a conflicting program is running. User response: Reissue the command once the conflicting program has ended. 6027-715 command is finished waiting. Starting execution now.
Explanation: An incorrect keyword was encountered. User response: Correct the command line.
Explanation: A program detected that it can now begin running because a conflicting program has ended. User response: None. Information message only.
145
6027-716 6027-725
6027-716 Some file system data or metadata has been lost. Explanation: The file system has encountered an error that is serious enough to make some or all data inaccessible. This message indicates that an error occurred that left the file system in an unusable state. Possible reasons include too many unavailable disks or insufficient memory for file system control structures. User response: Check other error messages as well as the error log for additional information. Correct any I/O errors. Then, remount the file system and try the operation again. If the problem persists, issue the mmfsck command with the file system unmounted to make repairs. 6027-723 Attention: Due to an earlier error normal access to this file system has been disabled. Check error log for additional information. After correcting the problem, the file system must be mounted again to restore normal data access.
Explanation: Unable to access some piece of file system data that has been lost due to the deletion of disks beyond the replication factor. User response: If the function did not complete, try to mount the file system in restricted mode. 6027-717 Must execute mmfsck before mount.
Explanation: An attempt has been made to mount a file system on which an incomplete mmfsck command was run. User response: Reissue the mmfsck command to the repair file system, then reissue the mount command. 6027-718 The mmfsd daemon is not ready to handle commands yet.
Explanation: The mmfsd daemon is not accepting messages because it is restarting or stopping. User response: None. Informational message only. 6027-719 Device type not supported.
Explanation: The file system has encountered an error that is serious enough to make some or all data inaccessible. This message indicates that an error occurred that left the file system in an unusable state. Possible reasons include too many unavailable disks or insufficient memory for file system control structures. User response: Check other error messages as well as the error log for additional information. Correct any I/O errors. Then, remount the file system and try the operation again. If the problem persists, issue the mmfsck command with the file system unmounted to make repairs. 6027-724 Incompatible file system format.
Explanation: A disk being added to a file system with the mmadddisk or mmcrfs command is not a character mode special file, or has characteristics not recognized by GPFS. User response: Check the characteristics of the disk being added to the file system. 6027-720 Actual sector size does not match given sector size.
Explanation: A disk being added to a file system with the mmadddisk or mmcrfs command has a physical sector size that differs from that given in the disk description list. User response: Check the physical sector size of the disk being added to the file system. 6027-721 Host name in fileName is not valid.
Explanation: An attempt was made to access a file system that was formatted with an older version of the product that is no longer compatible with the version currently running. User response: To change the file system format version to the current version, issue the -V option on the mmchfs command. 6027-725 The mmfsd daemon is not ready to handle commands yet. Waiting for quorum.
Explanation: A host name or IP address that is not valid was found in a configuration file. User response: Check the configuration file specified in the error message. 6027-722 Attention: Due to an earlier error normal access to this file system has been disabled. Check error log for additional information. The file system must be mounted again to restore normal data access.
Explanation: The GPFS mmfsd daemon is not accepting messages because it is waiting for quorum. User response: Determine why insufficient nodes have joined the group to achieve quorum and rectify the problem.
146
6027-726 6027-737
6027-726 Quota initialization/startup failed. 6027-732 Error while performing command on fileSystem.
Explanation: Quota manager initialization was unsuccessful. The file system manager finished without quotas. Subsequent client mount requests will fail. User response: Check the error log and correct I/O errors. It may be necessary to issue the mmcheckquota command with the file system unmounted. 6027-727 Specified driver type type does not match disk name driver type type.
Explanation: An error occurred while performing the stated command when listing or reporting quotas. User response: None. Informational message only. 6027-733 Edit quota: Incorrect format!
Explanation: The format of one or more edited quota limit entries was not correct. User response: Reissue the mmedquota command. Change only the values for the limits and follow the instructions given. 6027-734 Quota check for fileSystem ended prematurely.
Explanation: The driver type specified on the mmchdisk command does not match the current driver type of the disk. User response: Verify the driver type and reissue the command. 6027-728 Specified sector size value does not match disk name sector size value.
Explanation: The user interrupted and terminated the command. User response: If ending the command was not intended, reissue the mmcheckquota command. 6027-735 Error editing string from mmfsd.
Explanation: The sector size specified on the mmchdisk command does not match the current sector size of the disk. User response: Verify the sector size and reissue the command. 6027-729 Attention: No changes for disk name were specified.
Explanation: An internal error occurred in the mmfsd when editing a string. User response: None. Informational message only. 6027-736 Attention: Due to an earlier error normal access to this file system has been disabled. Check error log for additional information. The file system must be unmounted and then mounted again to restore normal data access.
Explanation: The disk descriptor in the mmchdisk command does not specify that any changes are to be made to the disk. User response: Check the disk descriptor to determine if changes are needed. 6027-730 command on fileSystem.
Explanation: Quota was activated or deactivated as stated as a result of the mmquotaon, mmquotaoff, mmdefquotaon, or mmdefquotaoff commands. User response: None, informational only. This message is enabled with the -v option on the mmquotaon, mmquotaoff, mmdefquotaon, or mmdefquotaoff commands. 6027-731 Error number while performing command for name quota on fileSystem
Explanation: The file system has encountered an error that is serious enough to make some or all data inaccessible. This message indicates that an error occurred that left the file system in an unusable state. Possible reasons include too many unavailable disks or insufficient memory for file system control structures. User response: Check other error messages as well as the error log for additional information. Unmount the file system and correct any I/O errors. Then, remount the file system and try the operation again. If the problem persists, issue the mmfsck command with the file system unmounted to make repairs. 6027-737 Attention: No metadata disks remain.
Explanation: An error occurred when switching quotas of a certain type on or off. If errors were returned for multiple file systems, only the error code is shown. User response: Check the error code shown by the message to determine the reason.
Explanation: The mmchdisk command has been issued, but no metadata disks remain. User response: None. Informational message only.
147
6027-738 6027-749
6027-738 Attention: No data disks remain. 6027-744 Unable to run command while the file system is mounted in restricted mode.
Explanation: The mmchdisk command has been issued, but no data disks remain. User response: None. Informational message only. 6027-739 Attention: Due to an earlier configuration change the file system is no longer properly balanced.
Explanation: A command that can alter the data in a file system was issued while the file system was mounted in restricted mode. User response: Mount the file system in read-only or read-write mode or unmount the file system and then reissue the command. 6027-745 fileSystem: no quotaType quota management enabled.
Explanation: The mmlsdisk command found that the file system is not properly balanced. User response: Issue the mmrestripefs -b command at your convenience. 6027-740 Attention: Due to an earlier configuration change the file system is no longer properly replicated.
Explanation: A quota command of the cited type was issued for the cited file system when no quota management was enabled. User response: Enable quota management and reissue the command. 6027-746 Editing quota limits for this user or group not permitted.
Explanation: The mmlsdisk command found that the file system is not properly replicated. User response: Issue the mmrestripefs -r command at your convenience 6027-741 Attention: Due to an earlier configuration change the file system may contain data that is at risk of being lost.
Explanation: The root user or system group was specified for quota limit editing in the mmedquota command. User response: Specify a valid user or group in the mmedquota command. Editing quota limits for the root user or system group is prohibited. 6027-747 Too many nodes in cluster (max number) or file system (max number).
Explanation: The mmlsdisk command found that critical data resides on disks that are suspended or being deleted. User response: Issue the mmrestripefs -m command as soon as possible. 6027-742 Error occurred while executing a command for fileSystem.
Explanation: The operation cannot succeed because too many nodes are involved. User response: Reduce the number of nodes to the applicable stated limit. 6027-748 fileSystem: no quota management enabled
Explanation: A quota command encountered a problem on a file system. Processing continues with the next file system. User response: None. Informational message only. 6027-743 Initial disk state was updated successfully, but another error may have changed the state again.
Explanation: A quota command was issued for the cited file system when no quota management was enabled. User response: Enable quota management and reissue the command. 6027-749 Pool size changed to number K = number M.
Explanation: The mmchdisk command encountered an error after the disk status or availability change was already recorded in the file system configuration. The most likely reason for this problem is that too many disks have become unavailable or are still unavailable after the disk state change. User response: Issue an mmchdisk start command when more disks are available.
Explanation: Pool size successfully changed. User response: None. Informational message only.
148
6027-750 6027-764
6027-750 The node address ipAddress is not defined in the node list. 6027-757 name is not an excluded disk.
Explanation: An address does not exist in the GPFS configuration file. User response: Perform required configuration steps prior to starting GPFS on the node. 6027-751 Error code value.
Explanation: Some of the disks passed to the mmfsctl include command are not marked as excluded in the mmsdrfs file. User response: Verify the list of disks supplied to this command. 6027-758 Disk(s) not started; disk name has a bad volume label.
Explanation: Provides additional information about an error. User response: See accompanying error messages. 6027-752 Lost membership in cluster clusterName. Unmounting file systems.
Explanation: The volume label on the disk does not match that expected by GPFS. User response: Check the disk hardware. For hot-pluggable drives, make sure the proper drive has been plugged in. 6027-759 fileSystem is still in use.
Explanation: This node has lost membership in the cluster. Either GPFS is no longer available on enough nodes to maintain quorum, or this node could not communicate with other members of the quorum. This could be caused by a communications failure between nodes, or multiple GPFS failures. User response: See associated error logs on the failed nodes for additional problem determination information. 6027-753 Could not run command command.
Explanation: The mmfsctl include command found that the named file system is still mounted, or another GPFS command is running against the file system. User response: Unmount the file system if it is mounted, or wait for GPFS commands in progress to terminate before retrying the command. 6027-761 Attention: excessive timer drift between node and node (number over number).
Explanation: The GPFS daemon failed to run the specified command. User response: Verify correct installation. 6027-754 Error reading string for mmfsd.
Explanation: GPFS has detected an unusually large difference in the rate of clock ticks (as returned by the times() system call) between two nodes. Another node's TOD clock and tick rate changed dramatically relative to this node's TOD clock and tick rate. User response: Check error log for hardware or device driver problems that might cause timer interrupts to be lost or a recent large adjustment made to the TOD clock. 6027-762 No quota enabled file system found.
Explanation: GPFS could not properly read an input string. User response: Check that GPFS is properly installed. 6027-755 Waiting for challenge to be responded during disk election.
Explanation: The node has challenged another node, which won the previous election and is waiting for the challenger to respond. User response: None. Informational message only. 6027-756 Configuration invalid or inconsistent between different nodes.
Explanation: There is no quota-enabled file system in this cluster. User response: None. Informational message only. 6027-763 uidInvalidate: Incorrect option option.
Explanation: An incorrect option passed to the uidinvalidate command. User response: Correct the command invocation. 6027-764 Error invalidating UID remapping cache for domain.
Explanation: Self-explanatory. User response: Check cluster and file system configuration.
Explanation: An incorrect domain name passed to the uidinvalidate command. User response: Correct the command invocation.
Chapter 11. Messages
149
6027-765 6027-778
6027-765 Tick value hasn't changed for nearly number seconds. 6027-772 Error writing fileset.quota file.
Explanation: Clock ticks incremented by AIX have not been incremented. User response: Check the error log for hardware or device driver problems that might cause timer interrupts to be lost. 6027-766 This node will be expelled from cluster cluster due to expel msg from node.
Explanation: An error occurred while writing the cited quota file. User response: Check the status and availability of the disks and reissue the command. 6027-774 fileSystem: quota management is not enabled, or one or more quota clients are not available.
Explanation: This node is being expelled from the cluster. User response: Check the network connection between this node and the node specified above. 6027-767 Request sent to node to expel node from cluster cluster.
Explanation: An attempt was made to perform quotas commands without quota management enabled, or one or more quota clients failed during quota check. User response: Correct the cause of the problem, and then reissue the quota command. 6027-775 During mmcheckquota processing, number node(s) failed. It is recommended that mmcheckquota be repeated.
Explanation: This node sent an expel request to the cluster manager node to expel another node. User response: Check network connection between this node and the node specified above. 6027-768 Wrong number of operands for mmpmon command 'command'.
Explanation: Nodes failed while an online quota check was running. User response: Reissue the quota check command. 6027-776 fileSystem: There was not enough space for the report. Please repeat quota check!
Explanation: The command read from the input file has the wrong number of operands. User response: Correct the command invocation and reissue the command. 6027-769 Malformed mmpmon command 'command'.
Explanation: The vflag is set in the tscheckquota command, but either no space or not enough space could be allocated for the differences to be printed. User response: Correct the space problem and reissue the quota check. 6027-777 Recovering nodes: nodeList.
Explanation: The command read from the input file is malformed, perhaps with an unknown keyword. User response: Correct the command invocation and reissue the command. 6027-770 Error writing user.quota file.
Explanation: Recovery for one or more nodes has begun. User response: No response is needed if this message is followed by 'recovered nodes' entries specifying the nodes. If this message is not followed by such a message, determine why recovery did not complete. 6027-778 Recovering nodes in cluster cluster: nodeList.
Explanation: An error occurred while writing the cited quota file. User response: Check the status and availability of the disks and reissue the command. 6027-771 Error writing group.quota file.
Explanation: Recovery for one or more nodes in the cited cluster has begun. User response: No response is needed if this message is followed by 'recovered nodes' entries on the cited cluster specifying the nodes. If this message is not followed by such a message, determine why recovery did not complete.
Explanation: An error occurred while writing the cited quota file. User response: Check the status and availability of the disks and reissue the command.
150
6027-779 6027-794
6027-779 Incorrect fileset name filesetName. 6027-788 Failed to load or initialize security library.
Explanation: The fileset name provided on the command line is incorrect. User response: Correct the fileset name and reissue the command. 6027-780 Incorrect path to fileset junction junctionName.
Explanation: There was an error loading or initializing the security library on this node. User response: Check previous messages for further information. 6027-789 Unable to read offsets offset to offset for inode inode snap snap, from disk diskName, sector sector.
Explanation: The path to the fileset junction is incorrect. User response: Correct the junction path and reissue the command. 6027-781 Storage pools have not been enabled for file system fileSystem.
Explanation: The mmdeldisk -c command found that the cited addresses on the cited disk represent data that is no longer readable. User response: Save this output for later use in cleaning up failing disks. 6027-790 Specified storage pool poolName does not match disk diskName storage pool poolName. Use mmdeldisk and mmadddisk to change a disk's storage pool.
Explanation: The user invoked a command with a storage pool option (-p or -P) before storage pools were enabled. User response: Enable storage pools with the mmchfs -V command, or correct the command invocation and reissue the command. 6027-784 Device not ready.
Explanation: A device is not ready for operation. User response: Check previous messages for further information. 6027-785 Cannot establish connection.
Explanation: An attempt was made to change a disk's storage pool assignment using the mmchdisk command. This can only be done by deleting the disk from its current storage pool and then adding it to the new pool. User response: Delete the disk from its current storage pool and then add it to the new pool. 6027-792 Policies have not been enabled for file system fileSystem.
Explanation: This node cannot establish a connection to another node. User response: Check previous messages for further information. 6027-786 Message failed because the destination node refused the connection.
Explanation: The cited file system must be upgraded to use policies. User response: Upgrade the file system via the mmchfs -V command. 6027-793 No policy file was installed for file system fileSystem.
Explanation: This node sent a message to a node that refuses to establish a connection. User response: Check previous messages for further information. 6027-787 Security configuration data is inconsistent or unavailable.
Explanation: No policy file was installed for this file system. User response: Install a policy file. 6027-794 Failed to read policy file for file system fileSystem.
Explanation: There was an error configuring security on this node. User response: Check previous messages for further information.
Explanation: Failed to read the policy file for the requested file system. User response: Reinstall the policy file.
151
6027-795 6027-860
6027-795 Failed to open fileName: errorCode. 6027-853 interval must be less than 1024.
Explanation: An incorrect file name was specified to tschpolicy. User response: Correct the command invocation and reissue the command. 6027-796 Failed to read fileName: errorCode.
Explanation: An incorrect value was supplied for the interval parameter. User response: Correct the command invocation and reissue the command. 6027-854 count must be less than 1024.
Explanation: An incorrect file name was specified to tschpolicy. User response: Correct the command invocation and reissue the command. 6027-797 Failed to stat fileName: errorCode.
Explanation: An incorrect value was supplied for the count parameter. User response: Correct the command invocation and reissue the command. 6027-855 Unable to connect to server, mmfsd is not started.
Explanation: An incorrect file name was specified to tschpolicy. User response: Correct the command invocation and reissue the command. 6027-798 Policy files are limited to number bytes.
Explanation: The tsiostat command was issued but the file system is not started. User response: Contact your system administrator. 6027-856 No information to report.
Explanation: A user-specified policy file exceeded the maximum-allowed length. User response: Install a smaller policy file. 6027-799 Policy policyName installed and broadcast to all nodes.
Explanation: The tsiostat command was issued but no file systems are mounted. User response: Contact your system administrator. 6027-857 Error retrieving values.
Explanation: Self-explanatory. User response: None. Informational message only. 6027-850 Unable to issue this command from a non-root user.
Explanation: The tsiostat command was issued and an internal error occurred. User response: Contact the IBM Support Center. 6027-858 File system not mounted.
Explanation: tsiostat requires root privileges to run. User response: Get the system administrator to change the executable to set the UID to 0. 6027-851 Unable to process interrupt received.
Explanation: The requested file system is not mounted. User response: Mount the file system and reattempt the failing operation. 6027-859 Set DIRECTIO failed
Explanation: An interrupt occurred that tsiostat cannot process. User response: Contact the IBM Support Center. 6027-852 interval and count must be positive integers.
Explanation: The tsfattr call failed. User response: Check for additional error messages. Resolve the problems before reattempting the failing operation. 6027-860 -d is not appropriate for an NFSv4 ACL
Explanation: Incorrect values were supplied for tsiostat parameters. User response: Correct the command invocation and reissue the command.
Explanation: Produced by the mmgetacl or mmputacl commands when the -d option was specified, but the object has an NFS Version 4 ACL (does not have a default). User response: None. Informational message only.
152
6027-861 6027-874
6027-861 Set afm ctl failed 6027-868 mmchattr failed.
Explanation: The tsfattr call failed. User response: Check for additional error messages. Resolve the problems before reattempting the failing operation. 6027-862 Incorrect storage pool name poolName.
Explanation: An error occurred while changing a file's attributes. User response: Check the error code and reissue the command. 6027-869 File replication exceeds number of failure groups in destination storage pool.
Explanation: An incorrect storage pool name was provided. User response: Determine the correct storage pool name and reissue the command. 6027-863 File cannot be assigned to storage pool 'poolName'.
Explanation: The tschattr command received incorrect command line arguments. User response: Correct the command invocation and reissue the command. 6027-870 Error on getcwd(): errorString. Try an absolute path instead of just pathName
Explanation: The file cannot be assigned to the specified pool. User response: Determine the correct storage pool name and reissue the command. 6027-864 Set storage pool failed.
Explanation: The getcwd system call failed. User response: Specify an absolute path starting with '/' on the command invocation, so that the command will not need to invoke getcwd. 6027-871 Error on gpfs_get_pathname_from_ fssnaphandle (pathName): errorString.
Explanation: An incorrect storage pool name was provided. User response: Determine the correct storage pool name and reissue the command. 6027-865 Restripe file data failed.
Explanation: An error occurred during a gpfs_get_pathname_from_fssnaphandle operation. User response: Verify the invocation parameters and make sure the command is running under a user ID with sufficient authority (root or administrator privileges). Specify a GPFS file system device name or a GPFS directory path name as the first argument. Correct the command invocation and reissue the command. 6027-872 Is pathName a GPFS file system name or path?
Explanation: An error occurred while restriping the file data. User response: Check the error code and reissue the command. 6027-866 Storage pools have not been enabled for this file system.
Explanation: The user invoked a command with a storage pool option (-p or -P) before storage pools were enabled. User response: Enable storage pools via mmchfs -V, or correct the command invocation and reissue the command. 6027-867 Change storage pool is not permitted.
Explanation: An error occurred while attempting to access the named GPFS file system or path. User response: Verify the invocation parameters and make sure the command is running under a user ID with sufficient authority (root or administrator privileges). Correct the command invocation and reissue the command. 6027-874 Error: incorrect Date@Time (YYYY-MM-DD@HH:MM:SS) specification: specification.
Explanation: The user tried to change a file's assigned storage pool but was not root or superuser. User response: Reissue the command as root or superuser.
Explanation: The Date@Time command invocation argument could not be parsed. User response: Correct the command invocation and try again. The syntax should look similar to: 2005-12-25@07:30:00.
Chapter 11. Messages
153
6027-875 6027-887
6027-875 Error on gpfs_stat(pathName: errorString. 6027-881 Error on gpfs_iopen([rootPath/pathName], inodeNumber): errorString.
Explanation: An error occurred while attempting to stat() the cited path name. User response: Determine whether the cited path name exists and is accessible. Correct the command arguments as necessary and reissue the command. 6027-876 Error starting directory scan(pathName: errorString.
Explanation: An error occurred during a gpfs_iopen operation. User response: Reissue the command. If the problem persists, contact the IBM Support Center. 6027-882 Error on gpfs_ireaddir([rootPath/ pathName], inodeNumber): errorString.
Explanation: The specified path name is not a directory. User response: Determine whether the specified path name exists and is an accessible directory. Correct the command arguments as necessary and reissue the command. 6027-877 Error opening pathName errorString.
Explanation: An error occurred during a gpfs_ireaddir() operation. User response: Reissue the command. If the problem persists, contact the IBM Support Center.
| 6027-883 | |
Explanation: An error occurred while attempting to open the named file. Its pool and replication attributes remain unchanged. User response: Investigate the file and possibly reissue the command. The file may have been removed or locked by another application. 6027-878 Error on gpfs_fcntl(pathName): errorString (offset=offset.
| Explanation: An error occurred during a | gpfs_next_inode operation. | User response: Reissue the command. If the problem | persists, contact the IBM Support Center.
6027-884 Error during directory scan (returnCode).
Explanation: A terminal error occurred during the directory scan phase of the command. User response: Verify the command arguments. Reissue the command. If the problem persists, contact the IBM Support Center. 6027-885 Error during inode scan: errorString(returnCode).
Explanation: An error occurred while attempting fcntl on the named file. Its pool or replication attributes may not have been adjusted. User response: Investigate the file and possibly reissue the command. Use the mmlsattr and mmchattr commands to examine and change the pool and replication attributes of the named file. 6027-879 Error deleting pathName: errorString.
Explanation: A terminal error occurred during the inode scan phase of the command. User response: Verify the command arguments. Reissue the command. If the problem persists, contact the IBM Support Center. 6027-886 Error during policy decisions scan (returnCode).
Explanation: An error occurred while attempting to delete the named file. User response: Investigate the file and possibly reissue the command. The file may have been removed or locked by another application.
| 6027-880 | | | | |
Explanation: A terminal error occurred during the policy decisions phase of the command. User response: Verify the command arguments. Reissue the command. If the problem persists, contact the IBM Support Center. 6027-887 Error on gpfs_igetstoragepool(datapoolId: errorString).
Explanation: An error occurred during a gpfs_seek_inode operation. User response: Reissue the command. If the problem persists, contact the contact the IBM Support Center
Explanation: An error occurred during a gpfs_igetstoragepool operation. User response: Reissue the command. If the problem
154
6027-888 6027-902[E]
persists, contact the IBM Support Center. 6027-888 Error on gpfs_igetfilesetname(filesetId: errorString). 6027-895 Error on pthread_mutex_unlock: errorString.
Explanation: An error occurred during a pthread_mutex_unlock operation. User response: Contact the IBM Support Center. 6027-896 Error on pthread_cond_init: errorString.
Explanation: An error occurred during a gpfs_igetfilesetname operation. User response: Reissue the command. If the problem persists, contact the IBM Support Center. 6027-889 Error on gpfs_get_fssnaphandle(rootPath: errorString).
Explanation: An error occurred during a pthread_cond_init operation. User response: Contact the IBM Support Center. 6027-897 Error on pthread_cond_signal: errorString.
Explanation: An error occurred during a gpfs_get_fssnaphandle operation. User response: Reissue the command. If the problem persists, contact the IBM Support Center. 6027-890 Error on gpfs_open_inodescan(rootPath: errorString.
Explanation: An error occurred during a pthread_cond_signal operation. User response: Contact the IBM Support Center. 6027-898 Error on pthread_cond_broadcast: errorString.
Explanation: An error occurred during a gpfs_open_inodescan() operation. User response: Reissue the command. If the problem persists, contact the IBM Support Center. 6027-891 WEIGHT(thresholdValue) UNKNOWN pathName.
Explanation: An error occurred during a pthread_cond_broadcast operation. User response: Contact the IBM Support Center. 6027-899 Error on pthread_cond_wait: errorString.
Explanation: The named file was assigned the indicated weight, but the rule type is UNKNOWN. User response: Contact the IBM Support Center. 6027-892 Error on pthread_create: where #threadNumber_or_portNumber_or_ socketNumber: errorString.
Explanation: An error occurred during a pthread_cond_wait operation. User response: Contact the IBM Support Center. 6027-900 Error opening work file fileName: errorString.
Explanation: An error occurred while creating the thread during a pthread_create operation. User response: Consider some of the command parameters that might affect memory usage. For further assistance, contact the IBM Support Center. 6027-893 Error on pthread_mutex_init: errorString.
Explanation: An error occurred while attempting to open the named work file. User response: Investigate the file and possibly reissue the command. Check that the path name is defined and accessible. 6027-901 Error writing to work file fileName: errorString.
Explanation: An error occurred during a pthread_mutex_init operation. User response: Contact the IBM Support Center. 6027-894 Error on pthread_mutex_lock: errorString.
Explanation: An error occurred while attempting to write to the named work file. User response: Investigate the file and possibly reissue the command. Check that there is sufficient free space in the file system. 6027-902[E] Error parsing work file fileName. Service index: number.
Explanation: An error occurred during a pthread_mutex_lock operation. User response: Contact the IBM Support Center.
Explanation: An error occurred while attempting to read the specified work file.
155
Explanation: The cited pool is not defined in the file system. User response: Correct the rule and reissue the command. This is not an irrecoverable error; the command will continue to run. Of course it will not find any files in an incorrect FROM POOL and it will not be able to migrate any files to an incorrect TO POOL. 6027-909 Error on pthread_join: where #threadNumber: errorString.
Explanation: An error occurred while writing the policy decision for the candidate file with the indicated inode number and path name to a work file. There probably will be related error messages. User response: Read all the related error messages. Attempt to correct the problems. 6027-905[E] Error: Out of memory. Service index: number.
Explanation: An error occurred while reaping the thread during a pthread_join operation. User response: Contact the IBM Support Center. 6027-910 Error during policy execution (returnCode).
Explanation: A terminating error occurred during the policy execution phase of the command. User response: Verify the command arguments and reissue the command. If the problem persists, contact the IBM Support Center. 6027-911 Error on changeSpecification change for pathName. errorString.
Explanation: The command has exhausted virtual memory. User response: Consider some of the command parameters that might affect memory usage. For further assistance, contact the IBM Support Center. 6027-906 Error on system (command): rc=number.
Explanation: An error occurred during the system call with the specified argument string. User response: Read and investigate related error messages. 6027-907 Error code number from sort_file (inodeListname, sortCommand, sortInodeOptions, tempDir).
Explanation: This provides more details on a gpfs_fcntl() error. User response: Use the mmlsattr and mmchattr commands to examine the file and then reissue the change command. 6027-912 Error on restriping of pathName. errorString.
Explanation: An error occurred while sorting the named work file using the named sort command with the given options and working directory. User response: Check these: v The sort command is installed on your system. v The sort command supports the given options. v The working directory is accessible. v The file system has sufficient free space.
Explanation: This provides more details on a gpfs_fcntl() error. User response: Use the mmlsattr and mmchattr commands to examine the file and then reissue the restriping command. 6027-913 Desired replication exceeds number of failure groups.
Explanation: While restriping a file, the tschattr or tsrestripefile command found that the desired replication exceeded the number of failure groups. User response: Reissue the command after adding or restarting file system disks.
156
6027-914 6027-925
6027-914 Insufficient space in one of the replica failure groups. are sufficient disks assigned to the pool with different failure groups to support the desired replication attributes. 6027-920 Error on pthread_detach(self): where: errorString.
Explanation: While restriping a file, the tschattr or tsrestripefile command found there was insufficient space in one of the replica failure groups. User response: Reissue the command after adding or restarting file system disks. 6027-915 Insufficient space to properly balance file.
Explanation: An error occurred during a pthread_detach operation. User response: Contact the IBM Support Center. 6027-921 Error on socket socketName (hostName): errorString.
Explanation: While restriping a file, the tschattr or tsrestripefile command found that there was insufficient space to properly balance the file. User response: Reissue the command after adding or restarting file system disks. 6027-916 Too many disks unavailable to properly balance file.
Explanation: An error occurred during a socket operation. User response: Verify any command arguments related to interprocessor communication and then reissue the command. If the problem persists, contact the IBM Support Center. 6027-922 Error in Mtconx - p_accepts should not be empty.
Explanation: While restriping a file, the tschattr or tsrestripefile command found that there were too many disks unavailable to properly balance the file. User response: Reissue the command after adding or restarting file system disks. 6027-917 All replicas of a data block were previously deleted.
Explanation: The program discovered an inconsistency or logic error within itself. User response: Contact the IBM Support Center. 6027-923 Error - command client is an incompatible version: hostName protocolVersion
Explanation: While restriping a file, the tschattr or tsrestripefile command found that all replicas of a data block were previously deleted. User response: Reissue the command after adding or restarting file system disks. 6027-918 Cannot make this change to a nonzero length file.
Explanation: While operating in master/client mode, the command discovered that the client is running an incompatible version. User response: Ensure the same version of the command software is installed on all nodes in the clusters and then reissue the command. 6027-924 Error - unrecognized client response from hostName clientResponse.
Explanation: GPFS does not support the requested change to the replication attributes. User response: You may want to create a new file with the desired attributes and then copy your data to that file and rename it appropriately. Be sure that there are sufficient disks assigned to the pool with different failure groups to support the desired replication attributes. 6027-919 Replication parameter range error (value, value).
Explanation: Similar to message 6027-923, except this may be an internal logic error. User response: Ensure the latest, same version software is installed on all nodes in the clusters and then reissue the command. If the problem persists, contact the IBM Support Center. 6027-925 Directory cannot be assigned to storage pool 'poolName'.
Explanation: Similar to message 6027-918. The (a,b) numbers are the allowable range of the replication attributes. User response: You may want to create a new file with the desired attributes and then copy your data to that file and rename it appropriately. Be sure that there
Explanation: The file cannot be assigned to the specified pool. User response: Determine the correct storage pool name and reissue the command.
157
6027-926 6027-935
6027-926 Symbolic link cannot be assigned to storage pool 'poolName'. 6027-931 Error - The policy evaluation phase did not complete.
Explanation: The file cannot be assigned to the specified pool. User response: Determine the correct storage pool name and reissue the command. 6027-927 System file cannot be assigned to storage pool 'poolName'.
Explanation: One or more errors prevented the policy evaluation phase from examining all of the files. User response: Consider other messages emitted by the command. Take appropriate action and then reissue the command. 6027-932 Error - The policy execution phase did not complete.
Explanation: The file cannot be assigned to the specified pool. User response: Determine the correct storage pool name and reissue the command. 6027-928 Error: filesystem/device fileSystem has no snapshot with name snapshotName.
Explanation: One or more errors prevented the policy execution phase from operating on each chosen file. User response: Consider other messages emitted by the command. Take appropriate action and then reissue the command. 6027-933 EXEC 'wouldbeScriptPathname' of EXTERNAL POOL or LIST 'PoolOrListName' fails TEST with code scriptReturnCode on this node.
Explanation: The specified file system does not have a snapshot with the specified snapshot name. User response: Use the mmlssnapshot command to list the snapshot names for the file system. 6027-929[W] Attention: In RULE 'ruleName' (ruleNumber), both pools 'poolName' and 'poolName' are EXTERNAL. This is not a supported migration. Explanation: The command does not support migration between two EXTERNAL pools. User response: Correct the rule and reissue the command. Note: This is not an unrecoverable error. The command will continue to run. 6027-930 Attention: In RULE 'ruleName' LIST name 'listName' appears, but there is no corresponding EXTERNAL LIST 'listName' EXEC ... OPTS ... rule to specify a program to process the matching files.
Explanation: Each EXEC defined in an EXTERNAL POOL or LIST rule is run in TEST mode on each node. Each invocation that fails with a nonzero return code is reported. Command execution is terminated on any node that fails any of these tests. User response: Correct the EXTERNAL POOL or LIST rule, the EXEC script, or do nothing because this is not necessarily an error. The administrator may suppress execution of the mmapplypolicy command on some nodes by deliberately having one or more EXECs return nonzero codes. 6027-934 Attention: Specified snapshot: 'SnapshotName' will be ignored because the path specified: 'PathName' is not within that snapshot.
Explanation: The command line specified both a path name to be scanned and a snapshot name, but the snapshot name was not consistent with the path name. User response: If you wanted the entire snapshot, just specify the GPFS file system name or device name. If you wanted a directory within a snapshot, specify a path name within that snapshot (for example, /gpfs/FileSystemName/.snapshots/SnapShotName/ Directory). 6027-935 Attention: In RULE 'ruleName' (ruleNumber) LIMIT or REPLICATE clauses are ignored; not supported for migration to EXTERNAL pool 'storagePoolName'.
Explanation: There should be an EXTERNAL LIST rule for every list named by your LIST rules. User response: Add an "EXTERNAL LIST listName EXEC scriptName OPTS opts" rule. Note: This is not an unrecoverable error. For execution with -I defer, file lists are generated and saved, so EXTERNAL LIST rules are not strictly necessary for correct execution.
Explanation: GPFS does not support the LIMIT or REPLICATE clauses during migration to external pools.
158
6027-936 6027-951[E]
User response: Correct the policy rule to avoid this warning message. 6027-936 Error - command master is an incompatible version. 6027-945 -r value exceeds number of failure groups for data.
Explanation: The mmchattr command received command line arguments that were not valid. User response: Correct command line and reissue the command. 6027-946 Not a regular file or directory.
Explanation: While operating in master/client mode, the command discovered that the master is running an incompatible version. User response: Upgrade the command software on all nodes and reissue the command. 6027-937 Error creating shared temporary subdirectory subDirName: subDirPath
Explanation: A mmlsattr or mmchattr command error occurred. User response: Correct the problem and reissue the command. 6027-947 Stat failed: A file or directory in the path name does not exist.
Explanation: The mkdir command failed on the named subdirectory path. User response: Specify an existing writable shared directory as the shared temporary directory argument to the policy command. The policy command will create a subdirectory within that. 6027-938 Error closing work file fileName: errorString
Explanation: A file or directory in the path name does not exist. User response: Correct the problem and reissue the command. 6027-948[E:nnn] fileName: get clone attributes failed: errorString Explanation: The tsfattr call failed. User response: Check for additional error messages. Resolve the problems before reattempting the failing operation. 6027-949[E] fileName: invalid clone attributes.
Explanation: An error occurred while attempting to close the named work file or socket. User response: Record the above information. Contact the IBM Support Center. 6027-940 Open failed.
Explanation: The open() system call was not successful. User response: Check additional error messages. 6027-941 Set replication failed.
Explanation: Self explanatory. User response: Check for additional error messages. Resolve the problems before reattempting the failing operation. 6027-950[E:nnn] File cloning requires the 'fastea' feature to be enabled. Explanation: The file system fastea feature is not enabled. User response: Enable the fastea feature by issuing the mmchfs -V and mmmigratefs --fastea commands. 6027-951[E] Error on operationName to work file fileName: errorString
Explanation: The open() system call was not successful. User response: Check additional error messages. 6027-943 -M and -R are only valid for zero length files.
Explanation: The mmchattr command received command line arguments that were not valid. User response: Correct command line and reissue the command. 6027-944 -m value exceeds number of failure groups for metadata.
Explanation: An error occurred while attempting to do a (write-like) operation on the named work file. User response: Investigate the file and possibly reissue the command. Check that there is sufficient free space in the file system.
Explanation: The mmchattr command received command line arguments that were not valid. User response: Correct command line and reissue the command.
159
6027-961 6027-982
6027-961 Cannot execute command. 6027-974 Failure reading ACL (rc=number).
Explanation: The mmeditacl command cannot invoke the mmgetacl or mmputacl command. User response: Contact your system administrator. 6027-963 EDITOR environment variable not set
Explanation: An unexpected error was encountered by mmgetacl or mmeditacl. User response: Examine the return code, contact the IBM Support Center if necessary. 6027-976 Failure writing ACL (rc=number).
Explanation: Self-explanatory. User response: Set the EDITOR environment variable and reissue the command. 6027-964 EDITOR environment variable must be an absolute path name
Explanation: An unexpected error encountered by mmputacl or mmeditacl. User response: Examine the return code, Contact the IBM Support Center if necessary. 6027-977 Authorization failure
Explanation: Self-explanatory. User response: Set the EDITOR environment variable correctly and reissue the command. 6027-965 Cannot create temporary file
Explanation: An attempt was made to create or modify the ACL for a file that you do not own. User response: Only the owner of a file or the root user can create or change the access control list for a file. 6027-978 Incorrect, duplicate, or missing access control entry detected.
Explanation: Self-explanatory. User response: Contact your system administrator. 6027-966 Cannot access fileName
Explanation: Self-explanatory. User response: Verify file permissions. 6027-967 Should the modified ACL be applied? yes or no
Explanation: An access control entry in the ACL that was created had incorrect syntax, one of the required access control entries is missing, or the ACL contains duplicate access control entries. User response: Correct the problem and reissue the command. 6027-979 Incorrect ACL entry: entry.
Explanation: Self-explanatory. User response: Respond yes if you want to commit the changes, no otherwise. 6027-971 Cannot find fileName
Explanation: Self-explanatory. User response: Correct the problem and reissue the command. 6027-980 name is not a valid user name.
Explanation: Self-explanatory. User response: Verify the file name and permissions. 6027-972 name is not a directory (-d not valid).
Explanation: Self-explanatory. User response: Specify a valid user name and reissue the command. 6027-981 name is not a valid group name.
Explanation: Self-explanatory. User response: None, only directories are allowed to have default ACLs. 6027-973 Cannot allocate number byte buffer for ACL.
Explanation: Self-explanatory. User response: Specify a valid group name and reissue the command. 6027-982 name is not a valid ACL entry type.
Explanation: There was not enough available memory to process the request. User response: Contact your system administrator.
Explanation: Specify a valid ACL entry type and reissue the command. User response: Correct the problem and reissue the command.
160
6027-983 6027-995
6027-983 name is not a valid permission set. 6027-991 Combining FileInherit and DirInherit makes the mask ambiguous.
Explanation: Specify a valid permission set and reissue the command. User response: Correct the problem and reissue the command. 6027-985 An error was encountered while deleting the ACL (rc=value).
Explanation: Produced by the mmputacl command when WRITE/CREATE is specified without MKDIR (or the other way around), and both the FILE_INHERIT and DIR_INHERIT flags are specified. User response: Make separate FileInherit and DirInherit entries and reissue the command. 6027-992 Subdirectory name already exists. Unable to create snapshot.
Explanation: An unexpected error was encountered by tsdelacl. User response: Examine the return code and contact the IBM Support Center, if necessary. 6027-986 Cannot open fileName.
Explanation: Self-explanatory. User response: Verify the file name and permissions. 6027-987 name is not a valid special name.
Explanation: tsbackup was unable to create a snapshot because the snapshot subdirectory already exists. This condition sometimes is caused by issuing a Tivoli restore operation without specifying a different subdirectory as the target of the restore. User response: Remove or rename the existing subdirectory and then retry the command. 6027-993 Keyword aclType is incorrect. Valid values are: 'posix', 'nfs4', 'native'.
Explanation: Produced by the mmputacl command when the NFS V4 'special' identifier is followed by an unknown special id string. name is one of the following: 'owner@', 'group@', 'everyone@'. User response: Specify a valid NFS V4 special name and reissue the command. 6027-988 type is not a valid NFS V4 type.
Explanation: One of the mm*acl commands specified an incorrect value with the -k option. User response: Correct the aclType value and reissue the command. 6027-994 ACL permissions cannot be denied to the file owner.
Explanation: Produced by the mmputacl command when the type field in an ACL entry is not one of the supported NFS Version 4 type values. type is one of the following: 'allow' or 'deny'. User response: Specify a valid NFS V4 type and reissue the command. 6027-989 name is not a valid NFS V4 flag.
Explanation: The mmputacl command found that the READ_ACL, WRITE_ACL, READ_ATTR, or WRITE_ATTR permissions are explicitly being denied to the file owner. This is not permitted, in order to prevent the file being left with an ACL that cannot be modified. User response: Do not select the READ_ACL, WRITE_ACL, READ_ATTR, or WRITE_ATTR permissions on deny ACL entries for the OWNER. 6027-995 This command will run on a remote node, nodeName.
Explanation: A flag specified in an ACL entry is not one of the supported values, or is not valid for the type of object (inherit flags are valid for directories only). Valid values are FileInherit, DirInherit, and InheritOnly. User response: Specify a valid NFS V4 option and reissue the command. 6027-990 Missing permissions (value found, value are required).
Explanation: The mmputacl command was invoked for a file that resides on a file system in a remote cluster, and UID remapping is enabled. To parse the user and group names from the ACL file correctly, the command will be run transparently on a node in the remote cluster. User response: None. Informational message only.
Explanation: The permissions listed are less than the number required. User response: Add the missing permissions and reissue the command.
161
6027-996 6027-1022
6027-996 Error (returnCode) reading policy text from: fileName. 6027-1005 Common is not sole item on [] line number.
Explanation: An error occurred while attempting to open or read the specified policy file. The policy file may be missing or inaccessible. User response: Read all of the related error messages and try to correct the problem. 6027-997 Attention: RULE 'ruleName' attempts to redefine EXTERNAL POOL 'poolName', ignored.
Explanation: A [nodelist] line in the input stream contains common plus any other names. User response: Fix the format of the [nodelist] line in the mmfs.cfg input file. This is usually the NodeFile specified on the mmchconfig command. If no user-specified [nodelist] lines are in error, contact the IBM Support Center. If user-specified [nodelist] lines are in error, correct these lines. 6027-1006 Incorrect custom [ ] line number.
Explanation: Execution continues as if the specified rule was not present. User response: Correct or remove the policy rule. 6027-998 Error in FLR/PDR serving for client clientHostNameAndPortNumber: FLRs=numberOfFileListRecords PDRs=numberOfPolicyDecision Responses pdrs=numberOfPolicyDecisionResponse Records
Explanation: A [nodelist] line in the input stream is not of the format: [nodelist]. This covers syntax errors not covered by messages 6027-1004 and 6027-1005. User response: Fix the format of the list of nodes in the mmfs.cfg input file. This is usually the NodeFile specified on the mmchconfig command. If no user-specified lines are in error, contact the IBM Support Center. If user-specified lines are in error, correct these lines. 6027-1007 attribute found in common multiple times: attribute.
Explanation: A protocol error has been detected among cooperating mmapplypolicy processes. User response: Reissue the command. If the problem persists, contact the IBM Support Center. 6027-999 Authentication failed: myNumericNetworkAddress with partnersNumericNetworkAddress (code= codeIndicatingProtocolStepSequence rc=errnoStyleErrorCode)
Explanation: The attribute specified on the command line is in the main input stream multiple times. This is occasionally legal, such as with the trace attribute. These attributes, however, are not meant to be repaired by mmfixcfg. User response: Fix the configuration file (mmfs.cfg or mmfscfg1 in the SDR). All attributes modified by GPFS configuration commands may appear only once in common sections of the configuration file. 6027-1008 Attribute found in custom multiple times: attribute.
Explanation: Two processes at the specified network addresses failed to authenticate. The cooperating processes should be on the same network; they should not be separated by a firewall. User response: Correct the configuration and try the operation again. If the problem persists, contact the IBM Support Center. 6027-1004 Incorrect [nodelist] format in file: nodeListLine
Explanation: A [nodelist] line in the input stream is not a comma-separated list of nodes. User response: Fix the format of the [nodelist] line in the mmfs.cfg input file. This is usually the NodeFile specified on the mmchconfig command. If no user-specified [nodelist] lines are in error, contact the IBM Support Center. If user-specified [nodelist] lines are in error, correct these lines.
Explanation: The attribute specified on the command line is in a custom section multiple times. This is occasionally legal. These attributes are not meant to be repaired by mmfixcfg. User response: Fix the configuration file (mmfs.cfg or mmfscfg1 in the SDR). All attributes modified by GPFS configuration commands may appear only once in custom sections of the configuration file. 6027-1022 Missing mandatory arguments on command line.
Explanation: Some, but not enough, arguments were specified to the mmcrfsc command.
162
6027-1023 6027-1040
User response: Specify all arguments as per the usage statement that follows. 6027-1023 File system size must be an integer: value 6027-1033 Option optionName specified twice.
Explanation: An option was specified more than once on the command line. User response: Use options only once. 6027-1034 Missing argument after optionName option.
Explanation: The first two arguments specified to the mmcrfsc command are not integers. User response: File system size is an internal argument. The mmcrfs command should never call the mmcrfsc command without a valid file system size argument. Contact the IBM Support Center. 6027-1028 Incorrect value for -name flag.
Explanation: An option was not followed by an argument. User response: All options need an argument. Specify one. 6027-1035 Option -optionName is mandatory.
Explanation: An incorrect argument was specified with an option that requires one of a limited number of allowable options (for example, -s or any of the yes | no options). User response: Use one of the valid values for the specified option. 6027-1029 Incorrect characters in integer field for -name option.
Explanation: A mandatory input option was not specified. User response: Specify all mandatory options. 6027-1036 Option expected at string.
Explanation: An incorrect character was specified with the indicated option. User response: Use a valid integer for the indicated option. 6027-1030 Value below minimum for -optionLetter option. Valid range is from value to value
Explanation: Something other than an expected option was encountered on the latter portion of the command line. User response: Follow the syntax shown. Options may not have multiple values. Extra arguments are not allowed. 6027-1038 IndirectSize must be <= BlockSize and must be a multiple of LogicalSectorSize (512).
Explanation: The value specified with an option was below the minimum. User response: Use an integer in the valid range for the indicated option. 6027-1031 Value above maximum for option -optionLetter. Valid range is from value to value.
Explanation: The IndirectSize specified was not a multiple of 512 or the IndirectSize specified was larger than BlockSize. User response: Use valid values for IndirectSize and BlockSize. 6027-1039 InodeSize must be a multiple of LocalSectorSize (512).
Explanation: The value specified with an option was above the maximum. User response: Use an integer in the valid range for the indicated option. 6027-1032 Incorrect option optionName.
Explanation: The specified InodeSize was not a multiple of 512. User response: Use a valid value for InodeSize. 6027-1040 InodeSize must be less than or equal to Blocksize.
Explanation: An unknown option was specified. User response: Use only the options shown in the syntax.
Explanation: The specified InodeSize was not less than or equal to Blocksize. User response: Use a valid value for InodeSize.
163
6027-1042 6027-1132
6027-1042 DefaultMetadataReplicas must be less than or equal to MaxMetadataReplicas. User response: Specify a valid value or increase the value of the allowed block size by specifying a larger value on the maxblocksize parameter of the mmchconfig command. 6027-1113 Incorrect option: option.
Explanation: The specified DefaultMetadataReplicas was greater than MaxMetadataReplicas. User response: Specify a valid value for DefaultMetadataReplicas. 6027-1043 DefaultDataReplicas must be less than or equal MaxDataReplicas.
Explanation: The specified command option is not valid. User response: Specify a valid option and reissue the command. 6027-1119 Obsolete option: option.
Explanation: The specified DefaultDataReplicas was greater than MaxDataReplicas. User response: Specify a valid value for DefaultDataReplicas. 6027-1055 LogicalSectorSize must be a multiple of 512
Explanation: A command received an option that is not valid any more. User response: Correct the command line and reissue the command. 6027-1120 Interrupt received: No changes made.
Explanation: The specified LogicalSectorSize was not a multiple of 512. User response: Specify a valid LogicalSectorSize. 6027-1056 Blocksize must be a multiple of LogicalSectorSize 32
Explanation: A GPFS administration command (mm...) received an interrupt before committing any changes. User response: None. Informational message only. 6027-1123 Disk name must be specified in disk descriptor.
Explanation: The specified Blocksize was not a multiple of LogicalSectorSize 32. User response: Specify a valid value for Blocksize. 6027-1057 InodeSize must be less than or equal to Blocksize.
Explanation: The disk name positional parameter (the first field) in a disk descriptor was empty. The bad disk descriptor is displayed following this message. User response: Correct the input and rerun the command. 6027-1124 Disk usage must be dataOnly, metadataOnly, descOnly, or dataAndMetadata.
Explanation: The specified InodeSize was not less than or equal to Blocksize. User response: Specify a valid value for InodeSize. 6027-1059 Mode must be M or S: mode
Explanation: The first argument provided in the mmcrfsc command was not M or S. User response: The mmcrfsc command should not be called by a user. If any other command produces this error, contact the IBM Support Center. 6027-1084 The specified block size (valueK) exceeds the maximum allowed block size currently in effect (valueK). Either specify a smaller value for the -B parameter, or increase the maximum block size by issuing: mmchconfig maxblocksize=valueK and restart the GPFS daemon.
Explanation: The disk usage parameter has a value that is not valid. User response: Correct the input and reissue the command. 6027-1132 Interrupt received: changes not propagated.
Explanation: An interrupt was received after changes were committed but before the changes could be propagated to all the nodes. User response: All changes will eventually propagate as nodes recycle or other GPFS administration commands are issued. Changes can be activated now by manually restarting the GPFS daemons.
Explanation: The specified value for block size was greater than the value of the maxblocksize configuration parameter.
164
6027-1133 6027-1149
6027-1133 Interrupt received. Only a subset of the parameters were changed. 6027-1142 File fileName already exists.
Explanation: The specified file already exists. User response: Rename the file or specify a different file name and reissue the command. 6027-1143 Cannot open fileName.
Explanation: An interrupt was received in mmchfs before all of the requested changes could be completed. User response: Use mmlsfs to see what the currently active settings are. Reissue the command if you want to change additional parameters. 6027-1135 Restriping may not have finished.
Explanation: A file could not be opened. User response: Verify that the specified file exists and that you have the proper authorizations. 6027-1144 Incompatible cluster types. You cannot move file systems that were created by GPFS cluster type sourceCluster into GPFS cluster type targetCluster.
Explanation: An interrupt occurred during restriping. User response: Restart the restripe. Verify that the file system was not damaged by running the mmfsck command. 6027-1136 option option specified twice.
Explanation: An option was specified multiple times on a command line. User response: Correct the error on the command line and reissue the command. 6027-1137 option value must be yes or no.
Explanation: The source and target cluster types are incompatible. User response: Contact the IBM Support Center for assistance. 6027-1145 parameter must be greater than 0: value
Explanation: A yes or no option was used with something other than yes or no. User response: Correct the error on the command line and reissue the command. 6027-1138 Incorrect extra argument: argument
Explanation: A negative value had been specified for the named parameter, which requires a positive value. User response: Correct the input and reissue the command. 6027-1147 Error converting diskName into an NSD.
Explanation: Non-option arguments followed the mandatory arguments. User response: Unlike most POSIX commands, the main arguments come first, followed by the optional arguments. Correct the error and reissue the command. 6027-1140 Incorrect integer for option: number.
Explanation: Error encountered while converting a disk into an NSD. User response: Check the preceding messages for more information. 6027-1148 File system fileSystem already exists in the cluster. Use mmchfs -W to assign a new device name for the existing file system.
Explanation: An option requiring an integer argument was followed by something that cannot be parsed as an integer. User response: Specify an integer with the indicated option. 6027-1141 No disk descriptor file specified.
Explanation: You are trying to import a file system into the cluster but there is already a file system with the same name in the cluster. User response: Remove or rename the file system with the conflicting name. 6027-1149 fileSystem is defined to have mount point mountpoint. There is already such a mount point in the cluster. Use mmchfs -T to assign a new mount point to the existing file system.
Explanation: An -F flag was not followed by the path name of a disk descriptor file. User response: Specify a valid disk descriptor file.
Explanation: The cluster into which the file system is being imported already contains a file system with the same mount point as the mount point of the file system being imported.
Chapter 11. Messages
165
6027-1150 6027-1162
User response: Use the -T option of the mmchfs command to change the mount point of the file system that is already in the cluster and then rerun the mmimportfs command. 6027-1150 Error encountered while importing disk diskName. 6027-1156 The NSD servers for the following free disks were reset or not defined: diskList
Explanation: Either the mmimportfs command encountered disks with no NSD servers, or was forced to reset the NSD server information for one or more disks. User response: After the mmimportfs command finishes, use the mmchnsd command to assign NSD server nodes to the disks as needed. 6027-1157 Use the mmchnsd command to assign NSD servers as needed.
Explanation: The mmimportfs command encountered problems while processing the disk. User response: Check the preceding messages for more information. 6027-1151 Disk diskName already exists in the cluster.
Explanation: You are trying to import a file system that has a disk with the same name as some disk from a file system that is already in the cluster. User response: Remove or replace the disk with the conflicting name. 6027-1152 Block size must be 16K, 64K, 128K, 256K, 512K, 1M, 2M, 4M, 8M or 16M.
Explanation: Either the mmimportfs command encountered disks with no NSD servers, or was forced to reset the NSD server information for one or more disks. Check the preceding messages for detailed information. User response: After the mmimportfs command finishes, use the mmchnsd command to assign NSD server nodes to the disks as needed. 6027-1159 The following file systems were not imported: fileSystemList
Explanation: The specified block size value is not valid. User response: Specify a valid block size value. 6027-1153 At least one node in the cluster must be defined as a quorum node.
Explanation: The mmimportfs command was not able to import the specified file systems. Check the preceding messages for error information. User response: Correct the problems and reissue the mmimportfs command. 6027-1160 The drive letters for the following file systems have been reset: fileSystemList.
Explanation: All nodes were explicitly designated or allowed to default to be nonquorum. User response: Specify which of the nodes should be considered quorum nodes and reissue the command. 6027-1154 Incorrect node node specified for command.
Explanation: The drive letters associated with the specified file systems are already in use by existing file systems and have been reset. User response: After the mmimportfs command finishes, use the -t option of the mmchfs command to assign new drive letters as needed. 6027-1161 Use the dash character (-) to separate multiple node designations.
Explanation: The user specified a node that is not valid. User response: Specify a valid node. 6027-1155 The NSD servers for the following disks from file system fileSystem were reset or not defined: diskList
Explanation: A command detected an incorrect character used as a separator in a list of node designations. User response: Correct the command line and reissue the command. 6027-1162 Use the semicolon character (;) to separate the disk names.
Explanation: Either the mmimportfs command encountered disks with no NSD servers, or was forced to reset the NSD server information for one or more disks. User response: After the mmimportfs command finishes, use the mmchnsd command to assign NSD server nodes to the disks as needed.
Explanation: A command detected an incorrect character used as a separator in a list of disk names.
166
6027-1163 6027-1203
User response: Correct the command line and reissue the command. 6027-1163 GPFS is still active on nodeName. 6027-1169 attribute must be value.
Explanation: The specified value of the given attribute is not valid. User response: Specify a valid value. 6027-1178 parameter must be from value to value: valueSpecified
Explanation: The GPFS daemon was discovered to be active on the specified node during an operation that requires the daemon to be stopped. User response: Stop the daemon on the specified node and rerun the command. 6027-1164 Use mmchfs -t to assign drive letters as needed.
Explanation: A parameter value specified was out of range. User response: Keep the specified value within the range shown. 6027-1188 Duplicate disk specified: disk
Explanation: The mmimportfs command was forced to reset the drive letters associated with one or more file systems. Check the preceding messages for detailed information. User response: After the mmimportfs command finishes, use the -t option of the mmchfs command to assign new drive letters as needed. 6027-1165 The PR attributes for the following disks from file system fileSystem were reset or not yet established: diskList
Explanation: A disk was specified more than once on the command line. User response: Specify each disk only once. 6027-1189 You cannot delete all the disks.
Explanation: The number of disks to delete is greater than or equal to the number of disks in the file system. User response: Delete only some of the disks. If you want to delete them all, use the mmdelfs command. 6027-1197 parameter must be greater than value: value.
Explanation: The mmimportfs command disabled the Persistent Reserve attribute for one or more disks. User response: After the mmimportfs command finishes, use the mmchconfig command to enable Persistent Reserve in the cluster as needed. 6027-1166 The PR attributes for the following free disks were reset or not yet established: diskList
Explanation: An incorrect value was specified for the named parameter. User response: Correct the input and reissue the command. 6027-1200 tscrfs failed. Cannot create device
Explanation: The mmimportfs command disabled the Persistent Reserve attribute for one or more disks. User response: After the mmimportfs command finishes, use the mmchconfig command to enable Persistent Reserve in the cluster as needed. 6027-1167 Use mmchconfig to enable Persistent Reserve in the cluster as needed.
Explanation: The internal tscrfs command failed. User response: Check the error message from the command that failed. 6027-1201 Disk diskName does not belong to file system fileSystem.
Explanation: The mmimportfs command disabled the Persistent Reserve attribute for one or more disks. User response: After the mmimportfs command finishes, use the mmchconfig command to enable Persistent Reserve in the cluster as needed. 6027-1168 Inode size must be 512, 1K or 4K.
Explanation: The specified disk was not found to be part of the cited file system. User response: If the disk and file system were specified as part of a GPFS command, reissue the command with a disk that belongs to the specified file system. 6027-1203 Attention: File system fileSystem may have some disks that are in a non-ready state. Issue the command: mmcommon recoverfs fileSystem
Explanation: The specified inode size is not valid. User response: Specify a valid inode size.
167
6027-1204 6027-1213
disks that are in a non-ready state. User response: Run mmcommon recoverfs fileSystem to ensure that the GPFS configuration data for the file system is current, and then display the states of the disks in the file system using the mmlsdisk command. If any disks are in a non-ready state, steps should be taken to bring these disks into the ready state, or to remove them from the file system. This can be done by mounting the file system, or by using the mmchdisk command for a mounted or unmounted file system. When maintenance is complete or the failure has been repaired, use the mmchdisk command with the start option. If the failure cannot be repaired without loss of data, you can use the mmdeldisk command to delete the disks. 6027-1204 command failed. 6027-1208 File system fileSystem not found in cluster clusterName.
Explanation: The specified file system does not belong to the cited remote cluster. The local information about the file system is not current. The file system may have been deleted, renamed, or moved to a different cluster. User response: Contact the administrator of the remote cluster that owns the file system and verify the accuracy of the local information. Use the mmremotefs show command to display the local information about the file system. Use the mmremotefs update command to make the necessary changes. 6027-1209 GPFS is down on this node.
Explanation: GPFS is not running on this node. Explanation: An internal command failed. This is usually a call to the GPFS daemon. User response: Check the error message from the command that failed. 6027-1205 Failed to connect to remote cluster clusterName. User response: Ensure that GPFS is running and reissue the command. 6027-1210 GPFS is not ready to handle commands yet.
Explanation: GPFS is in the process of initializing or waiting for quorum to be reached. User response: Reissue the command. 6027-1211 fileSystem refers to file system fileSystem in cluster clusterName.
Explanation: Attempt to establish a connection to the specified cluster was not successful. This can be caused by a number of reasons: GPFS is down on all of the contact nodes, the contact node list is obsolete, the owner of the remote cluster revoked authorization, and so forth. User response: If the error persists, contact the administrator of the remote cluster and verify that the contact node information is current and that the authorization key files are current as well. 6027-1206 File system fileSystem belongs to cluster clusterName. Command is not allowed for remote file systems.
Explanation: Informational message. User response: None. 6027-1212 File system fileSystem does not belong to cluster clusterName.
Explanation: The specified file system refers to a file system that is remote to the cited cluster. Indirect remote file system access is not allowed. User response: Contact the administrator of the remote cluster that owns the file system and verify the accuracy of the local information. Use the mmremotefs show command to display the local information about the file system. Use the mmremotefs update command to make the necessary changes. 6027-1213 command failed. Error code errorCode.
Explanation: The specified file system is not local to the cluster, but belongs to the cited remote cluster. User response: Choose a local file system, or issue the command on a node in the remote cluster. 6027-1207 There is already an existing file system using value.
Explanation: The mount point or device name specified matches that of an existing file system. The device name and mount point must be unique within a GPFS cluster. User response: Choose an unused name or path.
Explanation: An internal command failed. This is usually a call to the GPFS daemon. User response: Examine the error code and other messages to determine the reason for the failure. Correct the problem and reissue the command.
168
6027-1214 6027-1225
6027-1214 Unable to enable Persistent Reserve on the following disks: diskList 6027-1220 Node nodeName cannot be used as an NSD server for Persistent Reserve disk diskName because it is not an AIX node.
Explanation: The command was unable to set up all of the disks to use Persistent Reserve. User response: Examine the disks and the additional error information to determine if the disks should have supported Persistent Reserve. Correct the problem and reissue the command. 6027-1215 Unable to reset the Persistent Reserve attributes on one or more disks on the following nodes: nodeList
Explanation: The node shown was specified as an NSD server for diskName, but the node does not support Persistent Reserve. User response: Specify a node that supports Persistent Reserve as an NSD server. 6027-1221 The number of NSD servers exceeds the maximum (value) allowed.
Explanation: The command could not reset Persistent Reserve on at least one disk on the specified nodes. User response: Examine the additional error information to determine whether nodes were down or if there was a disk error. Correct the problems and reissue the command. 6027-1216 File fileName contains additional error information.
Explanation: The number of NSD servers in the disk descriptor exceeds the maximum allowed. User response: Change the disk descriptor to specify no more NSD servers than the maximum allowed. 6027-1222 Cannot assign a minor number for file system fileSystem (major number deviceMajorNumber).
Explanation: The command was not able to allocate a minor number for the new file system. User response: Delete unneeded /dev entries for the specified major number and reissue the command. 6027-1223 ipAddress cannot be used for NFS serving; it is used by the GPFS daemon.
Explanation: The command generated a file containing additional error information. User response: Examine the additional error information. 6027-1217 A disk descriptor contains an incorrect separator character.
Explanation: A command detected an incorrect character used as a separator in a disk descriptor. User response: Correct the disk descriptor and reissue the command. 6027-1218 Node nodeName does not have a GPFS server license designation.
Explanation: The IP address shown has been specified for use by the GPFS daemon. The same IP address cannot be used for NFS serving because it cannot be failed over. User response: Specify a different IP address for NFS use and reissue the command. 6027-1224 There is no file system with drive letter driveLetter.
Explanation: The function that you are assigning to the node requires the node to have a GPFS server license. User response: Use the mmchlicense command to assign a valid GPFS license to the node or specify a different node. 6027-1219 NSD discovery on node nodeName failed with return code value.
Explanation: No file system in the GPFS cluster has the specified drive letter. User response: Reissue the command with a valid file system. 6027-1225 Explicit drive letters are supported only in a Windows environment. Specify a mount point or allow the default settings to take effect.
Explanation: The NSD discovery process on the specified node failed with the specified return code. User response: Determine why the node cannot access the specified NSDs. Correct the problem and reissue the command.
Explanation: An explicit drive letter was specified on the mmmount command but the target node does not run the Windows operating system. User response: Specify a mount point or allow the default settings for the file system to take effect.
169
6027-1226 6027-1237
6027-1226 Explicit mount points are not supported in a Windows environment. Specify a drive letter or allow the default settings to take effect. 6027-1232 GPFS failed to initialize the tiebreaker disks.
Explanation: An explicit mount point was specified on the mmmount command but the target node runs the Windows operating system. User response: Specify a drive letter or allow the default settings for the file system to take effect. 6027-1227 The main GPFS cluster configuration file is locked. Retrying...
Explanation: A GPFS command unsuccessfully attempted to initialize the node quorum tiebreaker disks. User response: Examine prior messages to determine why GPFS was unable to initialize the tiebreaker disks and correct the problem. After that, reissue the command. 6027-1233 Incorrect keyword: value.
Explanation: Another GPFS administration command has locked the cluster configuration file. The current process will try to obtain the lock a few times before giving up. User response: None. Informational message only. 6027-1228 Lock creation successful.
Explanation: A command received a keyword that is not valid. User response: Correct the command line and reissue the command. 6027-1234 Adding node node to the cluster will exceed the quorum node limit.
Explanation: The holder of the lock has released it and the current process was able to obtain it. User response: None. Informational message only. The command will now continue. 6027-1229 Timed out waiting for lock. Try again later.
Explanation: An attempt to add the cited node to the cluster resulted in the quorum node limit being exceeded. User response: Change the command invocation to not exceed the node quorum limit, and reissue the command. 6027-1235 The fileName kernel extension does not exist.
Explanation: Another GPFS administration command kept the main GPFS cluster configuration file locked for over a minute. User response: Try again later. If no other GPFS administration command is presently running, see GPFS cluster configuration data files are locked on page 42. 6027-1230 diskName is a tiebreaker disk and cannot be deleted.
Explanation: The cited kernel extension does not exist. User response: Create the needed kernel extension by compiling a custom mmfslinux module for your kernel (see steps in /usr/lpp/mmfs/src/README), or copy the binaries from another node with the identical environment. 6027-1236 Unable to verify kernel/module configuration.
Explanation: A request was made to GPFS to delete a node quorum tiebreaker disk. User response: Specify a different disk for deletion. 6027-1231 GPFS detected more than eight quorum nodes while node quorum with tiebreaker disks is in use.
Explanation: The mmfslinux kernel extension does not exist. User response: Create the needed kernel extension by compiling a custom mmfslinux module for your kernel (see steps in /usr/lpp/mmfs/src/README), or copy the binaries from another node with the identical environment. 6027-1237 The GPFS daemon is still running; use the mmshutdown command.
Explanation: A GPFS command detected more than eight quorum nodes, but this is not allowed while node quorum with tiebreaker disks is in use. User response: Reduce the number of quorum nodes to a maximum of eight, or use the normal node quorum algorithm.
Explanation: An attempt was made to unload the GPFS kernel extensions while the GPFS daemon was still running. User response: Use the mmshutdown command to shut down the daemon.
170
6027-1238 6027-1250
6027-1238 Module fileName is still in use. Unmount all GPFS file systems and issue the command: mmfsadm cleanup mmcrcluster command and cannot be overridden by the user. User response: None. Informational message only. 6027-1245 configParameter must be set with the command command. Line in error: configLine. The line is ignored; processing continues.
Explanation: An attempt was made to unload the cited module while it was still in use. User response: Unmount all GPFS file systems and issue the command mmfsadm cleanup. If this does not solve the problem, reboot the machine. 6027-1239 Error unloading module moduleName.
Explanation: The specified parameter has additional dependencies and cannot be specified prior to the completion of the mmcrcluster command. User response: After the cluster is created, use the specified command to establish the desired configuration parameter. 6027-1246 configParameter is an obsolete parameter. Line in error: configLine. The line is ignored; processing continues.
Explanation: GPFS was unable to unload the cited module. User response: Unmount all GPFS file systems and issue the command mmfsadm cleanup. If this does not solve the problem, reboot the machine. 6027-1240 Module fileName is already loaded.
Explanation: An attempt was made to load the cited module, but it was already loaded. User response: None. Informational message only. 6027-1241 diskName was not found in /proc/partitions.
Explanation: The specified parameter is not used by GPFS anymore. User response: None. Informational message only. 6027-1247 configParameter cannot appear in a node-override section. Line in error: configLine. The line is ignored; processing continues.
Explanation: The cited disk was not found in /proc/partitions. User response: Take steps to cause the disk to appear in /proc/partitions, and then reissue the command. 6027-1242 GPFS is waiting for requiredCondition. 6027-1248
Explanation: The specified parameter must have the same value across all nodes in the cluster. User response: None. Informational message only. Mount point can not be a relative path name: path
Explanation: GPFS is unable to come up immediately due to the stated required condition not being satisfied yet. User response: This is an informational message. As long as the required condition is not satisfied, this message will repeat every five minutes. You may want to stop the GPFS daemon after a while, if it will be a long time before the required condition will be met. 6027-1243 command: Processing user configuration file: fileName
Explanation: The mount point does not begin with /. User response: Specify the absolute path name for the mount point. 6027-1249 operand can not be a relative path name: path.
Explanation: The specified path name does not begin with '/'. User response: Specify the absolute path name. 6027-1250 Key file is not valid.
Explanation: Progress information for the mmcrcluster command. User response: None. Informational message only. 6027-1244 configParameter is set by the mmcrcluster processing. Line in error: configLine. The line will be ignored; processing continues.
Explanation: While attempting to establish a connection to another node, GPFS detected that the format of the public key file is not valid. User response: Use the mmremotecluster command to specify the correct public key.
171
6027-1251 6027-1262
6027-1251 Key file mismatch. 6027-1257 There are uncommitted authentication files. You must first run: command.
Explanation: While attempting to establish a connection to another node, GPFS detected that the public key file does not match the public key file of the cluster to which the file system belongs. User response: Use the mmremotecluster command to specify the correct public key. 6027-1252 Node nodeName already belongs to the GPFS cluster.
Explanation: You are attempting to generate new public/private key files but previously generated files have not been committed yet. User response: Run the specified command to commit the current public/private key pair. 6027-1258 You must establish a cipher list first. Run: command.
Explanation: A GPFS command found that a node to be added to a GPFS cluster already belongs to the cluster. User response: Specify a node that does not already belong to the GPFS cluster. 6027-1253 Incorrect value for option option.
Explanation: You are attempting to commit an SSL private key but a cipher list has not been established yet. User response: Run the specified command to specify a cipher list. 6027-1259 command not found. Ensure the OpenSSL code is properly installed.
Explanation: The provided value for the specified option is not valid. User response: Correct the error and reissue the command. 6027-1254 Warning: Not all nodes have proper GPFS license designations. Use the mmchlicense command to designate licenses as needed.
Explanation: The specified command was not found. User response: Ensure the OpenSSL code is properly installed and reissue the command. 6027-1260 For the logical volume specification -l lvName to be valid lvName must be the only logical volume in the volume group. However, volume group vgName contains the following logical volumes:
Explanation: Not all nodes in the cluster have valid license designations. User response: Use mmlslicense to see the current license designations. Use mmchlicense to assign valid GPFS licenses to all nodes as needed. 6027-1255 There is nothing to commit. You must first run: command.
Explanation: The specified logical volume is not the only logical volume in the volume group. The others are listed. User response: Check your logical volume and volume group names. Reissue the command. 6027-1261 logicalVolume is not a valid logical volume.
Explanation: You are attempting to commit an SSL private key but such a key has not been generated yet. User response: Run the specified command to generate the public/private key pair. 6027-1256 The current authentication files are already committed.
Explanation: The specified logical volume is not a valid logical volume, with a corresponding volume group. User response: Reissue the command using a valid logical volume. 6027-1262 vgName is not a valid volume group name.
Explanation: You are attempting to commit public/private key files that were previously generated with the mmauth command. The files have already been committed. User response: None. Informational message.
Explanation: The specified volume group name is not correct. User response: Reissue the command using a valid volume group name.
172
6027-1263 6027-1277
6027-1263 For the hdisk specification -h physicalDiskName to be valid, physicalDiskName must be the only disk in the volume group. However, volume group vgName contains the following disks: 6027-1271 Unexpected error from command. Return code: value.
Explanation: A GPFS administration command (mm...) received an unexpected error code from an internally called command. User response: Perform problem determination. See GPFS commands are unsuccessful on page 55. 6027-1272 Unknown user name userName.
Explanation: The specified volume group was found to contain the disks listed. User response: Check your configuration. Ensure that you are referring to the correct volume group and hdisks. Remove extraneous disks from the volume group and reissue the command. 6027-1264 physicalDiskName is not a valid physical volume name.
Explanation: The specified value cannot be resolved to a valid user ID (UID). User response: Reissue the command with a valid user name. 6027-1273 Unknown group name groupName.
Explanation: The specified physical disk name is not correct. User response: Reissue the command using a valid physical disk name. 6027-1265 pvid is not a valid physical volume id.
Explanation: The specified value cannot be resolved to a valid group ID (GID). User response: Reissue the command with a valid group name. 6027-1274 Unexpected error obtaining the lockName lock.
Explanation: The specified physical volume ID is not correct. User response: Reissue the command using a valid physical volume ID. 6027-1268 Missing arguments.
Explanation: GPFS cannot obtain the specified lock. User response: Examine any previous error messages. Correct any problems and reissue the command. If the problem persists, perform problem determination and contact the IBM Support Center. 6027-1275 Daemon node adapter Node was not found on admin node Node.
Explanation: A GPFS administration command received an insufficient number of arguments. User response: Correct the command line and reissue the command. 6027-1269 The device name device starts with a slash, but not /dev/.
Explanation: An input node descriptor was found to be incorrect. The node adapter specified for GPFS daemon communications was not found to exist on the cited GPFS administrative node. User response: Correct the input node descriptor and reissue the command. 6027-1276 Command failed for disks: diskList.
Explanation: The device name does not start with /dev/. User response: Correct the device name. 6027-1270 The device name device contains a slash, but not as its first character.
Explanation: A GPFS command was unable to complete successfully on the listed disks. User response: Correct the problems and reissue the command. 6027-1277 No contact nodes were provided for cluster clusterName.
Explanation: The specified device name contains a slash, but the first character is not a slash. User response: The device name must be an unqualified device name or an absolute device path name, for example: fs0 or /dev/fs0.
Explanation: A GPFS command found that no contact nodes have been specified for the cited cluster. User response: Use the mmremotecluster command to specify some contact nodes for the cited cluster.
173
6027-1278 6027-1296
6027-1278 None of the contact nodes in cluster clusterName can be reached. 6027-1291 Options name and name cannot be specified at the same time.
Explanation: A GPFS command was unable to reach any of the contact nodes for the cited cluster. User response: Determine why the contact nodes for the cited cluster cannot be reached and correct the problem, or use the mmremotecluster command to specify some additional contact nodes that can be reached. 6027-1287 Node nodeName returned ENODEV for disk diskName.
Explanation: Incompatible options were specified on the command line. User response: Select one of the options and reissue the command. 6027-1292 nodeList cannot be used with attribute name.
Explanation: The specified configuration attribute cannot be changed on only a subset of nodes. This attribute must be the same on all nodes in the cluster. User response: Certain attributes, such as autoload, may not be customized from node to node. Change the attribute for the entire cluster. 6027-1293 There are no remote file systems.
Explanation: The specified node returned ENODEV for the specified disk. User response: Determine the cause of the ENODEV error for the specified disk and rectify it. The ENODEV may be due to disk fencing or the removal of a device that previously was present. 6027-1288 Remote cluster clusterName was not found.
Explanation: A value of all was specified for the remote file system operand of a GPFS command, but no remote file systems are defined. User response: None. There are no remote file systems on which to operate. 6027-1294 Remote file system fileSystem is not defined.
Explanation: A GPFS command found that the cited cluster has not yet been identified to GPFS as a remote cluster. User response: Specify a remote cluster known to GPFS, or use the mmremotecluster command to make the cited cluster known to GPFS. 6027-1289 Name name is not allowed. It contains the following invalid special character: char
Explanation: The specified file system was used for the remote file system operand of a GPFS command, but the file system is not known to GPFS. User response: Specify a remote file system known to GPFS. 6027-1295 The GPFS configuration information is incorrect or not available.
Explanation: The cited name is not allowed because it contains the cited invalid special character. User response: Specify a name that does not contain an invalid special character, and reissue the command. 6027-1290 GPFS configuration data for file system fileSystem may not be in agreement with the on-disk data for the file system. Issue the command: mmcommon recoverfs fileSystem
Explanation: A problem has been encountered while verifying the configuration information and the execution environment. User response: Check the preceding messages for more information. Correct the problem and restart GPFS. 6027-1296 Device name cannot be 'all'.
Explanation: GPFS detected that the GPFS configuration database data for the specified file system may not be in agreement with the on-disk data for the file system. This may be caused by a GPFS disk command that did not complete normally. User response: Issue the specified command to bring the GPFS configuration database into agreement with the on-disk data.
Explanation: A device name of all was specified on a GPFS command. User response: Reissue the command with a valid device name.
174
6027-1297 6027-1308
6027-1297 Each device specifies metadataOnly for disk usage. This file system could not store data. 6027-1303 The concurrent virtual shared disk flag cannot be applied to previously-created virtual shared disk diskName. The flag will be ignored.
Explanation: All disk descriptors specify metadataOnly for disk usage. User response: Change at least one disk descriptor in the file system to indicate the usage of dataOnly or dataAndMetadata. 6027-1298 Each device specifies dataOnly for disk usage. This file system could not store metadata.
Explanation: The concurrent virtual shared disk flag was specified, but it cannot be applied to the cited virtual shared disk since it has already been created. The flag will be ignored. User response: None. Informational message only. 6027-1304 Missing argument after option option.
Explanation: All disk descriptors specify dataOnly for disk usage. User response: Change at least one disk descriptor in the file system to indicate a usage of metadataOnly or dataAndMetadata.
Explanation: The specified command option requires a value. User response: Specify a value and reissue the command. 6027-1305 Cannot execute (command): return code value
| | |
6027-1299
Explanation: A command was not successfully invoked. User response: Determine why the command is not accessible. 6027-1306 Command command failed with return code value.
Explanation: The specified failure group is not valid. User response: Correct the problem and reissue the command. 6027-1300 No file systems were found.
Explanation: A GPFS command searched for file systems, but none were found. User response: Create a GPFS file system before reissuing the command. 6027-1301 The NSD servers specified in the disk descriptor do not match the NSD servers currently in effect.
Explanation: A command was not successfully processed. User response: Correct the failure specified by the command and reissue the command. 6027-1307 Disk disk on node nodeName already has a volume group vgName that does not appear to have been created by this program in a prior invocation. Correct the descriptor file or remove the volume group and retry.
Explanation: The set of NSD servers specified in the disk descriptor does not match the set that is currently in effect. User response: Specify the same set of NSD servers in the disk descriptor as is currently in effect or omit it from the disk descriptor and then reissue the command. Use the mmchnsd command to change the NSD servers as needed. 6027-1302 clusterName is the name of the local cluster.
Explanation: The specified disk already belongs to a volume group. User response: Either remove the volume group or remove the disk descriptor and retry. 6027-1308 Disk disk on node nodeName already has a logical volume vgName that doesn't appear to have been created by this program in a prior invocation. Correct the descriptor file or remove the logical volume and retry.
Explanation: The cited cluster name was specified as the name of a remote cluster, but it is already being used as the name of the local cluster. User response: Use the mmchcluster command to change the name of the local cluster, and then reissue the command that failed.
Explanation: The specified disk already has a logical volume. User response: Either remove the logical volume or remove the disk descriptor and retry.
Chapter 11. Messages
175
6027-1309 6027-1333
6027-1309 Disk disk on node nodeName already has multiple logical volumes that don't appear to have been created by this program in a prior invocation. Correct the descriptor file or remove the logical volumes and retry. 6027-1325 A descriptor was encountered whose volume (volumeName) is not a previously-defined virtual shared disk and for which no primary server was specified.
Explanation: The specified disk already had multiple logical volumes. User response: Either remove the logical volumes or remove the disk descriptor and retry. 6027-1311 The global volume group vgName we're attempting to define for node nodeName and disk disk is already defined with different server nodes or volume group. Correct the descriptor file or remove the offending global volume group and retry.
Explanation: A disk descriptor specified a volume that was not a previously-defined virtual shared disk and for which no primary server was specified. User response: Either specify an existing virtual shared disk as the volume, or specify a primary server within the offending disk descriptor. 6027-1327 Cannot open (fileName): return code value.
Explanation: The file could not be opened. User response: Either correct the permission bits or specify the correct file name. 6027-1328 Cannot determine node number for host nodeName. Either the node is not known to the IBM Virtual Shared Disk subsystem, or it is not a GPFS administration node name.
Explanation: The global volume group was defined with different parameters. User response: Either remove the global volume group or remove the disk descriptor and retry. 6027-1312 The virtual shared disk diskName we're attempting to define for global volume group vgName, node nodeName, and logical volume lvName is already defined with different parameters. Correct the descriptor file or remove the offending virtual shared disk and retry.
Explanation: A lookup in the cluster configuration data could not determine the node number. This can be caused by specifying a node interface that is not known to the IBM Virtual Shared Disk subsystem, or by specifying a node interface other than a GPFS administration node name. User response: Correct the node or host definition in the SDR. 6027-1330 diskName is already in volume group vgName and cannot be added to vgName.
Explanation: The virtual shared disk was already defined. User response: Either remove the virtual shared disk or remove the disk descriptor and retry. 6027-1323 A prior invocation of this command has recorded a partial completion in the file name. Should we restart at prior failing step (number)? [y/n]
Explanation: The specified disk already belongs to a volume group. User response: Either remove the volume group or specify a different disk. 6027-1332 Cannot find disk with command.
Explanation: The mmcrvsd command was restarted on a prior failing descriptor file. User response: Either allow the command to continue at the failing step or remove the failing step specification from the descriptor file. 6027-1324 Unable to rewrite new descriptor to the file fileName. Ensure the file system has enough space and retry.
Explanation: The specified disk cannot be found. User response: Specify a correct disk name. 6027-1333 The following nodes could not be restored: nodeList. Correct the problems and use the mmsdrrestore command to recover these nodes.
Explanation: There was either not enough space or the file system was write protected. User response: Either expand the file system or remove the write protection.
Explanation: The mmsdrrestore command was unable to restore the configuration information for the listed nodes. User response: Correct the problems and reissue the
176
6027-1334 6027-1347
mmsdrrestore command for these nodes. 6027-1334 Incorrect value for option option. Valid values are: validValues. 6027-1339 Disk usage value is incompatible with storage pool name.
Explanation: An incorrect argument was specified with an option requiring one of a limited number of legal options. User response: Use one of the legal values for the indicated option. 6027-1335 Command completed: Not all required changes were made.
Explanation: A disk descriptor specified a disk usage involving metadata and a storage pool other than system. User response: Change the descriptor's disk usage field to dataOnly, or do not specify a storage pool name. 6027-1340 File fileName not found. Recover the file or run mmauth genkey.
Explanation: The cited file was not found. User response: Recover the file or run the mmauth genkey command to recreate it. 6027-1341 Starting force unmount of GPFS file systems.
Explanation: Some, but not all, of the required changes were made. User response: Examine the preceding messages, correct the problems, and reissue the command. 6027-1336 Volume group vgName cannot be imported on node nodeName because the disk with physical volume ID value cannot be found.
Explanation: Progress information for the mmshutdown command. User response: None. Informational message only. 6027-1342 Unmount not finished after value seconds. Waiting value more seconds.
Explanation: The specified volume group cannot be imported because a disk with the specified pvid cannot be found. This problem may be caused by another node having a reserve on the disk, thus preventing access by the node trying to import the disk. User response: Ensure the disk with the specified physical volume ID is known on the specified node. If a node in the cluster has a reserve on the disk, release the reserve. When the disk with the specified physical volume ID is known on the specified node, reissue the command. 6027-1337 Failed to obtain DCE credentials; dsrvtgt name command rc= value. Continuing.
Explanation: Progress information for the mmshutdown command. User response: None. Informational message only. 6027-1343 Unmount not finished after value seconds.
Explanation: Progress information for the mmshutdown command. User response: None. Informational message only. 6027-1344 Shutting down GPFS daemons.
Explanation: An attempt to obtain spbgroot DCE credentials has failed. Processing continues, but there may be a authentication failure later on. User response: Go to the Parallel System Support Programs for AIX: Diagnosis Guide and search on diagnosing per node key management (PNKM) problems. Follow the problem determination and repair actions specified. 6027-1338 Command is not allowed for remote file systems.
Explanation: Progress information for the mmshutdown command. User response: None. Informational message only. 6027-1345 Finished.
Explanation: Progress information for the mmshutdown command. User response: None. Informational message only. 6027-1347 Disk with NSD volume id NSD volume id no longer exists in the GPFS cluster configuration data but the NSD volume id was not erased from the disk. To remove the NSD volume id, issue: mmdelnsd -p NSD volume id
Chapter 11. Messages
Explanation: A command for which a remote file system is not allowed was issued against a remote file system. User response: Choose a local file system, or issue the command on a node in the cluster that owns the file system.
177
6027-1348 6027-1363
Explanation: A GPFS administration command (mm...) successfully removed the disk with the specified NSD volume id from the GPFS cluster configuration data but was unable to erase the NSD volume id from the disk. User response: Issue the specified command to remove the NSD volume id from the disk. 6027-1348 Disk with NSD volume id NSD volume id no longer exists in the GPFS cluster configuration data but the NSD volume id was not erased from the disk. To remove the NSD volume id, issue: mmdelnsd -p NSD volume id -N nodeNameList 6027-1356 The same primary and backup server node (nodeName) cannot be specified for disk (disk).
Explanation: The primary and backup server nodes specified for an virtual shared disk are the same node. User response: Specify different nodes for the primary and backup servers. 6027-1357 An internode connection between GPFS nodes was disrupted.
Explanation: An internode connection between GPFS nodes was disrupted, preventing its successful completion. User response: Reissue the command. If the problem recurs, determine and resolve the cause of the disruption. If the problem persists, contact the IBM Support Center. 6027-1358 No clusters are authorized to access this cluster.
Explanation: A GPFS administration command (mm...) successfully removed the disk with the specified NSD volume id from the GPFS cluster configuration data but was unable to erase the NSD volume id from the disk. User response: Issue the specified command to remove the NSD volume id from the disk. 6027-1352 fileSystem is not a remote file system known to GPFS.
Explanation: Self-explanatory. User response: This is an informational message. 6027-1359 Cluster clusterName is not authorized to access this cluster.
Explanation: The cited file system is not the name of a remote file system known to GPFS. User response: Use the mmremotefs command to identify the cited file system to GPFS as a remote file system, and then reissue the command that failed. 6027-1354 A prior invocation of this command has recorded a partial completion in the file (fileName). We will restart at prior failing step (value).
Explanation: Self-explanatory. User response: This is an informational message. 6027-1361 Attention: There are no available valid VFS type values for mmfs in /etc/vfs.
Explanation: The mmcrvsd command was restarted on a prior failing descriptor file. User response: None. Informational message only. 6027-1355 The nodes (nodeName) and (nodeName) specified for disk (disk) must be defined within the same concurrent IBM Virtual Shared Disk cluster.
Explanation: An out of range number was used as the vfs number for GPFS. User response: The valid range is 8 through 32. Check /etc/vfs and remove unneeded entries. 6027-1362 There are no remote cluster definitions.
Explanation: An attempt had been made to create concurrent virtual shared disks and the specified nodes are either not defined in the concurrent IBM Virtual Shared Disk cluster, or the specified nodes are not in the same cluster. User response: Specify nodes in the same concurrent IBM Virtual Shared Disk cluster using the vsdnode command.
Explanation: A value of all was specified for the remote cluster operand of a GPFS command, but no remote clusters are defined. User response: None. There are no remote clusters on which to operate. 6027-1363 Remote cluster clusterName is not defined.
Explanation: The specified cluster was specified for the remote cluster operand of a GPFS command, but the cluster is not known to GPFS. User response: Specify a remote cluster known to GPFS.
178
6027-1364 6027-1375
6027-1364 No disks specified 6027-1369 Virtual shared disk diskName that was specified as input does not have a value defined for global volume group or logical volume name. The defined global volume group is vgName and the defined logical volume name is lvName.
Explanation: There were no disks in the descriptor list or file. User response: Specify at least one disk. 6027-1365 Disk diskName already belongs to file system fileSystem.
Explanation: The specified disk name is already assigned to a GPFS file system. This may be because the disk was specified more than once as input to the command, or because the disk was assigned to a GPFS file system in the past. User response: Specify the disk only once as input to the command, or specify a disk that does not belong to a file system. 6027-1366 File system fileSystem has some disks that are in a non-ready state.
Explanation: The specified input virtual shared disk does not have values defined for its global volume group or its logical volume name. User response: Recreate the offending virtual shared disk and try again. 6027-1370 The following nodes could not be reached:
Explanation: A GPFS command was unable to communicate with one or more nodes in the cluster. A list of the nodes that could not be reached follows. User response: Determine why the reported nodes could not be reached and resolve the problem. 6027-1371 Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
Explanation: The specified file system has some disks that are in a non-ready state. User response: Run mmcommon recoverfs fileSystem to ensure that the GPFS configuration data for the file system is current. If some disks are still in a non-ready state, display the states of the disks in the file system using the mmlsdisk command. Any disks in an undesired non-ready state should be brought into the ready state by using the mmchdisk command or by mounting the file system. If these steps do not bring the disks into the ready state, use the mmdeldisk command to delete the disks from the file system. 6027-1367 Attention: Not all disks were marked as available.
Explanation: A process is initiated to distribute the cluster configuration data to other nodes in the cluster. User response: This is an informational message. The command does not wait for the distribution to finish. 6027-1373 There is no file system information in input file fileName.
Explanation: The cited input file passed to the mmimportfs command contains no file system information. No file system can be imported. User response: Reissue the mmimportfs command while specifying a valid input file. 6027-1374 File system fileSystem was not found in input file fileName.
Explanation: The process of marking the disks as available could not be completed. User response: Before adding these disks to a GPFS file system, you should either reformat them, or use the -v no option on the mmcrfs or mmadddisk command. 6027-1368 This GPFS cluster contains declarations for remote file systems and clusters. You cannot delete the last node.
Explanation: The specified file system was not found in the input file passed to the mmimportfs command. The file system cannot be imported. User response: Reissue the mmimportfs command while specifying a file system that exists in the input file. 6027-1375 The following file systems were not imported: fileSystem.
Explanation: An attempt has been made to delete a GPFS cluster that still has declarations for remote file systems and clusters. User response: Before deleting the last node of a GPFS cluster, delete all remote cluster and file system information. Use the delete option of the mmremotecluster and mmremotefs commands.
Explanation: The mmimportfs command was unable to import one or more of the file systems in the input file. A list of the file systems that could not be imported follows.
179
6027-1376 6027-1388
User response: Examine the preceding messages, rectify the problems that prevented the importation of the file systems, and reissue the mmimportfs command. 6027-1376 Disk nsdName with pvid pvid already exists in the cluster configuration file. If this NSD is built on a virtual shared disk that no longer exists, remove it from the cluster configuration file by issuing: mmdelnsd nsdName. 6027-1381 Mount point cannot be specified when mounting all file systems.
Explanation: A device name of all and a mount point were specified on the mmmount command. User response: Reissue the command with a device name for a single file system or do not specify a mount point. 6027-1382 This node does not belong to a GPFS cluster.
Explanation: The command could not create the desired disk because there is already a disk with the same pvid recorded in the cluster configuration file /var/mmfs/gen/mmsdrfs. User response: If nsdName is built on a virtual shared disk that no longer exists, issue the specified command to remove the NSD from the cluster configuration file. 6027-1377 Attention: Unknown attribute specified: name. Press the ENTER key to continue.
Explanation: The specified node does not appear to belong to a GPFS cluster, or the GPFS configuration information on the node has been lost. User response: Informational message. If you suspect that there is corruption of the GPFS configuration information, recover the data following the procedures outlined in Recovery from loss of GPFS cluster configuration data file on page 42. 6027-1383 There is no record for this node in file fileName. Either the node is not part of the cluster, the file is for a different cluster, or not all of the node's adapter interfaces have been activated yet.
Explanation: The mmchconfig command received an unknown attribute. User response: Unless directed otherwise by the IBM Support Center, press any key to bypass this attribute. 6027-1378 Incorrect record found in the mmsdrfs file (code value):
Explanation: A line that is not valid was detected in the main GPFS cluster configuration file /var/mmfs/gen/mmsdrfs. User response: The data in the cluster configuration file is incorrect. If no user modifications have been made to this file, contact the IBM Support Center. If user modifications have been made, correct these modifications. 6027-1379 There is no file system with mount point mountpoint.
Explanation: The mmsdrrestore command cannot find a record for this node in the specified cluster configuration file. The search of the file is based on the currently active IP addresses of the node as reported by the ifconfig command. User response: Ensure that all adapter interfaces are properly functioning. Ensure that the correct GPFS configuration file is specified on the command line. If the node indeed is not a member of the cluster, use the mmaddnode command instead. 6027-1386 Unexpected value for Gpfs object: value.
Explanation: A function received a value that is not allowed for the Gpfs object. User response: Perform problem determination. 6027-1388 File system fileSystem is not known to the GPFS cluster.
Explanation: No file system in the GPFS cluster has the specified mount point. User response: Reissue the command with a valid file system. 6027-1380 File system fileSystem is already mounted at mountpoint.
Explanation: The file system was not found in the GPFS cluster. User response: If the file system was specified as part of a GPFS command, reissue the command with a valid file system.
Explanation: The specified file system is mounted at a mount point different than the one requested on the mmmount command. User response: Unmount the file system and reissue the command.
180
6027-1390 6027-1507
6027-1390 Node node does not belong to the GPFS cluster, or was specified as input multiple times. this is a normal disk device then take needed repair action on the specified disk. 6027-1501 Volume label of disk name is name, should be uid.
Explanation: Nodes that are not valid were specified. User response: Verify the list of nodes. All specified nodes must belong to the GPFS cluster, and each node can be specified only once. 6027-1393 Incorrect node designation specified: type.
Explanation: The UID in the disk descriptor does not match the expected value from the file system descriptor. This could occur if a disk was overwritten by another application or if the IBM Virtual Shared Disk subsystem incorrectly identified the disk. User response: Check the disk configuration. 6027-1502 Volume label of disk diskName is corrupt.
Explanation: A node designation that is not valid was specified. Valid values are client or manager. User response: Correct the command line and reissue the command. 6027-1394 Operation not allowed for the local cluster.
Explanation: The requested operation cannot be performed for the local cluster. User response: Specify the name of a remote cluster.
Explanation: The disk descriptor has a bad magic number, version, or checksum. This could occur if a disk was overwritten by another application or if the IBM Virtual Shared Disk subsystem incorrectly identified the disk. User response: Check the disk configuration. 6027-1503 Completed adding disks to file system name.
6027-1397
Explanation: A disk descriptor passed to the command specified an existing virtual shared disk and a desired disk name. An existing virtual shared disk cannot be renamed. User response: Change the disk descriptor to specify a disk that is not an existing virtual shared disk or to not specify a desired disk name, and then reissue the command. 6027-1450 Could not allocate storage.
Explanation: The mmadddisk command successfully completed. User response: None. Informational message only. 6027-1504 File name could not be run with err error.
Explanation: A failure occurred while trying to run an external program. User response: Make sure the file exists. If it does, check its access permissions. 6027-1505 Could not get minor number for name.
Explanation: Sufficient memory cannot be allocated to run the mmsanrepairfs command. User response: Increase the amount of memory available. 6027-1500 Open devicetype device failed with error:
Explanation: Could not obtain a minor number for the specified block or character device. User response: Problem diagnosis will depend on the subsystem that the device belongs to. For example, device /dev/VSD0 belongs to the IBM Virtual Shared Disk subsystem and problem determination should follow guidelines in that subsystem's documentation. 6027-1507 READ_KEYS ioctl failed with errno=returnCode, tried timesTried times. Related values are scsi_status=scsiStatusValue, sense_key=senseKeyValue, scsi_asc=scsiAscValue, scsi_ascq=scsiAscqValue.
Explanation: The "open" of a device failed. Operation of the file system may continue unless this device is needed for operation. If this is a replicated disk device, it will often not be needed. If this is a block or character device for another subsystem (such as /dev/VSD0) then GPFS will discontinue operation. User response: Problem diagnosis will depend on the subsystem that the device belongs to. For instance device "/dev/VSD0" belongs to the IBM Virtual Shared Disk subsystem and problem determination should follow guidelines in that subsystem's documentation. If
Explanation: A READ_KEYS ioctl call failed with the errno= and related values shown.
181
6027-1508 6027-1517
User response: Check the reported errno= value and try to correct the problem. If the problem persists, contact the IBM Support Center. 6027-1508 Registration failed with errno=returnCode, tried timesTried times. Related values are scsi_status=scsiStatusValue, sense_key=senseKeyValue, scsi_asc=scsiAscValue, scsi_ascq=scsiAscqValue. 6027-1513 DiskName is not an sg device, or sg driver is older than sg3
Explanation: The disk is not a SCSI disk, or supports SCSI standard older than SCSI 3. User response: Correct the command invocation and try again. 6027-1514 ioctl failed with rc=returnCode. Related values are SCSI status=scsiStatusValue, host_status=hostStatusValue, driver_status=driverStatsValue.
Explanation: A REGISTER ioctl call failed with the errno= and related values shown. User response: Check the reported errno= value and try to correct the problem. If the problem persists, contact the IBM Support Center. 6027-1509 READRES ioctl failed with errno=returnCode, tried timesTried times. Related values are scsi_status=scsiStatusValue, sense_key=senseKeyValue, scsi_asc=scsiAscValue, scsi_ascq=scsiAscqValue.
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-1515 READ KEY ioctl failed with rc=returnCode. Related values are SCSI status=scsiStatusValue, host_status=hostStatusValue, driver_status=driverStatsValue.
Explanation: A READRES ioctl call failed with the errno= and related values shown. User response: Check the reported errno= value and try to correct the problem. If the problem persists, contact the IBM Support Center. 6027-1510 Error mounting file system stripeGroup on mountPoint; errorQualifier (gpfsErrno).
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-1516 REGISTER ioctl failed with rc=returnCode. Related values are SCSI status=scsiStatusValue, host_status=hostStatusValue, driver_status=driverStatsValue.
Explanation: An error occurred while attempting to mount a GPFS file system on Windows. User response: Examine the error details, previous errors, and the GPFS message log to identify the cause. 6027-1511 Error unmounting file system stripeGroup; errorQualifier (gpfsErrno).
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-1517 READ RESERVE ioctl failed with rc=returnCode. Related values are SCSI status=scsiStatusValue, host_status=hostStatusValue, driver_status=driverStatsValue.
Explanation: An error occurred while attempting to unmount a GPFS file system on Windows. User response: Examine the error details, previous errors, and the GPFS message log to identify the cause. 6027-1512 WMI query for queryType failed; errorQualifier (gpfsErrno).
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center.
Explanation: An error occurred while running a WMI query on Windows. User response: Examine the error details, previous errors, and the GPFS message log to identify the cause.
182
6027-1518 6027-1533
6027-1518 RESERVE ioctl failed with rc=returnCode. Related values are SCSI status=scsiStatusValue, host_status=hostStatusValue, driver_status=driverStatsValue. 6027-1523 Disk name longer than value is not allowed.
Explanation: The specified disk name is too long. User response: Reissue the command with a valid disk name. 6027-1524 Registration failed. The READ_KEYS ioctl data does not contain the key that was passed as input.
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-1519 INQUIRY ioctl failed with rc=returnCode. Related values are SCSI status=scsiStatusValue, host_status=hostStatusValue, driver_status=driverStatsValue.
Explanation: A REGISTER ioctl call apparently succeeded, but when the device was queried for the key, the key was not found. User response: Check the device subsystem and try to correct the problem. If the problem persists, contact the IBM Support Center. 6027-1530 Attention: parameter is set to value.
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-1520 PREEMPT ABORT ioctl failed with rc=returnCode. Related values are SCSI status=scsiStatusValue, host_status=hostStatusValue, driver_status=driverStatsValue.
Explanation: A configuration parameter is temporarily assigned a new value. User response: Check the mmfs.cfg file. Use the mmchconfig command to set a valid value for the parameter. 6027-1531 parameter value
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-1521 Can not find register key registerKeyValue at device diskName.
Explanation: The configuration parameter was changed from its default value. User response: Check the mmfs.cfg file. 6027-1532 Attention: parameter (value) is not valid in conjunction with parameter (value).
Explanation: Unable to find given register key at the disk. User response: Correct the problem and reissue the command. 6027-1522 CLEAR ioctl failed with rc=returnCode. Related values are SCSI status=scsiStatusValue, host_status=hostStatusValue, driver_status=driverStatsValue.
Explanation: A configuration parameter has a value that is not valid in relation to some other parameter. This can also happen when the default value for some parameter is not sufficiently large for the new, user set value of a related parameter. User response: Check the mmfs.cfg file. 6027-1533 parameter cannot be set dynamically.
Explanation: The mmchconfig command encountered a configuration parameter that cannot be set dynamically. User response: Check the mmchconfig command arguments. If the parameter must be changed, use the mmshutdown, mmchconfig, and mmstartup sequence of commands.
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center.
183
6027-1534 6027-1545
6027-1534 parameter must have a value. you will need to purchase a GPFS license to continue using GPFS. 6027-1542 Old shared memory exists but it is not valid nor cleanable.
Explanation: The tsctl command encountered a configuration parameter that did not have a specified value. User response: Check the mmchconfig command arguments. 6027-1535 Unknown config name: parameter
Explanation: A new GPFS daemon started and found existing shared segments. The contents were not recognizable, so the GPFS daemon could not clean them up. User response: 1. Stop the GPFS daemon from trying to start by issuing the mmshutdown command for the nodes having the problem. 2. Find the owner of the shared segments with keys from 0x9283a0ca through 0x9283a0d1. If a non-GPFS program owns these segments, GPFS cannot run on this node. 3. If these segments are left over from a previous GPFS daemon: a. Remove them by issuing: ipcrm -m shared_memory_id b. Restart GPFS by issuing the mmstartup command on the affected nodes. 6027-1543 error propagating parameter.
Explanation: The tsctl command encountered an unknown configuration parameter. User response: Check the mmchconfig command arguments. 6027-1536 parameter must be set using the tschpool command.
Explanation: The tsctl command encountered a configuration parameter that must be set using the tschpool command. User response: Check the mmchconfig command arguments. 6027-1537 Connect failed to ipAddress: reason
Explanation: An attempt to connect sockets between nodes failed. User response: Check the reason listed and the connection to the indicated IP address. 6027-1538 Connect in progress to ipAddress
Explanation: mmfsd could not propagate a configuration parameter value to one or more nodes in the cluster. User response: Contact the IBM Support Center.
Explanation: Connecting sockets between nodes. User response: None. Information message only. 6027-1539 Connect progress select failed to ipAddress: reason
6027-1544
Sum of prefetchthreads(value), worker1threads(value) and nsdMaxWorkerThreads (value) exceeds value. Reducing them to value, value and value.
Explanation: An attempt to connect sockets between nodes failed. User response: Check the reason listed and the connection to the indicated IP address. 6027-1540 Try and buy license has expired!
Explanation: The sum of prefetchthreads, worker1threads, and nsdMaxWorkerThreads exceeds the permitted value. User response: Accept the calculated values or reduce the individual settings using mmchconfig prefetchthreads=newvalue or mmchconfig worker1threads=newvalue. or mmchconfig nsdMaxWorkerThreads=newvalue. After using mmchconfig, the new settings will not take affect until the GPFS daemon is restarted. 6027-1545 The GPFS product that you are attempting to run is not a fully functioning version. This probably means that this is an update version and not the full product version. Install the GPFS full product version first, then apply any applicable update version before attempting to start GPFS.
Explanation: Self explanatory. User response: Purchase a GPFS license to continue using GPFS. 6027-1541 Try and buy license expires in number days.
Explanation: Self-explanatory. User response: When the Try and Buy license expires,
184
6027-1546 6027-1560
Explanation: GPFS requires a fully licensed GPFS installation. User response: Verify installation of licensed GPFS, or purchase and install a licensed version of GPFS. 6027-1546 Attention: parameter size of value is too small. New value is value. 6027-1555 Mount point and device name cannot be equal: name
Explanation: The specified mount point is the same as the absolute device name. User response: Enter a new device name or absolute mount point path name. 6027-1556 Interrupt received.
Explanation: A configuration parameter is temporarily assigned a new value. User response: Check the mmfs.cfg file. Use the mmchconfig command to set a valid value for the parameter. 6027-1547 Error initializing daemon: performing shutdown.
Explanation: A GPFS administration command received an interrupt. User response: None. Informational message only. 6027-1557 You must first generate an authentication key file. Run: mmauth genkey new.
Explanation: GPFS kernel extensions are not loaded, and the daemon cannot initialize. GPFS may have been started incorrectly. User response: Check GPFS log for errors resulting from kernel extension loading. Ensure that GPFS is started with the mmstartup command. 6027-1548 Error: daemon and kernel extension do not match.
Explanation: Before setting a cipher list, you must generate an authentication key file. User response: Run the specified command to establish an authentication key for the nodes in the cluster. 6027-1558 Disk name diskName already exists in the cluster configuration file as the name of an NSD or of a disk underlying an NSD. If the cited disk no longer exists, remove the associated NSD from the cluster configuration file by issuing: mmdelnsd nsdName.
Explanation: The GPFS kernel extension loaded in memory and the daemon currently starting do not appear to have come from the same build. User response: Ensure that the kernel extension was reloaded after upgrading GPFS. See GPFS modules cannot be loaded on Linux on page 44 for details. 6027-1549 Attention: custom-built kernel extension; the daemon and kernel extension do not match.
Explanation: The command cannot create the specified disk because a disk with the same name is already recorded in the cluster configuration file. User response: If the specified disk no longer exists, issue the mmdelnsd command to remove the associated NSD from the cluster configuration file. 6027-1559 The -i option failed. Changes will take effect after GPFS is restarted.
Explanation: The GPFS kernel extension loaded in memory does not come from the same build as the starting daemon. The kernel extension appears to have been built from the kernel open source package. User response: None. 6027-1550 Error: Unable to establish a session with an Active Directory server. ID remapping via Microsoft Identity Management for Unix will be unavailable.
Explanation: The -i option on the mmchconfig command failed. The changes were processed successfully, but will take effect only after the GPFS daemons are restarted. User response: Check for additional error messages. Correct the problem and reissue the command. 6027-1560 This GPFS cluster contains file systems. You cannot delete the last node.
Explanation: GPFS tried to establish an LDAP session with an Active Directory server (normally the domain controller host), and has been unable to do so. User response: Ensure the domain controller is available.
Explanation: An attempt has been made to delete a GPFS cluster that still has one or more file systems associated with it. User response: Before deleting the last node of a GPFS cluster, delete all file systems that are associated with it.
Chapter 11. Messages
185
6027-1561 6027-1571
This applies to both local and remote file systems. 6027-1561 Attention: Failed to remove node-specific changes. 6027-1566 Remote cluster clusterName is already defined.
Explanation: A request was made to add the cited cluster, but the cluster is already known to GPFS. User response: None. The cluster is already known to GPFS. 6027-1567 fileSystem from cluster clusterName is already defined.
Explanation: The internal mmfixcfg routine failed to remove node-specific configuration settings, if any, for one or more of the nodes being deleted. This is of consequence only if the mmchconfig command was indeed used to establish node specific settings and these nodes are later added back into the cluster. User response: If you add the nodes back later, ensure that the configuration parameters for the nodes are set as desired. 6027-1562 command command cannot be executed. Either none of the nodes in the cluster are reachable, or GPFS is down on all of the nodes.
Explanation: A request was made to add the cited file system from the cited cluster, but the file system is already known to GPFS. User response: None. The file system is already known to GPFS. 6027-1568 command command failed. Only parameterList changed.
Explanation: The command that was issued needed to perform an operation on a remote node, but none of the nodes in the cluster were reachable, or GPFS was not accepting commands on any of the nodes. User response: Ensure that the affected nodes are available and all authorization requirements are met. Correct any problems and reissue the command. 6027-1563 Attention: The file system may no longer be properly balanced.
Explanation: The mmchfs command failed while making the requested changes. Any changes to the attributes in the indicated parameter list were successfully completed. No other file system attributes were changed. User response: Reissue the command if you want to change additional attributes of the file system. Changes can be undone by issuing the mmchfs command with the original value for the affected attribute. 6027-1569 The volume group volumeName exists, but partition information cannot be determined. Perhaps it needs to be varied on?
Explanation: The restripe phase of the mmadddisk or mmdeldisk command failed. User response: Determine the cause of the failure and run the mmrestripefs -b command. 6027-1564 To change the authentication key for the local cluster, run: mmauth genkey.
Explanation: The mmvsdhelper subroutine has successfully found the requested volume group, but attempts to get the free physical partitions with the lsvg command are failing. User response: Ensure that the volume group is varied on with the varyonvg command. 6027-1570 virtual shared disk support is not installed.
Explanation: The authentication keys for the local cluster must be created only with the specified command. User response: Run the specified command to establish a new authentication key for the nodes in the cluster. 6027-1565 disk not found in file system fileSystem.
Explanation: The command detected that IBM Virtual Shared Disk support is not installed on the node on which it is running. User response: Install IBM Virtual Shared Disk support. 6027-1571 commandName does not exist or failed; automount mounting may not work.
Explanation: A disk specified for deletion or replacement does not exist. User response: Specify existing disks for the indicated file system.
Explanation: One or more of the GPFS file systems were defined with the automount attribute but the requisite automount command is missing or failed.
186
6027-1572 6027-1588
User response: Correct the problem and restart GPFS. Or use the mount command to explicitly mount the file system. 6027-1572 The command must run on a node that is part of the cluster. 6027-1585 Primary server nodeName specified on the disk descriptor for previously-created virtual shared disk diskName does not match the primary server (nodeName) currently in effect for the virtual shared disk.
Explanation: The node running the mmcrcluster command (this node) must be a member of the GPFS cluster. User response: Issue the command from a node that will belong to the cluster. 6027-1573 Command completed: No changes made.
Explanation: The primary server that was specified on the disk descriptor for the cited previously-created virtual shared disk is different than the primary server currently in effect for that virtual shared disk. User response: Either correct or omit the primary server specified on the descriptor and try the command again. 6027-1586 Backup server nodeName specified on the disk descriptor for previously-created virtual shared disk diskName does not match the backup server (nodeName) currently in effect for the virtual shared disk.
Explanation: Informational message. User response: Check the preceding messages, correct any problems, and reissue the command. 6027-1574 Permission failure. The command requires root authority to execute.
Explanation: The command, or the specified command option, requires root authority. User response: Log on as root and reissue the command. 6027-1578 File fileName does not contain node names.
Explanation: The backup server that was specified on the disk descriptor for the cited previously-created virtual shared disk is different than the backup server currently in effect for that virtual shared disk. User response: Either correct or omit the backup server specified on the descriptor and try the command again. 6027-1587 Unable to determine the local device name for disk nsdName on node nodeName.
Explanation: The specified file does not contain valid node names. User response: Node names must be specified one per line. The name localhost and lines that start with '#' character are ignored. 6027-1579 File fileName does not contain data.
Explanation: GPFS was unable to determine the local device name for the specified GPFS disk. User response: Determine why the specified disk on the specified node could not be accessed and correct the problem. Possible reasons include: connectivity problems, authorization problems, fenced disk, and so forth. 6027-1588 Unknown GPFS execution environment: value
Explanation: The specified file does not contain data. User response: Verify that you are specifying the correct file name and reissue the command. 6027-1580 Failed to obtain Kerberos credentials; ksrvtgt name command rc=value. Continuing.
Explanation: An attempt to obtain SPbgAdm Kerberos credentials has failed. Processing continues, but there may be an authentication failure later on. User response: Check the preceding messages. Ensure that _PRINCIPLE root.SPbgAdm is defined in the file /etc/sysctl.mmcmd.acl. If the problem persists, go to the Parallel System Support Programs for AIX: Diagnosis Guide and search on diagnosing per node key management (PNKM) problems. Follow the problem determination and repair actions specified.
Explanation: A GPFS administration command (prefixed by mm) was asked to operate on an unknown GPFS cluster type. The only supported GPFS cluster type is lc. This message may also be generated if there is corruption in the GPFS system files. User response: Verify that the correct level of GPFS is installed on the node. If this is a cluster environment, make sure the node has been defined as a member of the GPFS cluster with the help of the mmcrcluster or the mmaddnode command. If the problem persists, contact the IBM Support Center.
187
6027-1590 6027-1603
6027-1590 nodeName cannot be reached. 6027-1597 Node node is specified more than once.
Explanation: A command needs to issue a remote function on a particular node but the node is not reachable. User response: Determine why the node is unreachable, correct the problem, and reissue the command. 6027-1591 Attention: Unable to retrieve GPFS cluster files from node nodeName.
Explanation: The same node appears more than once on the command line or in the input file for the command. User response: All specified nodes must be unique. Note that even though two node identifiers may appear different on the command line or in the input file, they may still refer to the same node. 6027-1598 Node nodeName was not added to the cluster. The node appears to already belong to a GPFS cluster.
Explanation: A command could not retrieve the GPFS cluster files from a particular node. An attempt will be made to retrieve the GPFS cluster files from a backup node. User response: None. Informational message only. 6027-1592 Unable to retrieve GPFS cluster files from node nodeName.
Explanation: A GPFS cluster command found that a node to be added to a cluster already has GPFS cluster files on it. User response: Use the mmlscluster command to verify that the node is in the correct cluster. If it is not, follow the procedure in Node cannot be added to the GPFS cluster on page 53. 6027-1599 The level of GPFS on node nodeName does not support the requested action.
Explanation: A command could not retrieve the GPFS cluster files from a particular node. User response: Correct the problem and reissue the command. 6027-1594 Run the command command until successful.
Explanation: A GPFS command found that the level of the GPFS code on the specified node is not sufficient for the requested action. User response: Install the correct level of GPFS. 6027-1600 Make sure that the following nodes are available: nodeList
Explanation: The command could not complete normally. The GPFS cluster data may be left in a state that precludes normal operation until the problem is corrected. User response: Check the preceding messages, correct the problems, and issue the specified command until it completes successfully. 6027-1595 No nodes were found that matched the input specification.
Explanation: A GPFS command was unable to complete because nodes critical for the success of the operation were not reachable or the command was interrupted. User response: This message will normally be followed by a message telling you which command to issue as soon as the problem is corrected and the specified nodes become available. 6027-1602 nodeName is not a member of this cluster.
Explanation: No nodes were found in the GPFS cluster that matched those specified as input to a GPFS command. User response: Determine why the specified nodes were not valid, correct the problem, and reissue the GPFS command. 6027-1596 The same node was specified for both the primary and the secondary server.
Explanation: A command found that the specified node is not a member of the GPFS cluster. User response: Correct the input or add the node to the GPFS cluster and reissue the command. 6027-1603 The following nodes could not be added to the GPFS cluster: nodeList. Correct the problems and use the mmaddnode command to add these nodes to the cluster.
Explanation: A command would have caused the primary and secondary GPFS cluster configuration server nodes to be the same. User response: Specify a different primary or secondary node.
188
6027-1604 6027-1625
command was unable to add the listed nodes to a GPFS cluster. User response: Correct the problems and add the nodes to the cluster using the mmaddnode command. 6027-1604 Information cannot be displayed. Either none of the nodes in the cluster are reachable, or GPFS is down on all of the nodes. 6027-1616 Caught SIG signal - terminating the child processes.
Explanation: The mmdsh command has received a signal causing it to terminate. User response: Determine what caused the signal and correct the problem. 6027-1617 There are no available nodes on which to run the command.
Explanation: The command needed to perform an operation on a remote node, but none of the nodes in the cluster were reachable, or GPFS was not accepting commands on any of the nodes. User response: Ensure that the affected nodes are available and all authorization requirements are met. Correct any problems and reissue the command. 6027-1610 Disk diskName is the only disk in file system fileSystem. You cannot replace a disk when it is the only remaining disk in the file system.
Explanation: The mmdsh command found that there are no available nodes on which to run the specified command. Although nodes were specified, none of the nodes were reachable. User response: Determine why the specified nodes were not available and correct the problem. 6027-1618 Unable to pipe. Error string was: errorString.
Explanation: The mmdsh command attempted to open a pipe, but the pipe command failed. User response: Determine why the call to pipe failed and correct the problem. 6027-1619 Unable to redirect outputStream. Error string was: string.
Explanation: The mmrpldisk command was issued, but there is only one disk in the file system. User response: Add a second disk and reissue the command. 6027-1613 WCOLL (working collective) environment variable not set.
Explanation: The mmdsh command was invoked without explicitly specifying the nodes on which the command is to run by means of the -F or -L options, and the WCOLL environment variable has not been set. User response: Change the invocation of the mmdsh command to use the -F or -L options, or set the WCOLL environment variable before invoking the mmdsh command. 6027-1614 Cannot open file fileName. Error string was: errorString.
Explanation: The mmdsh command attempted to redirect an output stream using open, but the open command failed. User response: Determine why the call to open failed and correct the problem. 6027-1623 command: Mounting file systems ...
Explanation: This message contains progress information about the mmmount command. User response: None. Informational message only. 6027-1625 option cannot be used with attribute name.
Explanation: The mmdsh command was unable to successfully open a file. User response: Determine why the file could not be opened and correct the problem. 6027-1615 nodeName remote shell process had return code value.
Explanation: An attempt was made to change a configuration attribute and requested the change to take effect immediately (-i or -I option). However, the specified attribute does not allow the operation. User response: If the change must be made now, leave off the -i or -I option. Then recycle the nodes to pick up the new value.
Explanation: A child remote shell process completed with a nonzero return code. User response: Determine why the child remote shell process failed and correct the problem.
189
6027-1626 6027-1636
6027-1626 Command is not supported in the type environment. 6027-1631 The commit process failed.
Explanation: A GPFS administration command (mm...) is not supported in the specified environment. User response: Verify if the task is needed in this environment, and if it is, use a different command. 6027-1627 The following nodes are not aware of the configuration server change: nodeList. Do not start GPFS on the above nodes until the problem is resolved.
Explanation: A GPFS administration command (mm...) cannot commit its changes to the GPFS cluster configuration data. User response: Examine the preceding messages, correct the problem, and reissue the command. If the problem persists, perform problem determination and contact the IBM Support Center. 6027-1632 The GPFS cluster configuration data on nodeName is different than the data on nodeName.
Explanation: The mmchcluster command could not propagate the new cluster configuration servers to the specified nodes. User response: Correct the problems and run the mmchcluster -p LATEST command before starting GPFS on the specified nodes. 6027-1628 Cannot determine basic environment information. Not enough nodes are available.
Explanation: The GPFS cluster configuration data on the primary cluster configuration server node is different than the data on the secondary cluster configuration server node. This can happen if the GPFS cluster configuration files were altered outside the GPFS environment or if the mmchcluster command did not complete successfully. User response: Correct any problems and issue the mmrefresh -f -a command. If the problem persists, perform problem determination and contact the IBM Support Center. 6027-1633 Failed to create a backup copy of the GPFS cluster data on nodeName.
Explanation: The mmchcluster command was unable to retrieve the GPFS cluster data files. Usually, this is due to too few nodes being available. User response: Correct any problems and ensure that as many of the nodes in the cluster are available as possible. Reissue the command. If the problem persists, record the above information and contact the IBM Support Center. 6027-1629 Error retrieving data from nodeName to nodeName.
Explanation: Commit could not create a correct copy of the GPFS cluster configuration data. User response: Check the preceding messages, correct any problems, and reissue the command. If the problem persists, perform problem determination and contact the IBM Support Center. 6027-1634 The GPFS cluster configuration server node nodeName cannot be removed.
Explanation: A GPFS command is unable to correctly copy data (checksum error). User response: Correct any communication problems and reissue the command. 6027-1630 The GPFS cluster data on nodeName is back level.
Explanation: An attempt was made to delete a GPFS cluster configuration server node. User response: You cannot remove a cluster configuration server node unless all nodes in the GPFS cluster are being deleted. Before deleting a cluster configuration server node, you must use the mmchcluster command to transfer its function to another node in the GPFS cluster. 6027-1636 Error found while checking disk descriptor: descriptor.
Explanation: A GPFS command attempted to commit changes to the GPFS cluster configuration data, but the data on the server is already at a higher level. This can happen if the GPFS cluster configuration files were altered outside the GPFS environment, or if the mmchcluster command did not complete successfully. User response: Correct any problems and reissue the command. If the problem persists, issue the mmrefresh -f -a command.
Explanation: A disk descriptor was found to be unsatisfactory in some way. User response: Check the preceding messages, if any, and correct the condition that caused the disk descriptor to be rejected.
190
6027-1637 6027-1663
6027-1637 command quitting. None of the specified nodes are valid. 6027-1645 Node nodeName is fenced out from disk diskName.
Explanation: A GPFS command found that none of the specified nodes passed the required tests. User response: Determine why the nodes were not accepted, fix the problems, and reissue the command. 6027-1638 Command: There are no unassigned nodes in the cluster.
Explanation: A GPFS command attempted to access the specified disk, but found that the node attempting the operation was fenced out from the disk. User response: Check whether there is a valid reason why the node should be fenced out from the disk. If there is no such reason, unfence the disk and reissue the command. 6027-1647 Unable to find disk with NSD volume id NSD volume id.
Explanation: A GPFS command in a cluster environment needs unassigned nodes, but found there are none. User response: Verify whether there are any unassigned nodes in the cluster. If there are none, either add more nodes to the cluster using the mmaddnode command, or delete some nodes from the cluster using the mmdelnode command, and then reissue the command. 6027-1639 Command failed. Examine previous error messages to determine cause.
Explanation: A disk with the specified NSD volume id cannot be found. User response: Specify a correct disk NSD volume id. 6027-1648 GPFS was unable to obtain a lock from node nodeName.
Explanation: GPFS failed in its attempt to get a lock from another node in the cluster. User response: Verify that the reported node is reachable. Examine previous error messages, if any. Fix the problems and then reissue the command. 6027-1661 Failed while processing disk descriptor descriptor on node nodeName.
Explanation: A GPFS command failed due to previously-reported errors. User response: Check the previous error messages, fix the problems, and then reissue the command. If no other messages are shown, examine the GPFS log files in the /var/adm/ras directory on each node. 6027-1642 command: Starting GPFS ...
Explanation: A disk descriptor was found to be unsatisfactory in some way. User response: Check the preceding messages, if any, and correct the condition that caused the disk descriptor to be rejected.
Explanation: Progress information for the mmstartup command. User response: None. Informational message only.
| 6027-1662
6027-1643 The number of quorum nodes exceeds the maximum (number) allowed.
Explanation: An attempt was made to add more quorum nodes to a cluster than the maximum number allowed. User response: Reduce the number of quorum nodes, and reissue the command. 6027-1644 Attention: The number of quorum nodes exceeds the suggested maximum (number).
Explanation: The specified disk device refers to an existing NSD. User response: Specify another disk that is not an existing NSD. 6027-1663 Disk descriptor descriptor should refer to an existing NSD. Use mmcrnsd to create the NSD.
Explanation: An NSD disk given as input is not known to GPFS. User response: Create the NSD. Then rerun the command.
Explanation: The number of quorum nodes in the cluster exceeds the maximum suggested number of quorum nodes. User response: Informational message. Consider reducing the number of quorum nodes to the maximum suggested number of quorum nodes for improved performance.
191
6027-1664 6027-1698
6027-1664 command: Processing node nodeName 6027-1680 Disk name diskName is already registered for use by GPFS.
Explanation: Progress information. User response: None. Informational message only. 6027-1665 Issue the command from a node that remains in the cluster.
Explanation: The cited disk name was specified for use by GPFS, but there is already a disk by that name registered for use by GPFS. User response: Specify a different disk name for use by GPFS and reissue the command. 6027-1681 Node nodeName is being used as an NSD server.
Explanation: The nature of the requested change requires the command be issued from a node that will remain in the cluster. User response: Run the command from a node that will remain in the cluster. 6027-1666 No disks were found.
Explanation: The specified node is defined as a server node for some disk. User response: If you are trying to delete the node from the GPFS cluster, you must either delete the disk or define another node as its server. 6027-1685 Processing continues without lock protection.
Explanation: A command searched for disks but found none. User response: If disks are desired, create some using the mmcrnsd command. 6027-1670 Incorrect or missing remote shell command: name
Explanation: The specified remote command does not exist or is not executable. User response: Specify a valid command. 6027-1671 Incorrect or missing remote file copy command: name
Explanation: The command will continue processing although it was not able to obtain the lock that prevents other GPFS commands from running simultaneously. User response: Ensure that no other GPFS command is running. See the command documentation for additional details. 6027-1688 Command was unable to obtain the lock for the GPFS system data. Unable to reach the holder of the lock nodeName. Check the preceding messages, if any. Follow the procedure outlined in the GPFS: Problem Determination Guide.
Explanation: The specified remote command does not exist or is not executable. User response: Specify a valid command. 6027-1672 option value parameter must be an absolute path name.
Explanation: A command requires the lock for the GPFS system data but was not able to obtain it. User response: Check the preceding messages, if any. Follow the procedure in the GPFS: Problem Determination Guide for what to do when the GPFS system data is locked. Then reissue the command. 6027-1689 vpath disk diskName is not recognized as an IBM SDD device.
Explanation: The mount point does not begin with '/'. User response: Specify the full path for the mount point. 6027-1674 command: Unmounting file systems ...
Explanation: This message contains progress information about the mmumount command. User response: None. Informational message only. 6027-1677 Disk diskName is of an unknown type.
Explanation: The mmvsdhelper command found that the specified disk is a vpath disk, but it is not recognized as an IBM SDD device. User response: Ensure the disk is configured as an IBM SDD device. Then reissue the command. 6027-1698 Disk diskName belongs to vpath vpathName.
Explanation: The specified disk is of an unknown type. User response: Specify a disk whose type is recognized by GPFS.
192
6027-1699 6027-1712
User response: Reissue the command specifying the vpath device, or specify another disk that does not belong to a vpath device. 6027-1699 Remount failed for file system fileSystem. Error code errorCode. 6027-1705 command: incorrect number of connections (number), exiting...
Explanation: The mmspsecserver process was called with an incorrect number of connections. This will happen only when the mmspsecserver process is run as an independent program. User response: Retry with a valid number of connections. 6027-1706 mmspsecserver: parent program is not "mmfsd", exiting...
Explanation: The specified file system was internally unmounted. An attempt to remount the file system failed with the specified error code. User response: Check the daemon log for additional error messages. Ensure that all file system disks are available and reissue the mount command. 6027-1700 Failed to load LAPI library. functionName not found. Changing communication protocol to TCP.
Explanation: The mmspsecserver process was invoked from a program other than mmfsd. User response: None. Informational message only. 6027-1707 mmfsd connected to mmspsecserver
Explanation: The GPFS daemon failed to load liblapi_r.a dynamically. User response: Verify installation of liblapi_r.a. 6027-1701 mmfsd waiting to connect to mmspsecserver. Setting up to retry every number seconds for number minutes.
Explanation: The mmfsd daemon has successfully connected to the mmspsecserver process through the communication socket. User response: None. Informational message only. 6027-1708 The mmfsd daemon failed to fork mmspsecserver. Failure reason explanation
Explanation: The GPFS daemon failed to establish a connection with the mmspsecserver process. User response: None. Informational message only. 6027-1702 Process pid failed at functionName call, socket socketName, errno value
Explanation: The mmfsd daemon failed to fork a child process. User response: Check the GPFS installation. 6027-1709 Accepted and connected to ipAddress.
Explanation: Either The mmfsd daemon or the mmspsecserver process failed to create or set up the communication socket between them. User response: Determine the reason for the error. 6027-1703 The processName process encountered error: errorString.
Explanation: The local mmfsd daemon has successfully accepted and connected to a remote daemon. User response: None. Informational message only. 6027-1710 Connecting to ipAddress.
Explanation: Either the mmfsd daemon or the mmspsecserver process called the error log routine to log an incident. User response: None. Informational message only. 6027-1704 mmspsecserver (pid number) ready for service.
Explanation: The local mmfsd daemon has started a connection request to a remote daemon. User response: None. Informational message only. 6027-1711 Connected to ipAddress.
Explanation: The mmspsecserver process has created all the service threads necessary for mmfsd. User response: None. Informational message only.
Explanation: The local mmfsd daemon has successfully connected to a remote daemon. User response: None. Informational message only. 6027-1712 Unexpected zero bytes received from name. Continuing.
Explanation: This is an informational message. A socket read resulted in zero bytes being read.
193
6027-1715 6027-1731
User response: If this happens frequently, check IP connections. 6027-1715 EINVAL trap from connect call to ipAddress (socket name) 6027-1726 The administrator of the cluster named clusterName requires authentication. Contact the administrator to obtain the cluster's key and register the key using: mmremotecluster update
Explanation: The connect call back to the requesting node failed. User response: This is caused by a bug in AIX socket support. Upgrade AIX kernel and TCP client support. 6027-1716 Close connection to ipAddress
Explanation: The administrator of the cluster requires authentication. User response: Contact the administrator to obtain the cluster's key and register it using: mmremotecluster update. 6027-1727 The administrator of the cluster named clusterName does not require authentication. Unregister the cluster's key using: mmremotecluster update
Explanation: Connection socket closed. User response: None. Informational message only. 6027-1717 The administrator of name does not allow remote mounts.
Explanation: The administrator of the cluster does not require authentication. User response: Unregister the clusters key using: mmremotecluster update. 6027-1728 Remote mounts are not enabled within the cluster named clusterName. Contact the administrator and request that they enable remote mounts.
Explanation: The administrator of a file system has not configured remote mount. User response: Contact the administrator and request remote mount access. 6027-1718 The administrator of name does not require secure connections. Unregister the target clusters key using: mmremotecluster update
Explanation: The administrator of the cluster has not enabled remote mounts. User response: Contact the administrator and request remote mount access. 6027-1729 The cluster named clusterName has not authorized this cluster to mount file systems. Contact the cluster administrator and request access.
Explanation: Connection socket closed. User response: None. Informational message only. 6027-1724 The key used by the cluster named clusterName has changed. Contact the administrator to obtain the new key and register it using: mmremotecluster update
Explanation: The administrator of the cluster has changed the key used for authentication. User response: Contact the administrator to obtain the new key and register it using mmremotecluster update. 6027-1725 The key used by the cluster named clusterName has changed. Contact the administrator to obtain the new key and register it using: mmauth update
Explanation: The administrator of the cluster has not authorized this cluster to mount file systems. User response: Contact the administrator and request access. 6027-1730 Unsupported cipherList cipherList requested.
Explanation: The target cluster requested a cipherList not supported by the installed version of OpenSSL. User response: Install a version of OpenSSL that supports the required cipherList or contact the administrator of the target cluster and request that a supported cipherList be assigned to this remote cluster. 6027-1731 Unsupported cipherList cipherList requested.
Explanation: The administrator of the cluster has changed the key used for authentication. User response: Contact the administrator to obtain the new key and register it using mmauth update.
Explanation: The target cluster requested a cipherList that is not supported by the installed version of OpenSSL.
194
6027-1732 6027-1804
User response: Either install a version of OpenSSL that supports the required cipherList or contact the administrator of the target cluster and request that a supported cipherList be assigned to this remote cluster. 6027-1732 Remote mounts are not enabled within this cluster. 6027-1738 Close connection to ipAddress (errorString). Attempting reconnect.
Explanation: Connection socket closed. User response: None. Informational message only. 6027-1739 Accept socket connection failed: err value.
Explanation: Remote mounts cannot be performed in this cluster. User response: See the GPFS: Advanced Administration Guide for instructions about enabling remote mounts. In particular, make sure the keys have been generated and a cipherlist has been set. 6027-1733 OpenSSL dynamic lock support could not be loaded.
Explanation: The Accept socket connection received an unexpected error. User response: None. Informational message only. 6027-1740 Timed out waiting for a reply from node ipAddress.
Explanation: One of the functions required for dynamic lock support was not included in the version of the OpenSSL library that GPFS is configured to use. User response: If this functionality is required, shut down the daemon, install a version of OpenSSL with the desired functionality, and configure GPFS to use it. Then restart the daemon. 6027-1734 OpenSSL engine support could not be loaded.
Explanation: A message that was sent to the specified node did not receive a response within the expected time limit. User response: None. Informational message only. 6027-1741 Error code value received from node ipAddress.
Explanation: When a message was sent to the specified node to check its status, an error occurred and the node could not handle the message. User response: None. Informational message only. 6027-1742 Message ID value was lost by node ipAddress.
Explanation: One of the functions required for engine support was not included in the version of the OpenSSL library that GPFS is configured to use. User response: If this functionality is required, shut down the daemon, install a version of OpenSSL with the desired functionality, and configure GPFS to use it. Then restart the daemon. 6027-1735 Close connection to ipAddress. Attempting reconnect.
Explanation: During a periodic check of outstanding messages, a problem was detected where the destination node no longer has any knowledge of a particular message. User response: None. Informational message only. 6027-1803 Global NSD disk, name, not found.
Explanation: Connection socket closed. The GPFS daemon will attempt to reestablish the connection. User response: None. Informational message only. 6027-1736 Reconnected to ipAddress
Explanation: A client tried to open a globally-attached NSD disk, but a scan of all disks failed to find that NSD. User response: Ensure that the globally-attached disk is available on every node that references it. 6027-1804 I/O to NSD disk, name, fails. No such NSD locally found.
Explanation: The local mmfsd daemon has successfully reconnected to a remote daemon following an unexpected connection break. User response: None. Informational message only. 6027-1737 Close connection to ipAddress (errorString).
Explanation: A server tried to perform I/O on an NSD disk, but a scan of all disks failed to find that NSD. User response: Make sure that the NSD disk is accessible to the client. If necessary, break a reservation.
Explanation: Connection socket closed. User response: None. Informational message only.
195
6027-1805 6027-1817
6027-1805 Rediscovered NSD server access to name. 6027-1811 Vdisk server recovery: delay complete.
Explanation: A server rediscovered access to the specified disk. User response: None. 6027-1806 A Persistent Reserve could not be established on device name (deviceName): errorLine.
Explanation: Done waiting for existing disk lease to expire before performing vdisk server recovery. User response: None. 6027-1812 Rediscovery failed for name.
Explanation: A server failed to rediscover access to the specified disk. User response: Check the disk access issues and run the command again. 6027-1813 Error reading volume identifier (for objectName name) from configuration file.
Explanation: GPFS is using Persistent Reserve on this disk, but was unable to establish a reserve for this node. User response: Perform disk diagnostics. 6027-1807 NSD nsdName is using Persistent Reserve, this will require an NSD server on an osName node.
Explanation: The volume identifier for the named recovery group or vdisk could not be read from the mmsdrfs file. This should never occur. User response: Check for damage to the mmsdrfs file. 6027-1814 Vdisk vdiskName cannot be associated with its recovery group recoveryGroupName. This vdisk will be ignored.
Explanation: A client tried to open a globally-attached NSD disk, but the disk is using Persistent Reserve. An osName NSD server is needed. GPFS only supports Persistent Reserve on certain operating systems. User response: Use the mmchnsd command to add an osName NSD server for the NSD. 6027-1808 Unable to reserve space for NSD buffers. Increase pagepool size to at least requiredPagePoolSize MB. Refer to the GPFS: Administration and Programming Reference for more information on selecting an appropriate pagepool size.
Explanation: The named vdisk cannot be associated with its recovery group. User response: Check for damage to the mmsdrfs file. 6027-1815 Error reading volume identifier (for NSD name) from configuration file.
Explanation: The pagepool usage for an NSD buffer (4*maxblocksize) is limited by factor nsdBufSpace. The value of nsdBufSpace can be in the range of 10-70. The default value is 30. User response: Use the mmchconfig command to decrease the value of maxblocksize or to increase the value of pagepool or nsdBufSpace. 6027-1809 The defined server serverName for NSD NsdName could not be resolved.
Explanation: The volume identifier for the named NSD could not be read from the mmsdrfs file. This should never occur. User response: Check for damage to the mmsdrfs file. 6027-1816 The defined server serverName for recovery group recoveryGroupName could not be resolved.
Explanation: The hostname of the NSD server could not be resolved by gethostbyName(). User response: Fix hostname resolution. 6027-1817 Vdisks are defined, but no recovery groups are defined.
Explanation: The host name of the NSD server could not be resolved by gethostbyName(). User response: Fix the host name resolution. 6027-1810 Vdisk server recovery: delay number sec. for safe recovery.
Explanation: Wait for the existing disk lease to expire before performing vdisk server recovery. User response: None.
Explanation: There are vdisks defined in the mmsdrfs file, but no recovery groups are defined. This should never occur. User response: Check for damage to the mmsdrfs file.
196
Explanation: This node has relinquished serving the named recovery group. User response: None. 6027-1819 Disk descriptor for name refers to an existing pdisk.
Explanation: The mmcrrecoverygroup command or mmaddpdisk command found an existing pdisk. User response: Correct the input file, or use the -v option. 6027-1820 Disk descriptor for name refers to an existing NSD.
Explanation: The allowed number of retries was exceeded when encountering an NSD checksum error on I/O to the indicated disk, using the indicated server. User response: There may be network issues that require investigation.
Explanation: The mmcrrecoverygroup command or mmaddpdisk command found an existing NSD. User response: Correct the input file, or use the -v option. 6027-1821 Error errno writing disk descriptor on name.
Explanation: The mmcrrecoverygroup command or mmaddpdisk command got an error writing the disk descriptor. User response: Perform disk diagnostics. 6027-1822 Error errno reading disk descriptor on name.
Explanation: The tspreparedpdisk command got an error reading the disk descriptor. User response: Perform disk diagnostics. 6027-1823 Path error, name and name are the same disk.
Explanation: The tspreparedpdisk command got an error during path verification. The pdisk descriptor file is miscoded. User response: Correct the pdisk descriptor file and reissue the command. 6027-1824 An unexpected Device Mapper path dmDevice (nsdId) has been detected. The new path does not have a Persistent Reserve set up. Server disk diskName will be put offline.
Explanation: A new device mapper path is detected or a previously failed path is activated after the local device discovery has finished. This path lacks a Persistent Reserve, and cannot be used. All device
197
198
6027-1868 [E] Error formatting pdisk on device diskName. Explanation: An error occurred when trying to format a new pdisk. User response: Check that the disk is working properly.
6027-1869 [E] Error updating the recovery group descriptor. Explanation: Error occurred updating the RAID recovery group descriptor. User response: Retry the command.
6027-1870 [E] Recovery group name name is already in use. Explanation: The recovery group name already exists. User response: Choose a new recovery group name using the characters a-z, A-Z, 0-9, and underscore, at most 63 characters in length.
6027-1871 [E] There is only enough free space to allocate number spare(s) in declustered array arrayName. Explanation: Too many spares were specified. User response: Retry the command with a valid number of spares.
6027-1872 [E] Recovery group still contains vdisks. Explanation: RAID recovery groups that still contain vdisks cannot be deleted. User response: Delete any vdisks remaining in this RAID recovery group using the tsdelvdisk command before retrying this command.
6027-1873 [E] Pdisk creation failed for pdisk pdiskName: err=errorNum. Explanation: Pdisk creation failed because of the specified error. User response: None.
199
| 6027-1886 [E] Vdisk block size cannot exceed | 6027-1880 [E] Cannot remove pdisk pdiskName because
the number of pdisks in declustered array arrayName would fall below the code width of one or more of its vdisks. Explanation: The number of pdisks in a declustered array must be at least the maximum code width of any vdisk in the declustered array. User response: Either add pdisks or remove vdisks from the declustered array. maxBlockSize (number). Explanation: The virtual block size of a vdisk cannot be larger than the value of the GPFS configuration parameter maxBlockSize. User response: Use a smaller vdisk virtual block size, or increase the value of maxBlockSize using mmchconfig maxBlockSize=newSize.
| 6027-1889 [E] Vdisk name vdiskName is already in use. | 6027-1883 [E] Pdisk pdiskName deletion failed: process
interrupted. Explanation: Pdisk deletion failed because the deletion process was interrupted. This is most likely because of the recovery group failing over to a different server. User response: Retry the command. Explanation: The vdisk name given on the tscrvdisk command already exists. User response: Choose a new vdisk name less than 64 characters using the characters a-z, A-Z, 0-9, and underscore.
| 6027-1890 [E] A recovery group may only contain one | 6027-1884 [E] Missing or invalid vdisk name.
Explanation: No vdisk name was given on the tscrvdisk command. User response: Specify a vdisk name using the characters a-z, A-Z, 0-9, and underscore of at most 63 characters in length. log vdisk. Explanation: A log vdisk already exists in the recovery group. User response: None.
200
6027-1892 [E] Log vdisks must use replication. Explanation: The log vdisk must use a RAID code that uses replication. User response: Retry the command with a valid RAID code.
Explanation: A stat() call failed for the specified object. User response: Correct the problem and reissue the command. 6027-1901 pathName is not a GPFS file system object.
6027-1894 [E] There is not enough space in the declustered array to create additional vdisks. Explanation: There is insufficient space in the declustered array to create even a minimum size vdisk with the given RAID code. User response: Add additional pdisks to the declustered array, reduce the number of spares or use a different RAID code.
Explanation: The specified path name does not resolve to an object within a mounted GPFS file system. User response: Correct the problem and reissue the command. 6027-1902 The policy file cannot be determined.
6027-1895 [E] Unable to create vdisk vdiskName because there are too many failed pdisks in declustered array declusteredArrayName. Explanation: Cannot create the specified vdisk, because there are too many failed pdisks in the array. User response: Replace failed pdisks in the declustered array and allow time for rebalance operations to more evenly distribute the space.
Explanation: The command was not able to retrieve the policy rules associated with the file system. User response: Examine the preceding messages and correct the reported problems. Establish a valid policy file with the mmchpolicy command or specify a valid policy file on the command line. 6027-1903 path must be an absolute path name.
6027-1896 [E] Insufficient memory for vdisk metadata. Explanation: There was not enough pinned memory for GPFS to hold all of the metadata necessary to describe a vdisk. User response: Increase the size of the GPFS page pool.
Explanation: The path name did not begin with a /. User response: Specify the absolute path name for the object. 6027-1904 Device with major/minor numbers number and number already exists.
Explanation: A device with the cited major and minor numbers already exists.
201
6027-1905 6027-1931
User response: Check the preceding messages for detailed information. 6027-1905 name was not created by GPFS or could not be refreshed. 6027-1911 File system fileSystem belongs to cluster clusterName. The option option is not allowed for remote file systems.
Explanation: The specified option can be used only for locally-owned file systems. User response: Correct the command line and reissue the command. 6027-1920 subsystem not active on nodes: nodeList.
Explanation: The attributes (device type, major/minor number) of the specified file system device name are not as expected. User response: Check the preceding messages for detailed information on the current and expected values. These errors are most frequently caused by the presence of /dev entries that were created outside the GPFS environment. Resolve the conflict by renaming or deleting the offending entries. Reissue the command letting GPFS create the /dev entry with the appropriate parameters. 6027-1906 There is no file system with drive letter driveLetter.
Explanation: A GPFS command requires the specified subsystem to be up and in active state. User response: Correct the problems and reissue the command. 6027-1922 IP aliasing is not supported (node). Specify the main device.
Explanation: IP aliasing is not supported. User response: Specify a node identifier that resolves to the IP address of a main device for the node. 6027-1927 The requested disks are not known to GPFS.
Explanation: No file system in the GPFS cluster has the specified drive letter. User response: Reissue the command with a valid file system. 6027-1908 The option option is not allowed for remote file systems.
Explanation: GPFS could not find the requested NSDs in the cluster. User response: Reissue the command, specifying known disks. 6027-1929 cipherlist is not a valid cipher list.
Explanation: The specified option can be used only for locally-owned file systems. User response: Correct the command line and reissue the command. 6027-1909 There are no available free disks. Disks must be prepared prior to invoking command. Define the disks using the command command.
Explanation: The cipher list must be set to a value supported by GPFS. All nodes in the cluster must support a common cipher. User response: See the GPFS FAQ (http://publib.boulder.ibm.com/infocenter/clresctr/ vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/ gpfsclustersfaq.html) for a list of the supported ciphers. 6027-1930 Disk diskName belongs to file system fileSystem.
Explanation: The currently executing command (mmcrfs, mmadddisk, mmrpldisk) requires disks to be defined for use by GPFS using one of the GPFS disk creation commands: mmcrnsd, mmcrvsd. User response: Create disks and reissue the failing command. 6027-1910 Node nodeName is not a quorum node.
Explanation: A GPFS administration command (mm...) found that the requested disk to be deleted still belongs to a file system. User response: Check that the correct disk was requested. If so, delete the disk from the file system before proceeding. 6027-1931 The following disks are not known to GPFS: diskNames.
Explanation: The mmchmgr command was asked to move the cluster manager to a nonquorum node. Only one of the quorum nodes can be a cluster manager. User response: Designate the node to be a quorum node, specify a different node on the command line, or allow GPFS to choose the new cluster manager node.
Explanation: A GPFS administration command (mm...) found that the specified disks are not known to GPFS.
202
6027-1932 6027-1943
User response: Verify that the correct disks were requested. 6027-1932 No disks were specified that could be deleted. 6027-1938 configParameter is an incorrect parameter. Line in error: configLine. The line is ignored; processing continues.
Explanation: The specified parameter is not valid and will be ignored. User response: None. Informational message only. 6027-1939 Line in error: line.
Explanation: A GPFS administration command (mm...) determined that no disks were specified that could be deleted. User response: Examine the preceding messages, correct the problems, and reissue the command. 6027-1933 Disk diskName has been removed from the GPFS cluster configuration data but the NSD volume id was not erased from the disk. To remove the NSD volume id, issue mmdelnsd -p NSDvolumeid.
Explanation: The specified line from a user-provided input file contains errors. User response: Check the preceding messages for more information. Correct the problems and reissue the command. 6027-1940 Unable to set reserve policy policy on disk diskName on node nodeName.
Explanation: A GPFS administration command (mm...) successfully removed the specified disk from the GPFS cluster configuration data, but was unable to erase the NSD volume id from the disk. User response: Issue the specified command to remove the NSD volume id from the disk. 6027-1934 Disk diskName has been removed from the GPFS cluster configuration data but the NSD volume id was not erased from the disk. To remove the NSD volume id, issue: mmdelnsd -p NSDvolumeid -N nodeList.
Explanation: The specified disk should be able to support Persistent Reserve, but an attempt to set up the registration key failed. User response: Correct the problem and reissue the command. 6027-1941 Cannot handle multiple interfaces for host hostName.
Explanation: Multiple entries were found for the given hostname or IP address either in /etc/hosts or by the host command. User response: Make corrections to /etc/hosts and reissue the command. 6027-1942 Unexpected output from the 'host -t a name' command:
Explanation: A GPFS administration command (mm...) successfully removed the specified disk from the GPFS cluster configuration data but was unable to erase the NSD volume id from the disk. User response: Issue the specified command to remove the NSD volume id from the disk. 6027-1936 Node nodeName cannot support Persistent Reserve on disk diskName because it is not an AIX node. The disk will be used as a non-PR disk.
Explanation: A GPFS administration command (mm...) received unexpected output from the host -t a command for the given host. User response: Issue the host -t a command interactively and carefully review the output, as well as any error messages. 6027-1943 Host name not found.
Explanation: A non-AIX node was specified as an NSD server for the disk. The disk will be used as a non-PR disk. User response: None. Informational message only. 6027-1937 A node was specified more than once as an NSD server in disk descriptor descriptor.
Explanation: A GPFS administration command (mm...) could not resolve a host from /etc/hosts or by using the host command. User response: Make corrections to /etc/hosts and reissue the command.
Explanation: A node was specified more than once as an NSD server in the disk descriptor shown. User response: Change the disk descriptor to eliminate any redundancies in the list of NSD servers.
203
6027-1944 6027-1969
6027-1944 Desired disk name diskName is longer than 13 characters. 6027-1962 Permission denied for disk diskName
Explanation: The cited disk name is not valid because it is longer than the maximum allowed length of 13 characters. User response: Specify a disk name whose length is 13 characters or less and reissue the command. 6027-1945 Disk name diskName is not allowed. Names beginning with gpfs are reserved for use by GPFS.
Explanation: The user does not have permission to access disk diskName. User response: Correct the permissions and reissue the command. 6027-1963 Disk diskName was not found.
Explanation: The specified disk was not found. User response: Specify an existing disk and reissue the command. 6027-1964 I/O error on diskName
Explanation: The cited disk name is not allowed because it begins with gpfs. User response: Specify a disk name that does not begin with gpfs and reissue the command. 6027-1947 Use mmauth genkey to recover the file fileName, or to generate and commit a new key.
Explanation: An I/O error occurred on the specified disk. User response: Check for additional error messages. Check the error log for disk hardware problems. 6027-1965 logicalVolume is not a valid logical volume.
Explanation: The specified file was not found. User response: Recover the file, or generate a new key by running: mmauth genkey propagate or generate a new key by running mmauth genkey new, followed by the mmauth genkey commit command. 6027-1948 Disk diskName is too large. 6027-1967
Explanation: The specified logical volume is not a valid logical volume with a corresponding volume group. User response: Reissue the command using a valid logical volume. Disk diskName belongs to back-level file system fileSystem or the state of the disk is not ready. Use mmchfs -V to convert the file system to the latest format. Use mmchdisk to change the state of a disk.
Explanation: The specified disk is too large. User response: Specify a smaller disk and reissue the command. 6027-1949 Propagating the cluster configuration data to all affected nodes.
Explanation: The cluster configuration data is being sent to the rest of the nodes in the cluster. User response: This is an informational message. 6027-1950 Local update lock is busy.
Explanation: The specified disk cannot be initialized for use as a tiebreaker disk. Possible reasons are suggested in the message text. User response: Use the mmlsfs and mmlsdisk commands to determine what action is needed to correct the problem. 6027-1968 Failed while processing disk diskName.
Explanation: More than one process is attempting to update the GPFS environment at the same time. User response: Repeat the command. If the problem persists, verify that there are no blocked processes. 6027-1951 Failed to obtain the local environment update lock.
Explanation: An error was detected while processing the specified disk. User response: Examine prior messages to determine the reason for the failure. Correct the problem and reissue the command. 6027-1969 Device device already exists on node nodeName
Explanation: GPFS was unable to obtain the local environment update lock for more than 30 seconds. User response: Examine previous error messages, if any. Correct any problems and reissue the command. If the problem persists, perform problem determination and contact the IBM Support Center.
Explanation: This device already exists on the specified node. User response: None.
204
6027-1970 6027-1991
6027-1970 Disk diskName has no space for the quorum data structures. Specify a different disk as tiebreaker disk. 6027-1977 Failed validating disk diskName. Error code errorCode.
Explanation: There is not enough free space in the file system descriptor for the tiebreaker disk data structures. User response: Specify a different disk as a tiebreaker disk. 6027-1971 Disk lvName (pvid pvid) is not known on node nodeName
Explanation: GPFS control structures are not as expected. User response: Contact the IBM Support Center. 6027-1984 Name name is not allowed. It is longer than the maximum allowable length (length).
Explanation: The cited name is not allowed because it is longer than the cited maximum allowable length. User response: Specify a name whose length does not exceed the maximum allowable length, and reissue the command. 6027-1985 mmfskxload: The format of the GPFS kernel extension is not correct for this version of AIX.
Explanation: The specified disk is not known on the above node. User response: Check the disk and node names and reissue the command. 6027-1972 Import of volume group vgName on node nodeName failed.
Explanation: Import of the specified volume group on the specified node failed. User response: Check for additional error messages. Check both names and reissue the command. 6027-1973 Volume group vgName is not known on node nodeName.
Explanation: This version of AIX is incompatible with the current format of the GPFS kernel extension. User response: Contact your system administrator to check the AIX version and GPFS kernel extension. 6027-1986 junctionName does not resolve to a directory in deviceName. The junction must be within the specified file system.
Explanation: The above volume group is not defined on the specified node. User response: Check both names and reissue the command. 6027-1974 None of the quorum nodes can be reached.
Explanation: The cited junction path name does not belong to the specified file system. User response: Correct the junction path name and reissue the command. 6027-1987 Name name is not allowed.
Explanation: Ensure that the quorum nodes in the cluster can be reached. At least one of these nodes is required for the command to succeed. User response: Ensure that the quorum nodes are available and reissue the command. 6027-1975 The descriptor file contains more than one descriptor.
Explanation: The cited name is not allowed because it is a reserved word or a prohibited character. User response: Specify a different name and reissue the command. 6027-1988 File system fileSystem is not mounted.
Explanation: The descriptor file must contain only one descriptor. User response: Correct the descriptor file. 6027-1976 The descriptor file contains no descriptor.
Explanation: The cited file system is not currently mounted on this node. User response: Ensure that the file system is mounted and reissue the command. 6027-1991 Vpath disk diskName has an underlying hdisk that already belongs to a volume group.
Explanation: The descriptor file must contain only one descriptor. User response: Correct the descriptor file.
Explanation: The specified vpath disk cannot be used because one or more of its underlying hdisks already belongs to a volume group.
Chapter 11. Messages
205
6027-1993 6027-2008
User response: Remove the underlying hdisks from the volume group, or use a different vpath disk. 6027-1993 File fileName either does not exist or has an incorrect format. description of the correct syntax for the line. User response: Correct the syntax of the line and reissue the command. 6027-1999 Syntax error. The correct syntax is: string.
Explanation: The specified file does not exist or has an incorrect format. User response: Check whether the input file specified actually exists. 6027-1994 Did not find any match with the input disk address.
Explanation: The specified input passed to the command has incorrect syntax. User response: Correct the syntax and reissue the command. 6027-2000 Could not clear fencing for disk physicalDiskName.
Explanation: The mmfileid command returned without finding any disk addresses that match the given input. User response: None. Informational message only. 6027-1995 Device deviceName is not mounted on node nodeName.
Explanation: The fencing information on the disk could not be cleared. User response: Make sure the disk is accessible by this node and retry. 6027-2002 Disk physicalDiskName of type diskType is not supported for fencing.
Explanation: The specified device is not mounted on the specified node. User response: Mount the specified device on the specified node and reissue the command. 6027-1996 Command was unable to determine whether file system fileSystem is mounted.
Explanation: This disk is not a type that supports fencing. User response: None. 6027-2004 None of the specified nodes belong to this GPFS cluster.
Explanation: The command was unable to determine whether the cited file system is mounted. User response: Examine any prior error messages to determine why the command could not determine whether the file system was mounted, resolve the problem if possible, and then reissue the command. If you cannot resolve the problem, reissue the command with the daemon down on all nodes of the cluster. This will ensure that the file system is not mounted, which may allow the command to proceed. 6027-1997 Backup control file fileName from a previous backup does not exist.
Explanation: The nodes specified do not belong to the GPFS cluster. User response: Choose nodes that belong to the cluster and try the command again. 6027-2007 Unable to display fencing for disk physicalDiskName.
Explanation: Cannot retrieve fencing information for this disk. User response: Make sure that this node has access to the disk before retrying. 6027-2008 For the logical volume specification -l lvName to be valid lvName must be the only logical volume in the volume group. However, volume group vgName contains logical volumes.
Explanation: The mmbackup command was asked to do an incremental or a resume backup, but the control file from a previous backup could not be found. User response: Restore the named file to the file system being backed up and reissue the command, or else do a full backup. 6027-1998 Line lineNumber of file fileName is incorrect:
Explanation: The command is being run on a logical volume that belongs to a volume group that has more than one logical volume. User response: Run this command only on a logical volume where it is the only logical volume in the corresponding volume group.
Explanation: A line in the specified file passed to the command had incorrect syntax. The line with the incorrect syntax is displayed next, followed by a
206
6027-2009 6027-2022
6027-2009 logicalVolume is not a valid logical volume. 6027-2015 Node node does not hold a reservation for disk physicalDiskName.
Explanation: logicalVolume does not exist in the ODM, implying that logical name does not exist. User response: Run the command on a valid logical volume. 6027-2010 vgName is not a valid volume group name.
Explanation: The node on which this command is run does not have access to the disk. User response: Run this command from another node that has access to the disk. 6027-2016 SSA fencing support is not present on this node.
Explanation: vgName passed to the command is not found in the ODM, implying that vgName does not exist. User response: Run the command on a valid volume group name. 6027-2011 For the hdisk specification -h physicalDiskName to be valid physicalDiskName must be the only disk in the volume group. However, volume group vgName contains disks.
Explanation: This node does not support SSA fencing. User response: None. 6027-2017 Node ID nodeId is not a valid SSA node ID. SSA node IDs must be a number in the range of 1 to 128.
Explanation: You specified a node ID outside of the acceptable range. User response: Choose a correct node ID and retry the command. 6027-2018 The SSA node id is not set.
Explanation: The hdisk specified belongs to a volume group that contains other disks. User response: Pass an hdisk that belongs to a volume group that contains only this disk. 6027-2012 physicalDiskName is not a valid physical volume name.
Explanation: The SSA node ID has not been set. User response: Set the SSA node ID. 6027-2019 Unable to retrieve the SSA node id.
Explanation: The specified name is not a valid physical disk name. User response: Choose a correct physical disk name and retry the command. 6027-2013 pvid is not a valid physical volume id.
Explanation: A failure occurred while trying to retrieve the SSA node ID. User response: None. 6027-2020 Unable to set fencing for disk physicalDiskName.
Explanation: The specified value is not a valid physical volume ID. User response: Choose a correct physical volume ID and retry the command. 6027-2014 Node node does not have access to disk physicalDiskName.
Explanation: A failure occurred while trying to set fencing for the specified disk. User response: None. 6027-2021 Unable to clear PR reservations for disk physicalDiskNam.
Explanation: The specified node is not able to access the specified disk. User response: Choose a different node or disk (or both), and retry the command. If both the node and disk name are correct, make sure that the node has access to the disk.
Explanation: Failed to clear Persistent Reserve information on the disk. User response: Make sure the disk is accessible by this node before retrying. 6027-2022 Could not open disk physicalDiskName, errno value.
Explanation: The specified disk cannot be opened. User response: Examine the errno value and other messages to determine the reason for the failure. Correct the problem and reissue the command.
Chapter 11. Messages
207
6027-2023 6027-2050
6027-2023 retVal = value, errno = value for key value. errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-2028 could not open disk device diskDeviceName
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-2024 ioctl failed with rc=returnCode, errno=errnoValue. Related values are scsi_status=scsiStatusValue, sense_key=senseKeyValue, scsi_asc=scsiAscValue, scsi_ascq=scsiAscqValue.
Explanation: A problem occurred on a disk open. User response: Ensure the disk is accessible and not fenced out, and then reissue the command. 6027-2029 could not close disk device diskDeviceName
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-2025 READ_KEYS ioctl failed with errno=returnCode, tried timesTried times. Related values are scsi_status=scsiStatusValue, sense_key=senseKeyValue, scsi_asc=scsiAscValue, scsi_ascq=scsiAscqValue.
Explanation: A problem occurred on a disk close. User response: None. 6027-2030 ioctl failed with DSB=value and result=value reason: explanation
Explanation: An ioctl call failed with stated return code, errno value, and related values. User response: Check the reported errno and correct the problem, if possible. Otherwise, contact the IBM Support Center. 6027-2031 ioctl failed with non-zero return code
Explanation: A READ_KEYS ioctl call failed with stated errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-2026 READRES ioctl failed with errno=returnCode, tried timesTried times. Related values are: scsi_status=scsiStatusValue, sense_key=senseKeyValue, scsi_asc=scsiAscValue, scsi_ascq=scsiAscqValue.
Explanation: An ioctl failed with a non-zero return code. User response: Correct the problem, if possible. Otherwise, contact the IBM Support Center. 6027-2049 Cannot pin a page pool of size value bytes.
Explanation: A GPFS page pool cannot be pinned into memory on this machine. User response: Increase the physical memory size of the machine. 6027-2050 Pagepool has size actualValue bytes instead of the requested requestedValue bytes.
Explanation: A REGISTER ioctl call failed with stated errno value, and related values. User response: Check the reported errno and correct the problem if possible. Otherwise, contact the IBM Support Center. 6027-2027 READRES ioctl failed with errno=returnCode, tried timesTried times. Related values are: scsi_status=scsiStatusValue, sense_key=senseKeyValue, scsi_asc=scsiAscValue, scsi_ascq=scsiAscqValue.
Explanation: The configured GPFS page pool is too large to be allocated or pinned into memory on this machine. GPFS will work properly, but with reduced capacity for caching user data. User response: To prevent this message from being generated when the GPFS daemon starts, reduce the page pool size using the mmchconfig command.
208
6027-2100 6027-2112
6027-2100 Incorrect range value-value specified. the problems, and issue the specified command until it completes successfully. 6027-2107 Upgrade the lower release level nodes and run: command.
Explanation: The range specified to the command is incorrect. The first parameter value must be less than or equal to the second parameter value. User response: Correct the address range and reissue the command. 6027-2101 Insufficient free space in fileSystem (storage minimum required).
Explanation: The command could not complete normally. User response: Check the preceding messages, correct the problems, and issue the specified command until it completes successfully. 6027-2108 Error found while processing stanza
Explanation: There is not enough free space in the specified file system or directory for the command to successfully complete. User response: Correct the problem and reissue the command. 6027-2102 Node nodeName is not mmremotefs to run the command.
Explanation: A stanza was found to be unsatisfactory in some way. User response: Check the preceding messages, if any, and correct the condition that caused the stanza to be rejected. 6027-2109 Failed while processing disk stanza on node nodeName.
Explanation: The specified node is not available to run a command. Depending on the command, a different node may be tried. User response: Determine why the specified node is not available and correct the problem. 6027-2103 Directory dirName does not exist
Explanation: A disk stanza was found to be unsatisfactory in some way. User response: Check the preceding messages, if any, and correct the condition that caused the stanza to be rejected. 6027-2110 Missing required parameter parameter
Explanation: The specified directory does not exist. User response: Reissue the command specifying an existing directory. 6027-2104 The GPFS release level could not be determined on nodes: nodeList.
Explanation: The specified parameter is required for this command. User response: Specify the missing information and reissue the command. 6027-2111 The following disks were not deleted: diskList
Explanation: The command was not able to determine the level of the installed GPFS code on the specified nodes. User response: Reissue the command after correcting the problem. 6027-2105 The following nodes must be upgraded to GPFS release productVersion or higher: nodeList
Explanation: The command could not delete the specified disks. Check the preceding messages for error information. User response: Correct the problems and reissue the command. 6027-2112 Permission failure. Option option requires root authority to run.
Explanation: The command requires that all nodes be at the specified GPFS release level. User response: Correct the problem and reissue the command. 6027-2106 Ensure the nodes are available and run: command.
Explanation: The specified command option requires root authority. User response: Log on as root and reissue the command.
Explanation: The command could not complete normally. User response: Check the preceding messages, correct
209
Explanation: A command could not find a GPFS disk that matched the specified disk and node values passed as input. User response: Correct the disk and node values passed as input and reissue the command. 6027-2114 The subsystem subsystem is already active.
Explanation: The user attempted to start a subsystem that was already active. User response: None. Informational message only. 6027-2115 Unable to resolve address range for disk diskName on node nodeName.
Explanation: A command could not perform address range resolution for the specified disk and node values passed as input. User response: Correct the disk and node values passed as input and reissue the command.
210
6027-2128 [E] The attribute attribute must be configured to use hostname as a recovery group server. Explanation: The specified GPFS configuration attributes must be configured to use the node as a recovery group server. User response: Use the mmchconfig command to set the attributes, then reissue the command.
Explanation: There was an attempt to enable Persistent Reserve for a disk, but not all of the NSD server nodes are running Linux. User response: Correct the configuration and enter the command again. 6027-2135 All nodes in the cluster must be running AIX to enable Persistent Reserve for SAN attached disk diskName.
6027-2129 [E] Vdisk block size (blockSize) must match the file system block size (blockSize). Explanation: The specified NSD is a vdisk with a block size that does not match the block size of the file system. User response: Reissue the command using block sizes that match.
Explanation: There was an attempt to enable Persistent Reserve for a SAN-attached disk, but not all nodes in the cluster are running AIX. User response: Correct the configuration and run the command again. 6027-2136 All NSD server nodes must be running AIX to enable Persistent Reserve for disk diskName.
6027-2130 [E] Could not find an active server for recovery group name. Explanation: A command was issued that acts on a recovery group, but no active server was found for the specified recovery group. User response: Perform problem determination.
Explanation: There was an attempt to enable Persistent Reserve for the specified disk, but not all NSD servers are running AIX. User response: Correct the configuration and enter the command again. 6027-2137 An attempt to clear the Persistent Reserve reservations on disk diskName failed.
6027-2131 [E] Cannot create an NSD on a log vdisk. Explanation: The specified disk is a log vdisk; it cannot be used for an NSD. User response: Specify another disk that is not a log vdisk.
Explanation: You are importing a disk into a cluster in which Persistent Reserve is disabled. An attempt to clear the Persistent Reserve reservations on the disk failed. User response: Correct the configuration and enter the command again. 6027-2138 The cluster must be running either all AIX or all Linux nodes to change Persistent Reserve disk diskName to a SAN-attached disk.
6027-2132 [E] Log vdisk vdiskName cannot be deleted while there are other vdisks in recovery group name. Explanation: The specified disk is a log vdisk; it must be the last vdisk deleted from the recovery group. User response: Delete the other vdisks first.
6027-2133 [E] Unable to delete recovery group name; vdisks are still defined. Explanation: Cannot delete a recovery group while there are still vdisks defined. User response: Delete all the vdisks first.
Explanation: There was an attempt to redefine a Persistent Reserve disk as a SAN attached disk, but not all nodes in the cluster were running either all AIX or all Linux nodes. User response: Correct the configuration and enter the command again.
211
6027-2139 6027-2154
6027-2139 NSD server nodes must be running either all AIX or all Linux to enable Persistent Reserve for disk diskName.
Explanation: There was an attempt to enable Persistent Reserve for a disk, but not all NSD server nodes were running all AIX or all Linux nodes. User response: Correct the configuration and enter the command again. 6027-2140 All NSD server nodes must be running AIX or all running Linux to enable Persistent Reserve for disk diskName.
Explanation: TSM dsmc or other specified backup or archive system client could not be found. User response: Verify that TSM is installed, dsmc can be found in the installation location or that the archiver client specified is executable. 6027-2151 The path directoryPath is not contained in the snapshot snapshotName.
Explanation: Attempt to enable Persistent Reserve for a disk while not all NSD server nodes are running AIXor all running Linux. User response: Correct the configuration first. 6027-2141 Disk diskName is not configured as a regular hdisk.
Explanation: In an AIX only cluster, Persistent Reserve is supported for regular hdisks only. User response: Correct the configuration and enter the command again. 6027-2142 Disk diskName is not configured as a regular generic disk.
Explanation: The directory path supplied is not contained in the snapshot named with the -S parameter. User response: Correct the directory path or snapshot name supplied, or omit -S and the snapshot name in the command. 6027-2152 The path directoryPath containing image archives was not found.
Explanation: In a Linux only cluster, Persistent Reserve is supported for regular generic or device mapper virtual disks only. User response: Correct the configuration and enter the command again.
Explanation: The directory path supplied does not contain the expected image files to archive into TSM. User response: Correct the directory path name supplied. 6027-2153 The archiving system backupProgram exited with status return code. Image backup files have been preserved in globalWorkDir
Explanation: Archiving system executed and returned a non-zero exit status due to some error. User response: Examine archiver log files to discern the cause of the archiver's failure. Archive the preserved image files from the indicated path. 6027-2154 Unable to create a policy file for image backup in policyFilePath.
Explanation: A temporary file could not be created in the global shared directory path. User response: Check or correct the directory path name supplied.
212
Explanation: The empty file system targeted for restoration must be mounted in read only mode during restoration. User response: Unmount the file system on all nodes and remount it read only, then try the command again. 6027-2156 The image archive index ImagePath could not be found.
Explanation: The archive image index could be found in the specified path User response: Check command arguments for correct specification of image path, then try the command again. 6027-2157 The image archive index ImagePath is corrupt or incomplete.
Explanation: The archive image index specified is damaged. User response: Check the archive image index file for corruption and remedy. 6027-2158 Disk usage must be dataOnly, metadataOnly, descOnly, dataAndMetadata, or vdiskLog
Explanation: The disk usage positional parameter in a vdisk descriptor has a value that is not valid. The bad disk descriptor is displayed following this message. User response: Correct the input and reissue the command.
6027-2159 [E] parameter is not valid or missing in the vdisk descriptor. Explanation: The vdisk descriptor is not valid. The bad descriptor is displayed following this message. User response: Correct the input and reissue the command.
6027-2160 [E] Vdisk vdiskName is already mapped to NSD nsdName. Explanation: The command cannot create the specified NSD because the underlying vdisk is already mapped to a different NSD. User response: Correct the input and reissue the command.
213
Explanation: The cited option cannot be specified by itself. User response: Correct the input and reissue the command.
Explanation: The command was unable to disable Persistent Reserve on the specified disks. User response: Examine the disks and additional error information to determine if the disks should support Persistent Reserve. Correct the problem and reissue the command.
| 6027-2176 [E] mmchattr for fileName failed. | 6027-2170 [E] Recovery group recoveryGroupName does
not exist or is not active. Explanation: A command was issued to a recovery group that does not exist or is not in the active state. User response: Reissue the command with a valid recovery group name or wait for the recovery group to become active. Explanation: The command to change the attributes of the file failed. User response: Check the previous error messages and correct the problems.
| 6027-2178 |
| Explanation: The input file should contain at least one | NSD descriptor or stanza. | User response: Correct the input file and reissue the | command. | 6027-2181 [E] Failover is allowed only for
single-writer and independent-writer filesets. Explanation: This operation is allowed only for single-writer filesets. User response: Check the previous error messages and correct the problems.
214
| | |
User response: Ensure the GPFS daemon network (as identified in the output of the mmlscluster command) is fully operational and reissue the command.
6027-2183 [E] Peer snapshots using mmpsnap are allowed only for single-writer filesets. Explanation: This operation is allowed only for single-writer filesets. User response: Check the previous error messages and correct the problems.
6027-2184 [E] If the recovery group is damaged, issue mmdelrecoverygroup name -p. Explanation: No active servers were found for the recovery group that is being deleted. If the recovery group is damaged the -p option is needed. User response: Perform diagnosis and reissue the command.
6027-2185 [E] There are no pdisk stanzas in the input file fileName. Explanation: The mmcrrecoverygroup input stanza file has no pdisk stanzas. User response: Correct the input file and reissue the command.
6027-2186 [E] There were no valid vdisk stanzas in the input file fileName. Explanation: The mmcrvdisk input stanza file has no valid vdisk stanzas. User response: Correct the input file and reissue the command.
6027-2187 [E] Could not get pdisk information for the following recovery groups: recoveryGroupList Explanation: An mmlspdisk all command could not query all of the recovery groups because some nodes could not be reached. User response: None.
| | | | | |
6027-2188
Explanation: The command is not able to determine the identity of the local node. This can be the result of a disruption in the network over which the GPFS daemons communicate.
215
Explanation: The specified node is not a gateway node. User response: Designate the node as a gateway node or specify a different node on the command line. 6027-2204 AFM cluster clusterName is already defined.
| 6027-2197 [E] Empty file encountered when running the mmafmctl flushPending command. | | Explanation: The mmafmctl flushPending command | did not find any entries in the file specified with the | --list-file option. | |
User response: Correct the input file and reissue the command.
Explanation: A request was made to add a cluster, but the cluster is already defined. User response: None. 6027-2205 There are no AFM cluster definitions.
| 6027-2198 [E] Cannot run the mmafmctl flushPending command on directory dirName. | | Explanation: The mmafmctl flushPending command | cannot be issued on this directory. | |
User response: Correct the input and reissue the command. 6027-2199 [E] No enclosures were found.
| Explanation: A value of all was specified for the | cluster operand of a GPFS command, but there are no | AFM clusters defined.
User response: None. 6027-2206 AFM cluster clusterName is not defined.
| Explanation: A command searched for disk enclosures | but none were found.
User response: None.
Explanation: The cited cluster name was specified as the AFM cluster operand of a GPFS command, but the cluster is not known to GPFS. User response: Specify an AFM cluster known to GPFS. 6027-2207 Node nodeName is being used as a gateway node for the AFM cluster clusterName.
| User response: Correct the node list and reissue the | command. | 6027-2201 [E] The mmafmctl flushPending command completed with errors. |
Explanation: An error occurred while flushing the queue.
Explanation: The specified node is defined as a gateway node for the specified AFM cluster. User response: If you are trying to delete the node from the GPFS cluster or delete the gateway node role, you must remove it from the export server map. 6027-2208 [E] commandName is already running in the cluster. Explanation: Only one instance of the specified command is allowed to run. User response: None.
| User response: Examine the GPFS log to identify the | cause. | 6027-2202 [E] There is a SCSI-3 PR reservation on
disk diskname. mmcrnsd cannot format the disk because the cluster is not configured as PR enabled. Explanation: The specified disk has a SCSI-3 PR reservation, which prevents the mmcrnsd command from formatting it.
| Explanation: A command was unable to list the | specific object that was requested.
User response: None.
216
| | |
Explanation: A command was unable to build a storage enclosure inventory file. This is a temporary file that is required to complete the requested command. User response: None. 6027-2211 [E] Error collecting firmware information on node nodeName. Explanation: A command was unable to gather firmware information from the specified node.
| | | |
User response: Ensure the node is active and retry the command. 6027-2212 [E] Firmware update file updateFile was not found.
| Explanation: The mmchfirmware command could not | find the specified firmware update file to load. | | | | | | | | | |
User response: Locate the firmware update file and retry the command. 6027-2213 [E] Pdisk path redundancy was lost while updating enclosure firmware. Explanation: The mmchfirmware command lost paths after loading firmware and rebooting the Enclosure Services Module. User response: Wait a few minutes and then retry the command. GPFS might need to be shut down to finish updating the enclosure firmware. 6027-2214 [E] Timeout waiting for firmware to load.
| |
| Explanation: The component specified for the | mmchenclosure command does not need service.
User response: Use the mmlsenclosure command to determine valid input and then retry the command. 6027-2220 [E] Recovery group name has pdisks with missing paths. Consider using the -v no option of the mmchrecoverygroup command. Explanation: The mmchrecoverygroup command failed because all the servers could not see all the disks, and the primary server is missing paths to disks. User response: If the disks are cabled correctly, use the -v no option of the mmchrecoverygroup command. 6027-2221 [E] Error determining redundancy of enclosure serialNumber ESM esmName. Explanation: The mmchrecoverygroup command failed. Check the following error messages. User response: Correct the problem and retry the command. 6027-2222 [E] Storage enclosure serialNumber already has a newer firmware version: firmwareLevel. Explanation: The mmchfirmware command found a newer level of firmware on the specified storage enclosure. User response: If the intent is to force on the older firmware version, use the -v no option.
| |
| | | | | |
Explanation: A storage enclosure firmware update was in progress, but the update did not complete within the expected time frame. User response: Wait a few minutes, and then use the mmlsfirmware command to ensure the operation completed. 6027-2215 [E] Storage enclosure serialNumber not found.
| |
Explanation: The specified storage enclosure was not found. User response: None. 6027-2216 Quota management is disabled for file system fileSystem.
| | |
Explanation: Quota management is disabled for the specified file system. User response: Enable quota management for the file system.
217
| User response: Update the GPFS code on the specified | servers and retry the command.
6027-2500 mmsanrepairfs already in progress for name.
Explanation: This is an output from mmsanrepairfs when another mmsanrepairfs command is already running. User response: Wait for the currently running command to complete and reissue the command. 6027-2501 Could not allocate storage.
Explanation: Sufficient memory could not be allocated to run the mmsanrepairfs command. User response: Increase the amount of memory available. 6027-2576 Error: Daemon value kernel value PAGE_SIZE mismatch.
| User response: Correct the problems and reissue the | command. | 6027-2225 [E] Peer snapshot successfully deleted at cache. The delete snapshot operation | failed at home. Error code errorCode. | | | | | |
Explanation: For an active fileset, check the AFM target configuration for peer snapshots. Ensure there is at least one gateway node configured for the cluster. Examine the preceding messages and the GPFS log for additional details.
Explanation: The GPFS kernel extension loaded in memory does not have the same PAGE_SIZE value as the GPFS daemon PAGE_SIZE value that was returned from the POSIX sysconf API. User response: Verify that the kernel header files used to build the GPFS portability layer are the same kernel header files used to build the running kernel. 6027-2600 Cannot create a new snapshot until an existing one is deleted. File system fileSystem has a limit of number online snapshots.
| User response: Correct the problems and reissue the | command. | 6027-2226 [E] Invalid firmware update file. | Explanation: An invalid firmware update file was | specified for the mmchfirmware command. | User response: Reissue the command with a valid | update file. | 6027-2227 [E] Failback is allowed only for independent-writer filesets. | | Explanation: Failback operation is allowed only for | independent-writer filesets. | User response: Check the fileset mode. | 6027-2228 [E] | | | | | | |
The daemon version (daemonVersion) on node nodeName is lower than the daemon version (daemonVersion) on node nodeName.
Explanation: The file system has reached its limit of online snapshots User response: Delete an existing snapshot, then issue the create snapshot command again. 6027-2601 Snapshot name dirName already exists.
Explanation: by the tscrsnapshot command. User response: Delete existing file/directory and reissue the command. 6027-2602 Unable to delete snapshot snapshotName from file system fileSystem. rc=returnCode.
Explanation: This message is issued by the tscrsnapshot command. User response: Delete the snapshot using the tsdelsnapshot command.
Explanation: A command was issued that requires nodes to be at specific levels, but the affected GPFS servers are not at compatible levels to support this operation.
218
6027-2603 6027-2616
6027-2603 Unable to get permission to create snapshot, rc=returnCode. 6027-2610 File system fileSystem does not contain snapshot snapshotName err = number.
Explanation: This message is issued by the tscrsnapshot command. User response: Reissue the command. 6027-2604 Unable to quiesce all nodes, rc=returnCode.
Explanation: An incorrect snapshot name was specified. User response: Select a valid snapshot and issue the command again. 6027-2611 Cannot delete snapshot snapshotName which is in state snapshotState.
Explanation: This message is issued by the tscrsnapshot command. User response: Restart failing nodes or switches and reissue the command. 6027-2605 Unable to resume all nodes, rc=returnCode.
Explanation: The snapshot cannot be deleted while it is in the cited transition state because of an in-progress snapshot operation. User response: Wait for the in-progress operation to complete and then reissue the command. 6027-2612 Snapshot named snapshotName does not exist.
Explanation: This message is issued by the tscrsnapshot command. User response: Restart failing nodes or switches. 6027-2606 Unable to sync all nodes, rc=returnCode.
Explanation: A snapshot to be listed does not exist. User response: Specify only existing snapshot names. 6027-2613 Cannot restore snapshot. fileSystem is mounted on number node(s) and in use on number node(s).
Explanation: This message is issued by the tscrsnapshot command. User response: Restart failing nodes or switches and reissue the command. 6027-2607 Cannot create new snapshot until an existing one is deleted. Fileset filesetName has a limit of number snapshots.
Explanation: This message is issued by the tsressnapshot command. User response: Unmount the file system and reissue the restore command. 6027-2614 File system fileSystem does not contain snapshot snapshotName err = number.
Explanation: The fileset has reached its limit of snapshots. User response: Delete an existing snapshot, then issue the create snapshot command again. 6027-2608 Cannot create new snapshot: state of fileset filesetName is inconsistent (badState).
Explanation: An incorrect snapshot name was specified. User response: Specify a valid snapshot and issue the command again. 6027-2615 Cannot restore snapshot snapshotName which is snapshotState, err = number.
Explanation: An operation on the cited fileset is incomplete. User response: Complete pending fileset actions, then issue the create snapshot command again. 6027-2609 Fileset named filesetName does not exist.
Explanation: The specified snapshot is not in a valid state. User response: Specify a snapshot that is in a valid state and issue the command again. 6027-2616 Restoring snapshot snapshotName requires quotaTypes quotas to be enabled.
Explanation: One of the filesets listed does not exist. User response: Specify only existing fileset names.
Explanation: The snapshot being restored requires quotas to be enabled, since they were enabled when the snapshot was created. User response: Issue the recommended mmchfs command to enable quotas.
219
6027-2617 6027-2628
6027-2617 You must run: mmchfs fileSystem -Q yes. 6027-2623 Error deleting snapshot fileSystem in file system fileSystem err number.
Explanation: The snapshot being restored requires quotas to be enabled, since they were enabled when the snapshot was created. User response: Issue the cited mmchfs command to enable quotas. 6027-2618 Restoring snapshot snapshotName in file system fileSystem requires quotaTypes quotas to be enabled.
Explanation: The cited snapshot could not be deleted during file system recovery. User response: Run the mmfsck command to recover any lost data blocks. 6027-2624 Previous snapshot snapshotName is not valid and must be deleted before a new snapshot may be created.
Explanation: The snapshot being restored in the cited file system requires quotas to be enabled, since they were enabled when the snapshot was created. User response: Issue the mmchfs command to enable quotas. 6027-2619 Restoring snapshot snapshotName requires quotaTypes quotas to be disabled.
Explanation: The cited previous snapshot is not valid and must be deleted before a new snapshot may be created. User response: Delete the previous snapshot using the mmdelsnapshot command, and then reissue the original snapshot command. 6027-2625 Previous snapshot snapshotName must be restored before a new snapshot may be created.
Explanation: The snapshot being restored requires quotas to be disabled, since they were not enabled when the snapshot was created. User response: Issue the cited mmchfs command to disable quotas. 6027-2620 You must run: mmchfs fileSystem -Q no.
Explanation: The cited previous snapshot must be restored before a new snapshot may be created. User response: Run mmrestorefs on the previous snapshot, and then reissue the original snapshot command. 6027-2626 Previous snapshot snapshotName is not valid and must be deleted before another snapshot may be deleted.
Explanation: The snapshot being restored requires quotas to be disabled, since they were not enabled when the snapshot was created. User response: Issue the cited mmchfs command to disable quotas. 6027-2621 Restoring snapshot snapshotName in file system fileSystem requires quotaTypes quotas to be disabled.
Explanation: The cited previous snapshot is not valid and must be deleted before another snapshot may be deleted. User response: Delete the previous snapshot using the mmdelsnapshot command, and then reissue the original snapshot command. 6027-2627 Previous snapshot snapshotName is not valid and must be deleted before another snapshot may be restored.
Explanation: The snapshot being restored in the cited file system requires quotas to be disabled, since they were disabled when the snapshot was created. User response: Issue the mmchfs command to disable quotas. 6027-2622 Error restoring inode inode, err number.
Explanation: The cited previous snapshot is not valid and must be deleted before another snapshot may be restored. User response: Delete the previous snapshot using the mmdelsnapshot command, and then reissue the original snapshot command. 6027-2628 More than one snapshot is marked for restore.
Explanation: The online snapshot was corrupted. User response: Restore the file from an offline snapshot.
Explanation: More than one snapshot is marked for restore. User response: Restore the previous snapshot and
220
6027-2629 6027-2641
then reissue the original snapshot command. 6027-2629 Offline snapshot being restored. 6027-2635 The free space data is not available. Reissue the command without the -q option to collect it.
Explanation: An offline snapshot is being restored. User response: When the restore of the offline snapshot completes, reissue the original snapshot command. 6027-2630 Program failed, error number.
Explanation: The existing free space information for the file system is currently unavailable. User response: Reissue the mmdf command. 6027-2636 Disks in storage pool storagePool must have disk usage type dataOnly.
Explanation: The tssnaplatest command encountered an error and printErrnoMsg failed. User response: Correct the problem shown and reissue the command. 6027-2631 Attention: Snapshot snapshotName was being restored to fileSystem.
Explanation: A non-system storage pool cannot hold metadata or descriptors. User response: Modify the command's disk descriptors and reissue the command. 6027-2637 The file system must contain at least one disk for metadata.
Explanation: A file system in the process of a snapshot restore cannot be mounted except under a restricted mount. User response: None. Informational message only. 6027-2632 Mount of fileSystem failed: snapshot snapshotName must be restored before it can be mounted.
Explanation: The disk descriptors for this command must include one and only one storage pool that is allowed to contain metadata. User response: Modify the command's disk descriptors and reissue the command. 6027-2638 Maximum of number storage pools allowed.
Explanation: A file system in the process of a snapshot restore cannot be mounted for read only or read/write access. User response: Run the mmrestorefs command to complete the restoration, then reissue the mount command. 6027-2633 Attention: Disk configuration for fileSystem has changed while tsdf was running.
Explanation: The cited limit on the number of storage pools that may be defined has been exceeded. User response: Modify the command's disk descriptors and reissue the command. 6027-2639 Incorrect fileset name filesetName.
Explanation: The fileset name provided in the command invocation is incorrect. User response: Correct the fileset name and reissue the command. 6027-2640 Incorrect path to fileset junction filesetJunction.
Explanation: The disk configuration for the cited file system changed while the tsdf command was running. User response: Reissue the mmdf command. 6027-2634 Attention: number of number regions in fileSystem were unavailable for free space.
Explanation: The path to the cited fileset junction is incorrect. User response: Correct the junction path and reissue the command. 6027-2641 Incorrect fileset junction name filesetJunction.
Explanation: Some regions could not be accessed during the tsdf run. Typically, this is due to utilities such mmdefragfs or mmfsck running concurrently. User response: Reissue the mmdf command.
Explanation: The cited junction name is incorrect. User response: Correct the junction name and reissue the command.
221
6027-2642 6027-2654
6027-2642 Specify one and only one of FilesetName or -J JunctionPath. 6027-2648 Filesets have not been enabled for file system fileSystem.
Explanation: The change fileset and unlink fileset commands accept either a fileset name or the fileset's junction path to uniquely identify the fileset. The user failed to provide either of these, or has tried to provide both. User response: Correct the command invocation and reissue the command. 6027-2643 Cannot create a new fileset until an existing one is deleted. File system fileSystem has a limit of maxNumber filesets.
Explanation: The current file system format version does not support filesets. User response: Change the file system format version by issuing mmchfs -V. 6027-2649 Fileset filesetName contains user files and cannot be deleted unless the -f option is specified.
Explanation: An attempt was made to delete a non-empty fileset. User response: Remove all files and directories from the fileset, or specify the -f option to the mmdelfileset command. 6027-2650 Fileset information is not available.
Explanation: An attempt to create a fileset for the cited file system failed because it would exceed the cited limit. User response: Remove unneeded filesets and reissue the command. 6027-2644 Comment exceeds maximum length of maxNumber characters.
Explanation: A fileset command failed to read file system metadata file. The file system may be corrupted. User response: Run the mmfsck command to recover the file system. 6027-2651 Fileset filesetName cannot be unlinked.
Explanation: The user-provided comment for the new fileset exceeds the maximum allowed length. User response: Shorten the comment and reissue the command. 6027-2645 Fileset filesetName already exists.
Explanation: The user tried to unlink the root fileset, or is not authorized to unlink the selected fileset. User response: None. The fileset cannot be unlinked.
Explanation: An attempt to create a fileset failed because the specified fileset name already exists. User response: Select a unique name for the fileset and reissue the command. 6027-2646 Unable to sync all nodes while quiesced, rc=returnCode
6027-2652
Explanation: The user tried to unlink the root fileset, or is not authorized to unlink the selected fileset. User response: None. The fileset cannot be unlinked. 6027-2653 Failed to unlink fileset filesetName from filesetName.
Explanation: This message is issued by the tscrsnapshot command. User response: Restart failing nodes or switches and reissue the command. 6027-2647 Fileset filesetName must be unlinked to be deleted.
Explanation: An attempt was made to unlink a fileset that is linked to a parent fileset that is being deleted. User response: Delete or unlink the children, and then delete the parent fileset. 6027-2654 Fileset filesetName cannot be deleted while other filesets are linked to it.
Explanation: The cited fileset must be unlinked before it can be deleted. User response: Unlink the fileset, and then reissue the delete command.
Explanation: The fileset to be deleted has other filesets linked to it, and cannot be deleted without using the -f flag, or unlinking the child filesets. User response: Delete or unlink the children, and then delete the parent fileset.
222
6027-2655 6027-2670
6027-2655 Fileset filesetName cannot be deleted. 6027-2662 Directory pathName for junction has too many links.
Explanation: The user is not allowed to delete the root fileset. User response: None. The fileset cannot be deleted. 6027-2656 Unable to quiesce fileset at all nodes.
Explanation: The directory specified for the junction has too many links. User response: Select a new directory for the link and reissue the command. 6027-2663 Fileset filesetName cannot be changed.
Explanation: An attempt to quiesce the fileset at all nodes failed. User response: Check communication hardware and reissue the command. 6027-2657 Fileset filesetName has open files. Specify -f to force unlink.
Explanation: The user specified a fileset to tschfileset that cannot be changed. User response: None. You cannot change the attributes of the root fileset. 6027-2664 Fileset at pathName cannot be changed.
Explanation: An attempt was made to unlink a fileset that has open files. User response: Close the open files and then reissue command, or use the -f option on the unlink command to force the open files to close. 6027-2658 Fileset filesetName cannot be linked into a snapshot at pathName.
Explanation: The user specified a fileset to tschfileset that cannot be changed. User response: None. You cannot change the attributes of the root fileset. 6027-2665 mmfileid already in progress for name.
Explanation: The user specified a directory within a snapshot for the junction to a fileset, but snapshots cannot be modified. User response: Select a directory within the active file system, and reissue the command. 6027-2659 Fileset filesetName is already linked.
Explanation: An mmfileid command is already running. User response: Wait for the currently running command to complete, and issue the new command again. 6027-2666 mmfileid can only handle a maximum of diskAddresses disk addresses.
Explanation: The user specified a fileset that was already linked. User response: Unlink the fileset and then reissue the link command. 6027-2660 Fileset filesetName cannot be linked.
Explanation: Too many disk addresses specified. User response: Provide less than 256 disk addresses to the command. 6027-2667 Allowing block allocation for file system fileSystem that makes a file ill-replicated due to insufficient resource and puts data at risk.
Explanation: The fileset could not be linked. This typically happens when the fileset is in the process of being deleted. User response: None. 6027-2661 Fileset junction pathName already exists.
Explanation: The partialReplicaAllocation file system option allows allocation to succeed even when all replica blocks cannot be allocated. The file was marked as not replicated correctly and the data may be at risk if one of the remaining disks fails. User response: None. Informational message only. 6027-2670 Fileset name filesetName not found.
Explanation: A file or directory already exists at the specified junction. User response: Select a new junction name or a new directory for the link and reissue the link command.
Explanation: The fileset name that was specified with the command invocation was not found. User response: Correct the fileset name and reissue the command.
223
6027-2671 6027-2681
6027-2671 Fileset command on fileSystem failed; snapshot snapshotName must be restored first. 6027-2676 Only file systems with NFSv4 locking semantics enabled can be mounted on this platform.
Explanation: The file system is being restored either from an offline backup or a snapshot, and the restore operation has not finished. Fileset commands cannot be run. User response: Run the mmrestorefs command to complete the snapshot restore operation or to finish the offline restore, then reissue the fileset command. 6027-2672 Junction parent directory inode number inodeNumber is not valid.
Explanation: A user is trying to mount a file system on Microsoft Windows, but the POSIX locking semantics are in effect. User response: Enable NFSv4 locking semantics using the mmchfs command (-D option). 6027-2677 Fileset filesetName has pending changes that need to be synced.
Explanation: An inode number passed to tslinkfileset is not valid. User response: Check the mmlinkfileset command arguments for correctness. If a valid junction path was provided, contact the IBM Support Center. 6027-2673 Duplicate owners of an allocation region (index indexNumber, region regionNumber, pool poolNumber) were detected for file system fileSystem: nodes nodeName and nodeName.
Explanation: A user is trying to change a caching option for a fileset while it has local changes that are not yet synced with the home server. User response: Perform AFM recovery before reissuing the command. 6027-2678 Filesystem fileSystem is mounted on nodes nodes or fileset filesetName is not unlinked.
Explanation: A user is trying to change a caching feature for a fileset while the filesystem is still mounted or the fileset is still linked. User response: Unmount the filesystem from all nodes or unlink the fileset before reissuing the command. 6027-2679 Mount of fileSystem failed because mount event not handled by any data management application.
Explanation: The allocation region should not have duplicate owners. User response: Contact the IBM Support Center. 6027-2674 The owner of an allocation region (index indexNumber, region regionNumber, pool poolNumber) that was detected for file system fileSystem: node nodeName is not valid.
Explanation: The mount failed because the file system is enabled for DMAPI events (-z yes), but there was no data management application running to handle the event. User response: Make sure the DM application (for example HSM or HPSS) is running before the file system is mounted. 6027-2680 AFM filesets cannot be created for file system fileSystem.
Explanation: The file system had detected a problem with the ownership of an allocation region. This may result in a corrupted file system and loss of data. One or more nodes may be terminated to prevent any further damage to the file system. User response: Unmount the file system and run the kwdmmfsck command to repair the file system. 6027-2675 Only file systems with NFSv4 ACL semantics enabled can be mounted on this platform.
Explanation: The current file system format version does not support AFM-enabled filesets; the -p option cannot be used. User response: Change the file system format version by issuing mmchfs -V. 6027-2681 Snapshot snapshotName has linked independent filesets
Explanation: A user is trying to mount a file system on Microsoft Windows, but the ACL semantics disallow NFSv4 ACLs. User response: Enable NFSv4 ACL semantics using the mmchfs command (-k option)
Explanation: The specified snapshot is not in a valid state. User response: Correct the problem and reissue the command.
224
6027-2682 6027-2694[I]
6027-2682 Set quota file attribute error (reasonCode)explanation 6027-2689 The value for --block-size must be the keyword auto or the value must be of the form [n]K, [n]M, [n]G or [n]T, where n is an optional integer in the range 1 to 1023.
Explanation: While mounting a file system a new quota file failed to be created due to inconsistency with the current degree of replication or the number of failure groups. User response: Disable quotas. Check and correct the degree of replication and the number of failure groups. Re-enable quotas. 6027-2683 Fileset filesetName in file system fileSystem does not contain snapshot snapshotName, err = number
Explanation: An invalid value was specified with the --block-size option. User response: Reissue the command with a valid option. 6027-2690 Fileset filesetName can only be linked within its own inode space.
Explanation: An incorrect snapshot name was specified. User response: Select a valid snapshot and issue the command again. 6027-2684 File system fileSystem does not contain global snapshot snapshotName, err = number
Explanation: A dependent fileset can only be linked within its own inode space. User response: Correct the junction path and reissue the command. 6027-2691 The fastea feature needs to be enabled for file system fileSystem before creating AFM filesets.
Explanation: An incorrect snapshot name was specified. User response: Select a valid snapshot and issue the command again. 6027-2685 Total file system capacity allows minMaxInodes inodes in fileSystem. Currently the total inode limits used by all the inode spaces in inodeSpace is inodeSpaceLimit. There must be at least number inodes available to create a new inode space. Use the mmlsfileset -L command to show the maximum inode limits of each fileset. Try reducing the maximum inode limits for some of the inode spaces in fileSystem.
Explanation: The current file system on-disk format does not support storing of extended attributes in the file's inode. This is required for AFM-enabled filesets. User response: Use the mmmigratefs command to enable the fast extended-attributes feature. 6027-2692 Error encountered while processing the input file.
Explanation: The tscrsnapshot command encountered an error while processing the input file. User response: Check and validate the fileset names listed in the input file. 6027-2693 Fileset junction name junctionName conflicts with the current setting of mmsnapdir.
Explanation: The number of inodes available is too small to create a new inode space. User response: Reduce the maximum inode limits and issue the command again. 6027-2688 Only independent filesets can be configured as AFM filesets. The --inode-space=new option is required.
Explanation: The fileset junction name conflicts with the current setting of mmsnapdir. User response: Select a new junction name or a new directory for the link and reissue the mmlinkfileset command. 6027-2694[I] The requested maximum number of inodes is already at number. Explanation: The specified number of nodes is already in effect. User response: This is an informational message.
Explanation: Only independent filesets can be configured for caching. User response: Specify the --inode-space=new option.
225
6027-2695[E] 6027-2707
6027-2695[E] The number of inodes to preallocate cannot be higher than the maximum number of inodes. Explanation: The specified number of nodes to preallocate is not valid. User response: Correct the --inode-limit argument then retry the command. 6027-2696[E] The number of inodes to preallocate cannot be lower than the number inodes already allocated. Explanation: The specified number of nodes to preallocate is not valid. User response: Correct the --inode-limit argument then retry the command. 6027-2697 Fileset at junctionPath has pending changes that need to be synced. GPFS are installed on all nodes. Also, verify that the joining node is in the configuration database. 6027-2701 The mmpmon command file is empty.
Explanation: The mmpmon command file is empty. User response: Check file size, existence, and access permissions. 6027-2702 Unexpected mmpmon response from file system daemon.
Explanation: An unexpected response was received to an mmpmon request. User response: Ensure that the mmfsd daemon is running. Check the error log. Ensure that all GPFS software components are at the same version. 6027-2703 Unknown mmpmon command command.
Explanation: A user is trying to change a caching option for a fileset while it has local changes that are not yet synced with the home server. User response: Perform AFM recovery before reissuing the command. 6027-2698 File system fileSystem is mounted on nodes nodes or fileset at junctionPath is not unlinked.
Explanation: An unknown mmpmon command was read from the input file. User response: Correct the command and rerun. 6027-2704 Permission failure. The command requires root authority to execute.
Explanation: The mmpmon command was issued with a nonzero UID. User response: Log on as root and reissue the command. 6027-2705 Could not establish connection to file system daemon.
Explanation: A user is trying to change a caching feature for a fileset while the filesystem is still mounted or the fileset is still linked. User response: Unmount the filesystem from all nodes or unlink the fileset before reissuing the command. 6027-2699 Cannot create a new independent fileset until an existing one is deleted. File system fileSystem has a limit of maxNumber independent filesets.
Explanation: The connection between a GPFS command and the mmfsd daemon could not be established. The daemon may have crashed, or never been started, or (for mmpmon) the allowed number of simultaneous connections has been exceeded. User response: Ensure that the mmfsd daemon is running. Check the error log. For mmpmon, ensure that the allowed number of simultaneous connections has not been exceeded. 6027-2706 Recovered number nodes.
Explanation: An attempt to create an independent fileset for the cited file system failed because it would exceed the cited limit. User response: Remove unneeded independent filesets and reissue the command. 6027-2700 A node join was rejected. This could be due to incompatible daemon versions, failure to find the node in the configuration database, or no configuration manager found.
Explanation: The asynchronous part (phase 2) of node failure recovery has completed. User response: None. Informational message only. 6027-2707 Node join protocol waiting value seconds for node recovery.
Explanation: A request to join nodes was explicitly rejected. User response: Verify that compatible versions of
Explanation: Node join protocol is delayed until phase 2 of previous node failure recovery protocol is complete.
226
6027-2708 6027-2719
User response: None. Informational message only. 6027-2708 Rejected node join protocol. Phase two of node failure recovery appears to still be in progress. 6027-2714 Could not appoint node nodeName as cluster manager.
Explanation: The mmchmgr -c command generates this message if the specified node cannot be appointed as a new cluster manager. User response: Make sure that the specified node is a quorum node and that GPFS is running on that node. 6027-2715 Could not appoint a new cluster manager.
Explanation: Node join protocol is rejected after a number of internal delays and phase two node failure protocol is still in progress. User response: None. Informational message only. 6027-2709 Configuration manager node nodeName not found in the node list.
Explanation: The specified node was not found in the node list. User response: Add the specified node to the node list and reissue the command. 6027-2710 Node nodeName is being expelled due to expired lease.
Explanation: The mmchmgr -c command generates this message when a node is not available as a cluster manager. User response: Make sure that GPFS is running on a sufficient number of quorum nodes. 6027-2716 Challenge response received; canceling disk election.
Explanation: The nodes listed did not renew their lease in a timely fashion and will be expelled from the cluster. User response: Check the network connection between this node and the node specified above. 6027-2711 File system table full.
Explanation: The node has challenged another node, which won the previous election, and detected a response to the challenge. User response: None. Informational message only. 6027-2717 Node nodeName is already a cluster manager or another node is taking over as the cluster manager.
Explanation: The mmfsd daemon cannot add any more file systems to the table because it is full. User response: None. Informational message only. 6027-2712 Option 'optionName' has been deprecated.
Explanation: The mmchmgr -c command generates this message if the specified node is already the cluster manager. User response: None. Informational message only. 6027-2718 Incorrect port range: GPFSCMDPORTRANGE='range'. Using default.
Explanation: The option that was specified with the command is no longer supported. A warning message is generated to indicate that the option has no effect. User response: Correct the command line and then reissue the command. 6027-2713 Permission failure. The command requires SuperuserName authority to execute.
Explanation: The GPFS command port range format is lllll[-hhhhh], where lllll is the low port value and hhhhh is the high port value. The valid range is 1 to 65535. User response: None. Informational message only. 6027-2719 The files provided do not contain valid quota entries.
Explanation: The command, or the specified command option, requires administrative authority. User response: Log on as a user with administrative privileges and reissue the command.
Explanation: The quota file provided does not have valid quota entries. User response: Check that the file being restored is a valid GPFS quota file.
227
6027-2722 6027-2735
6027-2722 Node limit of number has been reached. Ignoring nodeName. 6027-2730 Node nodeName failed to take over as cluster manager.
Explanation: The number of nodes that have been added to the cluster is greater than some cluster members can handle. User response: Delete some nodes from the cluster using the mmdelnode command, or shut down GPFS on nodes that are running older versions of the code with lower limits. 6027-2723 This node (nodeName) is now cluster manager for clusterName.
Explanation: An attempt to takeover as cluster manager failed. User response: Make sure that GPFS is running on a sufficient number of quorum nodes. 6027-2731 Failed to locate a working cluster manager.
Explanation: The cluster manager has failed or changed. The new cluster manager has not been appointed. User response: Check the internode communication configuration and ensure enough GPFS nodes are up to make a quorum. 6027-2732 Attention: No data disks remain in the system pool. Use mmapplypolicy to migrate all data left in the system pool to other storage pool.
Explanation: This is an informational message when a new cluster manager takes over. User response: None. Informational message only. 6027-2724 reasonString Probing cluster clusterName.
Explanation: This is an informational message when a lease request has not been renewed. User response: None. Informational message only. 6027-2725 Node nodeName lease renewal is overdue. Pinging to check if it is alive.
Explanation: The mmchdiskcommand has been issued but no data disks remain in the system pool. Warn user to use mmapplyppolicy to move data to other storage pool. User response: None. Informational message only. 6027-2733 The file system name (fsname) is longer than the maximum allowable length (maxLength).
Explanation: This is an informational message on the cluster manager when a lease request has not been renewed. User response: None. Informational message only. 6027-2726 Recovered number nodes for file system fileSystem.
Explanation: The asynchronous part (phase 2) of node failure recovery has completed. User response: None. Informational message only. 6027-2727 fileSystem: quota manager is not available.
Explanation: The file system name is invalid because it is longer than the maximum allowed length of 255 characters. User response: Specify a file system name whose length is 255 characters or less and reissue the command. 6027-2734 Disk failure from node nodeName Volume name. Physical volume name.
Explanation: An attempt was made to perform a quota command without a quota manager running. This could be caused by a conflicting offline mmfsck command. User response: Reissue the command once the conflicting program has ended. 6027-2729 Value value for option optionName is out of range. Valid values are value through value.
Explanation: An I/O request to a disk or a request to fence a disk has failed in such a manner that GPFS can no longer use the disk. User response: Check the disk hardware and the software subsystems in the path to the disk. 6027-2735 Not a manager
Explanation: An out of range value was specified for the specified option. User response: Correct the command line.
Explanation: This node is not a manager or no longer a manager of the type required to proceed with the operation. This could be caused by the change of manager in the middle of the operation. User response: Retry the operation.
228
6027-2736 6027-2748
6027-2736 The value for --block-size must be the keyword auto or the value must be of the form nK, nM, nG or nT, where n is an optional integer in the range 1 to 1023. 6027-2743 Permission denied.
Explanation: The command is invoked by an unauthorized user. User response: Retry the command with an authorized user. 6027-2744 Invoking tiebreaker callback script
Explanation: An invalid value was specified with the --block-size option. User response: Reissue the command with a valid option. 6027-2738 Editing quota limits for the root user is not permitted
Explanation: The node is invoking the callback script due to change in quorum membership. User response: None. Informational message only. 6027-2745 File system is not mounted.
Explanation: The root user was specified for quota limits editing in the mmedquota command. User response: Specify a valid user or group in the mmedquota command. Editing quota limits for the root user or system group is prohibited. 6027-2739 Editing quota limits for groupName group not permitted.
Explanation: A command was issued, which requires that the file system be mounted. User response: Mount the file system and reissue the command. 6027-2746 Too many disks unavailable for this server to continue serving a RecoveryGroup.
Explanation: The system group was specified for quota limits editing in the mmedquota command. User response: Specify a valid user or group in the mmedquota command. Editing quota limits for the root user or system group is prohibited. 6027-2740 Starting new election as previous clmgr is expelled.
Explanation: RecoveryGroup panic: Too many disks unavailable to continue serving this RecoveryGroup. This server will resign, and failover to an alternate server will be attempted. User response: Ensure the alternate server took over. Determine what caused this event and address the situation. Prior messages may help determine the cause of the event. 6027-2747 Inconsistency detected between the local node number retrieved from 'mmsdrfs' (nodeNumber) and the node number retrieved from 'mmfs.cfg' (nodeNumber).
Explanation: This node is taking over as clmgr without challenge as the old clmgr is being expelled. User response: None. Informational message only. 6027-2741 This node cannot continue to be cluster manager.
Explanation: This node is invoked the user-specified callback handler for event tiebreakerCheck and it returned a non-zero value. This node cannot continue to be the cluster manager. User response: None. Informational message only.
Explanation: The node number retrieved by obtaining the list of nodes in the mmsdrfs file did not match the node number contained in mmfs.cfg. There may have been a recent change in the IP addresses being used by network interfaces configured at the node. User response: Stop and restart GPFS daemon.
| | | | | | |
6027-2742
CallExitScript: exit script exitScript on event eventName returned code returnCode, quorumloss.
| 6027-2748 | |
Terminating because a conflicting program on the same inode space inodeSpace is running.
Explanation: This node invoked the user-specified callback handler for the tiebreakerCheck event and it returned a non-zero value. The user-specified action with the error is quorumloss. User response: None. Informational message only.
Explanation: A program detected that it must terminate because a conflicting program is running.
| User response: Reissue the command after the | conflicting program ends.
229
6027-2749 6027-2759
6027-2749 Specified locality group 'number' does not match disk 'name' locality group 'number'. To change locality groups in an SNC environment, please use the mmdeldisk and mmadddisk commands. 6027-2755 Another node committed disk election with sequence CommittedSequenceNumber (our sequence was OurSequenceNumber).
Explanation: The locality group specified on the mmchdisk command does not match the current locality group of the disk. User response: To change locality groups in an SNC environment, use the mmdeldisk and mmadddisk commands. 6027-2750 Node NodeName is now the Group Leader.
Explanation: Another node committed a disk election with a sequence number higher than the one used when this node used to commit an election in the past. This means that the other node has become, or is becoming, a Cluster Manager. To avoid having two Cluster Managers, this node will lose quorum. User response: None. Informational message only. 6027-2756 Attention: In file system(FileSystemName)FileSetName, Default QuotaLimitType(QuotaLimit) for QuotaTypeUerName/GroupName/FilesetName is too small. Suggest setting it higher than minQuotaLimit.
Explanation: A new cluster Group Leader has been assigned. User response: None. Informational message only. 6027-2751 Starting new election: Last elected: NodeNumber Sequence: SequenceNumber
Explanation: Users set too low quota limits. It will cause unexpected quota behavior. MinQuotaLimit is computed through: 1. for block: QUOTA_THRESHOLD * MIN_SHARE_BLOCKS * subblocksize 2. for inode: QUOTA_THRESHOLD * MIN_SHARE_INODES User response: Users should reset quota limits so that they are more than MinQuotaLimit. It is just a warning. Quota limits will be set anyway. 6027-2757 The peer snapshot is in progress. Queue cannot be flushed now.
Explanation: A new disk election will be started. The disk challenge will be skipped since the last elected node was either none or the local node. User response: None. Informational message only. 6027-2752 This node got elected. Sequence: SequenceNumber
Explanation: Local node got elected in the disk election. This node will become the cluster manager. User response: None. Informational message only. 6027-2753 Responding to disk challenge: response: ResponseValue. Error code: ErrorCode.
Explanation: The Peer Snapshot is in progress. Queue cannot be flushed now. User response: Reissue the command once the peer snapshot has ended. 6027-2758 The AFM target is not configured for peer snapshots. Run mmafmhomeconfig on the AFM target cluster.
Explanation: A disk challenge has been received, indicating that another node is attempting to become a Cluster Manager. Issuing a challenge response, to confirm the local node is still alive and will remain the Cluster Manager. User response: None. Informational message only. 6027-2754 Challenge thread did not respond to challenge in time: took TimeIntervalSecs seconds.
Explanation: The .afmctl file is probably not present on AFM target cluster. User response: Run mmafmhomeconfig on the AFM target cluster to configure the AFM target cluster. 6027-2759 Disk lease period expired in cluster ClusterName. Attempting to reacquire lease.
Explanation: Challenge thread took too long to respond to a disk challenge. Challenge thread will exit, which will result in the local node losing quorum. User response: None. Informational message only.
Explanation: The disk lease period expired, which will prevent the local node from being able to perform disk I/O. This can be caused by a temporary communication outage. User response: If message is repeated then the communication outage should be investigated.
230
6027-2760 6027-2774
6027-2760 Disk lease reacquired in cluster ClusterName. 6027-2766 User script has chosen to expel node nodeName instead of node nodeName.
Explanation: The disk lease has been reacquired, and disk I/O will be resumed. User response: None. Informational message only. 6027-2761 Unable to run command on 'fileSystem' while the file system is mounted in restricted mode.
Explanation: User has specified a callback script that is invoked whenever a decision is about to be taken on what node should be expelled from the active cluster. As a result of the execution of the script, GPFS will reverse its decision on what node to expel. User response: None.
Explanation: A command that can alter data in a file system was issued while the file system was mounted in restricted mode. User response: Mount the file system in read-only or read-write mode or unmount the file system and then reissue the command. 6027-2762 Unable to run command on 'fileSystem' while the file system is suspended.
| 6027-2770 | |
Disk diskName belongs to a write-affinity enabled storage pool. Its failure group cannot be changed.
| Explanation: The failure group specified on the | mmchdisk command does not match the current failure | group of the disk. | User response: Use the mmdeldisk and mmadddisk | commands to change failure groups in a write-affinity | enabled storage pool. | 6027-2771 |
fileSystem: Default per-fileset quotas are disabled for quotaType.
Explanation: A command that can alter data in a file system was issued while the file system was suspended. User response: Resume the file system and reissue the command. 6027-2763 Unable to start command on 'fileSystem' because conflicting program name is running. Waiting until it completes.
Explanation: A command was issued to modify default fileset-level quota, but default quotas are not enabled. User response: Ensure the --perfileset-quota option is in effect for the file system, then use the mmdefquotaon command to enable default fileset-level quotas. After default quotas are enabled, issue the failed command again. 6027-2772 Cannot close disk name.
Explanation: A program detected that it cannot start because a conflicting program is running. The program will automatically start once the conflicting program has ended as long as there are no other conflicting programs running at that time. User response: None. Informational message only. 6027-2764 Terminating command on fileSystem because a conflicting program name is running.
Explanation: Could not access the specified disk. User response: Check the disk hardware and the path to the disk. Refer to Unable to access disks on page 95.
Explanation: A program detected that it must terminate because a conflicting program is running.
| 6027-2773 |
| |
User response: Reissue the command after the conflicting program ends. 6027-2765 command on 'fileSystem' is finished waiting. Processing continues ... name
Explanation: A command was issued to modify default quota, but default quota is not enabled. User response: Ensure the -Q yes option is in effect for the file system, then enable default quota with the mmdefquotaon command.
Explanation: A program detected that it can now continue the processing since a conflicting program has ended. User response: None. Informational message only.
| 6027-2774 |
Explanation: A command was issued to modify fileset-level quota, but per-fileset quota management is not enabled.
231
6027-2775 6027-2785
| option is in effect for the file system and reissue the | command.
6027-2775 Storage pool named poolName does not exist.
| 6027-2780 |
| Explanation: The cluster manager cannot reach a | sufficient number of quorum nodes, and therefore must | resign to prevent cluster partitioning. | User response: Determine if there is a network outage | or if too many nodes have failed. | 6027-2781 |
Lease expired for numSecs seconds (shutdownOnLeaseExpiry).
Explanation: The mmlspool command was issued, but the specified storage pool does not exist.
| |
User response: Correct the input and reissue the command. Attention: A disk being stopped reduces the degree of system metadata replication (value) or data replication (value) to lower than tolerable.
| 6027-2776 | | |
| Explanation: Disk lease expired for too long, which | results in the node losing cluster membership. | User response: None. The node will attempt to rejoin | the cluster. | 6027-2782 | | | | |
This node is being expelled from the cluster.
| Explanation: The mmchdisk stop command was | issued, but the disk cannot be stopped because of the | current file system metadata and data replication | factors. | | | |
User response: Make more disks available, delete unavailable disks, or change the file system metadata replication factor. Also check the current value of the unmountOnDiskFail configuration parameter. Node nodeName is being expelled because of an expired lease. Pings sent: pingsSent. Replies received: pingRepliesReceived.
Explanation: This node received a message instructing it to leave the cluster, which might indicate communication problems between this node and some other node in the cluster.
| 6027-2777 | | |
| User response: None. The node will attempt to rejoin | the cluster. | 6027-2783 | | | | |
New leader elected with a higher ballot number.
| Explanation: The node listed did not renew its lease | in a timely fashion and is being expelled from the | cluster. | User response: Check the network connection | between this node and the node listed in the message. | 6027-2778 | | | | | |
Node nodeName: ping timed out. Pings sent: pingsSent. Replies received: pingRepliesReceived.
Explanation: A new group leader was elected with a higher ballot number, and this node is no longer the leader. Therefore, this node must leave the cluster and rejoin.
| User response: None. The node will attempt to rejoin | the cluster. | 6027-2784 | | | | |
No longer a cluster manager or lost quorum while running a group protocol.
Explanation: Ping timed out for the node listed, which should be the cluster manager. A new cluster manager will be chosen while the current cluster manager is expelled from the cluster.
Explanation: Cluster manager no longer maintains quorum after attempting to run a group protocol, which might indicate a network outage or node failures.
| User response: Check the network connection | between this node and the node listed in the message. | 6027-2779
Challenge thread stopped.
| User response: None. The node will attempt to rejoin | the cluster. | 6027-2785 |
A severe error was encountered during cluster probe.
| Explanation: A tiebreaker challenge thread stopped | because of an error. Cluster membership will be lost. | User response: Check for additional error messages. | File systems will be unmounted, then the node will | rejoin the cluster.
| Explanation: A severe error was encountered while | running the cluster probe to determine the state of the | nodes in the cluster. | User response: Examine additional error messages. | The node will attempt to rejoin the cluster.
232
6027-2786 6027-2804
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
6027-2786 Unable to contact any quorum nodes during cluster probe.
Explanation: This node has been unable to contact any quorum nodes during cluster probe, which might indicate a network outage or too many quorum node failures. User response: Determine whether there was a network outage or whether quorum nodes failed. 6027-2787 Unable to contact enough other quorum nodes during cluster probe.
| Explanation: The current version of the file system | does not support default fileset-level quotas. | User response: Use the mmchfs -V command to | activate the new function.
6027-2800 Available memory exceeded on request to allocate number bytes. Trace point sourceFile-tracePoint.
Explanation: This node, a quorum node, was unable to contact a sufficient number of quorum nodes during cluster probe, which might indicate a network outage or too many quorum node failures. User response: Determine whether there was a network outage or whether quorum nodes failed. 6027-2788 6027-2788 Attempt to run leader election failed with error errorNumber.
Explanation: The available memory was exceeded during an allocation request made from the cited source file and trace point. User response: Try shutting down and then restarting GPFS. If the problem recurs, contact the IBM Support Center. 6027-2801 Policy set syntax version versionString not supported.
Explanation: This node attempted to run a group leader election but failed to get elected. This failure might indicate that two or more quorum nodes attempted to run the election at the same time. As a result, this node will lose cluster membership and then attempt to rejoin the cluster. User response: None. The node will attempt to rejoin the cluster. 6027-2789 Tiebreaker script returned a non-zero value.
Explanation: The policy rules do not comply with the supported syntax. User response: Rewrite the policy rules, following the documented, supported syntax and keywords. 6027-2802 Object name 'poolName_or_filesetName' is not valid.
Explanation: The cited name is not a valid GPFS object, names an object that is not valid in this context, or names an object that no longer exists. User response: Correct the input to identify a GPFS object that exists and is valid in this context. 6027-2803 Policy set must start with VERSION.
Explanation: The tiebreaker script, invoked during group leader election, returned a non-zero value, which results in the node losing cluster membership and then attempting to rejoin the cluster. User response: None. The node will attempt to rejoin the cluster. 6027-2790 Attention: Disk parameters were changed. Use the mmrestripefs command with the -r option to relocate data and metadata.
Explanation: The policy set does not begin with VERSION as required. User response: Rewrite the policy rules, following the documented, supported syntax and keywords. 6027-2804 Unexpected SQL result code sqlResultCode.
Explanation: The mmchdisk command with the change option was issued. User response: Issue the mmrestripefs -r command to relocate data and metadata. 6027-2791 Disk diskName does not belong to file system deviceName.
Explanation: This could be an IBM programming error. User response: Check that your SQL expressions are correct and supported by the current release of GPFS. If the error recurs, contact the IBM Support Center.
| Explanation: The input disk name does not belong to | the specified file system.
233
6027-2805 6027-2816
6027-2805 Loaded policy 'policyFileName or filesystemName': summaryOfPolicyRules. User response: You should probably install a policy with placement rules using the mmchpolicy command, so that at least some of your data will be stored in your nonsystem storage pools. 6027-2811 Policy has no storage pool placement rules!
Explanation: The specified loaded policy has the specified policy rules. User response: None. Informational message only. 6027-2806 Error while validating policy 'policyFileName or filesystemName': rc=errorCode: errorDetailsString.
Explanation: The policy has no storage pool placement rules. User response: You should probably install a policy with placement rules using the mmchpolicy command, so that at least some of your data will be stored in your nonsystem storage pools. 6027-2812 Keyword 'keywordValue' begins a second clauseName clause - only one is allowed.
Explanation: An error occurred while validating the specified policy. User response: Correct the policy rules, heeding the error details in this message and other messages issued immediately before or after this message. Use the mmchpolicy command to install a corrected policy rules file. 6027-2807 Error in evaluation of placement policy for file fileName: errorDetailsString.
Explanation: The policy rule should only have one clause of the indicated type. User response: Correct the rule and reissue the policy command. 6027-2813 This 'ruleName' rule is missing a clauseType required clause.
Explanation: An error occurred while evaluating the installed placement policy for a particular new file. Although the policy rules appeared to be syntactically correct when the policy was installed, evidently there is a problem when certain values of file attributes occur at runtime. User response: Determine which file names and attributes trigger this error. Correct the policy rules, heeding the error details in this message and other messages issued immediately before or after this message. Use the mmchpolicy command to install a corrected policy rules file. 6027-2808 In rule 'ruleName' (ruleNumber), 'wouldBePoolName' is not a valid pool name.
Explanation: The policy rule must have a clause of the indicated type. User response: Correct the rule and reissue the policy command. 6027-2814 This 'ruleName' rule is of unknown type or not supported.
Explanation: The policy rule set seems to have a rule of an unknown type or a rule that is unsupported by the current release of GPFS. User response: Correct the rule and reissue the policy command. 6027-2815 The value 'value' is not supported in a 'clauseType' clause.
Explanation: The cited name that appeared in the cited rule is not a valid pool name. This may be because the cited name was misspelled or removed from the file system. User response: Correct or remove the rule. 6027-2809 Validated policy 'policyFileName or filesystemName': summaryOfPolicyRules.
Explanation: The policy rule clause seems to specify an unsupported argument or value that is not supported by the current release of GPFS. User response: Correct the rule and reissue the policy command. 6027-2816 Policy rules employ features that would require a file system upgrade.
Explanation: The specified validated policy has the specified policy rules. User response: None. Informational message only. 6027-2810 There are numberOfPools storage pools but the policy file is missing or empty.
Explanation: One or more policy rules have been written to use new features that cannot be installed on a back-level file system. User response: Install the latest GPFS software on all nodes and upgrade the file system or change your
Explanation: The cited number of storage pools are defined, but the policy file is missing or empty.
234
6027-2817 6027-2955
rules. (Note that LIMIT was introduced in GPFS Release 3.2.) 6027-2817 Error on popen/pclose (command_string): rc=return_code_from_popen_or_pclose and accessible. Check the file and then reissue the command. 6027-2821 Rule 'ruleName' (ruleNumber) specifies a THRESHOLD for EXTERNAL POOL 'externalPoolName'. This is not supported.
Explanation: The execution of the command_string by popen/pclose resulted in an error. User response: To correct the error, do one or more of the following: Check that the standard m4 macro processing command is installed on your system as /usr/bin/m4. Or: Set the MM_M4_CMD environment variable. Or: Correct the macro definitions in your policy rules file. If the problem persists, contact the IBM Support Center.
Explanation: GPFS does not support the THRESHOLD clause within a migrate rule that names an external pool in the FROM POOL clause. User response: Correct or remove the rule. 6027-2950 Trace value 'value' after class 'class' must be from 0 to 14.
Explanation: The specified trace value is not recognized. User response: Specify a valid trace integer value. 6027-2951 Value value for worker1Threads must be <= than the original setting value
6027-2818
Explanation: An attempt to expand the policy rules with an m4 subprocess yielded some warnings or errors or the m4 macro wrote some output to standard error. Details or related messages may follow this message. User response: To correct the error, do one or more of the following: Check that the standard m4 macro processing command is installed on your system as /usr/bin/m4. Or: Set the MM_M4_CMD environment variable. Or: Correct the macro definitions in your policy rules file. If the problem persists, contact the IBM Support Center. 6027-2819 Error opening temp file temp_file_name: errorString
Explanation: An attempt to dynamically set worker1Threads found the value out of range. The dynamic value must be 2 <= value <= the original setting when the GPFS daemon was started. 6027-2952 Unknown assert class 'assertClass'.
Explanation: The assert class is not recognized. User response: Specify a valid assert class. 6027-2953 Non-numeric assert value 'value' after class 'class'.
Explanation: The specified assert value is not recognized. User response: Specify a valid assert integer value. 6027-2954 Assert value 'value' after class 'class' must be from 0 to 127.
Explanation: The specified assert value is not recognized. User response: Specify a valid assert integer value. 6027-2955 Time-of-day may have jumped back. Late by delaySeconds seconds to wake certain threads.
Explanation: An error occurred while attempting to open the specified temporary work file. User response: Check that the path name is defined and accessible. Check the file and then reissue the command. 6027-2820 Error reading temp file temp_file_name: errorString
Explanation: Time-of-day may have jumped back, which has resulted in some threads being awakened later than expected. It is also possible that some other factor has caused a delay in waking up the threads. User response: Verify if there is any problem with network time synchronization, or if time-of-day is being incorrectly set.
Chapter 11. Messages
Explanation: An error occurred while attempting to read the specified temporary work file. User response: Check that the path name is defined
235
| 6027-3008 [E] Incorrect recovery group given for | 6027-3002 [E] Disk location code locationCode is not
known. Explanation: A disk location code specified on the command line was not found. User response: Check the disk location code. location. Explanation: The mmchcarrier command detected that the specified recovery group name given does not match that of the pdisk in the specified location. User response: Check the disk location code and recovery group name. If you are sure that the disks in the carrier are not being used by other recovery groups, it is possible to override the check using the --force-RG flag. Use this flag with caution as it can cause disk errors and potential data loss in other recovery groups.
236
6027-3012 [E] Cannot find a pdisk in location locationCode. Explanation: The tschcarrier command cannot find a pdisk to replace in the given location. User response: Check the disk location code.
6027-3013 [W] Disk location locationCode failed to power on. Explanation: The mmchcarrier command detected an error when trying to power on a disk. User response: Make sure the disk is firmly seated and run the command again.
6027-3015 [E] Pdisk pdiskName of recovery group recoveryGroupName in location locationCode cannot be used as a replacement for pdisk pdiskName of recovery group recoveryGroupName. Explanation: The tschcarrier command expected a pdisk to be removed and replaced with a new disk. But instead of finding a new disk, the mmchcarrier command found that another pdisk was moved to the replacement location. User response: Repeat the disk replacement procedure, making sure to replace the failed pdisk with a new disk.
Explanation: The tscrvdisk command could not create the necessary metadata for the specified vdisk. User response: Change the vdisk arguments and retry the command.
237
238
6027-3035 [E] Cannot configure NSD-RAID services. maxblocksize must be at least value. Explanation: The GPFS daemon is starting and cannot initialize the NSD-RAID services because the maxblocksize attribute is too small. User response: Correct the maxblocksize attribute and restart the GPFS daemon.
6027-3036 [E] Partition size must be a power of 2. Explanation: The partitionSize parameter of some declustered array was invalid. User response: Correct the partitionSize parameter and reissue the command.
6027-3037 [E] Partition size must be between number and number. Explanation: The partitionSize parameter of some declustered array was invalid. User response: Correct the partitionSize parameter to a power of 2 within the specified range and reissue the command.
6027-3038 [E] AU log too small; must be at least number bytes. Explanation: The auLogSize parameter of a new declustered array was invalid. User response: Increase the auLogSize parameter and reissue the command.
6027-3039 [E] A vdisk with disk usage vdiskLogTip must be the first vdisk created in a recovery group. Explanation: The --logTip disk usage was specified for a vdisk other than the first one created in a recovery group. User response: Retry the command with a different disk usage.
6027-3040 [E] Declustered array configuration data does not fit. Explanation: There is not enough space in the pdisks of a new declustered array to hold the AU log area using the current partition size. User response: Increase the partitionSize parameter or decrease the auLogSize parameter and reissue the command.
239
| 6027-3056 [E] Long and short term event log size and
fast write log percentage are only applicable to log home vdisk. Explanation: The longTermEventLogSize, shortTermEventLogSize, and fastWriteLogPct options are only applicable to log home vdisk. User response: Remove any of these options and retry vdisk creation.
240
| 6027-3062 [E] Recovery group version version is not compatible with the current recovery | group version. | | Explanation: The recovery group version specified | with the --version option does not support all of the | features currently supported by the recovery group. | User response: Run the command with a new value | for --version. The allowable values will be listed | following this message. | 6027-3063 [E] Unknown recovery group version version. | | Explanation: The recovery group version named by | the argument of the --version option was not | recognized. | User response: Run the command with a new value | for --version. The allowable values will be listed | following this message. | 6027-3064 [I] Allowable recovery group versions are: | Explanation: Informational message listing allowable | recovery group versions. | User response: Run the command with one of the | recovery group versions listed. | 6027-3065 [E] The maximum size of a log tip vdisk is size. | | Explanation: Running mmcrvdisk for a log tip vdisk | failed because the size is too large. | User response: Correct the size parameter and run the | command again.
6027-3200 AFM ERROR: command pCacheCmd fileset filesetName fileids [parentId,childId,tParentId,targetId,ReqCmd] name sourceName original error oerr application error aerr remote error remoteError
6027-3058 [E] GSS license failure - GPFS Native RAID services will not be configured on this node. Explanation: The GPFS Storage Server has not been validly installed. Therefore, GPFS Native RAID services will not be configured. User response: Install a legal copy of the base GPFS code and restart the GPFS daemon.
| | | |
6027-3059 [E] The serviceDrain state is only permitted when all nodes in the cluster are running daemon version version or higher.
| Explanation: The mmchpdisk command option | --begin-service-drain was issued, but there are | backlevel nodes in the cluster that do not support this | action. | | | | | | | | | | | | | | | | | | | | | |
User response: Upgrade the nodes in the cluster to at least the specified version and run the command again. 6027-3060 [E] Vdisk block size of log tip vdisk and log home vdisk must be the same. Explanation: Enforce the restriction that the block size of the log tip vdisk must be the same as the block size of the log home vdisk. User response: Try running the command again. Make sure that the block size of the log home vdisk is the same as the block size of the log tip vdisk. 6027-3061 [E] Cannot delete path pathName because there would be no other working paths to pdisk pdiskName of RG recoveryGroupName. Explanation: When the -v yes option is specified on the --delete-paths subcommand of the tschrecgroup command, it is not allowed to delete the last working path to a pdisk. User response: Try running the command again after repairing other broken paths for the named pdisk, or reduce the list of paths being deleted, or run the command with -v no.
Explanation: AFM operations on a particular file failed. User response: For asynchronous operations that are requeued, run the mmafmctl command with the resumeRequeued option after fixing the problem at the home cluster. 6027-3201 AFM ERROR DETAILS: type: remoteCmdType snapshot name snapshotName snapshot ID snapshotId
Explanation: Peer snapshot creation or deletion failed. User response: Fix snapshot creation or deletion error.
241
6027-3204 6027-3219
6027-3204 AFM: Failed to set xattr on inode inodeNum error err, ignoring. directory at the home cluster. User response: None. 6027-3214 Unexpiration of fileset filesetName failed with error err. Use mmafmctl to manually unexpire the fileset.
Explanation: Setting extended attributes on an inode failed. User response: None. 6027-3205 AFM: Failed to get xattrs for inode inodeNum, ignoring.
Explanation: Unexpiration of fileset failed after a home reconnect. User response: Run the mmafmctl command with the unexpire option on the fileset. 6027-3215 AFM Warning: Peer snapshot delayed due to long running execution of operation to remote cluster for fileset filesetName. Peer snapshot continuing to wait.
Explanation: Getting extended attributes on an inode failed. User response: None. 6027-3209 Home NFS mount of host:path failed with error err
Explanation: NFS mounting of path from the home cluster failed. User response: Make sure the exported path can be mounted over NFSv3. 6027-3210 Cannot find AFM control file for fileset filesetName in the exported file system at home. ACLs and extended attributes will not be synchronized. Sparse files will have zeros written for holes.
Explanation: Peer snapshot command timed out waiting to flush messages. User response: None. 6027-3216 Fileset filesetName encountered an error synchronizing with the remote cluster. Cannot synchronize with the remote cluster until AFM recovery is executed.
Explanation: Either home path does not belong to GPFS, or the AFM control file is not present in the exported path. User response: If the exported path belongs to a GPFS file system, run the mmafmhomeconfig command with the enable option on the export path at home. 6027-3211 Change in home export detected. Caching will be disabled.
Explanation: Cache failed to synchronize with home because of an out of memory or conflict error. Recovery, resynchronization, or both will be performed by GPFS to synchronize cache with the home. User response: None. 6027-3217 AFM ERROR Unable to unmount NFS export for fileset filesetName
Explanation: NFS unmount of the path failed. User response: None. 6027-3218 Change in home export detected. Synchronization with home is suspended until the problem is resolved.
Explanation: A change in home export was detected or the home path is stale. User response: Ensure the exported path is accessible. 6027-3212 AFM ERROR: Cannot enable AFM for fileset filesetName (error err)
Explanation: AFM was not enabled for the fileset because the root file handle was modified, or the remote path is stale. User response: Ensure the remote export path is accessible for NFS mount. 6027-3213 Cannot find snapshot link directory name for exported file system at home for fileset filesetName. Snapshot directory at home will be cached.
Explanation: A change in home export was detected or the home path is stale. User response: Ensure the exported path is accessible.
| 6027-3219
AFM: Home has been restored. Synchronization with home will be resumed.
Explanation: A change in home export was detected that caused the home to be restored. Synchronization with home will be resumed. User response: None.
242
6027-3220 6027-3234
| | | | | |
6027-3220 AFM: Home NFS mount of host:path failed with error err for file system fileSystem fileset id filesetName. Caching will be disabled and the mount will be tried again after mountRetryTime seconds, on next request to gateway User response: Ensure the remote export path is accessible for NFS mount.
| 6027-3228 |
AFM ERROR: Unable to unmount NFS export for file system fileSystem fileset filesetName
| Explanation: NFS mount of the home cluster failed. | The mount will be tried again after mountRetryTime | seconds. | User response: Make sure the exported path can be | mounted over NFSv3. | | | | | | | |
6027-3221 AFM: Home NFS mount of host:path succeeded for file system fileSystem fileset filesetName. Caching is enabled.
| 6027-3229
AFM: File system fileSystem fileset filesetName encountered an error synchronizing with the remote cluster. Cannot synchronize with the remote cluster until AFM recovery is executed.
Explanation: NFS mounting of the path from the home cluster succeeded. Caching is enabled. User response: None. 6027-3224 AFM: Failed to set extended attributes on file system fileSystem inode inodeNum error err, ignoring.
| | | |
Explanation: The cache failed to synchronize with home because of an out of memory or conflict error. Recovery, resynchronization, or both will be performed by GPFS to synchronize the cache with the home. User response: None.
| 6027-3230 | | |
AFM: Cannot find snapshot link directory name for exported file system at home for file system fileSystem fileset filesetName. Snapshot directory at home will be cached.
Explanation: Unable to determine the snapshot directory at the home cluster. User response: None.
| |
6027-3225
AFM: Failed to get extended attributes for file system fileSystem inode inodeNum, ignoring.
| 6027-3232 | | |
AFM type: pCacheCmd file system fileSystem fileset filesetName file IDs [parentId.childId.tParentId.targetId,flag] name sourceName origin origin error err
| |
6027-3226
AFM: Cannot find AFM control file for file system fileSystem fileset filesetName in the exported file system at home. ACLs and extended attributes will not be synchronized. Sparse files will have zeros written for holes.
Explanation: Either the home path does not belong to GPFS, or the AFM control file is not present in the exported path. User response: If the exported path belongs to a GPFS file system, run the mmafmhomeconfig command with the enable option on the export path at home. 6027-3227 AFM ERROR: Cannot enable AFM for file system fileSystem fileset filesetName (error err)
| 6027-3233 |
| Explanation: Multiple AFM operations have failed. | User response: None. | 6027-3234 |
AFM: Unable to start thread to unexpire filesets.
| Explanation: Failed to start thread for unexpiration of | fileset. | User response: None.
Explanation: AFM was not enabled for the fileset because the root file handle was modified, or the remote path is stale.
243
6027-3235 6027-3400
| 6027-3235 |
AFM: Stopping recovery for the file system fileSystem fileset filesetName
| Explanation: AFM recovery terminated because the | current node is no longer MDS for the fileset. | User response: None. | 6027-3236 | | | | |
Recovery on file system fileSystem fileset filesetName failed with error err. Recovery will be retried after recovery retry interval (timeout seconds) or manually resolve known problems and recover the fileset.
| 6027-3240 | | |
AFM Error: pCacheCmd file system fileSystem fileset filesetName file IDs [parentId.childId.tParentId.targetId,flag] error err
| Explanation: Operation failed to execute on home in | independent-writer mode. | User response: None.
6027-3300 Attribute afmShowHomeSnapshot cannot be changed for a single-writer fileset.
| Explanation: Recovery failed to complete on the | fileset. The next access will restart recovery. | | | | | | |
Explanation: AFM recovery failed. Fileset will be temporarily put into dropped state and will be recovered on accessing fileset after timeout mentioned in the error message. User can recover the fileset manually by running mmafmctl command with recover option after rectifying any known errors leading to failure.
Explanation: Changing afmShowHomeSnapshot is not supported for single-writer filesets. User response: None. 6027-3301 Unable to quiesce all nodes; some processes are busy or holding required resources.
Explanation: A timeout occurred on one or more nodes while trying to quiesce the file system during a snapshot command.
| User response: Check the GPFS log on the file system | manager node.
6027-3302 Attribute afmShowHomeSnapshot cannot be changed for a afmMode fileset.
| Explanation: Retrying AFM Recovery failed. Fileset | will be moved to dropped state. User should recover | the fileset by running mmafmctl command. | User response: Recover fileset manually to continue. | 6027-3238 | | | | | |
AFM: Home is taking longer than expected to respond. Synchronization with home is temporarily suspended.
Explanation: Changing afmShowHomeSnapshot is not supported for single-writer or independent-writer filesets. User response: None.
| 6027-3303 |
Explanation: A pending message from gateway node to home is taking longer than expected to respond. This could be the result of a network issue or a problem at the home site.
Explanation: File system quota management is still active. The file system must be unmounted when restoring global snapshots. User response: Unmount the file system and reissue the restore command.
| 6027-3400 | | | | | |
Attention: The file system is at risk. The specified replication factor does not tolerate unavailable metadata disks.
| Explanation: A failure occurred when creating or | deleting a peer snapshot. | User response: Examine the error details and retry the | operation.
Explanation: The default metadata replication was reduced to one while there were unavailable, or stopped, metadata disks. This condition prevents future file system manager takeover.
| User response: Change the default metadata | replication, or delete unavailable disks if possible.
244
6027-3401 6027-3402
| | | | | | | | | |
6027-3401 Failure group value for disk diskName is not valid.
| | | | | |
diskName will be marked as down. Explanation: A new device mapper path was detected, or a previously failed path was activated after the local device discovery was finished. This path lacks a Persistent Reserve and cannot be used. All device paths must be active at mount time.
Explanation: An explicit failure group must be specified for each disk that belongs to a write affinity enabled storage pool. User response: Specify a valid failure group. 6027-3402 An unexpected device mapper path dmDevice (nsdId) was detected. The new path does not have Persistent Reserve enabled. The local access to disk
| User response: Check the paths to all disks in the file | system. Repair any failed paths to disks then rediscover | the local disk access.
245
246
Accessibility features
The following list includes the major accessibility features in GPFS: v Keyboard-only operation v Interfaces that are commonly used by screen readers v Keys that are discernible by touch but do not activate just by touching them v Industry-standard devices for ports and connectors v The attachment of alternative input and output devices The IBM Cluster Information Center, and its related publications, are accessibility-enabled. The accessibility features of the information center are described in the Accessibility topic at the following URL: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/ com.ibm.cluster.addinfo.doc/access.html.
Keyboard navigation
This product uses standard Microsoft Windows navigation keys.
247
248
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 19-21, Nihonbashi-Hakozakicho, Chuo-ku Tokyo 103-8510, Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
249
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation Mail Station P300 2455 South Road, Poughkeepsie, NY 12601-5400 USA Such information may be available, subject to appropriate terms and conditions, including in some cases, payment or a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.
250
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries.
Notices
251
252
Glossary
This glossary defines technical terms and abbreviations used in GPFS documentation. If you do not find the term you are looking for, refer to the index of the appropriate book or view the IBM Glossary of Computing Terms, located on the Internet at: http://www-306.ibm.com/software/ globalization/terminology/index.jsp. Application Environment (CAE) Specification C429, The Open Group ISBN 1-85912-190-X. deadman switch timer A kernel timer that works on a node that has lost its disk lease and has outstanding I/O requests. This timer ensures that the node cannot complete the outstanding I/O requests (which would risk causing file system corruption), by causing a panic in the kernel. dependent fileset A fileset that shares the inode space of an existing independent fileset. disk descriptor A definition of the type of data that the disk contains and the failure group to which this disk belongs. See also failure group. disposition The session to which a data management event is delivered. An individual disposition is set for each type of event from each file system. disk leasing A method for controlling access to storage devices from multiple host systems. Any host that wants to access a storage device configured to use disk leasing registers for a lease; in the event of a perceived failure, a host system can deny access, preventing I/O operations with the storage device until the preempted system has reregistered. domain A logical grouping of resources in a network for the purpose of common management and administration.
B
block utilization The measurement of the percentage of used subblocks per allocated blocks.
C
cluster A loosely-coupled collection of independent systems (nodes) organized into a network for the purpose of sharing resources and communicating with each other. See also GPFS cluster. cluster configuration data The configuration data that is stored on the cluster configuration servers. cluster manager The node that monitors node status using disk leases, detects failures, drives recovery, and selects file system managers. The cluster manager is the node with the lowest node number among the quorum nodes that are operating at a particular time. control data structures Data structures needed to manage file data and metadata cached in memory. Control data structures include hash tables and link pointers for finding cached data; lock states and tokens to implement distributed locking; and various flags and sequence numbers to keep track of updates to the cached data.
F
failback Cluster recovery from failover following repair. See also failover. failover (1) The assumption of file system duties by another node when a node fails. (2) The process of transferring all control of
D
Data Management Application Program Interface (DMAPI) The interface defined by the Open Group's XDSM standard as described in the publication System Management: Data Storage Management (XDSM) API Common
Copyright IBM Corporation IBM 1998, 2013
253
the ESS to a single cluster in the ESS when the other clusters in the ESS fails. See also cluster. (3) The routing of all transactions to a second controller when the first controller fails. See also cluster. failure group A collection of disks that share common access paths or adapter connection, and could all become unavailable through a single hardware failure. fileset A hierarchical grouping of files managed as a unit for balancing workload across a cluster. See also dependent fileset and independent fileset. fileset snapshot A snapshot of an independent fileset plus all dependent filesets. file clone A writable snapshot of an individual file. file-management policy A set of rules defined in a policy file that GPFS uses to manage file migration and file deletion. See also policy. file-placement policy A set of rules defined in a policy file that GPFS uses to manage the initial placement of a newly created file. See also policy. file system descriptor A data structure containing key information about a file system. This information includes the disks assigned to the file system (stripe group), the current state of the file system, and pointers to key files such as quota files and log files. file system descriptor quorum The number of disks needed in order to write the file system descriptor correctly. file system manager The provider of services for all the nodes using a single file system. A file system manager processes changes to the state or description of the file system, controls the regions of disks that are allocated to each node, and controls token management and quota management. fragment The space allocated for an amount of data too small to require a full block. A fragment consists of one or more subblocks.
G
global snapshot A snapshot of an entire GPFS file system. GPFS cluster A cluster of nodes defined as being available for use by GPFS file systems. GPFS portability layer The interface module that each installation must build for its specific hardware platform and Linux distribution. GPFS recovery log A file that contains a record of metadata activity, and exists for each node of a cluster. In the event of a node failure, the recovery log for the failed node is replayed, restoring the file system to a consistent state and allowing other nodes to continue working.
I
ill-placed file A file assigned to one storage pool, but having some or all of its data in a different storage pool. ill-replicated file A file with contents that are not correctly replicated according to the desired setting for that file. This situation occurs in the interval between a change in the file's replication settings or suspending one of its disks, and the restripe of the file. independent fileset A fileset that has its own inode space. indirect block A block containing pointers to other blocks. inode The internal structure that describes the individual files in the file system. There is one inode for each file. inode space A collection of inode number ranges reserved for an independent fileset, which enables more efficient per-fileset functions.
J
journaled file system (JFS) A technology designed for high-throughput server environments,
254
which are important for running intranet and other high-performance e-business file servers. junction A special directory entry that connects a name in a directory of one fileset to the root directory of another fileset.
NSD volume ID A unique 16 digit hex number that is used to identify and access all NSDs. node An individual operating-system image within a cluster. Depending on the way in which the computer system is partitioned, it may contain one or more nodes.
K
kernel The part of an operating system that contains programs for such tasks as input/output, management and control of hardware, and the scheduling of user tasks.
node descriptor A definition that indicates how GPFS uses a node. Possible functions include: manager node, client node, quorum node, and nonquorum node node number A number that is generated and maintained by GPFS as the cluster is created, and as nodes are added to or deleted from the cluster. node quorum The minimum number of nodes that must be running in order for the daemon to start. node quorum with tiebreaker disks A form of quorum that allows GPFS to run with as little as one quorum node available, as long as there is access to a majority of the quorum disks. non-quorum node A node in a cluster that is not counted for the purposes of quorum determination.
M
metadata A data structures that contain access information about file data. These include: inodes, indirect blocks, and directories. These data structures are not accessible to user applications. metanode The one node per open file that is responsible for maintaining file metadata integrity. In most cases, the node that has had the file open for the longest period of continuous time is the metanode. mirroring The process of writing the same data to multiple disks at the same time. The mirroring of data protects it against data loss within the database or within the recovery log. multi-tailed A disk connected to multiple nodes.
P
policy A list of file-placement and service-class rules that define characteristics and placement of files. Several policies can be defined within the configuration, but only one policy set is active at one time. policy rule A programming statement within a policy that defines a specific action to be preformed. pool A group of resources with similar characteristics and attributes.
N
namespace Space reserved by a file system to contain the names of its objects. Network File System (NFS) A protocol, developed by Sun Microsystems, Incorporated, that allows any host in a network to gain access to another host or netgroup and their file directories. Network Shared Disk (NSD) A component for cluster-wide disk naming and access.
portability The ability of a programming language to compile successfully on different operating systems without requiring changes to the source code.
Glossary
255
primary GPFS cluster configuration server In a GPFS cluster, the node chosen to maintain the GPFS cluster configuration data. private IP address A IP address used to communicate on a private network. public IP address A IP address used to communicate on a public network.
owner, and groups), the requesting client, and the container name associated with the object.
S
SAN-attached Disks that are physically attached to all nodes in the cluster using Serial Storage Architecture (SSA) connections or using fibre channel switches Scale Out Backup and Restore (SOBAR) A specialized mechanism for data protection against disaster only for GPFS file systems that are managed by Tivoli Storage Manager (TSM) Hierarchical Storage Management (HSM). secondary GPFS cluster configuration server In a GPFS cluster, the node chosen to maintain the GPFS cluster configuration data in the event that the primary GPFS cluster configuration server fails or becomes unavailable. Secure Hash Algorithm digest (SHA digest) A character string used to identify a GPFS security key. session failure The loss of all resources of a data management session due to the failure of the daemon on the session node. session node The node on which a data management session was created. Small Computer System Interface (SCSI) An ANSI-standard electronic interface that allows personal computers to communicate with peripheral hardware, such as disk drives, tape drives, CD-ROM drives, printers, and scanners faster and more flexibly than previous interfaces. snapshot An exact copy of changed data in the active files and directories of a file system or fileset at a single point in time. See also fileset snapshot and global snapshot. source node The node on which a data management event is generated. stand-alone client The node in a one-node cluster.
Q
quorum node A node in the cluster that is counted to determine whether a quorum exists. quota The amount of disk space and number of inodes assigned as upper limits for a specified user, group of users, or fileset. quota management The allocation of disk blocks to the other nodes writing to the file system, and comparison of the allocated space to quota limits at regular intervals.
R
Redundant Array of Independent Disks (RAID) A collection of two or more disk physical drives that present to the host an image of one or more logical disk drives. In the event of a single physical device failure, the data can be read or regenerated from the other disk drives in the array due to data redundancy. recovery The process of restoring access to file system data when a failure has occurred. Recovery can involve reconstructing data or providing alternative routing through a different server. replication The process of maintaining a defined set of data in more than one location. Replication involves copying designated changes for one location (a source) to another (a target), and synchronizing the data in both locations. rule A list of conditions and actions that are triggered when certain conditions are met. Conditions include attributes about an object (file name, type or extension, dates,
256
storage area network (SAN) A dedicated storage network tailored to a specific environment, combining servers, storage products, networking products, software, and services. storage pool A grouping of storage space consisting of volumes, logical unit numbers (LUNs), or addresses that share a common set of administrative characteristics. stripe group The set of disks comprising the storage assigned to a file system. striping A storage process in which information is split into blocks (a fixed amount of data) and the blocks are written to (or read from) a series of disks in parallel. subblock The smallest unit of data accessible in an I/O operation, equal to one thirty-second of a data block. system storage pool A storage pool containing file system control structures, reserved files, directories, symbolic links, special devices, as well as the metadata associated with regular files, including indirect blocks and extended attributes The system storage pool can also contain user data.
of the file system. The token management server is located at the file system manager node. twin-tailed A disk connected to two nodes.
U
user storage pool A storage pool containing the blocks of data that make up user files.
V
virtual file system (VFS) A remote file system that has been mounted so that it is accessible to the local user. virtual node (vnode) The structure that contains information about a file system object in an virtual file system (VFS).
T
token management A system for controlling file access in which each application performing a read or write operation is granted some form of access to a specific block of file data. Token management provides data consistency and controls conflicts. Token management has two components: the token management server, and the token management function. token management function A component of token management that requests tokens from the token management server. The token management function is located on each cluster node. token management server A component of token management that controls tokens relating to the operation
Glossary
257
258
A
access to disk 95 ACCESS_TIME attribute 29, 30 accessibility features for the GPFS product 247 active file management in disconnected mode 110 active file management, questions related to 109 active file management, resync in 109 administration commands failure 42 AFM fileset, changing mode of 110 AFM in disconnected mode 110 AFM, extended attribute size supported by 110 AFM, resync in 109 AIX kernel debugger 37 AIX error logs MMFS_DISKFAIL 95 MMFS_QUOTA 72 unavailable disks 72 application programs errors 3, 5, 48, 57 authorization error 41 autofs 65 autofs mount 64 autoload option on mmchconfig command 45 on mmcrcluster command 45 automount 63, 69 automount daemon 64 automount failure 63, 65
C
candidate file 25, 28 attributes 29 changing mode of AFM fileset 110 checking, Persistent Reserve 103 chosen file 25, 27 CIFS serving, Windows SMB2 protocol cipherList 68 Copyright IBM Corp. 1998, 2013
58
259
commands (continued) mmlsmgr 11, 62 mmlsmount 24, 45, 57, 61, 70, 71, 91 mmlsnsd 31, 92, 93, 100 mmlspolicy 78 mmlsquota 57 mmlssnapshot 82, 83, 84 mmmount 23, 61, 72, 102 mmpmon 37, 85 mmquotaoff 57 mmquotaon 57 mmrefresh 19, 62, 64 mmremotecluster 35, 67, 68 mmremotefs 64, 67 mmrepquota 57 mmrestorefs 83, 84, 85 mmrestripefile 78, 81 mmrestripefs 81, 97, 100 mmrpldisk 75, 80, 102 mmsdrrestore 20 mmshutdown 18, 20, 45, 46, 48, 64, 65, 73 mmsnapdir 82, 84, 85 mmstartup 45, 64, 65 mmumount 70, 72 mmunlinkfileset 79 mmwindisk 32 mount 24, 61, 62, 63, 84, 98, 102 ping 41 rcp 41 rpm 113 rsh 41, 55 umount 71, 72, 100 varyonvg 102 commands, administration failure 42 communication paths unavailable 62 compiling mmfslinux module 44 configuration hard loop ID 40 performance tuning 41 configuration data 75 configuration parameters kernel 44 configuration problems 39 configuration variable settings displaying 19 connectivity problems 41 console logs mmfs.log 1 contact node address 67 contact node failure 67 contacting IBM 115 creating a master GPFS log file 2 cron 108
Data Management API (DMAPI) file system will not mount 63 data replication 96 data structure 3 dataOnly attribute 80 dataStructureDump 11 dead man switch timer 49, 50 deadlocks 50, 51 delays 50, 51 DELETE rule 25, 28 deleting a node from a cluster 56 descOnly 73 directories /tmp/mmfs 108, 113 .snapshots 82, 84, 85 directory that has not been cached, traversing disabling IPv6 for SSH connection delays 59 disabling Persistent Reserve manually 104 disaster recovery other problems 55 problems 54 setup problems 54 disconnected mode, AFM 110 disk access 95 disk commands hang 102 disk descriptor replica 72 disk failover 99 disk leasing 50 disk subsystem failure 91 disks damaged files 33 declared down 94 define for GPFS use 100 displaying information of 31 failure 3, 5, 91 media failure 96 partial failure 100 replacing 75 usage 72 disks down 100 disks, viewing 32 displaying disk information 31 displaying NSD information 92 DMAPI coexistence 82 DNS server failure 67
110
E
enabling Persistent Reserve manually 104 ERRNO I/O error code 55 error codes EIO 3, 91, 98 ENODEV 48 ENOSPC 98 ERRNO I/O 55 ESTALE 5, 48 NO SUCH DIRECTORY 48 NO SUCH FILE 48 error logs 1 example 5 MMFS_ABNORMAL_SHUTDOWN 3 MMFS_DISKFAIL 3
D
data replicated 97 data always gathered by gpfs.snap for a master snapshot 9 on AIX 9 on all platforms 8 on Linux 9 on Windows 9 data integrity 5, 88 8
260
error logs (continued) MMFS_ENVIRON 3 MMFS_FSSTRUCT 3 MMFS_GENERIC 3 MMFS_LONGDISKIO 4 MMFS_QUOTA 4, 31 MMFS_SYSTEM_UNMOUNT 5 MMFS_SYSTEM_WARNING 5 error messages 0516-1339 94 0516-1397 94 0516-862 94 6027-1209 48 6027-1242 42 6027-1290 75 6027-1324 58 6027-1598 53 6027-1615 42 6027-1617 42 6027-1627 57 6027-1628 42 6027-1630 42 6027-1631 42 6027-1632 42 6027-1633 42 6027-1636 92 6027-1661 92 6027-1662 94 6027-1995 82 6027-1996 74 6027-2101 82 6027-2108 92 6027-2109 92 6027-300 44 6027-306 47 6027-319 46, 47 6027-320 47 6027-321 47 6027-322 47 6027-341 44, 47 6027-342 44, 47 6027-343 44, 47 6027-344 44, 47 6027-361 99 6027-418 73, 99 6027-419 63, 73 6027-435 54 6027-473 73 6027-474 73 6027-482 63, 99 6027-485 99 6027-490 54 6027-506 58 6027-533 51 6027-538 56 6027-549 63 6027-580 63 6027-631 74 6027-632 74 6027-635 74 6027-636 74, 99 6027-638 74 6027-645 63 6027-650 48 6027-663 56 6027-665 44, 57 6027-695 58
error messages (continued) 6027-992 82 ANS1312E 82 descriptor replica 54 failed to connect 44, 99 GPFS cluster data recovery 42 incompatible version number 46 mmbackup 82 mmfsd ready 44 network problems 47 quorum 54 rsh problems 41 shared segment problems 46, 47 snapshot 82, 83, 84 TSM 82 error number configuration 43 EALL_UNAVAIL = 218 73 ECONFIG = 208 43 ECONFIG = 215 43, 47 ECONFIG = 218 43 ECONFIG = 237 43 EINVAL 77 ENO_MGR = 212 75, 99 ENO_QUOTA_INST = 237 63 ENOENT 71 ENOSPC 75 EOFFLINE = 208 99 EPANIC = 666 73 ESTALE 71 EVALIDATE = 214 88 file system forced unmount 73 GPFS application 99 GPFS daemon will not come up 47 installation 43 multiple file system manager failures 75 errors, Persistent Reserve 102 errpt command 113 EXCLUDE rule 29 excluded file 29 attributes 29 extended attribute size supported by AFM 110
F
facility Linux kernel crash dump (LKCD) 37 failure disk 94 hardware 115 mmfsck command 109 non-IBM hardware 115 of disk media 96 snapshot 82 failure group 72 failure groups loss of 72 use of 72 failures mmbackup 81 file migration problems 78 File Placement Optimizer (FPO), questions related to file placement policy 78 file system descriptor 72, 73 failure groups 72 inaccessible 72
111
Index
261
file system manager cannot appoint 71 contact problems communication paths unavailable 62 multiple failures 74 file system or fileset getting full 110 file systems cannot be unmounted 24 creation failure 56 determining if mounted 73 discrepancy between configuration data and on-disk data 75 forced unmount 5, 71, 74 free space shortage 84 listing mounted 24 loss of access 57 not consistent 84 remote 65 unable to determine if mounted 73 will not mount 23, 24, 61 will not unmount 70 FILE_SIZE attribute 29, 30 files /etc/filesystems 62 /etc/fstab 62 /etc/group 4 /etc/hosts 40 /etc/passwd 4 /etc/resolv.conf 59 /usr/lpp/mmfs/bin/runmmfs 11 /usr/lpp/mmfs/samples/gatherlogs.sample.sh 2 /var/adm/ras/mmfs.log.latest 1 /var/adm/ras/mmfs.log.previous 1, 55 /var/mmfs/etc/mmlock 42 /var/mmfs/gen/mmsdrfs 42, 43 .rhosts 41 detecting damage 33 mmfs.log 1, 2, 44, 46, 48, 61, 64, 65, 66, 67, 68, 69, 71, 113 mmsdrbackup 43 mmsdrfs 43 FILESET_NAME attribute 29, 30 filesets child 79 deleting 79 dropped 110 emptying 79 errors 79 lost+found 80 moving contents 79 performance 79 problems 75 snapshots 79 unlinking 79 usage errors 79 FSDesc structure 72 full file system or fileset 110
GPFS cluster problems adding nodes 53 recovery from loss of GPFS cluster configuration data files 42 GPFS cluster data backup 43 locked 42 GPFS cluster data files storage 42 GPFS command failed 55 return code 55 unsuccessful 55 GPFS configuration data 75 GPFS daemon 40, 44, 61, 70 crash 47 fails to start 45 went down 4, 47 will not start 44 GPFS daemon went down 48 GPFS is not using the underlying multipath device 105 GPFS kernel extension 44 GPFS local node failure 68 GPFS log 1, 2, 44, 46, 48, 61, 64, 65, 66, 67, 68, 69, 71, 113 GPFS messages 119 GPFS modules cannot be loaded 44 GPFS problems 39, 61, 91 GPFS startup time 2 GPFS trace facility 11 GPFS Windows SMB2 protocol (CIFS serving) 58 gpfs.snap command 6, 113 data always gathered for a master snapshot 9 data always gathered on AIX 9 data always gathered on all platforms 8 data always gathered on Linux 9 data always gathered on Windows 9 using 7 grep command 2 Group Services verifying quorum 46 GROUP_ID attribute 29, 30
H
hard loop ID 40 hints and tips for GPFS problems 107 Home and .ssh directory ownership and permissions 58
I
I/O error while in disconnected mode I/O error, AFM 110 I/O hang 50 I/O operations slow 4 ill-placed files 77, 81 ILM problems 75 inode data stale 87 inode limit 5 installation problems 39 110
G
gathering data to solve GPFS problems generating GPFS trace reports mmtracectl command 11 GPFS data integrity 88 nodes will not start 46 replication 96 unable to start 39 6
J
junctions deleting 79
262
K
KB_ALLOCATED attribute 29, 30 kdb 37 KDB kernel debugger 37 kernel module mmfslinux 44 kernel panic 49 kernel threads at time of system hang or panic
37
L
license inquiries 249 Linux kernel configuration considerations 40 crash dump facility 37 logical volume 100 location 107 Logical Volume Manager (LVM) 96 long waiters increasing the number of inodes 50 lslpp command 113 lslv command 107 lsof command 24, 70, 71 lspv command 101 lsvg command 100 lxtrace command 10 lxtrace commands 11
M
manually enabling or disabling Persistent Reserve master GPFS log file 2 maxblocksize parameter 63 MAXNUMMP 82 memory shortage 3, 40 message 6027-648 108 message severity tags 117 messages 119 6027-1941 40 metadata replicated 97 MIGRATE rule 25, 28 migration file system will not mount 62 new commands will not run 56 mmadddisk 80 mmadddisk command 75, 97, 100, 102 mmaddnode command 53, 108 mmafmctl Device getstate command 17 mmapplypolicy 77, 78, 80 mmapplypolicy -L 0 26 mmapplypolicy -L 1 26 mmapplypolicy -L 2 27 mmapplypolicy -L 3 28 mmapplypolicy -L 4 29 mmapplypolicy -L 5 29 mmapplypolicy -L 6 30 mmapplypolicy command 25 mmauth 35 mmauth command 67 mmbackup command 81, 82 mmchcluster command 41 mmchconfig command 19, 45, 54, 69, 108 mmchdisk 80 mmchdisk command 62, 71, 75, 91, 94, 95, 97, 99 104
mmcheckquota command 4, 31, 57, 72 mmchfs command 5, 51, 56, 62, 63, 64, 72 mmchnsd command 91 mmcommon 64, 65 mmcommon recoverfs command 75 mmcommon showLocks command 42 mmcrcluster command 19, 41, 45, 53, 108 mmcrfs command 56, 91, 102 mmcrnsd command 91, 94 mmcrsnapshot command 83, 84 mmdefedquota command fails 108 mmdeldisk 80 mmdeldisk command 75, 97, 100 mmdelfileset 79 mmdelfs command 98, 99 mmdelnode command 53, 56 mmdelnsd command 94, 98 mmdelsnapshot command 83 mmdf 75 mmdf command 51, 100 mmdiag command 17 mmdsh 41 mmedquota command fails 108 mmexpelnode command 20 mmfileid command 33, 88, 97 MMFS_ABNORMAL_SHUTDOWN error logs 3 MMFS_DISKFAIL error logs 3 MMFS_ENVIRON error logs 3 MMFS_FSSTRUCT error logs 3 MMFS_GENERIC error logs 3 MMFS_LONGDISKIO error logs 4 MMFS_QUOTA error logs 4, 31 MMFS_SYSTEM_UNMOUNT error logs 5 MMFS_SYSTEM_WARNING error logs 5 mmfs.log 1, 2, 44, 46, 48, 61, 64, 65, 66, 67, 68, 69, 71, 113 mmfsadm command 10, 14, 46, 51, 52, 53, 97 mmfsck 80 mmfsck command 23, 61, 62, 84, 88, 97, 100 failure 109 mmfsd 44, 61, 70 will not start 44 mmfslinux kernel module 44 mmgetstate command 17, 46, 55 mmlock directory 42 mmlsattr 78, 79 mmlscluster command 18, 53, 67, 107 mmlsconfig command 11, 19, 64 mmlsdisk command 56, 61, 62, 71, 72, 75, 91, 94, 96, 99, 114 mmlsfileset 79 mmlsfs command 63, 97, 98, 113 mmlsmgr command 11, 62 mmlsmount command 24, 45, 57, 61, 70, 71, 91 mmlsnsd 31 mmlsnsd command 92, 93, 100 mmlspolicy 78 mmlsquota command 57 mmlssnapshot command 82, 83, 84 Index
263
mmmount command 23, 61, 72, 102 mmpmon abend 86 altering input file 85 concurrent usage 85 counters wrap 86 dump 86 hang 86 incorrect input 85 incorrect output 86 restrictions 85 setup problems 85 trace 86 unsupported features 86 mmpmon command 37, 85 mmquotaoff command 57 mmquotaon command 57 mmrefresh command 19, 62, 64 mmremotecluster 35 mmremotecluster command 67, 68 mmremotefs command 64, 67 mmrepquota command 57 mmrestorefs command 83, 84, 85 mmrestripefile 78, 81 mmrestripefs 81 mmrestripefs command 97, 100 mmrpldisk 80 mmrpldisk command 75, 102 mmsdrbackup 43 mmsdrfs 43 mmsdrrestore command 20 mmshutdown command 18, 20, 45, 46, 48, 64, 65 mmsnapdir command 82, 84, 85 mmstartup command 45, 64, 65 mmtracectl command generating GPFS trace reports 11 mmumount command 70, 72 mmunlinkfileset 79 mmwindisk 32 mode of AFM fileset, changing 110 MODIFICATION_TIME attribute 29, 30 module is incompatible 44 mount problems 69 mount command 24, 61, 62, 63, 84, 98, 102 Multi-Media LAN Server 1
node reinstall 40 nodes cannot be added to GPFS cluster 53 non-quorum node 107 notices 249 NSD 100 creating 94 deleting 94 displaying information of 92 extended information 93 failure 91 NSD disks creating 91 using 91 NSD server 68, 69, 70 nsdServerWaitTimeForMount changing 70 nsdServerWaitTimeWindowOnMount changing 70
O
OpenSSH connection delays Windows 59 orphaned file 80
P
partitioning information, viewing 32 patent information 249 performance 41 permission denied remote mounts fail 69 Persistent Reserve checking 103 clearing a leftover reservation 103 errors 102 manually enabling or disabling 104 understanding 102 ping command 41 PMR 115 policies DEFAULT clause 77 deleting referenced objects 78 errors 78 file placement 77 incorrect file placement 78 LIMIT clause 77 long runtime 78 MIGRATE rule 77 problems 75 rule evaluation 77 usage errors 77 verifying 25 policy file detecting errors 26 size limit 77 totals 26 policy rules runtime problems 78 POOL_NAME attribute 29, 30 possible GPFS problems 39, 61, 91 predicted pool utilization incorrect 77 primary NSD server 69
N
network failure 49 network problems 3 NFS problems 87 NFS client with stale inode data 87 NFS V4 problems 87 NO SUCH DIRECTORY error code NO SUCH FILE error code 48 NO_SPACE error 75 node crash 115 hang 115 rejoin 70 node crash 40 node failure 49
48
264
problem locating a snapshot 82 not directly related to snapshot 82 snapshot 82 snapshot directory name 84 snapshot restore 84 snapshot status 83 snapshot usage 83 problem determination cluster state information 17 documentation 113 remote file system I/O fails with the "Function not implemented" error message when UID mapping is enabled 66 reporting a problem to IBM 113 tools 1, 23 tracing 11 Problem Management Record 115 problems configuration 39 installation 39 mmbackup 81 problems running as administrator, Windows 58 protocol (CIFS serving), Windows SMB2 58
S
Samba client failure 88 Secure Hash Algorithm digest 35 service reporting a problem to IBM 113 serving (CIFS), Windows SMB2 protocol 58 setuid bit, removing 48 setuid/setgid bits at AFM home, resetting of 110 severity tags messages 117 SHA digest 35, 67 shared segments 46 problems 47 SMB2 protocol (CIFS serving), Windows 58 snapshot 82 directory name conflict 84 invalid state 83 restoring 84 status error 83 usage error 83 valid 82 snapshot problems 82 storage pools deleting 78, 80 errors 81 failure groups 80 problems 75 slow access time 81 usage errors 80 strict replication 98 subnets attribute 54 syslog facility Linux 2 syslogd 65 system load 108 system snapshots 6, 7 system storage pool 77, 80
Q
quorum 46, 107 disk 50 quorum node 107 quota cannot write to quota file denied 57 error number 43 quota files 31 quota problems 4
72
R
RAID controller 96 rcp command 41 read-only mode mount 24 recovery log 49 recreation of GPFS storage file mmchcluster -p LATEST 42 remote command problems 41 remote file copy command default 41 remote file system I/O fails with "Function not implemented" error 66 remote mounts fail with permission denied 69 remote node expelled 54 remote shell default 41 removing the setuid bit 48 replicated data 97 replicated metadata 97 replication 80 of data 96 reporting a problem to IBM 10, 113 resetting of setuid/setgits at AFM home 110 restricted mode mount 23 resync in active file management 109 rpm command 113
T
the IBM Support Center 115 threads tuning 41 waiting 51 Tivoli Storage Manager server 81 trace active file management 12 allocation manager 12 basic classes 12 behaviorals 14 byte range locks 12 call to routines in SharkMsg.h 13 checksum services 12 cleanup routines 12 cluster security 14 concise vnop description 14 daemon routine entry/exit 12 daemon specific code 14 data shipping 12 defragmentation 12 Index
265
trace (continued) dentry operations 12 disk lease 12 disk space allocation 12 DMAPI 12 error logging 12 events exporter 12 file operations 13 file system 13 generic kernel vfs information 13 inode allocation 13 interprocess locking 13 kernel operations 13 kernel routine entry/exit 13 low-level vfs locking 13 mailbox message handling 13 malloc/free in shared segment 13 miscellaneous tracing and debugging 14 mmpmon 13 mnode operations 13 mutexes and condition variables 13 network shared disk 13 online multinode fsck 13 operations in Thread class 14 page allocator 13 parallel inode tracing 13 performance monitors 13 physical disk I/O 12 physical I/O 13 pinning to real memory 13 quota management 13 rdma 14 recovery log 13 SANergy 14 scsi services 14 shared segments 14 SMB locks 14 SP message handling 14 super operations 14 tasking system 14 token manager 14 ts commands 12 vdisk 14 vdisk debugger 14 vdisk hospital 14 vnode layer 14 trace classes 12 trace facility 11, 12 mmfsadm command 10 trace level 14 trace reports, generating 11 trademarks 250 traversing a directory that has not been cached troubleshooting errors 58 troubleshooting Windows errors 58 TSM client 81 TSM restore 82 TSM server 81 MAXNUMMP 82 tuning 41
V
varyon problems 101 varyonvg command 102 viewing disks and partitioning information volume group 101 32
W
Windows 58 file system mounted on the wrong drive letter 109 Home and .ssh directory ownership and permissions 58 mounted file systems, Windows 109 OpenSSH connection delays 59 problem seeing newly mounted file systems 109 problem seeing newly mounted Windows file systems 109 problems running as administrator 58 Windows 109 Windows SMB2 protocol (CIFS serving) 58
110
U
umount command 71, 72, 100 underlying multipath device 105 understanding, Persistent Reserve 102 useNSDserver attribute 99
266
Printed in USA
GA76-0415-08