Vcs Notes 601 Sol
Vcs Notes 601 Sol
Solaris
6.0.1
August 2012
Legal Notice
Copyright 2012 Symantec Corporation. All rights reserved. Symantec, the Symantec logo, Veritas, Veritas Storage Foundation, CommandCentral, NetBackup, Enterprise Vault, and LiveUpdate are trademarks or registered trademarks of Symantec corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. The product described in this document is distributed under licenses restricting its use, copying, distribution, and decompilation/reverse engineering. No part of this document may be reproduced in any form by any means without prior written authorization of Symantec Corporation and its licensors, if any. THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE. The Licensed Software and Documentation are deemed to be commercial computer software as defined in FAR 12.212 and subject to restricted rights as defined in FAR Section 52.227-19 "Commercial Computer Software - Restricted Rights" and DFARS 227.7202, "Rights in Commercial Computer Software or Commercial Computer Software Documentation", as applicable, and any successor regulations. Any use, modification, reproduction release, performance, display or disclosure of the Licensed Software and Documentation by the U.S. Government shall be solely in accordance with the terms of this Agreement.
Technical Support
Symantec Technical Support maintains support centers globally. Technical Supports primary role is to respond to specific queries about product features and functionality. The Technical Support group also creates content for our online Knowledge Base. The Technical Support group works collaboratively with the other functional areas within Symantec to answer your questions in a timely fashion. For example, the Technical Support group works with Product Engineering and Symantec Security Response to provide alerting services and virus definition updates. Symantecs support offerings include the following:
A range of support options that give you the flexibility to select the right amount of service for any size organization Telephone and/or Web-based support that provides rapid response and up-to-the-minute information Upgrade assurance that delivers software upgrades Global support purchased on a regional business hours or 24 hours a day, 7 days a week basis Premium service offerings that include Account Management Services
For information about Symantecs support offerings, you can visit our Web site at the following URL: www.symantec.com/business/support/index.jsp All support services will be delivered in accordance with your support agreement and the then-current enterprise technical support policy.
Hardware information Available memory, disk space, and NIC information Operating system Version and patch level Network topology Router, gateway, and IP address information Problem description:
Error messages and log files Troubleshooting that was performed before contacting Symantec Recent software configuration changes and network changes
Customer service
Customer service information is available at the following URL: www.symantec.com/business/support/ Customer Service is available to assist with non-technical questions, such as the following types of issues:
Questions regarding product licensing or serialization Product registration updates, such as address or name changes General product information (features, language availability, local dealers) Latest information about product updates and upgrades Information about upgrade assurance and support contracts Information about the Symantec Buying Programs Advice about Symantec's technical support options Nontechnical presales questions Issues that are related to CD-ROMs or manuals
Documentation
Product guides are available on the media in PDF format. Make sure that you are using the current version of the documentation. The document version appears on page 2 of each guide. The latest product documentation is available on the Symantec Web site. https://sort.symantec.com/documents Your feedback on product documentation is important to us. Send suggestions for improvements and reports on errors or omissions. Include the title and document version (located on the second page), and chapter and section titles of the text on which you are reporting. Send feedback to: [email protected] For information regarding the latest HOWTO articles, documentation updates, or to ask a question regarding product documentation, visit the Storage and Clustering Documentation forum on Symantec Connect. https://www-secure.symantec.com/connect/storage-management/ forums/storage-and-clustering-documentation
About this document Component product release notes About Veritas Cluster Server About Symantec Operations Readiness Tools Important release information Changes introduced in 6.0.1 VCS system requirements No longer supported Fixed issues Known issues Software limitations Documentation
The information in the Release Notes supersedes the information provided in the product documents for VCS. This is "Document version: 6.0.1 Rev 0" of the Veritas Cluster Server Release Notes. Before you start, make sure that you are using the latest version of this guide. The latest product documentation is available on the Symantec Web site at: https://sort.symantec.com/documents
Symantec recommends copying the files to the /opt/VRTS/docs directory on your system. This release includes the following component product release notes:
Veritas Cluster Server Release Notes About Symantec Operations Readiness Tools
and agents that are available through Symantec consulting services, contact your Symantec sales representative. VCS provides a framework that allows for the creation of custom agents. Create agents in situations where the Veritas High Availability Agent Pack, the bundled agents, or the enterprise agents do not meet your needs. For more information about the creation of custom agents, refer to the Veritas Cluster server Agent developer's Guide. You can also request a custom agent through Symantec consulting services.
If you use custom agents compiled on older compilers, the agents may not work with VCS 6.0.1. If your custom agents use scripts, continue linking to ScriptAgent. Use Script50Agent for agents written for VCS 5.0 and above.
10
Manage risks
Get automatic email notifications about changes to patches, array-specific modules (ASLs/APMs/DDIs/DDLs), and high availability agents from a central repository. Identify and mitigate system and environmental risks.
Improve efficiency
Find and download patches based on product version and platform. List installed Symantec products and license keys.
Note: Certain features of SORT are not available for all products. Access to SORT is available at no extra cost. To access SORT, go to: https://sort.symantec.com
For important updates regarding this release, review the Late-Breaking News TechNote on the Symantec Technical Support website: http://www.symantec.com/docs/TECH164885 For the latest patches available for this release, go to: https://sort.symantec.com/ The hardware compatibility list contains information about supported hardware and is updated regularly. For the latest information on supported hardware visit the following URL: http://www.symantec.com/docs/TECH170013 Before installing or upgrading Storage Foundation and High Availability Solutions products, review the current compatibility list to confirm the compatibility of your hardware and software.
11
Locally-installed installation and uninstallation scripts now include the release version
When you run local scripts (/opt/VRTS/install) to configure Veritas products, the names of the installed scripts now include the release version. Note: If you install your Veritas product from the install media, continue to run the installvcs command without including the release version. To run the script from the installed binaries, run the installvcs<version> command. Where <version> is the current release version with no periods or spaces. For example, to configure the 6.0.1 version of your product, run this command:
# /opt/VRTS/install/installvcs601 -configure
12
which can be downloaded from the Oracle Web site) to install the Oracle Solaris OS on a single SPARC or x86 platform. All cases require access to a package repository on the network to complete the installation.
General checks for all products. Checks for Volume Manager (VM). Checks for File System (FS). Checks for Cluster File System (CFS).
IPMPDevice: Stores the IPMP interface name. To configure MultiNICB resource in IPMP mode on Solaris 11, set the value of this attribute to the valid name of IPMP interface created for interfaces under MultiNICB control. At the same time, make sure that UseMpathd attribute of MultiNICB is set to 1. This attribute is applicable only to Oracle Solaris 11.
13
DeleteRouteOptions: String to delete a route when un-configuring an interface. When RouteOptions and DeleteRouteOptions attributes are configured, RouteOptions attribute is used to add route and DeleteRouteOptions attribute is used to delete route. When RouteOptions attribute is not configured, DeleteRouteOptions attribute is ignored.
ResyncVMCfg: The ResyncVMCfg attribute is set by the havmconfigsync utility. If this attribute is set, the agent redefines the virtual machine configuration if it already exists using the CFgFile attribute.
FipsMode: Indicates whether FIPS mode is enabled for the cluster. The value depends on the mode of the broker on the system.
For Oracle Solaris 11, VCS supports zone root creation only on ZFS filesystem.
14
The utility saves the backup of the original configuration file before updating the new configuration. On the other nodes in the cluster, during failover or switch, the online operation redefines the LDom configuration by removing the existing configuration and redefining the VM using the new configuration saved on the shared storage. Note: The havmconfigsync utility is not supported on Solaris x86.
15
pkg:/compatibility/ucb
StartProgram can be used to avail ProPCV functionality. The StartProgram attribute is added to IMFRegList of Application agent. See Bundled Agent Reference Guide and Veritas Cluster Server Administrator's Guide for more information.
16
2. 3. 4.
IMF can group events of different types under the same VCS resource and is the central notification provider for kernel space events and user space events. More agents can become IMF-aware by leveraging the notifications that are available only from user space. Agents can get notifications from IMF without having to interact with USNPs.
For more information, refer to the Veritas Cluster Server Administrator's Guide.
17
DiskGroup agent
online global soft online global firm online remote soft online remote firm
FIPS compliance is a configurable option available with VCS 6.0.1. When existing VCS deployments are upgraded from VCS 6.0 or earlier versions to 6.0.1, FIPS compliance is not automatically enabled. To enable FIPS mode, you must ensure that the cluster is new and configured without setting any security condition. To configure FIPS mode on a cluster which is already secured, refer to the steps under Enabling and disabling secure mode for the cluster in Veritas Cluster Server Administrator Guide. VCS 6.0.1 does not support FIPS in GCO or CP server based cluster.
18
Changes to LLT
This release includes the following change to LLT:
19
Operating systems
Solaris 10 Solaris 10 Solaris 11 Solaris 11
Chipsets
SPARC x86 SPARC x86
Oracle Solaris 10
Note: VCS supports the previous and the next versions of Storage Foundation to facilitate product upgrades.
20
Chipsets
SPARC x64 SPARC x64
Supported software for the VCS agents for enterprise applications Application
DB2 Enterprise Server Edition
21
Table 1-3
Supported software for the VCS agents for enterprise applications (continued) Application
Oracle
Agent
Oracle
Sybase
12.5.x, 15.x
See the Veritas Cluster Server Installation Guide for the agent for more details. For a list of the VCS application agents and the software that the agents support, see the Veritas Cluster Server Agents Support Matrix at Symantec website.
No longer supported
The following features are not supported in this release of VCS products:
The configure_cps.pl script used to configure CP server is now deprecated and is no longer supported. AlternateIO is not qualified on Solaris 11 platform. VCS 6.0.1 does not support NFS mount with UFS file system hosted on the NFS server.
Deprecated attributes
Deprecated DiskGroup agent attribute:
DiskGroupType
Fixed issues
This section covers the incidents that are fixed in this release.
22
2554167
Setting peerinact value to 0 in the /etc/llttab file floods the system log file with large number of log messages. Vxfenswap fails when LANG is set to a value other than 'C'. The vxfenswap utility internally uses the tr command. If the LANG environment variable is set to something other than C, it may cause improper functioning of the vxfenswap utility. On Solaris 11, vxfen startup scripts do not report the correct status of fencing . Fencing may start up with an error message open failed for device: /dev/vxfen in the log. It happens when the fencing startup script tries to access the driver that is still loading into memory. However, fencing comes up seamlessly in spite of the error message. Logs report errors related to mv command when the vxfen service is disabled. Post-install script of the VRTSllt package reports error while attempting to disable the SMF service system/llt. Post-install script of the VRTSvxfen package reports error while attempting to disable the SMF service system/vxfen.
2699308
2726341
2850926
2699291
2762660
2762660
23
2850904
2730451
2850923
2639181
2728802
2703707
2509227
2680428
2730979
24
2850905
2850916
2822920
2846389
2832754
2741299
2850906
25
2684818
2696056
Memory leak occurs in the engine when haclus status <cluster> command is run. When failover group is probed, VCS engine clears the MigrateQ and TargetCount. The syslog call used in gab_heartbeat_alarm_handler and gabsim_heartbeat_alarm_handler functions is not async signal safe.
2746802
2746816
26
2850920
2660011
27
2680435
2662766
2653668 2644483
Known issues
This section covers the known issues in this release.
28
2.
3.
4.
29
3.
Find and delete the keyless licenses left over in the system. To do this, perform the following steps for every key stored in /etc/vx/licenses/lic:
Verify if the key has VXKEYLESS feature Enabled using the following command:
# vxlicrep -k <license_key> | grep VXKEYLESS
Delete the key if and only if VXKEYLESS feature is Enabled. Note: When performing the search, do not include the .vxlic extension as part of the search string.
4.
30
The issue may be observed on any one or all the nodes in the sub-cluster. Workaround: After the upgrade or uninstallation completes, follow the instructions provided by the installer to resolve the issue.
Workaround: In future releases, the resstatechange trigger will not be invoked when a resource is restarted. Instead, the resrestart trigger will be invoked if you set the TriggerResRestart attribute. The resrestart trigger is available in the current release. Refer to the VCS documentation for details.
Instaling VRTSvlic package on Solaris system with local zones displays error messages [2555312]
If you try to install VRTSvlic package on a Solaris system with local zones in installed state, the system displays the following error messages:
cp: cannot create /a/sbin/vxlicinst: Read-only file system cp: cannot create /a/sbin/vxlicrep: Read-only file system cp: cannot create /a/sbin/vxlictest: Read-only file system
Workaround: On the Solaris system, make sure that all non-global zones are started and in the running state before you install the VRTSvlic package.
On Solaris 10, a flash archive installed through JumpStart may cause a new system to go into maintenance mode on reboot (2379123)
If a Flash archive is created on a golden host with encapsulated root disks, when this Flash archive is installed onto another host through JumpStart, the new system may go to maintenance mode when you initially reboot it. This problem is caused by the predefined root disk mirror in the Flash archive. When the archive is applied to a clone system, which may have different hard drives, the newly cloned system may get stuck at root disk mirroring during reboot.
31
Workaround: Create the Flash archive on a golden host with no encapsulated root disks. Run vxunroot to clean up the mirrored root disks before you create the Flash archive.
Web installer does not ask for authentication after the first session if the browser is still open (2509330)
If you install or configure VCS and then close the Web installer, if you have other browser windows open, the Web installer does not ask for authentication in the subsequent sessions. Since there is no option to log out of the Web installer, the session remains open as long as the browser is open on the system. Workaround: Make sure that all browser windows are closed to end the browser session and subsequently log in again.
VCS 5.1SP1RP1 or later VCS releases with DeleteVCSZoneUser attribute of Zone agent set to 1 VCS 5.1SP1 or ealier VCS releases
You may see the following issue. Zone agent offline/clean entry points delete VCS Zone users from configuration. After upgrade to VCS 6.0, VCS Zone users need to be added to the configuration. VCS Zone users can be added by running hazonesetup utility with new syntax after upgrade. See the Veritas Storage Foundation and High Availability Solutions Virtualization Guide for Solaris for more information on hazonesetup utility and see the Veritas Storage Foundation and High Availability Solutions Virtualization Guide for Solaris.
Stopping the Web installer causes Device Busy error messages (2633924)
If you start the Web installer, and then perform an operation (such as prechecking, configuring, or uninstalling), you may get an error message saying the device is busy. Workaround: Do one of the following:
Kill the start.pl process. Start the webinstaller again. On the first Web page you see that the session is still active. Either take over this session and finish it or terminate it directly.
32
Cluster goes into STALE_ADMIN_WAIT state during upgrade from VCS 5.1 to 6.0.1 [2850921]
While performing a manual upgrade from VCS 5.1 to VCS 6.0.1, cluster goes in STALE_ADMIN_WAIT state if there is an entry of DB2udbTypes.cf in main.cf. Installation of VRTSvcsea package in VCS 5.1 creates a symbolic link for Db2udbTypes.cf file inside /etc/VRTSvcs/conf/config directory which points to /etc/VRTSagents/ha/conf/Db2udb/Db2udbTypes.cf. During manual upgrade, the VRTSvcsea package for VCS 5.1 gets removed, which in turn removes the symbolic link for file Db2udbTypes.cf inside /etc/VRTSvcs/conf/config directory. After the complete installation of VRTSvcsea for VCS 6.0.1, because of absence of file Db2udbTypes.cf inside /etc/VRTSvcs/conf/config, cluster goes into STALE ADMIN WAIT state. Workaround: Manually copy DB2udbTypes.cf from /etc/VRTSagents/ha/conf/Db2udb directory to the /etc/VRTSvcs/conf/config directory after the manual upgrade before starting HAD.
VCS installation with CPI fails when a non-global zone is in installed state and zone root is not mounted on the node (2731178)
On Solaris 10, CPI tries to boot a zone in installed state during installation/ or uninstallation. The boot fails if the underlying storage for zone root is not imported and mounted onto the node, causing the installation or uninstallation to fail. Workaround: Make sure that the non-global zones are in running or configured state when CPI is invoked for installation or uninstallation.
If you set up Disaster Recovery using the Global Cluster Option (GCO), the status of the remote cluster (cluster at the secondary site) shows as "initing". If you configure fencing to use CP server, fencing client fails to register with the CP server. Setting up trust relationships between servers fails.
Workaround:
33
Ensure that the required ports and services are not blocked by the firewall. Refer to the Veritas Cluster Server Installation Guide for the list of ports and services used by VCS. Configure the firewall policy such that the TCP ports required by VCS are not blocked. Refer to your respective firewall or OS vendor documents for the required configuration.
Stale legacy_run services seen when VCS is upgraded to support SMF [2431741]
If you have VCS 5.0MPx installed on a Solaris 10 system, VCS uses RC scripts to manage starting services. If you upgrade VCS to any version that supports SMF for VCS, you see stale legacy_run services for these RC scripts in addition to the SMF services. Workaround: There are two ways to remove these legacy services:
Open svccfg console using svccfg -s smf/legacy_run and delete the legacy services. For example:
svccfg -s smf/legacy_run svc:/smf/legacy_run> listpg * rc2_d_S70llt framework NONPERSISTENT rc2_d_S92gab framework NONPERSISTENT svc:/smf/legacy_run> delpg rc2_d_S70llt svc:/smf/legacy_run> delpg rc2_d_S92gab svc:/smf/legacy_run> exit
The hastop -all command on VCS cluster node with AlternateIO resource and StorageSG having service groups may leave the node in LEAVING state
On a VCS cluster node with AlternateIO resource configured and StorageSG attribute contain service groups with Zpool, VxVM or CVMVolDG resources, `hastop -local` or `hastop -all` commands may leave the node in "LEAVING" state. This issue is caused by lack of dependency between service group containing LDom resource and service groups containing storage resources exported to logical domain in alternate I/O domain scenarios. In this scenario VCS may attempt to stop the storage service groups before stopping logical domain which is using the resources.
34
Workaround: Stop the LDom service group before issuing hastop -local or hastop -all commands.
Character corruption observed when executing the uuidconfig.pl -clus -display -use_llthost command [2350517]
If password-less ssh/rsh is not set, the use of uuidconfig.pl command in non-English locale may print garbled characters instead of a non-English string representing the Password prompt. Workaround: No workaround.
35
Trigger does not get executed when there is more than one leading or trailing slash in the triggerpath [2368061]
The path specified in TriggerPath attribute must not contain more than one leading or trailing '\' character. Workaround: Remove the extra leading or trailing '\' characters from the path.
Service group is not auto started on the node having incorrect value of EngineRestarted [2653688]
When HAD is restarted by hashadow process, the value of EngineRestarted attribute is temporarily set to 1 till all service groups are probed. Once all service groups are probed, the value is reset. If HAD on another node is started at roughly the same time, then it is possible that it does not reset the value of EngineRestarted attribute. Therefore, service group is not auto started on the new node due to mismatch in the value of EngineRestarted attribute. Workaround: Restart VCS on the node where EngineRestarted is set to 1.
Workaround: Online the child resources of the topmost resource which is disabled.
NFS resource goes offline unexpectedly and reports errors when restarted [2490331]
VCS does not perform resource operations, such that if an agent process is restarted multiple times by HAD, only one of the agent process is valid and the remaining processes get aborted, without exiting or being stopped externally. Even though the agent process is running, HAD does not recognize it and hence does not perform any resource operations. Workaround: Terminate the agent process.
36
Parent group does not come online on a node where child group is online [2489053]
This happens if the AutostartList of parent group does not contain the node entry where the child group is online. Workaround: Bring the parent group online by specifying the name of the system then use the hargp -online [parent group] -any command to bring the parent group online.
If secure and non-secure WAC are connected the engine_A.log receives logs every 5 seconds [2653695]
Two WACs in GCO must always be started either in secure or non-secure mode. The secure and non-secure WAC connections cause log messages to be sent to engine_A.log file. Workaround: Make sure that WAC is running in either secure mode or non-secure mode on both the clusters in GCO.
Oracle group fails to come online if Fire Drill group is online on secondary cluster [2653695]
If a parallel global service group faults on the local cluster and does not find a failover target in the local cluster, it tries to failover the service group to the remote cluster. However, if the firedrill for the service group is online on a remote cluster, offline local dependency is violated and the global service group is not able to failover to the remote cluster. Workaround: Offline the Firedrill service group and online the service group on a remote cluster.
Oracle service group faults on secondary site during failover in a disaster recovery scenario [2653704]
Oracle service group fails to go online in the DR site when disaster strikes the primary site. This happens if the AutoFailover attribute on the Service Group is
37
set to 1 and when the corresponding service group's FireDrill is online in the DR site. Firedrill Service group may remain ONLINE on the DR site. Workaround: If the service group containing the Oracle (or any database) resource faults after attempting automatic DR failover while FireDrill is online in the DR site, manually offline the FireDrill Service Group. Subsequently, attempt the online of the Oracle Service Group in the DR site.
Service group may fail to come online after a flush and a force flush operation [2616779]
A service group may fail to come online after flush and force flush operations are executed on a service group where offline operation was not successful. Workaround: If the offline operation is not successful then use the force flush commands instead of the normal flush operation. If a normal flush operation is already executed then to start the service group use -any option.
Elevated TargetCount prevents the online of a service group with hagrp -online -sys command [2871892]
When you initiate an offline of a service group and before the offline is complete, if you initiate a forced flush, the offline of the service group which was initiated earlier is treated as a fault. As start bits of the resources are already cleared, service group goes to OFFLINE|FAULTED state but TargetCount remains elevated. Workaround: No workaround.
Auto failover does not happen in case of two successive primary and secondary cluster failures [2858187]
In case of three clusters (clus1, clus2, clus3) in a GCO with steward not configured, if clus1 loses connection with clus2, it sends the inquiry to clus3 to check the state of clus2 one of the following condition persists: 1. 2. If it is able to confirm that clus2 is down, it will mark clus2 as FAULTED. If it is not able to send the inquiry to clus3, it will assume that a network disconnect might have happened and mark clus2 as UNKNOWN
In second case, automatic failover does not take place even if the ClusterFailoverPolicy is set to Auto. You need to manually failover the global service groups. Workaround: Configure steward at a geographically distinct location from the clusters to which the above stated condition is applicable.
38
Trust between two clusters is not properly set if clusters are secure. Firewall is not correctly configured to allow WAC port (14155).
Workaround: Make sure that above two conditions are rectified. Refer to Veritas Cluster Server Administrator's Guide for information on setting up Trust relationships between two clusters.
The ha commands may fail for non-root user if cluster is secure [2847998]
The ha commands fail to work if you first use a non-root user without a home directory and then create a home directory for the same user. Workaround
1 2 3 4
Delete /var/VRTSat/profile/<user_name>, Delete /home/user_name/.VRTSat. Delete /var/VRTSat_lhc/<cred_file> file which same non-root user owns. Run ha command with same non-root user (this will pass).
Older ClusterAddress remains plumbed on the node while modifying ClusterAddress [2858188]
If you execute gcoconfig to modify ClusterAddress when ClusterService group is online, the older ClusterAddress remains plumbed on the node. Workaround: Un-plumb the older ClusterAddress from the node manually or offline ClusterService group by executing the following command before running gcoconfig:
hagrp -offline -force ClusterService -any
or
hagrp -offline -force ClusterService -sys <sys_name>
VRTSvcs package may give error messages for package verification on Solaris 11 [2858192]
VRTSvcs package may give error messages for package verification on Solaris 11. This is because some of the VCS configuration files are modified as part of product configuration. This error can be ignored.
39
Workaround: No workaround.
Disabling the VCS SMF service causes the service to go into maintenance state [2848005]
If the CmdServer process is stopped then disabling the VCS SMF service causes it to go into maintenance state. Workaround: To bring the service out of maintenance state, run:
# svcadm clear system/vcs
VCS service does not start when security is disabled on a cluster in security enabled mode (2724844)
When you change a VCS cluster state from security enabled to security disabled using script based installer, SMF service for VCS goes into a maintenance state. Workaround: Perform the following steps:
40
of the zlogin process, and instead get a new group id. Thus, it is difficult for the agent framework to trace the children or grand-children of the shell process, which translates to the cancellation of only the zlogin process. Workaround: Oracle must provide an API or a mechanism to kill all the children of the zlogin process that was started to run the entry point script in the local-zone.
This is due to system NFS default version mismatch between Solaris and Linux. The workaround for this is to configure MountOpt attribute in mount resource and set vers=3 for it. Example
root@north $ mount -F nfs south:/test /logo/ nfs mount: mount: /logo: Not owner root@north $ Mount nfsmount ( MountPoint = "/logo" BlockDevice = "south:/test" FSType = nfs MountOpt = "vers=3" )
The zpool command runs into a loop if all storage paths from a node are disabled
The Solaris Zpool agent runs zpool commands to import and export zpools. If all paths to the storage are disabled, the zpool command does not respond. Instead, the zpool export command goes into a loop and attempts to export the zpool. This continues till the storage paths are restored and zpool is cleared. As a result, the offline and clean procedures of Zpool Agent fail and the service group cannot fail over to the other node.
41
Workaround: You must restore the storage paths and run the zpool clear command for all the pending commands to succeed. This will cause the service group to fail over to another node.
Zone remains stuck in down state if tried to halt with file system mounted from global zone [2326105]
If zone halts without unmounting the file system, the zone goes to down state and does not halt with the zoneadm commands. Workaround: Unmount the file system manually from global zone and then halt the zone. For VxFS, use following commands to unmount the file system from global zone. To unmount when VxFSMountLock is 1
umount -o mntunlock=VCS <zone root path>/<Mount Point>
Process and ProcessOnOnly agent rejects attribute values with white spaces [2303513]
Process and ProcessOnOnly agent does not accept Arguments attribute values that are separated by multiple whitespaces. The Arguments attribute specifies the set of arguments for a process. If a script controls the process, the script is passed as an argument. You must separate multiple arguments by using a single whitespace. A string cannot accommodate more than one space between arguments, or allow leading or trailing whitespace characters. This attribute must not exceed 80 characters. Workaround: You should use only single whitespace to separate the argument attribute values. Make sure you avoid multiple whitespaces between the argument attribute values or trailing whitespace characters.
42
The zpool commands hang and remain in memory till reboot if storage connectivity is lost [2368017]
If the FailMode attribute of zpool is set to continue or wait and the underlying storage is not available, the zpool commands hang and remain in memory until the next reboot. This happens when storage connectivity to the disk is lost, the zpool commands hang and they cannot be stopped or killed. The zpool commands run by the monitor entry point remains in the memory. Workaround: There is no recommended workaround for this issue.
Application agent cannot handle a case with user as root, envfile set and shell as csh [2490296]
Application agent does not handle a case when the user is root, envfile is set, and shell is csh. The application agent uses the system command to execute the Start/Stop/Monitor/Clean Programs for the root user. This executes Start/Stop/Monitor/Clean Programs in sh shell, due to which there is an error when root user has csh shell and EnvFile is written accordingly. Workaround: Do not set csh as shell for root user. Use sh as shell for root instead.
IMF registration fails for Mount resource if the configured MountPoint path contains spaces [2442598]
If the configured MountPoint of a Mount resource contains spaces in its path, then the Mount agent can online the resource correctly, but the IMF registration for ONLINE monitoring fails. This is due to the fact that the AMF driver does not support spaces in the path. Leading and trailing spaces are handled by the Agent and IMF monitoring can be done for such resources. Workaround: Symantec recommends to turn off the IMF monitoring for a resource having spaces in its path. For information on disabling the IMF monitoring for a resource, refer to Veritas Cluster Server Administrator's Guide.
43
Workaround: No workaround.
Password changed while using hazonesetup script does not apply to all zones [2332349]
If you use the same user name for multiple zones, updating password for one zone does not updated the password of other zones. Workaround: While updating password for VCS user which is used for multiple zones, update password for all the zones.
RemoteGroup agent does not failover in case of network cable pull [2588807]
A RemoteGroup resource with ControlMode set to OnOff may not fail over to another node in the cluster in case of network cable pull. The state of the RemoteGroup resource becomes UNKNOWN if it is unable to connect to a remote cluster. Workaround:
Connect to the remote cluster and try taking offline the RemoteGroup resource. If connection to the remote cluster is not possible and you want to bring down the local service group, change the ControlMode option of the RemoteGroup resource to MonitorOnly. Then try taking offline the RemoteGroup resource. Once the resource is offline, change the ControlMode option of the resource to OnOff.
Prevention of Concurrency Violation (PCV) is not supported for applications running in a container [2536037]
For an application running in a container, VCS uses a similar functionality as if that resource is not registered to IMF. Hence, there is no IMF control to take a resource offline. When the same resource goes online on multiple nodes, agent detects and reports to engine. Engine uses the offline monitor to take the resource offline. Hence, even though there is a time lag before the detection of the same resource coming online on multiple nodes at the same time, VCS takes the resource offline.
44
PCV does not function for an application running inside a local Zone on Solaris Workaround: No workaround.
Monitor program does not change a resource to UNKNOWN if Netmask value is hexadecimal IPMultiNIC [2754172]
For a IPMultNIC type resource, monitor program does not change the status of the resource to UNKNOWN when the value of the Netmask attribute is specified in hexadecimal format. When value for NetMask attribute is specified in hexadecimal format, the monitor does not transition the status of the resource. Hence, code related errors may be logged. Workaround: No workaround.
Share resource goes offline unexpectedly causing service group failover [1939398]
Share resource goes offline unexpectedly and causes a failover when NFSRestart resource goes offline and UseSMF attribute is set to 1 (one). When NFSRestart resource goes offline, NFS daemons are stopped. When UseSMF attribute is set to 1, the exported file systems become unavailable, hence Share resource unexpectedly goes offline. Workaround: Set the value of ToleranceLimit of Share resource to a value more than 1.
Some agents may fail to come online after full upgrade to VCS 6.0 if they were online before the upgrade [2618482]
Resources of type NFSRestart, DNS, LDom and Project do not come online automatically after a full upgrade to VCS 6.0 if they were previously online. Workaround: Online the resources manually after the upgrade, if they were online previously.
45
Zone root configured on ZFS with ForceAttach attribute enabled causes zone boot failure (2695415)
On Solaris 11 system, attaching zone with -F option may result in zone boot failure if zone root is configured on ZFS. Workaround: Change the ForceAttach attribute of Zone resource from 1 to 0. With this configuration, you are recommended to keep the default value of DetachZonePath as 1.
Error message is seen for Apache resource when zone is in transient state [2703707]
If the Apache resource is probed when the zone is getting started, the following error message is logged:
Argument "VCS ERROR V-16-1-10600 Cannot connect to VCS engine\n" isn't numeric in numeric ge (>=) at /opt/VRTSvcs/bin/Apache/Apache.pm line 452. VCS ERROR V-16-1-10600 Cannot connect to VCS engine LogInt(halog call failed):TAG:E:20314 <Apache::ArgsValid> SecondLevel MonitorTimeOut must be less than MonitorTimeOut.
Workaround: You can ignore this message. When the zone is started completely, the halog command does not fail and Apache agent monitor runs successfully.
Monitor falsely reports NIC resource as offline when zone is shutting down (2683680)
If a NIC resource is configured for an Exclusive IP zone, the NIC resource is monitored inside the zone when the zone is functional. If the NIC monitor program is invoked when the zone is shutting down, the monitor program may falsely
46
report the NIC resource as offline. This may happen if some of the networking services are offline but the zone is not completely shut down. Such reports can be avoided if you override and set the ToleranceLimit value to a non-zero value. Workaround: When a NIC resource is configured for an Exclusive IP zone, you are recommended to set the ToleranceLimit attribute to a non-zero value. Calculate the ToleranceLimit value as follows: Time taken by a zone to completely shut down must be less than or equal to NIC resource's MonitorInterval value + (MonitorInterval value x ToleranceLimit value). For example, if a zone take 90 seconds to shut down and the MonitorInterval for NIC agent is set to 60 seconds (default value), set the ToleranceLimit value to 1.
Apache resource does not come online if the directory containing Apache pid file gests deleted when a node or zone restarts (2680661)
The directory in which Apache http server creates PidFile may get deleted when a node or zone restarts. Typically the PidFile is located at /var/run/apache2/httpd.pid. When the zone reboots, the /var/run/apache2 directory may get removed and hence the http server startup may fail. Workaround: Make sure that Apache http server writes the PidFile to an accessible location. You can update the PidFile location in the Apache http configuration file (For example: /etc/apache2/httpd.conf).
Online of LDom resource may fail due to incompatibility of LDom configuration file with host OVM version (2814991)
If you have a cluster running LDom with different OVM versions on the hosts, then the LDom configuration file generated on one host may display error messages when it is imported on the other host with a different OVM version. Thus, the online of LDom resource may also fail. For example, if you have a cluster running LDom with OVM versions 2.2 on one and OVM 2.1 on the other node, the using XML configuration generated on the host with OVM 2.2 may display errors when the configuration is imported on the host with OVM 2.1. Thus, the online of LDom resource fails. The following error message is displayed:
ldm add-domain failed with error Failed to add device /ldom1/ldom1 as ld1_disk1@primary-vds0 because this device is already exported on LDom primary. Volume ld1_disk1 already exists in vds primary-vds0.
47
Workaround: If the CfgFile attribute is specified, ensure that the XML configuration generated is compatible with the OVM version installed on the nodes.
Online of IP or IPMultiNICB resource may fail if its IP address specified does not fit within the values specified in the allowed-address property (2729505)
While configuring an IP or IPMultiNICB resource to be run in a zone, if the IP address specified for the resource does not match the values specified in the allowed-address property of the zone configuration, then the online of IP resource may fail. This behavior is seen only on Solaris 11 platform. Workaround: Ensure that the IP address is added to allowed-address property of the zone configuration.
Application resource running in a container with PidFiles attribute reports offline on upgrade to VCS 6.0 or later [2850927]
Application resource configured to run in a container configured with PidFiles attribute reports state as offline after upgrade to VCS 6.0 or later versions. When you upgrade VCS from lower versions to 6.0 or later, if application resources are configured to run in a container with monitoring method set to PidFiles, then upgrade may cause the state of the resources to be reported as offline. This is due to changes introduced in the Application agent where if the resource is configured to run in a container and has PidFiles configured for monitoring the resource then the value expected for this attribute is the pathname of the PID file relative to the zone root. In releases prior to VCS 6.0, the value expected for the attribute was the pathname of the PID file including the zone root. For example, a configuration extract of an application resource configured in VCS 5.0MP3 to run in a container would appear as follows:
Application apptest ( User = root StartProgram = "/ApplicationTest/app_test_start" StopProgram = "/ApplicationTest/app_test_stop" PidFiles = { "/zones/testzone/root/var/tmp/apptest.pid" } ContainerName = testzone )
48
Whereas, the same resource if configured in VCS 6.0 and later releases would be configured as follows:
Application apptest ( User = root StartProgram = "/ApplicationTest/app_test_start" StopProgram = "/ApplicationTest/app_test_stop" PidFiles = { "/var/tmp/apptest.pid" } )
Note: The container information is set at the service group level. Workaround: Modify the PidFiles pathname to be relative to the zone root as shown in the latter part of the example.
# hares -modify apptest PidFiles /var/tmp/apptest.pid
SambaShare agent clean entry point fails when access to configuration file on shared storage is lost [2858183]
When the Samba server configuration file is on shared storage and access to the shared storage is lost, SambaShare agent clean entry point fails. Workaround: No workaround.
SambaShare agent fails to offline resource in case of cable pull or on unplumbing of IP [2848020]
When IP is unplumbed or in case of cable pull scenario, agent fails to offline SambasShare resource. Workaround: No workaround.
49
NIC resource may fault during group offline or failover on Solaris 11 [2754172]
When NIC resource is configured with exclusive IP zone, NIC resource may fault during group offline or failover. This issue is observed as zone takes long time in shutdown on Solaris 11. If NIC monitor is invoked during this window, NIC agent may treat this as fault. Workaround: Increase ToleranceLimit for NIC resource when it is configured for exclusive IP zone.
NFS client reports error when server is brought down using shutdown command [2872741]
On Solaris 11, when the VCS cluster node having the NFS share service group is brought down using shutdown command, NFS clients may report "Stale NFS file handle" error. During shutdown, the SMF service svc:/network/shares un-shares all the shared paths before taking down the virtual IP. Thus, the NFS clients accessing this path get stale file handle error. Workaround: Before you shutdown the VCS cluster node, disable the svc:/network/shares SMF service, so that only VCS controls the un-sharing of the shared paths during the shutdown operation.
Intentional Offline does not work for VCS agent for Oracle [1805719]
Due to issues with health check monitoring, Intentional Offline does not work for VCS agent for Oracle.
50
The ASMInstAgent does not support having pfile/spfile for the ASM Instance on the ASM diskgroups
The ASMInstAgent does not support having pfile/spfile for the ASM Instance on the ASM diskgroups. Workaround: Have a copy of the pfile/spfile in the default $GRID_HOME/dbs directory to make sure that this would be picked up during the ASM Instance startup.
VCS agent for ASM: Health check monitoring is not supported for ASMInst agent
The ASMInst agent does not support health check monitoring. Workaround: Set the MonitorOption attribute to 0.
The NOFAILOVER action means that the agent sets the resources state to OFFLINE and freezes the service group. You may stop the agent, edit the oraerror.dat file, and change the NOFAILOVER action to another action that is appropriate for your environment. The changes go into effect when you restart the agent.
ASMInstance resource monitoring offline resource configured with OHASD as application resource logs error messages in VCS logs [2846945]
When the Oracle High Availability Services Daemon (OHASD) is configured as an application resource to be monitored under VCS and if this resource is offline on the failover node then the ASMInstance resource in the offline monitor logs the following error messages in the VCS logs:
51
Workaround: Configure the application in a separate parallel service group and ensure that the resource is online.
The value of AgentReplyTimeout attribute can be set to a high value The scheduling class and scheduling priority of agent can be increased to avoid CPU starvation for the agent, using the AgentClass and AgentPriority attributes.
Agent framework cannot handle leading and trailing spaces for the dependent attribute (2027896)
Agent framework does not allow spaces in the target resource attribute name of the dependent resource. Workaround: Do not provide leading and trailing spaces in the target resource attribute name of the dependent resource.
The agent framework does not detect if service threads hang inside an entry point [1442255]
In rare cases, the agent framework does not detect if all service threads hang inside a C entry point. In this case it may not cancel them successfully. Workaround: If the service threads of the agent are hung, send a kill signal to restart the agent. Use the following command: kill -9 hung agent's pid. The haagent -stop command does not work in this situation.
52
IMF related error messages while bringing a resource online and offline [2553917]
For a resource registered with AMF, if you run hagrp -offline or hagrp -online explicitly or through a collective process to offline or online the resource respectively, the IMF displays error messages in either case. The errors displayed is an expected behavior and it does not affect the IMF functionality in any manner. Workaround: No workaround.
53
This is a known issue with the Solaris luupgrade command. Workaround: Check with Oracle for possible workarounds for this issue.
On Sparc, Live Upgrade from Solaris 9 to Solaris 10 Update 10 may fail (2424410)
On Sparc, Live Upgrade from Solaris 9 to Solaris 10 Update 10 may fail with the following error:
Generating file list. Copying data from PBE <source.24429> to ABE <dest.24429>. 99% of filenames transferredERROR: Data duplication process terminated unexpectedly. ERROR: The output is </tmp/lucreate.13165.29314/lucopy.errors.29314>. 29794 Killed Fixing zonepaths in ABE. Unmounting ABE <dest.24429>. 100% of filenames transferredReverting state of zones in PBE <source.24429>. ERROR: Unable to copy file systems from boot environment <source.24429> to BE <dest.24429>. ERROR: Unable to populate file systems on boot environment <dest.24429>. Removing incomplete BE <dest.24429>. ERROR: Cannot make file systems for boot environment <dest.24429>.
This is a known issue with the Solaris lucreate command. Workaround: Install Oracle patch 113280-10,121430-72 or higher before running vxlustart.
54
Workaround: No workaround.
System messages having localized characters viewed using hamsg may not be displayed correctly
If you use hamsg to view system messages, the messages containing a mix of English and localized characters may not be displayed correctly. [2405416] Workaround: No workaround. However, you can view English messages in the VCS log file.
Workaround: No workaround.
55
Application group attempts to come online on primary site before fire drill service group goes offline on the secondary site (2107386)
The application service group comes online on the primary site while the fire drill service group attempts to go offline at the same time, causing the application group to fault. Workaround: Ensure that the fire drill service group is completely offline on the secondary site before the application service group comes online on the primary site.
Second secondary cluster cannot take over the primary role when primary and 1st-secondary clusters panic [2858187]
If there are three clusters(clus1, clus2, and clus3) in a GCO without a steward, when clus1 loses connection to clus2, it will send the inquiry to clus3 to check the state of clus2:
If it is able to confirm that clus2 is down, it will mark clus2 as FAULTED. If it is not able to send the inquiry to clus3, it will assume that a network disconnect has happened and mark clus2 as UNKNOWN. In this case, automatic failover will not take place even if the ClusterFailoverPolicy is set to Auto. If this happens, users would need to manually failover the global service groups.
Workaround: Configure the steward at a location geographically distinct from those of the three clusters above.
LLT port stats sometimes shows recvcnt larger than recvbytes (1907228)
With each received packet, LLT increments the following variables:
recvcnt (increment by one for every packet) recvbytes (increment by size of packet for every packet)
Both these variables are integers. With constant traffic, recvbytes hits and rolls over MAX_INT quickly. This can cause the value of recvbytes to be less than the value of recvcnt. This does not impact the LLT functionality.
56
Cannot configure LLT if full device path is not used in the llttab file (2858159)
(Oracle Solaris 11) On virtual machines ensure that you use the full path of the devices corresponding to the links in llttab. For example, use /dev/net/net1 instead of /dev/net/net:1 in the llttab file, otherwise you cannot configure LLT.
Cannot use CPI response files to add nodes to a cluster that is using LLT over UDP (2869763)
When you run the addnode -responsefile command, if the cluster is using LLT over UDP, then the /etc/llttab file generated on new nodes is not correct. So, the procedure fails and you cannot add nodes to a cluster using CPI response files. Workaround: None
While deinitializing GAB client, "gabdebug -R GabTestDriver" command logs refcount value 2 (2536373)
After you unregister the gtx port with -nodeinit option, the gabconfig -C command shows refcount as 1. But when forceful deinit option (gabdebug -R GabTestDriver) is run to deinitialize GAB client, then a message similar to the following is logged.
GAB INFO V-15-1-20239 Client GabTestDriver with refcount 2 forcibly deinited on user request
The refcount value is incremented by 1 internally. However, the refcount value is shown as 2 which conflicts with the gabconfig -C command output. Workaround: There is no workaround for this issue.
57
GAB may fail to stop during a phased upgrade on Oracle Solaris 11 (2858157)
While performing a phased upgrade on Oracle Solaris 11 systems, GAB may fail to stop. However, CPI gives a warning and continues with stopping the stack. Workaround: Reboot the node after the installer completes the upgrade.
(Oracle Solaris 11) On virtual machines, sometimes the common product installer (CPI) may report that GAB failed to start and may exit (2879262)
GAB startup script may take longer than expected to start up. The delay in start up can cause the CPI to report that GAB failed and exits. Workaround: Manually start GAB and all dependent services.
Delay in rebooting Solaris 10 nodes due to vxfen service timeout issues (1897449)
When you reboot the nodes using the shutdown -i6 -g0 -y command, the following error messages may appear:
58
svc:/system/vxfen:default:Method or service exit timed out. Killing contract 142 svc:/system/vxfen:default:Method "/lib/svc/method/vxfen stop" failed due to signal Kill.
This error occurs because the vxfen client is still active when VCS attempts to stop I/O fencing. As a result, the vxfen stop service times out and delays the system reboot. Workaround: Perform the following steps to avoid this vxfen stop service timeout error. To avoid the vxfen stop service timeout error
Stop VCS. On any node in the cluster, run the following command:
# hastop -all
Workaround: Remove the offending IP address from the listening IP addresses list using the rm_port action of the cpsadm command. See the Veritas Cluster Server Administrator's Guide for more details.
Fencing port b is visible for few seconds even if cluster nodes have not registered with CP server (2415619)
Even if the cluster nodes have no registration on the CP server and if you provide coordination point server (CP server) information in the vxfenmode file of the
59
cluster nodes, and then start fencing, the fencing port b is visible for a few seconds and then disappears. Workaround: Manually add the cluster information to the CP server to resolve this issue. Alternatively, you can use installer as the installer adds cluster information to the CP server during configuration.
The cpsadm command fails if LLT is not configured on the application cluster (2583685)
The cpsadm command fails to communicate with the coordination point server (CP server) if LLT is not configured on the application cluster node where you run the cpsadm command. You may see errors similar to the following:
# cpsadm -s 10.209.125.200 -a ping_cps CPS ERROR V-97-1400-729 Please ensure a valid nodeid using environment variable CPS_NODEID CPS ERROR V-97-1400-777 Client unable to communicate with CPS.
However, if you run the cpsadm command on the CP server, this issue does not arise even if LLT is not configured on the node that hosts CP server. The cpsadm command on the CP server node always assumes the LLT node ID as 0 if LLT is not configured. According to the protocol between the CP server and the application cluster, when you run the cpsadm on an application cluster node, cpsadm needs to send the LLT node ID of the local node to the CP server. But if LLT is unconfigured temporarily, or if the node is a single-node VCS configuration where LLT is not configured, then the cpsadm command cannot retrieve the LLT node ID. In such situations, the cpsadm command fails. Workaround: Set the value of the CPS_NODEID environment variable to 255. The cpsadm command reads the CPS_NODEID variable and proceeds if the command is unable to get LLT node ID from LLT.
When I/O fencing is not up, the svcs command shows VxFEN as online (2492874)
Solaris 10 SMF marks the service status based on the exit code of the start method for that service. The VxFEN start method executes the vxfen-startup script in the background and exits with code 0. Hence, if the vxfen-startup script subsequently exits with failure then this change is not propagated to SMF. This behavior causes the svcs command to show incorrect status for VxFEN. Workaround: Use the vxfenadm command to verify that I/O fencing is running.
60
In absence of cluster details in CP server, VxFEN fails with pre-existing split-brain message (2433060)
When you start server-based I/O fencing, the node may not join the cluster and prints error messages in logs similar to the following: In the /var/VRTSvcs/log/vxfen/vxfen.log file:
VXFEN vxfenconfig ERROR V-11-2-1043 Detected a preexisting split brain. Unable to join cluster.
The vxfend daemon on the application cluster queries the coordination point server (CP server) to check if the cluster members as seen in the GAB membership are registered with the CP server. If the application cluster fails to contact the CP server due to some reason, then fencing cannot determine the registrations on the CP server and conservatively assumes a pre-existing split-brain. Workaround: Before you attempt to start VxFEN on the application cluster, ensure that the cluster details such as cluster name, UUID, nodes, and privileges are added to the CP server.
The vxfenswap utility does not detect failure of coordination points validation due to an RSH limitation (2531561)
The vxfenswap utility runs the vxfenconfig -o modify command over RSH or SSH on each cluster node for validation of coordination points. If you run the vxfenswap command using RSH (with the -n option), then RSH does not detect the failure of validation of coordination points on a node. From this point, vxfenswap proceeds as if the validation was successful on all the nodes. But, it fails at a later stage when it tries to commit the new coordination points to the VxFEN driver. After the failure, it rolls back the entire operation, and exits cleanly with a non-zero error code. If you run vxfenswap using SSH (without the -n option), then SSH detects the failure of validation of coordination of points correctly and rolls back the entire operation immediately. Workaround: Use the vxfenswap utility with SSH (without the -n option).
61
Fencing does not come up on one of the nodes after a reboot (2573599)
If VxFEN unconfiguration has not finished its processing in the kernel and in the meantime if you attempt to start VxFEN, you may see the following error in the /var/VRTSvcs/log/vxfen/vxfen.log file:
VXFEN vxfenconfig ERROR V-11-2-1007 Vxfen already configured
However, the output of the gabconfig -a command does not list port b. The vxfenadm -d command displays the following error:
VXFEN vxfenadm ERROR V-11-2-1115 Local node is not a member of cluster!
The cpsadm command fails after upgrading CP server to 6.0 or above in secure mode (2846727)
The cpsadm command may fail after you upgrade coordination point server (CP server) to 6.0 in secure mode. If the old VRTSat package is not removed from the system, the cpsadm command loads the old security libraries present on the system. As the installer runs the cpsadm command on the CP server to add or upgrade the VCS cluster (application cluster), the installer also fails. Workaround: Perform the following procedure on all of the nodes of the CP server. To resolve this issue
62
Server-based fencing may fail to start after reinstalling the stack (2802682)
Server-based fencing may fail to start if you use the existing configuration files after reinstalling the stack. Workaround: After reinstalling the stack, add the client cluster information on the coordination point server because the client cluster information is removed when the stack is uninstalled. For more details, see the Setting up server-based I/O Fencing manually section in the Veritas Cluster Server Installation Guide. Alternatively, you can manually modify the /etc/vxfenmode file and the main.cf file to start fencing in disable mode and then configure fencing.
Common product installer cannot setup trust between a client system on release version 5.1SP1 and a server on release version 6.0 or later (2824472)
The issue exists because the 5.1SP1 release version does not support separate directories for truststores. But, release version 6.0 and later support separate directories for truststores. So, because of this mismatch in support for truststores, you cannot set up trust between client systems and servers. Workaround: Set up trust manually between the coordination point server and client systems using the cpsat or vcsat command. Now, the servers and client systems can communicate in secure mode.
63
Workaround: Retain the "port=<port_value>" setting in the /etc/vxfenmode file, when using customized fencing with at least one CP server. The default port value is 14250.
Secure CP server does not connect from localhost using 127.0.0.1 as the IP address (2554981)
The cpsadm command does not connect to the secure CP server on the localhost using 127.0.0.1 as the IP address Workaround: Connect the secure CP server using any of the virtual IPs that is configured with the CP server and is plumbed on the local node.
CoordPoint agent does not report the addition of new disks to a Coordinator disk group [2727672]
The LevelTwo monitoring of the CoordPoint agent does not report a fault even if the constituent of a coordinator disk group changes due to addition of new disks in the cooridnator disk group Workaround: There is no workaround for this issue.
Coordination point server-based fencing may fail if it is configured on 5.1SP1RP1 using 6.0.1 coordination point servers (2824472)
The 5.1SP1 installer (CPI) cannot set up trust between a 5.1SP1 client and a 6.0 or later server, because there are no separate directories for truststores in the 5.1SP1. When trust cannot be setup, the 5.1SP1 installer cannot configure 5.1SP1 clients to work with 6.0 or later CPS in secure mode. Workaround: Set up trust manually between the CPS and clients using the cpsat or the vcsat command. After that, CPS and client will be able to communicate properly in the secure mode.
64
Cannot run the vxfentsthdw utility directly from the install media if VRTSvxfen package is not installed on the system (2858190)
If VRTSvxfen package is not installed on the system, then certain script files that are needed for the vxfentsthdw utility to function are not available. So, without the VRTSvxfen package installed on the system you cannot run the utility from the install media. Workaround: Install VRTSvxfen package, then run the utility from either the install media or from the /opt/VRTSvcs/vxfen/bin/ location.
Fencing may show the RFSM state as replaying for some nodes in the cluster (2555191)
Fencing based on coordination point clients in Campus cluster environment may show the RFSM state as replaying for some nodes in the cluster. Workaround: Restart fencing on the node that shows RFSM state as replaying.
Veritas Cluster Server may not come up after rebooting the first node in phased upgrade on Oracle Solaris 11 (2852863)
If any of the kernel level services that depend upon Veritas Cluster Server (VCS) do not come up, then VCS fails to come up. The LLT, GAB, and Vxfen modules may also fail to come up because the add_drv command failed to add its driver to the system. On Solaris 11, add_drv may fail if there is another add_drv command that is being run on the system at the same time. Workaround: Check the status of LLT, GAB and Vxfen modules. Ensure that all the three services are online in SMF. Then, retry starting VCS.
vxfentsthdw utility fails to launch before you install the VRTSvxfen package (2858190)
Before you install the VRTSvxfen package, the file of /etc/vxfen.d/script/vxfen_scriptlib.sh where stores the vxfentsthdw utility doesnt exist. In this case, the utility bails out. Workaround: Besides installing the VRTSvxfen package, run the vxfentsthdw utility directly from the installation DVD.
65
During Firedrill operations, VCS may log error messages related to IMF registration failure in the engine log. This happens because in the firedrill service group, there is a second CFSMount resource monitoring the same MountPoint through IMF. Both the resources try to register for online/offline events on the same MountPoint and as a result, registration of one fails. Workaround: No workaround.
IMF does not fault zones if zones are in ready or down state [2290883]
IMF does not fault zones if zones are in ready or down state. IMF does not detect if zones are in ready or down state. In Ready state, there are no services running inside the running zones. Workaround: Offline the zones and then restart.
IMF does not detect the zone state when the zone goes into a maintenance state [2535733]
IMF does not detect the change in state. However, the change in state is detected by Zone monitor in the next cycle. Workaround: No workaround.
Engine log gets flooded with messages proportionate to the number of mount offline registration with AMF [2619778]
In a certain error condition, all mount offline events registered with AMF are notified simultaneously. This causes the following message to get printed in the engine log for each registered mount offline event:
<Date> <Time> VCS INFO V-16-2-13717 (vcsnode001) Output of the completed operation (imf_getnotification) ==============================================
66
Cannot continue monitoring event Got notification for group: cfsmount221 ==============================================
This is an expected behavior for this error condition. Apart from the messages there will be no impact on the functionality of the VCS solution. Workaround: No workaround.
This error is due to the absolute path specified in main.cf for type-specific configuration files. Currently, haimfconfig does not support absolute path for type-specific configuration file in main.cf. Wrokaround: Replace the actual path with the actual file name and copy the file from its absolute location to /etc/VRTSvcs/conf/config directory. For example, if OracleTypes.cf is included in main.cf as:
include "/etc/VRTSagents/ha/conf/Oracle/OracleTypes.cf"
IMF does not provide notification for a registered disk group if it is imported using a different name (2730774)
If a disk group resource is registered with the AMF and the disk group is then imported using a different name, AMF does not recognize the renamed disk group and hence does not provide notification to DiskGroup agent. Therefore, the DiskGroup agent keeps reporting the disk group resource as offline. Workaround: Make sure that while importing a disk group, the disk group name matches the the one registered with the AMF.
67
This does not have any effect on the functionality of IMF. Workaround: No workaround.
Error message displayed when ProPCV prevents a process from coming ONLINE to prevent concurrency violation does not have I18N support [2848011]
The following message is seen when ProPCV prevents a process from coming ONLINE to prevent concurrency violation. The message is displayed in English and does not have I18N support.
Concurrency Violation detected by VCS AMF. Process <process-details> will be prevented from startup.
Workaround: No Workaround.
System panics when getnotification requests access of groups cleaned by AMF [2848009]
AMF while handling an agent that has faulted due to external or internal activity, cleans up the groups monitored by the agent. Simultaneously, if the agent notification is in progress and the getnotification thread requests to access an already deleted group, the system panics. Workaround: No workaround.
68
The libvxamf library encounters an error condition while doing a process table scan [2848007]
Sometimes, while doing a process table scan, the libvxamf encounters an error condition. As a result, the process offline registration with AMF fails. In most cases, this registration succeeds when tried again by the agent during the next monitor cycle for this resource. This is not a catastrophic failure as the traditional monitoring continues for this resource. Workaround: No workaround.
Internationalization support is not available for the proactive prevention of the concurrency violation output [2848011]
Internationalization support is not available for the following ProPCV message:
Concurrency Violation detected by VCS AMF. Process <process-details> will be prevented from startup.
Workaround: No workaround.
AMF displays StartProgram name multiple times on the console without a VCS error code or logs [2872064]
When VCS AMF prevents a process from starting, it displays a message on the console and in syslog. The message contains the signature of the process that was prevented from starting. In some cases, this signature might not match the signature visible in the PS output. For example, the name of the shell script that was prevented from executing will be printed twice. Workaround: No workaround.
69
# kill -9 27824
Stopping the daemon gracefully stops all the child processes spawned by the daemon. However, using kill -9 pid to terminate a daemon is not a recommended option to stop a daemon, and subsequently you must kill other child processes of the daemon manually.
Workaround: You must open port 14150 on all the cluster nodes.
Unable to log on to secure VCS clusters on Solaris 11 using Java GUI (2718955)
Connecting to secure clusters deployed on Solaris 11 systems using VCS Java GUI is not supported in VCS 6.0PR1. The system displays the following error when you attempt to use the Java GUI:
Incorrect username/password
Workaround: No workaround.
70
The default locale for Solaris 11 is en_US.UTF-8 and that of Solaris 10 is C. With solaris10 brand zone, en_US.UTF-8 is not installed inside the zone by default. Therefore, the error message is logged. Workaround: This message can be safely ignored as there is no functionality issue. To avoid this message, install en_US.UTF-8 locale on solaris10 brand zone.
Software limitations
This section covers the software limitations of this release. See the corresponding Release Notes for a complete list of software limitations related to that component or product. See Documentation on page 77.
71
system may assign the PIDs listed in the PID files to other processes running on the node. Thus, if the Application agent monitors the resource using the PidFiles attribute only, the agent may discover the processes running and report a false concurrency violation. This could result in some processes being stopped that are not under VCS control.
Volumes in a disk group start automatically irrespective of the value of the StartVolumes attribute in VCS
Volumes in a disk group are started automatically when the disk group is imported, irrespective of the value of the StartVolumes attribute in VCS. This behavior is observed if the value of the system-level attribute autostartvolumes in Veritas Volume Manager is set to On. Workaround: If you do not want the volumes in a disk group to start automatically after the import of a disk group, set the autostartvolumes attribute to Off at the system level.
LDom resource calls clean entry point when primary domain is gracefully shut down
LDom agent sets failure policy of the guest domain to stop when primary domain stops. Thus when primary domain is shut down, guest domain is stopped. Moreover, when primary domain is shutdown, ldmd daemon is stopped abruptly
72
and LDom configuration cannot be read. These operations are not under VCS control and VCS may call clean entry point. Workaround: No workaround.
Interface object name must match net<x>/v4static for VCS network reconfiguration script in Solaris 11 guest domain [2840193]
If the Solaris 11 guest domain is configured for DR and its interface object name does not match the net<x>/v4static pattern then the VCS guest network reconfiguration script (VRTSvcsnr) running inside the guest domain adds a new interface object and the existing entry will remain as is.
Agent directory base name must be type name for an agent using out-of-the-box imf_init IMF entry point to get IMF support [2858160]
To get IMF support for an agent which uses the out-of-the-box imf_init IMF entry point, the base name of agent directory must be the type name. When AgentFile is set to one of the out-of-the-box agents like Script51Agent, that agent will not get IMF support.
73
Workaround:
Create the following symlink in agent directory (for example in /opt/VRTSagents/ha/bin/WebSphereMQ6 directory).
# cd /opt/VRTSagents/ha/bin/<ResourceType> # ln -s /opt/VRTSvcs/bin/Script51Agent <ResourceType>Agent
Run the following command to update the AgentFile attribute based on value of VCS_HOME.
If VCS_HOME is /opt/VRTSvcs:
# hatype -modify <ResourceType> AgentFile /opt/VRTSvcs/bin/<ResourceType>/<ResourceType>Agent
If VCS_HOME is /opt/VRTSagents/ha:
# hatype -modify <ResourceType> AgentFile /opt/VRTSagents/ha/bin/<ResourceType>/<ResourceType>Agent
74
Sybase agent does not perform qrmutil based checks if Quorum_dev is not set (2724848)
If you do not set the Quorum_dev attribute for Sybase Cluster Edition, the Sybase agent does not perform the qrmutil-based checks. This error in configuration may lead to undesirable results. For example, if qrmutil returns failure pending, the agent does not panic the system. Thus, the Sybase agent does not perform qrmutil-based checks because the Quorum_dev attribute is not set. Therefore, setting Quorum_Dev attribute is mandatory for Sybase cluster edition.
Engine hangs when you perform a global cluster upgrade from 5.0MP3 in mixed-stack environments [1820327]
If you try to upgrade a mixed stack VCS environment (where IPv4 and IPv6 are in use), from 5.0MP3 to 5.1SP1, HAD may hang. Workaround: When you perform an upgrade from 5.0MP3, make sure no IPv6 addresses are plumbed on the system..
Use VCS installer to install or upgrade VCS when the zone root is on VxFS shared storage [1215671]
You must use the VCS installer program to install or upgrade VCS when the zone root is on Veritas File System (VxFS).
The DiskGroupSnap agent does not support layered volumes. [1368385] If you use the Bronze configuration for the DiskGroupSnap resource, you could end up with inconsistent data at the secondary site in the following cases: [1391445]
After the fire drill service group is brought online, a disaster occurs at the primary site during the fire drill. After the fire drill service group is taken offline, a disaster occurs at the primary while the disks at the secondary are resynchronizing.
75
Symantec recommends that you use the Gold configuration for the DiskGroupSnap resource.
Cluster Manager (Java Console) version 5.1 and lower cannot manage VCS 6.0 secure clusters
Cluster Manager (Java Console) from versions lower than VCS 5.1 cannot be used to manage VCS 6.0 secure clusters. Symantec recommends using the latest version of Cluster Manager. See the Veritas Cluster Server Installation Guide for instructions on upgrading Cluster Manager.
Cluster Manager does not work if the hosts file contains IPv6 entries
VCS Cluster Manager fails to connect to the VCS engine if the /etc/hosts file contains IPv6 entries. Workaround: Remove IPv6 entries from the /etc/hosts file.
76
Uninstalling VRTSvxvm causes issues when VxFEN is configured in SCSI3 mode with dmp disk policy (2522069)
When VxFEN is configured in SCSI3 mode with dmp disk policy, the DMP nodes for the coordinator disks can be accessed during system shutdown or fencing arbitration. After uninstalling VRTSvxvm package, the DMP module will no longer
77
be loaded in memory. On a system where VRTSvxvm package is uninstalled, if VxFEN attempts to access DMP devices during shutdown or fencing arbitration, the system panics.
Cluster address for global cluster requires resolved virtual IP. The virtual IP address must have a DNS entry if virtual IP is used for heartbeat agents. Total number of clusters in a global cluster configuration can not exceed four. Cluster may not be declared as faulted when Symm heartbeat agent is configured even when all hosts are down. The Symm agent is used to monitor the link between two Symmetrix arrays. When all the hosts are down in a cluster but the Symm agent is able to see the replication link between the local and remote storage, it would report the heartbeat as ALIVE. Due to this, DR site does not declare the primary site as faulted. Configuring Veritas Volume Replicator for Zone Disaster Recovery is not supported for zone root replication. Oracle Solaris 11 supports zone root only on ZFS file system. Configuring a cluster of mixed nodes such as a cluster between systems running on Solaris 10 and Solaris 11 versions is not supported in VCS 6.0.1. The configuration is not supported through manual as well as CPI configuration.
Documentation
Product guides are available in the PDF format on the software media in the /docs/product_name directory. Additional documentation is available online. Make sure that you are using the current version of documentation. The document version appears on page 2 of each guide. The publication date appears on the title page of each document. The latest product documentation is available on the Symantec website. http://sort.symantec.com/documents
Documentation set
Table 1-12 lists the documents for Veritas Cluster Server.
78
Veritas Cluster Server Installation Guide Veritas Cluster Server Release Notes Veritas Cluster Server Administrators Guide
Veritas Cluster Server Bundled Agents Reference vcs_bundled_agents_601_sol.pdf Guide Veritas Cluster Server Agent Developers Guide (This document is available online, only.) vcs_agent_dev_601_unix.pdf
Veritas Cluster Server Application Note: Dynamic vcs_dynamic_reconfig_601_sol.pdf Reconfiguration for Oracle Servers Veritas Cluster Server Agent for DB2 Installation vcs_db2_agent_601_sol.pdf and Configuration Guide Veritas Cluster Server Agent for Oracle Installation vcs_oracle_agent_601_sol.pdf and Configuration Guide Veritas Cluster Server Agent for Sybase Installation vcs_sybase_agent_601_sol.pdf and Configuration Guide
Table 1-13 lists the documentation for Veritas Storage Foundation and High Availability Solutions products. Table 1-13 Veritas Storage Foundation and High Availability Solutions products documentation File name
Document title
Veritas Storage Foundation and High Availability sfhas_solutions_601_sol.pdf Solutions Solutions Guide Veritas Storage Foundation and High Availability sfhas_virtualization_601_sol.pdf Solutions Virtualization Guide
If you use Veritas Operations Manager (VOM) to manage Veritas Storage Foundation and High Availability products, refer to the VOM product documentation at: http://sort.symantec.com/documents
79
Note: The GNOME PDF Viewer is unable to view Symantec documentation. You must use Adobe Acrobat to view the documentation.
Manual pages
The manual pages for Veritas Storage Foundation and High Availability Solutions products are installed in the /opt/VRTS/man directory. Set the MANPATH environment variable so the man(1) command can point to the Veritas Storage Foundation manual pages:
For the Bourne or Korn shell (sh or ksh), enter the following commands:
MANPATH=$MANPATH:/opt/VRTS/man export MANPATH
80