Sun Cluster Cheatsheet
Sun Cluster Cheatsheet
Sun Cluster Cheatsheet
2 - Cheat Sheet
http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
rpc.pmfd
pnmd
scdpmd
Multi-threaded DPM daemon runs on each node. It is automatically started by an rc script when a node boots. It monitors the availibility of logical path that is visiable through various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.
Version 3.2 only qd_userd cl_execd ifconfig_proxy_serverd rtreg_proxy_serverd cl_pnmd scprivipd sc_zonesd cznetd rpc.fed scqdmd pnm mod serverd
is a daemon for the public network management (PMN) module. It is started at boot time and starts the PMN service. It keeps track of the local host's IPMP state and facilities inter-node failover for all IPMP groups. This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones. This daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting failure It is used for reconfiguring and plumbing the private IP address in a local zone after virtual cluster is created, also see the cznetd.xml file. This is the "fork and exec" daemin which handles requests from rgmd to spawn methods for specfic data services. Failfast will hose the box if this is killed and not restarted in 30 seconds The quorum server daemon, this possibly use to be called "scqsd" This daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland i.e a NAS quorum device
File locations
Both Versions (3.1 and 3.2) man pages log files Configuration files (CCR, eventlog, etc) Cluster and other commands sccheck logs Cluster infrastructure file sccheck logs Cluster infrastructure file Command Log
/usr/cluster/man /var/cluster/logs /var/adm/messages /etc/cluster/ /usr/cluser/lib/sc
SCSI Reservations
scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2
1 of 5
10/10/2011 5:28 PM
http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
Command shortcuts In version 3.2 there are number of shortcut command names which I have detailed below, I have left the full command name in the rest of the document so it is obvious what we are performing, all the commands are located in /usr/cluster/bin
shortcut cldevice cldevicegroup clinterconnect clnasdevice clquorum clresource clresourcegroup clreslogicalhostname clresourcetype clressharedaddress
cldev cldg clintr clnas clq clrs clrg clrslh clrt clrssa
3.2
cluster shutdown -g0 -y
Cluster information
3.1 Cluster
scstat -pv cluster list -v cluster show cluster status clnode list -v clnode show clnode status cldevice list cldevice show cldevice status clquorum list -v clquorum show clqorum status clinterconnect show clinterconnect status clresource list -v clresource show clresource status clresourcegroup list -v clresourcegroup show clresourcegroup status clresourcetype list -v clresourcetype list-props -v clresourcetype show scstat i scinstall pv clnode status -m clnode show-rev -v
3.2
Nodes
scstat n
Devices
scstat D
scstat q
scstat W
scstat g
Resource Groups
Resource Types IP Networking Multipathing Installation info (prints packages and version)
Cluster Configuration
3.1 Release Integrity check Configure the cluster (add nodes, add data services, etc) Cluster configuration utility (quorum, data sevices, resource groups, etc) Rename Set a property
sccheck scinstall scsetup cat /etc/cluster/release cluster check scinstall clsetup cluster rename -c <cluster_name> cluster set -p <name>=<value> ## List cluster commands cluster list-cmds ## Display the name of the cluster cluster list
3.2
List
## List the checks cluster list-checks ## Detailed configuration cluster show -t global
Status Reset the cluster private network settings Place the cluster into install mode Add a node Remove a node Prevent new nodes from entering
scconf a T node=<host><host> scconf r T node=<host><host> scconf a T node=. scconf -c -q node=<node>,maintstate
cluster status cluster restore-netprops <cluster_name> cluster set -p installmode=enabled clnode add -c <clustername> -n <nodename> -e endpoint1,endpoint2 -e endpoing3,endpoint4 clnode remove
Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.
Node Configuration
scconf -c -q node=<node>,reset Note: use the scstat -q command to verify that the node is in maintenance mode, the 3.1 vote count should be one for that node.
3.2
2 of 5
10/10/2011 5:28 PM
http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
clnode add [-c <cluster>] [-n <sponsornode>] \ -e <endpoint> \ -e <endpoint> <node> ## Make sure you are on the node you wish to remove clnode remove scswitch -S -h <node> clnode evacuate <node> clnode clear <node>
Remove a node from the cluster Evacuate a node from the cluster Cleanup the cluster configuration (used after removing nodes)
List nodes
## Destailed list clnode show [+|<node>]
Admin Quorum Device Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together. You can use the scsetup(3.1)/clsetup(3.2) interface to add/remove quorum devices or use the below commands.
3.1
scconf a q globaldev=d11
3.2
Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace. n/a n/a scconf r q globaldev=d11 ## Evacuate all nodes ## Put cluster into maint mode scconf c q installmode
Adding a NAS device to the quorum Adding a Quorum Server Removing a device to the quorum
clquorum add -t netapp_nas -p filer=<nasdevice>,lun_id=<IDnumdevice nasdevi clquorum add -t quorumserver -p qshost<IPaddress>,port=<portnumber> <quorum clquorum remove [-t <type>] [+|<devicename>] ## Place the cluster in install mode cluster set -p installmode=enabled ## Remove the quorum device clquorum remove <device> ## Verify the device has been removed clquorum list -v
## Remove the quorum device scconf r q globaldev=d11 ## Check the quorum devices scstat q
List
## Detailed list clquorum show [-t <type>] [-n <node>] [+|<devicename>] ## Status clquorum status [-t <type>] [-n <node>] [+|<devicename>]
scconf c q reset
clquorum reset
Bring a quorum device into maintenance mode scdidadm L (3.2 known as enabled) scconf c q globaldev=<device>,maintstate Bring a quorum device out of maintenance mode (3.2 known as disabled)
scconf c q globaldev=<device><device>,reset
Device Configuration
3.1 Check device Remove all devices from node
cldevice check [-n <node>] [+] cldevice clear [-n <node>] ## Turn on monitoring cldevice monitor [-n <node>] [+|<device>]
3.2
Monitoring
## Turn off monitoring cldevice unmonitor [-n <node>] [+|<device>]
cldevice rename -d <destination_device_name> cldevice replicate [-S <source-node>] -D <destination-node> [+] cldevice set -p default_fencing={global|pathcount|scsi3} [-n <node>] <devic ## Standard display cldevice status [-s <state>] [-n <node>] [+|<device>]
Status
## Display failed disk paths cldevice status -s fail ## Standard List cldevice list [-n <node>] [+|<device>] ## Detailed list cldevice show [-n <node>] [+|<device>] see above cldevice populate cldevice refresh [-n <node>] [+] cldevice repair [-n <node>] [+|<device>]
Lists all the configured devices including paths scdidadm L across all nodes. List all the configured devices including paths scdidadm l on node only. Reconfigure the device database, creating new instances numbers if required. Perform the repair procedure for a particular path (use then when a disk gets replaced)
scdidadm r scdidadm R <c0t0d0s0> - device scdidadm R 2 - device id
Disks group
3.1 Create a device group Remove a device group Adding Removing Set a property
n/a n/a scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true scconf r D name=<disk group>
3.2
cldevicegroup create -t vxvm -n <node-list> -p failback= cldevicegroup delete <devgrp> cldevicegroup add-device -d <device> <devgrp> cldevicegroup remove-device -d <device> <devgrp> cldevicegroup set [-p <name>=<value>] [+|<devgrp>]
3 of 5
10/10/2011 5:28 PM
http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
List
scstat ## Detailed configuration report cldevicegroup show [-n <node>] [-t <type>] [+|<devgrp>]
status adding single node Removing single node Switch Put into maintenance mode take out of maintenance mode onlining a disk group offlining a disk group Resync a disk group
scstat scconf -a -D type=vxvm,name=appdg,nodelist=<host> scconf r D name=<disk group>,nodelist=<host> scswitch z D <disk group> -h <host> scswitch m D <disk group> scswitch -z -D <disk group> -h <host> scswitch -z -D <disk group> -h <host> scswitch -F -D <disk group> scconf -c -D name=appdg,sync
cldevicegroup status [-n <node>] [-t <type>] [+|<devgrp> cldevicegroup add-node [-n <node>] [-t <type>] [+|<devgr cldevicegroup remove-node [-n <node>] [-t <type>] [+|<de cldevicegroup switch -n <nodename> <devgrp> n/a n/a cldevicegroup online <devgrp> cldevicegroup offline <devgrp> cldevicegroup syn [-t <type>] [+|<devgrp>]
Transport Cable
3.1 Add Remove Enable Disable
Note: it gets deleted scconf c m endpoint=<host>:qfe1,state=enabled scconf c m endpoint=<host>:qfe1,state=disabled clinterconnect disable [-n <node>] [+|<endpoint>,<endpoint>] ## Standard and detailed list clinterconnect show [-n <node>][+|<endpoint>,<endpoint>] clinterconnect status [-n <node>][+|<endpoint>,<endpoint>] clinterconnect add <endpoint>,<endpoint> clinterconnect remove <endpoint>,<endpoint> clinterconnect enable [-n <node>] [+|<endpoint>,<endpoint>]
3.2
List Status
scstat scstat
Resource Groups
3.1 Adding (failover) Adding (scalable) Adding a node to a resource group
scrgadm -a -g <res_group> -h <host>,<host> clresourcegroup create <res_group> clresourcegroup create -S <res_group> clresourcegroup add-node -n <node> <res_group> ## Remove a resource group clresourcegroup delete <res_group>
3.2
Removing
scrgadm r g <group> ## Remove a resource group and all its resources clresourcegroup delete -F <res_group>
Removing a node from a resource group changing properties Status Listing Detailed List Display mode type (failover or scalable)
scrgadm -c -g <resource group> -y <propety=value> scstat -g scstat g scrgadm pv g <res_group> scrgadm -pv -g <res_group> | grep 'Res Group mode'
clresourcegroup remove-node -n <node> <res_group> clresourcegroup set -p Failback=true + <name=value> clresourcegroup status [-n <node>][-r <resource][-s <state>][-t <resourcety clresourcegroup list [-n <node>][-r <resource][-s <state>][-t <resourcetype clresourcegroup show [-n <node>][-r <resource][-s <state>][-t <resourcetype
Offlining
scswitch F g <res_group>
## Individual group clresourcegroup offline [-n <node>] <res_group> clresourcegroup evacuate [+|-n <node>]
Onlining
Evacuate all resource groups from a node (used when shutting down a node)
scswitch u g <res_group>
Unmanaging
Note: (all resources in group must be disabled)
Managing Switching Suspend Resume Remaster (move the resource group/s to their preferred node) Restart a resource group (bring offline then online)
clresourcegroup manage <res_group> clresourcegroup switch -n <node> <res_group> clresourcegroup suspend [+|<res_group>] clresourcegroup resume [+|<res_group>] clresourcegroup remaster [+|<res_group>] clresourcegroup restart [-n <node>] [+|<res_group>]
Resources
3.1 Adding failover network resource Adding shared network resource adding a failover apache application and attaching the network resource adding a shared apache application and attaching the network resource
scrgadm a L g <res_group> -l <logicalhost> scrgadm a S g <res_group> -l <logicalhost> scrgadm a j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=False y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin scrgadm a j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=True y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \ > -x FileSystemMountPoints=/oracle/data01 \ > -x Affinityon=true scrgadm r j res-ip clresource create -t HAStorage -g <res_group> \ -p FilesystemMountPoints=<mount-point-list> \ -p Affinityon=true <rs-hasp>
3.2
clreslogicalhostname create -g <res_group> <lh-resource> clressharedaddress create -t -g <res_group> <sa-resource>
Removing
Note: must disable the resource first
4 of 5
10/10/2011 5:28 PM
http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
scrgadm -c -j <resource> -y <property=value> ## Adding clresource set -p <name>+=<value> <resource> clresource list [-g <res_group>][-t <resourcetype>][+|<resource>
List
scstat -g
## List properties clresource list-props [-g <res_group>][-t <resourcetype>][+|<res clresurce show [-n <node>] [-g <res_group>][-t <resourcetype>][+ clresource status [-s <state>][-n <node>] [-g <res_group>][-t <r clresource monitor [-n <node>] [-g <res_group>][-t <resourcetype clresource unmonitor [-n <node>] [-g <res_group>][-t <resourcety clresource disable <resource> clresource enable <resource> clresource clear -f STOP_FAILED <resource>
Detailed List Status Disable resoure monitor Enable resource monitor Disabling Enabling Clearing a failed resource Find the network of a resource
scrgadm pv j res-ip scrgadm pvv j res-ip scstat -g scrgadm n M j res-ip scrgadm e M j res-ip scswitch n j res-ip scswitch e j res-ip scswitch c h<host>,<host> -j <resource> -f STOP_FAILED scrgadm pvv j <resource> | grep I network ## offline the group scswitch F g rgroup-1
## offline the group clresourcegroup offline <res_group> ## remove the resource clresource [-g <res_group>][-t <resourcetype>][+|<resource>] ## remove the resource group clresourcegroup delete <res_group>
## remove the resource scrgadm r j res-ip ## remove the resource group scrgadm r g rgroup-1
Resource Types
3.1 Adding (register in 3.2) Register a resource type to a node Deleting (remove in 3.2) Deregistering a resource type from a node Listing Listing resource type properties Show resource types Set properties of a resource type
scrgadm a t <resource type> n/a scrgadm r t <resource type> n/a scrgadm pv | grep Res Type name i.e SUNW.HAStoragePlus clresourcetype register <type> clresourcetype add-node -n <node> <type> clresourcetype unregister <type> clresourcetype remove-node -n <node> <type> clresourcetype list [<type>] clresourcetype list-props [<type>] clresourcetype show [<type>] clresourcetype set [-p <name>=<value>] <type>
3.2
5 of 5
10/10/2011 5:28 PM