Sun Cluster Cheatsheet

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5
At a glance
Powered by AI
The document provides a cheat sheet for common commands and information for both Sun Cluster 3.1 and 3.2. It describes the daemons and processes running after installation and their purposes. It also outlines commands for managing resource groups and resources.

The document lists and describes daemons like clexecd, cl_ccrad, cl_eventd, failfastd, rgmd, rpc.fed, rpc.pmfd, pnmd, scdpmd that run after installation and their purposes. It notes some daemons are monitored by failfastd to trigger kernel panics if not restarted.

To manage resource groups, commands like clresourcegroup manage, switch, suspend, resume, remaster, restart are used. The scswitch and scrgadm commands from version 3.1 also provide resource group management functionality.

Sun Cluster 3.

2 - Cheat Sheet

http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

Sun Cluster Cheat Sheet


This cheatsheet contains common commands and information for both Sun Cluster 3.1 and 3.2, there is some missing information and over time I hope to complete this i.e zones, NAS devices, etc Also both versions of Cluster have a text based GUI tool, so don't be afraid to use this, especially if the task is a simple one scsetup (3.1) clsetup (3.2) Also all the commands in version 3.1 are available to version 3.2 Daemons and Processes At the bottom of the installation guide I listed the daemons and processing running after a fresh install, now is the time explain what these processes do, I have managed to obtain informtion on most of them but still looking for others.
Versions 3.1 and 3.2 clexecd
This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck commands). It is also used to run cluster commands remotely (like the cluster shutdown command). This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds. This daemon provides access from userland management applications to the CCR. It is automatically restarted if it is stopped. The cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events. The daemon is automatically respawned if it is killed. cluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped. This daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed The resource group management daemon which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds. This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds. This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds. Public managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped. Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.

cl_ccrad cl_eventd cl_eventlogd failfastd rgmd rpc.fed

rpc.pmfd

pnmd

scdpmd

Multi-threaded DPM daemon runs on each node. It is automatically started by an rc script when a node boots. It monitors the availibility of logical path that is visiable through various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.

Version 3.2 only qd_userd cl_execd ifconfig_proxy_serverd rtreg_proxy_serverd cl_pnmd scprivipd sc_zonesd cznetd rpc.fed scqdmd pnm mod serverd
is a daemon for the public network management (PMN) module. It is started at boot time and starts the PMN service. It keeps track of the local host's IPMP state and facilities inter-node failover for all IPMP groups. This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones. This daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting failure It is used for reconfiguring and plumbing the private IP address in a local zone after virtual cluster is created, also see the cznetd.xml file. This is the "fork and exec" daemin which handles requests from rgmd to spawn methods for specfic data services. Failfast will hose the box if this is killed and not restarted in 30 seconds The quorum server daemon, this possibly use to be called "scqsd" This daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland i.e a NAS quorum device

File locations
Both Versions (3.1 and 3.2) man pages log files Configuration files (CCR, eventlog, etc) Cluster and other commands sccheck logs Cluster infrastructure file sccheck logs Cluster infrastructure file Command Log
/usr/cluster/man /var/cluster/logs /var/adm/messages /etc/cluster/ /usr/cluser/lib/sc

Version 3.1 Only


/var/cluster/sccheck/report.<date> /etc/cluster/ccr/infrastructure

Version 3.2 Only


/var/cluster/logs/cluster_check/remote.<date> /etc/cluster/ccr/global/infrastructure /var/cluster/logs/commandlog

SCSI Reservations
scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2

Display reservation keys


scsi3: /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2

scsi2: /usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

determine the device owner


scsi3: /usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

1 of 5

10/10/2011 5:28 PM

Sun Cluster 3.2 - Cheat Sheet

http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

Command shortcuts In version 3.2 there are number of shortcut command names which I have detailed below, I have left the full command name in the rest of the document so it is obvious what we are performing, all the commands are located in /usr/cluster/bin
shortcut cldevice cldevicegroup clinterconnect clnasdevice clquorum clresource clresourcegroup clreslogicalhostname clresourcetype clressharedaddress
cldev cldg clintr clnas clq clrs clrg clrslh clrt clrssa

Shutting down and Booting a Cluster


3.1
##other nodes in cluster scswitch -S -h <host> shutdown -i5 -g0 -y ## Last remaining node scshutdown -g0 -y

3.2
cluster shutdown -g0 -y

shutdown entire cluster

shutdown single node reboot a node into non-cluster mode

scswitch -S -h <host> shutdown -i5 -g0 -y ok> boot -x

clnode evacuate <node> shutdown -i5 -g0 -y ok> boot -x

Cluster information
3.1 Cluster
scstat -pv cluster list -v cluster show cluster status clnode list -v clnode show clnode status cldevice list cldevice show cldevice status clquorum list -v clquorum show clqorum status clinterconnect show clinterconnect status clresource list -v clresource show clresource status clresourcegroup list -v clresourcegroup show clresourcegroup status clresourcetype list -v clresourcetype list-props -v clresourcetype show scstat i scinstall pv clnode status -m clnode show-rev -v

3.2

Nodes

scstat n

Devices

scstat D

Quorum Transport info Resources

scstat q

scstat W

scstat g

Resource Groups

scsat -g scrgadm -pv

Resource Types IP Networking Multipathing Installation info (prints packages and version)

Cluster Configuration
3.1 Release Integrity check Configure the cluster (add nodes, add data services, etc) Cluster configuration utility (quorum, data sevices, resource groups, etc) Rename Set a property
sccheck scinstall scsetup cat /etc/cluster/release cluster check scinstall clsetup cluster rename -c <cluster_name> cluster set -p <name>=<value> ## List cluster commands cluster list-cmds ## Display the name of the cluster cluster list

3.2

List
## List the checks cluster list-checks ## Detailed configuration cluster show -t global

Status Reset the cluster private network settings Place the cluster into install mode Add a node Remove a node Prevent new nodes from entering
scconf a T node=<host><host> scconf r T node=<host><host> scconf a T node=. scconf -c -q node=<node>,maintstate

cluster status cluster restore-netprops <cluster_name> cluster set -p installmode=enabled clnode add -c <clustername> -n <nodename> -e endpoint1,endpoint2 -e endpoing3,endpoint4 clnode remove

Put a node into maintenance state

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.

Node Configuration

scconf -c -q node=<node>,reset Note: use the scstat -q command to verify that the node is in maintenance mode, the 3.1 vote count should be one for that node.

Get a node out of maintenance state

3.2

2 of 5

10/10/2011 5:28 PM

Sun Cluster 3.2 - Cheat Sheet

http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

Add a node to the cluster

clnode add [-c <cluster>] [-n <sponsornode>] \ -e <endpoint> \ -e <endpoint> <node> ## Make sure you are on the node you wish to remove clnode remove scswitch -S -h <node> clnode evacuate <node> clnode clear <node>

Remove a node from the cluster Evacuate a node from the cluster Cleanup the cluster configuration (used after removing nodes)

## Standard list clnode list [+|<node>]

List nodes
## Destailed list clnode show [+|<node>]

Change a nodes property Status of nodes

clnode set -p <name>=<value> [+|<node>] clnode status [+|<node>]

Admin Quorum Device Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together. You can use the scsetup(3.1)/clsetup(3.2) interface to add/remove quorum devices or use the below commands.
3.1
scconf a q globaldev=d11

3.2

Adding a SCSI device to the quorum

Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace. n/a n/a scconf r q globaldev=d11 ## Evacuate all nodes ## Put cluster into maint mode scconf c q installmode

clquorum add [-t <type>] [-p <name>=<value>] [+|<devicename>]

Adding a NAS device to the quorum Adding a Quorum Server Removing a device to the quorum

clquorum add -t netapp_nas -p filer=<nasdevice>,lun_id=<IDnumdevice nasdevi clquorum add -t quorumserver -p qshost<IPaddress>,port=<portnumber> <quorum clquorum remove [-t <type>] [+|<devicename>] ## Place the cluster in install mode cluster set -p installmode=enabled ## Remove the quorum device clquorum remove <device> ## Verify the device has been removed clquorum list -v

Remove the last quorum device

## Remove the quorum device scconf r q globaldev=d11 ## Check the quorum devices scstat q

## Standard list clquorum list -v [-t <type>] [-n <node>] [+|<devicename>]

List

## Detailed list clquorum show [-t <type>] [-n <node>] [+|<devicename>] ## Status clquorum status [-t <type>] [-n <node>] [+|<devicename>]

scconf c q reset

Resetting quorum info


Note: this will bring all offline quorum devices online

clquorum reset

Bring a quorum device into maintenance mode scdidadm L (3.2 known as enabled) scconf c q globaldev=<device>,maintstate Bring a quorum device out of maintenance mode (3.2 known as disabled)
scconf c q globaldev=<device><device>,reset

## Obtain the device number

clquorum enable [-t <type>] [+|<devicename>]

clquorum disable [-t <type>] [+|<devicename>]

Device Configuration
3.1 Check device Remove all devices from node
cldevice check [-n <node>] [+] cldevice clear [-n <node>] ## Turn on monitoring cldevice monitor [-n <node>] [+|<device>]

3.2

Monitoring
## Turn off monitoring cldevice unmonitor [-n <node>] [+|<device>]

Rename Replicate Set properties of a device

cldevice rename -d <destination_device_name> cldevice replicate [-S <source-node>] -D <destination-node> [+] cldevice set -p default_fencing={global|pathcount|scsi3} [-n <node>] <devic ## Standard display cldevice status [-s <state>] [-n <node>] [+|<device>]

Status
## Display failed disk paths cldevice status -s fail ## Standard List cldevice list [-n <node>] [+|<device>] ## Detailed list cldevice show [-n <node>] [+|<device>] see above cldevice populate cldevice refresh [-n <node>] [+] cldevice repair [-n <node>] [+|<device>]

Lists all the configured devices including paths scdidadm L across all nodes. List all the configured devices including paths scdidadm l on node only. Reconfigure the device database, creating new instances numbers if required. Perform the repair procedure for a particular path (use then when a disk gets replaced)
scdidadm r scdidadm R <c0t0d0s0> - device scdidadm R 2 - device id

Disks group
3.1 Create a device group Remove a device group Adding Removing Set a property
n/a n/a scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true scconf r D name=<disk group>

3.2
cldevicegroup create -t vxvm -n <node-list> -p failback= cldevicegroup delete <devgrp> cldevicegroup add-device -d <device> <devgrp> cldevicegroup remove-device -d <device> <devgrp> cldevicegroup set [-p <name>=<value>] [+|<devgrp>]

3 of 5

10/10/2011 5:28 PM

Sun Cluster 3.2 - Cheat Sheet

http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

## Standard list cldevicegroup list [-n <node>] [-t <type>] [+|<devgrp>]

List

scstat ## Detailed configuration report cldevicegroup show [-n <node>] [-t <type>] [+|<devgrp>]

status adding single node Removing single node Switch Put into maintenance mode take out of maintenance mode onlining a disk group offlining a disk group Resync a disk group

scstat scconf -a -D type=vxvm,name=appdg,nodelist=<host> scconf r D name=<disk group>,nodelist=<host> scswitch z D <disk group> -h <host> scswitch m D <disk group> scswitch -z -D <disk group> -h <host> scswitch -z -D <disk group> -h <host> scswitch -F -D <disk group> scconf -c -D name=appdg,sync

cldevicegroup status [-n <node>] [-t <type>] [+|<devgrp> cldevicegroup add-node [-n <node>] [-t <type>] [+|<devgr cldevicegroup remove-node [-n <node>] [-t <type>] [+|<de cldevicegroup switch -n <nodename> <devgrp> n/a n/a cldevicegroup online <devgrp> cldevicegroup offline <devgrp> cldevicegroup syn [-t <type>] [+|<devgrp>]

Transport Cable
3.1 Add Remove Enable Disable
Note: it gets deleted scconf c m endpoint=<host>:qfe1,state=enabled scconf c m endpoint=<host>:qfe1,state=disabled clinterconnect disable [-n <node>] [+|<endpoint>,<endpoint>] ## Standard and detailed list clinterconnect show [-n <node>][+|<endpoint>,<endpoint>] clinterconnect status [-n <node>][+|<endpoint>,<endpoint>] clinterconnect add <endpoint>,<endpoint> clinterconnect remove <endpoint>,<endpoint> clinterconnect enable [-n <node>] [+|<endpoint>,<endpoint>]

3.2

List Status

scstat scstat

Resource Groups
3.1 Adding (failover) Adding (scalable) Adding a node to a resource group
scrgadm -a -g <res_group> -h <host>,<host> clresourcegroup create <res_group> clresourcegroup create -S <res_group> clresourcegroup add-node -n <node> <res_group> ## Remove a resource group clresourcegroup delete <res_group>

3.2

Removing

scrgadm r g <group> ## Remove a resource group and all its resources clresourcegroup delete -F <res_group>

Removing a node from a resource group changing properties Status Listing Detailed List Display mode type (failover or scalable)
scrgadm -c -g <resource group> -y <propety=value> scstat -g scstat g scrgadm pv g <res_group> scrgadm -pv -g <res_group> | grep 'Res Group mode'

clresourcegroup remove-node -n <node> <res_group> clresourcegroup set -p Failback=true + <name=value> clresourcegroup status [-n <node>][-r <resource][-s <state>][-t <resourcety clresourcegroup list [-n <node>][-r <resource][-s <state>][-t <resourcetype clresourcegroup show [-n <node>][-r <resource][-s <state>][-t <resourcetype

## All resource groups clresourcegroup offline +

Offlining

scswitch F g <res_group>

## Individual group clresourcegroup offline [-n <node>] <res_group> clresourcegroup evacuate [+|-n <node>]

## All resource groups clresourcegroup online +

Onlining

scswitch -Z -g <res_group> ## Individual groups clresourcegroup online [-n <node>] <res_group>

Evacuate all resource groups from a node (used when shutting down a node)
scswitch u g <res_group>

clresourcegroup evacuate [+|-n <node>]

Unmanaging
Note: (all resources in group must be disabled)

clresourcegroup unmanage <res_group>

Managing Switching Suspend Resume Remaster (move the resource group/s to their preferred node) Restart a resource group (bring offline then online)

scswitch o g <res_group> scswitch z g <res_group> h <host> n/a n/a n/a n/a

clresourcegroup manage <res_group> clresourcegroup switch -n <node> <res_group> clresourcegroup suspend [+|<res_group>] clresourcegroup resume [+|<res_group>] clresourcegroup remaster [+|<res_group>] clresourcegroup restart [-n <node>] [+|<res_group>]

Resources
3.1 Adding failover network resource Adding shared network resource adding a failover apache application and attaching the network resource adding a shared apache application and attaching the network resource
scrgadm a L g <res_group> -l <logicalhost> scrgadm a S g <res_group> -l <logicalhost> scrgadm a j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=False y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin scrgadm a j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=True y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \ > -x FileSystemMountPoints=/oracle/data01 \ > -x Affinityon=true scrgadm r j res-ip clresource create -t HAStorage -g <res_group> \ -p FilesystemMountPoints=<mount-point-list> \ -p Affinityon=true <rs-hasp>

3.2
clreslogicalhostname create -g <res_group> <lh-resource> clressharedaddress create -t -g <res_group> <sa-resource>

Create a HAStoragePlus failover resource

Removing
Note: must disable the resource first

clresource delete [-g <res_group>][-t <resourcetype>][+|<resourc

4 of 5

10/10/2011 5:28 PM

Sun Cluster 3.2 - Cheat Sheet

http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

## Changing clresource set -t <type> -p <name>=<value> +

changing or adding properties

scrgadm -c -j <resource> -y <property=value> ## Adding clresource set -p <name>+=<value> <resource> clresource list [-g <res_group>][-t <resourcetype>][+|<resource>

List

scstat -g

## List properties clresource list-props [-g <res_group>][-t <resourcetype>][+|<res clresurce show [-n <node>] [-g <res_group>][-t <resourcetype>][+ clresource status [-s <state>][-n <node>] [-g <res_group>][-t <r clresource monitor [-n <node>] [-g <res_group>][-t <resourcetype clresource unmonitor [-n <node>] [-g <res_group>][-t <resourcety clresource disable <resource> clresource enable <resource> clresource clear -f STOP_FAILED <resource>

Detailed List Status Disable resoure monitor Enable resource monitor Disabling Enabling Clearing a failed resource Find the network of a resource

scrgadm pv j res-ip scrgadm pvv j res-ip scstat -g scrgadm n M j res-ip scrgadm e M j res-ip scswitch n j res-ip scswitch e j res-ip scswitch c h<host>,<host> -j <resource> -f STOP_FAILED scrgadm pvv j <resource> | grep I network ## offline the group scswitch F g rgroup-1

## offline the group clresourcegroup offline <res_group> ## remove the resource clresource [-g <res_group>][-t <resourcetype>][+|<resource>] ## remove the resource group clresourcegroup delete <res_group>

Removing a resource and resource group

## remove the resource scrgadm r j res-ip ## remove the resource group scrgadm r g rgroup-1

Resource Types
3.1 Adding (register in 3.2) Register a resource type to a node Deleting (remove in 3.2) Deregistering a resource type from a node Listing Listing resource type properties Show resource types Set properties of a resource type
scrgadm a t <resource type> n/a scrgadm r t <resource type> n/a scrgadm pv | grep Res Type name i.e SUNW.HAStoragePlus clresourcetype register <type> clresourcetype add-node -n <node> <type> clresourcetype unregister <type> clresourcetype remove-node -n <node> <type> clresourcetype list [<type>] clresourcetype list-props [<type>] clresourcetype show [<type>] clresourcetype set [-p <name>=<value>] <type>

3.2

5 of 5

10/10/2011 5:28 PM

You might also like