Unix
Unix
A Quick Reference Guide for Clustering, Security, Virtualization and General Administration for Solaris and Linux Operating Systems; Private Version.
Robert Bailey
Unix Administration Guide: A Quick Reference Guide for Clustering, Security, Virtualization and General Administration for Solaris and Linux Operating Systems; Private Version.
Robert Bailey Version 1.4 - In Progress Abstract: Obscure UNIX Procedures and Tasks This document covers Solaris 10, RHEL 5.3, and some AIX when using advanced topics such as LDOM's, Live Upgrades with SVM Mirror Splitting, FLAR Booting, Security Hardening, VCS Application Agent for Non-Global Zones, and IO Fencing. Many procedures are my own, some from scattered internet sites, some from the Vendors documentation. You are welcome to use this document, however be advised that several sections are copied from vendor documentation and various web sites, and therefore there is a high possibility for plagiarism. In general, this document is a collection of notes collected from a number of sources and experiences, in most cases it is accurate, however you should note that typo's should be expected along with some issues with command line and file output that extends beyond the format of this document.
<legalnotice> THE MATERIALS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. FURTHERMORE YOU MAY NOT USE THIS DOCUMENT AS A MEANS OF PROFIT, OR FOR CORPORATE USAGE, WITHOUT THE EXPLICIT CONCENT FROM THE AUTHOR. </legalnotice>
Table of Contents
1. Security Overview .......................................................................................................... 1 Definitions and Concepts ............................................................................................. 1 2. Project Live Cycle .......................................................................................................... 7 General Project Overview ............................................................................................ 7 Pre Test Data Collection .............................................................................................. 8 Scripting Test Cases ................................................................................................... 9 3. RAID Overview ............................................................................................................ 12 Purpose and basics .................................................................................................... 12 Principles ................................................................................................................ 13 Nested levels ............................................................................................................ 13 Non-standard levels ................................................................................................... 14 4. Solaris Security ............................................................................................................. 15 BSM C2 Auditing ..................................................................................................... 15 BSM Secure Device Control ....................................................................................... 17 General Hardening .................................................................................................... 19 Destructive DTrace Examples ..................................................................................... 19 IPFilter Overview ..................................................................................................... 20 IPSec with Shared Keys ............................................................................................. 23 IPSec With 509 Certs ................................................................................................ 26 Apache2 SSL Configuration with Self-Signed Certs ........................................................ 29 RBAC and Root As a ROLE ...................................................................................... 31 Secure Non-Global Zone FTP Server ........................................................................... 32 Trusted Extensions .................................................................................................... 35 5. Solaris Virtualization ..................................................................................................... 39 Logical Domains ...................................................................................................... 39 Socket, Core and Thread Distribution ................................................................... 39 Install Domain Manager Software ........................................................................ 39 Configure Primary Domain ................................................................................. 40 Create DOM1 .................................................................................................. 40 Adding RAW Disks and ISO Images to DOM1 ...................................................... 40 Bind DOM1 and set up for booting ...................................................................... 40 Install OS Image and Clean up DOM1 ................................................................. 41 Create LDOM #2 .............................................................................................. 41 Backup or Template LDOM Configurations ........................................................... 41 Add one virtual disk to two LDOMs .................................................................... 41 Grouping VCC Console ..................................................................................... 43 LDOM Automation Script .................................................................................. 43 VCS and LDOM Failover, Features and Start and Stop ............................................ 45 VCS LDOM with ZPool Configuration ................................................................. 47 Manual LDOM and Zpool Migration .................................................................... 48 xVM (XEN) Usage on OpenSolaris 2009.06 .................................................................. 49 Quick Create for Solaris 10 HVM ....................................................................... 49 Solaris 10 Non-Global Zones ...................................................................................... 49 Comments on Zones and Live Upgrade ................................................................ 49 Comments on Zones and Veritas Control .............................................................. 51 Basic Non-Global Zone Creation SPARSE ............................................................ 52 Scripting Basic Non-Global Zone Creation SPARSE ............................................... 53 Using Dtrace to monitor non-global zones ............................................................. 54 Setup a Non-Global Zone for running Dtrace ......................................................... 55 Using Dtrace to trace an applincation in a non-global zones ...................................... 55 Using Dtrace to monitor non-global zones ............................................................. 55
iii
6.
7.
8.
9.
Non-Global Zone Commands .............................................................................. 56 Non-Global Zones and Stock VCS Zone Agent ...................................................... 59 Non-Global Zones and Custom VCS Application Agent ........................................... 60 Solaris WANBoot ......................................................................................................... 64 General Overview for Dynamic Wanboot POC .............................................................. 64 POC Goals .............................................................................................................. 64 POC Out of Scope .................................................................................................... 64 Current challanges with wanboot marked for resolution ................................................... 65 POC Wanboot Configuration Highlights ....................................................................... 65 Next Steps .............................................................................................................. 65 Configuration Steps .................................................................................................. 65 Solaris 10 Live Upgrade ................................................................................................. 69 Solaris 8 to Solaris 10 U6 Work A Round ..................................................................... 69 Review current root disk and mirror ............................................................................. 70 Create Alternate Boot Device - ZFS ............................................................................. 71 Create Alternate Boot Device - SVM ........................................................................... 71 Patch, Adding Packages, setting boot environment and Installation examples ........................ 72 Solaris and Linux General Information .............................................................................. 75 Patch Database Information ........................................................................................ 75 SSH Keys ................................................................................................................ 76 RHEL 5.2 NIS Client ................................................................................................ 76 Redhat Proc FS Tricks ............................................................................................... 76 Force a panic on RHEL ..................................................................................... 76 Adjust swap of processes ................................................................................... 76 iSCSI Notes - RHEL 53 Target SOL 10U6 Initiator ........................................................ 77 Setup Linux NIC Bonding .......................................................................................... 78 Linux TCP sysctl settings .......................................................................................... 79 Linux Dynamic SAN HBA Scan ................................................................................ 80 Solaris 10 - Mapping a process to a port ....................................................................... 81 Network and Services Tasks for Linux ......................................................................... 82 Hardening Linux ....................................................................................................... 83 Solaris 10 Notes ........................................................................................................... 88 Link Aggregation ...................................................................................................... 88 Link Aggregation ...................................................................................................... 89 IPMP Overview ........................................................................................................ 90 IPMP Probe Based Target System Configuration ............................................................ 91 Using Service Management Facility (SMF) in the Solaris 10 OS ........................................ 92 MPXIO ................................................................................................................... 98 USB Wireless Setup WUSB54GC .............................................................................. 100 VCS MultiNICB without probe address - link only ........................................................ 101 Network IO in/out per interface ................................................................................. 101 Register Solaris CLI ................................................................................................ 102 NFS Performance .................................................................................................... 102 iSCSI Software Target Initiator .................................................................................. 103 iSCSI Target using TPGT Restrictions ........................................................................ 105 iSCSI Software Initiator ........................................................................................... 106 SVM Root Disk Mirror ............................................................................................ 106 Replace Failed SVM Mirror Drive ............................................................................. 110 ZFS Root adding a Mirror ........................................................................................ 113 Create Flar Images .................................................................................................. 114 FLAR Boot Installation ............................................................................................ 114 ZFS Notes ............................................................................................................. 121 ZFS ACL's ............................................................................................................. 123 ZFS and ARC Cache ............................................................................................... 125
iv
10. VMWare ESX 3 ........................................................................................................ 128 Enable iSCSI Software Initiators ................................................................................ 128 General esxcfg commands ........................................................................................ 128 General vmware-cmd commands ................................................................................ 131 Common Tasks ....................................................................................................... 132 Shared Disks with out RAW Access ........................................................................... 133 Using vmclone.pl clone script ................................................................................... 134 Clone VMWare Virtual Guests .................................................................................. 137 Clone VMWare Disks .............................................................................................. 138 LUN Path Information ............................................................................................. 139 11. AIX Notes ................................................................................................................ 141 Etherchannel ........................................................................................................... 141 12. Oracle 10g with RAC ................................................................................................. 143 Oracle General SQL Quick Reference ......................................................................... 143 Oracle 10g RAC Solaris Quick Reference ................................................................... 143 Oracle 10g R2 RAC ASM Reference .......................................................................... 145 Oracle 10g R2 RAC CRS Reference ........................................................................... 146 Oracle RAC SQL .................................................................................................... 147 13. EMC Storage ............................................................................................................ 152 PowerPath Commands ............................................................................................. 152 PowerPath Command Examples ................................................................................. 152 Disable PowerPath .................................................................................................. 153 INQ Syminq Notes .................................................................................................. 154 Brocade Switches .................................................................................................... 155 14. Dtrace ...................................................................................................................... 158 Track time on each I/O ............................................................................................ 158 Track directories where writes are occurring ................................................................ 159 15. Disaster Recovery ...................................................................................................... 160 VVR 5.0 ................................................................................................................ 160 VVR Configuration ......................................................................................... 160 General VVR Tasks using 5.0MP3 ..................................................................... 163 VVR and GCO v5.x Made Easy ...................................................................... 166 VVR 4.X ............................................................................................................... 175 Here's now to resynchronize the old Primary once you bring it back up 4.x: .............. 175 Failing Over from a Primary 4.x ....................................................................... 176 Setting Up VVR 4.x - the hard way ................................................................... 178 Growing/Shrinking a Volume or SRL 4.x ........................................................... 179 Removing a VVR volume 4.x .......................................................................... 180 16. VxVM and Storage Troubleshooting ............................................................................. 181 How to disable and re-enable VERITAS Volume Manager at boot time when the boot disk is encapsulated ........................................................................................................ 181 Replacing a failed drive ........................................................................................... 183 Storage Volume Growth and Relayout ........................................................................ 183 UDID_MISMATCH ................................................................................................ 185 VxVM Disk Group Recovery .................................................................................... 186 Resize VxFS Volume and Filesystem ......................................................................... 187 Incorrect DMP or Disk Identification .......................................................................... 187 Data Migration out of rootdg .................................................................................... 188 Recover vx Plex ..................................................................................................... 188 Shell code to get solaris disk size in GB ..................................................................... 188 Split Root Mirror vxvm ............................................................................................ 189 If VxVM Split Mirror needs post split recovery ............................................................ 190 17. Advanced VCS for IO Fencing and Various Commands .................................................... 192 General Information ................................................................................................. 192
SCSI3 PGR Registration vs Reservation ...................................................................... SCSI3 PGR FAQ .................................................................................................... IO Fencing / CFS Information ................................................................................... ISCSI Solaris software Target and Initiator Veritas Cluster Configuration with Zones ........... Heart Beat Testing .................................................................................................. Software Testing Heart Beats - unsupported ......................................................... Heart Beat Validation ...................................................................................... Using Mirroring for Storage Migration ........................................................................ 18. OpenSolaris 2009.06 COMSTAR ................................................................................. Installation ............................................................................................................. Simple Setup An iSCSI LUN .................................................................................... Walkthrough of Simple iSCSI LUN Example ............................................................... Setup iSCSI with ACL's ........................................................................................... 19. Sun Cluster 3.2 .......................................................................................................... Preperation ............................................................................................................. Installation ............................................................................................................. Basic Configuration ................................................................................................. General Commands ................................................................................................. Create a Failover Apache Resource Group ................................................................... Create a Failover NGZ Resource Group ...................................................................... Create a Parallel NGZ Configuration ......................................................................... Oracle 10g RAC for Containers ................................................................................ Zone and QFS Creation and Configuration .......................................................... Sun Cluster RAC Framework ............................................................................ 20. Hardware Notes ......................................................................................................... SunFire X2200 eLOM Management ........................................................................... SP General Commands ..................................................................................... Connection via Serial Port ................................................................................ System console ............................................................................................... To Set Up Serial Over LAN With the Solaris OS .................................................. Configure ELOM/SP ....................................................................................... 5120 iLOM Management ..........................................................................................
193 194 195 203 206 206 206 207 213 213 213 214 214 217 217 218 220 224 225 227 227 229 229 233 234 234 234 234 234 235 235 236
vi
List of Tables
1.1. Identifying Threats ....................................................................................................... 1 1.2. Orange Book NIST Security Levels ................................................................................. 2 1.3. EAL Security Levels ..................................................................................................... 3 1.4. EAL Security Component Acronyms ............................................................................... 5 4.1. Common IPFilter Commands ........................................................................................ 22 5.1. Coolthreads Systems ................................................................................................... 39 5.2. Incomplete IO Domain Distribution ............................................................................... 39 5.3. VCS Command Line Access - Global vs. Non-Global Zones .............................................. 59 6.1. Wanboot Server Client Details ...................................................................................... 65 10.1. esxcfg-commands .................................................................................................... 128 12.1. ASM View Table .................................................................................................... 146 13.1. PowerPath CLI Commands ....................................................................................... 152 13.2. PowerPath powermt commands .................................................................................. 152 17.1. Summary of SCSI3-PGR Keys .................................................................................. 196 19.1. Sun Cluster Filesystem Requirements .......................................................................... 217
vii
Security Overview
Can Exploit This Vulnerability Lack of fire extinguishers Lack of training or standards enforcement; Lack of auditing
Resulting in This Threat Facility and computer damage, and possible loss of life Sharing mission-critical information; Altering data inputs and outputs from data processing applications
Contractor Attacker
Lax access control mechanisms Stealing trade secrets Poorly written application; Lack Conducting buffer-overflow; of stringent firewall settings Conducting a Denial-of-Service attack Lack of security guard Breaking windows and stealing computers and devices
Intruder
7. Orange Book Security Levels <security, standard> A standard from the US Government National Computer Security Council (an arm of the U.S. National Security Agency), "Trusted Computer System Evaluation Criteria, DOD standard 5200.28-STD, December 1985" which defines criteria for trusted computer products. There are four levels, A, B, C, and D. Each level adds more features and requirements. Levels B and A provide mandatory control. Access is based on standard Department of Defense clearances. Orange Book n. The U.S. Government's (now obsolete) standards document "Trusted Computer System Evaluation Criteria, DOD standard 5200.28-STD, December, 1985" which characterize secure computing architectures and defines levels A1 (most secure) through D (least). Modern Unixes are roughly C2.
B1 B2
B3 A1
Security Overview
The Evaluation Assurance Level (EAL1 through EAL7) of an IT product or system is a numerical grade assigned following the completion of a Common Criteria security evaluation, an international standard in effect since 1999. The increasing assurance levels reflect added assurance requirements that must be met to achieve Common Criteria certification. The intent of the higher levels is to provide higher confidence that the system's principal security features are reliably implemented. The EAL level does not measure the security of the system itself, it simply states at what level the system was tested to see if it meets all the requirements of its Protection Profile. The National Information Assurance Partnership (NIAP) is a U.S. Government initiative by the National Institute of Standards and Technology (NIST) and the National Security Agency (NSA). To achieve a particular EAL, the computer system must meet specific assurance requirements. Most of these requirements involve design documentation, design analysis, functional testing, or penetration testing. The higher EALs involve more detailed documentation, analysis, and testing than the lower ones. Achieving a higher EAL certification generally costs more money and takes more time than achieving a lower one. The EAL number assigned to a certified system indicates that the system completed all requirements for that level. Although every product and system must fulfill the same assurance requirements to achieve a particular level, they do not have to fulfill the same functional requirements. The functional features for each certified product are established in the Security Target document tailored for that product's evaluation. Therefore, a product with a higher EAL is not necessarily "more secure" in a particular application than one with a lower EAL, since they may have very different lists of functional features in their Security Targets. A product's fitness for a particular security application depends on how well the features listed in the product's Security Target fulfill the application's security requirements. If the Security Targets for two products both contain the necessary security features, then the higher EAL should indicate the more trustworthy product for that application.
Security Overview
Assurance Levels
Description on the part of the developer than is consistent with good commercial practice. As such it should not require a substantially increased investment of cost or time. EAL2 is therefore applicable in those circumstances where developers or users require a low to moderate level of independently assured security in the absence of ready availability of the complete development record. Such a situation may arise when securing legacy systems. EAL3 permits a conscientious developer to gain maximum assurance from positive security engineering at the design stage without substantial alteration of existing sound development practices. EAL3 is applicable in those circumstances where developers or users require a moderate level of independently assured security, and require a thorough investigation of the TOE and its development without substantial re-engineering. EAL4 permits a developer to gain maximum assurance from positive security engineering based on good commercial development practices which, though rigorous, do not require substantial specialist knowledge, skills, and other resources. EAL4 is the highest level at which it is likely to be economically feasible to retrofit to an existing product line. EAL4 is therefore applicable in those circumstances where developers or users require a moderate to high level of independently assured security in conventional commodity TOEs and are prepared to incur additional security-specific engineering costs. Commercial operating systems that provide conventional, user-based security features are typically evaluated at EAL4. Examples of such operating systems are AIX[1], HP-UX[1], FreeBSD, Novell NetWare, Solaris[1], SUSE Linux Enterprise Server 9[1][2], SUSE Linux Enterprise Server 10[3], Red Hat Enterprise Linux 5[4], Windows 2000 Service Pack 3, Windows 2003[1][5], Windows XP[1][5], Windows 2008[1], and Windows Vista[1]. Operating systems that provide multilevel security are evaluated at a minimum of EAL4. Examples include Trusted Solaris, Solaris 10 Release 11/06 Trusted Extensions,[6] an early version of the XTS-400, and VMware ESX version 3.0.2[7]. EAL5 permits a developer to gain maximum assurance from security engineering based upon
Security Overview
Assurance Levels
Description rigorous commercial development practices supported by moderate application of specialist security engineering techniques. Such a TOE will probably be designed and developed with the intent of achieving EAL5 assurance. It is likely that the additional costs attributable to the EAL5 requirements, relative to rigorous development without the application of specialized techniques, will not be large. EAL5 is therefore applicable in those circumstances where developers or users require a high level of independently assured security in a planned development and require a rigorous development approach without incurring unreasonable costs attributable to specialist security engineering techniques. Numerous smart card devices have been evaluated at EAL5, as have multilevel secure devices such as the Tenix Interactive Link. XTS-400 (STOP 6) is a general-purpose operating system which has been evaluated at EAL5 augmented. LPAR on IBM System z is EAL5 Certified.[8]
EAL6: Semiformally Verified Design and Tested EAL6 permits developers to gain high assurance from application of security engineering techniques to a rigorous development environment in order to produce a premium TOE for protecting high value assets against significant risks. EAL6 is therefore applicable to the development of security TOEs for application in high risk situations where the value of the protected assets justifies the additional costs. An example of an EAL6 certified system is the Green Hills Software INTEGRITY-178B operating system, the only operating system to achieve EAL6 thus far.[9] EAL7: Formally Verified Design and Tested EAL7 is applicable to the development of security TOEs for application in extremely high risk situations and/or where the high value of the assets justifies the higher costs. Practical application of EAL7 is currently limited to TOEs with tightly focused security functionality that is amenable to extensive formal analysis. The Tenix Interactive Link Data Diode Device has been evaluated at EAL7 augmented, the only product to do so.
Security Overview
Description Controlled Access Protection Profile Role Based Access Control Protection Profile
a. A security level is a (c, s) pair: - c = classification E.g., unclassified, secret, top secret - s = categoryset E.g., Nuclear, Crypto b. (c1, s1) dominates (c2, s2) iff c1 c2 and s2 s1 c. Subjects and objects are assigned security levels - level(S), level(O) security level of subject/object - current-level(S) subject may operate at lower level - f = (level, level, current-level) 10.DAC vs. MAC Most people familiar with discretionary access control (DAC); - Example: Unix user-group-other permission bits - Might set a file private so only group friends can read it Discretionary means anyone with access can propagate information: - Mail [email protected] < private Mandatory access control - Security administrator can restrict propagation
5. How long is the project expected to last? 6. What metrics will be needed and collected for the pre/post project analysis? 7. How is success defined? Kickoff meeting 1. Define scope - what options and solutions are needed, what are the priorities, what items are must vs. nice to have. Also identify what is related but out of scope. If project is to be broken down into phases, that should be identified and the second phase and greater needs to be "adapted for" but not part of the success of the initial phase. It is good, when multiple groups are involded, to have each report back with their weighted options list (RFE/RFC). 2. Define ownership - including contact information 3. Milestones and Goals; including dependencies and serialized processes 4. Setup timelines and re-occuring meetings 5. Make sure there are next steps and meeting notes posted. Handling RFE/RFC Metrics and Weighted Items 1. Should vendor solutions be needed create a weighted requirments list. Should a vendor not be needed the same items should be identified for cross-team participation; or with the impacted group. 2. Define what vendors will be sent the weighted list 3. Develop the weighted list; usually 1-10 plus N/A. Information about a feature that is only included in the next release may be presented seperatly however it should have no weight. 4. Define expected completion date of the RFC by the vendor 5. Corelate answers based on weight and identify the optimal product for evaluation. Should more than one be close in score; there is a potential for a bake-off between products. Post Project Review and Presentation 1. Comparison of Pre/Post Project Metrics 2. Credits to all involved 3. Examples of Success - feedback from operations
Use logger to mark manual tasks and milestones If possible, run VXexplorer or SUNexplorer and save a copy remote Write a script to copy off key files - should be written based on test type Define rollback method - snapshot / LU Alternate Boot Example BART Data Collection ; run copy against all necessary directories; in this example that would include /etc and /zone; if milestones are involved then frequest collections of bart may be necessary to track overall changes within different enviironment stages. Just name the manifest based on the stage. # mkdir /bart-files # bart create -R /etc > /bart-files/etc.control.manifest
</screen> </stepxmp> </step> <step><cmd>Create Directories on both $Node0; and $Node1;</cmd> <stepxmp> <screen> # On &Node0; and &Node1; mkdir -p /oracle/dbdata/vote chown -R oracle:dba /oracle/dbdata chmod 774 /oracle/dbdata chmod 774 /oracle/dbdata/vote </screen> </stepxmp> </step> </steps> </taskbody> </task> This could be broken down even further with the right processing script <task id= "T11001"> <title>Volume Creation</title> <comments>Template Creates a Veritas Volume when passed an ENTITY value for the following: Disk Group: &DG Volume Name: &VOL Volume Size: &SIZE User Owner: &USER Volume Permission Mode: &MODE </comments> <command>/usr/sbin/vxassist -g &DG; make &VOL; \ &SIZE; user=&USER; mode=&MODE; </command> <return>1</return> </task> Tasks could be templated to execute as a sequence as a procedure- DITA Map is good for this, but example is just off-the-cuff xml <procedure id = "P001"> <title>Create Volume, Filesystem and add into VCS</title> <task id = "T1001"/> <task id = "T1002"/> <task id = "T1003"/> <return>1</return> </procedure> Procedures could be grouped together as part of a certification <certification id="C001"> <title>SFRAC 5.0 MP3 Certification</title> <procedure id= "P001"/> <procedure id= "P002"/> <procedure id= "P003"/> <return>1</return>
10
</certification> Execution Code for tasks/procedures should be able to pass back a return code for each task; probably best to return time to execute also. These numeric return codes and times would be best placed into a database with a table simular in concept to cert ( id, procedure, task , results) and cross link to a cert_info (id, description, owner, participants, BU, justification) If all is done well, then the certification tasks are re-usable for many certifications and only need to be written once, the process is defined and can be reproduced, and every command executed is logged and could be used to generate operational procedures.
11
12
RAID Overview
RAID is not a good alternative to backing up data. Data may become damaged or destroyed without harm to the drive(s) on which they are stored. For example, part of the data may be overwritten by a system malfunction; a file may be damaged or deleted by user error or malice and not noticed for days or weeks; and of course the entire array is at risk of physical damage.
Principles
RAID combines two or more physical hard disks into a single logical unit by using either special hardware or software. Hardware solutions often are designed to present themselves to the attached system as a single hard drive, so that the operating system would be unaware of the technical workings. For example, you might configure a 1TB RAID 5 array using three 500GB hard drives in hardware RAID, the operating system would simply be presented with a "single" 1TB disk. Software solutions are typically implemented in the operating system and would present the RAID drive as a single drive to applications running upon the operating system. There are three key concepts in RAID: mirroring, the copying of data to more than one disk; striping, the splitting of data across more than one disk; and error correction, where redundant data is stored to allow problems to be detected and possibly fixed (known as fault tolerance). Different RAID levels use one or more of these techniques, depending on the system requirements. RAID's main aim can be either to improve reliability and availability of data, ensuring that important data is available more often than not (e.g. a database of customer orders), or merely to improve the access speed to files (e.g. for a system that delivers video on demand TV programs to many viewers). The configuration affects reliability and performance in different ways. The problem with using more disks is that it is more likely that one will go wrong, but by using error checking the total system can be made more reliable by being able to survive and repair the failure. Basic mirroring can speed up reading data as a system can read different data from both the disks, but it may be slow for writing if the configuration requires that both disks must confirm that the data is correctly written. Striping is often used for performance, where it allows sequences of data to be read from multiple disks at the same time. Error checking typically will slow the system down as data needs to be read from several places and compared. The design of RAID systems is therefore a compromise and understanding the requirements of a system is important. Modern disk arrays typically provide the facility to select the appropriate RAID configuration.
Nested levels
Many storage controllers allow RAID levels to be nested: the elements of a RAID may be either individual disks or RAIDs themselves. Nesting more than two deep is unusual. As there is no basic RAID level numbered larger than 10, nested RAIDs are usually unambiguously described by concatenating the numbers indicating the RAID levels, sometimes with a "+" in between. For example, RAID 10 (or RAID 1+0) consists of several level 1 arrays of physical drives, each of which is one of the "drives" of a level 0 array striped over the level 1 arrays. It is not called RAID 01, to avoid confusion with RAID 1, or indeed, RAID 01. When the top array is a RAID 0 (such as in RAID 10 and RAID 50) most vendors omit the "+", though RAID 5+0 is clearer. RAID 0+1: striped sets in a mirrored set (minimum four disks; even number of disks) provides fault tolerance and improved performance but increases complexity. The key difference from RAID 1+0 is that RAID 0+1 creates a second striped set to mirror a primary striped set. The array continues to operate with one or more drives failed in the same mirror set, but if drives fail on both sides of the mirror the data on the RAID system is lost. RAID 1+0: mirrored sets in a striped set (minimum four disks; even number of disks) provides fault tolerance and improved performance but increases complexity. The key difference from RAID 0+1 is
13
RAID Overview
that RAID 1+0 creates a striped set from a series of mirrored drives. In a failed disk situation, RAID 1+0 performs better because all the remaining disks continue to be used. The array can sustain multiple drive losses so long as no mirror loses all its drives. RAID 5+0: stripe across distributed parity RAID systems. RAID 5+1: mirror striped set with distributed parity (some manufacturers label this as RAID 53).
Non-standard levels
Many configurations other than the basic numbered RAID levels are possible, and many companies, organizations, and groups have created their own non-standard configurations, in many cases designed to meet the specialised needs of a small niche group. Most of these non-standard RAID levels are proprietary. Some of the more prominent modifications are: Storage Computer Corporation uses RAID 7, which adds caching to RAID 3 and RAID 4 to improve I/O performance. EMC Corporation offered RAID S as an alternative to RAID 5 on their Symmetrix systems (which is no longer supported on the latest releases of Enginuity, the Symmetrix's operating system). The ZFS filesystem, available in Solaris, OpenSolaris, FreeBSD and Mac OS X, offers RAID-Z, which solves RAID 5's write hole problem. NetApp's Data ONTAP uses RAID-DP (also referred to as "double", "dual" or "diagonal" parity), which is a form of RAID 6, but unlike many RAID 6 implementations, does not use distributed parity as in RAID 5. Instead, two unique parity disks with separate parity calculations are used. This is a modification of RAID 4 with an extra parity disk. Accusys Triple Parity (RAID TP) implements three independent parities by extending RAID 6 algorithms on its FC-SATA and SCSI-SATA RAID controllers to tolerate three-disk failure. Linux MD RAID10 (RAID10) implements a general RAID driver that defaults to a standard RAID 1+0 with 4 drives, but can have any number of drives. MD RAID10 can run striped and mirrored with only 2 drives with the f2 layout (mirroring with striped reads, normal Linux software RAID 1 does not stripe reads, but can read in parallel).[4] Infrant (Now part of Netgear) X-RAID offers dynamic expansion of a RAID5 volume without having to backup/restore the existing content. Just add larger drives one at a time, let it resync, then add the next drive until all drives are installed. The resulting volume capacity is increased without user downtime. (It should be noted that this is also possible in Linux, when utilizing Mdadm utility. It has also been possible in the EMC Clariion for several years.) BeyondRAID created by Data Robotics and used in the Drobo series of products, implements both mirroring and striping simultaneously or individually dependent on disk and data context. BeyondRAID is more automated and easier to use than many standard RAID levels. It also offers instant expandability without reconfiguration, the ability to mix and match drive sizes and the ability to reorder disks. It is a block-level system and thus file system agnostic although today support is limited to NTFS, HFS+, FAT32, and EXT3. It also utilizes Thin provisioning to allow for single volumes up to 16TB depending on the host operating system support.
14
15
Solaris Security
ahlt all arge argv cnt group none path perzone public seq trail windata_down windata_up zonename
halt machine if it can not record an async event all policies include exec environment args in audit recs include exec command line args in audit recs when no more space, drop recs and keep a cnt include supplementary groups in audit recs no policies allow multiple paths per event use a separate queue and auditd per zone audit public files include a sequence number in audit recs include trailer token in audit recs include downgraded window information in audit recs include upgraded window information in audit recs generate zonename token
Class settings are located in /etc/security/audit_control and are in the following format: #!/bin/sh dir:/fisc/bsm flags:lo,ex,ad minfree:20 naflags:lo,ad # # # # # # # location of audit trail classes being audited for success and failure. Do not grow audit trails if less than 20% free events that cannot be attributed to a particular user.
You can add the following as class attributes be ware that more logging is more file system space used. In many cases this should be custom setup depending on the server function, such as database, application, or firewall. Class Alias Description no: nvalid class fr: file read w file write fa: file attribute access fm: file attribute modify fc: file create fd: file delete cl: file close pc: process nt: network ip: pc na non-attribute ad administrative lo: login or logout ap application io: octl ex: exec ot: other all: all classes In addition each user can have their own audit trails custom fit. This is handled through the /etc/ security/audit_user file and has the following format: # User Level Audit User File
16
Solaris Security
# # # username:always:never # root:lo:no Individual users can have their audit trail adjusted to collect all possible data, but testing on each change is vital. Any typo in /etc/security/audit_user can, and will, result in that users inability to login. Each user can have their own audit trails custom fit. This is handled through the /etc/security/audit_user file and has the following format: # User Level Audit User File # # # username:always:never # root:lo:no myuser:lo:no Individual users can have their audit trail adjusted to collect all possible data, but testing on each change is vital. Any typo in /etc/security/audit_user can, and will, result in that users inability to login.
Commands:
3. File descriptions and control features /etc/security/device_allocate is used to associate specific devices, like st0 to RBAC roles and cleanup scripts run at boot time. audio;audio;reserved;reserved;solaris.device.allocate;\ 17
Solaris Security
/etc/security/lib/audio_clean fd0;fd;reserved;reserved;solaris.device.allocate;\ /etc/security/lib/fd_clean sr0;sr;reserved;reserved;solaris.device.allocate;\ /etc/security/lib/sr_clean /etc/security/device_maps is a listing of devices \ with alias names such as: audio:\ audio:\ /dev/audio /dev/audioctl /dev/sound/0 /dev/sound/0ctl:\ fd0:\ fd:\ /dev/diskette /dev/rdiskette /dev/fd0a /dev/rfd0a /dev/fd0b /dev/rfd0b /dev/fd0c /dev/fd0 /dev/rfd0c /dev/rfd0:\ sr0:\ sr:
/dev/sr0 /dev/rsr0 /dev/dsk/c0t2d0s0 /dev/dsk/c0t2d0s1 /dev/dsk/c0t2d0s2 /dev/dsk/c0t2d0s3 /dev/dsk/c0t2d0s4 /dev/dsk/c0t2d0s5 /dev/dsk/c0t2d0s6 /dev/dsk/c0t2d0s7 /dev/rdsk/c0t2d0s0 /dev/rdsk/c0t2d0s1 /dev/rdsk/c0t2d0s2 /dev/rdsk/c0t2d0s3 /dev/rdsk/c0t2d0s4 /dev/rdsk/c0t2d0s5 /dev/rdsk/c0t2d0s6 /dev/rdsk/c0t2d0s7
\ \ \ \ \ \ \ \
4. Converting root to a role and adding access to root role to a user Fundamentals - login as a user and assume root; then modify the root account as type role and add the root role to a user; test with fresh login before logging out $ su # usermod -K type=role root # usermod -R root useraccount remote> ssh useraccount@host_with_root_role_config $ su - root # 5. Command review, and examples Allocation is done by running specific commands, as well as deallocating the same device. Here are a few examples. # # # # # allocate F device_special_filename allocate F device_special_filename U user_id deallocate F device_special_filename deallocate I list_devices U username
18
Solaris Security
When combined a user with the RBAC role of solaris.device.allocate, can allocate fd0, sr0, and audit devices in essence hogging the device for themselves. The scripts referenced in the device_allocate file are used to deallocate the device in the event of a reboot this way no allocation would be persistent. Since these files are customizable, it is possible to remove vold related devices such as the cdrom mounting by just deleting that section. Remember that device allocation is not needed for auditing to work, and can be set to allocate nothing by stripping down the device_maps and device_allocate files however more testing should be done in this case.
General Hardening
1. IP Module Control IP module can be tuned to prevent forwarding , redirecting of packets and request for information from the system . These parameters can be set using ndd with the given value to limit these features . # # # # # # # # # ndd ndd ndd ndd ndd ndd ndd ndd ndd -set -set -set -set -set -set -set -set -set /dev/ip /dev/ip /dev/ip /dev/ip /dev/ip /dev/ip /dev/ip /dev/ip /dev/ip ip_forward_directed_broadcasts 0 ip_forward_src_routed 0 ip_ignore_redirect 1 ip_ire_flush_interval 60000 ip_ire_arp_interval 60000 ip_respond_to_echo_broadcast 0 ip_respond_to_timestamp 0 ip_respond_to_timestamp_broadcast 0 ip_send_redirects 0
2. Prevent buffer overflows Add the following lines to /etc/system file to prevent the buffer overflow in a possible attack to execute some malicious code on your machine. set noexec_user_stack=1 set noexec_user_stack_log=1
19
Solaris Security
syscall::uname:entry { self->addr = arg0; } syscall::uname:return { copyoutstr("SunOS", self->addr, 257); copyoutstr("PowerPC", self->addr+257, 257); copyoutstr("5.5.1", self->addr+(257*2), 257); copyoutstr("gate:1996-12-01", self->addr+(257*3), 257); copyoutstr("PPC", self->addr+(257*4), 257); } Before running the dtrace script: # uname -a SunOS homer 5.10 SunOS_Development sun4u sparc SUNW,Ultra-5_10 While running the dtrace script # uname -a SunOS PowerPC 5.5.1 gate:1996-12-01 PPC sparc SUNW,Ultra-5_10 Example killing a process when it trys to read a file #cat read.d #!/usr/sbin/dtrace -ws ufs_read:entry / stringof(args[0]->v_path) == $$1 / { printf("File %s read by %d\n", $$1, curpsinfo->pr_uid); raise(SIGKILL); } # more /etc/passwd Killed # ./read.d /etc/passwd dtrace: script './read.d' matched 1 probe dtrace: allowing destructive actions CPU ID FUNCTION:NAME 0 15625 ufs_read:entry File /etc/passwd read by 0
IPFilter Overview
1. Background With the release of Solaris 10, ipfilter is now supported. Before Solaris 10, EFS or SunScreen Lite was the default firewall. IPfilter is a mature product traditionally found in BSDish Operating Systems 2. Configure an ippool if list of firewalled hosts is large enough - use /etc/ipf/ippool.conf # /etc/ipf/ippool.conf # IP range for China
20
Solaris Security
table role = ipf type = tree number = 5 { 219.0.0.0/8; 220.0.0.0/8; 222.0.0.0/8; 200.0.0.0/8 ; 211.0.0.0/8; }; # IP Range for proplem hosts table role = ipf type = tree number = 6 { 66.96.240.229/32; 125.65.112.217/32; 77.79.103.219/32; 61.139.105.163/32; 61.160.216.0/24; }; # IP Range for internal network table role = ipf type = tree number = 7 { 192.168.15.0/24; } ; # IP Range for known information stealers table role = ipf type = tree number = 8 { 209.67.38.99/32; 204.178.112.170/32; 205.138.3.62/32; 199.95.207.0/24; 199.95.208.0/24; 216.52.13.39/32; 216.52.13.23/32; 207.79.74.222/32; 209.204.128.0/18; 209.122.130.0/24; 195.225.177.27/32; 65.57.163.0/25; 216.251.43.11/32; 24.211.168.40/32; 58.61.164.141/32; 72.94.249.34/32; }; 3. Configuring IPF First, you will need an ipf ruleset. The Solaris default location for this file is /etc/ ipf/ipf.conf. Below is the ruleset I used for a Solaris 10 x86 workstation. Note that the public NIC is called elx10. Simply copy this ruleset to a file called /etc/ipf/ipf.conf, and edit to your needs. # /etc/ipf/ipf.conf # # IP Filter rules to be loaded during startup # # See ipf(4) manpage for more information on
21
Solaris Security
# IP Filter rules syntax. # # Public Network. Block everything not explicity allowed. block in log on bge0 all block out log on bge0 all # # Allow all traffic on loopback. pass in quick on lo0 all pass out quick on lo0 all # # Allow pings out. pass out quick on bge0 proto icmp all keep state # # pass in log quick on bge0 proto tcp from any to 192.168.15.78/24 \ port = 8080 pass in log quick on bge0 proto tcp from any to 192.168.15.78/24 \ port = 443 pass in log quick on bge0 proto tcp from any to 192.168.15.78/24 \ port = 22 # Internal Hosts pass in quick from pool/7 to 192.168.15.78 # Blocked due to showup in IDS block in log quick from pool/6 to any # Block Asia APNIC Inbound block in log quick on bge0 proto tcp/udp from pool/5 to any # Block Asia APNIC Outbound block out log quick on bge0 proto tcp/udp from any to pool/5 # # Known information stealers block in log quick from pool/8 to any block out log quick from any to pool/8 # Allow outbound state related packets. pass out quick on bge0 proto tcp/udp from any to any keep state #
22
Solaris Security
Description Show summary Show input list Show output list Show hits against all rules Monitor the state table and refresh every : 5 seconds. Output is similar to : 'top' monitoring the process table. Watch state table. Write logged entries to syslog, and : convert back to hostnames and servicenames. Write logged entries to some file. Run ipmon as a daemon, and log to : default location. : (/var/adm/messages for Solaris) : (/var/ log/syslog for Tru64)
23
Solaris Security
74686973206973206d79206c6f6e6720626c6f77666973682065737020706173 Configuring IPsec Policies IPsec policies are rules that the IP stack uses to determine what action should be taken. Actions include: bypass: Do nothing, skip the remaining rules if datagram matches. drop: Drop if datagram matches. permit: Allow if datagram matches, otherwise discard. (Only for inbound datagrams.) ipsec: Use IPsec if the datagram matches. As you can see, this sounds similar to a firewall rule, and to some extent can be used that way, but you ultimately find IPFilter much better suited to that task. When you plan your IPsec environment consider which rules are appropriate in which place. IPsec policies are defined in the /etc/inet/ipsecinit.conf file, which can be loaded/reloaded using the ipsecconf command. Lets look at a sample configuration: benr@ultra inet$ cat /etc/inet/ipsecinit.conf ## ## IPsec Policy File: ## # Ignore SSH { lport 22 dir both } bypass { } # IPsec Encrypt telnet Connections to 8.11.80.5 { raddr 8.11.80.5 rport 23 } ipsec \ { encr_algs blowfish encr_auth_algs sha1 sa shared Our first policy explicitly bypasses connections in and out ("dir both", as in direction) for the local port 22 (SSH). Do I need this here? No, but I include it as an example. You can see the format, the first curly block defines the filter, the second curly block defines parameters, the keyword in between is the action. The second policy is what we're interested in, its action is ipsec, so if the filter in the first curly block matches we'll use IPsec. "raddr" defines a remote address and "rport" defines a remote port, therefore this policy applies only to outbound connections where we're telnet'ing (port 23) to 8.11.80.5. The second curly block defines parameters for the action, in this case we define the encryption algorithm (Blowfish), encryption authentication algorithm (SHA1), and state that the Security Association is "shared". This is a full ESP connection, meaning we're encrypting and encapsulating the full packet, if we were doing AH (authentication only) we would only define "auth_algs". Now, on the remote side of the connection (8.11.80.5) we create a similar policy, but rather than "raddr" and "rport" we use "laddr" (local address) and "lport" (local port). We could even go so far as to specify the remote address such that only the specified host would use IPsec to the node. Here's that configuration: ## ## IPsec Policy File:
# Ignore SSH { lport 22 dir both } bypass { } # IPsec Encrypt telnet Connections to 8.11.80.5 { laddr 8.11.80.5 lport 23 } ipsec \ { encr_algs blowfish encr_auth_algs sha1 sa shared }
24
Solaris Security
To load the new policy file you can refresh the ipsec/policy SMF service like so: svcadm refresh ipsec/ policy. I recommend avoiding the ipsecconf command except to (without arguments) display the active policy configuration. So we've defined policies that will encrypt traffic from one node to another, but we're not done yet! We need to define a Security Association that will association keys with our policy. Creating Security Associations Security Associations (SAs) can be manually created by either using the ipseckeys command or directly editing the /etc/inet/secret/ipseckeys file, I recommend the latter, I personally find the ipseckeys shell very intimidating. Lets look at a sample file and then discuss it: add esp spi 1000 src 8.15.11.17 dst 8.11.80.5 auth_alg sha1 \ authkey 6d792073686f72742061682070617373776f7264 encr_alg \ blowfish encrkey 6d792073686f72742061682070617373 add esp spi 1001 src 8.11.80.5 dst 8.15.11.17 auth_alg sha1\ authkey 6d792073686f72742061682070617373776f7264 encr_alg \ blowfish encrkey 6d792073686f72742061682070617373 It looks more intimidating that it is. Each line is "add"ing a new static Security Association, both are for ESP. The SPI is the "Security Parameters Index", is a simple numeric value that represents the SA, nothing more, pick any value you like. The src and dst define the addresses to which this SA applies, note that you have two SA's here, one for each direction. Finally, we define the encryption and authentication algorithms and full keys. I hope that looking at this makes it more clear how policies and SA's fit together. If the IP stack matches a datagram against a policy who's action is "ipsec", it takes the packet and looks for an SA who's address pair matches, and then uses those keys for the action encryption. Note that if someone obtains your keys your hosed. If you pre-shared keys in this way, change the keys from time-to-time or consider using IKE which can negotiate keys (and thus SAs) on your behalf. To apply your new SA's, flush and then load using the ipseckeys command: $ ipseckey flush $ ipseckey -f /etc/inet/secret/ipseckeys Is it working? How to Test All this is for nothing if you don't verify that the packets are actually encrypted. Using snoop, you should see packets like this: $ snoop -d e1000g0 Using device e1000g0 (promiscuous mode) ETHER: ----- Ether Header ----ETHER: ETHER: Packet 1 arrived at 9:52:4.58883 ETHER: Packet size = 90 bytes ETHER: Destination = xxxxxxxxxxx, ETHER: Source = xxxxxxxxxx, ETHER: Ethertype = 0800 (IP) ETHER:
25
Solaris Security
IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: IP: ESP: ESP: ESP: ESP: ESP:
----- IP Header ----Version = 4 Header length = 20 bytes Type of service = 0x00 xxx. .... = 0 (precedence) ...0 .... = normal delay .... 0... = normal throughput .... .0.. = normal reliability .... ..0. = not ECN capable transport .... ...0 = no ECN congestion experienced Total length = 72 bytes Identification = 36989 Flags = 0x4 .1.. .... = do not fragment ..0. .... = last fragment Fragment offset = 0 bytes Time to live = 61 seconds/hops Protocol = 50 (ESP) Header checksum = ab9c Source address = XXXXXXXXX Destination address = XXXXXXXXXXXX No options ----- Encapsulating Security Payload ----SPI = 0x3e8 Replay = 55 ....ENCRYPTED DATA....
And there you go. You can no encrypt communication transparently in the IP stack. Its a little effort to get going, but once its running your done... just remember to rotate those keys every so often!
26
Solaris Security
MIICOzCCAaSgAwIBAgIFAJRpUUkwDQYJKoZIhvcNAQEEBQAwTzELMAkGA1UEBhMC [ ... some lines omitted ... ] oi4dO39J7cSnooqnekHjajn7ND7T187k+f+BVcFVbSenIzblq2P0u7FIgIjdlv0= -----END X509 CERTIFICATE----3. Do the same on the other host. $ ikecert certlocal -ks -m 1024 -t rsa-md5 -D \ "C=de, O=moellenkamp, OU=moellenkamp-vpn, CN=theoden"\ -A IP=10.211.55.200 Creating private key. Certificate added to database. -----BEGIN X509 CERTIFICATE----MIICOzCCAaSgAwIBAgIFAIRuR5QwDQYJKoZIhvcNAQEEBQAwTzELMAkGA1UEBhMC [ ... some lines omitted ... ] UHJ4P6Z0dtjnToQb37HNq9YWFRguSsPQvc/Lm+S9cJCLwINVg7NOXXgnSfY3k+Q= -----END X509 CERTIFICATE----4. Okay, now we have to tell both hosts to use IPsec when they talk to each other: $ echo "{laddr gandalf raddr theoden} ipsec \ {auth_algs any encr_algs any sa shared}"\ >> /etc/inet/ipsecinit.conf 5. This translates to: When im speaking to theoden, i have to encrypt the data and can use any negotiated and available encryptition algorithm and any negotiated and available authentication algorithm. Such an rule is only valid on one direction. Thus we have to define the opposite direction on the other host to enable bidirectional traffic: $ echo "{laddr theoden raddr gandalf} ipsec \ {auth_algs any encr_algs any sa shared}" \ >> /etc/inet/ipsecinit.conf 6. Okay, the next configuration is file is a little bit more complex. Go into the directory /etc/inet/ike and create a file config with the following content: cert_trust "10.211.55.200" cert_trust "10.211.55.201" p1_xform { auth_method preshared oakley_group 5 auth_alg sha encr_alg des } p2_pfs 5 { label "DE-theoden to DE-gandalf" local_id_type dn local_id "C=de, O=moellenkamp, OU=moellenkamp-vpn, CN=theoden" remote_id "C=de, O=moellenkamp, OU=moellenkamp-vpn, CN=gandalf" local_addr 10.211.55.200 remote_addr 10.211.55.201 p1_xform
27
Solaris Security
{auth_method rsa_sig oakley_group 2 auth_alg md5 encr_alg 3des} } 7. Okay, we are almost done. But there is still a missing but very essential thing when you want to use certificates. We have to distribute the certificates of the systems. $ ikecert certdb -l Certificate Slot Name: 0 Key Type: rsa (Private key in certlocal slot 0) Subject Name: Key Size: 1024 Public key hash: 28B08FB404268D144BE70DDD652CB874 At the beginning there is only the local key in the system. We have to import the key of the remote system. Do you remember the output beginning with -----BEGIN X509 CERTIFICATE----- and ending with -----END X509 CERTIFICATE-----? You need this output now. 8. The next command wont come back after you hit return. You have to paste in the key. On gandalf you paste the output of the key generation on theoden. On Theoden you paste the output of the key generation on gandalf. Lets import the key on gandalf $ ikecert certdb -a -----BEGIN X509 CERTIFICATE----MIICOzCCAaSgAwIBAgIFAIRuR5QwDQYJKoZIhvcNAQEEBQAwTzELMAkGA1UEBhMC UHJ4P6Z0dtjnToQb37HNq9YWFRguSsPQvc/Lm+S9cJCLwINVg7NOXXgnSfY3k+Q= -----END X509 CERTIFICATE----[root@gandalf:/etc/inet/ike]$ 9. After pasting, you have to hit Enter once and after this you press Ctrl-D once. Now we check for the successful import. You will see two certificates now. $ ikecert certdb -l Certificate Slot Name: 0 Key Type: rsa (Private key in certlocal slot 0) Subject Name: Key Size: 1024 Public key hash: 28B08FB404268D144BE70DDD652CB874 Certificate Slot Name: 1 Key Type: rsa Subject Name: Key Size: 1024 Public key hash: 76BE0809A6CBA5E06219BC4230CBB8B8 10.Okay, switch to theoden and import the key from gandalf on this system. $ ikecert certdb -l Certificate Slot Name: 0 Key Type: rsa (Private key in certlocal slot 0) Subject Name: Key Size: 1024 Public key hash: 76BE0809A6CBA5E06219BC4230CBB8B8 $ ikecert certdb -a -----BEGIN X509 CERTIFICATE----MIICOzCCAaSgAwIBAgIFAJRpUUkwDQYJKoZIhvcNAQEEBQAwTzELMAkGA1UEBhMC
28
Solaris Security
oi4dO39J7cSnooqnekHjajn7ND7T187k+f+BVcFVbSenIzblq2P0u7FIgIjdlv0= -----END X509 CERTIFICATE----$ ikecert certdb -l Certificate Slot Name: 0 Key Type: rsa (Private key in certlocal slot 0) Subject Name: Key Size: 1024 Public key hash: 76BE0809A6CBA5E06219BC4230CBB8B8 Certificate Slot Name: 1 Key Type: rsa Subject Name: Key Size: 1024 Public key hash: 28B08FB404268D144BE70DDD652CB874 11.Okay, now we have to activate this configuration on both systems: $ svcadm enable ike $ ipsecconf -a /etc/inet/ipsecinit.conf
3. Edit /etc/apache2/httpd.conf Set ServerName if necessary (default is 127.0.0.1) Set ServerAdmin to a valid email address 4. Enable Apache2 # svcadm enable apache2 5. Enable SSL Service Property if necessary. Log in as root and issue the following command: # svcprop -p httpd/ssl svc:network/http:apache2 If the response is false, issue these three commands: a. # svccfg -s http:apache2 setprop httpd/ssl=true b. # svcadm refresh http:apache2 c. # svcprop -p httpd/ssl svc:network/http:apache2 If the response is true, continue to the next step.
29
Solaris Security
6. Create a Certificate Directory and a Key Directory. # mkdir /etc/apache2/ssl.crt # mkdir /etc/apache2/ssl.key 7. Generate a RSA Key. # /usr/local/ssl/bin/openssl genrsa -des3 1024 > \ /etc/apache2/ssl.key/server.key Generating RSA private key, 1024 bit long modulus ..++++++ ++++++ e is 65537 (010001) Enter pass phrase: ******** Verifying - Enter pass phrase: ******** 8. Generate a Certificate Request. # /usr/local/ssl/bin/openssl req -new -key /etc/apache2/ssl.key/server.key \ > /etc/apache2/ssl.crt/server.csr Enter pass phrase for /etc/apache2/ssl.key/server.key: ******** You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter ., the field will be left blank. Country Name (2 letter code) [US]::US State or Province Name (full name) [Some-State]:OR Locality Name (eg, city) []:Blodgett Organization Name (eg, company) [Unconfigd OpenSSL Installation]:DIS Organizational Unit Name (eg, section) []:IT Common Name (eg, YOUR name) []:Big Cheese Email Address []:[email protected] Please enter the following extra attributes to be sent with your certificate request A challenge password []: ******** An optional company name []: Live Free or Die 9. Install a Self-Signed Certificate. If you are going to install a certificate from an authoritative source, follow their instructions and skip this step. # /usr/local/ssl/bin/openssl req -x509 -days 3650 -key \ > /etc/apache2/ssl.key/server.key \ > -in /etc/apache2/ssl.crt/server.csr > \
30
Solaris Security
> /etc/apache2/ssl.crt/server.crt Enter pass phrase for /etc/apache2/ssl.key/server.key: ******** 10.Edit the ssl.conf and change the line that begins with ServerAdmin to reflect an email address or alias for the Servers Administrator. 11.Test the SSL Certificate with Apache2 If Apache2 is enabled, disable it during testing: # svcadm disable apache2 12.Enable Apache2 with SSL to be started automatically as a service. # cd /etc/apache2/ssl.key # cp server.key server.key.org # /usr/local/ssl/bin/openssl rsa -in server.key.org -out server.key Enter pass phrase for server.key.org: ******** writing RSA key # chmod 400 server.key # svcadm enable apache2 # svcs | grep -i apache2 online 4:29:01 svc:/network/http:apache2
31
Solaris Security
# usermod -A solaris.admin.logsvc.read user_account 4. Converting root to a role and adding access to root role to a user Fundamentals - login as a user and assume root; then modify the root account as type role and add the root role to a user; test with fresh login before logging out $ su - # usermod -K type=role root # usermod -R root useraccount remote> ssh useraccount@host_with_root_role_config $ su - root #
# rm /etc/rc3.d/S76snmpdx # rm /etc/rc3.d/S90samba # Review /etc/rc2.d/S90* for deletion 2. Set Up Zone and Audit ZFS Pools Unused Disk List 36GB Disk c0t2d1 36GB Disk c1t2d1 # zpool create zones c0t2d1 # zfs create zones/secftp # zfs create zones/ftp-root [Must run ftpconfig before setting mountpoint legacy] # ftpconfig -d /zones/ftp-root # mkdir /zones/ftp-root/incoming # chown go-r /zones/ftp-root/incoming # zfs set mountpoint=legacy zones/ftp-root
32
Solaris Security
# chmod 700 zones/secftp # zpool create bsm c1t2d1 # zfs create bsm/audit 3. Configure Role for Primary Maintenance # mkdir /export/home # groupadd -g 2000 secadm # useradd -d /export/home/secuser -m secuser # passwd secuser # roleadd -u 2000 -g 2000 -d /export/home/secadm -m secadm # passwd secadm # rolemod -P "Primary Administrator","Basic Solaris User" secadm # usermod -R secadm secuser # svcadm restart system/name-service-cache #. logout of root, login as secuser # su - secadm 4. Change Root User to Root Role Fundamentals - login as a user and assume root; then modify the root account as type role and add the root role to a user; test with fresh login before logging out $ # # # # su usermod -K type=role root useradd -d /home/padmin -m -g 2000 padmin passwd padmin usermod -R root padmin
5. Install BSM on Global Server # cd /etc/security ## edit audit_control and change the dir:/var/audit to /bsm/audit ## Run the following command, you will need to reboot. # ./bsmconv 6. Create Zone secftp # zonecfg -z secftp secftp: No such zone configured Use 'create' to begin configuring a new zone. zonecft:secftp> create zonecft:secftp> set zonepath=/zones/secftp zonecft:secftp> set autoboot=false zonecft:secftpt> add fs zonecft:secftp:fs> set type=zfs zonecft:secftp:fs> set special=zones/ftp-root zonecft:secftp:fs> set dir=/ftp-root zonecft:secftp:fs> end zonecft:secftp> add net zonecft:secftp:net> set address=192.168.15.97 zonecft:secftp:net> set physical=pcn0
33
Solaris Security
zonecft:secftp:net> end zonecft:secftp> add attr zonecft:secftp:attr> set name=comment zonecft:secftp:attr> set type=string zonecft:secftp:attr> set value="Secure FTP Zone" zonecft:secftp:attr> end zonecft:secftp> verify zonecft:secftp> commit zonecft:secftp> exit zoneadm -z secftp verify zoneadm -z secftp install zoneadm -z secftp boot # zlogin -C secftp [Connected to zone 'secftp' ] Enter Requested Setup Information [Notice Zone Rebooting] secftp console login: root # passwd root 7. Disable Unwanted Network Services in Local Zone # # # # # # # # # svcadm svcadm svcadm svcadm svcadm svcadm svcadm svcadm svcadm disable disable disable disable disable disable disable disable disable sendmail rusers telnet rlogin rstat finger kshell network/shell:default snmpdx
# rm /etc/rc3.d/S76snmpdx # rm /etc/rc3.d/S90samba ## Review /etc/rc2.d/S90* for deletion 8. Add a user for secure ftp access [create same accounts and role changes as in global - you can set these to different names if you like] /etc/passwd: secxfr:x:2002:1::/ftp-root/./incoming:/bin/true # pwconv # passwd secxfr # set ot secxfr # Add /bin/true to /etc/shells # configure /etc/ftpd/ftpaccess
34
Solaris Security
Trusted Extensions
1. Fundamentals TX places classification, and compartment wrappers around Non-Global Zones and defines what systems can communicate with those zones a. Classification vs Compartment Classification is hierarchal level of security - TS , Confidential / Clearance / Sensitivity Label Compartment is sub groups - Devel, Management, b. Key Files for Trusted Extensions Site labels: defined in /etc/security/tsol/label_encodings Matching zones to labels: in /etc/security/tsol/tnzonecfg Network to label matching: in /etc/security/tsol/tnrhtp Defining network labels: in /etc/security/tsol/tnrhdb 2. Basic TX Configuration Make sure no non-global zones are configured or installed; Non-Global zones need to be mapped to a clearance and category before installation; these example content files will configure a host for three non-global zones; one for public "web like" features, one for internal host-to-host from non-labeled systems and one for secure tx to tx systems - labels are public, confidential and restricted. a. Check /etc/user_attr to make sure your root and root role account has the following access levels min_label=admin_low;clearance=admin_high b. Example label_encodings file Very primitive /etc/security/tsol/label_encodings file requiring only three non-global zones: VERSION= Sun Microsystems, Inc. Example Version - 6.0. 2/15/05 CLASSIFICATIONS: name= PUBLIC; sname= PUB; value= 2; initial compartments= 4; name= CONFIDENTIAL; sname= CNF; value= 4; initial compartments= 4; name= RESTRICTED; sname= RES; value= 10; initial compartments= 4; INFORMATION LABELS: WORDS: REQUIRED COMBINATIONS: COMBINATION CONSTRAINTS: SENSITIVITY LABELS:
35
Solaris Security
WORDS: REQUIRED COMBINATIONS: COMBINATION CONSTRAINTS: CLEARANCES: WORDS: REQUIRED COMBINATIONS: COMBINATION CONSTRAINTS: CHANNELS: WORDS: PRINTER BANNERS: WORDS: ACCREDITATION RANGE: classification= PUB; all compartment combinations valid; classification= RES; all compartment combinations valid; classification= CNF; all compartment combinations valid except: CNF minimum clearance= PUB; minimum sensitivity label= PUB; minimum protect as classification= PUB; * * Local site definitions and locally configurable options. * LOCAL DEFINITIONS: Default User Sensitivity Label= PUB; Default User Clearance= PUB; Default Label View is Internal; COLOR NAMES: label= Admin_Low; label= PUB; label= RES; label= CNF; label= Admin_High; * color= #bdbdbd; color= blue violet; color= red; color= yellow; color= #636363;
36
Solaris Security
* End of local site definitions * c. Set netservices to limited # netservices limited d. Update /etc/security/tsol/tnrhdb to include local interfaces as type cipso # CIPSO - who is a TX System 127.0.0.1:cipso 192.168.15.78:cipso 192.168.15.94:cipso # # ADMIN_LOW - what servers that are not TX, can talk to my global 192.168.15.1:admin_low # DNS Server 192.168.15.100:admin_low # Management Server # # SSH Allowed Remote 192.168.15.79:extranet 192.223.207.0:extranet # # All others can view my web site zone, but that is all. 0.0.0.0:world e. Update /etc/security/tsol/tnrhtb to define CIPSO connections and force a label for non-labeled host connections Note that this file uses "\" to shorten the lines for pdf output; remove them before using. # Default for locally plumbed interfaces cipso:host_type=cipso;doi=1;min_sl=ADMIN_LOW;max_sl=ADMIN_HIGH; # admin_low:host_type=unlabeled;doi=1;\ min_sl=ADMIN_LOW;max_sl=ADMIN_HIGH;def_label=ADMIN_LOW; extranet:host_type=unlabeled;doi=1;\ min_sl=RESTRICTED;max_sl=ADMIN_HIGH;def_label=RESTRICTED; world:host_type=unlabeled;doi=1;\ min_sl=PUBLIC;max_sl=ADMIN_HIGH;def_label=PUBLIC; f. Mapping the non-global zones to a LABEL is done in /etc/security/tsol/tnzonecfg # global:ADMIN_LOW:1:111/tcp;111/udp;515/tcp;\ 631/tcp;2049/tcp;6000-6003/tcp:6000-6003/tcp pub-tx01:0x0002-08-08:0:: restricted-tx01:0x000a-08-08:0:: g. Enable TX Services # # # # svcadm svcadm svcadm svcadm enable enable enable enable labeld tnd tsol-zones tname
37
Solaris Security
# txzonemgr 3. Permission and Access Control within TX and Non TX Zones TX places classification, and compartment wrappers around Non-Global Zones and defines what systems can communicate with those zones a. Allowing user upgrade information - should the labeled zone allow it. Information stored in /etc/ user_attr auths=solaris.label.file.upgrade defaultprivs=sys_trans_label,file_upgrade_sl b. Allowing user downgrade information - should the labeled zone allow it. Information stored in / etc/user_attr auths=solaris.label.file.downgrade defaultprivs=sys_trans_label,file_downgrade_sl c. Preventing user from seeing processes beyond the users ownership. Information stored in /etc/ user_attr defaultprivs=basic,!proc_info d. Combination of restrictions. Information stored in /etc/user_attr user::::auths=solaris.label.file.upgrade,\ solaris.label.file.downgrade;type=normal;\ defaultpriv=basic,!proc_info,sys_trans_label,\ file_upgrade_sl,file_downgrade_sl;\ clearance=admin_high;min_label=admin_low e. Paring priv limitations and expansion of features with non-global zone configuration zonecfg -z zone-name set limitpriv=default,file_downgrade_sl,\ file_upgrade_sl,sys_trans_label exit
38
2 UltraSPARC T2 Plus 128 2 UltraSPARC T2 Plus 128 4 UltraSPARC T2 Plus 256 1 UltraSPARC T2 1 UltraSPARC T2 1 UltraSPARC T2 1 UltraSPARC T1 1 UltraSPARC T1 1 UltraSPARC T1 1 UltraSPARC T1 1 UltraSPARC T1 64 64 64 32 32 32 32 32
39
Solaris Virtualization
Copyright 2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Installation of <SUNWldm> was successful. pkgadd -n -d "/export/home/rlb/LDoms_Manager-1_1/Product" -a pkg_admin SUNWjass Copyright 2005 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Installation of <SUNWjass> was successful. Verifying that all packages are fully installed. OK. Enabling services: svc:/ ldoms/ldmd:default Solaris Security Toolkit was not applied. Bypassing the use of the Solaris Security Toolkit is _not_ recommended and should only be performed when alternative hardening steps are to be taken. You have new mail in /var/mail/root
Create DOM1
# # # # # # svcadm enable vntsd ldm add-domain dom1 ldm add-vcpu 8 dom1 ldm add-memory 2048m dom1 ldm add-vnet pub0 primary-vsw0 dom1 ldm add-vnet isan0 primary-vsw1 dom1
40
Solaris Virtualization
Create LDOM #2
# # # # # # # # # # ldm add-domain dom2 ldm add-vcpu 8 dom2 ldm add-memory 2048m dom2 ldm add-vnet pub0 primary-vsw0 dom2 ldm add-vdiskserverdevice /dev/rdsk/c1t66d0s2 vol2@primary-vds0 ldm add-vdisk vdisk0 vol2@primary-vds0 dom2 ldm add-vdisk iso iso@primary-vds0 dom2 ldm set-variable auto-boot\?=false dom2 ldm bind dom2 ldm start dom2 LDom dom2 started
# telnet localhost 5001 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Connecting to console "dom2" in group "dom2" .... {0} ok boot iso // Continue as with LDOM#1
41
Solaris Virtualization
different guest domains. When a virtual disk backend is exported multiple times, it should not be exported with the exclusive (excl) option. Specifying the excl option will only allow exporting the backend once. Caution - When a virtual disk backend is exported multiple times, applications running on guest domains and using that virtual disk are responsible for coordinating and synchronizing concurrent write access to ensure data coherency. Export the virtual disk backend two times from a service domain by using the following commands. Note the "-f" that forces the second device to be defined. Without the "-f" the second command will fail reporting that the share must be "read only".
# ldm add-vdsdev [options={ro,slice}] backend volume1@service_name # ldm add-vdsdev -f [options={ro,slice}] backend volume2@service_name
Assign the exported backend to each guest domain by using the following commands.
# ldm add-vdisk [timeout=seconds] disk_name volume1@service_name ldom1 # ldm add-vdisk [timeout=seconds] disk_name volume2@service_name ldom2 Example: note that SVM was tested, but LDOM's would not recognize the disks
# zfs create -V 1g shared/fence0 # zfs create -V 1g shared/fence1 # zfs create -V 1g shared/fence2 # ldm add-vdsdev /dev/zvol/rdsk/shared/fence0 \ vsrv1_fence0@primary-vds0 # ldm add-vdsdev -f /dev/zvol/rdsk/shared/fence0 \ vsrv2_fence0@primary-vds0 # ldm add-vdsdev /dev/zvol/rdsk/shared/fence1 \ vsrv1_fence1@primary-vds0 # ldm add-vdsdev -f /dev/zvol/rdsk/shared/fence1 \ vsrv2_fence1@primary-vds0 # ldm add-vdsdev /dev/zvol/rdsk/shared/fence2 \ vsrv1_fence2@primary-vds0 # ldm add-vdsdev -f /dev/zvol/rdsk/shared/fence2 \ vsrv2_fence2@primary-vds0
# ldm add-vdisk fence0 vsrv1_fence0@primary-vds0 vsrv1 # ldm add-vdisk fence1 vsrv1_fence1@primary-vds0 vsrv1 # ldm add-vdisk fence2 vsrv1_fence2@primary-vds0 vsrv1
42
Solaris Virtualization
# ldm add-vdisk fence1 vsrv2_fence1@primary-vds0 vsrv1 # ldm add-vdisk fence2 vsrv2_fence2@primary-vds0 vsrv1
# ldm bind vsrv1 # ldm bind vsrv2 # ldm list NAME STATE primary active vsrv1 bound vsrv2 bound
FLAGS -n-cv-----------
VCPU 8 4 4
MEMORY 3968M 2G 2G
UTIL 0.2%
UPTIME 47m
primary-vnts-group1: h, l, c{id}, n{name}, q: l DOMAIN ID DOMAIN NAME DOMAIN STATE 0 ldg1 online 1 ldg2 online 2 ldg3 online
43
Solaris Virtualization
#!/bin/sh DOM=$1 date echo "Starting AutoDom" ## LDOM/dom3@primary is clean OS snapshot used as baseline ## create clone of snapshot zfs clone LDOM/dom3@primary LDOM/{$DOM} ## mount disk image for updating lofiadm -a /LDOM/$DOM/vdisk0.img mount /dev/lofi/1 /mnt ## update /etc/hosts, /etc/inet/ipnodes, ## /etc/hostname.vnet0 and /etc/nodename echo "# AutoDom Generated hosts file" >/mnt/etc/hosts echo '::1 localhost' >>/mnt/etc/hosts echo '127.0.0.1 localhost' >>/mnt/hosts grep $DOM /etc/inet/ipnodes | awk '{print $1, $2, "loghost"}'\ >>/mnt/etc/inet/ipnodes # updating ipnodes should be redundent, but just incase echo "# AutoDom Generated inet/ipnodes file" \ >/mnt/etc/inet/ipnodes echo '::1 localhost' >>/mnt/etc/inet/ipnodes echo '127.0.0.1 localhost' >>/mnt/etc/inet/ipnodes grep $DOM /etc/hosts | awk '{print $1, $2, "loghost"}' \ >>/mnt/etc/inet/ipnodes echo "$DOM" >/mnt/etc/nodename echo "$DOM" >/mnt/etc/hostname.vnet0 sync umount /mnt lofiadm -d /dev/lofi/1
# Create the LDOM ldm add-domain $DOM ldm add-vcpu 4 $DOM ldm add-mau 0 $DOM ldm add-memory 1G $DOM ldm add-vdiskserverdevice /LDOM/$DOM/vdisk0.img \ ${DOM}vdisk0@primary-vds0 ldm add-vdisk ${DOM}vdisk0 ${DOM}vdisk0@primary-vds0 $DOM ldm add-vnet vnet0 primary-vsw0 $DOM ldm set-variable auto-boot\?=false $DOM ldm set-variable local-mac-address\?=true $DOM ldm set-variable \ boot-device=/virtual-devices@100/channel-devices@200/disk@0 \ $DOM ldm bind-domain $DOM # All ready to boot as new image
44
Solaris Virtualization
/etc/VRTSvcs/conf/config/main.cf: group dom2 ( SystemList = { primary-dom1 = 0 } ) LDom ldom_dom2 ( LDomName = dom2 CfgFile = /etc/ldoms/dom2.xml )
View of ldm list when VCS LDOM Agent has been started bash-3.00# ldm list NAME STATE FLAGS primary active -n-cvdom1 active -t---dom2 active -t----
VCPU 8 8 8
View of ldm list when VCS LDOM Agent has been stopped
45
Solaris Virtualization
FLAGS -n-cv-----------
UTIL 0.4%
UPTIME 18m
Adjusting Number of CPU's in LDOM via LDom Agent # ldm list NAME STATE primary active dom1 inactive dom2 inactive
FLAGS -n-cv-----------
UTIL 0.4%
UPTIME 18m
# haconf -makerw # hares -modify ldom_dom1 NumCPU 4 # haconf -dump -makero # ldm list NAME STATE primary active dom1 inactive dom2 inactive
FLAGS -n-cv-----------
UTIL 0.4%
UPTIME 18m
# hagrp -online dom1 -sys dom0 # ldm list NAME STATE primary active dom1 active dom2 inactive
FLAGS -n-cv-t---------
UPTIME 18m 1s
Interaction between setting vCPU number in LDom Agent and CLI # ldm set-vcpu 8 dom1 # ldm list NAME STATE FLAGS primary active -n-cvdom1 active -n---dom2 inactive ------
CONS SP 5000
VCPU 8 8 8
UPTIME 26m 4m
# hares -display ldom_dom1 -attribute NumCPU #Resource Attribute System Value ldom_dom1 NumCPU global 4 # hagrp -offline dom1 -sys dom0 ### Note lack of VCPU definition on dom1 ###
46
Solaris Virtualization
# ldm list NAME STATE primary active dom1 inactive dom2 inactive
FLAGS -n-cv-----------
UTIL 0.4%
UPTIME 31m
# hagrp -online dom1 -sys dom0 ### System reverts back to NumCPU set in VCS ### # ldm list NAME STATE primary active dom1 active dom2 inactive
FLAGS -n-cv-t---------
### Additional Comments - dom1.xml never gets updated, ### ### so is set to 8CPU ###
Warning
When a LDOM uses a ZFS RAW Volume instead of a mkfile image on a ZFS FS, the Zpool Agent for VCS will attempt to mount and check the volume. Being a raw volume, this will cause the Agent to fail. To avoid this use the ChkZFSMounts 0 option.
Note
The LDOM XML File is generated by the # ldm ls-constraints -x dom1 >/etc/ldoms/dom1.xml command; make the /etc/ldoms directory on both servers first; create the xml file, then copy to both servers.
# hagrp -add LDOM # hagrp -modify LDOM SystemList sys1 0 sys2 1 # hagrp -modify LDOM AutoStartList sys1 # hares -add ldom_zp Zpool LDOM # # # # hares hares hares hares -modify -modify -modify -modify ldom_zp ldom_zp ldom_zp ldom_zp PoolName rapid_d AltRootPath / ChkZFSMounts 0 Enabled 1
47
Solaris Virtualization
# hares -add wanboot_ldm LDom LDOM # # # # hares hares hares hares -modify dom1_ldm CfgFile /etc/ldoms/dom1.xml -modify dom1_ldm NumCPU 4 -modify dom1_ldm LDomName dom1 -link dom1_ldm ldom_zp
# ldm list NAME STATE FLAGS primary active -n-cvwanboot active -n----
MOUNTPOINT -
b. Shutdown LDOM # ldm stop wanboot c. Generate LDOM XML Constraints File and copy to remote server
# ldm ls-constraints -x wanboot >/root/wanboot.xml # scp /root/wanboot.xml root@remote:/root/ d. Unbind Source LDOM Domain
48
Solaris Virtualization
c. Bind LDOM
Warning
Documentation on opensolaris web side uses different options to the virt-install command. Options displayed on website will not work, and are not available, on 2009.06 1. Create a back end zvol for installation
# virt-install --vnc -v --ram 2048 --file-size=18 \ --name svsrv2 -f /dev/zvol/dsk/vstorage/guests/svsrv2/rootdisk0 \ --bridge=nge0 --vcpus=4 -c /vstorage/iso/sol-10-u7-ga-x86-dvd.iso root@x2200:~# virsh vncdisplay svsrv2 :0 root@x2200:~# vncviewer localhost:0
49
Solaris Virtualization
Live Upgrade is the recommended program to upgrade and to add patches. Other upgrade programs might require extensive upgrade time, because the time required to complete the upgrade increases linearly with the number of installed non-global zones. If you are patching a system with Solaris Live Upgrade, you do not have to take the system to single-user mode and you can maximize your system's uptime. The following list summarizes changes to accommodate systems that have non-global zones installed. A new package, SUNWlucfg, is required to be installed with the other Solaris Live Upgrade packages, SUNWlur and SUNWluu. This package is required for any system, not just a system with non-global zones installed. Creating a new boot environment from the currently running boot environment remains the same as in previous releases with one exception. You can specify a destination disk slice for a shared file system within a non-global zone. For more information, see Creating and Upgrading a Boot Environment When Non-Global Zones Are Installed (Tasks). The lumount command now provides non-global zones with access to their corresponding file systems that exist on inactive boot environments. When the global zone administrator uses the lumount command to mount an inactive boot environment, the boot environment is mounted for non-global zones as well. See Using the lumount Command on a System That Contains Non-Global Zones. Comparing boot environments is enhanced. The lucompare command now generates a comparison of boot environments that includes the contents of any non-global zone. See To Compare Boot Environments for a System With Non-Global Zones Installed. Listing file systems with the lufslist command is enhanced to list file systems for both the global zone and the non-global zones. See To View the Configuration of a Boot Environment's Non-Global Zone File Systems. Upgrading and Patching Containers with Live Upgrade Solaris 10 8/07 adds the ability to use Live Upgrade tools on a system with Containers. This makes it possible to apply an update to a zoned system, e.g. updating from Solaris 10 11/06 to Solaris 10 8/07. It also drastically reduces the downtime necessary to apply some patches. The latter ability requires more explanation. An existing challenge in the maintenance of zones is patching - each zone must be patched when a patch is applied. If the patch must be applied while the system is down, the downtime can be significant. Fortunately, Live Upgrade can create an Alternate Boot Environment (ABE) and the ABE can be patched while the Original Boot Environment (OBE) is still running its Containers and their applications. After the patches have been applied, the system can be re-booted into the ABE. Downtime is limited to the time it takes to re-boot the system. An additional benefit can be seen if there is a problem with the patch and that particular application environment. Instead of backing out the patch, the system can be re-booted into the OBE while the problem is investigated. Understanding Solaris Zones and Solaris Live Upgrade The Solaris Zones partitioning technology is used to virtualize operating system services and provide an isolated and secure environment for running applications. A non-global zone is a virtualized operating system environment created within a single instance of the Solaris OS, the global zone. When you create a non-global zone, you produce an application execution environment in which processes are isolated from the rest of the system.
50
Solaris Virtualization
Solaris Live Upgrade is a mechanism to copy the currently running system onto new slices. When nonglobal zones are installed, they can be copied to the inactive boot environment along with the global zone's file systems. In this example of a system with a single disk, the root (/) file system is copied to c0t0d0s4. All nonglobal zones that are associated with the file system are also copied to s4. The /export and /swap file systems are shared between the current boot environment, bootenv1, and the inactive boot environment, bootenv2. The lucreate command is the following: # lucreate -c bootenv1 -m /:/dev/dsk/c0t0d0s4:ufs -n bootenv2 In this example of a system with two disks, the root (/) file system is copied to c0t1d0s0. All non-global zones that are associated with the file system are also copied to s0. The /export and /swap file systems are shared between the current boot environment, bootenv1, and the inactive boot environment, bootenv2. The lucreate command is the following: # lucreate -c bootenv1 -m /:/dev/dsk/c0t1d0s0:ufs -n bootenv2 In this example of a system with a single disk, the root (/) file system is copied to c0t0d0s4. All nonglobal zones that are associated with the file system are also copied to s4. The non-global zone, zone1, has a separate file system that was created by the zonecfg add fs command. The zone path is /zone1/ root/export. To prevent this file system from being shared by the inactive boot environment, the file system is placed on a separate slice, c0t0d0s6. The /export and /swap file systems are shared between the current boot environment, bootenv1, and the inactive boot environment, bootenv2. The lucreate command is the following: # lucreate -c bootenv1 -m /:/dev/dsk/c0t0d0s4:ufs \ -m /export:/dev/dsk/c0t0d0s6:ufs:zone1 -n bootenv2 In this example of a system with two disks, the root (/) file system is copied to c0t1d0s0. All non-global zones that are associated with the file system are also copied to s0. The non-global zone, zone1, has a separate file system that was created by the zonecfg add fs command. The zone path is /zone1/root/ export. To prevent this file system from being shared by the inactive boot environment, the file system is placed on a separate slice, c0t1d0s4. The /export and /swap file systems are shared between the current boot environment, bootenv1, and the inactive boot environment, bootenv2. The lucreate command is the following: # lucreate -c bootenv1 -m /:/dev/dsk/c0t1d0s0:ufs \ -m /export:/dev/desk/c0t1d0s4:ufs:zone1 -n bootenv2
51
Solaris Virtualization
pool: yourpool inherit-pkg-dir: dir: /lib inherit-pkg-dir: dir: /platform inherit-pkg-dir: dir: /sbin inherit-pkg-dir: dir: /usr Veritas Upgrading when the zone root is on Veritas File System shared storage The following procedures are to make one active non-global zone upgradeable with the zone root on shared storage. The corresponding non-global zones on the other nodes in the cluster are then detached from shared storage. They are detached to prevent them from being upgraded one at a time. 1. Stopping the cluster and upgrading nodeA # hastop -all 2. On nodeA, bring up the volumes and the file systems that are related to the zone root.
Note
For a faster upgrade, you can boot the zones to bring them into the running state. # hastop -all 3. Use the patchadd command to upgrade nodeA. # patchadd nnnnnn-nn # patchadd xxxxxx-xx . . 4. Detaching the zones on nodeB - nodeN Use a mount point as a temporary zone root directory. You then detach the non-global zones in the cluster that are in the installed state. Detach them to prevent the operating system from trying to upgrade these zones and failing. - this is from Veritas Docs; not sure about process; recomment detach on alternate global zones; but don't think the fake filesystem is needed as long as non-global zone is patches on the original host - more work needed should zone failover be a requirment for rolling upgrades; could be a possible "upgrade on attach" condition - not supported by VCS Zone Agent yet.
52
Solaris Virtualization
zonecfg:myzone:inherit-pkg-dir> end zonecfg:myzone> add inherit-pkg-dir zonecfg:myzone:inherit-pkg-dir> set zonecfg:myzone:inherit-pkg-dir> end zonecfg:myzone> add inherit-pkg-dir zonecfg:myzone:inherit-pkg-dir> set zonecfg:myzone:inherit-pkg-dir> end zonecfg:myzone> add inherit-pkg-dir zonecfg:myzone:inherit-pkg-dir> set zonecfg:myzone:inherit-pkg-dir> end zonecfg:myzone> add inherit-pkg-dir zonecfg:myzone:inherit-pkg-dir> set zonecfg:myzone:inherit-pkg-dir> end zonecfg:myzone> verify zonecfg:myzone> export
dir=/platform
dir=/sbin
dir=/usr
dir=/opt/swf
53
Solaris Virtualization
54
Solaris Virtualization
55
Solaris Virtualization
ZONE_IS_DEAD } zone_status_t; Dtrace code - can be run via cron with output to a monitored file /usr/sbin/dtrace -qs BEGIN { state[0] state[1] state[2] state[3] state[4] state[5] state[6] state[7] state[8] } zone_status_set:entry { printf("Zone %s status %s\n", stringof(args[0]->zone_name), state[args[1]]); } Example output of dtrace code above # ./zonestatus.d Zone aap status Ready Zone aap status Booting Zone aap status Running Zone aap status Shutting down Zone aap status Down Zone aap status Empty Zone aap status Dying Zone aap status Ready Zone aap status Dead Zone aap status Booting Zone aap status Running Zone aap status Shutting down Zone aap status Empty Zone aap status Down Zone aap status Dead = = = = = = = = = "Uninitialized"; "Ready"; "Booting"; "Running"; "Shutting down"; "Empty"; "Down"; "Dying"; "Dead";
56
Solaris Virtualization
Used when a zone will not attach due to manifest incompatabilities such as missing patches. Buyer be ware. # zoneadm -z inactive_local_zonename attach -F b. Detach non-global zone # zoneadm -z inactive_local_zonename detach c. Dry Run for attach and detach # zoneadm -z my-zone detach -n # zoneadm -z my-zone attach -n d. Dry Run to see if a non-global zone can be moved from one system to another # zoneadm -z myzone detach -n | ssh remote zoneadm attach -n e. Update on Attach Can be used durring round-robin upgrades or moving from one architecture to another. # zoneadm -z my-zone attach -u f. Verbose Non-Global Zone boot # zoneadm boot -- -m verbose g. Importing a Non-Global Zone on a host without the zone.xml/index definition Host1# zoneadm -z myzone halt Host1# zoneadm -z myzone detach [move storage to host2] Host2# zonecfg -z myzone "create -F -a /zone/myzone" Host2# zoneadm -z myzone attach -u 2. Creating the ZFS Storage Pool for local zone installation # zpool create zones c6t0d0 # zfs create zones/webzone # chmod go-rwx /zones/webzone 3. Create Zone webzone # zonecfg -z webzone webzone: No such zone configured Use 'create' to begin configuring a new zone zonecfg:webzone> create zonecfg:webzone> set zonepath=/zones/webzone zonecfg:webzone> exit # zoneadm -z webzone install # zoneadm -z webzone boot # zlogin -e @. -C webzone ## Finish the sysid questions
57
Solaris Virtualization
4. Defining default Non-Global Zone Boot Mode global# zonecfg -z myzone zonecfg:myzone> set bootargs="-m verbose" zonecfg:myzone> exit 5. Exclusive IP Mode global# zonecfg -z myzone zonecfg:myzone> set ip-type=exclusive zonecfg:myzone> add net zonecfg:myzone:net> set physical=bge1 zonecfg:myzone:net> end zonecfg:myzone> exit 6. Cap Memory for a Non-Global Zone global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set physical=500m zonecfg:myzone:capped-memory> end zonecfg:myzone> exit 7. Cap Swap for a Non-Global Zone global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set swap=1g zonecfg:myzone:capped-memory> end zonecfg:myzone> exit 8. Swap Cap for running Non-Global Zone global# prctl -n zone.max-swap -v 2g -t privileged \ -r -e deny -i zone myzone 9. Shared Memory Cap for Non-Global Zone global# zonecfg zonecfg:myzone> zonecfg:myzone> zonecfg:myzone> zonecfg:myzone> zonecfg:myzone> -z myzone set max-shm-memory=100m set max-shm-ids=100 set max-msg-ids=100 set max-sem-ids=100 exit
10.Dedicated CPUs Non-Global Zone After using that command, when that Container boots, Solaris: removes a CPU from the default pool assigns that CPU to a newly created temporary pool associates that Container with that pool, i.e. only schedules that Container's processes on that CPU Further, if the load on that CPU exceeds a default threshold and another CPU can be moved from another pool, Solaris will do that, up to the maximum configured amount of three CPUs. Finally, when the Container is stopped, the temporary pool is destroyed and its CPU(s) are placed back in the default pool. global# zonecfg -z myzone zonecfg:myzone> add dedicated-cpu zonecfg:myzone:dedicated-cpu> set ncpus=1-3
58
Solaris Virtualization
zonecfg:myzone:dedicated-cpu> end zonecfg:myzone> exit 11.Migration is done in the following stages: a. Primary system i. Halt the non-global zone # zlogin webzone init 0 ii. Detach the non-global zone # zoneadm -z webzone detach iii. Export the zfs pool used for the non-global zone # zpool export zones b. Failover System i. Import the zfs pool for the non-global zone # zpool import -d /dev/dsk zones ii. Create the zone XML configuration file # zonecfg -z webzone create -a /zones/webzone
iii. Attach the non-global zone # zoneadm -z webzone attach iv. Boot the non-global zone # zoneadm -z webzone boot
Table 5.3. VCS Command Line Access - Global vs. Non-Global Zones
Common Commands hastatus -sum hares -state Global Zone yes yes Non-Global Zone yes yes
59
Solaris Virtualization
Common Commands hagrp -state halogin hagrp -online/-offline hares -online/-offline hares -clear
ZONE=$1 SYS=`cat /var/VRTSvcs/conf/sysname` INDEX=/etc/zones/index ZONE_XML=/etc/zones/${ZONE}.xml if [ ! -f $ZONE_XML ] ; then VCSAG_LOG_MSG "N" "ZONE: $ZONE Configuration file: \ $ZONE_XML not found on $SYS. \ Must run failover test before being considered \ production ready" 1 "$ResName" fi
STATE=`grep ^$ARG1':'
if [ -z $STATE ] ; then VCSAG_LOG_MSG "N" "ZONE: $ZONE is not in $INDEX, and \ was never imported on $SYS. \ Must run failover test before being considered production\ ready" 1 "$ResName" # Exit offline exit 100 fi case "$STATE" in running)
60
Solaris Virtualization
# Zone is running exit 110 configured) # Zone Imported but not running exit 100 installed) # Zone had been configured on this system, but is not # imported or running exit 100 *) esac b. Zone StartProgram Script ######################### ## StartProgram ######################### VCSHOME="${VCS_HOME:-/opt/VRTSvcs} . $VCSHOME/bin/ag_i18n_inc.sh $ZONE=$1 $ZONE_HOME=$2 # This start program forces an attach on the zone, just # incase the xml file is not updated SYS=`cat /var/VRTSvcs/conf/sysname` zonecfg -z $ZONE "create -F -a $ZONE_HOME" S=$? if [ $S -eq 0 ] ; then # Creation was a success, starting zone boot VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Success in attaching to system $SYS" 1 "$ResName" VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Starting Boot sequence on $SYS" 1 "$ResName" zoneadm -z $ZONE boot ZB=$? if [ $ZB -eq 0 ] ; then VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Boot command successful $SYS" 1 "$ResName" else VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Boot command failed on $SYS" 1 "$ResName" fi else # Creation Failed VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Attach Command failed on $SYS" 1 "$ResName" fi
61
Solaris Virtualization
c. Zone StopProgram Script ########################## ## StopProgram ########################## VCSHOME="${VCS_HOME:-/opt/VRTSvcs} . $VCSHOME/bin/ag_i18n_inc.sh SYS=`cat /var/VRTSvcs/conf/sysname` VCSAG_LOG_MSG "N" "ZONE: $ZONE Shutting down $SYS" 1 $ZONE=$1 $ZONE_HOME=$2 zlogin -z $ZONE init 0 ZSD=$? if [ $ZSD -eq 0 ] ; then # Shutdown command sent successful VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Success in zlogin shutdown $SYS" 1 "$ResName" VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Going through init 0 on $SYS, expect \ normal shutdown delay" 1 "$ResName" else # zlogin shutdown Failed VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Failed zlogin shutdown command on $SYS" 1 "$ResName" fi STATE=`grep ^$ARG1':' $INDEX | awk '{print $2}'` while [ "$STATE" == "running" ] ; do sleep 4 STATE=`grep ^$ZONE':' $INDEX | awk '{print $2}'` done "$ResName"
VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Detach In Progress on $SYS" 1 zoneadm -z $ZONE detach sleep 2
"$ResName"
while [ "$STATE" == "configured" ] ; do sleep 4 STATE=`grep ^$ZONE':' $INDEX | awk '{print $2}'` done VCSAG_LOG_MSG "N" \ "ZONE: $ZONE Detach Is Complete $SYS" 1 exit
"$ResName"
62
Solaris Virtualization
63
POC Goals
Simple, extendable, flexable One time definition of system id information - sysidcfg Admins ability to pre-select OS Install Disk (secondary mirror) and or ability to set based on script conditions Configuration and Deployment condusive with management initerface Adaptable to allow for additional install scripts and products; including configuration tasks for those products Minimize any existing speciallized code modifications Minimize any rules.ok generation and updates Ability to define and pass variables set during the wanboot cliend definition process throughout different stages of the install. Methodology that allows for 'collection' of configuration information from an existing server (can be used to upgrade to new OS version while preserving existing scripts and configurations) Methodology that allows for additional products to be installed and configured - selection prior to install time. Can be integrated with existing wanboot methods and scripts
64
Solaris WANBoot
Next Steps
1. Develop a Client Management Interface for Product Selection and Configuration 2. Create script collections for various products selected through Client Management Interface 3. Implement a 'upgrade existing host' script process for integration
Configuration Steps
Table 6.1. Wanboot Server Client Details
Server Wanboot Server Target Client Hostname Target Client Host ID Target Client Install Disk Server Side Configuration Process Value 192.168.15.89 dom2 84F8799D c0d0
65
Solaris WANBoot
# cd /etc/apache2 # cp httpd.conf-example httpd.conf # svcadm enable apache2 ### Create the /etc/netboot directory structure ### # mkdir /etc/netboot # mkdir /etc/netboot/192.168.15.0 # # # # cd /var/apache2/htdocs mkdir config mkdir flar mkdir wanboot10
### Create directory for each node to ## be booted that contains the sysidcfg ### # mkdir /var/apache2/htdocs/config/client-sysidcfg/dom2 ### Install WANBOOT ### # cd /mnt/Solaris_10/Tools # ./setup_install_server -w /var/apache2/htdocs/wanboot10/wpath \ /var/apache2/htdocs/wanboot10/ipath
### Copy stock jumpstart rules ### # cd /mnt/Solaris_10/Misc/jumpstart_sample/ # mkdir /var/apache2/htdocs/config/js-rules # cp -r * /var/apache2/htdocs/config/js-rules
### Install wanboot cgi to apache2 cgi-bin directory ### # cd /usr/lib/inet/wanboot/ # cp bootlog-cgi wanboot-cgi /var/apache2/cgi-bin/ # cd /var/apache2/cgi-bin # cp wanboot-cgi wanboot.cgi
### Upload wanboot and miniroot ### # # # # cd cp cd cp /mnt/Solaris_10/Tools/Boot/platform/sun4v/ wanboot /var/apache2/htdocs/wanboot/sun4v.wanboot /var/apache2/htdocs/wanboot10/wpath miniroot ..
### Add wget to /usr/sfw/bin in the miniroot # lofiadm -a /var/apache2/htdocs/wanboot10/miniroot /dev/lofi/1 # mount /dev/lofi/1 /mnt # mkdir /mnt/usr/sfw/bin # cp /usr/sfw/bin/wget /mnt/usr/sfw/bin/
66
Solaris WANBoot
File Contents /etc/netboot/192.168.15.0/84F8799D/system.conf SsysidCF=http://192.168.15.89/config/js-rules/dom2 SjumpsCF=http://192.168.15.89/config/js-rules /etc/netboot/192.168.15.0/84F8799D/wanboot.conf boot_file=/wanboot10/sun4v.wanboot root_server=http://192.168.15.89/cgi-bin/wanboot-cgi root_file=/wanboot10/miniroot server_authentication=no client_authentication=no system_conf=system.conf boot_logger=http://192.168.15.89/cgi-bin/bootlog-cgi /var/apache2/htdocs/config/js-rules/rules karch sun4v dynamic_pre.sh = /var/apache2/htdocs/config/js-rules/dynamic_pre.sh #!/bin/sh HOST_NAME=`hostname` /usr/sfw/bin/wget -P/tmp/install_config/ \ http://192.168.15.89/config/js-rules/${HOST_NAME}/boot.env sleep 2 . /tmp/install_config/boot.env echo "Installing into: ${DY_ROOTDISK}" echo "dy install_type set to: ${dy_install_type}" echo "dy archive_location set to: ${dy_archive_location}" sleep 5 echo echo echo echo echo "install_type ${dy_install_type}" > ${SI_PROFILE} "archive_location ${dy_archive_location}" >>${SI_PROFILE} "partitioning explicit">> ${SI_PROFILE} "filesys ${DY_ROOTDISK}.s1 1024 swap" >> ${SI_PROFILE} "filesys ${DY_ROOTDISK}.s0 free / logging" >> ${SI_PROFILE}
67
Solaris WANBoot
network_interface=vnet0 { primary hostname=dom2 ip_address=192.168.15.88 netmask=255.255.255.0 protocol_ipv6=no default_route=192.168.15.1 } timezone=US/Eastern system_locale=C terminal=dtterm root_password=pm/sEGrVL9KT6 timeserver=localhost name_service=none nfs4_domain=dynamic security_policy=none Client OBP Boot String Example ok> setenv network-boot-arguments host-ip=192.168.15.88,\ subnet-mask=255.255.255.0,hostname=dom2,\ file=http://192.168.15.89/cgi-bin/wanboot-cgi,\ client-id=84F8799D ok> boot net - install
68
69
Discovering physical storage devices Discovering logical storage devices Cross referencing storage devices with boot environment configurations Determining types of file systems supported Validating file system requests Preparing logical storage devices Preparing physical storage devices Configuring physical storage devices Configuring logical storage devices INFORMATION: Removing invalid lock file. Analyzing system configuration. No name for current boot environment. Current boot environment is named <sol8>. Creating initial configuration for primary boot environment <sol8>. WARNING: The device </dev/md/dsk/d0> for the root file system mount point </> is not a physical device. WARNING: The system boot prom identifies the physical device </dev/dsk/c1t0d0s0> as the system boot device. Is the physical device </dev/dsk/c1t0d0s0> the boot device for the logical device </dev/md/dsk/d0>? (yes or no) yes INFORMATION: Assuming the boot device </dev/dsk/c1t0d0s0> obtained from the system boot prom is the physical boot device for logical device </dev/md/dsk/d0>. The device </dev/dsk/c1t0d0s0> is not a root device for any boot environment; cannot get BE ID. PBE configuration successful: PBE name <sol8> PBE Boot Device </dev/dsk/c1t0d0s0>. Comparing source boot environment <sol8> file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Searching /dev for possible boot environment filesystem devices Template Template Template Template Template entry entry entry entry entry /:/dev/dsk/c1t1d0s0:ufs skipped. /var:/dev/dsk/c1t1d0s5:ufs skipped. /opt:/dev/dsk/c1t1d0s6:ufs skipped. /opt/patrol:/dev/dsk/c1t1d0s4:ufs skipped. -:/dev/dsk/c1t1d0s1:swap skipped.
luconfig: ERROR: Template filesystem definition failed for /, all devices are not applicable.. ERROR: Configuration of boot environment failed.
d101 d11
m s
70
d21 d104 d1 d24 d105 d15 d25 d103 d0 d23 d100 d10 d20
s m s s m s s m s s m s s
2.0GB 10GB 10GB 10GB 9.7GB 9.7GB 9.7GB 4.0GB 4.0GB 4.0GB 10GB 10GB 10GB
c0d1s1 d1 d24 c0d0s4 c0d1s4 d15 d25 c0d0s5 c0d1s5 d0 d23 c0d0s3 c0d1s3 d10 d20 c0d0s0 c0d1s0
2. Check mounted filesystems and swap # df -h | grep md / (/dev/md/dsk/d100 /var (/dev/md/dsk/d103 /export (/dev/md/dsk/d104 /zones (/dev/md/dsk/d105 # grep swap /etc/vfstab /dev/md/dsk/d101 swap no -
# lustatus Boot Environment Name -------------------------svn110 os200906 2. Install into new ABE #
71
2. Create OS Image with same FS Layout ; Have lucreate split mirror for you. # lucreate -n abe -m /:/dev/md/dsk/d200:ufs,mirror -m /:/dev/dsk/c0d1s0:detach,attach,preserve -m /var:/dev/md/dsk/d210:ufs,mirror -m /var:/dev/dsk/c0d1s3:detach,attach,preserve -m /zones:/dev/md/dsk/d220:ufs,mirror -m /zones:/dev/dsk/c0d1s5:detach,attach,preserve -m /export:/dev/md/dsk/d230:ufs,mirror -m /export:/dev/dsk/c0d1s4:detach,attach,preserve
Warning
When adding patches to ABE bad patch script permissions could prevent the patch from being added; look for errors around permissions such as: /var/sadm/spool/lu/120273-25/postpatch simple chmod will fix and allow for patch installation ; recommend scripting check before adding patches 1. PATCHING - For Solaris 10 '*' works out patch order - otherwise patch_order file can be passed to it. # luupgrade -t -n abe -s /var/tmp/patches '*' 2. PATCHING - For pre-solaris 10 needing patch order file # luupgrade -t -n abe -s /path/to/patches \ -O "-M /path/to/patch patch_order_list" 3. Adding Additional Packages to alternate boot environment # luupgrade -p -n abe -s /export/packages 4. Removing packages from ABE # luupgrade -P -n abe MYpkg MYpkg
5. Mounting Alternate Boot Environment for modifications # lumount abe /mnt 6. Unmount Alternate Boot Environment # luumount abe 7. Enable ABE
72
# luactivate abe 8. Show Boot Environment Status # lustatus Boot Environment Name ----------------disk_a_S7 disk_b_S7db disk_b_S8 S9testbed 9. Filesystem merger example Instead of using the preceding command to create the alternate boot environment so it matches the current boot environment, the following command joins / and /usr, assuming that c0t3d0s0 is partitioned with sufficient space: # lucreate -c "Solaris_8" -m /:/dev/dsk/c0t3d0s0:ufs \ -m /usr:merged:ufs -m /var:/dev/dsk/c0t3d0s4:ufs \ -n "Solaris_9" 10.example patch order # luupgrade -t -n "Solaris_9" \ -s /install/data/patches/SunOS-5.9-sparc/recommended -O \ "-M /install/data/patches/SunOS-5.9-sparc/recommended patch_order" 11.Example with splitoff This next example would instead split /opt off of /, assuming that c0t3d0s5 is partitioned with sufficient space: # lucreate -c "Solaris_8" -m /:/dev/dsk/c0t3d0s0:ufs \ -m /usr:/dev/dsk/c0t3d0s3:ufs -m /var:/dev/dsk/c0t3d0s4:ufs \ -m /opt:/dev/dsk/c0t3d0s5:ufs -n "Solaris_9" 12.Using luupgrade to Upgrade from a JumpStart Server This next example shows how to upgrade from the existing Solaris 8 alternate boot environment to Solaris 9 by means of an NFS-mounted JumpStart installation. First create a JumpStart installation from CD-ROM, DVD, or an ISO image as covered in the Solaris 9 Installation Guide. The JumpStart installation in this example resides in /install on the server js-server. The OS image itself resides in / install/cdrom/SunOS-5.9-sparc. The profiles for this JumpStart installation dwell in /install/jumpstart/ profiles/ in a subdirectory called liveupgrade. Within this directory, the file js-upgrade contains the JumpStart profile to upgrade the OS and additionally install the package SUNWxwice: install_type upgrade package SUNWxwice add On the target machine, mount the /install partition from js-server and run luupgrade, specifying the Solaris_9 alternate boot environment as the target, the OS image location, and the JumpStart profile: Is Complete -------yes yes no yes Active Now -----yes no no no Active On Reboot --------yes no no no Can Delete -----no no no yes Copy Status --------UPGRADING -
73
74
# pkgchk -l SUNWaudd | grep Pathname Pathname: /kernel Pathname: /kernel/drv Pathname: /kernel/drv/audio1575.conf Pathname: /kernel/drv/audiocs.conf Pathname: /kernel/drv/audioens.conf Pathname: /kernel/drv/audiots.conf Pathname: /kernel/drv/sparcv9 Pathname: /kernel/drv/sparcv9/audio1575 Pathname: /kernel/drv/sparcv9/audiocs Pathname: /kernel/drv/sparcv9/audioens Pathname: /kernel/drv/sparcv9/audiots Pathname: /kernel/drv/sparcv9/dbri Pathname: /kernel/misc Pathname: /kernel/misc/sparcv9 Pathname: /kernel/misc/sparcv9/amsrc1 Pathname: /kernel/misc/sparcv9/amsrc2 Pathname: /kernel/misc/sparcv9/audiosup
75
SSH Keys
Common issues: 1. Permissions on .ssh 2. Hostnames for multiple interfaces ssh-keygen -t dsa scp ~/.ssh/id_dsa.pub burly:.ssh/authorized_keys2 ssh-agent sh -c 'ssh-add < /dev/null && bash'
76
# /etc/init.d/iscsi-target start# # chkconfig levels 345 iscsi-target on 2. Solaris 10 U6 Initiator Configuration Commands: # svcadm enable iscsi_initiator# # iscsiadm add static-config \ iqn.2008-02.com.domain:storage.disk2.host.domain,\ IP_OF_TARGET_HOST:3260 # devfsadm -c iscsi
77
78
USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none MII_NOT_SUPPORTED=yes The DEVICE= section should reflect the interface the file relates to (ifcfg-eth1 should have DEVICE=eth1). The MASTER= section should indicate the bonded interface to be used. Assign both e1000 devices to bond0.The bond0 file contains the actual IP address information: DEVICE=bond0 IPADDR=192.168.1.1 NETMASK=255.255.255.0 ONBOOT=yes BOOTPROTO=none USERCTL=no MII_NOT_SUPPORTED=yes 4. Restart network services # service network restart
79
3. /proc/sys/net/ipv4/ip_local_port_range - net.ipv4.ip_local_port_range Defines the local port range that is used by TCP and UDP to choose the local port. The first number is the first, the second the last local port number. The default value depends on the amount of memory available on the system: > 128MB 32768 - 61000, < 128MB 1024 - 4999 or even less. This number defines number of active connections, which this system can issue simultaneously to systems not supporting TCP extensions (timestamps). With tcp_tw_recycle enabled, range 1024 - 4999 is enough to issue up to 2000 connections per second to systems supporting timestamps.
echo 1 > /sys/class/fc_host/host0/issue_lip echo '- - -' > /sys/class/scsi_host/host0/scan echo 1 > /sys/class/fc_host/host1/issue_lip echo '- - -' > /sys/class/scsi_host/host1/scan partprobe cat /proc/scsi/scsi Check HBA Link state and Port state
cat /sys/class/fc_host/host*/port_name View WWN of FA to verify you are connected to redundant FAs
cat /sys/class/fc_remote_ports/rport*/node_name cat /sys/class/fc_remote_ports/rport*/port_id Manually add and remove SCSI disks by echoing the /proc or /sys filesystem You can use the following commands to manually add and remove SCSI disk.
Note
In the following command examples, H, B, T, L, are the host, bus, target, and LUN IDs for the device. You can unconfigure and remove an unused SCSI disk with the following command:
80
If the driver cannot be unloaded and loaded again, and you know the host, bus, target and LUN IDs for the new devices, you can add them through the /proc/scsi/scsi file using the following command:
echo "scsi add-single-device H B T L" > /proc/scsi/scsi For Linux 2.6 kernels, devices can also be added and removed through the /sys filesystem. Use the following command to remove a disk from the kernels recognition:
echo 1 > /sys/class/scsi_host/hostH/device/H:B:T:L/delete or, as a possible variant on other 2.6 kernels, you can use the command:
echo 1 > /sys/class/scsi_host/hostH/device/targetH:B:T/H:B:T:L/delete To reregister the disk with the kernel use the command
Note
The Linux kernel does not assign permanent names for the fabric devices in the /dev directory. Device file names are assigned in the order in which devices are discovered during the bus scanning. For example, a LUN might be /dev/sda. After a driver reload, the same LUN might become /dev/sdce. A fabric reconfiguration might also result in a shift in the host, bus, target and LUN IDs, which makes it unreliable to add specific devices through the /proc/scsi/scsi file.
"
# Check all pids for this port, then list that process for f in $pids do /usr/proc/bin/pfiles $f 2>/dev/null \ | /usr/xpg4/bin/grep -q "port: $ans" if [ $? -eq 0 ] ; then echo "$line\nPort: $ans is being used by PID: \c"
81
82
Supports Wake-on: pumbg Wake-on: d Current message level: 0x00000007 (7) Link detected: yes 4. Change Duplex with ethtool and or mii-tool # mii-tool -F 100baseTx-HD # mii-tool -F 10baseT-HD
# ethtool -s eth0 speed 100 duplex full # ethtool -s eth0 speed 10 duplex half
Hardening Linux
1. Restrict SU access to accounts through PAM and Group Access # groupadd rootmembers # groupadd oraclemembers # groupadd postgresmembers
# usermod -G rootmembers adminuser1 # usermod -G oraclemembers oracleuser1 # usermod -G postgresmembers postgresuser1 /etc/pam d/su auth auth auth sufficient /lib/security/$ISA/pam_stack.so\ service=su-root-members sufficient /lib/security/$ISA/pam_stack.so\ service=su-other-members required /lib/security/$ISA/pam_deny.so
The file /etc/pam.d/su-root-members referenced in /etc/pam.d/su should read like: auth required /lib/security/pam_wheel.so\ use_uid group=rootmembers auth required /lib/security/pam_listfile.so\ item=user sense=allow onerr=fail\ file=/etc/security/su-rootmembers-access
The file /etc/security/su-rootmembers-access referenced in /etc/pam.d/su-root-members should read like: root oracle postgres
83
/etc/pam.d/su should be created and read like: auth auth auth sufficient /lib/security/pam_stack.so\ service=su-oracle-members sufficient /lib/security/pam_stack.so\ service=su-postgres-members required /lib/security/pam_deny.so
If one of the two PAM services returns Success, it will return Success to the "su" PAM service configured in /etc/pam.d/su. Otherwise the last module will be invoked which will deny all further requests and the authentication fails. Next the PAM services "su-oracle-members" and "su-postgres-members" have to be created. The file /etc/pam.d/su-oracle-members referenced in /etc/pam.d/su-other-members should read like: auth auth required /lib/security/pam_wheel.so\ use_uid group=oraclemembers required /lib/security/pam_listfile.so\ item=user sense=allow onerr=fail\ file=/etc/security/su-oraclemembers-access
The file /etc/security/su-oraclemembers-access referenced in /etc/pam.d/su-oracle-members should read like: oracle The file /etc/pam.d/su-postgres-members referenced in /etc/pam.d/su-other-members should read like: auth auth required /lib/security/pam_wheel.so\ use_uid group=postgresmembers required /lib/security/pam_listfile.so\ item=user sense=allow onerr=fail\ file=/etc/security/su-postgresmembers-access
The file /etc/security/su-postgresmembers-access referenced in /etc/pam.d/su-postgres-members should read like: postgres 2. Detecting Listening Network Ports # netstat -tulp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Foreign State tcp 0 0 *:auth *:* LISTEN tcp 0 0 host.domain:smtp *:* LISTEN tcp 0 0 *:ssh *:* LISTEN
84
From the output you can see that xinetd, sendmail, and sshd are listening. On all newer Red Hat Linux distributions sendmail is configured to listen for local connections only. Sendmail should not listen for incoming network connections unless the server is a mail or relay server. Running a port scan from another server will confirm that (make sure that you have permissions to probe a machine): # nmap -sTU <remote_host> Starting nmap 3.70 ( http://www.insecure.org/nmap/ ) at 2004-12-10 22:51 CST Interesting ports on jupitor (172.16.0.1): (The 3131 ports scanned but not shown below are in state: closed) PORT STATE SERVICE 22/tcp open ssh 113/tcp open auth Nmap run completed -- 1 IP address (1 host up) scanned in 221.669 seconds #
Another method to list all of the TCP and UDP sockets to which programs are listening is lsof: # lsof -i -n | egrep 'COMMAND|LISTEN|UDP' COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME sshd 2317 root 3u IPv6 6579 TCP *:ssh (LISTEN) xinetd 2328 root 5u IPv4 6698 TCP *:auth (LISTEN) sendmail 2360 root 3u IPv4 6729 TCP 127.0.0.1:smtp (LISTEN) # 3. Inittab and Boot Scripts The inittab file /etc/inittab also describes which processes are started at bootup and during normal operation. For example, Oracle uses it to start cluster services at bootup. Therefore, it is recommended to ensure that all entries in /etc/inittab are legitimate in your environment. I would at least remove the CTRL-ALT-DELETE trap entry to prevent accidental reboots: The default runlevel should be set to 3 since in my opinion X11 (X Windows System) should not be running on a production server. In fact, it shouldn't even be installed. # grep ':initdefault' /etc/inittab id:3:initdefault: 4. TCP Wrappers To deny everything by default, add the following line to /etc/hosts.deny: ALL: ALL
85
To accept incoming SSH connections from e.g. nodes rac1cluster, rac2cluster and rac3cluster, add the following line to /etc/hosts.allow: sshd: rac1cluster rac2cluster rac3cluster To accept incoming SSH connections from all servers from a specific network, add the name of the subnet to /etc/hosts.allow. For example: sshd: rac1cluster rac2cluster rac3cluster .subnet.example.com To accept incoming portmap connections from IP address 192.168.0.1 and subnet 192.168.5, add the following line to /etc/hosts.allow: portmap: 192.168.0.1 192.168.5. To accept connections from all servers on subnet .subnet.example.com but not from server cracker.subnet.example.com, you could add the following line to /etc/hosts.allow: ALL: .subnet.example.com EXCEPT cracker.subnet.example.com Here are other examples that show some features of TCP wrapper: If you just want to restrict ssh connections without configuring or using /etc/hosts.deny, you can add the following entries to /etc/ hosts.allow: sshd: rac1cluster rac2cluster rac3cluster sshd: ALL: DENY The version of TCP wrapper that comes with Red Hat also supports the extended options documented in the hosts_options(5) man page. Here is an example how an additional program can be spawned in e.g. the /etc/hosts.allow file: sshd: ALL : spawn echo "Login from %c to %s" \ | mail -s "Login Info for %s" log@loghost For information on the % expansions, see "man 5 hosts_access". The TCP wrapper is quite flexible. And xinetd provides its own set of host-based and time-based access control functions. You can even tell xinetd to limit the rate of incoming connections. I recommend reading various documentations about the Xinetd super daemon on the Internet. 5. Enable TCP SYN Cookie Protection A "SYN Attack" is a denial of service attack that consumes all the resources on a machine. Any server that is connected to a network is potentially subject to this attack. To enable TCP SYN Cookie Protection, edit the /etc/sysctl.conf file and add the following line: net.ipv4.tcp_syncookies = 1 6. Disable ICMP Redirect Acceptance ICMP redirects are used by routers to tell the server that there is a better path to other networks than the one chosen by the server. However, an intruder could potentially use ICMP redirect packets to alter the hosts's routing table by causing traffic to use a path you didn't intend. To disable ICMP Redirect Acceptance, edit the /etc/sysctl.conf file and add the following line: net.ipv4.conf.all.accept_redirects = 0
86
7. Enable IP Spoofing Protection IP spoofing is a technique where an intruder sends out packets which claim to be from another host by manipulating the source address. IP spoofing is very often used for denial of service attacks. For more information on IP Spoofing, I recommend the article IP Spoofing: Understanding the basics. To enable IP Spoofing Protection, turn on Source Address Verification. Edit the /etc/sysctl.conf file and add the following line: net.ipv4.conf.all.rp_filter = 1 8. Enable Ignoring to ICMP Requests If you want or need Linux to ignore ping requests, edit the /etc/sysctl.conf file and add the following line: This cannot be done in many environments. net.ipv4.icmp_echo_ignore_all = 1
87
# dladm show-linkprop LINK PROPERTY vsw0 zone e1000g0 zone e1000g1 zone e1000g2 zone 3. Create a Link Aggregation
VALUE -----
DEFAULT -----
POSSIBLE -----
Note
Link aggregation, or IEEE 802.3ad, is a term which describes using multiple Ethernet network cables/ports in parallel to increase the link speed beyond the limits of any one single cable or port, and to increase the redundancy for higher availability. Here is the syntax to create aggr using dladm. You can use any number of data-link interfaces to create an aggr. Now this creates and aggregate called "aggr1". You can plumb this, using "ifconfig plumb" and assign IP address to it. The Link aggregation must be configured on the network switch also. The policy and and aggregated interfaces must configured identically on the other end of the ethernet cables . The example creates Link Aggregation Control Protocol (LACP) in passive mode to control simultaneous transmission on multiple interfaces. Any single stream is transmitted completely on an individual interface, but multiple simultaneous streams can be active across all interfaces. # ifconfig e1000g0 unplumb # ifconfig e1000g1 unplumb # dladm create-aggr -d e1000g0 -d e1000g1 -d e1000g2 1 4. Check properties of an aggregation # dladm show-aggr key: 1 (0x0001) policy: L4 address: XX:XX:XX:XX:XX (auto) device address speed duplex link state e1000g0 XX:XX:XX:XX:XX 0 Mbps half unknown standby e1000g1 <unknown> 0 Mbps half unknown standby e1000g2 <unknown> 0 Mbps half unknown standby 5. Check statistics of aggregati or data-link interface
88
Solaris 10 Notes
# dladm show-aggr -s key: 1 ipackets rbytes Total 0 e1000g0 0 e1000g1 0 e1000g2 0 # dladm show-link -s ipackets rbytes vsw0 225644 94949 e1000g0 0 0 e1000g1 0 0 e1000g2 0 0
opackets 0 0 0 0
obytes 0 0 0 0
Link Aggregation
1. Show all the data-links # dladm show-link vsw0 type: e1000g0 type: e1000g1 type: e1000g2 type: 2. Show link properties # dladm show-linkprop LINK PROPERTY vsw0 zone e1000g0 zone e1000g1 zone e1000g2 zone 3. Create a Link Aggregation non-vlan non-vlan non-vlan non-vlan mtu: mtu: mtu: mtu: 1500 1500 1500 1500 device: device: device: device: vsw0 e1000g0 e1000g1 e1000g2
VALUE -----
DEFAULT -----
POSSIBLE -----
Note
Link aggregation, or IEEE 802.3ad, is a term which describes using multiple Ethernet network cables/ports in parallel to increase the link speed beyond the limits of any one single cable or port, and to increase the redundancy for higher availability. Here is the syntax to create aggr using dladm. You can use any number of data-link interfaces to create an aggr. Now this creates and aggregate called "aggr1". You can plumb this, using "ifconfig plumb" and assign IP address to it. The Link aggregation must be configured on the network switch also. The policy and and aggregated interfaces must configured identically on the other end of the ethernet cables . The example creates Link Aggregation Control Protocol (LACP) in passive mode to control simultaneous transmission on multiple interfaces. Any single stream is transmitted completely on an individual interface, but multiple simultaneous streams can be active across all interfaces. # ifconfig e1000g0 unplumb # ifconfig e1000g1 unplumb # dladm create-aggr -d e1000g0 -d e1000g1 -d e1000g2 1
89
Solaris 10 Notes
4. Check properties of an aggregation # dladm show-aggr key: 1 (0x0001) policy: L4 address: XX:XX:XX:XX:XX (auto) device address speed duplex link state e1000g0 XX:XX:XX:XX:XX 0 Mbps half unknown standby e1000g1 <unknown> 0 Mbps half unknown standby e1000g2 <unknown> 0 Mbps half unknown standby 5. Check statistics of aggregati or data-link interface # dladm show-aggr -s key: 1 ipackets rbytes Total 0 e1000g0 0 e1000g1 0 e1000g2 0 # dladm show-link -s ipackets rbytes vsw0 225644 94949 e1000g0 0 0 e1000g1 0 0 e1000g2 0 0
opackets 0 0 0 0
obytes 0 0 0 0
%ipkts %opkts 0 0 0 0 -
obytes 29996 0 0 0
oerrors 0 0 0 0
IPMP Overview
1. Preventing Applications From Using Test Addresses After you have configured a test address, you need to ensure that this address is not used by applications. Otherwise, if the interface fails, the application is no longer reachable because test addresses do not fail over during the failover operation. To ensure that IP does not choose the test address for normal applications, mark the test address as deprecated. IPv4 does not use a deprecated address as a source address for any communication, unless an application explicitly binds to the address. The in.mpathd daemon explicitly binds to such an address in order to send and receive probe traffic. Because IPv6 link-local addresses are usually not present in a name service, DNS and NIS applications do not use link-local addresses for communication. Consequently, you must not mark IPv6 link-local addresses as deprecated. IPv4 test addresses should not be placed in the DNS and NIS name service tables. In IPv6, link-local addresses are not normally placed in the name service tables. 2. Standby Interfaces in an IPMP Group The standby interface in an IPMP group is not used for data traffic unless some other interface in the group fails. When a failure occurs, the data addresses on the failed interface migrate to the standby interface. Then, the standby interface is treated the same as other active interfaces until the failed interface is repaired. Some failovers might not choose a standby interface. Instead, these failovers might choose an active interface with fewer data addresses that are configured as UP than the standby interface. You should configure only test addresses on a standby interface. IPMP does not permit you to add a data address to an interface that is configured through the ifconfig command as standby. Any attempt
90
Solaris 10 Notes
to create this type of configuration will fail. Similarly, if you configure as standby an interface that already has data addresses, these addresses automatically fail over to another interface in the IPMP group. Due to these restrictions, you must use the ifconfig command to mark any test addresses as deprecated and -failover prior to setting the interface as standby. To configure standby interfaces, refer to How to Configure a Standby Interface for an IPMP Group. 3. Probe-Based Failure Detection The in.mpathd daemon performs probe-based failure detection on each interface in the IPMP group that has a test address. Probe-based failure detection involves the sending and receiving of ICMP probe messages that use test addresses. These messages go out over the interface to one or more target systems on the same IP link. For an introduction to test addresses, refer to Test Addresses. For information on configuring test addresses, refer to How to Configure an IPMP Group With Multiple Interfaces. The in.mpathd daemon determines which target systems to probe dynamically. Routers that are connected to the IP link are automatically selected as targets for probing. If no routers exist on the link, in.mpathd sends probes to neighbor hosts on the link. A multicast packet that is sent to the all hosts multicast address, 224.0.0.1 in IPv4 and ff02::1 in IPv6, determines which hosts to use as target systems. The first few hosts that respond to the echo packets are chosen as targets for probing. If in.mpathd cannot find routers or hosts that responded to the ICMP echo packets, in.mpathd cannot detect probebased failures. You can use host routes to explicitly configure a list of target systems to be used by in.mpathd. For instructions, refer to Configuring Target Systems. To ensure that each interface in the IPMP group functions properly, in.mpathd probes all the targets separately through all the interfaces in the IPMP group. If no replies are made in response to five consecutive probes, in.mpathd considers the interface to have failed. The probing rate depends on the failure detection time (FDT). The default value for failure detection time is 10 seconds. However, you can tune the failure detection time in the /etc/default/mpathd file. For instructions, go to How to Configure the /etc/default/mpathd File. For a repair detection time of 10 seconds, the probing rate is approximately one probe every two seconds. The minimum repair detection time is twice the failure detection time, 20 seconds by default, because replies to 10 consecutive probes must be received. The failure and repair detection times apply only to probe-based failure detection.
Note
In an IPMP group that is composed of VLANs, link-based failure detection is implemented per physical-link and thus affects all VLANs on that link. Probe-based failure detection is performed per VLAN-link. For example, bge0/bge1 and bge1000/bge1001 are configured together in a group. If the cable for bge0 is unplugged, then link-based failure detection will report both bge0 and bge1000 as having instantly failed. However, if all of the probe targets on bge0 become unreachable, only bge0 will be reported as failed because bge1000 has its own probe targets on its own VLAN.
91
Solaris 10 Notes
accomplish probe-based failure detection by setting up host routes in the routing table as probe targets. Any host routes that are configured in the routing table are listed before the default router. Therefore, IPMP uses the explicitly defined host routes for target selection. You can use either of two methods for directly specifying targets: manually setting host routes or creating a shell script that can become a startup script. Consider the following criteria when evaluating which hosts on your network might make good targets. Make sure that the prospective targets are available and running. Make a list of their IP addresses. Ensure that the target interfaces are on the same network as the IPMP group that you are configuring. The netmask and broadcast address of the target systems must be the same as the addresses in the IPMP group. The target host must be able to answer ICMP requests from the interface that is using probe-based failure detection. How to Manually Specify Target Systems for Probe-Based Failure Detection 1. Log in with your user account to the system where you are configuring probe-based failure detection 2. Add a route to a particular host to be used as a target in probe-based failure detection. Replace the values of destination-IP and gateway-IP with the IPv4 address of the host to be used as a target. For example, you would type the following to specify the target system 192.168.85.137, which is on the same subnet as the interfaces in IPMP group testgroup1. $ route add -host destination-IP gateway-IP -static $ route add -host 192.168.85.137 192.168.85.137 -static 3. Add routes to additional hosts on the network to be used as target systems. 4. Example Shell Script TARGETS="192.168.85.117 192.168.85.127 192.168.85.137" case "$1" in 'start') /usr/bin/echo "Adding static routes for use as IPMP targets" for target in $TARGETS; do /usr/sbin/route add -host $target $target done ;; 'stop') /usr/bin/echo "Removing static routes for use as IPMP targets" for target in $TARGETS; do /usr/sbin/route delete -host $target $target done ;; esac
92
Solaris 10 Notes
After a typical software installation, there can be a half dozen or more processes that need to be started and stopped during system startup and shutdown. In addition, these processes may depend on each other and may need to be monitored and restarted if they fail. For each process, these are the logical steps that need to be done to incorporate these as services in SMF: a. Create a service manifest file. b. Create a methods script file to define the start, stop, and restart methods for the service. c. Validate and import the service manifest using svccfg(1M). d. Enable or start the service using svcadm(1M). e. Verify the service is running using svcs(1). 2. Create SMF Entry for an OMR Service a. Create Manifest for OMR Service (example). Create the manifest file according to the description in the smf_method(5) man page. For clarity, this file should be placed in a directory dedicated to files related to the application. In fact, the service will be organized into a logical folder inside SMF, so having a dedicated folder for the files related to the application makes sense. However, there is no specific directory name or location requirement enforced inside SMF. In the example, the OMR service will be organized in SMF as part of the SAS application folder. This is a logical grouping; there is no physical folder named sas associated with SMF. However, when managing the service, the service will be referred to by application/sas/metadata. Other SASrelated processes can later be added and identified under application/sas as well. For the example, the file /var/svc/manifest/application/sas/metadata.xml should be created containing the following: <?xml version="1.0"?> <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type='manifest' name='SAS:Metadata'> <service name='application/sas/metadata' type='service' version='1'> <create_default_instance enabled='false' /> <single_instance /> <dependency name='multi-user-server' grouping='optional_all' type='service' restart_on='none'> <service_fmri value='svc:/milestone/multi-user-server'/> </dependency> <exec_method type='method' name='start' exec='/lib/svc/method/sas/metadata %m' timeout_seconds='60'>
93
Solaris 10 Notes
<method_context> <method_credential user='sas' /> </method_context> </exec_method> <exec_method type='method' name='restart' exec='/lib/svc/method/sas/metadata %m' timeout_seconds='60'> <method_context> <method_credential user='sas' /> </method_context> </exec_method> <exec_method type='method' name='stop' exec='/lib/svc/method/sas/metadata %m' timeout_seconds='60' > <method_context> <method_credential user='sas' /> </method_context> </exec_method> <property_group name='startd' type='framework'> <propval name='duration' type='astring' value='contract'/> </property_group> <template> <common_name> <loctext xml:lang='C'> SAS Metadata Service </loctext> </common_name> <documentation> <doc_link name='sas_metadata_overview' iri= 'http://www.sas.com/technologies/bi/appdev/base/metadatasrv.html' /> <doc_link name='sas_metadata_install' uri= 'http://support.sas.com/rnd/eai/openmeta/v9/setup'/> </documentation> </template> </service> </service_bundle> The manifest file basically consists of two tagged stanzas that have properties that define how the process should be started, stopped, and restarted and also define any dependencies. The first tag, <service_bundle> defines the name of the service bundle that will be used to group services and as part of the parameters in svcs commands (svcs, svcmgr, and so on). The interior tag, <service>, defines a specific process, its dependencies, and how to manipulate the process. Please see the man page for service_bundle(4) for more information on the format of manifest files. b. Create Methods scripts
94
Solaris 10 Notes
Create the methods scripts. This file is analogous to the traditional rc scripts used in previous versions of the Solaris OS. This file should be a script that successfully starts, stops, and restarts the process. This script must be executable for all the users who might manage the service, and it must be placed in the directory and file name referenced in the exec properties of the manifest file. For the example in this procedure, the correct file is /lib/svc/method/sas/metadata, based on the manifest file built in Step 1. See the man page for smf_method(5) for more information on method scripts. #!/sbin/sh # Start/stop client SAS MetaData service # .. /lib/svc/share/smf_include.sh SASDIR=/d0/sas9-1205 SRVR=MSrvr CFG=$SASDIR/SASMain/"$SRVR".sh case "$1" in 'start') $CFG start sleep 2 ;; 'restart') $CFG restart sleep 2 ;; 'stop') $CFG stop ;; *) echo "Usage: $0 { start | stop }" exit 1 ;; esac exit $SMF_EXIT_OK c. Import and Validate manifest file Validate and import the manifest file into the Solaris service repository to create the service in SMF and make the service available for manipulation. The following commands shows the correct file name to use for the manifest in this example. # svccfg svc:> validate /var/svc/manifest/application/sas/metadata.xml svc:> import /var/svc/manifest/application/sas/metadata.xml svc:> quit d. Enable Service Enable the service using the svcadm command. The -t switch allows you to test the service definition without making the definition persistent. You would exclude the -t switch if you wanted the definition to be a permanent change that persists between reboots. # svcadm enable -t svc:/application/sas/metadata e. Verify Service
95
Solaris 10 Notes
Verify that the service is online and verify that the processes really are running by using the svcs command. # svcs -a | grep sas online 8:44:37 svc:/application/sas/metadata:default # ps -ef | grep sas ..... sas 26791 1 0 08:44:36 ? 3. Configuring the Object Spawner Service Now, in the example, both the OMR process (above) and the Object Spawner process were to be configured. The Object Spawner is dependent on the OMR. The remainder of this document describes configuring the dependent Object Spawner process. a. Create the Manifest file The manifest file for the Object Spawner service is similar to the manifest file used for the OMR service. There are a few small changes and a different dependency. The differences are highlighted in bold in the following: <?xml version="1.0"> <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type='manifest' name='SAS:ObjectSpawner'> <service name='application/sas/objectspawner' type='service' version='1'> <create_default_instance enabled='false' /> <single_instance /> <dependency name='sas-metadata-server' grouping='optional_all' type='service' restart_on='none'> <service_fmri value='svc:/application/sas/metadata'/> </dependency> <exec_method type='method' name='start' exec='/lib/svc/method/sas/objectspawner %m' timeout_seconds='60'> <method_context> <method_credential user='sas' /> </method_context> </exec_method> <exec_method type='method' name='restart' exec='/lib/svc/method/sas/objectspawner %m'
96
Solaris 10 Notes
timeout_seconds='60'> <method_context> <method_credential user='sas' /> </method_context> </exec_method> <exec_method type='method' name='stop' exec='/lib/svc/method/sas/ objectspawner %m' timeout_seconds='60' > <method_context> <method_credential user='sas' /> <method_context> <exec_method> <property_group name='startd' type='framework'> <propval name='duration' type='astring' value='contract'/> </property_group> <template> <common_name> <loctext xml:lang='C'> SAS Object Spawner Service </loctext> </common_name> <documentation> <doc_link name='sas_metadata_overview' iri= 'http://www.sas.com/technologies/bi/appdev/base/metadatasrv.html' /> <doc_link name='sas_metadata_install' uri= 'http://support.sas.com/rnd/eai/openmeta/v9/setup'/> </documentation> </template> </service> </service_bundle> b. Create the Methods script After creating the manifest file, create the script /lib/svc/method/sas/objectspawner: #!/sbin/sh # Start/stop client SAS Object Spawner service # .. /lib/svc/share/smf_include.sh SASDIR=/d0/sas9-1205 SRVR=ObjSpa CFG=$SASDIR/SASMain/"$SRVR".sh case "$1" in 'start') $CFG start sleep 2 ;;
97
Solaris 10 Notes
'restart') $CFG restart sleep 2 ;; 'stop') $CFG stop ;; *) echo "Usage: $0 { start | stop }" exit 1 ;; esac exit $SMF_EXIT_OK c. Import and Validate the Manifest file Validate and import the manifest file in the same manner as was used for the OMR service: Note that application shortened to appl for documentation reasons. # svccfg svc:> validate /var/svc/manifest/appl/sas/objectspawner.xml svc:> import /var/svc/manifest/appl/sas/objectspawner.xml svc:> quit d. Enable Service Enable the new service in the same manner as was used for the OMR service: # svcadm enable -t svc:/application/sas/objectspawner e. Verify Service is running Finally, verify that the service is up and running in the same manner as was used for the OMR service: # svcs -a | grep sas online 10:28:39 svc:/application/sas/metadata:default online 10:38:20 svc:/application/sas/objectspawner:default # ps -ef | grep sas ..... sas 26791 1 0 18:44:36 ? 0:00 /bin/sh /d0/SASMain/MSrvr.sh sas 26914 1 0 18:18:49 ? 0:00 /bin/sh /d0/SASMain/ObjSpa.sh
MPXIO
1. Solaris 10 Configuration - CLI # stmsboot -e 2. Solaris 10 Configuration - File /kernel/drv/fp.conf mpxio-disable="no"; 3. Display Paths to LUN 98
Solaris 10 Notes
# stmsboot -L non-STMS device name STMS device name -----------------------------------------------------/dev/rdsk/c1t50060E801049CF50d0 \ /dev/rdsk/c2t4849544143484920373330343031383130303030d0 /dev/rdsk/c1t50060E801049CF52d0 \ /dev/rdsk/c2t4849544143484920373330343031383130303030d0 4. /var/adm/messages example output Dec 18 11:42:24 vampire mpxio: [ID 669396 kern.info] /scsi_vhci/ssd@g600c0ff000000000086ab238b2af0600 (ssd11) multipath status: optimal, path /pci@9,600000/SUNW,qlc@1/fp@0,0 (fp1) to target address: 216000c0ff886ab2,0 is online. Load balancing: round-robin 5. Disable MPXIO on a 880 kernel/drv/qlc.conf: name="qlc" parent="/pci@8,600000" unit-address="2"\ mpxio-disable="yes"; 6. Raw Mount Disk Name Example Filesystem bytes used avail capacity Mounted on /dev/dsk/c6t600C0FF000000000086AB238B2AF0600d0s5 697942398 20825341 670137634 4% /test 7. Display Properties # luxadm display \ /dev/rdsk/c6t600C0FF000000000086AB238B2AF0600d0s2 DEVICE PROPERTIES for disk: \ /dev/rdsk/c6t600C0FF000000000086AB238B2AF0600d0s2 Vendor: SUN Product ID: StorEdge 3510 Revision: 413C Serial Num: 086AB238B2AF Unformatted capacity: 1397535.000 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0xffff Device Type: Disk device Path(s): /dev/rdsk/c6t600C0FF000000000086AB238B2AF0600d0s2 /devices/scsi_vhci/ssd@g600c0ff000000000086ab238b2af0600:c,raw Controller /devices/pci@9,600000/SUNW,qlc@1/fp@0,0
99
Solaris 10 Notes
Device Address 216000c0ff886ab2,0 Host controller port WWN 210000e08b14cc40 Class primary State ONLINE Controller /devices/pci@9,600000/SUNW,qlc@2/fp@0,0 Device Address 266000c0fff86ab2,0 Host controller port WWN 210000e08b144540 Class primary State ONLINE
100
Solaris 10 Notes
6. Start an IP on your device, or replace dhcp with an appropriate IP address and configuration # ifconfig rum0 dhcp 7. Note that you might want to disable svcs service physical: # svcadm disable physical:default # svcadm disable physical:nwam
101
Solaris 10 Notes
elapse = (elapse==0)?1:elapse; printf "Outbound %f MB/s; Inbound %f MB/s\n", \ obytes_curr/elapse, rbytes_curr/elapse; prev_obytes = obytes; prev_rbytes = rbytes; prev_time = time; } '
NFS Performance
nfsstat -s reports server-side statistics. In particular, the following are important: calls: Total RPC calls received. badcalls: Total number of calls rejected by the RPC layer. nullrecv: Number of times an RPC call was not available even though it was believed to have been received. badlen: Number of RPC calls with a length shorter than that allowed for RPC calls. xdrcall: Number of RPC calls whose header could not be decoded by XDR (External Data Representation). readlink: Number of times a symbolic link was read. getattr: Number of attribute requests. null: Null calls are made by the automounter when looking for a server for a filesystem. writes: Data written to an exported filesystem. Sun recommends the following tuning actions for some common conditions: writes > 10%: Write caching (either array-based or host-based, such as a Prestoserv card) would speed up operation.
102
Solaris 10 Notes
badcalls >> 0: The network may be overloaded and should be checked out. The rsize and wsize mount options can be set on the client side to reduce the effect of a noisy network, but this should only be considered a temporary workaround. readlink > 10%: Replace symbolic links with directories on the server. getattr > 40%: The client attribute cache can be increased by setting the actimeo mount option. Note that this is not appropriate where the attributes change frequently, such as on a mail spool. In these cases, mount the filesystems with the noac option. nfsstat -c reports client-side statistics. The following statistics are of particular interest: calls: Total number of calls made. badcalls: Total number of calls rejected by RPC. retrans: Total number of retransmissions. If this number is larger than 5%, the requests are not reaching the server consistently. This may indicate a network or routing problem. badxid: Number of times a duplicate acknowledgement was received for a single request. If this number is roughly the same as badcalls, the network is congested. The rsize and wsize mount options can be set on the client side to reduce the effect of a noisy network, but this should only be considered a temporary workaround. If on the other hand, badxid=0, this can be an indication of a slow network connection. timeout: Number of calls that timed out. If this is roughly equal to badxid, the requests are reaching the server, but the server is slow. wait: Number of times a call had to wait because a client handle was not available. newcred: Number of times the authentication was refreshed. null: A large number of null calls indicates that the automounter is retrying the mount frequently. The timeo parameter should be changed in the automounter configuration. nfsstat -m (from the client) provides server-based performance data. srtt: Smoothed round-trip time. If this number is larger than 50ms, the mount point is slow. dev: Estimated deviation. cur: Current backed-off timeout value. Lookups: If cur>80 ms, the requests are taking too long. Reads: If cur>150 ms, the requests are taking too long. Writes: If cur>250 ms, the requests are taking too long.
103
Solaris 10 Notes
change on SXCE update the /lib/svc/method/svc-iscsitgt file and replace the /usr/ sbin/iscsitgtd execution with the following: /usr/bin/optisa amd64 > /dev/null 2>&1 if [ $? -eq 0 ] then /usr/sbin/amd64/iscsitgtd else /usr/sbin/iscsitgtd fi Then restart the iscsitgtd process via svcsadm restart iscsitgt. Note that opensolaris, Solaris 10 U6 and SXCE b110 all handle the start of this process differently. Performance iSCSI performance can be quite good, especially if you follow a few basic rules Use Enterprise class NICs (they make a HUGE difference) Enable jumbo frames on storage ports Use layer-2 link aggregation and IPMP to boost throughput Ensure that you are using the performance guidance listed in bug #6457694 on opensolaris.org Increase send and receive buffers, disable the nagle algorithm and make sure TCP window scaling is working correctly Ttcp and netperf are awesome tools for benchmarking network throughput, and measuring the impact of a given network tunable As with security, performance is a complete presentation in and of itself. Please see the references if your interested in learning more about tuning iSCSI communications for maximum Setting up an iscsi target on a solaris server with and without ZFS 1. Create iscsi base directory (config store) The base directory is used to store the iSCSI target configuration data, and needs to be defined prior to using the iSCSI target for the first time You can create a base directory with the iscistadm utility # iscsitadm modify admin -d/etc/iscsitgt 2. Configure a backing store The backing store contains the physical storage that is exported as a target The Solaris target supports several types of backing stores: Flat files Physical devices ZFS volumes (zvols for short) To create a backing store from a ZFS volume, the zfs utility can be run with the create subcommand, the create zvol option (-V), the size of the zvol to create, and the name to associate with the zvol:
104
Solaris 10 Notes
#zfs create -V 9g stripedpool/iscsivol000 3. Once a backing store has been created, it can be exported as an iSCSI target with the iscsitadm "create" command, the "target" subcommand, and by specifying the backing store type to use: # iscsitadm create target -b /fslocation -z 10g test-volume Or # iscsitadm create target -b /dev/zvol/dsk/stripedpool/iscsivol000 test-volume 4. Add an ACL to a target Access control lists (ACLs) can be used to limit the node names that are allowed to access a target To ease administration of ACLs, the target allows you to associate an alias with a node name (you can retrieve the node name of a Solaris initiator by running the iscsiadm utility with the list command, and initiator-node subcommand): # iscsitadm create initiator -n 03.com.sun:01:0003ba0e0795.4455571f host1 iqn.1986\
After an alias is created, it can be added to a targets ACL by passing the alias to the target subcommands -l option: # iscsitadm modify target -l host1 host1-tgt0
105
Solaris 10 Notes
6. Add Client Initiator to the TPGT Access List # iscsitadm modify target -l suitable-alias target-label
106
Solaris 10 Notes
# prtvtoc /dev/rdsk/c1t0d0s2 3. Now we have to create state database replicas on slice 7. We will be adding two replicas to each slice: # metadb -a -f -c3 /dev/dsk/c1t0d0s7 # metadb -a -f -c3 /dev/dsk/c1t1d0s7 4. Since the database replicas are in place we can start creating metadevices. The following commands will create metadevice d31 from slice c1t0d0s3, and metadevice d32 from slice c1t1d0s3. Then we create mirror d30 with d31 attached as a submirror. Finally we will attach submirror d32 to mirror d30. Once d32 is attached, the mirror d30 will automatically start syncing. # metainit -f d31 1 1 c1t0d0s3 d31: Concat/Stripe is setup # metainit -f d32 1 1 c1t1d0s3 d32: Concat/Stripe is setup # metainit d30 -m d31 d30: Mirror is setup # metattach d30 d32 d30: submirror d32 is attached 5. The procedure is the same for all other mirrors you might want to create. Root filesystem is slightly different. First you will have to create your submirrors. Then you will have to attach submirror with existing root filesystem, in this case d11, to the new mirror metadevice d10. Then you will have to run metaroot command. It will alter / entry in /etc/vfstab. Finally, you flush the filesystem using lockfs command and reboot. # metainit -f d11 1 1 c1t0d0s1 d31: Concat/Stripe is setup # metainit -f d12 1 1 c1t1d0s1 d32: Concat/Stripe is setup # metainit d10 -m d11 d30: Mirror is setup # metaroot d10 # lockfs -fa # init 6 6. When the system reboots, you can attach the second submirror to d10 as follows: # metattach d10 d12 7. You can check the sync progress using metastat command. Once all mirrors are synced up the next step is to configure the new swap metadevice, in my case d0, to be crash dump device. This is done using dumpadm command: # dumpadm Dump content: kernel pages Dump device: /dev/dsk/c1t0d0s0 (dedicated) Savecore directory: /var/crash/ultra
107
Solaris 10 Notes
Savecore enabled: yes # dumpadm -d /dev/md/dsk/d0 8. Next is to make sure you can boot from the mirror - SPARC ONLY a. The final step is to modify PROM. First we need to find out which two physical devices c1t0d0 and c1t1d0 refer to # ls -l /dev/dsk/c1t0d0s1 lrwxrwxrwx 1 root root 43 Mar 4 14:38 /dev/dsk/c1t0d0s1 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:b # ls -l /dev/dsk/c1t1d0s1 lrwxrwxrwx 1 root root 43 Mar 4 14:38 /dev/dsk/c1t1d0s1 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:b b. The physical device path is everything starting from /pci. Please make a note of sd towards the end of the device string. When creating device aliases below, sd will have to be changed to disk. Now we create two device aliases called root and backup_root. Then we set boot-device to be root and backup_root. The :b refers to slice 1(root) on that particular disk. # eeprom use-nvramrc?=true # eeprom nvramrc=devalias root /pci@1c,600000/scsi@2/disk@0,0 \ devalias backup_root /pci@1c,600000/scsi@2/disk@1,0# # eeprom boot-device=root:b backup_root:b net c. Enable the mirror disk to be bootable # installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \ /dev/rdsk/c0t1d0s0 9. Next is to make sure you can boot from the mirror - Intel/AMD ONLY a. Enable the mirror disk to be bootable # /sbin/installgrub /boot/grub/stage1 \ /boot/grub/stage2 /dev/rdsk/c0d0s0 10.If you are mirroring just the two internal drives, you will want to add the following line to /etc/ system to allow it to boot from a single drive. This will bypass the SVM Quorum rule set md:mirrored_root_flag = 1 Example full run on amd system; disks are named after d[1,2-n Drive][partition number] And Metadevices for the mirrors are named d[Boot Number]0[partition number] - example disk: d10 is drive 1 partition 0, metadevice d100 is the 1st boot environment (live upgrade BE) partition 0. If applying the split mirror alternate boot environment I would have the split off ABE as d200. // Use format fdisk to label and // partition the drive # format c1t1d0
108
Solaris 10 Notes
Total disk cylinders available: 2346 + 2 (reserved cylinders) Part Tag Flag Cylinders 0 root wm 1 - 1275 1 swap wu 1276 - 1406 2 backup wm 0 - 2345 3 unassigned wm 1407 - 2312 4 unassigned wm 0 5 unassigned wm 0 6 unassigned wm 0 7 unassigned wm 2313 - 2345 8 boot wu 0 0 9 unassigned wm 0 Size 9.77GB 1.00GB 17.97GB 6.94GB 0 0 0 258.86MB 7.84MB 0 Blocks (1275/0/0) 20482875 (131/0/0) 2104515 (2346/0/0) 37688490 (906/0/0) 14554890 (0/0/0) 0 (0/0/0) 0 (0/0/0) 0 (33/0/0) 530145 (1/0/0) 16065 (0/0/0) 0
# # # # # # # # # # # # # # # # # #
prtvtoc /dev/rdsk/c1t0d0s2 \ | fmthard -s - /dev/rdsk/c1t1d0s2 format metadb -a -f -c3 /dev/dsk/c1t0d0s7 metadb -a -f -c3 /dev/dsk/c1t1d0s7 metainit -f d10 1 1 c1t0d0s0 metainit -f d20 1 1 c1t1d0s0 metainit -f d11 1 1 c1t0d0s1 metainit -f d21 1 1 c1t1d0s1 metainit -f d13 1 1 c1t0d0s3 metainit -f d23 1 1 c1t1d0s3 metainit d100 -m d10 metainit d101 -m d11 metainit d103 -m d13 metaroot d100 echo 'set md:mirrored_root_flag = 1' \ >>/etc/system installgrub /boot/grub/stage1 \ /boot/grub/stage2 /dev/rdsk/c1t1d0s0 lockfs -fa init 6
// login post reboot # metattach d100 d20 d100: submirror d20 is attached # metattach d101 d21 d101: submirror d21 is attached # metattach d103 d23 d103: submirror d23 is attached
// Replace non-md entries in /etc/vfstab where applicable. // Example as follows. # grep dsk /etc/vfstab | awk '{print $1, $2, $3, $4}'
109
Solaris 10 Notes
/dev/dsk/c1t0d0s1 - - swap /dev/md/dsk/d100 /dev/md/rdsk/d100 / ufs /dev/dsk/c1t0d0s3 /dev/rdsk/c1t0d0s3 /zone ufs // Becomes the following # grep dsk /etc/vfstab | awk '{print $1, $2, $3, $4}' /dev/md/dsk/d101 - - swap /dev/md/dsk/d100 /dev/md/rdsk/d100 / ufs /dev/md/dsk/d103 /dev/md/rdsk/d103 /zone ufs // Wait for sync complete before reboot # lockfs -fa # init 6 // Setup Dump Device # dumpadm -d /dev/md/dsk/d101
block count 8192 /dev/dsk/c1t0d0s3 8192 /dev/dsk/c1t0d0s3 8192 /dev/dsk/c1t1d0s3 8192 /dev/dsk/c1t1d0s3
110
Solaris 10 Notes
r o u l c p m W a M D F S R
replica does not have device relocation information replica active prior to last mddb configuration change replica is up to date locator for this replica was read successfully replica's location was in /etc/lvm/mddb.cf replica's location was patched in kernel replica is master, this is replica selected as input replica has device write errors replica is active, commits are occurring to this replica replica had problem with master blocks replica had problem with data blocks replica had format problems replica is too small to hold current data base replica had device read errors
The replicas on c1t0d0s3 are dead to us, so lets wipe them out! # metadb -d c1t0d0s3 # metadb -i flags p p first blk 16 8208 block count 8192 /dev/dsk/c1t1d0s3 8192 /dev/dsk/c1t1d0s3
a a
luo luo
The only replicas we have left are onc1t1d0s3, so Im all clear to unconfigure the device. I run cfgadm to get the c1 path: # cfgadm -al Ap_Id c1 c1::dsk/c1t0d0 c1::dsk/c1t1d0 c1::dsk/c1t2d0 c1::dsk/c1t3d0 c1::dsk/c1t4d0 c1::dsk/c1t5d0 Type disk disk disk disk disk disk Receptacle scsi-bus connected connected connected connected connected connected Occupant connected configured configured configured configured configured configured Condition configured unknown unknown unknown unknown unknown unknown unknown
I run the following command to unconfigure the failed drive: # cfgadm -c unconfigure c1::dsk/c1t0d0 The drive light turns blue Pull the failed drive out Insert the new drive Configure the new drive: # cfgadm -c configure c1::dsk/c1t0d0 Now that the drive is configured and visible from within the format command, we can copy the partition table from the remaining mirror member: # prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t0d0s2 Next, I install the bootblocks onto the new drive:
111
Solaris 10 Notes
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk\ /dev/rdsk/c1t0d0s0 And finally, Im ready to replace the metadevices, syncing up the mirror and making things as good as new. repeat for each mirrored partition # metareplace -e d10 c0t0d0s1 1. The first step is to recreate the same slice arrangement on the second disk: # prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2 2. You can check both disks have the same VTOC using prtvtoc command # prtvtoc /dev/rdsk/c1t0d0s2 3. Now we have to create state database replicas on slice 7. We will be adding two replicas to each slice: # metadb -a -f -c3 /dev/dsk/c1t0d0s7 # metadb -a -f -c3 /dev/dsk/c1t1d0s7 4. Since the database replicas are in place we can start creating metadevices. The following commands will create metadevice d31 from slice c1t0d0s3, and metadevice d32 from slice c1t1d0s3. Then we create mirror d30 with d31 attached as a submirror. Finally we will attach submirror d32 to mirror d30. Once d32 is attached, the mirror d30 will automatically start syncing. # metainit -f d31 1 1 c1t0d0s3 d31: Concat/Stripe is setup # metainit -f d32 1 1 c1t1d0s3 d32: Concat/Stripe is setup # metainit d30 -m d31 d30: Mirror is setup # metattach d30 d32 d30: submirror d32 is attached 5. The procedure is the same for all other mirrors you might want to create. Root filesystem is slightly different. First you will have to create your submirrors. Then you will have to attach submirror with existing root filesystem, in this case d11, to the new mirror metadevice d10. Then you will have to run metaroot command. It will alter / entry in /etc/vfstab. Finally, you flush the filesystem using lockfs command and reboot. # metainit -f d11 1 1 c1t0d0s1 d31: Concat/Stripe is setup # metainit -f d12 1 1 c1t1d0s1 d32: Concat/Stripe is setup # metainit d10 -m d11 d30: Mirror is setup # metaroot d10 # lockfs -fa # init 6
112
Solaris 10 Notes
6. When the system reboots, you can attach the second submirror to d10 as follows: # metattach d10 d12 7. You can check the sync progress using metastat command. Once all mirrors are synced up the next step is to configure the new swap metadevice, in my case d0, to be crash dump device. This is done using dumpadm command: # dumpadm Dump content: kernel pages Dump device: /dev/dsk/c1t0d0s0 (dedicated) Savecore directory: /var/crash/ultra Savecore enabled: yes # dumpadm -d /dev/md/dsk/d0 8. Next is to make sure you can boot from the mirror - SPARC ONLY a. The final step is to modify PROM. First we need to find out which two physical devices c1t0d0 and c1t1d0 refer to # ls -l /dev/dsk/c1t0d0s1 lrwxrwxrwx 1 root root 43 Mar 4 14:38 /dev/dsk/c1t0d0s1 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:b # ls -l /dev/dsk/c1t1d0s1 lrwxrwxrwx 1 root root 43 Mar 4 14:38 /dev/dsk/c1t1d0s1 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:b b. The physical device path is everything starting from /pci. Please make a note of sd towards the end of the device string. When creating device aliases below, sd will have to be changed to disk. Now we create two device aliases called root and backup_root. Then we set boot-device to be root and backup_root. The :b refers to slice 1(root) on that particular disk. # eeprom use-nvramrc?=true # eeprom nvramrc=devalias root /pci@1c,600000/scsi@2/disk@0,0 \ devalias backup_root /pci@1c,600000/scsi@2/disk@1,0# # eeprom boot-device=root:b backup_root:b net 9. If you are mirroring just the two internal drives, you will want to add the following line to /etc/ system to allow it to boot from a single drive. This will bypass the SVM Quorum rule set md:mirrored_root_flag = 1 10.Enable the mirror disk to be bootable - used by both sparc and x64 systems; on x64 will update grub # installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \ /dev/rdsk/c0t1d0s0
113
Solaris 10 Notes
2. Format Disk B (c3d1s0) properly: host:# format (choose fdisk) (create 100% Standard Solaris Partition over the full Disk) 3. Overwrite the Diskformat properly: host:# prtvtoc /dev/rdsk/c3d0s2 | fmthard -s - /dev/rdsk/c3d1s2 (NOTE: s2! on BOTH Disks) 4. Attach Disk B to the ZFS Root Pool: host:# zpool attach -f rpool c3d0s0 c3d1s0 5. Install the GRUB-Stuff to Disk B: host:# installgrub -m /boot/grub/stage1 /boot/grub/stage2 \ /dev/rdsk/c3d1s0
114
Solaris 10 Notes
-x /export/home/flar /export/home/flar/Snapshot.flar 2. Add FLAR Image to Jumpstart - /etc/bootparams - add_client.sh ./add_install_client -e 0:14:4f:23:ab:8f \ -s host:/flash/boot/sol10sparc \ -c host:/flash/boot/Profiles/Solaris10 \ -p host:/flash/boot/Sysidcfg/smro204 \ smro204.fmr.com sun4u 3. Recover Script - recover.pl #!/usr/bin/perl use Getopt::Long ; $arch_location='/flasharchives/flar'; $boot_base='/flasharchives/boot'; GetOptions( "list" => \$list, "archive=s" => \$archive, "configured" => \$configured, "add" => \$addboot, "remove=s" => \$rmboot ); # Call out the subs from options list if ($list) { &_list ; } if ($addboot) { &_build; } if ($configured) {&_list_existing;} if ($rmboot) { &_rm_existing;}
sub _list { if ($archive) { &_details ; } else { system("/flasharchives/bin/list_archives.pl"); exit ; } } sub _details { &_info_collection; &_print_details; } sub _info_collection { $addto = (); @archinfo = (); $ih = (); chomp $archive; next if $archive =~ /lost/; next if $archive =~ /list/; next if $archive =~ /boot/;
115
Solaris 10 Notes
@archinfo = `flar -i $arch_location/$archive` ; chomp @archinfo; foreach $x (@archinfo) { ($item, $value ) = split(/=/,$x); chomp $value; if ($item =~ /creation_node/) { $inventory{$archive}{creation_node} = $value; } if ($item =~ /creation_date/) { $inventory{$archive}{creation_date} = $value; } if ($item =~ /creation_release/) { $inventory{$archive}{creation_release} = $value;} if ($item =~ /content_name/) { $inventory{$archive}{content_name} = $value;} } } # End of info collection sub _build { &_info_collection ; # Get target host ip $target_ip_string = \ `getent hosts $inventory{$archive}{creation_node}`; ($inventory{$archive}{creation_node_ip}, $target_host) \ = split(/\s+/,$target_ip_string); chomp $inventory{$archive}{creation_node_ip} ; # Set location of boot image if ($inventory{$archive}{creation_release} =~ /5.8/) { $image_base = '/flasharchives/boot/sol8sparc'; $image_tools = "$image_base/Solaris_8/Tools"; $rules_string = \ "hostname $inventory{$archive}{creation_node}\ .fmr.com - autogen_script \ uts_flash_finish.sh\n"; } if ($inventory{$archive}{creation_release} =~ /5.9/) { $image_base = '/flasharchives/boot/sol9sparc'; $image_tools = "$image_base/Solaris_9/Tools"; $rules_string = "hostname $inventory{$archive}{creation_node}\ .fmr.com - autogen_script \ uts_flash_finish.sh\n"; } if ($inventory{$archive}{creation_release} =~ /5.10/) { $image_base = '/flasharchives/boot/sol10sparc_bootonly'; $image_tools = "$image_base/Solaris_10/Tools"; $rules_string = "hostname $inventory{$archive}{creation_node}\ .fmr.com move_c3_to_c1.sh \ autogen_script uts_flash_finish.sh\n"; }
116
Solaris 10 Notes
# Create the rules file $rules_base = \ "$boot_base/Profiles/$inventory{$archive}{creation_node}"; $rules_location = "$rules_base/rules"; open(RULESOUT, ">$rules_location"); print RULESOUT $rules_string; close RULESOUT; # Define Profile configuration $profile $profile $profile $profile $profile $profile $profile $profile $profile = "install_type flash_install\n"; .= "archive_location http://host:80/flar/$archive\n"; .= "partitioning explicit\n"; .= "filesys c1t0d0s0 10000 /\n"; .= "filesys c1t0d0s1 10000 swap\n"; .= "filesys c1t0d0s4 72000 /export/home logging\n"; .= "filesys c1t0d0s5 free /var\n"; .= "filesys c1t0d0s6 34000 /fisc logging\n"; .= "filesys c1t0d0s7 5\n";
# Define Profile location $profile_base = \ "$boot_base/Profiles/$inventory{$archive}{creation_node}"; $profile_location = "$profile_base/autogen_script"; # # Create new profile open(PDUMP, ">$profile_location"); print PDUMP $profile; close PDUMP;
# Set the stock and new sysid cfg information $sysid_base = "$boot_base/Sysidcfg"; $sysid_stock = \ "$sysid_base/stock/$inventory{$archive}{creation_release}/sysidcfg"; $sysidcfg = \ "$sysid_base/$inventory{$archive}{creation_node}/sysidcfg"; $dump_sysidcfg .= "network_interface=ce4 \ {hostname=$inventory{$archive}{creation_node}.fmr.com \ default_route=172.26.21.1 \ ip_address=$inventory{$archive}{creation_node_ip}\ protocol_ipv6=no netmask=255.255.255.0}\n"; $dump_sysidcfg .= `cat $sysid_stock`; open(SYSIDOUT, ">$sysidcfg"); print SYSIDOUT $dump_sysidcfg; close SYSIDOUT; # Add flar statment into custom rules file # run check script
117
Solaris 10 Notes
$ret=system("cd $rules_base ; ./check"); if ($ret == 0 ) { print "Rules Check was successful\n"; } else { print "Rules Check Failed - please check\n"; print "Exiting Failed\n"; exit 1; } # Run the add_install_client script print "Test add_client statement \n"; $add_install_string = "./add_install_client \ -p host:$sysid_base/$inventory{$archive}{creation_node} \ -s host:$image_base \ -c host:$profile_base $inventory{$archive}{creation_node}\ .fmr.com sun4u"; print "$add_install_string\n"; # print "\n\nBring $inventory{$archive}{creation_node}\ down to ok prompt \ and run the following command:\n"; print "ok> boot net:speed=100,duplex=full - install\n"; }
sub _print_details { print "Details on $archive_location/$details\n"; print "=======================================================\n"; print "Server: $inventory{$archive}{creation_node} \n"; print "Creation Date: $inventory{$archive}{creation_date} \n"; print "Solaris Version: $inventory{$archive}{creation_release} \n"; print "Comments: $inventory{$archive}{content_name} \n"; } # End of sub sub _list_existing { open(BOOTP, "/etc/bootparams") || die "Bootparams does not exist,\ no systems set \ up for boot from flar\n";; print "\nThe following list of hosts are setup to jumpstart from\ this server\n"; print "Systems without a flar image listed were setup without this\ toolkit\n"; print "Validation of systems not configured with this toolkit must\ be done\n"; print "independently\n\n"; print "Host\t\tFlar Archive\n"; print "======================================================\n"; while (<BOOTP>) {
118
Solaris 10 Notes
($node, @narg) = split(/\s+/,$_); ($n1,@rest) = split(/\W+/,$node); foreach $i (@narg) { if ($i =~ /install_config/) { ($j1, $path) = split(/:/, $i); if ( -e "$path/autogen_script" ) { $loaded_flar = `grep archive_location $path/autogen_script` ; chomp $loaded_flar ; ($lc,$lf) = split(/\/flar\//,$loaded_flar); print "$n1\t\t$lf\n"; } else { print "$n1\t\tNot setup to use flar\n"; } } } } print "\n\n"; close BOOTP; exit; } sub _rm_existing { open(BOOTP, "/etc/bootparams") \ || die "Bootparams does not exist, no systems set up \ for boot from flar\n";; while (<BOOTP>) { ($node, @narg) = split(/\s+/,$_); ($n1,@rest) = split(/\W+/,$node); chomp $rmboot; chomp $n1; if ($rmboot =~ /$n1/) { foreach $i (@narg) { if ($i =~ /root=/) { ($j1, $path) = split(/:/, $i); # Filter out Boot ($ipath,$Boot) =split(/Boot/, $path); chomp $ipath; print "cd $ipath \; ./rm_install_client $n1\n"; } } } } print "\n\n"; close BOOTP; exit; } print "\n\n"; 4. List Archived FLAR Images
119
Solaris 10 Notes
#!/usr/bin/perl $arch_location='/flasharchives/flar'; @archive_list=`ls $arch_location`; print "\n\n"; foreach $archive (@archive_list) { $addto = (); @archinfo = (); $ih = (); chomp $archive; next if $archive =~ /lost/; next if $archive =~ /list/; next if $archive =~ /boot/; @archinfo = `flar -i $arch_location/$archive` ; chomp @archinfo; foreach $x (@archinfo) { ($item, $value ) = split(/=/,$x); chomp $value; if ($item =~ /creation_node/) { $inventory{$archive}{creation_node} = $value; } if ($item =~ /creation_date/) { $inventory{$archive}{creation_date} = $value; } if ($item =~ /creation_release/) { $inventory{$archive}{creation_release} = $value;} if ($item =~ /content_name/) { $inventory{$archive}{content_name} = $value;} } } $h1="Archive File Name"; $h2="Hostname"; $h3="OS"; $h4="Comments"; $h5="FID"; chomp $h1; chomp $h2 ; chomp $h3 ; chomp $h4; chomp $h5; # Format modified for documentation format BOO= @<<<<<<<<<<<<< @<<<<<< @<<<< @<<<<<<<<<< @<<<<<<<<<<<<<<< $h1, $h2, $h3, $h5, $h4; ============================================================ . write BOO; format STDOUT= @<<<<<<<<<<<<< @<<<<<< @<<<< @<<<<<<<<<< @<<<<<<<<<<<<<<<< $key, $creation_node, $creation_release, $fid, $content_name . while (($key, $content) = each(%inventory)) {
120
Solaris 10 Notes
$creation_node = $inventory{$key}{creation_node}; $creation_date = $inventory{$key}{creation_date}; $creation_release = $inventory{$key}{creation_release}; $content_name = $inventory{$key}{content_name}; $fid = $inventory{$key}{fid}; write; } print "\n\n"; 5. Code to swap Controller Numbers from Solaris 8-9 to Solaris 10 # # # # # mount -o remount,rw / cfgadm -c unconfigure c1 cfgadm -c unconfigure c2 devfsadm for dir in rdsk dsk do cd /dev/${dir} disks=`ls c3t*` for disk in $disks do newname="c1`echo $disk | awk '{print substr($1,3,6)}'`" mv $disk $newname done done
ZFS Notes
Quick notes for ZFS commands 1. Take a snapshot # zfs snapshot pool/filesystem@mybackup_comment 2. Scan and Import a ZFS Pool # zpool import -f npool 3. Rollback a snapshot # zfs rollback pool/filesystem@mybackup_comment 4. Use snapshot directory to view files # cat ~user/.zfs/shapshot/mybackup_comment/ems.c 5. Create a clone # zfs clone pool/filesystem@mybackup_comment pool/clonefs 6. Generate full backup # zfs send pool/filesystem@mybackup_comment > /backup/A 7. Generate incremental backup
121
Solaris 10 Notes
# zfs send -i pool/filesystem@mybackup_comment1\ pool/filesystem@mybackup_comment2 \ > /backup/A1-2 8. Generate incremental backup and send to remote host # zfs send -i tank/fs@11:31 tank/fs@11:32 | ssh host zfs receive -d /tank/fs 9. Comments on Clones A clone is a writable volume or file system whose initial contents are the same as the dataset from which it was created. As with snapshots, creating a clone is nearly instantaneous, and initially consumes no additional disk space Clones can only be created from a snapshot. When a snapshot is cloned, an implicit dependency is created between the clone and snapshot. Even though the clone is created somewhere else in the dataset hierarchy, the original snapshot cannot be destroyed as long as the clone exists. The origin property exposes this dependency, and the zfs destroy command lists any such dependencies, if they exist. Clones do not inherit the properties of the dataset from which it was created. Rather, clones inherit their properties based on where the clones are created in the pool hierarchy. Use the zfs get and zfs set commands to view and change the properties of a cloned dataset. For more information about setting ZFS dataset properties, see Setting ZFS Properties. Because a clone initially shares all its disk space with the original snapshot, its used property is initially zero. As changes are made to the clone, it uses more space. The used property of the original snapshot does not consider the disk space consumed by the clone. 10.Creating a clone To create a clone, use the zfs clone command, specifying the snapshot from which to create the clone, and the name of the new file system or volume. The new file system or volume can be located anywhere in the ZFS hierarchy. The type of the new dataset (for example, file system or volume) is the same type as the snapshot from which the clone was created. You cannot create clone of a file system in a pool that this different from where the original file system snapshot resides. In the following example, a new clone named tank/home/ahrens/bug123 with the same initial contents as the snapshot tank/ws/gate@yesterday is created. # zfs snapshot tank/ws/gate@yesterday # zfs clone tank/ws/gate@yesterday tank/home/ahrens/bug123 In the following example, a cloned workspace is created from the projects/newproject@today snapshot for a temporary user as projects/teamA/tempuser. Then, properties are set on the cloned workspace. # # # # zfs zfs zfs zfs snapshot projects/newproject@today clone projects/newproject@today projects/teamA/tempuser set sharenfs=on projects/teamA/tempuser set quota=5G projects/teamA/tempuser
11.Destroying a clone ZFS clones are destroyed by using the zfs destroy command. Clones must be destroyed before the parent snapshot can be destroyed. For example:
122
Solaris 10 Notes
# zfs destroy tank/home/ahrens/bug123 12.Listing ZFS Filesystems ZFS clones are destroyed by using the zfs destroy command. Clones must be destroyed before the parent snapshot can be destroyed. For example: # zfs snapshot zfzones/zone1@presysid # zfs list NAME USED AVAIL REFER MOUNTPOINT zfzones 33.4M 7.78G 33.3M /zfzones zfzones/zone1 24.5K 7.78G 24.5K /zfzones/zone1 zfzones/zone1@presysid 0 - 24.5K # zfs clone zfzones/zone1@preid zfzones/zone2 # zfs list NAME USED AVAIL REFER MOUNTPOINT zfzones 33.4M 7.78G 33.3M /zfzones zfzones/zone1 24.5K 7.78G 24.5K /zfzones/zone1 zfzones/zone1@preid 0 - 24.5K zfzones/zone2 0 7.78G 24.5K /zfzones/zone2 # zpool list zfzones NAME SIZE USED zfzones 7.94G 33.4M # # # # # # zfs zfs zfs zfs zfs zfs clone clone clone clone clone clone
AVAIL 7.90G
CAP 0%
HEALTH ONLINE
ALTROOT -
AVAIL 7.90G
CAP 0%
HEALTH ONLINE
ALTROOT -
# zfs list NAME USED AVAIL REFER MOUNTPOINT zfzones 33.5M 7.78G 33.3M /zfzones zfzones/zone1 24.5K 7.78G 24.5K /zfzones/zone1 zfzones/zone1@preid 0 - 24.5K zfzones/zone2 0 7.78G 24.5K /zfzones/zone2 zfzones/zone3 0 7.78G 24.5K /zfzones/zone3 zfzones/zone4 0 7.78G 24.5K /zfzones/zone4 zfzones/zone5 0 7.78G 24.5K /zfzones/zone5 zfzones/zone6 0 7.78G 24.5K /zfzones/zone6 zfzones/zone7 0 7.78G 24.5K /zfzones/zone7 zfzones/zone8 0 7.78G 24.5K /zfzones/zone8
ZFS ACL's
Quick notes for ZFS ACL commands
123
Solaris 10 Notes
List ACL's on a ZFS Filesystem $ ls -v file.1 -r--r--r-1 root root 206663 May 4 11:52 file.1 0:owner@:write_data/append_data/execute:deny 1:owner@:read_data/write_xattr/write_attributes\ /write_acl/write_owner :allow 2:group@:write_data/append_data/execute:deny 3:group@:read_data:allow 4:everyone@:write_data/append_data/write_xattr\ /execute/write_attributes /write_acl/write_owner:deny 5:eone@:read_data/read_xattr/read_attributes\ /read_acl/synchronize :allow Setting non-trivial ACL on a file # chmod A+user:gozer:read_data/execute:allow test.dir # ls -dv test.dir drwxr-xr-x+ 2 root root 2 Feb 16 11:12 test.dir 0:user:gozer:list_directory/read_data/execute:allow 1:owner@::deny 2:owner@:list_directory/read_data/add_file/write_data/\ add_subdirectory /append_data/write_xattr/execute/write_attributes/write_acl /write_owner:allow 3:group@:add_file/write_data/add_subdirectory/append_data:deny 4:group@:list_directory/read_data/execute:allow 5:eone@:add_file/write_data/add_subdirectory/append_data/\ write_xattr /write_attributes/write_acl/write_owner:deny 6:eone@:list_directory/read_data/read_xattr/execute/\ read_attributes /read_acl/synchronize:allow Remove Permissions # chmod A0- test.dir # ls -dv test.dir drwxr-xr-x 2 root root 2 Feb 16 11:12 test.dir 0:owner@::deny 1:owner@:list_directory/read_data/add_file/write_data/\ add_subdirectory /append_data/write_xattr/execute/write_attributes/\ write_acl /write_owner:allow 2:group@:add_file/write_data/add_subdirectory/append_data:deny 3:group@:list_directory/read_data/execute:allow 4:eone@:add_file/write_data/add_subdirectory/append_data/\ write_xattr /write_attributes/write_acl/write_owner:deny
124
Solaris 10 Notes
125
Solaris 10 Notes
If a future memory requirement is significantly large and well defined, then it can be advantageous to prevent ZFS from growing the ARC into it. So, if we know that a future application requires 20% of memory, it makes sense to cap the ARC such that it does not consume more than the remaining 80% of memory. If the application is a known consumer of large memory pages, then again limiting the ARC prevents ZFS from breaking up the pages and fragmenting the memory. Limiting the ARC preserves the availability of large pages. If dynamic reconfiguration of a memory board is needed (supported on certain platforms), then it is a requirement to prevent the ARC (and thus the kernel cage) togrow onto all boards. For theses cases, it can be desirable to limit the ARC. This will, of course, also limit the amount of cached data and this can have adverse effects on performance. No easy way exists to foretell if limiting the ARC degrades performance. If you tune this parameter, please reference this URL in shell script or in an /etc/system comment. http://www.solarisinternals.com/wiki/ index.php/ZFS_Evil_Tuning_Guide#ARCSIZE You can also use the arcstat script available at http:// blogs.sun.com/realneel/entry/zfs_arc_statistics to check the arc size as well as other arc statistics 4. Set the ARC maximum in /etc/system This syntax is provided starting in the Solaris 10 8/07 release and Nevada (build 51) release. For example, if an application needs 5 GBytes of memory on a system with 36-GBytes of memory, you could set the arc maximum to 30 GBytes, (0x780000000 or 32212254720 bytes). Set the zfs:zfs_arc_max parameter in the /etc/system file:
/etc/system: set zfs:zfs_arc_max = 0x780000000 * or set zfs:zfs_arc_max = 32212254720 5. Perl code to configure ARC cache at boot time - init script #!/bin/perl use strict; my $arc_max = shift @ARGV; if ( !defined($arc_max) ) { print STDERR "usage: arc_tune <arc max>\n"; exit -1; } $| = 1; use IPC::Open2; my %syms; my $mdb = "/usr/bin/mdb"; open2(*READ, *WRITE, "$mdb -kw") || die "cannot execute mdb"; print WRITE "arc::print -a\n"; while(<READ>) { my $line = $_; if ( $line =~ /^ +([a-f0-9]+) (.*) =/ ) { $syms{$2} = $1;
126
Solaris 10 Notes
} elsif ( $line =~ /^\}/ ) { last; } } # set c & c_max to our max; printf WRITE "%s/Z 0x%x\n", print scalar <READ>; printf WRITE "%s/Z 0x%x\n", print scalar <READ>; printf WRITE "%s/Z 0x%x\n", print scalar <READ>; set p to max/2 $syms{p}, ( $arc_max / 2 ); $syms{c}, $arc_max; $syms{c_max}, $arc_max;
127
esxcfg-firewall
128
VMWare ESX 3
ESX 3 Command
Description are having problems with your ESX server after an in-place upgrade, this tool is invaluable in resolving the problems with service console networking. Configures the service console authentication options including NIS, LDAP, Kerberos and Active Directory. Produces an enormous amount of information about the ESX host. You really need to pipe this to a file for closer examination! Manages multi-pathing just as the vmkmultipath utility did in previous versions of ESX Server. Used to manage the new ESX feature called resource groups. This command can add, remove or modify existing resource groups. esxcfg-hbadevs The esxcfg-vmhbadevs command is used to list the equivalent Linux device names for the visible disk devices that the VMkernel references using vmhba notation. If we use this command with the m switch, then we only list the LUNs which contain VMFS partitions. Alongside the Linux device name, a long unique hexadecimal value is listed. This is the VMFS volume signature assigned by the new logical volume manager (LVM). Used to configure the GRUB options presented at boot time. One thing to note is that the new esxcfg commands will not run if you boot just into Linux. If you just want to query the boot settings, you can use the -q switch but this must be qualified with the keyword boot or vmkmod. Used to configure access to Network Attached Storage (NAS). If we add an IP address to the VMkernel by adding a VMkernel port, then we can fully configure that IP stack by also assigning a default gateway. We can view (no parameters) and set (1st parameter) the VMkernel IP default gateway with the esxcfg-route command Used to view and set configure the VMkernel ports on virtual Ethernet switches. A VMkernel port is a special type of port group on a virtual Ethernet switch which is used to assign an IP address to the VMkernel. The VMkernel only needs an IP address for VMotion, software-initiated iSCSI or NFS access. If you need to create a VMkernel port at the command line, then you need to create a port group first and then enable it as a VMkernel port. There doesnt appear to be a way of enabling
esxcfg-auth
esxcfg-info
esxcfg-mpath esxcfg-resgrp
esxcfg-hbadevs
esxcfg-boot
esxcfg-nas esxcfg-route
esxcfg-vmknic
129
VMWare ESX 3
ESX 3 Command
Description a VMkernel port for VMotion from the command line. Used to configure the VMkernel crash dump partition. The old ESX 2.x utility for this function (vmkdump) is still present on an ESX 3 server, but appears just to be for extracting dump files. esxcfg-linuxnet --setup This tool can be used to view and configure the speed and duplex settings of the physical network cards in the ESX Server. So this tool can replace the MUI Network Connections/Physical Adapters, the mii-tool and modules.conf for network card management, ESX version 3.0 supports both hardware and software iSCSI. For hardware iSCSI, we can use host bus adapters which perform the TCP offload and so the vmkernel can just pass SCSI commands to them as normal. The iSCSI hba can then wrap the SCSI command in TCP/IP and forward to the iSCSI target. However, in software iSCSI (swiscsi), the wrapping of SCSI commands in TCP/IP is performed by the VMkernel and a regular physical network card can be used to communicate with the iSCSI target. This is exposed in the VI Client as a host bus adapter called vmhba40. This will place a significant load on the VMkernel and wouldn't be that great an idea, but the feature is in ESX 3.0! So we use this tool esxcfg-swiscsi to configure it. The software iSCSI initiator in the VMkernel has a dependency upon the service console, therefore both the service console and VMkernel must have an IP route to the iSCSI target. I have found that you need this command to scan for a new iSCSI target, as the VI Client rescan of the vmhba40 adapter doesn't appear to successfully discover targets. My suggestion for getting the software iSCSI to work is as follows: 1. Add a VMkernel port to a vSwitch that has an uplink and route to iSCSI target#2. Ensure service console IP interface has a route to the same iSCSI target#3. Using either the VI Client security profile or the esxcfg-firewall, open a service console port for iSCSI (TCP:3260)#4. In the VI Client, enable the vmhab40 software iSCSI adapter and wait for the reconfiguration task to change from "In Progress" to "Completed"#5. Reboot the ESX host. This step will result in the VMkernel module for iSCSI being loaded at next boot.#6. In the VI Client, configure the vmhba40 adapter with an iSCSI target IP address#7. At the service console command line, run esxcfg-swiscsi -e#8. At the service console command line, run esxcfg-swiscsi -d#9. At the
esxcfg-dumppart
esxcfg-linuxnet esxcfg-nics
esxcfg-swiscsi
130
VMWare ESX 3
ESX 3 Command
Description service console command line, run esxcfg-swiscsi -e#10. At the service console command line, run esxcfg-swiscsi -s#11. In the VI Client, perform a rescan of the vmhba adapters and your iSCSI target should become visible.
131
VMWare ESX 3
#/usr/bin/vmware-cmd<cfg> setguestinfo <variable> <value> #/usr/bin/vmware-cmd<cfg> getguestinfo <variable> #/usr/bin/vmware-cmd<cfg> getproductinfo <prodinfo> #/usr/bin/vmware-cmd<cfg> connectdevice <device_name> #/usr/bin/vmware-cmd<cfg> disconnectdevice <device_name> #/usr/bin/vmware-cmd<cfg> getconfigfile #/usr/bin/vmware-cmd<cfg> getheartbeat #/usr/bin/vmware-cmd<cfg> getuptime #/usr/bin/vmware-cmd<cfg> gettoolslastactive #/usr/bin/vmware-cmd<cfg> getresource <variable> #/usr/bin/vmware-cmd<cfg> setresource <variable> <value> #/usr/bin/vmware-cmd<cfg> hassnapshot #/usr/bin/vmware-cmd<cfg> createsnapshot <name> <description> <quiesce> <memory> #/usr/bin/vmware-cmd<cfg> revertsnapshot #/usr/bin/vmware-cmd<cfg> answer
Common Tasks
Expand a VM Disk to 20GB #vmkfstools -X 20GB /vmfs/volumes/<datastore>/virtualguest.vmdk Register/Un-Register a VMW # /usr/bin/vmware-cmd virtualguest.vmx # /usr/bin/vmware-cmd virtualguest.vmx -s register /vmfs/volumes/<datastore>/
-s
unregister
/vmfs/volumes/<datastore>/
Start/Stop/Restart/Suspend a VMW # /usr/bin/vmware-cmd /vmfs/volumes/<datastore>/virtualguest.vmx start # /usr/bin/vmware-cmd /vmfs/volumes/<datastore>/virtualguest.vmx stop # /usr/bin/vmware-cmd /vmfs/volumes/<datastore>/virtualguest.vmx reset # /usr/bin/vmware-cmd /vmfs/volumes/<datastore>/virtualguest.vmx suspend
132
VMWare ESX 3
Disk vmhba0:0:0 /dev/cciss/c0d0 (69459M has 1 paths and policy of Fixed#Local 2:1.0 vmhba0:0:0 On active preferred Disk vmhba1:0:0 (0M has 1 paths and policy of Most Recently Used# FC 10:1.0 210000e08b846a72<->5006016930221397 vmhba1:0:0 On active preferred Disk vmhba1:0:6 /dev/sda (9216M has 1 paths and policy of Most Recently Used#FC 10:1.0 210000e08b846a72<->5006016930221397 vmhba1:0:6 On active preferred Disk vmhba1:0:21 /dev/sdb (10240M has 1 paths and policy of Most Recently Used#FC 10:1.0 210000e08b846a72<->5006016930221397 vmhba1:0:21 On active preferred Map Disks to HBA's # esxcfg-vmhbadevs
vmhba0:0:0:1 /dev/sda1 45407607-fbc43ced-94cb-00145e231ce3# vmhba0:0:2:1 /dev/sdc1 455b08a8-8af7fee3-daa9-00145e231e35# vmhba2:0:0:3 /dev/sde3 4559c75f-831d8f3e-bc81-00145e231e35 Get and Set the Default Router # esxcfg-route
133
VMWare ESX 3
Create a new virtual disk. Disk size 20Gb or less or more. (do not allocate disk now) Define your destination path as created previously + name your disk DATA-SHARED elect the advanced options: select the virtual device node to "SCSI 1:0" and the mode to "Independent" and "Persistent" 2. Adding Line in VMWare Configuration File Go to the bottom of the vmx file. There you will see the following lines: scsi1.present = "TRUE" scsi1.sharedBus = "none" scsi1.virtualDev = "lsilogic" scsi1:0.present = "TRUE" scsi1:0.fileName = "D:\Virtual Machines\Shared Disk\SHARED-DISK.vmdk" Change them in the lines below: disk.locking = "FALSE" diskLib.dataCacheMaxSize = "0" #scsi1 data storage scsi1.present = "TRUE" scsi1.virtualDev = "lsilogic" scsi1.sharedbus = "none" scsi1:0.present = "TRUE" scsi1:0.fileName = "D:\Virtual Machines\Shared Disk\SHARED-DISK.vmdk" scsi1:0.mode = "independent-persistent" scsi1:0.shared = "TRUE" scsi1:0.redo = ""
134
VMWare ESX 3
unshift @ARGV, "s/$source/$dest/"; # default to replace in text files if ( ! -d "$source" ) { print "Error: Source directory '$source' does not exist.\n Please specify a relative path to CWD or the full path\n"; exit 2; } if ( -d "$dest" ) { print "Error: Destination directory '$dest' already exists.\n You cannot overwrite an existing VM image with this tool.\n"; exit 3; } my $regexwarn = 0; foreach (@ARGV) { if ( ! /^s\/[^\/]+\/[^\/]+\/$/ ) { $regexwarn = 1; warn "Error: Invalid regex pattern in: $_\n"; } } exit 4 if $regexwarn == 1;
# If we get here then $source and $dest are good if ( ! mkdir "$dest" ) { print "Error: Failed to create destination dir '$dest': $!\n"; exit 4; }
# Now get a list of all the files in each # directory and copy them to dest @files = listdir($source); #print @files; foreach $srcfile (@files) { # we want to copy $srcfile from $src to $dest # but first check if we need to rename the file $destfile = $srcfile; if ($destfile =~ /$source/ ) { # source filename contains the source dir name, rename it $destfile =~ s/$source/$dest/gi; } $istext = is_vmtextfile($srcfile); printf("Copying %s: %s/%s -> %s/%s\n", ($istext ? "text" : "binary"), $source, $srcfile, $dest, $destfile);
135
VMWare ESX 3
if ($istext == 0) { # do binary copy - no need to check regx args copy_file_bin("$source/$srcfile", "$dest/$destfile"); } else { # text copy - need to string replace on each line. copy_file_regex("$source/$srcfile", "$dest/$destfile", @ARGV); chmod 0755, "$dest/$destfile" if ($destfile =~ /\.vmx$/); # file needs to be mode 0755 } }
exit 0; sub copy_file_regex { my $src = shift; my $dst = shift; my @regexs = @_; my $buf = ''; my $regex = ''; open(COPYIN, "<$src") || warn "Can't read $src: $!\n"; open(COPYOUT, ">$dst") || warn "Can't write $dst: $!\n"; binmode COPYIN; binmode COPYOUT; while ( read(COPYIN, $buf, 65536) ) { #while ($buf = <COPYIN>) { foreach $regex (@regexs) { (undef, $search, $replace) = split("/", $regex); $buf =~ s/$search/$replace/g; } print COPYOUT $buf; } close COPYOUT || warn "Can't close $dst: $!\n"; close COPYIN || warn "Can't close $src: $!\n"; } sub copy_file_bin { my ($src, $dst) = @_; my $buf; open(COPYIN, "<$src") || warn "Can't read $src: $!\n"; open(COPYOUT, ">$dst") || warn "Can't write $dst: $!\n"; binmode COPYIN; binmode COPYOUT; while ( read(COPYIN, $buf, 65536) and print COPYOUT $buf ) {}; warn "Could not complete copy: $!\n" if $!; close COPYOUT || warn "Can't close $dst: $!\n"; close COPYIN || warn "Can't close $src: $!\n"; }
136
VMWare ESX 3
sub is_vmtextfile { my $file = shift; my $istxt = 0; $istxt = 1 if ( $file =~ /\.(vmdk|vmx|vmxf|vmsd|vmsn)$/ ); $istxt = 0 if ( $file =~ /-flat\.vmdk$/ ); $istxt = 0 if ( $file =~ /-delta\.vmdk$/ ); return $istxt; } sub listdir { my $dir = shift; my @nfiles = (); opendir(FH, $dir) || warn "Can't open $dir: $!\n"; @nfiles = grep { (-f "$dir/$_" && !-l "$dir/$_") } readdir(FH); closedir(FH); return @nfiles; } sub usage { print <<EOUSAGE; $0: Tool to "quickly" clone a VMware ESX guest OS Usage: $0 sourcedir destdir $0 "source dir" "dest dir" $0 sourcedir destdir [regexreplace [...]] e.g. # vmclone "winxp" "uscuv-clone" \ 's/memsize = "512"/memsize = "256"/' Clones a vmware image located in sourcedir to the destdir directory. The source machine must be powered off for this to correctly clone it. By default, if any filenames have "sourcedir" as part of their filename, then it is renamed to "destdir". The optional regexreplace argument will cause that regular expression to be performed on all the text files being copied. A default regexreplace of s/sourcedir/destdir/ is done by default. You may use multiple regexs. Author: Paul Gregg <pgregg\@pgregg.com> Jan 7, 2007 EOUSAGE exit 1; }
137
VMWare ESX 3
# cp -axvsol01 vsol02 2. In the new guest location rename the disk image [/vsol02]# /vmware/bin/vmware-vdiskmanager-n vsol01.vmdk vsol02.vmdk 3. Update vmx file to reference new image name [/vsol02]# mv vsol01.vmx vsol02.vmx 4. Rename virtual machine config and change disk image name in this config file [/vsol02]#sed -i 's/vsol01.vmdk/vsol02.vmdk/' vsol02.vmx 5. Register VMWare Image /usr/bin/vmware-cmd virtualguest.vmx -s register /vmfs/volumes/<datastore>/
138
VMWare ESX 3
Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby Disk vmhba2:1:1 /dev/sde (61440MB) has 2 paths and policy of Most Recently Used FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:1 On active preferred FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:1 Standby The following is an analysis of the first LUN: Canonical name Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby This is the canonical device name the ESX Server host used to refer to the LUN.
Note
When there are multiple paths to a LUN, the canonical name is the first path that was detected for this LUN. vmhba2:1:4 is one of the Host Bus Adapters (HBA). vmhba2:1:4 is one of the Host Bus Adapters (HBA). vmhba2:1:4 is the second storage target (numbering starts at 0) that was detected by this HBA. vmhba2:1:4 is the number of the LUN on this storage target. For multipathing to work properly, each LUN must present the same LUN number to all ESX Server hosts.
139
VMWare ESX 3
If the vmhba number for the HBA is a single digit number, it is a physical adapter. If the address is vmhba40 or vmhba32, it is a software iSCSI device for ESX Server 3.0 and ESX Server 3.5 respectively. Linux device name, Storage Capasity, LUN Type, WWPN, WWNN in order of highlights Disk vmhba2:1:4 /dev/sdh (30720MB) has 2 paths and policy of Most Recently Used FC 10:3.0 210000e08b89a99b<->5006016130221fdd vmhba2:1:4 On active preferred FC 10:3.0 210000e08b89a99b<->5006016930221fdd vmhba2:3:4 Standby This is the associated Linux device handle for the LUN. You must use this reference when using utilities like fdisk. There are three possible valuse for LUN Disk type: FC: This LUN is presented through a fibre channel device. iScsi: This LUN is presented through an iSCSI device. Local: This LUN is a local disk.
140
Add An Etherchannel Select only the first adapter to be added into the channel Etherchannel Adapters ent1 Enable ALTERNATE ETHERCHANNEL address no ALTERNATE ETHERCHANNEL address Enable GIGABIT ETHERNET JUMBO frames no Mode standard Hash Mode default Backup Adapter ent2 Internet Address to Ping <Default Gateway int> Number of Retries 10 Retry Timeout (sec) 1
2. Backup Adapter The default gateway should be supplied by data networks. The key entry here is the declaration of a backup adapter. This will create the next available ethernet card definition i.e. ent3. This is a logical device but is also the device on which the IP address will be bound smitty chinet en3 Network Interface Name en3 INTERNET ADDRESS (dotted decimal) <IP address>
141
AIX Notes
Network MASK (hexadecimal or dotted decimal) <subnet mask> Current STATE up Use Address Resolution Protocol (ARP)? yes BROADCAST ADDRESS (dotted decimal)
3. Edit /etc/hosts Edit /etc/hosts and set up an entry for the newly configured IP address. The format is <hostname>en* in this case: nac001en3 Check that the IP label is being resolved locally via: netstat -i The interface card en3 will now be available as shown via : ifconfig a The active card, by default is the first card listed in the etherchannel configuration: lsattr El ent3 adapter_names ent1 alt_addr 0x000000000000 Address backup_adapter ent2 hash_mode default mode standard netaddr <gateway address> num_retries 10 retry_time 1 use_alt_addr no use_jumbo_frame no EtherChannel Adapters Alternate EtherChannel Adapter used when whole channel Determines how outgoing adapter EtherChannel mode of operation Address to ping Times to retry ping before failing Wait time (seconds) between pings Enable Alternate EtherChannel Enable Gigabit Ethernet Jumbo
Use the etherchannel interface en3 as the Device for the NIC resource. An IP resource will depend on this NIC resource.
142
143
set udp:xmit_hiwat=65536 set udp:udp_recv_hiwat=65536 Project Setup for Oracle User # projadd -U oracle -K \ "project.max-shm-memory=(privileged,21474836480,deny);\ project.max-shm-ids=(privileged,1024,deny);\ process.max-sem-ops=(privileged,4000,deny);\ process.max-sem-nsems=(privileged,7500,deny);\ project.max-sem-ids=(privileged,4198,deny);\ process.max-msg-qbytes=(privileged,1048576,deny);\ process.max-msg-messages=(privileged,65535,deny);\ project.max-msg-ids=(privileged,5120,deny)" oracle IPMP Public All four public IP addresses need to reside on the same network subnet. The following is the list of IP addresses that will be used in the following example. Physical IP : 146.56.77.30 Test IP for ce0 : 146.56.77.31 Test IP for ce1 : 146.56.77.32 Oracle VIP : 146.56.78.1
IPMP NIC Configuration at boot time /etc/hostname.ce0 146.56.77.30 netmask + broadcast + group orapub up addif 146.56.77.31 deprecated -failover netmask + broadcast + up /etc/hostname.ce1 146.56.77.32 netmask + broadcast + deprecated group orapub -failover standby up The VIP should now be configured to use all NIC's assigned to the same public IPMP group. By doing this Oracle will automatically choose the primary NIC within the group to configure the VIP, and IPMP will be able to fail over the VIP within the IPMP group upon a single NIC failure. When running VIPCA: At the second screen in VIPCA (VIP Configuration Assistant, 1 of 2), select all NIC's within the same IPMP group where the VIP should run at. If already running execute the following: # srvctl stop nodeapps -n node # srvctl modify nodeapps -n node \ -o /u01/app/oracle/product/10gdb \ -A 146.56.78.1/255.255.252.0/ce0\|ce1 # srvctl start nodeapps -n node IPMP Private Connections Make sure IPMP is configured prior to install, with Private IP up on both nodes. The recommended solution is not to configure any private interface in oracle. The following steps need to done to use IPMP for the cluster interconnect:
144
1. If the private interface has already been configured delete the interface with 'oifcfg delif' oifcfg getif oifcfg delif -global <if_name> 2. Set the CLUSTER_INTERCONNECTS parameter in the spfile/init.ora to the physical IP which is swapped by IPMP. DO NOT ADD LINE BREAKS '\' ALTER SYSTEM SET CLUSTER_INTERCONNECTS = '10.0.0.25' scope=spfile sid='nick01'; ALTER SYSTEM SET CLUSTER_INTERCONNECTS = '10.0.0.26' scope=spfile sid='nick02'; \ \
3. Set the CLUSTER_INTERCONNECTS also for your ASM instances 4. Verify Correct Settings in use SQL> select * from gv$cluster_interconnects; SQL> show parameter cluster_interconnects; $CRS_HOME/bin/oifcfg getif bge0 170.13.76.0 global public e1000g0 170.13.76.0 global public
Permissions for ASM Raw Disks # chown oracle:dba /dev/rdsk/cxtydzs6 # chmod 660 /dev/rdsk/cxtydzs6 Oratab set to use ASM # more /var/opt/oracle/oratab +ASM2:oracle_home_path Check ASM Space $ $ORACLE_HOME/bin/sqlplus "SYS/SYS_password as SYSDBA" SQL> SELECT NAME,TYPE,TOTAL_MB,FREE_MB FROM V$ASM_DISKGROUP;
145
SQL> alter system set "_asm_allow_only_raw_disks"=false scope=spfile; SQL> alter system set asm_diskstring='/asmdisks\_file*' scope=both; SQL> shutdown SQL> startup $ mkdir /asmdisks $ cd /asmdisks $ ln -s /dev/rdsk/dev_needed _file_disk_description set oracle_sid=+ASM sqlplus "/ as sysdba" SQL> SELECT disk_number, mount_status, header_status, state, path 2 FROM v$asm_disk DISK_NUMBER MOUNT_S HEADER_STATU STATE PATH - ------ ------- ------- ------- ---------------0 CLOSED CANDIDATE NORMAL /ASMDISKS/_FILE_DISK1 1 CLOSED CANDIDATE NORMAL /ASMDISKS/_FILE_DISK2 2 CLOSED CANDIDATE NORMAL /ASMDISKS/_FILE_DISK3 3 CLOSED CANDIDATE NORMAL /ASMDISKS/_FILE_DISK4 Tables and Views
146
Listener is running on node: vm01 ONS daemon is running on node: vm01 $ srvctl status nodeapps -n vm02 VIP is running on node: vm02 GSD is running on node: vm02 Listener is running on node: vm02 ONS daemon is running on node: vm02 Check status of ASM $ srvctl status asm -n vm01 ASM instance +ASM1 is running on node vm01. $ srvctl status asm -n vm02 ASM instance +ASM2 is running on node vm02.
Check status of DB $ srvctl status database -d esxrac Instance esxrac1 is running on node vm01 Instance esxrac2 is running on node vm02 Check status of CRS Run on each node $ crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy
147
SQL> select file_name, bytes/1024/1024 2 from dba_data_files 3 / FILE_NAME BYTES/1024/1024 ----------------------------------------------- --------------+ORADATA/esxrac/datafile/system.259.620732719 500 +ORADATA/esxrac/datafile/undotbs1.260.620732753 200 +ORADATA/esxrac/datafile/sysaux.261.620732767 670 +ORADATA/esxrac/datafile/example.263.620732791 150 +ORADATA/esxrac/datafile/undotbs2.264.620732801 200 +ORADATA/esxrac/datafile/users.265.620732817 5 6 rows selected.
Querying RAC the status of all the groups, type, membership SQL> select group#, type, member, is_recovery_dest_file 2 from v$logfile 3 order by group# 4 / GROUP# TYPE MEMBER IS_ ------ ------- ----------------------------------------------1 ONLINE +ORADATA/esxrac/onlinelog/group_1.257.620732695 NO 1 ONLINE +FLASH_RECO_AREA/esxrac/onlinelog/group_1.257.620732699 YES 2 ONLINE +ORADATA/esxrac/onlinelog/group_2.258.620732703 NO 2 ONLINE +FLASH_RECO_AREA/esxrac/onlinelog/group_2.258.620732707 YES 3 ONLINE +ORADATA/esxrac/onlinelog/group_3.266.620737527 NO 3 ONLINE +FLASH_RECO_AREA/esxrac/onlinelog/group_3.259.620737533 YES 4 ONLINE +ORADATA/esxrac/onlinelog/group_4.267.620737535 NO 4 ONLINE +FLASH_RECO_AREA/esxrac/onlinelog/group_4.260.620737539 YES Querying RAC for datafiles SQL> select file_name, bytes/1024/1024 2 from dba_data_files 3 / FILE_NAME ----------------------------------------------+ORADATA/esxrac/datafile/system.259.620732719 +ORADATA/esxrac/datafile/undotbs1.260.620732753 +ORADATA/esxrac/datafile/sysaux.261.620732767 +ORADATA/esxrac/datafile/example.263.620732791 +ORADATA/esxrac/datafile/undotbs2.264.620732801 +ORADATA/esxrac/datafile/users.265.620732817 6 rows selected.
Querying RAC v$asm_diskgroup view select group_number, name,allocation_unit_size alloc_unit_size, state,type,total_mb,usable_file_mb from v$asm_diskgroup; GROUP_NUMBER NAME ALLOC_UNIT_SIZE STATE TYPE TOTAL_MB USABLE_FILE_MB ------------ ------------- ---------- -------- ------ ----- ----------
148
1 FLASH_RECO_AREA 2 ORADATA
10236 20472
2781 8132
Querying RAC v$asm_diskgroup for our volumes select name, path, header_status, total_mb free_mb, trunc(bytes_read/1024/1024) read_mb, trunc(bytes_written/1024/1024) write_mb from v$asm_disk; NAME ----VOL1 VOL2 VOL3 PATH ---------ORCL:VOL1 ORCL:VOL2 ORCL:VOL3 HEADER_STATU FREE_MB READ_MB WRITE_MB ------- ---------- ------ -------MEMBER 10236 39617 15816 MEMBER 10236 7424 15816 MEMBER 10236 1123 13059
Querying RAC All datafiles in one go SQL> select name from v$datafile 2 union 3 select name from v$controlfile 4 union 5 select name from v$tempfile 6 union 7 select member from v$logfile 8 / NAME --------------------------------------------------------+FLASH_RECO_AREA/esxrac/controlfile/current.256.620732691 +FLASH_RECO_AREA/esxrac/onlinelog/group_1.257.620732699 +FLASH_RECO_AREA/esxrac/onlinelog/group_2.258.620732707 +FLASH_RECO_AREA/esxrac/onlinelog/group_3.259.620737533 +FLASH_RECO_AREA/esxrac/onlinelog/group_4.260.620737539 +ORADATA/esxrac/controlfile/current.256.620732689 +ORADATA/esxrac/datafile/example.263.620732791 +ORADATA/esxrac/datafile/sysaux.261.620732767 +ORADATA/esxrac/datafile/system.259.620732719 +ORADATA/esxrac/datafile/undotbs1.260.620732753 +ORADATA/esxrac/datafile/undotbs2.264.620732801 +ORADATA/esxrac/datafile/users.265.620732817 +ORADATA/esxrac/onlinelog/group_1.257.620732695 +ORADATA/esxrac/onlinelog/group_2.258.620732703 +ORADATA/esxrac/onlinelog/group_3.266.620737527 +ORADATA/esxrac/onlinelog/group_4.267.620737535 +ORADATA/esxrac/tempfile/temp.262.620732779 17 rows selected. Querying RAC Listing all the tablespaces SQL> 2 3 4 select tablespace_name, file_name from dba_data_files union select tablespace_name, file_name
149
5 from dba_temp_files 6 / TABLESPACE_NAME FILE_NAME --------------------------------------------------EXAMPLE +ORADATA/esxrac/datafile/example.263.620732791 SYSAUX +ORADATA/esxrac/datafile/sysaux.261.620732767 SYSTEM +ORADATA/esxrac/datafile/system.259.620732719 TEMP +ORADATA/esxrac/tempfile/temp.262.620732779 UNDOTBS1 +ORADATA/esxrac/datafile/undotbs1.260.620732753 UNDOTBS2 +ORADATA/esxrac/datafile/undotbs2.264.620732801 USERS +ORADATA/esxrac/datafile/users.265.620732817 7 rows selected. Querying ASM to list disks in use SQL> select name, header_status, path from v$asm_disk; NAME HEADER_STATUS PATH ------------ ------------- ------------------------CANDIDATE /dev/rdsk/disk07 DISK06 MEMBER /dev/rdsk/disk06 DISK05 MEMBER /dev/rdsk/disk05 DISK04 MEMBER /dev/rdsk/disk04 DISK03 MEMBER /dev/rdsk/disk03 DISK02 MEMBER /dev/rdsk/disk02 DISK01 MEMBER /dev/rdsk/disk01 This script will give you information of the +ASM1 instance files: SQL> select group_number, file_number, bytes/1024/1024/1024 GB, type, striped, modification_date 2 from v$asm_file 3 where TYPE != 'ARCHIVELOG' 4 / GRP_NUM FILE_NUM GB TYPE STRIPE MODIFICAT ------- -------- -------- --------------- ------ --------1 256 .01 CONTROLFILE FINE 04-MAY-07 1 257 .05 ONLINELOG FINE 25-MAY-07 1 258 .05 ONLINELOG FINE 24-MAY-07 1 259 .05 ONLINELOG FINE 24-MAY-07 1 260 .05 ONLINELOG FINE 25-MAY-07 1 261 .00 PARAMETERFILE COARSE 24-MAY-07 2 256 .01 CONTROLFILE FINE 04-MAY-07 2 257 .05 ONLINELOG FINE 25-MAY-07 2 258 .05 ONLINELOG FINE 24-MAY-07 2 259 .49 DATAFILE COARSE 04-MAY-07 2 260 .20 DATAFILE COARSE 04-MAY-07 2 261 .65 DATAFILE COARSE 23-MAY-07 2 262 .03 TEMPFILE COARSE 04-MAY-07 2 263 .15 DATAFILE COARSE 04-MAY-07 2 264 .20 DATAFILE COARSE 04-MAY-07 2 265 .00 DATAFILE COARSE 04-MAY-07 2 266 .05 ONLINELOG FINE 24-MAY-07 2 267 .05 ONLINELOG FINE 25-MAY-07
150
18 rows selected. This script will give you information of the +ASM1 instance files: More detailed information SQL> select group_number, file_number, incarnation, block_size, bytes/1024/1024/1024 GB, type, striped, 2 creation_date 3 from v$asm_file 4 where TYPE != 'ARCHIVELOG' 5 / GRP_NUM FILE_NUM INCARNATION BLOCK_SIZE GB TYPE STRIPE CREATION_ ------- -------- ----------- ------ ------ ------ ------ --------1 256 620732691 16384 .01 CONTROLFILE FINE 24-APR-07 1 257 620732699 512 .05 ONLINELOG FINE 24-APR-07 1 258 620732707 512 .05 ONLINELOG FINE 24-APR-07 1 259 620737533 512 .05 ONLINELOG FINE 24-APR-07 1 260 620737539 512 .05 ONLINELOG FINE 24-APR-07 1 261 620737547 512 .00 PARAMETERFILE COARSE 24-APR-07 2 256 620732689 16384 .01 CONTROLFILE FINE 24-APR-07 2 257 620732695 512 .05 ONLINELOG FINE 24-APR-07 2 258 620732703 512 .05 ONLINELOG FINE 24-APR-07 2 259 620732719 8192 .49 DATAFILE COARSE 24-APR-07 2 260 620732753 8192 .20 DATAFILE COARSE 24-APR-07 2 261 620732767 8192 .65 DATAFILE COARSE 24-APR-07 2 262 620732779 8192 .03 TEMPFILE COARSE 24-APR-07 2 263 620732791 8192 .15 DATAFILE COARSE 24-APR-07 2 264 620732801 8192 .20 DATAFILE COARSE 24-APR-07 2 265 620732817 8192 .00 DATAFILE COARSE 24-APR-07 2 266 620737527 512 .05 ONLINELOG FINE 24-APR-07 2 267 620737535 512 .05 ONLINELOG FINE 24-APR-07 18 rows selected.
151
152
EMC Storage
Pseudo name=emcpower6a Symmetrix ID=000184503070 Logical device ID=0021 state=alive; policy=SymmOpt; priority=0; queued-IOs=0 ----------- Host --------### HW Path I/O Paths 0 sbus@2,0/fcaw@2,0 1 sbus@6,0/fcaw@1,0 - Stor Interf. -- I/O Path Mode State active active -- Stats --Q-IOs Errors 0 0 1 0
dead alive
# powermt display paths Symmetrix logical device count=20 - Host Bus Adapters - - Storage System - I/O Paths ### HW Path ID Interface Total Dead 0 sbus@2,0/fcaw@2,0 1 sbus@6,0/fcaw@1,0 000184503070 000184503070 FA 13bA FA 4bA 20 20 20 0
CLARiiON logical device count=0 - Host Bus Adapters --- ---- Storage System --- - I/O Paths ### HW Path ID Interface Total Dead # powermt display ports Storage class = Symmetrix ------ Storage System ------ID Interface Wt_Q 000184503070 000184503070 FA 13bA FA 4bA 256 256 -- I/O Paths -Total Dead 20 20 20 0 --- Stats --Q-IOs Errors 0 0 20 0
Storage class = CLARiiON ------ Storage System ----ID Interface Wt_Q -- I/O Paths -Total Dead --- Stats --Q-IOs Errors
Disable PowerPath
1. Please ensure that LUNS are available to the host from multiple paths # powermt display 2. Stop the Application so that there is no i/o issued to Powerpath devices If the application is under VCS control , please offline the service on that node # hagrp offline <servicename> offline
3. Unmount filesystems and Stop the volumes so that there is no volumes under i/o
153
EMC Storage
# umount /<mount_point #vxvol g <dgname> stop all 4. Stop CVM and VERITAS Fencing on the node ( if part of a VCS cluster) NOTE: All nodes in VCS cluster need to be brought down if CVM / fencing are enabled. #vxclustadm stopnode # /etc/init.d/vxfen stop 5. Disable volume Manager startup #touch /etc/vx/reconfig.d/state.d/install-db 6. Reboot host #shutdown y i6 7. Unmanage/remove Powerpath devices #powermt remove dev=all 8. Verify that Powerpath devices have been removed #powermt display dev=all 9. Uninstall Powerpath Binaries (package) #pkgrm EMCpower 10.Run EMC Powerpath cleanup script #/etc/emcp_cleanup 11.Reboot the host only if Powerpath Uninstall requests a reboot. 12.Start VERITAS Volume Manager daemons #vxconfigd m enable 13.Enable Volume Manager Startup ( disabled in step 5 ) #rm /etc/vx/reconfig.d/state.d/install-db 14.Update Boot alias of host if required in OBP
154
EMC Storage
/dev/dsk/c2t6d0
GK EMC
SYMMETRIX
5265
7301A281
2880
Using the first and last serial numbers as examples, the serial number is broken out as follows: 73 009 15 0 Last two digits of the Symmetrix serial number Symmetrix device number Symmetrix director number. If <= 16, using the A processor Port number on the director
-------------------------------------------------------73 01A 28 Last two digits of the Symmetrix serial number Symmetrix device number Symmetrix director number. If > 16, using the B proccessor on board: (${brd}-16). Port number on the director
So, the first example, device 009 is mapped to director 15, processor A, port 0 while the second example has device 01A mapped to director 12, processor B, port 0. Even if you don't buy any of the EMC software, you can get the inq command from their web site. Understanding the serial numbers will help you get a better understanding of which ports are going to which hosts. Understanding this and documenting it will circumvent hours of rapturous cable tracings.
Brocade Switches
1. Brocade Configuration Information Basic Brocade Notes DS8B_ID3:admin> switchshow switchName: DS8B_ID3 switchType: 3.4 switchState: Online switchRole: Principal switchDomain: 3 switchId: fffc03 switchWwn: 10:00:00:60:69:20:50:a9 switchBeacon: OFF port 0: id Online F-Port 50:06:01:60:20:02:f5:a1 port 1: id Online F-Port 50:06:01:68:20:02:f5:a1 port 2: id Online F-Port 10:00:00:00:c9:28:3a:fc port 3: id Online F-Port 10:00:00:00:c9:28:3d:21 port 4: id Online F-Port 10:00:00:00:c9:28:3d:0a port 5: id Online F-Port 10:00:00:00:c9:26:ac:16 port 6: id No_Light port 7: id No_Light DS8B_ID3:admin> DS8B_ID3:admin> cfgshow Defined configuration: cfg: CFG CSA_A_PATH; CSA_B_PATH
155
EMC Storage
CSA_A_PATH CSA_SPA; DB1_LPFC0; MN1_LPFC0 CSA_B_PATH CSA_SPB; DB1_LPFC1; MN1_LPFC1 CSA_SPA 50:06:01:60:20:02:f5:a1 CSA_SPB 50:06:01:68:20:02:f5:a1 DB1_LPFC0 10:00:00:00:c9:28:3a:fc DB1_LPFC1 10:00:00:00:c9:28:3d:21 MN1_LPFC0 10:00:00:00:c9:28:3d:0a MN1_LPFC1 10:00:00:00:c9:26:ac:16
Effective configuration: cfg: CFG zone: CSA_A_PATH 50:06:01:60:20:02:f5:a1 10:00:00:00:c9:28:3a:fc 10:00:00:00:c9:28:3d:0a zone: CSA_B_PATH 50:06:01:68:20:02:f5:a1 10:00:00:00:c9:28:3d:21 10:00:00:00:c9:26:ac:16 DS8B_ID3:admin> 2. Brocade Configuration Walkthrough a. Basic SwitchShow DS8B_ID3:admin> switchshow switchName: DS8B_ID3 switchType: 3.4 switchState: Online switchRole: Principal switchDomain: 3 switchId: fffc03 switchWwn: 10:00:00:60:69:20:50:a9 switchBeacon: OFF port 0: id Online F-Port 50:06:01:60:20:02:f5:a1 port 1: id Online F-Port 50:06:01:68:20:02:f5:a1 port 2: id Online F-Port 10:00:00:00:c9:28:3a:fc port 3: id Online F-Port 10:00:00:00:c9:28:3d:21 port 4: id No_Light port 5: id No_Light port 6: id No_Light port 7: id No_Light b. Create Aliases DS8B_ID3:admin> alicreate "CSA_SPA", "50:06:01:60:20:02:f5:a1"
156
EMC Storage
DS8B_ID3:admin> alicreate "CSA_SPB", "50:06:01:68:20:02:f5:a1" DS8B_ID3:admin> alicreate "DB1_LPFC0", "10:00:00:00:c9:28:3a:fc" DS8B_ID3:admin> alicreate "DB1_LPFC1", "10:00:00:00:c9:28:3d:21" c. Create Zones DS8B_ID3:admin> zoneCreate "CSA_A_PATH" , "CSA_SPA; DB1_LPFC0" DS8B_ID3:admin> zoneCreate "CSA_B_PATH" , "CSA_SPB; DB1_LPFC1" DS8B_ID3:admin> cfgCreate "CFG", "CSA_A_PATH; CSA_B_PATH" d. Save and Enable New Configuration DS8B_ID3:admin> cfgCreate "CFG", "CSA_A_PATH; CSA_B_PATH" DS8B_ID3:admin> cfgSave Updating flash ... DS8B_ID3:admin> cfgEnable "CFG" zone config "CFG" is in effect Updating flash ... 0x10e6e440 (tThad): Jun 21 04:26:09 Error FW-CHANGED, 4, fabricZC000 (Fabric Zoning change) value has changed. current value : 7 Zone Change(s). (info) e. Show Zone Configuration DS8B_ID3:admin> zoneshow Defined configuration: cfg: CFG CSA_A_PATH; CSA_B_PATH zone: CSA_A_PATH CSA_SPA; DB1_LPFC0 zone: CSA_B_PATH CSA_SPB; DB1_LPFC1 alias: CSA_SPA 50:06:01:60:20:02:f5:a1 alias: CSA_SPB 50:06:01:68:20:02:f5:a1 alias: DB1_LPFC0 10:00:00:00:c9:28:3a:fc alias: DB1_LPFC1 10:00:00:00:c9:28:3d:21 Effective configuration: cfg: CFG zone: CSA_A_PATH 50:06:01:60:20:02:f5:a1 10:00:00:00:c9:28:3a:fc zone: CSA_B_PATH 50:06:01:68:20:02:f5:a1 10:00:00:00:c9:28:3d:21
157
RW MS R 24.781 R 25.208 W 25.981 R 5.448 W 4.172 R 2.620 W 0.252 R 3.213 W 3.011 R 2.197 W 2.680 W 0.436 W 0.542 W 0.339 W 0.414 W 0.344 W 0.361 W 0.315 W 0.421 W 0.349 R 1.524 R 3.648
158
Dtrace
R R R R R R
COUNT 1 1 1 1 1 1 2 2 2 2 3
159
160
Disaster Recovery
The command to create the primary RVG takes the form: disk_group is the name of the disk group containing the database rvg_name is the name for the RVG data_volume is the volume that VVR replicates srl_volume is the volume for the SRL vradmin -g disk_group createpri rvg_name data_volume srl_volume The command creates the RVG on the primary site and adds a Data Change Map (DCM) for each data volume. In this case, a DCM exists for rac1_vol). Configuring replication for the secondary site To create objects for replication on the secondary site, use the vradmin command with the addsec option. To set up replication on the secondary site: Creating a disk group on the storage with the same name as the equivalent disk group on the primary site if you have not already done so. Creating volumes for the database and SRL on the secondary site. Editing the /etc/vx/vras/.rdg file on the secondary site. Resolvable virtual IP addresses that set network RLINK connections as host names of the primary and secondary sites. Creating the replication objects on the secondary site. Creating the data and SRL volumes on the secondary site To create the data and SRL volumes on the secondary site 1. In the disk group created for the Oracle database, create a volume for data; in this case, the rac_vol1 volume on the primary site is 6.6 GB: # vxassist -g oradatadg make rac_vol1 6600M nmirror=2 disk1 disk2 2. Create the volume for the SRL, using the same name and size of the equivalent volume on the primary site. Create the volume on a different disk from the disks for the database volume: # vxassist -g oradatadg make rac1_srl 1500M nmirror=2 disk4 disk6 Editing the /etc/vx/vras/.rdg files Editing the /etc/vx/vras/.rdg file on the secondary site enables VVR to replicate the disk group from the primary site to the secondary site. On each node, VVR uses the /etc/vx/vras/.rdg file to check the authorization to replicate the RVG on the primary site to the secondary site. The file on each node in the secondary site must contain the primary disk group ID, and likewise, the file on each primary system must contain the secondary disk group ID. 1. On a node in the primary site, display the primary disk group ID: # vxprint -l diskgroup
161
Disaster Recovery
2. On each node in the secondary site, edit the /etc/vx/vras/.rdg file and enter the primary disk group ID on a single line. 3. On each cluster node of the primary cluster, edit the file and enter the primary disk group ID on a single line. Setting up IP addresses for RLINKs on each cluster Creating objects with the vradmin command requires resolvable virtual IP addresses that set network RLINK connections as host names of the primary and secondary sites. To set up IP addresses for RLINKS on each cluster 1. on one of the nodes of the clusterFor each RVG running on each cluster, set up a virtual IP address on one of the nodes of the cluster. These IP addresses are part of the RLINK. The example assumes that the public network interface iseth0:1, the virtual IP address is 10.10.9.101, and the net mask is 255.255.240.0 for the cluster on the primary site: # ifconfig eth0:1 inet 10.10.9.101 netmask 255.255.240.0 up 2. Use the same commands with appropriate values for the interface, IP address, and net mask on the secondary site. The example assumes the interface is eth0:1, virtual IP address is 10.11.9.102, and the net mask is 255.255.240.0 on the secondary site. 3. Define the virtual IP addresses to correspond to a virtual cluster host name on the primary site and a virtual cluster host name on the secondary site. For example, update /etc/hosts file on all nodes in each cluster. The examples assume rac_clus101_priv has IP address 10.10.9.101 and rac_clus102_priv has IP address 10.11.9.102. 4. Use the ping command to verify the links are functional. Setting up disk group on secondary site for replication Create the replication objects on the secondary site from the master node on the primary site, using the vradmin command. To set up the disk group on the secondary site for replication 1. Issue the command in the following format from the cluster on the primary site: dg_pri is the disk group on the primary site that VVR will replicate. For example: oradatadg rvg_pri is the RVG on the primary site. For example: rac1_rvg pri_host is the virtual IP address or resolvable virtual host name of the cluster on the primary site. For example: 10.10.9.101 or rac_clus101_priv sec_host is the virtual IP address or resolvable virtual host name of the cluster on the secondary site. For example: 10.11.9.102 or rac_clus102_priv vradmin -g dg_pri addsec rvg_pri pri_host sec_host 2. On the secondary site, the command: Creates an RVG within the specified disk group using the same name as the one for the primary site Associates the data and SRL volumes that have the same names as the ones on the primary site with the specified RVG
162
Disaster Recovery
Adds a data change map (DCM) for the data volume Creates cluster RLINKS for the primary and secondary sites with the default names; for example, the primary RLINK created for this example is rlk_rac_clus102_priv_rac1_rvg and the secondary RLINK created is rlk_rac_clus101_priv_rac1_rvg. 3. Verify the list of RVGs in the RDS by executing the following command. # vradmin -g oradg -l printrvg Reeplicated Data Set: rac1_rvg Primary: HostName: 10.180.88.187 <localhost> RvgName: rac1_rvg DgName: oradatadg datavol_cnt: 1 vset_cnt: 0 srl: rac1_srl RLinks: name=rlk_10.11.9.102_ rac1_rvg, detached=on, synchronous=off Secondary: HostName: 10.190.99.197 RvgName: rac1_rvg DgName: oradatadg datavol_cnt: 1 vset_cnt: 0 srl: rac1_srl RLinks: name=rlk_10.10.9.101_ rac1_rvg, detached=on, synchronous=off Starting replication using automatic synchronization From the primary site, automatically synchronize the RVG on the secondary site: vradmin -g disk_group -a startrep pri_rvg sec_host Starting replication using full synchronization with Checkpoint vradmin -g disk_group -full -c ckpt_name syncrvg pri_rvg sec_host
163
Disaster Recovery
RVG state: Data volumes: VSets: SRL name: SRL size: Total secondaries: Primary (acting secondary): Host name: RVG name: DG name: Data status: Replication status: synchronization) Current mode: Logging to: (failback logging) Timestamp Information: Config Errors: 162.111.101.196:
162.111.101.196 hubrvg hubdg consistent, behind logging to DCM (needs failback asynchronous DCM (contains 3708448 Kbytes) N/A
Primary-Primary configuration
Secondary: Host name: RVG name: DG name: Data status: Replication status: Current mode: Logging to: Timestamp Information:
162.111.101.196 hubrvg hubdg consistent, up-to-date replicating (connected) asynchronous SRL behind by 0h 0m 0s
164
Disaster Recovery
165
Disaster Recovery
Here's now to resynchronize the old Primary once you bring it back up 5.0:
1. use the migrate option with vradmin # vradmin -g diskgroup migrate vgname hostRemoteIP 2. If the command reports back primary out of sync, use the fbsync option # vradmin -g diskgroup fbsync vgnme
166
Disaster Recovery
b. Indicate whether the NIC you entered is for all cluster nodes. If you enter n, enter the names of NICs on each node. c. Enter or confirm the virtual IP address for the local cluster. d. When the wizard discovers the net mask associated with the virtual IP address, accept the discovered value or enter another value. With NIC and IP address values configured, the wizard creates a ClusterService group or updates an existing one. After modifying the VCS configuration file, the wizard brings the group online. e. Perform through step 1 through step 5 on the secondary cluster. 3. Modifying the global clustering configuration using the main.cf on the primary cluster include include include include include "types.cf" "CFSTypes.cf" "CVMTypes.cf" "OracleTypes.cf" "VVRTypes.cf"
cluster rac_cluster101 ( UserNames = { admin = "cDRpdxPmHpzS." } ClusterAddress = "10.10.10.101" Administrators = { admin } CounterInterval = 5 UseFence = SCSI3 ) group ClusterService ( SystemList = { galaxy = 0, nebula = 0 } AutoStartList = { galaxy, nebula } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) Application wac ( StartProgram = "/opt/VRTSvcs/bin/wacstart" StopProgram = "/opt/VRTSvcs/bin/wacstop" MonitorProcesses = "/opt/VRTSvcs/bin/wac" } RestartLimit = 3 ) IP gcoip ( Device =eth1 Address = "10.10.10.101" NetMask = "255.255.240.0" ) NIC csgnic ( Device =eth1 ) gcoip requires csgnic wac requires gcoip
167
Disaster Recovery
4. Define the remotecluster and its virtual IP address. In this example, the remote cluster is rac_cluster102 and its IP address is 10.11.10.102: # haclus -add rac_cluster102 10.11.10.102 5. Complete step 3 and step 4 on the secondary site using the name and IP address of the primary cluster (rac_cluster101 and 10.10.10.101). 6. On the primary site, add the heartbeat object for the cluster. In this example, the heartbeat method is ICMP ping.
# hahb -add Icmp # hahb -modify Icmp ClusterList rac_cluster102 # hahb -modify Icmp Arguments 10.11.10.102 -clus \ rac_cluster102 # haclus -list rac_cluster101 rac_cluster102 7. Example additions to the main.cf file on the primary site:
remotecluster rac_cluster102 ( Cluster Address = "10.11.10.102" ) heartbeat Icmp ( ClusterList = { rac_cluster102 } Arguments @rac_cluster102 = { "10.11.10.102" } ) system galaxy ( ) 8. Example additions to the main.cf file on the secondary site:
remotecluster rac_cluster101 ( Cluster Address = "10.190.88.188" ) heartbeat Icmp ( ClusterList = { rac_cluster101 } Arguments @rac_cluster102 = { "10.190.88.188" } ) system galaxy
168
Disaster Recovery
2x IP for GCO - one per cluster ,2x IP for VVR RLINK one per cluster Primary CFS Cluster with VVR - example main.cf include include include include "types.cf" "CFSTypes.cf" "CVMTypes.cf" "VVRTypes.cf"
cluster primary003 ( UserNames = { haadmin = xxx } ClusterAddress = "162.111.101.195" Administrators = { haadmin } UseFence = SCSI3 HacliUserLevel = COMMANDROOT ) remotecluster remote003 ( ClusterAddress = "167.138.164.121" ) heartbeat Icmp ( ClusterList = { remote003 } Arguments @remote003 = { "167.138.164.121" } ) system primary003a1 ( ) system primary003b1 ( ) group ClusterService ( SystemList = { primary003a1 = 0, primary003b1 = 1 } AutoStartList = { primary003a1, primary003b1 } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) Application wac ( StartProgram = "/opt/VRTSvcs/bin/wacstart" StopProgram = "/opt/VRTSvcs/bin/wacstop" MonitorProcesses = { "/opt/VRTSvcs/bin/wac" } RestartLimit = 3 ) IP gcoip ( Device @primary003a1 = bond0 Device @primary003b1 = bond0 Address = "162.111.101.195" NetMask = "255.255.254.0" ) NIC csgnic (
169
Disaster Recovery
Device = bond0 ) NotifierMngr ntfr ( SmtpServer = "smtp.me.com" SmtpRecipients = { "[email protected]" = Warning } ) gcoip requires csgnic ntfr requires csgnic wac requires gcoip
group HUBDG_RVG ( SystemList = { primary003a1 = 0, primary003b1 = 1 } Parallel = 1 AutoStartList = { primary003a1, primary003b1 } ) CVMVolDg HUB_DG ( CVMDiskGroup = hubdg CVMActivation = sw ) RVGShared HUBDG_CFS_RVG ( RVG = hubrvg DiskGroup = hubdg ) requires group cvm online local firm HUBDG_CFS_RVG requires HUB_DG
group Myappsg ( SystemList = { primary003a1 = 0, primary003b1 = 1 } Parallel = 1 ClusterList = { remote003 = 1, primary003 = 0 } Authority = 1 AutoStartList = { primary003a1, primary003b1 } ClusterFailOverPolicy = Auto Administrators = { tibcoems } ) Application foo ( StartProgram = "/opt/tibco/vcs_scripts/foo start &" StopProgram = "/opt/tibco/vcs_scripts/foo stop &" MonitorProgram = "/opt/tibco/vcs_scripts/monitor_foo" )
170
Disaster Recovery
CFSMount foomnt ( MountPoint = "/opt/foo" BlockDevice = "/dev/vx/dsk/hubdg/foo" ) RVGSharedPri hubrvg_pri ( RvgResourceName = HUBDG_CFS_RVG OnlineRetryLimit = 0 ) requires group HUBDG_RVG online local firm foo requires foomnt foomnt requires hubrvg_pri
group cvm ( SystemList = { primary003a1 = 0, primary003b1 = 1 } AutoFailOver = 0 Parallel = 1 AutoStartList = { primary003a1, primary003b1 } ) CFSfsckd vxfsckd ( ActivationMode @primary003a1 = { hubdg = sw } ActivationMode @primary003b1 = { hubdg = sw } ) CVMCluster cvm_clus ( CVMClustName = primary003 CVMNodeId = { primary003a1 = 0, primary003b1 = 1 } CVMTransport = gab CVMTimeout = 200 ) CVMVxconfigd cvm_vxconfigd ( Critical = 0 CVMVxconfigdArgs = { syslog } ) cvm_clus requires cvm_vxconfigd vxfsckd requires cvm_clus
171
Disaster Recovery
IP vvr_ip ( Device @primary003a1 = bond1 Device @primary003b1 = bond1 Address = "162.111.101.196" NetMask = "255.255.254.0" ) NIC vvr_nic ( Device @primary003a1 = bond1 Device @primary003b1 = bond1 ) RVGLogowner logowner ( RVG = hubrvg DiskGroup = hubdg ) requires group HUBDG_RVG online local firm logowner requires vvr_ip vvr_ip requires vvr_nic Secondary CFS Cluster with VVR - example main.cf include include include include "types.cf" "CFSTypes.cf" "CVMTypes.cf" "VVRTypes.cf"
cluster remote003 ( UserNames = { haadmin = xxx } ClusterAddress = "167.138.164.121" Administrators = { haadmin } UseFence = SCSI3 HacliUserLevel = COMMANDROOT ) remotecluster primary003 ( ClusterAddress = "162.111.101.195" ) heartbeat Icmp ( ClusterList = { primary003 } Arguments @primary003 = { "162.111.101.195" } ) system remote003a1 ( ) system remote003b1 ( ) group ClusterService ( SystemList = { remote003a1 = 0, remote003b1 = 1 } AutoStartList = { remote003a1, remote003b1 }
172
Disaster Recovery
OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) Application wac ( StartProgram = "/opt/VRTSvcs/bin/wacstart" StopProgram = "/opt/VRTSvcs/bin/wacstop" MonitorProcesses = { "/opt/VRTSvcs/bin/wac" } RestartLimit = 3 ) IP gcoip ( Device @remote003a1 = bond0 Device @remote003b1 = bond0 Address = "167.138.164.121" NetMask = "255.255.254.0" ) NIC csgnic ( Device = bond0 ) NotifierMngr ntfr ( SmtpServer = "smtp.me.com" SmtpRecipients = { "[email protected]" = Warning } ) gcoip requires csgnic ntfr requires csgnic wac requires gcoip
group HUBDG_RVG ( SystemList = { remote003a1 = 0, remote003b1 = 1 } Parallel = 1 AutoStartList = { remote003a1, remote003b1 } ) CVMVolDg HUB_DG ( CVMDiskGroup = hubdg CVMActivation = sw ) RVGShared HUBDG_CFS_RVG ( RVG = hubrvg DiskGroup = hubdg ) requires group cvm online local firm HUBDG_CFS_RVG requires HUB_DG
173
Disaster Recovery
group Tibcoapps ( SystemList = { remote003a1 = 0, remote003b1 = 1 } Parallel = 1 ClusterList = { remote003 = 1, primary003 = 0 } AutoStartList = { remote003a1, remote003b1 } ClusterFailOverPolicy = Auto Administrators = { tibcoems } ) Application FOO ( StartProgram = "/opt/tibco/vcs_scripts/foo start &" StopProgram = "/opt/tibco/vcs_scripts/foo stop &" MonitorProgram = "/opt/tibco/vcs_scripts/monitor_foo" )
RVGSharedPri hubrvg_pri ( RvgResourceName = HUBDG_CFS_RVG OnlineRetryLimit = 0 ) requires group HUBDG_RVG online local firm foo requires foomnt foomnt requires hubrvg_pri
group cvm ( SystemList = { remote003a1 = 0, remote003b1 = 1 } AutoFailOver = 0 Parallel = 1 AutoStartList = { remote003a1, remote003b1 } ) CFSfsckd vxfsckd ( ActivationMode @remote003a1 = { hubdg = sw } ActivationMode @remote003b1 = { hubdg = sw } ) CVMCluster cvm_clus ( CVMClustName = remote003 CVMNodeId = { remote003a1 = 0, remote003b1 = 1 } CVMTransport = gab CVMTimeout = 200
174
Disaster Recovery
) CVMVxconfigd cvm_vxconfigd ( CVMVxconfigdArgs = { syslog } ) cvm_clus requires cvm_vxconfigd vxfsckd requires cvm_clus
group rlogowner ( SystemList = { remote003a1 = 0, remote003b1 = 1 } AutoStartList = { remote003a1, remote003b1 } OnlineRetryLimit = 2 ) IP vvr_ip ( Device @remote003a1 = bond1 Device @remote003b1 = bond1 Address = "167.138.164.117" NetMask = "255.255.254.0" ) NIC vvr_nic ( Device @remote003a1 = bond1 Device @remote003b1 = bond1 ) RVGLogowner logowner ( RVG = hubrvg DiskGroup = hubdg ) requires group HUBDG_RVG online local firm logowner requires vvr_ip vvr_ip requires vvr_nic Secondary CFS Cluster with VVR - example main.cf
VVR 4.X
Pre 5.0 VVR does not use vradmin as much, and is kept here to show the underlying commands. Note that with 4.0 and earlier you need to detach the SRL before growth, and in 5.x that is no longer needed.
Here's now to resynchronize the old Primary once you bring it back up 4.x:
1. The RVG and RLINK should be stopped and detached. If not, stop and detach # vxrvg stop rvgA # #vxrlink det rlinkA
175
Disaster Recovery
2. Disassociate the SRL and make the system a secondary: # vxvol dis srlA # #vxedit set primary=false rvgA 3. Reassociate the SRL, change the primary_datavol attribute: # vxvol aslog rvgA srlA# # vxedit set primary_datavol=sampleB sampleA 4. Attach the RLINK and then start the RVG: # vxrlink -f att rlinkA# # vxrvg start rvgA This won't do much, as the RLINK on hostB (the Primary) should still#be detached, preventing the Secondary from connecting. 5. Now go back to#the Primary to turn the RLINK on: # vxedit set remote_host=hostA local_host=hostB \ remote_dg=diskgroupA# remote_rlink=rlinkA# # vxrlink -a att rlinkB Giving the -a flag to vxrlink tells it to run in autosync mode. This#will automatically resync the secondary datavolumes from the Primary.#If the Primary is being updated faster than the Secondary can be#synced, the Secondary will never become synced, so this method is only#appropriate for certain implementations. Once synchronization is complete, follow the instructions above (the#beginning of section 6) to transfer the Primary role back to the#original Primary system.
176
Disaster Recovery
# # # #
#vxedit set primary=false rvgA #vxvol aslog rvgA srlA #vxrvg start rvgA #vxrlink -f att rlinkA
c. Now go to work on the Old Secondary to bring it up as the new Primary. i. First you need to stop the RVG, detach the rlink, disassociate the#SRL, and turn the PRIMARY attribute on: # # # # vxrvg stop rvgB #vxrlink det rlinkB #vxvol dis srlB #vxedit set primary=true rvgB
ii. Veritas recommends that you use vxedit to reinitialize some values on#the RLINK to make sure you're still cool: # vxedit set remote_host=hostA \ local_host=hostB remote_dg=diskgroupA \ #remote_rlink=rlinkA rlinkB iii. Before you can attach the rlink, you need to change the#PRIMARY_DATAVOL attribute on both hosts to point the the Veritas#volume name of the NEW Primary: A. On the new primary (e.g. hostB): # vxedit set primary_datavol=sampleB sampleB B. On the new secondary (e.g. hostA): # vxedit set primary_datavol=sampleB sampleA iv. Now that you have that, go back to the new Primary, attach the RLINK,#and start the RVG: # vxrlink -f att rlinkB # #vxrvg start rvgB 2. If the Primary is down: a. First you'll need to bring up the secondary as a primary. If your#secondary datavolume is inconsistent (this is only likely if an SRL#overflow occurred and the secondary was not resynchronized before the#Primary went down) you will need to disassociate the volumes from the#RVG, fsck them if they contain filesystems, and reassociate them with#VVR. If your volumes are consistent, the task is much easier: On the secondary, first stop the RVG, detach the RLINK, and#disassociate the SRL: # vxrvg stop rvgB # #vxrlink det rlinkB # #vxvol dis srlB b. Make the Secondary the new Primary: # vxedit -g diskgroupB set primary=true rvgB c. Now reassociate the SRL and change the primary_datavol: 177
Disaster Recovery
# vxvol aslog rvgB srlB# # vxedit set primary_datavol=sampleB sampleB d. If the old Primary is still down, all you need to do is start the RVG#to be able to use the datavolumes: # vxrvg start rvgB This will allow you to keep the volumes in VVR so that once you manage#to resurrect the former Primary, you can make the necessary VVR#commands to set it up as a secondary so it can resynchronize from the#backup system. Once it has resynchronized, you can use the process#listed at the beginning of section 6 (above) to fail from the Old#Secondary/New Primary back to the original configuration.
178
Disaster Recovery
e. Now make the RVG, where you put together the datavolume, the SRL, and the rlink: # vxmake -g diskgroupB rvg rvgB rlink=rlinkB \ datavol=sampleB srl=srlB#primary=false f. Attach the rlink to the rvg: # vxrlink -g diskgroupB att rlinkB g. Start the RVG on the Secondary: # vxrvg -g diskgroupB start rvgB 2. Configure Primary VVR Node a. As with the Secondary, make data#volumes, an SRL, and an rlink: # vxassist -g diskgroupA make sampleA 4g layout=log logtype=dcm # vxassist -g diskgroupA make srlA 500m # vxmake -g diskgroupA rlink rlinkA remote_host=hostB# \ remote_dg=diskgroupB remote_rlink=rlinkB local_host=host \ A#synchronous=[off|override|fail] srlprot=dcm b. Make the RVG for the primary. Only the last option is different: # vxmake -g diskgroupA rvg rvgA rlink=rlinkA \ datavol=sampleA srl=srlA primary=true 3. Now go back to the secondary. When we created the secondary,#brain-dead Veritas figured the volume on the Seconday and the Primary#would have the same name, but when we set this up, we wanted to have#the Primary datavolume named sampleA and the Secondary datavolume be#sampleB. So we need to tell the Secondary that the Primary is sampleA: vxedit -g diskgroupB set primary_datavol=sampleA sampleB 4. Now you can attach the rlink to the RVG and start the RVG. On the Primary: vxrlink -g diskgroupA att rlinkA You should see output like this: vxvm:vxrlink: INFO: Secondary data volumes detected \ with rvg rvgB as parent:#vxvm:vxrlink: INFO: sampleB: len=8388608 primary_datavol=sampleA 5. Finally, start I/O on the Primary: # vxrvg -g diskgroupA start rvgA
Disaster Recovery
and#then the secondary. You always need to make sure the Secondary is#larger than or as large as the Primary, or you will get a#configuration error from VVR. You may need to grow an SRL if your pipe shrinks (more likely if your#pipe gets busier) or the amount of data you are sending increases. See#pages 18-25 of the SRVM Configuration Notes for detailed#(excruciatingly) notes on selecting your SRL size. 1. To grow an SRL, you must first stop the RVG and disassociate the SRL#from the RVG: # vxrvg stop rvgA# # vxrlink det rlinkA# # vxvol dis srlA 2. From this point, you can grow your SRL (which is now just an ordinary volume): # vxassist growto srlA 2gb 3. Once your SRL has been successfully grown, reassociate it with the#RVG, reattach the RLINK, and start the RVG: # vxvol aslog rvgA srlA# # vxrlink -f att rlinkA # #vxvg start rvgA
primary# vxrlink -g diskgroupA det rlinkA secondary# #vxrlink -g diskgroupB det rlinkB 2. Then stop the RVG on the primary and then the secondary:
primary# vxrvg -g diskgroupA stop rvgA secondary# #vxrvg -g diskgroupB stop rvgB 3. On the primary, stop the datavolumes: # vxvol -g disgroupA stop sampleA 4. If you want to keep the datavolumes, you need to disassociate them from the RVG:
primary# vxvol -g diskgroupA dis sampleA secondary# #vxvol -g diskgroupB dis sampleB 5. Finally, on both the Primary and the Secondary, remove everything:
180
181
# touch /a/etc/vx/reconfig.d/state.d/install-db 6. Reboot from the disk that was just modified. 7. Once the system is booted in at least single-user mode, VxVM can be started manually with the following steps. a. Start the VxVM worker threads: # vxiod set 10 b. Start vxconfigd in disabled mode: # vxconfigd -d c. Enable vxconfigd: # vxdctl enable d. IMPORTANT: If the boot disk contains mirrored volumes, one must take all the mirrors offline for those volumes except for the one on the boot disk. Offlining a mirror prevents VxVM from ever performing a recovery on that plex. This step is critical in preventing data corruption. # vxprint -htg rootdg ... v rootvol root DISABLED ACTIVE 1026000 PREFER pl rootvol-01 rootvol DISABLED ACTIVE 1026000 CONCAT sd rootdisk-B0 rootvol-01 rootdisk 8378639 1 0 c0t0d0 sd rootdisk-02 rootvol-01 rootdisk 0 1025999 1 c0t0d0 pl rootvol-02 rootvol DISABLED ACTIVE 1027026 CONCAT sd rootmir-06 rootvol-02 rootmir 0 1027026 0 c0t1d0 ... In this case the rootvol-02 plex should be offlined as it resides on c0t1d0: # vxmend -g rootdg off rootvol-02 e. Start all volumes: # vxrecover -ns f. Start any recovery operations on volumes if needed: # vxrecover -bs Once any debugging actions and/or any other operations are completed, VxVM can be re-enabled again with the following steps. a. Undo the steps in the previous section that were taken to disable VxVM (steps 2-4): # cp /etc/vfstab.disable /etc/vfstab # cp /etc/system.disable /etc/system # rm /etc/vx/reconfig.d/state.d/install-db b. Reboot the system. c. Once the system is back up and it is verified to be running correctly, online all mirrors that were offlined in step 6 in the previous section. For example,
182
# vxmend -g rootdg on rootvol-02 d. Start recovery operations on the mirrors that were just onlined. # vxrecover -bs
3. Use the vxreattach command with "-c" option and accessname # /etc/vx/bin/vxreattach -c c2t21d220 # /etc/vx/bin/vxreattach -c c2t21d41
183
required by the relayout operation for column addition VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7) Once the relayout begins, the vxrelayout(1m) and vxtask(1m) utilities can be used to monitor the progress of the relayout operations: $ vxrelayout -g oof status oravol01 RAID5, columns=4, stwidth=32 --> STRIPED-MIRROR, stwidth=128 Relayout running, 10.02% completed.
columns=2,\
$ vxtask list TASKID PTID TYPE/STATE PCT PROGRESS 2125 RELAYOUT/R 14.45% 0/41943168/6061184 RELAYOUT oravol01 oof Veritas Resize When shrinking a volume/fs note that you can not use a -size, specify a -s, with the non-negative number that you want to reduce by # vxresize -s -g diskgroup volume 10g vxvm:vxassist: ERROR: Cannot allocate space for 1675008 block volume The most common example is in a two disk stripe as below. Here the volume is striped across disk 01 and 02. An attempt may be made to use another disk in the disk group (DG) to grow the volume and this will fail since it is necessary to grow the stripe equally. Two disks are needed to grow the stripe. dg stripedg default default 125000 1006935392.1115.sptsunvm5 dm striped01 c1t1d0s2 sliced 2159 8378640 dm striped02 c1t3d0s2 sliced 2159 8378640 dm striped03 c1t4d0s2 sliced 3590 17678493 v oil - ENABLED ACTIVE 16756736 SELECT oil-01 fsgen pl oil-01 oil ENABLED ACTIVE 16757392 STRIPE 2/128 RW sd striped01-01 oil-01 striped01 0 8378640 0/0 c1t1d0 ENA sd striped02-01 oil-01 striped02 0 8378640 1/0 c1t3d0 ENA # vxassist -g stripedg maxgrow oil vxvm:vxassist: ERROR: Volume oil cannot be extended within\ the given constraints Another disk is then added into the configuration so there are now two spare disks. Rerun the maxgrow command, which will succeed. The resize will also succeed. dg stripedg default default 125000 1006935392.1115.sptsunvm5 dm dm dm dm striped01 striped02 striped03 striped04 c1t1d0s2 c1t3d0s2 c1t4d0s2 c1t5d0s2 sliced sliced sliced sliced 2159 2159 3590 2159 8378640 8378640 17678493 8378640 -
v oil - ENABLED ACTIVE 16756736 SELECT oil-01 fsgen pl oil-01 oil ENABLED ACTIVE 16757392 STRIPE 2/128 RW
184
sd striped01-01 oil-01 striped01 0 8378640 0/0 c1t1d0 ENA sd striped02-01 oil-01 striped02 0 8378640 1/0 c1t3d0 ENA # vxassist -g stripedg maxgrow oil Volume oil can be extended from 16756736 to 33513472 (16364Mb) Under normal circumstances, it is possible to issue the resize command and add (grow) the volume across disks 3 and 4. If only one spare disk exists, it is possible to use it. Grow the volume to use the extra space. The only option is a relayout. In the example below, the volume is on disk01/02 and the intention is to incorporate disk 03 and convert the volume into a 3 column stripe. However, the relayout is doomed to fail: dm striped01 c1t1d0s2 sliced 2159 8378640 dm striped02 c1t3d0s2 sliced 2159 8378640 dm striped03 c1t4d0s2 sliced 3590 17678493 v oil - ENABLED ACTIVE 16756736 SELECT oil-01 fsgen pl oil-01 oil ENABLED ACTIVE 16757392 STRIPE 2/128 RW sd striped01-01 oil-01 striped01 0 8378640 0/0 c1t1d0 ENA sd striped02-01 oil-01 striped02 0 8378640 1/0 c1t3d0 ENA # vxassist -g stripedg relayout oil ncol=3 str01 str02 str03 vxvm:vxassist: WARNING: dm:striped01: No disk space matches spec vxvm:vxassist: WARNING: dm:striped02: No disk space matches spec vxvm:vxassist: ERROR: Cannot allocate space for 1675008 block volume vxvm:vxassist: ERROR: Relayout operation aborted. (7) This has failed because the size of the subdisks is exactly the same as that of the disks (8378640 blocks). For this procedure to work, resize (shrink) the volume by about 10% (10% of 8 gigabytes = 800 megabytes) to give VERITAS Volume Manager (VxVM) some temporary space to do the relayout: # vxresize -g stripedg oil 7382m v oil - ENABLED ACTIVE 15118336 SELECT oil-01 fsgen pl oil-01 oil ENABLED ACTIVE 15118464 STRIPE 3/128 RW sd striped01-04 oil-01 striped01 0 7559168 0/0 c1t1d0 ENA sd striped02-04 oil-01 striped02 0 7559168 1/0 c1t3d0 ENA The only other way to avoid having to shrink the volume (in the case of a UNIX File System (UFS) file system) is to add a fourth disk to the configuration just for the duration of the relayout, so VxVM would use the fourth disk as temporary space. Once the relayout is complete, the disk will be empty again.
UDID_MISMATCH
Volume Manager 5.0 introduced a unique identifiers for disks (UDID) which allow source and cloned (copied) disks to be differentiated. If a disk and its clone are presented to Volume Manager, devices will be flagged as udid_mismatch in vxdisk list. This typically indicates that the storage was originally cloned on the storage array; possibly a reassigned lun, or is a bcv If you want to you remove the clone attribute from the device itself and use it as a regular diskgroup with the newly imported diskgroup name: # vxdisk set c5t2d0s2 clone=off If wanting to import a BCV disk group 1. Verify that the cloned disk, EMC0_27, is in the "error udid_mismatch" state:
185
# vxdisk -o alldgs list DEVICE TYPE DISK GROUP STATUS EMC0_1 auto:cdsdisk EMC0_1 mydg online EMC0_27 auto - - error udid_mismatch In this example, the device EMC0_27 is a clone of EMC0_1. 2. Split the BCV device that corresponds to EMC0_27 from the disk group mydg: # /usr/symcli/bin/symmir -g mydg split DEV001 3. Update the information that VxVM holds about the device: # vxdisk scandisks 4. Check that the cloned disk is now in the "online udid_mismatch" state: # vxdisk -o alldgs list DEVICE TYPE DISK GROUP STATUS EMC0_1 auto:cdsdisk EMC0_1 mydg online EMC0_27 auto:cdsdisk - - online udid_mismatch 5. Import the cloned disk into the new disk group newdg, and update the disk's UDID: # vxdg -n newdg -o useclonedev=on -o updateid import mydg 6. Check that the state of the cloned disk is now shown as "online clone_disk": # vxdisk -o alldgs list DEVICE TYPE DISK GROUP STATUS EMC0_1 auto:cdsdisk EMC0_1 mydg online EMC0_27 auto:cdsdisk EMC0_1 newdg online clone_disk
# cat /var/tmp/config.out
186
Note
This will not delete existing data on the disks. All commands in this procedure interact with the private region header information and do not re-write data. 5. Continue through the list of disks by adding them into the disk group
# vxdg -g DiskGroupName
adddisk DISKNAME=cAtBdZs2
6. After all disks are added into the disk group generate the original layout by running vxmake against the /var/tmp/maker file # vxmake -g DiskGroupName -d /var/tmp/maker 7. At this point all volumes will be in a DISABLED ACTIVE state. Once enabling all volumes you will have full access to the original disk group. # vxvol -g DiskGroupName startall
187
sdb state=enabled # vxdisk list sdal |grep "state=enabled" sdax state=enabled sds state=enabled # vxdmpadm getsubpaths dmpnodename=sdal NAME STATE[A] PATH-TYPE[M] CTLR-NAME ENCLR-TYPE ENCLR-NAME ATTRS ======================================================================== sdax ENABLED(A) c1 EMC EMC2 sds ENABLED(A) c0 EMC EMC2 # vxdmpadm getsubpaths dmpnodename=sds NAME STATE[A] PATH-TYPE[M] CTLR-NAME ENCLR-TYPE ENCLR-NAME ATTRS ======================================================================== sdan ENABLED(A) c1 EMC EMC2 sdb ENABLED(A) c0 EMC EMC2 Solution # rm /etc/vx/disk.info ; rm /etc/vx/array.info # vxconfigd -k
Note
In newer versions of vxvm there is a vxsplit command that can be used for this process. ## (for each vol) get the names/disks from vxdisk list # vxprint -hmQq -g <current disk group> <volname> > /<volname> ## Next # vxedit -g <dg> -rf rm <volname> (for each vol) # vxdg -g <dg> rmdisk <name> # vxdg init <newdg> <diskname>=<disk> # vxdg -g newdg adddisk <diskname>=<disk> for each disk # vxmake -g newdg -d /tmp/<volname> for each volume. # vxvol -g newdg start <volname>
Recover vx Plex
# vxprint|grep DETA pl vol01-02 vol01 DETACHED 204800 - IOFAIL - # vxplex -g ptpd att vol01 vol01-02 &
188
disks=( `ls /dev/rdsk/c*s2` ) total=0; # ---- how many disks? ---sz=${#disks[*]} # ---- get disk size for each ---n=0 echo "Disks:" while [ $n -lt $sz ] do geom=( `prtvtoc ${disks[$n]} 2>/dev/null | \ egrep "sector|track|cylinder" | tr -d "*" | awk '{print $1}'` ) # ---- get disk parms and calculate size ---BperS=${geom[0]} SperT=${geom[1]} TperC=${geom[2]} SperC=${geom[3]} Cyls=${geom[4]} AccCyls=${geom[5]} if [ "$BperS" != "" ]; then size=`expr $BperS \* $SperC \* $Cyls` GB=`expr $size \/ 1024 \/ 1024 \/ 1024` echo -n " ${disks[$n]}: " echo $GB "Gbytes" total=`expr $total + $GB` fi n=`expr $n + 1` done
189
<<< 3. Disassociate the mirror plex # # # # # # vxplex vxplex vxplex vxplex vxplex vxplex -g -g -g -g -g -g rootdg rootdg rootdg rootdg rootdg rootdg dis dis dis dis dis dis rootvol-02 swapvol-02 usr-02 var-02 opt-02 -------------------- if any home-02 -------------------- if any
4. Edit the following files to make the root mirror disk bootable without VERITAS Volume Manager # # # # # mount /dev/dsk/c1t1d0s0 /mnt cd /mnt/etc cp -p system system.orig cp -p vfstab vfstab.orig cp -p vfstab.prevm vfstab
5. Change the c#t#d# number in above file to ensure the correct partitions will be referenced in the vfstab file: # touch /mnt/etc/vx/reconfig.d/state.d/install-db Edit /mnt/etc/system and comment out following lines using the "*" character: Before changes: rootdev .. set vxio .. After changes: * rootdev .. * set vxio ..
6. Unmount the root mirror's / partition # umount /mnt 7. If the upgrade or patching was successful, attach back mirror plex to root disk: # # # # vxplex vxplex vxplex vxplex -g -g -g -g rootdg rootdg rootdg rootdg att att att att rootvol rootvol-02 swapvol swapvol-02 var var-02 usr usr-02
190
- Boot system 2. Remove the partition having tag 14 and 15 from mirror disk using format completely. Do not just change tag type, zero out these partitions and labels before exiting from format. 3. Manually start up vxconfigd to allow for the encapsulation of the root mirror: # vxiod set 10 # vxdconfigd -m disable # vxdctl init # vxdisk -f init c1t0d0 # vxdctl enable # rm /etc/vx/reconfig.d/state.d/install-db # vxdiskadm => option 2 Encapsulate one or more disks => choose c1t1d0 (old rootmirror) => put under rootdg # shutdown -i6 -g0 -y 4. Mirror root mirror disk with original root disk: # /etc/vx/bin/vxrootmir -g rootdg rootdisk # /etc/vx/bin/vxmirror -g rootdg rootmirror rootdisk
191
192
Advanced VCS for IO Fencing and Various Commands plan and developed quicklog. This allows you to have the filesystem log on a different disk. This helps in speeding things up, because most disk operations can happen in parallel. OK, so now you know what quicklog is. You can have quicklog on cluster filesystems as well. Port "q" is used to coordinate access to quicklog (wow, that was a loooong one) Port U - Not a port you would normally see, but just to be complete, let's mention it here. When a Cluster Volume Manager is started, it will need to do a couple of things. The access to changing the configuration of volumes, plexes, subdisks and diskgroups, needs to be coordinated. This means that a "master" will always need to be selected in the cluster (can be checked with the "vxdctl -c mode" command). Normally the master is the first one to open port "u". Port "u" is an exclusive port for registering with the cluster volume manager "master". If no master has been established yet, the first node to open port "u" will assume the role of master. The master controls all access to changes of the cluster volume manager configuration. Each node that tries to join the cluster (CVM), will need to open (exclusively) port "u", search for the master, and make sure that the node and the master sees all the same disks for the shared diskgroups. Port V - OK, now that we've estabblished that there is a master, we need to mention that fact that each instance of volume manager running (thus on each node) keeps the configuration in memory (regardless if it is part of a cluster or not). This "memory" is is managed by the configuration daemon (vxconfigd). We will get to the vxconfigd in a minute, but first port "v". So, port "v" is actually used to register membership for the cluster volume manager. (once the node got port "u" membership, the "permanent" membership is done via port "v". Only members of the same cluster (cluster volume manager cluster that is) are allowed to import and access the (shared) disks Port W - The last port in cluster volume manager. This is the port used for the vxconfigd on each node to communicate with the vxconfigd on all the other nodes. The biggest issue is that a configuration change needs to be the same across the whole cluster (does not help that 1 node thinks we still have a mirrored volume and the others don't know a thing about the mirror)
193
194
Advanced VCS for IO Fencing and Various Commands If a registration fails down a particular path, dmp *should* prevent that path from going to an online state -- but I know that we've seen a few problems with this in the past (path goes online but the registration failed, leaving the particular subpath keyless). 5. If so, does scsi3_disk_policy=dmp result in the key being written on the bad path when it comes back online? If the dmp policy does not interact with the vxfen module and allow for placement of the keys on the previously bad path then what is the benefit of the dmp node? Using dmp policy instructs vxfen to use dmpnode instead of raw path. When the registration is made on the dmpnode, dmp keeps track of that registration request, and will gratuitously make the same registration for any subsequent added/restored path that arrives after the original registration to the dmpnode was made -- at least that's what is supposed to happen (see above about corner case bugs that have been identified and addressed over times past). 6. Can this setting be adjusted on the fly with the cluster up? The /etc/vxfentab file is (re)created each time the vxfen start script runs. Once the file is built, "vxfenconfig -c" reads the file upon initialization only. With 50mp3 and later, there is a way to go through a "replace" procedure to replace one device with another. With a bit of careful testing, that method could be used to replace the /dev/rdsk/c_t_d with the corresponding dmpnode if desired. 7. Last, why does the registration key on a data drive only have one key when there are multiple paths? Reservations have a key per path. Is the registration written to the LUN instead of the Symm? Its the other way actually, there are multiple registrations (one per path), and only one reservation. The reservation is not really a key itself (its a mode setting) but is made through a registration key. If you unregister the hosting key, the reservation mode is lost. But if you preempt that key using some other registration, the spec says that the preempting key will inherit the reservation. Our dmp code is paranoid here, and we try the reservation again anyway. As a result, it is expected to see failed reservations coming from CVM slave nodes given it is the CVM master that makes the initial reservation through one of its paths to the LUN and the slave's attempt to re-reserve is expected to fail if one of the paths from the CVM master still holds the reservation. If for some reason the master lost its reservation (should never happen) our extra try for reservation from all joining slaves is something like an extra insurance policy.
195
Advanced VCS for IO Fencing and Various Commands reads /etc/vxfendg to determine name of the diskgroup (DG) that contains the coordinator disks parses "vxdisk -o alldgs list" output for list of disks in that DG performs a "vxdisk list diskname" for each to determine all available paths to each coordinator disk uses all paths to each disk in the DG to build a current /etc/vxfentab 3. Summary of keys including uncommon ones In summary, the /opt/VRTSvcs/rac/bin/vxfentsthdw is a readable shell script which performs all of these steps (it uses dd instead of format's analyze function). Note that you must REGISTER a key before you can PREEMPT other keys. The easiest way of clearing keys is the /opt/VRTSvcs/rac/bin/vxfenclearpre script but this requires all IO to stop to ALL diskgroups, and a reboot to immediatly follow running the script (to safely re-apply needed keys). Failure to reboot results in VXVM performing shared IO without keys. If an event arises that mandates fencing, winning nodes will attempt to eject the keys from losing nodes, but won't find any. VXVM will silently continue. Worse yet, because the RESERVATION isn't present, the losing nodes still have the ability to write to the data disks thereby bypassing IOfencing altogether. If a node wants to perform IO on a device which has a RESERVATION, the node must first REGISTER a key. If the RESERVATION is inadvertently cleared, there is no requirement to maintain a REGISTRATION. For this reason, keys should never be manipulated of disks actively imported in shared mode. Manually stepping through this document 3-4 times using a spare disk on your cluster is the only way to become familiar with fencing and quickly resume normal production operation after a fence operation occurs. Otherwise, you must use vxfenclearpre or call VERITAS Support at 800 342 0652, being prepared to provide your VSN contract ID. Reading over the logic of vxfentsthdw and vxfenclearpre shell scripts also are valuable training aides. In the Table below ** the SCSI3_RESV_WRITEEXCLUSIVEREGISTRANTSONLY reservation mode is also required
196
Advanced VCS for IO Fencing and Various Commands vxfsckd is not running: # mount -F vxfs -o cluster,largefiles,qio /dev/vx/dsk/orvol_dg/orbvol /shared UX:vxfs mount: ERROR: Cluster mount is not supported on a non-CVM volume on a file system layout version less than 4, or GAB/GLM modules are not loaded, or vxfsckd daemon is not running. # which vxfsckd /opt/VRTSvxfs/sbin/vxfsckd # /opt/VRTSvxfs/sbin/vxfsckd # ps -ef|grep vxfsckd root 5547 1 0 23:04:43 ? 0:00 /opt/VRTSvxfs/sbin/vxfsckd largefiles has not yet been set: # mount -F vxfs -o cluster,largefiles,qio \ /dev/vx/dsk/orvol_dg/orbvol /shared UX:vxfs mount: ERROR: mount option(s) incompatible with file system /dev/vx/dsk/orvol_dg/orbvol b. Reboot command issued instead of init 6 This results in the keys from the rebooted node remaining on the disks and prevents vxfen from starting. Easy way to fix is a reboot with init 6. 5. Adjust CFS Primary node - not master node node 0# fsclustadm showprimary /orashared 0 node 1# fsclustadm setprimary /orashared # fsclustadm showprimary /orashared 1 6. Coordinator Disk example with keys - note lack of reservations ; coordinator disks do not set them. # head -1 /etc/vxfentab > /tmp/coordinator_disk # vxfenadm -g all -f /tmp/coordinator_disk Device Name: /dev/rdsk/c2t0d7s2 Total Number Of Keys: 2 key[0]: Key Value [Numeric Format]: 66,45,45,45,45,45,45,45 Key Value [Character Format]: B------key[1]: Key Value [Numeric Format]: 65,45,45,45,45,45,45,45 Key Value [Character Format]: A------# head -1 /etc/vxfentab > /tmp/coordinator_disk
197
Advanced VCS for IO Fencing and Various Commands # vxfenadm -r all -f /tmp/coordinator_disk Device Name: /dev/rdsk/c2t0d7s2 Total Number Of Keys: 0 No keys... 7. Data Disk example with keys - should have both Reservation and Registration set. # vxdisk -o alldgs list | awk '/shared$/ {print "/dev/rdsk/" $1 }'\ | head -1 > /tmp/data_disk # vxfenadm -g all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 2 key[0]: Key Value [Numeric Format]: 65,80,71,82,48,48,48,49 Key Value [Character Format]: APGR0001 key[1]: Key Value [Numeric Format]: 66,80,71,82,48,48,48,49 Key Value [Character Format]: BPGR0001 # vxfenadm -r all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 1 Key[0]: Reservation Type: SCSI3_RESV_WRITEEXCLUSIVEREGISTRANTSONLY Key Value [Numeric Format]: 65,80,71,82,48,48,48,49 Key Value [Character Format]: APGR0001 8. Determine the appropriate letter representing the local nodeID: node0=A, node1=B, node2=C, ... #!/bin/ksh "/usr/bin/echo '\0$(expr $(lltstat -N) + 101)'" B 9. Veritas SAN Serial Number # vxfenadm -i /dev/rdsk/c2t13d0s2 Vendor id : EMC Product id : SYMMETRIX Revision : 5567 Serial Number : 42031000a 10.SCSI3-PGR Register Test Keys for new storage One system; repeat with key B1 on second system # vxfenadm -m -kA1 -f /tmp/disklist Registration completed for disk path: /dev/rdsk/c2t0d1s2 11.SCSI3-PGR Remove Test Keys for new storage One system; repeat with key B1 on second system # vxfenadm -x -kA1 -f /tmp/disklist ## list reservations
198
Advanced VCS for IO Fencing and Various Commands Deleted the key : [A1------] from device /dev/rdsk/c2t0d1s2 12.Check SCSI3-PGR Keys on a list of disks Use disk list to show keys - example only showing one disk # vxfenadm -g all -f /tmp/disklist Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 1 key[0]: Key Value [Numeric Format]: 65,49,45,45,45,45,45,45 Key Value [Character Format]: A1-----13.Check if IO Fencing License is enabled vxlicrep -e | grep PGR PGR#VERITAS Volume Manager = Enabled PGR_TRAINING#VERITAS Volume Manager = Enabled PGR = Enabled PGR_TRAINING = Enabled 14.Disk Detach Policy In VERITAS Volume Manager 3.2 and later versions, there are two detach policies for a shared disk group, global and local. The default policy, and the way VERITAS Cluster Volume Manager (CVM) has always worked, is global. The policy can be selected for each disk group with the vxedit set command. The global policy will cause the disk to be detached throughout the cluster if a single node experiences an I/O failure to that disk. The local policy may be preferred for unmirrored volumes or in cases where availability is preferred over redundancy of the data. It allows a disk that experiences an I/O failure to remain available if other nodes in the cluster are still able to access it. After an I/O failure occurs, a message will be passed around the cluster to determine if the failure is disk related or path related. If the other nodes can still write to the disk, the mirrors are kept in sync by other nodes. The original node will fail writes. Something similar is done for reads, but the read will succeed. The state is not persistent. If a node has a local I/O failure, it does not remember. Any following read or write that fails will go through the same process of passing messages around the cluster to check for path or disk failure and repair the mirrored volume. Disk Detach Policy has no effect on the Master node, as any IO failure will result in the plex detaching regardless of policy. In any case, slaves that can't see the disk will still be unable to join the cluster. vxedit man page: Attribute Values for Disk Group Records diskdetpolicy Sets a disk group <detach policy>. These policies determine the way VxVM detaches unusable disks in a shared disk group. The diskdetpolicy attribute is ignored for private disk groups. - global
199
Advanced VCS for IO Fencing and Various Commands For a shared disk group, if any node in the cluster reports a disk failure, the detach occurs in the entire cluster. This is the default policy. - local If a disk fails, the failure is confined to the node that detected the failure. An attempt is made to communicate with all nodes in the cluster to ascertain the failed disk's usability. If all nodes report a problem with the failed disk, the disk is detached throughout the cluster. Note: The name of the shared disk group must be specified twice; once as the argument to the -g option, and again as the name argument that specifies the record to be edited as shown in this example: vxedit -g shareddg set diskdetpolicy=local shareddg NOTE !! For cluster filesystems, if the CFS primary resides on a slave node, an IO error on that node will result in the filesystem being disabled cluster-wide. This option is primarily intended for raw volumes. See following technote where local detach policy is strongly discouraged for DBE/AC: http://support.veritas.com/docs/258677 15.Example walk through of adding SCSI3-PGR Keys Manually a. First deport the diskgroup and confirm no keys # vxdg deport orabinvol_dg # vxfenadm -g all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 0 No keys... # vxfenadm -r all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 0 No keys... b. Now, register with the device # vxfenadm -m -kA1 -f /tmp/data_disk Registration completed for disk path: /dev/rdsk/c2t0d1s2 # vxfenadm -g all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2
200
Advanced VCS for IO Fencing and Various Commands Total Number Of Keys: 1 key[0]: Key Value [Numeric Format]: 65,49,45,45,45,45,45,45 Key Value [Character Format]: A1-----# vxfenadm -r all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 0 No keys... c. Set the reservation mode
Note
Even though the reservation is not a key, you must use the registration key to RESERVE (see note above). # vxfenadm -n -f /tmp/data_disk VXFEN:libvxfen:1118: Reservation FAILED for: /dev/rdsk/c2t0d1s2 VXFEN:libvxfen:1133: Error returned: Error 0 # vxfenadm -n -kA1 -f /tmp/data_disk Reservation completed for disk path: /dev/rdsk/c2t0d1s2 # vxfenadm -g all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 1 key[0]: Key Value [Numeric Format]: 65,49,45,45,45,45,45,45 Key Value [Character Format]: A1-----# vxfenadm -r all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 1 Key[0]: Reservation Type: SCSI3_RESV_WRITEEXCLUSIVEREGISTRANTSONLY Key Value [Numeric Format]: 65,49,45,45,45,45,45,45 Key Value [Character Format]: A1-----d. Remove the REGISTRATION # vxfenadm -x -kA1 -f /tmp/data_disk Deleted the key : [A1------] from device /dev/rdsk/c2t0d1s2 # vxfenadm -g all -f /tmp/data_disk Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 0 No keys... # vxfenadm -r all -f /tmp/data_disk
201
Device Name: /dev/rdsk/c2t0d1s2 Total Number Of Keys: 0 No keys... e. Unregistering removed the RESERVATION too # vxfenadm -m -kA1 -f /tmp/data_disk Registration completed for disk path: /dev/rdsk/c2t0d1s2 # vxfenadm -n -kA1 -f /tmp/data_disk Reservation completed for disk path: /dev/rdsk/c2t0d1s2
# vxfenadm -m -kB1 -f /tmp/data_disk Registration completed for disk path: /dev/rdsk/c3t0d1s2 # vxfenadm -g all -f /tmp/data_disk Device Name: /dev/rdsk/c3t0d1s2 Total Number Of Keys: 2 key[0]: Key Value [Numeric Format]: 65,49,45,45,45,45,45,45 Key Value [Character Format]: A1-----key[1]: Key Value [Numeric Format]: 66,49,45,45,45,45,45,45 Key Value [Character Format]: B1-----# vxfenadm -r all -f /tmp/data_disk Device Name: /dev/rdsk/c3t0d1s2 Total Number Of Keys: 1 Key[0]: Reservation Type: SCSI3_RESV_WRITEEXCLUSIVEREGISTRANTSONLY Key Value [Numeric Format]: 65,49,45,45,45,45,45,45 Key Value [Character Format]: A1-----f. A1 Key Removal # vxfenadm -x -kA1 -f /tmp/data_disk Deleted the key : [A1------] from device /dev/rdsk/c2t0d1s2 # vxfenadm -g all -f /tmp/data_disk Device Name: /dev/rdsk/c3t0d1s2 Total Number Of Keys: 1 key[0]: Key Value [Numeric Format]: 66,49,45,45,45,45,45,45 Key Value [Character Format]: B1-----# vxfenadm -r all -f /tmp/data_disk Device Name: /dev/rdsk/c3t0d1s2 Total Number Of Keys: 0
202
ISCSI Solaris software Target and Initiator Veritas Cluster Configuration with Zones
Walkthrough configuring a iSCSI Target and Initiator for Non-Global Zone migration, using VCS 5.0MP3 for failover between two test LDOMs. Example commands for the Target System are on a U40, Initiator Configuration between two LDOMs. My use of LDOMs here is for testing, Veritas Cluster Server can be used to failover LDOMs, however it is not recommended to run VCS within an LDOM as though it is a non-virtualized system. TARGET SERVER Simple configuration, no CHAP, no real security. Buyer be ware. $ zfs create V 16g jbod/iscsi/zlun1 $ zfs set shareiscsi=on jbod/iscsi/zlun1 $ iscsitadm list target Target: jbod/iscsi/lun0 iSCSI Name: \ iqn.1986-03.com.sun:02:b3d446a9-683b-615d-b5db-ff6846dbf758 Connections: 0 Target: jbod/iscsi/zlun1 iSCSI Name: \ iqn.1986-03.com.sun:02:633bdd37-1dfa-e1df-ee5e-91b8d29f410d Connections: 0 INITIATOR SERVER Manual Configuration Static Entry (no auto-discover): Execute the following on LDOM#0 and LDOM#1 $ iscsiadm add static-config iqn.1986-03.com.sun:\ 02:633bdd37-1dfa-e1df-ee5e-91b8d29f410d,192.168.15.30 $ iscsiadm modify discovery --static enable Feb 2 18:29:50 dom1 iscsi: NOTICE: iscsi session(4)\ iqn.1986-03.com.sun:02:633bdd37-1dfa-e1df-ee5e-91b8d29f410d\ online Feb 2 18:29:52 dom1 scsi: WARNING: \ /scsi_vhci/ssd@g010000144f3b8d6000002a004987cb2c (ssd0): Feb 2 18:29:52 dom1 Corrupt label; wrong magic number bash-3.00# Feb 2 18:29:53 dom1 scsi: WARNING: \ /scsi_vhci/ssd@g010000144f3b8d6000002a004987cb2c (ssd0): Feb 2 18:29:53 dom1 Corrupt label; wrong magic number $ devfsadm -c iscsi $ format Searching for disks...
203
Advanced VCS for IO Fencing and Various Commands Feb 2 18:30:54 dom1 scsi: WARNING: \ /scsi_vhci/ssd@g010000144f3b8d6000002a004987cb2c (ssd0): Feb 2 18:30:54 dom1 Corrupt label; wrong magic number Feb 2 18:30:55 dom1 scsi: WARNING: \ /scsi_vhci/ssd@g010000144f3b8d6000002a004987cb2c (ssd0): Feb 2 18:30:55 dom1 Corrupt label; wrong magic number done c1t010000144F3B8D6000002A004987CB2Cd0: \ configured with capacity of 16.00GB
AVAILABLE DISK SELECTIONS: 0. c0d0 <SUN-DiskImage-16GB cyl 55922 alt 2 hd 1 sec 600> /virtual-devices@100/channel-devices@200/disk@0 1. c1t010000144F3B8D6000002A004987CB2Cd0\ <SUN-SOLARIS-1 cyl 32766 alt 2 hd 4 sec 256> /scsi_vhci/ssd@g010000144f3b8d6000002a004987cb2c Specify disk (enter its number): 1 LABEL Drive #1 Creation of ZPool for NGZ, and NGZ on iSCSI Storage. Creation of zpool, and non-global zone followed by deport/import and detach/attach for testing migration prior to failover configuration. LDOM#0 Only $ zpool create zones \ c1t010000144F3B8D6000002A004987CB2Cd0 $ zfs create zones/p1 $ chmod 700 zones/p1 $ zonecfg z p1 zonecfg:p1> create zonecfg:p1> set zonepath=/zones/p1 zonecfg:p1> add net zonecfg:p1:net> set physical=vnet0 zonecfg:p1:net> set address=192.168.15.77/24 zonecfg:p1:net> end zonecfg:p1> exit $ zoneadm -z p1 install $ zoneadm z p1 boot $ zlogin C p1 // Config systems sysidcfg $ zoneadm z p1 halt $ zoneadm z p1 detach $ zpool export zones LDOM#1 Only
204
Advanced VCS for IO Fencing and Various Commands $ $ $ $ zpool import zones zonecfg z p1 create a /zones/p1 zoneadm z p1 attach [-u] zooneadm z p1 boot
REVERSE Migration of Non-Global Zone Migration back to original host: LDOM#1 commands $ zoneadm z p1 halt $ zoneadm z p1 detach $ zpool export zones Migration back to original host: LDOM#0 commands Note lack of running zonecfg z p1 create a /zones. This is not necessary once the zone.xml and index.xml are updated with p1 zone information. Should this script be automated, you may want to consider adding the force configuration into the script just in case. $ zpool import zones $ zoneadm z p1 attach [-u] $ zoneadm z p1 boot Moving Configuration of Zone and ZFS Pool on iSCSI Storage into Veritas Cluster Server .50MP3.
Note
The Zpool Agent only included with VCS starting in 5.0MP3 for Solaris. There are a number of configuration variations that could be used here, including legacy mounts with the Mount Agent. Below is a simple layout that uses ZFS Automounting when the zpool is imported through VCS. Example VCS 5.0MP3 main.cf configuration for Zpool and Zone Failover $ $ $ $ $ $ $ $ $ haconf -makerw hagrp add ztest hagrp modify ztest SystemList dom2 0 dom1 1 hagrp modify ztest AutoStartList dom2 dom1 hares hares hares hares hares -add zpool_zones Zpool ztest -modify zpool_zones PoolName zones -modify zpool_zones AltRootPath "/" -modify zpool_zones ChkZFSMounts 1 -modify zpool_zones Enabled 1
$ /opt/VRTSvcs/bin/hazonesetup ztest zone_p1 p1 \ ems dom1 dom2 $ haconf makerw $ hares link zone_p1 zpool_zones $ haconf dump -makero Example main.cf: /etc/VRTSvcs/conf/config/main.cf: include "types.cf"
205
cluster LDOM_LAB ( UserNames = { admin = eLMeLGlIMhMMkUMgLJ, z_zone_p1_dom2 = bkiFksJnkHkjHpiMji, z_zone_p1_dom1 = dqrRrkQopKnsOooMqx } Administrators = { admin } ) system dom1 ( ) system dom2 ( ) group ztest ( SystemList = { dom1 = 0, dom2 = 1 } AutoStartList = { dom2, dom1 } Administrators = { z_zone_p1_dom2, z_zone_p1_dom1 } ) Zone zone_p1 ( ZoneName = p1 ) Zpool zpool_zones ( PoolName = zones AltRootPath = "/" ) zone_p1 requires zpool_zones
## You enable the LLT link as follows: # lltconfig -t <tag> -L 3 (3 to enable the link)
206
Note
This process has only been used on CONCAT volumes. You will need to convert layout to CONCAT for each volume if striped. Migration Workflow 1. Have new SAN storage allocated to target host, and the same new storage LUN Masked/Zoned to source host 2. Mirror storage on source host to the new LUNS 3. Collelct dump of vxvm database 4. Break Mirror and remove new LUNs from Source host vxvm configuration 5. Re-create new disk group on target host using modified vxvm database dump 6. Online new storage group on target system Migration Walkthrough 1. Identify source and target LUNs; and difference in device names on source and target. Also record mount points and disk sizes target_lun0 = c2t600144F04A2E74170000144F3B8D6000d0 source_lun0 = c2t600144F04A2E74150000144F3B8D6000d0 # df -h Filesystem size used avail capacity Mounted on /dev/vx/dsk/demo_orig/v01 4.0G 18M 3.7G 1% /v01 /dev/vx/dsk/demo_orig/v02 4.0G 18M 3.7G 1% /v02 /dev/vx/dsk/demo_orig/v03 2.0G 18M 1.9G 1% /v03
207
Advanced VCS for IO Fencing and Various Commands /dev/vx/dsk/demo_org/v02 /dev/vx/rdsk/demo_org/v02 /v02 vxfs 2 yes /dev/vx/dsk/demo_org/v03 /dev/vx/rdsk/demo_org/v03 /v03 vxfs 2 yes -
# vxprint Disk group: demo_orig TY NAME dg demo_orig dm target_lun0 dm orig_disk ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 demo_orig - target_lun0 source_lun0 25098496 25098496 8388608 8388608 8388608 8388608 8388608 8388608 4194304 4194304 4194304 0 0 0 - - ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE -
v v01 fsgen ENABLED pl v01-01 v01 ENABLED sd orig_disk-01 v01-01 ENABLED v v02 fsgen ENABLED pl v02-01 v02 ENABLED sd orig_disk-02 v02-01 ENABLED v v03 fsgen ENABLED pl v03-01 v03 ENABLED sd orig_disk-03 v03-01 ENABLED
2. Add disks from destination to source server and mirror to new disks # vxdg -g demo_orig adddisk target_lun0=target_lun0
# vxassist -b -g demo_orig mirror v01 target_lun0 # vxassist -b -g demo_orig mirror v02 target_lun0 # vxassist -b -g demo_orig mirror v03 target_lun0 3. Collect Data needed for vxmake
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/vx/dmp/source_lun0s2 \ >/priv_dump.out # cat /priv_dump.out|vxprint -D - -hvpsm >/maker.out # cat /priv_dump.out|vxprint -D - -d -F "%name=%last_da_name" > list 4. Copy priv_dump.out, maker.out , list and vxdisk-o-alldgs.out to target system: # scp priv_dump.out maker.out list vxdisk-o-alldgs.out \ a123456@target:
5. Remove target mirror for each volume on source server # vxplex -o rm dis target_lun-plex
6. Remove target disks from vx disk group on source server # vxdg -g demo_orig rmdisk target_lun0
208
Advanced VCS for IO Fencing and Various Commands Storage Group Creation on Target Host 1. Update the maker.out, removing reference to source drives. Backup files before editing. Specifically removing sub disk and plex information pointing toward the source disk. Since plex v01-01 and sub disk orig_disk-01 were the original mirrors, delete references for those items in the maker.out file. Here they are highlighted. Onlyt v01 volume is shown, continue for all volumes. vol v01 use_type=fsgen fstype=" comment=" putil0=" putil1=" putil2=" state="ACTIVE writeback=on writecopy=off specify_writecopy=off pl_num=2 start_opts=" read_pol=SELECT minor=54000 user=root group=root mode=0600 log_type=REGION len=8388608 log_len=0 update_tid=0.1081 rid=0.1028 detach_tid=0.0 active=off forceminor=off badlog=off recover_checkpoint=16 sd_num=0 sdnum=0 kdetach=off storage=off readonly=off layered=off apprecover=off recover_seqno=0 recov_id=0 primary_datavol= vvr_tag=0 iscachevol=off morph=off guid={7251b03a-1dd2-11b2-ad16-00144f6ece3b} inst_invalid=off incomplete=off
209
Advanced VCS for IO Fencing and Various Commands instant=off restore=off snap_after_restore=off oldlog=off nostart=off norecov=off logmap_align=0 logmap_len=0 inst_src_guid={00000000-0000-0000-0000-000000000000} cascaded=off plex=v01-01,v01-02 export= plex v01-01 compact=on len=8388608 contig_len=8388608 comment=" putil0=" putil1=" putil2=" v_name=v01 layout=CONCAT sd_num=1 state="ACTIVE log_sd= update_tid=0.1066 rid=0.1031 vol_rid=0.1028 detach_tid=0.0 log=off noerror=off kdetach=off stale=off ncolumn=0 raidlog=off guid={7251f842-1dd2-11b2-ad16-00144f6ece3b} mapguid={00000000-0000-0000-0000-000000000000} sd=orig_disk-01:0 sd orig_disk-01 dm_name=orig_disk pl_name=v01-01 comment=" putil0=" putil1=" putil2=" dm_offset=0 pl_offset=0 len=8388608 update_tid=0.1034 rid=0.1033 guid={72523956-1dd2-11b2-ad16-00144f6ece3b} plex_rid=0.1031 dm_rid=0.1026 minor=0
210
Advanced VCS for IO Fencing and Various Commands detach_tid=0.0 column=0 mkdevice=off subvolume=off subcache=off stale=off kdetach=off relocate=off sd_name= uber_name= tentmv_src=off tentmv_tgt=off tentmv_pnd=off plex v01-02 compact=on len=8388608 contig_len=8388608 comment=" putil0=" putil1=" putil2=" v_name=v01 layout=CONCAT sd_num=1 state="ACTIVE log_sd= update_tid=0.1081 rid=0.1063 vol_rid=0.1028 detach_tid=0.0 log=off noerror=off kdetach=off stale=off ncolumn=0 raidlog=off guid={3d6ce0f2-1dd2-11b2-ad18-00144f6ece3b} mapguid={00000000-0000-0000-0000-000000000000} sd=new_disk-01:0 sd new_disk-01 dm_name=new_disk pl_name=v01-02 comment=" putil0=" putil1=" putil2=" dm_offset=0 pl_offset=0 len=8388608 update_tid=0.1066 rid=0.1065 guid={3d6d2076-1dd2-11b2-ad18-00144f6ece3b} plex_rid=0.1063 dm_rid=0.1052
211
Advanced VCS for IO Fencing and Various Commands minor=0 detach_tid=0.0 column=0 mkdevice=off subvolume=off subcache=off stale=off kdetach=off relocate=off sd_name= uber_name= tentmv_src=off tentmv_tgt=off tentmv_pnd=off 2. Create Disk Group on Target from Disks that were a mirror on source: Get the value of X from the first drive listed in "list" # vxdg init newdg $X=target_lun0 3. Rebuild volumes from maker.out .out scripts # vxmake -g newdg -d /maker.out 4. Start Volumes
212
213
# zfs create -V 10g npool/iscsitgt/vdisk_dom1 # sbdadm create-lu /dev/zvol/rdsk/npool/iscsitgt/vdisk_dom1 Created the following LU: GUID DATA SIZE SOURCE ----------------- ------------------- ---------600144f0c312030000004a366cee0001 19327287296 /dev/zvol/rdsk/npool/iscsitgt/vdisk_dom1 # stmfadm add-view 600144f0c312030000004a366cee0001 # itadm create-target Target iqn.1986-03.com.sun:02:\ 278f5072-6662-e976-cc95-8116fd42c2c2 successfully created
214
# stmfadm create-hg t1000_primary # stmfadm add-hg-member -g t1000_primary \ iqn.1986-03.com.sun:01:00144f6ece3a.498cfeb2 3. Create a access list for each target interface # svcadm disable stmf # stmfadm list-target # itadm list-target -v TARGET NAME iqn.1986-03.com.sun:02:2be6d243-0ff9-6981-f157-eea00338d1d4 alias: auth: none (defaults) targetchapuser: targetchapsecret: unset tpg-tags: nge0 = 2 iqn.1986-03.com.sun:02:1a6416d2-a260-ebe4-bbf7-d28643276f65 alias: auth: none (defaults) targetchapuser: targetchapsecret: unset tpg-tags: nge1 = 2
STATE online
online
# stmfadm create-tg iFA1 # stmfadm create-tg iFA0 # stmfadm add-tg-member -g iFA1 \ iqn.1986-03.com.sun:02:1a6416d2-a260-ebe4-bbf7-d28643276f65 # stmfadm add-tg-member -g iFA0 \ iqn.1986-03.com.sun:02:2be6d243-0ff9-6981-f157-eea00338d1d4 4. Mapping each LUN to both the Target TG access list, and the remote host HG Access list # sbdadm list-lu | awk '{print $1, $3}' Found LU(s) GUID SIZE -------------------------------600144f0c312030000004a3b8068001c 600144f0c312030000004a3b8068001b 600144f0c312030000004a3b8068001a 600144f0c312030000004a3b80680019 600144f0c312030000004a3b80680018 600144f0c312030000004a3b80680017
215
## Repeat below for each LUN to be shared over iFA1 (nge1) to remove ## iscsi addressed defined in HG t1000_primary # stmfadm add-view -h t1000_primary -t iFA1 -n 0 \ 600144f0c312030000004a3b80680017
216
Warning
ZFS is not supported for the /globaldevice filesystem, therefore unless you are being creative avoid installing Solaris 10 with the ZFS Root Option. If you do not allocate a UFS filesystem and partition for /globaldevices then a LOFI device will be used. This will reduce boot performance. Partition Layout - set identical between both servers where possible
Part Tag 0 root 1 swap 2 backup 3 unassigned 4 var 5 unassigned 6 unassigned 7 unassigned
Flag Size wm 8.00GB wu 8.00GB wm 74.50GB wm 8.00GB wm 8.00GB wm 1.00GB wm 512.19MB wm 40.99GB
Mount Point / [swap] [backup] /opt /var /globaldevice [reserved for SVM MDB] /free [remaining]
Interface Function Planned Options ---------------------------------------------------bge0 Public IPMP Link Only Detection bge1 Private Used for HB bge2 Private Used for HB bge3 Public IPMP Link Only Detection
217
Installation
This section covers a walkthrough configuration for Sun Cluster. General installation include the following: 1. Product Installaton Location
Warning
Either untar the software on both servers under /tmp or run installer from a shared directory such as NFS. Sun Cluster must be installed on both systems 2. Run Installer Script /swdepot/sparc/suncluster/Solaris_sparc $ ./installer Unable to access a usable display on the remote system. Continue in command-line mode?(Y/N) Y <Press ENTER to Continue> <Press ENTER to display the Software License Agreement> <--[40%]--[ENTER To Continue]--[n To Finish]-->n License Agreement [No] {"<" goes back, "!" exits}? Yes
Installation Type ----------------Do you want to install the full set of Sun Java(TM) Availability Suite Products and Services? (Yes/No) [Yes] {"<" goes back, "!" exits} Yes Install multilingual package(s) for all selected components [Yes] {"<" goes back, "!" exits}: No Do you want to add multilanguage support now? 1. Yes 2. No Enter your choice [1] {"<" goes back, "!" exits} 2 Enter 1 to upgrade these shared components and 2 to cancel [1] {"<" goes back, "!" exits}: 1
Checking System Status Available disk space... Memory installed... Swap space installed... : Checking .... OK : Checking .... OK : Checking .... OK
218
Operating system patches... Operating system resources... System ready for installation
Screen for selecting Type of Configuration 1. Configure Now - Selectively override defaults or express through 2. Configure Later - Manually configure following installation Select Type of Configuration [1] {"<" goes back, "!" exits} 2 Ready to Install ---------------The following components will be installed. Product: Java Availability Suite Uninstall Location: /var/sadm/prod/SUNWentsyssc32u2 Space Required: 326.34 MB --------------------------------------------------Java DB Java DB Server Java DB Client Sun Cluster 3.2 1/09 Sun Cluster Core Sun Cluster Manager Sun Cluster Agents 3.2 1/09 Sun Cluster HA for Sun Java(TM) System Application Server Sun Cluster HA for Sun Java(TM) System Message Queue Sun Cluster HA for Sun Java(TM) System Messaging Server Sun Cluster HA for Sun Java(TM) System Calendar Server Sun Cluster HA for Sun Java(TM) System Directory Server Sun Cluster HA for Sun Java(TM) System Application Server EE (HADB) Sun Cluster HA for Instant Messaging Sun Cluster HA/Scalable for Sun Java(TM) System Web Server Sun Cluster HA for Apache Tomcat Sun Cluster HA for Apache Sun Cluster HA for DHCP Sun Cluster HA for DNS Sun Cluster HA for MySQL Sun Cluster HA for Sun N1 Service Provisioning System Sun Cluster HA for NFS Sun Cluster HA for Oracle Sun Cluster HA for Samba Sun Cluster HA for Sun N1 Grid Engine Sun Cluster HA for Solaris Containers Sun Cluster Support for Oracle RAC Sun Cluster HA for Oracle E-Business Suite Sun Cluster HA for SAP liveCache Sun Cluster HA for WebSphere Message Broker Sun Cluster HA for WebSphere MQ
219
Sun Cluster HA for Oracle 9iAS Sun Cluster HA for SAPDB Sun Cluster HA for SAP Web Application Server Sun Cluster HA for SAP Sun Cluster HA for PostgreSQL Sun Cluster HA for Sybase ASE Sun Cluster HA for BEA WebLogic Server Sun Cluster HA for Siebel Sun Cluster HA for Kerberos Sun Cluster HA for Swift Alliance Access Sun Cluster HA for Swift Alliance Gateway Sun Cluster HA for Informix Sun Cluster Geographic Edition 3.2 1/09 Sun Cluster Geographic Edition Core Components Sun Cluster Geographic Edition Manager Sun StorEdge Availability Suite Data Replication Support Hitachi Truecopy Data Replication Support SRDF Data Replication Support Oracle Data Guard Data Replication Support Quorum Server Sun Java(TM) System High Availability Session Store 4.4.3 All Shared Components Sun Java(TM) System Monitoring Console 1.0 Update 1 1. Install 2. Start Over 3. Exit Installation What would you like to do [1] {"<" goes back, "!" exits}? 1
Enter 1 to view installation summary and Enter 2 to view installation logs [1] {"!" exits} ! In order to notify you of potential updates, we need to confirm an internet connection. Do you want to proceed [Y/N] : N
Basic Configuration
This section covers a walkthrough configuration for Sun Cluster. General configuration include the following:
Warning
Interfaces configured for heart beats must be unplumbed and have no /etc/hostname.dev file.
Warning
During the scinstall configuration process the nodes will be rebooted 1. Product Configuration # /usr/cluster/bin/scinstall
220
*** Main Menu *** Please select from one of the following (*) options: * 1) 2) 3) 4) 5) Create a new cluster or add a cluster node Configure a cluster to be JumpStarted from this install server Manage a dual-partition upgrade Upgrade this cluster node Print release information for this cluster node
*** New Cluster and Cluster Node Menu *** Please select from any one of the following options: 1) Create a new cluster 2) Create just the first node of a new cluster on this machine 3) Add this machine as a node in an existing cluster ?) Help with menu options q) Return to the Main Menu Option: 1
This option creates and configures a new cluster. You must use the Java Enterprise System (JES) installer to install the Sun Cluster framework software on each machine in the new cluster before you select this option. If the "remote configuration" option is unselected from the JES installer when you install the Sun Cluster framework on any of the new nodes, then you must configure either the remote shell (see rsh(1)) or the secure shell (see ssh(1)) before you select this option. If rsh or ssh is used, you must enable root access to all of the new member nodes from this node. Press Control-d at any time to return to the Main Menu.
Do you want to continue (yes/no) [yes]? >>> Typical or Custom Mode <<<
221
This tool supports two modes of operation, Typical mode and Custom. For most clusters, you can use Typical mode. However, you might need to select the Custom mode option if not all of the Typical defaults can be applied to your cluster. For more information about the differences between Typical and Custom modes, select the Help option from the menu. Please select from one of the following options: 1) Typical 2) Custom ?) Help q) Return to the Main Menu Option [1]: 1
>>> Cluster Name <<< Each cluster has a name assigned to it. The name can be made up of any characters other than whitespace. Each cluster name should be unique within the namespace of your enterprise. What is the name of the cluster you want to establish? >>> Cluster Nodes <<< This Sun Cluster release supports a total of up to 16 nodes. Please list the names of the other nodes planned for the initial cluster configuration. List one node name per line. When finished, type Control-D: Node name (Control-D to finish): Node name (Control-D to finish): sysdom1 ^D SC001
This is the complete list of nodes: sysdom0 sysdom1 Is it correct (yes/no) [yes]? yes
>>> Cluster Transport Adapters and Cables <<< You must identify the cluster transport adapters which attach this node to the private cluster interconnect. For node "sysdom0", What is the name of the first cluster transport adapter? >>> Cluster Transport Adapters and Cables <<<
bge1
222
You must identify the cluster transport adapters which attach this node to the private cluster interconnect. Select the first cluster transport adapter for "sysdom0": 1) bge2 2) bge3 3) Other Option: 1 no
Will this be a dedicated cluster transport adapter (yes/no) [yes]? What is the cluster transport VLAN ID for this adapter? 1
Searching for any unexpected network traffic on "bge1002" ... done Verification completed. No traffic was detected over a 10 second sample period. Select the second cluster transport adapter for "sysdom0": 1) bge2 2) bge3 3) Other Option: >>> Quorum Configuration <<< Every two-node cluster requires at least one quorum device. By default, scinstall selects and configures a shared disk quorum device for you. This screen allows you to disable the automatic selection and configuration of a quorum device. You have chosen to turn on the global fencing. If your shared storage devices do not support SCSI, such as Serial Advanced Technology Attachment (SATA) disks, or if your shared disks do not support SCSI-2, you must disable this feature. If you disable automatic quorum device selection now, or if you intend to use a quorum device that is not a shared disk, you must instead use clsetup(1M) to manually configure quorum once both nodes have joined the cluster for the first time. Do you want to disable automatic quorum device selection (yes/no) [no]? Cluster Creation Log file - /var/cluster/logs/install/scinstall.log.28876 Testing for "/globaldevices" on "sysdom0" ... done
223
Testing for "/globaldevices" on "sysdom1" ... done Starting discovery of the cluster transport configuration. The following connections were discovered: sysdom0:bge2 sysdom0:bge3 switch1 switch2 sysdom1:bge2 [VLAN ID 1] sysdom1:bge3 [VLAN ID 1]
Completed discovery of the cluster transport configuration. Started cluster check on "sysdom0". Started cluster check on "sysdom1". cluster check completed with no errors or warnings for "sysdom0". cluster check completed with no errors or warnings for "sysdom1".
Configuring "sysdom1" ... done Rebooting "sysdom1" ... done Configuring "sysdom0" ... done Rebooting "sysdom0" ... Log file - /var/cluster/logs/install/scinstall.log.28876
Rebooting ...
General Commands
This section covers a walkthrough configuration for Sun Cluster. General resource configuration: List DID Disks for use with failover storage devices
Note
The DID ID's are under /dev/did/dsk and /dev/did/rdsk on each node in the cluster. These paths are to be used for creating failover filesystems, zpools and storage access. cldevice list -v DID Device Full Device Path ---------- ---------------d1 sysdom1:/dev/rdsk/c0t0d0 d2 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B80680017d0 d2 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B80680017d0 d3 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B80680018d0 d3 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B80680018d0 d4 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B80680019d0 d4 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B80680019d0 d5 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B8068001Ad0 d5 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B8068001Ad0 d6 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B8068001Bd0 d6 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B8068001Bd0
224
d7 d7 d8 d8 d9 d9 d10
Add a Quorum Disk vsrv2# clquorum list vsrv2 vsrv1 vsrv2# cldevice list -v DID Device Full Device Path ---------- ---------------d1 vsrv2:/dev/rdsk/c0d0 d2 vsrv2:/dev/rdsk/c1t600144F04A4D00400000144F3B8D6000d0 d2 vsrv1:/dev/rdsk/c1t600144F04A4D00400000144F3B8D6000d0 d3 vsrv2:/dev/rdsk/c1t600144F04A53950C0000144F3B8D6000d0 d3 vsrv1:/dev/rdsk/c1t600144F04A53950C0000144F3B8D6000d0 d4 vsrv1:/dev/rdsk/c0d0 vsrv2# clquorum add -v /dev/did/rdsk/d2 Quorum device "/dev/did/rdsk/d2" is added. vsrv2# clquorum list -v Quorum Type --------d2 shared_disk vsrv2 node vsrv1 node
225
# clrt register HAStoragePlus # clrs create -g apache-rg -t HAStoragePlus -p Zpools=apache apache-zpool-rs 4. Bring the Apache Resource Group online and status # clrg online -M apache-rg # clrg status === Cluster Resource Groups === Group Name ---------apache-rg Node Name --------sysdom1 sysdom0 Suspended --------No No Status -----Online Offline
5. Switch Apache Resource Group to alternate server # clrg switch -n sysdom0 apache-rg # clrg status === Cluster Resource Groups === Group Name ---------apache-rg Node Name --------sysdom1 sysdom0 Suspended --------No No Status -----Offline Online
6. Configure Apache to use Failover Storage Update the httpd.conf file to point to storage under /apache on both servers. # zfs create apache/htdocs # vi /etc/apache2/httpd.conf Update <Directory> amoung others.
7. Add floating IP address Make sure IP/Hostname is in both servers /etc/hosts file. In this case the server vsrvmon has an IP of 192.168.15.95 # clreslogicalhostname create -g apache-rg -h vsrvmon host-vsrvmon-rs # ifconfig -a bge0:1: flags=1001040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED ,IPv4,FIXEDMTU> mtu 1500 index 2 inet 192.168.15.95 netmask ffffff00 broadcast 192.168.15.255 # scstat -i -- IPMP Groups --
226
Node Name --------IPMP Group: sysdom1 IPMP Group: sysdom1 IPMP Group: sysdom0 IPMP Group: sysdom0
-- IPMP Groups in Zones -Zone Name --------Group ----Status -----Adapter ------Status ------
8. Update the httpd.conf on both systems to ues the floating IP as the ServerName 9. Register the Apache Agent and configure the Apache Resouruce # clrt register apache # clrs create -g apache-rg -t apache -p Bin_dir=/usr/apache2/bin \ -p Port_list=80/tcp -p Resource_dependencies=apache-zpool-rs,\ host-vsrvmon-rs apache-rs 10.Status the Apache Resource group, and switch resource through all systems
#vsrv1# clzonecluster configure sczone sczone: No such zone cluster configured Use 'create' to begin configuring a new zone cluster. clzc:sczone> create clzc:sczone> set zonepath=/localzone/sczone 2. Add sysid Information
227
# clzc:sczone> add sysid clzc:sczone:sysid> set root_password=fubar clzc:sczone:sysid> end 3. Add the physical host information and network information for the zone on each host clzc:sczone> add node clzc:sczone:node> set clzc:sczone:node> set clzc:sczone:node> add clzc:sczone:node:net> clzc:sczone:node:net> clzc:sczone:node:net> clzc:sczone:node> end clzc:sczone> add node clzc:sczone:node> set clzc:sczone:node> set clzc:sczone:node> add clzc:sczone:node:net> clzc:sczone:node:net> clzc:sczone:node:net> clzc:sczone:node> end
4. From documents - still working on what this means - in this case, the IPs are those of vsrv3 and vssrv4 in that order clzc:sczone> add clzc:sczone:net> clzc:sczone:net> clzc:sczone> add clzc:sczone:net> clzc:sczone:net> net set address=192.168.15.86 end net set address=192.168.15.85 end
5. Commit zone configuration - saves info on both servers clzc:sczone> verify clzc:sczone> commit clzc:sczone> exit 6. Build the Non-Global Zones vsrv1# clzonecluster install sczone Waiting for zone install commands to complete on all the nodes of the zone cluster "sczone"... vsrv1# clzonecluster install sczone Waiting for zone install commands to complete on all the nodes of the zone cluster "sczone"... vsrv1# clzonecluster boot sczone Waiting for zone boot commands to complete on all the nodes of the zone cluster "sczone"... 7. Use zlogin on both global zones to finish configuring sczone
228
/etc/system: set shmsys:shminfo_shmmax=SGA_size_in_bytes 2. Download and install SC 3.2 or greater 3. Download and install the SUN QFS Packages on all nodes in the cluster
# pkgadd -d . SUNWqfsr SUNWqfsu 4. Create Meta Devices for QFS Oracle Home / CRS Home
Warning
Make sure that /var/run/nodelist exists on both servers. I've noticed that it might not. If not the -M metaset command will fail. Content of the file is: Node# NodeName PrivIP
# metadb -a -f -c3 /dev/did/dsk/d3s7 # metaset -s zora -M -a -h vsrv2 vsrv1 # metaset Multi-owner Set name = zora, Set number = 1, Master = Host Owner Member
229
vsrv2 vsrv1
Yes Yes
# metaset -s zora -a /dev/did/dsk/d3 # metainit -s zora d30 1 1 /dev/did/dsk/d3s0 # metainit -s zora d300 -m d30 5. Add QFS Information for Oracle Home on both systems
/etc/opt/SUNWsamfs/mcf: RAC 5 ms RAC on shared /dev/md/zora/dsk/d300 50 md RAC on /etc/opt/SUNWsamfs/samfs.cmd: fs=RAC sync_meta=1 /etc/opt/SUNWsamfs/hosts.RAC: vsrv1 vsrv2 172.16.4.2 1 0 server 172.16.4.1 1 0
6. Create QFS Directory on both nodes and make filesystem just from one node # mkdir -p /localzone/sczone/root/db_qfe/oracle # /opt/SUNWsamfs/sbin/sammkfs -S RAC sammkfs: Configuring file system sammkfs: Enabling the sam-fsd service. sammkfs: Adding service tags. Warning: Creating a new file system prevents use with 4.6 or earlier releases. Use the -P option on sammkfs to create a 4.6 compatible file system. Building 'RAC' will destroy the contents of devices: /dev/md/zora/dsk/d300 Do you wish to continue? [y/N]y total data kilobytes = 10228928 total data kilobytes free = 10225216 7. Mount, test, and remove mount point, otherwise clzonecluster install will fail.
# mount RAC # umount RAC # rm -rf /localzone/sczone 8. Create the Zones using clzonecluster
230
# clzonecluster create sczone clzc:sczone> set zonepath=/localzone/sczone clzc:sczone> set autoboot=true 9. Add sysid Information - there are more options than listed here
sysid set root_password=ENC_PW set nfs4_domain=whatever set terminal=vt100 set security_policy=NONE set system_locale=C end
10.Add the physical host information and network information for the zone on each host clzc:sczone> add node clzc:sczone:node> set clzc:sczone:node> set clzc:sczone:node> add clzc:sczone:node:net> clzc:sczone:node:net> clzc:sczone:node:net> clzc:sczone:node> end clzc:sczone> add node clzc:sczone:node> set clzc:sczone:node> set clzc:sczone:node> add clzc:sczone:node:net> clzc:sczone:node:net> clzc:sczone:node:net> clzc:sczone:node> end
11.Add floating IP addresses for RAC VIP clzc:sczone> add clzc:sczone:net> clzc:sczone:net> clzc:sczone> add clzc:sczone:net> clzc:sczone:net> 12.Add QFS Oracle Mount net set address=rac01 end net set address=rac02 end
clzc:sczone> add fs clzc:sczone:fs> set dir=/db_qfs/oracle clzc:sczone:fs> set special=RAC clzc:sczone:fs> set type=samfs clzc:sczone:fs> end 13.Add Disks for use with ASM
231
Initially add the storage to the storage group with metaset -s zora, then add into the zone configuration - short example provided, repeat for each device
clzc:sczone> add device clzc:sczone:device> set clzc:sczone:device> end clzc:sczone> add device clzc:sczone:device> set clzc:sczone:device> end clzc:sczone> add device clzc:sczone:device> set clzc:sczone:device> end clzc:sczone> add device clzc:sczone:device> set clzc:sczone:device> end clzc:sczone> 14.Add Resource Settings to Zone
match="/dev/md/zora/rdsk/d50"
match="/dev/md/zora/rdsk/d500"
match="/dev/md/shared/1/rdsk/d50"
match="/dev/md/shared/1/rdsk/d500"
clzc:sczone> set limitpriv="default,proc_priocntl,proc_clock_highres" 15.Commit zone configuration - saves info on both servers clzc:sczone> verify clzc:sczone> commit clzc:sczone> exit 16.Build the Non-Global Zones vsrv1# clzonecluster install sczone Waiting for zone install commands to complete on all the nodes of the zone cluster "sczone"... vsrv1# clzonecluster install sczone Waiting for zone install commands to complete on all the nodes of the zone cluster "sczone"... ### On both servers: # mkdir -p /localzone/sczone/root/db_qfs/oracle ############################################### vsrv1# clzonecluster boot sczone Waiting for zone boot commands to complete on all the nodes of the zone cluster "sczone"... 17.Use zlogin on both global zones to finish configuring sczone
232
# clresourcegroup create -Z zcname -n nodelist \ -p maximum_primaries=num-in-list \ -p desired_primaries=num-in-list \ [-p rg_description="description" \] -p rg_mode=Scalable rac-fmwk-rg 2. Register the SUNW.rac_framework resource type # clresourcetype register -Z zcname SUNW.rac_framework 3. Add an instance of the SUNW.rac_framework resource type to the resource group that you created in Step 2.
# clresource create -Z zcname -g rac-fmwk-rg \ -t SUNW.rac_framework rac-fmwk-rs 4. Register the SUNW.rac_udlm resource type. # clresourcetype register -Z zcname SUNW.rac_udlm 5. Add an instance of the SUNW.rac_udlm resource type to the resource group that you created in Step 2.
# clresource create -Z zcname -g resource-group \ -t SUNW.rac_udlm \ -p resource_dependencies=rac-fmwk-rs rac-udlm-rs 6. Bring online and in a managed state the RAC framework resource group and its resources. # clresourcegroup online -Z zcname -emM rac-fmwk-rg
233
System console
Use the Esc-Shift-9 key sequence to toggle back to the local console flow. Enter Ctrl-b to terminate the connection to the serial console Connect to system console
234
Hardware Notes
Configure ELOM/SP
Change IP Address from DHCP to Static
SP> SP> SP> SP> set set set set /SP/AgentInfo /SP/AgentInfo /SP/AgentInfo /SP/AgentInfo DhcpConfigured=disable IpAddress=ipaddress NetMask=netmask Gateway=gateway
SP> show /SP/AgentInfo Properties: HWVersion = 0 FWVersion = 3.20 MacAddress = 00:16:36:5B:97:E4 IpAddress = 10.13.60.63
235
Hardware Notes
236