Site Recovery Manager Administration Guide
Site Recovery Manager Administration Guide
This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this document, see http://www.vmware.com/support/pubs.
EN-000357-00
You can find the most up-to-date technical documentation on the VMware Web site at: http://www.vmware.com/support/ The VMware Web site also provides the latest product updates. If you have comments about this documentation, submit your feedback to: [email protected]
Copyright 20082010 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.
VMware, Inc.
Contents
26
Create a Site Pair 29 Disconnect From a Protected or Recovery Site 30 Install the SRM License Key 30 Configure Array Managers 31 Configure Recovery Site Array Managers When the Protected Site Is Inaccessible 32 Rescan Arrays to Detect Configuration Changes 33 Configure Inventory Mappings 33 Apply Inventory Mappings to All Members of a Protection Group 34 Configure Resource Mappings for a Virtual Machine 35 Create Protection Groups 35 Edit a Protection Group 36 Adding and Removing Members of a Protection Group 37
VMware, Inc.
Limitations on Recovery of Snapshots and Linked Clones 37 Create a Recovery Plan 37 Edit a Recovery Plan 38 Remove a Recovery Plan 39
Test a Recovery Plan 41 Pause, Resume, or Cancel a Test 42 Run a Recovery Plan 42 Configuring and Executing Failback 43 Review and Execute Post-Failover Cleanup Tasks 44 Reconfigure Replication 44 Reconfigure SRM to Enable Failback to the Protected Site Restore the Original Configuration 45
45
Assign Roles and Permissions 47 Customizing a Recovery Plan 48 Recovery Plan Steps 48 Customize Recovery Plan Steps 51 Customize the Recovery of an Individual Virtual Machine 53 Report IP Address Mappings for a Protection Group 54 Customize IP Properties for a Group of Virtual Machines 54 Configure Protection for a Virtual Machine or Template 57 Repair Placeholder Virtual Machines After a Failed Test Recovery 58 Configure SRM Alarms 59 Working with Advanced Settings 59 Guest Customization Settings 60 Change Recovery Site Settings 60 Change SAN Provider Settings 60 Change Local Site Settings 61 Change Remote Site Settings 62 Avoiding Replication of Paging Files and Other Transient Data 62 Specify a Nonreplicated Datastore for Swapfiles 62 Create a Nonreplicated Virtual Disk for Paging File Storage 63
6 Troubleshooting SRM 65
No Replicated Datastores Listed 65 Inconsistent Mount Points Warning When Configuring NFS Arrays Array Script Files Not Found 66 Expected Virtual Machine File Path Cannot Be Found 66 Recovery Plan Time-Out During the Change Network Settings Step Collecting SRM Log Files 68 Collect SRM Server Log Files 68 Collect an SRM Client Log Bundle 68
66
67
Index 69
VMware, Inc.
VMware vCenter Site Recovery Manager (SRM) is an extension to VMware vCenter that enables integration with array-based replication, discovery and management of replicated datastores, and automated migration of inventory from one vCenter to another. SRM servers coordinate the operations of the replicated storage arrays and vCenter servers at the two sites so that, as virtual machines at one site (the protected site) are shut down, virtual machines at the other site (the recovery site) start up and, using the data replicated from the protected site, assume responsibility for providing the same services. Migration of protected inventory and services from one site to the other is controlled by a recovery plan that specifies the order in which virtual machines are shut down and started up, the compute resources they are allocated, and the networks they can access. SRM enables you to test a recovery plan, using a temporary copy of the replicated data, in a way that does not disrupt ongoing operations at either site.
Intended Audience
This book is intended for Site Recovery Manager administrators who are familiar with vSphere and its replicated datastores, and who want to configure protection for vSphere inventory. It may also be appropriate for other users who need to add virtual machines to protected inventory or verify that existing inventory is properly configured for use with SRM.
Document Feedback
VMware welcomes your suggestions for improving our documentation. If you have comments, send your feedback to [email protected].
VMware, Inc.
VMware, Inc.
VMware vCenter Site Recovery Manager is a business continuity and disaster recovery solution that helps you plan, test, and execute a scheduled migration or emergency failover of vCenter inventory from one site to another. This chapter includes the following topics:
n n
Protected Sites and Recovery Sites, on page 7 SRM and vCenter, on page 13
Each site must have at least one datacenter. The recovery site must support array-based replication with the protected site. The recovery site must have hardware, network, and storage resources that can support the same virtual machines and workloads as the protected site. At least one virtual machine must be located on a replicated datastore at the protected site. This datastore must be supported by a storage array that is compatible with SRM. The sites should be connected by a reliable IP network (storage arrays might have additional network requirements). The recovery site should have access to the same networks (public and private) as the protected site, although not necessarily the same range of network addresses.
VMware, Inc.
Site Pairing
The protected and recovery sites must be paired before you can use SRM. Site pairing includes three main steps: 1 Exchange of authentication information between the two sites. 2 3 Discovery of the replicated storage arrays that support the protected site, and discovery of peer arrays at the recovery site. Discovery of the replicated devices supported by the arrays, and mapping of these devices to datastores that support virtual machines.
SRM includes a wizard that guides you through the site-pairing process. Site pairing requires vSphere administrative privileges at both sites. To initiate the site-pairing process, you must know the username and password of a vSphere administrator at each site.
Array-Based Replication
In array-based replication, one or more storage arrays at the protected site replicate data to peer arrays at the recovery site. Storage replication adapter (SRAs) enable integration of SRM with a wide variety of replicated arrays. You can configure array-based replication between ESX hosts whether or not you use SRM. In fact, it is a good idea to set up this replication before you install and configure SRM, so that you can be certain it is working correctly before you install SRM and the necessary SRAs.
VMware, Inc.
You cannot designate a third site as a recovery site for one that is already paired with another site. If you want to use SRM to provide business continuity and disaster recovery services at a recovery site, you must configure that site as a protected site that uses its own array managers to replicate data to the other member of the site pair. After site pairing is complete, configuring bidirectional operation requires you to follow the same site configuration procedures that are required for unidirectional operation, but you must do so for each site in each capacity. At recovery site that has not been configured for bidirectional operation, items that must be configured at a protected site remain unconfigured:
n n
Array Managers and Inventory Mappings are always listed as Not Configured. Protection Groups are listed as No Groups Created.
If the new device is created on a replicated datastore that is not protected (not part of any protection group), the datastore is added to the virtual machine's protected datastore group and the virtual machine's protection is unaffected. If the new device is created on a replicated datastore that is protected by a different protection group, the virtual machine's protection is invalidated. If the new device is created on a nonreplicated datastore, the virtual machine's protection is invalidated.
If you use Storage VMotion to move a virtual machine to a nonreplicated datastore, or to a replicated datastore on an array that SRM has not been configured to manage (through an SRA), the virtual machine's protection is invalidated.
A virtual machine has files on two different datastores. Two virtual machines share an RDM device on a SAN array. Two datastores span extents corresponding to different partitions of the same device. A single datastore spans two extents corresponding to partitions of two different devices. Multiple devices belong to a consistency group defined on the storage array.
SRM computes datastore groups when you first configure your array managers. After that, the computation executes every time that a virtual machine is added to or removed from a datastore that is part of a group.
VMware, Inc.
10
VMware, Inc.
machines. If you need to override inventory mappings for a few members of a protection group, use the vSphere Client to connect to the recovery site and edit the network settings of the placeholders or move them to a different folder or resource pool. If a member of a protection group loses its protection, its placeholder is removed from the recovery site until the protection has been restored. Placeholders can be treated like any other members of the recovery site vCenter inventory, although they cannot be powered on. When a placeholder is created, its folder, network, and compute resource assignments are derived from inventory mappings established at the protected site. Its permissions are inherited from the protected virtual machine that it represents. A recovery site vCenter administrator can modify these assignments and permissions as necessary. Changes made to the placeholder override settings established by inventory mapping, and are preserved in the recovery site SRM database. When a protected virtual machine is recovered by testing or running a recovery plan, its placeholder is unregistered, and the recovered virtual machine is registered in its place and powered on as directed by the recovery plan. After a recovery plan test completes, the placeholders are restored as part of the cleanup process.
Effect on replication
All replicated datastores are synchronized, then replication is stopped, and the target devices at the recovery site are made writable. Recovered virtual machines are connected to a datacenter network. Must run to completion.
Network Interruption
VMware, Inc.
11
About Failback
A failback restores the original configuration of the protected and recovery sites after a failover. You can configure and execute a failback procedure when you are ready to restore services to the protected site. Failback is a catch-all term for a collection of procedures that you can use to restore the original configuration of the protected and recovery sites after a failover. The specific procedures required depend on the nature of the preceding failover: a planned failover that leaves the protected site intact requires a different set of failback steps than an unplanned failover initiated before (or after) an event that compromises the protected site temporarily or permanently.
12
VMware, Inc.
A typical failback has two phases. In the first phase, the protected and recovery sites switch roles, and the virtual machines are migrated from the recovery site to the protected site under the control of a recovery plan. In the second phase, the relationship of the protected and recovery sites is restored, so that future failovers migrate the protected virtual machines from the protected site to the recovery site. Alternately, the recovery site can be promoted to a protected site, and the protected site becomes the recovery site. Configuring and executing a failback is a time-consuming task that requires downtime at the recovery site and changes to storage replication. After the failback is complete, restoring the protected site to its original role and enabling failover to the recovery site requires additional downtime and changes to storage replication.
Modifying a protected virtual machine configuration, including adding, modifying, or removing devices. Changing a virtual machines memory size on the protected site is not reflected on the recovery site if the virtual machine is already in a protection group. Relocating virtual machines. Deleting protected virtual machines. Deleting an object for which an inventory mapping exists.
n n n
SRM can tolerate the following changes at the recovery site without disruption:
n n n
Deleting placeholder virtual machines. Moving placeholder virtual machines to a different folder, resource pool, or network. Deleting an object for which an inventory mapping exists.
VMware, Inc.
13
Max Connections
SRM Licensing
The SRM server requires a license key to operate. Each SRM server installs with an evaluation license that is valid for 60 days and supports an unlimited number of protected virtual machines. SRM uses the vSphere licensing infrastructure to ensure that all protected virtual machines have appropriate licensing. Valid vSphere licensing includes Standard, Advanced, Enterprise, or Enterprise Plus licenses. After the evaluation license expires, existing protection groups remain protected and can be recovered, but you cannot create new protection groups or modify existing ones until you obtain and assign a valid SRM license key. VMware recommends that you obtain and assign SRM license keys as soon as possible after installing SRM. You can obtain a license key from your VMware sales representative.
Install one or more of these keys at the protected site to enable failover. Install keys at both sites to enable bidirectional operation.
If your SRM Servers are connected with linked vCenter Servers, the SRM servers can share the same license key. To obtain your license keys, go to the VMware Product Licensing Center (http://www.vmware.com/support/licensing/index.html). SRM licensing checks for a valid license whenever you add a virtual machine to or remove a virtual machine from a protection group. If licenses are not in compliance, vSphere triggers a licensing alarm. VMware recommends that you configure alerts for triggered licensing events so that licensing administrators are notified by email.
14
VMware, Inc.
SRM Authentication
All communications between SRM and vCenter servers take place over an SSL connection and are authenticated by public key certificates or stored credentials. When you install an SRM server, you must choose either credential-based authentication or certificate-based authentication. You cannot mix authentication methods. The authentication method you choose when installing the SRM server is used to authenticate connections between the SRM servers at the protected and recovery sites, and between SRM and vCenter.
Certificate-Based Authentication
If you have or can acquire a PKCS#12 certificate signed by a trusted authority, use certificate-based authentication. Public key certificates signed by a trusted authority streamline many SRM operations and provide the highest level of security. Certificates used by SRM have special requirements. See Requirements When Using Public Key Certificates, on page 16.
Credential-Based Authentication
If you are using credential-based authentication, SRM stores a user name and password that you specify during installation, and then uses those credentials when connecting to vCenter or another SRM server. SRM also creates a special-purpose certificate for its own use. This certificate includes additional information that you supply during installation. That information, an Organization name and Organization Unit name, must be identical for both members of an SRM server pair. NOTE Even though SRM creates and uses this special-purpose certificate when you choose credential-based authentication, credential-based authentication is not equivalent to certificate-based authentication in either security or operational simplicity.
Certificate Warnings
If you are using credential-based authentication, attempts by the SRM server to connect to vCenter produce a certificate warning because the trust relationship asserted by the special-purpose certificates created by SRM and vCenter cannot be verified by SSL. The warning dialog allows you to specify a disposition for the current instance of the problem, for all instances of the problem when making connection to a specific host, or for all instances of the problem for all hosts. To avoid these warnings, use certificate-based authentication and obtain your certificate from a trusted certificate authority.
VMware, Inc.
15
A Common Name (CN) attribute, whose value must be the same for both members of the pair. A string such as "SRM" is appropriate here. An Organization (O) attribute, whose value must be the same as the value of this attribute in the supporting vCenter server's certificate. An Organizational Unit (OU) attribute, whose value must be the same as the value of this attribute in the supporting vCenter server's certificate.
The certificate used by each member of an SRM server pair must include a Subject Alternative Name attribute whose value is the fully-qualified domain name of the SRM server host. (This value will be different for each member of the SRM server pair.) Because this name is subject to a case-sensitive comparison, it is a good idea to always use lower-case letters when specifying the name during SRM installation.
n
If you are using an openssl CA, modify the openssl configuration file to include a line like the following if the SRM server host's fully-qualified domain name is srm1.example.com:
subjectAltName = DNS: srm1.example.com
If you are using a Microsoft CA, refer to http://support.microsoft.com/kb/931351 for information on how to set the Subject Alternative Name.
The certificate used by each member of an SRM server pair must include an "extendedKeyUsage" or "enhancedKeyUsage" attribute whose value is "serverAuth, clientAuth". If you are using an openssl CA, modify the openssl configuration file to include a line like the following:
extendedKeyUsage = serverAuth, clientAuth
16
VMware, Inc.
Protection Groups AdministratorSet up and modify protection groups. Protection SRM AdministratorPair the protected and recovery sites, and configure inventory mappings. Protection Virtual Machine AdministratorSet up and modify the protection characteristics of a protected virtual machine. Recovery Datacenter AdministratorView available datastores and customize recovered virtual machines. Recovery Host AdministratorConfigure virtual machine components during recovery. If the recovery host is a cluster, this permission must be assigned for the cluster object itself and for every host in the cluster. Recovery Inventory AdministratorView customization specifications for the recovery site. Recovery Plans AdministratorReconfigure protected and recovered virtual machines. Also grants the ability to set up and run recovery. Recovery SRM AdministratorConfigure arrays and create protection profiles. Recovery Virtual Machine AdministratorCreate virtual at the recovery site machines and add them to the resource pool. Also grants the ability to reconfigure and customize the recovery virtual machines when a recovery plan is run.
n n
n n
Read-only at the vCenter root. Read-only at the datacenter inventory object. Protection Virtual Machine Administrator role at the virtual machine level (propagate).
VMware, Inc.
17
n n
Protection SRM Administrator role at the SRM site recovery root level (propagate). Protection Groups Administrator role at the SRM protection groups level (propagate).
Recovery Inventory Administrator role at the vCenter root. Recovery Datacenter Administrator role at the datacenter level (propagate). Recovery Host Administrator role at the host level (If the recovery host is a cluster, this permission must be assigned for the cluster object itself and for every host in the cluster.) Recovery Virtual Machine Administrator at the resource pool and folder levels (propagate). Recovery SRM Administrator at the SRM root level (propagate). Recovery Plans Administrator at the SRM recovery plans level (propagate).
n n n
You can grant selected roles or individuals the minimum set of privileges required to perform specific operations. You can grant these privileges in addition to the default permissions for a role rather than giving many users broad administrative powers. Table 1-4 summarizes the permissions required for common SRM administrative tasks and the sites at which those permissions must be granted. Table 1-4. Site Recovery Manager Administrative Tasks and Minimum Required Privileges
Task Add new user and role Assign access permission Change access permission Remove access permission Connect (pair) sites Modify advanced settings Modify advanced settings Configure or repair array managers Configure inventory preferences Create protection groups Create or modify a recovery plan Edit a recovery plan Test a recovery plan Run or remove a recovery plan Site Both Both Both Both Both Protected Recovery Both Both Protected Recovery Recovery Recovery Recovery Minimum Required Privilege Permissions > Modify Role Permissions > Modify Permission Permissions > Modify Permission Administrator Site Recovery Manager > Protection SRM Administrator Site Recovery Manager > Protection SRM Administrator Site Recovery Manager > Recovery SRM Administrator Site Recovery Manager > Array Manager > Configure Site Recovery Manager > Inventory Preferences > Create Mapping Site Recovery Manager > Protection Group > Create Site Recovery Manager > Recovery Plan > Create Site Recovery Manager > Recovery Plan > Modify Site Recovery Manager > Recovery Plan > Test Site Recovery Manager > Recovery Plan > Run
18
VMware, Inc.
You must install an SRM server at the protected site and also at the recovery site. After the SRM servers are installed, you can download the client plug-in from either server to any vSphere Client. You use the SRM client plug-in to configure and manage SRM at each site. Prerequisites SRM requires the support of a vCenter server at each site. The SRM installer must be able to connect with this server during installation. If you cannot install SRM on a dedicated server host, you can install it on the same host where vCenter Server is installed. The SRM server host must meet the following hardware requirements:
n n n n
Processor 2.0GHz or higher Intel or AMD x86 processor Memory 2GB minimum Disk Storage 2GB minimum Networking Gigabit recommended
For up-to-date information about supported platforms and databases, see the Site Recovery Manager Compatibility Matrixes, at http://www.vmware.com/support/pubs/srm_pubs.html. This chapter includes the following topics:
n n n n n n n
Configuring the SRM Database, on page 19 Install the SRM Server, on page 21 Install the Storage Replication Adapters, on page 23 Update the SRM Server, on page 24 Install the SRM Client Plug-In, on page 25 Revert to a Previous Release, on page 25 Repair a Site Recovery Manager Server Installation, on page 26
VMware, Inc.
19
The SRM database at each site holds information about virtual machine configurations, protection groups, and recovery plans. SRM cannot use the vCenter database because it has different database schema requirements, though you can use the vCenter database server to create and support the SRM database. Each SRM site requires its own instance of the SRM database. The database must exist before SRM can be installed. If the SRM database at either site becomes corrupted, the SRM servers at both sites shut down. NOTE If you reinitialize the database after you install SRM, you must run the SRM installer in repair mode and specify a new database connection.
It must be owned by the SRM database user (the database user name you supply when configuring the SRM database connection). It must have the same name as the SRM database user. It must be the default schema for the SRM database user.
n n n n
The SRM database user must have database administrator privileges. The SRM database user must be granted the following permissions:
n n n n
n n n
If you are using Windows authentication, the SRM server and database server must run on the same host. If the SRM server and database server run on different hosts, you must use mixed mode authentication. If SQL Server is installed locally, you might need to disable the Shared Memory network setting on the database server.
The SRM database user (the database user name you supply when configuring the SRM database connection) must be granted the following permissions:
n n n n
20
VMware, Inc.
When creating the database instance, specify utf-8 encoding. Because DB2 uses Windows authentication, you must specify the database owner as a domain account.
The hostname or IP address of the sites vCenter Server. The server must be running and accessible during SRM installation, and it must be in the same Windows domain as the SRM server host. The user name and password of the vCenter administrator. A user name and password for the SRM database. See Configuring the SRM Database, on page 19. If you are using certificate-based authentication, the pathname to an appropriate certificate file. See SRM Authentication, on page 15.
n n n
Procedure 1 Log in to the server host on which you are installing SRM. Log in as a local administrator. 2 3 Download the SRM installation file to a folder on the host, or open a folder on the network that contains this file. Double-click the SRM installer icon to begin installation. If the installer detects an existing installation, verify that you want to update the existing installation, and then follow the procedure at Update the SRM Server, on page 24. 4 5 6 Click Next on the Welcome to the installation wizard screen. On the License Agreement page, select I accept the terms in the license agreement and then click Next. On the Destination Folder page, select the folder in which you want to install SRM and click Next. The default installation folder for a new installation of SRM is C:\Program Files\VMware\VMware vCenter Site Recovery Manager. If you use a different folder, the pathname cannot be longer than 240 characters and cannot include non-ASCII characters.
VMware, Inc.
21
On the VMware vCenter Server page, enter information about the vCenter server at the site where you are installing SRM and then click Next.
n
vCenter Server AddressEnter the hostname or IP address of the vCenter Server. If you use the hostname, enter it in lowercase. After installation is complete and you are configuring the connection between the protected and recovery sites, you must supply this hostname or IP address exactly as you enter it here, because is subject to case-sensitive comparisons. See Requirements When Using Public Key Certificates, on page 16. vCenter Server PortAccept the default or enter a different port. vCenter Server UsernameEnter the user name of an administrator of the specified vCenter server. vCenter Server Password Enter the password for the specified user name. The password must not be empty.
n n n
When you click Next, the installer contacts the specified vCenter server and validates the information you supplied. 8 On the Certificate Type Selection page, select an authentication method.
n
To use credential-based authentication, select Automatically generate certificate and click Next. Enter text values for your organization and organization unit, typically your company name and the name of your group within the company. To use certificate-based authentication, select Use a PKCS #12 certificate file and click Next. Enter the path to the certificate file. The certificate file must contain exactly one certificate with exactly one private key matching the certificate. Enter the certificate password if necessary.
See SRM Authentication, on page 15. 9 Enter the following additional information:
n
Local Site NameA name for this installation of SRM. A suggested name is generated for you, but you can specify any name you want, so long as it is not the same name that you use for another SRM installation with which this one will be paired. Administrator E-mailThe email address to which SRM administrative alerts and notifications are sent. Additional E-mailAn optional additional email address to which SRM administrative alerts and notifications are sent. Local HostThe name or IP address of the local host. This value is obtained by the SRM installer and need only be changed if it is incorrect (for example, if the local host has more than one network interface and the one detected by the SRM installer is not the one you want to use). Listener PortsThe SOAP and HTTP port numbers to use. API Listener PortThe SOAP port number for API clients to use.
n n
The SRM installer supplies default values for these ports. Do not change them unless the defaults would cause port conflicts. See How SRM Uses Network Ports, on page 16.
22
VMware, Inc.
10
Database Client Select a database client type from the pulldown control. Data Source Name Select and existing DSN from the pulldown, or click ODBC DSN Setup to view existing DSNs or create a new one. UsernameA user ID valid for the specified database. PasswordThe password for the specified user ID. Connection CountThe initial connection pool size. Max ConnectionsThe maximum number of database connections that can be open simultaneously.
n n n n
For more information about any of these values, see About the Site Recovery Manager Database, on page 14. NOTE If a database exists at the DSN that you provide, you are prompted to either use it or overwrite it. 11 12 Click Install. When the wizard completes, click Finish.
What to do next You can now install SRAs at each site. See Install the Storage Replication Adapters, on page 23.
SRM server installation creates a directory in which you can install the SRAs. Install the SRM server before you install the SRAs. Your SRA might require the installation of other vendor-provided components. Some of these components might need to be installed on the SRM server host; others might require only network access by the SRM server. SRM might occasionally need to rescan storage arrays. You can improve array rescan times by changing default value of the Scsi.RescanAllHbas on ESX hosts. If rescan times on ESX hosts are longer than 10 minutes, you may want to set the value of this option to 1. Masking and zoning must be configured for replicated devices to remote ESX hosts for failover. VMware recommends that you configure storage to create clones or snapshots of the replicated devices. Snapshots or clones must be masked to the recovery site ESX hosts.
VMware, Inc.
23
Procedure 1 Download the SRA. You can download storage replication adapters and their documentation from http://www.vmware.com/download/srm/. Storage replication adapters downloaded from other sites are not supported by VMware. 2 Install the SRA on each SRM server host. Storage replication adapters come with their own installation instructions. The adapter you are using must be installed on the SRM server host at the protected and recovery sites. You cannot use different adapters, or different versions of the same adapter, at these sites. Both members of an SRM site pair must use identical adapters. 3 Re-start the SRM service. The SRM service looks for SRAs when it starts up. If you add or change SRAs on a host, you must re-start the SRM server process on that host.
24
VMware, Inc.
What to do next You can now install the updated client plug-in. See Install the SRM Client Plug-In, on page 25.
VMware, Inc.
25
The hostname or IP address of the site's vCenter Server. The username and password of the vCenter administrator. The username, password, and DSN for the SRM database. The type of authentication (certificate-based or credential-based), the authentication details, or both.
The installer's repair mode presents modified versions of most of the pages that are part of the SRM server installation. For more information about any of the repair options, see Install the SRM Server, on page 21. Procedure 1 Log in to the SRM server host. Log in as a local administrator. 2 3 4 5 Open the Windows Add or Remove Software tool. Navigate to the entry for VMware vCenter Site Recovery Manager and click Change to start the installer in repair mode. Click Next on the Welcome to the installation wizard screen. Click Repair on the Program Maintenance Options page. On the VMware vCenter Server page, enter the following information:
n n
vCenter Server UsernameEnter the user name of an administrator of the specified vCenter server. vCenter Server Password Enter the password for the specified user name.
You cannot use the installer's repair mode to change the vCenter server address or port. When you click Next, the installer contacts the specified vCenter server and validates the information you supplied. 6 On the Certificate Type Selection page, choose an authentication method and click Next.
n
To leave the current authentication method unchanged, select Use existing certificate. If the installed certificate is not valid, this option is unavailable. To choose credential-based authentication, select Automatically generate certificate. To choose certificate-based authentication, select Use a PKCS #12 certificate file.
n n
Unless you select Use existing certificate, you will be prompted to supply additional authentication details such as certificate location or strings to use for Organization and Organizational Unit. For more information, see SRM Authentication, on page 15.
26
VMware, Inc.
On the Database Configuration page, Enter the following database configuration information and click Next:
n
Data Source Name Select and existing DSN from the pulldown, or click ODBC DSN Setup to view existing DSNs or create a new one. UsernameA user ID valid for the specified database. PasswordThe password for the specified user ID. Connection CountThe initial connection pool size. Max ConnectionsThe maximum number of database connection open simultaneously.
n n n n
For more information about any of these values, see About the Site Recovery Manager Database, on page 14. You cannot use the installer's repair mode to change the database type. If the installer detects an existing database at the DSN you provide, it prompts you to either use it (preserving its contents) or overwrite it (destroying its contents). 8 On the Ready to Repair the Program page, click Install to repair the installation. The installer makes the requested repairs and restarts the SRM server.
VMware, Inc.
27
28
VMware, Inc.
After you have installed SRM at the protected and recovery sites, you must connect the two sites to create a site pair, configure the array managers at each site, and configure SRM at each site. You use the SRM client plug-in to administer SRM. Site pairing requires vSphere admnistrative privileges at both sites. Prerequisites Before you can connect the protected and recovery sites, you must: 1 2 3 Install an SRM server at each site. Install the appropriate storage replication adapters on the SRM server hosts at both sites. The recovery site must be the replication target of arrays managed by the SRA at the protected site. Download the SRM plug-in from an SRM server into the vSphere client that you want to use to administer SRM.
Create a Site Pair, on page 29 Install the SRM License Key, on page 30 Configure Array Managers, on page 31 Configure Inventory Mappings, on page 33 Create Protection Groups, on page 35 Create a Recovery Plan, on page 37
VMware, Inc.
29
Procedure 1 Open a vSphere client and connect to the vCenter server at the site that you want to designate as the protected site. Log in as a vSphere administrator. NOTE The recovery site must be the replication target of arrays managed by the SRA at the protected site. 2 3 4 On the vSphere Client Home page, click the Site Recovery icon. In the Protection Setup area of the Summary window, navigate to the Connection line and click Configure. On the Remote Site Information page, type the IP address or host name of the vCenter server at the recovery site and click Next. NOTE If you are using credential-based authentication, you must enter exactly the same information here that you entered when installing the SRM server. If you entered an IP address in that step, enter it again here. If you entered a hostname in that step, enter it here in exactly the same way. Port 80 is used for the initial connection to the remote site. After the initial HTTP connection is made, the two sites establish an SSL connection over port 443 for subsequent connections. 5 On the vCenter Server Authentication page, provide the vCenter administrator user name and password for the remote site and click Next. If you are using credential-based authentication, you must enter exactly the same information here that you entered when installing the SRM server. 6 On the Compete Connections page, click Finish after all of the site paring steps have completed successfully.
The SRM and vCenter servers at the protected and recovery sites are connected. Connection information is saved in the SRM databases, and persists across logins and host restarts. What to do next After the sites are connected, you can configure the array managers.
30
VMware, Inc.
Procedure 1 Open a vSphere client and connect to the vCenter server at the protected site. Log in as a vSphere administrator. 2 3 4 5 6 7 On the vSphere Client Home page, click Licensing. For the report view, select Asset. Right-click an SRM asset and select Change license key. Select Assign a new license key and click Enter Key. Enter the license key, enter an optional label for the key, and click OK. Click OK.
What to do next Repeat the process if you need to assign a license key at the recovery site (to support bi-drectional operation).
You provide SRM with connection information and credentials (if needed) for array management systems at the protected and recovery sites. SRM verifies that it can connect to arrays at both sites. SRM verifies that it can discover replicated storage devices on these arrays and identify the VMFS datastores that they support. SRM computes and verifies datastore groups based on virtual machine storage layout and any consistency groups defined by the storage array.
n n
When the configuration process is complete, the wizard presents a list of replicated datastore groups. You typically configure array managers only once, after you have connected the protected and recovery sites. You do not need to reconfigure them unless array manager connection information or credentials have changed, or you want to use a different set of arrays. Prerequisites Before you configure the array managers at the protected and recovery sites, be sure that at least one virtual machine at the protected site is stored on a replicated device supported by an array for which you have installed an SRA. The array manager configuration wizard does not detect replicated devices unless they are part of a datastore that is home to at least one virtual machine. You must also connect the protected and recovery sites (see Create a Site Pair, on page 29). Procedure 1 Open a vSphere Client and connect to the vCenter server at the protected site. Log in as a vSphere administrator. 2 3 4 On the vSphere Client Home page, click the Site Recovery icon. In the Protection Setup area of the Summary window, navigate to the Array Managers line and click Configure. On the Protected Site Array Managers page of the Configure Array Managers wizard, click Add.
VMware, Inc.
31
Make sure that the array manager type that you want SRM to use appears in the Manager Type field. If more than one SRA has been installed on the SRM server host, click the drop-down arrow and select the manager type you want to use. If no manager type is displayed, no SRA has been installed on the SRM server host. For more information, see Install the Storage Replication Adapters, on page 23.
Type a name for the array in the Display Name field of the Add Array Manager window. Use any descriptive name that makes it easy for you to identify the storage associated with this array manager.
Fill in the remaining fields of the Add Array Manager window. These fields are created by the SRA. For more information about how to fill them in, see the documentation provided by your SRA vendor.
Click Connect to validate the information you supplied and return the list of arrays that the selected array manager has discovered. All discovered arrays are selected. Clear the selection of any array that you do not want SRM to use.
Click OK. The array manager queries the selected arrays to discover which of their devices are replicated. Detailed information about the selected arrays and the number of replicated devices they support appears in the Replicated Array Pairs area of the Configure Array Managers window.
10 11
Click Next to configure array managers at the recovery site. On the Recovery Site Array Managers page of the Configure Array Managers wizard, click Add. The procedure for configuring these arrays is identical to the procedure for configuring the arrays at the protected site, described in steps Step 5 through Step 8.
12
Click OK. The array manager at the recovery site queries the selected arrays to discover which of their devices are replicated, and displays detailed information about the selected arrays and the number of replicated devices they support in the Replicated Array Pairs area of the Configure Array Managers window. A green checkmark icon distinguishes arrays that have peers at the protected site.
13
Click Next to display the list of replicated datastore groups. On the Review Replicated Datastores page, you can expand each datastore group to see the datastores it contains and the devices that they use. If the list of datastore groups is not what you expected, correct it before continuing. NOTE Only those datastores used by at least one virtual machine are displayed. If no datastores are displayed, verify that the inventory of this vCenter includes at least one virtual machine that uses a datastore supported by the paired arrays.
14
Configure Recovery Site Array Managers When the Protected Site Is Inaccessible
If you need to edit array manager details when the protected site is not accessible, use the Repair Array Managers function. Normally, configuration of array managers requires access to both the protected and recovery sites. SRM provides a Repair Array Managers function that allows you to modify the recovery site array manager configuration even though the protected site is inaccessible. If the protected site is accessible, you can accomplish the same thing by following the procedures in Configure Array Managers, on page 31.
32
VMware, Inc.
Procedure 1 Open a vSphere Client and connect to the vCenter server at the recovery site. Log in as a vSphere administrator. 2 3 4 On the vSphere Client Home page, click the Site Recovery icon. In the Recovery Setup area of the Summary window, navigate to the Recovery Plans line and click Repair Array Managers. On the Recovery Site Array Managers page, click the Add, Remove, or Edit button to change the array manager information for the recovery site.
VMware, Inc.
33
Procedure 1 Open a vSphere Client and connect to the vCenter server at the protected site. Log in as a vSphere administrator. 2 3 On the vSphere Client Home page, click the Site Recovery icon. In the Protection Setup area of the Summary window, navigate to the Inventory Mappings line and click Configure. The Inventory Mappings page displays a tree of resources at the protected site and a corresponding tree of resources at the recovery site. For any protected site resource that does not have an inventory mapping, the corresponding item in the recovery site tree is listed as None Selected. 4 5 6 To configure mapping for a resource, right-click it in the Protected Site Resources column and click Configure. Expand the top-level folder in the Configure Inventory Mapping window and navigate to the recovery site resource (network, folder, or resource pool) to which you want to map the protected site resource. Select the resource and click OK. The selected resource is displayed in the Recovery Site Resources column, and its path relative to the root of the recovery site vCenter is displayed in the Recovery Site Path column. 7 To undo an inventory mapping, right-click it and click Remove.
What to do next Create one or more protection groups. Inventory mappings are applied whenever a new protection group is created. New or changed mappings must be manually applied to existing protection groups.
This procedure applies existing inventory mappings to all virtual machines that have a status of Not Configured. What to do next After this process completes, virtual machines that could not be configured have a status of Mapping Missing or Mapping Invalid. You must configure protection for these machines individually.
34
VMware, Inc.
VMware, Inc.
35
4 5
On the Name and Description page of the Create Protection Group wizard, type a name and optional description for the protection group, and click Next. On the Select a Datastore Group page, select a datastore group from the list, and click Next. The datastores listed were discovered when you configured the array managers. Each datastore in the list is replicated to the recovery site and supports at least one virtual machine at the protected site. When you select a datastore, the virtual machines that it supports are listed in the VMs on the selected datastore group field, and are automatically included in the protection group.
On the Datastore for Placeholder VMs page, select a datastore group from the list. The datastores listed on this page exist only at the recovery site. None of them are replicated from the protected site. The datastore that you select is used to hold the files that constitute the placeholder virtual machines. These files are not large, so any datastore that is accessible to the recovery site host and cluster can be an appropriate choice.
SRM creates a protection group that includes all of the virtual machines on the datastore you selected in Step 5. Placeholders are created and inventory mappings applied for each member of the group. If a group member cannot be mapped to a folder, network, and resource pool on the recovery site, it is listed with a status of Mapping Missing, and a placeholder cannot be created for it.
36
VMware, Inc.
To add a new virtual machine or template to a protection group, create it on the protected datastore and then configure protection for it. To add an existing virtual machine to a protection group, use Storage VMotion to move it to the protected datastore and then configure protection for it. To remove a virtual machine or template from a protection group, remove it from the protected datastore.
NOTE When you add a virtual machine or template to a protected datastore, it has an initial status of Not Configured in the protection group. You must configure protection for the new group member by applying inventory mappings if they exist, or by configuring resource mappings for it individually.
VMware, Inc.
37
Procedure 1 Open a vSphere Client and connect to the vCenter server at the recovery site. Log in as a vSphere administrator. 2 3 4 On the vSphere Client Home page, click the Site Recovery icon. In the Recovery Setup area of the Summary window, navigate to the Recovery Plans line and click Create. On the Recovery Plan Information page of the Create Recovery Plan wizard, type a name for the plan in the Name text box and add an optional description, and then click Next. On the Protection Groups page, select one or more protection groups for the plan to recover, and click Next. On the Response Times page, specify how long you want the recovery plan to wait for a response from a virtual machine after various recovery plan events, and then click Next. Change Network Settings Wait for OS Heartbeat If the virtual machine does not acquire the expected IP address within the specified interval after a recovery step that changes network settings, an error is reported and the recovery plan proceeds to the next virtual machine. If the virtual machine does not report an OS heartbeat within the specified interval after being powered on, an error is reported and the recovery plan proceeds to the next virtual machine.
NOTE Responses cannot be detected on virtual machines that do not have VMware Tools installed. 7 On the Configure Test Networks page, select a recovery site network to which recovered virtual machines connect during recovery plan tests, and then click Next. By default, the test network is specified as Auto, which creates an isolated test network. If you would prefer to specify an existing recovery site network as the test network, click Auto and select the network from the drop-down menu. 8 On the Suspend Local Virtual Machines page, select the virtual machines at the recovery site that the recovery plan should suspend. Suspending local virtual machines frees resources for use by recovered virtual machines. The virtual machines are suspended during a test recovery as well as during an actual recovery. After a test recovery, they are powered on again. 9 Click Finish to create the recovery plan.
38
VMware, Inc.
Procedure 1 Open a vSphere Client and connect to the vCenter server at the recovery site. Log in as a vSphere administrator. 2 3 On the vSphere Client Home page, click the Site Recovery icon. In the Recovery Setup area of the Summary window, navigate to the Recovery Plans line, right-click the plan that you want to edit, and select Edit Recovery Plan.
What to do next After you have opened the plan for editing, you can change any of its properties. For more information, see Create a Recovery Plan, on page 37.
VMware, Inc.
39
40
VMware, Inc.
After you have configured SRM at the protected and recovery sites, you can test your recovery plan without affecting services at either site. You can also run a recovery plan and, if necessary, configure the two sites for failback so that you can restore services at the protected site. SRM makes it easy to test a recovery plan. The test does not disrupt replication or any ongoing activities at the protected site. Recovery plans that suspend local virtual machines do so for tests as well as for actual recoveries. With this exception, recovery plan tests do not disrupt activities at either site. NOTE Permission to test a recovery plan does not include permission to run a recovery plan. Permission to run a recovery plan does not include permission to test a recovery plan. Each permission must be assigned separately. This chapter includes the following topics:
n n n
Test a Recovery Plan, on page 41 Run a Recovery Plan, on page 42 Configuring and Executing Failback, on page 43
VMware, Inc.
41
Click the Recovery Steps tab to monitor the progress of the test and respond to messages. The Recovery Steps tab displays the progress of individual steps. The Recent Tasks area reports the progress of the overall plan. NOTE If the SRM server loses contact with the recovery site vCenter while a recovery plan is being tested or run, the recovery plan fails and displays the message Error: The session is not authenticated. If this happens during a test, cancel the test. If this happens during a recovery, manual cleanup will probably be required after the plan completes.
SRM powers down and unregisters the protected virtual machines and then registers the placeholders again.
Steps that cannot be stopped, such as powering on or waiting for a heartbeat, run to completion before the pause or cancellation completes. Steps that add or remove storage devices are undone by cleanup operations if you cancel or by subsequent steps if you pause and resume.
The time it takes to pause or cancel a test depends on the type and number of steps that are currently in progress. The time it takes to resume a test depends on the type and number of steps that were in progress when the pause was requested. To pause, resume, or cancel a test, click the Pause, Resume, or Stop button on the recovery plan toolbar.
42
VMware, Inc.
5 6
Review the information in the confirmation prompt, and when you are ready to proceed, select I understand that this process cannot be undone and click Run Recovery Plan. To monitor the progress of the recovery and respond to messages, click the Recovery Steps tab. The Recovery Steps tab displays the progress of individual steps. The Recent Tasks area reports the progress of the overall plan. NOTE If the SRM server loses contact with the recovery site vCenter while a recovery plan is being tested or run, the recovery plan fails and displays the message Error: The session is not authenticated. If this happens during a test, cancel the test. If this happens during a recovery, manual cleanup will probably be required after the plan completes.
Array replication from the protected site to the recovery site has stopped. Devices at the recovery site are not configured as replication sources or targets. If the protected site is still operational, all protected virtual machines affected by the failover have been powered down. At the recovery site, all placeholder virtual machines have been replaced by powered-on virtual machines in the recovery site's vCenter inventory.
The virtual machines and the services that they provide are now accessible at the recovery site, but the recovery site itself is no longer protected. To protect the site, you must reconfigure SRM to designate a new recovery site and create the protection groups and recovery plans that are needed to facilitate recovery. If you intend to restore virtual machines and services to the original protected site, you must first configure it to be a recovery site. You then run a failback recovery plan that migrates the protected inventory from the original recovery site back to the original protected site. You can then reconfigure the two sites to resume their original roles. If you cannot, or do not want to, restore the original protected site to its former status, you can establish a new recovery site. To do so, create the protection groups and recovery plans needed to protect the original recovery site, and then promote the old recovery site to a protected site. Procedure 1 Review and Execute Post-Failover Cleanup Tasks on page 44 Before you can execute a failback, you must remove artifacts such as invalid protection groups and unneeded placeholders that are left over from the previous configuration. 2 Reconfigure Replication on page 44 Failover stops replication. Failback requires you to configure replication in reverse, from the recovery site to the protected site. Restoring the protected and recovery sites to their original roles requires you to configure replication from the protected site to the recovery site, as it was before the original failover was executed.
VMware, Inc.
43
Reconfigure SRM to Enable Failback to the Protected Site on page 45 Before you can run a failback, you must create the protection groups and recovery plans required to migrate protected inventory from the recovery site back to the protected site.
Restore the Original Configuration on page 45 After a failback is complete, you can restore the original configuration so that the protected and recovery sites resume the roles they had before the failover.
Clean up the recovery site. a b Open a vSphere Client and connect to the vCenter server at the recovery site. Log in as a vCenter administrator. Remove the placeholder virtual machines from vCenter inventory.
Reconfigure Replication
Failover stops replication. Failback requires you to configure replication in reverse, from the recovery site to the protected site. Restoring the protected and recovery sites to their original roles requires you to configure replication from the protected site to the recovery site, as it was before the original failover was executed. Reconfiguring replication is likely to require help from the team that manages vSphere storage for the two sites. The operations required are specific to the arrays that you are using. Generally, you must take the following steps:
n
To prepare for a failback, configure the arrays so that the source devices are the ones located at the recovery site and the target devices are the ones located at the protected site. After the failback is complete and you are ready to have the protected site and recovery site resume their original roles, configure the arrays so that the source devices are the ones located at the protected site and the target devices are the ones located at the recovery site.
After you have configured replication as needed, force an immediate, one-time replication from the source to the target. This step is always required during a failback, but might not be needed when you are reconfiguring the protected and recovery sites to resume their original roles.
44
VMware, Inc.
Remove the recovered virtual machines from vCenter inventory and delete them from storage at the recovery site. Remove the protection group and recovery plan that you created in Reconfigure SRM to Enable Failback to the Protected Site, on page 45. Remove the placeholder virtual machines created at the protected site by the failback.
Reconfigure array replication to use the protected site devices as the source and the recovery site devices as the targets. See Reconfigure Replication, on page 44.
VMware, Inc.
45
3 4 5 6 7
Configure the array managers (see Configure Array Managers, on page 31). Configure the inventory mappings (see Configure Inventory Mappings, on page 33). Create the protection groups (see in Create Protection Groups, on page 35). Create the recovery plans (see Create a Recovery Plan, on page 37). Test the recovery plan (see Test a Recovery Plan, on page 41).
46
VMware, Inc.
In its default configuration, SRM enables a number of simple recovery scenarios. Advanced users can customize SRM to support a broader range of site recovery requirements. The default protection and recovery capabilities of SRM can be appropriate for sites that have simple configurations or recovery objectives. Sites that have more complex requirements, such as many virtual machines, a variety of guest operating systems, and application-specific networking requirements, typically need to customize the recovery plans and modify the settings. This chapter includes the following topics:
n n n n n n
Assign Roles and Permissions, on page 47 Customizing a Recovery Plan, on page 48 Configure Protection for a Virtual Machine or Template, on page 57 Configure SRM Alarms, on page 59 Working with Advanced Settings, on page 59 Avoiding Replication of Paging Files and Other Transient Data, on page 62
VMware, Inc.
47
5 6 7
To apply the selected role to all child objects of the selected inventory object, select Propagate to Child Objects. To select the user or group for the role, click the Add button. Identify the user or group. a b c From the Domain drop-down menu, select the domain where the user or group is located. Either enter a name in the Search text box or select a name from the Name list. Click Add and then click OK when finished.
The list of permissions references all users and groups who have roles assigned to the object and where in the hierarchy those roles are assigned. What to do next Repeat the procedure to assign roles and permissions to users at the recovery site.
Recovery Order
When a recovery plan runs, virtual machines in the high-priority group are recovered first, followed by the normal-priority group, the low-priority group, and the no-power-on group. Before a priority group is started, all machines in the next-higher priority group must have recovered or failed to recover . Within a group, virtual machines are always recovered in the order specified by the recovery plan. Highpriority virtual machines are recovered serially. Recovery of a machine in this group does not begin until its predecessor in the list has either been recovered (powered on and connected to the network) or has failed to recover within a specified period.
48
VMware, Inc.
Virtual machines in all other priority groups are recovered serially per ESX host to enable a group of machines that spans several hosts to recover in parallel. During this type of recovery, machines on a specific ESX host are recovered in the order specified by the list, but the recovery order of the entire list is subject to the assignment of virtual machines to hosts. For example, if the first three normal-priority virtual machines are hosted on one ESX host and the fourth is hosted on a different ESX host, the fourth machine in the list might be recovered before the second or third. Because vCenter limits the number of virtual machines that can be powered on in a single request, recovery plans cannot power on more than 20 virtual machines at a time even if more than 20 ESX hosts available.
3 4
At this point, the recovery is complete. If the recovery was run as a test, the plan pauses and prompts you to verify that the test was successful.
VMware, Inc.
49
3 4
Power on the virtual machine and verify that VMware Tools reports an OS heartbeat within the specified period. Run any post-power-on command or message steps. NOTE Post-power-on command steps provide an application-specific way to verify that a recovered virtual machine has all the capabilities that you expect. For example, after powering on a recovered database server, you could execute a simple database query from a script and declare the recovery complete (by having the script exit with status of 0) only if the script receives the expected response. In the absence of such additional steps, the virtual machine is considered to have been recovered if it powers on and connects to the network.
You must start the Windows command shell using its full path on the local host. For example, to run a script located in c:\alarmscript.bat, use the following command line:
c:\windows\system32\cmd.exe /c c:\alarmscript.bat
n n
Batch files and commands must be installed locally on the SRM server host at the recovery site. Batch files and commands must complete within 300 seconds. Otherwise, the recovery plan terminates with an error. To change this limit, see Change Recovery Site Settings, on page 60. Batch files or commands that produce output that contains characters with ASCII values greater than 127 must use UTF-8 encoding. Only the final 4KB of script output is captured in log files and recovery history. Scripts that produce more output can redirect the output to a file rather than sending it to the standard output to be logged.
Execution Environment for Command Steps Command steps run with the identity of the LocalSystem account on the SRM server host at the recovery site. When a command step runs, a number of environment variables are available for it to use. Table 5-1 lists the environment variables that are available to all command steps.
50
VMware, Inc.
The environment variables listed in Table 5-2 are also set if the command step is executing on a recovered virtual machine. Table 5-2. Environment Variables Available to Command Steps Running on Recovered Virtual Machines
Name VMware_VM_Uuid VMware_VM_Name VMware_VM_Ref VMware_VM_GuestName VMware_VM_GuestIp Vmware_VM_Path Value UUID used by vCenter to uniquely identify this virtual machine Name of this virtual machine, as set at the protected site Managed object ID of the virtual machine Name of the guest OS as defined by the VIM API IP address of the virtual machine, if known Path to this virtual machine in recovery site inventory Example "4212145a-eeae-a02c-e525-ebba70b0d4f3" "My New Virtual Machine" "vm-1199" "otherGuest" "192.168.0.103" "[datastore-123] jquser-vm2/jquservm2.vmdk"
VMware, Inc.
51
The message is added to the recovery plan as a new step, and subsequent steps are renumbered. When you test or run the recovery plan, the plan pauses, displays the message, and waits for acknowledgment when it reaches this step.
52
VMware, Inc.
The command is added to the recovery plan as a new step, and subsequent steps are renumbered. When you test or run the recovery plan, the plan executes the command line on the SRM server host at the recovery site when it reaches this step.
Message and command steps added to the recovery steps for a virtual machine operate like message and command steps added to a recovery plan. For more information, see Guidelines for Writing Command Steps, on page 50.
VMware, Inc.
53
The customizations you specify are saved as properties of the placeholder virtual machine and then applied to the recovered virtual machine when a recovery plan is run or tested. NOTE If you remove the protection of a virtual machine, all recovery customizations are lost.
To restrict the list of networks to just the ones required by a specific recovery plan, include the -plan option on the command line, as shown in this example:.
dr-ip-reporter.exe -cfg ..\config\vmware-dr.xml -out c:\tmp\report.xml -plan Plan-B
NOTE The command normally asks you to verify the thumbprints presented by the certificates at each site. You can suppress the verification request by including the -I option.
54
VMware, Inc.
In an SRM recovery plan that defines three placeholder virtual machines, the generated file might look like this:
VM ID,VM Name,Adapter ID,MAC Address,DNS Domain,Net BIOS,Primary WINS,Secondary WINS,IP Address,Subnet Mask,Gateway(s),DNS Server(s),DNS Suffix(es) shdw1,srm1,0,,,,,,,,,, shdw2,srm2,0,,,,,,,,,, shdw3,srm3,0,,,,,,,,,,
The file consists of a header row that defines the meaning of each column, and a single row for each placeholder virtual machine found in the recovery plan. The only columns populated with values are:
n n n
VM ID (the ID for the placeholder virtual machine) VM Name (the name of the placeholder virtual machine) Adapter ID (always 0, which designates global IP settings, not specific to any adapter)
All the other columns are empty. 4 Edit the generated file to customize IP properties for the virtual machines in the recovery plan. This example shows the result of opening the output of dr-ip-customizer with a spreadsheet program and creating additional rows that define network settings for placeholder virtual machines in the recovery plan. Table 5-3. IP Customization Spreadsheet
VM Name srm1 Adap ter ID 0 1 00:1f: 3a: 38:29: 9c exam ple.co m dhcp MAC Addr ess DNS Dom ain NetBI OS Prim ary WINS Seco ndary WINS IP Addr ess Subn et Mask Gate way( s) DNS Serve r(s) 10.10. 10.1 DNS Suffix( es) exampl e.com
VM ID shdw1 shdw1
shdw2 shdw2
srm2
0 1 00:1c: 23:3d: b9:e3 exam ple.co m 10.10. 10.10 10.13. 99.4 255.2 55.0.0 10.10. 10.10 0 10.10. 10.1 10.10. 10.2
1 0
VMware, Inc.
55
VM ID shdw3
shdw3
The following rules apply when you modify a CSV file created by the dr-ip-customizer utility.
n n
Commas are not allowed in any field. The VM Name field is intended as a reference for the user customizing the file. It is populated when the CSV file is created but ignored when the modifications are applied to the recovery plan. It cannot be used to rename a virtual machine. The only fields that you can modify for a row where Adapter ID is 0 are DNS Server(s) and DNS Suffix(es). These values, if specified, are inherited by all other adapters for that VM ID. To define properties for a specific adapter on a placeholder virtual machine, create a new row that contains that virtual machines ID in the VM ID column and the adapter ID (the virtual PCI slot in which the adapter is installed on the placeholder virtual machine) in the Adapter ID column, then specify values for the other columns. To specify more than one value for a column, create an additional row for that adapter and include the value in the column in that row. In Table 5-3, additional rows define a secondary DNS server for the placeholder virtual machines shdw2 and shdw3. To customize a placeholder virtual machine as a DHCP client, enter dhcp in the IP Address field, as shown in the second row of Table 5-3. For any non-zero adapter ID that is not a DHCP client:
n
You must specify values for IP Address, Subnet Mask, Gateway(s), and DNS Server(s) unless global values for these properties exist (in the row for Adapter ID zero for that VM ID). Global values, if specified, are overridden by values you specify for each non-zero adapter ID The NetBIOS column, if not left empty, must contain one of the following strings: disableNetBIOS, enableNetBIOS, or enableNetBIOSViaDhcp. If you are customizing multiple adapters for a virtual machine and want to be sure that the customizations in a specific row apply to a specific adapter, specify the adapter's MAC address as pairs of hexadecimal digits separated by the colon character. Character case is not considered.
n n
Run dr-ip-customizer.exe to apply the customized IP properties. Change directory to C:\Program Files\VMware\VMware vCenter Site Recovery Manager\bin and run the following command.
dr-ip-customizer.exe -cfg ..\config\vmware-dr.xml -csv c:\tmp\example.csv -cmd command
You can include a -verbose option on any dr-ip-customizer.exe command line to log additional diagnostic messages.
56
VMware, Inc.
The specified customizations are applied to all of the virtual machines named in the csv file during a recovery. (You do not need to select a customization specification for these machines when you edit their properties in a recovery plan.)
VMware, Inc.
57
In the Edit Virtual Machine Properties window, review and configure properties as needed. a In the resource list, click Folder to review the recovery site folder to which this virtual machine is assigned. If inventory mappings have not been established for this site, you can edit this property. b Click Next to review the recovery site host to which the virtual machine is assigned. If inventory mappings have not been established for this site, you can edit this property. c Click Next to review the recovery site resource pool to which this virtual machine is assigned. If inventory mappings have not been established for this site, you can edit this property. d Click Next to review the recovery site networks to which this virtual machine is assigned. If inventory mappings have not been established for this site, you can edit this property. e Click Next to review the list of storage devices attached to the virtual machine and verify that they are all in the same datastore group or have appropriate storage on a nonreplicated datastore at the recovery site. If any device has a Recovery Location that has a status of Not Configured, click Browse to find an appropriate datastore at the recovery site, or click Detach to detach the device during recovery. f Click Next to review the datastore that you originally selected for the placeholder virtual machines in the protection group. If inventory mappings have not been established for this site, you can edit this property. g Select a customization specification. Click Browse to see a list of customization specifications available from the vCenter at the recovery site. You can also enter a description of the specification you apply. Only the IP properties from the selected specification are applied. All other properties in the specification are ignored. If you have used the dr-ip-customizer.exe command to customize virtual machines in the recovery plan, you do not need to specify that customization here. h i j Click Next to select a recovery priority group for the virtual machine. See Specify Virtual Machine Recovery Priority, on page 52. Click Next to add a message or command step that executes before the machine is powered on. Click Next to add a message or command step that executes before the machine is powered on.
Click Finish to apply the new configuration to the selected virtual machine.
58
VMware, Inc.
Procedure 1 Open a vSphere Client and connect to the vCenter server at the protected site. Log in as a vSphere administrator. 2 3 On the vSphere Client Home page, click the Site Recovery icon. In the Site Recovery tree view, expand the Protection Groups item. Protection groups that include virtual machines that need repair are highlighted with a warning icon. 4 Open the protection group and click the Virtual Machines tab. Each virtual machine that needs repair is listed with a status of Needs Repair. 5 Click Repair All to repair the virtual machines that have a status of Needs Repair.
The SRM server at the recovery site contacts the vCenter Server at the recovery site, retrieves protection configurations for the affected virtual machines, and applies those configurations, restoring the status of the machines to OK.
VMware, Inc.
59
Procedure 1 Open a vSphere Client and connect to the vCenter server at the protected site. Log in as a vSphere administrator. 2 3 4 5 6 7 On the vSphere Client Home page, click the Site Recovery icon. Right-click Site Recovery in the vSphere Client navigation pane and click Advanced Settings. In the navigation pane of the Advanced Settings window, click a setting category. In the category window, make your changes. Click OK to save your changes and close the Advanced Settings window. Repeat the procedure as needed at the recovery site.
Command line timeout By default, SRM allows 300 seconds for a command step to complete. If a command step takes longer than 300 seconds, the step terminates and the recovery plan fails with an error. Power state change timeout By default, SRM allows 120 seconds for a virtual machine at the protected site to respond to a power-down request when testing or running a recovery plan. If the request does not complete in this interval, the plan skips to the next virtual machine in the list (or to the next step) and reports a recovery plan error.
Procedure 1 2 3 Right-click Site Recovery in the vSphere Client navigation pane and click Advanced Settings. In the navigation pane of the Advanced Settings window, click Recovery. Modify recovery site settings as needed.
n
To change the command-line timeout, enter a new value in the Recovery.calloutCommandLineTimeout field. The new value applies to all command steps. To change the power state change timeout, enter a new value in the Recovery.powerStateChangeTimeout field. The new time-out value applies to all power state changes to virtual machines at the protected site.
Click OK to save your changes and close the Advanced Settings window.
60
VMware, Inc.
Procedure 1 2 3 Right-click Site Recovery in the vSphere Client navigation pane and click Advanced Settings. In the navigation pane of the Advanced Settings window, click SanProvider. Modify the SAN provider settings as needed.
n
To change the length of time that SRM waits for a command issued by the SRA to complete, enter a new value in the SanProvider.calloutCommandTimeout text box. To force removal, upon successful completion of a test recovery, of the snap-xx prefix applied to recovered datastore names, select the SanProvider.fixRecoveredDatastores checkbox. To change the interval that SRM waits for a host to reconnect during a host bus adapter (HBA) rescan, enter a new value in the SanProvider.hostReconnectTimeoutSec text box. To change the number of HBA rescans that SRM executes when you test or run a recovery plan, enter a new value in the SanProvider.hostRescanRepeatCount text box. To change the interval that SRM waits for each HBA rescan to complete, enter a new value in the SanProvider.hostRescanTimeoutSec text box. To change the interval between datastore group computations, enter a new value in the SanProvider.minLunGroupComputationInterval text box.
Click OK to save your changes and close the Advanced Settings window.
To change the interval at which SRM checks the CPU usage, disk space, and free memory at the local site, enter a new value in the localSiteStatus.checkInterval field. To change the interval that which SRM waits between raising alarms about CPU usage, disk space, and free memory at the local site, enter a new value in the localSiteStatus.eventFrequency field. To change the percentage of CPU usage that causes SRM to raise a high CPU usage event, enter a new value in the localSiteStatus.maxCpuUsage field. To change the percentage of free disk space that causes SRM to raise a low disk space event, enter a new value in the localSiteStatus.minDiskSpace field. To change the amount of free memory that causes SRM to raise a low memory event, enter a new value in the localSiteStatus.minMemory field.
Click OK to save your changes and close the Advanced Settings window.
VMware, Inc.
61
To change the interval at which SRM checks to see whether the SRM server at the remote site is available, enter a new value in the remoteSiteStatus.checkInterval field. To change the number of failed remote site status checks required to trigger a Remote Site Down alarm, enter a new value in the remoteSiteStatus.panicDelay field. To change the interval between Remote Site Down alarms, enter a new value in the remoteSiteStatus.panicRepeatDelay field. To change the number of remote site status checks to try before declaring the check a failure, enter a new value in the remoteSiteStatus.warningDelay field.
Click OK to save your changes and close the Advanced Settings window.
62
VMware, Inc.
Procedure 1 2 3 In the vSphere Client, right-click an ESX cluster and click Edit Settings. In the Settings window for the cluster, click Swapfile Location and select Store the swapfile in the datastore specified by the host, then click OK. For each host in the cluster, select a nonreplicated datastore. a b c Click the Configuration tab. On the Swapfile Location line, click Edit. In the Virtual Machine Swapfile Location window, select a nonreplicated datastore and click OK.
VMware, Inc.
63
At the recovery site, use the vmkfstools command to create a clone of the copied disk. Create one clone for every placeholder virtual machine, but do not attach any clones to a virtual machine. The clones are assigned as part of the protection configuration process and are attached during recovery.
At the protected site, configure each protected virtual machine. a Use the vmkfstools command to clone the disk. Create the clone on a nonreplicated datastore at the protected site, and then copy it to a nonreplicated datastore at the recovery site with the original .vmdk file. b c d Connect the cloned disk to the virtual machine, and then power on the virtual machine and assign a drive to the cloned disk. Configure the virtual machine to create its paging file on the cloned disk. Power off and then power or the virtual machine so that it writes its paging file to the new location on the cloned disk. At this point, the protected virtual machine is writing its paging file to a disk on a nonreplicated datastore at the protected site. Until you specify a recovery site location for this disk, the virtual machine does not have a valid protection configuration. e Assign recovery site storage for the paging file disk to one of the clones that you copied from the protected site. See Configure Protection for a Virtual Machine or Template, on page 57. Initially, the paging file disk has a Recovery Location that is Not Configured. Click Browse, and then browse to the cloned vmdk file at the recovery site.
After you have configured the virtual machine to use the nonreplicated disk at the recovery site, SRM considers the virtual machines storage properly configured and returns it to the protection group. What to do next After the changes at the protected site are replicated to the recovery site, you can test the recovery plan to verify that the recovered virtual machines are using the nonreplicated paging file. After you reconfigure a virtual machine to use a paging file disk, you can delete the old, unused paging file from its system disk.
64
VMware, Inc.
Troubleshooting SRM
If you have problems with storage replication, site pairing, or guest customization, you can try to troubleshoot the problem. To help identify the cause, you might need to collect SRM server or client log files to review or send to VMware Support. Errors encountered during SRM operations are displayed in error dialogs or shown in the Recent Tasks window. Most errors also generate an entry in an SRM log files. It is important to check the recent tasks and log files for the recovery site and the protected site. When searching for the cause of a problem, also check the VMware knowledge base at http://kb.vmware.com. This chapter includes the following topics:
n n n n n n
No Replicated Datastores Listed, on page 65 Inconsistent Mount Points Warning When Configuring NFS Arrays, on page 66 Array Script Files Not Found, on page 66 Expected Virtual Machine File Path Cannot Be Found, on page 66 Recovery Plan Time-Out During the Change Network Settings Step, on page 67 Collecting SRM Log Files, on page 68
VMware, Inc.
65
3 4
In the Protection Setup area of the SRM Summary window, navigate to the Array Managers line and click Configure. In the Configure Array Mangers wizard, click Next on the Protected Site Array Managers page and then click Next on the Recovery Site Array Managers page. The Review Replicated Datastores page should now display each replicated datastore that contains at least one virtual machine.
66
VMware, Inc.
Cause This error usually occurs when a virtual machine has been recently created but its files have not yet been replicated to the recovery site. For instance, you have created a virtual machine at the protected site, added it to a protection group, and then tested or run a recovery plan that includes the new virtual machine. If the virtual machine files have not yet been replicated to the recovery site, the recovery plan cannot recover the virtual machine. This problem can also occur if the virtual machine files have been replicated but then moved by a recovery site administrator using Storage vMotion or a similar tool. Solution Make sure that all virtual machines in a protection group have been replicated to the recovery site before you test or run a recovery plan for the protection group. The virtual machine files can be missing even if the corresponding placeholder virtual machine exists. Placeholders are created by SRM, not by array replication, and do not include the files necessary to recover a virtual machine.
What to do next The customization process creates log files on each virtual machine. On Windows, these log files are written in the C:\windows\temp\vmware-imc\ directory. On Linux, these log files are written in the /var/log/vmwareimc/ directory. Review the log files for more information about errors that prevented the step from completing in time.
VMware, Inc.
67
To initiate the collection of SRM server log files from the Start menu: a b Log in to the SRM server host. Select Start > Programs > VMware > VMware Site Recovery Manager > Generate vCenter Site Recovery Manager log bundle.
To initiate the collection of SRM server log files from the Windows command line: a b c Start a Windows command shell on the SRM server host. Change directory to C:\Program Files\VMware\VMware vCenter Site Recovery Manager\bin. Run the following command.
cscript srm-support.wsf
The individual log files are collected in a file named srm-plugin-support-MM-DD-YYYY-HH-MM.zip, where MMDD-YYYY-HH-MM indicates the month, day, year, hour, and minute when the log files were created.
The individual log files are collected in a file named srm-plugin-support-MM-DD-YYYY-HH-MM.zip, where MMDD-YYYY-HH-MM indicates the month, day, year, hour, and minute when the log files were created.
68
VMware, Inc.
Index
A
administration, overview of 7 advanced serttings dialog boxes guest customization 60 local site 61 recovery site 60 remote site 62 SAN provider 60 alarms, SRM-specific 59 array managers and storage replication adapters 31 replicated device discovery 31 to configure 31 to configure when protected site is down 32 to rescan arrays 33 authentication certificate warnings and 15 methods used by Site Recovery Manager 15
E
environment variables 50 error message datastore mounted on multiple hosts 66 expected virtual machine file path not found 66 unable to find any array script files 66
F
failback about 12 and replication 44 not supported by all arrays 43 to enable 45 to prepare protected site for 44 to restore original configuration after 45 failover, effects of 42, 43 feedback 5
C
certificate public key 15 requirements for 16 to change type 26 to update 26 certificate warning 15 customizing SRM 47
I
installation of storage replication adapter 23 reverting to a previous release 25 Site Recovery Manager Client plug-in 25 Site Recovery Manager server 21 to repair 26 updating to a new release 24 inventory mappings about 10 and placeholders 10 to apply 34 to create 33 to override 35, 57 IP address mappings to customize 54 to report 54
D
database backup requirements 24, 25 configuration details 19 Connection Count value 14 DB2 21 Max Connections value 14 Microsoft SQL Server 20 Oracle 20 Site Recovery Manager 14 to change connection details 19, 26 vCenter 13 datastore no replicated datastores found 65 protected 8 replicated 10
L
license key, to install 30 licensing about 14 license key 30 linked clones, limitations on recovery of 37
VMware, Inc.
69
N
network, test 11
P
permissions Site Recovery Manager 17 to assign 47 placeholders in vCenter inventory 10 to repair 58 plug-in Site Recovery Manager Client 25 to install 25 ports, used by SRM 16 protected site configure array managers for 31 configuring 29 host compatibility requirements 7 to designate 29 to disconnect form 30 protection group maximum number supported 12 relationship to datastore group 10 relationship to recovery plan 10 to add or remove members 37 to create 35 to edit 36
to remove 39 to report IP address mappings used by 54 virtual machine recovery priority 48 recovery priority, virtual machine 48, 52 recovery site configure array managers for 31 configuring 29 host compatibility requirements 7 to designate 29 to disconnect form 30 recovery test, to pause or resume 42 replication and failback 44 and recovery 11 and transient data 62 array-based 8 of swapfiles 62 of Windows paging files 63 roles Site Recovery Manger 17 to assign 47
S
site protected 7 recovery 7 site pairing 29 snapshots, limitations on recovery of 37 SRA, See storage replication adapter storage replication adapter and array managers 31 to download 23 to install 23 support 5
R
recovery customize for a virtual machine 53 test 41 recovery plan command steps 50 customizing 48 running 11, 42 steps 48 testing 11, 41 time-out during 67 time-outs 48 to add commands 53 to add messages 52 to change properties of 38 to create 37 to customize steps 51
T
troubleshooting 65
V
vCenter and Site Recovery Manager 13 to change connection information 26 to change credentials used by Site Recovery Manager 26 virtual machine customize IP properties for 54 customize recovery of 53 recovery priority 48, 52
70
VMware, Inc.