Remote Replication Technologies Architecture Overview
Remote Replication Technologies Architecture Overview
Remote Replication Technologies Architecture Overview
A White Paper
By Roselinda R. Schulman
May 2007
Executive Summary
Well-planned business continuity and disaster recovery solutions are critical to organizations operating in 24/7 environments. Minimal to zero disruption is the goal of todays enterprises when faced with planned or unplanned outages. In addition to these business needs, recent government recommendations are driving businesses to look at out-of-region replication options, to enable them to recover from larger-scale events. This paper considers technological approaches to meeting disaster recovery needs, and emerging configuration alternatives and technologies that provide cost-effective solutions for disaster recovery and business resilience. This paper does not address traditional tape backup technology or disk-based backup options, but focuses on remote replication approaches. It familiarizes readers with the vocabulary of software copy alternatives and defines currently available remote copy techniques, including network options for replication. It reviews a variety of remote copy technologies that are widely used for in-region and out-of-region replication, offered by Hitachi Data Systems and other major storage suppliers. And it outlines cost-effective replication approachesincluding two data center and three data center configurationsthat meet a wide range of business needs. In each area of replication technology, this paper shows that Hitachi Data Systems provides customers with a wide range of flexibility and choice in technologies and solutions for both open systems and mainframe environments.
Contents
Introd uctio n ........................................................................................................................................ 1 Reco ver y O bje cti ves : All Dat a Is Not Cr ea te d Eq ual ............................................................................... 2 Rolling Dis aster s ................................................................................................................................. 5 Network O ption s for R e plic ation ........................................................................................................... 7 IBM ESCON -based Options ............................................................................................................................................ 7
Fibre Channelbased Options .......................................................................................................................................... 7 Curren tly Avai la ble R emot e Cop y T ec hni que s ........................................................................................ 8 Syn chrono us R emot e Co py .................................................................................................................. 9 Asy nchro nous Remo te Co py ............................................................................................................... 10 Hitachi Compatible Replication Software for IBM XRC .................................................................................................. 10 Hitachi TrueCopy Asynchronous Software..................................................................................................................... 11 Hitachi Universal Replicator for the Universal Storage Platform ..................................................................................... 13 EMC SRDF Asynchronous.............................................................................................................................................. 14 PiT-m edi at ed R emot e Cop y ................................................................................................................ 15 Hitachi 3- and 4-Copy Models ....................................................................................................................................... 15 IBM PPRC Global Mirror ................................................................................................................................................. 15 Cascade Copy Techniques ............................................................................................................................................ 15 Three Dat a C ent er Cop y .................................................................................................................... 16 Multi-hop and Traditional Three Data Center Copy........................................................................................................ 16 Advanced Three Data Center Copy with Universal Replicator Software........................................................................ 17 Cascade.......................................................................................................................................................................... 17 Multi-target...................................................................................................................................................................... 18 Ban dwidt h Consi der ations .................................................................................................................. 18 Sum mar y .......................................................................................................................................... 19
By Roselinda R. Schulman
Introduction
The world has changed significantly in the past few years. Devastating terrorist acts and threats, the seemingly increased frequency of widespread power-grid disruptions, and the emergence of regulatory requirements for infrastructure protection are all placing stringent, yet necessary, data protection requirements on many organizations. Regardless of the industry, as more and more businesses operate in a 24/7 environmentespecially large enterprises where global operations are the normthey need an increasingly competitive edge to maintain profitability and stay in business. In the complex and challenging global environment, well-planned business continuity or proven disaster recovery practices for nonstop data availability have become critical to organizations if they are to survive any type of outage. Most information technologyrelated disasters are actually logical disasters, such as data corruption, viruses, and human error, as opposed to physical disasters like fire, earthquakes, hurricanes, etc. Logical disasters occur all the time and pose a bigger threat to businesses. However, because they are less visible to the general public, logical disasters tend to be taken less seriously. The real challenge lies in your organizations ability to think proactively and deploy best practices and technologies that can be leveraged to maximize business operations instead of adopting a reactive fix-it posture. The true test lies in the ability to prevent outages from occurring in the first place, and minimizing the effects of those incidents when they do occur. Companies today must follow the continuous business paradigm, which combines highavailability solutions with advanced disaster recovery techniques. The ultimate goal is to be able to manage both planned and unplanned situations with minimal or zero disruption. When an unplanned event does occur, the ideal scenario is:
Recovery will happen almost automatically with no loss of data Costs of the solution and resources are minimal Impact to the production environment is zero
While technology is moving forward at a rapid pace to reach this ideal scenario, many other business and technology concerns exist, including some significant trade-offs dictated by technology, budgets, and personnel resources.
thought of as time between last backup and when outage occurred). Many solutions are available depending on your organizations recovery objectives. For example, when looking at your RPO, you may be concerned with the cost of some data loss (typically less than five minutes, depending on the replication methodology). You may prefer having the ability to quickly perform a database restart instead of a no-data-loss option. Or, you may prioritize limiting possible impacts to the production environment and ensuring easy recovery (minimizing rolling disaster and database corruption) at the secondary site. Consider the different tiers of recovery available to you (see Figure 1). While Hitachi Data Systems focuses on the higher recovery tiers in terms of its software and hardware replication offerings, we understand that not all data is created equal when it comes to disaster recovery protection (see Figure 2). Remember that these trade-offs are business- and application-driven. A thorough business impact analysis is a good starting point to determine the best course of action for data protection and business continuity. Figure 1. Available Tiers of Recovery
Technology Tier
Tier 1Tape Backup Tier 2a Virtual Tape4 Tier 2bDisk Point-in-time copies Tier 3aSync Tier 3bSync w/failover Tier 4aAsync Tier 4bAsync w/failover Tier 5Three data center
RPO Range
24168 hours 1248 hours 436 hours 02 minutes 02 minutes 05 minutes4 05 minutes4 02 minutes
RTO Range
48168 hours 124 hours 424 hours 18 hours 560 minutes 18 hours
Distance
Any Any Any Limited Limited Any Any Any
Note 1 The RPO and RTO ranges in this table are Hitachi Data Systems estimates based on customer experience in a real-world enterprise application environment and are also based on using the technology after a system-wide disaster. Note 2 If the tapes or disk backups are relocated to an out-of-region recovery site Note 3 Best practice is one additional copy for doing disaster recovery testing without impacting the ongoing replication session Note 4 Network problems will extend the RPO Note 5 Depends on vendor and method deployed
Many organizations still fall into Tier 1, although some of them will also use techniques such as remote logging of data. Typically, the higher the tier, the greater the cost. You can, however, achieve significant improvements in data currency and recovery time with the higher tiers. Figure 2. Data Types and Disaster Recovery
Completely Duplicated/ Interconnected Hot Site
More
Disk Mirroring Shared Disk Disk Consolidation Single Disk Copy Electronic Vaulting
Value of Data
Amount of Data
Less Delayed
Less Immediate
Recovery Time
Different types of data require different levels of protection. A data classification is required to assess business criticality and cost to recover.
First, we should consider your environment. As mentioned earlier, all data is not created equal. It is likely that only a portion of a corporations data is critical to its basic operation and that a variety of techniques could be used to secure that data, depending on the criticality of that particular business function. Depending on how intertwined the data and applications are and the degree to which they are segregated, this can become a big undertaking. Many organizations choose not to take this approach, but rather choose to copy everything. This is a trade-off in terms of the cost of potentially having to re-engineer your environment compared to the cost of using a higher tier for all data. For companies that have a fairly local site for replication and significant bandwidth available, the option to copy everything is very attractive. However, this can change as protection from both local and regional disasters becomes necessary. While evolving network technologies help lower the cost of replicating data over significant distances, the price can still be extremely high. Therefore, it may be necessary to prioritize data or use a less current copy of data (for example, a copy that is four hours old) for the second copy. However, newer approaches such as using Hitachi Universal Replicator software with the Hitachi Universal Storage Platform can help by enabling enterprises to support replication needs without over-provisioning bandwidth based on peak-load traffic.
Before we start focusing on the technology, it is important to understand some of the basic terms we use when discussing remote-copy alternatives. Advances in technology have brought new words and phrases, such as real time, point in time (PiT), and snapshot into the language of enterprise-class storage. Copy products are designed to allow an enterprise to replicate, protect, and share data in dynamic new ways. Some of the terms used in copy technology are:
Remote copyThis refers to the mirroring of data, typically in real time, to provide an I/O-consistent remote
copy of that data. The purpose of remote copy is to protect the data in the event of a business interruption at a production location.
Point-in-time (PiT) copyPiT copy refers to a copy of data that is taken at a specific point in time. Ideally, this
copy should be I/O-consistent. PiT copies are used in many ways, including backups and checkpoints. More recently, PiT copies have been used in architected disaster recovery solutions.
Data duplicationThis software duplicates data, as in remote copy or PiT snapshots. Data duplication differs
from data migration in that with data duplication there are two copies of data at the end of the process, while with data migration there is only one.
Data migrationThis software migrates data from one storage device to another. Data migration differs from
data duplication in that at the end of the process there is only one copy of data. One purpose of data migration is to reduce operational complexity and costs for storage system upgrades or equipment refurbishment.
Synchronous replicationRequires the application to wait for the remote site to confirm each write operation
before sending the next write operation. As the replication distance increases, the time lag between synchronous write operations gets longer, and these delays become intolerable for high-volume, write-intensive transaction processing applications.
Asynchronous replicationAllows the production application to continue, and keeps track of the pending
writes until they are complete. A well-designed asynchronous replication solution maintains an I/O-consistent copy at the remote site. It may deliver a somewhat lengthened RPO and RTO, but it does protect the enterprise from major data loss in case of a regional disaster. For further discussion, see the resources in Appendix A. Over the last few years, many significant new technologies in both the software and hardware arenas have come to market. These technologies can reduce time-to-business resumption from days to hours and shorten the downtime required for backups to near zero. When evaluating alternative copy technologies, there are some important points to consider. One is the consistency or integrity of the copy. While replicating data may sound simple in practice, the ability to recover from that copy of the data can be extremely complex. This depends not only on the technology but also on the processes that were employed. We will discuss this in further detail as we look at available technologies with a focus on Hitachi Data Systems solutions. But first, we look at the impact of rolling disasters on disaster recovery strategies and technologies.
Rolling Disasters
Real-time copy products are designed to maintain a duplicate image of data at a remote location so that if the primary location is lost due to a disaster, processing can continue at the second site. Although the concept of replicating updates sounds simple, surviving a disaster is actually extraordinarily difficult and challenging. To address this, three basic disaster recovery requirements should be satisfied by any disaster recovery solution:
Surviving a rolling disaster Preserving write sequencing Emergency restart capability following a disaster
Two discrete points in time define any catastrophic disaster: when the disaster first strikes (the beginning), and when the disaster finally completes (the end). Many seconds or even minutes may follow the beginning of a disaster. The period of time between these two events is the rolling disaster. Figure 3. Rolling Disaster Window
Data is I/O consistent Data can be unusable
Rolling Disaster
Surviving a rolling disaster is the true test of any disaster recovery solution, because data corruption might occur within this rolling disaster window.
The real objective of any disaster recovery solution is to provide the capability to produce an image or I/Oconsistent copy of data at the secondary location, as it existed at a point in time prior to the beginning of the disaster. This can be likened to the state in which data exists after a server or system crash. If update activity during the rolling disaster is also shadowed to the backup site, the backup copy may also be corrupted as the write order cannot always be preserved during this time. We know that the image of the shadowed data is usable at any point prior to the disaster occurring, but the image may not be immediately usable if the potentially corrupted updates that occur during the rolling disaster are copied. In a rolling disaster, the data image may be corrupted due to write sequencing and write dependency. Write sequencing is the notion that the order or sequence of updates to the primary data structure must be maintained to ensure the integrity of the data (see Figure 4).
OK? YES
NO
Error Recovery
Update Dbase
OK? YES
NO
Error Recovery
Update Log
NO
Error Recovery
The sequence in which a database and a log are updated allows the database management system (DBMS) to instantly recover the database, with data integrity, following any sudden outage.
That means a remote copy solution must be able to replicate the original sequence of updates; failure to do so will result in corrupted data at the backup site. Write dependency implies that there is a logical relationship among a series of updates, and if there is a particular update failure, the sequence of subsequent updates might change. The application controls this write sequence/dependency, but the application has no knowledge of the remote copy. There are different ways to preserve write dependency, and vendors have chosen various approaches in their remote copy products. Hitachi Data Systems believes that customers should look to proven technologies. These include true synchronous remote copy products with appropriate controls, such as freeze functions. A number of asynchronous replication approaches also satisfactorily solve the write-sequencing problem, including Hitachi Universal Replicator software, Hitachi TrueCopy Asynchronous software, Hitachi Compatible Replication software for IBM XRC (formerly known as Hitachi Extended Remote Copy, or HXRC), IBM XRC, IBM GDPS, IBM PPRC Global Mirror, EMC SRDF Asynchronous, and PiT technologies that give the ability to snap a consistent images. For applications that span multiple volumes, including many production databases, the remote copy technology must also maintain consistency across all related volumes. This can be a challenge in a rolling disaster, in which different storage systems or communication links can cease transmitting remote copy updates at different points in time. Remote copy vendors employ a number of different approaches to define consistency groups and to maintain consistent remote copy status across all the volumes in a consistency group.
environments only)
Hitachi TrueCopy Asynchronous software Hitachi Universal Replicator software for the Universal Storage Platform EMC SRDF Asynchronous
Replicator software)
IBM PPRC Global Mirror Cascade copytypically EMC (SRDF/AR) and IBM using SRDF Adaptive or PPRC-XD, respectively.
All of these offeringsexcept IBM XRC and Hitachi Compatible Replication software for IBM XRCare available for UNIX, Microsoft Windows, and z/OS and OS/390 operating systems. Implementations differ widely based on customer requirements, and no single alternative will satisfy every customers objectives. In many circumstances there is little choice; for example, if a customers backup location is hundreds of kilometers away, then only asynchronous or PiT copy technologies may be practical. It is also worth noting that not all options deserve equal merit. The concepts of write dependency and write sequencing are not addressed in all remote copy techniques, notably PPRC-XD (Peer-to-Peer Remote Copy) and adaptive copy. Other techniques, such as multi-hop, cascade, IBM PPRC Global Mirror, and Hitachi 3 or 4 PIT copy models address the issue by creating consistent PiT copies on a fairly regular basis, although they all have different characteristics and cost structures. Hitachi Data Systems has the widest range of offerings of any storage vendor, and, our competitors protests to the contrary, this does not make things confusing for the customer. Hitachi has taken the initiative of providing both IBM-compatible and proprietary technologies to give customers the most flexibility in choosing offerings. We use the concept of building blocks for all our software to architect the right solution, based on the customers goals and objectives, rather than trying to fit a square peg solution into a round hole. For example, our flexibility is evident in long-distance replication. Depending on the requirements, we can offer a solution with which a customer could replicate directly to a site halfway around the world, or, for the ultimate protection, we can offer advanced three data center solutions. Other vendors can only offer cascade-style solutions, as they cannot support direct copy for write-dependent applications over distance.
1 2 5
Primary Logical Volume
3 4
1. Write to primary logical volume 2. Disconnect from pathpath free 3. Write to secondary logical volume 4. Write complete on secondary logical volume 5. I/O completionapplication posted
The remote link is storage controller to controller. Remote copy activity is serverless and remote copy is at the LUN/volume level. Issues with this solution include performance, distance, and multiple controller coordination.
Performance is another consideration with synchronous copy, and we often get asked what the distance limitation is. The answer is that it depends on the performance sensitivity of your applications. While newer technologies, such as Fibre Channel and DWDM, provide improvements, you still cannot exceed the speed of light. It takes about 1ms for light to travel 199.64km (124 miles); both ESCON and Fibre Channel protocols support multiple round trips when used for data replication. This means that even at short distances there will be overhead when using synchronous remote copy. Having said this, many satisfied customers are using our synchronous technology very successfully. One other benefit of using Hitachi TrueCopy software is full support of IBM GDPS (an IBM service offering) for system failover, workload balancing, and data mirroring on systems spread across two or more sites up to 40km (25 miles) apart. While this may not be of interest to you today, it is an evolving technology that could be an option at a future time. Other vendors also support GDPS, but nearly all of the production GDPS sites around the world are currently using Hitachi storage systems. Additionally, due to the virtualization capabilities of the Universal Storage Platform, synchronous replication is possible from any externalized storage area attached to the Universal Storage Platform.
3
+T im
ta eS
mp
Secondary Logical Volume
This option provides great data integrity, long distance capability, and time stamps on all updates. Issues with this option include the requirement of host MIPs and a server at the secondary site, and cost (software license at secondary site).
This solution does require a secondary site server and software, but in order to recover quickly from any disaster, that is a requirement anyway. Recent enhancements have improved performance, reliability, and scalability, as well as adding unplanned outage support. EMC has recently added support for XRC, although it is believed the initial version is a very early one and may not incorporate the recent enhancements.
10
11
Figure 7. Hitachi TrueCopy Asynchronous Software: Write Sequencing and Consistency Groups
Data (5) Data (4) Data (3) Data (2) Data (1) C/TG = Consistency Group
Sort
#3: Data #2: Data #1: Data #2: Data #1: Data
#3: Data #2: Data #1: Data #2: Data #1: Data
P-VOL 0
P-VOL 1
P-VOL 2
C/TG_0
S-VOL 0
S-VOL 1
S-VOL 2
P-VOL 3
P-VOL 4
P-VOL 5
C/TG_1
S-VOL 3
S-VOL 4
S-VOL 5
TrueCopy Asynchronous software uses write sequencing and consistency groups to ensure data integrity and allow users to perform operations on single applications.
This capability allows you to execute operations on single applications, for example, during disaster recovery testing. In areas where customers require long-distance replication to ensure business continuity or for highly performance-sensitive environments, Hitachi Data Systems has demonstrated clear leadership with TrueCopy Asynchronous software:
Data integrity is guaranteed for dependent write applications Excellent performance is achieved for both long- and short-distance requirements, due to its asynchronous
nature
Future enhancements will provide additional configuration flexibility
I/Os
Application of changed tracks by other implementations (and not individually time-stamped I/Os) cannot preserve
the original sequence of writes, and therefore should not be used for real-time disaster recovery unless used as part of a properly architected PiT solution.
12
Universal Replicator uses disk-based journal volumes and a pull-style replication engine to move data from the primary site to the remote site.
When collecting the data to be replicated, the primary Universal Storage Platform writes the designated records to cache and subsequently to special set of journal volumes if cache exceeds safe limits for performance. The remote system then reads the records from either cache or the journal volumes, pulling them across the communication link. For large IBM z/OS customers, Universal Replicator software extends the capabilities provided by TrueCopy Asynchronous software and provides the capability to have one consistency group spread across multiple controllers in z/OS environments at both the primary and secondary sites. This provides large organizations with the ability to have up to 64K volumes be consistent to a single point in time, while maintaining a near-synchronous RPO. This is achieved using advanced time-stamping and sequencing capabilities and using a master/slave relationship at the remote site, to maintain consistency (see Figure 9). Other alternatives freeze I/O at the primary site to maintain consistency, thus impacting application performance. This capability also provides a compelling alternative to current XRC customers, by lowering their TCO as this solution does not require any host software at the remote site. In addition, managing many thousands of replicated volumes is accomplished easily using the Hitachi Business Continuity Manager software, when compared to issuing thousands of z/OS commands.
For a discussion of the Universal Storage Platform and business continuity applications of its capabilities, including Universal Replicator, see Business Continuity and the Hitachi Universal Storage Platform, Hitachi Data Systems white paper WHP-163. Customers should check with Hitachi Data Systems on supported platforms and configurations.
13
MCUs
RCUs
Universal Replicator
Master
Universal Replicator
Slave
Universal Replicator
Slave
Universal Replicator
Slave
Universal Replicators disk-based journaling and pull-style replication engine help reduce resource consumption and costs, while increasing performance and operational resilience. In effect, Universal Replicator restores primary site storage to its intended role as a transaction processing resource, not a replication engine. This can also translate to a lower cost of ownership over the life of the storage, as depending on a customers RPO, bandwidth acquisitions may be delayed. This technology also used significantly less storage than a PIT Mediated copy, while delivering better recovery points
Additionally, by using a disk-based journaling technique, Universal Replicator prevents network issues or spikes in workload from causing the replication process to suspend: the overflow is buffered in the journal disks until the replication process can catch up. This is a significant benefit, as recovery from a suspended state (in cache-based replication methods) requires a destructive resynchronization process that causes RPOs to be significantly elongated. By contrast Universal Replicator allows the RPO to improve at the remote site after a network outage as it catches up, versus falling farther behind until the total catch-up is complete2.
For a more detailed discussion of Universal Replicator and its application to advanced disaster-recovery strategies, see Universal Replicator Advanced Technology, Hitachi Data Systems white paper WHP-165.
14
SRDF Asynchronous uses the concept of timed cycles known as delta sets, typically every 30 seconds, and captures all host I/O during that period in cache. If the same record is updated more than once, only the most recent update is kept. Any dependent I/O will in theory be in that delta set or in a subsequent one. Once the time is up, SRDF Asynchronous starts another delta set cycle at the primary site and begins to transmit the previous delta set to the remote side cache. Once all the data is received at the secondary site, it then promotes it to an active cycle and the data can be de-staged to the back-end disk at the secondary site. If there is a problem during transmission or a disaster occurs, then the copy of consistent data at the remote site should be the previous completely transmitted delta set. However, with multiple links and network retries, that may be of concern. This technology also appears to use large amounts of cache, as the data is held there much longer than with other asynchronous techniques. EMC claims significant improvement in required bandwidth, but this will be highly data and network dependent. Additionally, since the time RPO of the data will be at least the 30-second cycle and the time required to transmit it, using too little bandwidth will result in elongated RPOs and the possible dropping of the SRDF Asynchronous environment due to cache overload. A new feature called Delta-Set extension is designed to alleviate temporary spikes or network failures, by directing IOs to a disk pool if cache fills up. However, this is not the same as Universal Replicator journals, which are designed to be part of the technology, rather than a temporary buffer.
these approaches are designed using four copies of data (i.e., two local copies and two remote copies) instead of just two or three. This technology may seem appropriate for an environment where a customer wants to create a PIT copy every four hours; there is some expectation of lower bandwidth requirements in such an environment, since not every update is replicated to the remote site. We will revisit the bandwidth trade-offs in a later section of this paper.
Locality of reference refers to the observation that references to data tend to cluster. A particular piece of data, once referenced, is often referenced again in the near future. Similarly, once a particular piece of data is referenced, nearby data is often referenced in the near future.
16
Multi-hop and traditional three data center copy operations are only really applicable in very specific circumstances and for a limited number of customers, and rely heavily on complex scripts and many copies of data. EMC offered multi-hop prior to the availability of SRDF Asynchronous, and may still offer it to customers who have not moved to the DMX product line.
Intermediate
Universal Replicator
Secondary
Cascade
TrueCopy Synchronous
Multi-target
Universal Replicator
With two configuration options available, three data center copy is now an affordable, realistic alternative with TrueCopy Synchronous and Universal Replicator software.
Cascade
This solution uses TrueCopy Synchronous software to maintain a current copy of the production data at an inregion data center. The Universal Storage Platform at the in-region site also cascades the data to an out-of-region recovery site, using Universal Replicator softwares asynchronous replication capabilities. In comparison with other asynchronous replication technologies, Universal Replicator software does not require an additional point-in-time copy of the data volume at the intermediate site, or at the remote site (except for best practices to have a disaster recovery test copy).
17
Multi-target
TrueCopy Synchronous software maintains a current copy of the production data at an in-region recovery data center. At the same time, the Universal Storage Platform at the primary site replicates the data to an out-of-region recovery site, using Universal Replicator softwares asynchronous replication across a separate replication network. This technology also features a Delta Resync capability, as depicted by the dashed line. Should the primary site fail and operations resume at the intermediate site, it allows for customers to resume disaster recovery capability within minutes to the out-of-region site (remote) by only sending the differential data between the intermediate and remote sites. Figure 11 summarizes the RTO and RPO impact of several different disaster recovery approaches based on two data center (2DC) and 3DC replication. Figure 11. Benefits Comparison for Selected Disaster Recovery Configurations
Data Center Strategy
Replication configuration Primary site failure/failover Speed of recovery (RTO) Data currency (RPO) Regional disaster (RTO) Protection after failure outside primary site
*
1DC*
Onsite Synchronous near
2DC
Asynchronous far
3DC Traditional
3DC Advanced
Cascade Multi-target
The chart shows the relative benefits of these configurations in terms of recovery speed and data currency after a primary site failureand in terms of recovery speed after a regional disaster. The chart also illustrates the impact of a failure outside the primary site, which could affect the ongoing level of protection and recovery capability in case of an additional site failure. For a more detailed discussion of these 2DC and 3DC configurations and their pros and cons, see Hitachi Universal Replicator Advanced Technology, Hitachi Data Systems white paper WHP-165.
Bandwidth Considerations
One of the biggest considerations for your organization is the cost of the solution, and network bandwidth is one of the biggest contributors to that cost. In real-time remote copy, every update is sent to the remote location. If your application executes 100 writes of 10K blocks/sec you need bandwidth that accommodates the writes, plus any control information that is also sent. In PiT copy solutions for disaster recovery, data is only replicated at preset intervals, such as every 15 minutes, every hour, etc. During the period of time that data is not being replicated, for data that is updated at the primary site, tracks are marked as changed by the storage system. If the same
18
record is updated 100 times, then only the last change to the track will be shipped when the data is sent to the secondary location. This in theory means that the bandwidth requirement may be lower when using PiT technologies; however, there are many factors to consider before coming to that conclusion. If the PiT copy is frequentunder six hours, for exampleas is often required, then the chances are that there will not be significant reduction in bandwidth. In certain circumstances, bandwidth requirements may be the same or not significantly different. The bandwidth requirement will be very dependent on locality of reference of the data and how current you want the PiT to be at the remote location. Careful consideration should be used when architecting remote copy solutions with a view to saving network bandwidth. This includes technologies such as cascade, PiT copies, multi-hop, three data center copy, and even SRDF Asynchronous. It is important to understand your data patterns, to try and determine if there are any benefits, and to consider your RPO and RTO. Using too little bandwidth can elongate your RPO and cause other problems in the environment. Where catch-up is required, the available bandwidth will affect the time you can begin your recovery (RTO). Universal Replicator is unique in that customers can improve bandwidth utilization and lower their communication costs by sizing bandwidth for average bandwidth needsnot for peak usage. This simplifies bandwidth planning, and empowers users to better control their RPO in relation to infrastructure and communication costs. In environments in which RPO of the data can be looked at in hours versus minutes, a PiT solution may be a viable alternative to a real-time solution, however Universal Replicator with its journaling capabilities may be a viable alternative.
Summary
Enterprises today must meet business applications service levelsincluding increased resilience and protection from local and regional disruptionswhile dealing with complex infrastructures and tight budgets. Services Oriented Storage Solutions from Hitachi Data Systems enable organizations to precisely match business application requirements to Hitachi storage system attributes (performance, availability, data value, and cost), leveraging the application-centric management capabilities of the Hitachi Storage Management Suite of software. To meet these requirements, Hitachi Data Systems provides customers with a wide range of flexibility and choice in technologies and solutions for both open systems and mainframe environments.
The focus of Hitachi Data Systems product offerings is on standards and interoperability. Universal Storage Platform and Hitachi Lightning 9900 V Series systems provide fully compatible, high-
performance S/390 (PPRC, GDPS, and XRC) solutions; this positions customers to take advantage of future software enhancements.
Hitachi TrueCopy Asynchronous software, Hitachi Universal Replicator software, and advanced 3DC
configurations provide unique, enterprise-wide, simple, elegant disaster recovery solutions. Many copy technologies today can be considered when implementing business continuity solutions, especially when you include traditional backup methods of copying data sets. This is why it is important to choose only from best-of-breed solutions when addressing disaster recovery objectives. Hitachi Data Systems provides not only a superior range of offerings in this area, but also the expertise necessary to help you achieve your disaster recovery goals.
19