Final Project Report
Final Project Report
A Project Report on
Nikhil Verma (061/IT-Vth SEM/EVE) Vivek Pasi (020/IT-Vth SEM/EVE) Under the guidance of: Mr. Sanjeev Rana (C/o MTNL) (Asst. Manager - I.T. Dept., Minto Rd. Telephone Exchange)
Guru Gobind Singh Indraprastha University Dwarka, New Delhi Year 2011-2012
II
DECLARATION
We hereby declare that all the work presented in this Project Work entitled Data Replication Infrastructure in the partial fulfillment of the curriculum for Bachelor of Technology in Information Technology from Guru Tegh Bahadur Institute of Technology; Guru Gobind Singh Indraprastha University is an authentic record of our own work carried out under the guidance of Mr. Sanjeev Rana.
ABSTRACT
This project is based on an on-going process in MTNL in which the data stored in a main site (known as Main Server or Site) is backed up to a local site (known as Remote Server or Site). This remote site has a backup routed to the main site of another city also. Now, this main site of another city has its own data backed up to its remote site, which has its backup routed to the main site of the previous city (just like the remote site of previous city) thereby forming an interconnection between all these sites. This interconnected network of sites helps in the recovery process whenever a main site is down due to any reason, with the remote site acting as the alternate main site with full resources from the main site. This remote site also updates the main site with any changes that were made during the period of main sites inactiveness when it is ready to function properly again. This whole process is done with help of IBM Corp.s DS 8000 Copy Services Equipment which uses the IBMs AIX Server Operating System for its operation. This OS can be used by two ways, through Command Line Interface (remote connection) or Graphical User Interface (on the spot maintenance).
CONTENTS
ChapterPage No.
Title page. I Declaration. II Certificate.. III Acknowledgement. IV Abstract................... V List of Figures VI List of Tables VII 1. CHAPTER 1 Introduction.. 1 1.1. Introduction to data replication. 2 1.1.1. Database Replication.. 2 1.1.2. Disk Replication. 2 1.1.3. File Based Replication 2 1.2. Data replication as an infrastructure (projects introduction) 2 1.3.Role of data replication in the project 3
2. CHAPTER 2 DS 8000 Copy Services Environment... 4 2.1. Introduction to DS 8000. 5 2.1.1. Remote Mirror & Copy Functions...... 5 2.1.2. Point in time copy function. 5 2.1.2.1. Flash Copy... 5 2.1.2.2. Remote mirror & copy. 5 2.2. Functions supported by DS 8000..... 6 2.2.1. Metro Mirror. 6 2.2.2. Global Mirror.... 6 2.3. Various Components of DS 8000.... 6
2.3.1. Metro Mirror. 6 2.3.1.1. Types of metro mirror paths & links. 7 2.3.1.2. Bandwidth requirement for setting up a metro mirror8 2.3.1.3. Volumes & Metro Mirror.. 8 2.3.1.4. Basic commands used for metro mirror operation 9 2.3.2. Global Mirror.. 10 2.3.2.1. Setting up a global mirror session.... 11 2.3.2.2. List of basic commands for a global mirror. 12
3. CHAPTER 3 Application of Data Replication Infrastructure: Disaster Management.14 3.1. DISASTER MANAGEMENT OF DATA REPLICATION SERVICES. 15 3.1.1. Disaster Identification. 15 3.1.2. Measures taken to minimize disaster & recovery process.. 15
4. CHAPTER 4 Brief description of the hardware & software used in the project.16 4.1. SOFTWARE.. 17 4.1.1. AIX 7.. 17 4.2. HARDWARE 18
5. CHAPTER 5 Conclusion & Future Scope of the Project... 19 5.1. Conclusions.... 20 5.2. Future Scope of this project... 20
6. References 21
Figure Name Simple Diagram of Data Replication Infrastructure Basic Diagram of Metro Mirror Basic Diagram of Global Mirror
page no. 3 7 10
Table Name Basic Commands used in the Metro Mirror Basic Commands used in the Global Mirror
page no. 9 12
CHAPTER 1 INTRODUCTION
1.1.
Data replication may be defined as the effective copying of data from a single or many source(s) to a single or many destination(s). It can be done either between two or more databases, disk volumes, or file systems. There are three main types of data replications, explained as follows: 1.1.1. Database Replication: Database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates. 1.1.2. Disk Storage Replication: Active (real-time) storage replication is usually implemented by distributing updates of a block device to several physical hard disks. This way, any file system supported by the operating system can be replicated without modification, as the file system code works on a level above the block device driver layer. It is implemented either in hardware (in a disk array controller) or in software (in a device driver) 1.1.3. File Based Replication: File base replication is replicating files at a logical level rather than replicating at the storage block level. There are many different ways of performing this and unlike storage level replication; they are almost exclusively software solutions.
1.2.
In our project (DATA REPLICATION INFRASTRUCTURE), data replication is put to use as an Infrastructure development medium where various data sites known as remote & main sites are interconnected as a network so as to provide backups from the remote site to the subsequent main site. This backup can also be used for proving update to the main site when it is ready to use again after a disaster or a maintenance layoff. The basic diagram of the data replication infrastructure is shown as follows:
Fig. no. 1.1: Basic diagram of data replication infrastructure showing two main sites and two remote sites interconnected with each other so that the data of any site can be accessed with proper privileges anywhere within this framework
1.3.
Data replication plays the pivotal role in the project as the whole framework of data replication infrastructure is based around it.
CHAPTER 2 DS 8000 COPY SERVICES ENVIRONMENT (THEORY & WORKING OF THE PROJECT)
2.1.
2.2.
2.2.1. Metro Mirror: The Metro Mirror Provides real-time mirroring of logical volumes between two DS8000 that can be located up to a range of 300 km. it is a synchronized copy solution where the write operations are completed on both the copies (local and remote sites) before they are considered to be complete. 2.2.2. Global Mirror: Global Mirrors provide a long distance remote copy features across two sites using asynchronous technologies. With global mirror, the data that the host writes to the storage unit at the local site is asynchronously shadowed to the storage unit on the remote site. A consistent copy of this data is automatically maintained on the storage unit at the remote site. The range of global mirrors is very high, only limited by the capabilities of the network and the medium channel extension technology. It can also provide a consistent and restorable copy of the data at the remote site, created with minimal impact to applications at the local site. This ability to maintain an efficient synchronization of the global and remote sites with support for Fail-Over & FailBack functions helps to reduce the time required for switching back to the local site after a planned or unplanned outage.
2.3.
2.3.1. METRO MIRROR: Metro Mirror (also known as a synchronous Peer-to-Peer Remote Copy Function or PPRC) provides real-time mirroring of logical volumes between two DS8000s. It is a synchronous copy solution where write operations are completed on both copies (local and remote site) before they are considered to be complete. The basic diagram of metro mirror is given as follows:
Fig. no. 2.1: Basic Diagram of Metro Mirror Function where the data is replicated between the main site and remote synchronously at an almost instantaneous rate. The distance between the remote & the main sites should not be more than 300 Km. 2.3.1.1. TYPES OF METRO MIRROR PATHS AND LINKS:
The following are the different types of paths and links used in establishing a metro mirror link between two sites or two volumes. Basically, there may be two paths, as given below: 2.3.1.1.1. Physical Paths: The physical path includes the host adapter in the source DS8000, the cabling, switches, or directors, any wide band or long distance transport devices. Types include Fibers, Cable links, Routers, Switches etc. 2.3.1.1.2. Logical Paths: a logical path maybe defined as the virtual path between two volumes or DS-8000. It is finally supported by the network of physical paths only, as the logical paths only specify the direction of travel for any data but it is accomplished on physical links only.
2.3.1.2.
The bandwidth requirement for setting up a metro mirror link between two DS-8000s has to be specified prior to setting up the connection. The points for selecting the right bandwidth are: 1. Specify the peak bandwidth that is required, all the processes should be covered under this. 2. Maximum metro links could be installed using this bandwidth. 3. The peak data write rate should also be specified and enough bandwidth must be allocated for the same. 4. The maximum distance supported for metro mirrors is 300 km, so bandwidth must be allocated keeping this distance in the loop.
2.3.1.3.
When copying the data through the metro mirror function, the volumes that are important should be copied generally. For this, a bitmap (also known as METADATA) for these volumes is generated which contains the structural information about that particular volume, which is then copied to the remote site. The only thing to be kept in mind though is that metro mirror function only works on the drives or volumes that are similar in type.
2.3.1.4.
The following are the basic commands that are used during the operation of metro mirror
TASK
List available I/O ports that can be used to establish Metro Mirror
lsavailpprcport
Copy Services Paths Copy Services Paths Create Copy Services Paths Delete Copy Services Metro Mirror / Global Copy Recovery Failback Copy Services Metro Mirror / Global Copy Recovery Failover
Delete path.
rmpprcpath
Failback.
failbackpprc
Failover.
failoverpprc
Table No. 2.1: Table Depicting Various Types of Commands Used in the Metro Mirror Operation with both Command Line Interface as well as the Graphical User Interface
10
2.3.2. GLOBAL MIRROR The Global Mirror is a type of asynchronous Data Replication Process (Metro Mirror being the synchronous version) which copies the data to a remote location from the main data center after a specified time interval from the point of data generation. The global mirror is used basically to replicate the data being generated at the main data center to a remote data site after a time interval of the point of data generation. For this, it is important to understand that Global Mirror works like a distributed application. A distributed application is usually built on a server to client relationship. The server functions as a supervisor and instructs the client. The client is able to do some work in an autonomic fashion but relies on the coordination efforts from the server. The server distributes the work to its clients. The server also coordinates all individual feedback from the clients and decides based on this feedback further action. The basic diagram of a Global Mirror is given as follows:
Fig. no. 2.2: A Basic Diagram Depicting the Process of Global Mirroring Technique where the data is replicated to the remote site from the main site after a set time interval. As seen from the figure, the distance between the source and the destination is determined by the network parameters like the quality of channel through which the replication takes place, etc.
11
2.3.2.1.
Global Mirror, as a long-distance remote copy solution, is based on an efficient combination of Global Copy and FlashCopy functions. It is the microcode that provides, from the user perspective, a transparent and autonomic mechanism to intelligently utilize Global Copy in conjunction with certain FlashCopy operations to attain consistent data at the remote site. In setting up the Global Copy Function, the following steps are involved: The I/O link is established between the two volumes (referred to as Source and Target volumes) Virtual Path link is established between the two volumes. The physical paths between the two volumes are established by the means of fiber links etc. The structural address (metadata) of the source volume is provided to the target volume using FlashCopy. The replication process starts.
12
2.3.2.2.
The following commands may be used for the operation of global mirror function using either the Command Line Interface or the Graphical User Interface
TASK Start Global Mirror session Stop Global Mirror session Pause Global Mirror session Resume Global Mirror session Show Global Mirror status Create a Global Mirror session
rmsession
CS GM Delete Or CS GM Modify
Change a Global Mirror session Display a Global Mirror session Create a complete Global Mirror environment
CSPCreate CSMMFailback CSMM Properties CSMM Failover CSMM Failback CSGM Resume
Table 2.2: Table depicting the basic commands for the operation a global mirror connection with both Command Line Interfacing & Graphical User Interfacing
13
14
15
3.1. DISASTER MANAGEMENT OF DATA REPLICATION SERVICES Disaster Management of the data replication services may be broadly categorized into the following types: Disaster Identification Measures Taken To Minimize It Recovering
All of the above steps are described as follows: 3.1.1. DISASTER IDENTIFICATION The first step in the disaster management is to identify the crisis at hand and its main causes. The causes may be of two types and they can be listed as under: a. Planned Causes: They can be explained as user defined case scenarios that are similar to the actual disaster scenario but are user initiated so as to study or upgrade the services. b. Unplanned Causes: They can be defined as the unavoidable causes due to which the functioning of copy services is affected in adverse manner. They can arise due to natural reasons (acts of nature like earthquake etc.) or due to human mistakes (like system failure etc.) 3.1.2. MEASURES TAKEN TO MINIMIZE DISASTER AND RECOVERY PROCESS The measures taken to minimize the losses due to any unwanted causes of disasters and then recovering from it include the concepts of failover and failback functions. They are more explained as follows: 7.1.1.1.FAILOVER FUNCTION:
The Protective Measures Taken to Tackle Such Disaster Situations Include Setting up A Remote Data Replication Site Which Replicates The Data Generated By The Main Site. In Case Of Failure of The Main Site, The Work Is Transferred to the Remote Site & it serves as the replication center till the main center is ready for work again. When the main center is ready, it is first updated till the present conditions & normal work is resumed afterwards. This process is commonly referred to as failover process.
7.1.1.2.FAILBACK FUNCTION:
Recovering from the disaster involves switching back from the remote site to the main site when the main site is deemed as ready to start operations again. This process is commonly referred to as failback process. In this process, the main data center is updated with the missing data from the remote site first and then the data center starts functioning properly as it would normally do.
16
17
4.1. SOFTWARE The software that this project uses is the IBMs AIX (Advanced Interactive eXecutive) Server OS (current version available 7.1). More information on AIX is as follows: AIX (Advanced Interactive eXecutive) is a series of proprietary UNIX operating systems developed and sold by IBM for several of its computer platforms. Originally released for the IBM 6150 RISC workstation, AIX now supports or has supported a wide variety of hardware platforms, including the IBM RS/6000 series and later IBM POWER and PowerPC-based systems, IBM System i, System/370 mainframes, PS/2 personal computers, and the Apple Network Server. AIX is based on UNIX System V with 4.3BSD-compatible extensions. It is one of four commercial operating systems that are presently certified to The Open Group's UNIX 03 standard. (The others are Mac OS X, Solaris and HP-UX.) The AIX family of operating systems debuted in 1986, became the standard operating system for the RS/6000 series on its launch in 1990, and is still actively developed by IBM. It is currently supported on IBM Power Systems alongside IBM i and Linux. AIX was the first operating system to utilize journaling file systems, and IBM has continuously enhanced the software with features like processor, disk and network virtualization, dynamic hardware resource allocation (including fractional processor units), and reliability engineering ported from its mainframe designs. 4.1.1. AIX 7: Main highlights of the latest version of the AIX series OS is given as follows: Latest generation of IBMs market leading, scalable, open standards-based UNIX operating system Binary compatibility with previous releases of AIX to preserve clients software investment Tremendous vertical scalability to provide capacity for your IT infrastructure to grow with your business Built-in clustering capabilities to simplify high availability and to provide infrastructure for future innovation Enhancements to virtualization capabilities to provide even more flexibility to support changing workloads Built on IBM POWER technology and virtualization to help deliver superior performance, increase system utilization and efficiency, provide for easy administration and reduce total costs Available in three editions for even more capability and flexibility
18
4.2. HARDWARE: The various kinds of hardware used in this project are described as follows: 4.2.1. ROUTER: A router is a device that forwards data packets between telecommunications networks, creating an overlay internetwork. When data comes in on one of the lines, the router reads the address information in the packet to determine its ultimate destination. Then, using information in its routing table or routing policy, it directs the packet to the next network on its journey or drops the packet. A data packet is typically forwarded from one router to another through networks that constitute the inter-network until it gets to its destination node. 4.2.2. NETWORK SWITCH: A network switch or switching hub is a computer networking device that connects network segments. It plays an integral part in most modern Ethernet local area networks (LANs). Switches may operate at one or more layers of a data link, network, or transport. THE SWITCH IS USED TO CREATE A MIRROR IMAGE OF DATA THAT CAN GO TO AN EXTERNAL DEVICE. 4.2.3. SERVER: A server computer is a computer, or series of computers, that link other computers or electronic devices together. They often provide essential services across a network, either to private users inside a large organization or to public users via the internet. Enterprise servers are servers that are used in a business context.
19
20
5.1. CONCLUSION: From our study of this project and its applications, we may conclude that: 1. Data Replication Infrastructure is an efficient way to replicate critical data from a main site to a remote site. 2. It can also be used to backup this data to another locations disk so as to use it afterwards in the crisis situations (like the disasters) 3. Data Replication can be effectively used to check the losses incurred on the data in case of any disasters and also help in keeping those losses to a minimum amount. 4. Disk mirroring type of data replication is the most effective way of replicating the data from the source to the destination. 5. In Metro Mirror (synchronous type mirroring), the data copying takes place instantaneously. 6. In Global Mirror (asynchronous type mirroring), the data copying takes place after a set time interval.
The Future Scope of this project depends upon the technological advances due to be made in the various equipment and software which are currently present in this field. Based upon those advances, the present working conditions like data transfer rate between two subsequent sites, factors affecting them etc. may improve, providing us with a better solution to the problem of disaster management and subsequently saving of a large amount of data from losses.
21
REFERENCES
22
REFERENCES
The following books were consulted during the making of this project: 1. DS-8000 Copy Services Environment (IBM Redbooks Publications/ISBN-0738431141) 2. DS-8000 Architecture (IBM Redbooks Publications/ISBN-0738435066) 3. DS-8000 Performance & Monitoring (IBM Redbooks Publications/ISBN-0738432695) 4. Disaster Recovery Strategies (IBM Redbooks Publications/ISBN-0738426504) 5. IBM AIX Enterprise Edition System Administration Guide (IBM Redbooks Publications/ISBN-0738432903) 6. Other sources The Internet
23
APPENDIX
24
SNAPSHOTS
Fig. 7(d): HARDWARE MANAGEMENT CONSOLE FOR MAINTENANCE OF A SERVER COLUMN (OS BEING USED HERE IS THE IBMs AIX SERVER OS)
Fig. 7(g): Data Storage Area (covered) (this is the area where the data is being stored for this particular site (Minto Rd. Exchange in this case). Approx. capacity is around 40 Terra Bytes)
Fig. 7(h): Data Storage Area in more detail. (Each white slip shown here is an individual hard disk of capacity 300 Giga Bytes. Together, these HDDs make the capacity of 40 Terra Bytes)