Backup and Disaster Recovery
Backup and Disaster Recovery
Michael Allen
March 4, 2016
BACKUP AND DISASTER RECOVERY 2
Abstract
Many business rely on Disaster Recovery (DR) services to prevent either man made or natural
disasters from causing expensive service disruptions. Businesses use information technology to
quickly and effectively process information. Employees use electronic mail and Voice Over
Internet Protocol (VOIP) telephone systems to communicate. Electronic data interchange (EDI)
is used to transmit data, including orders and payments from one company to another. Servers
process information and store large amounts of data. Desktop computers, laptops and wireless
devices are used by employees to create, process, manage and communicate information.
Unfortunately current DR services come either at a very high cost or with only weak guarantees
about the amount of data lost, or time required to restart operations after a failure. Backup and
restore for recovery requires a backup and restore strategy. A well designed backup and restore
strategy maximizes data availability and minimizes data loss while considering particular
business requirements. Our society’s growing reliance on crucial computer systems means that
even short periods of downtime can result in significant financial loss, or in some cases even put
Disasters can be classified into two broad categories. The first is natural disasters such as
risk management measures such as avoiding disaster-prone situations and good planning can
help. The second category is man-made disasters, such as hazardous material spills,
these instances, surveillance, testing and mitigation planning are invaluable. Disaster recovery is
primarily a form of long distance state replication combined with the ability to start up
applications at the backup site after a failure is detected. Incomplete RTOs and RPOs can
quickly derail a disaster recovery plan. Every item in the DR plan requires a defined recovery
point and time objective, as failure to create them may lead to significant problems that can
extend the disaster’s impact. Once the RTO and RPO metrics have been mapped to IT
infrastructure, the DR planner can determine the most suitable recovery strategy for each system.
The organization ultimately sets the IT budget and therefore the RTO and RPO metrics need to
fit with the available budget. While most business unit heads would like zero data loss and zero
time loss, the cost associated with that level of protection may make the desired high availability
solutions impractical. A cost-benefit analysis often dictates which disaster recovery measures are
implemented.
The amount and type of state that is sent to the backup site can vary depending on the
applications needs. State replication can be done at one of these layers (1) within an application,
(2) per disk or within a file system, or (3) for the full system context. Replication at the
application layer can be the most optimized, only transferring the crucial state of a specific
application. Backup mechanisms operating at the file system or disk layer replicate all or a
BACKUP AND DISASTER RECOVERY 4
portion of the file system tree to the remote site without requiring specific application
knowledge. The use of virtualization makes it possible to not only transparently replicate the
complete disk, but also the memory context of a virtual machine allowing it to seamlessly
resume operation after a failure. However such a technique is typically designed only for LAN
In general DR services fall under one of the following categories: Hot Backup Site
provides a set of mirrored stand-by services that are always available to run the application once
a disaster occurs, providing minimal Recovery Time Objective (RTO) and Recovery Point
Objective (RPO). Hot Stand-by Site typically use synchronous replications to prevent any data
loss due to disaster. This form of backup is the most expensive since fully powered servers must
be available at all times to run the application, plus extra licensing fees may apply for some
applications. Warm Backup Sites may keep state up to date with either synchronous or
asynchronous replication schemes depending on the necessary RPO. Stand-by servers to run the
application after failure are available, but are kept in a warm state where it may take minutes to
bring them online. This slows recovery, but also reduces cost. The server resources to run the
application must be available at all times. Cold Back-up Site data is often only replicated on a
periodic basis, leading to a RPO of hours and days. In addition, servers to run the application
after failure are not readily available and there may be a delay of hours or days as hardware is
kept in storage until needed. In addition to managing state replication, a DR solution must be
able to detect when a disaster has occurred, perform a failure procedure to activate the backup
site, as well as run the fallback steps necessary to revert control back to the primary data center
Recovery Point Objective (RPO) of a DR system represents the point of time of the most recent
backup prior to any failure. For some applications absolutely no data can be lost (RPO=0)
BACKUP AND DISASTER RECOVERY 5
requiring continuous synchronous replications to be used. While for other applications the
acceptable data loss could range from a few seconds, to hours or even days.
limit on how long it can take for an application to come back online after a failure occurs. This
includes the time to detect the failure, prepare any required servers in the backup site (virtual or
physical), initialize the failed application, and perform the network reconfiguration required to
reroute requests from the original site to the backup site so that the application can be used.
Depending on the application type and backup technique, this may involve additional manual
steps such as verifying the integrity of state or performing application specific data restore
operations, and can require careful scheduling of recovery tasks to be done efficiently.
Performance. For a DR service to be useful it must have a minimal impact on the performance
of each application being protected under failure free operation. DR can impact performance
either directly such as in a synchronous replication case where an application write will not
return until it is committed remotely, or indirectly by simply consuming disk and network
Consistency. The DR service must ensure that after a failure occurs the application can be
restored to a consistent state. This may require the DR mechanism to be application specific to
ensure that all relevant state is properly replicated to the backup site. In other cases the DR
system may assume that the application will keep a consistent copy of its important state on disk
and use a disk replication scheme to create consistent copies at the backup site. It is important
that the primary and backup sites are geographically separated in order to ensure that a single
disaster will not impact both sites. This geographic separation adds its own challenges since
increased distance leads to higher WAN bandwidth costs and will incur greater network latency.
Increased roundtrip latency directly impacts application response time when using synchronous
BACKUP AND DISASTER RECOVERY 6
replications. Asynchronous techniques can improve performance over longer distances, but can
planning and includes planning for resumption of applications, data, hardware, electronic
(BCP) includes planning for non-IT related aspects such as key personnel, facilities, crisis
communication and reputation protection, and should refer to the disaster recovery plan (DRP)
classified into the following three types: (1) Preventive measures - Controls aimed at preventing
an event from occurring. (2) Detective measures - Controls aimed at detecting or discovering
unwanted events. (3) Corrective measures - Controls aimed at correcting or restoring the system
A business continuity action plan is a document that contains and controls critical
information a business needs to stay running in spite of adverse events. A business continuity
plan is also called an emergency plan. A good business continuity plan should clearly state the
business’s essential functions in writing. An information technology disaster recovery plan (IT
DRP) should be developed in conjunction with the business continuity plan. Priorities and
recovery time objectives for information technology should be developed during the business
applications and data in time to meet the needs of the business recovery. The document should
identify and prioritize which systems and processes must be sustained and provide the necessary
information for maintaining them. A business continuity action plan should include the
following information:
3. Key Contacts.
5. Recovery Locations.
Data replication is the process of copying data from one location to another. Replication
helps an organization pass up to date copies of its data in the event of a disaster. Replication can
take place at the host, in the array, or over the network. Replication can take place over a storage
area network, local area network, or local wide area network, as well as in a cloud. Cloud
computing platforms are well suited for offering DR as a service due to their use of automated
virtual platforms that can minimize the recovery time after a failure. For disaster recovery (DR)
purposes, replication typically occurs between a primary storage location and a secondary offsite
location. Host based replication uses servers to copy data from one site to another and is
designed to allow a virtual machine to continue to function in times of disaster. With array based
application, compatible storage arrays use built in software to automatically copy data between
arrays. Network based data replication requires a switch or appliance between storage arrays and
servers. There are two types of data replications. Synchronous replication takes place in real
time, and asynchronous replication is time delayed. Synchronous replication is preferred for
BACKUP AND DISASTER RECOVERY 8
applications with low recovery time objectives that cannot lose data, but it is more expensive and
creates latency that slows down the primary application. Asynchronous replication is designed
to work over distances and requires less bandwidth. Because there is a delay in the copy time,
the two copies of data may not always be identical with asynchronous replication. Replication is
often combined with snapshot technology which allows users to replicate data periodically while
still being able to roll back to a specific point in time for recovery. Deduplication, which
eliminates redundant data is also frequently combined with replication for DR and backup.
Server virtualization is a driver for disaster recovery because virtualization reduces the
number of servers required for a disaster recovery site. Virtual servers are stored as files or
virtual machine (VM) images on the host, and can be moved by copying the VM image file and
booting it on another host while physical servers require the same hardware at the DR site. Tools
for replicating virtual machines include PHD Virtual esXpress, Vizioncore vReplicator or
VMware Site Recovery Manager if your array supports it, or tools built into applications such as
Oracle that replicate data between servers. The cloud also fits with replication, because it can
remove cost and complexity from disaster recovery. It alleviates the need to acquire and manage
an off-site location. Host-based replication is generally the best fit for disaster recovery through
the cloud because storage array and network based replication require devices at the source and
target locations. Host based replication lets you move data from standard servers in your
testing. Disaster recovery testing is the only reliable way for an organization to gauge the
effectiveness of its disaster preparedness, and data recovery planning. Simply verifying that a
backup can be restored is not enough. There are significant differences between data restoration
and business continuity. Disaster recovery testing covers a range of services. It must
BACKUP AND DISASTER RECOVERY 9
demonstrate the ability to recover data, as well as quickly return of applications infrastructure
components and mission critical systems to an operational state following a disaster. IT pros
must work to develop an effective DR test plan while establishing criteria for evaluating the
metrics that are gathered during recovery testing. DR testing also allows you to conduct planned
maintenance, offers a training opportunity for staff, and creates awareness within an organization
Prior to selecting a disaster recovery strategy, a disaster recovery planner first refers to
their organization's business continuity plan which should indicate the key metrics of recovery
point objective (RPO) and recovery time objective (RTO) for various business processes (such as
the process to run payroll, generate an order, etc.). The metrics specified for the business
processes are then mapped to the underlying IT systems and infrastructure that support those
processes.
applications and data. This includes networks, servers, desktops, laptops, wireless devices, data
and connectivity. Priorities for IT recovery should be consistent with the priorities for recovery
of business functions and processes that were developed during the business impact analysis. IT
resources required to support time sensitive business functions and processes should also be
identified. The recovery time for an IT resource should match the recovery time objective for the
business function or process that depends on the IT resource. Information technology systems
require hardware, software, data and connectivity. Without one component of the “system,” the
system may not run. Therefore, recovery strategies should be developed to anticipate the loss of
one or more of the following system components: Computer room environment (secure
computer room with climate control, conditioned and backup power supply, etc.). Hardware
(networks, servers, desktop and laptop computers, wireless devices and peripherals).
BACKUP AND DISASTER RECOVERY 10
Some business applications cannot tolerate any downtime. They utilize dual data centers
capable of handling all data processing needs, which run in parallel with data mirrored or
synchronized between the two centers. This is a very expensive solution that only larger
companies can afford. However, there are other solutions available for small to medium sized
Many businesses have access to more than one facility. Hardware at an alternate facility can be
configured to run similar hardware and software applications when needed. Assuming data is
backed up off-site or data is mirrored between the two sites, data can be restored at the alternate
There are vendors that can provide “hot sites” for IT disaster recovery. These sites are fully
configured data centers with commonly used hardware and software products. Subscribers may
provide unique equipment or software either at the time of disaster or store it at the hot site ready
for use. Data streams, data security services and applications can be hosted and managed by
vendors. This information can be accessed at the primary business site or any alternate site using
a web browser. If an outage is detected at the client site by the vendor, the vendor automatically
holds data until the client’s system is restored. These vendors can also provide data filtering and
detection of malware threats, which enhance cyber security. Data streams, data security services
and applications can be hosted and managed by vendors. This information can be accessed at the
primary business site or any alternate site using a web browser. If an outage is detected at the
BACKUP AND DISASTER RECOVERY 11
client site by the vendor, the vendor automatically holds data until the client’s system is restored.
These vendors can also provide data filtering and detection of malware threats, which enhance
cyber security.
A backup and restore strategy contains a backup portion and a restore portion. The backup
part of the strategy defines the types and frequencies of backups, the nature and speed of the
hardware that is required for them, how backups are to be tested, and where and how backup
media is supposed to be stored (including security considerations). The restore part of the
strategy defines who is responsible for performing restores and how restores should be
performed to meet the goals for availability of the database and for minimizing loss. Backup and
restore operations occur within the context of a recovery model. A recovery model is a database
property that controls how the transaction log is managed. The recovery model also determines
what types of backups and what restore scenarios are supported for the database. Typically a
database uses either the simple recovery model or the full recovery model. The full recovery
model can be supplemented by switching to the bulk logged recovery model before a bulk
operation.
Test the backup and recovery procedures thoroughly before a real failure occurs. Testing
helps ensure that you have the required backup to recover from various failures, that the
procedures are clearly defined and documented, and can be executed smoothly and quickly by
any qualified operator. Perform regular database and transaction log backups to minimize the
amount of lost data. Backup both system and user databases. Maintain system logs in a secure
manner. Keep records of all service packs installed in Microsoft and SQL server. Keep records
of network libraries used and the security mode. A documented copy of the backup and restore
In addition to preparing for the need to recover systems, organizations also implement
precautionary measures with the objective of preventing a disaster in the first place. These may
include: (1) Local mirrors of systems and/or data and use of disk protection technology such as
RAID. (2) Surge protectors to minimize the effect of power surges on delicate electronic
equipment. (3) Use of an uninterruptible power supply (UPS) and/or backup generator to keep
systems going in the event of a power failure. (4) Fire prevention/mitigation systems such as
alarms and fire extinguishers, and (5) Anti-virus software and other security measures. Recent
research supports the idea that implementing a more holistic pre-disaster planning approach is
more cost-effective in the long run. Every $1 spent on hazard mitigation (such as a disaster
recovery plan) saves society $4 in response and recovery costs. As IT systems have become
increasingly critical to the smooth operation of a company, and arguably the economy as a
whole, the importance of ensuring the continued operation of those systems, and their rapid
recovery, has increased. For example, of companies that had a major loss of business data, 43%
never reopen and 29% close within two years. As a result, preparation for continuation or
recovery of systems needs to be taken very seriously. This involves a significant investment of
time and money with the aim of ensuring minimal losses in the event of a disruptive event.
If a disaster recovery plan does not already exist, it will be necessary to initiate the
preparation of the first version of such a plan. In order to initiate a planning project for the first
time, the board and/or top level management would normally receive a proposal. Projects as
important as DRP development should be approved at the highest level to ensure that the
required level of commitment, resources and management attention are applied to the process.
The proposal should present the reasons for undertaking the project, and could include some or
Increased dependency by the business over recent years on computerized production and
sales delivery mechanisms, thereby creating increased risk of loss of normal services.
systems.
Increased recognition of the impact that a serious incident could have on the business.
Need to develop effective backup and recovery strategies to mitigate the impact of
disruptive events.
It should not be forgotten that data backup and recovery are not the same. For one thing,
the backup software can fail, or the person responsible for backing up the data can fail. Backing
up data without recovery in mind is tantamount to not backing up the data at all. There are other
steps that have to be taken in order to successfully restore the data in an event where it is needed.
Steps like assembling the right recovery environment, (the right operating system, servers, and
storage), and the right people, procedures and tools to bring back the backed up data. Backup
software can fail. Data has to be backed up as if it will absolutely be needed one day. From a
backup perspective, the main concern is not restoration, it is to back up data as quickly as
possible. Getting a secure copy of the data backed up at an offsite location is only the first step of
disaster recovery. A second step requires having the right recovery system connected to the data,
which means a need for the right servers, storage, hypervisors, and operating system in the
recovery environment. Basically, the recovery environment needs to reflect the production
environment. This is not an easy step, as there are many changes that occur daily in the
BACKUP AND DISASTER RECOVERY 14
production environment that IT staffs are frequently too busy to capture. The last step is having
the right people, processes, and tools needed to recover at the time when they are needed. All of
this is to say that data back up and disaster recovery are not the same, but both are necessary for
long term business technology resiliency. Having a recovery mindset is a necessity, which
means backing up data according to recovery strategy, connecting the right recovery systems to
the properly backed up data, and creating a programmatic approach to recovery by positioning
with the right people, right processes, right tools, and making sure that they are all available at
In response to the changing threat landscape, Network Intrusion Prevention Systems were
developed to provide advanced protection beyond that offered by firewalls and Intrusion
Detection Systems (IDS). Firewalls and IDS provides security but do not provide the kind of
protection that an IPS provides. IPS is a technology that provides security for computer systems
with features that are effective in facing threats in their advance stage. IPS has the ability to
detect attacks whether they are known or unknown. IPS is also a network security device that
monitors network and/or system activities for unwanted behavior and can interact to prevent
these activities. IPS is considered an important component in ant IT system defense. IPS
protects from denial of service attacks (DOS) and prevents intrusions that target software
typically claim an IP address, but can respond directly to any traffic in a variety of ways.
because both monitor network traffic and/or system activities for malicious activity. The main
difference between the two systems is unlike IDS, IPSs are placed in-line and are able to actively
prevent and block intrusions that are detected. More specifically IPS can take such actions as
BACKUP AND DISASTER RECOVERY 15
sending an alarm, dropping malicious packets, resetting the connection and/or blocking traffic
from the offending IP address. IPS can also correct Cyclic Redundancy Check errors,
unfragment packet streams, prevent TCP sequencing issues, and clean up unwanted transport and
network layer options. IPS can be classified into four different types. Network based intrusion
prevention systems (NIPS), monitors the entire network for suspicious traffic by analyzing
protocol activity. Wireless intrusion prevention systems (WIPS) monitors a wireless network
for suspicious traffic by analyzing wireless network protocols. Network behavioral analysis
(NBA) examines network traffic to identify threats that generate unusual traffic flows, such as
distributed denial of service (DDos) attacks, certain forms of malware and policy violations.
Host based intrusion prevention system (HIPS) is an installed software package which monitors
a single host for suspicious activity, by analyzing events occurring within that host. The
majority of intrusion prevention systems utilizes one of three detection methods. Signature-
Based Detection monitors packets in the network and compares the packets with pre-configured
and pre-determined attack patterns known as signatures. Statistical Anomaly Based Detection
determines the normal network activity, like what sort of bandwidth is generally used, what
protocols are used, what parts and devices generally connect to each other, and alerts the
administrator or user when traffic is detected that is anomaly (not usual). Stateful Protocol
Analysis Detection is the method which identifies deviations of protocol states by comparing
observed events with predetermined profiles of generally accepted definitions of benign activity.
IPS is a very effective technique to protect databases and networks from unauthorized users.
Like other developments it has its limitations, but the limitations are heavily outweighed by the
advantages. Combining network and host IPS techniques to protect databases and networks
creates a robust defensive prevention. Combining IPS IDS, and firewall technologies will
BACKUP AND DISASTER RECOVERY 16
provide a strong defense line which can protect systems from any and every attack. It is an
For the medical record company disaster recovery plan, I would have a generator in place
in case of power outages, I would use external hard drives and written logs as on-site back-up
with written procedures to follow, and an off-site cloud service as a major backup database in
case of a severe disaster. Backup recovery initial information would include records of employer
contact list, key supplier/vendor information, key contacts, prioritized list of critical business
functions, recovery locations, copies of essential records, critical telephone numbers, critical
company’s computer equipment and software, list of communication venues, and the technical
aspects of the recovery procedure. This would be done at all four location with the major backup
system located at an off-site cloud storage. All four offices would have an in-house IT technician
who would be responsible for backing up the database system regularly and who would also be
comfortable with the disaster recovery procedure. There would also be two other company
employees with the knowledge of what procedures needs to be followed in order to obtain a
successful database recovery at all times in case of an absence. This disaster recovery plan
My decision on choosing external hard drives as choice backup media is because in the
event of a small natural disaster, the hardware, software, and personnel will all be readily
available. This is all the office really needs. A cloud platform will be in place in case of a major
disaster, and just as not needing a generator the same size as a hospital, an elaborate backup
system would be too cost effective and maybe even time effective as going through complicated
recovery procedures would just make the recovery time more difficult to achieve. Each office
would have a database with updated backup information and recovery procedures for all four of
BACKUP AND DISASTER RECOVERY 17
the offices. This would not be a hard task to accomplish when there is an employee hired
specifically for the task. I would use a firewall system, an IDS system, and an IPS system
combined for protection from intrusion threats bought on by hackers, cyber-threats and from
employee related mistakes. This is an area that has to be given priority because the database has
personal identity information on employees and clients which carries judicial legalities that could
be just as damaging to the company. When recovery is taken into consideration, personal and
financial information might already be in the wrong hands. The damage to everyone involved
would be tremendous and very time consuming in an endeavor to repair. A rule of thumb is
when creating a disaster recovery plan, implement a recovery plan compatible to the business
needs.
BACKUP AND DISASTER RECOVERY 18
References
King, R, & Halim, N (Eds.). (1991). Management of a Remote Backup Copy for Disaster
http://encyclopedia.thefreedictionary.com/Intrusion-prevention system
http://www.forbes.com/sites/sungardas/2013/10/31/three-reasons-data-backup-is-different-from-
disaster-recovery/#349d52d1b
Homeland Security
http://searchstorage.techtarget.com/definition/Business-Continuity-and-Disaster-Recovery
https://en.wikipedia.org/wiki/Disaster_recovery#Further_reading