How Does A Disaster Recovery (DR) Plan Work?
How Does A Disaster Recovery (DR) Plan Work?
Generally, components one through three do not touch upon IT infrastructure at all. The
incident management plan typically establishes procedures and a structure to address cyber
attacks against IT systems during normal times, so it does not deal with the IT infrastructure
during disaster recovery. For this reason, the disaster recovery plan is the only component of the
BCP of interest to IT.
Among the first steps in developing such a strategy is business impact analysis, during
which the team should develop IT priorities and recovery time objectives. The team should time
technology recovery strategies for restoring applications, hardware, and data to meet business
recovery needs.
Every situation is unique and there is no single correct way to develop a disaster recovery plan.
However, there are three principal goals of disaster recovery that form the core of most DRPs:
Goals
A statement of goals will outline what the organization wants to achieve during or after a
disaster, including the recovery time objective (RTO) and the recovery point objective (RPO).
The recovery point objective refers to how much data (in terms of the most recent changes) the
company is willing to lose after a disaster occurs. For example, an RPO might be to lose no more
than one hour of data, which means data backups must occur at least every hour to meet this
objective.
Recovery time objective or RTO refers to the acceptable downtime after an outage before
business processes and systems must be restored to operation. For example, the business must be
able to return to operations within 4 hours in order to avoid unacceptable impacts to business
continuity.
Personnel
Every disaster recovery plan must detail the personnel who are responsible for the execution of
the DR plan, and make provisions for individual people becoming unavailable.
IT inventory
An updated IT inventory must list the details about all hardware and software assets, as well as
any cloud services necessary for the company’s operation, including whether or not they are
business critical, and whether they are owned, leased, or used as a service.
Backup procedures
The DRP must set forth how each data resource is backed up – exactly where, on which devices
and in which folders, and how the team should recover each resource from backup.
Restoration procedures
Finally, follow best practices to ensure a disaster recovery plan includes detailed restoration
procedures for recovering from a loss of full systems operations. In other words, every detail to
get each aspect of the business back online should be in the plan, even if you start with a disaster
recovery plan template. Here are some procedures to consider at each step.
Include not just objectives such as the results of risk analysis and RPOs, RTOs, and SLAs, but
also a structured approach for meeting these goals. The DRP must address each type of
downtime and disaster with a step-by-step plan, including data loss, flooding, natural disasters,
power outages, ransomware, server failure, site-wide outages, and other issues. Be sure to enrich
any IT disaster recovery plan template with these critical details.
Create a list of IT staff including contact information, roles, and responsibilities. Ensure each
team member is familiar with the company disaster recovery plan before it is needed so that
individual team members have the necessary access levels and passwords to meet their
responsibilities. Always designate alternates for any emergency, even if you think your team
can’t be affected.
Address business continuity planning and disaster recovery by providing details about mission-
critical applications in your DRP. Include accountable parties for both troubleshooting any issues
and ensuring operations are running smoothly. If your organization will use cloud backup
services or disaster recovery services, vendor name and contact information, and a list of
authorized employees who can request support during a disaster should be in the plan; ideally the
vendor and organizational contacts should know of each other.
Media communication best practices are also part of a robust disaster recovery and business
continuity plan. A designated public relations contact and media plan are particularly useful to
high profile organizations, enterprises, and users who need 24/7 availability, such as government
agencies or healthcare providers. Look for disaster recovery plan examples in your industry or
vertical for specific best practices and language.
Beyond the clear benefit of improved business continuity under any circumstances, having a
company disaster recovery plan can help an organization in several other important ways.
Cost-efficiency
Disaster recovery plans include various components that improve cost-efficiency. The most
important elements include prevention, detection, and correction, as discussed above.
Preventative measures reduce the risks from man-made disasters. Detection measures are
designed to quickly identify problems when they do happen, and corrective measures restore lost
data and enable a rapid resumption of operations.
Increased productivity
Designating specific roles and responsibilities along with accountability as a disaster recovery
plan demands increases effectiveness and productivity in your team. It also ensures redundancies
in personnel for key tasks, improving sick day productivity, and reducing the costs of turnover.
Compliance
Enterprise business users, financial markets, healthcare patients, and government entities, all rely
on availability, uptime, and the disaster recovery plans of important organizations. These
organizations in turn rely on their DRPs to stay compliant with industry regulations such as
HIPAA and FINRA.
Scalability
Planning disaster recovery allows businesses to identify innovative solutions to reduce the costs
of archive maintenance, backups, and recovery. Cloud-based data storage and related
technologies enhance and simplify the process and add flexibility and scalability.
The disaster recovery planning process can reduce the risk of human error, eliminate superfluous
hardware, and streamline the entire IT process. In this way, the planning process itself becomes
one of the advantages of disaster recovery planning, streamlining the business, and rendering it
more profitable and resilient before anything ever goes wrong.
Risk assessment
First, perform a risk assessment and business impact analysis (BIA) that addresses many
potential disasters. Analyze each functional area of the organization to determine possible
consequences from middle of the road scenarios to “worst-case” situations, such as total loss of
the main building. Robust disaster recovery plans set goals by evaluating risks up front, as part of
the larger business continuity plan, to allow critical business operations to continue for customers
and users as IT addresses the event and its fallout.
Consider infrastructure and geographical risk factors in your risk analysis. For example, the
ability of employees to access the data center in case of a natural disaster, whether or not you use
cloud backup, and whether you have a single site or multiple sites are all relevant here. Be sure
to include this information, even if you’re working from a sample disaster recovery plan.
Determine the recovery point objective (RPO), or the point in time back to which you must
recover the application. This is essentially the amount of data the organization can afford to lose.
Assess any service level agreements (SLAs) that your organization has promised to users,
executives, or other stakeholders.
lists (critical contact information list, backup employee position listing, master vendor
list, master call list, notification checklist)
inventories (communications equipment, data center computer hardware,
documentation, forms, insurance policies, microcomputer hardware and software,
office equipment, off-site storage location equipment, workgroup hardware, etc.)
schedules for software and data files backup/retention
procedures for system restore/recovery
temporary disaster recovery locations
other documentation, inventories, lists, and materials
Organize and use the collected data in your written, documented plan.
Finally, test the plan based on the criteria and procedures. Conduct an initial dry run or structured
walk-through test and correct any problems, ideally outside normal operational hours. Types of
business disaster recovery plan tests include: disaster recovery plan checklist tests, full
interruption tests, parallel tests, and simulation tests.
RPO vs RTO
The recovery point objective, or RPO, refers to how much data (in terms of the most
recent changes) the company is willing to lose after a disaster occurs. For example, an RPO
might be to lose no more than one hour of data, which means data backups must occur at least
every hour to meet this objective.
The RPO answers this question: “How much data could be lost without significantly impacting
the business?”
Example: If the RPO for a business is 20 hours and the last available good copy of data after an
outage is 18 hours old, we are still within the RPO’s parameters.
Recovery time objective or RTO refers to the acceptable downtime after an outage before
business processes and systems must be restored to operation. For example, the business must be
able to return to operations within 4 hours in order to avoid unacceptable impacts to business
continuity.
In other words, the RTO answers the question: “How much time after notification of business
process disruption should it take to recover?”
To compare RPO and RTO, consider that RPO means a variable amount of data that
would need to be re-entered after a loss or would be lost altogether during network downtime. In
contrast, RTO refers to how much real time can elapse before the disruption unacceptably
impedes normal business operations.
It is important to expose the gap between actuals and objectives set forth in the disaster
recovery plan. Only business disruption and disaster rehearsals can expose actuals—specifically
Recovery Point Actual (RPA) and Recovery Time Actual (RTA). Refining these differences
brings the plan up to speed.
Information technology systems require connectivity, data, hardware, and software. The entire
system may fail due to a single component, so recovery strategies should anticipate the loss of
one or more of these system components:
Vendors can host and manage applications, data security services, and data streams, enabling
access to information via web browser at the primary business site or other sites. These vendors
can typically enhance cybersecurity because their ongoing monitoring for outages offers data
filtering and detection of malware threats. If the vendor detects an outage at the client site, they
hold all client data automatically until the system is restored. In this sense, the cloud is essential
to security planning and disaster recovery.