IT Disaster Recovery Policy: Policy Statement Reason For Policy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

15.

IT Disaster Recovery Policy


Responsible Executive: Chief Information Officer, WCM
Original Issued: July 1, 2010
Last Updated: November 14, 2019

Policy Statement
This policy defines acceptable methods for disaster recovery planning, preparedness, management, and mitigation of IT
systems and services of any information system on behalf of Weill Cornell Medicine.

Reason for Policy


The disaster recovery standards in this policy provide a systematic approach for safeguarding the vital technology and
data managed by the Information Technologies and Services (ITS) Department. This policy provides a framework for the
management, development, implementation, and maintenance of a disaster recovery (DR) program for the systems and
services managed by ITS that use WCM data by any entity.

Entities Affected by this Policy


The Weill Cornell Medical College and Graduate School of Medical Sciences

Who Should Read this Policy


All individuals responsible for configuring, maintaining, and monitoring information systems on the Weill Cornell Medicine.
Individuals may include Weill Cornell Medicine faculty, staff, vendors, contractors, or managed service providers.

Web Address of this Policy


https://its.weill.cornell.edu/policies/

Contacts
Direct any questions about this policy, 15.5 – IT Disaster Recovery Policy, to Brian J. Tschinkel, Chief Information Security
Officer, using one of the methods below:

• Office: (646) 962-2768


• Email: [email protected]

1
15.5 – IT Disaster Recovery Policy

Contents
1. Definitions ........................................................................................................................................................................... 3
2. Overview ............................................................................................................................................................................ 4
2.01 Governance ..............................................................................................................................................................4
2.02 Program Development ..............................................................................................................................................4
2.03 Emergency Management..........................................................................................................................................4
2.04 Budgeting ..................................................................................................................................................................5
3. Implementation ................................................................................................................................................................... 5
3.01 Plan Objective ...........................................................................................................................................................5
3.02 Storage .....................................................................................................................................................................5
3.03 Plan Attributes ..........................................................................................................................................................5
4. Maintenance ....................................................................................................................................................................... 5
5. Additional Resources ......................................................................................................................................................... 6
5.01 Services Tier Mapping ..............................................................................................................................................6
5.02 Hosting Model ...........................................................................................................................................................7
5.03 Shared Responsibility Representative Model ...........................................................................................................8
5.04 Capability Review and Risk Assessment .................................................................................................................8

2
15.5 – IT Disaster Recovery Policy

1. Definitions
These definitions apply to institutions and regulations as they are used in this policy.

• WCM Weill Cornell Medicine

• ITS Information Technologies & Services Department

• Business Continuity the complementary process to DR which focuses on business


processes and people aspects of recovery. These plans are
stored in the Ready tool.

• Business Impact Analysis (BIA) the process that identifies critical business functions, sets
priorities, and determines the impact on the organization if those
functions are not performed for a specified period

• Capability Assessment (CA) an ITS assessment of our estimated recovery time of critical
services

• Disaster Recovery (DR) involves a set of policies, tools and procedures to enable the
recovery or continuation of vital technology systems following a
disaster

• Emergency Management Team (EMT) a WCM cross-functional response team that manages potential
and/or actual large-scale outages; a published Incident
Management Procedure governs the activities of this team

• Information System any system or service that transport, processes, and/or stores
WCM data

• Recovery Time Objective (RTO) represents the maximum amount of time an institution can
tolerate the loss of an application or, conversely, how quickly an
application must be restored to working order in the event of a
disaster

• Recovery Point Objective (RPO) represents the maximum amount of data loss an institution can
tolerate for a given application in the event of a disaster

• Recovery Tier Chart ranks IT services by business-defined recovery requirement


during the business impact analysis process

• Risk Assessment (RA) initial steps of risk management which analyzes the value of the
IT assets to the business, identifying threats to those IT assets,
and evaluating how vulnerable each IT asset is to those threats

• Service Manager the owner of a service as defined by one of the users

3
15.5 – IT Disaster Recovery Policy

2. Overview
The IT Disaster Recovery Program (“Program”) is a continuous lifecycle consisting of governance, implementation, and
maintenance of the disaster recovery program and plan.

2.01 Governance
All ITS-managed systems must comply with WCM disaster recovery policies and requirements. The Program is
responsible for coordination and project management, including, but not limited to, reporting the status of planning, testing,
and auditing activity to the IT Disaster Recovery Governance Committee at least twice per year.

The IT Disaster Recovery Governance Committee is responsible for ensuring adequate financial, personnel, and other
resources are available as deemed appropriate. The Program will review, update, and coordinate testing of the Policy at
least every other year. All modifications must be approved by the IT Disaster Recovery Governance Committee and the
Information Security and Privacy Advisory Committee (ISPAC).

2.02 Program Development


The Program addresses the protection and recovery of WCM IT services so that critical operations and services are
recovered in a timeframe that ensures the survivability of WCM and is commensurate with customer obligations, business
necessities, industry practices, and regulatory requirements.

The Plan must be developed, tested, and maintained to support the objectives of the Program, and the Plan should include
relevant IT infrastructure, computer systems, network elements, and applications.

At minimum, the Program and Plan must be updated in the event of a significant organizational change, following the use
of the plans in response to a disruptive event, or otherwise reviewed annually.

The Program includes business impact analyses to identify the critical business processes, determine standard recovery
timeframes, and establish the criticality ratings for each. The results and metrics must be agreed upon by the IT Disaster
Recovery Governance Committee. These analyses are required to be updated at least every other year.

The Program also includes capability analyses (CA) to determine the department’s capacity to recover critical IT services
that support defined critical business process and recovery objectives at least every other year.

The Program maintains the Recovery Tier Chart, which defines the Recovery Time Objectives (RTO) and Recovery Point
Objectives (RPO) of all ITS-managed systems. The Service Managers are required to prioritize their IT processes and
associated assets based upon the potential detrimental impacts to the defined critical business processes.

Lastly, the Program creates disaster recovery plans for the IT portion—including services, systems and assets-- of critical
business processes. These IT services, systems, and assets must be prioritized based upon results of the business impact
analysis and ranked according to their Recovery Time Objectives and Recovery Point Objectives. The Program must
account for risk assessments at least every other year to determine threats to disaster recovery and their likelihood of
impacting the IT infrastructure. For each risk or vulnerability identified in the risk assessment, a mitigation or preventive
solution must be identified. The Program must include a change management and quality assurance process.

2.03 Emergency Management


The Program will oversee IT disaster recovery-related activities in the event of an emergency (i.e., an unplanned outage
where RTO is in jeopardy). The Program should provide input to the institution’s emergency management team.

Each institution’s IT division must develop and maintain a documented emergency plan including notification procedures.
The emergency plan shall account for its associates when a building evacuation is ordered. Supervisory personnel are
responsible to account for the associates they supervise.

4
15.5 – IT Disaster Recovery Policy

The Program requires that a post-mortem report documenting outages and recovery responses be completed within 45
days after the occurrence of an event.

2.04 Budgeting
Budgeting for disaster recovery efforts must be informed annually by requirements gathered in the business impact
analysis and capability assessment as well as the ITS budgeting process.

The Program will track and report on planned and unplanned outage spending related to the recovery and restoration
effort. During an outage, the Program may incur special recovery and restoration costs that are unbudgeted. For a small
outage, these costs would be immaterial; but for a longer outage, these costs could be significant.

3. Implementation
3.01 Plan Objective
The Plans must address the following areas: business impact analysis; data backup and recovery; business resumption;
administration and organization responsibilities; emergency response and operations; training and awareness; testing;
recovery point objectives (RPO); and, recovery time objectives (RTO).

Technological solutions for data availability, data protection, and application recovery must be considered by data
gathered by a business impact assessment and capability assessment.

3.02 Storage
The Plans must be stored in a single, central, comprehensive application that is accessible by plan owners and key
stakeholders in the event of an emergency.

All backup data must be labeled, logged, and available for use during an emergency within stated recovery time
objectives. A documented decision-making process will be used to determine what subset of backup data will be
additionally encrypted and stored off-site in a secured location outside of the geographical area of the system they are
backups of.

3.03 Plan Attributes


The Plans must address an outage that could potentially last for a period of up to six (6) weeks. It must identify risk
exposure and either accept the risk or propose mitigation solution(s).

Backup strategies must comply with predefined businesses continuity requirements, including defined recovery time and
point objectives. Backup strategies must be reviewed at least every other year. Recovery strategies must meet recovery
objectives defined in accordance with disaster recovery tiers.

Approved recovery strategies must be tested to ensure they meet required recovery time and recovery point objectives.
Recovery strategies must be implemented within a previously agreed upon period, generally not more than 180 days after
management approval.

The Program will provide training and awareness activities on the Plan at least twice per year.

4. Maintenance
Several activities are required to maintain the Plans. Plan owners must ensure that plans contain current and accurate
information. Revisions must be completed within 60 days after a test is completed. To ensure effectiveness, the Plans
must be integrated into all phases of the IT system life cycle.

Tests that demonstrate recoverability commensurate with the documented Plans must be conducted regularly and when
warranted by changes in the business and/or information systems environment.

5
15.5 – IT Disaster Recovery Policy

Backup media supporting critical business processes must be tested semi-annually. Reviews are required within 60 days
after a test to correct exposed deficiencies.

The following maintenance activities must be conducted annually:

• Updating the documented Plan

• Reviewing the Plan objectives and strategy

• Updating the internal and external contacts lists

• Conducting a simulation/desktop exercise

• Conducting an application recovery test

• Verifying the alternate site technology

• Verifying the hardware platform requirements

• Submitting a DR Status and Recoverability Report

• IT managers are responsible for briefing staff on their roles and responsibilities related to DR planning, including
developing, updating, and testing plans.

5. Additional Resources
5.01 Services Tier Mapping
Tier Time Period Data Loss Technical Solution
0 Immediate (Active/Active) Generally synchronous (or semi- Redundant remote clustering or load
synchronous) data replication with no balancing and synchronous replicated
or minimal data loss (PoF) – transparent or near transparent
recovery.
1 < 24 hours (Active/Passive) Generally asynchronous data Recovery within minutes or hours on
replication/snapshot or some other hot/warm standby server – requires
periodic copy function therefore some manual intervention to invoke coupled
data loss is acceptable. with some form or data replication.
2 < 72 hours (Disk/Tape Restore) Generally asynchronous data May be coupled with hot site or mobile
replication/snapshot or some other type solution to provide recovery
periodic copy function therefore some within days. The volume of data
data loss is acceptable. required for recovery may push this
into a mirroring requirement to meet
RTO objectives.
3 < 1 week (Disk/Tape Restore) Generally asynchronous data May be coupled with hot site or mobile
replication/snapshot or some other type solution to provide recovery
periodic copy function therefore some within 1 week.
data loss is acceptable.

6
15.5 – IT Disaster Recovery Policy

Tier Time Period Data Loss Technical Solution


4 Deferrable Generally, recovery from last captured Deferrable apps can be recovered on
backup and data loss is acceptable. an as needed basis –i.e., a complete
build solution.

5.02 Hosting Model

Note: ITS manages the security and privacy segments that transport, process, and/or store WCM data.

7
15.5 – IT Disaster Recovery Policy

5.03 Shared Responsibility Representative Model

5.04 Capability Review and Risk Assessment


It is important to periodically vet all ITS and non-ITS service providers on their continuity practices so that WCM data is not
at risk. This assessment questionnaire is designed to identify any vulnerability area(s) which is derived from this policy.
The completed assessment needs to be shared with the Program for final resolution.

Category Description Reference


Sections
Recovery Strategy • What is your recovery strategy? 1.02, 2.01,
• Are these strategies documented in your contingency plan? 2.03, 3
Program Policy • What drives your continuity program—i.e., DR policy or procedure? 1.01
Testing and Audit • What is your commitment to DR testing? 1.02, 2.01, 3
• Is the DR program audited?
Data Recovery • What are your data backup procedures and storage practices? 2.01, 3
Notification and • What is your notification and escalation documented protocol? 1.03, 2.01
Escalation • Declaration process?
• Frequency of updates during a disaster?
Critical Functions • Do functional plans exists? 1.02
Recovery Plans
Teams and Roles & • What team structure supports your DR program? 1.03
Responsibilities
Contact Information • Does your DR plan identify contact information for key personnel, etc.? 3
Risk Assessment • Do you perform a risk assessment? 1.02
• Frequency of assessment?
• Top risk concerns?
Supplier DR Program • Which suppliers do you heavily depend on? 1.02
Information • Are you checking their DR program?

You might also like