Disaster Recovery Audit

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Audit of

IMS Disaster
Recovery Plan
Internal Audit
378-1-615

April 29, 2009


FINAL REPORT

TABLE OF CONTENTS

EXECUTIVE SUMMARY ........................................................................................................................................II


1.0 INTRODUCTION .............................................................................................................................................5
2.0 AUDIT OBJECTIVES AND SCOPE..........................................................................................................7
3.0 AUDIT APPROACH AND METHODOLOGY .............................................................................................7
4.0 AUDIT FINDINGS AND RECOMMENDATIONS.......................................................................................8
4.1 MANAGEMENT CONTROL FRAMEWORK FOR DISASTER RECOVERY PLANNING ............................................8
4.1.1 DRP Framework (COBIT DS4.1)......................................................................................................8
4.1.2 Disaster Recovery Plans, Critical Resources and Recovery & Resumption (COBIT DS4.2,
4.3 &4.8) ...........................................................................................................................................................10
4.1.3 Maintenance, Testing and Training of DRPs (COBIT DS4.4, 4.5 & 4.6) ..................................11
4.1.4 Distribution of DRPs (COBIT DS4.7) .............................................................................................12
4.1.5 Offsite Backup Storage (COBIT DS4.9) ........................................................................................13
4.1.6 Post-Resumption Review (COBIT DS4.10) ..................................................................................14
4.2 PROGRESS MADE ON MITS IMPLEMENTATION AND TABLETOP-INITIATED IMPROVEMENTS ..................15
4.2.1 Implementation of DRP-Related MITS Requirements.................................................................15
4.2.2 Implementation of Improvements ...................................................................................................16

Annex A – Audit Criteria


Annex B – Action Items from Confident Recovery I
Annex C – Management Action Plan
FINAL REPORT

EXECUTIVE SUMMARY

Background, Scope and Approach

In order to continue to meet its objectives and the requirements of the Government
Security Policy (GSP), CSC developed, through Information Management Services
(IMS), Disaster Recovery Plans (DRPs) for its applications identified as critical.

The objective of this audit is to provide reasonable assurance that the management
control framework in place to support disaster recovery preparedness for information
technology systems is adequate and effective. The audit also reviewed the progress
made on the implementation of the DRP-related requirements of the TBS Operational
Security Standard on Management of Information Technology Security (MITS) and
improvements initiated as a result of a DRP tabletop exercise conducted by IMS in
2005.

The scope of the audit included the IMS DRPs for critical applications, the controls in
place at National Headquarters (NHQ) and the Laval facility to support the timely
implementation of the DRPs, and linkages of DRPs and Business Continuity Plans
(BCPs), although BCPs themselves are outside the scope of the audit. While regions
have recently started to develop DRPs, the regional offices have only recently been
actively engaged in DRP activities in the context of the DRP tabletop exercises. The
scope of the audit therefore included a review of regional DRPs, but on-site visits in
each region were not deemed necessary considering the limited DRP-related control
activities performed in the regions, and the fact that all critical applications identified are
centrally managed at NHQ.

The critical applications, which were identified as critical during the Year 2000 project,
and included within the scope of this audit are outlined below:

System/ Number
System Overview
Application of Users
Network The CSC network infrastructure enables the
interaction of CSC staff and partners with the various
forms of electronic information stores and
applications.
CEDV2 Common Enterprise Desktop v2 (CEDV2) is the 13,000
common operating system on all CSC user
workstations delivering access to CSC applications.
Email Electronic mail (email) is a key messaging system 14,000
used within CSC.
OMS The Offender Management System (OMS) is a 10,000
computer based application developed for

ii
FINAL REPORT

Correctional Service of Canada (CSC) and National


Parole Board (NPB) to manage offender-related
information. Through the OMS system, CSC is
connected to the National Parole Board, RCMP, and
CCRA (Immigration) to share relevant offender
information.
IAS Inmate Accounting System (IAS) is an application 150
used by CSC institution clerks to manage inmates
pay and savings accounts (funds).
HRMS The Human Resource Management System (HRMS) 3,200
is an element of the PeopleSoft application.
PeopleSoft is a Commercially Off-The-Shelf (COTS)
application frequently used in private and public
sectors. It offers a range of products such as Human
Resource Management, Financial Management,
Management of Materiel and scheduling of Time and
Labour.
IFMMS The Integrated Financial and Material Management 600
System (IFMMS) is CSC's corporate financial system.
Online Pay The Online Pay application (OLPS) is used by CSC to
process payroll data.
RADAR RADAR (Reports of Automated Data Applied to 10,000
Reintegration) is a suite of reports that allows CSC
staff and managers to access OMS offender
information in a user-friendly manner.

The approach and methodology used is consistent with the Internal Audit standards as
outlined by the Institute of Internal Auditors, and is aligned with the Internal Audit Policy
for the Government of Canada. Audit criteria was developed from COBIT 4.1
(www.isaca.org) DS4 requirements, and also include specific DRP-related requirements
from the TBS Government Security Policy and supporting Management of IT Security
Standard. The audit criteria are included in Annex A.

Conclusion

A number of key controls for the DRP program have been implemented. Namely, CSC
has developed a DRP program for critical business applications which includes a
dedicated resource, has been based on an established framework, plans for the
resumption of critical application services are in place, the program makes use of off-
site storage and recovery and also it has been subject to table top testing exercises.

Several areas for improvement to the current DRP program were identified. Formal
Service Level Agreements detailing requirements for the availability of systems should
be implemented between IMS and its clients, a formal Business Impact Analysis should

iii
FINAL REPORT

be completed, a complete fail-over test for all critical applications should be performed,
and current efforts to further implement and test DRPs in regions should continue.

Progress has been made on implementing improvements from MITS and the 2005 DRP
tabletop exercise. However, further efforts are required to fully meet MITS requirements
related to DRP revisions and testing.

Recommendations have been made in this report to address these areas for improvement.
Management has reviewed and agrees with the findings contained in this report and a
Management Action Plan has been developed to address the recommendations (see Annex
C).

iv
FINAL REPORT

1.0 INTRODUCTION

As a federal government agency, Correctional Service Canada (CSC) is responsible for


managing institutions of various security levels and supervising offenders under
conditional release in the community. CSC is one component of the larger criminal
justice system, and works closely with other partners in the Public Safety Canada
portfolio, including the Royal Canadian Mounted Police and the National Parole Board,
and with all police agencies.

In order to continue to meet its objectives and the requirements of the Government
Security Policy (GSP), CSC developed, through Information Management Services
(IMS), Disaster Recovery Plans (DRPs) for its applications identified as critical. Due to
the importance of the DRP for CSC and the results of the 2006 preliminary risk
assessment of the IT function by Internal Audit, the Audit Committee has approved an
Audit of the Information Management Services DRP as part of the Internal Audit Branch
audit plan for 2008-2009.

IMS has identified within its Security and Project Management Directorate (ITSEC) a
Manager responsible for DRP, assigned a Senior Project Officer to DRPs on a full time
basis, and has started to review all DRPs available and to conduct tabletop and failover
tests to improve upon the current DRPs. While ITSEC is responsible for coordinating,
monitoring, testing and standardizing disaster recovery activities, the Infrastructure
Services and Operations (ISO) and Systems Development Directorates are responsible
for the development and maintenance of DRPs related to IT operations and critical
applications. All staff involved in Disaster Recovery (DR) activities centrally report up to
IMS, but may be located at NHQ, the alternate processing site in Laval (Quebec), or
within the regions or institution that they support.

Business Continuity Planning is the responsibility of the Departmental Security Officer,


and he has requested business areas within CSC to develop Business Continuity Plans
(BCPs). The BCPs should detail, among other things, the business area’s requirements
in terms of IT resources required to ensure the continuity of their business area. IMS is
responsible for developing DRPs that address these requirements for IT resources in
the event of a disaster.

5
FINAL REPORT

The critical applications, which were identified as critical during the Year 2000 project,
and included within the scope of this audit are outlined below:

System/ Number
System Overview
Application of Users
Network The CSC network infrastructure enables the
interaction of CSC staff and partners with the various
forms of electronic information stores and
applications.
CEDV2 Common Enterprise Desktop v2 (CEDV2) is the 13,000
common operating system on all CSC user
workstations delivering access to CSC applications.
Email Electronic mail (email) is a key messaging system 14,000
used within CSC.
OMS The Offender Management System (OMS) is a 10,000
computer based application developed for
Correctional Service of Canada (CSC) and National
Parole Board (NPB) to manage offender-related
information. Through the OMS system, CSC is
connected to the National Parole Board, RCMP, and
CCRA (Immigration) to share relevant offender
information.
IAS Inmate Accounting System (IAS) is an application 150
used by CSC institution clerks to manage inmates
pay and savings accounts (funds).
HRMS The Human Resource Management System (HRMS) 3,200
is an element of the PeopleSoft application.
PeopleSoft is a Commercially Off-The-Shelf (COTS)
application frequently used in private and public
sectors. It offers a range of products such as Human
Resource Management, Financial Management,
Management of Materiel and scheduling of Time and
Labour.
IFMMS The Integrated Financial and Material Management 600
System (IFMMS) is CSC's corporate financial system.
Online Pay The Online Pay application (OLPS) is used by CSC to
process payroll data.
RADAR RADAR (Reports of Automated Data Applied to 10,000
Reintegration) is a suite of reports that allows CSC
staff and managers to access OMS offender
information in a user-friendly manner.

6
FINAL REPORT

2.0 AUDIT OBJECTIVES AND SCOPE

2.1 Audit Objectives


The objective of this audit is to provide reasonable assurance that the management
control framework in place to support disaster recovery preparedness for information
technology systems is adequate and effective. The audit also reviewed the progress
made on the implementation of the DRP-related requirements of the TBS Operational
Security Standard on Management of Information Technology Security (MITS) and
improvements initiated as a result of a DRP tabletop exercise conducted by IMS in
2005.

2.2 Audit Scope


The scope of the audit was based on an initial risk assessment. As a result, it included
the IMS DRPs for critical applications, and the controls in place at National
Headquarters (NHQ) and the Laval facility to support the timely implementation of the
DRPs. Considering risks identified, the audit also includes the linkages of DRPs and
Business Continuity Plans (BCPs), although BCPs themselves are outside the scope of
the audit. DRP controls were only tested for design and implementation (i.e. at a point in
time) and were not tested for their operating effectiveness (i.e. over a period of time).

While regions have recently started to develop DRPs, the regional offices have only
recently been actively engaged in DRP activities in the context of the DRP tabletop
exercises. The scope of the DRP therefore included a review of regional DRPs, but on-
site visits in each region were not deemed necessary considering the limited DRP-
related control activities performed in the regions, and the fact that all critical
applications are centrally managed at NHQ. The only on-site visit outside of the National
Capital Region (NCR) was at the alternate processing facility in Laval.

3.0 AUDIT APPROACH AND METHODOLOGY


The approach and methodology used is consistent with the Internal Audit standards as
outlined by the Institute of Internal Auditors, and is aligned with the Internal Audit Policy
for the Government of Canada.

Following an analysis of potential control frameworks to use for the audit, a risk-based
audit program was developed from COBIT 4.1 (www.isaca.org) DS4 requirements, and
also include specific DRP-related requirements from the TBS Government Security
Policy and supporting Management of IT Security Standard (http://www.tbs-
sct.gc.ca/pubs_pol/gospubs/TBM_12A/23RECON-eng.asp). The audit criteria are
included in Annex A.

7
FINAL REPORT

Work was conducted in the NCR between September 2008 and December 2008, and
included an on-site visit to the Laval alternate processing facility. Inquiries were held
with numerous CSC representatives involved in DRP activities. Testing included a
review of directives and guidelines, organizational structure, roles and responsibilities,
and observing the tabletop testing exercise conducted for NHQ in November 2008. For
NHQ, testing was conducted for the full lifecycle of the DRP; from initial development to
testing, training, maintenance and updating. For regions, testing was limited to a review
of the regional specific DRPs and a limited number of interviews.

Upon completing fieldwork, the team held a debriefing meeting at National


Headquarters with the Chief Information Officer, Information Management Services and
the Director, IT Security and Project Management.

4.0 AUDIT FINDINGS AND RECOMMENDATIONS

4.1 Management Control Framework for Disaster Recovery Planning

We assessed the extent to which the management control framework for DRP is in
place.

4.1.1 DRP Framework (COBIT DS4.1)

We expected to find a framework that supports enterprise wide DR planning using a


consistent process. The objective of the framework should be to assist in determining
the required resilience of the infrastructure and to drive the development of the DRPs.
The framework should address the organisational structure, covering the roles, tasks
and responsibilities of internal and external service providers, their management and
their customers, and the planning processes that create the rules and structures to
document, test and execute the disaster recovery and IT contingency plans. The DRPs
should also address items such as the identification of critical resources, noting key
dependencies, the monitoring and reporting of the availability of critical resources,
alternative processing, and the principles of backup and recovery.

The DRP Framework is appropriately designed with regards to critical


applications maintained at NHQ, but not fully implemented as a number of areas
for improvement still exist. The framework is not yet sufficiently implemented
within regions.

More specifically during our testing we made the following observations:


 CSC has not formally assessed the adequacy of having only one resource
dedicated to DRP activities, which increases the risk that the DRP program is
under staffed.
 A standard template for regional DRPs and other related documents has not
been developed and distributed to CSC Regions to ensure regional DRPs are
comprehensive and consistent. For example, most regional DRPs do not identify

8
FINAL REPORT

the individuals assigned to various regional DRP responsibilities, and most of the
regional DRPs are missing key information such as a listing of critical
applications and related recovery time objectives. This lack of consistency
increases the risk that recovery efforts will be more difficult to coordinate in the
event of a disaster requiring multiple DRPs to be activated.
 Internal Service Level Agreements detailing requirements for the availability of
systems have not been implemented between IMS and its clients, which
increases the risk that IMS may not be aware and able to respond to the
availability requirements of the business areas.
 There is no evidence that any of the DRPs have been approved by senior
management, which increases the risk that DRPs may not meet the needs of
senior management.

Recommendation #1:

The Chief Information Officer, Information Management Services should formally


assess the adequacy of the level of resources currently assigned to the DRP program.

Recommendation #2:

The Chief Information Officer, Information Management Services should finalize a


standard template for documenting, testing and distributing DRPs at a regional level
within CSC.

Recommendation #3:

The Chief Information Officer, Information Management Services should ensure that
formal Service Level Agreements detailing requirements for the availability of systems
be implemented between IMS and its clients across CSC.

Recommendation #4:

The Chief Information Officer, Information Management Services should ensure that all
current DRPs are appropriately reviewed and formally approved by the same parties
that sign the internal Service Levels Agreements within which availability requirements
will be specified. All significant changes to DRPs should also be subject to review and
formal approval by management/application owners.

9
FINAL REPORT

4.1.2 Disaster Recovery Plans, Critical Resources and Recovery & Resumption
(COBIT DS4.2, 4.3 &4.8)

We expected to find DRPs based on the framework and designed to reduce the impact
of a major disruption on key business functions and processes. The plans should be
based on risk understanding of potential business impacts and address requirements
for resilience, alternative processing and recovery capability of all critical IT services.
They should also cover usage guidelines, roles and responsibilities, procedures,
communication processes, and the testing approach. The DRPs should focus attention
on items specified as most critical and establish priorities in recovery situations. The
DRPs should ensure response and recovery in line with prioritised business needs,
while ensuring that costs are kept at an acceptable level and complying with regulatory
and contractual requirements. Lastly, we expected to find plans for the actions to be
taken for the period when IT is recovering and resuming services.

DRPs have been appropriately designed with regards to critical applications


maintained at NHQ, but not fully implemented as a number of areas for
improvement still exist. DRPs have not yet sufficiently been implemented within
regions.

More specifically during our testing we made the following observations:


 As CSC relied on the list of critical applications identified for Y2K disaster
recovery efforts, no Business Impact Analysis has been conducted, which
increases the risk that all critical applications may not have been appropriately
identified, and that defined recovery time objectives (RTOs) may not be
appropriate. A Business Impact Analysis is typically conducted as part of the
BCP process, which falls under the responsibility of the DSO at CSC.
 While the DRPs of NHQ critical applications are based on defined recovery time
objectives (RTOs), it is not clear if all RTOs can be met, especially in a full
disaster situation where all critical applications need to be recovered, which
increases the risk that IT resources will not be recovered in time to meet the
requirements of the business areas.
 There is no guidance available on the timeframe within which a disaster should
be declared (RTOs only kick in once a disaster has been declared), which
increases the risk that a disaster may not be declared in a timely manner in order
to meet the requirements of the business areas.
 The DR role and training of staff located at the alternate processing facility (in
Laval) has been minimized, and DRPs rely mostly on staff located at NHQ, which
increases the risk of further delaying recovery efforts should NHQ staff be
delayed in relocating to the alternate site.
 While the DRPs could leverage the Staff College located next to the alternate
processing facility, the DRPs rely on the availability of rooms in nearby hotels to
relocate DR resources from NHQ, which increases the risk of further delaying
recovery efforts should NHQ staff be delayed in relocating to the alternate site.

10
FINAL REPORT

Recommendation #5:

The Departmental Security Officer (DSO) should ensure that a formal Business Impact
Analysis is completed by the business/applications owners to confirm the identification
of critical applications and to further confirm that the identified Recovery Time
Objectives remain appropriate and relevant.

Recommendation #6:

The Chief Information Officer, Information Management Services should develop a DRP
training program specifically aimed at increasing the DRP knowledge of the resources in
the alternate processing facility in Laval as a means of expanding the availability of
qualified DR resources in the event of a disaster.

4.1.3 Maintenance, Testing and Training of DRPs (COBIT DS4.4, 4.5 & 4.6)

We expected to find implemented change control procedures to ensure that the DRPs
are kept up to date and continually reflect actual business requirements, and that
changes in procedures and responsibilities are communicated clearly and in a timely
manner. We also expected to find regular tests of the DRPs to ensure that IT systems
can be effectively recovered, shortcomings are addressed and the plans remain
relevant. This requires careful preparation, documentation, reporting of test results and,
according to the results, implementation of an action plan. Lastly, we expected to find
that all concerned parties are provided with regular training sessions regarding the
procedures and their roles and responsibilities in case of an incident or disaster.

A testing plan has been designed and partly implemented for critical applications
and regions, consisting mostly of tabletop tests and limited failover tests. While
the regular maintenance of DRPs has been implemented, it is not clear if these
updates are addressing all lessons learned from testing performed.

More specifically during our testing we made the following observations:


 While fail-over tests for three critical applications have been conducted, complete
fail-over testing for all critical applications has not occurred, which increases the
risk that the defined RTOs may not meet the needs of the business areas, and
that current DPRs are missing important steps to permit the full recovery of
critical applications.
 The capacity of the alternate processing facility to take over all critical
applications has not been formally assessed, which increases the risk that critical
applications may not be responsive when running at the alternate processing
facility.

11
FINAL REPORT

 DRPs are not always being updated on at least a yearly basis as required by IMS
guidelines, which increases the risk that DRPs will be outdated and miss critical
steps in the recovery of critical applications.
 Lessons learned and action plans from testing sessions have not been
consistently documented, which increases the risk that problems raised during
testing may not have been formally addressed or updated within the DRP. Of the
twenty potential improvements identified in the 2005 DRP tabletop exercise,
eleven have been implemented (55%), two have been partially implemented
(10%), and seven have not yet been implemented (35%).
 A training plan does not formally exist. Training of DR resources is essentially
accomplished through participation in tabletop testing exercises, but attendance
is not mandatory. This increases the risk that staff may not always attend
required DRP training.

Recommendation #7:

The Chief Information Officer, Information Management Services should expand the
current testing program and include annual testing of the processing capacity of the
alternate processing facility.

Recommendation #8:

The Chief Information Officer, Information Management Services should ensure that all
DRP documents are updated at least annually, or following a significant change. As
part of the update, the Chief Information Officer should also ensure that the DRPs are
formally reviewed and approved by application owners.

Recommendation #9:

The Chief Information Officer, Information Management Services should implement a


process to ensure that lessons learned from DRP testing is consistently documented
and proactively addressed.

4.1.4 Distribution of DRPs (COBIT DS4.7)

We expected to find that a defined and managed distribution strategy exists to ensure
that plans are properly and securely distributed and available to appropriately
authorized interested parties when and where needed. Attention should be paid to
making the plans accessible under all disaster scenarios.

12
FINAL REPORT

A limited DRP distribution plan has been designed and implemented at NHQ, but
not in the regions. DRPs are available in a central repository at NHQ, and
replicated to the alternate processing facility on a daily basis, but areas for
improvement exist for DRP distribution processes.

More specifically during our testing we made the following observations:


 The DRPs do not currently include a comprehensive distribution plan listing all
individuals that should have a copy of the most recent DRP and the method of
distribution, which increases the risk that DRPs will not be readily available in the
event of a disaster.
 When we performed our testing, staff located at the alternate processing facility
with DR responsibilities could not readily access DR documents, which increases
the risk that DRPs will not be readily available in the event of a disaster.

Recommendation #10:

The Chief Information Officer, Information Management Services should ensure that
DRPs include a comprehensive distribution plan listing all individuals that should have a
copy of the most recent DRP and the method of distribution, and that staff located at the
alternate processing facility has readily access to DR documents.

4.1.5 Offsite Backup Storage (COBIT DS4.9)

We expected to find offsite storage of all critical backup media, documentation and
other IT resources necessary for IT recovery and business continuity plans.
Management of the offsite storage facility should respond to the data classification
policy and the enterprise’s media storage practices. IT management should ensure that
offsite arrangements are assessed for content, environmental protection and security.
Compatibility of hardware and software to restore archived data, and periodically test
and refresh archived data should also be ensured.

An offsite backup storage process has been designed and implemented. While
some critical applications also rely on data replication to reduce risks of data
loss, areas for improvement exist with backup storage processes.

More specifically during our testing we made the following observations:


 Backup tapes for critical applications and regions are not encrypted, which
increases the risk of unauthorized access to the data on the backup tapes,
especially while the tapes are in transit from CSC to National Archives or the
alternate processing facility.

13
FINAL REPORT

Recommendation #11:

The Chief Information Officer, Information Management Services should implement a


solution that enables CSC to encrypt backup tapes for application data assessed as
sensitive either from a security or from an access to information perspective.

4.1.6 Post-Resumption Review (COBIT DS4.10)

We expected to find that IT management has established procedures for assessing the
adequacy of the plan in regard to the successful resumption of the IT function after a
disaster, and update the plan accordingly.

A post-resumption review process has not yet been formally designed and
implemented.

More specifically during our testing we made the following observations:


 Evidence could not be found that lessons learned from DRP testing and actual
incidents and disasters are formally leveraged to make improvements to the
DRPs, which increases the risk that DPRs are missing important steps to permit
the full recovery of critical applications.
 DRPs do not currently include steps for the resumption of activities back to the
primary processing facility, which increases the risk of further delaying the
resumption to normal IT operations.

Recommendation #12:

The Chief Information Officer, Information Management Services should ensure that, as
part of the DRP, there is either a plan to restore resumptions of activities back to NHQ
or a new DRP which would guide DR staff in the event of a disaster at the alternate
processing facility (prior to resumption at NHQ).

CONCLUSION:
Overall, CSC has designed and implemented some key elements of the Management
Framework for DRP. However, it will be important that CSC establishes Formal Service
Level Agreements detailing requirements for the availability of systems between IMS
and its clients. To enable the development of such agreements a formal Business
Impact Analysis should be completed. In addition, a complete fail-over test for all critical
applications should be performed, and current efforts to further implement and test
DRPs in regions should continue.

14
FINAL REPORT

4.2 Progress Made on MITS Implementation and Tabletop-initiated


improvements

The second objective of the audit was to assess the progress made on the
implementation of the DRP-related requirements of the TBS Operational Security
Standard on Management of Information Technology Security (MITS) and
improvements initiated as a result of a DRP tabletop exercise conducted by IMS in
2005.

4.2.1 Implementation of DRP-Related MITS Requirements

We expected to find evidence that CSC has formally implemented the DRP-related
requirement of MITS, specifically:
 As part of their business continuity planning, departments must produce and
routinely test and revise an IM continuity plan and an IT continuity plan. (MITS
12.8)
 Departments must restore essential capabilities within the time constraints and
the availability requirements specified in the departmental Business Continuity
Plan (MITS 18.5);
 Backup and recovery procedures exist and are documented (MITS 18.5); and,
 Backup data is created regularly and copies are maintained at an off-site location
(MITS 18.5).

CSC has designed and implemented DRP-related requirements of MITS, with the
exception of requirements related to DRP revisions and testing.

More specifically, during our testing we made the following observations:


 DRPs have been produced for NHQ critical applications and for regions, but have
not been consistently revised and tested, which increases the risk that the DRPs
are missing critical steps in the recovery of critical regional applications.
 CSC has not comprehensively tested its capacity to restore all critical
applications within the time constraints and the availability requirements specified
in BCPs, which increases the risk that critical applications may not be recovered
in a timely manner, or be operating at expected processing levels when running
at the alternate processing facility.
 Backup and recovery procedures exist and are documented. Backup data is
created regularly and copies are maintained at an off-site location

These observations were previously noted in Section 4.1 and related recommendations
were included as part of that section.

15
FINAL REPORT

4.2.2 Implementation of Improvements

We expected to find evidence that CSC has formally implemented improvements


initiated as a result of a DRP tabletop exercise conducted by IMS in 2005.

Of the twenty potential improvements identified in the 2005 DRP tabletop


exercise, eleven have been implemented (55%), two have been partially
implemented (10%), and seven have not yet been implemented (35%). The
complete list of improvements and status is included in appendix B.

More specifically, during our testing we made the following observations:


 DRPs do not specifically document who will be relocated to the alternate
processing facility, how quickly they should relocate, where they will lodge and
for how long. This increases the risk of further delaying recovery efforts in
relocating NHQ staff to the alternate site.
 A process for the secure transportation of the backup tapes to the alternate
processing facility has not been documented, which increases the risk of
unauthorized access, loss or theft of backup tapes.

These observations were previously noted in Section 4.1 and related recommendations
were included as part of that section.

CONCLUSION:

Progress has been made on implementing improvements from MITS and the 2005 DRP
tabletop exercise. However, further efforts are required to fully meet MITS requirements
related to DRP revisions and testing.

16
FINAL REPORT

Annex A
Audit Criteria

The audit criteria for the audit were developed from COBIT 4.1 (www.isaca.org) DS4
requirements, and also include specific DRP-related requirements from the TBS
Government Security Policy and supporting Management of IT Security Standard
(http://www.tbs-sct.gc.ca/pubs_pol/gospubs/TBM_12A/23RECON-eng.asp).

1. IT Continuity Framework:
Develop a framework for IT continuity to support enterprise wide business continuity
management using a consistent process. The objective of the framework should be to
assist in determining the required resilience of the infrastructure and to drive the
development of disaster recovery and IT contingency plans. The framework should
address the organizational structure for continuity management, covering the roles,
tasks and responsibilities of internal and external service providers, their management
and their customers, and the planning processes that create the rules and structures to
document, test and execute the disaster recovery and IT contingency plans. The plan
should also address items such as the identification of critical resources, noting key
dependencies, the monitoring and reporting of the availability of critical resources,
alternative processing, and the principles of backup and recovery.

2. IT Continuity Plans:
Develop IT continuity plans based on the framework and designed to reduce the impact
of a major disruption on key business functions and processes. The plans should be
based on risk understanding of potential business impacts and address requirements
for resilience, alternative processing and recovery capability of all critical IT services.
They should also cover usage guidelines, roles and responsibilities, procedures,
communication processes, and the testing approach.

3. Critical IT Resources:
Focus attention on items specified as most critical in the IT continuity plan to build in
resilience and establish priorities in recovery situations. Avoid the distraction of
recovering less-critical items and ensure response and recovery in line with prioritised
business needs, while ensuring that costs are kept at an acceptable level and
complying with regulatory and contractual requirements. Consider resilience, response
and recovery requirements for different tiers, e.g., one to four hours, four to 24 hours,
more than 24 hours and critical business operational periods.

4. Maintenance of the IT Continuity Plan:


Encourage IT management to define and execute change control procedures to ensure
that the IT continuity plan is kept up to date and continually reflects actual business
requirements. Communicate changes in procedures and responsibilities clearly and in a
timely manner.
FINAL REPORT

5. Testing of the IT Continuity Plan:


Test the IT continuity plan on a regular basis to ensure that IT systems can be
effectively recovered, shortcomings are addressed and the plan remains relevant. This
requires careful preparation, documentation, reporting of test results and, according to
the results, implementation of an action plan. Consider the extent of testing recovery of
single applications to integrated testing scenarios to end-to-end testing and integrated
vendor testing.

6. IT Continuity Plan Training:


Provide all concerned parties with regular training sessions regarding the procedures
and their roles and responsibilities in case of an incident or disaster. Verify and enhance
training according to the results of the contingency tests.

7. Distribution of the IT Continuity Plan:


Determine that a defined and managed distribution strategy exists to ensure that plans
are properly and securely distributed and available to appropriately authorised
interested parties when and where needed. Attention should be paid to making the
plans accessible under all disaster scenarios.

8. IT Services Recovery and Resumption:


Plan the actions to be taken for the period when IT is recovering and resuming services.
This may include activation of backup sites, initiation of alternative processing, customer
and stakeholder communication, and resumption procedures. Ensure that the business
understands IT recovery times and the necessary technology investments to support
business recovery and resumption needs.

9. Offsite Backup Storage:


Store offsite all critical backup media, documentation and other IT resources necessary
for IT recovery and business continuity plans. Determine the content of backup storage
in collaboration between business process owners and IT personnel. Management of
the offsite storage facility should respond to the data classification policy and the
enterprise’s media storage practices. IT management should ensure that offsite
arrangements are periodically assessed, at least annually, for content, environmental
protection and security. Ensure compatibility of hardware and software to restore
archived data, and periodically test and refresh archived data.

10. Post-resumption Review:

Determine whether IT management has established procedures for assessing the


adequacy of the plan in regard to the successful resumption of the IT function after a
disaster, and update the plan accordingly.
FINAL REPORT

Annex B
Action Items from Confident Recovery I

Action Item /
Ref: Lesson Learned / Finding: Status:
Comment:
Alerte Phase
i. Time required for recovery teams Time required is not indicated in Not
to rendezvous for a meeting prior any plans. During conversation implemented
to deployment to recovery site with the DR Coordinator
needs to be taken into account December 23, 2008, this cannot
be documented. Onus is with
recovery groups to meet RTO.
ii. The contact list (call tree) was In the 2005 meeting, Bruno had Implemented
problematic. Lesson learned: the responsibility to call everyone.
team members should have all Changed for 2007 tabletop. Call
their team contact numbers and plan is documented in the BCP.
a fan out.
iii. Lesson learned: provision This was not included in the plan Implemented
should be made for senior in 2005. The BCP in 2008 has
management to hold meeting to the Emergency Operations Centre
discuss actions prior to a at 100 Metcalf street.
disaster declaration being made.
iv. Provision for notification process In 2005 alternates were not Implemented
to inform the designated identified. The 2008 BCP and
alternate contact that they are DRP’s have primes and alternates
the primary when the original documented.
primary is on vacation.
Deployment Phase
i Recovery Team
i.1 Rendezvous point needs to be In 2005, the rendezvous point was Implemented
clearly identified. not documented. The rendezvous
point of Carlingwood Shopping
Centre is documented in the BCP
in 2008.
i.2 Staff live all across Ottawa. There is a single central Implemented
These staff could be picked up rendezvous point.
en-route to Laval by bus
i.3 Initial deployment of all staff for This is a comment. In 2005 the Not
30 days is too long. After duration was not implemented. Implemented
recovery, less NHQ staff needed In 2008, the understanding is that
at Laval site. people are needed until the site is
functional. People are not needed
on site for more than 3 days.
This is not documented. Most
staff are under the understanding
FINAL REPORT

Action Item /
Ref: Lesson Learned / Finding: Status:
Comment:
that they are not needed to be
onsite. Their understanding is
that they will be working via VPN.
i.4 Aide-memoire should be created This has not been created Not
to remind team members what implemented
they should bring to recovery
site.
i.5 No provision for contract staff toDuring December 23 meeting with Partially
be paid past the 37.5 hours per Terry, noted that as contractor Implemented
week. contracts come up for renewal,
DRP availability clauses are being
inserted into the contracts.
i.6 Designated manager responsible Mentioned in BCP. During Implemented
for arranging bus services should meeting with Terry December 23,
be made aware of this 2008, the recovery manager for
responsibility. BCP (Murray) has this
responsibility. This was confirmed
with Murray during his interview
i.7 Other means of transportation During meeting with the DR Not
should be examined. Coordinator, this is his implemented
responsibility of the BCP recovery
manager.
Right of Refusal
ii.1 Recovery Team composition The DR Coordinator states that Not
must be reviewed to include although CSC cannot mandate Implemented
members with issues of being that personnel must respond to a
deployed to another site with a disaster, most primes understand
few hours notice. they will be there. However,
during interviews, most indicated
they do not think they need to go
to Laval
ii.2 Mechanisms to handle team Mechanism is the alternate Implemented
members who have health or contact. In 2005, alternates were
family issues with relocation to not defined. Every prime and key
recovery site. member has an alternate.
Physical Space
iii.1 Recovery site physical space Informal understanding that the Partially
limited to 15 people. Many more Staff College in Laval can be used. Implemented
than this are designated to go to
recovery site.
Backup Tapes
iv.1 Discrepancy in plans that tapes During interviews and process Implemented
are actually recovered from King review, it is understood that tapes
FINAL REPORT

Action Item /
Ref: Lesson Learned / Finding: Status:
Comment:
Edward, not National Archives. are stored at National Archives
and King Edward (2 copies).
Tapes will be recovered from
National Archives.
iv.2 Tape transportation – Treasury Plans do not indicate secure Not
Board (TBS) Standard for transport. implemented.
physical security requires secure
transport of tapes. Process must
be in place that, in the event of
an accident, would identify to
police the fact that there is cargo
in the vehicle and that the
container should be safeguarded
and not released to just anybody
Lodging of staff in Laval
v.1 Documents do not specify where This is still outstanding. Informal Not
staff would lodge while in Laval. arrangement that Normand implemented
Vermette in Laval makes
arrangements for Lodging in the
Laval area, or at the staff college.
Formal agreements are not in
place.
Recovery Phase
i Given the occasion and setting Tabletops occur yearly, and plans Implemented
to sit down as a team, recovery updated based on tabletop.
teams relish the opportunity to
review and revise their expected
plans.
ii Service Desk did not have Service desk has a detailed DRP. Implemented
detailed recovery plan We observed Service Desk DR
procedures during the tabletop
iii The DRP’s do not call for the In 2005 they did not have a Implemented
establishment of a “command command post. In 2007 they had
centre (CC) or Command post 441 MacLean defined as
(CP)” command centre. In 2008 they
have 100 Metcalfe.
FINAL REPORT

Annex C
Management Action Plan

Planned
Recommendation Action Summary OPI Completion
Date
Recommendation #1: 1.) Business case will be developed The Chief 1.) June 2009
The Chief Information Officer, jointly between ISO and ITSEC and Information
Information Management Services submitted to CIO for review and Officer,
should formally assess the adequacy approval. Information
of the level of resources currently Management
assigned to the DRP program. Services
Recommendation #2: 1.) Template was created in 2008. The Chief 1.) Completed
The Chief Information Officer, Information
Information Management Services 2.) Template will be presented to Officer, 2.) April 2009
should finalize a standard template for Regional Administrators IMS for Information
documenting, testing and distributing comments and implementation Management
DRPs at a regional level within CSC. Services
3.) Plans were created for Regional 3.) Completed
DRP’s FY07/08
Tabletop exercises were conducted in all
regions including NHQ in FY 2008/2009
based on the templates.
Recommendation #3: 1.) SLA template to be developed by The Chief 1.) Completed
The Chief Information Officer, ISO. Template will include RPO/RTO Information
Information Management Services from C&A evidence. Officer,
should ensure that formal Service Information
Level Agreements detailing 2.) ISO to create SLA’s for all mission Management 2.) April 2010
requirements for the availability of critical applications. Services NOTE: SLAs will
systems be implemented between IMS not be created for
and its clients across CSC. CED2
Engineering
group and the
Service (these
will be outlined
within individual
SLAs for Desktop
Support etc and
within the IT
Service
Catalogue.)

3.) Sign-off on SLAs 3.) April 2010


Recommendation #4: 1.) DRP’s will be reviewed by March 31, The Chief 1.) Completed
The Chief Information Officer, 2009. Information
Information Management Services Officer,
should ensure that all current DRPs 2.) DR Coordinator will design a formal Information 2.) June 2009
are appropriately reviewed and DRP review/approval process with Management
formally approved by the same parties annual updates. Services
that sign the internal Service Levels
Agreements within which availability 3.) Process will be approved and 3.) July 2009
requirements will be specified. All implemented by CIO.
FINAL REPORT

Planned
Recommendation Action Summary OPI Completion
Date
significant changes to DRPs should
also be subject to review and formal 4.) Change Process Trigger: Class 1 4.) Ongoing
approval by management/application Change and CAB Process document
owners. amended by ISO.
Recommendation #5: Departmental Security will undertake a The October 2009
The Departmental Security Officer Business Impact Analysis (BIA) for NHQ Departmental
(DSO) should ensure that a formal in the coming months. Included in this Security Officer
Business Impact Analysis is completed analysis will be a dialogue between
by the business/applications owners to Departmental Security, Information
confirm the identification of critical Technology Security as well as the
applications and to further confirm that business/application owners with a view
the identified Recovery Time to determining mission critical
Objectives remain appropriate and applications and their recovery time
relevant. objectives. The committed involvement
of all stakeholders will be crucial to
ensure that the BIA is accurate, well-
developed and relevant to the mission of
CSC.
Recommendation #6: 1.) ISO to develop and implement a The Chief 1.) June 2009
The Chief Information Officer, cross training plan for DR Information
Information Management Services activities/responsibilities in Laval Officer,
should develop a DRP training including implementation dates. Information
program specifically aimed at Management
increasing the DRP knowledge of the Services
resources in the alternate processing
facility in Laval as a means of
expanding the availability of qualified
DR resources in the event of a
disaster.
Recommendation #7: 1.) Capacity/failover tests were The Chief 1.) Completed
The Chief Information Officer, conducted in DR site in 2008 on IFMMS, Information
Information Management Services and HRMS applications. Officer,
should expand the current testing Information
program and include annual testing of 2.) Capacity/failover tests will be run Management 2.) March 2010
the processing capacity of the alternate annually on mission critical applications. Services
processing facility.
Recommendation #8: 1.) DR Coordinator will design a formal The Chief 1.) June 2009
The Chief Information Officer, DRP review/approval process. Process Information
Information Management Services will be approved and implemented by Officer,
should ensure that all DRP documents IMS CIO. Information
are updated at least annually, or Management
following a significant change. As part Services
of the update, the Chief Information
Officer should also ensure that the
DRPs are formally reviewed and
approved by application owners.

Recommendation #9: 1.) Action plan was created to address The Chief 1.) completed
The Chief Information Officer, issues identified in2007/2008 Lesson Information
Information Management Services Learned document. Officer,
should implement a process to ensure Information
that lessons learned from DRP testing 2.) Process will be developed, approved Management 2.) April 2010
FINAL REPORT

Planned
Recommendation Action Summary OPI Completion
Date
is consistently documented and and implemented by IMS CIO. Part of Services
proactively addressed. the process will include action plan to
deal with lesson learned
recommendations.
Recommendation #10: 1.) Management/application owners to The Chief 1.) June 2009
The Chief Information Officer, update DR plans to include distribution Information
Information Management Services list, and maintain list. Officer,
should ensure that DRPs include a Information
comprehensive distribution plan listing Management
all individuals that should have a copy 2.) Laval staff already has access to all Services 2.) Completed
of the most recent DRP and the DR Plans.
method of distribution, and that staff
located at the alternate processing
facility has readily access to DR
documents.
Recommendation #11: 1.) ISO to develop and implement The Chief 1.) March 2010
The Chief Information Officer, encryption solution for backup tapes or Information
Information Management Services propose alternative solution. Officer,
should implement a solution that Information
enables CSC to encrypt backup tapes 2.) Deployment of solution Management 2.) June 2010
for application data assessed as Services
sensitive either from a security or from
an access to information perspective
Recommendation #12: 1.) IT Security will lead the development The Chief 1.) March 2010
The Chief Information Officer, of the Resumption Plan in close Information
Information Management Services consultation with ISO. Officer,
should ensure that, as part of the DRP, Information
there is either a plan to restore Management
resumptions of activities back to NHQ Services
or a new DRP which would guide DR
staff in the event of a disaster at the
alternate processing facility (prior to 2.) Once accepted the NHQ resumption 2.) May 2010
resumption at NHQ). plan will be reviewed and approved by
the CIO.

You might also like