Oregon: State Data Center
Oregon: State Data Center
Oregon
Disaster Recovery Overview
Grey area -
Impact to Expected
service delivery time to return
equipment to
normal
service will
determine
whether DR
pact
g e Im is invoked
Outa
Normal
Severity 4 Severity 3 Severity 2 Severity 1 Disaster
Operations Bug or minor issue Major issue with high
where application is still impact – equipment not
functioning usable
2
Successful DR Requires
Cooperation
Participating Agency SDC
• Plans DR based on business needs and priorities
• Plans internal continuity of service delivery if
• Acquires DR services for out-of-scope IT
infrastructure must relocate
• Funds DR planning, backup & recovery
• Contracts with DR vendor to provide infrastructure
• Prioritizes recovery sequence within agency
environment in case of SDC disaster
• Tests agency DR plans • Determines scope and declares SDC disaster to
• Determines scope and declares disaster for out-of-
DR vendor
scope IT • Coordinates Cross-Agency priority sequencing
• Arranges for backups of data and applications
• Tests SDC DR plans
• Keeps vendor informed of changes
• Coordinates movement of people, backup
resources, and communications connectivity
• Keeps vendor informed of changes
3
What BCP Coordinators need
to know
• Who is your DR coordinator?
• Has your agency done DR planning?
• What applications are needed to support critical business functions?
• Where are those applications hosted?
• What are the disaster recovery time objective (RTO) and recovery point
objective (RPO) for each of those applications?
• Are the applications and their data backed up frequently enough to
meet RPO?
• Is the recovery option and grouping of back ups for each application
reasonable for the RTO?
• Will the agency’s budget planning support the cost associated with
meeting the desired RTO and RPO level?
4
What DR Coordinators need to
know
• Answers to all the questions in What do BCP Coordinators need to know,
plus:
– Who is your BCP coordinator?
– What agency infrastructure will need to be recovered before recovered
applications and data will be accessible to users? (e.g., DNS, LDAP,
Active Directory, networks)
– What communications vehicles are expected to be available during a
disaster? (e.g., email, blackberry, IM)
– What are the recovery procedures for agency infrastructure,
communications, applications, and data?
– What are DR testing plans?
– What are the procedures for keeping all of this up to date?
5
SDC DR Project Actions
• Develop DR planning framework and templates with
SunGard
• Identify, scope and develop backup and recovery for
SDC core infrastructure and infrastructure needed to
support agency recovery requirements
• Assist agencies with identifying and scoping DR
requirements for their infrastructure, applications and
data
• Develop and implement tiered DR strategies
• Develop DR test plans and execute initial tests
• Develop and implement DR maintenance process
6
Working with the SDC on DR
Planning
• Submit request for DR planning and
preparation through normal agency
procedures
• Provide initial information on DR
requirements
• Once potential solutions are scoped and
priced, get agency approval to proceed
• Provide detailed planning information
• Plan agency testing
7
Key data for DR Planning
Needed for Agency Your agency acronym
getting to more Application Name or How is the application most commonly known?
ACRONYM
detail
Technical Contact Who could answer questions about the infrastructure needs of the
application?
Technical Contact What is the best way to reach the primary technician?
Needed for phone/email
planning the best
Recovery Time Objectives If a disaster occurred at this moment, how long could the business work
recovery strategy - Days (RTO) around being able to have this application available?
Recovery Time Objectives Special conditions for restoration within this RTO - Would the RTO differ at
(RTO) Special Notes different times of the month or year?
Needed for
Recovery Point How many days worth of new data can be lost or recreated by other
planning the best Objectives - Days means? Are yesterday's backups good enough to recover from? Last
backup strategy (RPO) week's?
Software Component / A single application can consist of one or many components. Please list all
Database Name primary components, e.g., Database name, ColdFusion, Crystal
Reports, etc.
Needed for:
• estimating cost Software Component Who is the component manufacturer? e.g., Oracle, IBM, Microsoft, etc.
Vendor
• aggregating need
Software Component What's the component's primary function? e.g., dbms, reporting, data
Type(s) conversion, connectivity, etc.
Server or LPAR Name List every computer or server or appliance that makes up the entire
application environment.
Server or LPAR Operating Generally what type of server or LPAR is it? e.g., Unix, Intel Linux,
System Windows, Mainframe, zLinux, iSeries, etc.
8
Recovery Options
Relative Recovery Recovery Time
Recovery Option Comments
Cost for DR Category Target
‘Mirroring’, ‘load balancing’, or ‘split site’ –
$$$$$ A++ Immediate recovery 0 hrs automatic fail-over to site away from home
site
Hot standby – ‘mirroring’ at site away from home
$$$$ A+ Fast Recovery < 48 hrs
site – fail-over requires some actions
> 72 hrs,
$$$ B Intermediate recovery Warm standby – subsequent waves
< 1 week
cold standby or lower priority recovery in warm
$$ C Gradual Recovery 1 - 4 weeks
standby site
cold standby or lower priority recovery in warm
$$ D Gradual Recovery > 4 weeks
standby site
9
Recovery Timeline
MAD
* Source: Building a Business Impact Analysis: The Keystone to Effective Business Continuity Planning by Richard Jones, v110
7/30/2008, Burton Group
Definitions for Recovery Timeline
• MAD – Maximum Allowable Downtime; the maximum amount of time the business
can suffer an inoperable business process before significant negative consequences
are felt. Also called Maximum Acceptable Outage (MAO), Maximum Allowable
Outage (MAO), Maximum Acceptable Downtime (MAD), Maximum Tolerable
Downtime (MTD), Maximum Tolerable Outage (MTO), and Maximum Tolerable
Period of Disruption (MTPD).
• RPO – Recovery Point Objective; the amount of IT systems data or transaction loss
that can be tolerated by the business process
• RTO – Recovery Time Objective; the time IT organizations have to recover their
systems to an agreed upon operational state so that workers may then recover the
lost time of the outage to bring the business process back to acceptable service
levels.
• Work Recovery – The work time required to recover the lost transactions of the RPO
time plus the backlog of work created during the system outage. Lost transactions
must be recovered manually and procedures should be in place to accomplish this
work.
• Restoration time – Time to bring the business process back to a state of full
business continuity protection. Basically this is backing up the recovered system and
restoring redundancy capabilities.
11