0% found this document useful (0 votes)
9 views

DBA-325-A - SQLCAT - SQL - Server - HA - and - DR - Design - Patterns - Architectures - and - Best - Practices - Using - SQL - Server - 2012

The document discusses different patterns for achieving high availability and disaster recovery with SQL Server, including using a multi-site failover cluster instance, availability groups, or a combination. It provides architecture diagrams and considerations for each approach.

Uploaded by

dgiannopoulos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

DBA-325-A - SQLCAT - SQL - Server - HA - and - DR - Design - Patterns - Architectures - and - Best - Practices - Using - SQL - Server - 2012

The document discusses different patterns for achieving high availability and disaster recovery with SQL Server, including using a multi-site failover cluster instance, availability groups, or a combination. It provides architecture diagrams and considerations for each approach.

Uploaded by

dgiannopoulos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

AlwaysOn High Availability and

Disaster Recovery
Design Patterns and Architectures
Sanjay Mishra, Mike Weiner, Justin Erickson
Program Managers
Microsoft Corporation

October 11-14, Seattle, WA


What is your Disaster Recovery Plan?

Source: www.dilbert.com
SQLCAT (Customer Advisory Team)
The SQL Server Customer Advisory Team (SQL CAT) represents the
customer-facing resources from the SQL Server Product Group. SQLCAT is
comprised of product and solution experts that regularly engage in the
largest, most complex, and most unique customer deployments worldwide.

Achieving Customer Success


• Very large, complex projects
• Mission Critical projects
Azure Projects
• Are you pushing the limits with Azure? Stop by the SQL Server Clinic and tell us
about it. Free prizes.
Making a Better Product
• Drives feedback and product requirements back into SQL Server development
teams from deep and strategic customer and ISV engagements
Sharing with the Community
• http://sqlcat.com
Prerequisites

Basic knowledge of:


• AlwaysOn Failover Cluster Instances (FCI)
• AlwaysOn Availability Groups (AG)

Definition: For the purpose of this presentation


• High Availability (Local HA): Availability within a data center
• Disaster Recovery (DR): Availability across data centers
Denali HA+DR Solutions

• Shared Storage solution *


• Instance Level HA
Multi-site Failover Cluster Instance
1 • Instance Level DR
(FCI) for HA and DR • Does NOT require database to be in FULL
recovery model

• Non-Shared Storage solution


• (Group of) Database Level HA
2 Availability Group for HA and DR • (Group of) Database Level DR
• DR replica can be Active Secondary
• Requires database to be in FULL recovery model

• Combined Shared Storage and Non-Shared


Storage
Failover Cluster Instance for local HA • Instance Level HA
3
+ Availability Group for DR • (Group of) Database Level DR
• DR replica can be Active Secondary
• Requires database to be in FULL recovery model
Multi-site Failover Cluster
Instance (FCI) for HA and DR

October 11-14, Seattle, WA


Multi-site FCI Architecture
A single Windows server failover cluster (WSFC), and SQL
Server failover cluster instance (FCI) spanning across two
sites to provide local High Availability as well as a Disaster
Recovery solution
Multi-site Failover Cluster Instance Architecture
(Storage Replication)

4) WSFC Mounts
Physical Disk Resources

2) Storage
software detects
storage
replication state
3) Storage
software sets
storage to read
write enable

Storage
Storage
Replication
Replication

5) Replication reversed
Architecture Enhancements – Multi-site Cluster

Version Native Implementation


Support
SQL Server 2008 R2 NO • Create stretch Virtual-LAN (VLAN) to act
as a single subnet
Microsoft SQL YES • IP address OR dependency set within
Server code SQL Server setup
• SQL Engine skips binding to any IP’s
named “Denali” which are not online at start-up
AlwaysOn Multi-site Failover Cluster
Deployment Considerations
Storage Validation

• Multi-site FCI Solution does not require passing the storage validation tests, to be
supported. http://support.microsoft.com/kb/943984

Node and Disk Majority


AlwaysOn Multi-site Failover Cluster
Deployment Considerations

dd “MultiSubnetFailover=True” in
connection string

• Older Client driver or if cannot change connection string:


Configure connection timeout value appropriately
• On client machine reboot or subnet failover of cluster
client will try to connect. If it cannot resolve an online IP
address within the timeout value then receive Connection
Timeout.
Availability Group for HA and DR

October 11-14, Seattle, WA


SQL Server 2005, 2008, 2008 R2
• Using Database Mirroring for local high availability and combining it
with Log Shipping for a disaster recovery solution is a popular
deployment architecture today.
• Customer Evidence: MSIT SAP, Bwin, and many more …

Primary Data Center Disaster Recovery Data Center

Witness

Principal Mirror Log Shipping Secondary

Synchronous
Database
Mirroring

Log Shipping
SQL Server Codename “Denali”
Replace Database Mirroring and Log Shipping with Availability Group

Windows Server Failover Cluster

Primary Data Center Disaster Recovery Data Center

Fileshare Witness Availability Group


Primary Secondary Secondary

Synchronous

Synchronous / Asynchronous

Note: More secondaries (total upto 4) can be added for additional resiliency or read scaleout
Considerations

All SQL servers (including the secondary in the DR site) in the same Windows
domain
• One Windows Server Failover Cluster spreads over the primary and DR sites
All the databases must be in FULL recovery model
The unit of failover (for local HA, as well as DR) is at the AG level, i.e., group of
databases – not the instance
• Consider using Contained Database for containing logins for failover
• For jobs and other objects outside the database, simple customization needed
No delayed apply on the secondary
Removing log shipping means the regular log backup job is removed
• Need to re-establish periodic log backup (essential for truncating the log)
• New tools for monitoring and alerting
• AlwaysOn Dashboard
• System Center Operations Manager
Client Connectivity

Read / Write Workload


• Connecting using AG Listener
• Connection using FAILOVER_PARTNER (if connection string of
existing applications can’t be changed)
Read Only Workload
• Connection using AG Listener and ApplicationIntent=ReadOnly
• Connection to the secondary instance directly
• ReadOnly Routing
Multi subnet failover scenario
• New client libraries => MultiSubnetFailover=True
• Old client libraries configure appropriate client connection
timeout
Quorum Configuration
Step1: Node Votes
• While using AG for HA and DR, assign
• 1 vote to the nodes in the primary data center (nodes
participating in automatic failover)
• 0 votes to all other nodes
• Use Powershell or cluster.exe to assign / change votes
to nodes
Step2: Quorum Model
• Use a fileshare witness and chose “Node and Fileshare
Majority” quorum model
• Alternatively, add an additional “witness” node, and
then use “Node Majority”
• This node doesn’t need SQL Server installed on it
Quorum Model and Node Votes

Windows Server Failover Cluster

Primary Data Center Disaster Recovery Data Center

Fileshare Witness
Availability Group
Primary Secondary
1 vote 1 vote Secondary
0 vote

Synchronous

Node 1 Node 2 Node 3


Synchronous / Asynchronous

Note: The Fileshare Witness always has 1 vote. So, the above WSFC has 3 votes.
DR Scenario

Disaster = Primary site is down


Manual Process involved to bring database service online
on the DR site
• Use Failover Cluster Manager (or cluster.exe) to Force Quorum on
the secondary in the DR site
• Force Cluster Service on the secondary in the DR site
• Use T-SQL on the secondary in the DR site to execute FORCE
SERVICE ALLOW DATA LOSS on the Availability Group
• Use PowerShell or cluster.exe to adjust node votes on all the nodes
• Assign 1 vote to the node in the DR site (now running as primary)
• Assign 0 vote to each node in the primary data center
Failover Cluster Instance for HA +
Availability Group for DR

October 11-14, Seattle, WA


SQL Server 2005, 2008, 2008 R2

Primary Site DR Site


Principal

Database

Mirror
SQL-FCI-1 SQL-FCI-2
Mirroring
SQL Server Codename “Denali”
Replace Database Mirroring with Availability Group

Windows Server Failover Cluster

Primary Site DR Site


Node 1 Node 2 Node 3 Node 4

SQL-FCI-1 SQL-FCI-2

Secondary
Availability
Primary

Group
Considerations
One Windows Server Failover Cluster spreads over the primary and DR sites,
encompassing the two FCIs
• New ways to look at setup, quorum models, DR operations, etc.
The DR failover unit is at the AG level, i.e., group of databases – not the
instance
• Consider using Contained Database for containing logins for failover
• For jobs and other objects outside the database, simple customization
needed
• New tools for monitoring and alerting
• AlwaysOn Dashboard
• System Center Operations Manager
• Pre-requisite Windows Service packs / QFEs:
• Asymmetric Storage
• Windows Server 2008 with http://support.microsoft.com/kb/976097
• OR, Windows Server 2008 R2 SP1
• Node Votes: http://support.microsoft.com/kb/2494036
• Validate disk test QFE: http://support.microsoft.com/kb/2531907
Asymmetric Storage

• Key concept behind this architecture


• Windows Server Failover Clustering
capability introduced in:
• Windows Server 2008 R2 SP1
• Windows Server 2008 with QFE
• Symmetric storage = a cluster disk that is
shared between all the WSFC nodes
• Asymmetric storage = a cluster disk that is
shared between a subset of nodes
Setup Progress …

Windows Server Failover Cluster

Primary Site

Node 1 Node 2

SQL-FCI-1
Primary
Add Nodes:
Validation Tests pass with warnings

Most interesting of the warnings:


List potential cluster disks
Disk with id 6b4f692e is visible or cluster-able
only from a subset of nodes.
Expected and it is fine
Asymmetric storage is the key
Setup Progress …

Windows Server Failover Cluster

Primary Site DR Site


Node 1 Node 2 Node 3 Node 4

SQL-FCI-1
Primary
Add Disks
Install the second FCI (SQL-FCI-2)

The instance name of the second FCI (SQL-FCI-2)


has to be different from the first FCI (SQL-FCI-1)
Setup Progress …

Windows Server Failover Cluster

Primary Site DR Site


Node 1 Node 2 Node 3 Node 4

SQL-FCI-1 SQL-FCI-2

Secondary
Primary
Create AG between the two FCIs

Before creating AG, verify that the possible owner list of


all resources for the SQL-FCI-1 and SQL-FCI-2 have
been set appropriately
Create AG, with SQL-FCI-1 as primary and SQL-FCI-2 as
secondary
Make sure you pick “Manual Failover” (i.e., “High
Safety” or “High Performance”) while creating AG
• When FCI is combined with AG, “automatic failover”
happens within the FCI. The AG failover is manual.
Setup Progress …

Windows Server Failover Cluster

Primary Site DR Site


Node 1 Node 2 Node 3 Node 4

SQL-FCI-1 SQL-FCI-2

Secondary
Availability
Primary
Group
Client Connectivity

Read / Write Workload


• Connection using AG Listener
• Connection using FAILOVER_PARTNER (if connection string of
existing applications can’t be changed)
Read Only Workload
• Connection using AG Listener and ApplicationIntent=ReadOnly
• Connection to the secondary instance directly
• ReadOnly Routing
Multi subnet failover scenario:
• New client libraries => MultiSubnetFailover=True
• Old client libraries => configure appropriate client connection
timeout
Quorum Model

Quorum is managed at the Windows Cluster (WSFC)


node level, irrespective of how many SQL Server
instances (FCIs or stand-alone), AGs or replicas you
have in your topology.
Two steps involved in quorum decision:
• Step1: Selecting which nodes get to vote
• Step2: Selecting appropriate quorum model
Quorum Configuration: Node Votes
• Why node votes important
• By default all nodes get 1 vote
• The cluster (WSFC) needs majority of votes for it to operate and to
avoid split brain
• Node votes are needed to control and ensure that:
• Unavailability of secondary replica nodes doesn’t cause loss of
quorum for the cluster and hence reduce the availability of the
primary
• Disconnect between sites doesn’t cause loss of quorum for the
cluster and hence reduce the availability of the primary
• Windows (2008 and 2008 R2) QFE
(http://support.microsoft.com/kb/2494036) allows you configure 1
vote for some nodes and 0 votes for some other nodes
• Assign 1 vote to all possible owner nodes for the primary FCI (SQL-FCI-
1 in the diagram), and 0 votes to all other nodes
• Use Powershell or cluster.exe command to assign votes to nodes
Node Votes for FCI + AG Configuration

Windows Server Failover Cluster

Primary Site DR Site


Node 1 Node 2 Node 3 Node 4
0 Vote
1 Vote 0 Vote
1 Vote

Secondary
SQL-FCI-1 Availability SQL-FCI-2
Primary
Group
Quorum Configuration: Quorum Model

• Once the node votes are decided, pick the appropriate


quorum model
• If the primary FCI has odd number of nodes, use “Node
Majority”
• If the primary FCI has even number of nodes
• Use a fileshare witness and chose “Node and Fileshare Majority”
quorum model, OR
• Alternatively, add an additional “witness” node and then use
“Node Majority”
• This node doesn’t need to be part of the FCI, or have SQL Server
installed on it
• Preferred location for the fileshare witness or the witness node, in
this configuration, should be the primary site
Quorum Model for FCI + AG Configuration

Windows Server Failover Cluster

Primary Site DR Site


Fileshare
Node 1 Node 2 Node 3 Node 4
0 Vote
1 Vote 0 Vote
1 Vote

Secondary
Availability
Primary SQL-FCI-2
SQL-FCI-1 Group
Asymmetric Disk as Quorum Resource
• Prior to Asymmetric Disk capability, for a disk to be a
cluster resource (and a quorum resource) it was
required to be visible from all the nodes.
• With Asymmetric Disk capability, a cluster disk can be
visible to a subset of nodes.
• Asymmetric Disk can be used a quorum resource:
• Not through Failover Cluster Manager GUI, or PowerShell
• But through cluster.exe command line
• Asymmetric Disk as quorum resource enables quorum
models:
• Node + Asymmetric Disk Majority
• Asymmetric Disk Only
FCI + AG Configuration with Asymmetric
Disk-Only Quorum Model

Windows Server Failover Cluster

Primary Site DR Site

Node 1 Node 2 Node 3 Node 4

Secondary
Availability
Primary SQL-FCI-2
SQL-FCI-1 Group

Asymmetric Disk-Only
Quorum
Quorum Configuration

• Whenever you fail over outside the primary FCI, re-


evaluate quorum configuration
• The primary FCI is the automatic failover target
• When you failover to the secondary FCI (SQL-FCI-2):
• Assign 1 vote to each node in SQL-FCI-2
• Assign 0 vote to each node in SQL-FCI-1
• Configure a fileshare witness in the DR site
DR Scenario
Disaster = Primary FCI (SQL-FCI-1) has failed
Manual Process involved to bring database service online
• Use Failover Cluster Manager (or cluster.exe) to Force
Quorum on SQL-FCI-2
• Force Cluster Service on one of the nodes on SQL-FCI-2
• Start Cluster Service on rest of the nodes on SQL-FCI-2
• SQL-FCI-2 will now be available
• Use T-SQL on SQL-FCI-2 to execute FORCE SERVICE
ALLOW DATA LOSS on the Availability Group
• Adjust node votes and quorum configuration
appropriately
Quorum Model for FCI + AG Configuration:
After Failover

Windows Server Failover Cluster


Primary Site DR Site
Fileshare

Node 1 Node 2 Node 3 Node 4


1 Vote
0 Vote 1 Vote
0 Vote

Secondary
Availability
SQL-FCI-1 SQL-FCI-2

Primary
Group
Recap: Denali HA+DR Solutions
Corresponding
Denali HA+DR Solution Solution Characteristics
Existing Solution
• Shared Storage solution *
Multi-site Failover • Instance Level HA
Multi-site FCI using
1 Cluster Instance (FCI) • Instance Level DR
stretch VLAN
for HA and DR • Doesn’t require database to be in
FULL recovery model

• Non-Shared Storage solution


• (Group of) Database Level HA
Database Mirroring for
Availability Group for • (Group of) Database Level DR
2 Local HA and Log
HA and DR • DR replica can be Active Secondary
Shipping for DR
• Requires database to be in FULL
recovery model

• Combined Shared Storage and Non-


Failover Cluster Shared Storage
Failover Cluster
• Instance Level HA
Instance for local HA + Instance for Local HA
3 • (Group of) Database Level DR
Availability Group for • DR replica can be Active Secondary
and Database Mirroring
DR for DR
• Requires database to be in FULL
recovery model
Complete the Evaluation Form
to Win!
Win a Dell Mini Netbook – every day – just for
submitting your completed form. Each session
evaluation form represents a chance to win.

Pick up your evaluation form:


• In each presentation room Sponsored by Dell
• Online on the PASS Summit website
Drop off your completed form:
• Near the exit of each presentation room
• At the Registration desk
• Online on the PASS Summit website
Thank you
for attending this session and the
2011 PASS Summit in Seattle

October 11-14, Seattle, WA


Appendix:
FCI+AG setup screen shots

October 11-14, Seattle, WA


Create a 2 node WSFC
Install SQL-FCI-1: Install FCI on Node 1, then
perform Add Node on Node 2.
Add the other two nodes (Nodes for SQL-
FCI-2 in the DR site) to WSFC
Run all validation tests, but leave the SQL-
FCI-1 online
Move “Available Storage” to one of the
nodes for SQL-FCI-2

Start Command prompt "run as administrator",


and issue command:
cluster group "Available Storage" /move: <SQL-FCI-2 node>
Available Storage (and new disks) now online, Set
appropriate drive letters using Disk Management
Setup Progress …
Set Possible Owners appropriately
for all resources for SQL-FCI-1 and SQL-FCI-2
Microsoft SQL Microsoft Expert Pods Hands-on Labs
Server Clinic Product Pavilion Meet Microsoft SQL
Server Engineering
Work through your Talk with Microsoft SQL Get experienced through
team members &
technical issues with SQL Server & BI experts to self-paced & instructor-
SQL MVPs
Server CSS & get learn about the next led labs on our cloud
architectural guidance version of SQL Server based lab platform -
from SQLCAT and check out the new bring your laptop or use
Database Consolidation HP provided hardware
Appliance

Room 611 Expo Hall 6th Floor Lobby Room 618-620

You might also like