0% found this document useful (0 votes)
9 views45 pages

O31 SQL Server High Availability: Mike Shelton

The document provides an overview of SQL Server High Availability (HA) technologies and strategies, including failover clustering, log shipping, and replication. It emphasizes the importance of defining high availability goals, identifying barriers, and understanding the costs associated with downtime. Additionally, it discusses operational excellence, monitoring, and disaster recovery planning as critical components for achieving effective high availability in SQL Server environments.

Uploaded by

jun peng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views45 pages

O31 SQL Server High Availability: Mike Shelton

The document provides an overview of SQL Server High Availability (HA) technologies and strategies, including failover clustering, log shipping, and replication. It emphasizes the importance of defining high availability goals, identifying barriers, and understanding the costs associated with downtime. Additionally, it discusses operational excellence, monitoring, and disaster recovery planning as critical components for achieving effective high availability in SQL Server environments.

Uploaded by

jun peng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

IBM GLOBAL SERVICES

O31

SQL Server High Availability

Mike Shelton

IBM xSeries
Aug. 9 - 13, 2004
Technical Conference

Chicago, IL

© IBM Corporation 2004

RETURN TO INDEX
Overview
• Defining High Availability
• Setting High Availability Goals and Identifying
Barriers
• SQL Server 2000 HA Technology
– Failover Clustering
– Log Backup Shipping
– Replication
• High Availability Operations and Support
Defining Five Nines
• What Is 99.999%?
– Target:
• Online and available to users 24 hours a day,
365 days a year
• Total outages less than 5.26 minutes per year

• Is Five Nines the Only Option?


– Four Nines: 52.6 minutes of downtime a year
– Perhaps 12 hours a day, 5 days a week
• Allows for planned downtime: maintenance, etc.
Un-managed

Well-managed Nodes
Masks Some Hardware Failures
Well-managed Packs and Clones
Masks Hardware Failures
Masks Operations Tasks (e.g. Software Upgrades)
Masks Some Software Failures

Well-managed Geoplex
Masks Site Failures (Power, Network, Fire, Move…)
Masks Some Operations Failures
Business Makes High
Availability Necessary
• Reliance on technology
– Hospital has high cost for downtime (lives!)
• Product or service availability
• Continuous improvement of products, services,
and processes
– Example: It would be a failure if “the person to call”
left the company two years ago and nobody can
currently offer expertise.
• DBA interest in continuous improvement of
products, services, and processes
What High Availability Is Not
• A Technology Solution From a Vendor
• A Scalability Solution
• An IT Decision Without Business Knowledge
• A Business Decision Isolated From the Cost of
Downtime

• It is:
– A solution involving people and process, and, very
likely, technology
High Availability Framework
Business Availability Goals
Other relevant factors
Database Size
Barriers to Availability
Throughput Requirements

Downtime Budget Cost Requirements

Solution Component Costs

Solutions to Specific Barriers etc.

HA Solution HA Solution HA Solution


Component (1) Component (2) Component (n)
Setting High Availability Goals
• Identify Stakeholders
• Value to the Business
• How the System Is Used
• Cost Limitations
• Never Lead With Technology
Identifying Barriers
• A Barrier Impacts a System’s Availability
• Use Monitoring to Identify Barriers
• Any High Availability Element Can Become a
Barrier
– Hardware
– Communication/connectivity
– Environmental
– Services
– Software
– Process
– Application or User Error
– Staffing
Calculating the Time and Cost of
High Availability
• Cost of Downtime
• Probability of Occurrence
• Cost to Remove Barriers
Total Cost of Ownership Factors
1: Hardware and Software
7: End-User 2: Management /
Costs Procedural
Costs
3: Support

6: Downtime 4: Development
5: Telecommunication Fees
Microsoft SQL Server 2000 High
Availability Technology
• SQL Server 2000 Editions Suitable for High
Availability
• Comparison of Standby Options
• High Availability Features
• Failover Clustering
• Log Shipping
• Transactional Replication
• Using Combinations of Technologies
SQL Server 2000 Editions Suitable
for High Availability
• Enterprise Edition
– Most Scalable and Highly Available
– Includes Failover Clustering
– Includes Log Shipping Features
– Suitable for Production
• Developer Edition
– Full Featured (Same As Enterprise Edition)
– Suitable for Development and Testing
Comparison of Standby Options
• Hot Standby
• Warm Standby
• Cold Standby
High Availability Features
• Standby Type
• Failure Detection
• Automatic Failover
• Masks Disk Failure
• Masks SQL Process Failure
• Masks Other Process Failure
• Meta Data Support
• Transactionally Consistent
• Transactionally Current
• Perceived Downtime
• Transparent to Client
• Special Hardware Needed
• Distance Limit
• Complexity
• Standby Accessible
• Impact on Performance
• Impact on Backup Strategy
Failover Clustering
• Types of Clusters
• Windows Clustering
• SQL Server 2000 Failover Clustering
• How Failover Clustering Works
• Enhancements to Failover to Clustering
• High Availability Features in Failover
Clustering
Types of Clusters
• Windows Cluster
• Failover Cluster
• Federated Cluster
• Network Load Balancing Cluster
Failover Clustering
Windows Clustering
• Hardware Components
– Cluster node
– Heartbeat
– External network
– Shared cluster disk array
– Quorum drive
• Software Components
– Cluster name
– Cluster IP address
– Cluster administrator account
– Cluster resource
– Cluster group
• Virtual Server
SQL Server 2000 Failover
Clustering
Public Network

SQL Server 2000 Virtual Server


MSCS MSCS
Heartbeat
Local Local
Disk Disk
(Binaries, (Binaries,
Tools) Node B Tools)
Shared
Node A Disk
Array
(Data)
How Failover Clustering Works
• Operating System Checks
– Heartbeat Checks Availability of Nodes and Virtual Servers
• SQL Server Checks
– Looks-alive Check Every 5 Seconds
– IsAlive check runs SELECT @@VERSION query
• Failover to Another Node
– Windows Clustering Attempts Restart on Same Node or Fails
Over to Another Node
– SQL Server Service Starts
– Brings master Online
– Database Recovery Proceeds
– End Users and Applications Must Reconnect
Enhancements to Failover
Clustering
• SQL Server Setup Installs/Uninstalls a Cluster
• Service Packs Can be Applied Directly to Virtual Servers
• SQL Server Supports Multiple Instances and Multiple Network
Addresses
• Failover and Failback to or From Any Node in a Cluster
• SQL Server 2000 on Windows 2000 Datacenter Server Supports 4
Server Nodes in a Cluster
• All Nodes Have Local Copies of SQL Server Tools and
Executables
• Rerunning the Setup Program Updates Failover Cluster
Configurations
• SQL Server Service Manager or SQL Server Enterprise Manager
Now Start/Stop SQL Server Services
Failover Clustering Terminology

• “Multiple Active Instances” or


“Single Active Instance”
• No longer “Active/Active” or “Active/Passive”
Cluster Administrator

4-nodes
High Availability Features in Failover
Clustering (1 of 2)
Availability Feature Failover Clustering
Standby Type Hot
Failure Detection Yes
Automatic Failover Yes
Masks Disk Failure No; Shared Disk
Masks SQL Process Failure Yes
Masks Other Process Failure Yes
Meta Data Support All System and Database
Transactionally Consistent Yes
Transactionally Current Yes, Always Up to Date
High Availability Features in
Failover Clustering (2 of 2)
Availability Feature Failover Clustering
Perceived Downtime 30 Seconds + DB Recovery
Transparent to Client Yes, Reconnect to Same IP
Special Hardware Needed Specialized Hardware from Cluster HCL
Distance Limit 100 Miles
Complexity More
Standby Accessible Standby never accessible
Impact on Performance No Impact
Impact on Backup Strategy Must be able to backup from any node
SQL Server Failover Clustering
• Hot Standby Solution
• Best High Availability Configuration
– Redundant System
– Shared Access to the Database Files
– Recovery in Seconds
– Automatic Failure Detection
– Automatic Failover
– Minimal Client Application Awareness
• Built on Microsoft Cluster Server
Log Shipping
• Warm Standby Solution
• Applies Transaction Log From Primary Server
(Primary) to Warm Standby (Secondary)
• Attributes of Log Shipping
– Warm Standby Available for Limited Read-Only Use
– All Logged Schema and Data Changes Applied
– Cannot Filter Changes for Partitioning or Subsets
• Manual Failure Detection; Manual Failover

• For Yukon we call this “Log Backup Shipping”


– Differentiate it from Real-time Log Shipping
Log Shipping Architecture
Secondary Server (1..n)
Primary Server

Monitoring Server

1. BACKUP 3.
Transaction- RESTORE
Log Transaction-log
WITH STANDBY
Tranaction-Log Transaction-Log
Dump Dump
2. Log COPY (“Pulled”)

“SQL Agent” Scheduled Jobs


Log Shipping Monitor
Log Shipping HA Features (1 of 2)
Availability Feature Failover Clustering Log Backup Shipping
Standby Type Hot Warm
Failure Detection Yes No, NLB helps
Automatic Failover Yes No, NLB helps
Masks Disk Failure No; Shared Disk Yes
Masks SQL Process Failure Yes Yes

Masks Other Process Failure Yes No

Meta Data Support All System and Database Database Only

Transactionally Consistent Yes Yes

Transactionally Current Yes, Always Up to Date No, Since Last Log Backup

Perceived Downtime 30 Seconds + DB Recovery Seconds + DB Recovery Time


Log Shipping HA Features (2 of 2)
Availability Feature Failover Clustering Log Backup Shipping

Transparent to Client Yes, Reconnect to Same IP No, App must know standby

Special Hardware Needed Specialized Hardware from No; Duplicate system needed
Cluster HCL
Distance Limit 100 Miles Dispersed
Complexity More Some
Standby Accessible Standby never accessible Yes, Multiple Copies, Read-only;
% depends on update frequency

Impact on Performance No Impact Minimal – File Copy on Primary

Impact on Backup Strategy Must be able to backup from Minimal – many small backups
any node
High Availability Uses of Log
Shipping
• Shorter failover time
• If there is a high incidence of user error and a
need to recover data frequently without
recovering the whole database
– Allows you time-delay possibilities
• 5 hours behind
• 8 hours behind
• Increase data redundancy
• Less complex hardware – no HCL
When to Consider Using Replication
for HA
• After Considering Failover Clustering
• After Considering Log Shipping
• System and Some User Metadata is Not Replicated
• Failure Detection and Failover is Not Automatic
– Standby Server is Not Identical to the Primary
• Not Guaranteed to be Transactionally Current
– Merge Replication is not Transactionally Consistent
• Replication Uniquely Allows:
– Partitioning of Data on the Standby Server
However, standby server is not identical to primary server
– Offline Access to the Data without Periodic Termination
Transactional Replication
• Warm Standby Solution
• Propagates Transactions From Primary
Server (Publisher) to Warm Spare
(Subscriber)
• Use Replication to Create
– A Read-Only Spare
– A Scale Out Solution
– A Partitioned Solution
• Manual Failure Detection; Manual
Failover
Comparing Clustering, Log Shipping, and
Transactional Replication (1 of 2)
Availability Feature Failover Clustering Log Backup Shipping Transactional
Replication
Standby Type Hot Warm Warm
Failure Detection Yes No, NLB helps No
Automatic Failover Yes No, NLB helps No, NLB helps
Masks Disk Failure No; Shared Disk Yes Yes
Masks SQL Process Failure Yes Yes Yes

Masks Other Process Yes No No


Failure
Meta Data Support All System and Database Only Object(s)
Database
Transactionally Consistent Yes Yes Transactional: Yes;
Merge: No
Transactionally Current Yes, Always Up to No, Since Last Log No, Since Last
Date Backup Distributed Op
Perceived Downtime 30 Seconds + DB Seconds + DB Detect, then manual
Recovery Recovery Time fail over
Comparing Clustering, Log Shipping, and
Transactional Replication (2 of 2)
Availability Feature Failover Clustering Log Backup Shipping Transactional
Replication
Transparent to Client Yes, Reconnect to No, App must know No, App must know
Same IP standby standby
Special Hardware Specialized Hardware No; Duplicate system No; Duplicate system
Needed from needed needed
Cluster HCL
Distance Limit 100 Miles Dispersed Dispersed
Complexity More Some More
Standby Accessible Standby never Yes, Multiple Copies, Yes, Multiple Copies,
accessible Read-only; % depends on Read-only; perhaps
update frequency updateable
Impact on Performance No Impact Minimal – File Copy on Minimal Impact
Primary
Impact on Backup Must be able to backup Minimal – many small No Impact; but must
Strategy from any node backups backup distribution
database
Choosing Replication to Implement a
Warm Standby
• Data Partitioning
• Offline Access to the Data
• Geographically Separated Warm Standby
• Costs
– Complexity
– Process
– Resources
Using Combinations of
Technologies
• Log Shipping With Replication
• SQL Server Failover Clustering With Log
Shipping
• Log Shipping With Network Load Balancing
Establish Operational Excellence

• Data center principles


• Change Control
• Staffing
• Disaster Recovery Plan
• Run Book
Monitoring for HA
• Two theories:
– Every counter 100% of the time
– Just what you need
• Don’t forget Profiler
• Coordinate with Event Logs, SQL Logs, IIS
Logs, etc.
– Time difference between servers
– HA is a total solution … not just SQL
Backup And Restore
• Last resort for HA!
• Develop a backup strategy
• Full database backups
• File/filegroup backups
• Transaction log backups
• Image of operating system disk
• Test your backups on another server
• Rotate your tapes off-site
Backup And Restore (cont’d)

• Test your recovery plans


• Pick a point in time to recover to
– Locate the tapes
– Test using the graphical interface
– Test using script only
– Test with different people on all teams
• Time the drill: how long will it take?
Designing A Disaster Recovery
Plan
• One of the keys to HA
– Without this, you might as well not do HA
• Different plans:
– Site down
– Server down
– Data gone
• Document plan (keep updated) – test, test
TEST! Store results/learnings
• Off-site backup storage, including the
operational manual (run book)
SQL Server 2000 High Availability
Resources
• MSPress title: SQL Server 2000 High Availability
Authors: Allan Hirt with Cathan Cook, Kimberly L. Tripp,
Frank McBath ISBN: 0-7356-1920-4
• Microsoft SQL Server 2000 High Availability Series
http://www.microsoft.com/technet/prodtechnol/sql/2000/deploy/s
qlhalp.mspx
High Availability In SQL Server

Questions??

RETURN TO INDEX

You might also like