Implementing Failover Clustering

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 46

Module 11

Implementing failover clustering


Module Overview

Overview of failover clustering


Implementing a failover cluster
Configuring highly available applications and
services on a failover cluster
Maintaining a failover cluster
• Implementing a stretch cluster
Lesson 1: Overview of failover clustering

What is availability?
Failover clustering improvements in Windows
Server 2012 R2
Failover clustering improvements in Windows Server 2016
Failover cluster components
What are failover and failback?
Failover cluster networks
Failover cluster storage
What is quorum?
Quorum modes in Windows Server 2016 failover clustering
What are CSVs?
What is availability?

• Availability is a level of service expressed as a


percentage of time
• Highly available services or systems are
available more than 99 percent of the time
• High-availability requirements differ based on
how availability is measured
• Planned outages typically are not included when
calculating availability
Failover clustering improvements in Windows
Server 2012 R2

Significant new features of failover clustering in


Windows Server 2012 R2:
• Quorum changes and dynamic witness
• Force quorum resiliency
• Tie breaker for 50% node split
• Global Update Manager mode
• Cluster node health detection
• AD DS–detached clusters
Failover clustering improvements in Windows
Server 2016

Failover clustering improvements in


Windows Server 2016:
• Cluster operating system rolling upgrades
• Storage Replica
• Azure Cloud Witness
• VM resiliency
• Site-aware failover clusters
• Work group and multi-domain clusters
Failover cluster components
Shared bus or
iSCSI connection Service or
application

Cluster storage

Node 1 Node 2
A dedicated network
connects the failover
cluster nodes

A network connects Clients


the failover cluster and
clients
What are failover and failback?

• During failover, the clustered instance and all


associated resources are moved from one node to
another
• A failover occurs when:
• The node that hosts the instance becomes inactive for
any reason
• One of the resources within the instance fails
• An administrator makes a switchover

• The cluster service can fail back after the offline


node becomes active again
• Both planned and unplanned failovers can occur
Failover cluster networks

Network Description
Public network Clients use this network to connect to the
clustered service
Private network Nodes use this network to communicate with
each other
Public-and-private Required to communicate with external
network storage systems

• One network can support both client and node


communications
• Multiple network cards are recommended to
provide enhanced performance and redundancy
• iSCSI storage should have a dedicated network
Failover cluster storage

• Failover clusters require shared storage to provide


consistent data to a virtual server after a failover
• Shared storage options include:
• SAS
• iSCSI
• Fibre Channel
• Shared .vhdx
• Scale-Out File Server

• You can also implement clustered


storage spaces to achieve high
availability at the storage level
What is quorum?

• In failover clusters, quorum defines the consensus


that enough cluster members are available to
provide services
• Quorum:
• Is based on votes in Windows Server
• Enables nodes, file shares, or a shared disk to have a vote,
depending on the quorum mode
• Enables the failover cluster to remain online when sufficient
votes are available
Quorum modes in Windows Server 2016 failover
clustering

• Use dynamic quorum mode with:


•A disk witness
•A file share witness
• The Azure Cloud Witness
• Use all other quorum modes only in specific use
cases–the default and recommended best
practice is to always use dynamic quorum
What are CSVs?

• The benefits of CSVs include:


• Fewer required LUNs
• Better use of disk space
• Resources in a single logical location
• No special required hardware
• Increased resiliency

• To implement a CSV:
1. Create and format volumes on shared storage
2. Add the disks to failover cluster storage
3. Add the storage to the CSV
CSV improvements

The enhancements and new functionalities to


CSVs in Windows Server 2012 R2 include:
• Optimized CSV placement policies
• Increased CSV resiliency
• CSV cache allocation
• The ability to diagnose CSV
• CSV interoperability
Lesson 2: Implementing a failover cluster

Preparing for implementing failover clustering


Hardware requirements for failover cluster
implementation
Network requirements for failover cluster
implementation
AD DS and infrastructure requirements for failover
clusters
Software requirements for a failover cluster
implementation
• Demonstration: Validating and configuring a
failover cluster
Preparing for implementing failover clustering

Use failover clustering when:


• High availability is required
• Scalability is not required
• The application is stateful
• The client automatically reconnects to the application
• The application uses IP-based protocols
Hardware requirements for failover cluster
implementation

The hardware requirements for a failover


implementation include:
• The server hardware must be certified for
Windows Server
• The server nodes should all have the same
configuration and contain the same or similar
components
• All the tests in the Validate a Configuration
Wizard pass
Network requirements for failover cluster
implementation

The network requirements for a failover


implementation include:
• The server should be connected to multiple
networks for communication redundancy or to a
single network with redundant hardware to
remove single points of failure
• The network adapters should be identical and
have the same IP protocol versions, speed,
duplex, and flow control capabilities
AD DS and infrastructure requirements for
failover clusters
• The infrastructure requirements for a failover cluster
implementation include the following:
• The nodes in the cluster typically use DNS for name
resolution
• All servers in the cluster should be in the same Active
Directory domain for Windows Server 2012, however
this is not required for Windows Server 2016
• The user account that creates the cluster must have
administrator rights and permissions on all servers, and
the Create Computer Objects permission in the domain
• Failover cluster infrastructure recommendations include:
• Do not install the AD DS role on any of the cluster
nodes
Software requirements for a failover cluster
implementation

The software best practices for a failover cluster


implementation include:
• All nodes should have the same edition of
Windows Server 2016, which can be any of the
following:
• Windows Server 2016 Standard, Desktop Experience,
Server Core, or Nano Server installation
• Windows Server 2016 Datacenter, Desktop Experience,
Server Core, or Nano Server installation
• All nodes should have the same service pack and
updates
Demonstration: Validating and configuring a
failover cluster

In this demonstration, you will see how to validate


and configure a failover cluster
Lesson 3: Configuring highly available
applications and services on a failover cluster

Identifying cluster resources and services


Clustering server roles process
Demonstration: Clustering a file server role
Failover cluster management tasks
Managing cluster nodes
• Configuring application failover settings
Identifying cluster resources and services

• Clustered services:
• Are services or applications that are made highly
available by installing them on a failover cluster
• Are active on one node but can be moved to another
node
• Resources:
• Are the components that make up a clustered service
• Are moved to another node when one node fails
• Can run on only one node at a time
• Include components such as shared disks, names, and
IP addresses
Clustering server roles process

1. Install the Failover Clustering feature


2. Verify the configuration, and create a cluster
3. Install the role on all cluster nodes by using
Server Manager
4. Create a clustered application by using the
Failover Clustering Management snap-in
5. Configure the application
6. Test the failover
Demonstration: Clustering a file server role

In this demonstration, you will see how to cluster a


file server role
Failover cluster management tasks

Common management tasks include:


• Managing cluster nodes
• Managing cluster networks
• Managing permissions
• Configuring cluster quorum settings
• Migrating services and applications to a cluster
• Configuring new services and applications
• Removing clusters
• Upgrading cluster nodes to new operating system
versions
Managing cluster nodes

• To manage cluster nodes, you can:


• Add nodes after you create a cluster
• Pausenodes, which prevents resources from
running on that node
• Evict
nodes from a cluster, which removes the
node from the cluster configuration
• All of these actions are available in the Failover
Cluster Management console, Actions pane
Configuring application failover settings

• Considerations for using preferred owners:


• You set preferred owners are set on the clustered role
• You can set multiple preferred owners can be set in an ordered
list
• Setting preferred owners gives control over:
• The order in which a role selects a node to run
• The roles that can be run on the same nodes

• Options to modify failover and failback settings:


• Setting the number of times the Cluster service restarts a clustered
role in a set period
• Setting or preventing failback of the clustered role to the
preferred node when it becomes available
Lesson 4: Maintaining a failover cluster

Monitoring failover clusters


Backing up and restoring a failover cluster
configuration
Troubleshooting failover clusters
What is CAU?
How CAU works
• Demonstration: Configuring CAU
Monitoring failover clusters

Tools you can use to monitor clusters include:


• The Event Viewer
• Tracerpt.exe
• MHTML-formatted cluster configuration reports
• The Performance and Reliability Monitor snap-in
Backing up and restoring a failover cluster
configuration
• When backing up failover clusters, keep in mind that:
• Windows Server Backup is a Windows Server 2016
feature
• Non-Microsoft tools are available to perform backup
and restore operations
• You must perform system-state backups
• A nonauthoritative restore operation completely restores
a single node in the cluster
• An authoritative restore operation restores the entire
cluster configuration to a certain point
Troubleshooting failover clusters

Failover cluster troubleshooting techniques


include:
• Using the Validate a Configuration Wizard
• Reviewing the events in logs (cluster, hardware, storage)
• Defining a process for troubleshooting failover clusters
• Reviewing the storage configuration
• Checking for group and resource failures
What is CAU?

CAU is an automated feature in Windows


Server 2016 that:
• Updates nodes in a cluster
• Has these benefits:
• Updating is automatic
• Updating can be scheduled
• Updating causes minimal or no downtime
How CAU works

CAU can work in two modes:


• Remote-updating mode:
• You configure a separate computer as an orchestrator
• You must install the failover clustering administrative
tools
• The CAU orchestrator must not be a cluster member
• Self-updating mode:
• You configure the CAU clustered role as a workload
• No dedicated orchestrator exists
• The cluster updates itself
Demonstration: Configuring CAU

In this demonstration, you will see how to


configure CAU
Lesson 5: Implementing a stretch cluster

What is a stretch cluster?


Synchronous and asynchronous replication
Site-aware failover clusters
Choosing quorum witness
Considerations for deploying a stretch cluster
• Considerations for stretch cluster failover and
failback
What is a stretch cluster?

A stretch cluster is a cluster that has been extended so that


different nodes in the same cluster reside in separate
physical locations

Site A Site B

SAN SAN
Synchronous and asynchronous replication
• In synchronous replication, the host receives a write complete
response from the primary storage after the data is written
successfully to both storage locations
• In asynchronous replication, the host receives a write complete
response from the primary storage after the data is written
successfully on the primary storage
Site A Site B

Replication
Write
request
Secondary
Data Data storage
Write
complete Primary
storage
Site-aware failover clusters

Site-aware failover cluster services:


• Failover affinity
• Cross-site heartbeating
• Preferred site configuration
Choosing quorum witness

• File share witness:


• Requires three or more datacenter locations
• Is available in Windows Server 2012 R2 and
Windows Server 2016
• Azure Cloud Witness:
• Requires two datacenter locations
• Requires an Internet connection for all nodes
• Is available only in Windows Server 2016

• No witness:
• Is not recommended
• Is used for manual failover (disaster recovery site)
Considerations for deploying a stretch cluster

When deploying stretch clusters:


• Ensure that the business requirements are met
• Use storage replication among sites:
• Use a hardware vendor (Windows Server 2012 R2 or earlier)
• Use Storage Replica (Windows Server 2016)
• Choose the correct quorum witness to properly
maintain functionality in the event of failures
• Choose the correct storage replication solution to meet
these needs
Considerations for stretch cluster failover and
failback

When implementing stretch clusters in disaster


recovery scenarios, consider the following:
• Failover time
• The services for failover
• Quorum maintenance
• The storage connection
• Published services and name resolution
• Client connectivity
• The failback procedure
Lab: Implementing failover clustering

Exercise 1: Configuring iSCSI storage


Exercise 2: Configuring a failover cluster
Exercise 3: Deploying and configuring a highly available file server
Exercise 4: Validating the deployment of the highly available file server
• Exercise 5: Configuring CAU on the failover cluster
Logon Information
Virtual machines: 20743B-LON-DC1
20743B-LON-SVR1
20743B-LON-SVR2
20743B-LON-SVR3
MT17B-WS2016-NAT
User name: Adatum\Administrator
Password: Pa55w.rd

Estimated Time: 60 minutes


Lab Scenario

As the business of A. Datum Corporation grows, it is becoming


increasingly important that many of the applications and
services on the network are always available. A. Datum
Corporation has many services and applications that must be
available to internal and external users who work in different
time zones around the world. Many of these applications cannot
be made highly available by using Network Load Balancing
(NLB). Therefore, you should use a different technology to make
these applications highly available.
As one of the senior network administrators at A. Datum
Corporation, you are responsible for implementing failover
clustering on the servers running Windows Server 2016 to
provide high availability for network services and applications.
You are also responsible for planning the failover cluster
configuration and deploying applications and services on the
failover cluster.
Lab Review

What information do you need for planning a


failover cluster implementation?
After running the Validate a Configuration Wizard,
how can you resolve the network communication
single point of failure?
• In which situations might it be important to
enable failback for a clustered application during
a specific time?
Module Review and Takeaways

Review Questions
Real-world Issues and Scenarios
Tools
Best Practice
• Common Issues and Troubleshooting Tips

You might also like