Ol8 Availability
Ol8 Availability
Ol8 Availability
F23591-13
March 2024
Oracle Linux 8 Setting Up High Availability Clustering,
F23591-13
iii
Configuring Fencing Levels 4-6
7 More Information
iv
Preface
Oracle Linux 8: Setting Up High Availability Clustering describes how to install and configure
high availability clustering in Oracle Linux using Corosync and Pacemaker, which are tools
that enable you to achieve high availability for applications and services that are running on
Oracle Linux.
Documentation License
The content in this document is licensed under the Creative Commons Attribution–Share
Alike 4.0 (CC-BY-SA) license. In accordance with CC-BY-SA, if you distribute this content or
an adaptation of it, you must provide attribution to Oracle and retain the original copyright
notices.
Conventions
The following text conventions are used in this document:
Convention Meaning
boldface Boldface type indicates graphical user
interface elements associated with an action,
or terms defined in text or the glossary.
italic Italic type indicates book titles, emphasis, or
placeholder variables for which you supply
particular values.
monospace Monospace type indicates commands within a
paragraph, URLs, code in examples, text that
appears on the screen, or text that you enter.
Documentation Accessibility
For information about Oracle's commitment to accessibility, visit the Oracle Accessibility
Program website at https://www.oracle.com/corporate/accessibility/.
v
Preface
vi
1
About High Availability Clustering
This chapter describes how to set up and configure the Pacemaker and Corosync
technologies to create a high availability (HA) cluster that delivers continuous access to
services that are running across multiple nodes.
High availability services in Oracle Linux consist of several open source packages, including
the Corosync and Pacemaker features. These tools enable you to achieve high availability for
applications and services that are running on Oracle Linux. You can download Corosync,
Pacemaker, and any dependencies and related packages from the Unbreakable Linux
Network (ULN) at https://linux.oracle.com or the Oracle Linux yum server at https://
yum.oracle.com.
Corosync is an open source cluster engine that includes an API to implement several high
availability features, including an availability manager that can restart a process when it fails,
a configuration and statistics database, and a quorum system that can notify applications
when quorum is achieved or lost.
Corosync is installed with Pacemaker, which is an open source high availability cluster
resource manager that's responsible for managing the life cycle of software that's deployed
on a cluster. Pacemaker also provides high availability services, which are achieved by
detecting and recovering from node and resource-level failures by using the API that's
provided by the cluster engine.
Pacemaker also ships with the Pacemaker Command Shell (pcs). You can use the pcs
command to access and configure the cluster and its resources. The pcs daemon runs as a
service on each node in the cluster, making it possible to synchronize configuration changes
across all of the nodes in the cluster.
Oracle provides support for Corosync and Pacemaker that's used for an active-passive 2-
node (1:1) cluster configuration on Oracle Linux 8. Note that support for clustering services
doesn't imply support for Oracle products that are clustered by using these services.
Oracle also provides Oracle Clusterware for high availability clustering with Oracle Database.
You can find more information at https://www.oracle.com/database/technologies/rac/
clusterware.html.
1-1
2
Installing and Configuring Pacemaker and
Corosync
This chapter describes how to set up and configure the Pacemaker and Corosync features to
create a high availability (HA) cluster that delivers continuous access to services running
across multiple nodes.
2-1
Chapter 2
Installing and Enabling the Pacemaker and Corosync Service
• ol8_baseos_latest
• ol8_addons
Use the dnf config-manager tool to enable the yum repositories:
If you are running firewalld, add the high-availability service on each of the
nodes so that the service components are able to communicate across the network.
Per the command that is run in the following example, this step typically enables the
following ports: TCP port 2224 (used by the pcs daemon), port 3121 (for Pacemaker
Remote nodes), port 21064 (for DLM resources), and UDP ports 5405 (for Corosync
clustering), and 5404 (for Corosync multicast, if configured):
To use the pcs command to configure and manage your cluster, you must set a
password on each node for the hacluster user.
Tip:
It is helpful if you set the same password for this user as the password you
set for the user on each node.
Note that to use the pcs command, the pcsd service must be running on each of the
nodes in the cluster. You can set this service to run and to start at boot by running the
following command:
2-2
Chapter 2
Installing and Enabling the Pacemaker and Corosync Service
Note:
When running High Availability Clustering in the cloud, please refer to the following
documents:
• Create a High Availability Cluster on Oracle Cloud Infrastructure (OCI)
• Create a High Availability Cluster For Oracle Linux on Azure
2-3
3
Configuring an Initial Cluster and Service
This chapter provides an example, along with step-by-step instructions on configuring an
initial cluster across two nodes that are hosted on systems with the resolvable host names
node1 and node2. Each system is installed and configured by using the instructions that are
provided in Installing and Configuring Pacemaker and Corosync.
The cluster is configured to run a service, Dummy, that is included in the resource-agents
package. You should have installed this package along with the pacemaker packages. This
tool simply keeps track of whether the service is or is not running. Pacemaker is configured
with an interval parameter that determines how long it should wait between checks to
determine whether the Dummy process has failed.
The Dummy process is manually stopped outside of the Pacemaker tool to simulate a failure,
which is used to demonstrate how the process is restarted automatically on an alternate
node.
Replace node1 and node2 with the resolvable hostnames of the nodes that will form part
of the cluster.
Alternately, if the node names are not resolvable, specify the IP addresses where the
nodes can be accessed, as shown in the following example:
Replace 192.0.2.1 and 192.0.2.2 with the IP addresses of each of the respective hosts in
the cluster.
The tool prompts you to provide a password for the hacluster user. Provide the
password that you set for this user when you installed and configured the Pacemaker
software on each node.
2. Create the cluster by using the pcs cluster setup command. You must specify a
name for the cluster and the node names and IP addresses for each node in the cluster.
For example, run the following command:
3-1
Chapter 3
Setting Cluster Parameters
Replace pacemaker1 with an appropriate name for the cluster. Replace node1 and
node2 with the resolvable hostnames of the nodes in the cluster. Replace
192.0.2.1 and 192.0.2.2 with the IP addresses of each of the respective hosts in
the cluster.
Note that if you used the addr option to specify the IP addresses when
authenticated the nodes, you do not need to specify them again when running the
pcs cluster setup command.
The cluster setup process destroys any existing cluster configuration on the
specified nodes and creates a configuration file for the Corosync service that is
copied to each of the nodes within the cluster.
You can, optionally, use the --start option when running the pcs cluster
setup command to automatically start the cluster once it is created.
3. If you have not already started the cluster as part of the cluster setup command,
start the cluster on all of the nodes. To start the cluster manually, use the pcs
command:
Starting the pacemaker service from systemd is another way to start the cluster on
all nodes, for example:
4. Optionally, you can enable these services to start at boot time so that if a node
reboots, it automatically rejoins the cluster, for example:
Alternately you can enable the pacemaker service from systemd on all nodes, for
example:
Note:
Some users prefer not to enable these services so that a node failure
resulting in a full system reboot can be properly debugged before it
rejoins the cluster.
3-2
Chapter 3
Creating a Service and Testing Failover
Fencing is an advanced feature that helps protect your data from being corrupted by
nodes that might be failing or are unavailable. Pacemaker uses the term stonith (shoot
the other node in the head) to describe fencing options. This configuration depends on
particular hardware and a deeper understanding of the fencing process. For this reason,
it is recommended that you disable the fencing feature.
2. Optionally, configure the cluster to ignore the quorum state by running the following
command:
Because this example uses a two-node cluster, disabling the no-quorum policy makes the
most sense, as quorum technically requires a minimum of three nodes to be a viable
configuration. Quorum is only achieved when more than half of the nodes agree on the
status of the cluster.
In the current release of Corosync, this issue is treated specially for two-node clusters,
where the quorum value is artificially set to 1 so that the primary node is always
considered in quorum. In the case where a network outage results in both nodes going
offline for a period, the nodes race to fence each other and the first to succeed wins
quorum. The fencing agent can usually be configured to give one node priority so that it is
more likely to win quorum if this is preferred.
3. Configure a migration policy by running the following command:
Running this command configures the cluster to move the service to a new node after a
single failure.
3-3
Chapter 3
Creating a Service and Testing Failover
In the previous example, dummy_service is the name that is provided for the
service for this resource:
To invoke the Dummy resource agent, a notation (ocf:pacemaker:Dummy) is used
to specify that it conforms to the OCF standard, that it runs in the pacemaker
namespace, and that the Dummy script is used. If you were configuring a
heartbeat monitor service for an Oracle Database, you might use the
ocf:heartbeat:oracle resource agent.
The resource is configured to use the monitor operation in the agent and an
interval is set to check the health of the service. In this example, the interval is set
to 120s to give the service sufficient time to fail while you're demonstrating failover.
By default, this interval is typically set to 20 seconds, but it can be modified
depending on the type of service and the particular environment.
When you create a service, the cluster starts the resource on a node by using the
resource agent's start command.
2. View the resource start and run status, for example:
2 nodes configured
1 resource configured
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Running the crm_resource command ensures that the cluster is unaware that
the service has been manually stopped.
3-4
Chapter 3
Creating a Service and Testing Failover
4. Run the crm_mon command in interactive mode so that you can wait until a node fails, to
view the Failed Actions message, for example:
sudo crm_mon
Stack: corosync
Current DC: node1 (version 2.1.2-4.0.1.el8_6.2-ada5c3b36e2) - partition
with quorum
Last updated: Wed Jul 13 05:00:27 2022
Last change: Wed Jul 13 04:56:11 2022 by root via cibadmin on node1
3 nodes configured
1 resource configured
Active resources:
You can see the service restart on the alternate node. Note that the default monitor
period is set to 120 seconds, so you might need to wait up to the full period before you
see notification that a node has gone offline.
Tip:
You can use the Ctrl-C key combination to exit out of crm_mon at any point.
5. Reboot the node where the service is running to determine whether failover also occurs
in the case of node failure.
Note that if you didn't enable the corosync and pacemaker services to start on boot, you
might need to manually start the services on the node that you rebooted by running the
following command:
3-5
4
Configuring Fencing (stonith)
This chapter describes how to configure fencing (stonith).
After configuring stonith, run the following commands to check your configuration and
ensure that it is set up correctly:
To check the status of your stonith configuration, run the following command:
4-1
Chapter 4
Fencing Configuration Examples
In the example, node1 is a host that has an IPMI LAN interface configured on the IP
address 203.0.113.1. The host named node2 has an IPMI LAN interface that is
configured on the IP 203.0.113.2. The root user password for the IPMI login on both
systems is specified in this example as password. In each instance. You should
replace these configuration variables with the appropriate values for your particular
environment.
Note that the delay option should only be set to one node. This setting ensures that in
the rare case of a fence race condition that only one node is killed and the other
continues to run. Without this option set, it is possible that both nodes make the
assumption that they are the only surviving node and then simultaneously reset each
other.
NOT_SUPPORTED:
The IPMI LAN agent exposes the login credentials of the IPMI subsystem in
plain text. Your security policy should ensure that it is acceptable for users
with access to the Pacemaker configuration and tools to also have access to
these credentials and the underlying subsystems that are involved.
SCSI Fencing
The SCSI Fencing agent is used to provide storage-level fencing. This configuration
protects storage resources from being written to by two nodes simultaneously by using
SCSI-3 PR (Persistent Reservation). Used in conjunction with a watchdog service, a
node can be reset automatically by using stonith when it attempts to access the SCSI
resource without a reservation.
To configure an environment in this way:
4-2
Chapter 4
Fencing Configuration Examples
1. Install the watchdog service on both nodes and then copy the provided
fence_scsi_check script to the watchdog configuration before enabling the service, as
shown in the following example:
3. After both nodes are configured with the watchdog service and the iscsid service, you
can configure the fence_scsi fencing agent on one of the cluster nodes to monitor a
shared storage device, such as an iSCSI target, for example:
In the example, node1 and node2 represent the hostnames of the nodes in the cluster
and /dev/sdb is the shared storage device. Replace these variables with the appropriate
values for your particular environment.
SBD Fencing
The Storage Based Death (SBD) daemon can run on a system and monitor shared storage.
The SBD daemon can use a messaging system to track cluster health. SBD can also trigger
a reset if the appropriate fencing agent determines that stonith should be implemented.
Note:
SBD Fencing is the method used with Oracle Linux HA clusters running on Oracle
Cloud Infrastructure, as documented in Create a High Availability Cluster on Oracle
Cloud Infrastructure (OCI) .
4-3
Chapter 4
Fencing Configuration Examples
Note that the sbd systemd service is automatically started and stopped as a
dependency of the pacemaker service, you do not need to run this service
independently. Attempting to start or stop the sbd systemd service fails and returns
an error indicating that it is controlled as a dependency service.
4. Edit the /etc/sysconfig/sbd file and set the SBD_DEVICE parameter to identify
the shared storage device. For example, if your shared storage device is available
on /dev/sdc, make sure the file contains the following line:
SBD_DEVICE="/dev/sdc"
5. On one of the nodes, create the SBD messaging layout on the shared storage
device and confirm that it is in place. For example, to set up and verify messaging
on the shared storage device at /dev/sdc, run the following commands:
6. Finally, start the cluster and configure the fence_sbd fencing agent for the shared
storage device. For example, to configure the shared storage device, /dev/sdc, run
the following commands on one of the nodes:
IF-MIB Fencing
IF-MIB fencing takes advantage of SNMP to access the IF-MIB on an Ethernet
network switch and to also shutdown the port on the switch, which effectively takes a
host offline. This configuration leaves the host running, while disconnecting it from the
network. Bear in mind that any FibreChannel or InfiniBand connections could remain
intact, even after the Ethernet connection has been stopped, which means that any
data made available on these connections could still be at risk. Thus, consider
configuring this fencing method as a fallback fencing mechanism. See Configuring
Fencing Levels for more information about how to use multiple fencing agents in
combination to maximize stonith success.
4-4
Chapter 4
Fencing Configuration Examples
2. On one of the nodes in the cluster, configure the fence_ifmib fencing agent for each
node in the environment, as shown in the following example:
In the example, the SNMP IF-MIB switch is accessible at the IP address 203.0.113.10;
the node1 host is connected to port 1 on the switch, and the node2 host is connected to
port 2 on the switch. Replace these variables with the appropriate values for the
particular environment.
Note:
For systems hosted on Azure, clustering with Pacemaker and Corosync is only
available for Azure x86 VMs.
2. On the node you have set up the pcs cluster on, run the following, once for each node
in your cluster:
4-5
Chapter 4
Configuring Fencing Levels
Note:
The option pcmk_host_map is only required if the hostnames and the
Azure VM names are not identical.
4-6
5
Working With Quorum Devices
A quorum device acts as a third-party arbitrator in the event where standard quorum rules
might not adequately cater for node failure. A quorum device is typically used where there
may be an even number of nodes in a cluster. For example, in a cluster that contains two
nodes failure of the nodes to communicate can result in a split-brain issue where both nodes
function as primary at the same time, which results in possible data corruption. By using a
quorum device, quorum arbitration can be achieved and a selected node survives.
A quorum device is a service that ideally runs on a separate physical network to the cluster
itself. It should run on a system that's not a node in the cluster. Although the quorum device
can service multiple clusters at the same time, it should be the only quorum device for each
cluster that it serves. Each node in the cluster is configured for the quorum device. The
quorum device is installed and run as a network bound service on a system outside of the
cluster network.
3. If you're running a firewall on the quorum device service host, you must open the firewall
ports to allow the host to communicate with the cluster. For example, run:
4. On the quorum device service host, enable and start the quorum device service by
setting the Pacemaker configuration for the node to use the net model. Run:
This command creates a configuration for the host and names the node qdev. It sets the
model to net and enables and starts the node. The command triggers the corosync-
qnetd daemon to load and run at boot.
5-1
Chapter 5
Configuring the Cluster for a Quorum Device
5. On each of the nodes within the existing cluster, install the corosync-qdevice
package by running:
...
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 NR node1 (local)
2 1 NR node2
Under the Qdevice column, the value NR is displayed. The NR value indicates that
no quorum devices are registered with any of the nodes within the cluster. If any
other value is displayed, don't proceed with adding another quorum device to the
cluster without removing the existing device first.
3. Add the quorum device to the cluster. On one of the nodes within the existing
cluster, run:
Note that you specify the host to match the host where you're running the quorum
device service, in this case named qdev; and the algorithm that you want to use to
determine quorum, in this case ffsplit.
Algorithm options are:
• ffsplit: is a fifty-fifty split algorithm that favors the partition with the highest
number of active nodes in the cluster.
5-2
Chapter 5
Configuring the Cluster for a Quorum Device
• lms: is a last-man-standing algorithm that returns a vote for the nodes that are still
able to connect to the quorum device service node. If a single node is still active and
it can connect to the quorum device service, the cluster remains quorate. If none of
the nodes can connect to the quorum device service and any one node loses
connection with the rest of the cluster, the cluster becomes inquorate.
See the corosync-qdevice(8) manual page for more information.
4. Verify that the quorum device is configured within the cluster. On any node in the existing
cluster, run:
The output displays that a quorum device is configured and indicates the algorithm that is
in use:
Options:
Device:
Model: net
algorithm: ffsplit
host: qdev
You can also query the quorum status for the cluster by running:
Quorum information
------------------
Date: Fri Jul 15 14:19:07 2022
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 1
Ring ID: 1/8272
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW node1 (local)
2 1 A,V,NMW node2
0 1 Qdevice
5-3
Chapter 5
Managing Quorum Devices
Note that the membership information now displays values A,V,NMW for the
Qdevice field. Values for this field can be equal to any of the following:
• A/NA: indicates that the quorum device is alive or not alive to each node in the
cluster.
• V/NV: indicates whether the quorum device has provided a vote to a node. In
the case where the cluster is split, one node would be set to V and the other to
NV.
• MW/NMW: indicates whether the quorum device master_wins flag is set. Any
node with an active quorum device that also has the master_wins flag set
becomes quorate regardless of the node votes of the cluster. By default the
option is unset.
5-4
Chapter 5
Managing Quorum Devices
HB interval: 8000ms
Configured node list: 1, 2
Ring ID: 1.16
Membership node list: 1, 2
TLS active: Yes (client certificate verified)
Vote: ACK (ACK)
• To force the service to stop if the normal stop process is not working, run:
The example changes the algorithm to use the lms or last-man-standing algorithm.
Note:
You can't update the host for a quorum device. You must remove the device and
add it back into the cluster if you need to change the host.
5-5
Chapter 5
Managing Quorum Devices
Removing the quorum device updates the cluster configuration to remove any
configuration entries for the quorum device, reloads the cluster configuration into the
cluster and then disables and stops the quorum device on each node.
Because you might use the same quorum device service across multiple clusters,
removing the quorum device from the cluster doesn't affect the quorum device service
in any way. The service continues to run on the service host, but no longer serves the
cluster where it has been removed.
Note:
Remove the quorum device from any clusters that it services before
destroying the quorum device service.
5-6
6
Using the Pacemaker/Corosync Web User
Interface
This chapter describes how to create and manage clusters by using the web UI tool instead
of the pcs command line.
This chapter assumes that you have completed the tasks that are described in Installing and
Enabling the Pacemaker and Corosync Service and that the nodes have been authenticated
for the hacluster user.
For information about authentication and configuring hacluster credentials, see Step 1 of
Creating the Cluster.
To access the web UI, log in as user hacluster at https://node:2224, where node refers to
a node authenticated for hacluster. Specify the node either by its node name or IP address.
Note:
The rest of this chapter assumes that you have configured resolvable names for all
the nodes.
After you log in, the home page's Manage Clusters page is displayed. This page lists clusters
that are under the web UI's management.
6-1
Chapter 6
Initial Cluster Configuration Tasks
6-2
Chapter 6
Initial Cluster Configuration Tasks
6-3
Chapter 6
Managing Clusters With the Web UI
Note:
The web UI provides another method of adding nodes to a cluster. See
Configuring Nodes.
6-4
Chapter 6
Managing Clusters With the Web UI
Configuring Nodes
The Nodes page contains options to add nodes to the cluster or remove nodes.
• To add nodes:
1. Click + Add.
2. Specify the nodes to add.
3. Click Add Node.
• To remove nodes:
1. Select one or more nodes from the list.
2. Click x Remove.
3. Click Remove Node(s) to confirm.
For every node that you select from the list, information about that node is displayed,
including the status of the cluster daemons running on the node, resource information, node
attributes, and so on. You can manipulate the node further by clicking the options that
correspond to the following actions:
• Stop, start, or restart the node.
• Put the node on standby mode.
• Put the node on maintenance mode.
6-5
Chapter 6
Managing Clusters With the Web UI
To obtain information about a specific property, hover the mouse pointer over the
information icon (i). The icon displays a short description of the property and its default
value.
For example, the Batch Limit property is described as follows:
Default value: 0
The properties you customize depend on circumstances and needs. Suppose that you
have a two-node cluster. For this cluster, you want to disable the fencing feature.
Because the cluster consists only of two nodes, you do not need any quorum policy.
Finally you want to set the migration threshold such that the cluster moves services to
a new node after a single failure on a current node. In this case, you would do the
following:
1. From the drop down list of Stonith Enabled, select false to disable the fencing
feature.
2. From the drop down list of No Quorum Policy, select ignore to disregard the
quorum policy
3. Click Show advanced settings to display migration parameters.
4. On the migration limit field, type 1 to set the threshold to a single failure event
before services are moved.
5. Click Apply Changes to accept the revisions.
6. Click Refresh so that the page reflects the changed parameters with their new
values.
Typically, you can configure multiple properties in any random order. However, as a
final step, you must click Apply Changes to effect the new configuration.
6-6
Chapter 6
Managing Clusters With the Web UI
Creating ACLs for the cluster assumes that you have already created users and optionally
have added them to defined groups on all the cluster nodes, for example:
Configuring Fencing
To configure fencing for the cluster, click the appropriate menu item to open the Fence
Devices page.
6-7
Chapter 6
Managing Clusters With the Web UI
For a brief description of fencing and its purpose, see About Fencing Configuration
(stonith).
6-8
Chapter 6
Managing Clusters With the Web UI
• lanplus, for example, 1 as the level of priority over other fencing types for actions that
take effect in case of failure.
• pcmk_monitor_timeout, for example, 60 seconds
As with all other properties, information about each argument can be obtained through the
information icon.
6-9
Chapter 6
Managing Clusters With the Web UI
On the resource's information detail, you can manage the resource further through the
following options:
• Enforce actions on the resource such as enabling, disabling, refreshing, or
removing the resource; performing resource cleanups; and putting the resource in
manage or unmanage mode.
• Create a clone or a promotable clone.
• Update the resource's group information, such as assigning it to another group.
• Configure optional and advanced arguments.
6-10
7
More Information
For more information and documentation on Pacemaker and Corosync, see https://
clusterlabs.org/pacemaker/doc/.
7-1