Single-Node and Two-Node Clusters FAQ
Single-Node and Two-Node Clusters FAQ
Description:
A traditional Nutanix cluster requires a minimum of three nodes, but Nutanix also offers the option of a single-node or two-node cluster for ROBO
(Remote Office/Branch Office) implementations mainly. Both of them require the CVM (Controller VM) to be 6 vCPUs and 20 GB of memory. You
cannot expand a single-node or two-node cluster later on. Some platforms support a single-node cluster as the replication target (Single-Node
Replication Target, SNRT).
Note: Metadata is always RF4 in the SSD tier (two copies for one node, and a total of 4 copies across two nodes), so that minimizes the amount of
time it takes to get to RF2 state within the remaining node.
Supported Hardware
NX-1175S-G5
NX-1175S-G6
NX-1175S-G7 NX-1155-G5
NX-1120S-G7 NX-5155-G6
NX-1175S-G8 NX-8155-G6
NX-8150-G8 NX-8155-G7
NX-8150(N)-G8 NX-1175S-G6
Nutanix NX NX-1175S-G9 NX-1175S-G7
NX-3060-G9 NX-1175S-G8
NX-8170-G9 NX-8155(N)-G8
NX-3035-G9 NX-8155-G8
NX-8150-G9 NX-8155-G9
NX-8155-G9 NX-1175S-G9
NX-8155A-G9
NX-3155-G9
DX325 8SFF
HPE DX DX360 4LFF DX325 8SFF
DX380 12LFF
DX360 8SFF DX360 4LFF
DX385 12LFF
Gen10/Gen10 plus DX380 12LFF DX360 8SFF
DX385 12LFF
Single-Node Replication Target
Platform Vendor 1-node 2-node
(SNRT, 1-node)
DX320 4LFF
DX360 8SFF
DX320 4LFF
DX365 10NVMe
DX360 8SFF DX380 12LFF
HPE DX Gen11 DX360 10NVMe
DX365 10NVMe DX385 12LFF
DX380 24SFF
DX360 10NVMe
DX380 12LFF
DX385 12LFF
XC640-4i
XCXR2-8 XC450-4s
XC740-12R
XC640-4i XC450-4
Dell XC XC750-14
All 15G platforms XC4000r
XC7525-12
All 16G platforms (XC4510c, XC4520c)
XC660xs-4,XC660xs-4s
XF1070 M2
XF1070 M3
XF8055 M2
XF8055 M3 XF1070 M2
XF1070 M2
XF 3070 M3 XF8055 M3
Fujitsu XF1070 M3
XF 8050 M3 XF 1070 M7
XF 1070 M7
XF 1070 M7 XF 8055 M7
XF 3070 M7
XF 8050 M7
XF 8055 M7
InMerge600M5S
InMerge600M5S-core
InMerge 1000M5L InMerge1000M5L
Inspur
InMerge 1000M6L InMerge1000M5L-core
InMerge 1000M6G
InMerge 1000M6S
Single-Node Replication Target
Platform Vendor 1-node 2-node
(SNRT, 1-node)
HCIAF220C-M7S**
HCIAF240C-M7SX**
HCIAF240C-M7SN**
HCIAF220C-M6S**
Cisco -
HCIAF220C-M6SN**
HCIAF240C-M6SX**
UCS 220M5SX
Intel - - -
Voyager VM3.0
Klas - -
Voyager VM4.0
KR580S1-224N-NVMe
KTNF KR580S1-308N -
KR580S1-312N
** All Cisco M6 and M7 models are enabled for ISM mode only
The latest NX-1175S single-node specifications can be found here: https://www.nutanix.com/products/hardware-platforms/specsheet. Note that a two-
node solution will leverage 2x units of NX-1175S, so both solutions are based on the same underlying platform. NX-1175S can form a single-node or a
two-node cluster, whereas NX-5155-G6 can only form a single-node cluster (SNRT).
Note on NX-1155-G5 EOS/EOL status: The NX-1155-G5 EOS Announcement Bulletin specifies that this model is no longer for sale since October 30,
2018. However, it is still under support until October 30, 2023. For more details, see the NX-1155-G5 EOS Announcement Bulletin in Portal.
Note that the NX-1065S has only 3x drives per node. You need at least 4x drives per node – 2x SSD and 2x HDD to have data resiliency at each tier.
Hence, the NX-1065S is not a valid node for a single-node or two-node cluster.
"Storage Summary" on Prism shows "Resilient Capacity 0 GB" regardless of having enough free space.
In Single Node clusters with an all-flash 2 SSD configuration, Prism reports a Data resiliency warning for Free space in AOS version 5.18 and above.
This is expected due to the 2 SSDs configurations, which do not provide resilience capacity to rebuild. Move to a 3 SSD or 2 SSD + 2 HDD
configuration, depending on the hardware, to resolve this warning. Refer to the corresponding hardware documentation to confirm if it is possible to
add more storage since these configurations are hardware-dependent.
Solution:
1. From AOS 5.5 release, Nutanix supports running VMs on NX-1155 (see the Note on NX-1155-G5 EOS/EOL status above).
2. There are software-only options for single-node and two-node clusters: Cisco 220M5SX, Dell R640-4LFF, and HPE DL360 Gen10.
3. NX-1175S, with a three-node+ cluster size, supports standard intermixing rules as applies to the NX-1000 series.
4. Direct connection between the nodes for internal CVM traffic is not supported on two-node clusters.
Prism will now be able to show the 2nd PSU and detect its up/downstate.
B. Three-node+ cluster with 1x PSU, while adding new node shipped from the factory with 2x PSUs
Assume the three-node+ cluster has Foundation bits older than 4.0.3 (e.g., 4.0.1). If the new 2x PSU nodes (pre-installed with Foundation 4.0.3 from
the factory) are added to the cluster via the Prism UI’s Add-node workflow, the cluster will still use its Foundation 4.0.1 bits to image the new nodes,
which will essentially change the hardware layout file of the new nodes from 2x PSU to 1x PSU and disable the Prism UI capability to detect the
2nd PSU on the new nodes.
a. Follow the steps in Scenario A above to first upgrade the three-node+ cluster from 1x PSU to 2x PSU
b. Then, add the new nodes with 2x PSU to the cluster
Prism will now be able to show the 2nd PSU and detect its up/downstate.
Witness VM
Q. Does a two-node solution require a separate third entity to break the quorum?
Yes, a Witness VM is required (only for two-node, not for single-node).
Q. Is there any effect on a two-node cluster from functioning if the Witness VM is unreachable (for example, WAN link is down if Witness VM is
hosted remotely)?
No. A two-node cluster will function normally even if the Witness VM is down or unreachable, as the Witness VM is only utilized during failure
scenarios between the 2 nodes.
Q. What if the Witness VM is down (or the WAN link is down) as well as the network between the two nodes?
Note that a regular three-node RF2 cluster could suffer data loss if it encounters the failure of 2 nodes or 2 drives at the same time. This refers to a
double-failure scenario. In this case, none of the nodes will make forward progress. The VMs will go to a read-only state as I/Os are failing unless the
Witness VM is up and running or the network between the nodes is fixed.
Q. Is there any latency or bandwidth restriction on the network between the Witness VM and the two-node cluster?
Network latency between a two-node cluster and the Witness VM should not exceed 500 ms. (RPC timeouts are triggered if the network latency is
higher.) During a failure scenario, nodes keep trying to reach (ping) the Witness VM until successful. Nodes ping the Witness VM every 60 seconds,
and each Witness request has a two-second timeout, so it can tolerate up to one second of link latency.
Caveats
The 1-node is a single point of failure and thus offers lower resiliency vs. the 2-node and larger clusters. Thus, there is an additional risk of data
loss/corruption.
Users should have their own backups and/or resiliency within their application for data protection.
Users should use caution when performing disk removals from single-node clusters, as removing too much disk space could exhaust the
usable space in the cluster.
The following features are not supported on single-node and two-node configurations in the initial release. Future support TBD.
Features
Scale-out
Nutanix Volumes
Metro Availability
EC-X
Deduplication*
Hypervisor
Hyper-V
All Prism Pro features will be applicable to the single-node/two-node configuration as well.
A three-node+ cluster, even with NX-1175S, will not have the aforementioned restrictions. So, note that a three-node cluster may still be a very
valid and economical option for ROBO.
To change the IP addresses / RE-IP of Single-node and Two-node clusters, please contact Nutanix Support.
Cluster encryption on single-node and two-node clusters is only supported with non-local key management and requires external key
management.
Known Issues
Q. Will Prism prompt for User VM shut down for an upgrade of a two-node cluster?
An upgrade should not require UVMs to be shut down. This is a bug in Prism 5.6, which is fixed in 5.6.1. For 5.6, you can do an upgrade from the
command line.
1. Ensure that cluster has data resiliency OK and can tolerate one node down from Prism.
2. Stop UVMs - graceful shutdown
3. Run the following command on one CVM:
nutanix@cvm$ cvm_shutdown
4. Wait for 5-10 minutes, and then run the following command on the second CVM:
When you start the cluster, services on one node will start in standalone mode, and the cluster should be available to launch UVMs. Services on the
second node will come up after 15 minutes.
Q. Do two-node clusters take longer to perform upgrades and maintenance activities than 3+ node clusters?
Yes, two-node clusters take longer to restore data resiliency than 3+ node clusters. Depending on the workload, this recovery can take substantially
longer.
Why: Two-node clusters maintain data resiliency by transferring RF-2 containers to disks on the same node when one of the nodes is undergoing a
planned outage. Depending on the type of planned outage, you may also migrate all UVMs to the other node. (firmware or hypervisor upgrades).
After the node is back from the planned outage, the data resiliency shifts from 2-disks on the node to the two nodes thereby replicating the data
residing on 2-disks of the previous node to the newly up node. This is to ensure that data is protected at a Node granularity when possible. This
"transferring" of LAFD (Lowest Available Fault Domain) from Disk to Node takes time and depends upon the data that was written during a planned
outage. Only after the data is fully replicated does the cluster allow the second node to undergo a planned outage.