Nutanix Hybrid Cloud Fundamentals
Nutanix Hybrid Cloud Fundamentals
Lesson 1: Overview
Understanding Hypervisors
The primary technological problem that hypervisors solved was that most physical hardware
could run only one operating system at a time. This constraint often led to wasted resources, as a single
OS seldom fully utilized the hardware’s capacity. Hypervisors address this constraint by aggregating the
1
resources of virtualized physical servers (such as memory, network bandwidth and CPU cycles) and then
allocating those resources to virtual environments, called virtual machines.
Hypervisors are also known as virtual machine monitors (VMM). A VM is essentially a software-
based computer, with access to the same resources as a physical computer. A hypervisor lets you run
multiple VMs as guests, thereby using the physical resources of the underlying host machine much more
efficiently. Each VM is itself, a self-contained computer, with its own operating system, applications, and
services. VMs with different operating systems and applications can all reside on the same physical
system, sharing the same physical resources. The hypervisor is responsible for VM separation, which
protects one VM from another in the use of resources and provides crash resiliency and security isolation.
As software, hypervisors decouple the OS and apps from the physical host. This decoupling
provides an array of benefits, including the ability to easily and quickly migrate the VM from one host to
another without disruption. This capacity, called live migration, is used for many purposes including load
balancing and preparing for maintenance tasks. Virtual machines are automatically recovered in the case
of node failure, providing high availability and increased uptime.
Virtualization enables cost savings through reducing physical footprint, which in turn reduces
costs for electricity, cooling, and maintenance. Virtualization also greatly improves agility and speed in
delivering IT services. For example, it is far easier to spin up a VM than to provision new physical
(server/application) environments to satisfy customer requests.
Complexity:
Is it easy to deploy and manage? Is it a separate product, with a separate console, that requires
full-time specialists to maintain, operate, and troubleshoot? Is it something that an IT generalist could
master relatively quickly?
Performance:
Does it deliver enough performance to support your mission-critical applications? Check out the
benchmarks for performance in production (as close to real-world conditions as possible).
Cost:
Does it come with licensing fees, or is it built-in to the larger solution?
Ecosystem:
Does it support a rich ecosystem? For example, does it support the most widely used guest
operating systems? Microsoft, SUSE, RedHat, Ubuntu, CentOS. Does it support leading enterprise
applications and technologies? Microsoft SQL Server, Exchange, SharePoint, SAP, Oracle, Citrix, Hadoop,
Splunk, MongoDB, SaS, OpenStack, Avaya, and Docker.
2
Lesson 3: Traditional Multi-Tier Architecture
HCI is the infrastructure of choice for companies that want to stay competitive and ensure that
their datacenters are cloud-ready. Not all HCI solutions are equal. But, in general, a true HCI solution needs
to be capable of performing specific functions.
HCI is a 100% software-driven solution that converges the entire datacenter stack, including
compute, storage, storage networking, and virtualization. Complex and expensive legacy infrastructure is
replaced by turnkey, industry-standard servers that enable enterprises to start small and scale one node
at a time. Software running on each server node distributes all operating functions across a cluster for
superior manageability, performance, and resilience.
With the web explosion of the 90’s, infrastructure with server-SAN and storage networks was
introduced, featuring independent modules that could be updated or changed without affecting other
layers. This infrastructure revolutionized IT departments and has been used ever since.
But now, three-tier infrastructure can no longer keep pace with IT needs. It is complex, unwieldy,
does not provide a firm foundation for DevOps, and cannot scale with the magnitude that is needed now.
Today, HCI is the infrastructure of choice for companies that want to stay competitive and ensure
that their datacenters are cloud-ready. Not all HCI solutions are equal. But, in general, a true HCI solution
needs to do some very specific things.
3
HCI converges the entire datacenter stack, including compute, storage, storage networking, and
virtualization. Complex and expensive legacy infrastructure precedes a platform running on turnkey,
industry-standard servers that enable enterprises to start small and scale one node at a time. Software
running on each server node distributes all operating functions across the cluster for superior
performance and resilience.
DISTRIBUTED PLANE:
The distributed plane runs across a cluster of nodes delivering storage, virtualization, and
networking services for guest applications, whether they are VMs or container-based apps.
MANAGEMENT PLANE:
The management plane lets you easily administer HCI resources from one place and one view and
eliminates the need for separate management solutions for servers, storage networks, storage, and
virtualization.
Enterprise IT teams today are looking for ways to deliver services via on-premises private cloud
with the speed and operational efficiency of public cloud services offered by Amazon Web Services (AWS),
Microsoft Azure, and Google Cloud. They are also looking for ways to leverage public cloud offerings to
enhance their portfolios, scale rapidly, and provide lower-cost alternatives for traditional IT services.
4
The simplicity and flexibility of hyperconverged infrastructure makes it the ideal platform for
private cloud. HCI’s unique ability to run legacy workloads while reducing capital and operational expense
combined with its software-defined architecture, the ability to leverage APIs for automation, and support
for rapid and incremental scalability, also make it an ideal candidate for Dev-Ops and cloud-ready, next-
generation applications. This, in turn make HCI the ideal platform for hybrid cloud and multicloud
infrastructure.
PRIVATE CLOUDS:
Private clouds are typically built
using infrastructure that is entirely owned
and managed by an enterprise.
PUBLIC CLOUDS:
Public clouds are publicly accessed and
consumed. This means that networking, storage,
and compute resources (and often applications) are
owned and managed by a third-party provider like
Amazon Web Services (AWS), Microsoft Azure, or
Google Cloud Platform (GCP). Though workloads are
partitioned for security, these resources are shared
by the customers of a particular public cloud
provider.
HYBRID CLOUD:
Hybrid cloud describes an architecture where an enterprise is combining private and public cloud
resources to deliver IT services to the business. A distinguishing characteristic of hybrid cloud is the use of
a single management plane across clouds and the option of deploying workloads to either public or private
infrastructure based on business needs.
5
MULTI CLOUD:
Multicloud describes an architecture where an enterprise is using private and public cloud
resources to deliver IT services but a single management plane across clouds is not being used. The cloud
environments are discreet and managed separately. Many businesses using SaaS offerings are leveraging
this architecture.
Migration:
Companies that are migrating from a complete on-prem solution to a configuration that
incorporates some usage of public cloud capacity may use a hybrid cloud architecture to maintain control
of the workloads placed in the public cloud.
Reverse Migration:
Organizations that are moving workloads back to a private, on-prem datacenter from being
primarily cloud-based (a process called repatriation) would want to manage workloads from the single
management plane provided by a hybrid cloud solution.
Security:
Maintaining the level of security required to protect applications and data requires strict control
over who has access. Regulations can determine where data is stored, and how long it needs to be
retained. Hybrid cloud architecture can provide the necessary control and audit capability for an
infrastructure that spans private and public domains.
6
Identify what you are already doing in the cloud:
Hybrid cloud infrastructure provides almost unlimited flexibility for organizations. Offerings range
from colocation, where everything from the power plug out is provisioned and managed by the enterprise
to software as a service (SaaS), where everything but the data is controlled by the provider– and all points
in between. An enterprise can enjoy the enhanced security of on-prem resources while also having the
rapid scale and elasticity of the public cloud. And, with a properly constructed hybrid cloud, encrypted
data sharing enables industries that manage hypersensitive information such as public sector entities, law
offices, financial service institutions, and healthcare providers to consume cloud services where
regulations allow.
Organizations from these industries can store and share data as needed with external partners while still
adhering to regulatory compliance guidelines such as HIPAA, ISO, PCIDSS,CIS, NiST, and SOC-2.
Enabling hybrid and multicloud or connecting your private cloud with public cloud can be done as
quickly as 60 minutes. You can pick the right resources for your workloads and strategy and adjust
seamlessly as your business grows. Burst capacity from datacenters to hosted clouds and public clouds
like Amazon Web Services (AWS). Nutanix hybrid and multicloud solutions feature out-of-the-box
networking integration with public clouds. This allows application builders to focus on code and
application design, not the infrastructure or cloud connectivity.
Nutanix private, hybrid, and multicloud solutions ensure on-premises and public environments
are operated as a single cloud.
7
Module 2: UNDERSTANDING THE NUTANIX CLOUD PLATFORM
Lesson 1: Overview
After completing this module, you will be able to:
Describe the Nutanix Multicloud Platform
Define Acropolis
Describe AOS
Describe AHV
Describe Prism and its licenses
By running the Nutanix software on industry-standard servers, a business can gain all the benefits
of the Nutanix solution while also starting with a relatively small deployment and scaling one node (i.e.,
server) at a time, as needed. Each node includes Intel-powered x86 or IBM Power hardware with flash
SSDs and HDDs. Nutanix software running on each node distributes all operating functions across the
cluster for performance and resilience.
A single Nutanix cluster can scale as large as the hypervisor cluster it is on. Different hardware
platforms are available to address varying workload needs for compute and storage. Nutanix software is
hardware agnostic, running on hardware from vendors such as Dell, Lenovo, Cisco UCS, HPE ProLiant, and
more.
8
The image above encapsulates a broad idea that lies at the heart of Nutanix.
At the top are workloads. Workloads are the reason the underlying infrastructure exists and is necessary.
They are how a business runs, how it grows, and how it shapes its present and future.
At the bottom are the choices that you have when you consider deploying and running your workloads
on Nutanix. Choosing Nutanix is meant to be liberating, rather than restrictive. Nutanix supports several
leading hardware platforms, so you can run Nutanix software on your choice of hardware.
And the freedom that comes with choosing hardware platforms extends to the public cloud as well. If you
have workloads on AWS, Azure, or GCP, Nutanix integrates neatly and tightly with them, so you can
continue to benefit from the public cloud when necessary while leveraging the strengths of your private
cloud – in a true, hybrid model.
And in the middle, between the underlying infrastructure and the workloads is the Nutanix Cloud Platform
– the products that power this tremendous freedom. When you choose to modernize your infrastructure
with Nutanix HCI, the only way forward is up – to better security, to simplified storage, to automated
operations, and fully integrated enterprise-grade backup and DR.
Each product represents a key component of the hybrid cloud. AOS, AHV, and Prism are the foundation.
Every other product can be layered on top and integrates with this foundation to give you a fully featured
enterprise-class hybrid cloud solution.
Cloud Management
Prism:
Is the management plane that provides a unified management interface that can generate
actionable insights for optimizing virtualization, provides infrastructure management and everyday
operations.
9
Calm:
Allows you to seamlessly select, provision, and manage your business applications across your
infrastructure for both the private and public clouds. Calm provides application automation, lifecycle
management, monitoring, and remediation to manage your infrastructure.
Beam:
Is a Cost Governance SaaS product offering by Nutanix that helps cloud-focused organizations to
gain visibility into cloud spend across multiple cloud environments? It provides you with deep visibility
and rich analytics detailing cloud consumption pattern as well as one-click cost optimization across your
cloud environments.
Flow:
Flow Security Central (FSC) is a SaaS product offering by Nutanix. FSC helps cloud-focused
organizations to gain visibility into security compliance status across multiple cloud environments. FSC
helps you to detect and remediate security vulnerabilities in your cloud infrastructure in near real-time.
With its event-driven architecture, you can perform detection, analysis, and reporting of over 800 security
audit checks that are provided out-of-the-box in FSC.
Database Service:
Era: Automates and simplifies database administration, bringing one-click simplicity and invisible
operations to database provisioning and life-cycle management. Era enables data base administrators to
perform operations such as database registration, provisioning, cloning, patching, and restore. It allows
administrators to define provisioning standards with end state driven functionality that includes network
segmentation, High Availability (HA) database deployments, and much more.
Frame: Is an industry-leading hybrid and multicloud Desktop-as-a-Service (DaaS) solution built for
the cloud age. Frame is cloud and multitenant-native, and allows customers to deploy their workloads
natively into their private or public clouds.
AHV–
The native Nutanix hypervisor, AHV, represents a unique approach to virtualization that offers the
powerful virtualization capabilities needed to deploy and manage enterprise applications. AHV
compliments the HCI value by integrating native virtualization along with networking, infrastructure, and
operations management with a single intuitive interface - Nutanix Prism.
10
Karbon–
Is a curated turnkey offering that provides simplified provisioning and operations of Kubernetes
clusters. Kubernetes is an open source container orchestration system for deploying and managing
container-based applications. Karbon streamlines the deployment and management of Kubernetes
clusters with a simple GUI integrated into Prism Central.
Leap–
Leap offers an entity-centric automated approach to protect and recover workloads. It uses
categories to group the guest VMs and automate the protection of the guest VMs as the application scales.
Disaster recovery is easy and more flexible with network mappings, an enforceable VM start sequence,
and inter-stage delays.
Flow Microsegmentation–
Is built into AHV virtualization and is enabled with just a few clicks in Prism Central. Flow works at
the hypervisor level which means it works with your network, no new equipment or configuration changes
required. Flow includes a policy-driven security framework that inspects traffic within the data center. It
helps eliminate the need for additional firewalls within the data center. It uses a workload-centric
approach so it can scrutinize traffic to and from VMs no matter how their network configurations change
and where they reside in the data center.
Nutanix Clusters–
Provides a single platform that can span private and public clouds but operates as a single cloud using
Prism Central enabling true hybrid cloud architecture. Nutanix Clusters resources are deployed in your
cloud provider account, thereby enabling you to use your existing cloud provider relationship, credits,
commits, and discounts.
Unified Storage
FILES:
Is a software-defined, scale-out file storage solution that lets you share files in a centralized and
protected location to eliminate the requirement of a third-party fileserver. Files uses a scale-out
architecture that provides file services to clients through the Server Message Block (SMB) and Network
File System (NFS) protocols.
OBJECTS:
s an object storage service designed to solve the problem of unanticipated data growth—
specifically, unstructured data that could start small then grow at a rapid pace, potentially reaching the
petabyte scale. Storing unstructured data using traditional block or file data management protocols could
result in unmanageably large, complex, and expensive solutions.
VOLUMES:
Is designed as a scale-out storage solution in a cluster that can present storage volumes via iSCSI.
This solution allows an individual application to access resources across an entire cluster, if needed, to
scale out performance, as well to support any workloads external to cluster including bare metal or VMs.
Nutanix Volumes automatically manages High Availability (HA) to ensure upgrades or failures are non-
disruptive.
11
MINE:
Nutanix Mine™ is the product name for joint solutions between Nutanix and select data
protection software vendors. Nutanix Mine™ is a dedicated backup solution, where only backup
component VMs run on the Mine™ cluster and the cluster storage is used to store backup workloads.
The Nutanix Cloud Platform is a unified IT operating environment that melds private, public and
distributed clouds, providing a single point of control for managing infrastructure and applications in any
cloud. It delivers a consistent, high-performance and seamless experience for both cloud operators and
consumers of cloud-delivered services and applications.
12
Elements of Nutanix Cloud Platform
The Nutanix Cloud Platform builds on the foundational ideas of hyperconvergence, adding cloud
capabilities that competing HCI solutions lack. The result is a flexible platform uniquely capable of solving
your IT challenges.
With a compact footprint and simplified remote management, the Nutanix Cloud Platform is ideal
for secondary datacenters, disaster recovery sites, and edge locations including production facilities,
distribution centers, and remote and branch offices.
Exceptional Availability:
A self-healing architecture that restores full resiliency without operator intervention. One-click,
non-disruptive upgrades eliminate planned downtime.
Built-in Data Protection:
The Nutanix Cloud Platform OS incorporates data protection and DR as part of the infrastructure
stack.
Security by Design:
It is a security-first design, which results in a smaller attack surface and a less chance of sensitive
customer data being compromised. Nutanix software automatically tracks and reverses any changes from
a secure baseline.
Full Application Orchestration:
Nutanix Calm orchestrates the provisioning, scaling, and management of applications across
multiple on-premises and cloud deployments, making IT infrastructure more agile and application-centric.
Easy C;oud Integration:
Supports applications running in your data centers, on private clouds run by Nutanix X-Powered
Service Providers, and on public cloud services, such as Amazon Web Services (AWS).
This means, for example, that you can deploy a new application quickly in multiple locations and
be certain that everything is configured correctly. And, you can use the same blueprints to deploy
instances of your applications in public clouds such as Azure, Google Cloud Platform, or AWS.
13
Enhance developer productivity. Simplify access to high-performance test environments and up-
to-date production data copies.
Accelerate time to market. Cut QA cycles by as much as 50%. Improve quality through better test
coverage using private cloud-based orchestration and automation.
Lower costs and simplify administration. VM-centric operations, intuitive tools, and open APIs
streamline workflows and reduce complexity.
The Nutanix native hypervisor (AHV) eliminates the need to maintain a separate virtualization
stack and associated licensing costs while delivering all the functionality you expect.
Only Nutanix enables you to run large enterprise applications alongside VDI and server
virtualization apps on the same infrastructure, simply and efficiently, without compromising performance
or manageability.
14
And only Nutanix allows you to deploy those workloads in your enterprise datacenters, at
secondary locations, in the public cloud or at cloud service providers. Nutanix has put considerable
thought into determining the best way for enterprises to transform primary datacenters as well as
distributed and edge operations with the cloud platform.
You may need to ramp up resources to address seasonal business demands. With the Nutanix
Cloud Platform, you have multiple options for obtaining additional resources:
Spin up the application instances you need in a public cloud using Nutanix validated
stacks from VMware, Microsoft, and Nutanix.
Contract with an X-Powered Service Provider.
Leasing equipment gives you flexibility without sacrificing control
By contracting for the resources you need only when you need them, your IT operations achieve
much higher efficiency and lower cost.
Infrastructure Transformation–
The process of transformation begins with a full understanding of your current environment in every
operating location including:
Application-specific metrics: Gather steady-state statistics and trends for each application as well
as working set size, execution times for anybatch processes, and average and peak transactions per
second.
Infrastructure-specific metrics: Gather appropriate specifications, utilization, and capacity for
server CPUs and memory, networks, and storage. Also gather performance metrics such as latency and
throughput.
Mapping to service owners: Clearly communicate requirements for service owners and users
involvement.
Next, think about what aspects of your operations you intend to change, expand, or shrink. What
new applications or services are coming online in the next twelve to twenty-four months? What cloud
services are you currently using, and will you be expanding those services?
Do you plan to repatriate any applications currently running in the cloud? With the above
information in hand, you can accurately size infrastructure for each location. Nutanix Sizer makes this task
straightforward.
Nutanix provides application-specific best practices and guidelines for most popular enterprise
applications.
15
Process And Organizational Transformation–
For many enterprises, transforming processes and making necessary organizational adjustments
is even more challenging than transforming infrastructure.
IT staffing. As your infrastructure is transformed, you’ll have fewer silos and easier management.
IT generalists can then get comfortable with a wider variety of tasks and with a higher level of cloud
sophistication.
IT project flow. A less rigid infrastructure environment creates the opportunity to streamline how
applications are developed. Look for ways to re-architect the processes you use as applications move from
development to test to QA to staging to production. Incorporate cloning and other efficiency technologies
to eliminate points of friction and accelerate time to market.
Automation. Take advantage of the cloud platform to automate infrastructure tasks for both
operations and development. The more processes you can simplify and automate up front, the smoother
the path to DevOps will be.
Term Definition
The Nutanix converged software fabric for virtualization
and storage management. It consists of the Acropolis base
Acropolis
software (AOS), Nutanix storage, AHV, Prism, and Acropolis
APIs.
AHV The Nutanix hypervisor solution.
AOS The Acropolis operating system or base software.
A Nutanix VM that manages storage and other cluster
Controller VM (CVM)
functions on a node.
16
The sharing of identical guest VM data on premium tiers
(RAM and Flash)for improved performance or on capacity
Deduplication
tiers (HDD) for storage space savings. Enabled by
properties of a container or vDisk.
17
The number of nodes plus 1 that the cluster can tolerate
being down at one time. By default, Nutanix clusters have a
Redundancy factor redundancy factor of 2, which means that they can tolerate
1 node being down. They are configurable to redundancy
factor 3 to enable tolerating 2 nodes being down.
18
Lesson 6: Acropolis
Acropolis is the foundation for a platform that starts with hyperconverged infrastructure then
adds built-in virtualization, storage services, virtual networking, and cross-hypervisor application mobility.
For the complete list of features, see the Software Options page on the Nutanix website.
Nutanix delivers a hybrid cloud solution purpose-built for virtualization and cloud environments.
This solution brings the performance and economic benefits of web-scale architecture to the enterprise
through the Enterprise Cloud Platform, which includes two product families—Nutanix Acropolis and
Nutanix Prism.
AHV is the hypervisor while DSF and App Mobility Fabric are functional layers in the Controller VM (CVM).
Acropolis also refers to the base software running on each node in the cluster.
AOS
19
Lesson 7: AHV
Nutanix AHV is a comprehensive enterprise virtualization solution tightly integrated into Acropolis
and is provided with no additional license cost.
AHV delivers the features required to run enterprise applications, for example:
You manage AHV through the Prism web console (GUI), command line interface (nCLI/aCLI), and REST
APIs.
VM features
Intelligent placement
Live migration
Converged Backup/DR
Image management
VM operations
Analytics
Data path optimization
Lesson 8: Prism
20
Prism is the management plane that provides a unified management interface that can generate
actionable insights for optimizing virtualization, provides infrastructure management and everyday
operations.
Prism gives Nutanix administrators an easy way to manage and operate their end-to-end
virtualized environments. Prism includes two software components: Prism Element (also called the Prism
web console) and Prism Central.
The Home dashboard provides a dynamic summary of the status of your cluster. The information
available to you on the home page includes:
21
Term Description
Hypervisor Summary Displays the name and version number of the hypervisor.
Storage Summary Displays the total, used, and unused storage space in the cluster.
Displays the total number of VMs in the cluster broken down by on,
VM Summary
off, suspended, paused, and unknown states.
Displays the number of hosts and blocks in the cluster, as well as the
Hardware Summary
model numbers of the hardware.
Displays I/O operations per second (IOPS) in the cluster over a3-hour
time period. Placing the cursor anywhere on the horizontal axis
displays the value at that time. (These display features also apply to
Cluster-wide Controller IOPS the I/O bandwidth and I/O latency monitors.)For more in depth
analysis, you can add this chart (and any other charts on the page) to
the analysis page by clicking the blue link in the upper left of the
chart.
Displays I/O bandwidth used per second in the cluster. The value is
Cluster-wide Controller IO displayed in an appropriate metric (MBps, KBps, and so on)
Bandwidth depending on traffic volume. Cluster-wide Controller Latency Displays
the average I/O latency (in milliseconds) in the cluster.
Displays the current CPU utilization percentage along with the total
Cluster CPU Usage
available capacity (in GHz).
Displays the current memory utilization percentage along with the
Cluster Memory Usage
total available capacity (in GB).
22
Displays information indicating whether the cluster is protected
currently from potential data loss due to a component failure. Click
anywhere in this field to display a dialog box. Data Resiliency Status
window with more information.
Nutanix offers three licensed editions of AOS and Prism, and a licensing or subscription model for
Add-ons. Subscription models are for one to five-year terms.
Prism Element and Central are collectively referred to as Prism Starter. They are included with
every edition of Acropolis to facilitate both single and multicluster management. Two additional licensing
tiers are available - Pro and Ultimate. For more information, see the Nutanix website.
Starter Licenses are installed by default, on each Nutanix node and block. They never expire and
they do not require registration on your assigned Nutanix customer portal account.
If it is a software-only purchase, starter is not built in. The license term is 1 to 7 years.
23
Pro and Ultimate licenses are downloaded as a license file from the Nutanix Support Portal and
applied to your cluster using Prism.
With Pro or Ultimate or after upgrading to the Pro or Ultimate license, adding nodes or clusters
to your environment, requires you to generate a new license file for download and installation.
For more information about the different features that are available with Acropolis Starter, Pro,
and Ultimate, please see: https://www.nutanix.com/products/software-options
Prism licenses are available on a per-node basis through a yearly subscription. There are three
licensing tiers in Prism: Starter (core HCI management), Prism Pro, and Prism Ultimate.
PRISM STARTER:
Prism Starter provides intuitive management of the entire stack, from storage and compute
infrastructure, physical and virtual networking, and VMs. Prism RBAC enables granular control on which
users can perform what actions on entities such as VMs, applications, reports, and clusters. The Prism
Starter license comes included with every Nutanix AOS license.
24
PRISM PRO:
The Prism Pro tier provides advanced analytics and intelligent insights into managing a Nutanix
environment. These features include performance anomaly detection, capacity planning, custom
dashboards, reporting, and advanced automation capabilities. You can license the Prism Pro feature set
to unlock it in Prism Central.
PRISM ULTIMATE:
Prism Ultimate provides advanced capabilities that build on all the features of Prism and Prism
Pro. These features include application discovery, cost metering, budgeting, and chargeback.
AOS and hypervisor installation as well as cluster creation and starter license install, is performed
through the Nutanix Foundation tool. This process is performed after the physical hardware installation
and configuration.
Foundation software can run as a service within AOS (Control VM based) or as a standalone virtual
machine. The standalone Foundation VM is for bare-metal installs and is used by Certified Nutanix Service
Partners as part of the initial cluster installation.
The CVM based Foundation service cannot be used for bare-metal installation and requires either
factory-prepared nodes or previously installed nodes selected for reinstallation. A CVM must exist on at
least one node and it cannot be participating in an existing cluster. With Prism Central 5.17, Foundation
can now be used to remotely install Nutanix clusters anywhere in the world. This process uses the CVM
based Foundation service.
25
Once the nodes have been imaged with the chosen base software (AOS) and Hypervisor, the out-
of-the-box cluster configuration can immediately be used to create VMs, set up networks, and run
production.
Through Foundation, the cluster, a logical entity that consists of multiple physical nodes, has a
number of preconfigured guest VMs called Controller Virtual Machines (CVMs), one on each node.
26
Module 3: UNDERSTANDING CLUSTER MANAGEMENT CONCEPTS
Lesson 1: Overview
A node is an x86 server with compute and storage resources. A single cluster can have a maximum
of 32 nodes for an AHV cluster and 48 nodes for an ESXi cluster. Different hardware platforms are available
to address varying workload needs for compute and storage.
In a typical Nutanix cluster, a block is a chassis that holds one to four nodes, and contains power,
cooling, and the backplane for the nodes. The number of nodes and drives depends on the hardware
chosen for the solution.
27
The Nutanix Cluster:
Lesson 3: Prism
Prism includes two software components: Prism Element and Prism Central.
28
Lesson 4: Prism Element
Prism Element provides a graphical user interface to manage and monitor most activities in a
single Nutanix cluster.
Some of the major tasks you can perform using Prism Element include:
1. Infrastructure Management:
Streamline common hypervisor and VM tasks.
Configure, and manage clusters for storage and virtualization.
Deploy, configure, migrate, and manage virtual machines.
Create datastores, manage storage policies, and administer DR.
29
30
2. VM Management
The virtual machine (VM) dashboard displays dynamically updated information about virtual machines in
the cluster. There are two available views, the Overview view and the Table view:
The VM Overview view displays VM-specific performance and usage statistics on the left plus the
most recent VM-specific alert and event messages on the right.
The VM Table view displays information about each VM in a tabular form. The displayed
information is dynamically updated to remain current.
You can create and manage VMs directly from Prism Element when the hypervisor is either ESXi or AHV.
Prism Element only manages the cluster it is part of, each Nutanix cluster in a deployment has a unique
Prism Element instance for management. Prism Central allows you to monitor and manage all Nutanix
clusters from a single GUI:
31
Prism Central is an application you can deploy in a VM (Prism Central VM) or in a scale out cluster of
VMs (Prism Central instance), either manually, by importing a VM template, or via one click from Prism
Element. You can run a Prism Central VM in a VM of any size; the only difference is the amount of CPU
and memory available to the Prism Central VM for VM management. You can deploy a Prism Central
instance initially as a scale-out cluster or, if you are running it as a single VM, easily scale it out with one
click using Prism Element. The design decisions involved in using this architecture are dramatically simpler
than legacy solutions. You only need to answer two questions before deploying:
This extensible architecture allows you to enable value-added features and products, such as Prism
Pro, Calm, and Flow networking within Prism Central. These additional features operate within a single
Prism Central VM or clustered Prism Central instance and do not require you to design or deploy separate
products.
32
The main elements of the Prism Central home page are as follows:
1. The Entities menu, accessed by clicking the collapse button at the top left. The Entities menu, displayed
as a pane on the left, provides a number of different management features grouped into ten different
categories. The first two options in the left pane provide quick access to the home dashboard of Prism
Central and a list of all VMs in the cluster. The seven categories after that provide access to various
dashboards, focused on virtual infrastructure, policies, hardware, activity, operations, administration, and
services. The tenth and final option in the left pane allows you to access Prism Central global settings.
2. The Search field, allows you to search Prism Central for different types of entities, such as VMs, clusters,
nodes, security policies, projects, reports, events, alerts, and so on. The Search field is context-sensitive
and changes to match your location in Prism Central. As seen in the figure above, the Search field displays
"Dashboard" on the home page of Prism Central. In addition to searching for entities, you can also
bookmark specific search filters if you intend to return to them often, and use both simple and complex
query options to quickly find information. For details, including syntax rules, keywords, and query
operators, see the Searching for Information section of the Prism Central Guide on the Nutanix Support
Portal.
3. The Prism icon at the top-middle of the page, which is clickable and allows you to return to the home
dashboard from anywhere in Prism Central.
4. The Tasks, Help, Settings, and User Menu buttons at the top right.
5. The Main Dashboard and Manage Dashboard options near the top left allow you to switch between the
default Prism Central dashboards and any custom-built dashboards that you may have create and saved.
6. The Reset Dashboard, Add Widgets, and Data Density options near the top right allow you to modify
the dashboard you are viewing.
33
7. The main space of the home page contains a number of default widgets that display information about
the clusters you have registered with Prism Central, such as storage, latency, memory and CPU usage,
reports, runway, and so on.
34
Prism Self Service Administration–
The Prism Self Service feature allows you to create projects where consumers of IT infrastructure
within an enterprise—individual users or teams such as development, test, and DevOps—can provision
and manage VMs in a self-service manner, without having to engage IT in day-to-day operations. Prism
Self Service uses the resources provided by a single AHV cluster. (Other hypervisors are not supported
platforms for Prism Self Service.)
Role-Based Access Control (RBAC) restricts access to authorized users. For any role you are assigned,
privileges are given to you to modify certain configurations. Prism Central includes a set of predefined
roles as well as allows you to define additional custom roles.
There are three roles to consider when configuring Prism Self Service:
1. Prism Central administrator: The Prism Central administrator adds an Active Directory that
includes the pool of self-service users and (optionally) creates one or more self-service
administrators. Prism Central administrators also create VMs, images, and network configurations
that may be consumed by self-service users.
Creates a project for each team that needs self-service and adds users and groups to the projects.
Configures roles for project members. A project member can access only the entities or perform
only the actions defined in the role assigned to that project member.
Monitors resource usage by various projects and its VMs and members, and then adjusts resource
quotas as necessary.
3. Project user: These are the users assigned to a project by a self-service administrator. They can
perform any action that the self-service administrator grants them. The permissions are
determined by the roles assigned to the users and groups in the project. When project users log
in, they see a custom self-service GUI interface that shows only what the role permission sallow.
Project users create and manage only what they need.
35
Services Enablement–
You can enable selected services through Prism Central, such as:
Nutanix Calm: You can select, provision, and manage your business applications across all your
infrastructure for both private and public clouds through the Nutanix Calm feature. Nutanix Calm
provides automated application life cycle management, custom blueprints for the setup and
management of enterprise applications, a marketplace to publish the blueprints to end users, and
automated hybrid cloud management to provision your hybrid cloud architecture.
Nutanix Files: Is a software-defined, scale-out file storage solution that lets you share files in a
centralized and protected location to eliminate the requirement of a third-party file server. Files
offerings also include File Analytics, for statistics and monitoring of file servers, and the Files
Manager, for a unified control plane of all file servers.
Foundation Central: You can manage several Foundation instances from a single pane of glass,
allowing you to create clusters of remote nodes without needing to configure each of them
individually.
Nutanix Karbon: A curated turnkey offering that provides simplified provisioning and operations
of Kubernetes clusters. Kubernetes is an open-source container orchestration system for
deploying and managing container-based applications.
Nutanix Objects: Is a software-defined Object Store Service. Objects addresses storage related
use cases for backup, and long-term retention and data storage for your cloud-native applications
by using standard S3 APIs.
36
Task Automation (Prism Pro)–
The X-Play feature allows you to automate routine administrative tasks, and auto-remediate
issues that may occur in your system. This automation is achieved by creating playbooks.
Playbooks allows you to define a trigger that results in the execution of an action or a series of actions. A
trigger may be an event that occurs in the system, such as an alert or a request made by you. The resultant
actions that you configure can be VM actions, communication actions, and alert or report actions.
Prism Central includes machine-learning capabilities that analyze resource usage over time (XFit) and
provide tools to monitor resource consumption, identify abnormal behavior and guide resource planning.
These tools include:
VM "right sizing" where VMs are analyzed and those that exhibit inefficient profiles are identified.
Anomaly detection to record when performance or resource usage is outside an expected range
based on learned VM baseline behavior.
37
Smart alerts can be triggered when specified anomalies are recorded and reports that summarize
cluster efficiency.
Prism Central requires 21 days of data from a cluster to calculate the baseline runway estimates. No
estimates appear when insufficient data is available. In addition, it takes a day after registering a cluster
for the data to appear in Prism Central. Prism Central checks 90 days of past data for monthly seasonality
and adjusts the baseline accordingly if it finds a seasonality matching pattern.
The Prism Web Console allows you to import ISOs and disk image files for OS and application
installation, and enables you to reuse disk images. This image service allows you to assemble a repository
of image files in different formats (raw, vhd, vhdx, vmdk, vdi, iso, qcow2, and ova) that you can later use
when creating virtual machines. How image creation, updates, and deletions work depends on whether
or not Prism Element is registered with Prism Central.
Images that are imported to Prism Element reside in and can be managed from Prism Element. If
connected to Prism Central, you can migrate your images over to Prism Central for centralized
management. This will not remove your images from Prism Element, but will allow management only in
Prism Central. So, for example, if you want to update a migrated image, it can only be done from Prism
Central, not from Prism Element.
Registration with Prism Central is also useful if you have multiple Prism Element clusters managed by
a single instance of Prism Central. In this scenario, if you upload an image to a local Prism Element
instance, for example, this is what happens:
The image is available locally on that Prism Element instance. (Assuming it has not been migrated
to Prism Central.)
When you create a VM using that image, the image is copied to other Prism Element clusters, is
made active, and is then available for use on all Prism Element clusters managed by that instance
of Prism Central.
38
Understanding the Image Service
Post-Import Actions,
For more information on how to create a VM from an imported image, see the Prism Web Console Guide
on the Support Portal.
Nutanix provides a mechanism to perform nonintrusive rolling upgrades through Prism. This simplifies the
job of the administrator and results in zero loss of services.
AOS:
Each node in a cluster has a CVM which runs AOS. When upgrading a cluster, all nodes must be
upgraded to the same AOS version.
Nutanix provides a live upgrade mechanism that allows the cluster to run continuously while a rolling
upgrade of the nodes is started in the background. There is no downgrade option.
HYPERVISOR SOFTWARE:
Hypervisor upgrades provided by vendors such as VMware and qualified by Nutanix. The upgrade
process updates one node in a cluster at a time.
39
NUTANIX CLUSTER CHECK:
To help maintain cluster health and take advantage of the latest NCC technology, Nutanix
recommends that you keep your NCC version current.
FOUNDATION:
Nutanix Foundation installation software allows you to configure a pre-imaged node, or image a
node with a hypervisor and an AOS of your choice. Ensure that you use the minimum version of
Foundation required by your hardware platform.
40
Lesson 8: LCM Overview
Introduction to LCM
The Life Cycle Manager (LCM) tracks software and firmware versions of all entities in the cluster.
It performs two functions: taking inventory of the cluster and performing updates on the cluster.
LCM consists of a framework consisting of a set of modules for inventory and update. LCM supports
firmware updates only for the following platforms:
Nutanix (NX)
Dell XC / XC Core
Lenovo HX / HX Ready
HPE DX
Fujitsu XF
Intel DCB
HPE DL (G10)
Inspur InMerge
The LCM framework is accessible through the Prism interface. It acts as a download manager for LCM
modules, validating and downloading module content. All communication between the cluster and LCM
modules goes through the LCM framework.
LCM modules are independent of AOS. They contain libraries and images, as well as metadata and
checksums for security. Currently, Nutanix supplies all modules.
The LCM framework targets a configurable URL to download content from the LCM modules.
41
You can use LCM to display software and firmware versions of entities in a cluster. Inventory
information for a node is persistent for as long as the node remains in the chassis. When you remove a
node from a chassis, LCM will not retain inventory information for that node. When you return the node
to the chassis, you must perform inventory again to restore the inventory information.
To perform an inventory:
1. Open LCM
2. Navigate to Inventory on the left pane and click Perform Inventory. If you do not have auto-date
enabled, and a new version of the LCM framework is available, LCM will display the following warning:
Enable LCM Auto Inventory. To enable this feature, click Settings and select the Enable LCM Auto
Inventory check box in the dialog box that appears
42
Understanding LCM:
Nutanix uses LCM to perform single click, non-disruptive upgrades. Prism Central provides
software and firmware upgrades from a centralized, easy to manage, intuitive control plane. LCM does
the work of managing all upgrade dependencies for software and firmware components.
LCM normalizes firmware upgrades across hardware platforms, and provides a single, unified process
regardless of hardware vendor.
In addition, in Prism Central, an infrastructure inventory shows software (e.g. AOS, Hypervisor,
NCC, Objects, MSP) and firmware (e.g. BMC, BIOS, HBA, Disk) versions running across the environment,
including any new versions available for deployment. LCM enables the deployment of multiple
infrastructure upgrade packages across a distributed cluster like AOS and Prism Central.
Nutanix customers can choose from a range of hardware platform vendors, which results in LCM
being able to simplify firmware upgrades across different platform types and provide organizations with
a single, unified, upgrade solution. The LCM software upgrade experience is the same whether
infrastructure is deployed on-premises or in a public cloud.
43
Module 4: UNDERSTANDING STORAGE CONCEPTS
Lesson 1: Overview
After completing this module, you will be able to:
Identify components of the AOS Distributed Storage
Identify the storage dashboard
Define storage components
Describe snapshots
Describe redundancy factor and replication factor
Identify space-saving techniques such as deduplication, compression, and erasure coding
AOS Distributed Storage logically divides user VM data into extents which are 1MB in size. These extents
may be compressed, erasure coded, deduplicated, snapshotted or left untransformed. Extents can also
move around; new or recently accessed extents stay on faster storage (performance tier) while colder
extents move to capacity tier. AOS utilizes a least recently used” algorithm to determine what data can
be declared “cold” and migrated to capacity tier. Additionally, AOS attempts to maintain data locality for
VM data – so that one copy of each vDisk’s data is available locally from the CVM on the host where the
VMis running.
44
AOS Distributed Storage presents all the storage devices in the cluster to the hypervisor as a pool of
storage provides cluster-wide storage services:
Snapshots
Clones
HA/DR
Deduplication
Compression
Erasure coding
The Controller VMs (CVMs) running on each node combine to form an interconnected network within
the cluster, where every node in the cluster has access to data from shared SSD, HDD, and cloud resources.
The CVMs allow for cluster-wide operations on VM-centric software-defined services: snapshots, clones,
high availability, disaster recovery, deduplication, compression, erasure coding, and so on.
Hypervisors (AHV, ESXi, Hyper-V) and AOS communicate using the industry-standard protocols iSCSI, NFS,
and SMB3.
Nutanix Information Lifecycle Management (ILM) will determine tier placement dynamically
based upon I/O patterns and will move data between tiers.
Note: As of 5.11.1, for AES to be enabled, the node must have a minimum of 8 flash devices or any amount
of flash devices if at least one device is NVMe.
The Oplog–
The oplog is similar to a filesystem journal and is used to service bursts of random write
operations, coalesce them, and then sequentially drain that data to the extent store. For each write OP,
the data is written to disk locally and then synchronously replicated to one or more remote CVM oplogs
before the write is acknowledged for data availability purposes. The number of replicas depends on the
RF of the container. If the RF is 2, data will be written to one disk locally and replicated to one other oplog.
If the RF is 3, data will be written to one disk locally and replicated to two other oplogs.
All CVMs participate in oplog replication. Individual replica location is dynamically chosen based
upon load. The oplog is stored on the performance tier on the CVM to provide extremely fastwrite I/O
performance. oplog storage is distributed across the SSD devices attached to each CVM.
45
For sequential workloads, the oplog is bypassed and the writes go directly to the extent store.
If data is currently sitting in the oplog and has not been drained, all read requests will be directly
fulfilled from the oplog until they have been drained, where they would then be served by the extent
store/unified cache.
For containers where fingerprinting has been enabled, all write I/Os will be fingerprinted using a
hashing scheme allowing them to be deduplicated based on the fingerprint stored as part of the metadata
in Cassandra.
The file system automatically tiers data across different types of storage devices using intelligent
data placement algorithms. These algorithms make sure that the most frequently used data is available
in cache for the fastest possible performance.
46
In addition to the main menu (see Main Menu), the Storage screen includes a menu bar with a number of
options.
VIEW SELECTOR
ACTION BUTTONS
Are used to add a volume group or storage container.
PAGE SELECTOR
In the Table view, hosts and disks are listed 10 per page.
In the Table view, you can export the table information to a file in either CSV or JSON format.
The Storage Overview view displays storage-specific performance and usage statistics on the left
and the most recent storage-specific alert and event messages on the right.
The Storage Overview view has a number of fields.
47
Storage Diagram View
The Storage Diagram view displays information about storage pools and storage containers. The
displayed information is dynamically updated to remain current.
Selecting a storage container in the diagram displays information about that storage container in
the lower section of the screen. It has four tabs that display information about the selected storage
container, such as, Storage Container Usage, Storage Container Performance, Storage Container Alerts,
and Storage Container Events.
Selecting a storage pool in the diagram displays information about that storage pool in the lower
section of the screen. It has four tabs that display information about the selected storage container, such
as, Storage Pool Usage, Storage Pool Performance, Storage Pool Alerts, and Storage Pool Events.
48
Storage Table View:
The Storage Table view displays information about volume groups, storage pools, and storage
containers in a tabular form.
49
Lesson 4: Storage Components
Storage Pool–
A storage pool is a group of physical storage devices for the cluster including PCIe SSD, SSD, NVMe,
and HDD devices. The storage pool spans multiple nodes and scales as the cluster expands. A storage
device can only be a member of a single storage pool. Nutanix recommends creating a single storage pool
containing all disks within the cluster.
ncli sp ls displays existing storage pools.
Storage Container–
A storage container is a logical overlay of the pool of drives available in the cluster. Storage
containers enable an administrator to apply rules or transformations such as compression to a data set.
They hold the virtual disks (vDisks) used by virtual machines. Selecting a storage pool for a new storage
container defines the physical disks where the vDisks are stored.
The NutanixManagementShare storage container is a built-in storage container for Nutanix
clusters for use with the Nutanix Files and Self-Service Portal (SSP) features. This storage container is used
by Nutanix Files and SSP for file storage, feature upgrades, and other feature operations. Nutanix also
recommends that you do not delete this storage container even if you are not using these features. The
NutanixManagementShare storage container is not intended to be used as storage for vDisks, including
Nutanix Volumes.
A SelfServiceContainer storage container is created on the target cluster and used by Prism Self
Service for storage and other feature operations. To ensure proper operation of these features, do not
delete this storage container.
50
Volume Group–
A volume group is a collection of logically related virtual disks or volumes. It is attached to one or
more execution contexts (VMs or other iSCSI initiators) that share the disks in the volume group. You can
manage volume groups as a single unit.
Each volume group contains a UUID, a name, and iSCSI target name. Each disk in the volume group also
has a UUID and a LUN number that specifies ordering within the volume group. You can include volume
groups in protection domains configured for asynchronous data replication(Async DR) either exclusively
or with VMs.
Volume groups cannot be included in a protection domain configured for Metro Availability, in a
protected VStore, or in a consistency group for which application consistent snapshotting is enabled.
vDisk–
A vDisk is a subset of available storage within a storage container that provides storage to virtual
machines. A vDisk is any file over 512 KB on DSF, including VMDKs and VM disks. vDisks are broken up
into extents, which are grouped and stored on physical disk as an extent group.
Note: The Nutanix platform now allows you to migrate a vdisk from one storage container to another,
while it is attached to a guest VM without needing to shutdown or delete that VM.
51
Datastore–
A datastore is a hypervisor construct that provides a logical container for files necessary for VM
operations. In the context of the DSF, each container on a cluster is a datastore.
A snapshot is a copy of a system's state (files and data) at a specific point in time. Nutanix DR
solutions create snapshots for virtual machine, containers (for metro availability) or for the hypervisor
itself. Using the redirect-on-write (ROW) implementation for its snapshots, Nutanix reduces any
performance impacts associated with other implementations.
Snapshots are created locally and stored locally or remotely. Snapshots are created based on time
schedules that you construct. Each Nutanix DR solution permits time schedules with certain frequency for
creation of snapshots. The frequency of creating snapshots also determines the resources like memory
that are required to run the DR solution successfully.
Clones:
Clones (essentially writable snapshots) are closely related to snapshots. Distributed storage uses
the same underlying mechanism for cloning that it does for snapshots, so it benefits from the same
metadata optimizations.
52
Lesson 6: Redundancy Factor (Fault Tolerance)
Before we can talk about replication, we need to understand redundancy factor. So, what is
redundancy factor?
Redundancy factor, is the number of components that can be down in your cluster at anytime +1.
By default, Nutanix clusters have redundancy factor 2, which means they can tolerate the failure of a
single node or drive. The larger the cluster, the more likely it is to experience multiple failures. Redundancy
factor 3 is a configurable option that allows a Nutanix cluster to withstand the failure of two nodes or
drives in different blocks. Without redundancy factor 3, multiple failures cause cluster unavailability until
the failures are repaired.
The replication factor is directly tied to the redundancy factor configured on the cluster. So, what
is replication factor?
It is the number of copies of any piece of data that will be maintained in a cluster. Replication
factor is set at the container level. So, how do you define the replication factor for a container?
PRISM UI
In Prism, open container properties and update, or set when creating a container.
NCLI
Use the 'RF' option
53
Replication Factor 2 (RF2)
With RF2, we have two copies of data that is maintained in a cluster.
54
The Nutanix platform uses replication factor (RF) and checksums to ensure data redundancy and
availability in the case of a node or disk failure or corruption.
Nutanix Availability Domain is a unique feature provided by Nutanix hyper converged platform to
maintain the redundancy of data and components using Fault Tolerance (FT) and Replication Factor (RF).
Availability Domains determine the optimal placement for replicas based upon infrastructure layout. With
Availability Domains, while unlikely, you can lose a full Nutanix block (one or multiple nodes) and still have
copies of the data available. There is no admin interaction required to enable this. There are also silent
data-integrity checks to ensure data corruption issues are identified and fixed even before the VMs/Apps
notice it. A “domain” is a group of possible replacements for a component variable, like a disk, node, block,
or rack. It is calculated as a set of possible component values which will keep the cluster functional should
one of these components become unavailable.
Nutanix Availability Domains provide data resiliency to prevent loss of user data by maintaining
multiple copies of data due to unexpected component(s) failure. Nutanix cluster critical components are:
Data–
Being the most critical key component of awareness, Acropolis Distributed Storage ensures that
the VM’s data has copies written to other disks/nodes/blocks/racks within the Nutanix cluster. That way,
in case of a disk/node/block/rack failure, the data remains available without any damage or corruption,
and the cluster stays up.
Metadata–
Describes where and how data is stored in a file system, letting the system know on which node,
disk, and in what form the data resides. The Acropolis Distributed Storage metadata store, internally called
Cassandra, is a NoSQL key-value store built on top of heavily modified Apache Cassandra database.
Medusa is the AOS service which acts as an interface to Cassandra and runs in the CVM.
Lesson 9: Deduplication
55
Deduplication is a process that eliminates redundant data and reduces storage overhead. Deduplication
can be combined with compression and erasure coding on the same storage container to optimize
capacity efficiency.
Nutanix recommends using inline compression (compression delay = 0), because it compresses
only large/sequential writes and does not affect random write performance. This also increases the usable
size of the performance tier, increasing effective performance and enabling more data to sit in the
performance tier.
For sequential data that is written and compressed inline, the RF copy of the data is compressed
before transmission, further increasing performance since it is sending less data across the network.
Inline compression also pairs perfectly with erasure coding. For instance, an algorithm may represent a
string of bits with a smaller string of 0s and 1s by using a dictionary for the conversion between them, or
the formula may insert a reference or pointer to a string of 0s and1s that the program has already seen.
Text compression can be as simple as removing all unneeded characters, inserting a single repeat
character to indicate a string of repeated characters, and substituting a smaller bit string for a frequently
occurring bit string. compression can reduce a text file to 50% or a significantly higher percentage of its
original size.
To provide a balance between availability while reducing the amount of storage required, DSF
provides the ability to encode data using erasure coding (EC-X). Similar to the concept of RAID-5 and RAID-
6 where parity is calculated, EC-X encodes a strip of data blocks on different nodes and calculates parity.
In the event of a host or disk failure, the parity can be leveraged to calculate any missing data blocks
56
(decoding). In the case of DSF, the data block is an extent group and each data block must be on a different
node and belong to a different vDisk.
Before EC-X
If you have configured redundancy factor 2, two data copies are maintained. For example,
consider a 6-node cluster with 4 data blocks (a b c d). In this example, we start with 4 data blocks (a b c d)
configured with redundancy factor 2.
After EC-X
When the data becomes cold, the erasure code engine computes parity “P” for the data by
performing an exclusive OR operation. Once parity is computed, the data block copies are removed and
replaced with the parity information. Redundancy through parity results in data reduction because the
total data on the system is now a+b+c+d+P instead of 2 × (a+b+c+d).
Note: Each block in the stripe is placed on a separate node to protect from a single node failure.
If the node containing a data block fails, the block is rebuilt using the rest of the erasure coded stripe. The
block is then placed on a node that does not have any other members of this erasure coded stripe.
57
Module 5: MANAGING VMs WITH PRISM CENTRAL
Lesson 1: Overview
58
You can use Prism Central to create virtual machines (VMs) in Acropolis managed clusters. You
can create a VM with Prism by clicking the Create VM button. The Create VM wizard appears. You need
to update information in the different tabs in the wizard by specifying a name, description, cluster,
number of VMs, storage, and network information.
Prism also has self-service capabilities that enable administrators or project members with the
required permissions to create VMs. In this scenario, users will select from a list of pre-defined templates
for VMs and disk images to create their VM.
Finally, VMs can be updated after creation, cloned, or deleted as required. When updating a VM,
you can change compute details (vCPUs, cores per vCPU, memory), storage details (disk types and
capacity), as well as other parameters that were specified during the VM creation process.
For additional information, refer to the Creating a VM section of the Prism Central Guide on the
Nutanix Support Portal.
This process is slightly different from creating a VM with administrative permissions. This is
because self-service VMs are based on a source file stores in the Prism Central catalog. To create a VM
using Prism Self-Service:
1. In Prism Central, navigate to VM dashboard, click the List tab, and click Create VM.
2. Select the source for the VM. Here, you can choose to create a VM from a template in the catalog
or from a mounted disk image.
3. In the Browse Catalog tab, select either the target VM template or one or more disk images.
4. In the Deploy VM tab, enter a name, select a target project, select a network and
(optionally)assign categories to the VM.
If you are using a VM template, in the Deploy VM tab, you can also choose the device to boot
from, view information about the VM template itself, and view guest customization
information.
If you are using disk images, in the Deploy VM tab, you can add a new disk or a CD-ROM drive,
and choose the device to boot from.
5. After all the fields have been updated and verified, click Save to create the VM.
59
Nutanix VirtIO
Nutanix VirtIO is a collection of drivers for paravirtual devices that enhance the stability and performance
of virtual machines on AHV.
60
Nutanix VirtIO drivers:
The VMs dashboard summary view displays information about VMs across the registered clusters
and allows you to access detailed information about each VM. The dashboard includes five tabs; Summary,
List, Alerts, Events, and Metrics.
The Summary Tab
Clicking the List tab, which appears by default when you first open the page, displays a list of the
VMs across the registered clusters. You can filter the VMs list based on a variety of parameter values. To
apply a filter, select a parameter and check the box of the desired value (or multiple values) you want to
use as a filter.
61
The Alerts Tab
The Alerts tab displays a table of alerts. This tab provides the same features and options as the
Alerts dashboard, except it is filtered to display just VM-related alerts across the registered clusters.
Events View
The Events tab displays a table of events. This tab provides the same features and options as the
Events dashboard, except it is filtered to display just VM-related events across the registered clusters.
The Metrics Tab
The Metrics tab allows you to view performance metrics across the VMs. Clicking the Metrics tab
displays a list of available metrics; click the metric name to display the relevant performance information
to the right.
62
Lesson 4: Managing a VM
You can manage VMs directly from Prism Central when the hypervisor is either ESXi or AHV. There
are multiple options available to you when managing VMs, such as, update the VM configuration, delete
the VM, clone the VM, launch a console window, update the network configuration, and more.
You can perform these tasks by using any of the following methods:
Select the target VM in the List tab of the VMs dashboard and choose the required action from
the Actions menu.
Right-click on the target VM in the List tab of the VMs dashboard and select the required action
from the drop-down list.
Go to the details page of a selected VM and select the desired action.
63
Modifying a VM's Configuration:
1. Select the VM and click Update.
2. The Update VM dialog box includes the same fields as the Create VM dialog box. Make the
required changes and click Save.
Deleting a VM:
1. Select the VM and click Delete.
2. A confirmation prompt will appear; click OK to delete the VM.
Cloning a VM:
1. Select the VM and click Clone.
2. The Clone VM dialog box includes the same fields as the Create VM dialog box. However, all fields
will be populated with information based on the VM that you are cloning. You can either:
3. Enter a name for the cloned VM and click Save, or Change the information in some of the fields
as desired, and then click Save.
Exporting a VM as an OVA:
1. Select the VM and click Export as OVA.
2. In the Export as OVA window enter a name for the OVA, select the disk format, and then click
Export.
3. The default format is QCOW2. If you want to use the OVA to deploy a VM with disks formatted as
VMDK, then select VMDK as the disk format.
Categories allow you to implement a variety of policies across entity groups, and Prism Central
allows you to quickly view any established relationships.
A category is a grouping of entities into a key value pair. Typically, new entities are assigned to a
category based on some criteria. Policies can then be tied to those entities that are assigned (grouped by)
a specific category value.
For example, you might have a Department category that includes values such as engineering,
finance, and HR. In this case, you could create one backup policy that applies to engineering and HR and
a separate (more stringent) backup policy that applies to just finance.
64
Affinity and Anti-affinity
Nutanix Guest Tools (NGT) is an in-guest agent framework that enables advanced VM
management functionality through the Nutanix Platform.
The NGT Installer, which allows you to install NGT in a guest VM.
The Nutanix Guest Agent (NGA) Service, which maintains a communication channel between the
Nutanix CVM and guest VMs.
The Nutanix VirtIO Package, which includes the Nutanix VM mobility drivers that enable VM
migration, in-place hypervisor conversion, and CHDR.
65
Lesson 7: NGT Requirements and Limitations
You must configure the cluster virtual IP address on the Nutanix cluster. If the virtual IP address
of the cluster changes, it will impact all the NGT instances that are running in your cluster. For
more information, see the Impact of Changing Virtual IP Address of the Cluster section of the
Prism Web Console Guide on the Support Portal.
VMs must have at least one empty IDE CD-ROM or SATA slot to attach the ISO.
The following ports must be open between the Controller VMs and guest VMs on which NGT is
installed:
Controller VM: 2074 TCP port 2074 must be open in the Controller VM, so that user VMs can
communicate with the Controller VM.
User VM: TCP port 23578 if you want to use the VSS service.
The hypervisor should be ESXi 5.1 or later, or AHV 20160215 or later.
Guest VMs must be connected to a network that can be accessed by using the virtual IP address
of the cluster.
In AOS 5.18 and later versions, NGT backwards compatibility between CVMs and guest VMs is
supported. So, for example, if your CVMs are running one version of NGT while your guest VMs are running
a higher/later version, you can still continue to use NGT.
For Windows Server Edition VMs, ensure that Microsoft VSS service is enabled before installing
NGT.
When you connect a VM to a volume group (VG), NGT captures the IQN of the VM and stores the
information. If you change the VM IQN before the NGT refresh cycle occurs and you take a
snapshot of the VM, the NGT will not be able to provide auto restore capability because the
snapshot operation will not be able to capture the VM-VG connection. As a workaround, you can
manually restart the Nutanix guest agent service by running the $sudoservice ngt_guest_agent
restart command on the Linux VM and from the Services tab of the Windows VM to update NGT.
See the supported operating system information for the specific NGT features to verify if an operating
system is supported for a specific NGT feature.
66
Requirements by Operating System:
By default, the NGT feature is disabled for a VM running in a Nutanix cluster. To install and use
the NGT feature in a VM, you must first enable the NGT feature (allow the installation and usage of NGT)
in a VM and then mount the NGT installer (ISO disk file) in that VM by using the Prism Element web
console.
When you are enabling the NGT feature and mounting the NGT installer in a VM, you must also
select the NGT applications (self-service restore, volume snapshot service, and application consistent
snapshots) that you want to use in that VM.
67
After you select the Enable Nutanix Guest Tools check box, the VSS and application-consistent
snapshot applications are automatically selected. With these applications enabled, the Nutanix native in-
guest VmQuiesced Snapshot Service (VSS) agent is used to take application consistent snapshots for all
the VMs that support VSS. This mechanism takes application consistent snapshots without any VM stuns
(temporary unresponsive VMs) and also enables third-party backup providers like Commvault and Rubrik
to take application-consistent snapshots on the Nutanix platform regardless of which hypervisor is used
in the cluster.
Prism Element now enables the NGT feature, mounts the NGT installer, and attaches a CD (ISO
disk file) with the volume label NUTANIX_TOOLS to the selected VM.
Upgrading NGT–
If you have upgraded your AOS, you can upgrade NGT to the latest version. NGT is not
automatically upgraded. You can easily upgrade NGT on your VMs by reinstalling the NGT package on the
VM.
Note: If the Nutanix VM Mobility driver version in AHV is of the latest version than the one installed in
your VM, during the installation of NGT this driver is upgraded and a prompt that asks for a restart of the
VMs is displayed. Restart the VMs to upgrade the mobility driver version successfully.
Reconfiguring NGT–
If you reconfigure the cluster IP address, NGT loses connection with the CVM. You must then
reconfigure NGT to establish the connection again. This can be accomplished by mounting NGT on the
VM, then either restarting the VM or restarting the Nutanix Guest Agent service.
This will force NGA to fetch the latest configuration (new cluster IP address) mounted in the guest
VM. The guest VM can now use the new IP address to communicate with the cluster.
The Nutanix Enterprise Cloud Computing Platform fully supports live migration of VMs, whether
initiated manually or through an automatic process. All hosts within the cluster have visibility into shared
Nutanix datastores through the Controller VMs. Guest VM data is written locally and is also replicated on
other nodes for high availability.
Live migration can be performed on VMs both with and without virtual GPUs enabled. If a VM
with a vGPU is migrated, the GPU will continue to run while the VM itself is migrated in the background.
68
If you migrate a VM to another host, read requests are sent to a local copy of the data (if it exists).
Otherwise, the request is sent across the network to a host that does contain the requested data. As
remote data is accessed, the remote data is copied to storage devices on the current host, so that future
read requests can be local.
In VM clusters, VMs migrate from host to host within the cluster throughout the day and overtime
in order to optimize CPU and memory resources. Because distributed storage serves data locally to guest
VMs, the VM’s data must follow when it moves between hosts.
Due to the distributed and scalable nature of the Nutanix architecture, distributed storage keeps
data as close to the VM as possible to provide the fastest performance and minimize both cross talk and
network utilization. When a VM moves from one hypervisor node to another (or during a HA event), the
newly migrated VM’s data is served by the now local CVM.
69
Lesson 12: VM High Availability
VM High Availability (HA) is a feature built to ensure VM availability in the event of a host or block
outage. In the event of a host failure, the VMs previously running on that host will be restarted on other
healthy nodes throughout the cluster. The Acropolis leader is responsible for restarting the VM(s) on the
healthy host(s).
In segment-based reservation, the cluster is divided into segments to ensure enough space is
reserved for any host failure. Each segment corresponds to the largest VM that is guaranteed to be
restarted in case the failure occurs. The other factor is the number of host failures that can be tolerated.
Using these inputs, the scheduler implements admission control to always have enough resources
reserved so that the VMs can be restarted upon failure of any host in the cluster.
70
Module 6: MONITORING VMs & CLUSTER HEALTH
Lesson 1: Overview
Wouldn’t it make sense to take this same philosophy and apply it to your infrastructure? Distributed
systems are extremely complex. It’s not a matter of if a failure will occur, but a matter of when.
Nutanix software includes a variety of features to proactively identify and fix issues related to data
consistency and integrity, bit rot failures, and hard disk corruption.
CORRECTIVE ACTION
Nutanix Failed Disk Drive: Recovery start immediately.
Nutanix Node Failure: Recovery is started in 60 seconds.
71
LOW IMPACT
Nutanix prioritizes internal replication traffic. Each node has a queue called Admission Control.
VM I/O (front-end adapter) and maintenance tasks have a 75/25 split on each node in the cluster.
When a drive or node goes down, the metadata is quickly scanned to see what workloads have
been affected. This work is evenly distributed throughout the cluster.
The replication tasks are queued into a cluster-wide background task scheduler and trickle fed to
the nodes as their resources permit.
Nutanix provides a range of status checks to monitor the health of a cluster. Information about various
entities, such as VMs, hosts, disks, storage pools, storage containers, protection domains, remote sites,
and services, is continuously gathered and displayed in Prism.
Summary health status information for the entities listed above is displayed on the Home
dashboard.
In depth health status information is available through the Health dashboard.
Health Dashboard
The Health dashboard in Prism Element displays dynamically updated health information about
VMs, hosts, disks, storage pools, storage containers, protection domains, remote sites, and cluster
services in the cluster. To view the Health dashboard, select Health from the drop down menu on the left
of the main menu.
SUMMARY TAB
The Summary tab provides a summarized view of all the health checks according to check status
(Passed, Failed, Warning, Error, and Off) and check type (Scheduled, Not Scheduled, and Event Triggered).
72
CHECKS TAB
The Checks tab provides information about individual checks. Hovering the cursor over an entry
displays more information about that health check. You can filter checks by clicking the appropriate field
type and clicking Apply.
ACTIONS TAB
The Actions tab provides you with options to manage checks, set NCC frequency, run checks, and
collect logs.
A set of automated health checks are run regularly. They provide a range of cluster health
indicators. You can specify which checks to run and configure the schedules for the checks and other
parameters.
Cluster health checks cover a range of entities including AOS, hypervisor, and hardware
components. A set of checks are enabled by default, but you can run, disable, or reconfigure any of the
checks at any time to suit your specific needs.
To configure health checks, from the Actions menu on the Health dashboard, click Manage
Checks.
The displayed screen lists all checks that can be run on the cluster, divided into categories
including CVM, Cluster, Data Protection, File Server, Host, and so on. Sub-categories include CPU, disk,
and hardware for CVMs; Network, Protection Domains, and Remote Sites for Clusters; CPU and disk for
hosts; and so on.
When you select a check from the left pane, you will see a number of options. The options presented
depend on the check itself, and can include:
Viewing a history of all entities evaluated by this check, displayed in the middle of the screen.
Running the check.
Turn the check off.
Updating the alert policy associated with the check, if any.
Setting a schedule for the check.
Viewing causes and resolutions, as well as supporting reference articles on the Nutanix Knowledge
Base.
Logs regarding your Nutanix cluster and its various components can be collected directly from the
Prism web console. Logs can be collected a variety of tags that are available in the system. Some examples
are in the image below. The most common scenarios in which you will need to collect logs are when
troubleshooting an issue, or when you need to provide information for a Nutanix Support case.
73
74
1. On the Health dashboard, click Actions on the right pane and select Log Collector.
2. On the Node Selection page, choose the nodes for which you want to collect logs and click Next.
3. On the Log Settings page, you can choose to collect all logs or select specific tags for which you
want to collect logs. Click Next.
4. Select the period for which you want to collect logs, using the Duration, Cluster Date, and Cluster
Time fields.
5. Select a destination for the downloaded logs. Logs can be downloaded locally, made available via
Nutanix Support FTP or SFTP, or sent to a custom server. You can also choose to anonymize the
output, if needed.
6. Click Collect.
The Analysis dashboard in Prism Central allows you to create sessions with charts that can monitor
dynamically a variety of performance measures.
Analysis Session–
When you open the Analysis dashboard, it displays the session that you were last working in as
the default session. A session helps you correlate the metrics with the alerts and events for
troubleshooting. You can create new sessions to troubleshoot new situations and preserve the old
sessions created for old issues.
Switch Session–
A dropdown menu of all available sessions created and saved. Select the session you want to view
and modify the details for.
75
Alerts and Events Monitor–
Displays the alerts and events that occurred. The alerts and events occurring at any specific point of
time are displayed as a stacked bar with three colored segments. The hover image of the stacked bar
displays the following:
Time range for the stacked bar.
Red segment depicting Critical Alert with the number of alerts.
Yellow segment depicting Warning Alert with the number of alerts.
Grey segment depicting Events with the number of events.
Actions–
Provides the following three options for the current session:
Edit Session Details
Delete (session)
Close (session)
The Alerts dashboard in Prism Central allows you to view summary information about alert
messages across the registered clusters, access detailed information about each alert, and view alert
policies from any source (user defined, system defined, or external defined). The top pane includes two
tabs, the List and Alert Policies tabs.
List View:
The List tab, which appears by default when you first open the page, displays a list of active alerts
across the registered clusters.
76
This view can be used to order the alerts, enable alert emails and specify email addresses to which
alerts should be sent. You can also create a custom view, group the alerts, by cluster, severity, or impact
type as well as download the table of alerts in CSV format. The maximum number of alerts you can export
is 1000.
Clicking on an alert message in the dashboard or anywhere else the alert title appears, such as in
a search list, displays detailed information about that alert. The alert details appear in the left column.
Possible causes for the alert appear to the right. The most likely cause appears first with other possible
causes (if any) appearing below in the order of likelihood. Each cause includes a recommended corrective
action and in many cases a details section that provides additional context and instructions. At the top
right are buttons that can be used to acknowledge the alert or mark it as resolved.
77
notification for those alerts on the individual clusters through Prism Element (but keep email notification
for Nutanix customer support enabled).
To configure alert settings, reporting rules, and message templates, do the following:
1. Either, click Email Configuration on the Alerts dashboard, or click the gear icon and select Alert
Email Configuration from the Settings menu.
78
Alert Policies
The Alerts Policies view, based on the selection, displays user defined, system defined, or external
defined policies. You can create, update, delete, enable, and disable the user defined alert policies. The
79
system policy can be both viewed or modified and externally defined policies cannot be modified, but you
can view the policy.
Events Dashboard:
The Events dashboard summary view in Prism Central displays a list of event messages across the
registered clusters. Event messages describe cluster actions such as adding a storage pool or taking a
snapshot. Unlike alerts, event messages are simply informational without the need to acknowledge or
resolve. To filter the list, click the Filters button (upper right). This displays a pane for selecting filter values.
Check the box for each value to include in the filter. You can include multiple values.
You can filter the search on the following event parameters and values:
Event Type: Behavioral Anomaly, System Action, User Action
Cluster: Enter name in search field.
Create Time: Last 1 hour, Last 24 hours, Last week, From XXX to XXX
80
Event Type Displays the category in which the System Action, User Action,
event is classified. Behavioral Anomaly, DR
Cluster Displays the name of the cluster in (cluster name)
which the event was issued.
Create Time Displays the date and time when (date and time)
the event occurred.
The Nutanix Support Portal is available for support assistance, software downloads, and
documentation. To access the Nutanix Support Portal, type https://portal.nutanix.com in a browser and
hit enter.
Creating a Case
Nutanix's worldwide support is available 24x7x365, and our product support offerings include
industry-leading response times to address your mission critical deployments.
81
Be sure to have your serial number, hypervisor version, and AOS version handy.
Lesson 8: Pulse
Pulse provides diagnostic system data to the Nutanix Support team to deliver proactive, context-
aware support for Nutanix solutions.
The Nutanix cluster automatically and unobtrusively collects this information with no effect on
system performance.
82
Pulse shares only basic system-level information necessary for monitoring the health and status
of a Nutanix cluster. Information includes:
System alerts
Current Nutanix software version
Nutanix processes and Controller VM information
Hypervisor details such as type and version
When Pulse is enabled, it sends a message once every 24 hours to a Nutanix Support server by default.
Pulse also collects the most important system-level statistics and configuration information more
frequently to automatically detect issues and help improve resolution times. With this information,
Nutanix Support can apply advanced analytics to optimize your implementation and to address potential
problems.
Pulse is enabled by default. You can enable or disable Pulse at any time.
Pulse sends messages through ports 80/8443/443. If this is not allowed, Pulse sends messages
through your mail server. The Zeus leader IP address must also be open in the firewall.
83
Lesson 9: Using Insights
Nutanix includes a set of features on the Support Portal known collectively as "Insights" that provides
a predictive health and support automation platform. Insights dynamically analyzes the extent to which
you are following best practices in configuring your clusters for long-term reliability, availability, and
performance. Insights provides:
Predictive trends and analytics
Guidance for optimal configurations
Application best practices
Seamless automation support
Insights is available only for clusters with Pulse enabled. See the Pulse Health Monitoring section
in the Prism Web Console Guide.
84
Module 7: UNDERSTANDING DATA PROTECTION CONCEPTS
Lesson 1: Overview
85
NearSynchronous Replication for Disaster Recovery–
Nutanix NearSync builds on the asynchronous replication capabilities just described to create a
solution that can achieve an RPO lower than traditional asynchronous replication and very fast RTO. RPO
can be as low as 20 seconds without the distance limitations. When you configure asnapshot frequency
of 15 minutes or less, NearSync is automatically enabled.
Metro Availability synchronously replicates data to another site, ensuring that a real-time copy of
the data exists at a different location. Data is written synchronously to both sites, so it is always available
to applications in the event a site fails or needs maintenance. You can non disruptively migrate VMs
between sites for planned maintenance events or other needs.
The data protection features for a Nutanix cluster employ a number of components and capabilities.
Term Definition
Disaster Recovery Disaster Recovery (DR) is an area of failover planning that aims to
protect an organization from the effects of significant negative events.
DR allows an organization to maintain or quickly resume mission-
critical functions following a disaster.
RPO designates the variable amount of data that will be lost or will have
to be re-entered during network downtime.
Recovery Time Objective How much time does it take to recover after notification of business
(RTO) process disruption?
RTO is therefore the duration of time and a service level within which a
business process must be restored after a disaster in order to avoid
unacceptable consequences associated with a break in continuity.
RTO designates the amount of “real time” that can pass before the
disruption begins to seriously and unacceptably impede the flow of
normal business operations.
Native (on-site) and Remote - Data replication can be local or remote physical clusters
Data Replication Capabilities - Choose from backup or disaster recovery
86
Local Replication - This is also known as Time Stream, a set of snapshots
- Snapshots are placed locally on the same cluster as the source VM
Remote Replication - Snapshots are replicated to one or more other clusters
- Remote cluster is a physical cluster or cloud
- Synchronous [Metro]
- Asynchronous
Protection Domain Protection Domain (PD) is a defined group of entities (VMs and Volume
Groups) that are always backed up locally and optionally replicated to
one or more remote site.
Metro Availability Protection Active local storage container linked to a standby container at a remote
Domain site. Local and remote containers will have the same name. Containers
defined in a Metro Availability Protection Domain are synchronously
replicated to a remote container of the same name.
Snapshot A snapshot is a read-only copy of the data and state of a VM, file or
Volume Group at a specific point in time.
Retention Policy With retention policy, you can configure snapshot retention for a
specific amount of time.
87
Business Continuity:
That mechanism keeps your data safe on a local cluster. Should your business requirements
require your data to be protected outside the local cluster, Nutanix provides synchronous and
asynchronous replication solutions to replicate data between clusters in real-time.
The Data Protection dashboard displays dynamically updated information about the data
protection configuration in a cluster. To view the Data Protection dashboard, select Data Protection from
the pull-down list on the far left of the main menu.
The data Protection dashboard allows you to select from two viewing modes:
The Overview view displays data protection and recovery information in a summary view.
The Table view displays data protection information in a tabular form. The table screen is further
divided into protection domain and remote site views
88
Data Protection Table View:
The Data Protection table view displays information about remote sites and protection domains in a
tabular form. The displayed information is dynamically updated to remain current.
The Data Protection table view is divided into two sections:
The top section is a table. Each row represents a single protection domain (configured for
asynchronous data replication or metro availability) or remote site and includes basic information
about that protection domain or remote site.
The bottom Summary section provides additional information. It includes a details column on the
left and a set of tabs on the right. The details content and set of tabs varies depending on what
has been selected.
The two tabs on the Table view are the Async DR tab and the Remote Site tab.
ASYNC DR TAB
The Async DR tab displays information about protection domains configured for asynchronous
data replication in the cluster. This type of protection domain consists of a defined group of virtual
machines to be backed up (snapshots) locally on a cluster and optionally replicated to one or more remote
sites.
89
In this module, we will cover Instant Recovery with Snapshots and Backup and Recovery using
Protection Domains on AHV.
Note: If you are using another hypervisor with AOS, please consult your vendor's documentation for
proper procedures.
The instant recovery option provides a way to return rapidly to an exact time and state for both VMs
and Nutanix Volume groups (VGs). There are two built-in methods available in AHV for quickly bringing
back VM data:
Acropolis manages the instant recovery protection option by VM, VG, multiple VMs, multiple VGs, or
a mix of both VMs and VGs. These instant recovery methods can be managed through: Prism, REST API,
and command line interface (CLI).
Both crash-consistent and application-consistent snapshots are available. Crash consistency is
available for both instant recovery options, and application-level consistency is available only for the
protection domain VM instant recovery option.
Note: The system stores the data used to provide the instant recovery option in the same physical
infrastructure that hosts the VMs and VGs themselves. Do not treat this option as a valid fully functional
backup and recovery solution.
The on-demand option allows users to take VM snapshots, restore snapshots, and clone a new
VM from an existing snapshot. It is helpful to be able to take a snapshot as needed before starting
potentially sensitive administrative tasks or cloning an existing VM.
90
1. To create an On-Demand VM Recovery Point (take a snapshot), in Prism select the VM dashboard,
then the table view, then highlight the VM for which you want to set a recovery point.
2. Click Take Snapshot.
3. Give the snapshot a name, and click Submit.
To use the on-demand VM recovery point created earlier, repeat the same process, then in Prism,
select the VM dashboard, then the table view, then highlight the VM you want to recover and click the
VM Snapshot tab.
DETAILS:
CLONE:
Create new VMs based on the snapshot. The following options are available:
Number of clones (default is one). If you are creating more than one clone, you can select a
“Starting Index Number.”
Name: Default is [VM name]-1, or “iperf1-1” if you were cloning the first VM in the Take the
Snapshot screenshot above.
91
vCPUs.
Number of cores per vCPU.
Memory.
Network adapters (NICs).
You cannot configure disks and volume groups during the clone operation.
RESTORE:
Restore the VM to the snapshot state.
DELETE:
Delete the snapshot and merge all changes into one or more original VM disks.
The protection domain instant recovery option provides a way to schedule VM, VG, or both VM
and VG snapshots. Nutanix has continued to improve on its snapshots by incorporating lightweight
snapshots (LWS) to provide near-sync replication. The LWS feature can achieve an RPO of between 15
minutes and 1 minute by using markers instead of creating full snapshots. If the system cannot fulfill the
low RPO, Nutanix automatically switches to the vDisk snapshot approach, then returns to LWS when
possible.
Additional Information,
You can use consistency groups (CGs) to organize VMs within a protection domain. The default
option is to have one VM per CG, but if you include multiple VMs in one CG, you can snapshot all of them
at the same time in a crash-consistent manner and capture them all in one snapshot. If you're using the
application-consistent snapshot option, a CG can only contain one VM.
1. Use Entity Name: This is the default option; it creates a CG based on the VM name.
2. Use an existing CG: Available only when the given protection domain already contains a CG with
at least one VM.
3. Create a new CG: Use this option when you don’t want the name of the CG to be the same as the
VM name.
4. Use application-consistent snapshots: You can use this selection to enable the protection domain
instant recovery option. This choice provides instant recovery capability by capturing the VM data
on disk, data in memory, and transactions in progress.
5. Select the preferred protection domain schedule, including: When to take the snapshot, how
often it should be repeated (minute, hour, day, week, month, and so on), what day of the week
or month it should repeat, the start date and time, and if needed, its end date and time.
6. Retention policy.
92
Protection Domain VM Data Recovery Options
In-Place Restore–
In-place restoration creates entities—VMs and their attached VGs—on the recovery site, and
overwrites the existing ones. The newly created entities replace the older entities and their attachments.
For example, if you select for the in-place restore of a VM and its attached VG in a Protection Domain, a
new VM attached with a VG is created on the recovery site. The attachment of the VM and the VG is
retained on the recovery site only if you have installed NGT on the protected VM.
Out-of-Place Restore–
Out-of-place restoration creates entities—VMs and their attached VGs—on the recovery site, but
does not overwrite the existing ones. The newly created entities remain separate from the older entities
and their attachments. For example, if you select for the out-of-place restore of a VM and its attached VG
in a Protection Domain, a new VM attached with a VG is created on the recovery site. The attachment of
the VM and the VG is retained on the recovery site only if you have installed NGT on the protected VM.
You overwrite one or Volume groups are not detached Log on to the restored VMs and
more of the VMs in from the restored VMs, but follow- configure the in-guest iSCSI
the Protection up steps are required. Other VMs attachments.
Domain attached to the volume groups are
not affected.
You overwrite a The volume group is detached Log on to all the VMs to which the
volume group. from all VMs. volume group was attached and
configure in-guest iSCSI attachments.
AHV
You overwrite one or Volume groups are detached from Log on to the web console and reattach
more of the VMs in the restored VMs. Other VMs the volume groups to the VMs.
the Protection attached to the volume groups are Alternatively, log on to the VMs and
Domain. not affected. configure in-guest iSCSI attachments.
You overwrite a The volume group is detached Log on to the web console and reattach
volume group. from all VMs. the volume groups to the VMs.
Alternatively, log on to the VMs and
configure in-guest iSCSI attachments.
93
Lesson 6: Retention Policy
Depending on the scheduling configuration (1 to 15 minutes), snapshots are retained for a specific
amount of time.
In NearSync, you can configure the retention policy for days, weeks, or months on both the
primary and remote sites instead of defining the number of snapshots you want to retain.
EXAMPLE 1: DAYS
If your desired RPO is for 1 minute and you want to retain the snapshots for 5 days, the following
retention policy is applied:
For every 1 minute, a snapshot is created and retained for the maximum of 15 minutes.
Note: The recent 15 snapshots are only visible in the web console and are available for restore operations.
For every hour, a snapshot is created and retained for 6 hours.
One daily snapshot is created and retained for 5 days.
EXAMPLE 2: WEEKS/MONTHS
You can also define snapshot retention in weeks or months. If you configure a 3-month schedule, the
following retention policy is applied:
For every 1 minute, a snapshot is created and retained for 15 minutes.
For every hour, a snapshot is created and retained for 6 hours.
One daily snapshot is created and retained for 7 days.
One weekly snapshot is created and retained for 4 weeks.
One monthly snapshot is created and retained for 3 months.
Note: If you change the Protection Domain configuration from Async DR to NearSync, the first snapshot
will not be created according to the new schedule.
The snapshots will be created according to the start time of the old schedule that you configured in Async
DR. If you want the maximum retention for the first snapshot after modifying the schedule, update the
start time accordingly for NearSync.
The self-service restore (also known as file-level restore) feature allows virtual machine
administrators to perform a self-service recovery from the Nutanix data protection snapshots with
minimal administrator intervention.
94
Traditional File Restoration
Recovering from accidentally deleting files or overwriting important configuration files can be
very cumbersome, if you are the storage or virtualization administrator in a large company. Inmost cases
you will require the assistance of the back up or virtualization administrator, and a response from them
might take hours or even days.
Usually the application admin requests a recovery from the virtualization admin. Who then
restores an older snapshot and uses this entire snapshot as a new VM.
Then the virtualization admin grants VM-level access to the app admin who can now recover he
files. Once the app admin is done, they notify the virtualization admin that the recovery is complete and
that the restored virtual machine can be recycled.
Once enabled the app admin easily manages snapshots from within the VM, lists available
snapshots, and mounts or unmounts a particular snapshot. Once a snapshot is mounted, it shows up as a
new drive in the Windows guest OS, the admin can copy the required files, and then can unmount the
snapshot.
95
If the admin forgets to unmount, it automatically gets cleaned up after 24 hours to avoid mounted
snapshot congestion. The admin can connect multiple snapshots to their VM at the same time, to search
back in the snapshot history.
GENERAL REQUIREMENTS
Guest VM must have configured Nutanix snapshots by adding VM to a protection domain. Self-
service restore is not supported for the snapshots that you take from the VM table view.
vStore protection domains are not supported.
Volume groups are not supported.
Snapshots that are created in AOS 4.5 or later releases are only supported.
IDE/SCSI disks are only supported. SATA, PCI, and delta disks are not supported.
A sufficient number of logical drive letters should be available to bring the disk online.
File Systems. Dynamic disks comprising of NTFS on simple volumes, spanned volumes, striped
volumes, mirrored volumes, and RAID-5 volumes are not supported.
Only 64-bit operating system is supported.
These operating systems are supported.
Windows Server 2008 R2 or later versions
Windows 7 through Windows 10
Disks created as Microsoft Storage Space devices by using Microsoft Windows Server 2016 or later
are not supported.
96
LINUX VMS REQUIREMENTS
File Systems: Only extended file system (ext2, ext3, and ext4) and XFS file systems are supported.
Logical Volume Manager (LVM) disks for which, the volume group corresponds to only a single
physical disk are mounted.
Whenever the snapshot disk has an inconsistent filesystem (as indicated by the fsck check), disk
is only attached and not mounted.
Following Operating Systems are supported:
CentOS 6.5 through 6.9 and 7.0 through 7.3
Red Had Enterprise Linux (RHEL) 6.5 through 6.9 and 7.0 through 7.3
Oracle Linux 6.5 and 7.0
SUSE Linux Enterprise Server (SLES) 11 SP1 through 11 SP4 and 12 SP1 through 12 SP3
Ubuntu 14.04 for both AHV and ESXi and 16.10 (AHV only)
Leap uses an entity-centric approach and runbook-like automation to recover applications. It uses
categories to group the entities to be protected and applies policies to automate the protection of new
entities as the application scales. Application recovery is more flexible with network mappings, an
enforceable VM power on sequence, and inter-stage delays. Application recovery can also be validated
and tested without affecting production workloads. Asynchronous, NearSync, and Synchronous
replication schedules ensure that an application and its configuration details synchronize to the recovery
location for a smoother recovery.
Leap works with pairs of physically isolated locations called availability zones. An instance of Prism
Central represents an availability zone. One availability zone serves as the primary site for an application
while a paired availability zone serves as the recovery site. You can configure disaster recovery between
AHV or ESXi clusters running AOS in the same or different availability zones. Leap can be used between:
97
Two clusters in a single on-prem site that are managed by different instances of Prism Central
An on-prem site and a site in Xi Cloud Services
Configuration tasks and disaster recovery workflows are largely the same regardless of whether you
choose Xi Cloud Services or an on-premises deployment for recovery.
Availability Zone:
An availability zone is a location to which you can replicate the data that you want to protect. It
is represented by a Prism Central instance to which a Nutanix cluster is registered. To ensure availability,
availability zones should be physically isolated from each other.
Xi Cloud Services–
If you choose to replicate data to Xi Cloud Services, the on-premises Prism Central instance is
paired with a Xi Cloud Services account, and data is replicated to Xi Cloud Services.
Physical Datacenter–
If you choose to back up data to a physical datacenter, you must provide the details of a Prism
Central instance running in a datacenter that you have administrative access to, which is also logically
isolated from the primary availability zone.
Availability zones in Xi Cloud Services are physically isolated from each other to ensure that a
disaster at one location does not affect another location. If you choose to pair with a physical datacenter,
the responsibility of ensuring that the paired locations are physically isolated lies with you.
Nutanix Mine does something similar. It eliminates the complexity of traditional data protection
environments, by converging back up software, target storage, and long-term archival into one solution.
And this eliminates the need to license, manage, and support multiple point solutions.
Nutanix Mine uses the AOS to provide the same benefits for secondary storage that Nutanix already
provide to applications – simplicity, performance, and resiliency. Mine extends both the management
plane and the data fabric of Nutanix AOS to now run data protection functionality and back up target
storage in a turn-key solution. And if that wasn’t cool enough, Mine can be deployed as a standalone
solution to back up both virtualized and legacy applications.
98