OpenShift Virtualization - Technical Overview
OpenShift Virtualization - Technical Overview
OpenShift Virtualization - Technical Overview
Technical presentation
1
What is OpenShift
Virtualization?
3
Containers are not virtual machines
● Containers are process isolation
● Kernel namespaces provide isolation and
cgroups provide resource controls App 1 App 2 App 3
Virtualization Containerization
4
Virtual machines can be put into containers
● A KVM virtual machine is a process
● Containers encapsulate processes
● Both have the same underlying
resource needs:
○ Compute
○ Network
○ (sometimes) Storage
5
OpenShift Virtualization
● Virtual machines
○ Running in containers, managed as Pods
○ Using the KVM hypervisor
● Scheduled, deployed, and managed by Kubernetes
● Integrated with container orchestrator resources and
services
○ Traditional Pod-like SDN connectivity and/or
connectivity to external VLAN and other networks
via multus
○ Persistent storage paradigm (PVC, PV,
StorageClass)
6
VM containers use KVM
● OpenShift Virtualization uses KVM, the Linux kernel
hypervisor
● KVM is a core component of the Red Hat Enterprise
Linux kernel OTHER APPS libvirt
QEMU
○ KVM has 10+ years of production use: Red Hat
Virtualization, Red Hat OpenStack Platform, and RHCOS
KVM
RHEL all leverage KVM, QEMU, and libvirt
● QEMU uses KVM to execute virtual machines DRIVER DRIVER DRIVER
● libvirt provides a management abstraction layer
HARDWARE
7
Built with
Kubernetes
8
Virtual machines in a container world
● Provides a way to transition application components
which can’t be directly containerized into a Kubernetes
system
○ Integrates directly into existing k8s clusters
○ Follows Kubernetes paradigms:
VM pod App pod
■ Container Networking Interface (CNI)
■ Container Storage Interface (CSI)
OpenShift
■ Custom Resource Definitions (CRD, CR)
● Schedule, connect, and consume VM resources as
RHEL CoreOS
container-native
Physical Machine
9
Virtualization native to Kubernetes
● Operators are a Kubernetes-native way to introduce
new capabilities
● New CustomResourceDefinitions (CRDs) for native
VM integration, for example:
○ VirtualMachine
○ VirtualMachineInstance
○ VirtualMachineInstanceMigration
○ VirtualMachineSnapshot
○ DataVolume
10
Containerized virtual machines
Storage Kubernetes resources
● Every VM runs in a launcher pod. The launcher process will
supervise, using libvirt, and provide pod integration.
Network
12
Managed with
OpenShift
13
Virtual Machine Management
● Create, modify, and destroy virtual
machines, and their resources, using
the OpenShift web interface or CLI
● Use the virtctl command to
simplify virtual machine interaction
from the CLI
14
Create VMs
15
Virtual Machine creation
● Streamlined and simplified creation via the GUI or
create VMs programmatically using YAML
● Full configuration options for compute, network, and
storage resources
○ Clone VMs from templates or import disks using
DataVolumes
○ Pre-defined and customizable presets for
CPU/RAM allocations
○ Workload profile to tune KVM for expected
behavior
● Import VMs from VMware vSphere or Red Hat
Virtualization
16
Using templates for virtual machines
● Simplified and streamlined virtual machine
creation experience for VM consumers
● Administrators configure templates with an OS
disk, consumers select from the options
17
Creating a virtual machine
● Flavor represents the preconfigured CPU and
RAM assignments
● Storage, including PVC size and storage class, are
determined by the template administrator
● A default network configuration is defined in the
template
● Workload profile defines the category of workload
expected and is used to set KVM performance
flags
● The VM can be further customized by selecting
the option, or immediately created and deployed
○ Additional customization includes
CPU/memory, storage, network, cloud-init,
and more
18
VM Templates
19
Templates
● Templates are a core concept for
virtual machine creation with
OpenShift Virtualization
● Red Hat provides default templates,
administrators can create and
customize additional as needed
● Boot sources provide disk images for
the template
● Creating VMs can be done from the
template page in just a few clicks
20
Create a template - General
● In addition to unique names, each template is
associated with a provider
○ Providers represent who created the
template, with optional support
information
● The guest operating system and source boot
disk are provided. A boot disk can be imported
during the process, or an ISO can be used to
boot and install the OS
● A default flavor, representing CPU and memory
allotments, is assigned
● Workload type determines optimizations to
balance between performance and efficiency
21
Create a template - Networks
● Add or edit network adapters
● One or more network connections
○ Pod network for the default SDN
○ Additional multus-based interfaces
for specific connectivity
● Multiple NIC models for guest OS
compatibility or paravirtualized
performance with VirtIO
● Masquerade, bridge, or SR-IOV
connection types
● MAC address customization if desired
22
Create a template - Storage
● Add or edit persistent storage
● Disks can be sourced from
○ Imported QCOW2 or raw images
○ New or existing PVCs
○ Clone existing PVCs
● Use SATA/SCSI interface for compatibility
or VirtIO for paravirtual performance
● For new or cloned disks, select from
available storage classes
○ Customize volume and access mode as
needed
○ RWX PVCs are required for live
migration
23
Create a template - Advanced
● Customize the operating system
deployment using cloud-init
scripts
○ Guest OS must have
cloud-init installed
○ RHEL, Fedora, etc. cloud images
○ Default templates will
auto-generate a simple
cloud-init to set the password
24
Import VMs
25
Virtual Machine Import
● Wizard supports importing from VMware or
Red Hat Virtualization
○ Single-VM workflow
● VMware import uses VDDK to expedite the
disk import process
○ User is responsible for downloading the
VDDK from VMware and adding it to a
container image
● Credentials stored as Secrets
● ResourceMapping CRD configures default
source -> destination storage and network
associations
26
View / manage
VMs
27
Virtual Machine - Overview
28
Virtual Machine - Actions
29
Virtual Machine - Details
30
Virtual Machine - Console
31
Virtual Machine - Disks and NICs
32
Destroy VMs
33
Destroying a Virtual Machine
● Deleting a VM removes the VM definition
○ Optionally delete PVC-backed disks
associated with the VM
● Running VMs are terminated first
● Other associated resources, e.g. Services, are
not affected
34
Metrics
35
Overview Virtual Machine metrics
● Summary metrics for 1, 6, and 24 hour periods are
quickly viewable from the VM overview page
● Clicking a graph will display it enlarged in the
metrics UI
36
Detailed Virtual Machine metrics
● Virtual machine, and VM pod, metrics are collected
by the OpenShift metrics service
○ Available under the kubevirt namespace in
Prometheus
● Available per-VM metrics include
○ Active memory
○ Active CPU time
○ Network in/out errors, packets, and bytes
○ Storage R/W IOPS, latency, and throughput
● VM metrics are for VMs, not for VM pods
○ Management overhead not included in output
○ Look at virt-launcher pod metrics for
● No preexisting Grafana dashboards
37
Deeper into the
technology
38
Containerizing KVM
Trusted, mature KVM wrapped in modern management and automation
Red Hat Virtualization OpenShift Virtualization Red Hat OpenStack Platform
VM VM VM
KubeVirt Container
RHV Host RHEL CoreOS Host OSP Compute
39
Architectural Overview
kubelet
virt-controller
libvirtd container 2
VM container n
40
Virtual machines
42
Containerized virtual machines
● Inherit many features and functions from Kubernetes
○ Scheduling, high availability, attach/detach resources
● Containerized virtual machines have the same characteristics as
non-containerized
○ CPU, RAM, etc. limitations dictated by libvirt and QEMU
○ Linux and Windows guest operating systems
● Storage
○ Use Persistent Volumes Claims (PVCs) for VM disks
○ Containerized Data Importer (CDI) import VM images
● Network
○ Inherit pod network by default
○ Multus enables direct connection to external network
43
Network
45
Virtual Machine Networking
● Virtual machines optionally connect to the
standard pod network
○ OpenShift SDN, OVNKubernetes
○ Partners, such as Calico, are also supported
● Additional network interfaces accessible via
Multus:
○ Bridge, SR-IOV
○ VLAN and other networks can be created at
the host level using nmstate
● When using at least one interface on the default
SDN, Service, Route, and Ingress configuration
applies to VM pods the same as others
46
Example host network configuration
Service Net
configuration at install - bond0 in the example
Pod Net
to the right
● Use kubernetes-nmstate, via the NMstate SDN
Operator, to configure additional host network Multus
interfaces
Machine Net
○ bond1 and br1 in the example to the right
● VM pods connect to one or more networks
br0 br1
simultaneously
The following slides show an example of how this bond0 bond1
● NodeNetworkConfiguration-
Policy (NNCP)
○ Nmstate operator CRD
Service Net
Configure host network
Pod Net
○
using declarative
language SDN
● Applies to all nodes specified Multus
in the nodeSelector,
Machine Net
including newly added nodes
automatically
br0 br1
● Update or add new NNCPs
for additional host configs bond0 bond1
Service Net
Pod Net
SDN
Multus
Machine Net
br0 br1
bond0 bond1
● Use the
NodeNetworkConfigurationEnactment
(NNCE) object to view status of NNCP
application
● Further details of the node network state can
be seen using the NodeNetworkState CRD
○ oc get nns/node-name -o yaml
50
Connecting Pods to networks
● Multus uses CNI network definitions in the
NetworkAttachmentDefinition to allow access
○ net-attach-def are namespaced
○ Pods cannot connect to a net-attach-def
in a different namespace
● cnv-bridge and cnv-tuning types are used to
enable VM specific functions
○ MAC address customization
○ MTU and promiscuous mode
○ sysctls, if needed
● Pod connections are defined using an annotation
○ Pods can have many connections to many
networks
51
Connecting VMs to networks
53
Virtual Machine Storage
● OpenShift Virtualization uses the Kubernetes
PersistentVolume (PV) paradigm
● PVs can be backed by
○ In-tree iSCSI, NFS, etc.
○ CSI drivers
○ Local storage using host path provisioner
○ OpenShift Container Storage
● Use dynamically or statically provisioned PVs
● RWX is required for live migration
● Disks are attached using VirtIO or SCSI controllers
○ Connection order specified in the VM definition
● Boot order customized via VM definition
54
VM disks in PVCs
Vendor CSI Deployment
● VM disks on FileSystem mode PVCs are created as thin
provisioned raw images Node CSI
Driver Controller
○ As of OCP 4.7 / OpenShift Virtualization 2.6,
DataVolumes can create thin or thick volumes OpenShift Node(s) Storage Array
● Block mode PVCs are attached directly to the VM
kubelet
● CSI operations, e.g. snapshot and clone, should be used
with care Driver
registrar
○ Use DataVolumes to clone VM disks
○ Use VM details interface for (powered off) VM snaps
● PVC resize does not modify the size of the VM disk Side-
cars
○ Not currently supported
● Hot add is not supported (for any virtual hardware)
55
DataVolumes
Data source
PV
58
Ephemeral Virtual Machine Disks
● VMs booted via PXE or using a container image can be
“diskless”
○ PVCs may be attached and mounted as secondary
devices for application data persistence
● VMs based on container images use the standard
copy-on-write graph storage for OS disk R/W
○ Consider and account for capacity and IOPS
during RHCOS disk sizing if using this type
● An emptyDisk may be used to add additional
ephemeral capacity for the VM
59
Helper disks
● OpenShift Virtualization attaches disks to VMs for
injecting data
○ Cloud-Init
○ ConfigMap
○ Secrets
○ ServiceAccount
● These disks are read-only and can be mounted by the OS
to access the data within
60
Comparing with
traditional
virtualization
platforms
61
Live Migration
● Live migration moves a virtual machine from one node to another in the OpenShift cluster
● Can be triggered via GUI, CLI, API, or automatically
● RWX storage is required, cannot use bridge connection to pod network
● Live migration is cancellable by deleting the API object
● Default maximum of five (5) simultaneous live migrations
○ Maximum of two (2) outbound migrations per node, 64MiB/s throughput each
62
Automated live migration
● OpenShift / Kubernetes triggers Pod rebalance actions based on multiple factors
○ Soft / hard eviction policies
○ Pod descheduler
○ Pod disruption policy
○ Node resource contention resulting in evictions
■ Pods are Burstable QoS class by default
■ All memory is requested in Pod definition, only CPU overhead is requested
● Pod rebalance applies to VM pods equally
● VMs will behave according to the eviction strategy
○ LiveMigrate - use live migration to move the VM to a different node
○ No definition - terminate the VM if the node is drained or Pod evicted
63
VM scheduling
● VM scheduling follows pod scheduling rules
○ Node selectors
○ Taints / tolerations
○ Pod and node affinity / anti-affinity
● Kubernetes scheduler takes into account many additional factors
○ Resource load balancing - requests and reservations
○ Large / Huge page support for VM memory
○ Use scheduler profiles to provide additional hints (for all Pods)
● Resources are managed by Kubernetes
○ CPU and RAM requests less than limit - Burstable QoS by default
○ K8s QoS policy determines scheduling priority: BestEffort class is evicted before
Burstable class, which is evicted before Guaranteed class
64
Node Resource Management
● VM density is determined by multiple factors controlled at the cluster, OpenShift Virtualization,
Pod, and VM levels
● Pod QoS policy
○ Burstable (limit > request) allows more overcommit, but may lead to more frequent migrations
○ Guaranteed (limit = request) allows less overcommitment, but may have less physical resource
utilization on the hosts
● Cluster Resource Override Operator provides global overcommit policy, can be customized per
project for additional control
● Pods request full amount of VM memory and approx. 10% of VM CPU
○ VM pods request a small amount of additional memory, used for libvirt/QEMU overhead
■ Administrator can set this to be overcommitted
65
High availability
● Node failure is detected by Kubernetes and results in the Pods from the lost node being
rescheduled to the surviving nodes
● VMs are not scheduled to nodes which have not had a heartbeat from virt-handler, regardless of
Kubernetes node state
● Additional monitoring may trigger automated action to force stop the VM pods, resulting in
rescheduling
○ May take up to 5 minutes for virt-handler and/or Kubernetes to detect failure
○ Liveness and Readiness probes may be configured for VM-hosted applications
○ Machine health checks can decrease failure detection time
66
Terminology comparison
Feature RHV OpenShift Virtualization vSphere
Active resource balancing Cluster scheduling policy Pod eviction policy, descheduler Dynamic Resource
Scheduling (DRS)
Physical network Host network config (via nmstate Operator, Multus vSwitch / DvSwitch
configuration nmstate w/4.4)
Host / VM metrics Data warehouse + OpenShift Metrics, health checks vCenter, vROps
Grafana (RHV 4.4)
67
Runtime
awareness
68
Deploy and configure
69
Compute configuration
● VM nodes should be physical with CPU virtualization technology enabled in the BIOS
○ Nested virtualization works, but is not supported
○ Emulation works, but is not supported (and is extremely slow)
● Node labeler detects CPU type and labels nodes for compatibility and scheduling
● Configure overcommitment using native OpenShift functionality - Cluster Resource Override
Operator
○ Optionally, customize the project template so that non-VM pods are not overcommitted
○ Customize projects hosting VMs for overcommit policy
● Apply Quota and LimitRange controls to projects with VMs to manage resource consumption
● VM definitions default to all memory “reserved” via a request, but only a small amount of CPU
○ CPU and memory request/limit values are modified in the VM definition
70
Network configuration
● Apply traditional network architecture decision framework to OpenShift Virtualization
○ Resiliency, isolation, throughput, etc. determined by combination of application, management,
storage, migration, and console traffic
○ Most clusters are not VM only, include non-VM traffic when planning
● Node interface on the MachineNetwork.cidr is used for “primary” communication, including SDN
○ This interface should be both resilient and high throughput
○ Used for migration and console traffic
○ Configure this interface at install time using kernel parameters, reinstall node if configuration
changes
● Additional interfaces, whether single or bonded, may be used for traffic isolation, e.g. storage and VM
traffic
○ Configure using nmstate Operator, apply configuration to nodes using selectors on NNCP
71
Storage configuration
● Shared storage is not required, but very highly encouraged
○ Live migration depends on RWX PVCs
● Create shared storage from local resources using OpenShift Container Storage
○ RWX file and block devices for live migration
● No preference for storage protocol, use what works best for the application(s)
● Storage backing PVs should provide adequate performance for VM workload
○ Monitor latency from within VM, monitor throughput from OpenShift
● For IP storage (NFS, iSCSI), consider using dedicated network interfaces
○ Will be used for all PVs, not just VM PVs
● Certified CSI drivers are recommended
○ Many non-certified CSI provisioners work, but do not have same level of OpenShift testing
● Local storage may be utilized via the Host Path Provisioner
72
Deploying a VM operating system
Creating virtual machines can be accomplished in multiple ways, each offering different options and capabilities
● Start by answering the question “Do I want to manage my VM like a container or a traditional VM?”
● Deploying the OS persistently, i.e. “I want to manage like a traditional VM”
○ Methods:
■ Import a disk with the OS already installed (e.g. cloud image) from a URL or S3 endpoint using a
DataVolume, or via CLI using virtctl
■ Clone from an existing PVC or VM template
■ Install to a PVC using an ISO
○ VM state will remain through reboots and, when using RWX PVCs, can be live migrated
● Deploying the OS non-persistently, i.e. “I want to manage like a container”
○ Methods:
■ Diskless, via PXE
■ Container image, from a registry
○ VM has no state, power off will result in disk reset. No live migration.
73
● Import disks deployed from a container image using CDI to make them persistent
Deploying an application
Once the operating system is installed, the application can be deployed and configured several ways
● The application is pre-installed with the OS
○ This is helpful when deploying from container image or PXE as all components can be managed and
treated like other container images
● The application is installed to a container image
○ Allows the application to be mounted to the VM using a secondary disk. Decouples OS and app lifecycle.
When used with a VM that has a persistently deployed OS this breaks live migration
● The application is installed after OS is installed to a persistent disk
○ cloud-init - perform configuration operations on first boot, including OS customization and app
deployment
○ SSH/Console - connect and administer the OS just like any other VM
○ Ansible or other automation - An extension of the SSH/console method, just automated
74
Additional
resources
75
More information
● Documentation:
○ OpenShift Virtualization: https://docs.openshift.com
○ KubeVirt: https://kubevirt.io
● Demos and video resources: http://demo.openshift.com
● Labs and workshops: coming soon to RHPDS
76
Thank you linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
Red Hat is the world’s leading provider of enterprise
77