OpenShift Virtualization - Technical Overview

Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

OpenShift Virtualization

Technical presentation

1
What is OpenShift
Virtualization?

3
Containers are not virtual machines
● Containers are process isolation
● Kernel namespaces provide isolation and
cgroups provide resource controls App 1 App 2 App 3

● No hypervisor needed for containers Guest Guest Guest

● Contain only binaries, libraries, and tools OS OS OS

App 1 App 2 App 3


which are needed by the application
Hypervisor Operating System
● Ephemeral
Infrastructure Infrastructure

Virtualization Containerization

4
Virtual machines can be put into containers
● A KVM virtual machine is a process
● Containers encapsulate processes
● Both have the same underlying
resource needs:
○ Compute
○ Network
○ (sometimes) Storage

5
OpenShift Virtualization
● Virtual machines
○ Running in containers, managed as Pods
○ Using the KVM hypervisor
● Scheduled, deployed, and managed by Kubernetes
● Integrated with container orchestrator resources and
services
○ Traditional Pod-like SDN connectivity and/or
connectivity to external VLAN and other networks
via multus
○ Persistent storage paradigm (PVC, PV,
StorageClass)

6
VM containers use KVM
● OpenShift Virtualization uses KVM, the Linux kernel
hypervisor
● KVM is a core component of the Red Hat Enterprise
Linux kernel OTHER APPS libvirt
QEMU
○ KVM has 10+ years of production use: Red Hat
Virtualization, Red Hat OpenStack Platform, and RHCOS
KVM
RHEL all leverage KVM, QEMU, and libvirt
● QEMU uses KVM to execute virtual machines DRIVER DRIVER DRIVER
● libvirt provides a management abstraction layer

CPU/RAM STORAGE NETWORK

HARDWARE

7
Built with
Kubernetes

8
Virtual machines in a container world
● Provides a way to transition application components
which can’t be directly containerized into a Kubernetes
system
○ Integrates directly into existing k8s clusters
○ Follows Kubernetes paradigms:
VM pod App pod
■ Container Networking Interface (CNI)
■ Container Storage Interface (CSI)
OpenShift
■ Custom Resource Definitions (CRD, CR)
● Schedule, connect, and consume VM resources as
RHEL CoreOS
container-native

Physical Machine

9
Virtualization native to Kubernetes
● Operators are a Kubernetes-native way to introduce
new capabilities
● New CustomResourceDefinitions (CRDs) for native
VM integration, for example:
○ VirtualMachine
○ VirtualMachineInstance
○ VirtualMachineInstanceMigration
○ VirtualMachineSnapshot
○ DataVolume

10
Containerized virtual machines
Storage Kubernetes resources
● Every VM runs in a launcher pod. The launcher process will
supervise, using libvirt, and provide pod integration.
Network

Red Hat Enterprise Linux


CPU ● libvirt and qemu from RHEL are mature, have high
performance, provide stable abstractions, and have a
minimal overhead.
Memory
Security - Defense in depth

Device ● Immutable RHCOS by default, SELinux MCS, plus KVM


isolation - inherited from the Red Hat Portfolio stack
11
Using VMs and containers together
● Virtual machines connected to pod networks
are accessible using standard Kubernetes
methods:
○ Service
○ Route
○ Ingress
● Network policies apply to VM pods the same
as application pods
● VM-to-pod, and vice-versa, communication
happens over SDN or ingress depending on
network connectivity

12
Managed with
OpenShift

13
Virtual Machine Management
● Create, modify, and destroy virtual
machines, and their resources, using
the OpenShift web interface or CLI
● Use the virtctl command to
simplify virtual machine interaction
from the CLI

14
Create VMs

15
Virtual Machine creation
● Streamlined and simplified creation via the GUI or
create VMs programmatically using YAML
● Full configuration options for compute, network, and
storage resources
○ Clone VMs from templates or import disks using
DataVolumes
○ Pre-defined and customizable presets for
CPU/RAM allocations
○ Workload profile to tune KVM for expected
behavior
● Import VMs from VMware vSphere or Red Hat
Virtualization
16
Using templates for virtual machines
● Simplified and streamlined virtual machine
creation experience for VM consumers
● Administrators configure templates with an OS
disk, consumers select from the options

17
Creating a virtual machine
● Flavor represents the preconfigured CPU and
RAM assignments
● Storage, including PVC size and storage class, are
determined by the template administrator
● A default network configuration is defined in the
template
● Workload profile defines the category of workload
expected and is used to set KVM performance
flags
● The VM can be further customized by selecting
the option, or immediately created and deployed
○ Additional customization includes
CPU/memory, storage, network, cloud-init,
and more
18
VM Templates

19
Templates
● Templates are a core concept for
virtual machine creation with
OpenShift Virtualization
● Red Hat provides default templates,
administrators can create and
customize additional as needed
● Boot sources provide disk images for
the template
● Creating VMs can be done from the
template page in just a few clicks

20
Create a template - General
● In addition to unique names, each template is
associated with a provider
○ Providers represent who created the
template, with optional support
information
● The guest operating system and source boot
disk are provided. A boot disk can be imported
during the process, or an ISO can be used to
boot and install the OS
● A default flavor, representing CPU and memory
allotments, is assigned
● Workload type determines optimizations to
balance between performance and efficiency

21
Create a template - Networks
● Add or edit network adapters
● One or more network connections
○ Pod network for the default SDN
○ Additional multus-based interfaces
for specific connectivity
● Multiple NIC models for guest OS
compatibility or paravirtualized
performance with VirtIO
● Masquerade, bridge, or SR-IOV
connection types
● MAC address customization if desired

22
Create a template - Storage
● Add or edit persistent storage
● Disks can be sourced from
○ Imported QCOW2 or raw images
○ New or existing PVCs
○ Clone existing PVCs
● Use SATA/SCSI interface for compatibility
or VirtIO for paravirtual performance
● For new or cloned disks, select from
available storage classes
○ Customize volume and access mode as
needed
○ RWX PVCs are required for live
migration

23
Create a template - Advanced
● Customize the operating system
deployment using cloud-init
scripts
○ Guest OS must have
cloud-init installed
○ RHEL, Fedora, etc. cloud images
○ Default templates will
auto-generate a simple
cloud-init to set the password

24
Import VMs

25
Virtual Machine Import
● Wizard supports importing from VMware or
Red Hat Virtualization
○ Single-VM workflow
● VMware import uses VDDK to expedite the
disk import process
○ User is responsible for downloading the
VDDK from VMware and adding it to a
container image
● Credentials stored as Secrets
● ResourceMapping CRD configures default
source -> destination storage and network
associations

26
View / manage
VMs

27
Virtual Machine - Overview

● General overview about the virtual machine


● Information populated from guest when
integrations are available
○ IP address, etc.
● Inventory quickly shows configured hardware
with access to view/manage
● Utilization reporting for CPU, RAM, disk, and
network
● Events related to the Pod, scheduling, and
resources are displayed

28
Virtual Machine - Actions

● Actions menu allows quick access to


common VM tasks
○ Start/stop/restart
○ Live migration
○ Clone
○ Edit application group, labels, and
annotations
○ Delete
● Accessible from all tabs of VM details
screen and the VM list

29
Virtual Machine - Details

● Details about the virtual machine


○ Labels, annotations
○ Configured OS
○ Template used, if any
○ Configured boot order
○ Associated workload profile
○ Flavor
● Additional details about scheduling
○ Node selector, tolerations, (anti)affinity
rules
● Services configured for the VM

30
Virtual Machine - Console

● Browser-based access to the serial and


graphical console of the virtual machine
● Access the console using native OS tools,
e.g. virt-viewer, using the virtctl CLI
command
○ virtctl console vmname
○ virtctl vnc vmname

31
Virtual Machine - Disks and NICs

● Add, edit, and remove NICs


and disks for non-running
virtual machines

32
Destroy VMs

33
Destroying a Virtual Machine
● Deleting a VM removes the VM definition
○ Optionally delete PVC-backed disks
associated with the VM
● Running VMs are terminated first
● Other associated resources, e.g. Services, are
not affected

34
Metrics

35
Overview Virtual Machine metrics
● Summary metrics for 1, 6, and 24 hour periods are
quickly viewable from the VM overview page
● Clicking a graph will display it enlarged in the
metrics UI

36
Detailed Virtual Machine metrics
● Virtual machine, and VM pod, metrics are collected
by the OpenShift metrics service
○ Available under the kubevirt namespace in
Prometheus
● Available per-VM metrics include
○ Active memory
○ Active CPU time
○ Network in/out errors, packets, and bytes
○ Storage R/W IOPS, latency, and throughput
● VM metrics are for VMs, not for VM pods
○ Management overhead not included in output
○ Look at virt-launcher pod metrics for
● No preexisting Grafana dashboards

37
Deeper into the
technology

38
Containerizing KVM
Trusted, mature KVM wrapped in modern management and automation
Red Hat Virtualization OpenShift Virtualization Red Hat OpenStack Platform

RHV-M Console / CLI OpenShift Console / CLI OpenStack Horizon / CLI

vdsm kubelet nova-compute

libvirt libvirt libvirt


QEMU / KVM QEMU / KVM QEMU / KVM

VM VM VM
KubeVirt Container
RHV Host RHEL CoreOS Host OSP Compute
39
Architectural Overview

kubelet

(DaemonSet) Pod VM Pod Other Pod(s)


API Server

virt-handler virt-launcher container 1

virt-controller
libvirtd container 2

VM container n

Cluster Services Nodes

40
Virtual machines

42
Containerized virtual machines
● Inherit many features and functions from Kubernetes
○ Scheduling, high availability, attach/detach resources
● Containerized virtual machines have the same characteristics as
non-containerized
○ CPU, RAM, etc. limitations dictated by libvirt and QEMU
○ Linux and Windows guest operating systems
● Storage
○ Use Persistent Volumes Claims (PVCs) for VM disks
○ Containerized Data Importer (CDI) import VM images
● Network
○ Inherit pod network by default
○ Multus enables direct connection to external network
43
Network

45
Virtual Machine Networking
● Virtual machines optionally connect to the
standard pod network
○ OpenShift SDN, OVNKubernetes
○ Partners, such as Calico, are also supported
● Additional network interfaces accessible via
Multus:
○ Bridge, SR-IOV
○ VLAN and other networks can be created at
the host level using nmstate
● When using at least one interface on the default
SDN, Service, Route, and Ingress configuration
applies to VM pods the same as others
46
Example host network configuration

● Pod, service, and machine network are configured


by OpenShift automatically
○ Use kernel parameters (dracut) for

Service Net
configuration at install - bond0 in the example

Pod Net
to the right
● Use kubernetes-nmstate, via the NMstate SDN
Operator, to configure additional host network Multus

interfaces
Machine Net
○ bond1 and br1 in the example to the right
● VM pods connect to one or more networks
br0 br1
simultaneously
The following slides show an example of how this bond0 bond1

setup is configured NIC NIC NIC NIC Node


47
Host bond configuration

● NodeNetworkConfiguration-
Policy (NNCP)
○ Nmstate operator CRD

Service Net
Configure host network

Pod Net

using declarative
language SDN
● Applies to all nodes specified Multus

in the nodeSelector,
Machine Net
including newly added nodes
automatically
br0 br1
● Update or add new NNCPs
for additional host configs bond0 bond1

NIC NIC NIC NIC Node


48
Host bridge configuration

Service Net

Pod Net
SDN
Multus

Machine Net

br0 br1

bond0 bond1

NIC NIC NIC NIC Node


49
Host network status

● Use the
NodeNetworkConfigurationEnactment
(NNCE) object to view status of NNCP
application
● Further details of the node network state can
be seen using the NodeNetworkState CRD
○ oc get nns/node-name -o yaml

50
Connecting Pods to networks
● Multus uses CNI network definitions in the
NetworkAttachmentDefinition to allow access
○ net-attach-def are namespaced
○ Pods cannot connect to a net-attach-def
in a different namespace
● cnv-bridge and cnv-tuning types are used to
enable VM specific functions
○ MAC address customization
○ MTU and promiscuous mode
○ sysctls, if needed
● Pod connections are defined using an annotation
○ Pods can have many connections to many
networks

51
Connecting VMs to networks

● Virtual machine interfaces describe NICs


attached to the VM
○ spec.domain.devices.interfaces
○ Model: virtio, e1000, pcnet, rtl8139, etc.
○ Type: masquerade, bridge
○ MAC address: customize the MAC
● The networks definition describes the connection
type
○ spec.networks
○ Pod = default SDN
○ Multus = secondary network using Multus
● Using the GUI makes this easier and removes
the need to edit / manage connections in YAML
52
Storage

53
Virtual Machine Storage
● OpenShift Virtualization uses the Kubernetes
PersistentVolume (PV) paradigm
● PVs can be backed by
○ In-tree iSCSI, NFS, etc.
○ CSI drivers
○ Local storage using host path provisioner
○ OpenShift Container Storage
● Use dynamically or statically provisioned PVs
● RWX is required for live migration
● Disks are attached using VirtIO or SCSI controllers
○ Connection order specified in the VM definition
● Boot order customized via VM definition
54
VM disks in PVCs
Vendor CSI Deployment
● VM disks on FileSystem mode PVCs are created as thin
provisioned raw images Node CSI
Driver Controller
○ As of OCP 4.7 / OpenShift Virtualization 2.6,
DataVolumes can create thin or thick volumes OpenShift Node(s) Storage Array
● Block mode PVCs are attached directly to the VM
kubelet
● CSI operations, e.g. snapshot and clone, should be used
with care Driver
registrar
○ Use DataVolumes to clone VM disks
○ Use VM details interface for (powered off) VM snaps
● PVC resize does not modify the size of the VM disk Side-
cars
○ Not currently supported
● Hot add is not supported (for any virtual hardware)
55
DataVolumes

● VM disks can be imported from multiple sources using


DataVolumes, e.g. an HTTP(S) or S3 URL for a QCOW2 or
raw disk image, optionally compressed
● VM disks can be cloned / copied from existing PVCs
● DataVolumes are created as distinct objects or as a part of
the VM definition as a dataVolumeTemplate
● DataVolumes use the ContainerizedDataImporter to
connect, download, and prepare the disk image
● DataVolumes create PVCs based on defaults defined in
the kubevirt-storage-class-defaults ConfigMap or
according to the profile (as of version 4.8)
56
Storage Profiles

● Introduced with OpenShift Virtualization 4.8


● Provide default settings and properties for
StorageClasses used by DataVolumes
● Created automatically for every StorageClass
● Preconfigured values for some storage providers,
administrator can modify and customize
● DataVolume definitions only need to specify
StorageClass, without knowledge of underlying
details
○ spec.storage doesn’t require fields other than
the size and StorageClass
57
Containerized Data Importer

Data source

VM 1. The user creates a virtual


machine with a DataVolume
1 2. The StorageClass is used to
Requests

satisfy the PVC request


3. The CDI controller creates an
importer pod, which mounts
the PVC and retrieves the
3 disk image. The image could
be sourced from S3, HTTP, or
Creates
PVC Import Pod other accessible locations
4. After completing the import,
4 CDI the import pod is destroyed
s Controller and the PVC is available for
r ite the VM
2 W

PV
58
Ephemeral Virtual Machine Disks
● VMs booted via PXE or using a container image can be
“diskless”
○ PVCs may be attached and mounted as secondary
devices for application data persistence
● VMs based on container images use the standard
copy-on-write graph storage for OS disk R/W
○ Consider and account for capacity and IOPS
during RHCOS disk sizing if using this type
● An emptyDisk may be used to add additional
ephemeral capacity for the VM

59
Helper disks
● OpenShift Virtualization attaches disks to VMs for
injecting data
○ Cloud-Init
○ ConfigMap
○ Secrets
○ ServiceAccount
● These disks are read-only and can be mounted by the OS
to access the data within

60
Comparing with
traditional
virtualization
platforms

61
Live Migration
● Live migration moves a virtual machine from one node to another in the OpenShift cluster
● Can be triggered via GUI, CLI, API, or automatically
● RWX storage is required, cannot use bridge connection to pod network
● Live migration is cancellable by deleting the API object
● Default maximum of five (5) simultaneous live migrations
○ Maximum of two (2) outbound migrations per node, 64MiB/s throughput each

Migration Reason vSphere RHV OpenShift Virtualization

Resource contention DRS Cluster policy Pod eviction policy, pod


descheduler

Node maintenance Maintenance mode Maintenance mode Maintenance mode, node


drain

62
Automated live migration
● OpenShift / Kubernetes triggers Pod rebalance actions based on multiple factors
○ Soft / hard eviction policies
○ Pod descheduler
○ Pod disruption policy
○ Node resource contention resulting in evictions
■ Pods are Burstable QoS class by default
■ All memory is requested in Pod definition, only CPU overhead is requested
● Pod rebalance applies to VM pods equally
● VMs will behave according to the eviction strategy
○ LiveMigrate - use live migration to move the VM to a different node
○ No definition - terminate the VM if the node is drained or Pod evicted

63
VM scheduling
● VM scheduling follows pod scheduling rules
○ Node selectors
○ Taints / tolerations
○ Pod and node affinity / anti-affinity
● Kubernetes scheduler takes into account many additional factors
○ Resource load balancing - requests and reservations
○ Large / Huge page support for VM memory
○ Use scheduler profiles to provide additional hints (for all Pods)
● Resources are managed by Kubernetes
○ CPU and RAM requests less than limit - Burstable QoS by default
○ K8s QoS policy determines scheduling priority: BestEffort class is evicted before
Burstable class, which is evicted before Guaranteed class
64
Node Resource Management
● VM density is determined by multiple factors controlled at the cluster, OpenShift Virtualization,
Pod, and VM levels
● Pod QoS policy
○ Burstable (limit > request) allows more overcommit, but may lead to more frequent migrations
○ Guaranteed (limit = request) allows less overcommitment, but may have less physical resource
utilization on the hosts
● Cluster Resource Override Operator provides global overcommit policy, can be customized per
project for additional control
● Pods request full amount of VM memory and approx. 10% of VM CPU
○ VM pods request a small amount of additional memory, used for libvirt/QEMU overhead
■ Administrator can set this to be overcommitted

65
High availability
● Node failure is detected by Kubernetes and results in the Pods from the lost node being
rescheduled to the surviving nodes
● VMs are not scheduled to nodes which have not had a heartbeat from virt-handler, regardless of
Kubernetes node state
● Additional monitoring may trigger automated action to force stop the VM pods, resulting in
rescheduling
○ May take up to 5 minutes for virt-handler and/or Kubernetes to detect failure
○ Liveness and Readiness probes may be configured for VM-hosted applications
○ Machine health checks can decrease failure detection time

66
Terminology comparison
Feature RHV OpenShift Virtualization vSphere

Where VM disks are stored Storage Domain PVC datastore

Policy based storage None StorageClass SPBM

Non-disruptive VM Live migration Live migration vMotion


migration

Non-disruptive VM storage Storage live migration N/A Storage vMotion


migration

Active resource balancing Cluster scheduling policy Pod eviction policy, descheduler Dynamic Resource
Scheduling (DRS)

Physical network Host network config (via nmstate Operator, Multus vSwitch / DvSwitch
configuration nmstate w/4.4)

Overlay network OVN OCP SDN (OpenShiftSDN, NSX-T


configuration OVNKubernetes, and partners),
Multus

Host / VM metrics Data warehouse + OpenShift Metrics, health checks vCenter, vROps
Grafana (RHV 4.4)
67
Runtime
awareness

68
Deploy and configure

● OpenShift Virtualization is deployed as an


Operator utilizing multiple CRDs, ConfigMaps, etc.
for primary configuration
● Many aspects are controlled by native Kubernetes
functionality
○ Scheduling
○ Overcommitment
○ High availability
● Utilize standard Kubernetes / OpenShift practices
for applying and managing configuration

69
Compute configuration
● VM nodes should be physical with CPU virtualization technology enabled in the BIOS
○ Nested virtualization works, but is not supported
○ Emulation works, but is not supported (and is extremely slow)
● Node labeler detects CPU type and labels nodes for compatibility and scheduling
● Configure overcommitment using native OpenShift functionality - Cluster Resource Override
Operator
○ Optionally, customize the project template so that non-VM pods are not overcommitted
○ Customize projects hosting VMs for overcommit policy
● Apply Quota and LimitRange controls to projects with VMs to manage resource consumption
● VM definitions default to all memory “reserved” via a request, but only a small amount of CPU
○ CPU and memory request/limit values are modified in the VM definition

70
Network configuration
● Apply traditional network architecture decision framework to OpenShift Virtualization
○ Resiliency, isolation, throughput, etc. determined by combination of application, management,
storage, migration, and console traffic
○ Most clusters are not VM only, include non-VM traffic when planning
● Node interface on the MachineNetwork.cidr is used for “primary” communication, including SDN
○ This interface should be both resilient and high throughput
○ Used for migration and console traffic
○ Configure this interface at install time using kernel parameters, reinstall node if configuration
changes
● Additional interfaces, whether single or bonded, may be used for traffic isolation, e.g. storage and VM
traffic
○ Configure using nmstate Operator, apply configuration to nodes using selectors on NNCP
71
Storage configuration
● Shared storage is not required, but very highly encouraged
○ Live migration depends on RWX PVCs
● Create shared storage from local resources using OpenShift Container Storage
○ RWX file and block devices for live migration
● No preference for storage protocol, use what works best for the application(s)
● Storage backing PVs should provide adequate performance for VM workload
○ Monitor latency from within VM, monitor throughput from OpenShift
● For IP storage (NFS, iSCSI), consider using dedicated network interfaces
○ Will be used for all PVs, not just VM PVs
● Certified CSI drivers are recommended
○ Many non-certified CSI provisioners work, but do not have same level of OpenShift testing
● Local storage may be utilized via the Host Path Provisioner
72
Deploying a VM operating system
Creating virtual machines can be accomplished in multiple ways, each offering different options and capabilities
● Start by answering the question “Do I want to manage my VM like a container or a traditional VM?”
● Deploying the OS persistently, i.e. “I want to manage like a traditional VM”
○ Methods:
■ Import a disk with the OS already installed (e.g. cloud image) from a URL or S3 endpoint using a
DataVolume, or via CLI using virtctl
■ Clone from an existing PVC or VM template
■ Install to a PVC using an ISO
○ VM state will remain through reboots and, when using RWX PVCs, can be live migrated
● Deploying the OS non-persistently, i.e. “I want to manage like a container”
○ Methods:
■ Diskless, via PXE
■ Container image, from a registry
○ VM has no state, power off will result in disk reset. No live migration.
73
● Import disks deployed from a container image using CDI to make them persistent
Deploying an application
Once the operating system is installed, the application can be deployed and configured several ways
● The application is pre-installed with the OS
○ This is helpful when deploying from container image or PXE as all components can be managed and
treated like other container images
● The application is installed to a container image
○ Allows the application to be mounted to the VM using a secondary disk. Decouples OS and app lifecycle.
When used with a VM that has a persistently deployed OS this breaks live migration
● The application is installed after OS is installed to a persistent disk
○ cloud-init - perform configuration operations on first boot, including OS customization and app
deployment
○ SSH/Console - connect and administer the OS just like any other VM
○ Ansible or other automation - An extension of the SSH/console method, just automated

74
Additional
resources

75
More information
● Documentation:
○ OpenShift Virtualization: https://docs.openshift.com
○ KubeVirt: https://kubevirt.io
● Demos and video resources: http://demo.openshift.com
● Labs and workshops: coming soon to RHPDS

76
Thank you linkedin.com/company/red-hat

youtube.com/user/RedHatVideos
Red Hat is the world’s leading provider of enterprise

open source software solutions. Award-winning


facebook.com/redhatinc
support, training, and consulting services make

Red Hat a trusted adviser to the Fortune 500.


twitter.com/RedHat

77

You might also like