0% found this document useful (0 votes)
116 views20 pages

KVM Architecture LK2010 PDF

Uploaded by

Arun Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views20 pages

KVM Architecture LK2010 PDF

Uploaded by

Arun Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Corporate Technology

Architecture of the Kernel-based


Virtual Machine (KVM)

Jan Kiszka, Siemens AG, CT T DE IT 1


Corporate Competence Center Embedded Linux
[email protected]

Copyright © Siemens AG 2010. All rights reserved.


Agenda

 Introduction
 Basic KVM model
 Memory
 API
 Optimizations
 Paravirtual devices
 Outlook

Slide 2 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Virtualization of Commodity Computers

Instruction On-Chip
Set Resources

Busses
& CPU
I/O Devices
MMU

Clocks
Interrupt
&
Controllers
Timers

Memory

Slide 3 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Virtualizing the x86 Instruction Set Architecture

x86 originally virtualization “unfriendly”


 No hardware provisions
 Instructions behave differently depending on privilege context
 Performance suffered on trap-and-emulate
 CISC nature complicates instruction replacements
Early approaches to x86 virtualization
 Binary translation (e.g. VMware)
 Execute substitution code for privileged guest code
 May require substantial replacements to preserve illusion
 CPU paravirtualization (e.g Xen)
 Guest is aware of instruction restrictions
 Hypervisor provides replacement services (hypercalls)
 Raised abstraction levels for better performance

Slide 4 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Hardware-assisted x86 CPU Virtualization

Two variants
 Intel's Virtualization Technology, VT-x
 AMD-V (aka Secure Virtual Machine)
Identical core concept

CPU 3 Host Guest VCPU 3


2
1 State State 2
1
0 0

Slide 5 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Advent and Evolution of KVM

Introduced to make VT-x/AMD-V available to user space


 Exposes virtualization features securely
 Interface: /dev/kvm
Merged quickly
 Available since 2.6.20 (2006)
 From first LKML posting to merge: 3 months
 One reason: originally 100% orthogonal to core kernel
Evolved significantly since then
 Ported to further architectures (s390, PowerPC, IA64)
 Always with latest x86 virtualization features
 Became recognized & driving part of Linux

Slide 6 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


The KVM Model

Processes can create


virtual machines
Guest
VMs can contain Memory Hyper-
 Memory visor
 Virtual CPUs Process
 In-kernel device models VCPU VCPU
Guest physical memory part of Thread Thread Thread
creating process' address space
VCPUs run in process KVM
Linux
execution contexts Kernel
 Process usually maps
VCPUs on threads CPU CPU CPU

Slide 7 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Architectural Advantages of the KVM Model

Proximity of guest and user space hypervisor


 Only one address space switch: guest ↔ host
 Less rescheduling
Massive Linux kernel reuse
 Scheduler
 Memory management with swapping (though you don't what this)
 I/O stacks
 Power management
 Host CPU hot-plugging
…
Massive Linux user land reuse
 Network configuration
 Handling VM images
 Logging, tracing, debugging
 ...
Slide 8 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology
VCPU Execution Flow (KVM View)

User Space
Handle
Update
• I/O Handle
context, Run
• Invalid states Signal
raise IRQs
• ...

Handle
Update • In-Kernel I/O Handle

Kernel
guest • [vMMU] Host
state Save Host, • ... Save Guest, IRQ
Load Guest Load Host
State State

VM entry VM exit

CPU
Execute native (with reason)
guest code

Slide 9 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


KVM Memory Model

Slot-based guest memory


 Maps guest physical to RAM
host virtual memory
 Reconfigurable RAM
 Supports dirty tracking
In-Kernel Virtual MMU RAM
Coalesced MMIO
 Optimizes guest access to
Coalesced
RAM-like virtual MMIO regions MMIO
Out of scope
 Memory ballooning Unassigned
(guest ↔ user space hypervisor)
 Kernel Same-page Merging RAM
(not KVM-specific) Guest Hypervisor
Address Space Address Space

Slide 10 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


KVM API Overview

Step #1: open /dev/kvm


Three groups of IOCTLs
 System-level requests
 VM-level requests
 VCPU-level requests
Per-group file descriptors
 /dev/kvm fd for system level
 Creating a VM or VCPU returns new fd
mmap on file descriptors
 VCPU: fast kernel-user communication segment
 Frequently read/modified part of VCPU state
 Includes coalesced MMIO backlog
 VM: map guest physical memory (deprecated)

Slide 11 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Basic KVM IOCTLs

KVM_CREATE_VM

KVM_SET_USER_MEMORY_REGION
KVM_CREATE_IRQCHIP / ...PIT (x86)
KVM_CREATE_VCPU

KVM_SET_REGS / ...SREGS / ...FPU / ...


KVM_SET_CPUID / ...MSRS / ...VCPU_EVENTS / ... (x86)
KVM_SET_LAPIC (x86)
KVM_RUN

Slide 12 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Optimizations of KVM

Hardware evolves quickly


 Near-native performance in guest mode
 Decreasing costs of mode switches
 Additional features avoid software solutions, thus exits
 Nested page tables
 TLB tagging
 APIC virtualization
 ...

What will continue to consume cycles?


 Code path between VM-exit and VM-entry
 Mode switches, i.e. the need to exit at all

Slide 13 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Lightweight vs. Heavy-weight VM-Exits

Exits cost time!


 Basic state switch in hardware
 Additional state switches in software
 Analyze exit reason
 Return to APIC
In-kernel user space
 Analyze
In-kernelexit reason+ PIC
IO-APIC
 Obtain KVM state (VCPU, devices)
Coalescing MMIO
 Handle exit cause
 In-kernel instruction interpreter (detect MMIO access)
 Write back states >7.000 cycles
 Invoke
In-kernel network stub (vhost-net)
KVM_RUN
 Software-managed state switch >10.000 cycles
 Hardware state switch

Slide 14 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Optimizing Lightweight Exits

Let's get lazy!


 Perform only partial state switches z
z
 Complete at latest possible point z
 Late restoring for guest and host state
Candidates (x86)
 FPU
 Debug registers
 Model-specific registers (MSRs)
Requirements
 Usage detection when in guest mode
 Depends on hardware support
 Demand detection while in host mode
 Preemption notifiers
 User-return notifier

Slide 15 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Lazy MSR Switching

Why is this possible?


 Some MSRs unused by Linux
 Some MSRs only relevant when in user space
 Some are identical for host & guest
Approach
 Keep guest values of certain MSRs until...
 sched-out fires
 KVM_RUN IOCTL returns
 Keep others until user-return fires (Intel only)
Optimizations are vendor-specific
Exemplary saving:
 2000 cycles for guest → idle thread → guest

Slide 16 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


Paravirtual Devices

Advantages
 Reduce VM exits or make them lightweight
 Improve I/O throughput & latency (less emulation)
 Compensates virtualization effects
 Enable direct host-guest interaction
Available interfaces & implementions
 virtio (PCI or alternative transports)
 Network
 Block user space
 Serial I/O (console, host-guest channel, …) business
 Memory balloon (primarily)
 File system (9P)
 Clock (x86 only)
 Via shared page + MSRs KVM
business
 Enables safeTM TSC guest usage

Slide 17 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


An Almost-In-Kernel Device –
vhost-net

Goal: high throughput /


low latency guest networking memory r/w
 Avoid heavy exits VCPU virtio
ring &
 Reduce packet copying buffers
 No in-kernel QEMU, please! KVM
ioeventfd
The vhost-net model irqfd r/w
 Host user space opens and
configures kernel helper vhost-net r memory
worker slot
 virtio as guest-host interface table
kthread
 KVM interface: eventfd
 TX trigger → ioeventfd
 RX signal → irqfd hypervisor process
 Linux interface vie tap or macvtap Linux
network
Enables multi-gigabit throughput stack

Slide 18 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology


What's next?

Generic Linux improvements


 Transparent huge pages (mm topic)
 NUMA optimizations (scheduler topic)
Improve spin-lock-holder preemption effects
Zero-copy & multi-queue vhost-net
Further optimize exits
 Instruction interpretation (hardware may help)
 Faster in-kernel device dispatching
Nested virtualization as standard feature
 AMD-V bits already merged and working
 VT-x more complex but likely solvable
Hardware-assisted virtualization on non-x86
 PowerPC ISA 2.06
 ARMv7-A “Eagle” extensions

Slide 19 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology
Thanks you for listening!

Questions?

Slide 20 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology

You might also like