Sharing IOMMU PageTables With TDP in KVM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Sharing IOMMU page tables with TDP in KVM

Lu Baolu [email protected]
Zhao Yan [email protected]
Tian Kevin [email protected]
Sep. 2021
Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness
for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of
dealing, or
usage in trade.
This document contains information on products, services and/or processes in development. All information provided here is
subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and
roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published
specifications. Current characterized errata are available on request. No product or component can be absolutely secure.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-
4725 or by visiting www.intel.com/design/literature.htm.
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© Intel Corporation
Agenda
• Goal
• Sharing Advantages
• Sharing Prerequisites
• Sharing Interfaces
• Page & Page table Pinning
• Shared Page Table Root Update
• Bootup Performance
• TODOs
Goal

IOMMU CPU IOMMU CPU

GPA GPA GPA

HPA HPA
IOPT TDP HPA TDP

Duplicated! ONE COPY!


Sharing Advantages

• Reduced memory footprint

• Unified page table management


– Dirty page tracking, page fault handling, etc.

• Probably higher performance by reducing


unnecessary EPT/NPT zap
Sharing Prerequisites

• The same address space

• Compatible page table format

• Non-conflicting page table content


The Same Address Space

• Address space is GPA (L1)  HPA

• Qemu
– KVM side
• check TDP is enabled
• vCPU model does not include EPT/NPT feature

– IOMMU side
• no vIOMMU
• vIOMMU is not in shadow mode. (nested mode on GPA is ok)

.
The Same Address Space (Cont.)

• Nested VM
– TBD currently

• SMM in x86
– A different address space. Cannot be shared to IOMMU.
– Non-SMM mode EPT must be kept for sharing when vCPU is
in SMM mode.
Compatible Page Table Formats

• Unified compatible page table format definition across


KVM and IOMMU

• Compatible page table formats


– FORMAT_EPT_LEVEL_4
– FORMAT_EPT_LEVEL_5
– FORMAT_NPT_LEVEL_4
– FORMAT_NPT_LEVEL_5
– …
Sharing Handshake Sequence
QEMU VFIO IOMMU KVM

1 get current sharable page table format


Note:
EPT_LEVEL_4 1. device pass-through is
based on the
/dev/iommu proposal,
2 Check sharable format (EPT_LEVEL_4)
which is IOASID oriented.
2. KVM shares TDP used by
SUCCESS
vCPU 0

3 Allocate IOASID with format EPT_LEVEL_4

4 Attach Device to IOASID Request sharing


Non-conflicting Page Table Content

• Presence of page table entry


– For KVM user memslots
• must be present and pinned (staying present) for DMA pages when IO
page fault is not supported.
• Can be present or zapped for non-DMA pages or when IO page fault
supported

– For KVM private memslots


• Not present in IOPT before sharing
• Safe to be present in IOPT after sharing

• Local APIC
– DMA write to 0xfeexxxxx doesn’t go through DMA remapping.

• TSS and IDENTITY_PAGETABLE


– for !enable_unrestricted_guest, E820 Reserved
Non-conflicting Page Table Content (Cont.)

• Read/Write/Execute bit
– RO for RO memslots
– RW for other memslots
– Execute bit
• currently ignored in IOMMU and no device uses it.
– Write protection for live migration
• Allowed when IO page fault is supported
• Must be disabled otherwise
– All pinned ranges are dirty or
– traversal for Dirty bit
Sharing Interfaces
• Request/stop sharing Attach/Detach Device to pin/unpin
IOASID shared with TDP QEMU
• Page/page table pinning for
Request sharing with
DMAs without IO page fault pin mode & notification callbacks
IOMMU
page fault KVM
IO page fault
• Page fault for IO page fault
IOASID TDP
support Page table content update notification

Page table root update notification


• Notification
• Page table content update notification
• Page table root update notification
Page & Page Table Pinning
For sharing without IO page fault,
• Pinning of VM pages
– pin_user_pages_* (FOLL_LONGTERM) pin_user_page_*
VM pages

• Pinning of TDP entries


– Pre-population of pinned ranges Pin
KVM TDP
– No zap/pfn update
– No reclaiming of mmu pages with parent
linked zap/pfn_update
mm notifier
– Atomic update of TDP entries
when permission or page size
change
Atomic Update for TDP Entries
Atomic update is required for TDP entries for pinned ranges, when
– Splitting huge pages
– Updating of PTE permission

1 entry
rmap
2M 3
remove
rmap add
page entry 0
1
rmap add

511
rmap add 2M
0
page
2 1

511
2M
page

TDP entry being atomically updated from non-zero value to another non-zero value.
Page & Page Table Pinning Interfaces
• For sharing without IO page fault,
– Pinning of all ranges in user memslots: memslot add
– Pinning a specific range: extra interface

Pin/Unpin from Qemu Pin/Unpin from IOMMU


Pros: Pros:
QEMU doesn’t need to Straightforward to hold more
MAP/UNMAP IOASID QEMU
QEMU IOMMU side info, e.g. snoop
which is using 3rd party TDP bit
DMA_MAP/
Pin/Unpin DMA_UNMAP
IOMMU KVM IOMMU Request sharing KVM
Request sharing (pinning mode)
IOASID
(pinning mode) TDP IOASID TDP
Pin/Unpin
Shared Page Table Root Update
IOMMU vcpu 0 vcpu n-2 vcpu n-1
… 1 vcpu 0
2 kvm_mmu_load
Root vmcs vmcs vmcs kvm_mmu_unload
pointer eptp eptp eptp
vmcs
root_count-- eptp root_count++
3
mmu root page mmu root page
(old role) (new role)

root_count-- 4 If !role.smm,
root_count++

role role Atomic update done IOMMU 5 Pre-population of TDP


role.smm=1
Root update for pinned ranges
6 Root
pointer
notification

mmu root page mmu root page


(role) (smm role)
Share TDP root
of vcpu 0 root_count=n
Bootup Performance
Rough performance data without any optimization yet.
8G memory Bootup Time Pre-population • All VM pages were pinned/unpinned on
count user memslots creation/deletion.
Base 29s 0
(no sharing)
• TDP was pre-populated on page table
Sharing 32s 132 changes (when switching to new root,
(huge page enabled)
memslot add, and huge page splitting)
Sharing 63s 132
(huge page disabled)
• IOTLB was flushed on page table
root/content update notification (~1s)
• Quite a lot of time spend on TDP pre-population
• ~2s with huge page
• ~32s when huge page is disabled
• In concept can reach equal boot time performance as before sharing by reducing TDP root update count.
TODOs
• Snoop bit handling

• Unified dirty page tracking

• Nested VM (vIOMMU, virtual EPT/NPT)

• Performance optimization
– Page table root update reduction,
– Huge page support for P2P, etc.
Why it is KVM manages the shared table
• CPU side has more restrictions in page size
– Check guest MTRR
– NX huge page workaround

• CPU side has extra GFN ranges to access


– Private memslots in kernel space

• IOMMU page tables are not always present.


Overall Design
2 Attach/Detach IOASID vfio
Pin/Unpin
Pin All user memslot
1 Alloc IOASID a gfn range
KVM_EPT_LEVEL_4
/dev/vfio/devices/dev[n] 4b anon_inode: 4a
/dev/iommu kvm-vm

Pin/Unpin on
1 Pin/Unpin
memslot create/delete
notification of
3 5 root/map update
Device[n] 2 Attach/
Detach IOASID 4c page fault TDP
IO page fault

2 Request/Stop sharing

Legends: IOMMU 4a KVM


uAPI existing kernel interface
KVM sharing on/off interface KVM new interface KVM notification
Overall Design (alternative)
2 Attach/Detach IOASID vfio

1 Alloc IOASID
KVM_EPT_LEVEL_4 anon_inode:
/dev/vfio/devices/dev[n] /dev/iommu kvm-vm

4b 4a
1 Pin/Unpin on
MAP_DMA/ notification of memslot create/delete
UNMAP_DMA 3 5 root/map update

2 Attach/ 4a Pin All


Device[n]
Detach IOASID 4b Pin/Unpin a gfn range TDP
4c page fault
2 Request/Stop sharing
IOMMU KVM

You might also like