Xvisor 140810103501 Phpapp01
Xvisor 140810103501 Phpapp01
lightweight hypervisor
Instruction On-Chip
Set Resources
Busses
& CPU
I/O Devices
MMU
Clocks
Interrupt
&
Controllers
Timers
Memory
Agenda (1) Virtualization Concepts
(2) Xvisor for ARM
(3) Device Virtualization
Virtualization Concepts
CPU Performance
• 1965 - IBM S/360 – 0.1 MIPS (133,300 IPS)
– Provided full hardware virtualization with the ability to
run 14 OS instances.
• 1972 - IBM S/370 – 1.0 MIPS (1,000,000 IPS)
• 2000 - 1 GHz Intel P3 – 3,000 MIPS (3,000,000,000 IPS)
• 2009 - Qualcomm Snapdragon A8 – 2,000 MIPS
• 2010 - Intel Core i7 – 4 x 147,600 MIPS
• 2010 - Qualcomm Snapdragon MP – 2 x 2,500 MIPS
• 2011 - Qualcomm/Samsung/nVidia A9 MP – 2 x 5,000 MIPS
• 2012 – ARM Cortex A15 MP – 4 x 25,000 MIPS
ARM processors are capable to virtualize in much cheaper and powerful way!
Operating System Level Virtualization
• Why to have another layer below existing operating
systems?
– OS is not perfect: compatibility, stability,
security
• Workload consolidation
– Increase server utilization
– Reduce capital, hardware, power, space, heat costs
• Legacy OS support
– Especially with large 3rd-party software products
• Migration
– Predicted hardware downtime
– Workload balancing
Use Case: Low-cost 3G Phone
• Mobile Handsets
– Major applications runs on Linux
– 3G Modem software stack runs on
RTOS domain
• Virtualization in multimedia Devices
– Reduces BOM (bill of materials)
– Enables the Reusability of legacy
code/applications
– Reduces the system development
time Hypervisor
• Instrumentation, Automation
– Run RTOS for Measurement and
analysis
– Run a GPOS for Graphical Interface
• Real cases: Motorola Evoke QA4
General Classification of Virtualization
Type 1: Bare metal Hypervisor Type 2: OS ‘Hosted’
A pure Hypervisor that runs directly on A Hypervisor that runs within a Host OS
the hardware and hosts Guest OS’s. and hosts Guest OS’s inside of it, using the
host OS services to provide the virtual
environment.
VMn User-level VMM VMn
VM1 User VM1
Apps
VM0 Device Models
VM0
Guest OS Guest OS
and Apps and Apps
Host OS
Scheduler Hypervisor
Ring-0 VM Monitor
Device Drivers/Models Device Drivers “Kernel “
MMU
Host HW Host HW
I/O Memory CPUs I/O Memory CPUs
11
Problematic Instructions (1)
• Type 1
Instructions which executed in user mode will cause
undefined instruction exception
• Example
MCR p15, 0, r0, c2, c0, 0
Move r0 to c2 and c0 in coprocessor specified by p15
(co-processor) for operation according to option 0
and 0
– MRC: from coproc to register
– MCR: from register to coproc
• Problem:
– Operand-dependent operation
Problematic Instructions (2)
• Type 2
Instructions which executed in user mode will have
no effect
• Example
MSR cpsr_c, #0xD3
Switch to privileged mode and disable interrupt
N Z C V Q -- J -- GE[3:0] -- E A I F T M[4:0]
• Type 3
Instructions which executed in user mode will cause
unpredictable behaviors.
• Example
MOVS PC, LR
The return instruction
changes the program counter and switches to user
mode.
• This instruction causes unpredictable behavior when
executed in user mode.
ARM Sensitive Instructions
• Coprocessor Access Instructions
MRC / MCR / CDP / LDC / STC
• SIMD/VFP System Register Access Instructions
VMRS / VMSR
• TrustZone Secure State Entry Instructions
SMC
• Memory-Mapped I/O Access Instructions
Load/Store instructions from/into memory-mapped I/O locations
• Direct (Explicit/Implicit) CPSR Access Instructions
MRS / MSR / CPS / SRS / RFE / LDM (conditional execution) / DPSPC
• Indirect CPSR Access Instructions
LDRT / STRT – Load/Store Unprivileged (“As User”)
• Banked Register Access Instructions
LDM/STM (User mode registers)
Solutions to Problematic Instructions
[ Hardware Techniques ]
• Privileged Instruction Semantics dictated/translated
by instruction set architecture
• MMU-enforced traps
– Example: page fault
• Tracing/debug support
– Example: bkpt (breakpoint)
• Hardware-assisted Virtualization
– Example: extra privileged mode, HYP, in ARM
Cortex-A15
Solutions to Problematic Instructions
[ Software Techniques ]
Binary
Complexity Hypercall
translation
Design High Low
Implementation Medium High
Runtime High Medium
/* In Guest OS */
SWI Handler
BL TLB_FLUSH_DENTRY
…
TLB_FLUSH_DENTRY: Hypercall Handler
MOV R1, R0
……
MOV R0, #CMD_FLUSH_DENTRY
SWI #HYPER_CALL_TLB
… LDR R1, [SP, #4]
MCR p15, 0, R1, C8, C6, 1
• File: arch/arm/cpu/arm32/elf2cpatch.py
– Script to generate cpatch script from guest OS ELF
• Functionality before generating the final ELF image
– Encode all privileged instructions into SVC instructions
(software interrupt)
– For each privilege instruction, generate a primitive to
replace it
– read the directive from ELF2CPATCH and mangle the
target binary file
– The patched image contains no privilege instructions
and could run with in user mode
elf2cpatch.py :
...
if (len(w)==3):
if (w[2]=="wfi"):
print "\t#", w[2]
print "\twrite32,0x%x,0x%08x" % (addr, convert_wfi_inst(w[1]))
elif (len(w)==4):
if (w[2]=="cps" or w[2]=="cpsie" or w[2]=="cpsid"):
print "\t#", w[2], w[3]
print "\twrite32,0x%x,0x%08x" % (addr, convert_cps_inst(w[1]))
How does Xvisor handle problematic
instructions like MSR?
• Type 2
Instructions which executed in user mode will have
no effect
• Example
MSR cpsr_c, #0xD3
Switch to privileged mode and disable interrupt
N Z C V Q -- J -- GE[3:0] -- E A I F T M[4:0]
# MSR (immediate)
# Syntax:
# msr<c> <spec_reg>, #<const>
# Fields:
# cond = bits[31:28]
# R = bits[22:22]
# mask = bits[19:16]
# imm12 = bits[11:0]
# Hypercall Fields:
# inst_cond[31:28] = cond
# inst_op[27:24] = 0xf
# inst_id[23:20] = 0
# inst_subid[19:17] = 2
# inst_fields[16:13] = mask
# inst_fields[12:1] = imm12
# inst_fields[0:0] = R
# MSR (immediate)
# Syntax:
# msr<c> <spec_reg>, #<const>
# Fields:
# cond = bits[31:28]
# R = bits[22:22]
# mask = bits[19:16]
# imm12 = bits[11:0]
# Hypercall Fields:
# inst_cond[31:28] = cond
# inst_op[27:24] = 0xf
# inst_id[23:20] = 0
def convert_msr_i_inst(hxstr): # inst_subid[19:17] = 2
# inst_fields[16:13] = mask
hx = int(hxstr, 16) # inst_fields[12:1] = imm12
inst_id = 0 # inst_fields[0:0] = R
inst_subid = 2
cond = (hx >> 28) & 0xF Xvisor utilizes cpatch to convert
R = (hx >> 22) & 0x1
all problematic instructions for OS
mask = (hx >> 16) & 0xF
imm12 = (hx >> 0) & 0xFFF image files (ELF format).
rethx = 0x0F000000
rethx = rethx | (cond << 28)
rethx = rethx | (inst_id << 20)
rethx = rethx | (inst_subid << 17)
rethx = rethx | (mask << 13)
rethx = rethx | (imm12 << 1)
rethx = rethx | (R << 0)
return rethx
But it is not enough...
• have to handle virtual CPU, memory, devices.
Boot ARM/Linux under Xvisor
TIME
Qemu/ Xvisor img
Xvisor shell
QEMU Virtual IO
Guest instance 0
Xvisor Dev
MUGGLE TTY
attach
mangling
Scheduling
• Xvisor is basically a RTOS
• A thread in Xvisor is a “vcpu”
• Xvisor provides a priority-based
time slicing scheduler policy
• guest OS know nothing about
Xvisor Guest
Memory
Hypervisor
Tasks
VCPU VCPU
Xvisor
virtual memory
0300
P2
P2
0500 What if P1 do something like
P3
P3 memset(pagetable, 0, …)
0800
Privilege
virtualizer
virtualizer
virtualizer
virtualizer virtualizer
virtualizer virtualizer
virtualizer virtualizer
virtualizer
compatible = "ARMv7a,cortexa8";
start_pc = <0x40000000>;
mem1 {
guest_physical_addr = <0x70000000>;
host_physical_addr = <0x82000000>;
physical_size = <0x06000000>; /* 96 MB */
manifest_type = "real"; The vmm.bin parse the configure file and find that
0x40000000 is mapping to 0x80800000.
address_type = "memory";
Then Its setup the mapping and resume guest program
guest_physical_addr = <0x40000000>;
host_physical_addr = <0x80800000>;
physical_size = <0x00800000>;
}
Device Virtualization
Device Virtualization Bigpicture
• userspace device emulation
• Paravirtualized device drivers (VirtIO)
address_type = "memory";
guest_physical_addr = <0x1E000000>;
physical_size = <0x2000>;
device_type = "pic";
compatible = "realview,gic";
parent_irq = <6>;
};
For all device region, The Xvisor setups a mmu mapping with a
"no r/w permission" attribute in the page table.
Everytime the guest program access the device memory, a page fault
is triggered and CPU jump into the data_abort handler.
By decoding the instruction pointed by the fault address, the data-abort
handler emulate the device behavior before resume guesst VCPU
Device Emulator
• Since the device is actually a plain memory with its
functionalities emulated by software, the multiplex
could be easily implemented as following:
trap
v_uart guest0
bind
trap
uart vmm v_uart guest1