0% found this document useful (0 votes)
4 views44 pages

Lec16 Virtualization

Virtualization enables multiple operating systems to run on a single physical system, sharing hardware resources and providing user flexibility while isolating users. The document discusses various virtualization architectures, including hosted and bare-metal types, and highlights challenges faced in x86 architecture such as correctness and performance. It also covers advancements like dynamic binary translation, shadow page tables, and the evolution of virtualization technologies, including containers for resource isolation.

Uploaded by

jacky chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views44 pages

Lec16 Virtualization

Virtualization enables multiple operating systems to run on a single physical system, sharing hardware resources and providing user flexibility while isolating users. The document discusses various virtualization architectures, including hosted and bare-metal types, and highlights challenges faced in x86 architecture such as correctness and performance. It also covers advancements like dynamic binary translation, shadow page tables, and the evolution of virtualization technologies, including containers for resource isolation.

Uploaded by

jacky chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

CSCI3150 Introduction to

Operating Systems

Lecture 16: Virtualization

Hong Xu
https://github.com/henryhxu/CSCI3150
https://github.com/henryhxu/CSCI3150
Acknowledgement

 Based on slides from Brad Cambell@U. Virginia for CS6456

CSCI3150 Intro to OS 2
What is virtualization?

 Virtualization is the ability to run multiple operating systems on a single physical syst
em and share the underlying hardware resources 1

 Allows one computer to provide the appearance of many computers.

 Goals:
 Provide flexibility for users

 Amortize hardware costs

 Isolate completely separate users

1 VMWare white paper, Virtualization Overview

3
Requirements for Virtualizable Architectures

 “First, the VMM provides an environment for programs which is essentially identical
with the original machine;

 second, programs run in this environment show at worst only minor decreases in spe
ed;

 and last, the VMM is in complete control of system resources.”

 VMM: virtual machine monitor (from VMware); a.k.a. hypervisor

4
VMM Platform Types

 Hosted Architecture

 Install as application on existing x86 “host” OS, e.g. Windows, Linux, OS X

 Small context-switching driver

 Leverage host I/O stack and resource management

 Examples: VMware Player/Workstation/Server, Microsoft Virtual PC/Server, Parallels


Desktop

 Bare-Metal Architecture

 “Hypervisor” installs directly on hardware

 Acknowledged as preferred architecture for high-end servers

 Examples: VMware ESX Server, Xen, Microsoft Viridian (2008)

6
Virtualization: rejuvenation

 1960’s: first track of virtualization


 Time and resource sharing on expensive mainframes

 IBM VM/370

 Late 1970s and early 1980s: became unpopular


 Cheap hardware and multiprocessing OS

 Late 1990s: became popular again


 Wide variety of OS and hardware configurations

 VMWare

 Since 2000: hot and important


 Cloud computing

 Docker containers

7
IBM VM/370

 Technology: trap-and-emulate

Normal Application

Privileged Kernel

Trap Emulate

CPU

8
Trap and Emulate Virtualization on x86 architecture

 Challenges
 Correctness: not all privileged instructions produce traps!

 Performance:
 System calls: traps in both enter and exit (10X)

 I/O performance: high CPU overhead

 Virtual memory: no software-controlled TLB

9
Virtualization on x86 architecture

 Solutions:
 Dynamic binary translation & shadow page table

 Para-virtualization (Xen)

 Hardware extension

10
Dynamic binary translation

 Idea: intercept privileged instructions by changing the binary

 Cannot patch the guest kernel directly (would be visible to guests)

 Solution: make a copy, change it, and execute it from there


 Use a cache to improve the performance

11
Binary translation

 Directly execute unprivileged guest application code


 Will run at full speed until it traps, we get an interrupt, etc.
 “Binary translate” all guest kernel code, run it unprivileged
 Since x86 has non-virtualizable instructions, proactively transfer control to the VMM (no need for
traps)
 Safe instructions are emitted without change
 For “unsafe” instructions, emit a controlled emulation sequence
 VMM translation cache for good performance

12
How does VMWare do this?

 Binary – input is x86 “hex”, not source

 Dynamic – interleave translation and execution

 On Demand – translate only what about to execute (lazy)

 System Level – makes no assumptions about guest code

 Subsetting – full x86 to safe subset

 Adaptive – adjust translations based on guest behavior

13
Convert unsafe operations and cache them

Input: BB
55 ff 33 c7 03 ...
Each Translator Invocation
• Consume a basic block (BB)
• Produce a compiled code fragment (CCF)

Store CCF in Translation Cache


translator
• Future reuse
• Capture working set of guest kernel
• Amortize translation costs
• Not “patching in place”
Output: CCF
55 ff 33 c7 03 ...

14
Dynamic binary translation

 Pros:
 Make x86 virtualizable

 Can reduce traps

 Cons:
 Overhead

 Hard to improve system calls, I/O operations

 Hard to handle complex code

15
Shadow page table

Guest
page table

Shadow
page table

16
Shadow page table

 Pros:
 Transparent to guest VMs

 Good performance when working set is stable

 Cons:
 Big overhead of keeping two page tables consistent

 Introducing more issues: hidden fault, double paging …

17
Xen

18
Xen and the art of virtualization

 SOSP’03

 Very high impact (data collected in 2013)

Citation count in Google scholar


6000
8807
5153
5000
4000
3000 30902286
21161796
2000 1219 1222 1229 1413
1093
1000 461
0

19
Para-virtualization

 Full vs. para virtualization

20
Overview of the Xen approach

 Support for unmodified application binaries (but not OS)


 Keep Application Binary Interface (ABI)

 Modify guest OS to be aware of virtualization


 Get around issues of x86 architecture

 Better performance

 Keep hypervisor as small as possible


 Device driver is in Dom0

21
Xen architecture

22
Virtualization on x86 architecture

 Challenges
 Correctness: not all privileged instructions produce traps!

 Performance:
 System calls: traps in both enter and exit (10X)

 I/O performance: high CPU overhead

 Virtual memory: no software-controlled TLB

23
CPU virtualization

 Protection
 Xen in ring0, guest kernel in ring1

 Privileged instructions are replaced with hypercalls

 Exception and system calls


 Guest OS registers handlers validated by Xen

 Allowing direct system calls from app into guest OS

 Page faults: redirected by Xen

24
Memory virtualization

 Xen exists in a 64MB section at the top of every address space

 Guest sees the mapping to real machine address

 Guest kernels are responsible for allocating and managing the hardware page tables.

 After registering the page table to Xen, all subsequent updates must be validated.

25
Porting effort is quite low

26
Evaluation

27
Conclusion

 x86 architecture makes virtualization challenging

 Full virtualization
 unmodified guest OS; good isolation

 Performance issue (especially I/O)

 Para virtualization:
 Better performance (potentially)

 Need to update guest kernel

 Full and para virtualization will keep evolving together

28
Instead: Leverage hardware support

 First generation - processor

 Second generation - memory

 Third generation – I/O device


 In progress

29
Protection Rings

 Actually, x86 has four protection CPU Privilege Level (CPL)


levels, not two (kernel/user).
 X86 rings (CPL)
 Ring 0 – “Kernel mode” (most
privileged)
 Ring 3 – “User mode”
 Ring 1 & 2 – Other
Increasing Privilege Level
 Linux only uses 0 and 3.
 “Kernel vs. user mode” Ring 0
 Pre-VT Xen modified to run the Ring 1
guest kernel to Ring 1: reserve
Ring 0 for hypervisor. Ring 2

Ring 3

[Fischbach] 30
Why aren’t protection rings good enoug
h?

VM
CPL 3
Increasing Privilege Level
guest kernel CPL 1
Ring 0
hypervisor CPL 0 Ring 1
Ring 2

Ring 3

31
A short list of pre-VT problems

Early hypervisors (VMware, Xen) had to emulate various machine behaviors


and generally bend over backwards.
 IA32 page protection does not distinguish CPL 0-2.
 Segment-grained memory protection only.

 Ring aliasing: some IA instructions expose CPL to guest!


 Or fail silently…

 Syscalls don’t work properly and require emulation.


 sysenter always transitions to CPL 0. (D’oh!)
 sysexit faults if the core is not in CPL 0.

 Interrupts don’t work properly and require emulation.


 Interrupt disable/enable reserved to CPL0.

32
First generation: Intel VT-x & AMD SVM

 Eliminating the need of binary translation or modifying OSes

Host mode Guest mode

Ring3 Ring3
VMRUN
Ring2 Ring2
Ring1 VMEXIT Ring1
Ring0 Ring0

33
VT in a Nutshell

 New VM mode bit, Orthogonal to CPL (e.g., kernel/user mode)

 If VM mode is off → host mode


 Machine “looks just like it always did” (“VMX root”)

 If VM bit is on → guest mode


 Machine is running a guest VM: “VMX non-root mode”

 Machine “looks just like it always did” to the guest, BUT:

 Various events trigger gated entry to hypervisor (in VMX root)

 A “virtualization intercept”: exit VM mode to VMM (VM Exit)

 Hypervisor (VMM) can control which events cause intercepts

 Hypervisor can examine/manipulate guest VM state and return to VM (VM Entry)

34
CPU Virtualization With VT-x

Virtual Machines (VMs)


 Two new VT-x operating modes
 Less-privileged mode
(VMX non-root) for guest OSes Ring 3 Apps Apps
 More-privileged mode
(VMX root) for VMM
Ring 0 OS OS
 Two new transitions
 VM entry to non-root operation VM Exit VM Entry

 VM exit to root operation VMX


VM Monitor (VMM)
Root
• Execution controls determine when exits occur
• Access to privilege state, occurrence of exceptions, etc.
• Flexibility provided to minimize unwanted exits
• VM Control Structure (VMCS) controls VT-x operation
• Also holds guest and host state
35
Second generation: Intel EPT & AMD NPT

 Eliminating the need to shadow page table

36
Containers and isolation

42
Containers: idea

 Run a process with a restricted view of system resources


 Hide other processes

 Limit access to system resources

 Not full virtualization


 Process must use existing kernel and OS

 Benefits
 Consistent environment (runtime, dependencies, etc.)

 Low overhead, rapid launch/terminate

 Performance

43
Pre-container isolation features in Linux

 chroot
 Set the current root directory for processes
 Added to Unix in 1979

 Namespaces
 Provide processes with their own view of resources
 Process IDs, networking sockets, hostnames, etc.

 Akin to virtual address spaces


 Originally introduced in 2002

 Copy-on-Write Filesystem
 Allow a process to view existing filesystem, but any modifications result in copies then
updates
 Akin to virtual memory after fork

44
Key technologies

CSCI3150 Intro to OS 45
Namespace

 “A namespace wraps a global system resource in an abstraction that makes it appear


to the processes within the namespace that they have their own isolated instance of
the global resource.” – Linux man-pages

 Controls what resources a process can see, and what they are called

 Restrict visibility -> solve “inconsistent distribution”

CSCI3150 Intro to OS 46
cgroups in Linux

 Enforce policies to restrict how processes use resources


 Example: A process can only use 1 Mbps of network bandwidth

 Policies are enforced on groups of processes

 Controllers enforce various processes

 Restrict usage -> solves “resource isolation”

47
Controllers enforce restrictions for cgroups

 Different controllers for different system resources

 Each provides a policy for how processes should be restricted

48
Controllers in Linux

 io
 Limit I/O requests either capped per process or proportionally.

 memory
 Enforce memory caps on processes

 pids
 Limit number of new processes in a cgroup

 perf_event
 Allow monitoring performance

 cpu
 Limit CPU usage when CPU is busy

 freezer
 Allow suspending all processes in a cgroup
49
Linux containers = combinations of namespaces & cgroups

50

You might also like