0% found this document useful (0 votes)

13 views95 pages

Class 2 3 5

The document discusses virtualization in operating systems, detailing its history, types, and mechanisms, including hypervisors and their roles in managing virtual machines. It explains the differences between Type 1 (bare metal) and Type 2 (hosted) hypervisors, as well as various virtualization techniques such as full virtualization, paravirtualization, and hardware-assisted virtualization. Additionally, it covers the importance of isolation and security in cloud computing environments, emphasizing the need for efficient resource management and the role of the hypervisor in this context.

Uploaded by

Aaryan Bhagat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views95 pages

Class 2 3 5

Uploaded by

Aaryan Bhagat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 95

Cloud Computing and

Cloud Networking
K. K. Ramakrishnan
[email protected]
CS 208, Winter 2025
Tue-Thur 2 pm - 3:20 pm
Class 2, 3
Virtualization in Operating
Systems
Class 2

Page
2
Virtualization

• Virtualization: extend or replace an existing interface to

mimic the behavior of another system.
– Introduced in 1970s: run legacy software on newer mainframe
hardware
• Handle platform diversity by running apps in VMs
– Portability and flexibility
Thanks to Prof. Prashant Shenoy, CS Umass
for many of the slides here
Papers to Read

• M. Rosenblum, T. Garfinkel,
Virtual machine monitors: current technology and future
trends
, IEEE Computer, 38:5, May 2005.
• Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand,
Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and
Andrew Warfield, Xen and the art of Virtualization,
Proc. of SOSP, October 2003
Types of Interfaces

• Different types of interfaces

– Assembly instructions
– System calls
– APIs
• Depending on what is replaced /mimic-ed, we obtain
different forms of virtualization
5
Types of Virtualization
Application
s
Application Application Guest OS Non-
s s privileged
Applicati Guest OS VMM VMM mode
on
OS VMM Host OS Host VMM Privileged
(Hypervisor OS mode
Hardware Hardware
) Hardware
Hardware
Physical Hosted VM
machine Native VM Dual-Mode VM

Virtual machines can be built in multiple ways

1) With a VMM on top of the bare-metal: so hypervisor is in
privileged mode, and mediates access to CPU, disk etc.
2) VMM runs on top of Host OS.
3) VMM implemented in dual mode: Part of VMM in user mode, part
in kernel (OS needs modifications).
Page
6
Two broad types of hypervisors

• Type 1: Native (bare metal) - – example of Full/Native Virtualization

– Hypervisor runs on top of the bare metal machine
– e.g., VMWare Workstation
• Type 2: Hosted Virtual Machines
– Hypervisor is an emulator
– e.g., KVM, QEMU 7
Types of Hypervisors

• Type 1: hypervisor runs on “bare metal”

• Type 2: hypervisor runs on a host OS (like hosted VMs)
– Guest OS runs inside hypervisor
• Both VM types act like real hardware from application
standpoint
8
Types of Virtualization

• Full/native Virtualization (Type 1)

– VM simulates “enough” hardware to allow an unmodified
guest OS to be run in isolation (isolation from one VM to
another is an important characteristic for users sharing a cloud
platform)
• Same hardware CPU
– IBM VM family, VMWare Workstation, Parallels, VirtualBox
• Emulation (Type 2)
– VMM emulates/simulates complete hardware
– Unmodified guest OS for a different 'PC' can be run
• Bochs, VirtualPC for Mac, QEMU
9
Protection through Isolation
• Excerpted from Johanna Ullrich, Edgar R. Weippl, in “The Cloud
Security Ecosystem”, 2015 (Elsevier)
• Hypervisor provides efficient, isolated ‘duplicate’ of
physical machine for VMs.
• All sensitive instructions, (those changing
resource availability or configuration), must be privileged
instructions, to build effective hypervisor (Popek & Goldberg, ‘74)
– all sensitive instructions cross (go through) the hypervisor, which
is able to control VMs appropriately.
– This is full virtualization: advantage is host OS does not have to
be adapted, i.e., it is unaware of its virtualized environment.
• But, often need # additional actions to be virtualizable, 
paravirtualization, binary translation & hardware-assisted
virtualization
Hybrid organizations

User space

VMM

• Hybrid hypervisors are popular: e.g., Xen

– Mostly bare metal virtualization.
– But, VM0/Dom0 to keep device drivers out of VMM
– Paravirtualization (PV) front-end talks to Dom0 11
More on Types of virtualization
• Para-virtualization
– VMM does not simulate all of the hardware capabilities
– Use special API that a modified guest OS must use
– Hypercalls trapped by the Hypervisor and serviced through Dom0
– Xen, VMWare ESX Server
• OS-level virtualization
– OS allows multiple 'secure' virtual servers to be run
– Guest OS is the same as the host OS, but appears isolated
• apps see an isolated OS
– Solaris Containers, BSD Jails, Linux Vserver, Linux containers, Docker
• Application level virtualization
– Application is given its own copy of components that are not shared
• (E.g., own registry files, global objects) - VE prevents conflicts
– JVM, Rosetta on Mac (also emulation), WINE

12
Paravirtualization
• Paravirtualization: changes to the system in order to
redirect privileged sensitive, instructions over to the
hypervisor to regain full control on the resources
– Redirect rather than ‘trap’
• Privileged instructions: storage protection setting, interrupt handling,
timer control, I/O, special processor status-setting instructions -
executed only in special privileged mode for OS but not user program
• Guest OS modified to work with hypervisor. OS is aware
that it is virtualized.
– Apps that run on top of altered OS - no change required
• OS mods require work; but performance improves
– See: Crosby & Brown, ACM Queue 2006 (
https://dl.acm.org/doi/pdf/10.1145/1189276.1189289)
• Paravirtualization, generally able to run on any system
A note on Operating System Rings
• On most OSs, Ring 0 is level with most
privileges and interacts most directly with
physical hardware - CPU and memory.
• Special gates between rings are provided
to allow an outer ring to access an inner
ring's resources in a predefined manner,
as opposed to allowing arbitrary usage.
• Linux x86 ring usage: Linux kernel only
uses 0 and 3: Ring 0

– 0 for kernel – can do anything

– 3 for users
Focus only on
• ring 3 cannot run several instructions and Ring3: user space
Ring0: kernel,

write to several registers Can run privileged instructions

How does Virtualization work?
• CPU supports kernel and user mode (ring0, ring3)
– There is a set of instructions that can only be executed in kernel mode
• I/O, change MMU settings etc -- sensitive instructions
– Privileged instructions: cause a trap when executed in user mode
• Result: type 1 virtualization is feasible if sensitive instruction
subset are handled by the VMM, since it runs in Privileged mode
• Intel 386: ignores sensitive instructions in user mode
• Recent (well – ‘old’ now) Intel/AMD CPUs have hardware support
– Intel VT, AMD SVM
– Intel® Virtualization Technology provides hardware assist to the
virtualization software, reducing its size, cost, and complexity.
• Create an environment where a VM and guest OS can run
• Hypervisor uses hardware bitmap to specify which inst. should trap
• So, Sensitive instruction in guest traps to hypervisor

15
Hardware Assisted Virtualization
• Hardware-assisted virtualization: achieved by additional
functionality included into CPU,
– specifically an additional execution mode called guest mode,
dedicated to virtual instances
– Requires specific hardware
• Intel & AMD - realized full virtualization & paravirtualization were
major challenges and created new processor extensions: VT-x &
AMD-V
• virtualization-aware hardware provides the support to build the
VMM and also ensures isolation of a guest OS.
• representative of this virtualization type is the Kernel-
based Virtual Machine (KVM)
Type 1 hypervisor

• Unmodified OS is running in user mode

– But it thinks it is running in kernel mode (virtual kernel mode)
– privileged instructions trap; sensitive inst-> use VT to trap
– Hypervisor is the “real kernel”
• Upon trap, executes privileged operations
• emulates what the hardware would do

17
Binary Translation

• What did we do before hardware assist was available?

• VMware example
– Upon loading program: scans code for basic blocks
• (basic block: straight-line code sequence with no branches)
– If sensitive instructions, replace by VMware procedure
• Binary translation
– Conversion can be expensive: Cache the modified basic block
in VMware cache for subsequent reuse
• Execute; load next basic block etc.
Type 2 Hypervisor
Applications
Hosted VM
VMM

Host OS

Hardware

• Type 2 hypervisors can work without virtualization

technology (VT) support
– Host OS responsible for executing privileged instructions and
is running in kernel mode (with privileges)

19
True VMs vs. Paravirtualization
User Redirect/hypercall
space

• Both type 1 and 2 hypervisors work on unmodified OS

• Paravirtualization: modify OS kernel to replace all
sensitive instructions with hypercalls
– OS behaves like a user program making system calls
– Hypervisor executes the privileged operation invoked by
hypercall.
20
Standard Virtual machine Interface

Type 1

• Standardize the VM interface (VMIL) so kernel can run

on bare hardware or any hypervisor
– Can help a VM run on bare hardware (like a Type 1)
– Allows for VMs using binary translation (like a Type 2) – as if
the VMWare is the OS after translation is performed
– Allows for VMs to run on Xen (Paravirtualization) 21
Memory virtualization
• OS manages page tables
– Creating a new pagetable is sensitive priv. operation -> traps to
hypervisor
• Hypervisor manages multiple OSs
– Need a second shadow page table – perform two level mapping:
– A typical OS translates VM virtual pages to VM’s physical pages
– Hypervisor maps to actual page in shadow page table
– Need to catch changes to page table (but that is not a privileged
instruction)
– Change Page Table to read-only - page fault on update
• Paravirtualized system - use hypercalls to inform changes to
page table

22
I/O Virtualization

• Each guest OS thinks it “owns” the disk

• Hypervisor creates “virtual disks”
– Large empty files on the physical disk that appear as “disks” to
the guest OS
• Hypervisor converts block # to file offset for I/O
– But DMA needs physical addresses
• Hypervisor needs to do the translation to physical addresses

23
More Details on
Virtualization
Functionality
Slides from Prof. Nael Abu-Ghazaleh @UCR in his OS Course
NUTS AND BOLTS

25
Full virtualization
• Idea: run guest operating systems unmodified

• However, hypervisor is the real privileged

software

• When OS executes privileged instruction, trap

to hypervisor who executes it for the OS

• This can be very expensive

• Also, subject to quirks of the architecture

– Example, x86 fails silently if some privileged
instructions execute without privilege 26

–
Example of Hypervisor Intervention:
Disable Interrupts
• Guest OS tries to disable interrupts
– the instruction is trapped by the VMM which
makes a note that interrupts are disabled for that
VM

• Interrupts arrive for that machine

– Buffered at the VMM layer until the guest OS
enables interrupts.

• Other interrupts are directed to VMs that

have not disabled them
• This action can be expensive
Binary translation--making full
virtualization practical
• Use binary translation to modify OS to
rewrite silent failure instructions
• More aggressive translation can be used
– Translate OS mode instructions to
equivalent VMM instructions
• Some operations still expensive
• Cache for future use
• Used by VMWare ESXi and Microsoft Virtual Server
• Performance on x86 typically ~80-95% of
native
28
Binary Translation Example
Guest OS
Assembly Translated Assembly
do_atomic_operation: do_atomic_operation:
cli call
mov eax, 1 [vmm_disable_interrupts]
xchg eax, [lock_addr] mov eax, 1
test eax, eax xchg eax, [lock_addr]
jnz spinlock test eax, eax
… jnz spinlock
… …
mov [lock_addr], 0 …
sti mov [lock_addr], 0
ret call
[vmm_enable_interrupts]
ret
CLI: Clear Interrupt Flag; STI — Set Interrupt Flag 29
Paravirtualization
• Modify the Guest OS to make it aware
of the hypervisor
– Can avoid these tricky features
– OS is aware of the fact it is virtualized
• Can implement optimizations
• How does it Compare to binary
translation?
• Amount of code change?
– 1.36% of Linux, 0.04% for Windows 30
Hardware supported virtualization
(Intel VT-x, AMD-V)
• Hardware support for virtualization
– Intel® Virtualization Technology
hardware assist for virtualization
• Makes implementing VMMs much
simpler
• Streamlines communication between
VM and OS
• Removes the need for
paravirtualization/binary translation
• EPT: Support for shadow page tables31
Virtualization Tasks
• Virtualize hardware
– Memory hierarchy
– CPUs
– Devices
• Implement data and control transfer
between guests and hypervisor
• We’ll cover this by example – Xen paper
– Slides modified from presentation by
Jianmin Chen
32
Xen
• Design principles:
– Unmodified applications: essential
– Full-blown multi-task O/Ss: essential
– Paravirtualization: necessary for
performance and isolation
• Paul Barham, Boris Dragovic, Keir Fraser,
Steven Hand, Tim Harris, Alex Ho, Rolf
Neugebauer, Ian Pratt, and Andrew
Warfield, Xen and the art of Virtualization
, Proc. of SOSP, October 2003
Xen
Domain 0
Guest OS Implementation summary

35
Xen VM interface: Memory
• Memory management
– Guest cannot install highest privilege
level segment descriptors; top end of
linear address space is not accessible
• Kernel direct mapping space
– Guest has direct (not trapped) read
access to hardware page tables; writes
are trapped and handled by the VMM
– Physical memory presented to guest is
not necessarily contiguous
• ‘Guest’  Guest OS
Two Layers of Virtual Memory Physical address  machine

address Host OS’s

Virtual address  physical View of RAM

address 0xFFFFFFFF
Guest OS’s

View of RAM
Guest App’s
0xFFFF Page 2
View of RAM
Page 0 Page 0
0xFF
Page 3

Page 2 Page 1
Page 3
Page 1

Page 0
Page 3
Page 1
0x00 Page 2
Unknown to the
Known to the guest guest OS
0x0000
OS
0x00000000
Guest’s Page Tables Are Invalid
• Guest OS page tables map virtual page
numbers (VPNs) to physical frame
numbers (PFNs)
• Problem: the guest is virtualized, doesn’t
actually know the true PFNs (locations)
– That true location is the machine frame number
(MFN)
– MFNs are known to the VMM and the host OS
• Guest page tables cannot be installed in cr3
– Map VPNs to PFNs, but the PFNs are incorrect
• How can the MMU translate addresses used
by the guest (VPNs) to MFNs? 38
Shadow Page Tables
• Solution: VMM creates shadow page tables
that map VPN  MFN (as opposed to
VPNPFN)
Guest Page Table Physical Memory

VPN PFN •
64
Page 2 3 Maintained by the
00 (0) 01 (1)
48 2 guest OS
01 (1) 10 (2) Page 1
• Invalid for the MMU
32
Page 0 1
Virtual Memory 10 (2) 11 (3)
16
64 11 (3) 00 (0) Page 3 0
Page 3 0
48
Page 2
32 Shadow Page Table Machine Memory
Page 1
VPN MFN
16
Page 0 64 3 • Maintained by the
00 (0) 10 (2) Page 1
0 48 2 VMM
01 (1) 11 (3) Page 0
•
32 1 Valid for the MMU
10 (2) 00 (0) Page 3
16 0 39
11 (3) 01 (1) Page 2
0
Building Shadow Tables
• Problem: how can the VMM maintain
consistent shadow pages tables?
– The guest OS may modify its page tables at any
time
– Modifying the tables is a simple memory write, not
a privileged instruction
• Thus, there are no helpful CPU exceptions to trap this
action :(
• Solution: mark the hardware pages containing
the guest’s tables as read-only (guest page
table)
– If the guest updates a table, an exception is
generated
– VMM catches the exception, examines the faulting
40
More VMM Tricks
• The VMM can play tricks with virtual
memory just like an OS can
• Balooning:
– The VMM can page parts of a guest, or even an
entire guest, to disk
– A guest can be written to disk and brought
back online on a different machine!
• Deduplication:
– The VMM can share read-only pages between
guests
– Example: two guests both running Windows XP
41
Xen VM interface: CPU
• CPU
– Guest runs at lower privilege than VMM
– Exception handlers must be registered with
VMM
– Fast system call handler can be serviced
without trapping to VMM
• Allow direct calls from application to Guest OS,
rather than directing it through the VMM
– Hardware interrupts replaced by lightweight
event notification system
– Timer interface: both for real and virtual time
Details: CPU
• Frequent exceptions:
– Software interrupts for system calls
– Page faults
• Allow “guest” to register a ‘fast’
exception handler for system calls
that can be accessed directly by CPU in
ring 1, without switching to ring-0/Xen
– Handler is validated before installing in
hardware exception table: To make sure
nothing executed in Ring 0 privilege.
• Not used for Page Fault
Xen VM interface: I/O
• I/O
– Virtual devices exposed as
asynchronous I/O rings to guests
– Event notification replaces interrupts
Details: I/O 1
• Xen does not emulate hardware devices
– Exposes device abstractions for simplicity
and performance
– I/O data transferred to/from guest via Xen
using shared-memory buffers
– Virtualized interrupts: light-weight event
delivery mechanism from Xen-guest
• Update a bitmap in shared memory
• Optional call-back handlers registered by guest
O/S
NIC data structure

NIC manages incoming and

outgoing packets using circular
queues (ring) of buffer descriptors

Each slot in the ring contains the

length and physical address of the
buffer

NIC registers (CPU accessible)

indicate the portion of ring available
for transmission and reception
Details: I/O 2
• I/O Descriptor Ring:
Data Transfer: Descriptor Ring
• Descriptors are allocated by a
domain (guest) and accessible
from Xen
• Descriptors do not contain I/O
data; instead, point to data buffers
also allocated by domain (guest)
– Facilitate zero-copy transfers of I/O
data into a domain
OS Porting Cost
• Number of lines of code modified
or added compared with original
x86 code base (excluding device
drivers)
– Linux: 2995 (1.36%)
– Windows XP: 4620 (0.04%)
• Re-writing of privileged routines;
• Removing low-level system
initialization code
Control Transfer
• Guest synchronously calls into VMM
– Explicit control transfer from guest O/S to VM
monitor/hypervisor, similar to system calls
– “hypercalls”
• VMM delivers notifications to guest O/S
– E.g. data from network or when an I/O device is
ready
– Asynchronous event mechanism; guest O/S does
not see hardware interrupts, only Xen
notifications
Event notification
• Pending events stored in per-domain
bitmask
– E.g. incoming network packet received
– Updated by Xen before invoking guest OS
handler
– Xen-readable flag may be set by a domain
• To defer handling, based on time or number of
pending requests
• Analogous to interrupt disabling
Network Virtualization
• Each domain has 1+ network interfaces:
virtual interfaces (VIFs)
– Each VIF has 2 I/O rings (send, receive)
– Each direction also has rules of the form
(<pattern>,<action>) that are inserted by
domain 0 (management)
• Xen’s role: models/acts as a virtual
firewall + router (VFR) to which all
domain VIFs connect
Network Virtualization
• Packet transmission:
– Guest adds request to I/O ring
– Xen copies packet header, applies
matching filter rules
• E.g. change header - IP source address for
NAT
• No change to payload; pages with payload
must be pinned to physical memory until
DMA to physical NIC for transmission is
complete
– Round-robin packet scheduler
Network Virtualization
• Packet reception:
– Xen applies pattern-matching rules to determine
destination VIF
– Guest O/S required to exchange unused page
frame for each packet received
• Xen exchanges packet buffer for page frame
in VIF’s receive ring
• If no receive frame is available, the packet is
dropped
• Avoids Xen-to-guest copies; requires page
aligned receive buffers to be queued at VIF’s
receive ring
Disk virtualization
• Domain0 has access to physical
disks
– Currently: SCSI and IDE
• All other domains: virtual block
device (VBD)
– Created & configured by management
software at domain0
– Accessed via I/O ring mechanism
– Possible reordering by Xen based on
knowledge about disk layout
Disk virtualization
• Xen maintains translation tables
for each virtual block device (VBD)
– Used to map requests for VBD
(ID,offset) to corresponding physical
device and sector address
– Zero-copy data transfers take place
using DMA between memory pages
pinned by requesting domain
• Scheduling: batches of requests in
round-robin fashion across
domains
Evaluation
Microbenchmarks
• Stat, open, close, fork, exec, etc
• Xen shows overheads of up to 2x with
respect to native Linux
– (context switch across 16 processes; mmap
latency)
• VMware shows up to 20x overheads
– (context switch; mmap latencies)
• User Mode Linux shows up to 200x
overheads
– Fork, exec, mmap; better than VMware in
context switches
CPU Virtualization with VT-x

59
VT-x : Motivation
• To solve the problem that the x86 architecture
instructions cannot be virtualized.
• Simplify VMM software by closing virtualization
holes by design.
– Ring Compression
– Non-trapping instructions
– Excessive trapping
• Eliminate need for software virtualization (i.e.,
paravirtualization, binary translation).

60
VMX
• Virtual Machine Extensions define processor-
level support for virtual machines on the x86
platform by a new form of operation called VMX
operation.
• Kinds of VMX operation:
– root: VMM (hypervisor) runs in VMX root
operation
– non-root: Guest runs in VMX non-root
operation
• Eliminate de-privileging of Ring 0 for guest
OS.
61
Pre VT-x
Post without VT-x with VT-x
VMM ring de-privileging of guest OS VMM executes in VMX root-mode
Guest OS aware its not at Ring 0 Guest OS de-privileging eliminated
Guest OS views as if it “runs directly
on hardware”

62
VMX Transitions
• Transitions between VMX root operation and
VMX non-root operation.
• Kinds of VMX transitions:
– VM Entry: Transitions into VMX non-root
operation. Allows Guest OS to execute Priv.
instructions as if it is in Ring 0
– VM Exit: Transitions from VMX non-root
operation to VMX root operation.
• Registers and address space swapped in
one atomic operation.
63
VM 1 VM 2 VM n

VMX Non-Root Operation: Ring 3 Ring 3 Ring 3 App

Guest OS has access to priv. Ring 0 Ring 0 Ring 0 OS

instructions
VM Entry

VMCS 1 VMCS 2 VMCS n

VM Exit

Ring 3
VMX Root Operation
vmlaunch /
Ring 0
(hypervisor has access vmresume

to priv. instr.)
VMX Transitions

VMCS: VM Control Structure 64

VMCS: VM Control Structure
• Data structure to manage VMX non-root operation
and VMX transitions.
• Specifies guest OS state.
• Configured by VMM.
• Controls when VM exits occur.

• On VM entry, execution mode (guest state), dedicated

to virtual instances allows Guest OS to execute
privileged instructions without trapping to hypervisor
– Defined by VM-execution control fields that control
processor operation in VMX non-root operation

65
VMCS: VM Control Structure
The VMCS consists of six logical groups:
• Guest-state area: Processor state loaded on VM
entries from guest-state area; saved into guest-state
area on VM exits.
• Host-state area: Processor state loaded from the
host-state area on VM exits.
• VM-execution control fields: Fields controlling
processor operation in VMX non-root operation.
• VM-exit control fields: Fields that control VM exits.
• VM-entry control fields: Fields that control VM
entries.
• VM-exit information fields: Read-only fields to
receive information on VM exits describing the cause
66

and the nature of the VM exit.

CPU Virtualization with VT-x

67
Source: [2]
MMU Virtualization with VT-x

68
VPID: Motivation
• First generation VT-x forces TLB flush on each
VMX transition.
• Performance loss on all VM exits.
• Performance loss on most VM entries
– Guest page tables not modified always
• Better VMM software control of TLB
flushes is beneficial.
A translation lookaside buffer is part of the chip's memory-management unit:
TLB contains page table entries that have been most recently used.

69
VPID: Virtual Processor
Identifier
• 16-bit virtual-processor-ID field in the VMCS.
• Cached linear translations tagged with VPID
value.
• No flush of TLBs on VM entry or VM exit if VPID
active.
• TLB entries of different virtual machines can all
co-exist in the TLB.

70
Virtualizing Memory in
Software
0
• Three abstractions of memory: 4GB
Virtual
Current Guest Process Guest OS
Address Spaces

0 4GB
Virtual
Virtual Virtual Physical
Frame
Virtual RAM
Devices ROM
Buffer
Address Spaces

0 4GB
Frame Machine
RAM Devices ROM
Buffer
Address Space

71
Shadow Page Tables
• VMM maintains shadow page tables that
map guest-virtual pages directly to
machine pages.
• Guest modifications to V->P tables
synced to VMM V->M shadow page
tables.
– Guest OS page tables marked as read-only.
– Modifications of page tables by guest OS ->
trapped to VMM.
– Shadow page tables synced to the guest OS
tables 72
Drawbacks: Shadow Page
Tables
• Under shadow paging, in order to provide
transparent MMU virtualization, the VMM
intercepts guest page table updates to keep the
shadow page tables coherent with the guest page
tables.
• Maintaining consistency between guest page
tables and shadow page tables leads to
overhead: VMM traps
• Loss of performance due to TLB flush on every
“world-switch”.
• Memory overhead due to shadow copying of guest
page tables.
73
Guest Guest Guest
Virtual CR3
Page Table Page Table Page Table

Real CR3

Shadow Shadow Shadow

Page Table Page Table Page Table

Set CPU Rings CR3 by guest OS (1)

Guest Guest Guest
Virtual CR3
Page Table Page Table Page Table

Real CR3

Shadow Shadow Shadow

Page Table Page Table Page Table

Set CPU Rings CR3 by guest OS (2)

75
Nested / Extended Page
Tables
• With the introduction of EPT, the VMM can now rely on
hardware to eliminate the need for shadow page
tables. This removes much of the overhead incurred
to keep the shadow page tables up-to-date.
• Extended page-table mechanism (EPT) is used to
support the virtualization of physical memory.
• Translates the guest-physical addresses used in VMX
non-root operation.
• Guest-physical addresses are translated by traversing
a set of EPT paging structures to produce physical
addresses that are used to access memory.
• REF: https://www.vmware.com/pdf/Perf_ESX_Intel-EPT-
eval.pdf
76
Use of EPT
• guest operating system continues to maintain LPN-
>PPN mappings in the guest page tables, but
• VMM maintains PPN->MPN mappings in an
additional level of page tables, called nested page
tables.
• Both guest page tables and the nested page tables
are exposed to hardware.
• When logical address accessed, hardware walks
guest page tables like native execution
– for every PPN accessed during the guest page
table walk, hardware also walks nested page
tables to determine corresponding MPN.
• composite translation eliminates need to
77
maintain shadow page tables and synchronize them
Nested / Extended Page Tables

78
Nested / Extended Page Tables

79
Source: [4]
Advantages: EPT
• Simplified VMM design.
• Guest page table modifications need not be
trapped, hence VM exits reduced.
• Reduced memory footprint compared to
shadow page table algorithms.

80
Disadvantages: EPT
• TLB miss is very costly since guest-physical
address to machine address needs an extra
EPT walk for each stage of guest-virtual
address translation.

81
Virtual Appliances & Multi-Core
• Virtual appliance: pre-configured VM with OS/ apps
pre-installed
– Just download and run (no need to install/configure)
– Software distribution using appliances
(see: “SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing”, H.
Andrés Lagar-Cavilla et. al., Eurosys 2009)
• Multi-core CPUs
– Run multiple VMs on multi-core systems
– Each VM assigned one or more vCPU
– Mapping from vCPUs to physical CPUs

• Today: Virtual appliances have evolved into Docker containers

82
Examples

• Application-level virtualization: “process virtual

machine”
• VMM /hypervisor

83
OS Virtualization

• Emulate OS-level interface with native interface

• “Lightweight” virtual machines
– No hypervisor, OS provides necessary support

• Referred to as containers
– Solaris containers, BSD jails, Linux containers
84
Linux Containers (LXC)

• Containers share OS kernel of the host

– OS provides resource isolation
• Benefits
– Fast provisioning, bare-metal like performance, lightweight

Material courtesy of
“Realizing Linux
Containers” by Boden
Russell, IBM

85
OS Mechanisms for LXC
• OS mechanisms for resource isolation and
management
• namespaces: process-based resource isolation
• Cgroups: limits, prioritization, accounting, control
• chroot: change the apparent root directory for the
current running process and its children.
– Program run in such modified environment cannot name
files outside designated directory tree.
• Linux security module, access control
• Tools (e.g., docker) for easy management

86
Linux Namespaces
• Namespace: restrict what a container can see
– Provide process level isolation of global resources
• Processes have illusion they are the only processes in
the system
• MNT: mount points, file systems (what files, dir are
visible)?
• PID: what other processes are visible?
• NET: NICs, routing
• Users: what uid, gid are visible?

• chroot: change root directory

87
Linux cgroups
• Resource isolation
– what and how much can a container use?
• Set upper bounds (limits) on resources that can be used
• Fair sharing of certain resources
• Examples:
– cpu: weighted proportional share of CPU for a group
– cpuset: cores that a group can access
– block io: weighted proportional block IO access
– memory: max memory limit for a group

88
Proportional Share Scheduling
• Resource allocation
– Uses a variant of proportional-share scheduling
• Share-based scheduling:
– Assign each process a weight w_i (a “share”)
– E.g., CPU Allocation is in proportion to the ‘share'
– fairness: reused unused cycles to others in proportion to weight
– Examples: fair queuing, start time fair queuing
• Hard limits: assign upper bounds (e.g., 30%), no
reallocation
• Credit-based: allocate credits every time T, can
accumulate credits, and can burst up-to credit limit
– can a process starve other processes?
89
Share-based Schedulers

90
Putting it all together
• Images: files/data for a container
– can run different distributions/apps on a host
• Linux security modules and access control
• Linux capabilities: per process privileges

91
Docker and Linux Containers

• Linux containers are a set of kernel features

– Need user space tools to manage containers
– Virtuoze, OpenVZm, VServer, Lxc-tools, Wardenm Docker
• What does Docker add to Linux containers?
– Portable container deployment across machines
– Application-centric: geared for app deployment
– Automatic builds: create containers from build files
– Component re-use
• Docker containers are self-contained: no dependencies

92
Docker

• Docker uses Linux containers

93
LXC Virtualization Using Docker
• Portable: docker images run anywhere Docker runs
• Docker decouples LXC provider from operations
– uses virtual resources (LXC virtualization)
– fair share of physical NIC vs use virtual NICs that are fair-
shared

94
Docker Images and Use

• Docker uses a union file system (AuFS)

– allows containers to use host FS safely
• Essentially a copy-on-write file system
– read-only files shared (e.g., share glibc)
– make a copy upon write
• Allows for small efficient container images
• Docker Use Cases
– “Run once, deploy anywhere”
– Images can be pulled/pushed to repository
– Containers can be a single process (useful for
microservices) or a full OS
95

Isaacv1 7 9b J839
No ratings yet
Isaacv1 7 9b J839
3 pages
Unit1 Virtualization
No ratings yet
Unit1 Virtualization
35 pages
CloudComputing Lect2
No ratings yet
CloudComputing Lect2
63 pages
Virtualization PART 2
No ratings yet
Virtualization PART 2
128 pages
OS Virtualization: CSC 456 Final Presentation Brandon D. Shroyer
No ratings yet
OS Virtualization: CSC 456 Final Presentation Brandon D. Shroyer
36 pages
Virtualization Structures
No ratings yet
Virtualization Structures
73 pages
Virtulization
No ratings yet
Virtulization
29 pages
Lecture 3 - Visualization
No ratings yet
Lecture 3 - Visualization
28 pages
Unit1 Virtualization
No ratings yet
Unit1 Virtualization
38 pages
Unit 1 Continued 1
No ratings yet
Unit 1 Continued 1
29 pages
CS6456: Graduate Operating Systems: Bradjc@virginia - Edu
No ratings yet
CS6456: Graduate Operating Systems: Bradjc@virginia - Edu
45 pages
A1386673771 17671 22 2020 Lecture1-21 PDF
No ratings yet
A1386673771 17671 22 2020 Lecture1-21 PDF
201 pages
Lecture 3 - Visualization
No ratings yet
Lecture 3 - Visualization
31 pages
Module 2 Virtualization
No ratings yet
Module 2 Virtualization
65 pages
11 Biondi1-Hypervisors
No ratings yet
11 Biondi1-Hypervisors
9 pages
U2 Session 6
No ratings yet
U2 Session 6
28 pages
Virtualization and Cloud Computing: Vera Asodi Vmware
100% (1)
Virtualization and Cloud Computing: Vera Asodi Vmware
39 pages
Unit1 Virtualization
No ratings yet
Unit1 Virtualization
35 pages
Virtualization Technologies: IBM Haifa Research Lab
No ratings yet
Virtualization Technologies: IBM Haifa Research Lab
28 pages
Virtualization Technologies: IBM Haifa Research Lab
No ratings yet
Virtualization Technologies: IBM Haifa Research Lab
28 pages
Virtualization-1&2&week1-2-3-4-6-10 (9 Files Merged)
No ratings yet
Virtualization-1&2&week1-2-3-4-6-10 (9 Files Merged)
360 pages
Intro PDF
No ratings yet
Intro PDF
93 pages
Unit 2 Lec 4 Cloud Computing
No ratings yet
Unit 2 Lec 4 Cloud Computing
36 pages
Module 3 Virtualization
No ratings yet
Module 3 Virtualization
56 pages
FALLSEM2024-25 ITA2012 ETH VL2024250103465 2024-07-19 Reference-Material-II
No ratings yet
FALLSEM2024-25 ITA2012 ETH VL2024250103465 2024-07-19 Reference-Material-II
29 pages
Virtualization
No ratings yet
Virtualization
28 pages
Virtual Ization
No ratings yet
Virtual Ization
6 pages
Virtualization
No ratings yet
Virtualization
16 pages
Unit 2-Virtualization Basics - 2.2
No ratings yet
Unit 2-Virtualization Basics - 2.2
10 pages
Unit II Notes - Virtualization
No ratings yet
Unit II Notes - Virtualization
49 pages
Introduction To Virtualization
No ratings yet
Introduction To Virtualization
29 pages
INTRO
No ratings yet
INTRO
8 pages
3 Virtualization
No ratings yet
3 Virtualization
65 pages
AL3452 OS Unit-5
No ratings yet
AL3452 OS Unit-5
25 pages
OS Unit 5
No ratings yet
OS Unit 5
29 pages
Introduction of Virtualization Technology
No ratings yet
Introduction of Virtualization Technology
18 pages
Virtualization Basics
No ratings yet
Virtualization Basics
8 pages
Unit1 and Unit 2
No ratings yet
Unit1 and Unit 2
201 pages
Types of Virtual Machines and Implementations
No ratings yet
Types of Virtual Machines and Implementations
14 pages
Unit 1-3
No ratings yet
Unit 1-3
201 pages
Hypervisor Updated
No ratings yet
Hypervisor Updated
25 pages
Lecture 2 Virtualization Techniques
No ratings yet
Lecture 2 Virtualization Techniques
28 pages
Lec 7
No ratings yet
Lec 7
19 pages
Virtualization Structure and Tools
No ratings yet
Virtualization Structure and Tools
16 pages
Cloud Lect03
No ratings yet
Cloud Lect03
32 pages
Virtualization
No ratings yet
Virtualization
17 pages
Virtual Ization
No ratings yet
Virtual Ization
14 pages
LINUX Virtualization
No ratings yet
LINUX Virtualization
8 pages
Types of Virtualization
No ratings yet
Types of Virtualization
3 pages
CC Unit 2 Notes
No ratings yet
CC Unit 2 Notes
26 pages
Unit - 2 - Virtualization
No ratings yet
Unit - 2 - Virtualization
57 pages
Virtualization Techniques
No ratings yet
Virtualization Techniques
22 pages
CPU Virtualization
No ratings yet
CPU Virtualization
37 pages
CC Unit 3
No ratings yet
CC Unit 3
64 pages
Assignment No:2: Virtualization Basics, Benefits of Virtualization in Cloud Using Open Source Operating System
No ratings yet
Assignment No:2: Virtualization Basics, Benefits of Virtualization in Cloud Using Open Source Operating System
14 pages
Lec16 Virtualization
No ratings yet
Lec16 Virtualization
44 pages
Understanding Full Virtualiz Ation, Paravirtualization, A ND Hardware Assist
No ratings yet
Understanding Full Virtualiz Ation, Paravirtualization, A ND Hardware Assist
16 pages
Virtualization Technique - Digital Studio - 03-02-22
No ratings yet
Virtualization Technique - Digital Studio - 03-02-22
21 pages
Virtualization
No ratings yet
Virtualization
33 pages
Mastering KVM Virtualization
From Everand
Mastering KVM Virtualization
Humble Devassy Chirammal
5/5 (1)
Mastering Proxmox
From Everand
Mastering Proxmox
Wasim Ahmed
5/5 (1)
Online Placement Management
No ratings yet
Online Placement Management
22 pages
Principles of GIS
No ratings yet
Principles of GIS
51 pages
List of EDA Tools
No ratings yet
List of EDA Tools
4 pages
Simulink Starting Guide
No ratings yet
Simulink Starting Guide
89 pages
Unity in Action Multiplatform Game Development in C With Unity 5 1st Edition Joseph Hocking Instant Download
No ratings yet
Unity in Action Multiplatform Game Development in C With Unity 5 1st Edition Joseph Hocking Instant Download
52 pages
COA Assgnment Based On William Stallings Computer Organization and Architecture
No ratings yet
COA Assgnment Based On William Stallings Computer Organization and Architecture
2 pages
ORBITER User Manual
100% (1)
ORBITER User Manual
71 pages
Lastexception 63855515782
No ratings yet
Lastexception 63855515782
4 pages
Apple Iphone 12
No ratings yet
Apple Iphone 12
6 pages
Grade 4 Second Quarter Lecture 2 2020
No ratings yet
Grade 4 Second Quarter Lecture 2 2020
7 pages
Notes 1
No ratings yet
Notes 1
12 pages
Treci Dan - Tipovi I Metode Testiranja
No ratings yet
Treci Dan - Tipovi I Metode Testiranja
27 pages
Pathways RW SB Sample Unit Level 3 Unit 6
No ratings yet
Pathways RW SB Sample Unit Level 3 Unit 6
13 pages
D - Copia - 3524 3524mfplus 4023 4024 4024mfplus - Eng
No ratings yet
D - Copia - 3524 3524mfplus 4023 4024 4024mfplus - Eng
4 pages
UID Module 1
No ratings yet
UID Module 1
15 pages
Digital Image Forgery and Techniques of Image Forgery Detection 1
No ratings yet
Digital Image Forgery and Techniques of Image Forgery Detection 1
7 pages
CH 2
No ratings yet
CH 2
48 pages
Add The Up Button To The Explorer Command Bar in Windows 7: by Greg Shultz
No ratings yet
Add The Up Button To The Explorer Command Bar in Windows 7: by Greg Shultz
11 pages
IB Computer Science (Year 1) 2024 Sem 2 EOY Markscheme
No ratings yet
IB Computer Science (Year 1) 2024 Sem 2 EOY Markscheme
9 pages
UD21938B Baseline User Manual of DS 6900UDIB Series Decoder V2.6.1 20201109
No ratings yet
UD21938B Baseline User Manual of DS 6900UDIB Series Decoder V2.6.1 20201109
68 pages
Image Captioning Final
No ratings yet
Image Captioning Final
31 pages
CCMS Spec
No ratings yet
CCMS Spec
9 pages
Vector Spatial Data Types
No ratings yet
Vector Spatial Data Types
5 pages
Chapter 1
No ratings yet
Chapter 1
6 pages
Aspire AX1430 Desktop Computer Service Guide: Printed in Taiwan
No ratings yet
Aspire AX1430 Desktop Computer Service Guide: Printed in Taiwan
126 pages
Felicia Lorraine Walker CV 1
No ratings yet
Felicia Lorraine Walker CV 1
3 pages
iOS 18 Jailbreak (Latest Methods, and Safety Tips) - Pangu8
No ratings yet
iOS 18 Jailbreak (Latest Methods, and Safety Tips) - Pangu8
1 page
The Right Tools For Professionals: Nvidia Workstation Gpus
No ratings yet
The Right Tools For Professionals: Nvidia Workstation Gpus
4 pages
Đề bài tập 1-đã gộp-đã gộp
No ratings yet
Đề bài tập 1-đã gộp-đã gộp
17 pages

Class 2 3 5

Uploaded by

Class 2 3 5

Uploaded by

Cloud Computing and

• Virtualization: extend or replace an existing interface to

• Different types of interfaces

Virtual machines can be built in multiple ways

• Type 1: Native (bare metal) - – example of Full/Native Virtualization

• Type 1: hypervisor runs on “bare metal”

• Full/native Virtualization (Type 1)

• Hybrid hypervisors are popular: e.g., Xen

– 0 for kernel – can do anything

write to several registers Can run privileged instructions

• Unmodified OS is running in user mode

• What did we do before hardware assist was available?

• Type 2 hypervisors can work without virtualization

• Both type 1 and 2 hypervisors work on unmodified OS

• Standardize the VM interface (VMIL) so kernel can run

• Each guest OS thinks it “owns” the disk

• However, hypervisor is the real privileged

• When OS executes privileged instruction, trap

• This can be very expensive

• Also, subject to quirks of the architecture

• Interrupts arrive for that machine

• Other interrupts are directed to VMs that

address Host OS’s

Virtual address  physical View of RAM

NIC manages incoming and

Each slot in the ring contains the

NIC registers (CPU accessible)

VMX Non-Root Operation: Ring 3 Ring 3 Ring 3 App

Guest OS has access to priv. Ring 0 Ring 0 Ring 0 OS

VMCS 1 VMCS 2 VMCS n

VMCS: VM Control Structure 64

• On VM entry, execution mode (guest state), dedicated

and the nature of the VM exit.

Shadow Shadow Shadow

Set CPU Rings CR3 by guest OS (1)

Shadow Shadow Shadow

Set CPU Rings CR3 by guest OS (2)

• Today: Virtual appliances have evolved into Docker containers

• Application-level virtualization: “process virtual

• Emulate OS-level interface with native interface

• Containers share OS kernel of the host

• chroot: change root directory

• Linux containers are a set of kernel features

• Docker uses Linux containers

• Docker uses a union file system (AuFS)

You might also like