Cs498 Week 13 Slide
Cs498 Week 13 Slide
Virtualization
Prof. Reza Farivar
Sharing Resources
User Cloud
Network Controller and
Network
Virtualization Scheduler
Self-
Storage Controller and
Application Service Storage
Virtualization Scheduler
Portal
Compute CPU
Compute
Virtualization Hypervisors
Space reserved for video
Do not put anything here
Automation
Types of Virtualization
• Emulation
• Full
• Software
• Binary Translation
• Paravirtualization
• Hardware assisted
• MicroVMs Space reserved for video
Do not put anything here
• OS level
• Containers
Cloud Computing
Virtualization
Prof. Reza Farivar
Sharing Resources
User Cloud
Network Controller and
Network
Virtualization Scheduler
Self-
Storage Controller and
Application Service Storage
Virtualization Scheduler
Portal
Compute CPU
Compute
Virtualization Hypervisors
Space reserved for video
Do not put anything here
Automation
Types of Virtualization
• Emulation
• Full
• Software
• Binary Translation
• Paravirtualization
• Hardware assisted
• MicroVMs Space reserved for video
Do not put anything here
• OS level
• Containers
Cloud Computing
Virtualization: Background
Prof. Reza Farivar
Brief History Lesson
• Single program computers
• VERY early mainframes (1950s)
• MS-DOS
• Single user program gets access to everything the hardware
has
• The OS is really a thin wrapper around BIOS
• No real notion of process
• Multi-user / Multi-tasking
• Need to isolate programs
Space reserved for video
• Need to isolate users Do not put anything here
• Notion of Process
• "executing program and its context”
• Normal applications better not use any of these instructions Do not put anything here
level
not cause a trap, and will execute straight on Do not put anything here
Virtualization: Paravirtualization
Prof. Reza Farivar
Software-only Virtualization
• Problem: x86 processors were not
virtualizable until mid 2000s
• Software-only virtualization is a technique to
go around the trap and emulate design of
Popek and Goldberg
• Does not need special hardware support, e.g.
the Intel "VT-x" or "AMD-V" features Space reserved for video
Do not put anything here
• Examples:
• VMWare Fusion, ESX Space reserved for video
Do not put anything here
• Parallels Desktop for Mac
• Parallels Workstation
First Generation Hardware Virtualization
• First introduced in x86 in mid 2000s
• Intel VT-x, AMD-V
• Virtual machine control block (VMCB).
• in-memory data structure
• The VMCB combines control state with a subset of
the guest VCPU state
• A new, less privileged execution mode, guest
mode, supports direct execution of guest Space reserved for video
Do not put anything here
• Specialized OS
• Boot time in less than 5 ms
Containers
Prof. Reza Farivar
Isolation
• “I once heard that hypervisors are the living
proof of operating system's incompetence”
• -Glauber Costa, 2012
• hypervisors have indeed provided a remedy
for certain deficiencies in operating system
design
• for some cases, containers may be an even Space reserved for video
better remedy for those deficiencies Do not put anything here
• Not chroot!
Space reserved for video
Do not put anything here
• chroot() does NOT provide secure isolation Do not put anything here
Containers: cgroups
Prof. Reza Farivar
Cgroups
• Control Groups
• Linux kernel feature which limits, isolates and
measures resource usage of a group of processes
• Since Linux Kernel 2.6.24
• Resources quotas for memory, CPU, network and IO
• Create a control group and assign resource limits on it:
• e.g. 3GB of memory limit and 70% of CPU
• Add a process id to the group
Space reserved for video
• Process resource usage will be throttled Do not put anything here
freezer can suspend and restore (resume) all processes in a cgroup. Freezing a cgroup /A also causes its children, for example, processes in /A/B, to be
frozen.
io The io cgroup controls and limits access to specified block devices by applying IO control in the form of throttling and upper limits against leaf nodes
and intermediate nodes in the storage hierarchy. Two policies are available. The first is a proportional- weight time-based division of disk
implemented with CFQ. This is in effect for leaf nodes using CFQ. The second is a throttling policy which specifies upper I/O rate limits on a device.
Space reserved for video
memory The memory controller supports reporting and limiting of process memory, kernel memory, and swap usedDo
bynot
cgroups.
put anything here
Perf_event This controller allows perf monitoring of the set of processes grouped in a cgroup.
pids This controller permits limiting the number of process that may be created in a cgroup (and its descendants).
rdma The RDMA controller permits limiting the use of RDMA/IB- specific resources per cgroup.
3
Detailed Documentation at: https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst
Cgroup example
• Controllers mounted in the cgroups file system
• /cgroup directory
• /sys/fs/cgroup/memory
• /sys/fs/cgroup/cpu
• Making a control group
• /cgroup/memory/mytestcgroup
• Setting limits
• echo 2097152 > /sys/fs/cgroup/memory/mytestcgroup/memory.limit_in_bytes
• echo 2097152 > /sys/fs/cgroup/memory/mytestcgroup/memory.memsw.limit_in_bytes
Space reserved for video
Do not put anything here
• Set both memory AND swap space limit to 2 MB
• Running a process
• cgexec -g memory:mytestcgroup ./<binary_name>
Containers: Namespaces
Prof. Reza Farivar
Namespaces
• A namespace wraps a global system resource in an
abstraction that makes it appear to the processes
within the namespace that they have their own
isolated instance of the global resource
• Linux processes form a single hierarchy, with all
processes rooting at init.
• Usually privileged processes in this tree can trace
or kill other processes.
• Linux namespace enables us to have many Space reserved for video
Do not put anything here
hierarchies of processes with their own “subtrees”
such that processes in one subtree can NOT access
or even know of those in another.
IPC Isolates System V IPC, POSIX message queues. The common characteristic of these IPC mechanisms is that IPC objects are
identified by mechanisms other than filesystem pathnames.
Network * Network devices, stacks, ports, etc. each network namespace has its own network devices, IP addresses, IP routing tables,
/proc/net directory, port numbers, etc.
Mount * Mount points. processes in different mount namespaces can have different views of the filesystem hierarchy
PID * Isolates Process IDs. In other words, processes in different PID namespaces can have the same PID.
Space reserved for video
Do not put anything here
Time Boot and monotonic clocks
User * User and group IDs. In other words, a process's user and group IDs can be different inside and outside a user namespace
UTS Hostname and NIS domain name. Allows each container to have its own hostname and NIS domain name. Affects
nodename and domainname—returned by the uname() system call.
Cloud Computing Applications - Reza Farivar 3
PID Namespace example
• Without namespace, all processes descend hierarchically
from PID 1(init).
• If we create a PID namespace and run a process in it, that
first process becomes PID 1 in that namespace.
• The process that creates namespace still remains in parent
namespace, but makes its child the root of new process
tree.
• The processes within the new namespace can not see
parent process but the parent process namespace can see
the child namespace.
• The processes within new namespace have 2 PIDs: one for
new namespace and one for global namespace. Space reserved for video
Do not put anything here
• PID namespaces also allow each container to have its own
init (PID 1), the "ancestor of all processes" that manages
various system initialization tasks and reaps orphaned
child processes when they terminate
Docker CLI
7
Cloud Computing
Containers: Networking
Prof. Reza Farivar
Container Network Model
• Libnetwork implements Container Network Model (CNM)
• Formalizes the steps required to provide networking for
containers while providing an abstraction that can be used to
support multiple network drivers
production use
Host : Container