0% found this document useful (0 votes)
18 views

Computer Organization

The document discusses coprocessors and network processors. Coprocessors are used to supplement the primary processor by offloading processor-intensive tasks like floating point arithmetic, graphics processing, and I/O interfacing. Network processors are specialized hardware that can process high volumes of network packets at wire speed through multithreading and pipelining.

Uploaded by

kz33252000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Computer Organization

The document discusses coprocessors and network processors. Coprocessors are used to supplement the primary processor by offloading processor-intensive tasks like floating point arithmetic, graphics processing, and I/O interfacing. Network processors are specialized hardware that can process high volumes of network packets at wire speed through multithreading and pipelining.

Uploaded by

kz33252000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

CST-32107

Computer
Organization
Coprocessors &
Processor Virtualization

Group-1
Group members

3CS -1013 Kaung Min Set


3CS -959 Satt Paing Thu
3CS -1222 Kaung Khant Nyein
3CS -968 Minn Khant Ko
3CS -1162 Kay Zin Thant
3CS - 1286 May Shun Lae Naing
Coprocessors - What are they ? Why do we need them?

● Coprocessor is a processor used to supplement the functions of the primary


processor (the CPU).
● Coprocessor’s operation can be floating-point arithmetic (FPU), graphic
processing unit (GPU), signal processing such as digital signal processor (DSP),
string processing, cryptography or I/O interfacing.

● By offloading processor-intensive tasks from the main processor, coprocessors


can accelerate system performance.
For example, instead of the CPU handling time-comsuming I/O formatting and
processing, by dedicating relatively simple sub-processors to such tasks, overall
higher system performance is achieved.
Two modes of coprocessor operations

● The CPU gives the coprocessor an instruction or set of instructions and tells it to
execute them
● The coprocessor is more independent and runs pretty much on its own.

- Sony’s Playstation 2’s EmotionEngine contains DSP-like SIMD vector unit


coprocessor that can run in both mode of operations.

● Three areas where such speed-ups are possible:


○ network processing
○ multi-media processing
○ cryptography
Network Processors
● As a result of technological progress in network hardware, networks are now so
fast that it has become increasingly difficult to process all the incoming and
outgoing data in software.

● Due to that, special network processors have been developed to handle the
traffic.

● To discuss how network processors work, we first need to know the general
overview of how networking works. Let’s do a brief intro to networking!!
Introduction to networking
● Computer networks are categorized into
○ Local-area Network (LAN) and Wide-area Network (WAN)
● Ethernet is the most popular LAN, connecting multiple computers within a
building or campus. Its original form used a vampire tap, but modern versions
have computers attached to a central switch.
● Speed of ethernet
○ Original ethernet – 3 Mbps
○ First commercial ethernet (10M) – 10 Mbps
○ Fast ethernet (100M) – 100 Mbps
○ 1G – 1 Gbps
○ 10G – 10 Gbps
○ 40G, 100G, 400G, etc…
Introduction to networking (cont’d), WAN

● Internet is a collection of many WANs connected together.


● WAN consist of specialized computers called routers connected by wires or
optical fibers
● store-and-forward packet switching
Introduction to networking (cont’d), Life of a packet

break down data


into packets

User Computer ISP Internet Server

ADSL or broadband connection

high-speed (fiber-optic) connection (s)


Introduction to networking (cont’d), Network Software
● Network software consists of multiple protocols
● protocol - a set of formats, exchange sequences, and rules about what the
packets mean

● Say you want to see a cute kitten pic on internet. Let’s observe what happened
if you clicked on a website page that contain many pics of kittens.
1. Your browser first establishes a connection to the Web server using TCP
(Transmission Control Protocol).
2. The browser sends a packet containing a GET PAGE request using the
HTTP (HyperText Transfer Protocol) to the server.
Introduction to networking (cont’d), Network Software (cont’d)
3. Web browser formats the GET PAGE request as a correct HTTP message and then
hands it to the TCP software to transmit over the connection.
4. The TCP software adds a header in front of the message containing a sequence
number and other information. This header is naturally called the TCP header.
5. the TCP software takes the TCP header and payload (containing the GET PAGE
request) and passes it to another piece of software that implements the IP protocol
(Internet Protocol).
6. This software attaches an IP header to the front, It contains
● the source address (the machine the packet is coming from)
● the destination address (the machine the packet is supposed to go to)
● how many more hops the packet may live (to prevent lost packets from living forever)
● a checksum (to detect transmission and memory errors)
● and other fields.
Introduction to networking (cont’d), Network Software (cont’d)

7. The resulting packet (which constitutes the IP header, TCP header, and GET
PAGE request) is passed down to the data link layer, where a data link header is
attached to the front for actual transmission.

8. The data link layer also adds a checksum to the end called a CRC (Cyclic
Redundancy Code) used to detect transmission errors.

A packet as it appears on the ethernet


Introduction to Network Processors
● An incoming packet may need various kinds of processing before being
forwarded to either the outgoing line or the application program.
● With 40 Gbps and 1 KB sized packets, a networked computer might have to
process almost 5 millions packets/second. With 64 byte packets, that number
rises to nearly 80 million.
● Intermediate systems in networks such as routers, switches, firewalls, web
proxies, and load balancers are the most demanding.
● Performing the various functions mentioned above in 12–200 nsec is not
doable with software alone, hardware assistance is essential.
Introduction to Network Processors (cont’d)
Three kinds of Hardware solutions
1. Application-Specific Integrated Circuit (ASIC)
○ a hardwired chip that does whatever set of processing functions it was designed for
○ has many problems, long time to design and manufacture, rigid, bug management
nightmare, expensive
2. Field Programmable Gate Array (FPGA)
○ a collection of gates that can be organized into the desired circuit by rewiring them
in the field.
○ can be rewired in the field by removing them from the system and inserting them
into a special reprogramming device.
○ it’s problems are complex, slow and expensive.
Introduction to Network Processors (cont’d)
3. Network Processors
○ programmable devices that can handle incoming and outgoing packets at wire speed
○ The following picture is a typical network processor board and chip.
Introduction to Network Processors (cont’d)
Introduction to Network Processors (cont’d)
● Static Random Access Memory (SRAM) is used to hold routing tables and other key data
structures
● Synchronous Dynamic Random Access Memory (SDRAM) holds the actual packets being
processed
● Protocol/Programmable/Packet Processing Engines (PPEs) contain a Redued Instruction
Set Computer (RISC) core and a small amount of internal memory for holding the program
and some variables.
● A control processor is usually just a standard general-purpose Redued Instruction Set
Computer (RISC) CPU, for performing all work not related to packet processing, such as
updating the routing tables. Its program and data are in the local on-chip memory
● Specialized processors, which are really small Application-Specified Integrated Circuits
(ASICs) that are good at one simple operation, for doing pattern matching or other critical
operations.
Introduction to Network Processors (cont’d)
Two Ways of organizing PPEs

1. Simplest organization – having all the PPEs identical.


○ When a packet arrives, either incoming or outgoing, handed to idel PPEs for
processing.
○ If no PPEs is idel, the packet is queued in the SDRAM till a PPE frees up.
○ Horizontal connections in the above picture is not needing for this way of organizing
○ PPEs are completely programmable.
Introduction to Network Processors (cont’d)
Two Ways of organizing PPEs (cont’d)

2. Pipeline organization
○ each PPE performs one processing step and then feeds a pointer to its output packet
to the next PPE in the pipeline
○ the PPE pipeline acts very much like the CPU pipelines
○ PPEs are completely programmable.
Introduction to Network Processors (cont’d)
PPEs Multithreading

● In advanced designs, the PPEs have multithreading.


● This feature is used to run multiple programs at the same time by allowing a
program (i.e., thread) switch by just changing the ‘‘current register set’’ variable.
Packet Processing
● When a packet arrives,it goes through many processing stages,independent of
the network processor’s parallel or pipeline organization.
● Network processors divide these steps mainly into two processing called
ingress processsing (incoming packets) and egress processing(outgoing
packets).
● Every packet goes throuh ingress processing first and then it goes through
egress processing but some steps can be done in either part of one of those two
because the boundary between ingress processing and egress processing is
flexible.(Eg.collecting traffic statistics)
Checksum Verification
● Ethernet Check(CRC)-the device recomputes CRC and compares it to the one in
the packet to check no transmission error.
● IP Checksum-If Ethernet part is has no error,it recalculates IP checksum and
compares it to the one in the packet to make sure there was no damage during
sending process.
● If both CRC and IP checksum are correct or missing,the packet is accepted or it
will be discarded.
Field Extraction
● In Ethernet Switch,it looks at only the Ethernet header to find key parts.
● In IP Router,it checks the IP header to understand where it should go.
● Those details are stored in PPE(Packet Processing Engine) organization or
SRAM(Static Random Access Memory).
Packet Classification
● The packet is classified according to programmable rules.
● The simplest classification is to distinguish data packets from control packets.

Path Selection
● Most network processors use special fast path to
handle plain old garden-variety data packets.
● Other packets need to be managed differently and
carefully by control processor and they are sent to
slow path.
● Packets have to go through either fast or slow path
that suits them best.
Destination Network Determination
● IP packets contain a 32 bit destination address.
● It is not possible to have a 232 entry table to look up the destination of each IP
packet.
● The leftmost part is network number and the rest specifies a machine on that
network.
● Network numbers can be any length, so determining the destination network
number is nontrivial and multiple matches are possible.
● To handle this step,a custom ASIC(Application Specific Integrated Circuit) is
often used.
Route Lookup
● Once the number of the destination network is known,the outgoing line can
be used to look up in a table in SRAM.
● A custom ASIC may be used in this step too.
Fragmentation and Reassembly

● When sending large data over a network,programmers want to


use fewer steps.

● However,there are size limits for data to send in one go in


network layers like TCP,IP and Ethernet.

● So,data need to be fragmented at the sending side and


reassembled at the receiving side.
Computation
● Heavy-duty computation on the payload is sometimes needed.
● For example, data compresssion/decompression and encryption/decryption.

Header Management
● Sometimes,headers need to be added,removed or
some of their fields modified.
● For Example,IP header can count the number of hops
the packets make before being discarded.
● Everytime it is retransmitted,the number must be
decremented.
Queue Management
● Incoming and outgoing packets have to be queued while waiting their turn at
being processed.
● Sometimes,certain types of applications,like videos or games need specific
interpacket spacing to work smoothly(avoid jitter).

Checksum Generation
● Outgoing packets need to be checksummed.
● IP checksum can be generated by the network
processor,but the Ethernet CRC is generally computed
by hardware.
Accounting
● Accounting for packet traffic is needed for some cases especially when one
network is forwarding traffic for other networks.

Statistics Gathering
● Finally,many organizations like to collect statistics
about their traffic.
● The network processor is a good place to collect
statistics like how many packets came and how many
went out at what times of day and more.
Improving Performance
❖ Improving performance of network processors is crucial for efficient data
handling.
❖ Performance can be measured using metrics like packets forwarded per second
and bytes forwarded per second.
Key Strategies For Enhancing Performance
❖ Clock Speed and Parallelism: Boosting the clock speed of the processor
can help, but the relationship isn't always straightforward due to
memory cycle time and heat issues. Incorporating more parallel
processing elements (PPEs) and deeper pipelines can enhance
performance, especially in parallel PPE configurations.

❖ Specialized Hardware: Introducing dedicated processors or ASICs for


repetitive, time-consuming tasks like lookups, checksums, and
cryptography can significantly speed up operations that are slower in
software.
Key Strategies For Enhancing Performance
❖ Bus Improvements: Expanding internal buses and using wider existing buses
can quicken packet movement through the system, aiding overall speed.

❖ Memory Type Optimization: Replacing slower SDRAM with faster SRAM can
generally enhance performance, but at an expense.
Graphic Processors
Graphic Processors
The second area which uses co-processors is for handling high-resolution
graphic processing such as 3D rendering.

Since ordinary cpu can not handle graphical processing which needs the
massive computation to process large amount of data, many future
processors equipped with GPUs (Graphical Processing Units)
Nvidia Fermi GPU
So, Let’s talk about Fermi GPU which you may have heard of.
Architecture of Fermi GPU
Architecture of Fermi GPU
● Organized into 16 Streaming Multiprocessros (SM) which basically is
components of the GPU
● Each SM contains private L1 cache & Dedicated Shared Memory
● Each SM is composed of 32 CUDA (Compute Unified Deveice
Architecture) cores which is heart of GPU
● CUDA is the simple processor which render images, display graphical
interface in monitors , tvs & also used for computer vision applications
Architecture of Fermi GPU (Cont’d)
● SMs share single unified 768-KB L2 cache which connect multiported
DRAM (Dynamic Random Access Memory)
● Host Processor Interface connect host system & GPU through shared
DRAM bus interface, typically PCIe interface.
SIMD (Single-Instruction Multiple Data) processing
● Since Fermi architecture is designed to efficiently execute graphics,
videos & images processing codes, there may have unnecessary
computations
● Unnecessary Computations require SM components to execute same
operations in a cycle to be identical
● This style of processing is called SIMD computation
Advantages of SIMD processing
● Each SM fetch & decode same instruction in a cycle
● Only by sharing instructions process across all cores, NVIDIA cram 512
CUDA cores in a single silicon
● If programmers are clever enough to utilize computation resources,
system provides computational advantages over traditional scalar
architecture.
Problems with SIMD
● SIMD processing requirements limit programmer to execute by putting
constraints on the code.
● To be exact, each CUDA core must run the same code in lock-step to
execute 16 operations simultaneously.
● This was the burdens for programmers
CUDA language
● To remove the burden, NVIDA implemented CUDA programming
language. (C , Fortran, C++)
● By using Threads, CUDA language specifies program parallelism.
● Grouping threads into blocks, then assign to SMs.
● As long as threads in the block execute the same code sequence (all
branches have the same decision) 16 operations can be executes
simultaneously.
Branch Divergence
● If threads in each SM makes different decision, there has been a branch
divergence which causes performance-degrade effect.
● That effect forces threads with different code path to execute serially on
SM.
● Reduces Parallel Processing & slows GPU processing.
Countering Branch Divergence & GPGPUS
● Fortunately, there are activities which avoids brand divergence & achieve
good speed-ups
● SIMD-styled architecture graphic processors obtains benefits such as
Medical Imaging, Proof Solving, financial prediction & graph analysis.
● There’s a nickname called GPGPUS (General-Purpose Graphics
Processing Units) due to wide area of application using GPU.
Fermi GPU memory Hierarchy

● Fermi GPU which has 512 CUDA


cores, would grind to halt without
significant memory bandwidth
● To solve the bandwidth problem,
Fermi implements modern
memory hierarchy.
Fermi GPU Memory Hierarchy (Cont’d)
● Each SM contains dedicated shared memory & private L1 cache.
● Dedicated shared memory directly access over CUDA cores & provide
faster data-sharing rate for threads in SM
● L1 cache speeds up to access data from DRAM
● Two Configure Methods to accommodate wide area of program data
usage
○ 16-KB shared memory & 48-KB L1 Cache
○ 48-KB shared memory & 16-KB L1 Cache
Fermi GPU Memory Hierarchy (Cont’d)
● All SMs shared Single Unified 768-KB L2 Cache.
● Reason is, L2 cache provide more faster access to the data from L1
cache which can only contains 64-KB max.
● Additionally, L2 cache provides sharing between SMs.
● After L2 Cache, There’s DRAM, which holds the data, imagery & texture
used by programs running on Fermi GPU.
● Important to note that efficient programs will try to avoid accessing
DRAM at all costs, due to waiting time of hundred cycles per access.
Fermi GPU (Conclusion)
● A single-fan based GTX 580 which was built upon Fermi architecture
running at 772MHz, 512 CUDA cores have the computational rate of 1.5
teraflops and only consumes 250Watt, which was the great statistics at
that time.
● The impressive one is it only cost $600 at that time.
● In 1990, fastest computer at that time called Cray-2 costs $30 mil, bigger
size & consume 150kW of power.
● In terms of comparison , we can not argue that Fermi GPU architecture
based GPUs had a good renown.
Ctyptoprocessors
Introduction to Cryptoprocessors
● Third area which coprocessors are popular is network security.
● For the purpose of authentication, encrypted connection has to be
established between client & server to transfer data in secured way
● The problem is how to achieve it ?
● Easy, it is cryptography.
Cryptography
● Practice of secure communication through techniques that protect
information by converting it into a code or cipher, making it unreadable to
unauthorized users.
● Two general types of cryptography
○ Symmetric key cryptography
○ public-key cryptography
Symmetric key cryptography
● Type of cryptography where the same key is used for both encryption and
decryption of data.
● Sender and receiver sharing a common secret key that is kept
confidential.
● This key is used to transform plaintext data into ciphertext during
encryption and then reverse the process during decryption to retrieve the
original data.
● Examples of symmetric key algorithms include AES (Advanced Encryption
Standard) and DES (Data Encryption Standard).
Public-key cryptography
● Also known as asymmetric cryptography, that uses a pair of
mathematically related keys: a public key and a private key.
● The public key is freely shared and used for encryption, while the private
key is kept secret and used for decryption.
● Data encrypted with the public key can only be decrypted using the
corresponding private key, and vice versa.
● Public key cryptography forms the basis for digital signatures, secure
online communication, and other cryptographic applications.
● Examples of public key algorithms include RSA, ECC (Elliptic Curve
Cryptography), and Diffie-Hellman key exchange.
Production of crypto processors
To handle the computation needed to encrypt & decrypt data securely,
companies produced cryptoprocessors, PCI bus plug-in cards.
Production of crypto processors (Cont’d)

● The main reason to use the coprocessor hardware is that enables to do


necessary cryptography computataions much faster than ordinary CPU’s.
● Since detailed discussion of how they work is out of scope, the author
suggests Gaspar et al. (2010), Haghighizadeh et al. and Shoufan et al.
(2011)
Processor Virtualization
Virtualization
❖ Virtualization refers to the use of hardware and software to create an emulated
version of an environment in which a piece of software runs, as opposed to the
real environment in which the code normally expects to run.

❖ Virtualization tools enable the emulation of instruction set–accurate


representations of various computer architectures and operating systems on
general-purpose computers. Virtualization is used widely in the deployment of
real-world software applications in cloud environments.


Virtual Memory
❖ Virtual memory is a method that computers use to manage storage space to
keep systems running quickly and efficiently.

❖ Using the technique, operating systems can transfer data between different
types of storage, such as random access memory (RAM), also known as main
memory, and hard drive or solid-state disk storage.

❖ Systems using virtual memory create multiple sandboxed environments in


which each application runs without interference fromother applications,
except in competition for shared system resources
Sandboxed
❖ In the virtualization context, a sandbox is an isolatedenvironment in which code
runs without interference from anything outside its boundaries, and which
prevents code inside thesandbox from affecting resources external to it. This
isolation between applications is rarely absolute, however.

❖ For example, eventhough a process in a virtual memory system cannot access


another process's memory, it may do something else, such as delete afile that is
needed by a second process, which may cause problems for the other process.
Type of Virtualization
❖ The term virtualization is applied in businesses, universities, government
organizations, and cloud service providers.
Operation System Virtualization
❖ A virtualized operating system runs under the controlof a hypervisor.
❖ A hypervisor is a combination of software and hardware capable of
instantiating and running virtual machines.
❖ There are two general types of hypervisor:

❖ A type 1 hypervisor((Bare-Metal Hypervisor) , sometimes referred to as a bare


metal hypervisor, includes software for managing virtual machines that runs
directly on the hardware of a host computer. Eg - VMware vSphere/ESXi,
Microsoft Hyper-V
❖ A type 2 hypervisor(Hosted Hypervisor), also called a hosted hypervisor, runs
as an application program that manages virtual machines under a host
operating system. Eg VMware Workstation, Oracle VirtualBox
Hypervisor Vs Virtual Machine Monitor
❖ The term "hypervisor" is more commonly used in modern discussions about
virtualization.It's a layer of software that directly runs on the physical
hardware and manages the creation and execution of virtual machines.

❖ The term "Virtual Machine Monitor" is a more traditional term that was
used before the popularity of the term "hypervisor."VMM refers to the
software layer responsible for managing and creating virtual machines on a
physical host.It's the precursor to the modern concept of the hypervisor,
and the terms are often used interchangeably.
Application Virtualization
❖ Application virtualization abstracts the operating system from the application
code and provides a degree ofsandboxing.
❖ Application virtualization replaces portions of the runtime environment with a
virtualization layer and performs tasks such as
❖ intercepting disk I/O calls and redirecting them to a sandboxed, virtualized disk
environment.
❖ Application virtualization can encapsulate a complex software installation
process, consisting of hundreds of files installed in various directories, as well as
numerous Windows registry modifications, in an equivalent virtualized
environment contained within a single executable file.
❖ Simply copying the executable to a target system and running it brings up the
application as if the entire installation process had taken place on the target.
Network Virtualization
❖ Network virtualization is the connection of software-based emulations of
network components, such as switches, routers, firewalls, and
telecommunication networks in a manner that represents a physical
configuration of these components.
❖ This allows operating systems and the applications running on them to
interact with and communicate over the virtual network in the same manner
they would on a physical implementation of the same network architecture.
❖ A single physical network can be subdivided into multiple virtual local area
networks (VLANs), each of which appears to be a complete, isolated network
to all systems connected on the same VLAN.
❖ Multiple computer systems at the same physical location can be connected to
different VLANs, effectively placing them onseparate networks.
Storage Virtualization
❖ A storage virtualization system manages the process of translating logical data
requests to physical data transfers. Logical data requests are addressed as block
locations within a disk partition. Following the logical-to-physical translation,
data transfers may ultimately interact with a storage device that has an
organization completely different from the logical disk partition.
❖ The process of accessing physical data given a logical address is similar to the
virtual-to-physical address translation process in virtual memory systems
❖ The logical disk I/O request includes information such as a device identifier and
a logical block number.
❖ This request must be translated to a physical device identifier and block number.
The requested read or write operation then takes place on the physical disk.
❖ There are somen improvement in Storage Virtualization
❖ Centralized management / Replication / Data migration
Categories of processor virtualization
❖ With full virtualization, binary code in operating systems and applications runs
in the virtual environment with no modifications whatsoever.
❖ Guest operating system code performing privileged operations executes under
the illusion that it has complete and sole access to all machine resources and
interfaces.
❖ The hypervisor manages interactions between guest operating systems and
host resources, and takes any steps needed to deconflict access to I/O devices
and other system resources for each virtual machine under its control.
Trap-and-emulate virtualization
❖ Gerald J. Popek and Robert P. Goldberg described the three properties a
hypervisor must implement to efficiently and fully virtualize a computer
system.
❖ Equivalence: Programs (including the guest operating system) running in a
hypervisor must exhibit essentially the same behavior as when they run directly
on machine hardware, excluding the effects of timing.
❖ Resource control: The hypervisor must have complete control over all of the
resources used by the virtual machine.
❖ Efficiency: A high percentage of instructions executed by the virtual machine
must run directly on the physical processor, without hypervisor intervention.
Trap-and-emulate virtualization(Con’td)
❖ The hardware and operating system of the computer on which it is running
must grant the hypervisor the power to fully control the virtual machines it
manages.
❖ In a hypervisor implementing the trap-and-emulate virtualization method,
portions of the hypervisor run with kernel privilege, while all guest operating
systems operate at the user privilege level.
❖ Kernel code within the guest operating systems executes normally until a
privileged instruction attempts to execute or a memory-access instruction
attempts to read or write memory outside the user-space address range
available to the guest operating system. When the guest attempts any of these
operations, a trap occurs.
Exception types: faults, traps, and aborts
❖ A fault is an exception that ends by restarting the instruction that caused the
exception. For example, a page fault occurs when a program attempts to
access a valid memory location that is currently inaccessible. After the page
fault handler completes, the triggering instruction is restarted, and execution
continues from that point.
❖ A trap is an exception that ends by continuing the execution with the
instruction following the triggering instruction. For example, execution
resumes after the exception triggered by a debugger breakpoint by
continuing with the next instruction.
❖ An abort represents a serious error condition that may be unrecoverable.
Problems such as errors accessing memory may cause aborts.
Paravirtualization
❖ In Paravirtualization, the hypervisor is installed on the device. Then, the
guest operating systems are installed into the environment.
❖ The virtualization method modifies the guest operating system to
communicate with the hypervisor. Thus, it reduces the time taken by the
operating system to perform operations that are difficult and take a
longer time in a virtual environment.
❖ Paravirtualization helps to increase the performance of the system.
Moreover, the guest operating systems communicate with the hypervisor
using API calls.
Binary translation
❖ Problematic instructions within processor architectures that lack full support for
virtualization is to scan the binary code prior to execution to detect the presence
of nonvirtualizable instructions. Where such instructions are found, the code is
translated into virtualization-friendly instructions that produce identical effects.
❖ Static binary translation recompiles a set of executable images into a form ready
for execution in the virtual environment. This translation takes some time, but it is a
one-time process providing a set of system and user images that will continue to
work until new image versions are installed, necessitating a recompilation
procedure for the new images.
❖ Dynamic binary translation scans sections of code during program execution to
locate problematic instructions. When such instructions are encountered, they are
replaced with virtualizable instruction sequences.
Hardware emulation
❖ When emulating processor hardware, each instruction executing in an
emulated guest system must be translated to an equivalent instruction or
sequence of instructions in the host ISA.
❖ One example of hardware emulation tools is the open source QEMU
machine emulator and virtualizer, which supports the running of operating
systems for a wide variety of processor architectures on an impressive list
of differing architectures, with reasonably good performance.
Virtualization challenges
In this section, we will focus on the hosted (type 2) hypervisor because this mode of
operation presents a few added challenges that a bare-metal hypervisor may not
face because the type 1 hypervisor has been optimized to support virtualization.

❖ In a type 2 hypervisor, the host operating system supports kernel and user modes,
as does the guest operating system. As the guest operating system and the
applications running within it request system services, the hypervisor must
intercept each request and translate it into a suitable call to the host kernel.

❖ In a virtualized environment, the hypervisor must manage the interfaces to these
devices whenever the user requests interaction with the guest OS.The degree of
difficulty involved in implementing these capabilities depends on the instruction
set of the host computer.
Unsafe instructions
❖ Processor instructions rely on or modify privileged system state information
are referred to as unsafe. For the trap-and-emulate method to function in a
comprehensively secure and reliable manner, all unsafe instructions must
generate exceptions that trap to the hypervisor.

❖ If an unsafe instruction is allowed to execute without trapping, the isolation of


the virtual machine is compromised and virtualization may fail.
Shadow page tables
A particular problem arises in the x86 architecture due to the fact that virtual
memory page table configuration data must be stored within the processor to
properly configure the system, but that information becomes inaccessible once it
has been stored.
❖ To resolve this issue, the hypervisor maintains its own copy of the page table
configuration data, referred to as shadow page tables.

❖ The shadow page tables are not actual page tables managing memory for the
host OS, it is necessary for the hypervisor to set access permission restrictions
on shadow page table memory regions and intercept the resulting traps when
the guest OS attempts to access its page tables.The hypervisor then emulates
the requested operation by interacting with the physical MMU through calls to
the host OS.
Security
❖ The hypervisor provides an additional avenue that
an attacker may attempt to exploit in a virtualized
environment.

❖ If malicious users manage to penetrate and take


control of the hypervisor, this will grant full access
to all of the guest operating systems, the
applications and data accessible from within the
guests because they operate at a lower privilege
level granting the hypervisor full control over them.
Virtualizing modern processors

➢ x86 processor virtualization


➢ x86 hardware virtualization
➢ ARM processor virtualization
➢ RISC-V processor virtualization
x86 processor virtualization

➔ architecture was not originally designed to support the execuiton of


virtualized operating systems
➔ containing several unsafe but non-trapping instructions
➔ problems for example allowing the guest operating system to access privileged
registers that do not contain data corresponding to the state of the virtual
machine
Current Privilege Level(CPL)

➔ 0 for kernel code


➔ 3 for user applications
➔ 3 for virtual machines
Unsafe Instructions

➔ Interrupt Descriptor Table


Register (IDTR)
➔ only one IDTR in physical
single-core x86 processor
➔ Can write IDTR at CPL 0
➔ Can read IDTR at CPL 3
Host OS Guest OS

➔ Write IDTR at CPL 0 with trap ➔ Write shadow register at CPL 0


➔ Read IDTR at CPL 3 without a with trap
trap ➔ Read shadow register without a
trap
➔ Can’t read :(
Pentium x86 architecture

➔ Of the hundreds of instructions in


the Pentium ISA, 17 were found to
be unsafe but non-trapping
➔ These instructions are
non-virtualizable
➔ For the Pentium x86
architecture,implementing a pure
trap-and-emulate virtualization
approach is therefore not possible
x86 hardware virtualization

● Between 2005 and 2006,Intel and AMD released versions of the x86
processors containing hardware extensions supporting virtualization
● These extensions resolved the problems caused by the privileged but
non-trapping instructions
● Enabling full system virtualization under the Popek and Goldberg criteria
● AMD-V in AMD processors and VT-x in Intel processors
ARM processor virtualization

❖ ARMv8-A architecture supports


virtualization in both the 32-bit and
64-bit
❖ Full trap-and-emulate virtualization
❖ A dedicated exception category for
hypervisor use
❖ Additional registers supporting
hypervisor exceptions and stack
pointers
❖ either type-1 or type-2 hypervisor
RISC-V processor virtualization

★ virtualization support as a baseline


requirement from the beginning
★ Both type-1 and type-2 hypervisor
★ Each hardware thread in RISC-V
runs at one of three privilege levels:
User(U), Supervisor(S), Machine(M)
★ additional configuration bit, V bit
V-bit

★ The V bit is set to 1 for hardware threads executing in a virtualized guest


★ User (U) => V bit set to 1=>Virtual User (VU)
★ Supervisor (S) => V bit set to 1 => Virtual Supervisor (VS)
★ Machine(M) only functions in non-virtualized manner (V=0)
★ Both VU and VS modes, RISC-V implements a two-level address translation
scheme that converts each guest virtual address first to a guest physical
address and then to a supervisor physical address
Virtualization tools
VirtualBox - free, open source type 2 hypervisor from Oracle
Corporation.
VMware Workstation - first released in 1999, type 2
hyperviosr, require the purchase of licenses.
VMwareWorkstation Player is available at no cost but for
non-commercial purposes only.
VMware ESXi - type 1 hypervisor, for enterprise-class
deployments in data centers and cloud server farms, has
service console. Administrators can oversee and manage the
operation of a large-scale data center. hok lee pr
KVM - kernel-based virtual machine,type 2 hypervisor
initially released in 2007, must include the AMD-V or Intel VT
virtualization extensions, supports paravirtualization using
VirtIO API.

Xen - first released in 2003,free and open source type 1


hypervisor, runs at the most privileged level, full access to the
system hardware. Largest commercial cloud service providers
including Amazon EC2 use Xen as primary platform.

QEMU - quick emulator,free and open source emulator, uses


hardware virtualization,emulate at the level of a single
application or an entire complete computer system that can
use different ISA. Unique ,no need privilege, entirely emulates
guest system in software.
Thanks For

Paying Attention
R.I.P Cheems (2011 - 2023)

You might also like