0% found this document useful (0 votes)
116 views13 pages

Cs 6210 Spring 2016 Midterm Soln: Name: - Tas Plus Kishore - GT Number

This document provides instructions and questions for a midterm exam in CS 6210 Spring 2016. It begins by listing instructions for taking the exam such as writing your name and GT number, that it is closed book, and to answer questions concisely. It then provides a table to track points earned and running total. The exam then asks multiple choice and short answer questions about operating system structures like Exokernels and SPIN, virtualization concepts like interrupts and memory ballooning, and techniques like adjusting memory allocated to VMs.

Uploaded by

Jim Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views13 pages

Cs 6210 Spring 2016 Midterm Soln: Name: - Tas Plus Kishore - GT Number

This document provides instructions and questions for a midterm exam in CS 6210 Spring 2016. It begins by listing instructions for taking the exam such as writing your name and GT number, that it is closed book, and to answer questions concisely. It then provides a table to track points earned and running total. The exam then asks multiple choice and short answer questions about operating system structures like Exokernels and SPIN, virtualization concepts like interrupts and memory ballooning, and techniques like adjusting memory allocated to VMs.

Uploaded by

Jim Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

CS 6210 Spring 2016 Midterm Soln

Name:____TAs plus Kishore_________________GT Number:

Note:

1. Write your name and GT number AT LEAST on the first page.


2. The test is CLOSED BOOK and NOTES.
3. Please provide the answers in the space provided. You can use scratch
paper (provided by us) to figure things out (if needed) but you get
credit only for what you put down in the space provided for each
answer.
4. For conceptual questions, concise bullets (not wordy sentences) are
preferred. YOU DONT HAVE TIME TO WRITE WORDY SENTENCES
5. While it is NOT REQUIRED, where appropriate use figures to convey your
points (a figure is worth a thousand words!)
6. Illegible answers are wrong answers.
7. DONT GET STUCK ON ANY SINGLE QUESTIONFIRST PASS: ANSWER QUESTIONS
YOU CAN WITHOUT MUCH THINK TIME; SECOND PASS: DO THE REST.

Good luck!

Question number Points earned Running total


1 (0 min) (Max: 1 pts)
2 (8 min) (Max: 17 pts)
3 (7 min) (Max: 12 pts)
4 (15 min) (Max: 30 pts)
5 (15 min) (Max: 30 pts)
6 ( 5 min) (Max: 10 pts)
Total (50 min) (Max: 100 pts)

1. (0 min, 1 point) (This is a freebie, you get 1 point regardless)


The half-time musical guest for the 2016 NBA all-star game last Sunday was
(a) Beyonc
(b) Bruce Springsteen
(c) Eminem
(d) Sting
(e) Dr. Dre
(f) How do I know...I was immersed in CS 6210 prep

Page 1 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
OS Structures
2. (8 min, 17 points)
(a) (9 points) (Exokernel)
A process is currently executing on the processor. The process makes a
system call. List the steps involved in getting this system call serviced
by the operating system that this process belongs to. You should be precise
in mentioning the data structures involved in Exokernel that makes this
service possible.
System Call traps to Exokernel. (+2)
Exokernel identifies the library OS responsible for handling this system
call using Time Slice Vector (+2)
Exokernel uses the PE (Processor Environment) data structure associated with
this library OS to get the system call context, which is the entry point
registered with Exokernel by the library OS for system calls. (+2)
Exokernel upcalls the library OS using this entry point. (+2)
The library OS services the system call (+1)

(b) (4 points) (SPIN)


Explain how SPIN makes OS service extensions as cheap as a procedure call.
(Concise bullets please)
SPIN implements each service extension as a Modula-3 object: interactions to
and from this object are compile time checked and runtime verified. Thus
each object is a logical protection domain. (+2)
SPIN and the service extensions are co-located in the same hardware address
space. This means that OS service extensions do not require border crossing,
and can be as cheap as a procedure call. (+2)

(c) (4 points) (SPIN)


Give two reasons in the form of concise bullets, explaining why SPINs
vision of OS extensibility purely via Language-enforced protection checks
may be impractical for building real operating systems.
Modifying architectural features (e.g., hardware registers in the CPU, memory
mapped registers in device controllers) may necessitate a service extension
to step out of their logical protection domains (i.e.,Modula-3 compiler-
enforced access checks). (+2)
A significant chunk (upwards of 50%) of the OS code base (e.g., device
drivers) is from third-party OEM vendors and it is difficult if not
impossible to force everyone to use a specific programming language. (+2)

Page 2 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
3. (7 mins, 12 points)

(a) (2 points) (L3)


What is the main assertion of L3 microkernel?
Microkernel-based design of an OS need not be performance deficient. (+1)
With the right abstractions in the microkernel and architecture-specific
implementation of the abstractions, microkernel-based OS can be as performant
as a monolithic kernel. (+1)

(b) (2 points) (L3)


Why is the assertion of L3 at odds with the premise of SPIN/Exokernel?
Spin/Exokernel used Mach as an exemplar for microkernel-based OS structure
whose design focus was on portability (+1)
On the other hand, L3 argues that the primary goal of a microkernel should be
performance not portability. (+1)

(c) (L3)
Consider a 64-bit paged-segmented architecture. The virtual address space
is 264 bytes. The TLB does not support address space id, so on an address
space switch, the TLB has to be flushed. The architecture has two segment
registers:
LB: lower bound for a segment
UB: upper bound for a segment
i. (4 points) There are three protection domains each requiring 32 MiB of
address space (Note: Mi is 220). How would you recommend implementing
the 3 protection domains that reduces border crossing overheads among
these domains?

All three protection domains can be packed in 1 address space (+1)


Each address space takes up: 2^25 B
LB, UB(range) for each address space
0 , (2^25-1) (+1)
(2^25) , (2^26-1) (+1)
(2^26) , ((2^25)*3)-1 (+1)

ii. (4 points) Explain with succinct bullets, what happens upon a call
from one protection domain to another.
UB and LB hardware registers are changed to correspond to the called
protection domain.
Context switch is made to transfer control to the entry point in the called
protection domain.
The architecture ensures that virtual addresses generated by the called
domain are within the bounds of legal addresses for that domain.
There is no need to flush the TLB on context switch from one protection
domain to another.
(+1 for each bullet above)

Page 3 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
Virtualization
4. (15 mins, 30 points)
(a) (2 points) (interrupts)
The hypervisor gets an interrupt from the disk. How does it determine which
virtual machine it has to deliver the interrupt?

The interrupt is a result of a request that originated from some specific


VM. The hypervisor tags each request dispatched to the disk controller with
the id of the requesting VM. Using this information, the interrupt is
delivered to the appropriate VM.
(+2 if request association to the interrupt is mentioned)

(b) (2 points) (interrupts)


The hypervisor receives a packet arrival interrupt from the network
interface card (NIC). How does it determine which virtual machine it has to
deliver the interrupt?

Every packet from the network will have the MAC address of the NIC to which the
packet is destined. The MAC addresses are uniquely associated with the VMs. Based
on the MAC addresses associated with the VMs and the destination field in the IP
header of the packet, the packet arrival interrupt is delivered to the VM.
[Note: As an aside, NAT protocol on your home router connecting several home
devices to the ISP works quite similarly.]
(+2 if MAC address of NICs associated with the VMs to direct intrpt mentioned)
(c) (6 points) (ballooning)
A virtualized setting uses ballooning to cater to the dynamic memory needs
of VMs. Imagine 4 VMs currently executing on top of the hypervisor. VM1
experiences memory pressure and requests the hypervisor for 100 MB of
additional memory. The hypervisor has no machine memory available currently
to satisfy the request. List the steps taken by the hypervisor in trying to
give the requested memory to VM1 using ballooning. (Concise bullets
please).

Hypervisor keeps information on the memory allocated and actively used by


each of the VMs. (+1)
This allows the hypervisor to decide the amount of memory to be taken from
each of the other VMs to meet VM1s request. (+1)
It communicates the amount of memory to be acquired from each VM to the
balloon driven in that VM. (+2)
The balloon drivers go to work in the respective VMs and return the released
machine pages to the hypervisor. (+2)
It gives the requested memory to the needy VM (VM1 in this case).

Page 4 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
(d) (2 points) (memory reclamation)
One of the techniques for efficient use of memory which is a critical
resource for performance is to dynamically adjust the memory allocated to a
VM by taking away some or all of the idle memory (unused memory) from a
VM. This is referred to in the literature as taxing the VM proportional to
the amount of idle memory it has. Why is a 100% tax rate (i.e. taking away
ALL the idle memory from a VM) not a good idea?

Because any sudden increase in the working set size of the VM will result in poor
performance for that VM potentially violating the SLA for that VM.

(+2 if working set size increase mentioned)

Page 5 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
(e) (6 points) (para virtualization)
Using a concrete example (e.g., a disk driver), show how copying memory
buffers between guest VMs and the hypervisor is avoided in a para
virtualized setting?

I/O ring data structure shared between


hypervisor and guest VM (+1)
Each slot in the I/O ring is a descriptor for
a unique request from the guest VM or a
unique response from the hypervisor (+1)
Address pointer to the physical memory
page corresponding to a data buffer in the
guest OS is placed in the descriptor (+2)
The physical memory page is pinned for the
duration of the transfer (+2)

Page 6 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
(f) (full virtualization)
In a fully-virtualized setting, Shadow Page Table (S-PT) is a data structure
maintained by the hypervisor. Answer the following questions pertaining to
S-PT.
(i) (2 point) How many S-PT data structures are there?

One per guest OS currently running. (all or nothing)

(ii) (2 points) How big is the S-PT data structure?

Proportional to the number of processes in that guest OS. (all or nothing)

(iii) (2 points) What does an entry in the S-PT contain?

In principle it is a mapping from PPN to MPN. (+1)


However, since S-PT is the real hardware page table used by the architecture for
address translation (VPN->MPN), the hypervisor keeps the VPN -> MPN mapping as each
entry in the data structure. (+1)

(iv) (6 points) The currently running OS, switches from one process (say
P1) to another process (P2). List the sequence of steps before P2 starts
running on the processor.

Guest OS executes the privileged instruction for changing the PTBR to point
to the PT for P2. Results in a trap to the hypervisor. (+2)
From the PPN of the PT for P2, the hypervisor will know the offset into the
S-PT for that VM where the PT resides in machine memory. (+2)
Hypervisor installs the MPN thus located as the PT for P2 into PTBR. (+2)
Once other formalities are complete associated with context switching (which
also needs hypervisor intervention) such as saving the volatile state of P1
into its PCB, and loading the volatile state of P2 from its PCB into the
processor, the process P2 can start executing.

Page 7 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
Synchronization, Communication, and Scheduling in Parallel Systems
5. (15 mins, 30 points)
(a) (5 points) (Answer True/False with justification)
Sequential consistency memory model makes sense only in a non-cache coherent
(NCC) shared memory multiprocessor.

False. (+1)
Sequential consistency memory model is a contract between software and hardware.
(+2)

It is required for the programmer to reason about the correctness of the software.
(+1)

Cache coherence is only a mechanism for implementing the memory consistency model.
It can be implemented in hardware or software. (+1)

(b) (MCS lock)


Recall that the MCS lock algorithm implements a queue of lock requestors
using a linked list. The MCS algorithm uses an atomic fetch-and-store(X,Y)
primitive in its implementation. The primitive returns the old value of X,
and stores Y in X. Assume the current state of a lock is as shown below:

i. (4 points) Assume two threads T1 and T2 make a lock request


simultaneously for the same lock. What sequence of actions would have
brought the data structures to the intermediate state shown below from
the current state?

Though T1 and T2 are doing the lock request simultaneously, their


attempt at queuing themselves behind the current lock holder (curr)
(will get sequentialized through the atomic fetch-and-store operation.
(+2)
In the picture above, T1 has definitely done a fetch-and-store. So the
lock is pointing to it as the lock requestor. (+1)
As for T2, the thread has allocated the queue node data structure, but
there are two possibilities with respect to where it will be in the
lock queue(either one will get full credit): (+1)
o (Possibility 1) T2 may have done a fetch-and-store prior to T1.
o (Possibility 2) T2 is yet to do fetch-and-store

Page 8 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
ii. (2 points) What does each of T1 and T2 know about the state of the
data structures at this point of time?

Possibility 1:
T2 knows its predecessor is curr (+1)
T1 knows its predecessor is T2 (+1)

Possibility 2:
T2 does not know anything about the queue associated with L (+1)
T1 knows its predecessor is curr (+1)

iii. (2 points) What sequence of subsequent actions will ensure the correct
formation of the waiting queue of lock requestors behind the current
lock holder?

Possibility 1:
T2 will set the next pointer in curr to point to T2. (+1)
T1 will set the next pointer in T2 to point to T1. (+1)

Possibility 2:
T1 will set the next pointer in curr to point to T1. (+1)
T2 will do a fetch-and-store on L->next; this will result in two
things: (+1)
o T2 will get its predecessor T1
T2 will set the next pointer in T1 to point to T2
o L->next will now point to T2

Page 9 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
(c) (MCS barrier)
i. (6 points) The MCS barrier uses a 4-ary arrival tree where the nodes
are the processors and the links point to children. What will the
arrival tree look like for 16 processors labeled P0-P15? Use P0 as the
root of the tree. You can either draw the tree or give your answer by
showing the children for each of the 16 processors below:
P0: [P1, P2, P3, P4 ]
P1: [P5, P6, P7, P8 ]
P2: [P9, P10, P11, P12 ]
P3: [P13, P14, P15, X ]
P4: [ ]
P5: [ ]
P6: [ ]
P7: [ ]
P8: [ ]
P9: [ ]
P10: [ ]
P11: [ ]
P12: [ ]
P13: [ ]
P14: [ ]
P15: [ ]
(+1 for each of the rows P0 thru P3)
(+2 for showing no children for P4-P15)
ii. (3 points) What is the reason for such a construction of the arrival
tree? (Concise bullets please)
Unique and static location for each processor to signal barrier
completion
Spinning on a statically allocated local word-length variable by
packing data for four processors reduces bus contention
4-ary tree construction shows the best performance on sequent symmetry
used in the experimentation in MCS paper
(+1 for each bullet above)
(d) (4 points) (LRPC)
Recall that light-weight RPC (LRPC) is for cross-domain calls within a
single host without going across the network. The kernel allocates A-stack
in physical memory and maps this into the virtual address space of the
client and the server. It also allocates an E-stack that is visible only to
the server. What is the purpose of the E-stack? (Concise bullets please)
By procedure calling convention, the server procedure expects the actual
parameters to be in a stack in its address space. E-stack is provided for
this purpose.
The arguments placed in the A-stack by the client stub are copied into the E-
stack by the server stub. Once this is done, the server procedure can
execute as it would in a normal procedure call using the E-stack.

(+2 for each bullet)

Page 10 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
(e) (4 points) (multiprocessor scheduling)
A processor has 2 cores and each core is 4-way multithreaded. The last
level cache of the processor is 32 MB. The OS has the following pool of
ready to run threads:
Pool1: 8 threads each having a working set of 1 MB (medium priority)
Pool2: 3 threads each having a working set of 4 MB (highest priority)
Pool3: 4 threads each having a working set of 8 MB (medium priority)
The OS can choose any subset of the threads from the above pools to schedule
on the cores. Which threads should be scheduled that will make full use of
the available parallelism and the cache capacity while respecting the thread
priority levels?

Pool 1 - 3 threads (3MB)


Pool 2 - 3 threads (12MB)
Pool 3 - 2 threads (16MB)

Total cache used: 31 MB

(-1 if all Pool 2 threads not scheduled)


(-3 if only Pool 1 threads scheduled scheduled)

Page 11 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
Parallel System Case Studies
6. (5 mins, 10 points) (Tornado)
(a) (5 points)
Each core in the figure below is 4-way hardware
multithreaded. The workload managed by the OS
is multithreaded multiprocesses. You are
designing a thread scheduler as a clustered
object. Give the design considerations for this
object. Discuss with concise bullets the
representation you will suggest for such an
object from each core (singleton, partitioned,
fully replicated).

Use one representation of the thread scheduler object for each processor
core.
Each representation has its own local queue of threads.
No need for locking the queue for access since each representation is unique
to each core
Each local queue is populated with at most 8 threads since each processor is
4-way hardware multithreaded (this is just a heuristic to balance interactive
threads with compute-bound threads). The threads in a local queue could be a
mix of threads from single-threaded processes and multi-threaded processes.
Each local queue is populated with the threads of the same process (up to a
max of 4) when possible.
If a process has less than 16 threads, then the threads are placed in the
local queues of the 4 cores in a single NUMA node.
If a process has more than 16 threads, then the threads are split into
multiple local queues so that the threads of the same process can be co-
scheduled on the different nodes (often referred to as gang scheduling)
Implement entry point in thread scheduler object for peer representations
of the same object to call each other for work stealing.

(This is a bit open ended so we have to be lenient in grading.


Give full credit if their answer explains one of the following
(a) one representation per processor core
OR
(b) one representation per NUMA node (shared by the 4 cores)
AND
a good reasoning to back the choice)

(-2) if not one of (a) or (b)


(-3) if no reasoning
(-2) if incomplete reasoning

Page 12 of 13
CS 6210 Spring 2016 Midterm Soln
Name:____TAs plus Kishore_________________GT Number:
(b) (5 points)
For the above multiprocessor, we are implementing a DRAM object to physical
memory. Give the design considerations for this object. Discuss with
concise bullets the representation you will suggest for such an object from
each core (singleton, partitioned, fully replicated).

One representation of the DRAM object for each NUMA piece (i.e., shared by
all the 4 cores. Each representation manages the DRAM at that node for memory
allocation, and reclamation. (+3)
Core- and thread-sensitive allocation of physical frames to ward off false
sharing among threads running on different cores of the NUMA node.
(+2)

(-2) if no mention of allocation for different cores


(-2) if NUMA-ness does not figure in choice of representation

Page 13 of 13

You might also like