Final Exam Notes
Final Exam Notes
Multiprocessor systems --> which allow multiple process to run in parallel at the same time, are
increasingly commonplace. These systems use multicore processor in which multiple CPU cores
are packed onto a single chip.
A primary one is that a typical application (i.e., some C program you wrote) only uses a single CPU;
adding more CPUs does not make that single application run faster. To remedy this problem, you’ll
have to rewrite your application to run in parallel, perhaps using threads. Multi-threaded applications
can spread work across multiple CPUs and thus run faster when given more CPU resources.
Caches are small, fast memories that (in general) hold copies of popular data that is found in the main
memory of the system. Main memory, in contrast, holds all of the data, but access to this larger
memory is slower. By keeping frequently accessed data in a cache, the system can make the large,
slow memory appear to be a fast one.
In a system with a single CPU, there are a hierarchy of hardware caches that in general help the
processor run programs faster. By keeping frequently accessed data in a cache, the system can make
the large, slow memory appear to be a fast one.
consider a program that issues an explicit load instruction to fetch a value from memory, and a simple
system with only a single CPU; the CPU has a small cache (say 64 KB) and a large main memory.
The first time a program issues this load, the data resides in main memory, and thus takes a long time
to fetch (perhaps in the tens of nanoseconds, or even hundreds). The processor, anticipating that the
data may be reused, puts a copy of the loaded data into the CPU cache. If the program later fetches
this same data item again, the CPU first checks for it in the cache; if it finds it there, the data is
fetched much more quickly (say, just a few nanoseconds), and thus the program runs faster.
Caches are thus based on the notion of locality, of which there are two kinds: temporal locality and
spatial locality.The idea behind temporal locality is that when a piece of data is accessed, it is likely to
be accessed again in the near future; imagine variables or even instructions themselves being
accessed over and over again in a loop. The idea behind spatial locality is that if a program accesses a
data item at address x, it is likely to access data items near x as well; here, think of a program
streaming through an array, or instructions being executed one after the other. Because locality of
these types exist in many programs, hardware systems can make good guesses about which data to
put in a cache and thus work well.
Cache Coherence Problem: Imagine, for example, that a program running on CPU 1 reads a data item
(with value D) at address A; because the data is not in the cache on CPU 1, the system fetches it from
main memory, and gets the value D. The program then modifies the value at address A, just updating
its cache with the new value D ′ ; writing the data through all the way to main memory is slow, so the
system will (usually) do that later. Then assume the OS decides to stop running the program and
move it to CPU 2. The program then re-reads the value at address A; there is no such data CPU 2’s
cache, and thus the system fetches the value from main memory, and gets the old value D instead of
the correct value D ′
Solution To Cache Coherence ---> Bus Snooping: Each cache pays attention to memory updates by
observing the bus that connects them to main memory. When a CPU then sees an update for a data
item it holds in its cache, it will notice the change and either invalidate its copy (i.e., remove it from its
own cache) or update it (i.e., put the new value into its cache too).
Race Conditions Prroblem: This problem arises when multiple CPUs want to access and update the
same shared resource at the same time. Lets say that a linked list is being shared across multiple CPUs
and one CPU removes all the data from the list but at the same time another CPU wants to get the ith
data from the list. This can lead to unexpected behaviour or in consistency in data.
Solution To Race Conditions: ---> Mutual Exclusion. This refers to the use of locks. Whenever we are
updating or accessing a shared resource in our program, we must first lock the shared resource,
indicating that no other CPU can access/update the shared resource until this CPU releases it, then do
the required work and once the CPU is done using the shared resource, it will unlock the shared
resource. This is studied ahead in much more detail.
Cache Affinity Problem: When a process runs on a particular CPU, it builds up a fair bit of state in the
caches (and TLBs) of the CPU. The next time the process runs, it is often advantageous to run it on the
same CPU, as it will run faster if some of its state is already present in the caches on that CPU. If,
instead, one runs a process on a different CPU each time, the performance of the process will be
worse, as it will have to reload the state each time it runs. This idea of running a process on the same
CPU it was previously running on is known as cache affinity.
Solution To Cache Affinity --> The multiprocessor scheduler, which decides, which process should get
the next use of which CPU, should consider cache affinity when making its scheduling decisions,
thereby preferring to keep a process on the same CPU if possible.
MULTIPROCESSOR SCHEDULING POLICIES: By these policies, the OS decides which ready process
should get the net use of which CPU.
Policy: There will be a single queue of all ready processes, globally available to all CPUs. From this
queue, the first available job at the front of the queue will be sent to the first available CP.
Problem 1: Lack Of Scalability --> To ensure the scheduler works correctly on multiple CPUs, the
developers will have inserted some form of locking into the code. With this policy locks, can greatly
reduce performance, particularly as the number of CPUs in the systems grows because the system
spends more and more time in lock overhead (ie waiting for the lock to be released)and less time
doing the work the system should be doing.
Problem 2: Lack Of Cache Affinity ---> We can see that with this policy, the scheduler does not takes
into account which process ran on which CPU. It simply gives the first available CPU to the first ready
process available, instead of trying to run the process on the same CPU it previousl ran on.
To handle this problem, most SQMS include some kind of affinity mechanism to try to make it more
likely that the process will continue to run on the same CPU if possible.. Specifically, one might
provide affinity for some jobs, but move others around to balance load. See below:
Due to these limitations of the SQMS policy, we try to derrive some other policy for multi processor
systems:
Policy: There are multiple queues for handling ready processes … Each CPU has one queue and each
CPU will take on a process from its own queue only.. Each queue will likely follow a particular
scheduling discipline, such as round robin (give each process in the queue a specific time slice to run
on CPU).When a job enters the system, it is placed on exactly one scheduling queue, according to
some heuristic (e.g., random, or picking one with fewer jobs than others). Then it is scheduled
essentially independently, thus avoiding the problems of information sharing and synchronization
found in the single-queue approach.
Assume we have a system where there are just two CPUs (labeled CPU 0 and CPU 1), and some
number of jobs enter the system: A, B, C, and D for example. Given that each CPU has a scheduling
queue now, the OS has to decide into which queue to place each job. It might do something like this:
Depending on the queue scheduling policy, each CPU now has two jobs to choose from when deciding
what should run. For example, with round robin, the system might produce a schedule that looks like
this:
We can see that the cache affinity is always provided here because since each CPU has its own queue,
hence when a process leaves the CPU, it goes back into the queue of that CPU it was running on. So
the next time this process will be loaded from the same queue into the same CPU.Moreover, MQMS
is much more scalable because as the number of CPU grows, the number of queues also grows, due to
which lock and cache contention should not become a problem.
Problem: Load Imbalance ---> Its not always the case that all queues have the same number of
processes at all times. Some queues might have lesser processes or smaller processes, due to which
the CPU of this queue will finish its processes much quickly as compared to others and its queue will
be empty due to which now this CPU is just sitting idle and not doing any useful work (ie wastes CPU
cycles).
Let’s assume we have the same set up as above (four jobs, two CPUs), but then one of the jobs (say C)
finishes. We now have the following scheduling queues:
As you can see from this diagram, A gets twice as much CPU as B and D, which is not the desired
outcome. Even worse, let’s imagine that both A and C finish, leaving just jobs B and D in the system.
The two scheduling queues, and resulting timeline, will look like this:
Solution: Migration ---> Move/migrate jobs from the CPU which has a higher load to the CPU which
has a comparatively lesser load. In this way, a true load balance can be achieved ie eah CPU having
the same number of jobs.
the OS should simply move one of B or D to CPU 0. The result of this single job migration is evenly
balanced load and everyone is happy.
In this case, a single migration does not solve the problem. It requires continous migration of one or
more jobs. We can keep switching jobs as seen below:
So there are multiple different techniques for migration which the OS can decide.One basic approach
is to use a technique known as work stealing.With a work-stealing approach, a (source) queue that is
low on jobs will occasionally peek at another (target) queue, to see how full it is. If the target queue is
(notably) more full than the source queue, the source will “steal” one or more jobs from the target to
help balance load.
No common scheduler is there. Three different kinds of multiprocessor schedulers are there:
O(1) scheduler,
Completely Fair Scheduler (CFS),
BF Scheduler (BFS)
Both O(1) and CFS use multiple queues, whereas BFS uses a single queue, showing that both
approaches can be successful. O(1) scheduler is a priority-based scheduler (similar to the MLFQ
discussed before), changing a process’s priority over time and then scheduling those with highest
priority, thereby making sure interactivity is a particular focus. CFS and BFS, in contrast, use a
deterministic proportional-share approach.
CHAPTER 13:
EARLY COMPUTER SYSTEMS:
The computer systems back in the days were single processing system meaning that only one
processor can run at a time. Therefore, they were relatively simple and easy to build. Moreover, much
abstraction was not provided to the users.
The operating system was basically a library/set of routines, which sat in the memory at the physical
address 0. There would be exactly one running program (ie one process) which took up the entire of
the remaining physical memory.
Reason For Such A Simple Structure: Back in the days, tasks were relatively simple and users did not
expected much. There were no headaches of reliability or high performance or ease of use.
Back in the days machines were expensive. Hence people wanted these machines to work much more
effectively. Thus the era of multiprogramming was born:
Multiple processes were READY to run at a given time, and the OS would switch between these ready
processes when a running process becomes blocked. For example, there would be exactly one
process which is running (ie using the CPU) and it request to perform some I/O from the hard disk,
which is time consuming. So the OS would block this running process until its I/O is finished, and in
the mean time give the CPU to some other ready process so that it can run. This prevented the waste
of CPU cycles during I/O etc.
Thus in this approach, there is exactly one running process using the CPU, but multiple ready
processes present in the physical memory. By reducing the waste of CPU cycles by switching the
between ready and running processes, the CPU was utilized much more effectively and increased
efficiency.
To further increase efficiency, the era of time sharing was born:
Instead of switching between processes only when I/O was requested, time sharing would allow one
process to run for short while on the CPU, then stop it and load some other ready process and give it
its time slice for which it would run. Thus a context switch would be made after a desired time slice,
which will stop the currently running process and save its state to some kind of disk and load another
process’s state from the disk and run it for a while. And this process repeats
To solve this problem, what we did was instead keeping a ready process in the hard disk, all the ready
processes are kept in the physical memory itself. Therefore, when a context switch is made, the state
of the process will be stored in the physical memory and the state of another ready process will be
loaded form the physical memory itself… Since access to physical memory is faster as compared to
the disk hence this approach is much faster.
For this approach to work safely without one process interfering with any other process, the OS will
need to create some form of abstraction of the physical memory which allows a process to access
only its own space within the physical memory and not the space of some other process. This form of
abstraction is called the ADDRESS SPACE ---> View of the running program in physical memory.
ADDRESS SPACE:
The address space of a process contains all of the memory state of the running program. This includes
the following:
The code of the program lives in the Code Segment of the address space within the physical
memory.
When a program is running, to keep track if where it is in the function call chain as well as to
Allocate local variables and pass parameters and return value to/from routines, a stack is
used.
The heap is used for dynamically-allocated , user managed memory. Eg objects created using
malloc (studied ahead)
Free Space is available to allow the stack and heap to grow
The code segment resides at the top of the address space. Below the code segment is the heap,
followed by the free space, followed by the stack in the end of the address space. This structure is
important because a heap grows positively (ie downwards), whereas a stack grows negatively (ie
upwards).
The address space we saw above is from 0KB to 16KB. This means that the addresses from 0KB to
16KB are used by this process… However, this address from 0KB to 16KB is not the actual physical
address where the program resides in the physical memory. It’s the address where the program
thinks it is ie virtual address.
Therefore, a program will always think its placed at address 0 of the physical memory. But in reality,
the program is placed somewhere else at some other address in the physical memory because at
physical address 0 , the OS is placed.
This creation of abstraction/address space for each process by the OS is called the OS to be
virtualizing the physical memory.--> The running program thinks its loaded into the memory at a
particular address and has a potentially the entire physical memory available to it … but in reality its
not.
With this visualization provided by the OS, the OS will need to make sure the following mechanism
also exists for the program to run correctly:
Lets say that the process tries to perform a load from address 0. this is the virtual address where the
process thinks it is. Therefore, now the OS will have to make sure that with some hardware support,
the load does not actually goes to the physical address 0 but rater to a physical address (say 32 KB),
where the process is actually loaded in the physical memory. This is the key to visualization and is
known as address translation (studied ahead in detail)
1) TRANSPARENCY: The virtual memory should be invisible to the running program. IE the running
program should not be aware of the fact that the memory is virtualized; it should think that it has its
own physical memory.
2) EFFICIENCY: To make virtualization efficient, both in terms of time and space, the OS will need
some kind of hardware support (studied ahead)
3) PROTECTION: The OS should make sure to protect processes from one another as well as to protect
itself from other processes IE some running process should not be able to access or affect in any way
the memory contents of any other process or the OS itself (because the OS itself is a process residing
in the physical memory). Hence a program must not access anything outside its address space.
This protection delivers the property of Isolation among processes--> each process should be running
in its own isolated space safe from other processes.
OUTPUT:
By looking at these address we can see that the program thinks that its code is placed at top at
address 0x1095afe50, followed by the heap below it at address 0x1096008c0 and finally with the
stack at the bottom at address 0x7fff691aea64.
If we add more things on heap, then their address in physical memory will increase because heap
grows positively ie downwards in memory. But if we add more things on stack, then their address in
physical memory will decrease because stack grows negatively ie upwards in memory.
Moreover, these address are the virtual addresses where the program think it is… these are NOT the
physical address of where these segments of the program are in the main memory. Thus the OS needs
to make sure to fetch these virtual addresses and translate them into the actual physical address
where they reside in the main memory.
CHAPTER 14:
TYPES OF MEMORY:
In a running C program, there are two types of memory that are allocated,
--->Stack Memory: In this memory, allocations and de-allocations are managed implicitly by the
compiler. Hence it is also called automatic memory. This means that if a function puts allocates any
objects on the stack, then as soon as the function ends/returns, all these objects will be removed/
deallocated from the stack automatically.
--->Heap Memory: In this memory, all allocations and de-allocations are managedd explicitly by the
programmer. This means that if a programmer wants to allocate some object on heap, then he himsef
will have to add it to heap and this object will be removed from the heap only when the programmer
removes/deallocates it from the heap
Example:
When func() is called, then some space is made for this function in the stack memory:
The integer x is stored in this stack and any other objects which the function creates will allocated on
this stack. As soon as we return form the function, the compiler deallocates stack memory. This
means that x and all other objects which this function created on stack are no longer alive beyond the
function’s lifespan.
However, if we want objects to live even beyond the function calls (ie even after the function ends,the
object created by the function should remain alive), then instead of saving them on the stack, we
must allocate them on heap because once the object is in the heap, then it is now upto us when we
want to deallocate this object.
Example:
This is example of a function which is creating an object on heap. Lets understand this instruction:
--->sizeof(int) returns the number of bytes that any integer takes in the memory ie 4 bytes
--->malloc(sizeof(int)) simply creates a space of bytes which can store an integer in the heap memory.
So now the space is created on the heap memory. Now we must have something in the stack by
which we can refer to this heap space. Therefore, we will create a pointer to this heap space. This
pointer is created on the stack and will be pointing to an integer on the heap. So for this we used
---> int* x …. thus now x is a pointer pointing to this memory on heap we created
---> (int *) is called type casting to ensure that whatever value is returned from malloc() call is first
converted in to type int * so that this returned value from malloc() is of the same data type as the
data-type of our x.
STACK | HEAP
The malloc() call is not a system call but is a library call belonging to the library “stdlib.h”.
-->The malloc() call takes exactly one parameter which is the size in bytes of the sapce that you want
to create on the heap.
---> The call will simply create a space of the given bytes on heap and returns the address of this heap
space it created. Hence, we store this address of the sapce we created in some pointer variable so
that we can later use the pointer on stack to refer to this heap space.
--->If fthe call was not able to allocate the space on heap, then it will return NULL.
We do not directly input the size in bytes of the space we want to create like 1,2 4,5 bytes etc.
Instead we use the sizeof() function on the data type which we want this space to hold to get the size
in bytes and pass this value to malloc.
Moreover, we can see that the return type of malloc() is void*. This means that it can return a pointer
to any datatype IE it can return the address of any data type. Thus, to make malloc() generally usable
for all data types, void* is used. Since the return type is void*, hence we used type -casting during
malloc call to convert the void* returned pointer into the data type which we are creating to avoid
any errors. If we did not used type casting, then there will be no error but compiler will only give us a
warning that there might be a data type mismatch of assigning a some data type pointer a void*
pointer.
double* D = (double*)malloc(size(double));
char* C = (char*)malloc(sizeof(char));
//The pointer Array will be pointing to the very first element of the array
//Array++ will move the pointer to point to the next element of the array
//*(Array)++ will increment the value pointed by Array by 1
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *x = (int*)malloc(10 * sizeof(int));
printf(“%d\n” , sizeof(x));
}
//Above we are printing sizeof(x). In this program x acts as a simple integer pointer to an integer.
Hence sizeof(x) will return the size of the integer pointer x which is 4 bytes. Hence this program will
print out 4 bytes
#include <stdio.h>
#include <stdlib.h>
int main()
{
int x[10];
printf(“%d\n” , sizeof(x));
}
//Now this code simply creates an integer arrray of 10 elements on the stack, the name of which is x.
Hence here x is not a pointer to any integer. Instead, x is an array.Therefore, when sizeof(x) is called
here it will return the size of the array which is 40 bytes. Hence 40 will be printed here
int main()
{
char* Original = “Mannan”;
char* Copy = (char* )malloc(strlen(Original) );
strcpy(Copy , Original);
}
‘
Original M a n n a n ‘\0’
In C, a ‘\0’ is called a null-terminator which is ended at the end of a string to indicate that the string is
ended. Its important that this is included at the end of strings because the string library of C uses this
null terminator to perform tasks and if its not present then it can lead to errors.
strlen() will give the number of bytes occupied by this string.This string has 6 character(d not include
the null terminator because its not part of string ) each occupying 1 byte. Therefore, the string in total
occupies 6 bytes of space.
STACK: | HEAP
Copy
--->strcpy(Copy , Original);
Simply copies the characters in Original into the Copy. But note that Copy has space only for the
characters of Original… It does not includes space for the null-terminator. Hence, the null-terminator
would not be automatically added at the end of copy
STACK: | HEAP
Copy asdd a
M n n a n
This is wrong because when Copy is further used in the program, it can cause errors due to the
absence of the null terminator. Therefore, while copying strings always include space for 1 more byte
on the array to keep space for the null-terminator which occupies 1 byte of space
int main()
{
char* Original = “Mannan”;
char* Copy = (char* )malloc(strlen(Original) + 1 );
strcpy(Copy , Original);
Now the null terminator will automatically be added at the end of Copy because there is space left for
it to be added
So we have learned that by using malloc,we can allocate space for an object on the heap. But once we
are done using this object, then it is our responsibility to free this sapce that we allocated using
malloc.
To do so, ie free the space for an object on heap, we use the library call:
free(PointerToHeapObject);
This function simply takes the pointer which points to our space in heap, releases the heap space
which is pointed by this pointer:
int* x = (int*)malloc(sizeof(int));
….
….
free(x); //releasing the heap space
X = NULL; // prevent x to be dangling ie pointing to nothing
SOME COMMON ERRORS WE CAN ENCOUNTER WHILE WORKING WITH malloc AND free
This program will work correctly but there is a logical error here because now Copy does not have the
null terminator to indicate the end of string. These can lead to overreading the Copy variable at later
times because there is nothing to indicate the end of this string
When you forget to free the space allocated on heap, then the heap might become full leading to
memory leak errors. This is an operational problem in long-running applications and not in small
applications:
REASON:.When a program completes it execution, the OS will automatically clean up everything
related to the process, including the heap as well. Therefore, if you forget to clean up the heap in a
small program, then since the program is small, it will complete its execution before the heap
becomes full due to which no errors will occur. But for a long application, forgetting to clean the heap
objects might cause the heap to become full while the program is still running leading to memory leak
errors where there is no more space left on the heap.
The system performs two levels of memory management: In the first level, the OS where it hands out
memory to processes (ie the address space ) when they run and takes it back when the process
exits/dies. The second level is done within each process eg within the heap.ahaence if you fail to call
the free() , the OS will reclaim all the memory of the process (inclduing the sgtack,code,heap etc for
that process. ) when the process is done running.This ensures that no memory is lost despite the fact
that you didn’t free it
int* x = (int*)malloc(sizeof(int));
x = NULL;
free(x);
Here an error will occur because first x was pointing to an object in the heap. But making it NULL
removes this connection due to which now x is pointing to nothing. Therefore, now freeing x means
an error because free now does not know what to free as now x is pointing to nothing. Thus the heap
object created here will remain leaked until the program finishes or system restarts.
int* x = (int*)malloc(sizeof(int));
free(x);
int* x = (int*)malloc(sizeof(int));
Here we will encounter an error saying that the same pointer x is declared twice. First we declared x
to point to an integer. Then when we free the object pointed, then only the heap object gets
destroyed, and x still remains. Thus now when we re delcare x while it still exists in the program, we
get a re-declaration error.
This happens because we freed the heap memory before we ar edone using it
Freeing the same memory more than once is known as double free error.
free() expects us to pass the pointer that we received from malloc() to free the memory pointed by it.
But if any other thing is passed into free, then the program will behave badly.
It must be remembered that malloc() and free() are library calls and not system calls because we have
to include a library in our program to use them ie stdlib.h
Therefore, the malloc() library manages the space within our virtual address space and itself is built
on top of some system calls which call into the OS to ask for more memory or release some back into
the system.
One such system call is ‘brk’ which is used to change the locations of the program’s break(ie location
of end of heap).
Brk takes one argument which is the address of the new brk
-->If NewBreak > CurrentBreak, then the size of the heap increases
--> If NewBreak < CurrentBreak , then the size of the heap decreases
There is another system call which is used to obtain memory from the OS --> mmap().
This system call creates an anonymous memory region within the program ie a region which is not
associated with any particular file. Thus this memory region can be treated as a separate heap.
-->calloc(): This is just like malloc() but it also initializes the dynamic memory created with zeros so
that there are no initialization errors
-->realloc() : This is used to change the size of an already created dynamic object such as arrays. T
makes a new larger region of the memory in the heap, copies the old region into it and returns the
pointer to this new region
CHAPTER 15:
Previously we saw that the memory is virtualized ie each program in the memory has its own address
space. To virtualize memory properly, the OS needs to ensure two things: Efficiency and Control
Moreover, virtualization must also provide flexibility to programs ie Programs should be able to use
their own address sapce in whatever way they like.
To begin memory virtulization, we will take three assumptions, each of which we shall relax one after
another in the chapters ahead:
ASSUMPTION 1) The user’s address space must be placed contigously in the memory
ASSUMPTION 2) The size of the address space is lesser than that of the physical memory
ASSUMPTION 3) Each address space is of the exact same size
We previously learned about virtual address and physical address. Virtual address is the address
where the program thinks it is in the main memory. But physical address is the actual address where
the program/process is in the main memory.
If the OS is designed such that each process is allocated an address space of 16KB, then it means that
from the program’s perspective, its address (virtual) starts from address 0KB and grows to a maximum
address of 16KB. THIS IS WHAT THE PROGRAM THINKS ie VIRTUAL ADDRESS:
32896
32900
32901
Therefore, in actual the process is not at address 0 (but at address 32KB in this case). Thus when the
process says to fetch the instructions from address 128,132 and 135, then there is a need to translate
these virtual addresses into the physical addresses (32896, 32900, 32901) so thaat the instructions
are fetched from the actual address where they are at in the main memory.
The address translation from virtual address to physical address is done by additional hardware called
the Memory Management Unit (MMU). This hardware will make use of Interposition technique for
address translation ---> This means that on each memory access, the hardware will interpose and
translate each virtual address for the memory access issued by the process to a physical address
where the desired information is actually stored.
The benefit of interposition is that it maintains transparency ie the process does not knows that its
memory is virtualised
Why additional hardware is used for address translation and not the OS?
Address Translation is needed for each memory access issued by the process.
If this is done by the OS, then in each clock cycle , some time will be spent by the CPU to do
calculations for the address translation.This will waste a lot of clock cycles, making the system very
inefficient and slow.
Therefore, to make the system efficient by not wasting clock cycles, additional hardware is set up to
perform the address translations when a memory access is made.
-->The technique used for address translation is called Dynamic Reallocation or Base And Bounds
Technique
Each CPU has two hardware registers: Base Register And Bounds/Limit Register
Each Program thinks that it is loaded at address 0 of the main memory. However, when the program
starts running, the OS will decide where in the main memory the process should be loaded in actual
and it will set the Base Register to hold this address where the program is actually loaded in the main
memory.
In previous example, The OS deicded to load the process at address 32KB ie at address 32*1024 =
32786 BYTES. Thus the base register is set to this value: Base Register = 32786
Now when any memory reference/access is generate by the process, the actual physical address is
calculated as below:
In previos example, the process made memory access to virtual addresses 128, 132 and 135. Thus the
corresponding physical addresses are:
--> Physical Address = 128 + 32786 = 32896
--> Physical Address = 132 + 32786 = 32900
--> Physical Address = 135 + 32786 = 32901Therefore, on memory access from virtual address 128,
the OS will fetch data from address 32896 of the main memory.
In this way, the virtual addresses are translated into the physical addresses by the MMU
EXAMPLE) A PROCESS IS LOADED AT ADDRESS 32KB. THE PROCESS MAKES A MEMORY ACCESS FOR
ADDRESS 15 KB (VIRTUAL). WHERE WILL THE MEMORY ACCESS BE MADE IN THE MAIN MEMORY?
This technique of address translation is called Dynamic Reallocation because the reallocation of the
address happens at runtime (ie the conversion occurs when the process is running).
This allows the OS to move the address space of a process while it is still running.
The Bounds/Limit register holds the physical address of the main memory where the process’s
address space ends
For example, if a process begins at address 32KB and each process is to be allocated an address space
of 16KB, then the address space of this process will end at address 32+16 = 48KB. Thus the bounds
register will hold 48KB
The bounds/limit register is required to ensure protection and control ie the processor will fist check
whether or not the memory access made is within the bounds to make sure its legal
In the previous exmaple, lets say that the 16KB process at address 32KB wants to access the memory
at virtual address 17kb, then
We can see that Physical Address > Bounds, which means that the process wants to make a memory
access outside the bounds of its address space. Hence this will be illegal because the process can not
access anything outside its address space. This is known as a bad load. In this event, a segmentation
error will occur and the process will terminate immediately. In this way, the bounds register is used to
ensure protection and control.
NOTE: THERE IS EXACTLY ONE BASE/BOUNDS PAIR FOR ONE CPU. THUS THE NUMBER OF
BASE/BOUNDS PAIR IS EQUAL TO THE NUMBER OF CPUs THE PROCESSOR HAS
NOTE: In previous days, instead of dynamic reallocation (ie translation at runtime), Static reallocation
was used ---> This means that before running the program, the loader software will take the
executable file and rewrites all the virtual addresses to the desired offset in the physical
memory.However, there are 2 disadvantages of this technique:
--? With static reallocation, it becomes difficult for the OS to later reallocate the address space of the
process to another location because all translations were done before run time
--? Static reallocation does not gives any protection because a program can generate bad addresses
and hence illegally access other process’s or even the OS
So we saw how bounds register is used by the hardware to decide whether or not a memory
access/reference is illegal or not. If the access is legal, then the process continues. But if the access is
illegal, then the CPU generates an exception. This raised exception will call the exception handler
which will call the OS to terminate the user process which made an illegal access. Moreover, an
exception will also be raises if the user program tries to changes the values of the base/bounds
registers…it can only be done by the OS.
Now it must be noted that the updating of base/bounds registers must only be done by the OS and
not by the user program. Therefore, there exists a privileged mode which prevent the user-mode
process from executing such privileged operations like updating base/bounds registers or raising
exceptions.
1) Privileged Mode: --> Needed to prevent the user-mode processes from executing privileged
operations
2) Privileged Instructions to update Base/Bounds: --> Enable the OS to set the values of base/bounds
registers before the program runs
3) Privileged Instruction To Register Exception Handlers: --> Enable the OS to tell the hardware which
code to run if an exception occurs
4) Base/Bounds Registers: One pair per CPU to support address translation and bounds check
5) Ability To Translate virtual addresses and check if they are within bounds: This is done for example
using MUX and Adders etc
6) Ability To Raise Exceptions: When a process tries to access privileged instructions or makes an
illegal memory reference/access.
NOTE: To allocate memory to processes in the main memory, the OS must keep a track of which parts
of the main memory are free and not in use. To do so, the OS will use a Free list which is simply a list
of all those address ranges of the physical memory which are currently not in use
1) To start the process, the OS in kernel mode will allocate an entry for this process in the process
table. It will the allocate memory for this process by giving it an address space by searching for a free
space in the free list to find a room for this new address space and mark it as used. This space will
now be allocated for this process. Thus the OS will now set the base/bounds registers of the CPU for
this address space
2) To run the process, the hardware will then restore the registers of the process and move to the
user mode and jump to the process’s current PC value. To fetch an instructions, the hardware will
translate each virtual address into physical address and fetch the instruction from this physical
address.
The fetched instruction is then executed in the user mode. If this instructions is a load/store then it
will need to access the main memory. Hence the hardware will again translate the virtual address for
memory access into physical address:
--->If the physical address is within the bounds , then the memory access is valid and load/store will
be performed
--->If the physical address is outside the bounds, then an exception will be raised and the hardware
will now move in to the kernel mode to handle this exception. Go to stage 4
In this way, the process will continue to run (ie use the CPU) until a timer interrupt occurs to give the
CPU to another process. In this case of context switch, the hardware will move back to the kernel
mode and jump to the interrupt handler.
3) When a context switch is made by an interrupt, the OS uses its interrupt handler to fo the
following:
There is only one base and bounds register pair on each CPU and their values differ for eacch running
process because each process is loaded at a different physical address in the main memory. Thus the
OS will call the switch() routine , which will save the alues of all the registers (including the
base/bounds) for this process into some per-process structure called the Process Control Block(PCB).
It will then restore the values of all the registers (including base and bounds) from the PCB of the next
process which now needs to use the CPU.
4) If an exception is raised, then the hardware will move into the kernel mode so that the OS can
handle the exception by calling the exception handlers. These exception handlers will likely terminate
the process which caused the exception by moving to stage 5
5) When a program has completed its execution or is killed, the OS in kernel mode will deallocate the
memory which was previously allocated for this terminating process by reclaiming all its memory,
putting this memory back in the free list and clean up any data associated with this process. The OS
will also free this process’s entry from the process table.
NOTE: When a process is not running, the OS might move its address space from one location to
another location. To do so, the OS will first deschedule the process, then copy the address space from
the current location to the new location and update the saved base/bounds registers in the PCB of the
process to point to the new location and free the previous address space.
This technique of Dynamic Reallocation is simple but is very inefficient due to the following reason:
However, if a program is not much complex, then it won’t be using all of the 14KB of its free space.
Thus in such cases, much space within a process’s address space is left unused. Since this space can
not be allocated to another process because its allocated to a process and this process is also not
using this space entirely, hence we say that this space is wasted…. there exists space in main memory
but can not be used.
This type of waste is called internal fragmentation --->The space within a process’s address space is
not used entirely and thus is wasted.
This is wasted
Therefore, dynamic reallocation causes a lot of internal fragmentation. Thus we will have to change
out technique of virtualizing the memory a little(discussed in next chapters).
CHAPTER 16:
So far we have been putting the entire address space of each process in memory. With the base and
bounds registers, the OS can easily relocate processes to different parts of physical memory.
However, you might have noticed something interesting about these address spaces of ours: there is
a big chunk of “free” space right in the middle, between the stack and the heap.although the space
between the stack and heap is not being used by the process, it is still taking up physical memory
when we relocate the entire address space somewhere in physical memory; thus, the simple
approach of using a base and bounds register pair to virtualize memory is wasteful. It also makes it
quite hard to run a program when the entire address space doesn’t fit into memory; thus, base and
bounds is not as flexible as we would like.
To solve this problem of internal fragmentation, we came up with a new idea of virtualizing memory:
---->SEGMENTATION:
Therefore, instead of placing all these segments together along with free space (as seen in dynamic
reallocation) in the main memory, we can place these segments of a process separately in the main
memory. This is the idea of segmentation.
In segmentation, the address space of a process is split into three logical segments: Code, Heap And
Stack. A segment is a contiguous portion of the address space of a particular length. The OS places
each one of these segments in different parts of the physical memory and now instead of having just
one base/bounds pair in our MMU, now each segment will have its own base/bounds pair.
Base_S
Bounds_S
Base
Bounds
Now only used memory is allocated space in physical memory, and thus large address spaces with
large amounts of unused address space (which we sometimes call sparse address spaces) can be
accommodated.
For address translation and checking of bounds to support segmentation, now the hardware structure
in our MMU must have a set of three base and bounds register pairs… Now the bounds register is
called the SIZE register which now holds the size of the segment, as shown below:
You can see from the figure that the code segment is placed at physical address 32KB and has a size of
2KB and the heap segment is placed at 34KB and has a size of 3KB. The size segment here is exactly
the same as the bounds register introduced previously; it tells the hardware exactly how many bytes
are valid in this segment.
Lets say that the system uses 14-bit logical address space. This means that each virtual address is of
14 bits (in binary).
Up til now we have split the address space into 3 segments: Code, Heap And Stack. Thus what we can
use is some indicators like 0 (binary 00) for Code, 1 (binary 01) for Heap and 3 (bianry 11) for stack.
Hence with 3 segments we can see that 2 bits (3 < 4 = 2^2) are enough to indicate to which segmemnt
the reference was made to. Hence the top 2 most significant bits of the virtual address tells us which
segment we are referring to. This approach of splitting up the address space into segments based on
the top few bits of the virtual address is known as an explicit approach.
The remaining bits of the virtual address give the offset of the data to be fetched from that segment.
Hence, once we know which segment of the address space the reference was made to, we can use
the formula below to get the physical address:
Now whether this reference is legal or not is determined using the offset and the size of the segment..
If the offset is within the size of the segment, then it means that the access was valid and hence the
process will continue. But if the offset is outside the size of the segment, then it means that the
process is trying to access something which is not in its segment, due to which a segmentation fault
error will occur.
With Code and Heap(ie 00 and 01) everything is simple because they both have data placed such that
new data comes at an incremented address below. But a stack grows negatively meaning that new
data comes at a decremented address. Hence in case of stack, the Base is pointing to the start of stack
which is the bottom of the stack, therefore all the data as stack grows will be in smaller adddresses
above it. Hence with stack, we will have to go upwards. Therefore, if the segment is stack, then we
will have to change the offset as shown below
In our 14- bit virtual address, we allocated top 2 bits for segment identification. So remaining 12 bits
are for offset. Thus the maximum an offset/segment can go to is 2^12 = 4096. This is the maximum to
which the Stack segment and heap segment can grow to.
Thus , New Offset = 3072 - 4096 = -1024
Due to this difference in address calculations for structures which grow positively and negatively, the
hardware keeps an extra bit called GrowsPositive? to see which calculation path it should take.
For stack , GrowsPositive? is 0 and for Code and Heap GrowsPositive? is 1
So this is how address translation works in segmentation and how segmentation avoids internal
fragmentation.
Segmentation is much more efficient as compared to dynamic reallocation and it can be made more
efficient with the following features added in the hardware support:
To save memory, sometimes it is useful to share certain memory segments between address spaces.
In particular, code sharing is common and still in use in systems today.To support sharing, we need a
little extra support from the hardware, in the form of protection bits. Basic support adds a few bits
per segment, indicating whether or not a program can read or write a segment, or perhaps execute
code that lies within the segment. By setting a code segment to read-only, the same code can be
shared across multiple processes, without worry of harming isolation; while each process still thinks
that it is accessing its own private memory, the OS is secretly sharing memory which cannot be
modified by the process, and thus the illusion is preserved
With protection bits, the hardware algorithm described earlier would also have to change. In addition
to checking whether a virtual address is within bounds, the hardware also has to check whether a
particular access is permissible. If a user process tries to write to a read-only segment, or execute
from a non-executable segment, the hardware should raise an exception, and thus let the OS deal
with the offending process.
Most of our examples thus far have focused on systems with just a few segments (i.e., code, stack,
heap); we can think of this segmentation as coarse-grained, as it chops up the address space into
relatively large, coarse chunks.. However, some early systems (e.g., Multics [CV65,DD68]) were more
flexible and allowed for address spaces to consist of a large number of smaller segments, referred to
as fine-grained segmentation. Supporting many segments requires even further hardware support,
with a segment table of some kind stored in memory. Such segment tables usually support the
creation of a very large number of segments, and thus enable a system to use segments in more
flexible ways than we have thus far discussed.
WHAT SHOULD THE OS DO ON A CONTEXT SWITCH WITH SEGMENTATION TECHINQUE:
All segment registers must be saved and restored.
WHAT SHOULD THE OS DO IF THE HEAP SEGMENT IS FULL AND NEEDS TO GROW MORE?
A program may call malloc() to allocate an object. In some cases, the existing heap will be able to
service the request, and thus malloc() will find free space for the object and return a pointer to it to
the caller. In others, however, the heap segment itself may need to grow. In this case, the memory-
allocation library will perform a system call to grow the heap (e.g., the traditional UNIX sbrk() system
call). The OS will then (usually) provide more space, updating the segment size register to the new
(bigger) size, and informing the library of success; the library can then allocate space for the new
object and return successfully to the calling program. Do note that the OS could reject the request, if
no more physical memory is available, or if it decides that the calling process already has too much.
In dynamic reallocation we stick to the assumption that each addresss space is of a fixed size. But
through segmentation, we relaxed this assumption as now each segment of the addres sapce can
have a differnet size. But there is a problme with segmentation:
Due to the different sized segments being allocated in the main memory, the main memory might
quickly becomes full of very small size free spaces(eg 1KB) which are useless as they are not big
enough to be allocated to a segment of a process. Hence there does exist free space in the physical
memory but this space is split up into very very small non-contigous chunks which are useless. This is
known as External Fragmentation.
We can see that compaction is very expensive, as copying segments is memory-intensive and uses a
fair amount of the processor time. Hence we will have to come up with another solution to manage
the free space in the physical memory much more efficiently. This is done in the next chapter. Which
is about free space management.
CHAPTER 17:
Managing free space in segmentation is difficult because in segmentation we are dealing with
variable-sized units (segments are not of fixed size; they can grow/shrink).
(With Paging used to virtualize memory as discussed in next chapters, managing free space is easier
because the pages are of fixed size)
What we want to solve is the issue of external fragmentation : the free space gets chopped into little
pieces of different sizes and thus is fragmented. This will lead to the failure of subsequent requests
because there is no single contiguous space that can satisfy the request, even though the total
amount of free space exceeds the size of the request.
Before moving onto policies to manage free space, we will first look at how the heap space is
managed by malloc() and free() in much much detail.
The heap is initially a free space just like the main memory. Hence the methods used to manage the
free space of a heap are same as the methods used to manage the free space of the main memory.
A free list contains a set of elements that describe the free space still remaining in the heap.
For Example, below shown is a 30-byte heap:
We can see that there are 2 free regions in this heap. Hence the free list for this heap would have two
elements on it: The first entry will describe the first 10-bytes free segment (ie address 0 to 9), and the
next entry will describe the next 10-bytes free segment (ie address 20 to 29):
We can see that in this heap, a contiguous free space region is of maximum size 10-bytes. Therefore,
in this state any request for anything greater than 10 bytes will fail , returning NULL by malloc()
because we are assuming that just like physical memory, the heap can’t be growing.
A request for exactly that size (10 bytes) could be satisfied easily by either of the free chunks.
However, if a request for something smaller than 10-bytes (ie something smaller than the size of the
smallest free region) arrives, then Splitting will be performed: First we will find a free chunk of
memory from the free list that can satisfy the request by using some technqiue (studied ahead in this
chapter) and split it into two parts, where the first chunk will have the size that was requested and
the second chunk will have the remaining space. The first chunk will be returned to malloc() ie will be
assigned to the request and the second chunk will remain in the first list
For example, lets say a request arrives to allocate memory for 1 byte. Now since the length of the
smallest free region in the free list is 10 and the request is for a size lesser than this, therefore,
splitting is performed: The allocator will select one of the free regions which can accommodate the
request by some technqiue (studied ahead in this chapter). Lets say that the allocator chooses the
second free region. Hence this free region of 10-bytes is split into two; first chunk will be of 1 byte and
is given to the request and then remaining chunk of 9 bytes remains in the free list. Hence now the
free list looks something like this:
Thus, the split is commonly used in allocators when requests are smaller than the size of any
particular free chunk.
Now lets say that the free list is looking something like this:
Lets say that previously 10 bytes of space was allocated from address 10 to some object (process).
Now when this space that is consumed is freed using free(), then this space of 10-bytes strting at
address 10 is needed to be added back to the free list. Hence, now the free list is looking something
like this:
We can see that the heap is free contigously from addresses 0 to 29 , but this large free space is
divided into three chunks of 10-bytes each. Therefore, lets say if a user (or the OS) request for a space
of 20-bytes for an object (or process), then this reuqest will be rejected because even though there is
30-bytes of free space available but this 30-bytes is divided into three parts, each of 10-bytes due to
which none of the free regions are ablle to accommodate the request of 20-bytes.
To avoid this problem, we use the method of Coalescing the free space whenever a chunk of memory
is freed: This means that when a free chunk is returned in the memory, then we look carefully at the
address of the chunk that is begin returned along with the addresses of the nearby chunks in the free
list If the newly-freed space sits contigously to any of the existing free chunks then it is merged with
these existing free chunks into a single space.
Above we can see that address 0 to 9 and address 20 to 29 are already in the free list. Thus, when
addresses 10 to 19 are freed, all the three chunks in the free list become contiguous and thus can be
merged together into a single free region as below:
Now the request for 20-bytes can be accepted because now there is a region in the free list which is
big enough to accommodate this request.
The signature of the free function is free(void* ptr). This means that free only takes a pointer to the
heap object as argument. So how does the free() knows how much space should be freed from the
heap as we are only giving it the address of from where to free the memory?
To keep track of how much memory is allocated to a heap object, we store a little bit of extra
information in a header block which is kept in memory, just above the handed-out chunk of memory.
The header is a structure which contains the following two things for basics:
-->Size of the region which is allocated
---> a magic number for integrity checking
It might also contain some other information as well.
This header is created when malloc() is called and is placed at the top of the region which will be
allocated. Therefore, malloc() will always allocate a little more space than that was asked dfor to keep
space for the header as well.
For example, if malloc(20) is called then it is asking to allocate 20 bytes space on the heap. But instead
lets say 21 bytes space is allocated on the heap to allow the additional space to store the header for
this allocation.
Hence the size of the free region is the size of the header plus the size of the allocated space to the
user. Therefore, during allocation, malloc() will always find a free chunk of memory in the free list
whose size is big enough to accommodate the size of requested region plus size of header
When free() is called to free the heap memory from the address given to it as argument, then free()
will simply first calculate the address of where the header is placed by just moving the given pointer
one place upwards because the header is just above the allocated region:
Hence now hptr is pointing to the header of the allocated region. From this, the free() knows exactly
how much bytes were allocated to the free region.Thus it can now easily free the size of header+ size
of the allocated region starting from the header address. Moreover, before freeing , free() will also
compare the magic number of the header with the expected value as an integrity check and if they
both match only then the region will be freed.
So now we know that a free list is maintained to keep track of the free region in the heap memory.
But where and how is the free list is made?
The free list is simply a linked list structure which is kept inside the free region of the memory itself.
Therefore, we are converting the entire free region of the heap into a linked list.
Every node of this linked list will be a structure containing two things:
--> The size of the next free region
--> A pointer to the above free region
In the examples below, assume that the size of the free region/heap is 4KB ie 4096 bytes and the size
of each header is 8-bytes.
Now let’s look at some code that initializes the heap and puts the first element of the free list inside
that space. We are assuming that the heap is built within some free space acquired via a call to the
system call mmap(); this is not the only way to build such a heap.
Initially, the free list will just have one entire free region. Hence, initially the size of the free region is
4096 - 8HeaderBytes = 4088 bytes and since there is no other free region thus, the next pointer is
null(0). Hence the free list/heap looks something like below:
ptr = malloc(100);
Lets say a request arrives for allocation of 100-bytes. Now first the free list is searched to find the first
region that accomodates this requested space. Since there is only one region in the free list, hence
this free region of 4088 bytes is selected and split into two parts; the first is of 100+8HeaderBytes ie
108 bytes which is allocated for the request and the remaining 3980 bytes region is left in the free list:
ptr = malloc(100);
Lets say another request arrives for allocation of 100-bytes. Now first the free list is searched to find
the first region that accomodates this requested space. Since there is only one region in the free list,
hence this free region of 3980 bytes is selected and split into two parts; the first is of
100+8HeaderBytes ie 108 bytes which is allocated for the request and the remaining 3872 bytes
region is left in the free list:
ptr = malloc(100);
Lets say another request arrives for allocation of 100-bytes. Now first the free list is searched to find
the first region that accomodates this requested space. Since there is only one region in the free list,
hence this free region of 3872 bytes is selected and split into two parts; the first is of
100+8HeaderBytes ie 108 bytes which is allocated for the request and the remaining 3764 bytes
region is left in the free list:
Now lets say we want to free the region pointed by sptr
free(sptr);
So now simply the 100 bytes allocated from this sptr address will be freed and the header of this
address now becomes a node of the linked list and head of the linked list now points to this recently
freed region of 100 bytes and the next of this node will point to the next 3764-byte free region of the
list. Thus, we can see that now there are two free regions/ nodes in the free list:
Now lets say another alloated region poited by sptr is needed to be freed:
free(sptr);
So now simply the 100 bytes allocated from this sptr address will be freed and the header of this
address now becomes a node of the linked list and head of the linked list now points to this recently
freed region of 100 bytes and the next of this node will point to the previously freed region where the
head of the list previously was:
Now lets say another alloated region poited by sptr is needed to be freed:
free(sptr);
So now simply the 100 bytes allocated from this sptr address will be freed and the header of this
address now becomes a node of the linked list and head of the linked list now points to this recently
freed region of 100 bytes and the next of this node will point to the previously freed region where the
head of the list previously was:
So upon three frees now there are three free regions in this non-coalesced free list . Thus the free list
is split multiple chunks which is bad as we studied above. Hence always remember that are each
free() , the free list is coalesced to merge those free regions together which are contigous . Hence
after coalescing the linked list will look something like this:
Now we assumed that the heap does not grows if it runs out of memory just to simulate a physical
memory. But in reality, If the heap grows out of space, then it will request the OS to give it more
space in the physical memory by calling the system call sbrk().To service the sbrk request, the OS finds
free physical pages, maps them into the address space of the requesting process, and then returns
the value of the end of the new heap; at that point, a larger heap is available, and the request can be
successfully serviced.
So we leaned that that allocator looks for a chunk in the free space which is big enough to service the
request and then uses splitting. But what strategies are used to select the chunk because there can be
many chunks in the free list which are big enough to service the request. Below we will study such
strategies used to select a chunk from the free list.
Idea of Segregated Lists: if a particular application has one (or a few) popular-sized request that it
makes, then keep a separate list just to manage objects of that size; all other request are forwarded
to a more general memory allocator.
-->Advantage: By having a big chunk of memory dedicated for just one particular size of requests,
fragmentation is a much less concern. Moreover, allocation and free requests can be served much
more quickly, as no complicated search of the free list is required
So how much memory should be allocated for this separate list/memory that serves the specialised
requests for a given size?
STRATEGY 6: BUDDY ALLOCATION
Coalescing of free regions is imortant to handle allocation requests efficiently. However, coalescing is
dfficult to do. Inorder to ease the process of coalescing, buddy allocation is introduced.
Buddy Allocation can only be performed in a memory whose size is 2^N. ie in powers of 2.
The idea is to repeatedly divide the memory into 2 branches of tree until a leaf is obtained which is
big enough not accommodate the request… then the ENTIRE LEAF WITHOUT SPLITTING is assigned to
this request. Thus when a region is needed to be freed, this leaf can simply be merged back of the
tree without any hastle.
When a request for memory is made, the search for free space recursively divides free space by two
until a block that is big enough to accommodate the request is found (and a further split into two
would result in a space that is too small). At this point, the requested block is returned to the user
Example:Lets say the memory size is of 64 KB (ie 2^7 KB). And a request arrives to allocate 7-KB space.
We will repeatedly divide the memory into 2: 64 -->32 , 32 ---> (16 , 16 ) , (16 , 16) --> ((8,8) , (8,8)) ,
((8,8) , (8,8)) until we reach a block big enough to accommodate the 7KB request. We can see that if
we further divide 8KB block into 2 then it wont be able to accommodate the 7KB request. Hence the
entire first block of 8KB is assigned to the 7KB request and a pointer to it is returned
We can see that this scheme suffers from internal fragmentation: the requeest was for 7KB, but 8KB
was given due to whicb 1KB space is wasted as it can no longer be assigned to any other processs and
neither is used by this 7Kb request. This happens because we are only allowed to give out power-of-
two-sized blocks.
The real advantage of buddy allocation is seen when the block is freed.
When returning the 8KB block to the free list, the allocator checks whether the “buddy” 8KB is free; if
so, it coalesces the two blocks into a 16KB block. The allocator then checks if the buddy of the 16KB
block is still free; if so, it coalesces those two blocks. This recursive coalescing process continues up
the tree, either restoring the entire free space or stopping when a buddy is found to be in use.
The buddy of a block is simple to determine because each buddy pair differs only by a single bit. This
makes buddy allocation to work very well.
Most of the above strategies make use of free LIST, searching through which can be very slow.This
affects performance.. Hence advanced allocators use more complex data structures like binary trees,
splay trees or partially-ordered trees.
CHAPTER 18:
Uptil now we saw two methods to virtualize main memory:
---> Dynamic Reallocation: Each Address Space is of a fixed size. This causes internal fragmentation
---> Segmentation: Each address space is divided into variable-sized segments and segment registers
are used for translation. But the variable sized segments caused external fragmentation. Thus to
overcome external segmentation the free space must be managed efficiently for which we learned
free space management techniques. But managing freee space itself is complex because of the
variable sized segments.
Hence now we shall learn a third approach to virtualise the physical memory:PAGING
The idea of paging is to split the physical space available into fied-sized pieces which makes the
management of free space much less complex and much more efficient
WHAT IS PAGING:
The address space of a process is divided into fixed-sized unit called pages.
(The size of each unit is called the page size).
Similarly, the physical memory is made into an array of these fixed-sized slots called page frames
(The size of each page frame is same as the page size).
Then we can assign each virtual page of the process to a page frame in the physical memory.
EXAMPLE)
The size of physical memory is 128 bytes. The page size is 16-bytes.
Therefore, the entire 128 bytes memory is divided into an array of 16-byte units called page frames.
Thus, there will be total 128/16 = 8 page frames in the physical memory:
Consider a process having a 64-byte address Space. This 64-byte address space is split into 16-byte
units called pages. Thus, in total the address space of this process needs 64/4 = 4 pages
Now it must be noted that the process thinks it is assigned pages starting from 0. Like the above
process thinks that it has been assigned pages 0,1,2 and 3 in the physical memory. This is called the
virtual page number (VPN) that the process thinks it has been assigned.
But in physical memory, the process is given some other page number from the available page
frames. This page number which is actually assigned to a page of the process in the physical memory
is called the page frame number (PFN).
In the above example, the VPN 0 of the process is assigned the PFN 3, the VPN 1 of the process is
assigned the PFN 7, the VPN 2 of the process is assigned the PFN 5, and the VPN 3 of the process is
assigned the PFN 2.
To record where each virtual page of the address space is placed in physical memory, the operating
system usually keeps a per-process data structure known as a page table. Therefore, each process has
its own page table to store the address translations for each of the virtual pages of the address space,
thus letting us know where in the physical memory each page resides.
We can see that the virtual page number always starts from 0 for a process because this is where the
process thinks it begins from. Thus a page table can be simple structure like an array, where the index
of the array represents the VPN of the page and the integer value in the array element represents the
corresponding PFN where this virtual page actually is in the physical memory:
VPN 0 1 2 3
PFN 3 7 5 2
Each process will have a different mapping of its virtual pages to page frames because a frame given
to one process can not be allocated to another. Hence each process will maintain its own page table,
and thus the OS will have to manage the page table of the process which is currently runnning.
The main advantage of paging is the simplicity of free space management that it provides due to fixed
sized units. For example, when the OS wishes to place our tiny 64-byte address sapce into our eight
page physical memory, it simply finds four free pages in the physical memory and maps the pages of
the address space to these free page frames in the main memory. Therefore, the OS will simply have
to keep a free list of all the free pages present in the physical memory and just grab the first 4 pages
off of this free list.
So we can see that the process thinks that it has been given virutal pages 0, 1, … and so on. Bu tin
acutal, the pages can be spread anywhere in the physical memory. This visualizes the memory.
So now, how address translation is done in paging?
Lets assume that the virtual address space is of 64-bytes. This means that logical Address bits are
(64 = 2^6) 6 bits. Ie each virtual address is of 6 bits.
The page size is of 16 bytes. This means that the address space of the process needs in total 64/16 = 4
virtual pages. To represent the VPN of these four pages from 0,1,2,and 3, we only need (4 = 2^2) 2
bits.
Therefore, when the virtual address is converted into 6 bits binary, the top 2 most significant bits tell
us the VPN of the page from which data is needed to be accessed. From the page table of the process
we can get the corresponding PFN.
Thus now we have the exact page number of ht ephysical memory from were access is requested.
Now in this page, a lot of bytes are there.So which data is requested? This is indicated by the offset
which are the remaining 4 bits of the virtual address.
In this way, when a access is made , the exact location in the pyscial memory can be found using the
abve translation
The first 2 bits are VPN and remaining bits are offset
In the page table above, we can see that the VPN 1 is mapped to PFN 7 (111). Hence now the
translation is
PFN = 7
Offset = 5
Therefore, virtual address 21 in this example means to access the 5th bytes of PFN 7 in the physical
memory.
In this way the virtual address 010101 is converted into the physical address 1110101
In coding this is done by:
where VPN_MASK is: 110000 --> Obtaining the top 2 VPN bits from the 6-bit virtual address
SHIFT = 4 ---> Remanining 4 bits of the offset
The start address of the page table in the main memory is held by the hardware in the PageTableBase
Register. From this the address of the page table entry corresponding to this VPN is found using the
below:
Where PTEAddr is the address of the page table entry of the corresponding VPN
PageTableBaseRegister holds the starting address of the page table
From this PTEAddr address we get the page table entry of the corresponding page, from which we can
extract the PFN.
Above we have discussed a very small memory and address space just for the sake of example. Bu tin
reality, page table can get terribly large. See a real world example below:
Example) Lets say that the system uses 32-bit logical address space. This means the virtual addresses
can range from 0 to 2^32 ie the size of virtual address space is 2^32 = 4,294,967,296 bytes.
This means that each process needs a maximum of 4,294,967,296/4096 = 1,048,576 pages.
This means that out of the 32-bit virtual address, the top most (log base 2 (1,048,576) = ) 20 bits
represents the VPN and the remaining 12 bits are used for offset.
This means that a process needs maximum of 2^20 = 1,048,576 pages for its address space, due to
which there will be f 2^20 = 1,048,576 entries in the page table. If we assume that each page table
entry (PTE) takes 4 bytes per entry then the total bytes taken by this page table are 4 * 1,048,576 =
4MB.
Now imagine that there are 100 processes running; Since there is one page table per process, this
means that the OS will need 100 * 4MB = 400 MB of memory just for those address translations.
And if the logical address space bits increases, then the size of page table will increase even more.
Since page table are so big, therefore they are not kept on some special chip in the MMU which does
the translations. Instead, page tables reside somewhere in the memory.
Uptil now we know that a page table contains mapping of VPN to PFN… but that’s not all. It still is an
array,where index represents the VPN but each entry of the page table array is a structure comprising
of the following:
-->int PFN; this is the correspondingpage frame number of the VPN
-->Valid bit; When a program starts running, it will have code and heap at one end of the
Address space and stack at the other end. There between these two ends there
There will be unused free space. Now if the program makes an access to either
The code or heap or stack it should be allowed but if it makes access to the
Unused space which has no useful data then it should not be allowed. To
Implement this a valid bit is used for each page. If the page contains the unused
space only, then it will be marked invalid (valid = 0) so that if the program
Accesses an invalid page it should terminate. All other pages which contains
Useful data will be valid (valid = 1) indicating the program can access these
Pages. by simply marking all the unused pages in the address space invalid, we
remove the need to allocate physical frames for those pages and thus save a
great deal of memory.
-->Protection Bits: Protection bits are used to indicate whether a page could be read from,
Written to or executed from.If a progrram tries to access a page in a way
Not allowed by these bit then a trap will be generated by the OS.
-->Present Bit: The present bit indicates whether this page is present in the main memory or
Is present in the hard disk (ie it has been swapped out - discussed ahead)
-->Dirty Bit: Indicates whether the page has been modified since it was brought into the
Memory
-->Reference bit/Accessed bit: This is used to track whather a page has been accessed or not.
It is useful to determine the popular pages which should this be
Kept in the main memory.
Struct PTE{
int PFN;
type ProtectionBitl
int ValidBit;
int DirtyBit;
int PresentBit;
int ReferenceBit
};
So the entire process is below:
The process generates a virtual address. From the virtual address, the VPN and offset are extracted.
The VPN is used to get the address of the corresponding page table entry of the page from the page
table. From this address of page table entry, the page table entry of the VPN is obtained. First the
valid bit of the page is checked. If its 0 , then segmentation fault occurs because the process has
accessed an invalid page. Then the protection bit is checked to see whether the page is allowed to be
accessed or not. If not allowed, then segmentation fault occurs. If all these checks pass, then it means
that the process can access the page. Hence the page table entry we obtained is used to get the
corresponding PFN of the page. The PFN and the offset together are used to gt the physical address of
the main memory which needs to be accessed. Then access is made to this address in the main
memory and the task is performed.
Going to memory for translation information before every instruction causes the system to become
very slow because we know that access to main memory is slow (access cache is fastest, and access to
disk is the slowest).
So to make paging faster, the OS needs some additional hardware. This hardware is called the
translation-lookaside buffer or TLB.
The TLB is part of the MMU which handles the translation, and is simply a cache (and thus have the
fastest access).
TLB is also known as an address translation cache because it’s a cache used for virtual address
translation in paging.
Instead of holding all the mappings of VPN to page table entry in only the main memory, the popular
mappings which are frequently accessed are ALSO stored in the TLB cache. Since it’s a cache,
therefore, if a popular mapping is found in the TLB then it can be extracted very quickly because
access to cache is faster than access to main memory. But if its not found, then we always have the
option to look for the mapping in the pagetable stored in the main memory. In this way, TLB provides
faster access to the popular mappings thereby speeding up the process of address translations in
paging.
Therefore our aim should be to avoid TLB misses as much as possible because a miss results in an
extra memory reference which is slow.
EXAMPLE) BELOW WE WILL SEE A C PROGRAM WHICH MAKES ACCESSES TO AN ARRAY. LETS SEE
HOW TLB ALGORITHM WORKS HERE:
From this example, we can see that the performance of TLB depends upon three factors:
--> Spacial Locality of a Program: If the program consists of structures like array where data is closed
packed together usually on same page, then the number of TLB hits will increase as the page will likely
be in the TLB, thereby making paging faster
-->Temporal Locality: If the program is such that the structures are re-referenced repeatedly in a
small span of time, then the page they are in is most likely in the TLB, thereby increasing TLB hits and
hence the hit rate and thus speeding up paging
-->Page Size: The larger is the page size, the more data can be fit into a page, thereby allwoing
multiple accesses to refer to the same page which might already be in the TLB, thus increasing TLB
hits and hence speeding up the paging.
WHO HANDLES THE TLB?
In old days , the TLB were managed by the hardware(MMU). Hence , previously they were known as
hardware managed TLBs. This meant that upon a TLB Miss, the hardware would use the Page table
Base Address to get the start address of the process’s pagetable, it would go through the entire page
table to find the page table entry corresponding to the VPN and extract the desired translation and
update it in the TLB and retry the instruction.
The hardware was used in the olden days because the hardware developers did not trusted the OS
people
But in modern architectures , the TLBs are managed by the OS. Hence, TLBs in these systems are
called software-managed TLBs. This means that upon a TLB Miss, the hardware will simply raise an
exception, which will raise the privilege level to kernel mode and jump to the trap handler which is
within the OS. The trap handler code will run, which will look up the translation in the page table, and
use special privileged instructions to update the TLB and return from the trap. Now the hardware
(MMU) can retry the instruction. It must be noted here the trap code must resume the execution at
the instruction that caused the trap to allow the instruction to run again once the TLB is updated , this
time resulting in a TLB hit.
The advantage of software-managed TLBs is flexibility. Ie the OS can use any data structure it wants to
implement the page table, without necessitating hardware change. Simplicity is another advantage.
TLB CONTENTS:
TLB is a fully-associative cahce. This means that any given translation can be anywhere in the TLB, and
hence the hardware will have to search the entire TLB in parallel to find the desired translation.
Therefore to store a translation in TLB, we must store the VPN along with its corresponding PFN.
Moreover, each translation/VPN in TLB also has some other bits:
-->Valid Bit: The TLB Valid bit is not the same as the PTE Valid bit. The PTE Valid bit tells whether or
not that page has been allocated by the process. If not (ie invalid), then the page should not be
accessed by the running program. On the other hand, TLB Valid bit refers to whether a TLB entry has a
valid translation or not. When the system boots up, for each TLB entry the valid but is set to invalid
because initially no address translation are yet cached there. Once the virtual memory is enabled and
the processes start running and accessing their virtual address sapce, the TLB gets slowly populated
and thus the valid entries soon fill the TLB.
-->Protecction Bit : Same as the PTE protection bit telling how the page can be accessed
-->address-space identifier
--> Dirty Bit
Etc
TLB ISSUE:
This means to make the valid TLB bit of all the entries in the TLB to 0 to indicate that they are not valid
and should not be used.
However, this method is not much efficient because flushing the TLB on each context switch would
mean that each time a process runs, it must incur TLB Misses because initially there will be no valid
translations for the process in the TLB, even though it ran before. Thus, there is a lot of overhead with
this approach.
WAY2) An ASID (address space identifier) field is added to the TLB entry to indicate that a particular
TLB entry belongs to the process with a particular PID. Therefore, a TLB entry can only be used for
translation, if the ASID of the TLB entry matches with the process’s PID.
In this way, the TLB can be shared across multiple processes without worrying about one process to
use some other process’s translations.
NOTE: If two or more processes VPN is mapped to the same PFN, it means that these processes are
sharing that page. Sharing of code pages is useful because it reduces the number of physical page
frames in use, thereby reducing memory overheads.
If the number of pages a program accesses in a short period of time exceeds the number of pages that
fit into the TLB, the program will generate a large number of TLB misses, and thus run quite a bit more
slowly. We refer to this phenomenon as exceeding the TLB coverage, and it can be quite a problem
for certain programs.
One solution, as we’ll discuss in the next chapter, is to include support for larger page sizes; by
mapping key data structures into regions of the program’s address space that are mapped by larger
pages, the effective coverage of the TLB can be increased
REPLACEMENT POLICY:
We know that TLB is a cache and all cache are of small size due to cost constraints.
Therefore, there will come a time when the TLB becomes full.
Now you will have to replace an existing translation with a new translation in the TLB.
CHAPTER 20:
In paging we identified two errors which ae unsolved uptil now:
--> Page tables can get very large because of the large number of pages allocated for each process’s
address space
--> If a program access mcuh more pages in a short period such that these pages can not fit in the TLB,
then there will be a lot of TLB misses causing the program to slow down IE TLB coverage
Therefore, there must be 16 entries in the page table as shown below. The pages which contain useful
data will be given page frames in the memory and their valid bit will be 1. The remaining pages which
are not yet used by the process will not be given any page frame in the memory and their valid bit will
be 0:
We can see that the smaller is the page size, the more pages the process’s address space will need
and hence the more entries will be there in the page table.
We can see that when page size is increased, the number of pages is reduced, due to which there will
be lesser entries in the page table thereby reducing page table size.
Now lets say that the last page of a process uses 100 bytes only.
When page size was 8KB (8192 bytes), then in the last page 8192 -1000 = 8092 bytes will be unused
When page size was 16KB (16384 bytes), then in the last page 16384-1000 = 16284 bytes will be
unused and wasted.
Therefore, increasing the page size increases the internal framentation. Therefore to use the physical
memory more efficiently, many systems use smaller page size.
We can see that the unused pages which are invalid take up a lot of space in the page table. (even
though they are not given any page frame, but still they are in the page table because they might be
used by the process as it grows). Hence now we can focus on reducing these entries of invalid pages
from the page table to reduce the page table size
This can be done by the hyrbid approach. The hybrid approach is a combination of paging and
segmentation: Instead of having a single page table for the entire address space of the process, we
will have one page table PER logical segment of the process.
This means that if we define to divide the address space of a process into three segments (code,heap
and stack), then the process will have three page tables in total: one for code, one for stack and one
for heap. In this way we can remove the invlaid pages from the page table which are not yet used by
the process and the process can take on more pages as the stack and heaep grows
Now just like segmentation, we will have a per segment base register on the MMU to HOLD physical
address of the page table of that segment and a bounds/limit register to hold the number of valid
pages in that segment to indicate the end of the page table of that segment. Memory accesses
beyond the end of segment will generate an exception and will likely terminate the process.
We can see that the unused pages of the process are no longer in the page table due to which the size
of the page table is significantly reduced.
Lets say that the system uses a logical address space of 32-bits and page size of 4KB.
Since a process’s address space is divided into three segments, hence to represents these three
segments by some binary number we can use (3 < 4 = 2^2) 2 bits: lets say that 00 represents unused
segment (ie not applicable in our design) , 01 represent code segment , 10 represent heap segment
and 11 represents stack (as in segmentation). thus the top 2 most significant bits of the evirtual
address in binary will represent which segment is needed to be accessed. Then this segments page
table will be used for translation
The next remaining bits in the virtual address represent the offset IE byte of the page which is needed
to be accessed
The VPN will first be looked up in the TLB .. if TLB hit then corresponding PFN will be obtained and
using the PFN and offset the memory access will be made
If the VPN is not present in the TLB.. then there is a TLB miss due to which now a memory reference
will be made to access the page table of the segment we found and translation will be updated in the
TLB and instruction will be repeated for TLB hit.
Muti level paging also focuses on reducing the page table size by removing the invalid regions in the
page table instead of keeping them all in the memory. This is the most effective approach.
Multi level paging convert the linear page table into a tree.
An even better solution is to use Inverted Page tables:
Inverted page tables solve the problem of large memory required to keep the page tables:
The idea is that instead of each process having its own page table, a single page table will be shared
across all the processes.
To enable this, there will be an additional entry in the page table to tell to which process the page
table entry belongs so that only that process can access this page table entry (or some other process
to show this page is shared).
Finding the correct entry is now a matter of searching through this data structure. A linear scan would
be expensive, and thus a hash table is often built over the base structure to speed up lookups.
Since now instead of each process consuming memory for its own page table, now there is only one
large page table which wont consume as much space as that required by all the page tables of all the
processes combined.
Okay so uptil now we were assuming that a process can come in the main memory only if it fits
entirely in the main memory. However, there can be situations when there is some space left in the
main memory but still a process can’t be placed in the memory because it needs much more space
than that avaialble. So ofcourse its not good to waste this free space of the mainmemory because its
precious.
Hence now we will relax this assumption as well ie now a process can be placed in the main memory
even if there is not enough space in the main memory to accommodate the entire process. This is
done through page swapping (next chapter).
CHAPTER 21:
If all the pages of a process are not able to fit in the main memory, then what we can do is place as
many pages as can fit in the main memory and put all the remaining pages of the process in the hard
disk (which has a lot of space but has very very slow access). Thus, the pages can now be swapped to
and from the hard disk depending upon which page is requested by the process.
The part of the hard disk where the pages of a process are stored is called the swap space.
Therefore, the pages of a process are in the memory as well as the hard disk. For a running process,
all those pages which are currently in use and recently accessed are in the main memory and the
present bit of the PTE of each of these pages is 1 to indicate that these pages are present in the
memory. All those pages which are not currently in use by the running process are present in the hard
disk even if they do fit in the main memory so that not all of the main memory is occupied and the
present bit of all these pages in the disk is 0 to indicate that these pages are not in the main memory.
Pages can be swapped to and from the swap space. But how does the OS knows the adddress of
where does the pae resides in the disk? Well, to do so,
We know that if a process is in the main memory, then it will have a page frame and thus a PFN
allocated to it which is written in the page table entry. But if the page is not in the memory but in the
disk, then it won’t have any page frame alloacted to it. Instead the non-present page will have an
addresss of where it is stored in the disk. Therefore for all the pages that are in the disk, their page
table entry will have the adddress of where they are present in the disk.
So how the entire process works with page swapping:
2) The hardware then check the TLB of the MMU for a matching VPN entry.
If the VPN is found in the TLB, then TLB hit is encountered. Thus the corresponding PFN of the page is
obtained and the memory reference is made at the given offset of the found PFN.
However if the VPN is not found in the TLB then we have encountered a TLB miss.In this case, the
hardware locates the page table of the processs (using the page table base regsiter ) and looks up the
page table entry using the VPN as inde.
If the entry at the given VPN of the page table is valid and present, then it means that the page is
indeed in the main memory and can be accessed. Thus, the corresponding PFN is obtained from this
page table entry and the TLB is updated with this mapping and the instruction is executed again this
time resulting in a TLB hit
However, if the entry at the given VPN of the process table is valid but not present (present bit =0),
then it means that this indeed is a valid page referenced but the page is in the hard disk and not in the
main memory.This situation is known as Page Fault.Thus the OS is invoked to serve the page fault by
running a particular piece of code called the page fault handler. The page fault handler looks for the
address of the swapped page in the Page table and issue a request to this disk address. During this
disk I/O , the process is blocked because I/O is time-consuming process , thereby allowing the OS to
run some other process in the meantime while the page gets swapped back into the memory . When
the disk I/O completes, the page is fetched into the memory , the OS updates the Page table entry of
this page with the PFN allotted to it in the main memory to record the in memory location of the
newly fetched page and this mapping is also updated in the TLB. Then the instruction is retried, this
time resulting in a TLB hit.
At any point if an invalid page is accessed by the process, then the hardware raises an exception
which cuases the OS to terminate the process.
In this way , even if all the pages of the process can not fit in the memory, still the process can run by
swapping page to and from the swap space of the hard disk.
Now it must be noted that if the main memory is full and swapping is done, then inorder to swap aa
new page from disk to memory, some existing page in the main memeory will be needed to be
swapped back into the hard disk. So now we will study some policies which are used to decide which
page from the memory should be swapped in to the disk..
These policies are very important because it depends on these policies which pages are swappd out of
the memory. If mistakenly we keep on swapping soome popular page out of the memory which is
accessed again and again, then in each memory accesss we will have to access the disk to get this
page back in the memory and acces to disk is the most time consumimg thereby making the system
very inefficient.
NOTE that page replacements occur not only when the entire physical memory is full. The OS must
keep a small amount of memory free. Inroder to ensure this, OS uses two things:
-->High watermark (HW)
-->>Low watermark (LW)
To decide when to start evicting pages from the main memory to the hard disk.
When the OS notices that there are fewer than LW pages available in the main memory, it runs a
thread which is responsible to free the memory by evicting pages from the main memory into the
swap space of the disk until there are HW pages available in the main memory. This background
thread is known as swap daemon or page daemon.
OKAY SO HERE IS THE SUMMARY OF OVERALL MEMORY VIRTUALIZATION:
Memory is needed to be virtualized so that multiple processes can reside in the main memory
while ensuring efficiency (ie faster accesses ) , protection and control (ie a process should not be
able to access anything outside its address space) and flexibility (ie the process can do anythig
within its own address space). To virtualize memory, we have learnt three methods:#
Virutalization of memoyry means that each proces think that its has its own memory when
indeed the OS is secretly mutlipplexing address spaces across physical memory and sometimes
the disk
NOW THAT WE KNOW HOW MEMORY IS VIRTUALIZED BY THE OS, WE NOW MOVE TOWARDS
LEARNING ABOUT CONCURRENCY
CHAPTER 26 & 27 COMBINED: CONCURRENCY
Now we will introduce the idea of concurrency. Concurrency is the ability to run multiple tasks or
processes simultaneously/in parallel at the same time.
We have learned that a single application will run on the same CPU no matter how many CPUs are
there. Now in-order to make this application run faster , we want this application to run on multiple
CPUs in parallel. This can be achieved by re-writing the application using THREADS.
Up til now all the programs we looked at are single threaded programs. In these programs, there is a
single program counter where instructions are being fetched from and executed.
However, now we are going to learn about multi-threaded programs, which have more than one
point of execution. This means that the program has multiple program counters, each of which is
being fetched and executed from.
So in simple words: Each thread is very much like a separate process, except for one difference: all the
threads belonging to a process share the same address space and thus can access the same data.
The state of a thread is very similar to that of a process. Each thread has its own program counter that
tracks where the program is fetching instructions from. Moreover, each thread has its own private set
of registers that it uses for computation. Due to this reason, lets say that there are two threads
running on a single processor. Now in order to switch from running one thread (t1) t running the
other thread (t2), a context switch must take placed inorder to save the register state of t1 and
restore the register state of t2 just like we did in the context switch of processes. With processes, the
register states are saved to and restored from the Process Control Block, thus with threads , each
thread has its own thread-control block (TCB) where the register states or stored to or restored from.
However, in the context switch for processes we used to change the address space. But in case of
threads the context switch will keep using the same address space to ensure that the same page table
is used by all the threads belonging to a process.
The single thrad programs that we have seen uptil now had one stack and one heap and one code
segment. As shown eblow
However, in a multi threaded process, each thread will have its own stack, but the heap and the code
segement will be SHARED across all the threads. Below shown is the address space of a process.. We
can see that the address space has two stacks. This means that this process has 2 threads:
Therefore, any variables or parameters or return value a thread creates will be placed on the stack of
this thread only.. thus the stack is sometimes called the thread-local storage.
Threads are used to introduce parallelism to the program. This means that when a process uses
multiple threads, then the process is basically split into multiple tasks, each of which is executed in
parallel at the same time. This will cause the process to be executed much more quickly as compared
to if it used only a single thread
EXAMPLE: Lets say that you want to write a program to initialize an array of 1000 elements with the
integer 0.
-->Single threaded approach: This will be to simply iterate the array from 0 to 999 index , putting a 0
in the ith index in each iteration. This program will take very long to run
-->Multi threaded approach: here you divide the task of filling the array into multiple threads. Lets say
you are using 2 threads. So the first thread will initialize the first 0 to 499 elements of the array and
the second thread will initialize the next 500 to 999 elements of the array. Since the two threads can
run in parallel, hence the entire array will be filled in half the time.
Thus we can see that using threads can make processes execute faster by splitting them into tasks
each of which can run in parallel.
This task of transforming your standard single-threaded program into a program that does this sort of
work on multiple CPUs is called Parallelization. Usually one thread per CPU is used.
The second reason to use threads to avoid blocking program progress due to slow I/O. Imagine that
you are writing a program that performs different types of I/O: either waiting to send or receive a
message, for an explicit disk I/O to complete, or even (implicitly) for a page fault to finish. Instead of
waiting, your program may wish to do something else, including utilizing the CPU to perform
computation, or even issuing further I/O requests. Using threads is a natural way to avoid getting
stuck; while one thread in your program waits (i.e., is blocked waiting for I/O), the CPU scheduler can
switch to other threads, which are ready to run and do something useful. Threading enables overlap
of I/O with other activities within a single program, much like multiprogramming did for processes
across programs; as a result, many modern server-based applications (web servers, database
management systems, and the like) make use of threads in their implementations.
In POSIX (Portable Operating System Interface), Threads are implemented using the “pthread.h”
library .
CREATING A THREAD:
A thread is created using the following function belonging to the pthread.h library:
pthread_create(pthread_t *Thread ,
Const pthread_attr_t *attr,
Void *(*start_routine) (void*) ,
Void *arg);
The first parameter is a pointer to pthread_t structure. This is used to interact with the thread. Thus,
we will pass this into the function to indicate that this thread is to be created. The second parameter
is used to specify any attributes this thread might have eg setting the stack size or perhaps
information about the scheduling priority of the thread.An attribute is initialized with a separate call
to pthread attr init() Usually, in our level, we pass NULL for the second argument to keep things
simple. The third argument is a thread function pointer that we will create. This thread function is
given to the thread to indicate what task(s) this thread needs to perform. Hence this thread will
perform this function when it starts. Any arguments to the function are given as the fourth
parameter.
The thread function is a function whose return type is void * ie it can return a pointer to any data
type. Moreover, the parameter that the thread function takes in is also of void* indicating that the
function can take a pointer to any data type as it input.
Once the threads are created, it must be kept in mind that the OS scheduler can run any thread it
wants to in any order … this means that even if you are first creating thread t0 and then t1 and then
t2, then its not determined that first thread t0 will run then t1 then t2 then t3 etc. The run sequence
of the thread will be determined by the OS scheduler and hence can not be deteremined by the
programmer in which order the threads can run.
Moreover, it must also be kept in mind that the main is also itself a thread. So we must make sure
that the main thread does not finishes it execution until all the threads t0,t1 … have executed. To do
so , we will join the threads that we created … join will simply make the main thrad wait for the
completion of all the other theads. These threads can be joined by the following function
This function takes two arguments. The first argument is the pthread_t structure of the thread that
we want to join. The second argument Is the pointer to the return value you expect to get back… if
the thread function does not returns anything, then this is set to NULL otherwise the adddress of the
variable which need to hold the return value is passsed here.
EXAMPLE1) In this program there are three threads: the main thread , t0 and t1.
Thread t0 is given the ThreadFunct0 to print “MANNAN” and thread t1 is given the ThreadFunct1 to
print “RANGOONIA”.
Now even though t0 is created first and t1 is created next, it is the OS scheduler’s choice to run which
thread… sometime it might run t0 first and then t1 due to which first MANNAN by t0 is printed
followed by RANGOONIA by t1 and sometimes it runs t1 first and then t0 due to which first
RANGOONIA is printed by t1 followed by MANNAN by t0.
Hence the order of execution of threads is not determined.
But one thing is known ie all these threads run together at the same time in parallel
EXAMPLE 2) In this program , thread t0 is given the thread function PrintChar which takes a string as
parameter and thread t1 is given the thread function PrintInt which takes an integer as parameter
EXAMPLE3) In this program, the thread t0 is given the ThreadFunct which takes a string as parameter
and returns an integer to the main thread.
In this example,we can notice that if we want the thread function to return some value then this
returned value should be placed on the heap memory (ie dynamically allocated) and not on stack
because the stack is removed as soon as the tthread ends due to which any object on the stack will be
lost.
We should note that not all code that is multi-threaded uses the join routine. For example, a multi-
threaded web server might create a number of worker threads, and then use the main thread to
accept requests and pass them to the workers, indefinitely. Such long-lived programs thus may not
need to join. However, a parallel program that creates threads to execute a particular task (in parallel)
will likely use join to make sure all such work completes before exiting or moving onto the next stage
of computation.