0% found this document useful (0 votes)
36 views

CS330 Operating System Part VI

The document discusses threads and multi-threading in operating systems. It defines threads as light-weight processes that share resources like memory with other threads in a process. Threads allow leveraging multi-core systems for parallel computation. The document provides examples of data parallel and task parallel processing models using threads. It describes how operating systems maintain thread information and manage thread stacks. It also discusses the POSIX pthread API for thread creation and management.

Uploaded by

Harsh Bihany
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

CS330 Operating System Part VI

The document discusses threads and multi-threading in operating systems. It defines threads as light-weight processes that share resources like memory with other threads in a process. Threads allow leveraging multi-core systems for parallel computation. The document provides examples of data parallel and task parallel processing models using threads. It describes how operating systems maintain thread information and manage thread stacks. It also discusses the POSIX pthread API for thread creation and management.

Uploaded by

Harsh Bihany
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CS330: Operating Systems

Lecture 29
Threads
Threads are almost independent execution entities of a single process. Threads of a single process
can be scheduled on different CPUs in a concurrent manner.
Therefore, each thread has a different register state and stack.
At a given point in time, PC of different threads can be different.

Threads are different from processes. Threads of a single process share the address space, and
therefore context switch between them does not require switching the address space.

Is multi-threading useful?
Multithreading allows to leverage multi-core systems.
Threads share the address space, therefore global variables can be accessed from the thread
functions. Also, dynamically allocated memory can be passed as thread arguments.

Example of parallel-computation models


Data parallel process: Data is partitioned into disjoint sets and assigned to different threads.
Task parallel process: Each thread performs a different computation on the same data.

Eg.: In case of involved/overlaping-IO, blocking read/write, example network related receiv-


ing packets etc., a need for multiple threads catering to the needs of different read/receive requests.

Eg.: Finding MAX


Given N elements and a function f , we have to find the element with the maximum value of
f (e).
If the computation takes a lot of time, we can employ a multi-threading with K threads with
the following strategy: Partition N elements into K non-overlapping sets and assign each thread
to compute the MAX within its own set.

How does OS maintain thread related information?


2

Thread information is stored in the thread control block (TCB) which is pointed to by the PCB.
TCB contains the register state, which is used to save/restore the CPU state during context switch.
In Linux however, there is an overlap lesser distinction between a thread and a process. A
thread is treated as a separate process. The difference being the that the constructs within domain
of the current process like the address space, file state etc., do not have to be copied like in a vanilla
fork implementation, but rather only pointers to these have to maintained and copied. The thread
differs also from a process in that the TGID differs from the PID of the main process.

How stacks for multiple threads are managed?

Stack for threads dynamically allocated from the address space using mmap() system call and passed
to the OS during thread creation. Threads may have conflict in spaces allocated to them, however
since they function in the same address space, it is an acceptable hindrance.

POSIX Thread API


pthread_create

int pthread_create(pthread_t* tid, pthread_attr* attr, void* (*thfunc)(void*) void* arg);


3

Creates a thread with tid as its handle, and the thread starts executing the function pointed
to by thfunc pointer. A single argument void* can be passed to the thread. Thread attribute can
be used to control the thread behavior e.g., stack size, stack address etc. Passing NULL sets the
defaults. Returns 0 (NULL) on success.
The thread might also terminate, using system calls like pthread_exit() or pthread_cancel().

In LINUX, both pthread_create and fork are implemented using the clone system call.

pthread_join

int pthread_join( pthead_t tid, void **retval)

This call is generally made from the ‘main’ thread. This call waits for the tid to finish, and the
return value is captured in the retval variable (the thread must allocate the return value which
is freed after the process joins).
Invoking pthread_join for an already finished thread returns immediately.
// pthreads.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<pthread.h>

static long g_ctr[4];

void* do_inc(void *arg)


{
int ctr, tnum = *(int *)arg;
char *ptr = malloc(32);
for(ctr=0; ctr<100; ++ctr){
g_ctr[tnum] += 10 + tnum;
}
sprintf(ptr, "Thread-%d", tnum);
return ptr;
}

int main()
{
int num_threads = 4, ctr;
pthread_t threads[4];
int tids[4];
void *retval;

/*Create threads*/
for(ctr=0; ctr < num_threads; ++ctr){
tids[ctr] = ctr;
if(pthread_create(&threads[ctr], NULL, do_inc, &tids[ctr]) != 0){
perror("pthread_create");
exit(-1);
}
}
/*Wait for threads to finish their execution*/
4

for(ctr=0; ctr < num_threads; ++ctr){


pthread_join(threads[ctr], &retval);
printf("Joined %s with value %ld\n", (char *) retval, g_ctr[ctr]);
free(retval);
}
}

Output:

Joined Thread-0 with value 1000


Joined Thread-1 with value 1100
Joined Thread-2 with value 1200
Joined Thread-3 with value 1300

To check and see the stack allocation:


// pthread_analyze.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<pthread.h>

void* thfunc(void *arg)


{
int ctr = 0;
printf("Thread stack pointer = %p\n", &ctr);
return NULL;
}

int main()
{
pthread_t tid;
if(pthread_create(&tid, NULL, thfunc, NULL) != 0){
perror("pthread_create");
exit(-1);
}
printf("Main stack pointer = %p\n", &tid);
pthread_join(tid, NULL);
}

Output:

Thread stack pointer = 0xffffb565e834


Main stack pointer = 0xffffddb63480

On doing the strace we observe mmap, close to the thread stack address and clone calls. There
are several flags passed to the clone() system call, some of which are: CLONE_VM: for address space,
CLONE_FS: for file system, CLONE_THREAD: for thread specific cloning.
The Linux OS, does not distinguishes in terms of the way that fork or pthread creates new
process/threads. The difference is in the flags that have been passed.
5

// find_max_parallel.c
#include<stdio.h>
#include<stdlib.h>
#include<sys/time.h>
#include<string.h>
#include<math.h>
#include<assert.h>
#include<pthread.h>

#define SEED 0x74587


#define MAX_THREADS 64
#define USAGE_EXIT(s) do{ \
printf("Usage: %s <# of elements> <# of threads> \n %s\n",
argv[0], s); \
exit(-1);\
}while(0);

#define TDIFF(start, end) ((end.tv_sec - start.tv_sec) * 1000000UL + (end.tv_usec -


start.tv_usec))

struct thread_param{
pthread_t tid;
int *array;
int size;
double max;
int max_index;
};

double function(double element)


{
return (sqrt(element) * sin(element));
}

void* find_max(void *arg)


{
struct thread_param *param = (struct thread_param *) arg;
int ctr;

param->max = function(param->array[0]);
param->max_index = 0;

for(ctr=1; ctr < param->size; ++ctr){


double result = function(param->array[ctr]);
if(result > param->max){
param->max = result;
param->max_index = ctr;
}
}
return NULL;
}

int main(int argc, char **argv)


6

{
struct thread_param *params;
struct timeval start, end;
int *a, *ptr;
int num_elements, ctr, num_threads, per_thread, residue, max_index;
double max = 0.0;

if(argc !=3)
USAGE_EXIT("not enough parameters");

num_elements = atoi(argv[1]);
if(num_elements <=0)
USAGE_EXIT("invalid num elements");

num_threads = atoi(argv[2]);
if(num_threads <=0 || num_threads > MAX_THREADS){
USAGE_EXIT("invalid num of threads");
}

per_thread = num_elements / num_threads;


residue = num_elements % num_threads; /*Have to distribute*/

if(per_thread <= 0)
USAGE_EXIT("invalid num of elements to threads");

/* Parameters seems to be alright. Lets start our business*/

a = malloc(num_elements * sizeof(int));
if(!a){
USAGE_EXIT("invalid num elements, not enough memory");
}
srand(SEED);
for(ctr=0; ctr<num_elements; ++ctr)
a[ctr] = rand();

/*Allocate thread specific parameters*/


params = malloc(num_threads * sizeof(struct thread_param));
bzero(params, num_threads * sizeof(struct thread_param));

ptr = a;
gettimeofday(&start, NULL);

for(ctr=0; ctr < num_threads; ++ctr){


struct thread_param *param = params + ctr;
param->size = per_thread;

if(residue){
param->size++;
--residue;
}
param->array = ptr;
ptr += param->size;
7

if(pthread_create(&param->tid, NULL, find_max, param) != 0){


perror("pthread_create");
exit(-1);
}

}
assert((ptr - a) == num_elements);
num_elements = 0;

for(ctr=0; ctr < num_threads; ++ctr){


struct thread_param *param = params + ctr;
pthread_join(param->tid, NULL);
if(ctr == 0 || (ctr > 0 && param->max > max)){
max = param->max;
max_index = num_elements + param->max_index;
}
num_elements += param->size;
gettimeofday(&end, NULL);
}

gettimeofday(&end, NULL);
printf("Time taken = %ld microsecs\n", TDIFF(start, end));
printf("Max = %.2f @index %d\n", max, max_index);
free(a);
free(params);
}

Demonstrating the threads, we observe that on increasing the number of threads, reduces the
time taken.

Lecture 30
Consider the simultaneous increment to a global counter variable by multiple threads.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

#define MAX_THREADS 64
#define USAGE_EXIT(s) \
do \
{ \
printf("Usage: %s <# of threads> \n %s\n", argv[0], s); \
exit(-1); \
} while (0);

#define OP_COUNT 10000000UL


static long g_ctr;

void *do_inc(void *arg)


8

{
unsigned long ctr;
for (ctr = 0; ctr < OP_COUNT; ++ctr)
{
g_ctr++;
// asm volatile("incq %0;"
// : "=m"(g_ctr)
// :
// : "memory");
}
return NULL;
}

int main(int argc, char **argv)


{
int num_threads, ctr;
pthread_t *threads;
if (argc != 2)
USAGE_EXIT("not enough parameters");

num_threads = atoi(argv[1]);
if (num_threads <= 0 || num_threads > MAX_THREADS)
{
USAGE_EXIT("invalid num of threads");
}

threads = malloc(num_threads * sizeof(pthread_t));


for (ctr = 0; ctr < num_threads; ++ctr)
{
if (pthread_create(threads + ctr, NULL, do_inc, NULL) != 0)
{
perror("pthread_create");
exit(-1);
}
}

for (ctr = 0; ctr < num_threads; ++ctr)


{
pthread_join(threads[ctr], NULL);
}

printf("Final value = %ld\n", g_ctr);


free(threads);
}

Output:
The output is not 2 × 10000000. Rather the output is non-deterministic.

Reason:

counter++ in assembly
9

Mov (counter) R1
Add 1 R1
Mov R1 (counter)

A single C line can be composed of multiple instructions, between them scheduling creates an
issue. For eg.:

T1: Mov (counter), R1 // R1 = 0


T1: Add 1, R1 // R1 = 1
{switch-out, R1=1 saved on PCB (T1)}
T2: Mov (counter), R1 // R1 = 0
T2: Add 1, R1 // R1 = 1
T2 mov R1, (counter) // counter = 1
{switch-out, T1 scheduled, R1 = 1}
T1: mov R1, (counter) // counter = 1!

Assume after T1 starts execution, it is scheduled out and T2 starts execution. After T1 is
scheduled back and completes execution, counter should have been 2, however this is not the case
because of in-between scheduling.

What if counter++ is compiled to a single instruction, e.g: inc (counter).


This is a solution to the problem at hand for single-core systems, however he problem persists
in case of multi-core systems, because of similar solutions.

The main problem: Accessing shared variable in a concurrent manner results in


incorrect output. Correctness of a program is impacted because of concurrent access to shared
data causing race condition.

Some definitions
• Atomic operation: An operation is atomic if it is uninterruptible and indivisible.

• Critical section: A section of code accessing one or more shared resource(s), mostly shared
memory location(s).

• Mutual exclusion: Technique to allow exactly one execution entity to execute the critical
section.

• Lock: A mechanism used to orchestrate entry into critical section.

• Race condition: Occurs when multiple threads are allowed to enter the critical section.

Critical sections of an OS
• OS maintains shared information which can be accessed from different OS mode execution
(e.g., system call handlers, interrupt handlers etc.). They are often handled in parallel.

For eg.:
10

1. Consider a page table entry which is being updated due to swapping, and change in
protection flags, simultaneously.
2. The queue of network packets being updated concurrently to deliver the packets to a
process and receive incoming packets from the network device.

Strategy to handle race conditions in OS

Contexts executing criti- Uni-processor systems Multi-processor system


cal section
System calls Disable preemption Locking
System calls, Interrupt han- Disable interrupts Locking + Interrupt Disabling
dler (local CPU)
Multiple interrupt handlers Disable interrupts Locking + Interrupt Disabling
(local CPU)

In the case of only system calls, disabling preemption, is basically disabling the thread to be
scheduled out during the course of the critical section. This as seen previously, does not rectifies
the issue seen in multi-processor system, and therefore locking is required to refrain other entities
from accessing the shared data.
In the second case, disabling interrupts is a stricter condition than disabling interrupts, as
there are timer interrupts that cause this preemption. Thus it not only allows the thread to not
be descheduled, but also prevents any interrupts from ‘hijacking’ and taking over the execution of
the critical section. In the same case, only locking does not work (Why?).

Lecture 31
Concurrency issues in OS is challenging as finding the race condition itself is non-trivial.

Locking in pthread
pthred_mutex
pthread_mutex_t lock; // Initialized using pthread_mutex_init
static int counter = 0;
void *thfunc(void *) {
int ctr = 0;
for(ctr=0; ctr<10000; ++ctr){
pthread_mutex_lock(&lock); // One thread acquires lock, others wait
counter++; // Critical section
pthread_mutex_unlock(&lock); // Release the lock
}
}
11

Design issues of locks


The lock and unlock operations have to be efficient as it directly influence performance. We shall
refer to an abstract lock for discussion subsequently which is similar to pthread_mutex_lock.

lock_t* L;

lock(L) {
return; // -> Lock acquired
}

unlock(L) {
return; // ->Lock released
}

We have some implementation choices:


Hardware-assisted locks: Use hardware synchronization primitives like atomic operations.
These are not self-sufficient however are useful while developing locks.
Software locks are implemented without assuming any hardware support. Generally not
used in practice due to excessive overhead.

Lock acquisition and wasted CPU cycles


Consider the scenario, when a lock is acquired by a thread T1, and say T2 tries to access the
critical section. Two things can happen, either T2 repeatedly tries for the lock OR T2 reschedules
and (ideally) comes back when the lock is yielded.
With busy waiting (spinlock), context switch overheads saved, wasted CPU cycles due to
spinning. Busy waiting is preferred when critical section is small and the context executing the
critical section is not rescheduled (e.g., due to I/O wait).

Fairness
Given N threads contending for the lock, number of unsuccessful attempts for lock acquisition for
all contending threads should be same.

Bounded wait property: Given N threads contending for the lock, there should be an upper
bound on the number of attempts made by a given context to acquire the lock.

Lecture 32
Locks
We shall try to give certain implementation techniques towards designing spinlocks.

Buggy attempt
Consider the following implementation:
12

1. lock_t* L = 0; // initial value = 0;


2. lock(L) {
3. while(*L);
4. *L = 1;
5. }
6. unlock(L) {
7. *L = 0;
8. }

This implementation does not work because compare and swap have to occur atomically.
Meaning, consider the scenario in a single-core system, when after the thread T1, calls lock(L),
it de-schedules between lines 3 and 4. In this case, thread T2 (next on line of execution) gains
the lock, because the value of *L is still 1, and when T1, comes back it (as well as T2) will have
gained the lock. In a multi-core system two cores executing the while can both gain the lock.

Spinlock using Atomic exchange


Consider this implementation:

1. lock_t* L = 0; // initial value = 0;


2. lock(L) {
3. while(atomic_xchg(*L, 1));
4. }
5. unlock(L) {
8. *L = 0;
9. }

Atomic exchange: exchange the value of memory and register atomically.


atomic_xchg (int *PTR, int val) returns the value at PTR before exchange. Ensures mutual
exclusion if val is stored on the register.
However there is no guarantee to fairness.

// assembly equivalent using XCHG in x86_64


1. lock(lock_t *L ) {
2. asm volatile(
3. \mov $1, %%rax; "
4. \loop: xchg %%rax, (%%rdi); "
5. \cmp $0, %%rax;"
6. \jne loop; "
7. : : : \memory" );
8. }
unlock(int *L ) { *L = 1;}

Exchange value of register R and value at memory address M. RDI register contains the lock
argument. (To check various context switch scenarios...)
13

Spinlock using compare and swap


Atomic compare and swap: perform the condition check and swap atomically.
CAS (int *PTR, int cmpval, int newval) sets the value of PTR to newval if cmpval is equal to
value at PTR. Returns 0 on successful exchange.
Similar to previous case, this does not guarantees fairness.

1. lock_t* L = 0; // initial value = 0;


2. lock(L) {
3. while(CAS(*L, 0, 1));
4. }
5. unlock(L) {
6. *L = 0;
7. }

In x86, by default the equivalent instruction for CAS cmpxchg is not atomic, but it can be made
so by using the lock prefix.

cmpxchg source[Reg] destination [Mem/Reg]


Implicit registers : rax and flags

// happens atomically with lock prefix, ZF-> zero flag


1. if rax == [destination]
2. then
3. flags[ZF] = 1
4. [destination] = source
5. else
6. flags[ZF] = 0
7. rax = [destination]

// usage in x86
1. lock(lock_t *L ) {
2. asm volatile(
3. \mov $1, %%rcx;"
4. \loop: xor %%rax, %%rax;"
5. \lock cmpxchg %%rcx, (%%rdi);"
6. \jnz loop; "
7. : : : \rcx", \rax", \memory");
8. }
9. unlock(int *L ) { *L = 1;}

Value of RAX (=0) is compared against value at address in register RDI and exchanged with
RCX (=1), if they are equal.

Load Linked (LL) and Store Conditional (SC)


LoadLinked (R, M): Like a normal load, it loads R with value of M. Additionally, the hardware
keeps track of future stores to M.
14

StoreConditional (R, M): Stores the value of R to M if no stores happened to M after the
execution of LL instruction (after execution, R = 1). Otherwise, store is not performed (after
execution R=0)
Supported in RISC architectures like mips, risc-v etc.

1. lock_t* L = 0; // initial value = 0;


2. lock(L) {
3. while(LoadLinked(L) || !StoreConditional(L, 1));
4. }
5. unlock(L) {
6. *L = 0;
7. }

// assembly equivalent
lock: LL R1, (R2); //R2 = lock address
BNEQZ R1, lock;
ADDUI R1, R0, #1; //R1 = 1
SC R1, (R2)
BEQZ R1, lock

Efficient as the hardware avoids memory traffic for unsuccessful lock acquire attempts. Context
switch between LL and SC results in SC to fail (the architecture provides that).

Spinlocks: Reducing wasted cycles


Spinning for locks can introduce significant CPU overheads and increase energy consumption. A
strategy could be to backoff after every failure (exponential backoff is mostly used). Note that,
backing off does not ensures any fairness.

lock (lock_t* L) {
u64 backoff = 0;
while(LoadLinked(L) || !StoreConditional(L, 1)){
if (backoff < 63) backof++;
pause(1 << backoff); // hint to processor
}
}

Fairness in spinlocks
To ensure fairness, some notion of ordering is required. What if the threads are granted the lock in
the order of their arrival to the lock contention loop? A single lock variable may not be sufficient.
Example solution: Ticket spinlocks

Atomic fetch-and-add
xadd R, M
15

TmpReg T = R + [M]
R = [M]
[M] = T

fetch-and-add instruction, which atomically increments a value while returning the old value
at a particular address. Require lock prefix to be atomic.
// turning locking

struct lock_t {
long ticket;
long turn;
};

void init_lock(struct lock_t* L) {


L->ticket = 0;
L->turn = 0;
}

void unlock(struct lock_t*L) {


L->turn += 1;
}

void lock(struct lock_t *L){


long myturn = xadd(&L->ticket, 1);
while(myturn != L->turn)
pause(myturn - L->turn);
}

Consider the following example:


order of arrival: T1, T2, T3
T1 (in CS): myturn=0; L={1, 0}
T2: myturn=1; L={2, 0}
T3: myturn=2; L={3, 0}
T1 unlocks: L={3, 1}; T2 enters CS.

Local variable myturn is equivalent order of arrival. If a thread is in CS, myturn must be equal
to turn.
# threads waiting = tickets - turn - 1

Value of turn incremented on lock release. Thread which arrived just after the current thread
enters the CS. When a new thread arrives, it gets the lock after the other threads ahead of the
new thread acquire and release the lock.

Ticket spinlock guarantees bounded waiting. If N threads are contending for the lock
and execution of the CS consumes T cycles, then bound = N * T (assuming negligible context
switch overhead).

Spinlock with yield


16

void lock(struct lock_t *L){


long myturn = xadd(&L->ticket, 1);
while(myturn != L->turn)
sched_yield();
}

Why spin if the thread’s turn is yet to come?


Yield the CPU and allow the thread with ticket (or other non contending threads)
Further optimization: Allow the thread with “myturn” value one more than “L->turn” to
continue spinning.

Reader-Writer Locks
Allows multiple readers but a single writer to enter the CS.
Consider the following working example of a BST
struct BST {
struct Node* root;
rwlock_t* lock;
}

struct Node {
item_t item;
struct Node* left;
struct Node* right;
}

void insert(BST *t, item_t item);


void lookup(BST *t, item_t, item);

If multiple threads call lookup(), they may be allowed in parallel.

Implementation of reader-writer lock

struct rwlock_t {
Lock read_lock;
Lock write_lock;
int num_readers;
}

init_lock(rwlock_t* rL) {
init_lock(&rl->read_lock);
init_lock(&rl->write_lock);
rl->num_readers=0;
}

This serves as the baseline, we would different implementation for the writers and the readers.

For the writers: Write lock behavior is same as the typical lock, only one thread allowed to
acquire the lock.
17

void write_lock(rwlock_t *rL) {


lock(&rL->write_lock);
}
void write_unlock(rwlock_t *rL) {
unlock(&rL->write_lock);
}

For the readers: The first reader acquires the write lock, to prevent other writers from entering.
The last reader releases the write lock to allow writers.
void read_lock(rwlock_t *rL) {
lock(&rL->read_lock);
rL->num_readers++;
if(rL->num_readers == 1)
lock(&rL->write_lock);
unlock(&rL->read_lock);
}
void read_unlock(rwlock_t *rL) {
lock(&rL->read_lock);
rL->num_readers--;
if(rL->num_readers == 1)
unlock(&rL->write_lock);
unlock(&rL->read_lock);
}

Software attempts to Locking


Buggy attempt #1

int flag[2] = {0, 0};

void lock(int flag) { // id = 0 or 1


while(flag[id^1]);
flag[id] = 1;
}

void unlock(int id) {


flag[id] = 0;
}

Consider this solution for two threads only, say T0 and T1. This is buggy because: Both
threads can acquire the lock as “while condition check” and “setting the flag” is non-atomic.

Buggy attempt #2

int flag[2] = {0, 0};

void lock(int flag) { // id = 0 or 1


flag[id] = 1;
18

while(flag[id^1]);
}

void unlock(int id) {


flag[id] = 0;
}

This solution doesn’t works as well. Can lead to deadlocks (flag[0] = flag[1] = 1). In other
words the “progress” requirement is not met.
Progress: If no one has acquired the lock and there are contending threads, one of the threads
must acquire the lock within a finite time.

Buggy attempt # 3

int turn = 0;

void lock(int flag) { // id = 0 or 1


while(turn == id^1);
}

void unlock(int id) {


turn = id ^ 1;
}

Assuming T0, applies for lock first, this attempt does guarantee mutual exclusion. However
there is another issue with this, two threads must contend for the lock alternately. Thus, progress
requirement is not met, and one of the threads is stuck in an infinite loop (in non-CS code).

Peterson’s solution

int turn = 0;
int flag[2] = {0, 0};

void lock(int flag) { // id = 0 or 1


flag[id] = 1;
turn = id ^ 1;
while(flag[id^1] && turn == (id^1));
}

void unlock(int id) {


flad[id] = 0;
}

Mutual exclusion and fairness is guaranteed (The lock is fair because if two threads are con-
tending, they acquire the lock in an alternate manner.)
19

Lecture 33
Semaphores
Consider a scenario when a finite array of size N is accessed from a set of producer and consumer
threads. In this case,
- At most N concurrent producers are allowed if array is empty
- At most N concurrent consumers are allowed if array is full
- If we use mutual exclusion techniques, only one producer or consumer is allowed at any point
of time.

Operations on semaphores

struct semaphore {
int value;
spinlock_t* lock;
queue* waitQ;
} sem_t;

sem_init(sem_t* sem, int pshared, unsigned int value);


sem_wait(sem_t* sem);
sem_post(sem_t* sem);

Semaphores can be initialized by passing an initial value.

1. int sem_wait(sem_t *s) {


2. decrement the value of semaphore s by one
3. wait if value of semaphore s is negative
4. }
5.
6. int sem_post(sem_t *s) {
7. increment the value of semaphore s by one
8. if there are one or more threads waiting, wake one
9. }

Can be used to in a multi-threaded process or across multiple processes. The second argument
while sem_init is used to depict sharing procedures. If it is 0, then semaphores are shared between
threads of a same process.
Semaphores initialized with a value 1 are called binary semaphores, and can be used to imple-
ment blocking(waiting) locks.

Semaphore usage example: wait for child


Assuming that the semaphore is accessible from multiple processes.
child() {
...
sem_post(s);
exit(0);
20

int main (void) {


sem_init(s, 0);
if(fork() == 0)
child();
sem_wait(s);
}

If parent is scheduled after child creation, it waits till child finishes (because sem_wait decre-
ments the value, and now since the value becomes negative, it waits). If child is scheduled before
parent, parent does not wait for the semaphore.

Semaphore usage example: ordering

A=0; B=0;

Thread-0 {
A = 1;
printf("B: %d\n", B);
}

Therad-1 {
B = 1;
printf("A: %d\n", A);
}

What are the possible outcomes?


{A=1,B=1}, {A=1,B=0}, {A=0,B=1}.
How to guarantee {A=1, B=1}?

Attempt 1:
sem_init(s1, 0);
A=0; B=0;

Thread-0 {
A = 1;
sem_wait(&s1);
printf("B: %d\n", B);
}

Thread-1 {
B = 1;
sem_post(&s1);
printf("A: %d\n", A);
}

What are the possible outcomes?


{A=1,B=1}, {A=0,B=1}
21

Attempt 2:
sem_init(s1, 0);
sem_init(s2, 0);

A=0; B=0;

Thread-0 {
A=1;
sem_post(s1);
sem_wait(s2);
printf("B: %d\n", B);
}

Thread-1 {
B=1;
sem_wait(s1);
sem_post(s2);
printf("A: %d\n", A);
}

Waiting for each other guarantees desired output.

Producer-Consumer Problem

A buffer of size N, one or more producers and consumers. Producer produces an element into
the buffer (after processing). Consumer extracts an element from the buffer and processes it.
Example: A multi-threaded web server, network protocol layers etc.
How to solve this problem using semaphores?

Buggy Attempt 1

item_t A[n], pctr=0, cctr=0;


sem_t empty=sem_init(n); sem_t used=sem_init(0);

produce(item_t item) {
sem_wait(&empty);
A[pctr]=item;
pctr=(pctr+1)%n;
22

sem_post(&used);
}

consume(item_t item) {
sem_wait(&used);
item_t item = A[cctr];
cctr=(cctr+1)%n;
sem_post(&empty);
return item;
}

This solution is buggy because pctr, cctr are not protected, and can cause race conditions.

Buggy Attempt 2

item_t A[n], pctr=0, cctr=0; lock_t* L=init_lock();


sem_t empty=sem_init(n); sem_t used=sem_init(0);

produce(item_t item) {
Lock(L); sem_wait(&empty);
A[pctr]=item;
pctr=(pctr+1)%n;
sem_post(&used); Unlock(L);
}

consume(item_t item) {
Lock(L); sem_wait(&used);
item_t item = A[cctr];
cctr=(cctr+1)%n;
sem_post(&empty);
return item; Unlock(L);
}

Consider empty = 0 and producer has taken lock before the consumer. This results in a
deadlock, consumer waits for L and producer for empty.

A working solution

item_t A[n], pctr=0, cctr=0; lock_t* L=init_lock();


sem_t empty=sem_init(n); sem_t used=sem_init(0);

produce(item_t item) {
sem_wait(&empty); Lock(L);
A[pctr]=item;
pctr=(pctr+1)%n;
Unlock(L); sem_post(&used);
}

consume(item_t item) {
sem_wait(&used); Lock(L);
item_t item = A[cctr];
23

cctr=(cctr+1)%n;
Unlock(L); sem_post(&empty);
return item;
}

The solution is deadlock free and ensures correct synchronization, but very much serialized
(inside produce and consume).

Solution with different mutexes

item_t A[n], pctr=0, cctr=0; lock_t* P=init_lock() lock_t* C=init_lock();;


sem_t empty=sem_init(n); sem_t used=sem_init(0);

produce(item_t item) {
sem_wait(&empty); Lock(P);
A[pctr]=item;
pctr=(pctr+1)%n;
Unlock(P); sem_post(&used);
}

consume(item_t item) {
sem_wait(&used); Lock(C);
item_t item = A[cctr];
cctr=(cctr+1)%n;
Unlock(C); sem_post(&empty);
return item;
}

Check for the correctness of this solution.

Lecture 34
Some common issues in concurrent programs: atomic issues, failure of ordering assumption and
deadlocks.

Atomicity issues
Consider the following program
char* ptr; // allocated before use

void T1() {
...
strcpy(ptr, "hello world");
...
}

void T2() {
...
24

if (some_condition) {
free(ptr); ptr=NULL;
}
}

This code is buggy because the ptr can be freed before strcpy, which results is segmentation
fault. To rectify this, this solution can be seen:
char* ptr; // allocated before use

void T1() {
...
if(ptr) strcpy(ptr, "hello world");
...
}

void T2() {
...
if (some_condition) {
free(ptr); ptr=NULL;
}
}

This however does not fixes the issue at hand. Consider the following order of execution.
T1: if(ptr) T2: free(ptr) T1: strcpy Result: SEGFAULT.

Ordering issues
This code works under the assumption that line #4 of T2 is executed after the line #4 of T1. If
this ordering is violated, then T1 is stuck in the while loop.
1. bool pending;
2. void T1()
3. {
4. pending=True;
5. do_large_processing();
6. while(pending);
7. }

1. void T2()
2. {
3. do_some_processing();
4. pending=False;
5. some_other_processing();
6. }

Deadlocks

struct acc_t {
lock_t* L;
25

id_t acc_no;
long balance;
};

void txn_transfer(acc_t* src, acc_t* dst, long amount) {


lock(src->L); lock(dst->L);
check_and_transfer(src, dst, amount);
unlock(src->L); unlock(dst->L);
}

Consider this situation (which results in deadlock).


T1: txn_transfer(a1, a2, 1000) ⇒ lock(a1), lock(a2)
T2: txn_transfer(a2, a1, 1000) ⇒ lock(a1), lock(a2). Deadlock!

You might also like