0% found this document useful (0 votes)

36 views

CS330 Operating System Part VI

The document discusses threads and multi-threading in operating systems. It defines threads as light-weight processes that share resources like memory with other threads in a process. Threads allow leveraging multi-core systems for parallel computation. The document provides examples of data parallel and task parallel processing models using threads. It describes how operating systems maintain thread information and manage thread stacks. It also discusses the POSIX pthread API for thread creation and management.

Uploaded by

Harsh Bihany

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

CS330 Operating System Part VI

Uploaded by

Harsh Bihany

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

CS330: Operating Systems

Lecture 29
Threads
Threads are almost independent execution entities of a single process. Threads of a single process
can be scheduled on different CPUs in a concurrent manner.
Therefore, each thread has a different register state and stack.
At a given point in time, PC of different threads can be different.

Threads are different from processes. Threads of a single process share the address space, and
therefore context switch between them does not require switching the address space.

Is multi-threading useful?
Multithreading allows to leverage multi-core systems.
Threads share the address space, therefore global variables can be accessed from the thread
functions. Also, dynamically allocated memory can be passed as thread arguments.

Example of parallel-computation models

Data parallel process: Data is partitioned into disjoint sets and assigned to different threads.
Task parallel process: Each thread performs a different computation on the same data.

Eg.: In case of involved/overlaping-IO, blocking read/write, example network related receiv-

ing packets etc., a need for multiple threads catering to the needs of different read/receive requests.

Eg.: Finding MAX

Given N elements and a function f , we have to find the element with the maximum value of
f (e).
If the computation takes a lot of time, we can employ a multi-threading with K threads with
the following strategy: Partition N elements into K non-overlapping sets and assign each thread
to compute the MAX within its own set.

How does OS maintain thread related information?

Thread information is stored in the thread control block (TCB) which is pointed to by the PCB.
TCB contains the register state, which is used to save/restore the CPU state during context switch.
In Linux however, there is an overlap lesser distinction between a thread and a process. A
thread is treated as a separate process. The difference being the that the constructs within domain
of the current process like the address space, file state etc., do not have to be copied like in a vanilla
fork implementation, but rather only pointers to these have to maintained and copied. The thread
differs also from a process in that the TGID differs from the PID of the main process.

How stacks for multiple threads are managed?

Stack for threads dynamically allocated from the address space using mmap() system call and passed
to the OS during thread creation. Threads may have conflict in spaces allocated to them, however
since they function in the same address space, it is an acceptable hindrance.

POSIX Thread API

pthread_create

int pthread_create(pthread_t* tid, pthread_attr* attr, void* (thfunc)(void) void* arg);

Creates a thread with tid as its handle, and the thread starts executing the function pointed
to by thfunc pointer. A single argument void* can be passed to the thread. Thread attribute can
be used to control the thread behavior e.g., stack size, stack address etc. Passing NULL sets the
defaults. Returns 0 (NULL) on success.
The thread might also terminate, using system calls like pthread_exit() or pthread_cancel().

In LINUX, both pthread_create and fork are implemented using the clone system call.

pthread_join

int pthread_join( pthead_t tid, void **retval)

This call is generally made from the ‘main’ thread. This call waits for the tid to finish, and the
return value is captured in the retval variable (the thread must allocate the return value which
is freed after the process joins).
Invoking pthread_join for an already finished thread returns immediately.
// pthreads.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<pthread.h>

static long g_ctr[4];

void* do_inc(void *arg)

{
int ctr, tnum = *(int *)arg;
char *ptr = malloc(32);
for(ctr=0; ctr<100; ++ctr){
g_ctr[tnum] += 10 + tnum;
}
sprintf(ptr, "Thread-%d", tnum);
return ptr;
}

int main()
{
int num_threads = 4, ctr;
pthread_t threads[4];
int tids[4];
void *retval;

/*Create threads*/
for(ctr=0; ctr < num_threads; ++ctr){
tids[ctr] = ctr;
if(pthread_create(&threads[ctr], NULL, do_inc, &tids[ctr]) != 0){
perror("pthread_create");
exit(-1);
}
}
/*Wait for threads to finish their execution*/
4

for(ctr=0; ctr < num_threads; ++ctr){

pthread_join(threads[ctr], &retval);
printf("Joined %s with value %ld\n", (char *) retval, g_ctr[ctr]);
free(retval);
}
}

Output:

Joined Thread-0 with value 1000

Joined Thread-1 with value 1100
Joined Thread-2 with value 1200
Joined Thread-3 with value 1300

To check and see the stack allocation:

// pthread_analyze.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<pthread.h>

void* thfunc(void *arg)

{
int ctr = 0;
printf("Thread stack pointer = %p\n", &ctr);
return NULL;
}

int main()
{
pthread_t tid;
if(pthread_create(&tid, NULL, thfunc, NULL) != 0){
perror("pthread_create");
exit(-1);
}
printf("Main stack pointer = %p\n", &tid);
pthread_join(tid, NULL);
}

Output:

Thread stack pointer = 0xffffb565e834

Main stack pointer = 0xffffddb63480

On doing the strace we observe mmap, close to the thread stack address and clone calls. There
are several flags passed to the clone() system call, some of which are: CLONE_VM: for address space,
CLONE_FS: for file system, CLONE_THREAD: for thread specific cloning.
The Linux OS, does not distinguishes in terms of the way that fork or pthread creates new
process/threads. The difference is in the flags that have been passed.
5

// find_max_parallel.c
#include<stdio.h>
#include<stdlib.h>
#include<sys/time.h>
#include<string.h>
#include<math.h>
#include<assert.h>
#include<pthread.h>

#define SEED 0x74587

#define MAX_THREADS 64
#define USAGE_EXIT(s) do{ \
printf("Usage: %s <# of elements> <# of threads> \n %s\n",
argv[0], s); \
exit(-1);\
}while(0);

#define TDIFF(start, end) ((end.tv_sec - start.tv_sec) * 1000000UL + (end.tv_usec -

start.tv_usec))

struct thread_param{
pthread_t tid;
int *array;
int size;
double max;
int max_index;
};

double function(double element)

{
return (sqrt(element) * sin(element));
}

void* find_max(void *arg)

{
struct thread_param *param = (struct thread_param *) arg;
int ctr;

param->max = function(param->array[0]);
param->max_index = 0;

for(ctr=1; ctr < param->size; ++ctr){

double result = function(param->array[ctr]);
if(result > param->max){
param->max = result;
param->max_index = ctr;
}
}
return NULL;
}

int main(int argc, char **argv)

{
struct thread_param *params;
struct timeval start, end;
int *a, *ptr;
int num_elements, ctr, num_threads, per_thread, residue, max_index;
double max = 0.0;

if(argc !=3)
USAGE_EXIT("not enough parameters");

num_elements = atoi(argv[1]);
if(num_elements <=0)
USAGE_EXIT("invalid num elements");

num_threads = atoi(argv[2]);
if(num_threads <=0 || num_threads > MAX_THREADS){
USAGE_EXIT("invalid num of threads");
}

per_thread = num_elements / num_threads;

residue = num_elements % num_threads; /*Have to distribute*/

if(per_thread <= 0)
USAGE_EXIT("invalid num of elements to threads");

/* Parameters seems to be alright. Lets start our business*/

a = malloc(num_elements * sizeof(int));
if(!a){
USAGE_EXIT("invalid num elements, not enough memory");
}
srand(SEED);
for(ctr=0; ctr<num_elements; ++ctr)
a[ctr] = rand();

/Allocate thread specific parameters/

params = malloc(num_threads * sizeof(struct thread_param));
bzero(params, num_threads * sizeof(struct thread_param));

ptr = a;
gettimeofday(&start, NULL);

for(ctr=0; ctr < num_threads; ++ctr){

struct thread_param *param = params + ctr;
param->size = per_thread;

if(residue){
param->size++;
--residue;
}
param->array = ptr;
ptr += param->size;
7

if(pthread_create(&param->tid, NULL, find_max, param) != 0){

perror("pthread_create");
exit(-1);
}

}
assert((ptr - a) == num_elements);
num_elements = 0;

for(ctr=0; ctr < num_threads; ++ctr){

struct thread_param *param = params + ctr;
pthread_join(param->tid, NULL);
if(ctr == 0 || (ctr > 0 && param->max > max)){
max = param->max;
max_index = num_elements + param->max_index;
}
num_elements += param->size;
gettimeofday(&end, NULL);
}

gettimeofday(&end, NULL);
printf("Time taken = %ld microsecs\n", TDIFF(start, end));
printf("Max = %.2f @index %d\n", max, max_index);
free(a);
free(params);
}

Demonstrating the threads, we observe that on increasing the number of threads, reduces the
time taken.

Lecture 30
Consider the simultaneous increment to a global counter variable by multiple threads.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

#define MAX_THREADS 64
#define USAGE_EXIT(s) \
do \
{ \
printf("Usage: %s <# of threads> \n %s\n", argv[0], s); \
exit(-1); \
} while (0);

#define OP_COUNT 10000000UL

static long g_ctr;

void do_inc(void arg)

{
unsigned long ctr;
for (ctr = 0; ctr < OP_COUNT; ++ctr)
{
g_ctr++;
// asm volatile("incq %0;"
// : "=m"(g_ctr)
// :
// : "memory");
}
return NULL;
}

int main(int argc, char **argv)

{
int num_threads, ctr;
pthread_t *threads;
if (argc != 2)
USAGE_EXIT("not enough parameters");

num_threads = atoi(argv[1]);
if (num_threads <= 0 || num_threads > MAX_THREADS)
{
USAGE_EXIT("invalid num of threads");
}

threads = malloc(num_threads * sizeof(pthread_t));

for (ctr = 0; ctr < num_threads; ++ctr)
{
if (pthread_create(threads + ctr, NULL, do_inc, NULL) != 0)
{
perror("pthread_create");
exit(-1);
}
}

for (ctr = 0; ctr < num_threads; ++ctr)

{
pthread_join(threads[ctr], NULL);
}

printf("Final value = %ld\n", g_ctr);

free(threads);
}

Output:
The output is not 2 × 10000000. Rather the output is non-deterministic.

Reason:

counter++ in assembly
9

Mov (counter) R1
Add 1 R1
Mov R1 (counter)

A single C line can be composed of multiple instructions, between them scheduling creates an
issue. For eg.:

T1: Mov (counter), R1 // R1 = 0

T1: Add 1, R1 // R1 = 1
{switch-out, R1=1 saved on PCB (T1)}
T2: Mov (counter), R1 // R1 = 0
T2: Add 1, R1 // R1 = 1
T2 mov R1, (counter) // counter = 1
{switch-out, T1 scheduled, R1 = 1}
T1: mov R1, (counter) // counter = 1!

Assume after T1 starts execution, it is scheduled out and T2 starts execution. After T1 is
scheduled back and completes execution, counter should have been 2, however this is not the case
because of in-between scheduling.

What if counter++ is compiled to a single instruction, e.g: inc (counter).

This is a solution to the problem at hand for single-core systems, however he problem persists
in case of multi-core systems, because of similar solutions.

The main problem: Accessing shared variable in a concurrent manner results in

incorrect output. Correctness of a program is impacted because of concurrent access to shared
data causing race condition.

Some definitions
• Atomic operation: An operation is atomic if it is uninterruptible and indivisible.

• Critical section: A section of code accessing one or more shared resource(s), mostly shared
memory location(s).

• Mutual exclusion: Technique to allow exactly one execution entity to execute the critical
section.

• Lock: A mechanism used to orchestrate entry into critical section.

• Race condition: Occurs when multiple threads are allowed to enter the critical section.

Critical sections of an OS
• OS maintains shared information which can be accessed from different OS mode execution
(e.g., system call handlers, interrupt handlers etc.). They are often handled in parallel.

For eg.:
10

1. Consider a page table entry which is being updated due to swapping, and change in
protection flags, simultaneously.
2. The queue of network packets being updated concurrently to deliver the packets to a
process and receive incoming packets from the network device.

Strategy to handle race conditions in OS

Contexts executing criti- Uni-processor systems Multi-processor system

cal section
System calls Disable preemption Locking
System calls, Interrupt han- Disable interrupts Locking + Interrupt Disabling
dler (local CPU)
Multiple interrupt handlers Disable interrupts Locking + Interrupt Disabling
(local CPU)

In the case of only system calls, disabling preemption, is basically disabling the thread to be
scheduled out during the course of the critical section. This as seen previously, does not rectifies
the issue seen in multi-processor system, and therefore locking is required to refrain other entities
from accessing the shared data.
In the second case, disabling interrupts is a stricter condition than disabling interrupts, as
there are timer interrupts that cause this preemption. Thus it not only allows the thread to not
be descheduled, but also prevents any interrupts from ‘hijacking’ and taking over the execution of
the critical section. In the same case, only locking does not work (Why?).

Lecture 31
Concurrency issues in OS is challenging as finding the race condition itself is non-trivial.

Locking in pthread
pthred_mutex
pthread_mutex_t lock; // Initialized using pthread_mutex_init
static int counter = 0;
void *thfunc(void *) {
int ctr = 0;
for(ctr=0; ctr<10000; ++ctr){
pthread_mutex_lock(&lock); // One thread acquires lock, others wait
counter++; // Critical section
pthread_mutex_unlock(&lock); // Release the lock
}
}
11

Design issues of locks

The lock and unlock operations have to be efficient as it directly influence performance. We shall
refer to an abstract lock for discussion subsequently which is similar to pthread_mutex_lock.

lock_t* L;

lock(L) {
return; // -> Lock acquired
}

unlock(L) {
return; // ->Lock released
}

We have some implementation choices:

Hardware-assisted locks: Use hardware synchronization primitives like atomic operations.
These are not self-sufficient however are useful while developing locks.
Software locks are implemented without assuming any hardware support. Generally not
used in practice due to excessive overhead.

Lock acquisition and wasted CPU cycles

Consider the scenario, when a lock is acquired by a thread T1, and say T2 tries to access the
critical section. Two things can happen, either T2 repeatedly tries for the lock OR T2 reschedules
and (ideally) comes back when the lock is yielded.
With busy waiting (spinlock), context switch overheads saved, wasted CPU cycles due to
spinning. Busy waiting is preferred when critical section is small and the context executing the
critical section is not rescheduled (e.g., due to I/O wait).

Fairness
Given N threads contending for the lock, number of unsuccessful attempts for lock acquisition for
all contending threads should be same.

Bounded wait property: Given N threads contending for the lock, there should be an upper
bound on the number of attempts made by a given context to acquire the lock.

Lecture 32
Locks
We shall try to give certain implementation techniques towards designing spinlocks.

Buggy attempt
Consider the following implementation:
12

1. lock_t* L = 0; // initial value = 0;

2. lock(L) {
3. while(*L);
4. *L = 1;
5. }
6. unlock(L) {
7. *L = 0;
8. }

This implementation does not work because compare and swap have to occur atomically.
Meaning, consider the scenario in a single-core system, when after the thread T1, calls lock(L),
it de-schedules between lines 3 and 4. In this case, thread T2 (next on line of execution) gains
the lock, because the value of *L is still 1, and when T1, comes back it (as well as T2) will have
gained the lock. In a multi-core system two cores executing the while can both gain the lock.

Spinlock using Atomic exchange

Consider this implementation:

1. lock_t* L = 0; // initial value = 0;

2. lock(L) {
3. while(atomic_xchg(*L, 1));
4. }
5. unlock(L) {
8. *L = 0;
9. }

Atomic exchange: exchange the value of memory and register atomically.

atomic_xchg (int *PTR, int val) returns the value at PTR before exchange. Ensures mutual
exclusion if val is stored on the register.
However there is no guarantee to fairness.

// assembly equivalent using XCHG in x86_64

1. lock(lock_t *L ) {
2. asm volatile(
3. \mov $1, %%rax; "
4. \loop: xchg %%rax, (%%rdi); "
5. \cmp $0, %%rax;"
6. \jne loop; "
7. : : : \memory" );
8. }
unlock(int *L ) { *L = 1;}

Exchange value of register R and value at memory address M. RDI register contains the lock
argument. (To check various context switch scenarios...)
13

Spinlock using compare and swap

Atomic compare and swap: perform the condition check and swap atomically.
CAS (int *PTR, int cmpval, int newval) sets the value of PTR to newval if cmpval is equal to
value at PTR. Returns 0 on successful exchange.
Similar to previous case, this does not guarantees fairness.

1. lock_t* L = 0; // initial value = 0;

2. lock(L) {
3. while(CAS(*L, 0, 1));
4. }
5. unlock(L) {
6. *L = 0;
7. }

In x86, by default the equivalent instruction for CAS cmpxchg is not atomic, but it can be made
so by using the lock prefix.

cmpxchg source[Reg] destination [Mem/Reg]

Implicit registers : rax and flags

// happens atomically with lock prefix, ZF-> zero flag

1. if rax == [destination]
2. then
3. flags[ZF] = 1
4. [destination] = source
5. else
6. flags[ZF] = 0
7. rax = [destination]

// usage in x86
1. lock(lock_t *L ) {
2. asm volatile(
3. \mov $1, %%rcx;"
4. \loop: xor %%rax, %%rax;"
5. \lock cmpxchg %%rcx, (%%rdi);"
6. \jnz loop; "
7. : : : \rcx", \rax", \memory");
8. }
9. unlock(int *L ) { *L = 1;}

Value of RAX (=0) is compared against value at address in register RDI and exchanged with
RCX (=1), if they are equal.

Load Linked (LL) and Store Conditional (SC)

LoadLinked (R, M): Like a normal load, it loads R with value of M. Additionally, the hardware
keeps track of future stores to M.
14

StoreConditional (R, M): Stores the value of R to M if no stores happened to M after the
execution of LL instruction (after execution, R = 1). Otherwise, store is not performed (after
execution R=0)
Supported in RISC architectures like mips, risc-v etc.

1. lock_t* L = 0; // initial value = 0;

2. lock(L) {
3. while(LoadLinked(L) || !StoreConditional(L, 1));
4. }
5. unlock(L) {
6. *L = 0;
7. }

// assembly equivalent
lock: LL R1, (R2); //R2 = lock address
BNEQZ R1, lock;
ADDUI R1, R0, #1; //R1 = 1
SC R1, (R2)
BEQZ R1, lock

Efficient as the hardware avoids memory traffic for unsuccessful lock acquire attempts. Context
switch between LL and SC results in SC to fail (the architecture provides that).

Spinlocks: Reducing wasted cycles

Spinning for locks can introduce significant CPU overheads and increase energy consumption. A
strategy could be to backoff after every failure (exponential backoff is mostly used). Note that,
backing off does not ensures any fairness.

lock (lock_t* L) {
u64 backoff = 0;
while(LoadLinked(L) || !StoreConditional(L, 1)){
if (backoff < 63) backof++;
pause(1 << backoff); // hint to processor
}
}

Fairness in spinlocks
To ensure fairness, some notion of ordering is required. What if the threads are granted the lock in
the order of their arrival to the lock contention loop? A single lock variable may not be sufficient.
Example solution: Ticket spinlocks

Atomic fetch-and-add
xadd R, M
15

TmpReg T = R + [M]
R = [M]
[M] = T

fetch-and-add instruction, which atomically increments a value while returning the old value
at a particular address. Require lock prefix to be atomic.
// turning locking

struct lock_t {
long ticket;
long turn;
};

void init_lock(struct lock_t* L) {

L->ticket = 0;
L->turn = 0;
}

void unlock(struct lock_t*L) {

L->turn += 1;
}

void lock(struct lock_t *L){

long myturn = xadd(&L->ticket, 1);
while(myturn != L->turn)
pause(myturn - L->turn);
}

Consider the following example:

order of arrival: T1, T2, T3
T1 (in CS): myturn=0; L={1, 0}
T2: myturn=1; L={2, 0}
T3: myturn=2; L={3, 0}
T1 unlocks: L={3, 1}; T2 enters CS.

Local variable myturn is equivalent order of arrival. If a thread is in CS, myturn must be equal
to turn.
# threads waiting = tickets - turn - 1

Value of turn incremented on lock release. Thread which arrived just after the current thread
enters the CS. When a new thread arrives, it gets the lock after the other threads ahead of the
new thread acquire and release the lock.

Ticket spinlock guarantees bounded waiting. If N threads are contending for the lock
and execution of the CS consumes T cycles, then bound = N * T (assuming negligible context
switch overhead).

Spinlock with yield

void lock(struct lock_t *L){

long myturn = xadd(&L->ticket, 1);
while(myturn != L->turn)
sched_yield();
}

Why spin if the thread’s turn is yet to come?

Yield the CPU and allow the thread with ticket (or other non contending threads)
Further optimization: Allow the thread with “myturn” value one more than “L->turn” to
continue spinning.

Reader-Writer Locks
Allows multiple readers but a single writer to enter the CS.
Consider the following working example of a BST
struct BST {
struct Node* root;
rwlock_t* lock;
}

struct Node {
item_t item;
struct Node* left;
struct Node* right;
}

void insert(BST *t, item_t item);

void lookup(BST *t, item_t, item);

If multiple threads call lookup(), they may be allowed in parallel.

Implementation of reader-writer lock

struct rwlock_t {
Lock read_lock;
Lock write_lock;
int num_readers;
}

init_lock(rwlock_t* rL) {
init_lock(&rl->read_lock);
init_lock(&rl->write_lock);
rl->num_readers=0;
}

This serves as the baseline, we would different implementation for the writers and the readers.

For the writers: Write lock behavior is same as the typical lock, only one thread allowed to
acquire the lock.
17

void write_lock(rwlock_t *rL) {

lock(&rL->write_lock);
}
void write_unlock(rwlock_t *rL) {
unlock(&rL->write_lock);
}

For the readers: The first reader acquires the write lock, to prevent other writers from entering.
The last reader releases the write lock to allow writers.
void read_lock(rwlock_t *rL) {
lock(&rL->read_lock);
rL->num_readers++;
if(rL->num_readers == 1)
lock(&rL->write_lock);
unlock(&rL->read_lock);
}
void read_unlock(rwlock_t *rL) {
lock(&rL->read_lock);
rL->num_readers--;
if(rL->num_readers == 1)
unlock(&rL->write_lock);
unlock(&rL->read_lock);
}

Software attempts to Locking

Buggy attempt #1

int flag[2] = {0, 0};

void lock(int flag) { // id = 0 or 1

while(flag[id^1]);
flag[id] = 1;
}

void unlock(int id) {

flag[id] = 0;
}

Consider this solution for two threads only, say T0 and T1. This is buggy because: Both
threads can acquire the lock as “while condition check” and “setting the flag” is non-atomic.

Buggy attempt #2

int flag[2] = {0, 0};

void lock(int flag) { // id = 0 or 1

flag[id] = 1;
18

while(flag[id^1]);
}

void unlock(int id) {

flag[id] = 0;
}

This solution doesn’t works as well. Can lead to deadlocks (flag[0] = flag[1] = 1). In other
words the “progress” requirement is not met.
Progress: If no one has acquired the lock and there are contending threads, one of the threads
must acquire the lock within a finite time.

Buggy attempt # 3

int turn = 0;

void lock(int flag) { // id = 0 or 1

while(turn == id^1);
}

void unlock(int id) {

turn = id ^ 1;
}

Assuming T0, applies for lock first, this attempt does guarantee mutual exclusion. However
there is another issue with this, two threads must contend for the lock alternately. Thus, progress
requirement is not met, and one of the threads is stuck in an infinite loop (in non-CS code).

Peterson’s solution

int turn = 0;
int flag[2] = {0, 0};

void lock(int flag) { // id = 0 or 1

flag[id] = 1;
turn = id ^ 1;
while(flag[id^1] && turn == (id^1));
}

void unlock(int id) {

flad[id] = 0;
}

Mutual exclusion and fairness is guaranteed (The lock is fair because if two threads are con-
tending, they acquire the lock in an alternate manner.)
19

Lecture 33
Semaphores
Consider a scenario when a finite array of size N is accessed from a set of producer and consumer
threads. In this case,
- At most N concurrent producers are allowed if array is empty
- At most N concurrent consumers are allowed if array is full
- If we use mutual exclusion techniques, only one producer or consumer is allowed at any point
of time.

Operations on semaphores

struct semaphore {
int value;
spinlock_t* lock;
queue* waitQ;
} sem_t;

sem_init(sem_t* sem, int pshared, unsigned int value);

sem_wait(sem_t* sem);
sem_post(sem_t* sem);

Semaphores can be initialized by passing an initial value.

1. int sem_wait(sem_t *s) {

2. decrement the value of semaphore s by one
3. wait if value of semaphore s is negative
4. }
5.
6. int sem_post(sem_t *s) {
7. increment the value of semaphore s by one
8. if there are one or more threads waiting, wake one
9. }

Can be used to in a multi-threaded process or across multiple processes. The second argument
while sem_init is used to depict sharing procedures. If it is 0, then semaphores are shared between
threads of a same process.
Semaphores initialized with a value 1 are called binary semaphores, and can be used to imple-
ment blocking(waiting) locks.

Semaphore usage example: wait for child

Assuming that the semaphore is accessible from multiple processes.
child() {
...
sem_post(s);
exit(0);
20

int main (void) {

sem_init(s, 0);
if(fork() == 0)
child();
sem_wait(s);
}

If parent is scheduled after child creation, it waits till child finishes (because sem_wait decre-
ments the value, and now since the value becomes negative, it waits). If child is scheduled before
parent, parent does not wait for the semaphore.

Semaphore usage example: ordering

A=0; B=0;

Thread-0 {
A = 1;
printf("B: %d\n", B);
}

Therad-1 {
B = 1;
printf("A: %d\n", A);
}

What are the possible outcomes?

{A=1,B=1}, {A=1,B=0}, {A=0,B=1}.
How to guarantee {A=1, B=1}?

Attempt 1:
sem_init(s1, 0);
A=0; B=0;

Thread-0 {
A = 1;
sem_wait(&s1);
printf("B: %d\n", B);
}

Thread-1 {
B = 1;
sem_post(&s1);
printf("A: %d\n", A);
}

What are the possible outcomes?

{A=1,B=1}, {A=0,B=1}
21

Attempt 2:
sem_init(s1, 0);
sem_init(s2, 0);

A=0; B=0;

Thread-0 {
A=1;
sem_post(s1);
sem_wait(s2);
printf("B: %d\n", B);
}

Thread-1 {
B=1;
sem_wait(s1);
sem_post(s2);
printf("A: %d\n", A);
}

Waiting for each other guarantees desired output.

Producer-Consumer Problem

A buffer of size N, one or more producers and consumers. Producer produces an element into
the buffer (after processing). Consumer extracts an element from the buffer and processes it.
Example: A multi-threaded web server, network protocol layers etc.
How to solve this problem using semaphores?

Buggy Attempt 1

item_t A[n], pctr=0, cctr=0;

sem_t empty=sem_init(n); sem_t used=sem_init(0);

produce(item_t item) {
sem_wait(&empty);
A[pctr]=item;
pctr=(pctr+1)%n;
22

sem_post(&used);
}

consume(item_t item) {
sem_wait(&used);
item_t item = A[cctr];
cctr=(cctr+1)%n;
sem_post(&empty);
return item;
}

This solution is buggy because pctr, cctr are not protected, and can cause race conditions.

Buggy Attempt 2

item_t A[n], pctr=0, cctr=0; lock_t* L=init_lock();

sem_t empty=sem_init(n); sem_t used=sem_init(0);

produce(item_t item) {
Lock(L); sem_wait(&empty);
A[pctr]=item;
pctr=(pctr+1)%n;
sem_post(&used); Unlock(L);
}

consume(item_t item) {
Lock(L); sem_wait(&used);
item_t item = A[cctr];
cctr=(cctr+1)%n;
sem_post(&empty);
return item; Unlock(L);
}

Consider empty = 0 and producer has taken lock before the consumer. This results in a
deadlock, consumer waits for L and producer for empty.

A working solution

item_t A[n], pctr=0, cctr=0; lock_t* L=init_lock();

sem_t empty=sem_init(n); sem_t used=sem_init(0);

produce(item_t item) {
sem_wait(&empty); Lock(L);
A[pctr]=item;
pctr=(pctr+1)%n;
Unlock(L); sem_post(&used);
}

consume(item_t item) {
sem_wait(&used); Lock(L);
item_t item = A[cctr];
23

cctr=(cctr+1)%n;
Unlock(L); sem_post(&empty);
return item;
}

The solution is deadlock free and ensures correct synchronization, but very much serialized
(inside produce and consume).

Solution with different mutexes

item_t A[n], pctr=0, cctr=0; lock_t* P=init_lock() lock_t* C=init_lock();;

sem_t empty=sem_init(n); sem_t used=sem_init(0);

produce(item_t item) {
sem_wait(&empty); Lock(P);
A[pctr]=item;
pctr=(pctr+1)%n;
Unlock(P); sem_post(&used);
}

consume(item_t item) {
sem_wait(&used); Lock(C);
item_t item = A[cctr];
cctr=(cctr+1)%n;
Unlock(C); sem_post(&empty);
return item;
}

Check for the correctness of this solution.

Lecture 34
Some common issues in concurrent programs: atomic issues, failure of ordering assumption and
deadlocks.

Atomicity issues
Consider the following program
char* ptr; // allocated before use

void T1() {
...
strcpy(ptr, "hello world");
...
}

void T2() {
...
24

if (some_condition) {
free(ptr); ptr=NULL;
}
}

This code is buggy because the ptr can be freed before strcpy, which results is segmentation
fault. To rectify this, this solution can be seen:
char* ptr; // allocated before use

void T1() {
...
if(ptr) strcpy(ptr, "hello world");
...
}

void T2() {
...
if (some_condition) {
free(ptr); ptr=NULL;
}
}

This however does not fixes the issue at hand. Consider the following order of execution.
T1: if(ptr) T2: free(ptr) T1: strcpy Result: SEGFAULT.

Ordering issues
This code works under the assumption that line #4 of T2 is executed after the line #4 of T1. If
this ordering is violated, then T1 is stuck in the while loop.
1. bool pending;
2. void T1()
3. {
4. pending=True;
5. do_large_processing();
6. while(pending);
7. }

1. void T2()
2. {
3. do_some_processing();
4. pending=False;
5. some_other_processing();
6. }

Deadlocks

struct acc_t {
lock_t* L;
25

id_t acc_no;
long balance;
};

void txn_transfer(acc_t* src, acc_t* dst, long amount) {

lock(src->L); lock(dst->L);
check_and_transfer(src, dst, amount);
unlock(src->L); unlock(dst->L);
}

Consider this situation (which results in deadlock).

T1: txn_transfer(a1, a2, 1000) ⇒ lock(a1), lock(a2)
T2: txn_transfer(a2, a1, 1000) ⇒ lock(a1), lock(a2). Deadlock!

LAB - 5-Threading Using Pthreads API
No ratings yet
LAB - 5-Threading Using Pthreads API
8 pages
A 9 SysytemCallLab
No ratings yet
A 9 SysytemCallLab
6 pages
Workercrew Threadpool-Patterns
No ratings yet
Workercrew Threadpool-Patterns
7 pages
Lab03 Multithreading
No ratings yet
Lab03 Multithreading
4 pages
Threads and Multithreading
No ratings yet
Threads and Multithreading
36 pages
Lab 7
No ratings yet
Lab 7
24 pages
Lab
No ratings yet
Lab
22 pages
Inter Process Communication
No ratings yet
Inter Process Communication
84 pages
PES1UG24CS838_VIKASKS.pdf
No ratings yet
PES1UG24CS838_VIKASKS.pdf
5 pages
Posix Thread
No ratings yet
Posix Thread
22 pages
04 Pthread
No ratings yet
04 Pthread
42 pages
P Threads
No ratings yet
P Threads
18 pages
Lab12(3)
No ratings yet
Lab12(3)
14 pages
ch4并发编程
No ratings yet
ch4并发编程
45 pages
Lecture 4
No ratings yet
Lecture 4
41 pages
POSIX Threads: Tutorial Number 2 Principles of Parallel Architectures
No ratings yet
POSIX Threads: Tutorial Number 2 Principles of Parallel Architectures
30 pages
Ex - No.5 Threads Questions
No ratings yet
Ex - No.5 Threads Questions
4 pages
University of Central Punjab: Objective
No ratings yet
University of Central Punjab: Objective
14 pages
4.4. Tugas: Laboratorium Pembelajaran Ilmu Komputer Fakultas Ilmu Komputer Universitas Brawijaya
No ratings yet
4.4. Tugas: Laboratorium Pembelajaran Ilmu Komputer Fakultas Ilmu Komputer Universitas Brawijaya
11 pages
Lab 2
No ratings yet
Lab 2
7 pages
Operating System - Lab 4
No ratings yet
Operating System - Lab 4
7 pages
Thread Basics
No ratings yet
Thread Basics
11 pages
14-10-27 Jubb Concurrency
No ratings yet
14-10-27 Jubb Concurrency
22 pages
Reporte
No ratings yet
Reporte
9 pages
Pthreads, Locks, Semaphores: CS 241 Discussion Section 3 6 February - 9 February
No ratings yet
Pthreads, Locks, Semaphores: CS 241 Discussion Section 3 6 February - 9 February
22 pages
Threads With Ucontext - Implementation BEST ONE
No ratings yet
Threads With Ucontext - Implementation BEST ONE
3 pages
PES1UG24CS838_VIKASKS
No ratings yet
PES1UG24CS838_VIKASKS
5 pages
Experiments For Practice
No ratings yet
Experiments For Practice
11 pages
Programming Shared Address Space Platforms
No ratings yet
Programming Shared Address Space Platforms
44 pages
Lab 07 - Programming threads
No ratings yet
Lab 07 - Programming threads
9 pages
Lab4 22bps1216
No ratings yet
Lab4 22bps1216
7 pages
Swapnil PDC
No ratings yet
Swapnil PDC
43 pages
Operating Systems 9
No ratings yet
Operating Systems 9
15 pages
8 Week Report
No ratings yet
8 Week Report
23 pages
Lab10 C
No ratings yet
Lab10 C
2 pages
pthread
No ratings yet
pthread
6 pages
Program TCP Chatting
No ratings yet
Program TCP Chatting
11 pages
Topics 115-133
No ratings yet
Topics 115-133
33 pages
System Programming With Solution
No ratings yet
System Programming With Solution
17 pages
Code 1
No ratings yet
Code 1
81 pages
front_threads
No ratings yet
front_threads
18 pages
THREAD
No ratings yet
THREAD
6 pages
osweek3
No ratings yet
osweek3
6 pages
High Performance Computing
No ratings yet
High Performance Computing
67 pages
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Help Lab Threads
No ratings yet
Help Lab Threads
17 pages
Ud923 Birrell Paper
No ratings yet
Ud923 Birrell Paper
38 pages
21_Semaphore1
No ratings yet
21_Semaphore1
3 pages
Operating System Lab Notes of Multithreading Using P Threads
No ratings yet
Operating System Lab Notes of Multithreading Using P Threads
5 pages
Assignment Task
No ratings yet
Assignment Task
7 pages
Thread
No ratings yet
Thread
3 pages
AOS - Lab - Skill - I&II - Synchronization, Shared Memory Programming
No ratings yet
AOS - Lab - Skill - I&II - Synchronization, Shared Memory Programming
8 pages
Thread Basics
No ratings yet
Thread Basics
20 pages
25 SamplePThreads
No ratings yet
25 SamplePThreads
2 pages
Session 15 POSIX Threads and Mutex
No ratings yet
Session 15 POSIX Threads and Mutex
20 pages
Lecture 9 Programming Shared Address Space Platforms using POSIX Thread API.pptx
No ratings yet
Lecture 9 Programming Shared Address Space Platforms using POSIX Thread API.pptx
35 pages
Solutions 2
No ratings yet
Solutions 2
14 pages
C Interview Tips
100% (1)
C Interview Tips
109 pages
Lab 10
No ratings yet
Lab 10
6 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Preemptive Scheduling: by Curated by Kavitha Patil
No ratings yet
Preemptive Scheduling: by Curated by Kavitha Patil
11 pages
Chap 1303 - Ak
No ratings yet
Chap 1303 - Ak
26 pages
03 Laboratory Exercise 1 (3) VILLANUEVA
No ratings yet
03 Laboratory Exercise 1 (3) VILLANUEVA
6 pages
Modern Operating System
100% (1)
Modern Operating System
401 pages
Structured and Object Oriented Programming
No ratings yet
Structured and Object Oriented Programming
4 pages
Chapter 4 Process Management Deadlock
No ratings yet
Chapter 4 Process Management Deadlock
22 pages
Principles Of Modern Operating Systems 2nd Edition Jose M Garrido - Discover the ebook with all chapters in just a few seconds
100% (1)
Principles Of Modern Operating Systems 2nd Edition Jose M Garrido - Discover the ebook with all chapters in just a few seconds
55 pages
Threading
No ratings yet
Threading
36 pages
OS CO1 Session 01 Introduction
No ratings yet
OS CO1 Session 01 Introduction
18 pages
Introduction To Operating System
No ratings yet
Introduction To Operating System
19 pages
S.H.Jondhale Polytechnic Dombivli (West) Maharashtra State Board of Technical Education 2023-2024
No ratings yet
S.H.Jondhale Polytechnic Dombivli (West) Maharashtra State Board of Technical Education 2023-2024
19 pages
Monitoring Linux
No ratings yet
Monitoring Linux
9 pages
Cloud-Ops Interview
No ratings yet
Cloud-Ops Interview
4 pages
OS (Unit 4)
No ratings yet
OS (Unit 4)
23 pages
What Is Operating System?: An Operating System (OS) Is That Manages and Resources and Provides Common For
No ratings yet
What Is Operating System?: An Operating System (OS) Is That Manages and Resources and Provides Common For
11 pages
Moring Exam
No ratings yet
Moring Exam
31 pages
Operating System Design: Douglas
No ratings yet
Operating System Design: Douglas
13 pages
CSC101 Unified Material
No ratings yet
CSC101 Unified Material
56 pages
GP OS Threads
No ratings yet
GP OS Threads
4 pages
Unit-3 Synchronization and Deadlocks: 3.1 Critical Section Problem
No ratings yet
Unit-3 Synchronization and Deadlocks: 3.1 Critical Section Problem
16 pages
Multiprogramming: Concurrency Improves Throughput
No ratings yet
Multiprogramming: Concurrency Improves Throughput
38 pages
Operating System Question Bank For Unit 1&2
No ratings yet
Operating System Question Bank For Unit 1&2
9 pages
Chapter 5: Threads: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
No ratings yet
Chapter 5: Threads: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
16 pages
OS MCQ Set 5
100% (1)
OS MCQ Set 5
20 pages
OSCA Individual Assignment
No ratings yet
OSCA Individual Assignment
33 pages
Multiple Tasks and Multiple Processes-Preemptive RTOS
No ratings yet
Multiple Tasks and Multiple Processes-Preemptive RTOS
32 pages
Project Report - Visual OS Scheduler
No ratings yet
Project Report - Visual OS Scheduler
13 pages
Real Time and Embedded System Assignment
No ratings yet
Real Time and Embedded System Assignment
10 pages
OS Questions and Answer
No ratings yet
OS Questions and Answer
185 pages
Programming Assignment 1 2
No ratings yet
Programming Assignment 1 2
7 pages