Merged 2
Merged 2
pthread_mutex _t lock; // Initialized using pthread_mutex_init . lock_t *L; // Initial value = - Does this implementation work?
- Locking is necessary when multiple contexts access shared resources
static int counter = ; . lock(L)
- Example: Multiple threads, multiple OS execution contexts . {
void *thfunc(void *)
{- Efficiency of lock and unlock operations . while(*L);
- intHardware-assisted
ctr = ; lock implementations are used for efficiency . *L = ;
Lock acquisition
- for(ctr= ; ctr< delay; vs. wasted CPU cycles
++ctr){ . }
- pthread_mutex_lock(&lock);
Use waiting locks and spinlocks depending on theacquires
// One thread requirement
lock, others wait . unlock(L)
- counter++;
Fairness of the locking scheme // Critical section . {
- pthread_mutex_unlock(&lock); // Release
Contending threads should not starve for the the
locklock
(infinitely) . *L = ;
} . }
}
Recap: Synchronization and locking
pthread_mutex _t lock; // Initialized using pthread_mutex_init
- Locking is necessary when multiple contexts access shared resources
static int counter = ;
CS330: Operating Systems - Example: Multiple threads, multiple OS execution contexts
void *thfunc(void *)
{- Efficiency of lock and unlock operations
- intHardware-assisted
ctr = ; lock implementations are used for efficiency
Locks Lock acquisition
- for(ctr= ; ctr< delay; vs. wasted CPU cycles
++ctr){
- pthread_mutex_lock(&lock);
Use waiting locks and spinlocks depending on theacquires
// One thread requirement
lock, others wait
- counter++;
Fairness of the locking scheme // Critical section
- pthread_mutex_unlock(&lock); // Release
Contending threads should not starve for the the
locklock
}
} Agenda: Spinlocks, Semaphore and mutex (waiting locks)
Spinlock: Buggy attempt Spinlock using atomic exchange
. lock_t *L; // Initial value = - Does this implementation work? . lock_t *L; // Initial value = - Atomic exchange: exchange the value of
. lock(L) - No, it does not ensure mutual exclusion . lock(L) memory and register atomically
. { - Why? . { - atomic_xchg (int *PTR, int val) returns
. while(*L); - Single core: Context switch . while(atomic_xchg(*L, ));
the value at PTR before exchange
. *L = ; between line #4 and line #5 . }
- Ensures mutual exclusion if “val” is
. } . unlock(L)
- Multicore: Two cores exiting the stored on a register
. unlock(L) . {
while loop by reading lock = 0 - No fairness guarantees
. { . *lock = ;
. *L = ; . }
. }
Spinlock: Buggy attempt Spinlock: Buggy attempt
. lock_t *L; // Initial value = - Does this implementation work? . lock_t *L; // Initial value = - Does this implementation work?
. lock(L) - No, it does not ensure mutual exclusion . lock(L) - No, it does not ensure mutual exclusion
. { - Why? . { - Why?
. while(*L); . while(*L); - Single core: Context switch
. *L = ; . *L = ; between line #4 and line #5
. } . }
- Multicore: Two cores exiting the
. unlock(L) . unlock(L)
while loop by reading lock = 0
. { . {
. *L = ; . *L = ;
- Core issue: Compare and swap has to
. } . } happen atomically!
Spinlock using compare and swap Spinlock using CMPXCHG on X86
lock(lock_t *L )
. lock_t *L; // Initial value =
- Atomic compare and swap: perform the { - Value of RAX (=0) is compared
. lock(L)
. {
condition check and swap atomically asm volatile( against value at address in register
. while( CAS(*L, , ) ); - CAS (int *PTR, int cmpval, int newval) “mov $ , %%rcx;” RDI and exchanged with RCX (=1), if
. } sets the value of PTR to newval if “loop: xor %%rax, %%rax;” they are equal
. unlock(L) cmpval is equal to value at PTR . Returns “lock cmpxchg %%rcx, (%%rdi);”
- Exercise: Visualize a context switch
. { 0 on successful exchange “jnz loop; ”
: : : “rcx”, “rax”, “memory”); between any two instructions and
. *lock = ; - No fairness guarantees!
} analyse the correctness
. }
unlock(lock_t *L ) { *L = ;}
Spinlock using XCHG on X86 CAS on X86: cmpxchg
lock(lock_t *L )
cmpxchg source[Reg] destination [Mem/Reg]
{ - XCHG R, M ⇒ Exchange value of Implicit registers : rax and flags
asm volatile( register R and value at memory address
“mov $ , %%rax; ” . if rax == [destination] - “cmpxchg” is not atomic in
M
“loop: xchg %%rax, (%%rdi); ” . then X86, should be used with a
- RDI register contains the lock argument
“cmp $ , %%rax;” . flags[ZF] = “lock” prefix
- Exercise: Visualize a context switch
“jne loop; ” . [destination] = source
between any two instructions and . else
: : : “memory” );
analyse the correctness . flags[ZF] =
}
unlock(int *L ) { *L = ;} . rax = [destination]
Spinlock using LL and LC Spinlocks: reducing wasted cycles
lock_t *L; //initial value = lock: LL R , (R ); //R = lock address - Spinning for locks can introduce significant CPU overheads and increase
lock( lock_t *L) BNEQZ R , lock; energy consumption
{ ADDUI R , R , # ; //R =
SC R , (R ) - How to reduce spinning in spinlocks?
while(LoadLinked(L) || - Strategy: Back-off after every failure, exponential back-off used mostly
BEQZ R , lock
!StoreConditional(L, ));
} lock( lock_t *L) {
unlock( lock_t *L) { *L = ;} u backoff = ;
while(LoadLinked(L) || !StoreConditional(L, )){
- Efficient as the hardware avoids memory traffic for unsuccessful lock if(backoff < ) ++backoff;
acquire attempts pause( << backoff); // Hint to processor
- Context switch between LL and SC results in SC to fail }
Load Linked (LL) and Store conditional (SC) Spinlocks: reducing wasted cycles
- LoadLinked (R, M) - Spinning for locks can introduce significant CPU overheads and increase
- Like a normal load, it loads R with value of M energy consumption
- Additionally, the hardware keeps track of future stores to M - How to reduce spinning in spinlocks?
- StoreConditional (R, M)
- Stores the value of R to M if no stores happened to M after the
execution of LL instruction (after execution, R = 1)
- Otherwise, store is not performed (after execution R=0)
- Supported in RISC architectures like mips, risc-v etc.
Atomic fetch and add (xadd on X86) Ticket spinlock
Thread-N Thread-K+1 Thread-K Thread-1 Thread-0
xadd R, M
- Example: M = 100; RAX = 200 myturn = N ……... myturn = K+1 myturn = K ……...
myturn = 1 myturn = 0
TmpReg T = R + [M]
- After executing “lock xadd %RAX, M”, value
R = [M] Ticket = N + 1
of RAX = 100, M = 300 Contending
Turn = K
Finished CS
[M] = T
- Require lock prefix to be atomic
- Local variable “myturn” is equivalent to the order of arrival
- If a thread is in CS ⇒ Local Turn must be same as “Turn”
- Threads waiting = Ticket - Turn -1
Fairness in spinlocks Ticket spinlocks (OSTEP Fig. 28.7)
void lock(struct lock_t *L){
struct lock_t{
- Spinlock implementations discussed so far are not fair, long myturn = xadd(&L → ticket, );
long ticket;
- no bounded waiting while(myturn != L → turn)
long turn;
- To ensure fairness, some notion of ordering is required };
pause(myturn - L → turn);
- What if the threads are granted the lock in the order of their arrival to }
void init_lock (struct lock_t *L){
the lock contention loop? L → ticket = ; L → turn = ; - Example: Order of arrival: T1 T2 T3
- A single lock variable may not be sufficient } - T1 (in CS) : myturn = 0, L = {1, 0}
- Example solution: Ticket spinlocks void unlock(struct lock_t *L){ - T2: myturn = 1, L = {2, 0}
L → turn++; - T3: myturn = 2, L = {3,0}
} - T1 unlocks, L = {3, 1}. T2 enters CS
Ticket spinlock Reader-writer locks
Thread-N+1 Thread-N Thread-K+1 Thread-K Thread-1 Thread-0 - Allows multiple readers or a single writer to enter the CS
myturn = N+1 myturn = N……... myturn = K+1 myturn = K ……...
myturn = 1 myturn = 0 - Example: Insert, delete and lookup operations on a search tree
Ticket = N + 2
Turn = K + 1
- Ticket spinlock guarantees bounded waiting
- If N threads are contending for the lock and execution of the CS
consumes T cycles, then bound = N * T (assuming negligible context
switch overhead)
Ticket spinlock Ticket spinlock (with yield)
Thread-N Thread-K+1 Thread-K Thread-1 Thread-0 - Why spin if the thread’s turn is yet to
myturn = N ……... myturn = K+1 myturn = K ……...
myturn = 1 myturn = 0
void lock(struct lock_t *L){ come?
Ticket = N + 1 long myturn = xadd(&L → ticket, ); - Yield the CPU and allow the thread
Turn = K + 1
while(myturn != L → turn) with ticket (or other non contending
sched_yield( ); threads)
- Value of turn incremented on lock release } - Further optimization
- Thread which arrived just after the current thread enters the CS - Allow the thread with “myturn”
- When a new thread arrives, it gets the lock after the other threads value one more than “L→ turn”
ahead of the new thread acquire and release the lock to continue spinning
Reader-writer locks Implementation of read-write locks (writers)
- Allows multiple readers or a single writer to enter the CS struct rwlock_t{ init_lock(rwlock_t *rL)
Lock read_lock; {
- Example: Insert, delete and lookup operations on a search tree
Lock write_lock; init_lock(&rL → read_lock);
struct BST{ struct node{ int num_readers; init_lock(&rL → write_lock);
struct node *root; item_t item; } rL → num_readers = ;
rwlock_t *lock; struct node *left; }
}; struct node*right; void write_lock(rwlock_t *rL) void write_unlock(rwlock_t *rL)
}; { {
lock(&rL → write_lock); unlock(&rL → write_lock);
void insert(BST *t, item_t item); - If multiple threads call lookup( ), they } }
void lookup(BST *t, item_t item); may traverse the tree in parallel - Write lock behavior is same as the typical lock, only one thread allowed to
acquire the lock
Reader-writer locks Implementation of read-write locks
- Allows multiple readers or a single writer to enter the CS struct rwlock_t{ init_lock(rwlock_t *rL)
Lock read_lock; {
- Example: Insert, delete and lookup operations on a search tree
Lock write_lock; init_lock(&rL → read_lock);
struct BST{ struct node{ int num_readers; init_lock(&rL → write_lock);
struct node *root; item_t item; } rL → num_readers = ;
rwlock_t *lock; struct node *left; }
}; struct node*right;
};
void insert(BST *t, item_t item);
void lookup(BST *t, item_t item);
Implementation of read-write locks (readers) Software lock: Buggy #1
struct rwlock_t{ - The first reader acquires the write lock int flag[ ] = { , };
Lock read_lock;
prevents writers to acquire lock void lock (int id) /*id = or */ - Solution for two threads, T and T with
Lock write_lock; 0 1
int num_readers; - The last reader releases the write lock to {
id 0 and 1, respectively
} allow writers while(flag[id ^ ])); // ^ → XOR
- We have seen that this solution does not
void read_lock(rwlock_t *rL) void read_unlock(rwlock_t *rL) flag[id] = ;
{ { } work, Why?
lock(&rL → read_lock); lock(&rL → read_lock); void unlock (int id) - Both threads can acquire the lock as
rL → num_readers++; rL → num_readers--; { “while condition check” and “setting the
if(rL → num_readers == ) if(rL → num_readers == ) flag[id] = ; flag” is non-atomic
lock(&rL → write_lock); unlock(&rL → write_lock);
}
unlock(&rL → read_lock); unlock(&rL → read_lock);
} }
Implementation of read-write locks (readers) Software lock: Buggy #1
struct rwlock_t{
int flag[ ] = { , };
Lock read_lock;
Lock write_lock;
void lock (int id) /*id = or */
int num_readers; { - Solution for two threads, T0 and T1 with
} while(flag[id ^ ])); // ^ → XOR id 0 and 1, respectively
void read_lock(rwlock_t *rL) void read_unlock(rwlock_t *rL) flag[id] = ;
- We have seen that this solution does not
{ { }
work, Why?
lock(&rL → read_lock); lock(&rL → read_lock); void unlock (int id)
rL → num_readers++; rL → num_readers--; {
if(rL → num_readers == ) if(rL → num_readers == ) flag[id] = ;
lock(&rL → write_lock); unlock(&rL → write_lock);
}
unlock(&rL → read_lock); unlock(&rL → read_lock);
} }
Software lock: Buggy #2 Software lock: Buggy #3
int flag[ ] = { , }; int turn = ;
void lock (int id) /*id = or */ - Does this solution work? void lock (int id) /*id = or */ - Assuming T invokes lock( ) first, does
0
{ { the solution provide mutual exclusion?
- No, as this can lead to a deadlock (flag[0]
flag[id] = ; while(turn == id ^ ));
= flag[1] = 1) In other words the - Yes it does, but there is another issue
while(flag[id ^ ])); // ^ → XOR }
“progress” requirement is not met with this solution - two threads must
} void unlock (int id)
- Progress: If no one has acquired the lock request the lock in an alternate manner
void unlock (int id) {
{ and there are contending threads, one of turn = id ^ ; - Progress requirement is not met
flag[id] = ; the threads must acquire the lock within } - Argument: one of the threads stuck
} a finite time in an infinite loop (in non-CS code)
Software lock: Buggy #2 Software lock: Buggy #3
int flag[ ] = { , }; int turn = ;
void lock (int id) /*id = or */ - Does this solution work? void lock (int id) /*id = or */ - Assuming T invokes lock( ) first, does
0
{ { the solution provide mutual exclusion?
flag[id] = ; while(turn == id ^ ));
while(flag[id ^ ])); // ^ → XOR }
} void unlock (int id)
void unlock (int id) {
{ turn = id ^ ;
flag[id] = ; }
}
Peterson’s solution
int flag[ ] = { , }; int turn = ;
- Homework: Prove that mutual
void lock (int id) /*id = or */
exclusion is guaranteed
{
- What about fairness?
flag[id] = ;
turn = id ^ ; - The lock is fair because if two
while(flag[id ^ ]) && turn == (id ^ )); threads are contending, they
} acquire the lock in an alternate
void unlock (int id) manner
{ - Extending the solution to N
flag[id] = ; threads is possible
}
Peterson’s solution
int flag[ ] = { , }; int turn = ;
void lock (int id) /*id = or */ - Homework: Prove that mutual
{ exclusion is guaranteed
flag[id] = ;
- What about fairness?
turn = id ^ ;
while(flag[id ^ ]) && turn == (id ^ ));
}
void unlock (int id)
{
flag[id] = ;
}
Semaphores Unix semaphores
- Can be used to in a multi-threaded
- Mutual exclusion techniques allows exactly one thread to access the #include <semaphore.h> process or across multiple processes
critical section which can be restrictive - If second argument is 0, the semaphore
- Consider a scenario when a finite array of size N is accessed from a set of main( ){ can be used from multiple threads
sem_t s;
producer and consumer threads. In this case, - Semaphores initialized with value = 1
int K = 5;
- At most N concurrent producers are allowed if array is empty sem_init(&s, , K); (third argument) is called a binary
- At most N concurrent consumers are allowed if array is full sem_wait(&s); semaphore and can be used to implement
- If we use mutual exclusion techniques, only one producer or sem_post(&s); blocking(waiting) locks
}
consumer is allowed at any point of time - Initialize: sem_init(s, 0, 1)
lock:sem_wait(s), unlock: sem_post(s)
Operations on semaphore
struct semaphore{ - Semaphores can be initialized
int value; by passing an initial value
CS330: Operating Systems spinlock_t *lock; - sem_wait waits (if required) till
queue *waitQ;
}sem_t; the value becomes +ve and
returns after decrementing the
Semaphore, Classical problems // Operations value
- sem_post increments the value
sem_init(sem_t *sem, int init_value);
and wakes up a waiting context
sem_wait(sem_t *sem);
- Other notations: P-V, down-up,
sem_post(sem_t *sem);
wait-signal
Semaphore usage example: ordering Semaphore usage example: ordering
A= ; B= ; sem_init(s , );
A= ; B= ;
Thread- { - What are the possible outputs? Thread - {
A= ; A= ; - What are the possible outputs?
printf(“B = %d\n”, B); sem_wait(s );
} printf(“B = %d\n”, B);
}
Thread- { Thread - {
B= ; B= ;
printf(“A = %d\n”, A); sem_post(s );
} printf(“A = %d\n”, A);
}
Semaphore usage example: wait for child Semaphore usage example: ordering
child( ){ A= ; B= ;
… - Assume that the semaphore is
sem_post(s); accessible from multiple processes, Thread- { - What are the possible outputs?
exit( ); value initialized to zero A= ; - (A = 1, B= 1), (A = 1, B = 0), (A = 0, B=1)
} printf(“B = %d\n”, B);
- If parent is scheduled after the child - How to guarantee A = 1, B= 1?
}
int main (void ){ creation, it waits till child finishes
sem_init(s, ); Thread- {
- If child is scheduled and exits before
if(fork( ) == ) B= ;
parent, parent does not wait for the
child( ); printf(“A = %d\n”, A);
sem_wait(s); semaphore }
}
Ordering with two semaphores Buggy #1
item_t A[n], pctr= , cctr = ;
sem_init(s , ); - Waiting for each other guarantees sem_t empty = sem_init(n), used = sem_init( );
sem_init(s , )
desired output produce(item_t item){ item_t consume( ) {
A= ; B= ; sem_wait(&empty); sem_wait(&used);
Thread - A[pctr] = item; item_t item = A[cctr];
Thread -
{ pctr = (pctr + ) % n; cctr = (cctr + ) % n;
{
B= ; sem_post(&used); sem_post(&empty);
A= ;
sem_wait(s ); } return item;
sem_post(s );
sem_post(s ); }
sem_wait(s );
printf(“%d\n”, B); printf(“%d\n”, A); - This solution does not work. What is the issue?
} }
Semaphore usage example: ordering Producer-consumer problem DoConsumerWork( ){
sem_init(s , ); DoProducerWork( ){ while( ){
A= ; B= ; while( ){ item_t item = consume( );
Thread - { item_t item = prod_p( ); cons_p(item);
A= ; - What are the possible outputs? produce(item); }
sem_wait(s ); - (A = 1, B = 1), (A=0, B=1) } }
printf(“B = %d\n”, B); }
- How to guarantee A = 1, B= 1?
} - A buffer of size N, one or more producers and consumers
Thread - { - Producer produces an element into the buffer (after processing)
B= ;
- Consumer extracts an element from the buffer and processes it
sem_post(s );
printf(“A = %d\n”, A); - Example: A multithreaded web server, network protocol layers etc.
} - How to solve this problem using semaphores?
Buggy #2 A working solution
item_t A[n], pctr= , cctr = ; lock_t *L = init_lock( ); item_t A[n], pctr= , cctr = ; lock_t *L = init_lock( );
sem_t empty = sem_init(n), used = sem_init( ); sem_t empty = sem_init(n), used = sem_init( );
produce(item_t item){ item_t consume( ) { produce(item_t item){ item_t consume( ) {
lock(L); sem_wait(&empty); lock(L); sem_wait(&used); sem_wait(&empty); lock(L); sem_wait(&used); lock(L)
A[pctr] = item; item_t item = A[cctr];; A[pctr] = item; item_t item = A[cctr];
pctr = (pctr + ) % n; cctr = (cctr + ) % n; pctr = (pctr + ) % n; cctr = (cctr + ) % n;
sem_post(&used); unlock(L); sem_post(&empty); unlock(L); unlock(L); sem_post(&used); unlock(L); sem_post(&empty);
} return item; } return item;
} }
- What is the problem? - The solution is deadlock free and ensures correct synchronization, but very
much serialized (inside produce and consume)
- What if we use separate locks for producer and consumer?
Buggy #1 Buggy #2
item_t A[n], pctr= , cctr = ; item_t A[n], pctr= , cctr = ; lock_t *L = init_lock( );
sem_t empty = sem_init(n), used = sem_init( ); sem_t empty = sem_init(n), used = sem_init( );
produce(item_t item){ item_t consume( ) { produce(item_t item){ item_t consume( ) {
sem_wait(&empty); sem_wait(&used); lock(L); sem_wait(&empty); lock(L); sem_wait(&used);
A[pctr] = item; item_t item = A[cctr];; A[pctr] = item; item_t item = A[cctr];;
pctr = (pctr + ) % n; cctr = (cctr + ) % n; pctr = (pctr + ) % n; cctr = (cctr + ) % n;
sem_post(&used); sem_post(&empty); sem_post(&used); unlock(L); sem_post(&empty); unlock(L);
} return item; } return item;
} }
- This solution does not work. What is the issue? - What is the problem?
- The counters (pctr and cctr) are not protected, can cause race conditions - Consider empty = 0 and producer has taken lock before the consumer. This
results in a deadlock, consumer waits for L and producer for empty
Concurrency bugs - atomicity issues
char *ptr; // Allocated before use void T ( )
void T ( ) {
CS330: Operating Systems { …
… if(some_condition){
strcpy(ptr, “hello world!”); free(ptr); ptr = NULL;
... }
Concurrency bugs } …
}
- This code is buggy. What is the issue?
Solution with separate mutexes Common issues in concurrent programs
item_t A[n], pctr= , cctr = ; lock_t *P = init_lock( ), *C=init_lock( );
sem_t empty = sem_init(n), used = sem_init( );
produce(item_t item){ item_t consume( ) {
sem_wait(&empty); lock(P); sem_wait(&used); lock(C) - Atomicity issues
A[pctr] = item; item_t item = A[cctr];
- Failure of ordering assumption
pctr = (pctr + ) % n; cctr = (cctr + ) % n;
unlock(P); sem_post(&used); unlock(C); sem_post(&empty); - Deadlocks
} return item;
}
- Does this solution work?
- Homework: Assume that item is a large object and copy of item takes long
time. How can we perform the copy operation without holding the lock?
Concurrency bugs - atomicity issues Concurrency bugs - ordering issues
char *ptr; // Allocated before use void T ( ) . bool pending; . void T ( )
void T ( ) { . void T ( ) . {
{ … . { . do_some_processing( );
… if(some_condition){ . pending = true; . pending = false;
if(ptr) strcpy(ptr, “hello world!”); free(ptr); ptr = NULL; . do_large_processing( ); . some_other_processing( );
... } . while (pending); . }
} … . }
}
- Does the above fix (checking ptr in T1) work? - This code works with the assumption that line#4 of T2 is executed after
line#4 of T1
- If this ordering is violated, T1 is stuck in the while loop
Concurrency bugs - atomicity issues Concurrency bugs - atomicity issues
char *ptr; // Allocated before use void T ( ) char *ptr; // Allocated before use void T ( )
void T ( ) { void T ( ) {
{ … { …
… if(some_condition){ … if(some_condition){
strcpy(ptr, “hello world!”); free(ptr); ptr = NULL; if(ptr) strcpy(ptr, “hello world!”); free(ptr); ptr = NULL;
... } ... }
} … } …
} }
- This code is buggy. What is the issue? - Does the above fix (checking ptr in T1) work?
- T2 can free the pointer before T1 uses it. - Not really. Consider the following order of execution:
- How to fix it? - T1: “if(ptr)” T2: “free(ptr)” T1: “strcpy” Result: Segfault
Concurrency bugs - deadlocks Conditions for deadlock
struct acc_t{
lock_t *L; - Consider a simple transfer - Mutual exclusion: exclusive control of resources (e.g, thread holding lock)
id_t acc_no; transaction in a bank - Hold-and-wait: hold one resource and wait for other
long balance;
- Where is the deadlock? - No resource preemption: Resources can not be forcibly removed from
}
void txn_transfer( acc_t *src, - T1: txn_transfer(iitk, cse, 10000) threads holding them
acc_t *dst, long amount) - lock (iitk), lock (cse) - Circular wait: A cycle of threads requesting locks held by others. Specifically,
{ - T2: txn_transfer(cse, iitk, 5000) a cycle in the directed graph G (V, E) where V is the set of processes and
lock(src → L); lock(dst → L); - lock (cse), lock(iitk) (v1, v2) ∈ E if v1 is waiting for a lock held by v2
check_and_transfer(src, dst, amount);
unlock(dst → L); unlock(src → L); All of the above conditions should be satisfied for a deadlock to occur
}
Concurrency bugs - deadlocks Dining philosophers
struct acc_t{
- Consider a simple transfer P0 atomic_t forks[ ];
lock_t *L;
transaction in a bank
F1 F0 Philosopher( int id)
id_t acc_no;
{
long balance; - Where is the deadlock? while ( ) {
P1
} P4
think( );
void txn_transfer( acc_t *src, acquire(forks[id]);
acc_t *dst, long amount) acquire(forks[(id+ ) % ]);
F2
{ F4
eat( );
lock(src → L); lock(dst → L); release( forks[(id+ ) % ]);
check_and_transfer(src, dst, amount); release(forks[id]);
P2 P3
unlock(dst → L); unlock(src → L); }
} F3
}
Recap: file system Step-1: Disk device partitioning
Concurrency bugs - avoiding deadlocks /
USER
OS
Storage devices
Hard disk
Logical Partitions
/dev/sda1 - It does not create a file system
- Deadlock in a simple transfer
etc bin sbin home lib SSD - A file system is created on a partition
lock_t *L;
layer - parted
/dev/sda2
code file.txt
to manage the physical device and
Others /dev/sda3 present the logical view
id_t acc_no; transaction in a bank - File system is an important OS subsystem - All file systems provide utilities to
initialize file system on the partition
long balance; - Provides abstractions like files and directories
- While acquiring locks, first acquire - Hides the complexity of underlying storage devices (e.g., MKFS)
}
void txn_transfer( acc_t *src, the lock for the account with lower
acc_t *dst, long amount) “acc_no” value
{ - Account number comparison Recap: Process view of file
PCB (P1)
lock(src → L); lock(dst → L); performed before acquiring the lock
P1
fd1 =open(“file1”) 0 1 2 3 file 1
CS330: Operating Systems Inode 1
check_and_transfer(src, amount); P2
fd1 = open(“file1”)
PCB (P2) file 1
fd2 = open(“file2”) 0 1 2 3 4
unlock(dst → L); unlock(src → L);
file 2 Inode 2
Filesystem
- Per-process file descriptor table with pointer to a “file” object
} - file object → inode (in-memory) is many-to-one
- How is the inode maintained in a persistent manner? How to access data at
different offsets of a file? How directory structure is maintained?
Solutions for deadlocks Dining philosophers: breaking the deadlock
P0 atomic_t forks[ ];
- Remove mutual exclusion: lock free data structures
F1 F0 Philosopher( int id)
{
- Either acquire all resources or no resource while ( ) {
P1
- trylock(lock) APIs can be used (e.g., pthread_mutex_trylock( )) Cycle breaking rule: fork P4
if(id == ){
- Careful scheduling: Avoid scheduling threads such that no deadlock occur with lower value must be acquire( );
- Most commonly used technique is to avoid circular wait. This can be F2 acquire( );
acquired first. F4
}else{
achieved by ordering the resources and acquiring them in a particular order acquire(forks[id]);
from all the threads. P2 P3
acquire(forks[id+ ]);
}
F3
…
}
File system organization Inode Indirect block pointers File system organization
0
SB Block bitmap Inode bitmap Inode table Data blocks - A on-disk structure containing information regarding files/directories in SB Block bitmap Inode bitmap Inode table Data blocks
the unix systems Inode
- File system is mounted, the inode number for root of the file system (mount
- Given any inode number, load the I0
K0 K1 K2 K3
point) is known, root inode can be- accessed.
Given any inode number, load the
However,
- Represented by a unique number in the file system (e.g., in Linux, “ls ...
Super block inode structure into memory K0 Super block
- How to search/lookup files/directoriesinode structure
under into memory
root inode?
-i filename” can be used to print the inode) I0 I1 I2
Inode bitmap address - Contains access permissions, access time, file size etc. ... K1 - Specifically,
Inode bitmap address
inode_t *get_inode(SB *sb, long ino){ inode_t *get_inode(SB *sb, long ino){
Inode table address - Most importantly, inode contains information regarding the file data Inode-table
How to locate the content in disk?
address
inode_t *inode = alloc_mem_inode( ); S-1
inode_t *inode = alloc_mem_inode( );
Total (Max) inodes read_disk(inode, sb → inode_table + location on the device Total-(Max)
Index structures in inode are used to map file offset to disk location
inodes read_disk(inode, sb → inode_table +
- Directory inodes also contain information regarding its content, albeit the - Inode contains the pointers to a block containing pointers to data blocks - How to keep track of size, permissions etc.?
Other information ino * sizeof(inode)); Other information ino * sizeof(inode));
content is structured (for searching files) - Advantages: flexible, random access is good - Inode is used to maintain these information
return inode; return inode;
- Disadvantages: Indirect block access overheads (even for small files)
} }
Structure of an example superblock File system organization Direct block pointers 0
Hybrid block pointers: Ext2 file system
struct superblock{ - Superblock contains information SB Block bitmap Inode bitmap Inode table Data blocks
Inode
Ext2/3 inode Direct pointers {PTR [0] to PTR [11]}
K0
u16 block_size; regarding the device and the file ... ….. K0 K1 K2 K11 File block address (0 -11)
- File system is mounted, the inode- numberGiven any inode
for root of number, load the(i.e., the
the file system K1 …...
u64 num_blocks; system organization in the disk K0 K1 K2 PTR[15]
mount point) is known, root inode can
Super block inode structure into memory
be accessed. …..
Single indirect {PTR [12]}
u64 last_mount_time; - Pointers to different metadata related K4 ... Ks Ks
I File block address (12 -1035)
- How to search/lookup files/directories under root inode? …... 1
u64 root_inode_num; to the file system are also maintained Inode bitmap address
inode_t *get_inode(SB *sb, long ino){
- Specifically, S-1
Double indirect {PTR [13]}
u64 max_inodes; by the superblock Inode table address
inode_t *inode = alloc_mem_inode( );
- How to locate the content in disk? I File block address (1036 to 1049611)
disk_off_t inode_table; - Ex: List of free blocks is required Total (Max) inodes read_disk(inode, sb → inode_table + 2
- How to keep track of size, permissions etc.? - Inode contains direct pointers to the block
disk_off_t blk_usage_bitmap; before adding data to a new Other information ino * sizeof(inode)); Triple indirect {PTR [14]}
- Flexible: growth, shrink, random access is good
... file/directory return inode; I3 File block address (?? to ??)
- Can not support files of larger size!
}; }
Step 2: File system creation /dev/sda2
Contiguous allocation Linked allocation
0 0 0
Physical Disk Logical Partitions 1
2
/dev/sda
- fdisk /dev/sda1 Inode K Inode K
- parted mkfs EXFS /dev/sda2 ... ...
/dev/sda2 start = K Start = K K+1
size = N K+N-1 Last = M
/dev/sda3 ... ... K+2
N-1
S-1 S-1
- MKFS creates initial structures in the logical partition
- Works nicely for both sequential and random access
- Creates the entry point to the filesystem (known as the super block) - Every block contains pointer to next block
- Append operation is difficult, How to expand files? Require relocation!
- At this point the file system is ready to be mounted - Advantage: flexible, easy to grow and shrink, Disadvantage: random access
- External fragmentation is a concern
- Why maintain last block not size? Efficient append operation!
Step 3: File system mounting Problem: file offset to disk address mapping Linked allocation
Block device 0
User view of file How to efficiently
- mount( ) associates a superblock with translate file offset to
0
EXFS information 0 1 Inode K
(superblock, mount point) the file system mount point 1 device address? 2
...
2
Load file system - Example: The OS will use the Start = K K+1
Last = M
superblock associated with the mount ... K+2
Inode
OS point “/home” to reach any file/dir
under “/home” N-1 S-1
S-1
mount( “/dev/sdb1”, “/home”,
“EXFS”,flags , fs_options)
- Superblock is a copy of the on-disk - Every block contains pointer to next block
superblock along with other - File size can range from few bytes to gigabytes - Advantage: flexible, easy to grow and shrink, Disadvantage: random access
USER
mount -t exfs /dev/sdb1 /home information - Can be accessed in a sequential or random manner - Why maintain last block not size?
- How to design the mapping structure?
Organizing the directory content
Fixed size directory entry Variable size directory entry
struct dir_entry{
struct dir_entry{ inode_t inode_num;
inode_t inode_num; u8 entry_len;
char name[FNAME_MAX]; char name[name_len];
}; };
- Variable sized directory entries contain length explicitly
- Advantages: less space wastage (compact)
- Disadvantages: inefficient rename, require compaction
Organizing the directory content File system organization
Fixed size directory entry SB Block bitmap Inode bitmap Inode table Data blocks
- File system is mounted, the inode number for root of the file system (mount
struct dir_entry{ point) is known, root inode can be- accessed. However,
Given any inode number, load the
inode_t inode_num;
char name[FNAME_MAX]; - How
Superto search/lookup files/directories
block under
inode root inode?
structure into memory
}; - Read the content of the root inode and search the next level dir/file
Inode bitmap address
- Specifically, inode_t *get_inode(SB *sb, long ino){
Inode table address
- How to locate the content in disk?inode_t *inode = alloc_mem_inode( );
- Fixed size directory entry is a simple way to organize directory content Total (Max) inodes read_disk(inode,
- Index structures in inode are used sb →toinode_table
to map file offset +
disk location
- Advantages: avoid fragmentation, rename
- Disadvantages: space wastage - How to keep track of size, permissions etc.? ino * sizeof(inode));
Other information
- Inode is used to maintain thesereturn inode;
information
}