Threads: Single and Multithreaded Processes

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 34

Threads

• Overview
• Multithreading Models
• Threading Issues
• Pthreads
• Solaris 2 Threads
• Windows 2000 Threads
• Linux Threads
• Java Threads

Single and Multithreaded Processes

Benefits

• Responsiveness
• Resource Sharing
• Economy
• Utilization of MP Architectures
User Threads

• Thread management done by user-level threads library


• Examples
- POSIX Pthreads
- Mach C-threads
- Solaris threads

Kernel Threads

• Supported by the Kernel


• Examples
- Windows 95/98/NT/2000
- Solaris
- Tru64 UNIX
- BeOS
- Linux

Multithreading Models

• Many-to-One
• One-to-One
• Many-to-Many

Many-to-One
• Many user-level threads mapped to single kernel thread.
• Used on systems that do not support kernel threads.

One-to-On

• Each user-level thread maps to kernel thread.


• Examples
- Windows 95/98/NT/2000
- OS/2

Many-to-Many Model

• Allows many user level threads to be mapped to many kernel threads.


• Allows the operating system to create a sufficient number of kernel threads.
• Solaris 2
• Windows NT/2000 with the ThreadFiber package

Threading Issues

• Semantics of fork() and exec() system calls.


• Thread cancellation.
• Signal handling
• Thread pools
• Thread specific data

Pthreads

• a POSIX standard (IEEE 1003.1c) API for thread creation and synchronization.
• API specifies behavior of the thread library, implementation is up to development
of the library.
• Common in UNIX operating systems.

Solaris 2 Threads
Solaris Process

Windows 2000 Threads

• Implements the one-to-one mapping.


• Each thread contains
- a thread id
- register set
- separate user and kernel stacks
- private data storage area

Linux Threads

• Linux refers to them as tasks rather than threads.


• Thread creation is done through clone() system call.
• Clone() allows a child task to share the address space of the parent task (process)

Java Threads

• Java threads may be created by:


✦ Extending Thread class
✦ Implementing the Runnable interface
• Java threads are managed by the JVM.
Java Thread States
CPU Scheduling
 Basic Concepts
 Scheduling Criteria
 Scheduling Algorithms
 Multiple-Processor Scheduling
 Real-Time Scheduling
 Algorithm Evaluation

Basic Concepts

• Maximum CPU utilization obtained with multiprogramming


• CPU–I/O Burst Cycle – Process execution consists of a cycle of CPU execution
and I/O wait.
• CPU burst distribution

Alternating Sequence of CPU and I/O Bursts


Histogram of CPU-burst Times

CPU Scheduler

• Selects from among the processes in memory that are ready to execute, and
allocates the CPU to one of them.
• CPU scheduling decisions may take place when a process:
 Switches from running to waiting state.
 Switches from running to ready state.
 Switches from waiting to ready.
 Terminates.
• Scheduling under 1 and 4 is nonpreemptive.
• All other scheduling is preemptive.
Dispatcher

• Dispatcher module gives control of the CPU to the process selected by the short-
term scheduler; this involves:
✦ switching context
✦ switching to user mode
✦ jumping to the proper location in the user program to restart that program
• Dispatch latency – time it takes for the dispatcher to stop one process and start
another running.

Scheduling Criteria

• CPU utilization – keep the CPU as busy as possible


• Throughput – # of processes that complete their execution per time unit
• Turnaround time – amount of time to execute a particular process
• Waiting time – amount of time a process has been waiting in the ready queue
• Response time – amount of time it takes from when a request was submitted until
the first response is produced, not output (for time-sharing environment)

Optimization Criteria

• Max CPU utilization


• Max throughput
• Min turnaround time
• Min waiting time
• Min response time

First-Come, First-Served (FCFS) Scheduling


Process Burst Time
P1 24
P2 3
P3 3
• Suppose that the processes arrive in the order: P1 , P2 , P3
The Gantt Chart for the schedule is:

P1 P2 P3

0 24 27 30
• Waiting time for P1 = 0; P2 = 24; P3 = 27
• Average waiting time: (0 + 24 + 27)/3 = 17
Suppose that the processes arrive in the order
P2 , P3 , P1 .
• The Gantt chart for the schedule is:

P2 P3 P1

0 3 6 30

• Waiting time for P1 = 6; P2 = 0; P3 = 3


• Average waiting time: (6 + 0 + 3)/3 = 3
• Much better than previous case.
• Convoy effect short process behind long process

Shortest-Job-First (SJR) Scheduling

• Associate with each process the length of its next CPU burst. Use these lengths to
schedule the process with the shortest time.
• Two schemes:
✦ nonpreemptive – once CPU given to the process it cannot be preempted
until completes its CPU burst.
✦ preemptive – if a new process arrives with CPU burst length less than
remaining time of current executing process, preempt. This scheme is
know as the
Shortest-Remaining-Time-First (SRTF).
• SJF is optimal – gives minimum average waiting time for a given set of processes.

Example of Non-Preemptive SJF

Process Arrival Time Burst Time


P1 0.0 7
P2 2.0 4
P3 4.0 1
P4 5.0 4

• SJF (non-preemptive)
P1 P3 P2 P4

0 3 7 8 12 16
• Average waiting time = (0 + 6 + 3 + 7)/4 - 4

Example of Preemptive SJF

Process Arrival Time Burst Time


P1 0.0 7
P2 2.0 4
P3 4.0 1
P4 5.0 4

• SJF (preemptive)

P1 P2 P3 P2 P4 P1

0 2 4 5 7 11 16

• Average waiting time = (9 + 1 + 0 +2)/4 - 3

Determining Length of Next CPU Burst

• Can only estimate the length.


• Can be done by using the length of previous CPU bursts, using exponential
averaging.

1. tn = actual lenght of nth CPU burst


2. τ n +1 = predicted value for the next CPU burst
3. α, 0 ≤ α ≤ 1
4. Define :

τ n =1 = α tn + (1 − α )τ n .
Prediction of the Length of the Next CPU Burst

Examples of Exponential Averaging

• α =0
✦ τ n+1 = τ n
✦ Recent history does not count.
• α =1
✦ τ n+1 = tn
✦ Only the actual last CPU burst counts.
• If we expand the formula, we get:
τ n+1 = α tn+(1 - α ) α tn -1 + …
+(1 - α )j α tn -1 + …
+(1 - α )n=1 tn τ 0
• Since both α and (1 - α ) are less than or equal to 1, each successive term has
less weight than its predecessor.

Priority Scheduling

• A priority number (integer) is associated with each process


• The CPU is allocated to the process with the highest priority (smallest integer ≡
highest priority).
✦ Preemptive
✦ nonpreemptive
• SJF is a priority scheduling where priority is the predicted next CPU burst time.
• Problem ≡ Starvation – low priority processes may never execute.
• Solution ≡ Aging – as time progresses increase the priority of the process.

Round Robin (RR)

• Each process gets a small unit of CPU time (time quantum), usually 10-100
milliseconds. After this time has elapsed, the process is preempted and added to
the end of the ready queue.
• If there are n processes in the ready queue and the time quantum is q, then each
process gets 1/n of the CPU time in chunks of at most q time units at once. No
process waits more than (n-1)q time units.
• Performance
✦ q large ⇒ FIFO
✦ q small ⇒ q must be large with respect to context switch, otherwise
overhead is too high.

Example of RR with Time Quantum = 20


Process Burst Time
P1 53
P2 17
P3 68
P4 24
• The Gantt chart is:

P1 P2 P3 P4 P1 P3 P4 P1 P3 P3
2 3 5 7 9 11 12 13 15 16
0
0 7 7 7 7 7 1 4 4 2
• Typically, higher average turnaround than SJF, but better response.
Time Quantum and Context Switch Time

Turnaround Time Varies With The Time Quantum


Multilevel Queue

• Ready queue is partitioned into separate queues:


foreground (interactive)
background (batch)
• Each queue has its own scheduling algorithm,
foreground – RR
background – FCFS
• Scheduling must be done between the queues.
✦ Fixed priority scheduling; (i.e., serve all from foreground then from
background). Possibility of starvation.
✦ Time slice – each queue gets a certain amount of CPU time which it can
schedule amongst its processes; i.e., 80% to foreground in RR
✦ 20% to background in FCFS

Multilevel Queue Scheduling


Multilevel Feedback Queue

• A process can move between the various queues; aging can be implemented this
way.
• Multilevel-feedback-queue scheduler defined by the following parameters:
✦ number of queues
✦ scheduling algorithms for each queue
✦ method used to determine when to upgrade a process
✦ method used to determine when to demote a process
✦ method used to determine which queue a process will enter when that
process needs service

Example of Multilevel Feedback Queue

• Three queues:
✦ Q0 – time quantum 8 milliseconds
✦ Q1 – time quantum 16 milliseconds
✦ Q2 – FCFS
• Scheduling
✦ A new job enters queue Q0 which is served FCFS. When it gains CPU, job
receives 8 milliseconds. If it does not finish in 8 milliseconds, job is
moved to queue Q1.
✦ At Q1 job is again served FCFS and receives 16 additional milliseconds.
If it still does not complete, it is preempted and moved to queue Q2.

Multilevel Feedback Queues


Multiple-Processor Scheduling

• CPU scheduling more complex when multiple CPUs are available.


• Homogeneous processors within a multiprocessor.
• Load sharing
• Asymmetric multiprocessing – only one processor accesses the system data
structures, alleviating the need for data sharing.

Real-Time Scheduling

• Hard real-time systems – required to complete a critical task within a guaranteed


amount of time.
• Soft real-time computing – requires that critical processes receive priority over
less fortunate ones.

Dispatch Latency
Algorithm Evaluation

• Deterministic modeling – takes a particular predetermined workload and defines


the performance of each algorithm for that workload.
• Queueing models
• Implementation

Evaluation of CPU Schedulers by Simulation


Solaris 2 Scheduling

Windows 2000 Priorities


Process Synchronization
 Background
 The Critical-Section Problem
 Synchronization Hardware
 Semaphores
 Classical Problems of Synchronization
 Critical Regions
 Monitors
 Synchronization in Solaris 2 & Windows 2000

Background
 Concurrent access to shared data may result in data inconsistency.
 Maintaining data consistency requires mechanisms to ensure the orderly
execution of cooperating processes.
 Shared-memory solution to bounded-butter problem (Chapter 4) allows at most n
– 1 items in buffer at the same time. A solution, where all N buffers are used is
not simple.
✦ Suppose that we modify the producer-consumer code by adding a variable
counter, initialized to 0 and incremented each time a new item is added to
the buffer
Bounded-Buffer
 Shared data

#define BUFFER_SIZE 10
typedef struct {
...
} item;
item buffer[BUFFER_SIZE];
int in = 0;
int out = 0;
int counter = 0;

 Producer process

item nextProduced;
while (1) {
while (counter == BUFFER_SIZE)
; /* do nothing */
buffer[in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
counter++;
}
 Consumer process

item nextConsumed;
while (1) {
while (counter == 0)
; /* do nothing */
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
counter--;
}

 The statements
counter++;
counter--;
must be performed atomically.

 Atomic operation means an operation that completes in its entirety without


interruption.
 The statement “count++” may be implemented in machine language as:

register1 = counter
register1 = register1 + 1
counter = register1

 The statement “count—” may be implemented as:

register2 = counter
register2 = register2 – 1
counter = register2

 If both the producer and consumer attempt to update the buffer concurrently, the
assembly language statements may get interleaved.

 Interleaving depends upon how the producer and consumer processes are
scheduled.
 Assume counter is initially 5. One interleaving of statements is:

producer: register1 = counter (register1 = 5)


producer: register1 = register1 + 1 (register1 = 6)
consumer: register2 = counter (register2 = 5)
consumer: register2 = register2 – 1 (register2 = 4)
producer: counter = register1 (counter = 6)
consumer: counter = register2 (counter = 4)

 The value of count may be either 4 or 6, where the correct result should be 5.
Race Condition
■ Race condition: The situation where several processes access – and manipulate
shared data concurrently. The final value of the shared data depends upon which
process finishes last.

■ To prevent race conditions, concurrent processes must be synchronized.


The Critical-Section Problem
■ n processes all competing to use some shared data
■ Each process has a code segment, called critical section, in which the shared data
is accessed.
■ Problem – ensure that when one process is executing in its critical section, no
other process is allowed to execute in its critical section.
Solution to Critical-Section Problem
Mutual Exclusion. If process Pi is executing in its critical section, then no other
processes can be executing in their critical sections.
2. Progress. If no process is executing in its critical section and there exist some
processes that wish to enter their critical section, then the selection of the processes that
will enter the critical section next cannot be postponed indefinitely.
3. Bounded Waiting. A bound must exist on the number of times that other
processes are allowed to enter their critical sections after a process has made a request to
enter its critical section and before that request is granted.
 Assume that each process executes at a nonzero speed
 No assumption concerning relative speed of the n processes
Initial Attempts to Solve Problem
■ Only 2 processes, P0 and P1
■ General structure of process Pi (other process Pj)
do {
entry section
critical section
exit section
reminder section
} while (1);
■ Processes may share some common variables to synchronize their actions.
Algorithm 1
■ Shared variables:
✦ int turn;
initially turn = 0
✦ turn - i ⇒ Pi can enter its critical section
■ Process Pi
do {
while (turn != i) ;
critical section
turn = j;
reminder section
} while (1);
■ Satisfies mutual exclusion, but not progress
Algorithm 2
■ Shared variables
✦ boolean flag[2];
initially flag [0] = flag [1] = false.
✦ flag [i] = true ⇒ Pi ready to enter its critical section
■ Process Pi
do {
flag[i] := true;
while (flag[j]) ; critical section
flag [i] = false;
remainder section
} while (1);
■ Satisfies mutual exclusion, but not progress requirement.
Algorithm 3
■ Combined shared variables of algorithms 1 and 2.
■ Process Pi
do {
flag [i]:= true;
turn = j;
while (flag [j] and turn = j) ;
critical section
flag [i] = false;
remainder section
} while (1);
■ Meets all three requirements; solves the critical-section problem for two
processes.
Bakery Algorithm
Critical section for n processes
■ Before entering its critical section, process receives a number. Holder of the
smallest number enters the critical section.
■ If processes Pi and Pj receive the same number, if i < j, then Pi is served first; else
Pj is served first.
■ The numbering scheme always generates numbers in increasing order of
enumeration; i.e., 1,2,3,3,3,3,4,5...
■ Notation <≡ lexicographical order (ticket #, process id #)
■ (a,b) < c,d) if a < c or if a = c and b < d
■ max (a0,…, an-1) is a number, k, such that k ≥ ai for i - 0,
…, n – 1
■ Shared data
boolean choosing[n];
int number[n];
Data structures are initialized to false and 0 respectively
do {
choosing[i] = true;
number[i] = max(number[0], number[1], …, number [n – 1])+1;
choosing[i] = false;
for (j = 0; j < n; j++) {
while (choosing[j]) ;
while ((number[j] != 0) && (number[j,j] < number[i,i])) ;
}
critical section
number[i] = 0;
remainder section
} while (1);
Synchronization Hardware
■ Test and modify the content of a word atomically
.
boolean TestAndSet(boolean &target) {
boolean rv = target;
tqrget = true;

return rv;
}
Mutual Exclusion with Test-and-Set
■ Shared data:
boolean lock = false;
■ Process Pi
do {
while (TestAndSet(lock)) ;
critical section
lock = false;
remainder section
}
Synchronization Hardware
■ Atomically swap two variables.
void Swap(boolean &a, boolean &b) {
boolean temp = a;
a = b;
b = temp;
}
Mutual Exclusion with Swap
■ Shared data (initialized to false):
boolean lock;
boolean waiting[n];
■ Process Pi
do {
key = true;
while (key == true)
Swap(lock,key);
critical section
lock = false;
remainder section
}
Semaphores
■ Synchronization tool that does not require busy waiting.
■ Semaphore S – integer variable
■ can only be accessed via two indivisible (atomic) operations
wait (S):
while S≤ 0 do no-op;
S--;
signal (S):
S++;
Critical Section of n Processes
■ Shared data:
semaphore mutex; //initially mutex = 1
■ Process Pi:

do {
wait(mutex);
critical section
signal(mutex);
remainder section
} while (1);

Semaphore Implementation
■ Define a semaphore as a record
typedef struct {
int value;
struct process *L;
} semaphore;
■ Assume two simple operations:
✦ block suspends the process that invokes it.
✦ wakeup(P) resumes the execution of a blocked process P.
Implementation
■ Semaphore operations now defined as
wait(S):
S.value--;
if (S.value < 0) {
add this process to S.L;
block;
}
signal(S):
S.value++;
if (S.value <= 0) {
remove a process P from S.L;
wakeup(P);
}
Semaphore as a General Synchronization Tool
■ Execute B in Pj only after A executed in Pi
■ Use semaphore flag initialized to 0
■ Code:
Pi Pj
Μ Μ
A wait(flag)
signal(flag) B
Deadlock and Starvation
■ Deadlock – two or more processes are waiting indefinitely for an event that can
be caused by only one of the waiting processes.
■ Let S and Q be two semaphores initialized to 1
P0 P1
wait(S); wait(Q);
wait(Q); wait(S);
Μ Μ
signal(S); signal(Q);
signal(Q) signal(S);
■ Starvation – indefinite blocking. A process may never be removed from the
semaphore queue in which it is suspended.
Two Types of Semaphores
■ Counting semaphore – integer value can range over an unrestricted domain.
■ Binary semaphore – integer value can range only between 0 and 1; can be simpler
to implement.
■ Can implement a counting semaphore S as a binary semaphore
Implementing S as a Binary Semaphore
■ Data structures:
binary-semaphore S1, S2;
int C:
■ Initialization:
S1 = 1
S2 = 0
C = initial value of semaphore S
Implementing S
■ wait operation
wait(S1);
C--;
if (C < 0) {
signal(S1);
wait(S2);
}
signal(S1);

■ signal operation
wait(S1);
C ++;
if (C <= 0)
signal(S2);
else
signal(S1);
Classical Problems of Synchronization
■ Bounded-Buffer Problem
■ Readers and Writers Problem
■ Dining-Philosophers Problem
Bounded-Buffer Problem
■ Shared data

semaphore full, empty, mutex;

Initially:

full = 0, empty = n, mutex = 1


Bounded-Buffer Problem Producer Process
do {

produce an item in nextp

wait(empty);
wait(mutex);

add nextp to buffer

signal(mutex);
signal(full);
} while (1);

Bounded-Buffer Problem Consumer Process


do {
wait(full)
wait(mutex);

remove an item from buffer to nextc

signal(mutex);
signal(empty);

consume the item in nextc

} while (1);
Readers-Writers Problem
■ Shared data
semaphore mutex, wrt;

Initially

mutex = 1, wrt = 1, readcount = 0


Readers-Writers Problem Writer Process
wait(wrt);

writing is performed

signal(wrt);
Readers-Writers Problem Reader Process
wait(mutex);
readcount++;
if (readcount == 1)
wait(rt);
signal(mutex);

reading is performed

wait(mutex);
readcount--;
if (readcount == 0)
signal(wrt);
signal(mutex):
Dining-Philosophers Problem
■ Shared data
semaphore chopstick[5];
Initially all values are 1
Dining-Philosophers Problem
■ Philosopher i:
do {
wait(chopstick[i])
wait(chopstick[(i+1) % 5])

eat

signal(chopstick[i]);
signal(chopstick[(i+1) % 5]);

think

} while (1);
Critical Regions
■ High-level synchronization construct
■ A shared variable v of type T, is declared as:
v: shared T
■ Variable v accessed only inside statement
region v when B do S

where B is a boolean expression.


■ While statement S is being executed, no other process can access variable v.
■ Regions referring to the same shared variable exclude each other in time.
■ When a process tries to execute the region statement, the Boolean expression B is
evaluated. If B is true, statement S is executed. If it is false, the process is
delayed until B becomes true and no other process is in the region associated with
v.
Example – Bounded Buffer
■ Shared data:
struct buffer {
int pool[n];
int count, in, out;
}
Bounded Buffer Producer Process
■ Producer process inserts nextp into the shared buffer
region buffer when( count < n) {
pool[in] = nextp;
in:= (in+1) % n;
count++;
}
Bounded Buffer Consumer Process
■ Consumer process removes an item from the shared buffer and puts it in nextc
region buffer when (count > 0) { nextc =
pool[out];
out = (out+1) % n;
count--;
}
Implementation region x when B do S
■ Associate with the shared variable x, the following variables:
semaphore mutex, first-delay, second-delay;
int first-count, second-count;
■ Mutually exclusive access to the critical section is provided by mutex.
■ If a process cannot enter the critical section because the Boolean expression B is
false, it initially waits on the first-delay semaphore; moved to the second-delay
semaphore before it is allowed to reevaluate B.
Implementation
■ Keep track of the number of processes waiting on first-delay and second-delay,
with first-count and second-count respectively.
■ The algorithm assumes a FIFO ordering in the queuing of processes for a
semaphore.
■ For an arbitrary queuing discipline, a more complicated implementation is
required.
Monitors
■ High-level synchronization construct that allows the safe sharing of an abstract
data type among concurrent processes.
monitor monitor-name
{
shared variable declarations
procedure body P1 (…) {
...
}
procedure body P2 (…) {
...
}
procedure body Pn (…) {
...
}
{
initialization code
}
}
■ To allow a process to wait within the monitor, a condition variable must be
declared, as
condition x, y;
■ Condition variable can only be used with the operations wait and signal.
✦ The operation
x.wait();
means that the process invoking this operation is suspended until another process invokes
x.signal();
✦ The x.signal operation resumes exactly one suspended process. If no
process is suspended, then the signal operation has no effect.
Schematic View of a Monitor

Monitor With Condition Variables


Dining Philosophers Example
monitor dp
{
enum {thinking, hungry, eating} state[5];
condition self[5];
void pickup(int i) // following slides
void putdown(int i) // following slides
void test(int i) // following slides
void init() {
for (int i = 0; i < 5; i++)
state[i] = thinking;
}
}
Dining Philosophers
void pickup(int i) {
state[i] = hungry;
test[i];
if (state[i] != eating)
self[i].wait();
}

void putdown(int i) {
state[i] = thinking;
// test left and right neighbors
test((i+4) % 5);
test((i+1) % 5);
}
Dining Philosophers
void test(int i) {
if ( (state[(I + 4) % 5] != eating) &&
(state[i] == hungry) &&
(state[(i + 1) % 5] != eating)) {
state[i] = eating;
self[i].signal();
}
}

Monitor Implementation Using Semaphores


■ Variables
semaphore mutex; // (initially = 1)
semaphore next; // (initially = 0)
int next-count = 0;
■ Each external procedure F will be replaced by
wait(mutex);

body of F;

if (next-count > 0)
signal(next)
else
signal(mutex);
■ Mutual exclusion within a monitor is ensured.
Monitor Implementation
■ For each condition variable x, we have:
semaphore x-sem; // (initially = 0)
int x-count = 0;
■ The operation x.wait can be implemented as:

x-count++;
if (next-count > 0)
signal(next);
else
signal(mutex);
wait(x-sem);
x-count--;

■ The operation x.signal can be implemented as:


if (x-count > 0) {
next-count++;
signal(x-sem);
wait(next);
next-count--;
}

■ Conditional-wait construct: x.wait(c);


✦ c – integer expression evaluated when the wait operation is executed.
✦ value of c (a priority number) stored with the name of the process that is
suspended.
✦ when x.signal is executed, process with smallest associated priority
number is resumed next.
■ Check two conditions to establish correctness of system:
✦ User processes must always make their calls on the monitor in a correct
sequence.
✦ Must ensure that an uncooperative process does not ignore the mutual-
exclusion gateway provided by the monitor, and try to access the shared
resource directly, without using the access protocols.
Solaris 2 Synchronization
■ Implements a variety of locks to support multitasking, multithreading (including
real-time threads), and multiprocessing.
■ Uses adaptive mutexes for efficiency when protecting data from short code
segments.
■ Uses condition variables and readers-writers locks when longer sections of code
need access to data.
■ Uses turnstiles to order the list of threads waiting to acquire either an adaptive
mutex or reader-writer lock.
Windows 2000 Synchronization
■ Uses interrupt masks to protect access to global resources on uniprocessor
systems.
■ Uses spinlocks on multiprocessor systems.
■ Also provides dispatcher objects which may act as wither mutexes and
semaphores.
■ Dispatcher objects may also provide events. An event acts much like a condition
variable.

You might also like