Concurrency
Concurrency
dispatch
What is concurrent programming?
• Concurrency generally refers to events or circumstances
that are happening or existing at the same time.
• In programming terms, concurrent programming is a
technique in which two or more processes start, run in
an interleaved fashion through context switching and
complete in an overlapping time period by managing
access to shared resources e.g. on a single core of CPU.
• This doesn’t necessarily mean that multiple processes
will be running at the same instant – even if the results
might make it seem like it.
Difference between Concurrent & Parallel
programming
• In parallel programming, parallel processing is
achieved through hardware parallelism e.g.
executing two processes on two separate CPU
cores simultaneously.
• Concurrency is everywhere in modern
programming, whether we like it or not:
Multiple computers in a network
Multiple applications running on one computer
Multiple processors in a computer (today,
often multiple processor cores on a single chip)
Difference …
• In fact, concurrency is essential in modern
programming:
• Web sites must handle multiple simultaneous
users.
• Mobile apps need to do some of their processing
on servers (“in the cloud”).
• Graphical user interfaces almost always require
background work that does not interrupt the
user. For example, Eclipse compiles your Java
code while you’re still editing it.
Why concurrent programing is important?
source .o files
file
static libraries
Header (libc, streams…)
Code
Executable file
Initialized data (must follow standard format,
such as ELF on Linux,
Microsoft PE on Windows)
BSS
Symbol table
Line numbers
Ext. refs
Running a program
• OS creates a “process” and allocates memory for it
• The loader:
– reads and interprets the executable file
– sets process’s memory to contain code & data from executable
– pushes “argc”, “argv”, “envp” on the stack
– sets the CPU registers properly & calls “__start()”
– Program start running at __start(), which calls main()
– we say “process” is running, and no longer think of “program”
• When main() returns, controler calls “exit()”
– destroys the process and returns all resources
Process != Program mapped segments
Header DLL’s
Program is passive
Code • Code + data
Stack
Initialized data Process is running program
• stack, regs, program counter
BSS
Symbol table
Example:
Line numbers Heap
We both run IE:
Ext. refs - Same program BSS
- Separate processes
Executable
Initialized data
Process
address space Code
Process States
• Many processes in system, only one on CPU
• “Execution State” of a process:
– Indicates what it is doing
– Basically 3 states:
• Ready: waiting to be assigned to the CPU
• Running: executing instructions on the CPU
• Waiting: waiting for an event, e.g. I/O completion
• Process moves across different states
Process State Transitions
interrupt
adm
New i tt e Exit
d e
dispatch don
Ready Running
it
wa
I/O omp
t
c
en
or leti
ev
ev on
or
en
Waiting
I/O
t
Accounting Info …
What happens during execution?
Addr 232-1
R0
…
…
R31 Data1
Fetch
F0 Data0
… Exec
Inst237
F30
PC Inst236
…
Inst5
• Execution sequence: Inst4
– Fetch Instruction at PC Inst3 PC
– Decode Inst2 PC
– Execute (possibly using registers) Inst1 PC
– Write Results to registers Inst0 PC
– PC = Next Instruction(PC) Addr 0
– Repeat
How can we give the illusion of multiple processors?
Register values
Execution stacks
Thread Benefits
Simplified programming model per thread
Example: Microsoft Word
One thread for grammar check; one thread for
spelling check; one thread for formatting; and so
on…
Can be programmed independently
Register values
Running or sleeping
Scheduling information of the thread
e.g., priority
Dispatching Loop
Waiting
When does scheduler run?
• Non-preemptive minimum
– Process runs until voluntarily relinquish CPU
• process blocks on an event (e.g., I/O or synchronization)
• process terminates
• process yields
• Preemptive minimum
– All of the above, plus:
• Event completes: process moves from blocked to ready
• Timer interrupts
• Implementation: process can be interrupted in favor of another
Waiting
Process Model
• Process alternates between CPU and I/O bursts
– CPU-bound jobs: Long CPU bursts
Matrix multiply
– I/O-bound: Short CPU bursts
emacs
emacs
– I/O burst = process idle, switch to another “for free”
– Problem: don’t know job’s type before running
Student Presidential debates in Sunday
• Why watch?
– Want to see what hype is about
– Very entertaining
– See who is going to be elected president
– Support your friend who is computing for the position
• An underlying assumption:
– “response time” most important for interactive jobs (I/O bound)
“The perfect CPU scheduler”
• Minimize latency: response or job completion time
• Maximize throughput: Maximize jobs / time.
• Maximize utilization: keep I/O devices busy.
– Recurring theme with OS scheduling
• Fairness: everyone makes progress, no one starves
Problem Cases
• Blindness about job types
– I/O goes idle
• Optimization involves favoring jobs of type “A” over “B”.
– Lots of A’s? B’s starve
• Interactive process trapped behind others.
– Response time sucks for no reason
• Priority Inversion: A depends on B. A’s priority > B’s.
– B never runs
Scheduling Algorithms FCFS
• First-come First-served (FCFS) (FIFO)
– Jobs are scheduled in order of arrival
– Non-preemptive
• Problem:
– Average waiting time depends on arrival order
P1 P2 P3
time
0 16 20 24
P2 P3 P1
0 4 8 24
• Advantage: really simple!
Convoy Effect
• A CPU bound job will hold CPU until done,
– or it causes an I/O burst
• rare occurrence, since the thread is CPU-bound
long periods where no I/O requests issued, and CPU held
– Result: poor I/O device utilization
• Example: one CPU bound job, many I/O bound
• CPU bound runs (I/O devices idle)
• CPU bound blocks
• I/O bound job(s) run, quickly block on I/O
• CPU bound runs again
• I/O completes
• CPU bound still runs while I/O devices idle (continues…)
– Simple hack: run process whose I/O completed?
• What is a potential problem?
Scheduling Algorithms LIFO
• Last-In First-out (LIFO)
– Newly arrived jobs are placed at head of ready queue
– Improves response time for newly created threads
• Problem:
– May lead to starvation – early processes may never get CPU
Problem
• You work as a short-order cook
– Customers come in and specify which dish they want
– Each dish takes a different amount of time to prepare
• Your goal:
– minimize average time the customers wait for their food
• What strategy would you use ?
– Note: most restaurants use FCFS.
Scheduling Algorithms: SJF
• Shortest Job First (SJF)
– Choose the job with the shortest next CPU burst
– Provably optimal for minimizing average waiting time
P1 P3 P2
0 15 21 24
PP2 2 PP3 3 P1
0 3 9 24
• Problem:
– Impossible to know the length of the next CPU burst
Scheduling Algorithms SRTF
• SJF can be either preemptive or non-preemptive
– New, short job arrives; current process has long time to execute
• Preemptive SJF is called shortest remaining time first
P2
P3 P1
0 6 10 21
P3 P1 P2 P1
0 6 10 13 24
Shortest Job First Prediction
• Approximate next CPU-burst duration
– from the durations of the previous bursts
• The past can be a good predictor of the future
• No need to remember entire past history
• Use exponential average:
tn duration of the nth CPU burst
n+1 predicted duration of the (n+1)st CPU burst
n+1 = tn + (1- ) n
where 0 1
determines the weight placed on past behavior
Prediction of the Length of the Next CPU Burst
Examples of Exponential Averaging
• =0
– n+1 = n
– Recent history does not count
• =1
– n+1 = tn
– Only the actual last CPU burst counts
• If we expand the formula, we get:
n+1 = tn+(1 - ) tn -1 + …
+(1 - )j tn -j + …
+(1 - )n +1 0
P1 P2 P3 P4 P1 P3 P4 P1 P3 P3
Moral:
Context switch is usually negligible (< 1% per timeslice)
otherwise you context switch too frequently and lose all
productivity
Scheduling Algorithms
• Multi-level Queue Scheduling
• Implement multiple ready queues based on job “type”
– interactive processes
– CPU-bound processes
– batch jobs
– system processes
– student programs
• Different queues may be scheduled using different algorithms
• Intra-queue CPU allocation is either strict or proportional
• Problem: Classifying jobs into queues is difficult
– A process may have CPU-bound phases as well as interactive ones
Multilevel Queue Scheduling
Highest priority
System Processes
Interactive Processes
Batch Processes
Student Processes
Lowest priority
Scheduling Algorithms
• Multi-level Feedback Queues
• Implement multiple ready queues
– Different queues may be scheduled using different algorithms
– Just like multilevel queue scheduling, but assignments are not static
• Jobs move from queue to queue based on feedback
– Feedback = The behavior of the job,
• e.g. does it require the full quantum for computation, or
• does it perform frequent I/O ?
Quantum = 4
Quantum = 8
FCFS
Lowest priority
A Multi-level System
priority
CPU bound jobs
high
low timeslice
Multiple-Processor Scheduling
• CPU scheduling more complex when multiple CPUs are available
• Homogeneous processors within a multiprocessor
• Asymmetric multiprocessing – only one processor accesses the
system data structures, alleviating the need for data sharing
• Symmetric multiprocessing (SMP) – each processor is self-
scheduling, all processes in common ready queue, or each has
its own private queue of ready processes
• Processor affinity – process has affinity for processor on which
it is currently running
– soft affinity
– hard affinity
NUMA and CPU Scheduling
Multicore Processors
T1 T2
time
CSEnter(); CSEnter();
Critical section Critical section
CSExit(); CSExit();
T1 T2
Critical Section Goals
• Perhaps they loop (perhaps not!)
T1 T2
CSEnter(); CSEnter();
Critical section Critical section
CSExit(); CSExit();
T1 T2
Critical Section Goals
• We would like
– Safety (aka mutual exclusion)
• No more than one thread can be in a critical section at any time.
– Liveness (aka progress)
• A thread that is seeking to enter the critical section will eventually succeed
– Bounded waiting
• A bound must exist on the number of times that other threads are allowed
to enter their critical sections after a thread has made a request to enter its
critical section and before that request is granted
• Assume that each process executes at a nonzero speed
• No assumption concerning relative speed of the N processes