Concurrent Programiing Tutorial-1
Concurrent Programiing Tutorial-1
• Course Structure & Book
• Basic of thread and process
• Coordination and synchronization
• Example of Parallel Programming
– Shared memory : C/C++ Pthread, C++11 thread,
OpenMP, Cilk
Dr A Sahu – Distributed Memory : MPI
Dept of Computer Science & • Concurrent Objects
Engineering – Concurrent Queue, List, stack, Tree, Priority Queue,
IIT Guwahati Hash, SkipList
• Use of Concurrent objects
• Programming to Simulate Concurrent behavior
• Concurrent Programming of system
– Threads and processes – Multi‐threading
– Synchronization and monitors – Doing many task simultaneously
– Concurrent objects • Platform of Concurrent Programming
– Concurrent Programming in Java/MPI/CILK/C++. – May be uni‐processor
• Book – May be shared or distributed memory
multiprocessor
– Maurice Herlihy, Nir Shavit, Art of Multiprocessor
Programming, Elsevier 2009 • Parallel Programming
– Enhancing performance of application by running
– Anthony Williams, C++ Concurrency in Action:
program in parallel on Multiprocessor
Practical Multithreading , Dream tech
Publication, 2012
• Process • Exchange of data between threads/processes
– A sequential computation with its own thread of – Either by explicit message passing
control – Or through the values of shared variable
– Can be many threads of a Process • Between Process
• Thread – Message passing
– A sequential computation is the sequence of the – Message Passing Interface : MPI‐send(), MPI_recv()
program points that are reached as control flow
through source text
• Between thread
– through the values of shared variable
– Light weight process
– Many things shared by parent process
1
9/23/2014
• Relates the thread of one process with others • Time shared programs appears to run in parallel
– Even if it run on uni‐processor system
If P is point in the thread of a process P, and q – Lets go back to Pentium PC, RR Scheduling
is point in the thread of another process Q, • Interrupts (Hardware)
Then Synchronization can be used to constrain
Then Synchronization can be used to constrain – Allowed the activity of a central CPU to be
Allowed the activity of a central CPU to be
the order in which P reached to p and Q synchronized with data channels.
reaches to q. – If a program P needed to read a card, CPU could
initiate the read action on a data channel and start
Synchronization Involves: Exchange of control executing other program Q. Once the card had been
information between processes. read, the channel sent INT to CPU to resume
execution of P
• Reactive System: Potential for parallelism occurs
• No need to specify
in system
• Process networks in Unix (Pipe)
– User Interface: KBD, Mice and Display supporting
multiple window P1 | P2 | …|Pn
– Each primitive process does a simple job, perhaps
– Network, Game, Processor controller a trivial job
a trivial job
– but short pipeline of processes can do what would
otherwise done by substantial program
• Example
$ bc | number | speak
$ ls | wc –l
$ ps –A | grep mozilla
• Concurrent computation • Interleaving: The relative order of atomic
– Can be described in terms of events, where an events
event is an un‐interruptible action – An interleaving of two sequence S and T is any
– Event: execution of assignment, call, expr sequence U formed from the events of S and T
evaluation
l ti – Subjected to constraints: events of S retain their
order in U and so the event of T
• Example: S={a,b,c,d,…}, T={1,2,3,4,..}
– One U can be {1,a,b,c,d,2,3,4,e,5,f,g..}
2
9/23/2014
• Sharing Data • Sharing Data: Reader and Writer
More than 1 process and 1 must
– Reader and Writer be writer • Locking and unlocking
– 1R, 1W, MR, 1R1W, MR1W, 1RMW, MRMW – Mutex
– Synchronization necessary: One process should be • Hardware Instruction to ensure locking
writer
– Atomic Instructions: TAS, LL/LD pair, XHNG, SWAP
At i I t ti TAS LL/LD i XHNG SWAP
– Mutual Exclusion: Critical Section Problem
– TAS (test and set)
• Barrier or Fence For_all_N_threads DoWork1(); – TTAS (try, test and set) This part we will discuss towards
– Wait until some thing waits(N); End of this course (in Nov)
For_all_N_threads DoWork2(); – TTAS with Backup •Atomic Register
– Synchronized waits(N);
•Safe Register
– Example: Phase wise executions • Relative Power of Sync’
operations
int main() {
#include <stdio.h> pthread_t thr1, thr2;
#include <stdlib.h> const char *MSG1="Thr 1“, *MSG2="Thr 2";
#include <pthread.h> int iret1, iret2;
iret1 = pthread_create( &thr1, NULL,
void *thr_func( void *ptr ){ thr_func, (void*) MSG1);
thr_func
char *message; iret2 = pthread_create( &thr2, NULL,
message = (char *) ptr; thr_func, (void*) MSG2);
thr_func
printf("%s \n", message); pthread_join(thr1, NULL);
} pthread_join(thr2, NULL);
exit(0);
}
$ g++ -pthread pthread1.c –o pthread1
3
9/23/2014
int counter = 0;
int main() {
thread_t th[NTH];
int i, j;
.. spin
CS
.. spin
CS
4
9/23/2014
Review: Test‐and‐Set
• Boolean value
• Test‐and‐set (TAS)
…lock suffers from contention
– Swap true with current value
– Return value tells if prior value was true or false
• Can reset just by writing false
.. spin
CS
Contention Æ ???
import java.util.concurrent.atomic
public class AtomicBoolean { • Locking
boolean value; – Lock is free: value is false
– Lock is taken: value is true
public synchronized boolean • Acquire lock by calling TAS
getAndSet(boolean newValue) {
boolean prior = value; – If result is false, you win
value = newValue; – If result is true, you lose
return prior; • Release lock by writing false
}
}
Test‐and‐set Lock Graph
class TASlock {
AtomicBoolean state = no speedup
new AtomicBoolean(false); because of
sequential
me
o d lock()
void oc () {
tim
bottleneck
while (state.getAndSet(true)) {}
ideal
}
5
9/23/2014
Mystery #1
• Lurking stage
TAS lock – Wait until lock “looks” free
– Spin while read returns true (lock taken)
me
• Pouncing state
Pouncing state
tim
Ideal – As soon as lock “looks” available
– Read returns false (lock free)
– Call TAS to acquire lock
threads What is – If TAS loses, back to lurking
going
on?
Mystery #2
class TTASlock {
AtomicBoolean state =
TAS lock
new AtomicBoolean(false);
TTAS lock
me
void lock() { Then try to acquire it
tim
while (true) {
Ideal
while (state.get()) {}
if (!state.getAndSet(true))
return;
} threads
}
Wait until lock looks free
6
9/23/2014
Spin‐Waiting Overhead
• Shared memory
– Pthread, C++11 thread
– Java
TTAS Lock
– OpenMP
time
– Cilk
threads
#include <future> // for std::async #include <thread> // for std::thread
#include <iostream> #include <iostream>
void write_message(std::string const& message) {
std::cout<<message; void write_message(std::string const& message) {
} std::cout<<message;g
int
i main() {
i () {
}
auto f=std::async(write_message,
"hello world from std::async\n");
int main() {
write_message("hello world from main\n"); std::thread t(write_message,
f.wait(); "hello world from std::thread\n");
} write_message("hello world from main\n");
$ g++ ‐std=c++0x ‐pthread test.cpp t.join();
}
7
9/23/2014
#include <future>
#include <iostream>
• std::launch::async => “as if” in a new void write_message(std::string const& message) {
thread. std::cout<<message;
• std::launch::deferred => executed on }
demand. int main() {
auto f=std::async(
• std::launch::async | std::launch::deferred
std::launch::async, write_message,
=> implementation chooses (default).
"hello world from std::async\n");
write_message("hello world from main\n");
f.wait();
}
class NewThread implements Runnable {
#include <future>
Thread t;
#include <iostream>
NewThread() {
int find_the_answer() {
return 42; t = new Thread(this, "Demo Thread");
} System.out.println("Child thread: " + t);
int main() { t.start(); // Start the thread
auto }
f=std::async(find_the_answer); public void run() {
std::cout<<"the answer is“
<<f.get()<<"\n"; for(int i = 5; i > 0; i‐‐) {
} System.out.println("Child Thread: " + i);
}
}
public class ThreadDemo { • We need to indentify parallelism
public static void main(String args[]) { – How to do extract parallelism manually
new NewThread(); – Parallel Decomposition
• Code in threaded model
C d i th d d d l
for(int i = 5; i > 0; i‐‐) {
System.out.println("Main Thread: " + i);
• OS is responsible for running it efficiently
}
– Less control over runtime
}
$javac ThreadDemo.java
$java ThreadDemo