Thread-Level Parallelism and Synchronization Issues

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 9

Thread-Level Parallelism and

Synchronization Issues

1
Data Races and Synchronization
• Two memory accesses form a data race if from
different threads to same location, and at least
one is a write, and they occur one after another
• If there is a data race, result of program can vary
depending on chance (which thread ran first?)
• Avoid data races by synchronizing writing and
reading to get deterministic behavior
• Synchronization done by user-level functions that
rely on hardware synchronization instructions
2
Lock and Unlock Synchronization
• Lock used to create region Set the lock
(critical section) where only
one thread can operate Critical section
• Given shared memory, use (only one thread
memory location as gets to execute
synchronization point: lock or this section of
semaphore
code at a time)
• Thread reads lock to see if it
must wait, or OK to go into e.g., change
critical section (and set to shared variables
locked)
– 0 => lock is free / open /
Unset the lock
unlocked / lock off
– 1 => lock is set / closed /
3
locked / lock on
Possible Lock/Unlock Implementation

• Lock (aka busy wait):


addiu $t1,$zero,1 ; t1 = 1 means Locked
Loop: lw $t0,lock($s0) ; load lock
bne $t0,$zero,Loop ; loop if locked
Lock: sw $t1,lock($s0) ; Unlocked, so lock

• Unlock:
sw $zero,lock($s0)

4
Possible Lock Problem
• Thread 1 • Thread 2
addiu $t1,$zero,1
Loop: lw $t0,lock($s0)
addiu $t1,$zero,1
Loop: lw $t0,lock($s0)

bne $t0,$zero,Loop
bne $t0,$zero,Loop
Lock: sw $t1,lock($s0)
Lock: sw $t1,lock($s0)

Time Both threads think they have set the lock


Exclusive access not guaranteed!
5
Help! Hardware Synchronization
• Hardware support required to prevent
interloper (either thread on other core or
thread on same core) from changing the value
– Atomic read/write memory operation
– No other access to the location allowed between
the read and write
• Could be a single instruction
– E.g., atomic swap of register ↔ memory
– Or an atomic pair of instructions
6
Test-and-Set
• In a single atomic operation:
– Test to see if a memory location is set
(contains a 1)
– Set it (to 1) If it isn’t (it contained a
zero when tested)
– Otherwise indicate that the Set
failed, so the program can try again
– No other instruction can modify the
memory location, including another
Test-and-Set instruction
• Useful for implementing lock
operations
7
Multithreading on Multicore
• Basic idea: Processor resources are expensive and
should not be left idle
• Long memory latency to memory on cache miss?
• Hardware switches threads to bring in other useful
work while waiting for cache miss
• Cost of thread context switch must be much less than
cache miss latency
• Put in redundant hardware so don’t have to save
context on every thread switch:
– PC, Registers?
• Attractive for apps with abundant TLP
8
Concluding
• Sequential software is slow software
– Multiprocessors only path to higher performance
• Multiprocessor (Multicore) uses Shared Memory
(single address space) for TLP
• Cache coherency keeps data coherent.
1. Snooping protocols
2. Directory based protocols
– False sharing a concern in cache coherence
• Synchronization via hardware primitives
9

You might also like