0% found this document useful (0 votes)
64 views9 pages

Thread-Level Parallelism and Synchronization Issues

This document discusses thread-level parallelism and synchronization issues in multicore processors. It describes how data races can cause non-deterministic behavior and how synchronization, such as locks, are used to prevent data races. It explains possible implementations of locks using busy waiting or hardware synchronization instructions like test-and-set. Finally, it discusses how multicore processors leverage hardware thread switching to improve performance from thread-level parallelism.

Uploaded by

SIYAB Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views9 pages

Thread-Level Parallelism and Synchronization Issues

This document discusses thread-level parallelism and synchronization issues in multicore processors. It describes how data races can cause non-deterministic behavior and how synchronization, such as locks, are used to prevent data races. It explains possible implementations of locks using busy waiting or hardware synchronization instructions like test-and-set. Finally, it discusses how multicore processors leverage hardware thread switching to improve performance from thread-level parallelism.

Uploaded by

SIYAB Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Thread-Level Parallelism and

Synchronization Issues

1
Data Races and Synchronization
• Two memory accesses form a data race if from
different threads to same location, and at least
one is a write, and they occur one after another
• If there is a data race, result of program can vary
depending on chance (which thread ran first?)
• Avoid data races by synchronizing writing and
reading to get deterministic behavior
• Synchronization done by user-level functions that
rely on hardware synchronization instructions
2
Lock and Unlock Synchronization
• Lock used to create region Set the lock
(critical section) where only
one thread can operate Critical section
• Given shared memory, use (only one thread
memory location as gets to execute
synchronization point: lock or this section of
semaphore
code at a time)
• Thread reads lock to see if it
must wait, or OK to go into e.g., change
critical section (and set to shared variables
locked)
– 0 => lock is free / open /
Unset the lock
unlocked / lock off
– 1 => lock is set / closed /
3
locked / lock on
Possible Lock/Unlock Implementation

• Lock (aka busy wait):


addiu $t1,$zero,1 ; t1 = 1 means Locked
Loop: lw $t0,lock($s0) ; load lock
bne $t0,$zero,Loop ; loop if locked
Lock: sw $t1,lock($s0) ; Unlocked, so lock

• Unlock:
sw $zero,lock($s0)

4
Possible Lock Problem
• Thread 1 • Thread 2
addiu $t1,$zero,1
Loop: lw $t0,lock($s0)
addiu $t1,$zero,1
Loop: lw $t0,lock($s0)

bne $t0,$zero,Loop
bne $t0,$zero,Loop
Lock: sw $t1,lock($s0)
Lock: sw $t1,lock($s0)

Time Both threads think they have set the lock


Exclusive access not guaranteed!
5
Help! Hardware Synchronization
• Hardware support required to prevent
interloper (either thread on other core or
thread on same core) from changing the value
– Atomic read/write memory operation
– No other access to the location allowed between
the read and write
• Could be a single instruction
– E.g., atomic swap of register ↔ memory
– Or an atomic pair of instructions
6
Test-and-Set
• In a single atomic operation:
– Test to see if a memory location is set
(contains a 1)
– Set it (to 1) If it isn’t (it contained a
zero when tested)
– Otherwise indicate that the Set
failed, so the program can try again
– No other instruction can modify the
memory location, including another
Test-and-Set instruction
• Useful for implementing lock
operations
7
Multithreading on Multicore
• Basic idea: Processor resources are expensive and
should not be left idle
• Long memory latency to memory on cache miss?
• Hardware switches threads to bring in other useful
work while waiting for cache miss
• Cost of thread context switch must be much less than
cache miss latency
• Put in redundant hardware so don’t have to save
context on every thread switch:
– PC, Registers?
• Attractive for apps with abundant TLP
8
Concluding
• Sequential software is slow software
– Multiprocessors only path to higher performance
• Multiprocessor (Multicore) uses Shared Memory
(single address space) for TLP
• Cache coherency keeps data coherent.
1. Snooping protocols
2. Directory based protocols
– False sharing a concern in cache coherence
• Synchronization via hardware primitives
9

You might also like