Hardware Multithreading
Hardware Multithreading
MULTITHREADING
JAHANGIR ABBAS 15091519-091
Coarse-grain
multithreading
Fine-grain
multithreading
Simultaneous
Multi-Threading
Coarse-grain Multithreading
• Threads are switched upon ‘expensive’ operations
• Single thread runs until a costly stall
– E.g. 2nd level cache miss
• Another thread starts during stall for first
– Pipeline fill time requires several cycles!
• Does not cover short stalls
• Less likely to slow execution of a single thread (smaller latency)
• Needs hardware support
– PC and register file for each thread
• – little other hardware
Fine-grain Multithreading
• Threads are switched every single cycle among the ‘ready’
threads
• Two or more threads interleave instructions
– Round-robin fashion
– Skip stalled threads
• Needs hardware support
– Separate PC and register file for each thread
– Hardware to control alternating pattern
• Naturally hides delays
– Data hazards, Cache misses
– Pipeline runs with rare stalls
• Does not make full use of multi-issue architecture
Simultaneous Multi-Threading
• The main idea is to exploit instructions level
parallelism and thread level parallelism at the
same time
• In a superscalar processor issue instructions from
different threads in the same cycle
– Schedule as many ‘ready’ instructions as possible
– Operand reading and result saving becomes
much more complex
Simultaneous Multi-Threading
• Let’s look simply at instruction issue:
1 2 3 4 5 6 7 8 9 10
Inst a IF ID EX MEM WB
Inst b IF ID EX MEM WB
Inst M IF ID EX MEM WB
Inst N IF ID EX MEM WB
Inst c IF ID EX MEM WB
Inst P IF ID EX MEM WB
Inst Q IF ID EX MEM WB
Inst d IF ID EX MEM WB
Inst e IF ID EX MEM WB
Inst R IF ID EX MEM WB
We want to run these
two Threads
Thread A Thread B SMT Issue as many
Time ————>
1 a 1 a Ready instrs.
2 b 2 b as possible
ICM c c d
ICM d e f
3 e 3 4
4 f 5 6
5 ICM … …
6 ICM
… …
SMT ISSUES WITH IN-ORDER PROCESSORS
• Asymmetric pipeline stall
• One part of pipeline stalls – we want other pipeline to
continue
• Overtaking – non-stalled threads should progress
• What happens if a ready thread
SMT issues with in-order processors
Cache misses – Abort instruction (and instructions
in the shadow if Dcache miss) upon cache miss
Most existing implementations are for O-o-O,
register-renamed architectures (akin to
tomasulo)
e.g. PowerPC, Intel Hyper-threading
SIMULTANEOUS MULTI THREADING
• Extracts the most parallelism from instructions and threads
• Implemented mostly in out-of-order processors because they
are the only able to exploit that much parallelism
• Has a significant hardware overhead
• Replicate (and MUX) thread state (registers, TLBs, etc)
• Operand reading and result saving increases datapath
complexity
• Per-thread instruction handling/scheduling engine in out-of-
order implementations
BENEFITS OF HW MT
• Multithreading techniques improve the utilisation of
processor resources and, hence, the overall performance
• If the different threads are accessing the same input data
they may be using the same regions of memory
• Cache efficiency improves in these cases
DISADVANTAGES OF HW MT
xCM
xCH