0% found this document useful (0 votes)

32 views15 pages

Simultaneous Multi-Threaded Design: Virendra Singh

Uploaded by

Bhoomik Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views15 pages

Simultaneous Multi-Threaded Design: Virendra Singh

Uploaded by

Bhoomik Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Simultaneous Multi-

Threaded Design
Virendra Singh
Associate Professor
Computer Architecture and Dependable Systems Lab
Department of Electrical Engineering
Indian Institute of Technology Bombay
http://www.ee.iitb.ac.in/~viren/
E-mail: [email protected]

EE-739: Processor Design

Lecture 35 (11 April 2013) CADSL
Simultaneous Multi-threading ...
One thread, 8 units! Two threads, 8 units!
Cycle! M! M! FX! FX! FP! FP!BR!CC! Cycle! M! M! FX! FX! FP! FP!BR!CC!
1 1

2 2

3 3

4 4

5 5

6 6
7
7
8
8
9
9

M = Load/Store, FX = Fixed Point, FP = Floating Point, BR = Branch, CC = Condition Codes!

11 Apr 2013 EE-739@IITB 2 CADSL

Multithreaded Categories
Simultaneous
Superscalar Fine-Grained Coarse-Grained Multiprocessing Multithreading
Time (processor cycle)

Thread 1 Thread 3 Thread 5

Thread 2 Thread 4 Idle slot

11 Apr 2013 EE-739@IITB 3 CADSL

Design Challenges in SMT
• Since SMT makes sense only with fine-‐grained implementa:on,
impact of fine-‐grained scheduling on single thread performance?
– A preferred thread approach sacrifices neither throughput nor
single-‐thread performance?
– Unfortunately, with a preferred thread, the processor is likely
to sacrifice some throughput, when preferred thread stalls
• Larger register file needed to hold mul:ple contexts
• Not affec:ng clock cycle :me, especially in
– Instruc:on issue -‐ more candidate instruc:ons need to be
considered
– Instruc:on comple:on -‐ choosing which instruc:ons to commit
may be challenging
• Ensuring that cache and TLB conflicts generated by SMT do not
degrade performance

11 Apr 2013 EE-739@IITB 4 CADSL

Basic Out-of-order Pipeline

11 Apr 2013 EE-739@IITB 5 CADSL

SMT Pipeline

11 Apr 2013 EE-739@IITB 6 CADSL

Simultaneous Multithreading

11 Apr 2013 EE-739@IITB 7 CADSL

Simultaneous Multithreading (SMT)
• Simultaneous mul:threading (SMT): insight that dynamically
scheduled processor already has many HW mechanisms to support
mul:threading
– Large set of virtual registers that can be used to hold the register
sets of independent threads
– Register renaming provides unique register iden:ﬁers, so
instruc:ons from mul:ple threads can be mixed in datapath
without confusing sources and des:na:ons across threads
– Out-‐of-‐order comple:on allows the threads to execute out of
order, and get beTer u:liza:on of the HW
• Just adding a per thread renaming table and keeping separate PCs
– Independent commitment can be supported by logically keeping
a separate reorder buﬀer for each thread Source:“Compaq Micrprocessor Report, December 6, 1999
Chooses SMT for Alpha”

11 Apr 2013 EE-739@IITB 8 CADSL

SMT Architecture
• StraighYorward extension to conven:onal
superscalar design.
– mul:ple program counters and some mechanism by which
the fetch unit selects one each cycle,
– a separate return stack for each thread for predic:ng
subrou:ne return des:na:ons,
– per-‐thread instruc:on re:rement, instruc:on queue flush,
and trap mechanisms,
– a thread id with each branch target buffer entry to avoid
predic:ng phantom branches, and
– a larger register file, to support logical registers for all
threads plus addi:onal registers for register renaming.
• The size of the register file affects the pipeline and the
scheduling of load-‐dependent instruc:ons.

11 Apr 2013 EE-739@IITB 9 CADSL

SMT Performance
Tullsen ‘96

11 Apr 2013 EE-739@IITB 10 CADSL

Implementing SMT
Can use as is most hardware on current out-‐or-‐order processors
Out-‐of-‐order renaming & instruc3on scheduling mechanisms
• physical register pool model
• renaming hardware eliminates false dependences both
within a thread (just like a superscalar) & between threads
• map thread-‐specific architectural registers onto a pool of
thread-‐independent physical registers
• operands are therea]er called by their physical names
• an instruc:on is issued when its operands become available
& a func:onal unit is free
• instruc:on scheduler not consider thread IDs when
dispatching instruc:ons to func:onal units
(unless threads have different priori:es)
11 Apr 2013 EE-739@IITB 11 CADSL
From Superscalar to SMT
Extra pipeline stages for accessing thread-‐shared register
files
• 8 threads * 32 registers + renaming registers

SMT instruc3on fetcher (ICOUNT)
• fetch from 2 threads each cycle
• count the number of instruc:ons for each thread in
the pre-‐execu:on stages
• pick the 2 threads with the lowest number
• in essence fetching from the two highest throughput
threads

11 Apr 2013 EE-739@IITB 12 CADSL

From Superscalar to SMT
Per-‐thread hardware
• small stuff
• all part of current out-‐of-‐order processors
• none endangers the cycle :me
• other per-‐thread processor state, e.g.,
• program counters
• return stacks
• thread iden:fiers, e.g., with BTB entries, TLB entries
• per-‐thread bookkeeping for
• instruc:on re:rement
• trapping
• instruc:on queue flush
This is why there is only a 10% increase to Alpha 21464 chip area.

11 Apr 2013 EE-739@IITB 13 CADSL

Implementing SMT
Thread-‐shared hardware:
• fetch buﬀers
• branch predic:on structures
• instruc:on queues
• func:onal units
• ac:ve list
• all caches & TLBs
• MSHRs
• store buﬀers
This is why there is liTle single-‐thread performance
degrada:on (~1.5%).

11 Apr 2013 EE-739@IITB 14 CADSL

Thank You

11 Apr 2013 EE-739@IITB 15 CADSL

Constitutional Law Commerce Clause Flowchart4
73% (11)
Constitutional Law Commerce Clause Flowchart4
1 page
Chegg SQNA Authoring Guidelines-1
No ratings yet
Chegg SQNA Authoring Guidelines-1
36 pages
2023 Ayala Corporation - Statement of Cash Flows
No ratings yet
2023 Ayala Corporation - Statement of Cash Flows
2 pages
IVDGW25 Trade DG Werfen (IN) - May 2025
No ratings yet
IVDGW25 Trade DG Werfen (IN) - May 2025
16 pages
China Green Finance Status and Trends 2023 2024
No ratings yet
China Green Finance Status and Trends 2023 2024
47 pages
Forced Convection Sample Problems
No ratings yet
Forced Convection Sample Problems
7 pages
Castillo v. Castillo
No ratings yet
Castillo v. Castillo
1 page
Class VIII 7 Algorithm Desing
No ratings yet
Class VIII 7 Algorithm Desing
9 pages
06b Multithreading MF
No ratings yet
06b Multithreading MF
37 pages
Nature of Housing Location and Trend of Housing Types: Assessing The Rural Housing of Bangladesh
No ratings yet
Nature of Housing Location and Trend of Housing Types: Assessing The Rural Housing of Bangladesh
14 pages
Community Development Thesis Topics
100% (2)
Community Development Thesis Topics
4 pages
Unit 5
No ratings yet
Unit 5
86 pages
Interest and Commission Worksheet
No ratings yet
Interest and Commission Worksheet
1 page
Osy Micro Project
No ratings yet
Osy Micro Project
15 pages
Chief of Staff To Ceo: Why Do You Need One?: by Melissa Wingard-Phillips
No ratings yet
Chief of Staff To Ceo: Why Do You Need One?: by Melissa Wingard-Phillips
18 pages
Lecture 03 - Multithreading
No ratings yet
Lecture 03 - Multithreading
22 pages
MFCApplication Finance
No ratings yet
MFCApplication Finance
3 pages
Basis
No ratings yet
Basis
137 pages
Vision 2023 Operating System Chapter 2 Threads and System Calls 74
No ratings yet
Vision 2023 Operating System Chapter 2 Threads and System Calls 74
11 pages
Topic 6 - Identity & Access Management
No ratings yet
Topic 6 - Identity & Access Management
27 pages
Osn902 1
No ratings yet
Osn902 1
35 pages
Simultaneous Multithreading
No ratings yet
Simultaneous Multithreading
50 pages
Lecture ParallelArchTLP-DLP
No ratings yet
Lecture ParallelArchTLP-DLP
52 pages
03 TLP
No ratings yet
03 TLP
33 pages
Multithreading, SMT and CMP
No ratings yet
Multithreading, SMT and CMP
7 pages
Untitled
No ratings yet
Untitled
3 pages
03 Process Part2
No ratings yet
03 Process Part2
92 pages
TLP
No ratings yet
TLP
19 pages
2nd Big Olaf Lawsuit
No ratings yet
2nd Big Olaf Lawsuit
11 pages
Power Amplifier Applications: Absolute Maximum Ratings
No ratings yet
Power Amplifier Applications: Absolute Maximum Ratings
4 pages
Unit IV QB With Answers
No ratings yet
Unit IV QB With Answers
16 pages
Development of A Simultaneously Threaded
No ratings yet
Development of A Simultaneously Threaded
14 pages
DigitalLogic ComputerOrganization L23 Multicore Handout
No ratings yet
DigitalLogic ComputerOrganization L23 Multicore Handout
32 pages
Hardware
No ratings yet
Hardware
54 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
CSE3117-Lecture 1-Introduction
No ratings yet
CSE3117-Lecture 1-Introduction
22 pages
202004261306373620rohit Engg Multi Threaded
No ratings yet
202004261306373620rohit Engg Multi Threaded
4 pages
Chapter 13: Empirical Evidence On Security Returns: Stock A B C D E F G H I
No ratings yet
Chapter 13: Empirical Evidence On Security Returns: Stock A B C D E F G H I
2 pages
21CS44 - Operating Systems - Module-2
No ratings yet
21CS44 - Operating Systems - Module-2
44 pages
Definition:: Examples: FPT Education (FPTE), in Particular FPT University, Is Currently in Need of Fresh
No ratings yet
Definition:: Examples: FPT Education (FPTE), in Particular FPT University, Is Currently in Need of Fresh
8 pages
Presentation On Multithreading/Vector
No ratings yet
Presentation On Multithreading/Vector
7 pages
STICK WELDING 101 - Getting Started With SMAW
No ratings yet
STICK WELDING 101 - Getting Started With SMAW
5 pages
Lecture19 ILP SMT
No ratings yet
Lecture19 ILP SMT
31 pages
Question List 2019 02 09
No ratings yet
Question List 2019 02 09
38 pages
Simulation of Distance Relay For Load Encroachment Alleviation With Agent Based Supervision of Zone-3
No ratings yet
Simulation of Distance Relay For Load Encroachment Alleviation With Agent Based Supervision of Zone-3
11 pages
MKTG 2P52 - Chapter - 5
No ratings yet
MKTG 2P52 - Chapter - 5
20 pages
EE6304 Lecture12 TLP
No ratings yet
EE6304 Lecture12 TLP
70 pages
Flynns Taxonomy
0% (1)
Flynns Taxonomy
79 pages
Novel Process Scheme For Selective Propane Dehydrogenation: C. Croppi, G. Iaquaniello, E. Palo, A. Salladini
No ratings yet
Novel Process Scheme For Selective Propane Dehydrogenation: C. Croppi, G. Iaquaniello, E. Palo, A. Salladini
26 pages
Panchayat Raj Act and Rules
100% (1)
Panchayat Raj Act and Rules
328 pages
Lecture 3 Flynn's Classical Taxonomy
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
29 pages
Accountability and Responsibility
No ratings yet
Accountability and Responsibility
7 pages
Simultaneous Multithreading G Architecture: Virendra Singh
No ratings yet
Simultaneous Multithreading G Architecture: Virendra Singh
15 pages
Institution of Technology School of Computing Department of Information Technology
No ratings yet
Institution of Technology School of Computing Department of Information Technology
42 pages
Arvind LTD
No ratings yet
Arvind LTD
25 pages
Alkali Sulfur Ratio PDF
No ratings yet
Alkali Sulfur Ratio PDF
2 pages
Processor Design: Course Introduction
No ratings yet
Processor Design: Course Introduction
18 pages
Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu
No ratings yet
Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu
4 pages
Simultaneous Multithreading Processor
No ratings yet
Simultaneous Multithreading Processor
4 pages
Multi-Core Architectures
100% (1)
Multi-Core Architectures
43 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Unit 1 Modern Processors
No ratings yet
Unit 1 Modern Processors
52 pages
Lec 4 Superscalarprocessor PDF
No ratings yet
Lec 4 Superscalarprocessor PDF
23 pages
Organisasi & Arsitektur Komputer
No ratings yet
Organisasi & Arsitektur Komputer
7 pages
Basic of Thread Level Parallelism
No ratings yet
Basic of Thread Level Parallelism
30 pages
Lec 4 Superscalarprocessor Updated PDF
No ratings yet
Lec 4 Superscalarprocessor Updated PDF
40 pages
Coa Unit 4,5 MCQ
No ratings yet
Coa Unit 4,5 MCQ
26 pages
ACA Unit 4
No ratings yet
ACA Unit 4
27 pages
Multi-Core Architectures
No ratings yet
Multi-Core Architectures
32 pages
Threads: by Salman Memon 2K12/IT/109 University of Sindh Jamshoro
No ratings yet
Threads: by Salman Memon 2K12/IT/109 University of Sindh Jamshoro
16 pages
Parallelism (2) & Heterogeneous Computing & Future Perspetives
No ratings yet
Parallelism (2) & Heterogeneous Computing & Future Perspetives
50 pages
SMT and CMP Architectures
100% (1)
SMT and CMP Architectures
19 pages
Future Processors To Use Coarse-Grain Parallelism
No ratings yet
Future Processors To Use Coarse-Grain Parallelism
48 pages
MULTITHREADING
No ratings yet
MULTITHREADING
30 pages
Design Issues: SMT and CMP Architectures
No ratings yet
Design Issues: SMT and CMP Architectures
9 pages
Co1 Co2 Co3 Co4 Co5 Co6
No ratings yet
Co1 Co2 Co3 Co4 Co5 Co6
2 pages
09 - Thread Level Parallelism
50% (2)
09 - Thread Level Parallelism
34 pages
Multi Core 15213 Sp07
No ratings yet
Multi Core 15213 Sp07
67 pages
A Fine-Grain Multi Threading
No ratings yet
A Fine-Grain Multi Threading
6 pages
Antenna Design
No ratings yet
Antenna Design
6 pages
Multi Thread2
No ratings yet
Multi Thread2
37 pages
Hardware Multithreading
No ratings yet
Hardware Multithreading
22 pages
SMT and CMP Architectures
100% (3)
SMT and CMP Architectures
19 pages
Design Issues SMT and CMP Architectures
No ratings yet
Design Issues SMT and CMP Architectures
25 pages
15th Lecture 6. Future Processors To Use Coarse-Grain Parallelism
No ratings yet
15th Lecture 6. Future Processors To Use Coarse-Grain Parallelism
35 pages
SMT and CMP Architectures
No ratings yet
SMT and CMP Architectures
19 pages
Hardware Multithreading
100% (1)
Hardware Multithreading
4 pages
CS252 Graduate Computer Architecture Multithreading / Vector Processing March 2, 2011
No ratings yet
CS252 Graduate Computer Architecture Multithreading / Vector Processing March 2, 2011
26 pages
Syllabus: Computer Architecture AND Parallel Processing
No ratings yet
Syllabus: Computer Architecture AND Parallel Processing
1 page

Simultaneous Multi-Threaded Design: Virendra Singh

Uploaded by

Simultaneous Multi-Threaded Design: Virendra Singh

Uploaded by

Simultaneous Multi-

EE-739: Processor Design

M = Load/Store, FX = Fixed Point, FP = Floating Point, BR = Branch, CC = Condition Codes!

11 Apr 2013 EE-739@IITB 2 CADSL

Thread 1 Thread 3 Thread 5

11 Apr 2013 EE-739@IITB 3 CADSL

11 Apr 2013 EE-739@IITB 4 CADSL

11 Apr 2013 EE-739@IITB 5 CADSL

11 Apr 2013 EE-739@IITB 6 CADSL

11 Apr 2013 EE-739@IITB 7 CADSL

11 Apr 2013 EE-739@IITB 8 CADSL

11 Apr 2013 EE-739@IITB 9 CADSL

11 Apr 2013 EE-739@IITB 10 CADSL

11 Apr 2013 EE-739@IITB 12 CADSL

11 Apr 2013 EE-739@IITB 13 CADSL

11 Apr 2013 EE-739@IITB 14 CADSL

11 Apr 2013 EE-739@IITB 15 CADSL

You might also like