Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu

1. Simultaneous multithreading (SMT) is a processor architecture that improves performance by exploiting both instruction-level parallelism (ILP) and thread-level parallelism (TLP). SMT maintains hardware state for multiple threads and can issue instructions from different threads each cycle. 2. SMT provides performance gains over superscalar and multiprocessor architectures by dynamically partitioning hardware resources between threads rather than statically allocating resources. This allows SMT processors to better tolerate long latency operations and improve hardware utilization. 3. While SMT offers performance advantages, it also faces challenges including increased register pressure, latency, and stress on shared resources like caches. Compiler and operating system support may be needed

Uploaded by

Palanikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views4 pages

Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu

Uploaded by

Palanikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Simultaneous Multithreading

Pratyusa Manadhata, Vyas Sekar

{pratyus,vyass}@cs.cmu.edu

1 Introduction: tirety and these are inherently incapable of

adapting to dynamic levels of ILP and TLP.
Current research in processor technology and This is the primary motivation for a new
computer architecture is motivated primar- architecture of processors called Simultane-
ily by the need for greater performance. In ous Multithreading (SMT).
this context, it is well understood that the
performance gain from improving the mem-
ory system alone is limited, and using system 2 SMT
Level Integration (such as supporting graph-
ics/sound on chip) can only lead to marginal In this section we identify some of the key
performance benefits. The most significant characteristics of an SMT architecture and
gain can be achieved by increasing parallelism some of the design requirements that can fa-
in execution. cilitate the implementation of an SMT over
There exist two kinds of parallelism in a conventional superscalar architecture. The
typical programming workloads, Instruction characteristics of SMT processors are
Level Parallelism (ILP) and Thread Level 1. inherited from superscalar: issue multiple
Parallelism (TLP). Modern superscalar archi- instructions per cycle
tectures are designed to capture ILP in pro- 2. from multithreaded: maintain hardware
grams, while multithreaded and multiproces- state for multiple threads
sor systems are designed to capture TLP or In Fig 1 we can see that there is a significant
parallelism across threads/processes. amount of wastage of issue slots in the super-
The better solution then would be to ex- scalar and the multithreaded system. There
ploit both ILP and TLP ; TLP from either are essentially two kinds of waste: vertical
multithreaded parallel programs or from mul- waste (an entire cycle is unused) and horizon-
tiprogramming workload, and the ILP from tal waste ( within a cycle issue slots are un-
each thread. used). Superscalar processors look at multi-
Neither superscalar nor multiprocessor ple instructions from same process, and have
(MP) can capture ILP and TLP in its en- both horizontal waste (as a result of insuffi-

1
3. Most h/w resources are available unlike
in static resource allocation This implies
that a non-parallelizable program will
still run efficiently in SMT.

4. Fetch Mechanism:
a. 2.8 scheme: select 2 threads . fetch 8
from each thread
(2.4 scheme?) out of these choose a
subset to match h/w decoding b/w b.
h/w cost:additional port on IC (2.8
better than 2.4)
c. icount technique: selecting the
thread, higher priority to those threads
that have least number of instructions in
the decode,rename and queue pipeline
Figure 1: Horizontal and Vertical Wastage stages: even distribution, prevents star-
vation etc. Other options are misscount,
bcount etc.
cient ILP) and vertical waste (due to data de-
pendencies and long latency operations). The
MT system minimizes vertical waste as it can 5. Caveat: Hardware register file is larger:
look at multiple threads to fetch from in each 2-clock latency to access register needs
cycle and thus it can tolerate long latency 2-cycle read/write.
operations within each thread.

3.1 SMT Disadvantages

3 SMT Model
• There is greater register pressure and
1. Consider a superscalar that fetches 8 in- greater per thread latency due to the
structions from the IC longer pipeline.

2. SMT h/w modifications required over a • On a multiprogrammed workload there

conventional superscalar is greater stress on shared structures
1. state for h/w contexts for threads such as BPB, cache, TLB etc.
2. per-thread exception/retirement
mechanisms • A Parallel Workload tends to stress the
functional units more.

2
4 Results and Observa- and TLP which implies that resources are not
statically partitioned.
tions
4.1 Discussion of Issues in
It has been observed that superscalars ap-
proximately give an IPC of about 1-2. But SMTs
the results shown indicate that SMT can • Cost vs Performance: It is necessary to
reach an IPC of upto 6.7 (for a 8-issue ar- quantify the architecture that can best
chitecture). Even though the SMT pipeline use the chip area and can provide en-
is longer implying a longer latency for a single hanced performance with minimal hard-
thread it is observed to not have a significant ware overhead.
performance effect. The reason for the non-
degraded performance in the presence of con- • Quantitative Comparisons: It is difficult
flicts and a longer pipeline is essentially the to quantify in absolute terms the perfor-
systems’ ability to absorbs additional con- mance gain that the SMT processor can
flicts i.e., the ability to hide latency by using deliver. Often this depends a lot on de-
multiple issues from multiple threads. The sign cycle time, the actual hardware im-
multiprocessor architectures MP2 and MP4 plementation etc that are hard to predict
were observed to be hindered by static re- given the technology trends.
source partitioning, while SMT on the other • Compilers: One of the earlier claims was
hand dynamically partitions resources among that SMT is easier for compilers and pro-
threads. Also a comparison between MP2 grammers, as the hardware can dynami-
vs MP4 shows that MP2 can better adapt cally repartition resources. But the gen-
to ILP, while MP4 is better suited for uti- eral feeling is that in order to assure
lizing TLP, which is quite intuitive as there a performance no worse than the com-
are more functional units per processor avail- peting architectures and to ensure maxi-
able in the MP2, while there are more paral- mum processor utilization, one does need
lel units in the MP4. SMT can also lead to compiler support for identifying sources
increased cache misses/conflicts and greater of parallelism and help in static schedul-
stress on the branching hardware. However ing.
the impact on overall program performance
is not significant as SMT, efficient hardware • OS: It is important to consider OS issues
design, and compiler optimizations can hide such as thread scheduling, thread prior-
latencies and conflicts significantly. The key ity etc. that will be necessary in a realis-
insight is that SMT achieves a better perfor- tic implementation of an SMT, and the
mance gain than Superscalar, multithreaded, interaction between the thread priority
and multiprocessor architectures due to the and the fetch/issue logic is an interest-
ability to ignore the distinction between ILP ing issue.

3
• Another observation is that more than in reality is far more complex, and there are
static partitioning of resources in multi- other economic factors that come into play.
processors the communication overhead
is a significant reason why SMTs perform
better than MPs. References
• The question also arises whether SMT [1] Susan Eggers, Joel Emer, Henry Levy,
needs a branch prediction mechanism at Jack Lo, Rebecca Stamm, and Dean
all? The answer is yes, which is again Tullsen. Simultaneous Multithreading: A
consistent with the design philosophy Platform for Next-generation Processors,
that a non-parallelizable program still in IEEE Micro, September/October 1997,
needs to get a good performance. pages 12-18.

• Is the performance gain adequate with [2] Jack Lo, Susan Eggers, Joel Emer, Henry
the additional resource cost? It has Levy, Rebecca Stamm, and Dean Tullsen.
been shown that an SMT outperforms Converting Thread-Level Parallelism Into
an equally resource-equipped multipro- Instruction-Level Parallelism via Simul-
cessor running at maximum number of taneous Multithreading, in ACM Transac-
supported threads, which shows that the tions on Computer Systems, August 1997,
SMT has maximum resource utilization. pages 322-354.

What does the future hold for SMTs? [3] Dean Tullsen, Susan Eggers, Joel Emer,
Each processor in an SMP can use SMT - Henry Levy, Jack Lo, and Rebecca
This is a direct extension of the SMP and Stamm. Exploiting Choice: Instruction
SMT architectures that can create small to Fetch and Issue on an Implementable Si-
massive parallel systems where each proces- multaneous Multithreading Processor , in
sor employs SMT to minimize execution time. Proceedings of the 23rd Annual Interna-
It has been observed that next generation ar- tional Symposium on Computer Architec-
chitectures would be based on design issues ture, May 1996, pages 191-202.
that tend to maximize use of power and chip
area, and this would mean that multiprocess-
ing (MP or MT or SMT) on chip is more ef-
ficient than a wider superscalars.
An interesting observation is that even
though the research on SMT was done in the
mid-late 90s, the actual commercial imple-
mentation of an SMT on a processor has been
delayed until now (the Intel “Hyperthread-
ing” Pentium). This shows that chip-design

Unit 5
No ratings yet
Unit 5
86 pages
EE6304 Lecture12 TLP
No ratings yet
EE6304 Lecture12 TLP
70 pages
Multi-Core Architectures
100% (1)
Multi-Core Architectures
43 pages
06b Multithreading MF
No ratings yet
06b Multithreading MF
37 pages
Simultaneous Multithreading
No ratings yet
Simultaneous Multithreading
50 pages
09 - Thread Level Parallelism
50% (2)
09 - Thread Level Parallelism
34 pages
Week 6 A
No ratings yet
Week 6 A
32 pages
Lecture ParallelArchTLP-DLP
No ratings yet
Lecture ParallelArchTLP-DLP
52 pages
Development of A Simultaneously Threaded
No ratings yet
Development of A Simultaneously Threaded
14 pages
TLP
No ratings yet
TLP
19 pages
03 TLP
No ratings yet
03 TLP
33 pages
Lecture19 ILP SMT
No ratings yet
Lecture19 ILP SMT
31 pages
Dynamic Simultaneous Multithreaded Architecture
No ratings yet
Dynamic Simultaneous Multithreaded Architecture
11 pages
Unit - 4 Computing Technologies: To - Bca 4 Sem BY-Vijayalaxmi Chiniwar
No ratings yet
Unit - 4 Computing Technologies: To - Bca 4 Sem BY-Vijayalaxmi Chiniwar
34 pages
Raasch Resourcepartsmt03
No ratings yet
Raasch Resourcepartsmt03
11 pages
Unit IV QB With Answers
No ratings yet
Unit IV QB With Answers
16 pages
Multi-Core Architectures
No ratings yet
Multi-Core Architectures
32 pages
L38 TLP
No ratings yet
L38 TLP
13 pages
Osa Multi Core
No ratings yet
Osa Multi Core
37 pages
Performance of Multi-Process and Multi-Thread Processing On Multi-Core SMT Processors
No ratings yet
Performance of Multi-Process and Multi-Thread Processing On Multi-Core SMT Processors
10 pages
(English (Auto-Generated) ) 2 4 8 Examples of Simultaneous Multithreading (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) 2 4 8 Examples of Simultaneous Multithreading (DownSub - Com)
5 pages
Presentation On Multithreading/Vector
No ratings yet
Presentation On Multithreading/Vector
7 pages
Multithreading, SMT and CMP
No ratings yet
Multithreading, SMT and CMP
7 pages
Parallelism (2) & Heterogeneous Computing & Future Perspetives
No ratings yet
Parallelism (2) & Heterogeneous Computing & Future Perspetives
50 pages
Future Processors To Use Coarse-Grain Parallelism
No ratings yet
Future Processors To Use Coarse-Grain Parallelism
48 pages
15th Lecture 6. Future Processors To Use Coarse-Grain Parallelism
No ratings yet
15th Lecture 6. Future Processors To Use Coarse-Grain Parallelism
35 pages
Unit 6
No ratings yet
Unit 6
15 pages
MTP: Understanding The Essence: Veljko Milutinović
No ratings yet
MTP: Understanding The Essence: Veljko Milutinović
12 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
17 pages
Design Issues SMT and CMP Architectures
No ratings yet
Design Issues SMT and CMP Architectures
25 pages
Simultaneous Multithreading Processor
No ratings yet
Simultaneous Multithreading Processor
4 pages
Organisasi & Arsitektur Komputer
No ratings yet
Organisasi & Arsitektur Komputer
7 pages
Simultaneous Multithreading G Architecture: Virendra Singh
No ratings yet
Simultaneous Multithreading G Architecture: Virendra Singh
15 pages
ACA Unit 4
No ratings yet
ACA Unit 4
27 pages
CC Unit 1
No ratings yet
CC Unit 1
24 pages
Multi Thread2
No ratings yet
Multi Thread2
37 pages
Basic of Thread Level Parallelism
No ratings yet
Basic of Thread Level Parallelism
30 pages
Lec 4 Superscalarprocessor PDF
No ratings yet
Lec 4 Superscalarprocessor PDF
23 pages
Flynns Taxonomy
0% (1)
Flynns Taxonomy
79 pages
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
No ratings yet
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
56 pages
SMT and CMP Architectures
No ratings yet
SMT and CMP Architectures
19 pages
Lec 4 Superscalarprocessor Updated PDF
No ratings yet
Lec 4 Superscalarprocessor Updated PDF
40 pages
Multi-Threaded RTOS: How Multi-Threading Can Increase On-Chip Parallelism
No ratings yet
Multi-Threaded RTOS: How Multi-Threading Can Increase On-Chip Parallelism
14 pages
Boosting Single-Thread Performance in Multi-Core Systems Through Fine-Grain Multi-Threading
No ratings yet
Boosting Single-Thread Performance in Multi-Core Systems Through Fine-Grain Multi-Threading
10 pages
Multi threadedRTOS
No ratings yet
Multi threadedRTOS
14 pages
Electric & Hybrid Vehicle Technology International - July 2012
100% (5)
Electric & Hybrid Vehicle Technology International - July 2012
225 pages
Introduction To Multi-Core Architecture
No ratings yet
Introduction To Multi-Core Architecture
16 pages
MULTITHREADING
No ratings yet
MULTITHREADING
30 pages
Multi Core 15213 Sp07
No ratings yet
Multi Core 15213 Sp07
67 pages
Gas Turbine and Jet & Rocket Propulsion1
No ratings yet
Gas Turbine and Jet & Rocket Propulsion1
405 pages
Multi-Core Computing: Osama Awwad
No ratings yet
Multi-Core Computing: Osama Awwad
37 pages
Antenna Design
No ratings yet
Antenna Design
6 pages
Multithreading: Multithreading Computers Have Hardware Support To Efficiently Execute Multiple
No ratings yet
Multithreading: Multithreading Computers Have Hardware Support To Efficiently Execute Multiple
5 pages
Design Issues: SMT and CMP Architectures
No ratings yet
Design Issues: SMT and CMP Architectures
9 pages
A Visual Simulation Framework For Simult PDF
No ratings yet
A Visual Simulation Framework For Simult PDF
7 pages
SMT and CMP Architectures
100% (3)
SMT and CMP Architectures
19 pages
BurpSuite Compendium Preview
100% (2)
BurpSuite Compendium Preview
13 pages
Riding The Next Wave of Embedded Multicore Processors: - Maximizing CPU Performance in A Power-Constrained World
No ratings yet
Riding The Next Wave of Embedded Multicore Processors: - Maximizing CPU Performance in A Power-Constrained World
36 pages
Hardware Multithreading
100% (1)
Hardware Multithreading
4 pages
Hardware Multithreading
No ratings yet
Hardware Multithreading
22 pages
Dental Studio Plus: User Manual
No ratings yet
Dental Studio Plus: User Manual
171 pages
CS252 Graduate Computer Architecture Multithreading / Vector Processing March 2, 2011
No ratings yet
CS252 Graduate Computer Architecture Multithreading / Vector Processing March 2, 2011
26 pages
SMT and CMP Architectures
No ratings yet
SMT and CMP Architectures
19 pages
Fm-Aa-CIA-15 CC 101 Study Guide 1 Final
No ratings yet
Fm-Aa-CIA-15 CC 101 Study Guide 1 Final
13 pages
Package)
No ratings yet
Package)
292 pages
Quality Guideline Finalized
No ratings yet
Quality Guideline Finalized
48 pages
Isuzu TF Series Gasoline Engine Workshop Manual
100% (45)
Isuzu TF Series Gasoline Engine Workshop Manual
20 pages
Topic 1 - Problem Domain of Artificial Intelligence
100% (1)
Topic 1 - Problem Domain of Artificial Intelligence
21 pages
ENG - NTT Global Data Centers EMEA - Presentation
No ratings yet
ENG - NTT Global Data Centers EMEA - Presentation
10 pages
Introduction To Social Media
No ratings yet
Introduction To Social Media
8 pages
Europass CV 20130527 Odipiyo EN
No ratings yet
Europass CV 20130527 Odipiyo EN
4 pages
Toshiba Universal Smart X Series 4
No ratings yet
Toshiba Universal Smart X Series 4
14 pages
SMA SI2012 2224 Technical Description
No ratings yet
SMA SI2012 2224 Technical Description
212 pages
(Case 9) Customer Relationship Management Helps Celcom Become Number One
No ratings yet
(Case 9) Customer Relationship Management Helps Celcom Become Number One
6 pages
SOLVED - Lexmark - A Scan Profile With The Same Name Already Exists On The Specified MFP - Up & Running Technologies, Tech How To's
No ratings yet
SOLVED - Lexmark - A Scan Profile With The Same Name Already Exists On The Specified MFP - Up & Running Technologies, Tech How To's
2 pages
Sarah Resume 2
No ratings yet
Sarah Resume 2
2 pages
HR Zaman Now
No ratings yet
HR Zaman Now
32 pages
Registering For ADP Portal - 2023
No ratings yet
Registering For ADP Portal - 2023
8 pages
Spec 223 2 PDF
No ratings yet
Spec 223 2 PDF
2 pages
Evolution Thermal Imaging Camera Bulletin - en
No ratings yet
Evolution Thermal Imaging Camera Bulletin - en
8 pages
Sourav Sadhu: Experience Skills
No ratings yet
Sourav Sadhu: Experience Skills
2 pages
John Dulemba 08292011
No ratings yet
John Dulemba 08292011
6 pages
Developing Airport Immigration System To Reduce Time
No ratings yet
Developing Airport Immigration System To Reduce Time
3 pages
Data Collection:: Department of Computer Science and Engineering Course Code: CD503 Course Name: Pattern Recognition
No ratings yet
Data Collection:: Department of Computer Science and Engineering Course Code: CD503 Course Name: Pattern Recognition
1 page
Write Up - AI For Everyone
No ratings yet
Write Up - AI For Everyone
3 pages
Asus p5b - Bupdater Bios Upgrade Procedure
No ratings yet
Asus p5b - Bupdater Bios Upgrade Procedure
4 pages
Defender 2000 AP - DS - 80774824
No ratings yet
Defender 2000 AP - DS - 80774824
2 pages
Java JDBC PreparedStatement Example - HowToDoInJava
No ratings yet
Java JDBC PreparedStatement Example - HowToDoInJava
1 page
NRT Z43 Firmware Upgrade
No ratings yet
NRT Z43 Firmware Upgrade
2 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet

Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu

Uploaded by

Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu

Uploaded by

Simultaneous Multithreading

Pratyusa Manadhata, Vyas Sekar

1 Introduction: tirety and these are inherently incapable of

3.1 SMT Disadvantages

2. SMT h/w modifications required over a • On a multiprogrammed workload there

You might also like