Comp Architecture 101
Comp Architecture 101
Who am I?
• 7 years at Intel, 17 years in industry
• Managing compiler teams (GCC, Go)
• 10 years teaching
Why we are here?
• To better understand how CPU works
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 2
*Other names and brands may be claimed as the property of others.
Texbooks and References
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 3
*Other names and brands may be claimed as the property of others.
Lecture Outline
• Pipeline
• Memory Hierarchy (Caches: +1 lecture later)
• Out-of-order execution
• Branch prediction
• Real example: Haswell Microarchitecture
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 4
*Other names and brands may be claimed as the property of others.
Layers of Abstraction
Application
Algorithms
Software
Programming Languages
Operating Systems/Libraries
Interface between
HW and SW
Instruction Set Architecture
Microarchitecture
Gates/Register-Transfer Level (RTL)
Hardware
Circuits
Physics
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 5
*Other names and brands may be claimed as the property of others.
Basic CPU Actions
4ns 8ns time
F D E M W
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 6
*Other names and brands may be claimed as the property of others.
Non-Pipelined Processing
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 8
*Other names and brands may be claimed as the property of others.
Pipeline vs Non-Pipeline
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 9
*Other names and brands may be claimed as the property of others.
Pipeline vs Non-Pipeline
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 10
*Other names and brands may be claimed as the property of others.
Pipeline Limitations
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 11
*Other names and brands may be claimed as the property of others.
Pipeline Limitations
• Various types of hazards:
• read after write (RAW), a true dependency
• write after read (WAR), an anti-dependency
• write after write (WAW), an output dependency
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 12
*Other names and brands may be claimed as the property of others.
Motivation for Memory Hierarchy
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 13
*Other names and brands may be claimed as the property of others.
Memory Tradeoffs
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 14
*Other names and brands may be claimed as the property of others.
Superscalar: Wide Pipeline
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 15
*Other names and brands may be claimed as the property of others.
Is Superscalar Good Enough?
• Theoretically can execute multiple instructions in parallel
• Wide pipeline => more performance
• But…
• Only independent subsequent instructions can be executed in parallel
• Whereas subsequent instructions are often dependent
• So the utilization of the second pipe is often low
• Solution: out-of-order execution
• Execute instructions based on the “data flow” graph, rather than
program order
• Still need to keep the visibility of in-order execution
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 16
*Other names and brands may be claimed as the property of others.
Data Flow Analysis
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 17
*Other names and brands may be claimed as the property of others.
Instruction “Grinder”
• Then technology allowed building wide HW, but the code representation
remained sequential
• Decision: extract parallelism back by means of hardware
• Compatibility burden: needs to look like sequential hardware
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 18
*Other names and brands may be claimed as the property of others.
Why Order is Important?
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 19
*Other names and brands may be claimed as the property of others.
Maintaining Architectural State
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 20
*Other names and brands may be claimed as the property of others.
Dependency Check
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 21
*Other names and brands may be claimed as the property of others.
How Large Windows Should Be?
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 22
*Other names and brands may be claimed as the property of others.
Limitation: False Dependencies
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 23
*Other names and brands may be claimed as the property of others.
Register Renaming
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 24
*Other names and brands may be claimed as the property of others.
Limitation: Branches
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 25
*Other names and brands may be claimed as the property of others.
Dynamic Branch Prediction
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 26
*Other names and brands may be claimed as the property of others.
How To Predict Branch?
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 27
*Other names and brands may be claimed as the property of others.
Using History Patterns
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 28
*Other names and brands may be claimed as the property of others.
Local Predictor
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 29
*Other names and brands may be claimed as the property of others.
Global Predictor
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 30
*Other names and brands may be claimed as the property of others.
Concepts Covered
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 31
*Other names and brands may be claimed as the property of others.
Intel Processor Roadmap
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 32
*Other names and brands may be claimed as the property of others.
Haswell Floorplan
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 33
*Other names and brands may be claimed as the property of others.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 34
*Other names and brands may be claimed as the property of others.
Block Diagram
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 35
*Other names and brands may be claimed as the property of others.
FrontEnd
• Instruction Fetch and Decode
• 32 KB 8-way Icache
• 4 decoders, up to 4 inst/cycle
• CISC to RISC transformation
• Decode Pipeline supports 16
bytes per cycle
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 36
*Other names and brands may be claimed as the property of others.
FrontEnd: Instruction Decode
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 37
*Other names and brands may be claimed as the property of others.
FrontEnd: Decode UOP Cache
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 38
*Other names and brands may be claimed as the property of others.
FrontEnd: Loop Stream Detector
• LSD detects small loops that fit in the
Decode Queue
• The loop streams from the uop queue,
with no more fetching, decoding, or
reading uops from any of the caches
• Works until a branch misprediction
• The loops with the following attributes
qualify for LSD replay
• Up to 56 uops
• All uops are also resident in the UC
• No more than eight taken branches
• No CALL or RET
• No mismatched stack operations (e.g.
more PUSH than POP)
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 39
*Other names and brands may be claimed as the property of others.
FrontEnd: Macro-Fusion
• Merge two instructions into a single uop
• Increased decode, rename and retire
bandwidth
• Power savings from representing
more work in fewer bits
• The first instruction of a macro-fused pair
modifies flags
• CMP, TEST, ADD, SUB, AND, INC, DEC
• The 2nd inst of a macro-fusible pair is a
conditional branch
• For each first instruction, some
branches can fuse with it
• These pairs are common in many apps
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 40
*Other names and brands may be claimed as the property of others.
OOO Structures
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 41
*Other names and brands may be claimed as the property of others.
OOO: Renamer
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 42
*Other names and brands may be claimed as the property of others.
OOO: Dependency Breaking Idiom
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 43
*Other names and brands may be claimed as the property of others.
EXE
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 44
*Other names and brands may be claimed as the property of others.
Core Cache Size/Latency/BW
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 45
*Other names and brands may be claimed as the property of others.
ST vs MT
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 46
*Other names and brands may be claimed as the property of others.