0% found this document useful (0 votes)
4 views46 pages

Comp Architecture 101

Areg Melik-Adamyan, an Engineering Manager at Intel, presents an overview of CPU architecture, focusing on concepts such as pipelining, memory hierarchy, out-of-order execution, and branch prediction. The lecture aims to enhance understanding of CPU operations and performance optimization. Key topics include the limitations of pipelining, memory trade-offs, and the role of superscalar architecture in improving instruction execution efficiency.

Uploaded by

Satya Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views46 pages

Comp Architecture 101

Areg Melik-Adamyan, an Engineering Manager at Intel, presents an overview of CPU architecture, focusing on concepts such as pipelining, memory hierarchy, out-of-order execution, and branch prediction. The lecture aims to enhance understanding of CPU operations and performance optimization. Key topics include the limitations of pipelining, memory trade-offs, and the role of superscalar architecture in improving instruction execution efficiency.

Uploaded by

Satya Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Areg Melik-Adamyan, PhD

Engineering Manager, Intel Developer Products Division


Introduction

Who am I?
• 7 years at Intel, 17 years in industry
• Managing compiler teams (GCC, Go)
• 10 years teaching
Why we are here?
• To better understand how CPU works

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 2
*Other names and brands may be claimed as the property of others.
Texbooks and References

• Try to hit the tip of the iceberg


• Explain main concepts only
• Not enough to develop your own microprocessor…
• But allow better understand behavior and performance of your program
• Hennesy, Patterson, Computer Architecture: Quantative Approach, 6th Ed.
• Blaauw, Brooks, Computer Architecture: Concepts and Evolution

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 3
*Other names and brands may be claimed as the property of others.
Lecture Outline

• Pipeline
• Memory Hierarchy (Caches: +1 lecture later)
• Out-of-order execution
• Branch prediction
• Real example: Haswell Microarchitecture

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 4
*Other names and brands may be claimed as the property of others.
Layers of Abstraction
Application
Algorithms
Software
Programming Languages
Operating Systems/Libraries
Interface between
HW and SW
Instruction Set Architecture
Microarchitecture
Gates/Register-Transfer Level (RTL)
Hardware
Circuits
Physics

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 5
*Other names and brands may be claimed as the property of others.
Basic CPU Actions
4ns 8ns time

F D E M W

1. Fetch instruction by PC from memory


2. Decode it and read its operands from registers
3. Execute calculations
4. Read/write memory
5. Write the result into registers and update PC

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 6
*Other names and brands may be claimed as the property of others.
Non-Pipelined Processing

• Instructions are processed sequentially, one per cycle


• How to speed-up?
• SW: decrease number of instructions
• HW: decrease the time to process one instruction
or overlap their processing. i.e. make pipeline
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 7
*Other names and brands may be claimed as the property of others.
Pipeline

• Processing is split into several steps called “stages”


• Each stage takes one cycle
• The clock cycle is determined by the longest stage
• Instructions are overlapped
• A new instruction occupies a stage as soon as the previous one leaves it

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 8
*Other names and brands may be claimed as the property of others.
Pipeline vs Non-Pipeline

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 9
*Other names and brands may be claimed as the property of others.
Pipeline vs Non-Pipeline

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 10
*Other names and brands may be claimed as the property of others.
Pipeline Limitations

• Max speed of the pipeline is one instruction per clock


• It is rare due to dependencies among instructions (data or control) and in-
order processing

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 11
*Other names and brands may be claimed as the property of others.
Pipeline Limitations
• Various types of hazards:
• read after write (RAW), a true dependency
• write after read (WAR), an anti-dependency
• write after write (WAW), an output dependency

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 12
*Other names and brands may be claimed as the property of others.
Motivation for Memory Hierarchy

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 13
*Other names and brands may be claimed as the property of others.
Memory Tradeoffs

• Large memories are slow


• Small memories are fast, but expensive and consume high power
• Goal: give the processor a feeling that it has memory which is fast, large,
cheap and consumes low energy
• Solution: Hierarchy of Memories

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 14
*Other names and brands may be claimed as the property of others.
Superscalar: Wide Pipeline

• Pipeline exploits instruction level parallelism (ILP)


• Can we improve? Execute, instructions in parallel
• Need to double HW structures
• Max speedup is 2 instructions per cycle (IPC=2)
• The real speedup is less due to dependencies and in-order execution

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 15
*Other names and brands may be claimed as the property of others.
Is Superscalar Good Enough?
• Theoretically can execute multiple instructions in parallel
• Wide pipeline => more performance
• But…
• Only independent subsequent instructions can be executed in parallel
• Whereas subsequent instructions are often dependent
• So the utilization of the second pipe is often low
• Solution: out-of-order execution
• Execute instructions based on the “data flow” graph, rather than
program order
• Still need to keep the visibility of in-order execution
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 16
*Other names and brands may be claimed as the property of others.
Data Flow Analysis

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 17
*Other names and brands may be claimed as the property of others.
Instruction “Grinder”

• Then technology allowed building wide HW, but the code representation
remained sequential
• Decision: extract parallelism back by means of hardware
• Compatibility burden: needs to look like sequential hardware

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 18
*Other names and brands may be claimed as the property of others.
Why Order is Important?

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 19
*Other names and brands may be claimed as the property of others.
Maintaining Architectural State

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 20
*Other names and brands may be claimed as the property of others.
Dependency Check

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 21
*Other names and brands may be claimed as the property of others.
How Large Windows Should Be?

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 22
*Other names and brands may be claimed as the property of others.
Limitation: False Dependencies

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 23
*Other names and brands may be claimed as the property of others.
Register Renaming

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 24
*Other names and brands may be claimed as the property of others.
Limitation: Branches

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 25
*Other names and brands may be claimed as the property of others.
Dynamic Branch Prediction

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 26
*Other names and brands may be claimed as the property of others.
How To Predict Branch?

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 27
*Other names and brands may be claimed as the property of others.
Using History Patterns

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 28
*Other names and brands may be claimed as the property of others.
Local Predictor

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 29
*Other names and brands may be claimed as the property of others.
Global Predictor

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 30
*Other names and brands may be claimed as the property of others.
Concepts Covered

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 31
*Other names and brands may be claimed as the property of others.
Intel Processor Roadmap

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 32
*Other names and brands may be claimed as the property of others.
Haswell Floorplan

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 33
*Other names and brands may be claimed as the property of others.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 34
*Other names and brands may be claimed as the property of others.
Block Diagram

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 35
*Other names and brands may be claimed as the property of others.
FrontEnd
• Instruction Fetch and Decode
• 32 KB 8-way Icache
• 4 decoders, up to 4 inst/cycle
• CISC to RISC transformation
• Decode Pipeline supports 16
bytes per cycle

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 36
*Other names and brands may be claimed as the property of others.
FrontEnd: Instruction Decode

• Four decoding units decode instructions


into uops
• The first can decode all instructions
up to four uops in size
• Uops emitted by the decoders are
directed to the Decode Queue and to
the Decoded Uop Cache
• Instructions with >4 uoops generate
their uops from the MSROM
• The MSROM bandwith is 4 uops per
cycle

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 37
*Other names and brands may be claimed as the property of others.
FrontEnd: Decode UOP Cache

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 38
*Other names and brands may be claimed as the property of others.
FrontEnd: Loop Stream Detector
• LSD detects small loops that fit in the
Decode Queue
• The loop streams from the uop queue,
with no more fetching, decoding, or
reading uops from any of the caches
• Works until a branch misprediction
• The loops with the following attributes
qualify for LSD replay
• Up to 56 uops
• All uops are also resident in the UC
• No more than eight taken branches
• No CALL or RET
• No mismatched stack operations (e.g.
more PUSH than POP)

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 39
*Other names and brands may be claimed as the property of others.
FrontEnd: Macro-Fusion
• Merge two instructions into a single uop
• Increased decode, rename and retire
bandwidth
• Power savings from representing
more work in fewer bits
• The first instruction of a macro-fused pair
modifies flags
• CMP, TEST, ADD, SUB, AND, INC, DEC
• The 2nd inst of a macro-fusible pair is a
conditional branch
• For each first instruction, some
branches can fuse with it
• These pairs are common in many apps
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 40
*Other names and brands may be claimed as the property of others.
OOO Structures

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 41
*Other names and brands may be claimed as the property of others.
OOO: Renamer

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 42
*Other names and brands may be claimed as the property of others.
OOO: Dependency Breaking Idiom

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 43
*Other names and brands may be claimed as the property of others.
EXE

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 44
*Other names and brands may be claimed as the property of others.
Core Cache Size/Latency/BW

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 45
*Other names and brands may be claimed as the property of others.
ST vs MT

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 46
*Other names and brands may be claimed as the property of others.

You might also like