ECE 463/563
Microprocessor Architecture
Overview
Prof. Eric Rotenberg
ECE 463/563, Microprocessor Architecture, 1
Prof. Eric Rotenberg
Computer Architecture & Systems
Microprocessor Architecture (CPUs)
Hard:
Correct & Fast CPU
Easy:
Correct CPU
ECE 463/563, Microprocessor Architecture,
2
Prof. Eric Rotenberg
Simple Processor Datapath
Register File
1
IF ID EX MEM WB
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 3
Prof. Eric Rotenberg
Invention #1 Pipelining
Register File
1
6
5
4
3
2
IF ID EX MEM WB
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 4
Prof. Eric Rotenberg
Problem: Data-Dependent Instructions
Register File
1
6
5
4
3
2
IF ID EX MEM WB
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 5
Prof. Eric Rotenberg
Invention #2 Register File Bypasses
Register File
1
6
5
4
3
2
IF ID EX MEM WB
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 6
Prof. Eric Rotenberg
Problem: Branch Instructions
Register File
2
? 1
IF ID EX MEM WB
2 (instr. fetch) (instr. decode) (execute) (memory) (writeback)
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 7
Prof. Eric Rotenberg
Invention #3 Branch Prediction
Register File
Branch
342 Predictor
? 1
IF ID EX MEM WB
2 (instr. fetch) (instr. decode) (execute) (memory) (writeback)
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 8
Prof. Eric Rotenberg
Problem: “Memory Wall”
Register File
Branch
Predictor
IF ID EX MEM WB
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 9
Prof. Eric Rotenberg
Invention #4 Caches
Register File
Branch
Predictor
IF ID EX MEM WB
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Instr. Data
Cache Cache
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 10
Prof. Eric Rotenberg
Caches (cont.)
• Locality of reference
– Temporal locality: If you access an item, likely to
access it again in near future
– Spatial locality: If you access an item, likely to
access a nearby item in the near future
ECE 463/563, Microprocessor Architecture, 11
Prof. Eric Rotenberg
Problem: Stalled Instructions
Register File
Branch
Predictor
4 3 2 1
IF ID EX MEM WB
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Instr. cache Data
Cache miss Cache
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 12
Prof. Eric Rotenberg
Invention #5 Out-of-Order Execution
Register File
Branch
Predictor
4
7
6
5 3 2 1
IF ID EX MEM WB
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Dynamic
Scheduler
Instr. cache Data
Cache miss Cache
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 13
Prof. Eric Rotenberg
Superscalar Execution
Register File
Branch
Predictor
7
1
4
IF852 ID EX MEM WB
3
6
9
(instr. fetch) (instr. decode) (execute) (memory) (writeback)
Dynamic
Scheduler
Instr. Data
Cache Cache
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 14
Prof. Eric Rotenberg
Deep Pipelining
Register File
Branch
Predictor
IF1 IF2 ID1 ID2 EX1 EX2 M1 M2 W1 W2
Dynamic
Scheduler
Instr. Data
Cache Cache
Memory (DRAM & Disk)
ECE 463/563, Microprocessor Architecture, 15
Prof. Eric Rotenberg
BRANCH
PREDICTION
L1
S
OOO Instr.
SE
Cache
AS
EXECUTION
YP
SUPPORT
.B
R.F
S
SSE L1
PA Data
. BY
R.F Cache
ECE 463/563, Microprocessor Architecture,
16
Prof. Eric Rotenberg
Computer System
ECE 209, 309
Application
Operating
ECE 306, CSC 501 or
ECE 465/565 System
ECE 466/566 Compiler Firmware
Instruction Set
Architecture
ECE 109, 463/563 Machine Organization
(ECE 506, 786) Processor Mem. I/O system
ECE 721
ECE 310, 464/564 Datapath & Control
ECE 212 Digital Design
ECE 211, 403 Circuit Design
ECE 546 Layout
Computer Architecture = Instruction Set Architecture + Machine Organization
(Microarchitecture)
ECE 463/563, Microprocessor Architecture, 17
Prof. Eric Rotenberg
What is Computer Architecture?
Computer Architecture =
Instruction Set Architecture +
Machine Organization
• Programmable storage (registers, • Capabilities and performance characteristics of
memory) principal components
• Data types and their encodings – e.g., register files, ALUs, memory system, etc.
(integer, floating-point, SIMD) • Ways in which these components are
• Instruction set interconnected
• Instruction formats and encodings • Choreography of components to realize the ISA
• Modes of addressing and accessing • Performance-enhancing techniques and
data and instructions components
• Exceptions and interrupts – e.g., pipelining, caches, predictors, dynamic scheduling,
• Virtual memory superscalar execution, etc.
• Memory consistency model
ECE 463/563, Microprocessor Architecture, 18
Prof. Eric Rotenberg
CPU time equation (brief version)
• CPU time = time to execute a program on CPU
• Two factors
– # cycles = number of clock cycles to execute a program
– Cycle Time (CT) = clock period = 1 / (clock frequency)
CPU time = (# cycles)x(CT)
Example:
# cycles = 109 cycles (1 billion cycles) clock frequency (f) = 1 GHz = 109 Hz = 109 cycles/s
CT = 1/f = 10-9 s/cycle = 1 ns/cycle
CPU time = (109 cycles) x (10-9 s/cycle ) = 1 s
ECE 463/563, Microprocessor Architecture, 19
Prof. Eric Rotenberg
Static vs. Dynamic Instructions
#include <stdio.h>
#include <inttypes.h>
Loop has 4 static instructions.
#define HUGE 1000000
Dynamically, at run-time, the loop
int main(int argc, char *argv[]) {
uint64_t a[HUGE];
executes 1 million times.
uint64_t sum = 0;
// I know, a[] uninitialized... just a demo. # static instructions = 4
for (uint64_t i = 0; i < HUGE; i++)
# dynamic instructions = 4 million
sum += a[i];
printf("sum = %lu\n", sum); Dynamic instruction count influences
number of cycles.
return(0);
}
a5: starts with address of first element of array a[] (&a[0])
a3: contains address just after last element of array a[] (&a[1000000])
a1: contains on-going sum, initialized to 0
1a538: ld a4,0(a5) // a4 = a[i]
1a53c: addi a5,a5,8 // increment address to point to next element of a[]
1a540: add a1,a1,a4 // sum += a[i]
1a544: bne a5,a3,1a538 <main+0x34> // branch to top of loop if not after last element of a[]
ECE 463/563, Microprocessor Architecture, 20
Prof. Eric Rotenberg
Influence on CPU time
• Programmer influence
CYCLES – Algorithm affects dynamic instruction count: more (fewer) instructions to fetch and execute may cause more
(fewer) cycles
– Algorithm affects temporal and spatial locality: more (fewer) cache misses may cause more (fewer) cycles
• Compiler influence
CYCLES – Compiler optimizations affect dynamic instruction count (up or down): more (fewer) instructions to fetch and
execute may cause more (fewer) cycles
? – Instruction scheduling aims to increase instruction level parallelism and hence reduce cycles
• Influence of instruction-set architecture (ISA)
– Complex interactions between ISA, compiler, and microarchitecture. Unresolved debate on whether ISA (e.g.,
CISC vs. RISC) influences performance as much as microarchitecture.
CYCLES,
• CTMicroarchitecture influence
– Pipeline optimizations aim to reduce cycles, by increasing instruction-level parallelism (ILP) (the number of
concurrently executing instructions and the extent of their overlapped execution)
(e.g., pipelining,
– Pipeline optimizations may increase CT due to increased logic complexity
CT – Deeper pipelining aims to decrease CT
data bypassing,
branch prediction,
caches,
CT • Circuit design influence dynamic scheduling,
– Faster circuits aim to decrease CT superscalar, etc.)
• Technology influence
– Faster transistors and wires aim to decrease CT
ECE 463/563, Microprocessor Architecture, 21
Prof. Eric Rotenberg
Overview of Topics in 463/563
1. Measuring Performance and Cost
2. Caches and Memory Hierarchies
3. Instruction-Set Architecture (ISA)
– Defines software/hardware interface
4. Simple Pipelining
– Data and control (branch) dependencies
– Register file bypasses
– Branch prediction
ECE 463/563, Microprocessor Architecture, 22
Prof. Eric Rotenberg
Overview of Topics in 463/563
5. Complex Pipelining and Instruction-Level
Parallelism (ILP)
– Data hazards
– Issue Queue (IQ): from in-order to out-of-order
scheduling
– Reorder Buffer (ROB): speculation and register
renaming
– Precise interrupts
– Superscalar, VLIW, and vector processors
ECE 463/563, Microprocessor Architecture, 23
Prof. Eric Rotenberg
Projects
• Three projects
– Cache simulator
– Branch predictor simulator
– Superscalar pipeline simulator
• Programming for projects is harder than
anything many of you have encountered
before
ECE 463/563, Microprocessor Architecture, 24
Prof. Eric Rotenberg
Course Grading
• 40% projects
– Project 1, caches: 15%
– Project 2, branch predictors: 10%
– Project 3, superscalar pipeline: 15%
• 20% Moodle quizzes
– Can be thought of as homework assignments
– Each quiz tests a unit of knowledge or a whole topic, or gives practice on manually
doing a computer architecture simulation
– The current plan is 10 quizzes but there could be more
– A quiz may be assigned with a group of lectures (most common) or a pre-recorded
video
• 20% Midterm
– Covers Performance/Cost, Caches
• 20% Final (NOT comprehensive)
– Covers ISA, Simple and Complex Pipelining, ILP
ECE 463/563, Microprocessor Architecture, 25
Prof. Eric Rotenberg
Course Web Page Moodle site
• wolfware.ncsu.edu
– Login
– Select ECE 463/563
• Content
– Link to Panopto for live-stream (webcast) and recordings
– Syllabus
– Schedule
– Contact information and office hours
– Q&A and discussion forums
– Projects: specifications, benchmark traces, validation runs, etc.
– Moodle (on-line) quizzes and exams
• Check frequently (I announce updates)
ECE 463/563, Microprocessor Architecture, 26
Prof. Eric Rotenberg