Lecture 2 CPU Fundamentals
Lecture 2 CPU Fundamentals
Injae Yoo
VLSI Research Lab, PNU
Lecture 2: CPU Fundamentals
TECHNOLOGY TRENDS
2
Technology Trends
Semiconductor technology continues to evolve
Increased capacity and performance
Reduced cost
3
Technology Trends: Logics
Integrated circuit technology (Moore’s Law)
Transistor density: 35%/year
Die size: 10-20%/year
Integration overall: 40-55%/year
4
Technology Trends: Memory
DRAM capacity: 25-40%/year (slowing)
8 Gb (2014), 16 Gb (2019), …
5
Power Trends
In CMOS IC technology
×30 5V → 1V ×1000
6
Power Trends
Intel 80386 consumed ~2W (1986)
3.3 GHz Intel Core i7 consumes ~130W
Heat must be dissipated from 1.5 x 1.5 cm chip
Hits the air cooling limit
7
Reducing Power
Suppose a new CPU has
85% of capacitive load of old CPU
15% voltage and 15% frequency reduction
8
Uniprocessor Performance
9
Multiprocessors
Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
Hardware executes multiple instructions at once
Hidden from the programmer
Hard to do
Programming for performance
Load balancing
Optimizing communication and synchronization
10
Lecture 2: CPU Fundamentals
WHAT’S A COMPUTER?
(+ CPU, ISA)
11
The Computer Revolution
Progress in computer technology
Underpinned by domain-specific accelerators
Makes novel applications feasible
World Wide Web
Search Engines
Smartphones
Computers in automobiles
Computers are pervasive
12
Classes of Computers
Personal computers
General purpose, variety of software
Subject to cost/performance tradeoff
Server computers
Network based
High capacity, performance, reliability
Range from small servers to building sized
Supercomputers
Type of server
High-end scientific and engineering calculations
Highest capability but represent a small fraction of the overall market
Embedded computers
Hidden as components of systems
Stringent power/performance/cost constraints
13
The PostPC Era
14
The PostPC Era
Personal Mobile Device (PMD)
Battery operated
Connects to the Internet
Hundreds of dollars
Smart phones, tablets, electronic glasses
Apple and Samsung
Cloud computing
Warehouse Scale Computers (WSC)
Software as a Service (SaaS)
Portion of software run on a PMD and a portion run in the Cloud
Amazon, Microsoft, and Google
15
Modern Computer in a Nutshell
From the perspective of programs running
on CPU, it’s just a bunch of memory space.
16
Apple iPad Pro Teardown
17
Apple iPad Pro Teardown
“The processor”
(CPU+GPU+NPU)
Storage
Memory
I/O
Power
18
Inside the Processor
Apple A12 processor
19
Inside the CPU
Datapath: performs operations on data
Control: sequences datapath, memory, ...
Cache memory
Small fast SRAM memory for immediate access to data
I$
(SRAM)
CPU
(Microprocessor)
D$ Register
(SRAM) File
20
What We Will Learn (Briefly) Today
How programs are translated into the machine language
And how the CPU executes them
The hardware/software interface ISA
What determines program performance
And how it can be improved
How CPU hardware designers improve performance
21
Below Your Program
Application software
Written in high-level language (HLL) – C, Python, …
System software
Compiler: Translates HLL code to machine code
Operating System
Handling system input/output
Managing memory and storage
Scheduling tasks & sharing resources
Hardware
Processor, memory, I/O controllers
22
Levels of Program Code
High-level language
Level of abstraction closer to
problem (or algorithm) domain
Provides for productivity and portability
Assembly language
Textual representation of instructions
Hardware representation
Binary digits (bits)
Encoded instructions and data
23
What is a Program (or Software)?
Sequences of instructions to do a certain task
Example: Finding length of a text string
In a high-level programming language (like C):
24
What is an Instruction?
A unit of microprocessor operations
Instruction set architecture (ISA):
Set of instructions, defining a microprocessor (CPU) architecture
x86 (Intel) vs ARM vs RISC-V
Load/Store
Load
Arithmetic
Branch
Arithmetic
Store
Arithmetic
Branch
25
What is an Instruction?
3 types of instructions ?
Load/Store
Read/Write data from/to memory
(or external devices through memory-mapped IO)
Arithmetic
Process data inside CPU
Add, subtract, shift, …
Branch
Change program counter
(or change the program’s execution sequence)
For example, leaving a for loop after some iterations
Branch
I$
(SRAM)
CPU
(Microprocessor)
D$ Load
Arithmetic
(SRAM) Store
26
Lecture 2: CPU Fundamentals
PERFORMANCE OF A CPU
27
Understanding “Performance”
Algorithm (or a program) to execute
Determines number of operations executed
Programming language, compiler, architecture
Determine number of machine instructions executed per operation
Processor and memory system
Determine how fast instructions are executed
28
X Defining Performance
Which airplane has the best performance?
29
×
Response Time and Throughput
Response time
How long it takes to do a task
Throughput
Total work done per unit time
e.g., tasks/transactions/… per hour
30
Relative Performance
Define Performance = 1 / Execution Time
“X is n time faster than Y”
Performanc e X Performanc e Y
Execution time Y Execution time X n
31
CPU Clocking
Operation of digital hardware governed by a constant-rate clock
2 G 12 ) O 5 us
.
Clock period
Clock (cycles)
Data transfer
I instruction
and computation
Update state
32
CPU Time = executiontime
Performance improved by
Reducing number of clock cycles
Increasing clock rate (frequency)
Hardware designer must often trade off clock rate against
cycle count
33
CPU Time Example
Computer A: 2GHz clock, 10s CPU time [ ]
36
CPI in More Detail
If different instruction classes take different numbers of cycles
Weighted average CPI
n
Clock Cycles (CPIi Instruction Count i )
i1
Clock Cycles n
Instruction Count i
CPI CPIi
Instruction Count i1 Instruction Count
Relative frequency
37
CPI Example
Alternative compiled code sequences using instructions
in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
38
Performance Summary
Performance depends on
Algorithm: affects IC, possibly CPI
SW ISAA IC ,CPI
Programming language: affects IC, CPI ,
ISA : HW , SW
39
× SPEC CPU Benchmark
Programs used to measure performance
Supposedly typical of actual workload
Standard Performance Evaluation Corp (SPEC)
Develops benchmarks for CPU, I/O, …
SPEC CPU2006
Elapsed time to execute a selection of programs
Negligible I/O, so focuses on CPU performance
Normalize relative to reference machine
Summarize as geometric mean of performance ratios
CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i1
i
40
SPECspeed 2017 Integer benchmarks on a
1.8 GHz Intel Xeon E5-2650L
41
Amdahl’s Law search
Taffected
Timproved Tunaffected
improvemen t factor
MoOs 80 s
Example: multiply accounts for 80s/100s
How much improvement in multiply performance to get 5× overall?
80
20 20 Can’t be done!
n
Corollary: make the common case fast!
42
Concluding Remarks
Cost/performance is improving
Due to underlying technology development
Hierarchical layers of abstraction
In both hardware and software
Instruction set architecture
The hardware/software interface
Execution time: the best performance measure
Power is a limiting factor
Use parallelism to improve performance
43