0% found this document useful (0 votes)
3 views43 pages

Lecture 2 CPU Fundamentals

This document discusses CPU fundamentals, focusing on technology trends, power consumption, and performance metrics in computer architecture. It outlines the evolution of semiconductor technology, the impact of multi-core processors, and the importance of instruction set architecture (ISA) in determining program performance. Additionally, it covers the relationship between clock speed, instruction count, and cycles per instruction (CPI) in evaluating CPU efficiency.

Uploaded by

chw7407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views43 pages

Lecture 2 CPU Fundamentals

This document discusses CPU fundamentals, focusing on technology trends, power consumption, and performance metrics in computer architecture. It outlines the evolution of semiconductor technology, the impact of multi-core processors, and the importance of instruction set architecture (ISA) in determining program performance. Additionally, it covers the relationship between clock speed, instruction count, and cycles per instruction (CPI) in evaluating CPU efficiency.

Uploaded by

chw7407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

COMPUTER ORGANIZATION AND DESIGN

The Hardware/Software Interface

Lecture 2: CPU Fundamentals


Computer Architecture

Injae Yoo
VLSI Research Lab, PNU
Lecture 2: CPU Fundamentals

TECHNOLOGY TRENDS

2
Technology Trends
 Semiconductor technology continues to evolve
 Increased capacity and performance
 Reduced cost

Year Technology Relative performance/cost


1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000

3
Technology Trends: Logics
 Integrated circuit technology (Moore’s Law)
 Transistor density: 35%/year
 Die size: 10-20%/year
 Integration overall: 40-55%/year

Transistor size↓ Performance (clock rate)↑ Power ↓

4
Technology Trends: Memory
 DRAM capacity: 25-40%/year (slowing)
 8 Gb (2014), 16 Gb (2019), …

 Flash capacity: 50-60%/year


 8-10X cheaper/bit than DRAM

 Magnetic disk capacity: recently slowed to 5%/year


 Density increases may no longer be possible, maybe increase
from 7 to 9 platters
 8-10X cheaper/bit then Flash
 200-300X cheaper/bit than DRAM

5
Power Trends
 In CMOS IC technology

Power  Capacitive load  Voltage  Frequency


2

×30 5V → 1V ×1000

6
Power Trends
 Intel 80386 consumed ~2W (1986)
 3.3 GHz Intel Core i7 consumes ~130W
 Heat must be dissipated from 1.5 x 1.5 cm chip
 Hits the air cooling limit

7
Reducing Power
 Suppose a new CPU has
 85% of capacitive load of old CPU
 15% voltage and 15% frequency reduction

Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85


 2
 0.85 4
 0.52
Pold Cold  Vold  Fold

 The power wall


 We can’t reduce voltage further
 We can’t remove more heat
 How else can we improve performance?

8
Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency

9
Multiprocessors
 Multicore microprocessors
 More than one processor per chip
 Requires explicitly parallel programming
 Compare with instruction level parallelism
 Hardware executes multiple instructions at once
 Hidden from the programmer
 Hard to do
 Programming for performance
 Load balancing
 Optimizing communication and synchronization

10
Lecture 2: CPU Fundamentals

WHAT’S A COMPUTER?
(+ CPU, ISA)

11
The Computer Revolution
 Progress in computer technology
 Underpinned by domain-specific accelerators
 Makes novel applications feasible
 World Wide Web
 Search Engines
 Smartphones
 Computers in automobiles
 Computers are pervasive

12
Classes of Computers
 Personal computers
 General purpose, variety of software
 Subject to cost/performance tradeoff

 Server computers
 Network based
 High capacity, performance, reliability
 Range from small servers to building sized

 Supercomputers
 Type of server
 High-end scientific and engineering calculations
 Highest capability but represent a small fraction of the overall market

 Embedded computers
 Hidden as components of systems
 Stringent power/performance/cost constraints

13
The PostPC Era

14
The PostPC Era
 Personal Mobile Device (PMD)
 Battery operated
 Connects to the Internet
 Hundreds of dollars
 Smart phones, tablets, electronic glasses
 Apple and Samsung

 Cloud computing
 Warehouse Scale Computers (WSC)
 Software as a Service (SaaS)
 Portion of software run on a PMD and a portion run in the Cloud
 Amazon, Microsoft, and Google
15
Modern Computer in a Nutshell
From the perspective of programs running
on CPU, it’s just a bunch of memory space.

So, we will call everything just “memory”


• Where chunks of OS and apps are loaded
• Where chunks of data is loaded
• While your phone is running
I$
Storage Main (SRAM)
CPU
(Microprocessor)
(NAND Flash Memory
D$ Register
Memory) (DRAM) (SRAM) File
• Where entire OS (Android) and apps are stored
• Where all data is stored • I$ (instruction cache): Where programs (OS / app)
• While your phone is idle or off being executed are loaded
• D$ (data cache): Where data being
processed / created are loaded
• CPU: Where data are processed / created
according to programs
• Register file: A scribble pad of CPU

16
Apple iPad Pro Teardown

17
Apple iPad Pro Teardown

“The processor”
(CPU+GPU+NPU)
Storage

Memory

I/O

Power

18
Inside the Processor
 Apple A12 processor

• 2 + 4 ARM CPU cores


• On-chip cache memories
• GPU, NPU
• DRAM controllers

19
Inside the CPU
 Datapath: performs operations on data
 Control: sequences datapath, memory, ...
 Cache memory
 Small fast SRAM memory for immediate access to data

I$
(SRAM)
CPU
(Microprocessor)
D$ Register
(SRAM) File

20
What We Will Learn (Briefly) Today
 How programs are translated into the machine language
 And how the CPU executes them
 The hardware/software interface  ISA
 What determines program performance
 And how it can be improved
 How CPU hardware designers improve performance

21
Below Your Program
 Application software
 Written in high-level language (HLL) – C, Python, …
 System software
 Compiler: Translates HLL code to machine code
 Operating System
 Handling system input/output
 Managing memory and storage
 Scheduling tasks & sharing resources

 Hardware
 Processor, memory, I/O controllers

22
Levels of Program Code
 High-level language
 Level of abstraction closer to
problem (or algorithm) domain
 Provides for productivity and portability
 Assembly language
 Textual representation of instructions
 Hardware representation
 Binary digits (bits)
 Encoded instructions and data

23
What is a Program (or Software)?
 Sequences of instructions to do a certain task
 Example: Finding length of a text string
 In a high-level programming language (like C):

 In a low-level language (or assembly): Compile

Each line is one


microprocessor instruction

24
What is an Instruction?
 A unit of microprocessor operations
 Instruction set architecture (ISA):
Set of instructions, defining a microprocessor (CPU) architecture
 x86 (Intel) vs ARM vs RISC-V

 Usually 3 types of instructions The entire set of basic RISC-V instructions

 Load/Store
Load
 Arithmetic
 Branch
Arithmetic

Store

Arithmetic

Branch
25
What is an Instruction?
 3 types of instructions ?
 Load/Store
 Read/Write data from/to memory
(or external devices through memory-mapped IO)
 Arithmetic
 Process data inside CPU
 Add, subtract, shift, …
 Branch
 Change program counter
(or change the program’s execution sequence)
 For example, leaving a for loop after some iterations

Branch

I$
(SRAM)
CPU
(Microprocessor)
D$ Load
Arithmetic
(SRAM) Store

26
Lecture 2: CPU Fundamentals

PERFORMANCE OF A CPU

27
Understanding “Performance”
 Algorithm (or a program) to execute
 Determines number of operations executed
 Programming language, compiler, architecture
 Determine number of machine instructions executed per operation
 Processor and memory system
 Determine how fast instructions are executed

28
X Defining Performance
 Which airplane has the best performance?

29
×
Response Time and Throughput
 Response time
 How long it takes to do a task
 Throughput
 Total work done per unit time
 e.g., tasks/transactions/… per hour

 How are response time and throughput affected by


 Replacing the processor with a faster version?
 Adding more processors?
 We’ll focus on response time for now…

30
Relative Performance
 Define Performance = 1 / Execution Time
 “X is n time faster than Y”

Performanc e X Performanc e Y
 Execution time Y Execution time X  n

 Example: Time taken to run a program


 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B

31
CPU Clocking
 Operation of digital hardware governed by a constant-rate clock
2 G 12 ) O 5 us
.

Clock period

Clock (cycles)

Data transfer
I instruction
and computation
Update state

 Clock period: duration of a clock cycle


 e.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0×109Hz

32
CPU Time = executiontime

 Performance improved by
 Reducing number of clock cycles
 Increasing clock rate (frequency)
 Hardware designer must often trade off clock rate against
cycle count

CPU Time  CPU Clock Cycles  Clock Cycle Time


CPU Clock Cycles

Clock Rate

33
CPU Time Example
 Computer A: 2GHz clock, 10s CPU time [ ]

 Designing Computer B clock


cpu timed →
cyde .
 Aim for 6s CPU time
 Can do faster clock, but causes 1.2 × clock cycles 1 Os → 6, clock cycle 1 . 2

 How fast must Computer B clock be?

Clock CyclesB 1.2  Clock Cycles A


Clock RateB  
CPU Time B 6s
Clock Cycles A  CPU Time A  Clock Rate A
 10s  2GHz  20  10 9
1.2  20  10 9 24  10 9
Clock RateB    4GHz
6s 6s
clock rate
34
clock cycled
Instruction Count and CPI
 Instruction Count for a program C

 Determined by program, ISA and compiler


 Average cycles per instruction (CPI)
CPI
 Determined by CPU hardware
 If different instructions have different CPI
 Average CPI affected by instruction mix

Clock Cycles  Instruction Count  Cycles per Instruction


CPU Time  Instruction Count  CPI  Clock Cycle Time
Instruction Count  CPI

Clock Rate
ICX CPI Xclock
=
period
35 ( Tc )
CPI Example
 Computer A: Cycle Time = 250ps, CPI = 2.0
 Computer B: Cycle Time = 500ps, CPI = 1.2
 Assume the same ISA  Same instruction count!
 Which is faster, and by how much?

CPU Time  Instruction Count  CPI  Cycle Time


A A A
 I  2.0  250ps  I  500ps A is faster…
CPU Time  Instruction Count  CPI  Cycle Time
B B B
 I  1.2  500ps  I  600ps
CPU Time
B  I  600ps  1.2
…by this much
CPU Time I  500ps
A

36
CPI in More Detail
 If different instruction classes take different numbers of cycles
 Weighted average CPI

n
Clock Cycles   (CPIi  Instruction Count i )
i1

Clock Cycles n
 Instruction Count i 
CPI     CPIi  
Instruction Count i1  Instruction Count 

Relative frequency

37
CPI Example
 Alternative compiled code sequences using instructions
in classes A, B, C

Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

 Sequence 1: IC = 5  Sequence 2: IC = 6
 Clock Cycles  Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
 Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5

38
Performance Summary
 Performance depends on
 Algorithm: affects IC, possibly CPI
SW ISAA IC ,CPI
 Programming language: affects IC, CPI ,

 Compiler: affects IC, CPI


SW
ISA  Instruction set architecture: affects IC, CPI, Tc (clock cycle time)
HW
 Hardware design: affects Tc HW ISA
.
Tc

Instructions Clock cycles Seconds


CPU Time   
Program Instruction Clock cycle

ISA : HW , SW

39
× SPEC CPU Benchmark
 Programs used to measure performance
 Supposedly typical of actual workload
 Standard Performance Evaluation Corp (SPEC)
 Develops benchmarks for CPU, I/O, …

 SPEC CPU2006
 Elapsed time to execute a selection of programs
 Negligible I/O, so focuses on CPU performance
 Normalize relative to reference machine
 Summarize as geometric mean of performance ratios
 CINT2006 (integer) and CFP2006 (floating-point)

n
n
 Execution time ratio
i1
i

40
SPECspeed 2017 Integer benchmarks on a
1.8 GHz Intel Xeon E5-2650L

41
Amdahl’s Law search

 Improving an aspect of a computer and expecting a


proportional improvement in overall performance

Taffected
Timproved   Tunaffected
improvemen t factor

MoOs 80 s
 Example: multiply accounts for 80s/100s
 How much improvement in multiply performance to get 5× overall?
80
20   20 Can’t be done!
n
 Corollary: make the common case fast!

42
Concluding Remarks
 Cost/performance is improving
 Due to underlying technology development
 Hierarchical layers of abstraction
 In both hardware and software
 Instruction set architecture
 The hardware/software interface
 Execution time: the best performance measure
 Power is a limiting factor
 Use parallelism to improve performance

43

You might also like