CPU Performance and Power Consumption
CPU Performance and Power Consumption
CPU performance
CPU power consumption.
01/22/25
Elements of CPU
performance
Cycle time.
CPU pipeline.
Memory system.
01/22/25
Pipelining
01/22/25
Performance measures
01/22/25
ARM7 pipeline
01/22/25
ARM pipeline execution
time
1 2 3
01/22/25
Pipeline stalls
01/22/25
ARM multi-cycle LDMIA
instruction
time
01/22/25
Control stalls
01/22/25
ARM pipelined branch
time
01/22/25
Delayed branch
01/22/25
Example: ARM execution
time
01/22/25
FIR filter ARM code
; loop initiation code ; loop body
MOV r0,#0 ; use r0 for i, set to 0 loop LDR r4,[r3,r8] ; get value of
MOV r8,#0 ; use a separate index for c[i]
arrays LDR r6,[r5,r8] ; get value of x[i]
ADR r2,N ; get MUL r4,r4,r6 ; compute c[i]*x[i]
address for N ADD r2,r2,r4 ; add into running sum
LDR r1,[r2] ; get value of N ; update loop counter and array index
MOV r2,#0 ; use r2 for f, set to 0 ADD r8,r8,#4 ; add one to array index
ADR r3,c ; load r3 with address of base ADD r0,r0,#1 ; add 1 to i
of c ; test for exit
ADR r5,x ; load r5 with address of base CMP r0,r1
of x
BLT loop ; if i <
N, continue loop
loopend ...
01/22/25
FIR filter performance by
block
Block Variable # instructions # cycles
Initialization tinit 7 7
Body tbody 4 4
Update tupdate 2 2
Test ttest 2 [2,4]
01/22/25
C55x organization
3 data read busses B
C,
D bus
D busses
16
3 data read address busses 24
program address bus 24
program
read bus Program
32 Instruction Address Data
flow
Writes unit unit unit
Dual read
operand
Instruction
Data
Single operand unit
read
coefficient
fetch
from memory
2 data write busses 16
2 data write address busses 24
01/22/25
C55x pipeline hazards
Processor structure:
Three computation units.
14 operators.
Can perform two operations per
instruction.
Some combinations of operators are
not legal.
01/22/25
C55x hazards
A-unit ALU/A-unit ALU.
A-unit swap/A-unit swap.
D-unit ALU,shifter,MAC/D-unit
ALU,shifter,MAC
D-unit shifter/D-unit shift, store
D-unit shift, store/D-unit shift, store
D-unit swap/D-unit swap
P-unit control/P-unit control
01/22/25
Memory system performance
01/22/25
Types of cache misses
01/22/25
CPU power consumption
01/22/25
CMOS power consumption
01/22/25
CPU power-saving strategies
01/22/25
C55x low power features
Parallel execution units---longer idle
shutdown times.
Multiple data widths:
16-bit ALU vs. 40-bit ALU.
Instruction caches minimizes main
memory accesses.
Power management:
Function unit idle detection.
Memory idle detection.
User-configurable IDLE domains allow
programmer control of what hardware is shut
down.
01/22/25
Power management styles
01/22/25
Application: PowerPC 603
energy features
01/22/25
PowerPC 603 activity
Percentage of time units are idle for
SPEC integer/floating-point:
unit Specint92 Specfp92
D cache 29% 28%
I cache 29% 17%
load/store 35% 17%
fixed-point 38% 76%
floating-point 99% 30%
system register 89% 97%
01/22/25
Power-down costs
01/22/25
SA-1100 power state
machine
Prun = 400 mW
run
10 s
160 ms
90 s
10 s
90 s
idle sleep
01/22/25