0% found this document useful (0 votes)
11 views30 pages

CPU Performance and Power Consumption

The document discusses CPU performance, focusing on factors such as cycle time, pipelining, and memory systems. It details the ARM7 and C55x pipeline architectures, their execution processes, and the impact of pipeline stalls and hazards on performance. Additionally, it covers CPU power consumption strategies, types of cache misses, and specific power-saving features of processors like PowerPC 603 and StrongARM SA-1100.

Uploaded by

kavyapirangi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views30 pages

CPU Performance and Power Consumption

The document discusses CPU performance, focusing on factors such as cycle time, pipelining, and memory systems. It details the ARM7 and C55x pipeline architectures, their execution processes, and the impact of pipeline stalls and hazards on performance. Additionally, it covers CPU power consumption strategies, types of cache misses, and specific power-saving features of processors like PowerPC 603 and StrongARM SA-1100.

Uploaded by

kavyapirangi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

CPUs

CPU performance
CPU power consumption.

01/22/25
Elements of CPU
performance

Cycle time.
CPU pipeline.
Memory system.

01/22/25
Pipelining

 Several instructions are executed


simultaneously at different stages of
completion.
 Various conditions can cause pipeline
bubbles that reduce utilization:
 branches;
 memory system delays;
 etc.

01/22/25
Performance measures

Latency: time it takes for an


instruction to get through the
pipeline.
Throughput: number of instructions
executed per time period.
Pipelining increases throughput
without reducing latency.

01/22/25
ARM7 pipeline

 ARM 7 has 3-stage pipe:


 fetch instruction from memory;
 decode opcode and operands;
 execute.

01/22/25
ARM pipeline execution

fetch decode execute add r0,r1,#5

sub r2,r3,r6 fetch decode execute

cmp r2,#3 fetch decode execute

time
1 2 3

01/22/25
Pipeline stalls

• If every step cannot be completed in


the same amount of time, pipeline
stalls.
 Bubbles introduced by stall increase
latency, reduce throughput.

01/22/25
ARM multi-cycle LDMIA
instruction

ldmia fetch decodeex ld r2ex ld r3


r0,{r2,r3}

sub fetch decode ex sub


r2,r3,r6

cmp fetch decodeex cmp


r2,#3

time

01/22/25
Control stalls

 Branches often introduce stalls


(branch penalty).
 Stall time may depend on whether
branch is taken.
 May have to squash instructions that
already started executing.

01/22/25
ARM pipelined branch

bne foo fetch decode ex bne ex bne ex bne

sub fetch decode


r2,r3,r6

foo add fetch decode ex add


r0,r1,r2

time

01/22/25
Delayed branch

 To increase pipeline efficiency,


delayed branch mechanism requires
n instructions after branch always
executed whether branch is
executed or not.

01/22/25
Example: ARM execution
time

 Determine execution time of FIR


filter:
 for (i=0; i<N; i++)
 f = f + c[i]*x[i];
 Only branch in loop test may take
more than one cycle.
 BLT loop takes 1 cycle best case, 3
worst case.

01/22/25
FIR filter ARM code
; loop initiation code ; loop body
MOV r0,#0 ; use r0 for i, set to 0 loop LDR r4,[r3,r8] ; get value of
MOV r8,#0 ; use a separate index for c[i]
arrays LDR r6,[r5,r8] ; get value of x[i]
ADR r2,N ; get MUL r4,r4,r6 ; compute c[i]*x[i]
address for N ADD r2,r2,r4 ; add into running sum
LDR r1,[r2] ; get value of N ; update loop counter and array index
MOV r2,#0 ; use r2 for f, set to 0 ADD r8,r8,#4 ; add one to array index
ADR r3,c ; load r3 with address of base ADD r0,r0,#1 ; add 1 to i
of c ; test for exit
ADR r5,x ; load r5 with address of base CMP r0,r1
of x
BLT loop ; if i <
N, continue loop
loopend ...

01/22/25
FIR filter performance by
block
Block Variable # instructions # cycles
Initialization tinit 7 7
Body tbody 4 4
Update tupdate 2 2
Test ttest 2 [2,4]

tloop = tinit+ N(tbody + tupdate) + (N-1) ttest,worst + ttest,best

Loop test succeeds is worst case


Loop test fails is best case
01/22/25
C55x pipeline

 C55x has 7-stage pipe:


 fetch;
 decode;
 address: computes data/branch addresses;
 access 1: reads data;
 access 2: finishes data read;
 Read stage: puts operands on internal
busses;
 execute.

01/22/25
C55x organization
3 data read busses B
C,
D bus
D busses
16
3 data read address busses 24
program address bus 24
program
read bus Program
32 Instruction Address Data
flow
Writes unit unit unit
Dual read
operand
Instruction
Data
Single operand unit
read
coefficient
fetch
from memory
2 data write busses 16
2 data write address busses 24

01/22/25
C55x pipeline hazards

 Processor structure:
 Three computation units.
 14 operators.
 Can perform two operations per
instruction.
 Some combinations of operators are
not legal.

01/22/25
C55x hazards
 A-unit ALU/A-unit ALU.
 A-unit swap/A-unit swap.
 D-unit ALU,shifter,MAC/D-unit
ALU,shifter,MAC
 D-unit shifter/D-unit shift, store
 D-unit shift, store/D-unit shift, store
 D-unit swap/D-unit swap
 P-unit control/P-unit control

01/22/25
Memory system performance

 Caches introduce indeterminacy in


execution time.
 Depends on order of execution.
 Cache miss penalty: added time due
to a cache miss.

01/22/25
Types of cache misses

 Compulsory miss: location has not


been referenced before.
 Conflict miss: two locations are
fighting for the same block.
 Capacity miss: working set is too
large.

01/22/25
CPU power consumption

 Most modern CPUs are designed with


power consumption in mind to some
degree.
 Power vs. energy:
 heat depends on power consumption;
 battery life depends on energy
consumption.

01/22/25
CMOS power consumption

 Voltage drops: power consumption


proportional to V2.
 Toggling: more activity means more
power.
 Leakage: basic circuit
characteristics; can be eliminated by
disconnecting power.

01/22/25
CPU power-saving strategies

 Reduce power supply voltage.


 Run at lower clock frequency.
 Disable function units with control
signals when not in use.
 Disconnect parts from power supply
when not in use.

01/22/25
C55x low power features
 Parallel execution units---longer idle
shutdown times.
 Multiple data widths:
 16-bit ALU vs. 40-bit ALU.
 Instruction caches minimizes main
memory accesses.
 Power management:
 Function unit idle detection.
 Memory idle detection.
 User-configurable IDLE domains allow
programmer control of what hardware is shut
down.

01/22/25
Power management styles

 Static power management: does not


depend on CPU activity.
 Example: user-activated power-down
mode.
 Dynamic power management: based
on CPU activity.
 Example: disabling off function units.

01/22/25
Application: PowerPC 603
energy features

 Provides doze, nap, sleep modes.


 Dynamic power management
features:
 Uses static logic.
 Can shut down unused execution units.
 Cache organized into subarrays to
minimize amount of active circuitry.

01/22/25
PowerPC 603 activity
 Percentage of time units are idle for
SPEC integer/floating-point:
unit Specint92 Specfp92
D cache 29% 28%
I cache 29% 17%
load/store 35% 17%
fixed-point 38% 76%
floating-point 99% 30%
system register 89% 97%

01/22/25
Power-down costs

 Going into a power-down mode


costs:
 time;
 energy.
 Must determine if going into mode is
worthwhile.
 Can model CPU power states with
power state machine.
01/22/25
Application: StrongARM SA-
1100 power saving
 Processor takes two supplies:
 VDD is main 3.3V supply.
 VDDX is 1.5V.
 Three power modes:
 Run: normal operation.
 Idle: stops CPU clock, with logic still powered.
 Sleep: shuts off most of chip activity; 3 steps,
each about 30 s; wakeup takes > 10 ms.

01/22/25
SA-1100 power state
machine
Prun = 400 mW

run
10 s
160 ms
90 s
10 s
90 s
idle sleep

Pidle = 50 mW Psleep = 0.16 mW

01/22/25

You might also like