0% found this document useful (0 votes)
60 views46 pages

Chip Basics: Time, Area, Power, Reliability, Configurability

The document discusses chip design basics related to time, area, power, reliability and configurability. It covers topics like clocking, pipelining, tradeoffs in IP selection, optimal pipelining, die floorplanning methodology, wafer manufacturing and yield modeling.

Uploaded by

shahd dawood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views46 pages

Chip Basics: Time, Area, Power, Reliability, Configurability

The document discusses chip design basics related to time, area, power, reliability and configurability. It covers topics like clocking, pipelining, tradeoffs in IP selection, optimal pipelining, die floorplanning methodology, wafer manufacturing and yield modeling.

Uploaded by

shahd dawood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Chapter 2

Chip Basics: Time, Area, Power,


Reliability, Configurability
Computer System Design
System-on-Chip
by M. Flynn & W. Luk
Pub. Wiley 2011 (copyright 2011)

soc 2.1
Basic design issue: Time

• clocking
• pipelining
– optimal pipelining
– pipeline partitioning
– wave pipelining and low overhead clocking

soc 2.2
SIA roadmap

soc 2.3
Tradeoffs in IP selection and
design: performance, area, power

soc 2.4
Clock parameters
• parameters
– Pmax: maximum delay through logic
– Pmin: minimum delay through logic
 t : cycle time (in seconds per cycle)
– tw : clock pulse width
– tg : data setup time
– td : register output delay
t = Pmax + C
– C : total clocking overhead

t

td Pmax tg –tw
soc 2.5
Skew
• skew: uncertainty in the clock arrival time
• two types of skew
– depends on t.....skew = k, a fraction of Pmax
where Pmax is the segment delay that determines t
• large segments may have longer delay and skew
• part of skew varies with Leff, like segment delay
– independent of t....skew = 
• can relate to clock routing, jitter from environmental conditions,
other effects unrelated to segment delay
• effect of skew = k(Pmax) + 
– skew range adds directly to the clock overhead
soc 2.6
Optimal pipelining
• let the total instruction execution without pipelining and
associated clock overhead be T
• in a pipelined processor, let S be the number of segments
S - 1 is number of cycles lost due to a pipeline break
• let b = probability of break, C = clock overhead incl. fixed skew

soc 2.7
Optimum pipelining

P1 P2 P3 P4

T
Pmax i = delay of the i th functional unit

suppose T = i Pmax i without clock overhead


S = number of pipeline segments
C = clock overhead
T/S  max (Pmax i ) [quantization]

soc 2.8
Cycle Avg. Clock
time Time / overhead
segment
t = T/S + C
performance = 1/ (1+(S - 1)b) [IPC]
throughput = G = performance / t [IPS]

G=

Find S for optimum performance by solving for S:

we get
soc 2.9
Find Sopt
• estimate b
– use instruction traces
• find T and C from design details
– feasibility studies

• example:

Clock
b k T (ns) C (ns) Sopt G (MIPS) f (MHZ) CPI Overhead %
0.1 0.05 15 0.5 16.8 270 697 2.58 34.8%
0.1 0.05 15 1 11.9 206 431 2.09 43.1%
0.2 0.05 15 0.5 11.2 173 525 3.04 26.3%
0.2 0.05 15 1 7.9 140 335 2.39 33.5%
soc 2.10
Quantization + other considerations
• quantization effects
– T cannot be arbitrarily divided into segments
– segments defined by functional unit delays
– some segments cannot be divided; others can be
divided only at particular boundaries
• some functional operations are atomic
– cycle: usually not cross function unit boundary
• Sopt
– ignores cost/area of extra pipeline stages
– ignores quantization loss
– largest S to be used
soc 2.11
Microprocessor design practice
• tradeoff around design target
• optimal in-order integer RISC: 5-10 stages
– performance: relatively flat across this range
– deeper for out-of-order or complex ISA
(e.g. Intel Architectures)
• use longer pipeline (higher frequency) if
– FP/multimedia vector performance important
– clock overhead low
• else use shorter pipeline
– especially if area/power/effort are critical
soc 2.12
Advanced circuit techniques
• asynchronous or self-timed clocking
– avoids clock distribution problems
but has its own overhead
• multi-phase domino clocking
– skew tolerant and low clock overhead;
lots of power required and extra area
• wave pipelining
– ultimate limit on t
t = Pmax - Pmin + C

soc 2.13
Basic Design Issues: Silicon Area,
Power, Reliability, Reconfiguration
• die floorplanning methodology
• area-cost model
• power analysis and model
• reliability
• reconfigurable design
• soft processors

soc 2.14
AMD Barcelona multicore

http://www.techwarelabs.com/reviews/processors/barcelona/ soc 2.15


Die floorplanning methodology
• pick target cost based on market requirements
• determine total area available within cost budget
– defect and yield model
• compute net available area for processors,
caches and memory
– account for I/O, buses, test hooks, I/O pads etc.
• select core processors and assess area and
performance
• re-allocate area to optimize performance
– cache, signal processors, multimedia processors, etc.

soc 2.16
Wafers and chips

suppose the wafer has diameter d and each die is square with area A
soc 2.17
Wafers and chips: example
If N is the number of dice on the wafer,
N = d)2/ (4A) [Gross Yield]

Let NG be number of good dice


and ND be the number of defects on a wafer.
Given N dice of which NG are good.....suppose we randomly add
1 new defect to the wafer. What’s the probability that it strikes a
good die....and changes NG ?

soc 2.18
Probability of the defect hitting a good die = NG / N
The change in NG is d NG /d ND = - NG / N
Rewriting this we get d NG / NG = - ( 1/N) d ND
Integrating and solving: ln(NG) = -ND/N + C

Since NG = N => ND = 0, C must be ln(N)

NG / N = Yield = e - N /N
D

let defect density ( defects / cm2 ) = D

Nd = D x wafer area = D x A x N

Yield = Ng / N = e - DA
typically D = 0.3 – 1.0 defect / cm2
soc 2.19
Using yield to size a die
to find the cost per die:
1. find N , the number of die on a wafer
2. find Yield
3. find Ng = Yield x N
4. cost/die = wafer cost/ Ng

Wafer Defect Wafer Die Cost per


Diameter Density Cost Size Gross Good good die
(cm) (per cm2) ($) (cm) Yield Yield dice ($)
21 1 5000 1 314 0.37 116 $ 43
21 1 5000 1.5 133 0.11 14 $ 357

soc 2.20
Effect of defect density

soc 2.21
What can be put on the die?
• depends on the lithography and die area
• lithography determined by f, minimum
feature size
• feature size is related to the mask
registration variation
– f=2

soc 2.22
Smallest device: 5  x 5 
5
4

4  2  
5

soc 2.23
Area Units: rbe and A
• rbe: small area unit for sizing functional units
of the processor
• suppose we define another larger unit, A, as
1A =f2 x 106,then 1A = 106 / 675 = 1481 rbe
• since 1481 is close to 1444 we can also refer
to the simple register file as occupying 1 A
Unit Relative Size

mask registration
f f = 2
minimum feature size
rbe rbe = 2700 2 = 675 f2
register bit equivalent
A A = 106 f2 = 1481 rbe
functional unit area
soc 2.24
Area of other cells
• 1 register bit = 1 rbe
• 1 CAM bit = 2 rbe
• 1 cache bit (6 tx cell) = 0.6 rbe
• 1 SRAM bit = 0.6 rbe
• 1 DRAM bit = 0.1 rbe = 67.5 f2

These are the parameters for basic


cells in most design tradeoffs

soc 2.25
Floorplan and area allocation

Core processors
Signal processor
Cache
Bus
Memory
Clock
Test

soc 2.26
The baseline: I
• suppose d is 0.2 defects /cm2 and we target
80% yield
• then A = 110 mm2 gross or (allowing 20%) guard
88 mm2 net
• if f = 0.13  we have 5200 A area units for our
design
• we want to realize
– a 32b core processor (w 8kB I & 16kB D cache)
– 2 32b Vector proc. W 16 x 1k x 32 vector memory
+ I and D cache
– 128kB ROM
– anything else is SRAM
soc 2.27
The baseline: II

This leaves 5200 - 2462 = 2538A available for data


SRAM
This implies about 512kB of SRAM soc 2.28
Example SOC floorplan

soc 2.29
Die area summary
• cost: an exponential function of area
• successful business model
– targets initial production at relatively low yield (~0.3)
– ride learning curve and leverage technology to
reduce cost and improve performance
• technical innovation and analysis
– intersect with business decisions to make a product
– use design feasibility studies and empirical targets
– methodology for cost and performance evaluation
– marketing targets: determine weighting of
performance metrics soc 2.30
Power consumption
• power consumption:
becoming key design issue
• increased power:
largely due to higher frequency
operation

soc 2.31
Bipolar and CMOS clock frequency

Bipolar
power limit

soc 2.32
Bipolar cooling technology (ca ’91)

Hitachi M880: 500 MHz; one processor/module,


40 die sealed in helium then cooled by a water jacket.

Power consumed: about 800 watts per module.


F. Kobayashi, et al . “Hardware technology for Hitachi M-880.” Proceedings Electronic Components and Tech Conf., 1991.

soc 2.33
Power: real price of performance
Dynamic
Static
power
power

As feature size & C (capacitance) decrease, the electric fields force


a reduction in V. To maintain performance we also reduce Vth

So as Vth decreases this increases Ileakage and static power.


Static power is now a big problem in high performance designs.

Static power can be controlled by maintaining Vth and using


lower frequencies; also lowering V reduces dynamic power.
soc 2.34
Power and frequency
• I = C dV/dt ….smaller C enables higher dV/dt
(frequency)
• but I = (V - Vth)1.25/V and I also directly determines max.
frequency.
• for Vth = 0.6v , halving V also halves the frequency. (E.g. if V
goes from 3 to 1.5v then freq is ½)
• so halving the voltage (VDD or the signal V) halves the
frequency BUT reduces the power by 1/8 … (CV2f/2)
• so

soc 2.35
Power: a new frontier
• cooled high power: >70w/ die
• high power: 10- 50w/ die … plug in supply
• low power: 0.1- 2w / die.. rechargeable battery
• very low power: 1- 100mw /die .. AA size
batteries
• extremely low power: 1- 100 microwatt/die and
below (nano watts) .. button batteries
• no power: extract from local EM field,
….O (1uw/die)

soc 2.36
Battery energy and usage
type energy time power
capacity
recharage 10,000 50 hours 400mw-
able mAh (10-20% 4w
duty)
2xAA 4000 ½ year (10- 1-10 mw
mAh 20% duty)
button 40mAh 5 years 1uw
(always on)

soc 2.37
Power is important!

• by scaling alone a 1000x slower implementation


may need only 10-9 as much power
• gating power to functional units and other
techniques should enable 100MHz processors to
operate at O(10-3) watts
• goal: O(10-6) watts…. implies about 10 MHz

soc 2.38
Reliability + computational integrity
• design for reliability using
– redundancy
– error detect and correct
– process recoverability
– fail-safe computation
• failure: a deviation from a design specification
• error: a failure that results in an incorrect signal
value
• fault: an error manifests as an incorrect logical
result
• faults
– do not necessarily produce incorrect program execution
– can be masked by detection/correction logic, e.g. ecc
codes
• types of faults: soc 2.39
Redundancy: carefully applied
• P(t) = e-t/
– derived in the same way as the yield equation
• TMR (triple modular redundancy) system
– additional reliability over a time much less than
 the expected failure time for a single module
• additional hardware
– makes the occurrence of multiple module failures
more probable

soc 2.40
Highly reliable designs
• typical usage
– error detection: parity, residue, block codes;
sanity & bounds checks
– action (instruction) retry
– error correction: code or alternate path compute
– reconfiguration

soc 2.41
Why reconfigurable design?
• manage design complexity based on
high-performance IP-blocks
– avoid the risk and delay of fabrication
• time – support highly-pipelined designs
• area – regularity of FPGA, readily to
advance to better process technology
• reliability – FPGA enables redundant cells
and interconnections, avoid run-time faults

soc 2.42
Area estimate of FPGAs
• use rbe model as the basic measure
– one slice  7000 transistors = 700 rbe
– one logic element (LE)  12000 = 1200 rbe
– Xilinx Virtex XC2V6000 = 33,792 slices
• 23.65 million rbe or 16400A
• 8 x 8 multiplier: around 35 slices
– equivalent to 24500 rbe or 17A
– 1-bit multiplier in VLSI contains a full-adder and an AND
gate  3840 transistors = 384 rbe around 60 times smaller
than reconfigurable version
• block multipliers in FPGAs: more efficient

soc 2.43
Soft processors: using FPGAs
• soft processors  how soft they are?
– an instruction processor design in bit-stream
format, used to program an FPGA device
– cost reduction, design reuse, …
• major soft processors include:
– Altera: Nios
– Xilinx: MicroBlaze
– open-source: OpenRISC, Leon
– all 32-bit RISC architecture with 5-stage
pipelines, connect to different bus standards

soc 2.44
Features: soft processors

soc 2.45
Summary
• best optimise: time, area, power
• cycle time: optimized pipelining
• area: die floorplanning, rbe model
• power: cooling + battery implications
• reliability: computational integrity, redundancy
• reconfiguration: reduce risks and delays
– area overhead alleviated by coarse-grained blocks
– soft processors: instruction processors in FPGA

soc 2.46

You might also like