L15: Custom and ASIC VLSI Integration
L15: Custom and ASIC VLSI Integration
Acknowledgements:
Materials in this lecture are courtesy of the following people and used with permission.
- Rabaey, J., A. Chandrakasan, B. Nikolic. Digital Integrated Circuits: A Design Perspective.
Prentice Hall, 2003.
- Curt Schurgers
n-type well
metal/pdiff
contact
Wp
Lp
IN OUT
VDD Wn
contact
Ln frommetal
S to ndiff
G Circuit Representation GND
D metal poly n+ p+
diff diff
IN OUT
D Layout
Follow simple design rules (contract
G
between process and circuit designers)
S
(Courtesy of Chris Terman. Used with permission.)
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 3
Custom Design/Layout
Itanium has 6 integer execution units like this
a
9-1 Mux
5-1 Mux
g64
CARRYGEN
node1
SUMSEL
sum sumb
REG
ck1
to Cache
9-1 Mux
2-1 Mux
SUMGEN s0
+ LU s1
b
LU : Logical
Unit
1000um
Multiplexers
Shifter
Adder stage 1
Wiring
Die photograph of the
Loopback Bus
Loopback Bus
Loopback Bus
Adder stage 2
Wiring
Itanium integer datapath
Bit slice 63
Bit slice 2
Bit slice 1
Bit slice 0
Adder stage 3
Hand crafting the layout to achieve maximum clock rates (> 1Ghz)
Exploits regularity in datapath structure to optimize interconnects
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 4
The ASIC Approach
Design Capture Behavioral
Verilog
Verilog(or
(orVHDL
VHDL))
Pre-Layout
Pre-Layout
Simulation Structural
Simulation
Design Iteration
Logic
LogicSynthesis
Synthesis
Floorplanning
Floorplanning
Post-Layout
Post-Layout
Simulation
Simulation Placement
Placement Physical
Circuit
Circuit Routing
Routing
Extraction
Extraction
Tape-out
Most Common Design Approach for Designs up to 500Mhz
Clock Rates
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 5
Standard Cell Example
Each library cell (FF, NAND, NOR, INV, etc.) and the variations on size
(strength of the gate) is fully characterized across temperature, loading, etc.
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 6
Standard Cell Layout Methodology
After
Synthesis
module adder64 (a, b, sum);
input [63:0] a, b;
output [63:0] sum;
assign sum = a + b;
endmodule
After Routing
After
Placement
VDD BUS
CL
d1 l1
CI
λ = =5
d2 l2
CI CL
CL
Wire-to-wire capacitance causes
inter-wire delay dependencies
D Q
To VDD Grid
To VDD Grid
Ccoup
To VDD Grid
Receiver
Cint Rd
Cd
Driver
GROUND GRID
Pad Pad
up
down
Conventional Multiplication X3 X2 X1 X0
Y3 Y2 Y1 Y0
Z=X·Y
X 3 · Y0 X 2 · Y0 X 1 · Y0 X 0 · Y0
X 3 · Y1 X 2 · Y1 X 1 · Y1 X 0 · Y1
X 3 · Y2 X 2 · Y2 X 1 · Y2 X 0 · Y2
X 3 · Y3 X 2 · Y3 X 1 · Y3 X 0 · Y3
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
X Z
Y = (1001)2 = 23 + 20
<< 3
shifts using wiring
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 15
Transform: Canonical Signed Digits (CSD)
01101111 0 1 1 0 1 1 1 1 0 1 1 1 0 0 0 -1
=
10010001 1 0 0 -1 0 0 0 -1
X << 7 Z
<< 4
Shift translates to re-wiring
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 16
Algebraic Transformations
Commutativity Distributivity
A C B
A B A B
B A
C
⇔
⇔
A+B=B+A (A + B) C = AB + BC
A B
A B
(A + B) + C = A + (B+C)
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 17
Transforms for Efficient Resource Utilization
A B C D E FG H I
Time multiplexing: mapped to
3 multipliers and 3 adders
1
distributivity
A C B D E FG H I
D
D
D
D
D
Cutset retiming: A cutset intersects the edges, such that this would result in two disjoint
partitions of these edges being cut. To retime, delays are moved from the ingoing to the
outgoing edges or vice versa.
D
Benefits of retiming:
• Modify critical path delay
• Reduce total number of registers
associativity of
x(n) the addition
D D D
y(n)
(4) retime
x(n)
Note: here we use a first cut analysis that assumes the delay of a chain of operators is the sum
of their individual delays. This is not accurate.
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 20
Pipelining, Just Another Transformation
(Pipelining = Adding Delays + Retiming)
Contrary to retiming,
pipelining adds extra registers
to the system
add input
registers
D D How to pipeline:
1. Add extra registers at
D D all inputs
2. Retime
retime
D D
D D
D 2D
How about pipelining A A A
this structure! associativity
x(n) y(n)
x(n) y(n)
retiming
A D D D D 2D
A A2
A2
precomputed
L15: 6.111 Spring 2004 Introductory Digital Systems Laboratory 22
Scan Testing
... Idea: have a mode in which all registers are chained
into one giant shift register which can be loaded/
0 read-out bit serially. Test remaining (combinational)
1 logic by
ScanShift (1) in “test” mode, shift in new values for all
shift out register bits thus setting up the inputs to the
combinational logic
0 (2) clock the circuit once in “normal” mode, latching
1 the outputs of the combinational logic back into
CLK the registers
ScanShift
(3) in “test” mode, shift out the values of all
shift in register bits and compare against expected
ScanShift shift in results.
S reg X reg
Add, Mult2
Sub,
Shift
Mac1 Mac2
Mult1