0% found this document useful (0 votes)
56 views36 pages

Processor Intro PPT

Uploaded by

Chand1891
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views36 pages

Processor Intro PPT

Uploaded by

Chand1891
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

CMOS VLSI Design

Lecture 3:
Simple Processor
Example
Learning Objectives
At the end of this lecture, you should be able to
• Describe and apply techniques for managing the design of complex systems.
• Describe the implementation of a simple processor at abstraction levels including:
Architecture, Microarchitecture, Logic Design, Circuit Design, Physical Design,
Verification & Test. 

2 © 2020 Arm Limited


Outline
• Design Partitioning
• Simplified Processor Example
• Architecture
• Microarchitecture
• Logic Design
• Circuit Design
• Physical Design
• Fabrication, Packaging, Testing

3 © 2020 Arm Limited


Coping with Complexity
• How to design System-on-Chip?
• Many millions (even billions!) of transistors
• Tens to hundreds of engineers
• Structured Design
• Design Partitioning

4 © 2020 Arm Limited


Structured Design
• Hierarchy: Divide and Conquer
• Recursively system into modules
• Regularity
• Reuse modules wherever possible
• E.g., standard cell library
• Modularity: well-formed interfaces
• Allows modules to be treated as black boxes
• Locality
• Physical and temporal

5 © 2020 Arm Limited


Design Partitioning
• Architecture: User’s perspective, what does it do?
• Instruction set, registers
• ARM, MIPS, x86, Alpha, PIC, RISC-V, …
• Microarchitecture
• Single cycle, multicycle, pipelined, superscalar?
• Logic: how are functional blocks constructed
• Ripple carry, carry lookahead, carry select adders
• Circuit: how are transistors used
• Complementary CMOS, pass transistors, domino
• Physical: chip layout
• Datapaths, memories, random logic

6 © 2020 Arm Limited


Simplified Architecture
• Example: simplified processor architecture
• Drawn from DDCA ARM Ed. (Harris & Harris)
• ARM is a 32-bit architecture with 16 registers
• Consider 8-bit subset using 8-bit datapath for simplified processor
• Only implement 8 registers (R0 - R7)
• 8-bit program counter in R7
• You’ll build this simplified processor in the labs
• Illustrate the key concepts in VLSI design

7 © 2020 Arm Limited


Instruction Set

• Data Processing
– ADD, SUB, AND, OR
– Each accepts register/immediate 2nd sources
• Memory
– LDR, STR
– Positive immediate offset
• Branch
–B

8 © 2020 Arm Limited


Instruction Encoding
• 32-bit instruction encoding
• Requires four cycles to fetch on 8-bit datapath
• Three instruction formats shown below
• Distinguished by their op fields
• Rd: destination register
• Rn: first source register
• Rm: second source register
• Cond: conditional execution
• Imm: immediate
cmd instr
0100 ADD
0010 SUB
0000 AND
1100 OR

9 © 2020 Arm Limited


Fibonacci (C)
f0 = 1; f-1 = -1
fn = fn-1 + fn-2
f = 1, 1, 2, 3, 5, 8, 13, …

10 © 2020 Arm Limited


Fibonacci (Assembly)
# fib.asm
# Register usage: R0 = 0, R3 = n, R4 = f1, R5 = f2
# Put result in address 255
fib add R3, R0, #8 # initialize n = 8
add R4, R0, #1 # initialize f1 = 1
add R5, R0, #-1 # initialize f2 = -1
loop subs R3, R0, end # n = 0?
beq done # then done
add R4, R4, R5 # f1 = f1 + f2
sub R5, R4, R5 # f2 = f1 – f2
add R3, R3, #-1 # n = n-1
b loop # repeat until done
done str R4, [R0, #255] # store result in adr 255

11 © 2020 Arm Limited


ARM Microarchitecture
• Multicycle microarchitecture ( Harris DDCA ARM Ed.)

12 © 2020 Arm Limited


Multicycle Controller

13 © 2020 Arm Limited


Multicycle FSM

14 © 2020 Arm Limited


Logic Design
• Start at top level
• Hierarchically decompose simplified processor into units

15 © 2020 Arm Limited


Hierarchical Design

mips
Simplified Processor

controller alucontrol datapath

standard bitslice zipper


cell library

alu inv4x flop ramslice

fulladder or2 and2 mux4

nor2 inv nand2 mux2

tri

16 © 2020 Arm Limited


HDLs
• Hardware Description Languages
• Widely used in logic design
• Verilog and VHDL
• Describe hardware using code
• Document logic functions
• Simulate logic before building
• Synthesize code into gates and layout
– Requires a library of standard cells

17 © 2020 Arm Limited


Verilog Example
module fulladder(input a, b, c,
output s, cout);
 
sum s1(a, b, c, s);
carry c1(a, b, c, cout);
endmodule
 
module carry(input a, b, c,
output cout)
 
assign cout = (a&b) | (a&c) | (b&c);
endmodule

18 © 2020 Arm Limited


Circuit Design
• How should logic be implemented?
• NANDs and NORs vs. ANDs and ORs?
• Fan-in and fan-out?
• How wide should transistors be?
• These choices affect speed, area, power
• Logic synthesis makes these choices for you
• Good enough for many applications
• Hand-crafted circuits are still better

19 © 2020 Arm Limited


Example: Carry Logic
• assign cout = (a&b) | (a&c) | (b&c);

Transistors? Gate Delays?

20 © 2020 Arm Limited


Gate-level Netlist

module carry(input a, b, c,
output cout)
 
wire x, y, z;
 
and g1(x, a, b);
and g2(y, a, c);
and g3(z, b, c);
or g4(cout, x, y, z);
endmodule

21 © 2020 Arm Limited


Transistor-Level Netlist

module carry(input a, b, c,
output cout)
 
wire i1, i2, i3, i4, cn;
 
tranif1 n1(i1, 0, a);
tranif1 n2(i1, 0, b);
tranif1 n3(cn, i1, c);
tranif1 n4(i2, 0, b);
tranif1 n5(cn, i2, a);
tranif0 p1(i3, 1, a);
tranif0 p2(i3, 1, b);
tranif0 p3(cn, i3, c);
tranif0 p4(i4, 1, b);
tranif0 p5(cn, i4, a);
tranif1 n6(cout, 0, cn);
tranif0 p6(cout, 1, cn);
endmodule

22 © 2020 Arm Limited


SPICE Netlist
.SUBCKT CARRY A B C COUT VDD GND
MN1 I1 A GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P
MN2 I1 B GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P
MN3 CN C I1 GND NMOS W=1U L=0.18U AD=0.5P AS=0.5P
MN4 I2 B GND GND NMOS W=1U L=0.18U AD=0.15P AS=0.5P
MN5 CN A I2 GND NMOS W=1U L=0.18U AD=0.5P AS=0.15P
MP1 I3 A VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1 P
MP2 I3 B VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1P
MP3 CN C I3 VDD PMOS W=2U L=0.18U AD=1P AS=1P
MP4 I4 B VDD VDD PMOS W=2U L=0.18U AD=0.3P AS=1P
MP5 CN A I4 VDD PMOS W=2U L=0.18U AD=1P AS=0.3P
MN6 COUT CN GND GND NMOS W=2U L=0.18U AD=1P AS=1P
MP6 COUT CN VDD VDD PMOS W=4U L=0.18U AD=2P AS=2P
CI1 I1 GND 2FF
CI3 I3 GND 3FF
CA A GND 4FF
CB B GND 4FF
CC C GND 2FF
CCN CN GND 4FF
CCOUT COUT GND 2FF
.ENDS

23 © 2020 Arm Limited


Physical Design
• Floorplan
• Standard cells
• Place & route
• Datapaths
• Slice planning
• Area estimation

24 © 2020 Arm Limited


Simplified Processor Floorplan

25 © 2020 Arm Limited


8-bit Processor Layout

26 © 2020 Arm Limited


Standard Cells
• Uniform cell height
• Uniform well height
• M1 VDD and GND rails
• M2 Access to I/Os
• Well/substrate taps
• Exploits regularity

27 © 2020 Arm Limited


Synthesized Controller
• Synthesize HDL into gate-level netlist
• Place & Route using standard cell library

28 © 2020 Arm Limited


Pitch Matching
• Synthesized controller area is mostly wires
• Design is smaller if wires run through/over cells
• Smaller = faster, lower power as well!
• Design snap-together cells for datapaths and arrays
• Plan wires into cells
• Connect by abutment
– Exploits locality
– Takes lots of effort

29 © 2020 Arm Limited


Simple 8-bit Processor Datapath

• 8-bit datapath built from wordslices


• Zipper at top drives control signals to datapath
Register
File
Decoder

Zipper
Bitslice 7

Bitslice 1

Flop Adder
Wordslice Wordslice

30 © 2020 Arm Limited


Slice Plans
• Slice plan for bitslice
• Cell ordering, dimensions, wiring tracks
• Arrange cells for wiring locality

31 © 2020 Arm Limited


Area Estimation
• Need area estimates to make floorplan
• Compare to another block you already designed
• Or estimate from transistor counts
• Budget room for large wiring tracks
• Your mileage may vary; derate by 2x for class.

32 © 2020 Arm Limited


Design Verification
• Fabrication is slow & expensive
• MOSIS 0.6 µm: $1000, 3 months
• 65 nm: $3M, 1 month
• Debugging chips is very hard
• Limited visibility into operation
• Prove design is right before building!
• Logic simulation
• Circuit simulation/formal verification
• Layout vs. schematic comparison
• Design & electrical rule checks
• Verification is >50% of effort on most chips!

33 © 2020 Arm Limited


Fabrication & Packaging
• Tapeout final layout
• Fabrication
• 6, 8, 12” wafers
• Optimized for throughput,
not latency (10 weeks!)
• Cut into individual dice
• Packaging
• Bond gold wires from die I/O pads to package

34 © 2020 Arm Limited


Testing
• Test that chip operates
• Design errors
• Manufacturing errors
• A single dust particle or wafer defect kills a die
• Yields from 90% to <10%
• Depends on die size, maturity of process
• Test each part before shipping to customer

35 © 2020 Arm Limited


ARM1 Processor
• 1st generation ARM chip (1985)
• 24,800 transistors
• 3-stage pipeline
• 8 MHz
• 3 mm process
• 49 mm2
• 82-pin PLCC package
• Average power = 0.1 W
• 2-phase nonoverlapping
clocks
• Built for Acorn Computer
© 1985 ARM Ltd.

36 © 2020 Arm Limited

You might also like