0 RTL2GDS Synthesis Intro

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 153

RTL2GDS

DIGITAL IC DESIGN FLOWS

Dina Tantawy
TRAINING PLAN
• M.1: • M.5:
• Motivation . • Placement .
• Different Design Flows (FPGA vs ASIC). • Clock Tree Synthesis.

• Logic Synthesis. • M.6:


• Clock Tree Synthesis continue.
• M.2: • Routing.
• Static Time Analysis & SDC. • M.7:
• M.3: • Physical Verification: DRC
• Dynamic Time analysis & Utility • Physical Verification: LVS
scripting (TCL). • Physical Verification: PEX

• M.4:
• Design For testability.
2
• Floor Planning & Power Planning.
AGENDA

• Motivation
• Design Flows
• PLDs
• FPGAs
• ASIC
• Semi ASIC Flow
• Design Abstraction
• Standard Library
• Synthesis
3
SEMICONDUCTORS INDUSTRY

Source: SIA_State-of-Industry-Report_Nov-2022
4
EDA: DRIVER

5
SEMICONDUCTORS INDUSTRY
Market Share prediction

6
SEMICONDUCTORS INDUSTRY

7
SEMICONDUCTORS INDUSTRY

Source: Atropos,
Wikipedia*

Billion of Transistors 8

Trillions of Polygons
The internal layout of the Core i7 microprocesso
SEMICONDUCTORS INDUSTRY

Moore’s Law

The number of
transistors
incorporated in a
chip will
exponentially
double every 24
months…1979
Rock’s Law

The cost of
semiconductor chip
fabrication doubles
every 4 years
….1985 9
SEMICONDUCTORS
PROCESS(TECHNOLOGY)

• Used to be measured in microns (um = m) which are a


millionth of meter, or a thousandth of a millimeter.
• 130um, 45um , 2um (0.000002 meter)
• These were the length of the transistor
• Now in nanometers (nm = m)
• 28nm, 16nm, 7nm
• Research reached less than 7 and no one knows what it can
reach 10
INCREASING COMPLEXITY
Number of transistors on a chip has grown exponentially.
Due to advances in fabrication, and enabled by powerful C A D tools.

Automation becomes a need for


Fast time-to-market to seek corresponding productivity gains.
Reasonable engineering effort.

The greatest challenge in modern VLSI design is not in designing


the individual transistors but rather in managing system
complexity.
Modern System-On-Chip (SOC) designs combine memories, processors, 11
high-speed I/O
interfaces, and dedicated application-specific logic on a single chip.
MOTIVATION

• Technology is everywhere.
• New technologies.
• More complexities.
• New Needs (Chips).
• Politics Conflicts: New Foundries.
• More Demand on optimization speed, power, area.
12
DIFFERENT DESIGN FLOWS
Digital Chip Implementation
Choices
Digital Circuit Implementation Approaches

Custom Semicustom

Cell-based Array-based

Pre-diffused Pre-wired
(Gate Arrays) (FPGA's)

FPGA

ASIC 14
PROGRAMMABLE LOGIC DEVICES (GATE-
ARRAYS)
• An IC that contains large numbers of gates, flip-flops, etc. that can be
configured by the user to perform different functions is called a
Programmable Logic Device (PLD).
• It permits elaborate digital logic designs to be implemented by the user on
a single device.
• The internal logic gates and/or connections of PLDs can be
changed/configured by a programming process.
• Once manufactured and programmed, the logic cannot be changed. This
system is a fantastic asset for repeated tasks.
15
TYPES OF PLDS

16
PROGRAMMABLE LOGIC ARRAY (PLA)

17
COMPLEX PROGRAMMABLE LOGIC
DEVICES (CPLDS)

18
HOW TO PROGRAM?

• To be programmable, we need some mechanism that allows


us to configure (program) a prebuilt silicon chip.
• Techniques:
• Fuse based
The major difference is in
• Anti-fuse based how the chip is programmed,
that is, how the logic is
• E(E)PROM stored on the chip and how
the chip retains its
programming.
19
FUSIBLE LINK TECHNOLOGY
Exampl
e:

20
FUSIBLE LINK TECHNOLOGY

Design engineers can selectively


remove
undesired fuses by applying pulses of
relatively high voltage and current to
the device’s inputs.

Devices based on fusible-link


technologies are said to be one-time
programmable, or OTP, because once a
fuse has been blown, it cannot be
replaced and there’s no going back. 21
ANTI-FUSE TECHNOLOGY

Antifuses can be selectively


“grown”
(programmed) by applying
pulses of relatively high
voltage and
current to the device’s inputs.

Devices based on antifuse


technologies
are OTP, because once an
antifuse has been grown, it
cannot 22
be removed
ANTI-FUSE TECHNOLOGY

An antifuse commences life as a microscopic column of


amorphous (non-crystalline) silicon linking two metal
tracks.

In its un-programmed state, the amorphous silicon acts


as an insulator with a very high resistance more than
one billion ohms. 23
E(E)PROM
• EEPROM is that it can be
reprogrammed multiple times.
• The stored data is non-volatile
and can be erased on a byte-by-
byte basis
• Unlike other types, EEPROM chips
do not have to be removed from
the computer to be modified.

24
EXERCISE

• Can you think of application for PLDs or CPLDs?


• What is the advantage/disadvantage of PLDs?

25
Digital Chip Implementation
Choices
Digital Circuit Implementation Approaches

Custom Semicustom

Cell-based Array-based

Pre-diffused Pre-wired
(Gate Arrays) (FPGA's)

FPGA

ASIC 26
WHAT ARE FPGAS??

• Field programmable gate arrays (FPGAs) are digital


integrated circuits (ICs) that contain configurable
(programmable) blocks of logic along with configurable
interconnects between these blocks.

• Design engineers can configure (program) such devices to


perform a tremendous variety of tasks. (Flash & SRAM)
27
HOW FPGA WORKS?

• CLB consists of two slices, each of which contains


look-up tables (LUT), registers, multiplexers, and
carry logic.
• LUT implement logic functions
• Registers store data
• Multiplexers select the desired output
• Carry logic enables fast arithmetic function
• Interconnections are routing resources including
channels, switch boxes, clock distribution
networks, etc.
28

Hao wang and Jyh-Charn (Steve) Liu


CLB ARCHITECTURE

• Arrangement of slices within the CLB

29
SLICE DIAGRAM

30
3-LUT
input[0:2] config_out
0
1
1
0
output
1
0
clock 0
1
config_in

31
HOW LUT WORKS Input 1 Input 2 Output
0 0 0
0 1 0
1 0 0
• An 2-input LUT configured as follows 1 1 1

implements AND gate


• It can be easily reconfigured to implement
OR, XOR, NAND, and NOR gates, which are
the basics to build up more complex
functions.
• Modern FPGAs have 6-input LUTs which can
be cascaded to implement complex digital
circuits.
32
Dir ect
Connections

DI CE A DI CE A
B X B X
C CLB 0 C CLB1
K Y K Y
E D R E D R
Hor iz ontal
Long Line

Switching
Matrix

General
Purpose Line s
DI CE A DI CE A
B X B X
C CLB2 C CLB3
K Y K Y
E D R E D R

33
Globa l
Vertic al
Long Line
INTERCONNECTION ARCHITECTURE

• Switch box topology

34
Which takes
more chip
area, logic or
interconnects?
Logic: 20% ~
30%
Interconnect:
70% ~ 80%

35
IOB

36
MORE COMPONENTS

• Additional components
• RAM blocks
• Dedicated multipliers
• Tri-state buffers
• Transceivers
• Processor cores
• DSP blocks

37
Dedicated Arithmetic Structures in FPGAs

QuickLogic

Altera
Xilinx
38
HOW TO RECONFIGURE FPGA

• Logic block functions and


interconnections are specified using
Hardware Description Language
(HDL).
• The HDL code is synthesized,
mapped, placed ,routed and
download onto the chip by vendor-
provided tools
39
HOW TO RECONFIGURE FPGA

• Each time a new configuration is downloaded, the FPGA


behaves like a new chip implementing a new function.

• The configuration is lost when powered off, because they are


stored in SRAM.
• However, the configuration file can be stored in external storage such
as flash card or PROM so that the chip can automatically load it when
powered on
40
HOW TO PROGRAM?

• In order to be programmable, we need some mechanism that


allows us to configure (program) a prebuilt silicon chip.

• Techniques:
• Flash technology
• SRAM based

41
FLASH TECHNOLOGY

• Flash technology is a variant of EEPROM (electrically erasable programmable


read-only memory).The method of storing data is based on transistors with a
“floating gate”, which controls the behavior of the transistor.

• The floating gate is fully insulated and therefore holds a charge without a power
supply. Writing and erasing are performed using a high voltage.

• Flash memory withstands only a limited number of program/erase (P/E) cycles,


ranging from a typical value of 100 000 up to a million.
42
SRAM TECHNOLOGY
• A SRAM chip holds the logic in memory only while the chip is
powered, that is, it is volatile memory. they must be reconfigured
every time the system is powered up. This either requires the use
of a special external memory device or of an on-board
microprocessor.

• SRAM technology holds the programming as the state of groups of


cross-coupled transistors.
A chip based on flash technology is
reprogrammable like SRAM but it retains
the programming without power.
43
SRAM TECHNOLOGY

• Compared to the other technologies, the SRAM cell is large


(6–12 transistors) and dissipates significant static power
because of leakage current.
• This method has been made popular because it provides fast
and infinite reconfiguration.

44
CATEGORY OF FPGA DEVICES

• coarse grained: In the case of a coarse-grained architecture, each logic block


contains a relatively large amount of logic compared to their fine grained
counterparts. For example, a logic block might contain four 4-input LUTs, four
multiplexers, four D-type flip-flops, and some fast carry logic.

• fine grained : In the case of a fine-grained architecture, each logic block can be
used to implement only a very simple function. For example, it might be possible to
configure the block to act as any 3-input function, such as a primitive logic gate
(AND, OR, NAND, etc.) or a storage element (D-type flip-flop, D-type latch, etc.).
45
IMPORTANCE OF FPGAS

• Can implement extremely large and complex functions that


previously could be realized only using ASICs.
• Implementing design changes is much easier in FPGAs, and
the time-to-market for such designs is much faster.

46
WHY USE FPGA?

• Compared to CPU, FPGA has the following benefits


• Performance
• Throughput
• Reliability
• Compared to ASIC, FPGA has the following benefits
• Reconfigurability
• Cost
• Time to market
47
BENEFIT OVER CPU: PERFORMANCE

• FPGA code runs in real time in nanoseconds.


• All the logic blocks inside FPGA can run concurrently in
parallel, since they are real hardware circuits.
• If fully utilized, FPGA can implement a high performance
many-core system.

48
BENEFIT OVER CPU : THROUGHPUT

• FPGA has many high speed I/O pins to interface with


memory, Ethernet, etc. It has dedicated channels for
peripherals, unlike CPU whose peripherals share a bus.
• FPGA vendors provide many high speed communication IP
cores.

49
BENEFIT OVER CPU : RELIABILITY

• Software programs are usually built upon several layers of


abstractions (driver, OS, etc) to help schedule tasks and
share resources. They are at risk of incompatibility, resource
contention, deadline violation, etc.
• FPGA circuitry is a hard implementation, which is
deterministic and well predictable.

50
BENEFIT OVER ASIC : RECONFIGURABILITY

• ASIC can only do a specific job. If you want an extra function after
an ASIC has been manufactured, design another one! While for
FPGA, that is nothing but adding a piece of code and re-compile it.
• Certain bugs are caught after the ASIC has been manufactured and
distributed to customers. A recall may put the company into
bankrupt. While for FPGA, developers can fix the bug in the HDL
code and distribute the update to customers.

51
BENEFIT OVER ASIC : COST

• ASIC development is usually non-recurring engineering.


Product fabrication also costs a lot of money.
• FPGA vendors produce large volume of chips so that end
users do not need to eat the fabrication cost.
Reconfigurability of FPGA makes it reusable for different
projects.

52
BENEFIT OVER ASIC : TIME TO MARKET

• ASIC needs cost many man-years to design, test, validate,


and fabricate.
• FPGA development boards usually come with a rich set of
peripherals and IP cores which makes it ideal for rapid
prototyping.

53
EXERCISE

• What are the disadvantage of using FPGA ?

54
Digital Chip Implementation
Choices
Digital Circuit Implementation Approaches

Custom Semicustom

Cell-based Array-based

Pre-diffused Pre-wired
(Gate Arrays) (FPGA's)

FPGA

ASIC 56
ASIC
- 100’s of millions of logic gates can be integrated on the same chip using AS I C s to
create incredibly large and complex functions.

- D ue to the extremely expensive cost of building a new silicon foundry , many


companies work in design only and few companies specialize in fabricating their
designs.

57
ASIC: DESIGN APPROACHES

Full Custom
A design methodology useful for integrated circuits. In this design, the resistors, transistors, digital
logic, capacitors and analog circuits are all positioned in the circuit layout. [“handcrafted” designs]
Pros:
• Maximum performance,
• minimized area and
• highest degree of flexibility.
Cons:
• Huge design effort,
• high Design cost and NRE cost,
• design is frozen in silicon, and
58
• long time to market.
ASIC: DESIGN APPROACHES
Semi-Custom [Standard Cell Based ASICs]
• Components from a predesigned
standard cell library are used.
• All logic cells are predesigned and
some mask layers are only
customized.
• Standard cell libraries are usually
designed using full custom
approach.

59
ASIC: DESIGN APPROACHES
Semi-Custom [Standard Cell Based ASICs]

Pros over Full custom:


• Easier,
• automatable/less design effort,
• practical to use for large designs,
• reduced risk.

60
STANDARD CELLS: THE MAIN BUILDING
BLOCKS

61
ASIC COST
Total Product Cost = NRE + (Number of parts * recurring cost per part)

NRE: fixed non-recurring engineering cost.


EDA tools and training.
Project Development/Verification cost.
 Analog design and layout design, Logic synthesis, DFT/ATPG, PnR.

 Formal verification, logic simulation, functional simulation, Co-simulation, Physical

verification, EMIR, Static Timing Analysis.


 A single EDA tool license can cost hundreds of thousands of dollars.
ASIC vendor costs (ex. masks).

Cost-per-part
Wafer cost.
Wafer processing.
Production yield. 62
🞂 Packaging.
FPGA ASIC
Advantages Faster time to market. Lower unit cost for mass production.
No NRE Faster than FPGA.
Simpler design cycle, more predictable. Lower power than FPGA.
Re- More flexible; analog and mixed-signal
programability designs can be created which is not possible
Reusability in FPGA.
Perfect for
prototyping
Have built-in
blocks like: MACs,
memories, high
speed IOs.
Disadvantages Higher unit cost Longer time to market
Slower than ASIC High NRE, very expensive tools.
Higher power Design cycle has to analyze/enhance more
than ASIC. No aspects like: DFM, Crosstalk, EMIR, LVS,
control over ERC/PERC.
power Design is frozen in silicon.
optimization.
Limited design
size to FPGA
resources
Typical Usage Complex products in small volume. Products in large volume
63
(ex. Medical and Defense)
Digital HW Prototyping
DIE PHOTOS: VERTEX VS. PENTIUM IV

FGPA Vertex chip looks remarkably well structured- Very dense,


very regular structure.

Full Custom Pentium chip somewhat more random in structure -


Large on-chip memories (caches) are visible
64
EXERCISE
What is the best design approach for implementing each of the following: [hint: you can use a mix of approaches when
needed]

• ADC
• Microcontroller
• AI accelerator HW testing and prototyping.
• Mobile application chip.
• Medical and Aerospace applications. [Low volume and high complexity]

65
EXERCISE

•When can we consider the high NRE cost of ASIC development make sense
economically?

66
SEMICONDUCTORS PLAYERS

67
MAIN SEMICONDUCTOR MARKET PLAYERS:
FABLESS VS FABS
- Fabless semiconductor companies design only the layout, which indicates the
placement of the different layers inside the IC (n-diffusion, p-diffusion, metal1,
metal2,…etc.). These different placements indicates building transistor(s) or connections
between them.
- For example:
Chip vendors: Qualcomm, Broadcom, Nvidia, Infineon, Freescale and Renesas.
IP vendors: Arm, Synopsys, Cadence, Imagination Technologies and CEVA.

- Fabrication foundries ensures first that this layout satisfy their design rules, then start

fabricating them. They play the role of a pure-play foundry; specializing in fabrication
and not competing with their customers [Chip vendors].
- For example: TSMC, Samsung, UMC.
- Now Intel is joining the market to fabricate for other chip vendors.
68
TAPE-OUT
 The layout is sent to the fab house as GDSII or OASIS file format
 The process of delivering to fab is called "Tapeout” as it was sent on a
magnetic tape. Now only email is sent.

69
EXERCISE

• What are the benefits of IPs?


• What are the benefits of integrating more components into a single
chip, moving system blocks from board level to chip level?
[cost, performance. power, integration]

70
DESIGN ABSTRACTION

71
DESIGN ABSTRACTIONS
It is all about hiding details until they become necessary.
The practice of structured design, which is also used in large software
projects, uses the principles of hierarchy, regularity, modularity, and
locality to manage the complexity

72
DESIGN ABSTRACTIONS
Hardware
Lev Modeling Example of Modeling Description
el O bject O bject Language
Used
System C/C++
(Electronic Structural RA bu CP System
Circuit M s U
Verilog
System
Level – Functional Ad System C
ESL) Circuits on the d Accumulator
Register-
level of In ut
Transfer p Command Register
Level multibit devices
+1
– registers,
(RTL) Command Veril
and data Counter
transfer og
Gate Level Circuitthem
between VHD
containing
(Gate level L
logic gates J
73
netlist, (AND, O R, K
Logic circuit) etc.) and flip-
flops
DESIGN ABSTRACTIONS
Hardware
Level Modeling Object Example of Modeling Description
Object Language
Used

Circuit Level
SPICE
(Transistor Electrical Circuit
CDL
Level, SPICE
Netlist)

n+

p+
n
Device Level IC Components -
n+
p
74
NOTES
These levels are interdependent and all influence each of the design
objectives.
For example, choices of microarchitecture and logic are strongly
dependent on the number of transistors that can be placed on the chip,
which depends on the physical design and process technology.

Digital VLSI design favors the engineer who can evaluate how
choices in one part of the system impact other parts of the
system.

75
DIGITAL ASIC DESIGN FLOW
Usually, for electronic systems, either analog or digital;
circuit design is separated from layout design. Each job is performed by a
different engineering team.

Circuit design (RTL design for digital systems) is done by Frontend team.
Layout design (PnR for digital systems) is done by backend team. [Also called Digital
implementation, Chip implementation or ASIC design].

76
DIGITALBACKEND DESIGN
It’s the transformation of a digital circuit design (RTL in Verilog or
VHDL) into a
physical representation (layout) for manufacturing (GDS/Oasis).

The design after being represented in the physical layout has to


meet all signoff
criteria:
A. Mathematically equivalent to the RTL. [Formality clean
VS RTL]
B. Timing signoff.
C. Physical signoff (DRC , LVS, Antenna…etc)
D. Power Integrity signoff. 77
DESIGN VERIFICATION
 Formal (static) verification:
 Checks logic equivalence between different abstraction levels
 Static Timing Analysis:
 Checks timing against constraints
 setup/hold/recovery/removal
 max transition / max
capacitance..
 Physical transistor-level verification
 Layout vs. schematic (LVS)
 Design rule check (DRC)
 EMIR/power Verifications:
 Electromigration and IR drop 78

analysis
DIGITAL IMPLEMENTATION / PNR ENGINEER
ROLE
 Digital Implementation is the last step in the VLSI design flow
 Basically transform RTL to GDSII and verify your design
 Any mistake in the flow ripples to the DI team!
 Interface with other design teams ex: Digital design-verification / analog layout /
other DI teams / Fab house / Cad team / Packaging team / lab team / ..”

 Automation has limitations!


 EDA (Electronic Design Automation) tools are limited and the design problem is
very hard!
 Tradeoffs to consider: (mainly PPA: Power Performance Area):
 Ex: Reduce congestion by placing cells apart VS Improve timing by placing
cells closer.
 More performance always require more power and area.
79
Concept + Market
Overall Design Flow Research
Architectural
Specs & RTL coding

VCS RTL simulation


/Modelsim/Verilator
DC/Yosys
Logic Synthesis (DC)
DFT
Compiler/Fault
Formal verification Formali
(Formality) ty
Floorplanning

Power Planning & boundary cell


insertion
ECO Placeme ICC/ICC2/
rePnR/ nt openRoad
reSythesis
RTL/SDC C T S and post-CTS logical
update optimization

Routing & Chip Finishing

ICV/Calibre Physical Formal StarRC EM / Redha


Extracti
M agic/KLay Verification verification PT/openS IR wk
on
out TA
S TA

N
o Clean?
80
Ye
s
Signoff/Tapeout
EXERCISE 4
LIST 4 MAIN RESPONSIBILITIES OF DIGITAL
FRONTEND TEAMS, AND DIGITAL BACKEND
TEAMS.

81
DIGITAL ASIC DESIGN FLOW
RESPONSIBILITIES: FRONTEND VS
BACKEND
Digital Frontend -Choosing microarchitecture, suitable algorithms to use, number
team (System  of
algorithms  RTL) pipeline stages…etc according to the system specifications.
-Developing synthesizable RTL and constraints.
-Developing suitable D C scripts for synthesis and DFT.
-Specifying power domains and generating UPF maps.
-Performing verification for the RTL.
-Performing gatelevel simulations to check backend deliverables.
Digital Backend team -Meeting timing requirements of setup and hold according
(RTL  logic  logic cells to SC library specifications.
and FFs  standard -Meeting special timing requirements asked by the digital team
cells) (ex. skew balancing).
-Matching digital team intended design: gatelevel netlist
matching RTL,
-Meeting physical design rules specified by the foundary
(DRC, LVS,
Antenna, DFM)
-Minimizing IR drop over the design so that it is below 82
a defined threshold (~2%)
-Minimizing power consumption so that it’s comparable
EXERCISE 5
For timing closure; is it the responsibility of FE, BE, or both?

How can we confirm that our design is ready for tapeout/fabrication? [3 criterias]

83

4
3
STANDARD LIBRARY

84
STANDARD CELL LIBRARY

A good application of design abstraction.


A standard cell is a group of transistor and interconnect structures that provides a
boolean logic function (e.g., AND, OR, inverters) or a storage function (flipflop or latch).

85
STANDARD CELL LIBRARY
• Cell categories
All basic and universal gates (AND, OR, NOT, NAND, NOR, XOR etc)
Complex gates (MUX, HA, FA, Comparators, AOI, OAI etc)
Clock tree cells (Clock buffers, clock inverters, ICG cells etc)
Flip flops and latches
Delay cells
Physical only cells
Scannable Flip flops, Latches.

86
CELL VARIANTS: DRIVE STRENGTHS
Each logic cell (NAND, NOR, INV…) is
implemented in the SC library in:
A. Multiple sizes (x1, x2, x4, x8..etc).
B. Multiple flavors (LVT, SVT).

Each cell will have various drive strengths for


effective speed VS area optimization.
Larger output stage leads to better driving
of fanouts, better delay/performance at the
cost of increased area and leakage power.
Smaller drive strength less area, leakage The higher the width; the
and input cap. more current going
through the channel.
87
CELL VARIANTS: MT-CMOS
One additional mask can provide more or less
doping in a transistor channel, shifting the
threshold voltage.
Most libraries provide equivalent cells with
three VTs: SVT, HVT, LVT to tradeoff speed vs.
leakage.
All threshold varieties have same footprint and
therefore can be swapped without any
placement/routing iterations. [Footprint: pins
and obstructions] 88

88
EXERCISE
What is the preferred cells to be used for the following applications: [size and flavor]
- Heart peacemaker.
- Datacenter processor.
- Battery powered IoT device.

Compare the following cells in terms of: speed, area, and leakage power.
-svt_x2_buf, svt_x8_buf, svt_x16_buf
-svt_x2_buf, lvt_x2_buf [lvt: low V-threshold cell]

-Compare between using a complex “AOI” cell, and implementing same logic using equivalent NAND2 cells, w.r.t overall delay and
performance.

89
STANDARD CELL LIBRARY: TYPICAL VIEWS
Behavioral views
Gatelevel netlist(.v): used for simulation an logic equivalence.
Timing/Power views (.lib): contains characterization of library used for STA and EMIR
analysis. Also input to logic synthesis and PnR tools for optimization.

Physical views
.lef: abstract format for modeling of cells in PnR tools.
.gds: graphical representation of the layout going to be fabricated, used for DRC and
LVS.
.sp: spice netlist contains transistor level representation of cells, used for LVS.

90
STANDARD CELL LAYOUT
• At the top of the standard cell, there is VD D rail and bottom there
is a VSS rail.

• nwell region, near to the VD D rail where pMOS transistors are


built.
• A gap of nwell and pwell dedicated usually for wiring.
• pwell region near the VSS rail where nMOS transistors are built

91
STANDARD CELL LAYOUT [EXAMPLE]

92
LIBERTY FILE (.LIB OR .DB FILE)
• It’s a readable ASCII format that characterizes the standard cell library cells in terms of
timing, area,
power and other parameters.

• The cell is characterized using simulation and timing and power results are obtained
under a variety of
conditions.

93
1. LIBERTY FILE (.LIB OR .DB FILE)
Why to use .lib?
- To know if the design meets timing or not:
• Running SPICE will consume a lot of time and computing
resources.
• Instead, we use a timing model that abstracts cell behavior
and simplify calculations.

For every timing arc, the .lib enables us to calculate:


A. Propagation delay
B. Output transition
Based on:
C. Input transition
D. Output load capacitance

For each signoff corner, we use the provided .lib file for
this corner to perform timing analysis (STA) and
Power analysis as well. 94
LIBERTY FILE (.LIB OR .DB FILE)

95
NON-LINEAR DELAY MODEL (NLDM)
-Non-linear delay models use Spice-derived timing
at several input_transition and output_loadpoints.
-Data-points not found in the tables are linearly
interpolated.
-Note the presence of two tables:
A.One for cell delay,/.
B.and another for output transition (true rise/fall
time).

Why do we need to calculate output


transition
The output transition is needed in addition to the cell delay,
because it is used as the input transition on the
downstream cell to calculate its delay.
Output transition is also used to check against the
max_transition
design rule, if applicable in the library.
96

Input transition also affects output transition in this


model, which affects the cell delay and output transition
1. LIBERTY FILE (.LIB OR .DB FILE)
Timing data of standard cells is provided in the Liberty
format.

• Library
• General information common to all cells in the
library.
• For example. Operating conditions, Wire load
models, Look-up tables.
• Cell
• Specific information about cell characterization.
• For example. Function, Area, leakage power

97

97
1. LIBERTY FILE (.LIB OR .DB FILE)
Timing data of standard cells is provided in the Liberty format.

• Pin
• Timing, power, capacitance, leakage. functionality,
design rules and other characteristics of each pin in
each cell.

98

98
PARASITIC ESTIMATION: WLM

 Parasitics are inevitable.


 Parasitics are not known without layout.
 Delay and Area will be incorrect/optimistic without
estimation of parasitics.
 To calculate output load capacitance of a cell; we need to
calculate both:
1. Input capacitance of the load(s).
2. Wire delay of the interconnect.
During logic synthesis; we don’t have actual placed cells
-> We use WLM to estimate interconnect parasitics, based on
the fanout of the net.

R = length ∙ Runit length


99

C = length ∙ Cunit length


ESTIMATING PARASITICS
Generalization Length components

 The more the fanout (output


connections) the larger
the length

 All parasitics depend on  The larger the chip (the more


interconnect gates it has) the more the
length length
R = length ∙ Runit length
C = length ∙ Cunit length
Area = length ∙ Areaunit
length
length = f (gate count, 100

6
fanout)
0
EXERCISE

For calculating the propagation delay of a specific cell, how can DC/ICC calculate its input transition and load
capacitance?

For critical timing paths, which cells should be used by logic synthesis tools? (size/flavor/complex or simple cells)

101
TIMING GROUP NAMES
N Parameter Unit Symbol Figure Definition
1. Rise transition ns tR The time it takes a driving pin to
VDD
time
make a transition from kVDD to
rise_transition (1-
0.1VDD
tR k)VDD value. Usually k=0.1
VSS
(also possible k=0.2, 0.3,
etc)
2. Fall transition ns tF VDD
time 0.9VDD The time it takes a driving pin
to make a transition from (1-
fall_transition k)VDD to kVDD value. Usually
0.1VDD
k=0.1 (also possible k=0.2,
tF VSS 0.3, etc)
3. Propagation delay ns tPLH IN
Time difference between the
low-to-high (rise) (tPR) 0.5V D D input signal crossing a 0.5VDD
OUT
and the output signal
cell_rise 0.5V D D crossing its 0.5VDD when the
t PLH output signal is changing
from low to high
4. Propagation ns tPHL IN Time difference between the
delay high- 0.5VDD
input signal crossing a 0.5VDD
to-low (Fall) (tPF
) and the output signal crossing
OUT its 0.5VDD when the output
cell_fall 0. signal is changing from high to 103
VDD
t PHL 5
low
CELL TIMING DATA
library(){
lu_table_template ("del_1_7_7")
{ variable_1 :
"input_net_transition"; index_1("1,
2, 3, 4, 5, 6, 7");
variable_2 : "total_output_net_capacitance";
index_2("1, 2, 3, 4, 5, 6, 7");
}

cell (INVX1) {
pin (Y) {
timing ()
{ related_pin :
"A";
timing_type : "combinational";
timing_sense : "negative_unate";
cell_rise ("del_1_7_7") {
index_1("0.016, 0.032, 0.064,
0.128, 0.256, 0.512,
"0.0239648, 0.0255491, 0.0279298,
1.024"); 0.0319930, 0.0387540, 0.0520896, 0.0790211", \
"0.0342118,
index_2("0.1, 0.0366966,
0.25, 0.5, 1, 0.0402223,
2, 0.0462823, 0.0558327, 0.0705154, 0.0967339", \
4, "0.0491695,
8"); 0.0524727, 0.0576512, 0.0665647, 0.0810999, 0.1027237, 0.1342571", \
"0.0721332, 0.0765389,
values("0.016861, 0.0179019,0.0836775, 0.0960890, 0.1171612, 0.1497265, 0.1957640", \
"0.1111560, 0.1164417,
0.0195185, 0.0229259, 0.1252609, 0.1422002, 0.1712097, 0.2171862, 0.2847010", \
"0.1841131,
0.029658, 0.1901881, 0.2010298,
0.043145, 0.2194395, 0.2555983, 0.3182710, 0.4139452");
} 0.07712", \

104
COMBINATIONAL TIMING ARC SYNTAX
Combinational timing arc between input A and output Y, with negative dependence,
i.e.When A is
rising Y is falling and vice-versa

cell (INVX1) { pin (Y) {


timing () { related_pin : "A";
timing_type : "combinational"; timing_sense :
"negative_unate"; cell_rise ("del_1_7_7") {
index_1("0.016, 0.032, 0.064, 0.128, 0.256,
0.512, 1.024");
index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
values("0.0168610, 0.0179019, 0.0195185,
"0.0239648, 0.0255491, 0.0279298, 0.0319930, 0.0387540, 0.0520896, 0.0790211", \
0.0229259, 0.0296588,
"0.0342118, 0.0431451,
0.0366966, 0.0402223, 0.0462823, 0.0558327, 0.0705154, 0.0967339", \
0.0702328", \
"0.0491695, 0.0524727, 0.0576512, 0.0665647, 0.0810999, 0.1027237, 0.1342571", \
"0.0721332, 0.0765389, 0.0836775, 0.0960890, 0.1171612, 0.1497265, 0.1957640", \
"0.1111560, 0.1164417, 0.1252609, 0.1422002, 0.1712097, 0.2171862, 0.2847010", \
"0.1841131, 0.1901881, 0.2010298, 0.2194395, 0.2555983, 0.3182710, 0.4139452");
}

105
DELAY ANALYSIS
Calculation of each timing arc’s
value
cell delay or a net delay
1
Positive unate timing arc combines rise
delays
with rise delays and fall delays with fall
delays
Negative unate timing arc combines 1

incoming rise delays with local fall


delays and vice versa
Non-unate timing arc combines local
delay with the worst – case incoming 1 0

delays logic functions whose output


value change cannot be predicted
106
TIMING CONSTRAINTS: TIMING TYPES
Setup/Hold, Recovery/Removal
N Constraints
Parameter Unit Symbol Figure Definition

1 Setup time ns The minimum period in which the


tSU 0.5VDD
(only for flip-flops or DATA
t SU input data to a flip-flop or a latch
latches) setup_rising must be stable before the active
setup_falling CLOCK
0.5VDD
edge of the clock occurs
DATA

2. Hold time ns tH 0.5VDD The minimum period in which the


(only for flip-flops or input data to a flip-flop or a latch must
latches) 0.5VDD remain stable after the active edge
hold_risin CLOCK tH of the clock has occurred
g
hold_fallin
g
3 Removal time ns
tREM
0.5VDD The minimum time in which the
SET (RESET)
(only for asynchronous asynchronous Set or Reset pin to a
0.5VDD
Set or Reset) CLOCK
flip-flop or latch must remain
removal_rising, tREM enabled after the active edge of
removal_falling the clock has occurred

4 Recovery time ns The minimum time in which Set or


(only for asynchronous tREC 0.5V DD
Reset must be held stable after
Set or Reset) SET (RESET) being deasserted before next
recovery_rising, CLOCK active edge of the clock occurs
recovery_falling 107

6
7
LEF: LIBRARY EXCHANGE FORMAT (.LEF
FILE)
• It’s a readable ASCII format that contains detailed PIN information that is
used later by PnR tools to guide routing.

• The LEF file abstracts the following information to PnR tools:


A. Cell size and shape.
B. Pin locations and layer.
C. Metal blockages (OBS section), that represent internal metal shapes of the
cell not to be touched by routing.

108
LEF: LIBRARY EXCHANGE FORMAT (.LEF
FILE)

109
TECHNOLOGY LEF
Tech .lef contains simplified information about the
technology to be used by the PnR tool. (Physical
synthesis tool)
Layers
Via definitions
Design rules
Antenna data

110
SPICE AND GDS
• SPICE netlist is the netlist of cell in SPICE
format is used for simulation.
• Typically used in digital implementation
for LVS checking.

• GDSII file is a binary file format


representing planar geometric shapes,
text labels, and other information about
the layout in hierarchical form.
• A better alternative widely used now:
Oasis format.
111
.DEF: DESIGN EXCHANGE FORMAT
.def file holds both physical and logical information of the design.
It is used for exchanging information between tools, enabling inter-operability within the ASIC flow. For example,
doing floorplanning and placement with one tool, CTS with a 2nd one, and parasitic extraction with a 3rd one.
This file has information about:
Die/block size, dimensions
Row height
Nets, NDRs
Placement blockages and routing blockages
Macro and std cell location ,
Pin location etc.
It can be used as an input to parasitic extraction tools [.def/.lef flows] 112
SYNTHESIS

113
Logic Synthesis: Background
(HDLs) such as VHDL and Verilog, has emerged as the primary
means to
capture functionality and deal with a range of design issues.

The use of logic synthesis has made it possible to effectively


translate designs captured in these high-level languages to
designs optimized for area and speed.

Besides enabling control over design parameters such as silicon


real-estate and timing, logic synthesis tools facilitate the
capture of designs in a parametrizable and re-usable
form. Moreover, logic synthesis makes it possible to re-target
a given design to new and emerging semiconductor
technologies.
114
Logic Synthesis Definition
“Synthesis is achieving an optimal gate level
netlist from H D L
code.”

The logic synthesis process consists of two steps - translation


and
optimization.
Translation involves transforming a HDL (RTL) description to gates,
optimization involves selecting the optimal combination of ASIC
technology library
cells to achieve the required functionality.

Synthesis is an iterative process aimed at achieving design


115
11
6
Logic synthesis and Formality: Basic Flow

RTL SC
Uncertainti Constrain & .lib
es ts head s
er

Logic Synthesis
(DC)

RTL Prefloorpl SC
& .svf an netlist .lib
head files (.v) s
er

Formal Verification
(Formality)

11
7
Basic Steps of Logic Synthesis

y=(a+b)&(c⊕d)
RTL &e

Logic Synthesis
a

Gat b y
e d
c
Physical Synthesis
e

Layout

118
Synthesis and Optimization
Synthesis/ Translation
The process which converts an
abstract form of desired circuit y=(a+b)&(c⊕d)
behavior into a design &e
implementation in terms of logic a
gates
Optimization
b y
Changing design to achieve d
design goal (required by c
e
specification)
For example. Meeting a design
rule (ex. max_transition, a
2
max_fanout) 3 y
b
For example. Optimizing the design 1
d
to meet a certain design criteri (ex. c
e
minimumu leakage, minimum
119 area…etc.)
Logic synthesis: Basic Flow
Develop HDL files
Libraries objects
link_library Specify libraries
target_library
symbol_library analyze/
Read design elaborate
read_file
Design optimization Set design constraints
constraints Design rule constraints
create_clock set_max_transition
set_clock_latency Define design environment set_max_fanout
set_propagated_clock set_max_capacitance
set_clock_uncertaintly
set_clock_transition Optimize the design compile
set_input_delay
set_output_delay check design
Analyze and resolve design problems report_area
set_max_area
report_constraint
write_file report_timing
Save the design database

12
0
Library Requirements
1. Logic Libraries
2. Symbol Libraries
3. DesignW are
Libraries
4. Physical Libraries

12
1
Logic Libraries
maintained and distributed by semiconductor vendors,
contain information about the characteristics and functions of each cell,
such as cell names, pin names, area, delay arcs, and pin loading.
define design rule constraints.
specify the operating conditions and wire load models for a specific
technology.

122
Logic Libraries
D C uses logic libraries for the following purposes:
Implementing the design function
Resolving cell references
Calculating timing values and path delays
Calculating power consumed

Note
D C Explorer requires the logic libraries to be in .db format. A .db file is
a compiled
version of the ASCII .lib file, with smaller size which result faster tool
execution.

123
Logic Libraries – Setup
Target libraries
They contain the cells used to generate the netlist.
D C Explorer selects functionally correct gates from the target libraries to build a circuit during
mapping.
The target libraries that are used to map a design become the local link libraries for the
design. D C saves this information in the design's local_link_library attribute.
To specify the target libraries, use the target_library variable.

When optimizing the design, Design Compiler uses this library to search for alternative cells that
can be used to optimize the design for better timing, less power or less area.

124
Logic Libraries – Setup
Link libraries
They are used to resolve cell references and macros.
Link libraries contain the descriptions of library cells and sub-designs in a mapped netlist and
can also contain design files.
Link libraries include local link libraries defined in the local_link_library attribute and system
link libraries
specified in the link_library variable.
Link libraries define the delay models that are used to calculate timing values and path delays.

125
Logic Libraries – Setup
search_path
This variable specifies a list of directory paths that the tool uses to find logic libraries and other
files when you specify a plain file name without a path. It also sets the paths where D C
Explorer can continue the search for unresolved references after it searches the link
libraries.

You can use the which command to list the library files in the order as found by D C Explorer.
dc_shell> which my_lib.db
/usr/lib/my_lib.db, /usr/vhdl/my_lib.db

set search_path "../ref/models ../ref/icons“


set link_library "* saed90nm_typ_ht.db“
set target_library "saed90nm_typ_ht.db“
set symbol_library "saed90nm.sdb"

126
Logic Libraries – Link Command
For a design to be complete, it needs to be connected to all of the library
components and designs it
references.
The references must be located and linked to the current design in order for the design to
be functional.

The purpose of this command is to locate all of the designs and library components
referenced in the current
design and connect (link) them to the current design.

Search order during a link is


1.local_link_library  2. link_library  3. search_path
The first occurrence of a design reference is used.

127
Logic synthesis: Basic Flow
Develop HDL files
Libraries objects
link_library Specify libraries
target_library
symbol_library analyze/
Read design elaborate
read_file
Design optimization Set design constraints
constraints Design rule constraints
create_clock set_max_transition
set_clock_latency Define design environment set_max_fanout
set_propagated_clock set_max_capacitance
set_clock_uncertaintly
set_clock_transition Optimize the design compile
set_input_delay
set_output_delay check design
Analyze and resolve design problems report_area
set_max_area
report_constraint
write_file report_timing
Save the design database

128
Reading Design – analyze & elaborate commands
The analyze command performs the following tasks:
Reads an HDL source file
Checks for errors without building logic for the design
Creates HDL library objects in an HDL-independent
intermediate format
Stores the intermediate files in a location you define
analyze -format verilog -lib work "${verilog_files}"

12
9
Reading Design – analyze & elaborate
The elaborate command performs the following tasks:
Translates the intermediate design into a technology-independent logic
design using generic technology (GTECH) library elements
Allows changing of parameter values defined in the source code
Allows VHDL architecture selection
Replaces HDL arithmetic operators in the code with DesignWare
components
elaborate <.syn file> -arch “<architecture >” –param
“<parameter>”
elaborate <top_cell_name> -arch “<architecture >” –param
“<parameter>”

13
0
Reading Design – read_file
The read_file command
Reads several different formats
Performs the same operations as the analyze and elaborate commands in
a single step
Creates .mr and .st intermediate files for VHDL
Does not create any intermediate files for Verilog
read – format verilog adder.v
read_verilog adder.v

13
1
Logic synthesis: Basic Flow
Develop HDL files
Libraries objects
link_library Specify libraries
target_library
symbol_library analyze/
Read design elaborate
read_file
Design optimization Set design constraints
constraints Design rule constraints
create_clock set_max_transition
set_clock_latency Define design environment set_max_fanout
set_propagated_clock set_max_capacitance
set_clock_uncertaintly
set_clock_transition Optimize the design compile
set_input_delay
set_output_delay check design
Analyze and resolve design problems report_area
set_max_area
report_constraint
write_file report_timing
Save the design database

13
2
Overview of the Optimization Process
Optimization is the step in the synthesis process that attempts to
implement a combination of library cells that meets the functional,
speed, and area requirements of your design.
Optimization transforms the design into a technology-specific
circuit based on the attributes and constraints you place on the
design.
Design Compiler performs the following levels of optimization in
the following order:
Architectural Optimization
Logic-Level Optimization
Gate-Level Optimization

133
Logic-Level Optimization
Logic-level optimization works on the GTECH netlist. It
consists of the following two processes:
Structuring
During structuring, Design Compiler searches for subfunctions that can be
factored out and evaluates these factors, based on the size of the factor and
the number of times the factor appears in the design.
Structuring is constraint based
It can result in reduced design area.

Flattening
The goal of this process is to convert combinational logic paths of the design to a
two-level, sum- of-products representation.
Flattening is carried out independently of constraints.
It is useful for speed optimization because it leads to just two levels of
combinational logic.
13
4
Gate-Level Optimization
Gate-level optimization works on the generic netlist created by
logic synthesis to produce a technology-specific netlist. It includes
the following processes:
Mapping
Design rule fixing
Delay optimization
Area optimization

135
Cost Functions
The synthesis tool performs optimization by minimizing cost functions .
one for design rule costs and the other for optimization costs

The optimization cost function consists of four parts in the


following order of importance:
Max delay cost
Min delay cost
Max power cost
Max area cost
Notes:
You can change the priority of the constraints by using the
set_cost_priority command.
You can disable design rule fixing by specifying the -no_design_rule
option when you run the
compile command.
136
Compile Command
The compile command performs logic-level and gate-level
synthesis and optimization on the current design.
The compile command reports progress in real time by
displaying a report

137
Logic synthesis: Basic Flow
Develop HDL files
Libraries objects
link_library Specify libraries
target_library
symbol_library analyze/
Read design elaborate
read_file
Design optimization Set design constraints
constraints Design rule constraints
create_clock set_max_transition
set_clock_latency Define design environment set_max_fanout
set_propagated_clock set_max_capacitance
set_clock_uncertaintly
set_clock_transition Optimize the design compile
set_input_delay
set_output_delay check design
Analyze and resolve design problems report_area
set_max_area
report_constraint
write_file report_timing
Save the design database

13
8
D C Analysis Commands
report_cell/report_reference
To get information about the cells/references used in your design.
report_timing
To get the timing information for paths
report_constraints
To get the endpoints that are violating your constraints
report_area
To get information about your design area.
check_design
To check the current design for problems, and issues error and warning
messages to explain them.
check_timing
To check for possible timing problems in the current design

10
0
Logic synthesis: Basic Flow
Develop HDL files
Libraries objects
link_library Specify libraries
target_library
symbol_library analyze/
Read design elaborate
read_file
Design optimization Set design constraints
constraints Design rule constraints
create_clock set_max_transition
set_clock_latency Define design environment set_max_fanout
set_propagated_clock set_max_capacitance
set_clock_uncertaintly
set_clock_transition Optimize the design compile
set_input_delay
set_output_delay check design
Analyze and resolve design problems report_area
set_max_area
report_constraint
write_file report_timing
Save the design database

140
Write Command
Use write command to save and export the design from memory to
disk, in the
required format.
write command is actually-format
write -hierarchy an aliasverilog
for the –output
write_file command.
gatenetlist.v

Options for this command:


-format file_format : Specifies the output format of the design. Supported output
formats and their
descriptions are as follows:
ddc - Synopsys internal database format (the default format)
verilog - IEEE Standard Verilog
svsim - SystemVerilog netlist wrapper
vhdl - IEEE Standard VHDL

-hierarchy :Writes out all the designs in the hierarchy.


RECOMMENDED DIRECTORY STRUCTURE
FOR DC
Verilog for Synthesis
- revisited

143

3
SOME THINGS WE MAY HAVE MISSED
• Now that we’ve seen how synthesis works, let’s revisit some of
the things we may have skipped or only briefly mentioned earlier…
• Let’s take a simple 4x2 encoder as an example:
• Take a one-hot encoded vector and output the position of the ‘1’
bit.
• One
always @(x)possibility would be to describe this logic with a nested if-
begin :
else (xblock:
if
encode == 4'b0001) y = 2'b00;
else if (x == 4'b0010) y = 2'b01;
else if (x == 4'b0100) y = 2'b10;
else if (x == 4'b1000) y = 2'b11;
else y = 2'bxx;
end

• The result is known as “priority


144
logic”
• i.e., some bits have priority over
SOME
• It would haveTHINGS
been betterWE
to useMAY HAVE
a case MISSED
• All cases are
construct: always @(x)
begin :
matched in encode
parallel case4’b0001
(x) y = 2'b00;
:
• And better yet, 4’b0010 y = 2'b01;
:
synthesis can 4'b0100 y = 2'b10;
optimize away the :
endcas
4'b1000 y = 2'b11;
constants and other e end :
Boolean equalities: default y = 2'bxx;
:

145
SOME THINGS WE MAY HAVE MISSED
• In the previous example, if the encoding was wrong (i.e., not one-hot), we
would have propagated an x in the logic simulation.
• But what if we guarantee that the input was one hot encoded?
• Then we could write our code differently…
always @(x)
begin :
if (x[0])
encode y = 2'b00;
else if (x[1]) = 2'b01;
y
else if (x[2]) = 2'b10;
y
else if (x[3]) = 2'b11;
eny
d else y = 2'bxx;
• In fact, we have implemented a “priority
decoder” (the least significant ‘1’ gets
priority) 146
A FEW POINTS ABOUT OPERATORS
• Logical operators map into primitive logic gates Y = ~X <<
2
• Arithmetic operators map into adders, subtractors,
X[3]

• Unsigned or signed 2’s complement X[2] Y[5] Y[4
]
• Model carry: target is one-bit wider that source X[1]
• Watch out for *, %, and / X[0] Y[3
]
• Relational operators generate comparators Y[2] Y[1
]
• Shifts by constant amount are just wire
connections Y[0
]
• No logic involved
• Variable shift amounts a whole different story  147

shifter
DATAPATH SYNTHESIS
• Complex operators (Adders, Multipliers, etc.) are implemented in a special
way

• Pre-written descriptions can be found in


Synopsys DesignWare or Cadence ChipWare IP 148

libraries.
Global Clock
Gating
CLOCK GATING enF
FSM

• As you know, since a clock is continuously toggling, it


is a major consumer of dynamic power. enE Execution
• Therefore, in order to save power, we will try to Unit
turn off the clock for gates that are not in use.
• Block level (Global) clock-gating
• If certain operating modes do not use an entire enM Memory

module/component, a clock gate should be clk Control


defined in the RTL.
• Register leveleven
• However, (Local) clock-gating
at the register level, Local Clock
if a flip-flop doesn’t change it’s Gating din d q dout
output, internal power is still
dissipated due to the clock d q
din dout
toggling. en qn
• This is very typical of an enabled signal en clk
qn clk
sampling, and therefore can be clk 149

automatically detected and gated by clk


the synthesis tool.
CLOCK GATING
• Local clock gating: 3 • Conventional RTL
methods Code
//always clock the
register always @
• Logic synthesizer finds (posedge clk) begin
and implements local if (enable) q <= din;
end
gating opportunities • Low Power Clock Gated
• RTL code explicitly
specifies clock gating RTL
//only clock the ff when enable is true
assign gclk = enable && clk;
always @ (posedge gclk)
• Clock gating cell begin q <= din;
explicitly instantiated end
in RTL
• Instantiated Clock Gating
Cell
//instantiate a clock gating cell
• Global clock gating: 2 clkgx1 i1
(.en(enable), .cp(clk), .gclk_out(gclk));
methods always @ (posedge gclk) begin
q <= din;
150

• RTL code explicitly end


CLOCK GATING – GLITCH PROBLEM
• What happens if there is a glitch on the enable
signal?
cl
k

en

What if the
gcl happened
glitch
k during the high
phase?

Ah, we live in Not Maybe the world


a so aint
151
perfect world! Fast! so perfect after
 all…
SOLUTION: GLITCH-FREE CLOCK GATE
• By latching the enable signal during
the positive phase, we can eliminate
glitches:
cl
k

en
//clock gating with glitch prevention
latch always @ (enable or clk)
begin
if (!clk)
en_out en_out <= enable;
end
assign gclk = en_out && clk;

gcl 152

k
MERGING CLOCK ENABLE GATES
• Clock gates with common enable can be
merged
• Lower clock tree power, fewer gates
• May impact enable signal timing and
skew.
E

cl
k
E cl
k
en E

enable E
153
DATA GATING
• While clock gating is very well understood and automated, a similar
situation occurs due to the toggling of data signals that are not used.
• These situations should
be recognized and data
gated.
assign add_out = A+B;
assign shift_out =
A<<B;
assign out = shift_add
? shift_out : add_out;

assign shift_in_A = A && shift_add;


assign shift_in_B = B && shift_add; 154
assign shift_out = shift_in_A <<
assign out = shift_add ? shift_out :
shift_in_B;
add_out;
REFERENCE

• Introduction to FPGA Technology: Top 5 Benefits. http://www.ni.com/white-paper/6984/en#toc1


• FPGA introduction, Ovind Harboe
• http://www.fpga4fun.com/FPGAinfo1.html
• http://en.wikipedia.org/wiki/Field-programmable_gate_array#Architecture
• The Design Warrior's Guide to FPGAs
• Virtex-6 FPGA Configurable Logic Block User Guide
• http://www.cse.unsw.edu.au/~cs4211/seminars/va/VirtexArchitecture.html
• http://www.1-core.com/library/digital/fpga-logic-cells/
• http://www.ni.com/white-paper/7440/en

155

You might also like