SoC Mod2 Notes
SoC Mod2 Notes
Module 2
Karun Menon
The Macro Design Process (Ch 4)
• RTL code must also meet the testability requirements for the macro. Most macros
will use a full-scan test methodology - require 95% coverage (99% preferred).
• Use a test insertion tool to perform scan insertion and automatic test pattern
generation for the macro.
• Use a static timing analysis tool to verify the final timing of the macro.
• The Macro design flow does not imply a rigid, top-down design methodology. For
eg. frequently, some detailed design work must be done before the specification
is complete, just to make sure that the design can be implemented.
• A rigid, top-down methodology - one phase cannot start until the preceding one
is completed.
• Mixed methodology, - one phase cannot complete until the preceding one is
completed
Soft Macro Productization
Productizing the macro - creating the remaining deliverables that system integrators
will require for reuse of the macro:
• Versions of the code, testbenches, and tests that work in both Verilog and VHDL
environments
• Supporting scripts for the design - installation scripts and synthesis scripts required
to build the different configurations of the macro.
• Documentation - includes updating all the functional specifications and generating
the final user documentation from them.
• Final version locked in a version control system
All deliverables must be in a revision control system to allow future maintenance.
(page 265)
Develop a prototype chip using the macro
• Developing a chip using the macro and testing it in a real application
with real application software allows us to:
• Verify that the design is functionally correct.
• Verify that the design complies with the appropriate standards (for
instance, we can take a PCI test chip to the PCI SIG for compliance
testing).
• Verify that the design is compatible with the kind of
hardware/software environment that other integrators are likely to
use.
Hard Macros
• Hard macros are macros that have a physical representation, and are
delivered in the form of a GDSII file.
• As a result, hard macros are more predictable than soft macros in
terms of timing, power, and area.
• Hard macros do not have the flexibility of soft macros; they provide
only a fixed configuration, and are not user-configurable.
Hard Macros (Ch 8)
• Every macro starts out as soft, for RTL has to be the reference
implementation model.
• Every macro ends up in GDSII, and thus in hard form
• The only real distinction between hard and soft macros is at which
stage of design the developer hands the macro over to the chip
designer.
• Hard macros are just soft macros that are hardened before they are
integrated into the chip design
• But in some cases - may include full-custom design
• The silicon vendor provides the timing and functional models as well
as block outlines and pin locations to the chip designer.
• The chip designer uses the timing and functional models for the hard
macro while designing the rest of the chip - RTL is not provided
When (and Why) to Use Hard Macros
• The design is pushing performance to the limit of the silicon process -
Physical design needs to be done by the designer to get optimal
performance
• The chip designer wants to reuse a macro without having to perform
functional or physical verification.
• The design requires some full-custom design, and so cannot be
delivered in soft(that is, synthesizable) form
• The macro provider does not want the chip designers to have access
to the RTL – IP protection.
• To prevent the possibility of the user modifying the macro.
Affects the ease with which the macro can be integrated into the final chip.
A large hard macro with an extreme ratio can present significant problems in
placing and routing an SoC design.
An aspect ratio close to 1:1 minimizes the burden on the integrator. Aspect
ratios of 1:2 and 1:4 are also commonly used.
A non-square aspect ratio (for example, a tall, narrow block), means that
there will be more routing in vertical direction than in the horizontal.
This asymmetric demand on routing resources can lead to problems during
place and route.
This is another reason why macro designers typically try for a 1:1 aspect ratio.
The hard macro designer has to implement a clock structure in the hard macro without
knowing in advance the clocking structure of the chip in which the macro will be
used.
The designer should provide full clock buffering in the hard macro, and provide a
minimal load on the clock input(s) to the macro.
One key problems is that the hard macro will have a clock tree insertion delay; that
is, the delay from the clock input pin of the macro, through the clock buffers, before
the clock arrives at the internal flops.
This delay affects the setup and hold times at the macro’s inputs and its clock-to-
output delays. The chip designer needs to account for this when integrating the macro
into the chip.
Porosity
• Hard Macros can present real challenges to the integrator if they
completely block all routing. Solutions:
• Provide routing channels through the macro,
• or Reserve routing layers above the macro for chip-level routing.
Antenna Checking
• During fabrication process - metal layers are built up one-by-one - a
transistor gate can end up with a long piece of metal attached to it
which, until other layers are added, is not connected to any path to
ground.
• This piece of metal can act like an antenna, developing a large static
charge and damaging the transistor.
• Antenna rules (provided by the foundry) are used to determine how
long an unconnected wire is acceptable.
• Antenna checking and fixing - painful and time consuming tasks in
physical verification
• See diagrams
• The routing tools do not know the geometries inside the hard macro
-> the routing to and from the clock pins, combined with the internal
signal routing can produce antenna violations that the router cannot
detect or fix.
• Insert diodes of sufficient size on I/O pins to avoid violating the metal-
to-diode ratio rule.
Output and Bidirectional Pins
• Every output/bidirectional pin of the hard macro should have antenna
diodes inserted.
• Connections from the pin to the diode should be very short and
connect down to Metal 1
Input Pins
• Should not have antenna violations when physical verification is run
stand-alone
• Use very short wires from the macro input pin to the receiving cell
Circuit consist of a driver and at least
one receiver, which will consist of a
gate electrode over a thin gate
dielectric.
Since the gate dielectric is very thin,
only a few molecules thick, a big worry
is breakdown of this layer.
This can happen if the net somehow
acquires a voltage somewhat higher
than the normal operating voltage of
the chip.
Once the chip is fabricated, breakdown due to antenna effect cannot happen,
since every net has at least some source/drain implant connected to it. The
source/drain implant forms a diode, which breaks down at a lower voltage
than the oxide (either forward diode conduction, or reverse breakdown), and
does so non-destructively. This protects the gate oxide.
A diode can be formed away from a MOSFET source/drain, for example,
with an n+ implant in a p-substrate or with a p+ implant in an n-well. If the
diode is connected to metal near the gate(s), it can protect the gate oxide.
This can be done only on nets with violations
Model Development for Hard Macros
• Functional models: used by the integrator to develop and verify the RTL for the
rest of the chip
• Timing model: used by the integrator to run static timing analysis on the entire
chip
• Power model: used by the integrator to estimate power dissipation and IR drop
for the entire chip
• Test model: used by the integrator to develop a complete manufacturing test for
the chip
• Physical models: used by the integrator during physical design of the rest of the
chip
System Integration with Reusable Macros (Ch 10)
• Integrating completed macros into the final (SoC)
• Two key tasks remaining: physical design and functional verification.
Challenges:
• Physical design
- Achieving timing closure - this chapter
• Functional verification
- Knowing when we are done (verification complete)
– next chapter
Integrating blocks and doing physical design
Process:
• Selecting IP blocks and preparing them for integration
• Integrating all the blocks into the top-level RTL
• Floor planning and timing model generation
• Synthesis and initial timing analysis
• Physical synthesis and timing analysis, with iteration until timing closure
• Detailed route, timing verification, and power analysis
• Physical verification of the design
(process similar to hard macro ..)
Process of integrating the various blocks into the final version of the chip and
getting the chip through physical design ( representative representative) shown
in next slide
Design Planning
Start of the design, before blocks are designed or IP selected, the team should do
an initial estimate of
• Die size, (silicon die before pkg ing)
• Number of I/O pads, and
• Power dissipation.
This information is key for determining package type.
Preliminary floorplan.
• should include a rough placement of blocks and I/O pads
• some preliminary planning for the power and clock distribution.
• can be used to provide more accurate wire load models and timing budgets for
synthesis.
If only the outputs of each block are
registered, then the relative placement of
the block on the chip affects both the wire
load model and the time budget of the
block.
• Scan insertion is typically done by a DFT (Design for Testing) tool. Note
that at this point, the DFT tool merely replaces standard flops with scan
flops.
Physical Placement
• Block placement
• I/O pad placement
• Placement of the I/O cells for each block
Flat vs. Hierarchical Placement
Hierarchical Placement
• Floorplan is developed early, and a location for each major block
identified. Pin locations for the I/O of each block are assigned.
• Some room between blocks is reserved for top-level routing; all routing
between blocks is restricted to this area
• Top-level routing is performed before place and route of the blocks; as a
result, the wire length and capacitive loading for each top-level wire is
fixed.
• Based on the information from the steps above, each block is placed and
routed independently, and then placed in the top-level design
Flat Placement
• Floorplan is developed early, and a location for each major block
identified.
• Based on the information from the steps above, timing constraints
are developed for the design.
• Placement constraints are developed based on the floorplan, and the
entire design is placed as a unit.
• Detailed routing is performed on the chip as a single unit
• Note that a detailed floorplan is still needed; it allows us to develop
the timing constraints for placement.
Recommendation:
• The only strong recommendation in this area is that the physical
hierarchy should reflect the logical hierarchy.
• A physical block may consist of several logical blocks, but a single
logical block should never be split across several physical blocks
unless absolutely necessary to meet timing.
The resulting name changes makes it very difficult to work with the
post-layout netlist and to troubleshoot problems.
Clock Tree Insertion
• Balance the insertion delay from the clock source to leaf cells by
buffering the clock(s)
• Since these are the most critical nets in the design, and need to be
balanced to minimize clock skew, they are routed first.
• Buffering clock trees require a very large number of buffers. For this
reason, designers reserve a buffer site near each group of flops. If the
site is not needed, it takes up some small incremental area, but this is
well worth it if it speeds convergence of clock tree insertion.
Detailed Route
• After clock tree insertion - detailed route of the design. This is the first
time we have a complete physical design. more accurate assessment of
the timing and power
Parasitic Extraction
• 3-D extraction engine to calculate the actual load of each metal
interconnect segment in the design. These extraction engines are full
field solvers that give much more accurate delay modeling as compared
to statistical interconnect load models (min,typ,max) used in synthesis
and placement
• Interconnect capacitances of submicron technologies are primarily
determined by sidewall (or coupling) capacitance. With multi-metal
process, capacitances need to be addressed in three dimensions. The
different capacitances include wire-to-wire (same layer), wire-to-
substrate, and wire-to-wire (different layers).
• Thus, it becomes increasingly crucial and necessary to have a reliable 3-
D extraction engine to accurately model the timing of the design.
• With the extracted data, we can now do a full static timing analysis
and determine the timing of the design
Static Timing Analysis
Timing Fixes