0% found this document useful (0 votes)
155 views10 pages

Core Design and SOC Integration

In the past, vendor design and partial integration were the dominant approaches. The desktop approach gives the system designer the most flexibility and fastest time to market. In the vendor design approach, an ASIC vendor designs the core and ASIC to meet a customer's functional specification.

Uploaded by

rodrahul
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views10 pages

Core Design and SOC Integration

In the past, vendor design and partial integration were the dominant approaches. The desktop approach gives the system designer the most flexibility and fastest time to market. In the vendor design approach, an ASIC vendor designs the core and ASIC to meet a customer's functional specification.

Uploaded by

rodrahul
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

.

CORE DESIGN

Core Design and System-on-a-


Chip Integration
ANN MARIE RINCON ALTHOUGH SYSTEM - ON - A - CHIP (SOC) tegration were the dominant approaches.
CORY CHERICHETTI design takes various forms, we can identify Recent strides in core modeling and ASIC
JAMES A. MONZEL three main approaches (as described by design methodology allow system develop-
DAVID R. STAUFFER Susan Mason of Information Architects, ers to choose any of the three approaches.
MICHAEL T. TRICK speaking at the 1996 Design Automation They must consider many variables in se-
IBM Microelectronics C orp. Conference): vendor design, partial inte- lecting an approach appropriate for a pro-
gration, and desktop. These approaches ject and/or a design team. For example,
provide a wide range of design flexibility designers capable of taking control of the
and time-to-market scenarios. In the vendor process should have that option—that is, for
design approach, an ASIC vendor or a de- them, the desktop approach is most appro-
sign services group designs the core and priate. From the core developer’s point of
ASIC to meet a customer’s functional spec- view, designing a core to support the desk-
ification. This approach gives the system de- top approach facilitates all three scenarios.
signer little or no control over the design Here, we describe IBM designs that cover
schedule and may result in a longer time to the spectrum of SOC design approaches. Two
market and less design flexibility than oth- designs used the desktop approach. The third
er approaches. Although vendor design can design, developed by both the ASIC vendor
lead to the lowest production cost per die, (IBM) and the customer, represents the par-
it may also result in high engineering costs, tial integration approach. The fourth design,
thus lending itself only to high-volume implemented entirely by IBM to meet the sys-
applications. tem designer’s functional specification, ex-
Using the partial integration approach, the emplifies the vendor design approach.
system designer creates most or all the ASIC Hard cores (see “Terminology” box) in
gates, and the ASIC vendor designs and inte- these designs include a 32-bit PCI (peripheral
grates the core with the customer’s ASIC log- component interface), a PowerPC 401 CPU, a
IBM’s experience with ic. This approach provides a more flexible memory controller from Rambus, Inc., a video
core-based designs division of labor, and the system designer has PLL (phase-locked loop), a video DAC (digi-
provides insight into some control over the design schedule. tal-analog converter), and two high-speed
methodology, SOC But the desktop approach gives the sys- SRAMs. Several soft cores serve either as stand-
design styles, core tem designer the most flexibility and enables alone functions or to integrate several hard
design trade-offs, and the fastest time to market. In this scenario, cores into a single function. All the cores are
ASIC design processes. the ASIC vendor designs the core, and the 0.35-µm technology implementations.
The authors describe a designer builds the ASIC logic and integrates
prototype cosimulation the core, using a standard ASIC design flow. Design-for-integration techniques
system developed for the The ASIC vendor is no longer the design Core design techniques such as parame-
PowerPC core and schedule gatekeeper. The desktop approach terization, function partitioning, on-chip bus-
present SOC designs to incurs the lowest engineering cost and is suit- es, built-in modularity, and use of firm cores
illustrate their methods. able for lower-volume solutions. address the challenges of integrating cores
In the past, vendor design and partial in- into ASICs.

26 0740-7475/97/$10.00 © 1997 IEEE IEEE DESIGN & TEST OF CO MPUTERS

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

Parameterization. Parameterization is a key technique


for customizing soft cores. By using VHDL generics or Verilog Terminology: core types
parameters and the gate-level netlist, designers can select
or eliminate commonly customized features during synthe- A n industry standard for c ore terminology is just be-
sis. A parameter set defined and verified by the core devel- ginning to emerge. It may differ from the terminologies in
oper limits allowed modifications, greatly reducing the need use at IBM and other companies, and likewise those ter-
for subsequent reverification by the customer and simplify- minologies may differ from each other. Therefore, we de-
ing or minimizing the tasks of supporting multiple core con- fine the following terms as we use them in this article.
figurations. Examples of features that can be customized A synthesizable core comes in a technology-indepen-
through parameterization are the core base address, baud dent high-level description language form. The core’s lay-
rate clock, bus interface selection, number of DMA chan- out is c ompletely flexible, but its speed and density are
nels, and number of FIFOs. limited by the c harac teristic s of the A SIC c ell library in
whic h it is implemented. Synthesizable c ores require
Function partitioning. An alternative to implementing ground-up synthesis, test, static timing analysis, and user
a function as a single, monolithic hard core is to partition it verific ation. Their flexibility limits their “drop-in” design
into multiple hard, firm, and soft cores. Portions of the func- usability and their ability to leverage performanc e and
tion that are timing-critical (such as a CPU) or function- area optimization characteristics.
critical (analog elements) are best implemented as hard A soft core is a technology-dependent gate-level netlist.
cores. A firm-core implementation is appropriate for a func- In some cases, a soft core also includes a small amount of
tion without timing requirements that dictate custom lay- high-level technology-independent code that a designer can
out, but requiring intellectual property encryption. Functions use to parameterize the core during synthesis. Because of
that do not require fixed layout or IP encryption, or that are the core’s technology-dependent nature, its size and speed
candidates for frequent customization, are best imple- are more predictable than those of synthesizable cores. The
mented as soft cores. soft core’s layout is flexible, but floor-planning guidelines
The reduced size and increased flexibility of the core el- may be necessary to achieve performance targets.
ements resulting from partitioning alleviate chip-level rout- A firm core reaches the customer in the form of an en-
ing problems that occur with large, monolithic hard cores. crypted or abstracted black box that protects the core’s in-
The core developer can provide the partitioned functions tellectual property content. Designers incorporate a firm
to the designer as individual cores or as a top-level soft-core core into an A SIC design in the same manner as a library
netlist containing the hard-, firm-, and soft-core submodules. element. The A SIC vendor lays out the core, using a tech-
Designers can modify the top-level netlist by following guide- nology-dependent gate-level netlist. This provides flexibil-
lines set forth in each core’s documentation. ity in the chip layout process because the core form factor
An example of effective function partitioning is the RAM- is not fixed. A firm core’s size, aspect ratio, and pin loca-
DAC core, which originated from the IBM 526DB palette DAC tion c an be c hanged to meet a c ustomer’s c hip layout
standard product chip. The palette DAC function consists needs, and floor-planning guidelines assist the c hip de-
of digital logic, high-speed SRAMs, several analog compo- signer in making trade-offs. The tec hnology-spec ific na-
nents, two PLLs, and a 10-bit video DAC. ture of a firm c ore makes it highly predic table in
It is possible to implement the palette DAC as a single, performance and area. A nd, because layout uses a gate-
monolithic hard core. But doing so complicates floor plan- level netlist, a firm core has the same porosity and routabil-
ning, forcing the layout engineer to work with a large, in- ity as a soft core.
flexible block with little wiring porosity. The core must be A hard c ore is also an enc rypted or abstrac ted blac k
close to the appropriate chip signal and test pads. It must box, which designers incorporate into an A SIC design in
not block signal wires in other logic blocks on the chip. And the same manner as a standard cell library element. Unlike
the sensitive analog circuitry must not have any interference firm cores, however, hard cores have a fixed, custom phys-
from noisy signals routed near the core. Partitioning the ic al layout. The tec hnology-spec ific layout allows maxi-
palette DAC makes it easier to place each subcomponent mum optimization in terms of performanc e and density.
next to the required pad locations without blocking other Hard cores, however, have the most limited vendor porta-
signals or receiving interference. bility and greatest difficulty of reuse when moving to a new
The single-core implementation prevents chip designers process technology. A hard core may contain significant
from customizing the function by substituting custom logic routing blockages (poor porosity), making the placement
for the IBM digital portion of the palette DAC. We decided of other blocks and chip-level routing difficult.
to partition the standard product function into four distinct

OCTOBER–DECEMBER 1 9 9 7 27

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

CORE DESIGN

Figure 1. Partitioned RAMDAC core integrated with customer Figure 2. PowerPC -core-based design. Unlabeled blocks are a
logic. Unlabeled blocks are SRAMs and register arrays. PLL, RAMs, and register arrays.

blocks that designers could use either separately or as an in- Interfacing with on-chip buses. Designers can use stan-
tegrated solution. We implemented the video DAC, PLL, and dard on-chip buses to eliminate the suboptimal glue logic re-
high-speed SRAM as individual hard cores in the range of quired to integrate one or more cores with customer logic or
10,000 to 30,000 cells each. We implemented the remaining with each other. Successfully used in the world of printed
digital logic as a soft-core netlist (RAMDAC) that included circuit board design for several years, this method is ex-
the other hard cores as components and was the delivery tendable to SOC designs. Standard buses ease integration
vehicle for the integrated solution. of peripherals and independent design of user modules by
Figure 1 shows an example of the successful use of the providing a standard interface and communication proto-
RAMDAC integrated solution. Other chip designs, requiring col. Core logic and customer logic designed to a consistent
only the PLL, video DAC, or SRAM, could use these cores as protocol can quickly interconnect without requiring addi-
stand-alone functions without sacrificing valuable silicon to tional glue logic gates.
unused palette DAC functions. As in board design, the latency, bandwidth, and interface
Another example of multiple cores derived from a stan- compatibility trade-offs among blocks of differing traffic char-
dard product chip is the PowerPC core product line (based acteristics dictate the need for a hierarchy of buses. For the
on the PowerPC 40X chip series). We divided the PowerPC PowerPC cores, IBM devised a dual on-chip bus architecture
microcontroller chip into a hard core and several soft cores. (Figure 3): The processor local bus (PLB) serves high-speed
The timing-critical CPU became a hard core; peripheral devices, and the on-chip peripheral bus (OPB) serves lower-
functions such as the DMA controller, external bus inter- speed peripheral devices. A separate core provides arbiter
face unit (EBIU), timers, and serial port unit (SPU) became logic for each bus, and a bridge core transfers data between
soft cores. the two buses to further enhance usability.
The first chip to use these PowerPC cores contained the In addition to the PowerPC peripheral cores, many other
401 CPU hard core and the SPU soft core (Figure 2). It did not cores interface to the PLB/OPB bus structures. These include
use the off-chip memory interface core (EBIU). The appli- a UART (universal asynchronous receive transmit), a time di-
cation called for the Rambus high-speed memory interface, vision multiplexer, a universal serial bus, an Ethernet, an
provided as a separate, mixed analog-digital hard core. HDLC (high-level data line controller), a MAL (memory ac-
Because we had partitioned the 40X PowerPC function into cess layer), an IIC (interintegrated circuit) serial bus inter-
multiple individual cores, the customer could select and use face, and an IEEE 1284 parallel port unit.
only the functions required for the design. Figure 4 represents the architecture of a design using the

28 IEEE DESIGN & TEST OF CO MPUTERS

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

401 core, several PowerPC soft-core peripherals, and cores


specific to wired communication applications. All cores in- OPB
External bus interface unit arbiter
terface to each other through the PLB/OPB bus structures.
The PowerPC EBIU and code decompression cores connect
DRAM SRAM
directly to the PLB bus; the Ethernet, HDLC, and MAL cores controller controller
interface with the OPB. The chip designer chose a custom Serial
UART design and created a general-purpose I/O macro, de- port
signing these blocks for the OPB interface and verifying them

On-chip peripheral bus (OPB)


PLB OPB
with the PLB/OPB model toolkit. arbiter bridge
Model toolkits for standard board-level buses such as ISA, Timers
EISA, PCI, and SCSI have been available for several years to
help chip designers verify their bus communication logic.
These toolkits are equally important to the SOC designer, Processor local bus (PLB)
Custom
providing an efficient design environment by allowing de- logic
sign and verification of macro functions without simulation
of the complete project. The PLB/OPB toolkit includes bus
master and slave models and a bus arbiter model for each PowerPC DMA Custom
bus. Also included is a bus monitor for verifying all bus trans- CPU core controller logic
actions. Using this toolkit, the designer of the chip in Figure
4 saved an estimated two to three months of design and ver- Figure 3. Dual on-chip bus architecture.
ification time.
Predesigned cores, which constitute over 90% of the chip
logic, shortened the design time by an estimated 50% to 75%.
Wired in four levels of metal, the design achieved over 80%
utilization of available cells on the die. Only the CPU core
presented significant blockage to the layout system. All the OPB
External bus interface unit arbiter
other cores, which were soft, were flat-routed with standard
layout software and techniques.
DRAM SRAM
controller controller
Hard-core modularity. Core developers can provide de- MAL
sign flexibility, even for a timing-critical hard core, by parti-
tioning its function into several small, individual hard cores,
PLB OPB
or “chiplets.” The timing-critical layout blocks are retained arbiter bridge
in smaller subblocks, enabling the designer to customize Ethernet
the core by selecting only the needed chiplets. Automatic
generation of placement records, based on the selected con-
On-chip peripheral bus (OPB)

figuration, maintains timing at the core’s top level. Decompression


Meeting timing constraints on the 32-bit, 66-MHz periph- HDLC
eral component interface core shown in Figure 5 (next
page) required a fixed layout, which dictates a hard-core
implementation. Because the PCI function is highly config- Processor local bus (PLB) UART 1
urable, we partitioned the design into nine chiplets that can
be individually selected as required by an application.
IBM customers have designed several chips containing
chiplets, using the desktop approach. For example, one de- UART 2
PowerPC
sign requiring a PCI controller used a PCI engine chiplet in CPU core
combination with register configuration, master FIFO write
server, and master FIFO read server chiplets. Another design General-
purpose
requiring a PCI bridge also used the engine and register con- I/O
figuration chiplets but no FIFO servers. Instead, it used the
master DMA write server and master DMA read server
chiplets. The PCI function implemented in the design using Figure 4. O n-chip bus implementation.

OCTOBER–DECEMBER 1 9 9 7 29

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

CORE DESIGN

Master
form additional technology-specific manufacturability
DMA checks using an IBM-developed set of checking routines.
read Functional verification of the design is left entirely to the
server
designer. IBM does not resimulate ASIC designs before or
Configuration
Master registers after layout. We provide an updated postlayout design netlist
FIFO
read
to the ASIC customer for functional verification, with an SDF
server Slave (standard delay file) on request, to support delay simula-
write tion. While many customers still rely on gate-level simula-
Master port
tion for functional verification after layout, we recommend
DMA
write the use of formal verification tools for this task.
server Slave We have functionally verified pre- to postlayout versions
PCI read port
engine
of designs exceeding 800,000 gates with a 24-hour turn-
Master FIFO around using the DesignVerifyer tool from Chrysalis. With
write server Slave delayed the advent of chips containing complex embedded cores,
read port
verification through gate-level simulation has become even
less practical, and formal verification methods have become
Figure 5. PC I chiplets. more essential.
Our timing-driven layout system handles both hierarchi-
cal and flat designs. Timing assertions (description of ex-
the FIFO servers was approximately 35% smaller than a sin- pected arrival times, clock cycles, false paths, and so forth)
gle hard core containing all nine chiplets. The PCI function for static sign-off generate timing targets for the place-and-
using the DMA servers was over 40% smaller. route system. To close the final postlayout chip timing, we
use a series of in-place optimization tools for drive strength
Firm cores. Using firm cores increases design flexibility optimization, buffer insertion, clock tree placement, and
and reduces layout problems. A core provider who requires scan chain reordering.
IP protection for functions without critical timing require- Figure 6 also shows how we have updated the basic ASIC
ments that drive a fixed hard-core layout should implement methodology with core-specific deliverables. Although the
such functions as firm cores. Firm cores provide abstracted basic flow and sign-off points remain the same, the design
core views to the designer; the vendor replaces the firm kit has changed substantially. In addition to the base library
cores with the gate-level netlist during chip layout. By al- of ANDs, NANDs, latches, and SRAMs, the ASIC vendor now
lowing core size, aspect ratio, and pin locations to be altered supplies large pieces of the customer design in the form of
during layout, this method facilitates an optimal chip lay- soft-core netlists and black box models for firm- and hard-
out. It alleviates tiling problems and reduces the unused sil- core functions. This requires the creation, delivery, and sup-
icon space that sometimes occurs when several large cores port of several new core-specific models. New tools and
reside on the same chip. design techniques are also needed to address the verifica-
tion requirements of complex cores (such as embedded
SOC design methodology controllers) that now reside on ASIC silicon.
Figure 6 illustrates an ASIC design methodology proven
on very large (500,000 to 2 million gates), high-performance Simulation
(over 100 MHz) designs. IBM ASIC customers used this The customer receives soft-core functions as gate-level
methodology to create the designs described in this article. netlists. These netlists are mapped to the same ASIC library
The most significant factor in achieving success on these used by the customer logic. Simulation of the core netlist,
large, complex designs is the sign-off criteria. We base de- therefore, requires no unique support beyond the design kit
sign sign-off on static timing analysis and DFT compliance provided for customer logic simulation at the gate level.
with fully automatic test pattern generation rather than ex- Simulation models for hard and firm cores fall into two
haustive delay simulation and functional manufacturing test major categories: full-function models (FFMs) and bus func-
vectors. We enforce DFT rules through test structure verifi- tion models (BFMs). Each hard- or firm-core macro requires
cation (TSV) software provided by the ASIC vendor. an FFM. An FFM, derived from the core design source, ac-
We run the TSV against the design at both the release-to- curately models the core hardware’s behavior. The design
layout and release-to-manufacturing checkpoints. Similarly, source may be Verilog or VHDL and may be register-transfer
we use a “golden” static timing analysis tool (Einstimer) and level, gate level, or a mixture of the two. Because these core
library for timing sign-off at these same checkpoints. We per- macros require IP protection, the design source must be en-

30 IEEE DESIGN & TEST OF CO MPUTERS

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

Customer design kit


ASIC vendor sign-off tool Initial
Core-specific deliverables design review

Core model kit


Design entry
Design source (soft only)
Bus models (core and system bus)

Full-function models
Hardware-software High-level Core instruction Synthesis, timing, test,
cosimulation simulation set simulator floor planning

Test benches, bus


command files

Floor planning Logic and


test synthesis

ASIC library
component models

Static Gate-level Test and Formal Prelayout Synthesis, simulation,


timing analysis simulation macro structure verification technology checks timing, test, floor planning
verification

Release
Timing to layout SmartModel (Swift) interface. This interface is supported by
assertions most event-based simulators on the market and requires
uniquely compiled versions on a platform basis only (Sun,
Timing-driven
layout HP, IBM RS/6000 ), not a simulator-specific basis.
We supply bus function models for processor cores and
on-chip bus structures. A BFM is not derived from the design
Static source; it is based on the processor’s bus specification docu-
timing analysis
SDF, RC, ment and is written in VHDL or Verilog. A translator can cre-
capacitance
values ate the alternate form. In either case, the customer receives
Postlayout unencrypted HDL source for simulation. In contrast to the
technology checks
FFM, which accurately represents the entire core function, a
BFM drives simulation with the core’s bus response without
modeling the internal implementation. Because it represents
Release to Automatic test only a subset of core function, a BFM generally simulates faster
manufacturing pattern generation
than an FFM and is useful in the early stages of SOC design.

Figure 6. C ore-based ASIC design methodology. Synthesis


The customer receives a soft core as a technology-
dependent VHDL or Verilog netlist mapped to the same ASIC
crypted but still function in simulation. library as the customer logic. Soft-core synthesis usually con-
We protect the hard- and firm-core IP by compiling the sists of instantiating the netlist directly into the customer’s
model to a non-human-readable form, using Viewlogic’s design. In some cases, a small portion of a soft core may be
Verilog Model Compiler (VMC). This model can simulate di- parameterizable.
rectly with Cadence’s Verilog XL and Viewlogic’s Verilog Elements within soft-core netlists used in the Synopsys
Compiled Simulator tool. Communication between the mod- synthesis environment carry the company’s “dont_touch”
el and the simulators takes place via the Verilog programming annotation to prevent changes to the soft-core design during
language interface (PLI). To support use of the compiled mod- synthesis optimization. This annotation preserves the func-
el with VHDL simulators, we integrated the model output from tion and performance guaranteed by IBM. A customer who
the VMC with a set of PLI routines known as the Synopsys chooses to change a soft core beyond the allowed parame-

OCTOBER–DECEMBER 1 9 9 7 31

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

CORE DESIGN

terization must explicitly remove the don’t_touch annota- capacitance files to generate firm-core timing rules because
tion and subsequently accepts ownership of the core’s func- the firm-core layout varies with each implementation.
tion and timing. Instead, we use the timing assertions to generate fixed pin-
Hard and firm cores are modeled as black box library el- to-pin delays that exactly match the delays published in the
ements in synthesis and pass through the synthesis process core’s functional specification. We capture these delays,
unchanged. The PowerPC CPU core is an exception to this with variable temperature and voltage factors, and the ap-
rule. A synthesizable logic macro called the test mode ma- propriate timing checks in DCL statements. We compile DCL
trix (TMM), required to support functional testing of the CPU, statements for both hard and firm cores into a non-human-
accompanies the PowerPC black box model. The customer readable executable form, which is provided to the cus-
synthesizes this logic to elements in the target ASIC library tomer for static timing analysis. From the abstracted DCL
and can optimize it for both area and performance. timing model, an IBM program called gensyn uses Einstimer
to create timing information in the Synopsys synthesis mod-
Timing el and core timing wrappers for simulation back-annotation.
The customer performs timing analysis on soft cores using
the timing models for the ASIC library elements. Timing as- Testability
sertions that specify false or don’t-care paths in the design Soft and firm cores are designed to meet the same DFT
come with the soft-core netlist. The customer incorporates requirements as the customer design. All soft and firm cores
these assertions into the chip-level assertions used for timing must pass through the TSV sign-off tool without generating
sign-off. errors or warnings. All core scan chains and test clocks must
The basic criterion for a soft core is that it meet perfor- be connected correctly, and untestable faults are not al-
mance requirements using the standard timing-driven lay- lowed. An edge clock at the core boundary drives the core.
out system. Therefore, IBM provides core-specific wire-load A clock splitter element in the design splits the edge clock
models or area constraints on an exception basis only. The into the required master/slave clocks. The customer con-
customers who designed the chips shown in Figures 2 and nects the edge clock, scan clocks, and scan chain inputs
4 integrated the soft cores with their custom logic and per- and outputs in the customer logic to the corresponding pins
formed chip-level timing analysis. IBM placed and routed at the core boundary. Once the core is fully integrated into
these chips, without soft-core region constraints, using the the customer design, the customer uses ASIC library com-
timing-driven layout approach. In contrast, the RAMDAC soft ponent test models to check it again for DFT compliance at
core in Figure 1 required region constraints and early floor the chip level.
planning, in addition to timing-driven layout, to meet the We take either an integrated or an isolated approach to
220-MHz performance target. testing hard-core macros, depending on how the core was
For hard and firm cores, core developers must create designed. If the hard-core design complies with the DFT re-
black box timing models or timing abstracts to protect the quirements, we can test the core with the same test meth-
intellectual content. The timing model must contain values ods used in the standard ASIC flow (Figure 7). IBM includes
for all pin-to-pin paths as well as all appropriate timing the full gate-level core model and automatically generates
checks such as setup, hold, and minimum pulse width. test patterns for the customer and core logic concurrently.
Although the timing model is an abstracted representation To maintain IP protection for hard and firm cores, we send
(that is, it does not contain detailed design data), it must be an encrypted model to the customer. A special cloaking fea-
accurate enough for static timing sign-off. Because of core ture in the sign-off TSV tool prevents core models from be-
designs’ complexity and time-to-market requirements, an ing viewed via the graphical interface. We used the
automated method of creating these models is necessary to integrated core test method on the customer designs con-
eliminate human error. taining the PCI core chiplets described earlier.
We create the hard-core timing models for static timing In isolated testing, we test the customer logic separately
sign-off using the IBM Einstimer tool’s design abstraction from the core using a complex set of multiplexing and gat-
process. Originally developed to support efficient timing of ing logic, called an isolation matrix, or test mode matrix
hierarchical designs, we extended this capability for cores. (Figure 8). Control signals put the core into a series of
We read a detailed design netlist, RC (resistance-capacitance) modes. The nontest or functional mode allows communi-
and capacitance files from layout, and the design timing as- cation between the core and customer logic. In this mode,
sertions into Einstimer. The tool generates a pin-to-pin timing the isolation matrix logic is transparent to the core’s func-
abstraction for the design and writes out the information in tional use. In core test modes, the core is accessible via the
Delay Calculation Language (DCL). chip I/Os, and the customer logic is fenced off and stable.
We do not use the detailed netlist and the layout RC and We can apply functional patterns to the core using the chip

32 IEEE DESIGN & TEST OF CO MPUTERS

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

User-
defined
logic
Core Test User-
isolation defined
matrix logic
Core 1

I/O

Core 2

Figure 7. Integrated core-testing approach.

I/Os. We can also apply stored scan patterns directly to the


core. In user-defined logic test mode, the customer logic is I/O
accessible via the chip I/Os, and the core is fenced off and
stable. We test the customer logic by applying scan patterns
using the chip I/Os.
Multiple cores on a single chip that require isolated test-
ing will each need an associated isolation matrix. Adding
this matrix may add several days to the design phase by in-
creasing logic design and circuit wiring complexities. Figure 8. Isolated core-testing approach.
We used the isolated test approach on the PowerPC CPU,
the video DAC, and the Rambus cores. Analog cores often
have additional unique test requirements, as was the case for We remove the netlist information and replace it with the
the Rambus and video DAC. These cores must connect to firm-core library element before returning the design to the
specific, predetermined chip I/Os that are contacted by ana- customer for postlayout functional verification.
log testers during manufacturing test. Hard cores complicate floor planning because of their
size, wiring blockage, noise isolation requirements, periph-
Floor planning and layout eral test circuitry, and proximity to chip I/O pads. Although
The customer performs floor planning of soft cores as part the PowerPC CPU core uses standard I/O cells and has no
of chip-level design, using the ASIC library floor-planning placement restrictions, it can use more than a quarter of a
models. It may be necessary to constrain the placement of small chip’s area. The core can be placed anywhere on the
soft-core logic, but the automatic placement tools often find chip, but placement in a corner of the die allows for maxi-
a good solution without extraordinary intervention. RAM- mum wiring of the remaining chip logic. Corner placement
DAC, running at 220 MHz, was the only soft core that re- also minimizes the chance of splitting a functional block
quired advanced floor-planning and grouping restrictions and placing the parts on opposite sides of the core. We
to meet its performance target. We successfully placed, rout- placed CPU pins that communicate with on-chip logic pri-
ed, and timed all the other soft cores in the designs described marily on the north and east sides of the rectangular core
here using the timing-driven layout system with chip-level structure. A small number of CPU pins—those that com-
timing assertions. municate with the chip I/Os—appear on the south and west
IBM provides black box floor-plan models for each firm sides of the core. If a critical timing path traverses the core,
and hard core. These models accurately represent core size, the designer can use region constraints to force logic to lie
aspect ratio, and pin locations. Firm cores resemble any oth- on one side of the core or block the area on one side to avoid
er hierarchical block, except that processing them involves the long wires.
added complexity for the ASIC vendor, who must insert and Floor planning of chips with large hard cores is essential
remove the gate-level netlist. After inserting the netlist, we because of the wiring blockage these macros create. The
place and route each firm core and time it to meet the same PowerPC 401, the Rambus, and the PCI chiplets block all
timing assertions used to generate the static timing model. chip wiring tracks on the first two levels of metal. Porosity on

OCTOBER–DECEMBER 1 9 9 7 33

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

CORE DESIGN

the third level of metal ranges from approximately 15% in much easier than laying out a chip without cores. For ex-
the PCI chiplets to zero in the Rambus. In the 0.35-µm ASIC ample, large static RAMs have been available as cores for
product, this leaves a minimum of one and a maximum of many years, and these enable the layout engineer to add
two wiring planes for routing over these cores. The limited large amounts of memory to a chip without difficulty.
porosity is caused primarily by the densely packed under- Likewise, well-designed cores solve timing, clock skew, and
lying core logic. Other core areas may complete the block congestion problems for high-performance circuits without
wiring to prevent routing noise over the analog circuitry (in additional layout effort on core contents.
the Rambus) and instruction and data caches (in the 401
CPU). This causes highly congested wiring in other chip ar- Hardware-software cosimulation
eas. It may be necessary to reserve area around the cores to Simulation and verification can quickly become the bot-
help alleviate wiring congestion in those regions. tleneck in the design of large, complex core-based systems.
Certain hard cores, particularly those with analog func- A range of simulation models of varying accuracy, including
tions, have additional characteristics affecting placement. BFMs and FFMs, address specific needs at various stages of
The Rambus, for example, is a high-speed analog/digital the ASIC hardware design process. But none fulfills the
memory controller with unique test and high-performance needs of the software designer developing code for the em-
I/O cell requirements. The I/O cells must occupy predeter- bedded processor. Using an instruction set simulator (ISS)
mined locations on the die to allow access by wafer-level with an instruction set architecture (ISA) model has been a
test equipment. The Rambus core requires wiring several traditional method of software designers for code debug and
sensitive high-current analog signals to the chip’s I/O pads. execution time analysis. In stand-alone mode, an ISS runs
Performance requirements on the signal nets between the several orders of magnitude faster than an FFM, executing
core and the I/O pads and the need to prevent coupling to an average of 100,000 instructions per second (IPS) versus
other noisy wires limit placement of the core: It must be the FFM’s 5 to 20 IPS. An ISS also gives the software developer
placed in a predefined area directly adjacent to the test- visibility into the internal registers of the processor as it ex-
specific I/O cells. ecutes instructions and provides breakpoint and single-step
The chip shown in Figure 2 required both the 401 and functions for controlling the execution stream.
Rambus cores to fit on a small die. Because the Rambus had Because the processor core is embedded on the same
to occupy a location in the middle of one side of the chip, piece of silicon as the customer-designed ASIC logic, test-
the 401 was forced into the corner on the opposite side. This ing the interaction of processor and ASIC gates is critical. In
placement created a long narrow area between the two most cases, the processor must correctly execute a stream
cores that caused significant chip-level routing challenges of initialization code before it can begin to interact with the
on the four-level metal design. The design was highly pop- surrounding ASIC logic. We can debug this hardware-
ulated (using over 80% of the available silicon) and con- dependent software by translating processor instructions
tained logic that needed to communicate across the core into a memory image and running an ASIC simulation with
to other on-chip logic. Because routing across the core was the FFM fetching instructions from the memory model. This
available in only the vertical direction, we had to create ad- method’s productivity is constrained by the FFM’s perfor-
ditional reserve areas around the cores to allow for hori- mance and the limited visibility this model provides into the
zontal wires. Detailed floor planning and wiring congestion processor’s internals. As a result, many vendors are creating
analysis highlighted these routing problems early in the de- cosimulation products that link a processor ISS with an ASIC
sign process. Subsequent versions of the 401 CPU contain HDL simulator by using the processor BFM. These products
wiring channels through the core on the third wiring level to execute processor instructions faster and concurrently sim-
help alleviate the chip-level routing problem. ulate the activity of the remaining logic.
Large cores sometimes require additional peripheral cir- IBM used a prototype cosimulation system developed for
cuitry such as test logic, which must be arranged around the the PowerPC 401 during the design of several core-based
core. When test signals are multiplexed with functional sig- chips. The system consists of a 401 ISA model running in the
nals around the chip’s edge, the floor plan must account for PowerPC Virtual Simulator (PVS), linked with the VHDL sim-
the additional wire demand. ulator from Model Tech, Inc. (MTI), executing on an RS/6000
Logic and layout optimization of hard-core contents trans- workstation. The ISA model does not include the concept
lates into area and performance improvements within the of functional pins and uses the BFM to model output pin be-
core. Inefficiencies at the chip level will partially offset these havior. The BFM accepts bus commands such as read and
gains unless the cores and logic on the chip are arranged to write and translates them into bus transactions that model
avoid timing and wiring congestion problems. the interface’s signal sequencing and timing.
With a good floor plan, laying out a chip with cores is The ISA model in PVS executes instructions at a relative-

34 IEEE DESIGN & TEST OF CO MPUTERS

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.
.

ly high code throughput (much higher than an FFM would


execute object code). It sends simple bus commands to the Ann Marie Rincon is a senior engineer in the
bus model in the MTI environment. It can also send com- ASIC Products Group of IBM Microelectron-
mands to a file, so that a stand-alone BFM-based event-level ics in Essex Junction, Vermont. She is also the
simulation can run without requiring PVS at a later time. Core+ASIC Methodology Team leader. Rin-
Designers debugging the software execution retain the visi- con received a BS degree in mathematics and
bility and instruction stream controls provided by PVS. The computer science from Saint Joseph’s College
ASIC designer can debug problems detected in the ASIC log- in Indiana.
ic in a familiar HDL simulator environment.
The design team for the chip in Figure 4 used the PVS/MTI
cosimulation prototype to debug the instructions initializ- Cory Cherichetti is a staff engineer in the IBM
ing the 401 processor and to configure ASIC logic functions ASIC Products Group, where he is working on
on the PLB and OPB buses. Attempting a similar level of the design of several cores and the supporting
code debugging with the FFM model alone would have been methodology. Cherichetti has a BSCSE degree
virtually impossible. The cosimulation system provided vis- from Rensselaer Polytechnical Institute.
ibility, control, and interaction with actual ASIC gates—es-
sential to the team’s understanding of processor functions
and to the correct coding of chip initialization routines.

James A. Monzel is a senior engineer in the


SOC DESIGN IS A PROCESS of system architecture design, IBM ASIC Products Group, where he is the
block selection, integration, and verification. The designer Core+ASIC Test Methodology Team leader.
is responsible for the integration of components and verifi- He is the IEEE Computer Society’s Test Tech-
cation, as well as design of custom modules and modules nology chair for tutorials and the general chair
not available in the core library. This shifts ASIC design to a of the North Atlantic Test Workshop. He is also
block-based methodology that offers potentially huge pro- on the steering committees of the VLSI Test
ductivity increases over RTL synthesis of the equivalent chip. Symposium and the Workshop on Testing of Embedded Core-
Critical methodology components include static sign-off Based Systems. Monzel received a BS from Case Western Reserve
using static timing analyzers, structured DFT schemes, syn- University and a BS in physics from Marietta College. He is a senior
thesis, and floor planning. SOC design also requires new core member of the IEEE.
models, hardware-software cosimulation techniques, con-
sistent core development practices, and the availability of
cores themselves. Cores must be designed for reuse. Reusable David R. Stauffer is a senior engineer, re-
cores require a balance of completeness, flexibility, perfor- sponsible for core and ASIC design, in the IBM
mance, and optimization. In the future, cores will consume ASIC Products Group. Stauffer holds a BSEE
most of the die area, so core evaluation and selection will be degree from Pennsylvania State University
at least as significant as custom logic design. The SOC design and an MSEE degree from the University of
methodology is still evolving. An important area of future re- Houston.
search is the development of standards to foster core devel-
opment, exchange, and interoperability.

Michael T. Trick is a senior engineer in the


IBM ASIC Products Group and the leader of
the Physical Design Team for the ASIC Design
Center. Trick received a BS degree in electri-
cal engineering from the University of Illinois
Acknowledgments and a PhD in computer engineering from
We derived this article from two earlier works: “Core+ASIC Carnegie Mellon University.
Methodology: The Pursuit of System-on-a-Chip,” A. Rincon et al.,
Proc. Wescon IC Expo 97, and “Design Environment for System-on- Address questions or comments about this article to Ann Marie
a-Chip,” A. Rincon et al., Proc. On-Chip System Conf., Design Rincon, IBM Corp., 1000 River St., Dept. G07V, Bldg. 863-1, Essex
SuperCon, 1997. Junction, VT 05452-4299; [email protected].

OCTOBER–DECEMBER 1 9 9 7 35

Authorized licensed use limited to: University of Illinois. Downloaded on September 9, 2009 at 23:30 from IEEE Xplore. Restrictions apply.

You might also like