SUN Scribd
SUN Scribd
SUN Microsystems
410 N. Mary Ave
Sunnyvale, Ca. 94085
Abstract
The Niagara2 System-on-Chip is SUN Microsystems
latest processor in the Eco-sensitive CoolThreads line of
multi-threaded servers. This DFT survey of the Niagara2 chip
introduces the RAWWCas memory test, a Hybrid Flop Design
and a fast efcient bitmapping architecture called DMO. It also
showcases some excellent DFT results for this challenging system-on-chip design project.
1.0 Introduction
The Niagara2 SPARC chip is truly a Chip Multiprocessor, Core Multithreaded, system on a chip. It has eight processor cores each supporting eight threads. The cores run at
1.4GHz and are connected to 4MB of on-chip L2 cache. The
L2 cache connects to 4 on-chip DRAM controllers which
directly interface to a pair of fully-buffered DIMM channels.
Additionally N2 has an on-chip Network Interface Unit with
two 1 Gb/10 Gb ethernet MACs, and an on-chip PCI-EX controller.
The system on chip nature of Niagara2 poses a number
of DFT challenges. There are eight different clock domains
within the chip and a mixture of full custom, semi-custom, and
ASIC design styles. SERDES blocks [3] are hard IP, acquired
complete with their own DFT architecture which must be integrated into the overall Niagara2 DFT architecture. The entire
functional IO space of the chip is high speed SERDES rendering testing with functional vectors difcult and limited. This
makes providing high quality stuck-at and transition test vectors imperative.There are more than 300 SRAMs of various
sizes and functionalities each of which has an MBIST test plan.
There are over one million ops in the design managed in over
forty different scan string congurations and three different
clocking scenarios. Welcome to 65nm SOC design!
Paper 1.2
1-4244-1128-9/07/$25.00 2007 IEEE
L1clk
SI
Aclk
Clocks
Off
Load
Master
Time
Clocks
Off
Load
Slave
Clocks
Off
L1clk
Aclk
Bclk
Paper 1.2
ACTestTrig
PLL
RCLK
L2Clk
sync op
Clk_Stop
ACLK
BCLK
start
counter
AClk
counter
Package Pins
TCU
BClk
AClk
Cluster
Header
BClk
Paper 1.2
scannable input ops. The fact that each SRAM has boundary
ops allows the logic designers to perform a full cycle of logic
work between the previous op stage and the boundary of the
SRAM. Because of this design it is necessary to understand
the upstream logic and to appropriately load the upstream ops
to perform a Macro Test operation. This adds an undesirable
level of complexity to the SRAM access process.
At rising edge of clock SRAM
operates on data here
SRAM
Logic
MBIST Data(7:0)
MBIST_ON
MBIST_ON
Func RW Cntl
MBIST RW Cntl
MBIST
ENGINE
SRAM
Func Address
MBIST Address
three reads the zeros from the target location. At this point we
have executed a read immediately after a write of opposite
polarity with the worst case background: RAWWCas. Action
four reads the ones from the non-target location to provide a
read recovery bitline test. Finally action ve writes the target
location to all ones to match the background and the test is
ready to move on to the next target in the address space. When
march element seven is complete the SRAM is initialized to
the opposite polarity in element eight and the RAWWCas test
is repeated in element nine with the opposite test polarity.
8.2
MBIST_ON
MBIST_Address_Mix
FAIL
8.1
CAM Test
Each CAM in Niagara2 can be read and written like a
RAM. Figure 8.2.1 illustrates the test model we used for
CAMs. The March C- with RAWWCas test is applied to each
CAM during the RAM test phase. Once RAM testing is complete CAMs are further exercised in order to test the logic gates
that generate the HIT signal.
CAM Data Register
Memory Cell
Hit
Paper 1.2
{(W0);(W1CAM1W0);(W1);(W0CAM0W1);(CAMwalking0);}
Figure 8.2.1: 4-bit 2-row CAM Model with CAM Test Algorithm
Data Values - The MBIST engines have four sets of data values programmed for use: AA/55, CC/33, 99/66, FF/00. In
default MBIST mode the engine will deliver an eight bit data
value AA wherever 1 appears in the data eld of the algorithm
and 55 wherever 0 appears. The test is run twice with this pair
or data values, once in standard addressing mode the second
time with Address Mix active. When the second run is complete the data value set is changed to CC/33 and the test is run
twice again, then 99/66 and nally FF/00. There is a user data
register in the MBIST engine that can be loaded with an arbitrary eight bit value. When MBIST is run in user data mode
the value in the user data register will be used along with its
bitwise complement. In this mode the test is complete after
executing with this one data set.
User Address - The MBIST engine is equipped with three user
registers:
Address_Start,
Address_Stop,
and
Address_Increment to exercise control over addressing. Using
these registers the user can specify a specic address range to
exercise as well as the size of increments to be taken through
that space. This feature provides an excellent means of detecting intermittent failures in a know address space. In conjunction with the loop feature it also provides means to create a
tight at-speed loop around a failing address thus facilitating
efcient back side probing.
Loop - The MBIST engine can be put into endless loop mode.
This is especially useful for backside probing activities.
BISI - The MBIST engine has a Built-In Self Initialization
Mode. This is run as part of Power On Reset to initialize all
Paper 1.2
MBIST Data
SRAM
MBIST_Compare_Select
MBIST_Array_Select
128
64
64
Functional Data Path
64
256
tion to this need that uses high speed one port SRAMs that are
used in other parts of the chip. A 2X clock domain is created
inside the SRAM custom block and the memory is run at twice
the clock speed as the surrounding logic. In this way a read
and write operation can occur on consecutive clock cycles
within the SRAM custom block and appear to the surrounding
logic as if they had been executed on the same clock cycle.
Since the MBIST engines supporting these double-pumped
memories are designed as part of the NIU logic, they runs at
the normal NIU clock frequency. The vanilla MBIST algorithm
has therefore been modied to test simultaneous read and write
operations. March elements 2-5 in the test algorithm shown in
gure 8.1.2 consist of consecutive Read and Write operations
at the same address. For these double pumped arrays those
operations occur simultaneously.
8.4.4 L2Data Cache: Two Cycle Access
The 4MB L2Data Cache is partitioned into eight banks
with each bank partitioned into two sub-banks, top and bottom.
Circuit specications of this SRAM prohibit access to the
same sub-bank on consecutive cycles. Top and bottom subbanks do however share some address decoder circuitry and so
it is desirable to exercise consecutive cycle accesses on one
then the other sub-bank. The test algorithm of 8.1.2 was modied to jump back and forth between the top sub-bank and the
bottom sub-bank on consecutive cycles. The modied algorithm is shown here in gure 8.4.4.1. A superscript has been
added to each operation to indicate top or bottom sub-bank.
{(W1TW1B); (R1TR1BW0TW0B); (R0TR0BW1TW1B);
(R1TR1BW0TW0B); (R0TR0BW1TW1B); (R1TR1B);
(W0TW0BW*1TW*1BR0TR0BR*1TR*1BW1TW1B);
T
(W0 W0B); (W1TW1BW*0TW*0BR1TR1BR*0TR*0BW0TW0B);}
8.4.6 TCAM
There is a Ternary CAM located in the NIU block which
is responsible for ethernet packet processing. Each TCAM cell
has a mask-bit(m) and a data-bit(d). When a mask bit is set
to0 a match occurs regardless of the status of the data bit. The
mask and data bits share common data-in and data-out lines for
read and write operations. In the event of a multiple row hit,
priority encoder logic reports the smallest matched address.
An MBIST engine has been designed to test the read,
write, and compare operations of the TCAM. The read/write
basic operations are covered in the same manner as described
in section 8.1. The TCAMs compare operation is exercised by
implementing the algorithm shown in table 8.4.6.1 based on
the vanilla CAM test presented in section 8.2 Each test
sequence is designed such that it can be independently run for
characterization and bring-up purposes.
Test Sequence
(Wm1)
Paper 1.2
(Wd0); (Wd1CAM1Wd0);
(Wd1); (Wd0CAM0Wd1);
(Wd1); (CAMwalking0);
(Wd0); (CAMwalking1);
(Wd1); (Wm0CAM0Wm1);
(Wd1); (CAM1Wd0);
(Wd0); {CAM0Rm1};
Explanation
the-pins observation for the tester. There are two major challenges to applying this approach: excessive global wiring and a
speed mismatch between SRAMs and IO.
The largest arrays on the chip were targeted for DMO.
The ICache and DCache in the cores, the L2Data and L2Tag
which are partitioned into eight banks as seen in gure 1.0.1,
and fteen of the largest SRAMs in the Ethernet Network
Interface Unit. Bit reduction is performed on the MBIST legs
of all of the SRAM data output buses as described in section
8.4.1. DMO data buses from SRAMs within a cluster are
muxed together before leaving the cluster. In this way the
DMO data bus from any cluster back to the TCU is no more
than forty bits. The DMO data bus from the TCU to the IO
block is forty bits.
The IO blocks for the debug port operate at 350MHz
while the SRAMs in the cores and L2 operate at 1.4GHz; a
speed mismatch. To solve this problem the TCU is equipped
with time multiplexing logic. It takes the SRAM data in at
speed and performs sampling based on a user programable register. In one conguration it may sample and hold 1.4GHz
SRAM data for four clock cycles and thereby present a data
rate of 350MHz to the IOs; within the capability of the IOs to
respond and the tester to strobe. In order to get all the SRAM
data to the IOs it is necessary to run the MBIST four times
changing the sampling offset while keeping the same sampling
rate for each run.
Scan String 14
Scan String 3
Scan String 2
Scan String 1
Paper 1.2
PRPG
MISR
12.0 Summary
The SUN Microsystems Niagara2 System on a Chip is a
signicant DFT challenge. With well chosen testability guidelines in place the team has been able to achieve greater than
98% stuck-at test coverage. Embedded SRAMs are covered
completely by at-speed MBIST equipped with a rich feature
set supporting debug, bitmapping, and failure analysis. We
have introduced a number of original DFT solutions. A hybrid
op design that combines the design advantages of edge triggered ip-ops with the hold time immunity of LSSD masterslave latch designs. We have presented the RAWWCas weak
bit memory test. We have also presented Direct Memory
Observe, a useful combination of MBIST with direct pin
access to greatly facilitate embedded SRAM bitmapping.
Clock domain management across the chip posed a signicant
challenge for debug clock control and the reliable application
of scan test vectors crossing clock domain boundaries. LBIST
Acknowledgments
The authors would like to acknowledge SUN colleagues
Ray Heald and PJ Tan for their role in developing the RAWWCas memory test. And Paul Dickinson for his participation in
the development and productization of the Direct Memory
Observe feature.
References
[1] P. H. Bardell and W. H. McAnney, Self-Testing of Multichip
Logic Modules, Proceedings ofthe 1982 IEEE Interntional Test
Conference, Nov. 1982, pp. 200-204.
[2] C.Pyron, M.Alexander, J.Golab, G. Joos, B.Long, R.Molyneaux,
R.Raina, N.Tendolkar, DFT Advances in the Motorola
MPC7400, a PowerPC G4 Microprocessor, Proceedings of the
1999 IEEE International Test Conf., Sept. 1999, pp141
[3] I.Robertson, G.Hetherington, T.Leslie, I.Parulkar and R.Lesnikoski, Testing High-Speed, Large Scale Implementation of SerDes
I/Os on Chips, Proceedings IEEE International Test Conference,
2005.
[4] IEEE Std 1149.6-2003, IEEE Standard for Boundary- Scan Testing of Advanced Digital Networks
[5] R.Muench, T.Munns, W.C.Shields, Bitmapping the PowerPC
604 Cache Using ABIST, Teradyne User Group Conf, April
1996.
[6] A. J. van de Goor, Testing Semiconductor Memories: Theory and
Practice, John Wiley & Sons, New York, USA, 1991.
Paper 1.2