Euro Designcon 2005
Euro Designcon 2005
Keywords: “Design For Test”, DFT, At-Speed Tests, Delay Tests, Transition Delay, Path
Delay, On chip PLL.
Author(s) Biography
Eric Haioun is a Digital Design engineer actually working on Design Verification. Eric is
a member of the Freescale DFT Methodology Development Council and was previously
DFT leader on a complex digital design for Freescale Networking & Computer System
Group. Eric has a Masters degree in Electrical Engineering of the National School of
Electronics and its Applications (E.N.S.E.A) in Cergy Pontoise, France. His main interest
is in chip design DFT, Integration, Verification and Methodology.
Colin Renfrew is a Digital Design engineer for Freescale Networking & Computing
System Group. Colin has an MSc in System Level Integration from the ISLI (University
of Edinburgh) and a BEng(Hons) in Computer and Electronic Systems from the
University of Strathclyde. His main interests are in SoC chip design, DFT and
Verification.
Robert Gach is Design Manager of the Freescale Networking & Computing System
Group in Europe. Robert has a BSc (Hons) of Electrical and Electronic Engineering. His
main interests are in SoC chip design from the Front-End definition, design and
verification to the Physical implementation.
1 Introduction
As an alternative to using a more expensive tester, the preferred solution for this project
was to generate the at-speed clock on-chip. An innovative and silicon-proven DFT
technique to do this had been designed for LSSD (Level Sensitive Scan Design, with
Latches) style, co-developed by Mentor and Freescale Semiconductor (formerly
Motorola) on the PowerPC(TM) microprocessors [1]. Since the device for this project
was designed with Scan Flip-Flops (MUX-D), the objective was to adapt the concept and
develop a suitable, generic solution that could be applied to all MUX-D style devices.
This paper will present the design of the on-chip clock generation and control logic, the
delay fault models adopted, the DFT implementation and the verification, describing the
results before tape-out and also on silicon.
The device on which we applied this technique, was fabricated in a 130nm process with
eight layers of metal, incorporating over 62 Million transistors with >150k Flip-Flops and
4Mb of distributed Memory.
In order to enable full speed transition scan testing it was decided to implement an on-
chip method for generation of the high speed clock pulse for the transition launch and
capture. The internal core clock frequency for this device, 250 MHz, was already above
the normal ATE speed in terms of accurate high frequency clock generation and was well
beyond the limits of wafer probe capability. The implementation of an on-chip solution
was also deemed better suited for re-use and the migration of device test to lower cost
ATE platforms. In addition, it offered greater control of the transition pulse timing via the
PLL, latter used as an aid for device characterization.
2.1 Requirements
1. Secure operation
o Allow simple ATE interfacing
o Easy to meet Interface Timing requirements
o Meta-stability protection for multiple clock domains and asynchronous
Inputs
o Single “At Speed” Clock Domain
2. Technology Independent
o Implementation should not be specific to a particular technology
o Independent of Standard Cell Library
o Independent of PLL features/operation
3. No custom design or Layout
o Must allow Timing Driven P+R Backend Flow & Timing Closure Flow
4. Synthesizable
o Implementation in Verilog/VHDL for synthesis
5. Flexible pulse generation
o Capable of generating 2 or 3 Transition Clock Pulses, 3 pulse are used to
increase the sequential depth available to the ATPG tools
6. Extendable
o Easily extendable to increase the number of transition clock pulses
2.2 Implementation
The Transition Pulse Generator (TPG) has been implemented in synthesizable Verilog.
For simplicity of visual description, a schematic representation will be used in this paper.
It should be noted that a standard SoC backend flow was used: synthesis; timing driven
Place & Route; clock tree insertion; back-annotation and Static Timing Analysis (STA).
No custom timing or layout techniques or in house tools were used.
The logic is by nature non-scan-able since it itself is used for running the device in scan
mode. For this reason the “core_clk” was made visible on a Primary Output Pin in a test
mode to allow operation of the circuitry to be verified albeit at a lower clock frequency.
The simplified schematic representation is as follows:-
scan_clk P8
P6
P9 Clock Tree
core_clk
P7
P5
P3 P4
transition_pulses
P1
pulse_2b_3_sel P2
M3 M4 S1 S2 S3 S4
D Q D Q D Q D Q D Q D Q
pulse_trigger
scan_en EN2
EN1 M1 M2
scan_tf_en D Q D Q
CLK QB CLK QB
sys_clk
Signal Descriptions
• sys_clk – Primary Input, System clock used as the reference input clock for the
PLL, continuous free running clock, synchronized to “scan_clk”
• pll_out_clk – Output reference clock from the PLL, internal core clock frequency
• scan_clk - Primary Input, Scan chain clock input source, discontinuous clock
• core_clk - Primary Input, Internal Core clock
• scan_tf_en - Primary Input, Scan mode Transition Fault enable
• scan_en - Primary Input, Scan chain shift enable, low = capture & high = shift
• pulse_2b_3_sel - Primary Input, Transition Pulse number select, low =2 & high=3
• pulse_trigger – sampled “scan_en” used to start pulse generation sequence
• transition_pulse – extracted 2 or 3 pulse sequence for combination with the
“scan_clk”
2.3 Operation
The following timing diagram shows the required input/output waveforms. The diagram
shows how the primary input signals, “sys_clk”, “scan_clk” and “scan_en”, are used to
transform the internal “core_clk” signal to contain both the “scan_clk” for shift and
two/three PLL clock pulses for the high speed transition capture.
sys_clk
scan_clk
scan_en
pulse_trigger
pll_ref_clk
P5
core_clk
The next timing diagram takes a closer look at how the pulses are extracted, in this case 2
pulses, “transition_pulse” (P5). This is generated by the shift Register S1-4 and the logic
P1-5. M1-4 are used as a meta-stability barriers since the signal “pulse_trigger” is
crossing clock domains
pll_out_clk
pulse_trigger
M3 (meta-stability)
M4 (meta-stability)
S1
S2
S3
S4
P2
transition_pulses (P5)
pll_out_clk
pulse trigger
M3 (meta-stability)
M4 (meta-stability)
S1
S2
S3
S4
P2
transition_pulses (P5)
The device targeted 130nm technology using a commercial standard cell library with a
target operational frequency of 250MHz. This meant that we had 2ns (half a clock
period) as the upper limit for propagation delays within the pulse generation logic.
This logic path equates to 4 levels of logic and the 2ns target was easily attainable with
synthesis and timing driven Place & Route. Indeed STA showed that frequencies in
excess of 600MHz were easily attainable with a 130nm technology. Data from a 90nm
SOI process indicate that 1GHz operation would be attainable.
If custom layout techniques such as hand placement and hardening were used then
significantly higher frequencies could be attained. This has not been quantified since it is
out with the requirement defined for the implementation of this device.
The accuracy of the resulting Transition Pulses must also be quantified, error sources
identified were:-
1) Input clock jitter
2) PLL jitter, which would be unique to the internal solution
3) Clock tree skew, which would also be present in either and internal or external
solution
For our implementation we achieved ~300ps for 2) and 3) combined, i.e. less than 10% of
the 4ns clock period. This potential pulse spacing error must be taken into account during
the cross-correlation exercise between silicon and simulation.
3 “Design For Test” Delay Tests using On-Chip PLL
solution
This section will cover the “Design For Test” (DFT) implementation of the Delay Tests
using the internal PLL solution described in the previous section. Before describing the
requirements and implementation, we will go over a brief description of the Delay Test
concept.
The purpose of a delay test is to verify that a chip operates correctly at the specified clock
speed. Researchers have proposed two types of fault models for dealing with generating
test patterns for delay defects detection. In the transition fault model, a gate output has a
slow-to-rise and a slow-to-fall fault associated with it. In the other delay fault model,
called the path delay fault model, a chip contains a path delay fault if it has a path whose
delay exceeds a specified value.
The Transition delay fault model looks for a gross delay potential at each gate terminal
(Paths are automatically selected by the ATPG tool). A faulty node is shown below in
red:
The Path Delay fault model looks for combined delay through all gates of a path
(Paths are explicitly loaded into the ATPG tool). A faulty path is shown below in red:
Below is a real example showing a resistive via defect seen on silicon, resulting in a
failing transition delay test pattern. Figure 8 shows the resistive via between two metal
layers and figure 9 shows the signal transition on both good and bad devices.
Figure 8 – FA report of resistive via between two metal layers
Figure 9 – FA report of timing diagram: slow to fall silicon defect
The schematic below (figure 10) describes how the low speed external clock
“scan_clock” and the high speed internal clock “transition_pulses” are combined into a
single “core_clock” that drives all of the scan registers on this clock domain.
PADS
scan_clock
core_clock
transition_pulses
sys_clock
PLL
TPG
scan_tf_en
pulse_2b_3_sel
scan_enable
During the shift or load/unload phase (scan_enable=1) the tester drives “scan_clock”,
which is connected to the clock input of the scan cells (“core_clock”). This clock can be
at low speed, since only the scan chains are activated to load/unload values in/from the
scan cells.
During the capture phase (scan_enable=0) the TPG provides the internal clock for the
transition pulses, connected to the clock input of the scan cells (“core_clock”). This clock
is at functional speed, programmed via the PLL setup, and “scan_clock” is inactive
during this phase. In general two pulses are generated, the launch clock and the capture
clock. An option to generate three pulses is available in the TPG to increase the
sequential depth which may improve the test coverage in certain cases.
1. The ATPG tool needs to have a clock defined as a scan clock from a PAD, in this
case the tool needs to know that the scan clock used for capture is an internal
signal.
2. The transition delay setup has to be in Broadside mode and cannot be in Last Shift
Launch mode. Broadside mode means that both the launch and capture pulses
need to happen during the capture phase (scan_enable=0). Last Shift Launch
mode means that the launch pulse happen during the Shift phase (scan_enable=1)
and the capture pulse during the capture phase. In Broadside mode, scan_enable
timing is not critical but the coverage is usually lower, depending on the design
and the ATPG tool. On the device we are referring to in this paper, the coverage
difference was less than 2%.
3. The external port sys_clock must free running, to keep the PLL locked.
To handle the first requirement, the ATPG tool that we selected used a “named capture
procedure” that defines two states for a scan clock, either external (driven from a PAD)
or internal (from an internal node in the design). For the second requirement, the tool was
forced to work in broadside mode with a command switch when selecting the transition
or path delay fault models. For the third requirement, we chose to constrain the PAD
sys_clock to 1 in the ATPG tool and to change the format of this signal to “Non Return to
Zero” in the test patterns, in order to create a free running clock.
Several input pins, shared in scan mode, were used to program the PLL ratio to 10:1. The
external clock was set to 25MHz to provide a internal clock at 250MHz, the targeted
functional speed of the device.
Internal mode:
• cycles 1 to 2 at low speed (25MHz): configure TPG and force all clocks (internal
and external) to their off-state
• cycles 3 to 5 at high speed (250MHz): wait cycles (wait until TPG
transition_pulses clock signal is active)
• cycles 6 and 7 at high speed (250MHz): force a pulse transition_pulses internal
signal
• cycles 8 to 12 at high speed (250MHz): wait cycles
• cycles 13 at low speed (25MHz): wait cycle
External mode:
• cycles 1 to 2 at low speed (25MHz): configure TPG and force all clocks (internal
and external) to their off-state
• cycles 3 to 5 at high speed (250MHz): wait cycles (wait until TPG
transition_pulses clock signal is active)
• cycles 6 and 7 at high speed (250MHz): empty cycles (the pulses are generated by
the TPG block)
• cycles 8 to 12 at high speed (250MHz): wait cycles
• cycles 13 at low speed (25MHz): wait cycle
The internal and external modes have the same duration, the only difference being that in
internal mode the design internal node transition_pulses is forced by the tool (as if it was
a PAD input of the device). When in external mode, no action is taken since the launch
and capture pulses on the internal clock are generated by the TPG block. The internal
mode is used by the ATPG tool to generate test patterns for simulation without the TPG
and PLL models. The external mode is used to generate test patterns for simulation with
the TPG and PLL models and for silicon test.
The ATPG process generated the following number of patterns generated and subsequent
coverage:-
An ATPG pattern consists of the load/unload and capture phases, which corresponds to
388 clock cycles.
4 Pattern and Transition Pulse Generator Verification
Before applying the at-speed ATPG test patterns to real silicon, it was important that they
were verified on the design in simulation. In addition to this, for this specific
implementation and use of the PLL, it was imperative that the functionality of the on-chip
logic that supports the at-speed test capabilities was verified before tape-out of the
design, since any problems would be much more difficult and time-consuming to debug
on silicon. Through simulation of the generated test patterns and analysis of the results,
the TPG logic and the ATPG patterns were verified together. The ATPG patterns were
run through a standard Verilog simulator and also through a Virtual Tester before being
applied to the silicon. The methodology behind the ATPG pattern and TPG logic
verification, the results from running the ATPG patterns through simulation and the
results on silicon are described below.
In order to verify that the at-speed ATPG patterns have been generated correctly they
must be fully simulated on a version of the design as close as possible to the real silicon
and the results cross-checked with those expected by the ATPG tool. The chart below
shows the ATPG pattern verification flow used on this design.
PLL Model
Simulation Tool
Memory Models
Logfiles Waveforms
FAIL
Analysis
PASS
Final ATPG Patterns
The ATPG patterns were provided to the simulator as stimulus along with the appropriate
SDF (Standard Delay Format) timing files and Gate Level Netlist. For testing the patterns
at-speed, it was essential to use SDF back-annotated timing on a post-layout version of
the design in order to have an image of the device as close as possible to real silicon. The
simulations were run with best and worst case timing in order to verify the extreme
corners of the device specifications and process.
Behavioral models with timing characteristics were used for the memories and the PLL.
The model for the PLL accurately estimated the real delay in waiting for the PLL to lock
to the configured clock speed (approximately 1µs). For the purposes of verification and
debug, this resulted in an undesired delay in the simulation before being able to see any
of the actual results and so was ameliorated to zero inside the model without affecting the
functionality of the PLL itself.
Following the generation of the ATPG patterns by the ATPG tool, they are then validated
by running them on the design through simulation. The number of potential device faults,
and therefore the number of patterns required to test a device, increases with the number
of transistors. This particular device resulted in many thousands of patterns being
generated for each pattern type (stuck-at, transition delay and path delay). This, combined
with the long simulation time, mean that it was impossible to simulate and validate all of
them serially. Instead, the majority of the patterns were simulated in “parallel” mode and
only the first 10 patterns of each set were simulated in “serial” mode.
Essentially, parallel mode cannot be applied to the silicon and is for simulation purposes
only where the patterns are loaded directly into the scan chains, omitting the time-
consuming shift phase of the test. In this particular design, the maximum scan chain
length was 384 flip-flops and the capture phase was 4 cycles long giving a total of 388
cycles per pattern. Simulating the patterns in parallel was therefore approximately 97
times faster than in serial mode. Though the patterns are simulated much faster in parallel
mode, potential problems with the shift phases (both in and out) of the patterns cannot be
observed and so some patterns must be run in serial mode. The first 10 patterns of each
pattern set were run serially to catch any issues within the scan chains that would result in
other patterns failing.
For at-speed testing of a device, two sets of patterns are normally generated: path delay
and transition delay. For this particular device, the memories were put into a “bypass
mode” for scan in order to increase the test coverage of the device. However, this bypass
mode resulted in the scan paths through the logic around the memories being twice the
length of a normal path between two flip-flops because the scan chains had to be routed
around the actual memories themselves. In order to test this logic the options where
therefore to either declare the paths around the memories as multi-cycle to the ATPG tool
and to let it generate patterns accordingly, or for the paths to be declared normally but for
the patterns to be run at half of the operating speed, in this case 125MHz. Using multi-
cycle paths for ATPG testing can be problematic, especially when they are not real
functional paths, and can result in extremely long pattern generation and simulation time.
A separate set of transition delay patterns for the logic was therefore generated for the
logic around the memories and run at 125MHz.
The results of the simulation were produced in waveform format for debug and in a log
file format for comparison against the expected values. From the log files it could be
analysed where in the device any errors occurred by tracing the first failure of the pattern
along the chain to the flip-flop that captured it. Analysis of the logic between the flip-
flops would reveal what was causing the failure.
For any at-speed patterns the capture phase is the most important part of the test because
it is where the actual speed of the design is proven. For this particular device, the capture
phase was also important in verifying the correct functionality of the TPG, since it is at
this point in the test where the two pulses generated from the PLL clock are used. Within
the capture phase, there is a specific window where the correct number of capture clock
pulses must occur and the time between these two pulses must also be correct in order to
properly test the speed of the design. The waveforms generated by the simulation tool
were therefore used to carefully analyse the capture phase of the test and to assist in
debugging any failing patterns.
As described above, the patterns were run through simulation in both serial and parallel
modes. The results from the parallel simulations were used to solve general ATPG testing
issues such as pattern generation difficulties, problems with the tools and in obtaining the
test coverage values for the design. The serial patterns, representing the way that the
patterns would run on real silicon, were examined more closely in order to verify both the
ATPG patterns and the TPG logic.
The TPG logic was tested simultaneously with the 250MHz transition delay patterns.
From the design of the TPG (Figure 2), the theoretical time for changing the core clock
speed from the scan clock frequency (directly from the pin) to the capture clock
frequency derived by the PLL was two scan-speed clock cycles and four capture-speed
cycles. Once the core clock has been switched to the capture clock speed, the two capture
clock pulses should occur within the expected timing window. In the ATPG tool, it must
be declared to the precise cycle exactly when these two capture pulses are expected. This
was provided in the format described in the capture procedure described previously,
where the capture phase consisted of 3, scan-speed equivalent cycles and 7 capture-speed
cycles, with the capture clocks occurring during cycles 6 and 7. In simulation, on the
post-layout design with SDF timing, the first capture clock occurred after 96 nanoseconds
of entering the capture phase of the test, corresponding to the characteristics of the design
declared to the ATPG tool. The second capture pulse occurred at 100ns, proving the
distance between the two pulses to be correct.
For the patterns around the memories a separate timing script for the ATPG tool was
created and the TPG had to generate the capture clock pulses at a speed of 125MHz
correspondingly. The clock speed of 125MHz for the logic around the memories was
based on a theoretical maximum of the length of the paths generated by putting the
memories in bypass mode. In reality, the path was slightly shorter and in simulation the
first fails on these paths were actually at a frequency of 173 MHz. Since the specification
for these paths however was 125MHz, this was the initial speed maintained for running
the tests on silicon.
The bypass mode of the memories and the resulting restrictions for testing the
surrounding logic at-speed were initially unknown. The discovery of this came from
debug of a selection of patterns that failed at 250MHz. During this debug process, the
capture clock speed was used from a range of 25MHz to 300MHz. In all cases, the
generation of both of the capture clock pulses at the desired time by the TPG was verified
to be correct.
Between the generation of the test pattern set by the ATPG tool and application of the
patterns on silicon, the patterns go through several conversion tools and amelioration
steps in order to be compliant with the exact tester in use. To verify that no errors were
introduced into the patterns during these steps, an in-house Virtual Tester tool was used
to verify the final patterns in tester format before applying them to the actual silicon.
Through comparison of the simulation log files, the results matched those from the ATPG
tool proving that the conversion process was successful.
The ATPG patterns were applied to the “first silicon” version of the device, running them
on devices through probe and also through final test directly on “blind assembled” parts.
In testing each pattern set on the devices in final test, the initial results were closely
analysed through waveforms generated by the tester software to ensure that the capture
clock pulses were being correctly generated by the TPG at the correct speed and time.
The patterns were run across a range of speeds for device characterisation and for each
the performance of the logic was verified to be correct, corresponding to the results of the
simulation.
During simulation of the 125MHz patterns, the first fails started to appear at speeds of
173MHz. On the tester, the speed of the capture clock was incremented from 125MHz by
steps of 8.25MHz to find the failing speed on real silicon. The first real fails on silicon
appeared at speeds of around 150MHz for low Vcc and high temperature (worst case
conditions), 181MHz for normal Vcc and high temperature and 199 MHz for high Vcc
and high temperature.
250
200
FMAX MHz
150
100
50
0
HHH_w19
HNN_w18
NNN_w9
LNN_w21
LNN_w5
NNL_w10
NLL_w25
LLL_w6
Split
Figure 12 shows an example of the results obtained on first silicon across a range of
blind-assembled devices through final test under worst case conditions, showing at what
speed the first failure occurred for each transition delay pattern sub-set on each device.
Taking the average of the results across the split shows that the first failure speed of
173MHz found through simulation using back-annotated SDF timing was reasonably
accurate in estimating the target frequency of the device.
For the 250MHz transition delay patterns, the first failures on the device were seen at
speeds of 248MHz (see Figure 13), 274MHz and 296MHz at high temperature for low,
average and high Vcc respectively.
Speed 250MHz Transition Delay
tr_fs_0_199_X_250m hz tr_fs_1000_1199_X_250m hz tr_fs_1200_1399_X_250m hz
tr_fs_1400_1599_X_250m hz tr_fs_1600_1799_X_250m hz tr_fs_1800_1999_X_250m hz
tr_fs_200_399_X_250m hz tr_fs_2000_2199_X_250m hz tr_fs_2200_2399_X_250m hz
tr_fs_2400_end_X_250m hz tr_fs_400_599_X_250m hz tr_fs_600_799_X_250m hz
tr_fs_800_999_X_250m hz tr_tk_0_499_X_250m hz tr_tk_1000_1499_X_250m hz
tr_tk_1500_1999_X_250m hz tr_tk_2000_end_X_250m hz tr_tk_500_999_X_250m hz
400
350
300
FMAX MHz
250
200
150
100
50
0
HHH_w19
HNN_w18
NNN_w9
LNN_w21
LNN_w5
NNL_w10
NLL_w25
LLL_w6
Split
The results for the path delay patterns were similar to those of the 250MHz patterns, with
the first failures occurring on the device at speeds of 271MHz, 297MHz and 317MHz
under the same test conditions as the transition delay patterns.
Since the path delay patterns exercise, theoretically, the most timing critical paths in the
design, the frequency at which the first failure occurs should always be lower than that of
the transition delay patterns. As can be observed from the initial results, this was not the
case for this device and there was a slight inconsistency between the results of the
250MHz transition delay patterns and the path delay patterns. After extensive analysis of
the failing patterns through failure analysis tools, the cause of the discrepancy was found
to be the way that the ATPG tool had generated the transition delay test patterns. In the
design, the tool had found a logical path in scan mode through the JTAG logic in order to
test a handful of potential faults that would otherwise be un-testable. For stuck-at patterns
this was not a problem. However, the path that it created in scan mode through the logic
was longer than 4ns, causing the patterns to fail at-speed. Since the path was not used in
functional mode, the faults covered by the path were removed from the transition delay
pattern set, after which the results corresponded to those of the path delay patterns.
5 Conclusion
The Transition Pulse Generator implementation provided a simple, safe and easy to use
method of on-chip pulse generation for use in scan Transition and Path Delay tests. It
meets the requirements for implementation in any standard SoC design flow, is
extendable and can be easily migrated to other technologies. Despite the learning curve
associated with the ATPG tool, the TPG logic and PLL setup, this method was easy to
adopt from a DFT perspective. The ATPG At-Speed patterns and the related TPG logic
were both verified through simulation. The patterns were run through a standard
simulation tool and also through a Virtual Tester, both using back-annotated SDF timing,
to ensure the correctness of the patterns before applying them to the tester. The results on
silicon showed a strong correlation with the results from simulation in terms of device
timing and confirmed the correct functionality of the TPG.
The benefits of using this technique were that, with very small logic overhead, production
test costs were significantly lowered whilst increasing the testability of the device for
delay faults. The only drawback was the learning curve for TPG and PLL control in order
to generate the test patterns. This solution can be applied to any MUX-D scan-able device
that needs to be tested at a frequency higher than that which the targeted test platform can
accurately provide.
With higher transistor densities and more complex fabrication techniques of upcoming
devices, ATPG tools will need to be more effective at finding ways to cover as many
potential faults as possible. Where the design described in this paper had test coverage
targets of 98% SA and 70% transition delay, future devices (in particular very deep sub-
micron) will need to have at-speed targets closer to 100%. This, together with the
development and adoption of lower-cost ATE platforms, suggests that an on-chip
solution for generation and control of a clock for at-speed testing will become essential.
6 References