0% found this document useful (0 votes)
119 views23 pages

Euro Designcon 2005

This document discusses a technique for performing at-speed testing of high-frequency devices using a low-frequency tester. It involves using an on-chip PLL to generate high-frequency clock pulses during testing capture phases, while still using the low-frequency tester clock during scan shifting. The authors implemented this technique on a 130nm SoC with over 62 million transistors, allowing at-speed testing using a slower tester. They describe the design of the on-chip pulse generator and PLL controller logic to generate the high-frequency clock pulses, as well as verifying the design through simulation before silicon validation.

Uploaded by

sumitbaheti
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views23 pages

Euro Designcon 2005

This document discusses a technique for performing at-speed testing of high-frequency devices using a low-frequency tester. It involves using an on-chip PLL to generate high-frequency clock pulses during testing capture phases, while still using the low-frequency tester clock during scan shifting. The authors implemented this technique on a 130nm SoC with over 62 million transistors, allowing at-speed testing using a slower tester. They describe the design of the on-chip pulse generator and PLL controller logic to generate the high-frequency clock pulses, as well as verifying the design through simulation before silicon validation.

Uploaded by

sumitbaheti
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Euro DesignCon 2005

At-Speed Scan Transition and


Path Delay Testing Using On-
chip PLL for High Frequency
Device and Low Frequency
Tester

Eric Haioun, Freescale


[email protected], +33 (0) 561 199 483

Colin D. Renfrew, Freescale


[email protected], +33 (0) 561 191 101

Robert Gach, Freescale


[email protected], +33 (0) 561 199 412
Abstract
In this paper we present a Design-For-Test (DFT) technique implemented on a high
speed VLSI device that allows us to use a low speed/cost tester to perform at-speed scan
transition and path delay testing.

The concept is to:-


• Use an internal PLL controller to be able to use a slow tester clock during the scan
shift phase and to use to the internal PLL at the functional speed of the device
during the capture phase.
• Adopt an ATPG tool that allows definition of an external tester driven clock
during shift phase and internal clock driven by the PLL controller during the
capture procedure.
• Verify the Test Patterns and the PLL controller design by simulation prior tape-
out and validate the results on silicon.

Keywords: “Design For Test”, DFT, At-Speed Tests, Delay Tests, Transition Delay, Path
Delay, On chip PLL.

Author(s) Biography

Eric Haioun is a Digital Design engineer actually working on Design Verification. Eric is
a member of the Freescale DFT Methodology Development Council and was previously
DFT leader on a complex digital design for Freescale Networking & Computer System
Group. Eric has a Masters degree in Electrical Engineering of the National School of
Electronics and its Applications (E.N.S.E.A) in Cergy Pontoise, France. His main interest
is in chip design DFT, Integration, Verification and Methodology.

Colin Renfrew is a Digital Design engineer for Freescale Networking & Computing
System Group. Colin has an MSc in System Level Integration from the ISLI (University
of Edinburgh) and a BEng(Hons) in Computer and Electronic Systems from the
University of Strathclyde. His main interests are in SoC chip design, DFT and
Verification.

Robert Gach is Design Manager of the Freescale Networking & Computing System
Group in Europe. Robert has a BSc (Hons) of Electrical and Electronic Engineering. His
main interests are in SoC chip design from the Front-End definition, design and
verification to the Physical implementation.
1 Introduction

At-speed testing is becoming progressively more important as the technology of devices


gets smaller. In parallel, the increasing complexity of modern designs requires to move
from functional testing towards structural testing using advanced “Design For Test”
(DFT) techniques. As device frequencies become higher, the ability to test the device at-
speed becomes limited by the capabilities of the production test equipment.

As an alternative to using a more expensive tester, the preferred solution for this project
was to generate the at-speed clock on-chip. An innovative and silicon-proven DFT
technique to do this had been designed for LSSD (Level Sensitive Scan Design, with
Latches) style, co-developed by Mentor and Freescale Semiconductor (formerly
Motorola) on the PowerPC(TM) microprocessors [1]. Since the device for this project
was designed with Scan Flip-Flops (MUX-D), the objective was to adapt the concept and
develop a suitable, generic solution that could be applied to all MUX-D style devices.

This paper will present the design of the on-chip clock generation and control logic, the
delay fault models adopted, the DFT implementation and the verification, describing the
results before tape-out and also on silicon.
The device on which we applied this technique, was fabricated in a 130nm process with
eight layers of metal, incorporating over 62 Million transistors with >150k Flip-Flops and
4Mb of distributed Memory.

Figure 1 – The Device Under Test


2 Transition Pulse Generator Design

In order to enable full speed transition scan testing it was decided to implement an on-
chip method for generation of the high speed clock pulse for the transition launch and
capture. The internal core clock frequency for this device, 250 MHz, was already above
the normal ATE speed in terms of accurate high frequency clock generation and was well
beyond the limits of wafer probe capability. The implementation of an on-chip solution
was also deemed better suited for re-use and the migration of device test to lower cost
ATE platforms. In addition, it offered greater control of the transition pulse timing via the
PLL, latter used as an aid for device characterization.

2.1 Requirements

Before implementation started, we determined specific requirements for the solution to


ensure that the resulting functionality was easy to use and presented no integration issues
to our SoC design flow. These requirements were as follows:-

1. Secure operation
o Allow simple ATE interfacing
o Easy to meet Interface Timing requirements
o Meta-stability protection for multiple clock domains and asynchronous
Inputs
o Single “At Speed” Clock Domain
2. Technology Independent
o Implementation should not be specific to a particular technology
o Independent of Standard Cell Library
o Independent of PLL features/operation
3. No custom design or Layout
o Must allow Timing Driven P+R Backend Flow & Timing Closure Flow
4. Synthesizable
o Implementation in Verilog/VHDL for synthesis
5. Flexible pulse generation
o Capable of generating 2 or 3 Transition Clock Pulses, 3 pulse are used to
increase the sequential depth available to the ATPG tools
6. Extendable
o Easily extendable to increase the number of transition clock pulses

2.2 Implementation

The Transition Pulse Generator (TPG) has been implemented in synthesizable Verilog.
For simplicity of visual description, a schematic representation will be used in this paper.
It should be noted that a standard SoC backend flow was used: synthesis; timing driven
Place & Route; clock tree insertion; back-annotation and Static Timing Analysis (STA).
No custom timing or layout techniques or in house tools were used.
The logic is by nature non-scan-able since it itself is used for running the device in scan
mode. For this reason the “core_clk” was made visible on a Primary Output Pin in a test
mode to allow operation of the circuitry to be verified albeit at a lower clock frequency.
The simplified schematic representation is as follows:-

scan_clk P8
P6
P9 Clock Tree
core_clk
P7
P5
P3 P4
transition_pulses

P1
pulse_2b_3_sel P2

M3 M4 S1 S2 S3 S4
D Q D Q D Q D Q D Q D Q

CLK QB CLK QB CLK QB CLK QB CLK QB CLK QB


pll_out_clk

pulse_trigger

scan_en EN2
EN1 M1 M2
scan_tf_en D Q D Q

CLK QB CLK QB
sys_clk

Figure 2 – Transition Pulse Generator Schematic

Signal Descriptions

• sys_clk – Primary Input, System clock used as the reference input clock for the
PLL, continuous free running clock, synchronized to “scan_clk”
• pll_out_clk – Output reference clock from the PLL, internal core clock frequency
• scan_clk - Primary Input, Scan chain clock input source, discontinuous clock
• core_clk - Primary Input, Internal Core clock
• scan_tf_en - Primary Input, Scan mode Transition Fault enable
• scan_en - Primary Input, Scan chain shift enable, low = capture & high = shift
• pulse_2b_3_sel - Primary Input, Transition Pulse number select, low =2 & high=3
• pulse_trigger – sampled “scan_en” used to start pulse generation sequence
• transition_pulse – extracted 2 or 3 pulse sequence for combination with the
“scan_clk”

2.3 Operation

The following timing diagram shows the required input/output waveforms. The diagram
shows how the primary input signals, “sys_clk”, “scan_clk” and “scan_en”, are used to
transform the internal “core_clk” signal to contain both the “scan_clk” for shift and
two/three PLL clock pulses for the high speed transition capture.

sys_clk

scan_clk

scan_en

pulse_trigger

pll_ref_clk

P5

core_clk

Figure 3 – Transition Pulse Generator Timing Diagram 1

The next timing diagram takes a closer look at how the pulses are extracted, in this case 2
pulses, “transition_pulse” (P5). This is generated by the shift Register S1-4 and the logic
P1-5. M1-4 are used as a meta-stability barriers since the signal “pulse_trigger” is
crossing clock domains

pll_out_clk

pulse_trigger

M3 (meta-stability)

M4 (meta-stability)

S1

S2

S3

S4

P2

transition_pulses (P5)

Figure 4 – Transition Pulse Generator Timing Diagram 2


All that remains to do is to combine the “scan_clk” and “transition_pulse” signals using a
simple multiplexer since the timing/state of the signals are known and stable. This
multiplexer is described by P6-P9. Finally, the resulting hybrid clock is used as the root
for the chip wide clock tree for the “core_clk”. For three transition pulse the waveform
would be as follows:-

pll_out_clk

pulse trigger

M3 (meta-stability)

M4 (meta-stability)

S1

S2

S3

S4

P2

transition_pulses (P5)

Figure 5 – Transition Pulse Generator Timing Diagram 3

2.4 Technology Considerations

The device targeted 130nm technology using a commercial standard cell library with a
target operational frequency of 250MHz. This meant that we had 2ns (half a clock
period) as the upper limit for propagation delays within the pulse generation logic.

The critical logic path was:-

“pll_out_clk” -> S1,2,3 or4 -> P2 or 3 -> P4 -> P5-> “transition_pulse”

This logic path equates to 4 levels of logic and the 2ns target was easily attainable with
synthesis and timing driven Place & Route. Indeed STA showed that frequencies in
excess of 600MHz were easily attainable with a 130nm technology. Data from a 90nm
SOI process indicate that 1GHz operation would be attainable.
If custom layout techniques such as hand placement and hardening were used then
significantly higher frequencies could be attained. This has not been quantified since it is
out with the requirement defined for the implementation of this device.

The accuracy of the resulting Transition Pulses must also be quantified, error sources
identified were:-
1) Input clock jitter
2) PLL jitter, which would be unique to the internal solution
3) Clock tree skew, which would also be present in either and internal or external
solution

For our implementation we achieved ~300ps for 2) and 3) combined, i.e. less than 10% of
the 4ns clock period. This potential pulse spacing error must be taken into account during
the cross-correlation exercise between silicon and simulation.
3 “Design For Test” Delay Tests using On-Chip PLL
solution

This section will cover the “Design For Test” (DFT) implementation of the Delay Tests
using the internal PLL solution described in the previous section. Before describing the
requirements and implementation, we will go over a brief description of the Delay Test
concept.

The purpose of a delay test is to verify that a chip operates correctly at the specified clock
speed. Researchers have proposed two types of fault models for dealing with generating
test patterns for delay defects detection. In the transition fault model, a gate output has a
slow-to-rise and a slow-to-fall fault associated with it. In the other delay fault model,
called the path delay fault model, a chip contains a path delay fault if it has a path whose
delay exceeds a specified value.

The Transition delay fault model looks for a gross delay potential at each gate terminal
(Paths are automatically selected by the ATPG tool). A faulty node is shown below in
red:

Figure 6 – Transition Delay fault model

The Path Delay fault model looks for combined delay through all gates of a path
(Paths are explicitly loaded into the ATPG tool). A faulty path is shown below in red:

Figure 7 – Path Delay fault model

Below is a real example showing a resistive via defect seen on silicon, resulting in a
failing transition delay test pattern. Figure 8 shows the resistive via between two metal
layers and figure 9 shows the signal transition on both good and bad devices.
Figure 8 – FA report of resistive via between two metal layers
Figure 9 – FA report of timing diagram: slow to fall silicon defect

3.1 Using the On-Chip PLL, principle of operation

The schematic below (figure 10) describes how the low speed external clock
“scan_clock” and the high speed internal clock “transition_pulses” are combined into a
single “core_clock” that drives all of the scan registers on this clock domain.
PADS

scan_clock
core_clock
transition_pulses
sys_clock
PLL
TPG

scan_tf_en

pulse_2b_3_sel

scan_enable

Figure 10 – Schematics of the clock connections

During the shift or load/unload phase (scan_enable=1) the tester drives “scan_clock”,
which is connected to the clock input of the scan cells (“core_clock”). This clock can be
at low speed, since only the scan chains are activated to load/unload values in/from the
scan cells.

During the capture phase (scan_enable=0) the TPG provides the internal clock for the
transition pulses, connected to the clock input of the scan cells (“core_clock”). This clock
is at functional speed, programmed via the PLL setup, and “scan_clock” is inactive
during this phase. In general two pulses are generated, the launch clock and the capture
clock. An option to generate three pulses is available in the TPG to increase the
sequential depth which may improve the test coverage in certain cases.

3.2 DFT Requirements

1. The ATPG tool needs to have a clock defined as a scan clock from a PAD, in this
case the tool needs to know that the scan clock used for capture is an internal
signal.
2. The transition delay setup has to be in Broadside mode and cannot be in Last Shift
Launch mode. Broadside mode means that both the launch and capture pulses
need to happen during the capture phase (scan_enable=0). Last Shift Launch
mode means that the launch pulse happen during the Shift phase (scan_enable=1)
and the capture pulse during the capture phase. In Broadside mode, scan_enable
timing is not critical but the coverage is usually lower, depending on the design
and the ATPG tool. On the device we are referring to in this paper, the coverage
difference was less than 2%.
3. The external port sys_clock must free running, to keep the PLL locked.

3.3 DFT Implementation

To handle the first requirement, the ATPG tool that we selected used a “named capture
procedure” that defines two states for a scan clock, either external (driven from a PAD)
or internal (from an internal node in the design). For the second requirement, the tool was
forced to work in broadside mode with a command switch when selecting the transition
or path delay fault models. For the third requirement, we chose to constrain the PAD
sys_clock to 1 in the ATPG tool and to change the format of this signal to “Non Return to
Zero” in the test patterns, in order to create a free running clock.

Several input pins, shared in scan mode, were used to program the PLL ratio to 10:1. The
external clock was set to 25MHz to provide a internal clock at 250MHz, the targeted
functional speed of the device.

Named Capture Procedure used for this device:

Internal mode:
• cycles 1 to 2 at low speed (25MHz): configure TPG and force all clocks (internal
and external) to their off-state
• cycles 3 to 5 at high speed (250MHz): wait cycles (wait until TPG
transition_pulses clock signal is active)
• cycles 6 and 7 at high speed (250MHz): force a pulse transition_pulses internal
signal
• cycles 8 to 12 at high speed (250MHz): wait cycles
• cycles 13 at low speed (25MHz): wait cycle

External mode:
• cycles 1 to 2 at low speed (25MHz): configure TPG and force all clocks (internal
and external) to their off-state
• cycles 3 to 5 at high speed (250MHz): wait cycles (wait until TPG
transition_pulses clock signal is active)
• cycles 6 and 7 at high speed (250MHz): empty cycles (the pulses are generated by
the TPG block)
• cycles 8 to 12 at high speed (250MHz): wait cycles
• cycles 13 at low speed (25MHz): wait cycle

The internal and external modes have the same duration, the only difference being that in
internal mode the design internal node transition_pulses is forced by the tool (as if it was
a PAD input of the device). When in external mode, no action is taken since the launch
and capture pulses on the internal clock are generated by the TPG block. The internal
mode is used by the ATPG tool to generate test patterns for simulation without the TPG
and PLL models. The external mode is used to generate test patterns for simulation with
the TPG and PLL models and for silicon test.

The ATPG process generated the following number of patterns generated and subsequent
coverage:-

• 4000 to obtain 98% coverage for StuckAt fault model


• 2200 to obtain 70% coverage for Transition Delay fault model
• 320 to obtain 2569 paths tested among the most critical paths for Path Delay fault
model

An ATPG pattern consists of the load/unload and capture phases, which corresponds to
388 clock cycles.
4 Pattern and Transition Pulse Generator Verification

Before applying the at-speed ATPG test patterns to real silicon, it was important that they
were verified on the design in simulation. In addition to this, for this specific
implementation and use of the PLL, it was imperative that the functionality of the on-chip
logic that supports the at-speed test capabilities was verified before tape-out of the
design, since any problems would be much more difficult and time-consuming to debug
on silicon. Through simulation of the generated test patterns and analysis of the results,
the TPG logic and the ATPG patterns were verified together. The ATPG patterns were
run through a standard Verilog simulator and also through a Virtual Tester before being
applied to the silicon. The methodology behind the ATPG pattern and TPG logic
verification, the results from running the ATPG patterns through simulation and the
results on silicon are described below.

4.1 Verification Methodology

In order to verify that the at-speed ATPG patterns have been generated correctly they
must be fully simulated on a version of the design as close as possible to the real silicon
and the results cross-checked with those expected by the ATPG tool. The chart below
shows the ATPG pattern verification flow used on this design.

SDF Timing ATPG Patterns

PLL Model

Simulation Tool

Memory Models

Logfiles Waveforms

FAIL
Analysis

PASS
Final ATPG Patterns

Figure 11 – ATPG Pattern Verification Methodology


4.1.1 Verification Environment

The ATPG patterns were provided to the simulator as stimulus along with the appropriate
SDF (Standard Delay Format) timing files and Gate Level Netlist. For testing the patterns
at-speed, it was essential to use SDF back-annotated timing on a post-layout version of
the design in order to have an image of the device as close as possible to real silicon. The
simulations were run with best and worst case timing in order to verify the extreme
corners of the device specifications and process.

Behavioral models with timing characteristics were used for the memories and the PLL.
The model for the PLL accurately estimated the real delay in waiting for the PLL to lock
to the configured clock speed (approximately 1µs). For the purposes of verification and
debug, this resulted in an undesired delay in the simulation before being able to see any
of the actual results and so was ameliorated to zero inside the model without affecting the
functionality of the PLL itself.

4.1.2 ATPG Pattern Format

Following the generation of the ATPG patterns by the ATPG tool, they are then validated
by running them on the design through simulation. The number of potential device faults,
and therefore the number of patterns required to test a device, increases with the number
of transistors. This particular device resulted in many thousands of patterns being
generated for each pattern type (stuck-at, transition delay and path delay). This, combined
with the long simulation time, mean that it was impossible to simulate and validate all of
them serially. Instead, the majority of the patterns were simulated in “parallel” mode and
only the first 10 patterns of each set were simulated in “serial” mode.

Essentially, parallel mode cannot be applied to the silicon and is for simulation purposes
only where the patterns are loaded directly into the scan chains, omitting the time-
consuming shift phase of the test. In this particular design, the maximum scan chain
length was 384 flip-flops and the capture phase was 4 cycles long giving a total of 388
cycles per pattern. Simulating the patterns in parallel was therefore approximately 97
times faster than in serial mode. Though the patterns are simulated much faster in parallel
mode, potential problems with the shift phases (both in and out) of the patterns cannot be
observed and so some patterns must be run in serial mode. The first 10 patterns of each
pattern set were run serially to catch any issues within the scan chains that would result in
other patterns failing.

For at-speed testing of a device, two sets of patterns are normally generated: path delay
and transition delay. For this particular device, the memories were put into a “bypass
mode” for scan in order to increase the test coverage of the device. However, this bypass
mode resulted in the scan paths through the logic around the memories being twice the
length of a normal path between two flip-flops because the scan chains had to be routed
around the actual memories themselves. In order to test this logic the options where
therefore to either declare the paths around the memories as multi-cycle to the ATPG tool
and to let it generate patterns accordingly, or for the paths to be declared normally but for
the patterns to be run at half of the operating speed, in this case 125MHz. Using multi-
cycle paths for ATPG testing can be problematic, especially when they are not real
functional paths, and can result in extremely long pattern generation and simulation time.
A separate set of transition delay patterns for the logic was therefore generated for the
logic around the memories and run at 125MHz.

4.1.3 Analyzing the Results

The results of the simulation were produced in waveform format for debug and in a log
file format for comparison against the expected values. From the log files it could be
analysed where in the device any errors occurred by tracing the first failure of the pattern
along the chain to the flip-flop that captured it. Analysis of the logic between the flip-
flops would reveal what was causing the failure.

For any at-speed patterns the capture phase is the most important part of the test because
it is where the actual speed of the design is proven. For this particular device, the capture
phase was also important in verifying the correct functionality of the TPG, since it is at
this point in the test where the two pulses generated from the PLL clock are used. Within
the capture phase, there is a specific window where the correct number of capture clock
pulses must occur and the time between these two pulses must also be correct in order to
properly test the speed of the design. The waveforms generated by the simulation tool
were therefore used to carefully analyse the capture phase of the test and to assist in
debugging any failing patterns.

4.2 Validation of the Patterns through Simulation

As described above, the patterns were run through simulation in both serial and parallel
modes. The results from the parallel simulations were used to solve general ATPG testing
issues such as pattern generation difficulties, problems with the tools and in obtaining the
test coverage values for the design. The serial patterns, representing the way that the
patterns would run on real silicon, were examined more closely in order to verify both the
ATPG patterns and the TPG logic.

The TPG logic was tested simultaneously with the 250MHz transition delay patterns.
From the design of the TPG (Figure 2), the theoretical time for changing the core clock
speed from the scan clock frequency (directly from the pin) to the capture clock
frequency derived by the PLL was two scan-speed clock cycles and four capture-speed
cycles. Once the core clock has been switched to the capture clock speed, the two capture
clock pulses should occur within the expected timing window. In the ATPG tool, it must
be declared to the precise cycle exactly when these two capture pulses are expected. This
was provided in the format described in the capture procedure described previously,
where the capture phase consisted of 3, scan-speed equivalent cycles and 7 capture-speed
cycles, with the capture clocks occurring during cycles 6 and 7. In simulation, on the
post-layout design with SDF timing, the first capture clock occurred after 96 nanoseconds
of entering the capture phase of the test, corresponding to the characteristics of the design
declared to the ATPG tool. The second capture pulse occurred at 100ns, proving the
distance between the two pulses to be correct.

For the patterns around the memories a separate timing script for the ATPG tool was
created and the TPG had to generate the capture clock pulses at a speed of 125MHz
correspondingly. The clock speed of 125MHz for the logic around the memories was
based on a theoretical maximum of the length of the paths generated by putting the
memories in bypass mode. In reality, the path was slightly shorter and in simulation the
first fails on these paths were actually at a frequency of 173 MHz. Since the specification
for these paths however was 125MHz, this was the initial speed maintained for running
the tests on silicon.

The bypass mode of the memories and the resulting restrictions for testing the
surrounding logic at-speed were initially unknown. The discovery of this came from
debug of a selection of patterns that failed at 250MHz. During this debug process, the
capture clock speed was used from a range of 25MHz to 300MHz. In all cases, the
generation of both of the capture clock pulses at the desired time by the TPG was verified
to be correct.

4.3 Results on a Virtual Tester

Between the generation of the test pattern set by the ATPG tool and application of the
patterns on silicon, the patterns go through several conversion tools and amelioration
steps in order to be compliant with the exact tester in use. To verify that no errors were
introduced into the patterns during these steps, an in-house Virtual Tester tool was used
to verify the final patterns in tester format before applying them to the actual silicon.
Through comparison of the simulation log files, the results matched those from the ATPG
tool proving that the conversion process was successful.

4.4 Results on Silicon

The ATPG patterns were applied to the “first silicon” version of the device, running them
on devices through probe and also through final test directly on “blind assembled” parts.
In testing each pattern set on the devices in final test, the initial results were closely
analysed through waveforms generated by the tester software to ensure that the capture
clock pulses were being correctly generated by the TPG at the correct speed and time.
The patterns were run across a range of speeds for device characterisation and for each
the performance of the logic was verified to be correct, corresponding to the results of the
simulation.

During simulation of the 125MHz patterns, the first fails started to appear at speeds of
173MHz. On the tester, the speed of the capture clock was incremented from 125MHz by
steps of 8.25MHz to find the failing speed on real silicon. The first real fails on silicon
appeared at speeds of around 150MHz for low Vcc and high temperature (worst case
conditions), 181MHz for normal Vcc and high temperature and 199 MHz for high Vcc
and high temperature.

Speed 125MHz Transition Delay


tr_fs_X_0_199_125m hz tr_fs_X_1000_1199_125m hz tr_fs_X_1200_1399_125m hz tr_fs_X_1400_1599_125m hz
tr_fs_X_1600_1799_125m hz tr_fs_X_1800_1999_125m hz tr_fs_X_200_399_125m hz tr_fs_X_2000_end_125m hz
tr_fs_X_400_599_125m hz tr_fs_X_600_799_125m hz tr_fs_X_800_999_125m hz tr_tk_X_0_499_125m hz
tr_tk_X_1000_1499_125m hz tr_tk_X_1500_1999_125m hz tr_tk_X_2000_end_125m hz tr_tk_X_500_999_125m hz

250

200
FMAX MHz

150

100

50

0
HHH_w19

HNN_w18

NNN_w9
LNN_w21

LNN_w5

NNL_w10
NLL_w25
LLL_w6

Split

Figure 12 – 125MHz Transition Delay Results on Silicon

Figure 12 shows an example of the results obtained on first silicon across a range of
blind-assembled devices through final test under worst case conditions, showing at what
speed the first failure occurred for each transition delay pattern sub-set on each device.
Taking the average of the results across the split shows that the first failure speed of
173MHz found through simulation using back-annotated SDF timing was reasonably
accurate in estimating the target frequency of the device.

For the 250MHz transition delay patterns, the first failures on the device were seen at
speeds of 248MHz (see Figure 13), 274MHz and 296MHz at high temperature for low,
average and high Vcc respectively.
Speed 250MHz Transition Delay
tr_fs_0_199_X_250m hz tr_fs_1000_1199_X_250m hz tr_fs_1200_1399_X_250m hz
tr_fs_1400_1599_X_250m hz tr_fs_1600_1799_X_250m hz tr_fs_1800_1999_X_250m hz
tr_fs_200_399_X_250m hz tr_fs_2000_2199_X_250m hz tr_fs_2200_2399_X_250m hz
tr_fs_2400_end_X_250m hz tr_fs_400_599_X_250m hz tr_fs_600_799_X_250m hz
tr_fs_800_999_X_250m hz tr_tk_0_499_X_250m hz tr_tk_1000_1499_X_250m hz
tr_tk_1500_1999_X_250m hz tr_tk_2000_end_X_250m hz tr_tk_500_999_X_250m hz

400
350
300
FMAX MHz

250
200
150
100
50
0
HHH_w19

HNN_w18

NNN_w9
LNN_w21

LNN_w5

NNL_w10
NLL_w25
LLL_w6

Split

Figure 13 – 250MHz Transition Delay Results on Silicon

The results for the path delay patterns were similar to those of the 250MHz patterns, with
the first failures occurring on the device at speeds of 271MHz, 297MHz and 317MHz
under the same test conditions as the transition delay patterns.

Since the path delay patterns exercise, theoretically, the most timing critical paths in the
design, the frequency at which the first failure occurs should always be lower than that of
the transition delay patterns. As can be observed from the initial results, this was not the
case for this device and there was a slight inconsistency between the results of the
250MHz transition delay patterns and the path delay patterns. After extensive analysis of
the failing patterns through failure analysis tools, the cause of the discrepancy was found
to be the way that the ATPG tool had generated the transition delay test patterns. In the
design, the tool had found a logical path in scan mode through the JTAG logic in order to
test a handful of potential faults that would otherwise be un-testable. For stuck-at patterns
this was not a problem. However, the path that it created in scan mode through the logic
was longer than 4ns, causing the patterns to fail at-speed. Since the path was not used in
functional mode, the faults covered by the path were removed from the transition delay
pattern set, after which the results corresponded to those of the path delay patterns.
5 Conclusion

The Transition Pulse Generator implementation provided a simple, safe and easy to use
method of on-chip pulse generation for use in scan Transition and Path Delay tests. It
meets the requirements for implementation in any standard SoC design flow, is
extendable and can be easily migrated to other technologies. Despite the learning curve
associated with the ATPG tool, the TPG logic and PLL setup, this method was easy to
adopt from a DFT perspective. The ATPG At-Speed patterns and the related TPG logic
were both verified through simulation. The patterns were run through a standard
simulation tool and also through a Virtual Tester, both using back-annotated SDF timing,
to ensure the correctness of the patterns before applying them to the tester. The results on
silicon showed a strong correlation with the results from simulation in terms of device
timing and confirmed the correct functionality of the TPG.

The benefits of using this technique were that, with very small logic overhead, production
test costs were significantly lowered whilst increasing the testability of the device for
delay faults. The only drawback was the learning curve for TPG and PLL control in order
to generate the test patterns. This solution can be applied to any MUX-D scan-able device
that needs to be tested at a frequency higher than that which the targeted test platform can
accurately provide.

With higher transistor densities and more complex fabrication techniques of upcoming
devices, ATPG tools will need to be more effective at finding ways to cover as many
potential faults as possible. Where the design described in this paper had test coverage
targets of 98% SA and 70% transition delay, future devices (in particular very deep sub-
micron) will need to have at-speed targets closer to 100%. This, together with the
development and adoption of lower-cost ATE platforms, suggests that an on-chip
solution for generation and control of a clock for at-speed testing will become essential.
6 References

[1] “At-Speed Testing of Delay Faults for the PowerPCTM G4 Microprocessor”


Nandu Tendolkar, Robert Molyneaux, Carol Pyron, and Rajesh Raina, Freescale

You might also like