0% found this document useful (0 votes)
45 views

DFT Guide

The document is a beginner-friendly guide on Design for Testability (DFT) in VLSI, covering key concepts such as fault models, scan architecture, and automatic test pattern generation (ATPG). It explains the importance of DFT techniques in detecting manufacturing defects and outlines various components like on-chip clock controllers and logic built-in self-test (LBIST). Additionally, it discusses modifications needed for clocking architecture to support scan operations and provides SystemVerilog code for on-chip clock controllers.

Uploaded by

prashanthp436
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

DFT Guide

The document is a beginner-friendly guide on Design for Testability (DFT) in VLSI, covering key concepts such as fault models, scan architecture, and automatic test pattern generation (ATPG). It explains the importance of DFT techniques in detecting manufacturing defects and outlines various components like on-chip clock controllers and logic built-in self-test (LBIST). Additionally, it discusses modifications needed for clocking architecture to support scan operations and provides SystemVerilog code for on-chip clock controllers.

Uploaded by

prashanthp436
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

DFT

Design for Testability

MADE SIMPLE
A Complete Beginner Friendly
VLSI Guide

Prasanthi Chanda
CONTENT
CHAPTER 1: DFT, Scan & ATPG
What is DFT
Fault Models
Stuck-at Faults
At-speed Faults
Scan & ATPG

CHAPTER 2: On-Chip Clock Controller


Scan Clocking Architecture
Modifications Required
Glitches Concepts
Basic On-Chip Clock Architecture
SystemVerilog Code

CHAPTER 3: LFSR & Ring Generator


Types of LFSR
How to transform a modular type LFSR to Ring Generator

CHAPTER 4: Logic Built in Self Test (LBIST)


Basics of LBIST
Test Pattern Generator
Phase Shifter
Response Analyzer of LBIST
Role of RA
Characteristics of RA
Aliasing
probability of Aliasing
CONTENT
LFSR based Serial RA
CRC Theory
LFSR based Parallel RA (MISR)
Masking in MISR

CHAPTER 5: Test Compression


Basics of test compression and EDT
EDT – Decompressor
EDT – Compactor
CHAPTER 1
DFT, Scan & ATPG

The chip manufacturing process is prone to defects and the


defects are commonly referred as faults. A fault is testable
if there exists a well-specified procedure to expose it in the
actual silicon. To make the task of detecting as many faults
as possible in a design, we need to add additional logic;
Design for testability (DFT) refers to those design
techniques that make the task of testing feasible. The most
common DFT techniques for logic test are called Scan and
ATPG.

Fault Models
Fault models abstract the behavior of manufacturing
defects so that test vectors can be generated to detect
them.
• Functional Defects: Stuck-at Fault Model
• Current defects: Pseudo Stuck-at Fault Model (IDDQ)
• Speed defects: At-speed Fault Model, Path Delay Fault
Model
The two most common fault models are stuck-at and at-
speed fault models.
1. Stuck-at Faults
This is the most common fault model used in industry.
It models manufacturing defects which occurs when a
circuit node is shorted to VDD (stuck-at-1 fault) or GND
(stuck-at-0 fault) permanently.
The fault can be at the input or output of a gate.
Thus a simple 2-input AND gate has six possible stuck-at
faults.

In the above circuit, suppose we have a stuck-at-0 fault


at the output of an AND gate.
Note one important thing, there are three input ports in
the circuit, thus we can have a combination of eight
different inputs or patterns {000, 001, 010, 011, 100, 101,
110, 111}.
Out of the eight patterns, only two patterns {011, 111}
will be able to detect this fault because with rest of the
patterns the expected output will be same as the actual
circuit output in the presence of this s-a-0 fault.
This is a small circuit so we can easily find the pattern
that can detect this fault, but what about much bigger
circuits?
Well we don’t have to worry about it as the CAD tools
(ATPG tools) will do that for us.
The ATPG tools will try to generate the stuck-at fault
patterns required to test all the possible fault locations
using complex algorithms, but if it is unable to find
patterns for few faults, then it will classify those faults
as untestable.

2. At-speed Faults
It models the manufacturing defects that behave as
gross delays on gate input-output ports.
So each port is tested for logic 0-to-1 transition delay
(slow-to-rise fault) or logic 1-to-0 transition delay (slow-
to-fall fault).
Like stuck-at faults, the at-speed fault can be at the
input or output of a gate, thus a simple 2-input AND
gate has six possible at-speed faults.
Assume Initial Value stored in Flip Flop 1 = 1
Assume Initial Value stored in Flip Flop 2 = 1
st
After 1 Clock pulse (Launch Edge)
Captured Value in Flip Flop 1 = 0
(1 to 0 transition occurs at the output of the AND gate)
Captured Value in Flip Flop 2 = 1
nd
After 2 Clock pulse (Capture Edge)
Expected Captured Value in Flip Flop 2 = 0
Actual Captured Value in Flip Flop 2 = 1

SCAN and ATPG


Scan is the internal modification of the design’s circuitry
to increase its test-ability.
ATPG stands for Automatic Test Pattern Generation; as
the name suggests, this is basically the generation of
test patterns.
In other words, we can say that Scan makes the process
of pattern generation easier for detection of the faults.
To test a fault we need to initialize the flops to the
required values.
In a bigger sequential circuit (without scan), it is difficult
to control the flop’s value through primary inputs and
observe the captured response in primary outputs.
To solve this issue we do ‘Scan Insertion’ during
synthesis.
The goal of ‘Scan Insertion’ is to make a difficult-to-test
sequential circuit behave (during testing process) like
an easier-to-test combinational circuit.
Achieving this goal involves two steps –

1. Converting Regular Flop to Scan Flop

All the flops in the design are converted into scan flops
in the below image, except –
The ones that are excluded by user. These are
called non-scan flops.
The ones that have DFT DRC violation(s).
2. Stitching the Scan Flops to form Scan Chains
The scan flops are stitched to form scan chain(s) as
shown in the below image.
The number of scan chains depends upon various user
inputs like –
Length of scan chain
Clock domain mixing
Power domain mixing
Voltage domain mixing

To initialize any flop to a value as shown in the above


image, we simply make the SE = 1, such that SI to Q path
is activated and we shift in the required values serially
through a top level primary input called Scan-Input.
Once the required values are loaded to the flops, we
capture the values from combinational circuit by
making SE = 0.
And to observe the captured response we make the SE =
1 and serially shift out the captured data through a
primary output called Scan-Output.
Thus in a way, we can say the scan flop’s output (Q) act
as pseudo primary output of the design and the scan
flop’s input (D) act as pseudo primary inputs to the
design, thereby making it a pseudo combination circuit.

Once the patterns are generated, the expected


response of the circuit for each pattern is obtained in
pre-silicon.
The expected responses along with the patterns are
then stored in the memory of Automatic Test Equipment
(ATE).
In post-silicon, the manufactured chip is tested using
the ATE, which loads the pattern and compares it with
the expected response for pass or fail status.
CHAPTER 2
On-Chip Clock Controller
On-chip Clock Controllers (OCC) are also known as Scan
Clock Controllers (SCC). OCC is the logic inserted on the
SOC for controlling clocks during silicon testing on ATE
(Automatic test Equipment). Since at-speed testing requires
two clock pulses in capture mode with a frequency equal to
the functional clock frequency, without OCC we need to
provide these at-speed pulses through I/O pads. But these
pads has limitation in terms of maximum frequency they
can support; OCC on other hand uses internal PLL clock for
generating clock pulses for test. During stuck-at testing, the
OCC ensures only one clock pulse is generated in the
capture phase. Similarly, during at-speed testing, the OCC
ensures two clock pulses are generated in the capture
phase, having a frequency equal to frequency of the
functional clock.

Therefore all the test clocks in a scan friendly design is


routed through an OCC, which controls the clock operation
in scan mode (both in stuck-at and at-speed testing) and
bypasses the functional clock in functional mode.
Scan Clocking Architecture - Modified with OCCs to
Support Scan.
The clocking architecture of a design needs to be
modified to support ‘Scan’ operation.
We take an example of a very generic functional
clocking architecture as shown in the below image and
modify it.

In the above image, there is a PLL which is generating


three different clocks (of frequency 500 MHz, 400 MHz
and 100 MHz).
The cloud like structures in the figure indicates different
clock domains (having the logic we want to test).
There are two dividers –
DIV (2) – which divides the input clock by 2. So the
frequency of the clock at the output of divider is 200
MHz.
DIV (1 or 4) – which either divides the input clock by 4
or bypasses it without any division, depending upon the
functional requirement. So the frequency of clock at the
output of divider is either 500 MHz or 125 MHz.

The dividers have functional control that determines


the division ratio of the divider.
The divider ‘DIV (2)’ always divides the input clock by a
constant value of 2; typically in such dividers the
functional control, excluding the reset to divider is
likely to be tied to a constant value in the design.
But the divider ‘DIV (1 or 4)’ can either divide the input
by 4 or bypass it; typically the functional control of such
dividers are controlled by a FSM (or any other
controlling logic).
There is also a clock mux, which has a functional
control that selects which clock it should propagate at
its output.
The frequency of the clock at the clock mux output can
be either 200 MHz (from the divider) or 100 MHz (from
the PLL).
Like the dividers, the functional control in a clock mux is
typically controlled by a FSM (or any other controlling
logic).
Modifications Required:

The modifications needed in clocking architecture for


making the design ‘Scan’ friendly –

In stuck-at testing the frequency of the clock domain we


will be testing doesn’t matter; but in at-speed testing,
we should be testing the clock domain at the maximum
frequency it supports because of the reason discussed
here.
The frequency shown in red inside the clock domains
(cloud like structures) in the below figure, indicates the
maximum clock frequency of that clock domain.
Since we have two clock dividers and one clock mux in
our design, we have to ensure the clock with the highest
frequency is propagated at the output of dividers and
clock mux for at-speed testing at correct frequency.

Clock mux –
Maximum possible frequency at the output is 200 MHz.
Since the FSM controlling the select pin of clock mux
will be part of scan chains, it will toggle during testing.
Hence the clock at the output of the clock mux
becomes unpredictable and can be any one of its input
at any instance.
To prevent this, we need to add a simple mux as shown
in the previous image, which will mask the functional
control in scan mode (Test Mode = 1), to select the
clock with highest frequency (in this case the 200 MHz
clock).

‘DIV (2)’
The 200 MHz clock at the output of clock mux is coming
from the clock divider ‘DIV (2)’, thus ‘DIV (2)’ should
function as a divider throughout the scan mode, so that
we will get the required 200 MHz clock.
If we scan the divider, the logic responsible for dividing
the clock will become part of scan chain and will toggle
during scan mode, resulting in clock of unpredictable
frequency at the output of divider; so we should not
scan this divider.
Also we have to mask the reset or any other functional
control that it likely to affect the functionality of the
divider as shown in the below figure.
‘DIV (1 or 4)’
We need the undivided clock of 500 MHz (fastest clock),
thus we need to mask the functional control to select
the undivided clock in scan mode, as shown in the
below figure.
Since we are bypassing the divider, we can scan this
divider as it will not affect the divider output.
Then we need to modify the clocking architecture to
add an On-chip Clock Controller(OCC) for every clock
domain, as shown in the previous image.
We have six clock domains, thus six OCCs.
Although the scan clocking architecture as shown in the
previous image can be further optimized for this
particular example, but this is a much cleaner and
generic representation to illustrate how we need to
define a Scan friendly clocking architecture.

W will see the very basic OCC design with the sole
purpose of demonstrating how it work.
However industry standard OCCs are much more
advanced and robust to clock glitches than the OCC
discussed here.

Glitch Free CLock Mux

In modern day chips, sometimes it is necessary to switch


between two different clocks when the chip is running.
What will happen if we use the normal mux to switch
clocks?
Well if you see the first waveform below, everything is
fine but if you see the second waveform, there are
glitches.
Why is it so? It is because, in the first waveform, the
‘select’ signal changes its value when ‘both’ the clocks
are low, but in the second waveform it is not the case.

Waveform of normal mux implementation of clock switching (no glitch)

Waveform of normal mux implementation for clock switching

This kind of glitch may lead to unwanted behavior in


the circuit.
One way to avoid it is to gate both the clocks just
before changing the ‘select’, so that when switching
occurs both the clocks are low.
However there is a better option available in terms of
using Glitch free clock mux or commonly called clock
mux.
One method of implementing a glitch free clock mux as
shown below
Waveform of glitch free clock mux implementation for clock switching

Although the above implementation of glitch free clock


mux solves our purpose, but there is a catch.
The ‘select’ pin could be asynchronous to clk1 and clk2,
and if it changes its value very near to the capturing
edge of the flop this may lead to metastability.
So it is better to use a double synchronizer instead of a
single flop in a glitch free clock mux, as shown in the
below image
Waveform of glitch free clock mux implementation (using double synchronizer) for clock switching

Basic On-Chip Clock Controller Structure


When the circuit is in functional mode (Test Mode = 0),
the OCC bypasses the functional clock.
But during the shift phase (Shift Enable = 1), the Scan
Clock is propagated at the output of OCC.
In capture phase (Shift Enable = 0), the shift register
starts shifting ‘1’ and enables the Clock Gate, to allow
single pulse or double pulse, depending on the type of
testing.
The OCC generates one clock pulse in stuck-at testing
(At-speed Mode = 0) and two clock pulses in at-speed
testing (At-speed Mode = 1).
The behavior of this OCC (having a 5-bit shift register) in
at-speed testing is shown in the below image.
The two capture pulses came after 5 positive edges of
the functional clock (as we are using a 5-bit shift
register).

Simulation waveform of the OCC structure shown in Figure 1 (having a 5-bit shift register)

Systemverilog Code of OCC


module occ
#(
parameter SHIFT_REG_BITS = 5
)
(
input logic test_mode,
input logic atspeed_mode,
input logic shift_en,
input logic scan_clk,
input logic func_clk,
output logic occ_out_clk
)
logic cg_en;
logic cg_out_clk;
logic sync_flop;
logic [SHIFT_REG_BITS-1:0]shift_reg;

always @(func_clk or cg_en) begin


if (cg_en == 1)
cg_out_clk = func_clk;
else
cg_out_clk = 0;
end

always_ff @(posedge scan_clk) begin


sync_flop <= ~shift_en;
end

always_ff @(posedge func_clk) begin


shift_reg <= shift_reg << 1;
shift_reg[0] <= sync_flop;
end

assign occ_out_clk = test_mode ? (shift_en ? scan_clk :


cg_out_clk) : func_clk;
assign cg_en = atspeed_mode ?
(~shift_reg[SHIFT_REG_BITS-1] &
shift_reg[SHIFT_REG_BITS-3]) : (~shift_reg[SHIFT_REG_BITS-
1] & shift_reg[SHIFT_REG_BITS-2]);

endmodule
CHAPTER 3
LFSR & Ring Generator
An n-bit Linear Feedback Shift Register (LFSR) consists of ‘n’
memory elements (or flops) and XOR gates. There are
basically two types of LFSR –

1. Standard Form (also known as External Feedback LFSR)


2. Modular Form (also known as Internal Feedback LFSR)

LFSRs can be represented by its characteristics polynomial


hnxn + hn-1xn-1 + . . . + h1x + h0, where the term hixi refers to the
ith flop of the register. In standard form LFSR, if hi = 1, then
there is a feedback tap taken from this flop and in modular
form LFSR, if hi = 1, then there is a feedback to the output of
this flop.

Standard Form LFSR

Modified Form LFSR


To avoid these issues, EDT uses a Ring LFSR structure
(called Ring Generator).
This is a simple LFSR structure folded back on itself to
form a ring with multiple tap points.
Shown below is an example of a simple 8 bits Ring
Generator implementing the polynomial, f(x)= x8 + x5 +
x2 +1

Ring Form LFSR


A Ring LFSR has a smaller number of levels of logic than
its corresponding external feedback LFSR and smaller
fan-out than its corresponding internal feedback LFSR .
Q0 is having 3 fan-outs but in its corresponding Ring
LFSR implementation each flop can have maximum 2
fan-outs.
The reduction is fan-outs is significant in large LFSRs.
Thus it minimizes XOR gates, has low fan-out and also
has efficient physical implementation.

Ring LFSRs are obtained by transforming conventional


LFSRs in such a way that many realizations having the
same characteristic polynomial are generated.
Shown below is an example of how a conventional LFSR
is transformed into a Ring LFSR.

Modular LFSR structure f(x) = x8 + x5 + x2 + 1

Fold the LFSR to make it look like a Ring


Horizontally flip the Ring

Elementary Shift Left transformation (1)


The XOR position is shifted left by 2 positions [Q2 to Q4 and Q5 to Q7]
The feedback origin position is shifted left by 2 positions [Q0 to Q2]

Elementary Shift Left transformation (2)


The XOR position is shifted left by 1 position [Q4 to Q5]
The feedback origin position is shifted left by 1 position [Q2 to Q3]
CHAPTER 4
Logic Built In Self Test (LBIST)
LBIST is a form of built in self-test (BIST) in which the logic
inside a chip can be tested on-chip itself without any
expensive Automatic Test Equipment (ATE). A BIST engine is
built inside the chip and requires only an access mechanism
like the Test Access Port (TAP) to start.

TAP and TAP Controller

Test Access Port (TAP)


It is the interface used for JTAG control.
The IEEE standard defines four mandatory TAP signals
and one optional TRST signal.
1. TDI (Test Data Input) – It is used to feed data
serially to the target.
2. TDO (Test Data Output) – It is used to collect data
serially from target.
3. TCK (Test Clock) – It is the clock to the registers.
4. TMS (Test Mode Select) – It controls the TAP
controller state transitions.
5. [Optional] TRST (Test Reset) – It resets the TAP
controller.
TAP Controller
It controls the JTAG operation.
It is basically a 16-state Finite State Machine (FSM)
whose state transitions are controlled by the TMS
signal.
The TAP controller can change state only at the rising
edge of TCK and the next state is determined by the
logic level of TMS and the present state.

This shows a very basic top-level view of TAP controller.


TMS, TCK and the optional TRST signals go to a 16-state
FSM, which produces various control signals depending
upon the FSM’s state.
These output signals include dedicated control signals
for Instruction Register (IR): CaptureIR, ShiftIR, UpdateIR
and generic control signals for all Data Registers (DR):
CaptureDR, ShiftDR, UpdateDR along with other control
signals.
State transition diagram of TAP Controller FSM
A brief description about the different states of the TAP
controller –

Test-Logic-Reset:
It resets the JTAG circuits.
Whenever the TRST (optional) signal is asserted, it goes
back to this state.
Also notice that in whatever state the TAP controller
may be at, it will goes back to this state if TMS is set to 1
for 5 consecutive TCK cycles.
Thus if we don’t have the TRST signal then we can still
reset the circuit.

Run-Test/Idle:
This is a state in which the FSM is waiting for some test
operations to complete.

Select-DR/Scan and Select-IR/Scan:


This is a temporary state to allow the test data
sequence for the corresponding Register (the IR in
Select-IR/Scan state and the selected DR in Select-
DR/Scan state) to be initiated.

Capture-DR and Capture-IR:


In this state, data can be loaded in parallel to the
corresponding Register
Shift-DR and Shift-IR:
In this state, the required test data is loaded (or
unloaded) serially into (or from) the corresponding
Register.
When the TAP controller is in this state, it will stay at
this state as long as TMS=0.
For each clock cycle, one data bit is shifted into (or out
of) the selected Register through TDI (or TDO).

Run-Test/Idle:
This is a state in which the FSM is waiting for some test
operations to complete.

Select-DR/Scan and Select-IR/Scan:


This is a temporary state to allow the test data
sequence for the corresponding Register (the IR in
Select-IR/Scan state and the selected DR in Select-
DR/Scan state) to be initiated.

Capture-DR and Capture-IR:


In this state, data can be loaded in parallel to the
corresponding Register
BIST Architecture

The general architecture of an on-chip BIST consists of 3


major components –
1. BIST controller
2. TPG (Test Pattern Generator)
3. RA (Response Analyzer)

The basic mechanism of LBIST is it uses a Linear


Feedback Shift Register (LFSR) to generate the inputs to
the device’s internal scan chain, initiate a functional
cycle to capture the response of the device, and then
compress the captured response using a multiple input
signature register (MISR).
The compressed response that comes out of the MISR is
called the signature.
Any corruption in the output signature indicates a
defect in the device.
Test Pattern Generation (TPG)

It generates the test patterns required to sensitize the


faults and propagate the effect to the outputs (of the
CUT).
Typically a Standard form LFSR is used for generating
pseudo random patterns which acts as the input test
vector.
Each flop in the LFSR feeds a scan chain as shown in the
below image.
In pseudo random patterns, the patterns are repeated
after certain interval of time, which is why we use the
term ‘pseudo’.

As discussed earlier, the LFSR is represented by its


characteristics polynomial.
Suppose we are initializing two 4-degree LFSR having
characteristics polynomial f(x) = x4 + x3 +1 and f(x) = x4 +
x2 + 1, with 1000.
The sequence generated by the LFSR with
characteristics polynomial f(x) = x4 + x3 +1 repeats itself
after 15 sequences, thus has a period of 24 – 1 = 15.
But the sequence generated by the LFSR with
characteristics polynomial f(x) = x4 + x2 +1 repeats itself
after 6 sequences.

If the sequence generated by the LFSR has a period 2N –


1, where N = number of flops in LFSR (or degree of the
LFSR), then the LFSR is called maximum-length
sequence or m-sequence.
Now the characteristic polynomial of the m-sequence
LFSR is called primitive polynomial.
The choice of polynomial has a great impact on the
cycle length.
Better the length of cycle implies better TPG, thus a
LFSR implementing a primitive polynomial is best suited
for Test Pattern Generation.

Why we need Phase Shifter?


One of the major disadvantage of LFSR is there is not
enough randomness.
Consider the same m-sequence LFSR we discussed
earlier f(x) = x4 + x3 +1. As shown in the below image, in
the absence of phase shifter, there exists a diagonal
relationship between adjacent bit stream.
How to find the seed of a LFSR?

The LFSR feeds a scan chains with 10 flops. S0-S9 are


the test pattern specified by the ATPG and Q0-Q3 are the
seeds of the LFSR.

There are two methods –


1. Cycle by cycle tracing
2. System of Linear equations

Initial conditions (S9 is the first input to the chain from


LFSR, thus equal to Q0, and similarly till S6) –
S9 = Q0
S8 = Q1
S7 = Q2
S6 = Q3
S5 = S9 ⊕ S6 = Q3 ⊕ Q0 = Q0 ⊕ Q3
S4 = S8 ⊕ S5 = Q1 ⊕ Q3 ⊕ Q0 = Q0 ⊕ Q1 ⊕ Q3
S3 = S7 ⊕ S4 = Q2 ⊕ Q1 ⊕ Q3 ⊕ Q0 = Q0 ⊕ Q1 ⊕ Q2
⊕ Q3
S2 = S6 ⊕ S3 = Q3 ⊕ Q3 ⊕ Q1 ⊕ Q2 ⊕ Q0 = Q0 ⊕ Q1
⊕ Q2
S1 = S5 ⊕ S2 = Q3 ⊕ Q0 ⊕ Q1 ⊕ Q2 ⊕ Q0 = Q1 ⊕ Q2
⊕ Q3
S0 = S4 ⊕ S1 = Q1 ⊕ Q3 ⊕ Q0 ⊕ Q1 ⊕ Q2 ⊕ Q3 = Q0
⊕ Q2

Thus we have 10 equations and 4 variables Q0, Q1, Q2


and Q3.
Suppose the pattern S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 is: 1 X
X X 0 1 X X 1 0 respectively.
Then we write the 10 equations in the form of matrix as
shown below –
From the matrix –
S9 = Q0 = 0
S8 = Q1 = 1
S5 = Q0 ⊕ Q3 = 1, but Q0 = 0, thus Q3 = 1
S6 = Q2 ⊕ Q0 = 1, but Q0 = 0, thus Q2 = 1

Thus the required seed of the LFSR is 1110

Responsive Analyzer
What is the role of Response Analyzer (RA)?
It compresses the CUT output responses into a small
signature, so that it can be stored on-chip.
It compares the signature (generated in silicon) with the
gold signature (generated in pre silicon) to determine
Pass/Fail.
It is also called signature analyzer or output response
analyzer.

What are the characteristics of a good RA?


The signature generated should be as small as possible,
so that it occupies less memory while storing the gold
signature on-chip.
It should have correct Pass/Fail decision, i.e. low
aliasing.
The RA logic should be as small as possible, implies less
area overhead.
It should support diagnosis, implies when CUT fails it
should be able to find the source of failure.
Aliasing
Aliasing occurs when the signature generated by a
faulty output is same as the gold signature.

Signature faulty o/p = Signature good o/p

Thus aliasing can lead to loss in fault coverage as we


cannot cover the faults that will be generating the gold
signature.

Probability of Aliasing (PAL)


It can be defined as –
LFSR based Response Analyzer (RA)
Serial : compress one bit at a time
Parallel : compress multiple bit at a time

LFSR based RA – serial


Consider a LFSR with a characteristic polynomial f(x) =
x4 + x + 1.
The external input to the LFSR is a bit stream coming
from a CUT.
The final signature is generated once all the input bit
stream are exhausted and in this case the gold
signature is 1011.
The gold signature size is equal to the number of flops
in LFSR.
But this method is too slow.
Let’s say the input bit stream is of 200 bits, you don’t
want to calculate the LFSR value at each cycle as it will
take a long time.
Therefore we use CRC theory to calculate the signature.
CRC Theory
It represents the input bit streams by a polynomial.

Example: The bit stream 011011011 (The right most bit


is the first bit to enter LFSR), can be represented as x +
x 2 + x4 + x5 + x7 + x8.
Now consider the same LFSR, whose polynomial is f(x) =
x4 + x + 1.

The modular LFSR acts as a Modulo-2 divider, whose –

Dividend = LFSR input bit stream = x + x2 + x4 + x5 + x7 + x8


Divisor = LFSR characteristic polynomial = x4 + x + 1
Quotient = we calculate it
Remainder = the Signature
Probability of Aliasing (PAL)
Estimate of LFSR based serial RA

M = length of input bit stream


N = degree of polynomial

Study shows, PAL of primitive polynomial converge to


final steady state value faster than non-primitive
polynomial.
Thus it is good to use primitive polynomial.

LFSR based RA – parallel

Serial RA only compress one CUT output at a time,


implies for compressing multiple CUT outputs at a time
we need one LFSR for each CUT output, which will lead
to too much hardware overhead.
Therefore we use a parallel LFSR based RA called
Multiple Input Shift Register (MISR).
MISR has similar structure to LFSR, except parallel
inputs feed XOR between the stages.
Also MISR characteristic polynomial is same as LFSR.
Equivalent LFSR of a MISR

MISR and its input bit stream can be mapped to its


corresponding equivalent LFSR by just phase shifting
and adding the input bit stream.

Probability of Aliasing (PAL) Estimate of MISR

K = length of input bit stream


M = length of equivalent LFSR bit stream = K + N – 1
N = degree of polynomial
Masking in MISR

Masking means one error bit cancels another error bit


before reaching the MISR feedback tap points.
Consider the scenario shown below –

Assuming there is no aliasing, the signature generated


in Case 1 will be different that the golden signature.
However the signature generated in Case 2 will be
same as the golden signature as the equivalent LFSR
input bit stream in the presence of error is same as the
golden input bit stream.
This is known as Masking.
CHAPTER 5
Test Compression
The test data volume increases exponentially with increase
in circuit size. For large circuits, the growing test data
volume causes a significant increase in test cost because of
much longer test time and elevated tester memory
requirements to store the test data. Therefore test
compression techniques are essential to reduce the test
cost by reducing the Scan patterns while trying to keep the
same test quality.

Test Data Volume ≈ Number of Scan Cells in all the Scan Chains
× Scan Patterns

In the International Technology Roadmap for


Semiconductor (ITRS) 2013, it was predicted that more than
1000x test compression would be needed by 2020. Although
there exists software techniques for test compression that
are implemented by Automatic Test Pattern
Generation(ATPG) tools in the form of complex algorithm,
still it is not enough to achieve high test compression.
Therefore we go for Hardware based test compression
techniques by adding additional logic to the circuit, at the
cost of increased area.
A typical implementation without test compression logic

A typical implementation with test compression logic


(Note: N2 < N1 and M2 > M1)
Embedded Deterministic Test (EDT)

One of the most common hardware test compression


technique is EDT.
Tessent TestKompress is the tool that can generate the
decompressor and compactor logic at the RTL level.
The decompressor drives the scan chain inputs and the
compactor connects from the scan chain outputs.

Typically when an ATPG tool generates a pattern, it


target a group of faults as a result only a small number
of scan flops need to take specific values.
And it would use random values to fill up the
unspecified scan flops that cannot improve targeted
fault detection.
Thus in conventional ATPG, the patterns consists of
many ‘x’ or don’t care bits that increases the test data
volume and loading and unloading these bits to scan
chains increases the tester time.

But EDT processes the desired bits of the ATPG pattern


and determines how to load them through the
decompressor in the form of EDT pattern.
After processing, the resulting compressed pattern (or
EDT pattern) is loaded through the decompressor, and
the specified bits of the ATPG pattern get loaded into its
respective scan flops.
A side effect of the decompressor is that all the
unspecified bits get loaded with random data, this side
effect is in fact the reason aiding the compression.
Thus test volume decreases by not having to store these
unspecified bits and many of tester cycles are saved by
not having to specifically load random data.

Decompressor
The decompressor consists of a ring generator, which is
basically a Ring LFSR with external inputs.
The external inputs feeding the ring generator are
commonly referred as EDT channels.
The outputs of the ring generator flops will connect to
scan chain inputs through a phase shifter consisting of
XOR gates.
As discussed earlier, phase shifter helps supporting
more scan chains than the degree of LFSR.
Creation of the compressed pattern from the original
ATPG test pattern consists of solving a set of linear
equations based on the ring generator polynomial and
the phase shifter connections.
Inputs to the ring generator are driven from the
compressed pattern stored on the ATE.
A typical decompressor structure

LFSR with External chains


Compactor
Basically there are two types of Test Response compactor –

1. Spatial compactor [reduces the number of output pins


compared to input pins]

2. Time compactor [reduces the length of the output bit


stream compared to the length of input bit stream]
EDT uses the spatial compactor which consists of group
of XOR trees.
It allows multiple scan chains to be observed at the
same time on a given scan output channel.
Several scan chains are XOR-combined into individual
scan channels as shown below –

But there are two problems that we may encounter

1. ‘X’ contamination due to unknown value propagation


2. Fault Aliasing due to bad Probability of Aliasing
‘X’ contamination due to unknown value propagation

Scan cells can capture unknown or ’X’ values from black


boxes, non-scan cells, false paths, etc.
Let’s assume we have two scan chains that are
compacted into one scan channel using one XOR gate,
as shown below.
An X captured in one of the chain will then block the
corresponding cell in other chain, resulting in loss of
observability.

Fault Aliasing due to bad Probability of Aliasing


A fault is aliased when it is observed by an even number
of scan cells that happened to line up at the same
location in different scan chains that are compacted to
the same output channel.
The example shown below here illustrates this case. For
this unique scenario, it is not possible to see the
difference between a good and faulted circuit.
To deal with these issues, a mask controller is also
found as a part of compactor logic.
This mask controller along with masking logic at the
scan chain output can selectively mask scan chains
based on few bits (called mask code) at the end of the
pattern shifted-in, that don’t make it to the
decompressor.
Excellence in World class
VLSI Training & Placements

Do follow for updates & enquires

+91- 9182280927

You might also like