Test Point Insertion For Test Coverage Improvement in DFT - Design For Testability (DFT) .HTML
Test Point Insertion For Test Coverage Improvement in DFT - Design For Testability (DFT) .HTML
Test Point Insertion for Test Coverage Improvement in DFT
Hongwei Wang
STMicroelectronics (Shenzhen) R&D Co., Ltd.
[email protected]
Abstract
In a complex ASIC design, there are usually some uncontrollable or
unobservable logics. Because these logics are unable or difficult to control
and/or observe, it is very difficult or impossible to test them. The consequence
is low test coverage, and this will lead to the part’s reliability problem. Test
Point Insertion is an efficient technique to improve a design’s testability and
improve its test coverage by adding some simple controllable and/or
observable logic.
This paper presents an example of Test Point Insertion. This is a real project
of using DFT Compiler and TertaMAX to improve test coverage. We will
analyze and explain the main causes which lower test coverage, and then
provide a solution for test coverage improvement. By comparing the results of
both pre and post Test Point Insertion, we can see that test coverage and test
efficiency have been greatly improved with just a few test point insertions and
the addition of a few logic gates. This paper analyzes the causes that lead to
low test coverage and introduces the test design flow using Test Point
Insertion technique. Test Point Insertion technique can solve two typical test
issues that lower test coverage: shadow logic between digital logic and black
box, and the un-bonded pads in a multi-die chip.
Key words: Test Point Insertion, Test Coverage , DFT.
1. Frequetly Problems
To speed up ASIC design and reduce the turnaround time of marketing and
mass production for an electronics product, the development schedules of
Very Large Scale Integrated Circuit (VLSI) design and manufacturing
become shorter and shorter. Design engineers must take care of the possible
defects and debug them during the manufacturing process. Design for
Testability (DFT) takes an important role of improving product yield.
Test coverage and test efficiency are the most important standards for design
quality when we use DFT techniques. A good design with high test coverage
should be observable and controllable. It is understandable that 100% test
coverage is difficult to achieve. In order to save the testing costs, design
engineers should use fewer test patterns with higher coverage. Test efficiency
is also very important.
In addition, DFT engineers must guarantee the function remains unchanged
during DFT design. In some designs, as some mission mode function logics are
uncontrollable and unobservable, and these logics potentially cause many test
problems. Test coverage for these designs could not be improved without
using advanced test techniques or simply by increasing the number of test
patterns.
Test coverage is defined as the percentage of detected faults out of total
detectable faults. This coverage is a more meaningful evaluation for test
pattern quality. Fault coverage is defined as the percentage of detected faults
out of all faults, including the ATPG Undetected faults. By definition we can
find that test coverage is always higher than fault coverage. The formula
below shows the calculation for test coverage and fault coverage.
General speaking, the primary input and output pads are used for
controllability and observability in DFT design. Usually it is impossible to
reserve enough dedicated pads for every design because of the limitation of IO
pads. We normally consider analog macros such as PLL, ADC and memory as
a black box during ATPG test. Figure 2 shows digital black box interface. The
RAM macro below is an example of a black box. Since the addr_bus, din_bus
and net_1 go directly into the pins of memory, and the related logic cone
“Com. Logic 1” sinks into the memory input pins, this logic could not be
observed directly and therefore could not be tested. The test coverage of the
design is low. Meantime, at the output pins of memory block, the nets
dout_bus, net_2 and net_3 cannot be controlled directly because they are
connected to memory’s output pins, these pins are considered as “X” state
during Automatic Test Pattern Generation, therefore the “Com. Logic 2”
could not be tested. Because of these test problems, the test coverage for this
design is not high enough to meet the target of DFT.
Clock gating may also affect the testability problem for Automatic Test
Pattern Generation. To make the clock signal from clock gating cell
controllable, we can use a control signal to bypass the clock gating cell or
make it transparent during test or scan shift. We have two choices for the
control signals: one is test mode and the other is test enable signal. It is
recommended to use test enable signal as clock gating control signal, since the
test mode keeps “1” in all test procedures while test enable only keeps in “1”
in shift mode. Sometimes, we have to use test mode signal because of the
impact from other modes. In this case, we can insert test points to increase the
test coverage.
2. Solution
In order to fix the low global test coverage problem, we focus on making the
related logic controllable and observable during RTL design or during scan
chain insertion. This will help us to understand how to write good testable
RTL code during function design and use DFT features of DFT Compiler for
Test Point Insertion (TPI).
Figure 3 gives a solution to improve test coverage by adding TPI for un-
bonded pads in Figure 1. One multiplex register is inserted at the output of
the input pads as illustrated in the graph. When the control test enable signal
TE is “0”, the circuit works in normal operation mode, its function logic
receives the normal input data from primary inputs. When the control test
enable signal TE is “1”, the circuit works in test mode: during shift process,
the pre-load bits are shifted through the scan chain; during capture process, a
pull-down or pull-up is applied to prevent the test logic from the global “X”
propagation.
As for the output pads, some XOR cells and multiplex cells are inserted. When
the control test enable signal TE is “0”, the circuit works in normal operation
mode, the related pads output normal function response. When the control
test enable signal TE is “1”, the circuit works in test mode: the XOR cells take
the consideration with un-bonded pads, so these ports can be observed
equivalently.
Figure 3: Test Point Insertion to Improve Test Coverage (for Un-bonded Pins)
As for the interface between the logic and the black box shown in Figure 2, the
black box input signals come from the combinational logic and these signals
go to the black box. The logic is not observable so the test coverage is low. For
the black box’ s output signals, they control the next level combination logic
directly. These signals are uncontrollable in test modes because they are from
the black box. Therefore the logic is not controllable, which lowers the test
coverage. Similar to the solution for un-bonded pads, Figure 4 shows the
solution to improve testability. We insert some test points in the design and
put the above mentioned signals into the scan chain in order to make them
controllable and observable which finally improves the test coverage and
efficiency.
In Figure 4, some XOR cells and multiplex cells are added at the input pins of
the black box; these cells control the inputs of the black box and they are
inserted into scan chain. When the control signal is “0”, the circuit works in
normal operation mode, and the black box inputs receive signals from
function input ports. When the control signal is “1”, the circuit works in test
mode: the previously unobservable signals go through XOR cells and these
signals can be observed transparently.
Also one multiplex register is added at each output pin of the black box to
control the next-level combinational logic. When the control signal is “0”, the
circuit works in normal operation mode, and the function logic receives the
normal input data from the black box. When the control signal is “1”, the
circuit works in test mode: Pre-loading data are shifted through the scan
chain during the shift process. In capture process, a ground connection is used
to prevent the test logic from the global “X” propagation.
For a design with black box modules, it is recommended to take them into
account early during RTL design. Design engineers can balance a design’s
functionality and its testability simultaneously. If testability is not considered
during function design, TPI techniques can be used to solve the testability
problem of the design. These techniques are flexible and easy to use as a
common solution.
3. TPI Application
From the previous analysis of testability problems and solutions provided,
some test points can be inserted to put the uncontrollable or unobservable
logics into the scan chain. By using this technique, these logics can be tested,
and test coverage and test efficiency can be improved greatly.
Test Point Insertion (TPI) is a useful technique for solving the potential
testability problem and improving the test coverage of a design by making its
uncontrollable logic controllable and unobservable logic observable. This
technique also helps to improve the test efficiency since the higher coverage
can be derived with few test vectors increasing. This technique is very easy to
put into application since only a few commands are added in the existing
scripts.
As for the multi-die package design, the “add net connections” command can
be used to eliminate the unbonded pads before pattern generation. These
unbonded pads can be defined as TIE0, TIE1 using embedded pull-up, pull-
down cells. These pads can be defined as floating if they are not connected to
any pin during packaging. The following example removes the primary input
pads or inout pads; ATPG will exclude these pads during pattern generation.
Before the test point insertion, it is important to check the global test coverage
and analyze where the bottle-neck for low test coverage is; otherwise the test
point efficiency will not be good enough to meet the test target. TetraMAX is
recommended to report the global test coverage of a netlist with scan chain
insertion and then get the TPI design guidance. Figure 6 indicates test
coverage and fault coverage from the design.
According to the formulas for calculating test coverage and fault coverage, the
UD (Undetectable) is excluded from the test coverage calculation; but the AU
(ATPG Untestable) fault is included in coverage calculation. This is why we
focus on AU faults for test point insertion to make them testable. With “report
fault –class AU” command in depth option, the test coverage will be reported.
This report gives the outline of test coverage for every module. We can find
out which macros cause low test coverage mostly, then analyze the related
logic carefully; TPI can be applied efficiently to receive the maximum return
with little logic added. As an example, Figure 7 shows script command to
report the AU Faults in the design.
Figure 8 indicates the AU faults report with the related command for test
coverage analysis. In this example, the main cause of low-test coverage is the
untestable logic between memories and digital interface, also the untestable
logic between analog and digital. Because the memories address and data in
RAM are dedicated to special functionality and have nothing to do with other
logic, this logic cone cannot be controlled by the primary ports in capture
mode; they act just like sinking points, which need the test point insertion to
make test coverage higher.
We recommend including TPI in traditional scan insertion flow when using a
DFT Compiler. Some commands are added in script simply to define which
instances are required for the test point insertion, then we insert scan chains
using the original configuration. This flow is convenient for both design
review and checks.
Figure 9 below indicates the script file for test point insertion. Many options
can be used for control point and observe point insertion according to our
requirement. According to our experience, both observe points and
controllable points are sensitive for test ceoverage improvement, most of
important thing is to choose the correct points for the higher test efficiency.
Also scan chains can be inserted with traditional configuration. Figure 10 and
figure 11 show the post-TPI netlist adding the observation points and control
points, the name of inserted DFF instances follows the name rule
“udtp_sink***” by default, which means “user defined test point”. The
instance named “EOLL” is the XOR gates according to the design request.
With the same options, Figure 12 indicates the test coverage and fault
coverage of a post-TPI netlist in ATPG flow. With just a few test patterns
increased, global test coverage is improved greatly which meets our test
target. Figure 13 shows the test coverage increase for modules with TPI.
Figure 12: ATPG Test Coverage Report with Post-TPI Netlist
The reports from DFT Compiler and TetraMAX reveal that the little
additional logics can mean a lot of improvement for test coverage to satisfy the
test target. Because only a few instances are added for coverage improvement,
this TPI will not impact too much on Back-End design flow. Also there is no
functionality difference between pre-TPI and post-TPI netlist; since the added
test points only takes some multiplexers or XOR gates into account, it is clean
from the design function. In functional mode, the design passes formal check
successfully and smoothly by using formality; the TetraMAX guarantees the
integration of scan chain and ATPG processing.
After we analyzed the post-TPI netlist, we concluded that the TPI technique
and DFT Compiler are very useful to insert observe and control points in a
test unfriendly design, and therefore improve the test coverage with little area
overhead.
4. Conclusion
In our project mentioned above, there are more than ten thousand registers in
the original design. With the TPI technique, we only added 12 registers and a
few combinational logics. The test coverage increased from 95% to 98.3%.
Obviously, this technique is very efficient and easy to use. More test points can
be inserted if necessary for higher test coverage. The test coverage can reach
nearly 100% in theory.
We strongly recommend design engineers use TPI technique in their design
flow. By doing so, we can anticipate the different design structures required
for function design and design for testability. On one hand, the function
design is clear in operation mode. On the other hand, uncontrollable and
unobservable problems are avoided mostly in the DFT design. However, if
these issues exist after the RTL is frozen, we may use DFT tools to insert user-
defined test-points. DFT Compiler and TetraMAX from Synopsys have the
capability to accomplish this job.
It is also important to choose the right test points’ locations. If the location is
chosen improperly, test coverage could not be improved. Test coverage could
not be increased further even with more test patterns.
Based on analysis and implementation of this project, test coverage and
efficiency have been improved greatly with just a few logic gates. This
methodology is very easy to use, and we only needed to add a few commands
in our existing scripts. It is useful for almost all DFT design, especially for
those designs requesting higher test coverage. We strongly recommend this
technique to improve test coverage for our designs.
5. Reference
[1] DFT Compiler User Guide Vol.1: Scan(XG Mode) X2005.09,September
2005;
[2] TetraMAX ATPG User Guide Version X2005.09, August 2005;
Classic
Flipcard
Magazine
Mosaic
Sidebar
Snapshot
Timeslide
1.
2.
JAN
8
The on-chip IEEE 1149.1 JTAG interface and TAP offer an option to share
some of its pins during scan test. These includes TCK, TDI and TDO for clock,
scan input and scan output respectively. An additional pin viz. TMS is not
shared for scan test as it is used to sequence the TAP controller. (The fifth
pin TRSTN is optional). In many cases, these pins are used to configure the
device into different test modes, thereby making it difficult to share them for
scan test. A simple solution is to put the TAP controller in Run_Test_Idle state
upon entering into the scan test mode and forcing the
internal TMS and TRSTN signals of TAP controller to an appropriate value,
while releasing the DUT pins. However, the control of the TAP FSM is now
lost. Re-powering up is required to reset it and get control of the pins again.
Such a power-up has many implications in an SOC, including ATE and DUT
internal test time due to the initialization of several on-chip functions,
including embedded power management. (However, our focus is not as much
on the reduction of the test mode control pins (namely JTAG), as is on the
elimination of scan control pins). A novel mechanism has been developed
through which these two pins, TMS and TRSTN pins are also shared
dynamically for scan test. At any point during or after scan test, the functional
control on these two pins can be regained back without the need of any
additional power-up.In proposed solution, the TAP controller is kept
in Run_Test_Idle state and when the scan enable pin is asserted, internal TMS
and TRSTN signals at the TAP controller levels are asserted to suitable logic
levels, thereby allowing these two pins to be shared for scan test. During the
capture phase when scan enable is de-asserted, functional control on these two
pins is regained back and they are kept at the desired functional values to
improve coverage. With such an implementation, it is possible to share these
pins for scan test thereby reducing the number of tester contacted pins or
being able to increase the scan channel bandwidth. Additionally, it is possible
to combine the pattern detection based generation of SE signal internally with
the proposed method of sharing of all JTAG pins dynamically.
For details :
https://ieeexplore.ieee.org/document/8326907/
Posted 8th January by Raj
0
Add a comment
3.
SEP
7
1. Introduction
The main issue of the classical scan pattern verification flow described above
is coming from its first step: the validation of bypass pattern within a big
System on Chip required a lot of runtime that prevent any quick debug. Any
change within a pattern or in the design conducts to a restart of simulation
that can required dozen of hours to get a first bit to be shift-out. Switching a
design in scan bypass mode
The first inconvenient of the classical flow is then the runtime of the bypass
scan pattern simulation. Anyway, it has to be the first step because in another
hand, the debug capabilities on compress patterns are close to nil using a
classical verification flow. When a bypass pattern allows identifying a clear
relationship between a bit shift at the input of the design and a given scan
chain flip-flop, this is no longer the case with compressed patterns. The de-
compactor and the compressor become a scan data encoder for the simulation
point of view. A simulation fail on a scan channel after the decompressor
cannot be linked to a given flip-flop and the debug of the pattern is then not
possible.
As illustrated by the following table, in most of cases this is not possible to
know with the classical flow which flip-flop is failing when running simulation
in serial mode.
As explained before there is almost no debug capability when simulating
compressed scan pattern in serial mode. But regarding the runtime required
to get the result of a scan shift simulation, a compressed pattern is then much
faster. An easy and accurate estimation of the difference of duration between
the two simulation (shift bypass versus shift in compressed mode), can be done
by considering the maximum length of a scan chain in both mode. A SoC
composed in compressed mode of 1’000 internal scan segments of 256 flip-
flops and that is connected to a tester through 16 scan inputs and 16 scan
outputs, will have 16 scan chains of 16’000 flip-flops. The bypass pattern
requires then 62.5 time more shift cycle than the compressed pattern. If we
neglect the setup phase, the simulation of a bypass pattern is around 60 times
slower to get a result then the compressed one.
Debug Capability Run Time
Bypass Simulation + –
Compressed Simulation – +
Tetramax allows generating both serial and parallel STIL patterns for
simulation. For a same set of patterns this is then possible to get equivalent
serial and parallel simulation data. As explained earlier, serial simulations are
requesting an important runtime; and this is particularly the case when
running bypass simulation. In another hand, pattern debug is only possible
when running bypass simulation (in case of serial simulation). In this chapter
we are going to cover the way to retrieve an efficient flow to be able to debug
compressed patterns during a serial simulation.
The STIL format allows getting a full description of each of the internal scan
segment. The flow that is proposed is based on a test bench generator that
computes the overall information available within on both serial and parallel
STIL files. This flow aims to provide the same debug capability for a serial
compressed pattern than a bypass one.
The debug is made possible thanks to the parallel information that is included
within the STIL parallel pattern.
In order to make this conversion, the first information that is required is the
composition of each internal scan segment, including the order of each
segment and any potential inversion between flip-flops. The STIL format
provides this information. For each internal scan segment, the STIL provide
the full path to each scan element, and the character ! is used to specify any
inversion along the scan path.
Figure 2 : description of internal scan segment
The parallel format of a pattern describes the expected value at the end of
each internal segment. The format provided then the value to spy before the
decompactor when running a serial simulation. In combination with the serial
pattern, it allows then knowing at each shift out cycle, what should be the
expected value on each internal scan segment output, and the value expected
on the scanout port of the design.
Figure 4 : Update the internal scan check with the number of shift cycles
Integrating those checks within a serial compressed mode simulation allows
knowing to link a fail occurring on a scan output of the design with an
internal scan segment.
Then after, thanks to the rebuilding of the internal scan segment illustrated by
the figure 2, the generated test bench is able to provide important debug
information:
The shift cycle where the fail occurs
Failing internal scan segments (might be one or more)
The falling DFF with the expected and simulate value (thanks to
the inversion over scan chain information)
With this debug information, this is then possible to identify clearly the reason
of a fail on any compressed pattern.
The following figure shows the overall proposed flow. As described, most of
the required data are classical output of Tetramax. The basis of the test bench
is the one provided by the Max Testbench (STIL2VERILOG script). An extra
script known as “Internal Scan Segment Spy TB Generation”, computes all
the data Tetramax can provide in order to create an efficient serial test
bench to be used when running serial simulation of compressed patterns. This
is the script that has been specifically developed in order to support the
proposed methodology.
The test bench provided by this custom script allows keeping the same debug
capabilities as you have when running bypass pattern serial simulation.
4. Conclusion
Accelerating the debug of scan pattern and the related debug capabilities
represents a real value on System on Chip development. The gain is even
bigger when you multiply the number of compressed modes that are required
within your system.
In some particular case, you might have no other choice than running most of
your debug in serial mode. In such case, the classical debug approach leads to
the incapacity of any efficient debug and is then time consuming for
hypothetic result.
Thanks to both serial and parallel pattern formats that are provided by
Tetramax, this is possible to achieve the same level of debug on serial
simulation whatever the type of scan pattern (bypass/debug).
The overall gain of using this simulation flow depends on the number of serial
simulation you required according to your design test plan. But at least there
is a common gain across the different projects: this is no longer required to
start debugging the scan shift through a bypass pattern. Since you keep the
same level of debug capabilities on compressed patterns, you can speed-up the
shift phase debug by running a compressed pattern that will provide the first
shift out much faster than standard initial debug through a bypass scan check.
Why does the post layout netlist get low coverage if scan reordering is done
during place & route?
Problem
Why does the post layout netlist get low coverage if scan reordering is done
during place & route?
Solution
If there is a scan chain that has positive and negative edge triggered flops,
then there is a possibility that the modified design will have
a problem with lower coverage.
When you synthesize the net list, the synthesis tool puts a lockup latch if there
is a positive edge triggered flop followed by a negative edge triggered flop in
same scan chain or if there are two flops connected to each other consecutively
driven by different clock domains. The Synthesis tool will try to place all
negative edge flops followed by all positive edge flops so that a lockup latch
can be avoided.
Now consider the case when the scan chain is in Place and Route and gets
reordered. The Place and Route tool will not recognize the positive and
negative edge nature of the flops during reordering. The result may be a
case where a positive edge flop is followed by a negative edge flop. There is a
need for the lockup latch in this case but P&R will not insert the lockup latch
itself. So this will cause a problem with low coverage.
The solution is to check the scandef after P&R and check to see if there is the
need for lockup latches. If possible, try switching off reordering of the scan
chains. The problem can also be avoided by modifying the scandef supplied
from Synthesis to P&R.
Problem
What are the things which need to be be considered before doing scan chain
reordering in post layout netlist?
Why there is difference in scan chain length between pre-layout netlist and
post-layout netlists, when scan chain reordering is done in a post layout
netlist?
Solution
While doing scan chain reordering you need to take care that, if there are
flops where the receiver flop is negative edge triggered and the transmitter
flop is positive edge triggered, a lockup latch is required to be inserted.
Otherwise the receiver flops may not be seen as a scannable flop thus resulting
in a reduction in the scan chain length and coverage loss.
Even if you have lockup latches in the original design before reordering, you
should check that the final netlist has handled this properly with lockup
latches.
What is the best way to reorder the scan chains within a partition (swap the
regs between chains)?
Problem
I define my scan chains using a scan DEF file. In this file I use
the PARTITION keyword when defining chains to identify the compatible
scans chains. How do I enable scan reordering in the Encounter Digital
Implementation System to swap registers between the compatible scan
chains? I want to enable swapping because it reduces the length of the scan
route and thereby reduces congestion in the design. I do not see much
improvement with the following flow:
setPlaceMode -reorderScan true # default is true
placeDesign
optDesign -preCTS
scanReorder
The following is an example of my scan chain definitions in the DEF:
VERSION 5.5 ;
NAMESCASESENSITIVE ON ;
DIVIDERCHAR "/" ;
BUSBITCHARS "[]" ;
DESIGN shift_reg ;
SCANCHAINS 2 ;
- Chain1_seg2_clk_rising
+ PARTITION p_clk_rising
# MAXBITS 248
+ START Q_reg Q
+ FLOATING
q_reg[0] ( IN SI ) ( OUT Q )
q_reg[1] ( IN SI ) ( OUT Q )
...
+ STOP q_reg[249] SI
- Chain1_seg4_clk_rising
+ PARTITION p_clk_rising
# MAXBITS 248
+ START q_reg[249] Q
+ FLOATING
q_reg[250] ( IN SI ) ( OUT Q )
q_reg[251] ( IN SI ) ( OUT Q )
q_reg[252] ( IN SI ) ( OUT Q )
q_reg[253] ( IN SI ) ( OUT Q )
q_reg[254] ( IN SI ) ( OUT Q )
q_reg[255] ( IN SI ) ( OUT Q )
I have also confirmed that these are compatible, by
running reportScanChainPartition:
<CMD> reportScanChainPartition
Info: Scan Chain Partition Group set to:
Partition group: p_clk_rising
Chain: Chain1_seg4_clk_rising
Chain: Chain1_seg2_clk_rising
How do I enable the swapping of cells between compatible scan chains?
Solution
The Scan DEF written from RC/Genus is generally different from the scan
chain report. Although both these reports are for chains with different
number of sequential elements, the segments and chains in the two reports
lead to much confusion.
Solution
This solution explains the possible differences between the scan DEF and scan
chain report and the reasons behind those differences.
Basic difference between the scan DEF and scan chain report
The Scan DEF is not a report on connectivity, whereas the scan chain report
provides information on connectivity.
Reporting of the chain in the netlist can be different as compared to the scan
DEF. This is because the scan DEF gives information on the re-orderable
sequential elements only, whereas the scan chain report provides information
on the complete chain with all the sequential elements.
Scan chain report
Scan DEF report
+ START BLOCK_2/DFT_lockup_g1 Q
+ FLOATING
BLOCK_3/Q_reg ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[0] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[1] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[2] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[3] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[4] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[5] ( IN SI ) ( OUT Q )
+ STOP PIN so2
The reason for the number of elements in the scan DEF being different
from the scan chain report
The Scan DEF will generally have fewer elements in comparison to the scan
chain report. All the flops belonging to the preserved segments, abstract
segments and any other non-re-orderable segment will not become part of the
scan DEF. However, the scan chain will have all the sequential elements that
pass dft rules and are scan mapped.
Scan chain report
Chain 1: sc1
scan_in: si1
scan_out: so
shift_enable: SE (active high)
clock_domain: test_domain1 (edge: rise)
length: 11
START segment abs1 (type: abstract)
# @ bit 1, length: 7
pin BLOCK_1/SI_1 <clk1 (rise)>
pin BLOCK_1/SO_1 <clk1 (rise)>
END segment abs1
llatch 7 DFT_lockup_g1 <clk1 (low)>
bit 8 BLOCK_2/Q_reg <clk2 (rise)>
bit 9 BLOCK_2/q_reg[3] <clk2 (rise)>
bit 10 BLOCK_2/q_reg[4] <clk2 (rise)>
bit 11 BLOCK_2/q_reg[5] <clk2 (rise)>
Chain 2: sc2
scan_in: si2
scan_out: so2
shift_enable: SE (active high)
clock_domain: test_domain1 (edge: rise)
length: 10
START segment fixed_Segment_1 (type: fixed)
# @ bit 1, length: 3
bit 1 BLOCK_2/q_reg[0] <clk2 (rise)>
bit 2 BLOCK_2/q_reg[1] <clk2 (rise)>
bit 3 BLOCK_2/q_reg[2] <clk2 (rise)>
END segment fixed_Segment_1
llatch 3 BLOCK_2/DFT_lockup_g1 <clk2 (low)>
bit 4 BLOCK_3/Q_reg <clk3 (rise)>
bit 5 BLOCK_3/q_reg[0] <clk3 (rise)>
bit 6 BLOCK_3/q_reg[1] <clk3 (rise)>
bit 7 BLOCK_3/q_reg[2] <clk3 (rise)>
bit 8 BLOCK_3/q_reg[3] <clk3 (rise)>
bit 9 BLOCK_3/q_reg[4] <clk3 (rise)>
bit 10 BLOCK_3/q_reg[5] <clk3 (rise)>
Scan DEF report
sc1_seg2_test2_rising
# + PARTITION p_test2_rising
# MAXBITS 4
+ START DFT_lockup_g1 Q
+ FLOATING
BLOCK_2/Q_reg ( IN SI ) ( OUT Q )
BLOCK_2/q_reg[3] ( IN SI ) ( OUT Q )
BLOCK_2/q_reg[4] ( IN SI ) ( OUT Q )
BLOCK_2/q_reg[5] ( IN SI ) ( OUT Q )
+ STOP PIN so;
- sc2_seg2_test3_rising
# + PARTITION p_test3_rising
# MAXBITS 7
+ START BLOCK_2/DFT_lockup_g1 Q
+ FLOATING
BLOCK_3/Q_reg ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[0] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[1] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[2] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[3] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[4] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[5] ( IN SI ) ( OUT Q )
+ STOP PIN so2;
In the above report, the scan chain report has both abstract segments and
fixed segments. However, for the scan DEF, these segments will not be present
because these are not re-orderable.
The reason for differences in the number of scan_partition
Scan DEF report
SCANCHAINS 52 ;
- sc1_seg6_test1_rising
# + PARTITION p_test1_rising
# MAXBITS 3
+ START BLOCK_1/Q_reg QN
+ FLOATING
BLOCK_1/q_reg[0] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[1] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[2] ( IN SI ) ( OUT QN )
+ STOP BLOCK_1/q_reg[3] SI;
- sc1_seg8_test1_rising
# + PARTITION p_test1_rising
# MAXBITS 3
+ START BLOCK_1/q_reg[4] QN
+ FLOATING
BLOCK_1/q_reg[5] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[6] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[7] ( IN SI ) ( OUT QN )
+ STOP BLOCK_1/q_reg[8] SI;
- sc1_seg10_test1_rising
# + PARTITION p_test1_rising
# MAXBITS 3
+ START BLOCK_1/q_reg[9] QN
+ FLOATING
BLOCK_1/q_reg[10] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[11] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[12] ( IN SI ) ( OUT QN )
+ STOP BLOCK_1/q_reg[13] SI;
The reason for the scan DEF starting at the combinational output, while the
scan chain starts from the flop
If there are ordered segment at the beginning of the chain, the segment has a
non-re-orderable element and an inverter. After writing out the scan DEF,
the scan DEF chain starts at the inverter output. See the following example
for differences:
Scan chain report
- wrp_in_chain1_seg1_clk_rising
+ PARTITION clk_rising
MAXBITS 13
+ START u_input_stage_testpoint_115 ZN
In this case, because the ordered scan segment at the beginning of the scan
chain is a preserved segment, the connection between the flop and the inverter
has to be preserved. This is why the START of the scan DEF chain is at the
inverter output and, because the preserved segment will not become part of
the scan DEF, it will have only the inverter.
The reason for empty scan DEF after the insertion of compression logic
For compression, it is not the chains but the channels that get dumped in the
scan DEF. If the compression ratio is such that the total number of flops in all
the channels is three or less, the scan DEF will be empty because nothing is re-
orderable.
Posted 29th August 2017 by Raj
0
Add a comment
2.
3.
MAY
31
http://www.synopsys.com/tools/implementation/rtlsynthesis/capsulemodule/df
t_wpa4.pdf
Posted 30th May 2016 by Raj
0
Add a comment
5.
MAY
24
Test procedure files describe the scan circuitry’s operation for the ATPG tool.
They contain cycle based procedures and timing definitions that tell the DFT
tools how to operate the scan structures in the design.
Before running ATPG you must be ready with a test procedure file to proceed
further.
To specify a test procedure file in setup mode, use the Add Scan Groups
command. The tools can also read in procedure files by using the Read
Procfile command or the Save Patterns command when not in Setup mode.
When you load more than one test procedure file, the tool merges the timing
and procedure data.
The shift and load_unload procedures define how the design must be
configured to allow shifting data through the scan chains. The procedures
define the timeplate that will be used and the scan group that it will reference.
The following are some examples of timeplate definitions and shift and
load_unload procedures:-
timeplate tp1 =
force_pi 0;
measure_po 1;
pulse clk1;
pulse clk2;
period <integer>;
end;
procedure shift =
scan_group grp1 ;
timeplate gen_tp1 ;
cycle =
force_sci ;
measure_sco ;
pulse clk ;
end;
end;
procedure load_unload =
scan_group grp1 ;
timeplate gen_tp1 ;
cycle =
force <control_signal1> <off_state> ;
force <control_signal2> <off state> ;
………………………………. ;
force scan_en 1 ;
end ;
apply shift <number of scan cells in scan chain>;
end;
TimePlate Definition
The timeplate definition describes a single tester cycle and specifies where in
that cycle all event edges are placed.A procedure file must have at least one
timeplate definition.
This statement is required for any pin that is not defined as a clock pin by the
Add Clocks command but will be pulsed within this timeplate.
force/measure_pi/po <time>
A literal and string pair that specifies the force/measure time for all primary
inputs/outputs
bidi_force/measure_pi/po <time>
A literal and string pair that specifies the force/measure time for all
bidirectional pins. This statement allows the bi-directional pins to be forced
after applying the tri-state control signal, so the system avoids bus contention.
This statement overrides “force_pi” and “measure_po”.
period time
A literal and string pair that defines the period of a tester cycle.
Parallel patterns are forced parallely (at the same instance of time) @ SI of
each flop and measured @ SO. Basically these patterns are used for
simulating the patterns faster. Here only two cycles are required to simulate a
pattern : one to force all the flops and one for capture.
Serial patterns are the ones which are used @the tester. They are serially
shifted and captured and shifted out.
Posted 24th May 2016 by Raj
0
Add a comment
.
.
DEC
10
But if chip have Hold violation than one question will arise in mind
'Will it work if we change the frequency ?'
As shown in figure, in the window of Tsetup and Thold, data must remain
stable.
Fig: Hold violation
Figure explain that there is hold violation due to data change in the Thold
timing window which result into hold violation.
According to hold violation definition, data should remain stable after the
active edge of clock for some minimum time
Fig: see the following figure that data is traveling from one ff1 to ff2
data1 = data at ff1
data2 = data at ff2
clock1= launch clock
clock2= capture clock
At the clock1, data1 is being sampled at ff2, and at clock2 data2( data of ff1
that is data1) is already reach to ff2 already
which means that data is overridden by next data because data comes to much
fast that override the previous data that is captured by previous clock edge,
so, functionality of chip is getting failed.
If the delay of combo logic and Tc2q delay is less than the hold time of ff2
than data comes too much fast that which does not give setup violation but
result in hold violation, so, due to this condition data that is already capture
data at ff2(data2) overrides by the data1 at the clock2 edge.
so, it is only depends on the Tcombo and Tc2q, and Tcombo and Tc2q is not
depends upon the clock period or working frequency.
To understand why setup and hold time arises in a flip-flop one needs to begin
by looking at its basic function.
It is a positive edge triggered flip flop because output arrives at the positive
edge of clk
Setup time: it is defined as the minimum amount of time before the clock's
active edge that data must be stable for it to be latched correctly.
Hold time: it is defined as the minimum amount of time after the clock's
active edge during which data must be stable.
Here, setup and hold time is measured with respect to the active clock edge
only.
this defines the reason for the setup time within a flip flop.
so, it is necessary that data must be stable before the active edge of clock with
delay value of the D to Z node of the latch unit of flip flop and this delay
define the setup time of the register
Note:
when the clock =0 , LHS part of the flop is active and RHS part is inactive due
to clock is inverted in the RHS region
same, for when clock =1, LHS part of the flop is inactive and RHS part is
active, and reflect the result of D input.
here, flop is made of two latch unit with working in master and slave logic
working fashion
so we can assume the LHS part is as Latch-1 and RHS part is as Latch-2
there may be combo logic sitting before the first transmission gate ( here you
can see the inverter before the transmission gate at the input path from D to
W). This introduces a certain delay in the path of input data D to reach the
transmission gate. this delay establishes whether the hold time is positive,
negative or zero.
Now this relationship between the Combo logic delay and time taken for
transmission gate to switch On and Off after clk and clkbar is given. that
relationship between that rise to various types of hold time that exist, it can
be +ve,-ve or zero hold time.
Placement
Congestion: If the number of routing tracks available for routing is less than
the required tracks then it is known as congestion.
Standard cells: the designer neither uses predesigned logic cells such as AND
gate, NOR gate, etc. These gates are called Standard Cells. The advantage of
Standard Cell ASIC’s is that the designers save time, money and reduce the
risk by using a predesigned and pre-tested Standard Cell Library
Tie cells : The tie cells are used to connect the floating input to either a VDD
or VSS without any change in logic functionality of circuit.
Spare cells : Whenever it is required to perform some functional ECO
(Engineering change order) , spare cells would be used .These are extra cells,
floating in an ASIC design they are also apart of standard cell library and if
you want to include some more functionality, after the base tape out of chip by
using spare cells.
End cap cells: End caps are placed at the end of cell rows and handle end-of-
row well tie-off requirements End caps are used to connect power and ground
rails across an area and are also used to ensure gaps do not occur between
well or implant layers which could cause design rule violations.
Pre requisites of CTS include, ensuring that the design is placed and
optimizes, ensuring that the clock tree can be routed i.e., taking care of
congestion issues, power and ground nets are pre-routed.
The inputs are placement database or design exchange format file after the
placement stage and clock tree constraints.
Congestion Effect:
During placement, the optimization may make the scan chain difficult to route
due to congestion. Hence the tool will re-order the chain to reduce congestion.
Timing Effect:
This sometimes increases hold time problems in the chain. To overcome these
buffers may have to be inserted into the scan path. It may not be able to
maintain the scan chain length exactly. It cannot swap cell from different
clock domains.
Because of scan chain reordering patterns generated earlier is of no use. But
this is not a problem as ATPG can be redone by reading the new netlist.
Posted 10th December 2015 by Raj
0
Add a comment
.
OCT
20
It is a well known fact that DFT Shifting is done at a slower frequency. Well,
I'm gonna list down some cons against this.
Lower is the frequency, greater is the test time. In modern SoCs,
tester cost (which is directly proportional to the tester time) accounts for
roughly 40% of the selling price of a single chip. It would be pragmatic to
decrease the test time by increasing the frequency. No?
Increasing the frequency would not pose any timing issue.
Because, hold would anyway be met (Hold check is independent of frequency).
And setup would never be in the critical path considering the fact that scan
chains only involve direct path from output of a flop to scan input pin of the
next flop, devoid of any logic.
Then why not test at a higher frequency, which is at least closer to the
functional frequency? What could possibly be the reason for testing at slower
frequency?
Answer:
Unlike functional mode, where different paths have varying combinational
logic between any two registers, in shift mode, there is absolutely no logic at
all! Hence, all the flops tend to switch at the same time. Imagine all the flops
switching at the same time. The peak power consumption which is directly
proportional to the switching frequency, would shoot up, maybe upto the
point that the IC might catch fire!!
Also, in functional mode, the entire SoC does not function simultaneously.
Depending on use-case, some portions will either not work, or work in
tandem.
You might argue here, that one can run shift the same way, i.e. different parts
in tandem. But that would mean, higher test times that we intended to reduce
by increasing the shift frequency in the first place.
2.In the below question, we intend to check the node X for a stuck-at-0 fault.
Can you tell what input vector (A,B,C) would be need to give to do so?
Answer:
ABC(100) vector will detect the s@0 fault at that node.
Setup and hold checks are the most common types of timing checks used in
timing verification. Synchronous inputs (e.g. D) have Setup, Hold time
specification with respect to the clock input. These checks specify that the data
input must remain stable for a specified interval before and after the clock
input changes
Ø Setup Time: the amount of time the data at the synchronous input (D) must
be stable before the active edge of clock
Ø Hold Time: the amount of time the data at the synchronous input (D) must be
stable after the active edge of clock.
Both setup and hold time for a flip-flop is specified in the library.
Setup Time
Setup Time is the amount of time the synchronous input (D) must show up,
and be stable before the capturing edge of clock. This is so that the data can
be stored successfully in the storage device.
Setup violations can be fixed by either slowing down the clock (increase the
period) or by decreasing the delay of the data path logic.
setup information .lib :
timing () {
related_pin : "CK";
timing_type : setup_rising;
fall_constraint(Setup_3_3) {
index_1 ("0.000932129,0.0331496,0.146240");
index_2 ("0.000932129,0.0331496,0.146240");
values ("0.035190,0.035919,0.049386", \
"0.047993,0.048403,0.061538", \
"0.082503,0.082207,0.094815");
}
Hold Time
Hold Time is the amount of time the synchronous input (D) stays long enough
after the capturing edge of clock so that the data can be stored successfully in
the storage device.
Hold violations can be fixed by increasing the delay of the data path or by
decreasing the clock uncertainty (skew) if specified in the design.
Hold Information n .lib:
timing () {
related_pin : "CK";
timing_type : hold_rising;
fall_constraint(Hold_3_3) {
index_1 ("0.000932129,0.0331496,0.146240");
index_2 ("0.000932129,0.0331496,0.146240");
values ("-0.013960,-0.014316,-0.023648", \
"-0.016951,-0.015219,-0.034272", \
"0.108006,0.110026,0.090834");
}
Timing paths
Timing Path
Timing path is defined as the path between start point and end point where
start point and end point is defined as follows:
Start Point:
All input ports or clock pins of a sequential element are considered as valid
start point.
End Point:
All output port or D pin of sequential element is considered as End point.
For Static Timing Analysis (STA) design is split into different timing path
and each timing path delay is calculated based on gate delays and net delays.
In timing path data gets launched and traverses through combinational
elements and stops when it encounter a sequential element. In any timing
path, in general (there are exceptions); delay requirements should be satisfied
within a clock cycle.
In a timing path wherein start point is sequential element and end point is
sequential element, if these two sequential elements are triggered by two
different clocks(i.e. asynchronous) then a common least common multiple
(LCM) of these two different clock periods should be considered to find the
launch edge and capture edge for setup and hold timing analysis.
Different Timing Paths
Any synchronous design is split into various timing paths and each timing
path is verified for its timing requirements. In general four types of timing
paths can be identified in a synchronous design. They are:
Ø Input to Register
Ø Input to Output
Ø Register to Register
Register to Output
Input to Output:
It starts at input port and ends at output port. This is pure combinational
path. You can hardly find this in a synchronous design.
Input to Register:
Semi synchronous; Register is controlled by the clock. Input data can come at
any time.
Register to Register:
Purely sequential; both starting and ending flops are controlled by the clock.
Register to Output:
Data can come at any point of time.
Clock path
The path wherein clock traverses is known as clock path. Clock path can have
only clock inverters and clock buffers as its element. Clock path may be
passed trough a “gated element” to achieve additional advantages. In this
case, characteristics and definitions of the clock change accordingly. We call
this type of clock path as “gated clock path”. The process of “clock gating”
has main advantage of dynamic power saving.
Data path
The path wherein data traverses is known as data path. Data path is a pure
combinational path. It can have any basic combinational gates or group of
gates.
Launch path
Launch path is part of clock path. Launch path is launch clock path which is
responsible for launching the data at launch flip flop.
Launch path and data path together constitute arrival time of data at the
input of capture register.
Capture path
Capture path is part of clock path. Capture path is capture clock path which
is responsible for capturing the data at capture flip flop.
Capture clock period and its path delay together constitute required time of
data at the input of capture register.
Slack
Slack is defined as difference between actual or achieved time and the desired
time for a timing path. For timing path slack determines if the design is
working at the specified speed or frequency.
This is the time required for data to travel through data path.
This is the time taken for the clock to traverse through clock path.
Setup and hold slack is defined as the difference between data required time
and data arrival time.
Zero setup slack specifies design is exactly working at the specified frequency
and there is no margin available.
Negative setup slack implies that design doesn’t achieve the constrained
frequency and timing. This is called as setup violation.
Data arrival time is the time required for data to propagate through source
flip flop, travel through combinational logic and routing and arrive at the
destination flip-flop before the next clock edge occurs.
Arrival Time= Tclk-q+Tcombo
Required Time=Tclock-Tsetup
setup slack= Required Time- Arrival Time
=( Tclock-Tsetup) – (Tclk-q+Tcombo)
Reg to Output:
Data arrival time is the time required for data to leave source flip-flop, travel
through combinational logic and interconnects and leave the chip through
output port.
Reg to Output:
Data arrival time is the time required for data to leave source flip-flop, travel
through combinational logic and interconnects and leave the chip through
output port.
Data arrival time is the time required for the data to start from input port and
propagate through combinational logic and end at data pin of the flip-flop.
Arrival time=Tcombo
Required time= Tclk-Tsetup
setup slack= Required Time- Arrival Time
=( Tclock-Tsetup) – Tcombo
Loading
Dynamic Views theme. Theme images by Dizzo. Powered by Blogger.