0% found this document useful (0 votes)
3K views75 pages

Test Point Insertion For Test Coverage Improvement in DFT - Design For Testability (DFT) .HTML

The document discusses test point insertion (TPI) as a technique to improve test coverage for designs with low controllability and observability. It provides examples of logic that impacts test coverage, such as unbonded pins and black box interfaces. The solution described is to add test points like multiplexers and XOR gates to make previously untestable logic controllable and observable during testing, thus improving overall test coverage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views75 pages

Test Point Insertion For Test Coverage Improvement in DFT - Design For Testability (DFT) .HTML

The document discusses test point insertion (TPI) as a technique to improve test coverage for designs with low controllability and observability. It provides examples of logic that impacts test coverage, such as unbonded pins and black box interfaces. The solution described is to add test points like multiplexers and XOR gates to make previously untestable logic controllable and observable during testing, thus improving overall test coverage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 75

APR

Test Point Insertion for Test Coverage Improvement in DFT


Inorder to make the design friendly

 
Test Point Insertion for Test Coverage Improvement in DFT

Hongwei Wang
STMicroelectronics (Shenzhen) R&D Co., Ltd.
[email protected]
Abstract
In a complex ASIC design, there are usually some uncontrollable or
unobservable logics. Because these logics are unable or difficult to control
and/or observe, it is very difficult or impossible to test them. The consequence
is low test coverage, and this will lead to the part’s reliability problem. Test
Point Insertion is an efficient technique to improve a design’s testability and
improve its test coverage by adding some simple controllable and/or
observable logic.
This paper presents an example of Test Point Insertion. This is a real project
of using DFT Compiler and TertaMAX to improve test coverage. We will
analyze and explain the main causes which lower test coverage, and then
provide a solution for test coverage improvement. By comparing the results of
both pre and post Test Point Insertion, we can see that test coverage and test
efficiency have been greatly improved with just a few test point insertions and
the addition of a few logic gates. This paper analyzes the causes that lead to
low test coverage and introduces the test design flow using Test Point
Insertion technique. Test Point Insertion technique can solve two typical test
issues that lower test coverage: shadow logic between digital logic and black
box, and the un-bonded pads in a multi-die chip.
Key words: Test Point Insertion, Test Coverage , DFT.
1. Frequetly Problems
To speed up ASIC design and reduce the turnaround time of marketing and
mass production for an electronics product, the development schedules of
Very Large Scale Integrated Circuit (VLSI) design and manufacturing
become shorter and shorter. Design engineers must take care of the possible
defects and debug them during the manufacturing process. Design for
Testability (DFT) takes an important role of improving product yield.
Test coverage and test efficiency are the most important standards for design
quality when we use DFT techniques. A good design with high test coverage
should be observable and controllable. It is understandable that 100% test
coverage is difficult to achieve. In order to save the testing costs, design
engineers should use fewer test patterns with higher coverage. Test efficiency
is also very important.
In addition, DFT engineers must guarantee the function remains unchanged
during DFT design. In some designs, as some mission mode function logics are
uncontrollable and unobservable, and these logics potentially cause many test
problems. Test coverage for these designs could not be improved without
using advanced test techniques or simply by increasing the number of test
patterns.
Test coverage is defined as the percentage of detected faults out of total
detectable faults. This coverage is a more meaningful evaluation for test
pattern quality. Fault coverage is defined as the percentage of detected faults
out of all faults, including the ATPG Undetected faults. By definition we can
find that test coverage is always higher than fault coverage. The formula
below shows the calculation for test coverage and fault coverage.

AU: ATPG Untestable; UD: Undetectable;


ND: Not Detected; PT: Possible Detected
Default Value: pt_credit = 50%; au_credit = 0%
IIn real design, usually there are two typical types of logic that impact test
coverage. The first type resides in the input/output shadow parts between
digital logic and black box. The second type is in the input/output pads of un-
bonded pads in multi-die package.
Figure 1 indicates a multi-die package. In this application, the other pads
except test_si and test_so are not bonded to outside; this unbonded input or
output pads connect to ground or are floating. Therefore, the cone logic
related to port_A,port_B and port_C are uncontrollable, while the logic
related to port_Z1,port_Z2 and port_Z3 are unobservable. These in turn
mean that test coverage is lowered because many of the logics in the design
are not testable.
Figure 1: Uncontrollable or Unobservable Logic Caused by Packages

General speaking, the primary input and output pads are used for
controllability and observability in DFT design. Usually it is impossible to
reserve enough dedicated pads for every design because of the limitation of IO
pads. We normally consider analog macros such as PLL, ADC and memory as
a black box during ATPG test. Figure 2 shows digital black box interface. The
RAM macro below is an example of a black box. Since the addr_bus, din_bus
and net_1 go directly into the pins of memory, and the related logic cone
“Com. Logic 1” sinks into the memory input pins, this logic could not be
observed directly and therefore could not be tested. The test coverage of the
design is low. Meantime, at the output pins of memory block, the nets
dout_bus, net_2 and net_3 cannot be controlled directly because they are
connected to memory’s output pins, these pins are considered as “X” state
during Automatic Test Pattern Generation, therefore the “Com. Logic 2”
could not be tested. Because of these test problems, the test coverage for this
design is not high enough to meet the target of DFT.

Figure 2: Uncontrollable or Unobservable Logic Caused by Black-Box


Interface

Clock gating may also affect the testability problem for Automatic Test
Pattern Generation. To make the clock signal from clock gating cell
controllable, we can use a control signal to bypass the clock gating cell or
make it transparent during test or scan shift. We have two choices for the
control signals: one is test mode and the other is test enable signal. It is
recommended to use test enable signal as clock gating control signal, since the
test mode keeps “1” in all test procedures while test enable only keeps in “1”
in shift mode. Sometimes, we have to use test mode signal because of the
impact from other modes. In this case, we can insert test points to increase the
test coverage.
2. Solution
In order to fix the low global test coverage problem, we focus on making the
related logic controllable and observable during RTL design or during scan
chain insertion. This will help us to understand how to write good testable
RTL code during function design and use DFT features of DFT Compiler for
Test Point Insertion (TPI).
Figure 3 gives a solution to improve test coverage by adding TPI for un-
bonded pads in Figure 1. One multiplex register is inserted at the output of
the input pads as illustrated in the graph. When the control test enable signal
TE is “0”, the circuit works in normal operation mode, its function logic
receives the normal input data from primary inputs. When the control test
enable signal TE is “1”, the circuit works in test mode: during shift process,
the pre-load bits are shifted through the scan chain; during capture process, a
pull-down or pull-up is applied to prevent the test logic from the global “X”
propagation.
As for the output pads, some XOR cells and multiplex cells are inserted. When
the control test enable signal TE is “0”, the circuit works in normal operation
mode, the related pads output normal function response. When the control
test enable signal TE is “1”, the circuit works in test mode: the XOR cells take
the consideration with un-bonded pads, so these ports can be observed
equivalently.

Figure 3: Test Point Insertion to Improve Test Coverage (for Un-bonded Pins)

As for the interface between the logic and the black box shown in Figure 2, the
black box input signals come from the combinational logic and these signals
go to the black box. The logic is not observable so the test coverage is low. For
the black box’ s output signals, they control the next level combination logic
directly. These signals are uncontrollable in test modes because they are from
the black box. Therefore the logic is not controllable, which lowers the test
coverage. Similar to the solution for un-bonded pads, Figure 4 shows the
solution to improve testability. We insert some test points in the design and
put the above mentioned signals into the scan chain in order to make them
controllable and observable which finally improves the test coverage and
efficiency.
In Figure 4, some XOR cells and multiplex cells are added at the input pins of
the black box; these cells control the inputs of the black box and they are
inserted into scan chain. When the control signal is “0”, the circuit works in
normal operation mode, and the black box inputs receive signals from
function input ports. When the control signal is “1”, the circuit works in test
mode: the previously unobservable signals go through XOR cells and these
signals can be observed transparently.
Also one multiplex register is added at each output pin of the black box to
control the next-level combinational logic. When the control signal is “0”, the
circuit works in normal operation mode, and the function logic receives the
normal input data from the black box. When the control signal is “1”, the
circuit works in test mode: Pre-loading data are shifted through the scan
chain during the shift process. In capture process, a ground connection is used
to prevent the test logic from the global “X” propagation.

Figure 4: TPI Solution for Test Coverage Improvement (Digital-Analog


Interface)

For a design with black box modules, it is recommended to take them into
account early during RTL design. Design engineers can balance a design’s
functionality and its testability simultaneously. If testability is not considered
during function design, TPI techniques can be used to solve the testability
problem of the design. These techniques are flexible and easy to use as a
common solution.
3. TPI Application
From the previous analysis of testability problems and solutions provided,
some test points can be inserted to put the uncontrollable or unobservable
logics into the scan chain. By using this technique, these logics can be tested,
and test coverage and test efficiency can be improved greatly.
Test Point Insertion (TPI) is a useful technique for solving the potential
testability problem and improving the test coverage of a design by making its
uncontrollable logic controllable and unobservable logic observable. This
technique also helps to improve the test efficiency since the higher coverage
can be derived with few test vectors increasing. This technique is very easy to
put into application since only a few commands are added in the existing
scripts.
As for the multi-die package design, the “add net connections” command can
be used to eliminate the unbonded pads before pattern generation. These
unbonded pads can be defined as TIE0, TIE1 using embedded pull-up, pull-
down cells. These pads can be defined as floating if they are not connected to
any pin during packaging. The following example removes the primary input
pads or inout pads; ATPG will exclude these pads during pattern generation.

Figure 5: Unbonded Pads Removal from Imported Design

Before the test point insertion, it is important to check the global test coverage
and analyze where the bottle-neck for low test coverage is; otherwise the test
point efficiency will not be good enough to meet the test target. TetraMAX is
recommended to report the global test coverage of a netlist with scan chain
insertion and then get the TPI design guidance. Figure 6 indicates test
coverage and fault coverage from the design.

Figure 6: ATPG Test Coverage Report with Pre-TPI Netlist

According to the formulas for calculating test coverage and fault coverage, the
UD (Undetectable) is excluded from the test coverage calculation; but the AU
(ATPG Untestable) fault is included in coverage calculation. This is why we
focus on AU faults for test point insertion to make them testable. With “report
fault –class AU” command in depth option, the test coverage will be reported.
This report gives the outline of test coverage for every module. We can find
out which macros cause low test coverage mostly, then analyze the related
logic carefully; TPI can be applied efficiently to receive the maximum return
with little logic added. As an example, Figure 7 shows script command to
report the AU Faults in the design.

Figure 7: AU Faults Report Command

Figure 8: AU Faults Report for Low Test Coverage

Figure 8 indicates the AU faults report with the related command for test
coverage analysis. In this example, the main cause of low-test coverage is the
untestable logic between memories and digital interface, also the untestable
logic between analog and digital. Because the memories address and data in
RAM are dedicated to special functionality and have nothing to do with other
logic, this logic cone cannot be controlled by the primary ports in capture
mode; they act just like sinking points, which need the test point insertion to
make test coverage higher.
We recommend including TPI in traditional scan insertion flow when using a
DFT Compiler. Some commands are added in script simply to define which
instances are required for the test point insertion, then we insert scan chains
using the original configuration. This flow is convenient for both design
review and checks.
Figure 9 below indicates the script file for test point insertion. Many options
can be used for control point and observe point insertion according to our
requirement. According to our experience, both observe points and
controllable points are sensitive for test ceoverage improvement, most of
important thing is to choose the correct points for the higher test efficiency.
Also scan chains can be inserted with traditional configuration. Figure 10 and
figure 11 show the post-TPI netlist adding the observation points and control
points, the name of inserted DFF instances follows the name rule
“udtp_sink***” by default, which means “user defined test point”. The
instance named “EOLL” is the XOR gates according to the design request.

Figure 9: DFT Compiler Test Point Insertion Script File

Figure 10: Inserted Test Instances for Observation Purpose

Figure 11: Inserted Test Instances for Control Purpose

With the same options, Figure 12 indicates the test coverage and fault
coverage of a post-TPI netlist in ATPG flow. With just a few test patterns
increased, global test coverage is improved greatly which meets our test
target. Figure 13 shows the test coverage increase for modules with TPI.
Figure 12: ATPG Test Coverage Report with Post-TPI Netlist

Figure 13: AU Faults Report with Post-TPI

The reports from DFT Compiler and TetraMAX reveal that the little
additional logics can mean a lot of improvement for test coverage to satisfy the
test target. Because only a few instances are added for coverage improvement,
this TPI will not impact too much on Back-End design flow. Also there is no
functionality difference between pre-TPI and post-TPI netlist; since the added
test points only takes some multiplexers or XOR gates into account, it is clean
from the design function. In functional mode, the design passes formal check
successfully and smoothly by using formality; the TetraMAX guarantees the
integration of scan chain and ATPG processing.
After we analyzed the post-TPI netlist, we concluded that the TPI technique
and DFT Compiler are very useful to insert observe and control points in a
test unfriendly design, and therefore improve the test coverage with little area
overhead.
4. Conclusion
In our project mentioned above, there are more than ten thousand registers in
the original design. With the TPI technique, we only added 12 registers and a
few combinational logics. The test coverage increased from 95% to 98.3%.
Obviously, this technique is very efficient and easy to use. More test points can
be inserted if necessary for higher test coverage. The test coverage can reach
nearly 100% in theory.
We strongly recommend design engineers use TPI technique in their design
flow. By doing so, we can anticipate the different design structures required
for function design and design for testability. On one hand, the function
design is clear in operation mode. On the other hand, uncontrollable and
unobservable problems are avoided mostly in the DFT design. However, if
these issues exist after the RTL is frozen, we may use DFT tools to insert user-
defined test-points. DFT Compiler and TetraMAX from Synopsys have the
capability to accomplish this job.
It is also important to choose the right test points’ locations. If the location is
chosen improperly, test coverage could not be improved. Test coverage could
not be increased further even with more test patterns.
Based on analysis and implementation of this project, test coverage and
efficiency have been improved greatly with just a few logic gates. This
methodology is very easy to use, and we only needed to add a few commands
in our existing scripts. It is useful for almost all DFT design, especially for
those designs requesting higher test coverage. We strongly recommend this
technique to improve test coverage for our designs.
5. Reference
[1] DFT Compiler User Guide Vol.1: Scan(XG Mode) X2005.09,September
2005;
[2] TetraMAX ATPG User Guide Version X2005.09, August 2005;

Posted 2nd April 2012 by Raj


 

Add a comment
Design for Testability(DFT)

 Classic
 
 Flipcard
 
 Magazine
 
 Mosaic
 
 Sidebar
 
 Snapshot
 
 Timeslide
1.
2.
JAN
8

Increasing Scan Channel Bandwidth by Sharing All Jtag Pins

The on-chip IEEE 1149.1 JTAG interface and TAP offer an option to share
some of its pins during scan test. These includes TCK, TDI and TDO for clock,
scan input and scan output respectively. An additional pin viz. TMS is not
shared for scan test as it is used to sequence the TAP controller. (The fifth
pin TRSTN is optional). In many cases, these pins are used to configure the
device into different test modes, thereby making it difficult to share them for
scan test. A simple solution is to put the TAP controller in Run_Test_Idle state
upon entering into the scan test mode and forcing the
internal TMS and TRSTN signals of TAP controller to an appropriate value,
while releasing the DUT pins. However, the control of the TAP FSM is now
lost. Re-powering up is required to reset it and get control of the pins again.
Such a power-up has many implications in an SOC, including ATE and DUT
internal test time due to the initialization of several on-chip functions,
including embedded power management. (However, our focus is not as much
on the reduction of the test mode control pins (namely JTAG), as is on the
elimination of scan control pins). A novel mechanism has been developed
through which these two pins, TMS and TRSTN pins are also shared
dynamically for scan test. At any point during or after scan test, the functional
control on these two pins can be regained back without the need of any
additional power-up.In proposed solution, the TAP controller is kept
in Run_Test_Idle state and when the scan enable pin is asserted, internal TMS
and TRSTN signals at the TAP controller levels are asserted to suitable logic
levels, thereby allowing these two pins to be shared for scan test. During the
capture phase when scan enable is de-asserted, functional control on these two
pins is regained back and they are kept at the desired functional values to
improve coverage. With such an implementation, it is possible to share these
pins for scan test thereby reducing the number of tester contacted pins or
being able to increase the scan channel bandwidth. Additionally, it is possible
to combine the pattern detection based generation of SE signal internally with
the proposed method of sharing of all JTAG pins dynamically.

For details :
https://ieeexplore.ieee.org/document/8326907/
Posted 8th January by Raj
 

Add a comment
3.
SEP
7

serial simulations -debug-of-compressed-scan-patterns


The current ATPG pattern validation flow is mainly driven through a two steps
approach. The first step focuses on compression bypass scan patterns that allow
full debug capability at the expense of simulation time. The second step verifies
the compressed patterns in simulation without debugging possibility.
The presented flow is based on Dual STIL patterns and allows fast and easy
debug of compressed patterns in serial mode. It uses the combination of internal
scan chains definition, standard serial patterns and its associated parallel ones
to allow simulation of compressed pattern to speed-up the ATPG verification,
with the same debug capabilities as classical bypass pattern serial verification.

1.    Introduction

Debugging ATPG pattern in simulation is a time consuming activity,


especially for the very first step that concerns the verification of the shift
phase. The usage of a parallel simulation approach allows to speed-up the
most of the scan pattern simulation, but it can validate only the capture phase.
For the shift, the standard approach that consists in simulation the pattern in
serial remains mandatory, at any level of the simulation. It concerns then 0-
delay simulation to validate the scan structure, and also back-annotated
simulation to cross-check check timing.
In actual design, usage of scan compression is mandatory in order to
significantly reduce the overall test time. It results of two different scan modes
that can be used for a given design: bypass and compress mode. Note that for
some SoC specific reasons, you might even more than those two modes. Then
for each of the mode, a serial shift phase simulation is required to make sure
everything works as expected in any situation.
The standard approach for debugging shift phase of scan patterns is to focus
first on a bypass one. The reason of it is coming from the fact that only bypass
pattern are transiting as it along the scan chain, meaning that a data shift
from a scan input can be easily followed flip-flop after flip-flop until the data
is going out of the design through a scan output. When you have validated
that the bypass shift is working as expected, you can move to compression
bypass capture simulation and other scan modes (compressed) simulation. All
those simulation are performed in serial mode; most of the patterns are then
simulated in parallel mode since only the capture phase is then to verify.

Figure 1 : standard approach for SCAN patterns simulation


This approach is then use for the different scan patterns flavor (stuck-at,
transition fault, bridging…) and at-least with a 0-delay netlist and then after,
back-annotated netlist in different corner.
2.    Simulation time and debug capabilities

The main issue of the classical scan pattern verification flow described above
is coming from its first step: the validation of bypass pattern within a big
System on Chip required a lot of runtime that prevent any quick debug. Any
change within a pattern or in the design conducts to a restart of simulation
that can required dozen of hours to get a first bit to be shift-out. Switching a
design in scan bypass mode
The first inconvenient of the classical flow is then the runtime of the bypass
scan pattern simulation. Anyway, it has to be the first step because in another
hand, the debug capabilities on compress patterns are close to nil using a
classical verification flow. When a bypass pattern allows identifying a clear
relationship between a bit shift at the input of the design and a given scan
chain flip-flop, this is no longer the case with compressed patterns. The de-
compactor and the compressor become a scan data encoder for the simulation
point of view. A simulation fail on a scan channel after the decompressor
cannot be linked to a given flip-flop and the debug of the pattern is then not
possible.
As illustrated by the following table, in most of cases this is not possible to
know with the classical flow which flip-flop is failing when running simulation
in serial mode.
As explained before there is almost no debug capability when simulating
compressed scan pattern in serial mode. But regarding the runtime required
to get the result of a scan shift simulation, a compressed pattern is then much
faster. An easy and accurate estimation of the difference of duration between
the two simulation (shift bypass versus shift in compressed mode), can be done
by considering the maximum length of a scan chain in both mode. A SoC
composed in compressed mode of 1’000 internal scan segments of 256 flip-
flops and that is connected to a tester through 16 scan inputs and 16 scan
outputs, will have 16 scan chains of 16’000 flip-flops. The bypass pattern
requires then 62.5 time more shift cycle than the compressed pattern. If we
neglect the setup phase, the simulation of a bypass pattern is around 60 times
slower to get a result then the compressed one.
Debug Capability Run Time

Bypass Simulation + –

Compressed Simulation – +

Table 2: Pros and Cons of serial SCAN pattern simulation

3.    Optimized verification flow in serial mode

Tetramax allows generating both serial and parallel STIL patterns for
simulation. For a same set of patterns this is then possible to get equivalent
serial and parallel simulation data. As explained earlier, serial simulations are
requesting an important runtime; and this is particularly the case when
running bypass simulation. In another hand, pattern debug is only possible
when running bypass simulation (in case of serial simulation). In this chapter
we are going to cover the way to retrieve an efficient flow to be able to debug
compressed patterns during a serial simulation.
The STIL format allows getting a full description of each of the internal scan
segment. The flow that is proposed is based on a test bench generator that
computes the overall information available within on both serial and parallel
STIL files. This flow aims to provide the same debug capability for a serial
compressed pattern than a bypass one.
The debug is made possible thanks to the parallel information that is included
within the STIL parallel pattern.
In order to make this conversion, the first information that is required is the
composition of each internal scan segment, including the order of each
segment and any potential inversion between flip-flops. The STIL format
provides this information. For each internal scan segment, the STIL provide
the full path to each scan element, and the character ! is used to specify any
inversion along the scan path.
Figure 2 : description of internal scan segment

The parallel format of a pattern describes the expected value at the end of
each internal segment. The format provided then the value to spy before the
decompactor when running a serial simulation. In combination with the serial
pattern, it allows then knowing at each shift out cycle, what should be the
expected value on each internal scan segment output, and the value expected
on the scanout port of the design.

Figure 3 : Spying data extract from the different STIL files


The parallel STIL file provides the value that is shift in at the beginning of
each scan segment, and those that are shift out at the end of each of them.
Scan in data are described for each pattern after the key
word internalscn_iwhen the scan out data are introduced by the key
word internal_scn_o.
In order to be aligned with the real number of scan shift cycles required in
serial simulation, the expected value of the parallel pattern must be adapted
with masking bit as illustrated in the following figure:

Figure 4 : Update the internal scan check with the number of shift cycles
Integrating those checks within a serial compressed mode simulation allows
knowing to link a fail occurring on a scan output of the design with an
internal scan segment.
Then after, thanks to the rebuilding of the internal scan segment illustrated by
the figure 2, the generated test bench is able to provide important debug
information:
 The shift cycle where the fail occurs
 Failing internal scan segments (might be one or more)
 The falling DFF with the expected and simulate value (thanks to
the inversion over scan chain information)
With this debug information, this is then possible to identify clearly the reason
of a fail on any compressed pattern.
The following figure shows the overall proposed flow. As described, most of
the required data are classical output of Tetramax. The basis of the test bench
is the one provided by the Max Testbench (STIL2VERILOG script). An extra
script known as “Internal Scan Segment Spy TB Generation”, computes all
the data Tetramax can provide in order to create an efficient serial test
bench to be used when running serial simulation of compressed patterns. This
is the script that has been specifically developed in order to support the
proposed methodology.
The test bench provided by this custom script allows keeping the same debug
capabilities as you have when running bypass pattern serial simulation.

Figure 5 : Flow overview


This is important to notice that this methodology can be applied whatever the
compression structure of the design since the debug feature are coming from
an observation of the inputs of the compressor. Whatever the number of
compressor you have, using pipeline stages of not, the parallel pattern
provides the required data to be computed to handle an easy debug of the
compressed pattern during serial simulation. The script to be used for the
computation of both serial and parallel STIL data will have then to be
adapted, but the philosophy behind the flow remains valid.
Considering the serializer capabilities of DFTMAX, we can make the same
remark as previously: since the spied data used for debug are located at the
input of the compressor, there is no impact on the methodology whether you
have or not a structure that includes a serializer. The presented flow has been
developed for an industrial application that was not requiring usage of
sterilizer but as mentioned earlier, there is no identified impact of sterilizer on
the flow.

Figure 6 : Structure including a serializer


Finally, the proposed flow is simulation tool independent since based on STIL
information only. This can be then used (and adapt) to any simulation tool. In
our case, the flow was developed to target a simulation on Modelsim®.

4.    Conclusion

Accelerating the debug of scan pattern and the related debug capabilities
represents a real value on System on Chip development. The gain is even
bigger when you multiply the number of compressed modes that are required
within your system.
In some particular case, you might have no other choice than running most of
your debug in serial mode. In such case, the classical debug approach leads to
the incapacity of any efficient debug and is then time consuming for
hypothetic result.
Thanks to both serial and parallel pattern formats that are provided by
Tetramax, this is possible to achieve the same level of debug on serial
simulation whatever the type of scan pattern (bypass/debug).
The overall gain of using this simulation flow depends on the number of serial
simulation you required according to your design test plan. But at least there
is a common gain across the different projects: this is no longer required to
start debugging the scan shift through a bypass pattern. Since you keep the
same level of debug capabilities on compressed patterns, you can speed-up the
shift phase debug by running a compressed pattern that will provide the first
shift out much faster than standard initial debug through a bypass scan check.

Posted 7th September 2017 by Raj


 

Add a comment
4.
AUG
29

Why does the post layout netlist get low coverage if scan reordering is done
during place & route?
Problem

Why does the post layout netlist get low coverage if scan reordering is done
during place & route?

Solution

 If there is a scan chain that has positive and negative edge triggered flops,
then there is a possibility that the modified design will have
a problem with lower coverage.
When you synthesize the net list, the synthesis tool puts a lockup latch if there
is a positive edge triggered flop followed by a negative edge triggered flop in
same scan chain or if there are two flops connected to each other consecutively
driven by different clock domains. The Synthesis tool will try to place all
negative edge flops followed by all positive edge flops so that a lockup latch
can be avoided.
Now consider the case when the scan chain is in Place and Route and gets
reordered. The Place and Route tool will not recognize the positive and
negative edge nature of the flops during reordering.  The result may be a
case where a positive edge flop is followed by a negative edge flop. There is  a
need for the lockup latch in this case but P&R will not insert the  lockup latch
itself. So this will cause a problem with low coverage.
The solution is to check the scandef after P&R and check to see if there is the
need for lockup latches.  If possible, try switching off reordering of the scan
chains. The problem can also be avoided by modifying the scandef supplied
from Synthesis to P&R.

Problem

What are the things which need to be be considered before doing scan chain
reordering in post layout netlist?

Why there is difference in scan chain length between pre-layout netlist and
post-layout netlists, when scan chain reordering is done in a post layout
netlist?

Solution

While doing scan chain reordering you need to take care that, if there are
flops where the receiver flop is negative edge triggered and the transmitter
flop is positive edge triggered, a lockup latch is required to be inserted.
Otherwise the receiver flops may not be seen as a scannable flop thus resulting
in a reduction in the scan chain length and coverage loss.
Even if you have lockup latches in the original design before reordering, you
should check that the final netlist has handled this properly with lockup
latches.

What is the best way to reorder the scan chains within a partition (swap the
regs between chains)?
Problem
I define my scan chains using a scan DEF file. In this file I use
the PARTITION keyword when defining chains to identify the compatible
scans chains. How do I enable scan reordering in the Encounter Digital
Implementation System to swap registers between the compatible scan
chains? I want to enable swapping because it reduces the length of the scan
route and thereby reduces congestion in the design. I do not see much
improvement with the following flow:
setPlaceMode -reorderScan true # default is true
placeDesign
optDesign -preCTS
scanReorder
The following is an example of my scan chain definitions in the DEF:
VERSION 5.5 ;
NAMESCASESENSITIVE ON ;
DIVIDERCHAR "/" ;
BUSBITCHARS "[]" ;
DESIGN shift_reg ;
SCANCHAINS 2 ;
  - Chain1_seg2_clk_rising
    + PARTITION p_clk_rising
#       MAXBITS 248
    + START Q_reg Q
    + FLOATING
       q_reg[0] ( IN SI ) ( OUT Q )
       q_reg[1] ( IN SI ) ( OUT Q )
...
+ STOP q_reg[249] SI
- Chain1_seg4_clk_rising
    + PARTITION p_clk_rising
#       MAXBITS 248
    + START q_reg[249] Q
    + FLOATING
       q_reg[250] ( IN SI ) ( OUT Q )
       q_reg[251] ( IN SI ) ( OUT Q )
       q_reg[252] ( IN SI ) ( OUT Q )
       q_reg[253] ( IN SI ) ( OUT Q )
       q_reg[254] ( IN SI ) ( OUT Q )
       q_reg[255] ( IN SI ) ( OUT Q )
I have also confirmed that these are compatible, by
running reportScanChainPartition:
<CMD> reportScanChainPartition
Info: Scan Chain Partition Group set to:
      Partition group: p_clk_rising
                Chain: Chain1_seg4_clk_rising
                Chain: Chain1_seg2_clk_rising
How do I enable the swapping of cells between compatible scan chains?

Solution

To enable swapping, set the following prior to placeDesign/place_opt_design:


setScanReorderMode -reset
setScanReorderMode -scanEffort high -allowSwapping true
This reorders the scan chains. After this you can observe the following to
confirm that cell swapping is occurring:
First, the scan chains are traced:
Successfully traced scan chain "Chain1_seg2_clk_rising" (248 scan bits).
Successfully traced scan chain "Chain1_seg4_clk_rising" (248 scan bits).
*** Scan Trace Summary (runtime: cpu: 0:00:00.0 , real: 0:00:00.0):
Successfully traced 2 scan chains (total 496 scan bits).
Start applying DEF ordered sections ...
Successfully applied all DEF ordered sections.
*** Scan Sanity Check Summary:
*** 2 scan chains passed sanity check
The scan reordering is performed within the same START – STOP points or
intra chain:
INFO: High effort scan reorder.
Reordered scan chain "Chain1_seg2_clk_rising". Wire length: initial:    
5849.080; final:     5412.760.
Reordered scan chain "Chain1_seg4_clk_rising". Wire length: initial:    
5695.180; final:     5400.100.
*** Summary: Scan Reorder within scan chain
        Total scan reorder time: cpu: 0:00:00.1 , real: 0:00:00.0
Successfully reordered 2 scan chains.
Initial total scan wire length:    11544.260
Final   total scan wire length:    10812.860
Improvement:      731.400   percent  6.34
The scan chain reordering starts swapping within the scan chains, meaning
the compatible partitions are considered:
*** Start scan chain refinement by swapping scan elements in same partition.
INFO: Initial Scan Partition Length
Scan Partition "p_clk_rising":
    Scan chain "Chain1_seg4_clk_rising": 248 scan bits; wire length 5400.100
um.
    Scan chain "Chain1_seg2_clk_rising": 248 scan bits; wire length 5412.760
um.
Scan Partition "p_clk_rising": total 496 scan bits, total wire length 10812.860
um
INFO: Final Scan Partition Length
Scan Partition "p_clk_rising":
    Scan chain "Chain1_seg4_clk_rising": 248 scan bits; wire length 5413.994
um.
    Scan chain "Chain1_seg2_clk_rising": 248 scan bits; wire length 5365.898
um.
Scan Partition "p_clk_rising": total 496 scan bits, total wire length 10779.892
um
INFO: Finished netlist update for 2 scan groups.
*** Summary: Scan Reorder between scan chains
The scan reordering is performed within the same START – STOP points or
intra chain:
INFO: High effort scan reorder.
Reordered scan chain "Chain1_seg2_clk_rising". Wire length: initial:    
5383.720; final:     5383.720.
Reordered scan chain "Chain1_seg4_clk_rising". Wire length: initial:    
5422.240; final:     5391.340.
*** Summary: Scan Reorder within scan chain
        Total scan reorder time: cpu: 0:00:00.0 , real: 0:00:00.0
Successfully reordered 2 scan chains.
Initial total scan wire length:    10805.960
Final   total scan wire length:    10775.060
Improvement:       30.900   percent  0.29
In the end the number of registers in each chain will remain the same, by
default. However, the registers may now belong to a different chain, if these
were swapped. The only time the number of elements in a chain may change is
if the scan DEF file defines a MAXBITS value larger than the current number
of elements in the chain. By default, scan reordering uses a MAXBITS value
equal to the number of original bits in the chain. So, the number of elements
remain the same. However, if MAXBITS is set higher, then swapping can
increase the number of elements in a chain.

Difference between Scan Chain and Scan DEF reporting


Problem

The Scan DEF written from RC/Genus is generally different from the scan
chain report. Although both these reports are for chains with different
number of sequential elements, the segments and chains in the two reports
lead to much confusion.

Solution

This solution explains the possible differences between the scan DEF and scan
chain report and the reasons behind those differences.
Basic difference between the scan DEF and scan chain report

The Scan DEF is not a report on connectivity, whereas the scan chain report
provides information on connectivity.
Reporting of the chain in the netlist can be different as compared to the scan
DEF. This is because the scan DEF gives information on the re-orderable
sequential elements only, whereas the scan chain report provides information
on the complete chain with all the sequential elements.
Scan chain report

llatch 3 BLOCK_2/DFT_lockup_g1 <clk2 (low)>


bit 4     BLOCK_3/Q_reg  <clk3 (rise)>
bit 5     BLOCK_3/q_reg[0]  <clk3 (rise)>
bit 6     BLOCK_3/q_reg[1]  <clk3 (rise)>
bit 7     BLOCK_3/q_reg[2]  <clk3 (rise)>
bit 8     BLOCK_3/q_reg[3]  <clk3 (rise)>
bit 9     BLOCK_3/q_reg[4]  <clk3 (rise)>
bit 10    BLOCK_3/q_reg[5]  <clk3 (rise)> 

Scan DEF report

+ START BLOCK_2/DFT_lockup_g1 Q
+ FLOATING
BLOCK_3/Q_reg ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[0] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[1] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[2] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[3] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[4] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[5] ( IN SI ) ( OUT Q )
+ STOP PIN so2

The reason for the number of elements in the scan DEF being different
from the scan chain report
The Scan DEF will generally have fewer elements in comparison to the scan
chain report. All the flops belonging to the preserved segments, abstract
segments and any other non-re-orderable segment will not become part of the
scan DEF. However, the scan chain will have all the sequential elements that
pass dft rules and are scan mapped.
Scan chain report

Chain 1: sc1
scan_in:      si1
scan_out:     so
shift_enable: SE (active high)
clock_domain: test_domain1 (edge: rise)
length: 11
START segment abs1 (type: abstract)
# @ bit 1, length: 7
pin     BLOCK_1/SI_1 <clk1 (rise)>
pin     BLOCK_1/SO_1 <clk1 (rise)>
END segment abs1
llatch 7 DFT_lockup_g1 <clk1 (low)>
bit 8     BLOCK_2/Q_reg  <clk2 (rise)>
bit 9     BLOCK_2/q_reg[3]  <clk2 (rise)>
bit 10    BLOCK_2/q_reg[4]  <clk2 (rise)>
bit 11    BLOCK_2/q_reg[5]  <clk2 (rise)>  
Chain 2: sc2
scan_in:      si2
scan_out:     so2
shift_enable: SE (active high)
clock_domain: test_domain1 (edge: rise)
  length: 10
START segment fixed_Segment_1 (type: fixed)
# @ bit 1, length: 3
bit 1   BLOCK_2/q_reg[0] <clk2 (rise)>
bit 2   BLOCK_2/q_reg[1] <clk2 (rise)>
bit 3   BLOCK_2/q_reg[2] <clk2 (rise)>
END segment fixed_Segment_1
llatch 3 BLOCK_2/DFT_lockup_g1 <clk2 (low)>
bit 4     BLOCK_3/Q_reg  <clk3 (rise)>
bit 5     BLOCK_3/q_reg[0]  <clk3 (rise)>
bit 6     BLOCK_3/q_reg[1]  <clk3 (rise)>
bit 7     BLOCK_3/q_reg[2]  <clk3 (rise)>
bit 8     BLOCK_3/q_reg[3]  <clk3 (rise)>
bit 9     BLOCK_3/q_reg[4]  <clk3 (rise)>
bit 10    BLOCK_3/q_reg[5]  <clk3 (rise)> 
Scan DEF report

sc1_seg2_test2_rising
#     + PARTITION p_test2_rising
#       MAXBITS 4
+ START DFT_lockup_g1 Q
+ FLOATING
BLOCK_2/Q_reg ( IN SI ) ( OUT Q )
BLOCK_2/q_reg[3] ( IN SI ) ( OUT Q )
BLOCK_2/q_reg[4] ( IN SI ) ( OUT Q )
BLOCK_2/q_reg[5] ( IN SI ) ( OUT Q )
+ STOP PIN so;
- sc2_seg2_test3_rising
#     + PARTITION p_test3_rising
#       MAXBITS 7
+ START BLOCK_2/DFT_lockup_g1 Q
+ FLOATING
BLOCK_3/Q_reg ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[0] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[1] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[2] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[3] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[4] ( IN SI ) ( OUT Q )
BLOCK_3/q_reg[5] ( IN SI ) ( OUT Q )
+ STOP PIN so2;
In the above report, the scan chain report has both abstract segments and
fixed segments. However, for the scan DEF, these segments will not be present
because these are not re-orderable.
The reason for differences in the number of scan_partition

1. For compression, the different channels of compression cannot be


allowed to re-order to become part of different segments in the
scan DEF. This will results in the total number of scan_partition
in the scan DEF being more than the scan chains. The following
example demonstrates the differences:
Scan chain report

Chain 1: sc1 (compressed)


scan_in:      SI1
scan_out:     SO1 
shift_enable: SE (active high)
clock_domain: test_domain1 (edge: rise)
length: 117
START segment DFT_segment_4 (type: abstract)
# @ bit 1, length: 13
pin     COMPACTOR/msi[3] <DFT_mask_clk (rise)>
pin     COMPACTOR/mso[3] <DFT_mask_clk (rise)>
END segment DFT_segment_4
START segment DFT_segment_3 (type: abstract)
# @ bit 14, length: 13
pin     COMPACTOR/msi[2] <DFT_mask_clk (rise)>
pin     COMPACTOR/mso[2] <DFT_mask_clk (rise)>
END segment DFT_segment_3
START segment DFT_segment_2 (type: abstract)
# @ bit 27, length: 13
pin     COMPACTOR/msi[1] <DFT_mask_clk (rise)>
pin     COMPACTOR/mso[1] <DFT_mask_clk (rise)>
END segment DFT_segment_2
START segment DFT_segment_1 (type: abstract)
# @ bit 40, length: 13
pin     COMPACTOR/msi[0] <DFT_mask_clk (rise)>
pin     COMPACTOR/mso[0] <DFT_mask_clk (rise)>
END segment DFT_segment_1
llatch 52 DFT_lockup_g1 <DFT_mask_clk (low)>
<START compressed internal chain sc1_0 (sdi:
COMPACTOR/SWBOX_SI[0])>
bit 53    BLOCK_1/Q_reg  <clk1 (rise)>
bit 54    BLOCK_1/q_reg[0]  <clk1 (rise)>
bit 55    BLOCK_1/q_reg[1]  <clk1 (rise)>
bit 56    BLOCK_1/q_reg[2]  <clk1 (rise)>
bit 57    BLOCK_1/q_reg[3]  <clk1 (rise)>
<END   compressed internal chain sc1_0 (sdo:
COMPACTOR/SWBOX_SO[0]) (length: 5)>
<START compressed internal chain sc1_1 (sdi:
COMPACTOR/SWBOX_SI[1])>
bit 58    BLOCK_1/q_reg[4]  <clk1 (rise)>
bit 59    BLOCK_1/q_reg[5]  <clk1 (rise)>
bit 60    BLOCK_1/q_reg[6]  <clk1 (rise)>
bit 61    BLOCK_1/q_reg[7]  <clk1 (rise)>
bit 62    BLOCK_1/q_reg[8]  <clk1 (rise) >

Scan DEF report

SCANCHAINS 52 ;
- sc1_seg6_test1_rising
#     + PARTITION p_test1_rising
#       MAXBITS 3
+ START BLOCK_1/Q_reg QN
+ FLOATING
BLOCK_1/q_reg[0] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[1] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[2] ( IN SI ) ( OUT QN )
+ STOP BLOCK_1/q_reg[3] SI;
- sc1_seg8_test1_rising
#     + PARTITION p_test1_rising
#       MAXBITS 3
+ START BLOCK_1/q_reg[4] QN
+ FLOATING
BLOCK_1/q_reg[5] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[6] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[7] ( IN SI ) ( OUT QN )
+ STOP BLOCK_1/q_reg[8] SI;
- sc1_seg10_test1_rising
#     + PARTITION p_test1_rising
#       MAXBITS 3
+ START BLOCK_1/q_reg[9] QN
+ FLOATING
BLOCK_1/q_reg[10] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[11] ( IN SI ) ( OUT QN )
BLOCK_1/q_reg[12] ( IN SI ) ( OUT QN )
+ STOP BLOCK_1/q_reg[13] SI;

 If you want the flops of different edges to be stitched into a single


scan chain, set the following attribute to true:
    set_attribute dft_mix_clock_edges_in_scan_chain true /
If there is an edge change from pos to neg, there will be a lockup element. Due
to this, in the scan DEF, it will partition the flops into two segments on the
basis of corresponding active clock edges. Re-ordering is not allowed between
the flops with different active edges or across the lockup element. So,
RC/Genus will split and keep these in different segments. The lockup element
is placed after the +STOP element of the segment.

The reason for the scan DEF starting at the combinational output, while the
scan chain starts from the flop
If there are ordered segment at the beginning of the chain, the segment has a
non-re-orderable element and an inverter.  After writing out the scan DEF,
the scan DEF chain starts at the inverter output. See the following example
for differences:
Scan chain report

Chain 797: wrp_in_chain1


scan_in: scan_input1 (hookup: u1/Z)
scan_out: scan_output1 (hookup: u2/I)
shift_enable: NONE
clock_domain: clk (edge: rise)
length: 15
START segment wrp_2912 (type: preserved/core_wrapper)
# @ bit 1, length: 1
bit 1 u_input_stage_reg[9] <clk (rise)>
bit 2 u_input_stage_testpoint_115 <combinational_instance>
END segment wrp_2912
Scan DEF report

- wrp_in_chain1_seg1_clk_rising
+ PARTITION clk_rising
MAXBITS 13
+ START u_input_stage_testpoint_115 ZN
In this case, because the ordered scan segment at the beginning of the scan
chain is a preserved segment, the connection between the flop and the inverter
has to be preserved. This is why the START of the scan DEF chain is at the
inverter output and, because the preserved segment will not become part of
the scan DEF, it will have only the inverter.

The reason for empty scan DEF after the insertion of compression logic
For compression, it is not the chains but the channels that get dumped in the
scan DEF. If the compression ratio is such that the total number of flops in all
the channels is three or less, the scan DEF will be empty because nothing is re-
orderable.
Posted 29th August 2017 by Raj
 

Add a comment
2.
3.
MAY
31

scan shift and capture timing analysis


scan shift mode timimg analysis:
Posted 31st May 2016 by Raj
 

Add a comment
4.
MAY
30

What is DFT Closure? Why is it Important Now?


Achieving successful DFT closure requires that RTL designers and DFT
engineers work in concert on a unified view of the design, using integrated
tools and flows. It also requires that DFT tools have zero impact on critically
important timing closure flows. The technologies necessary to support this
wide-ranging view of testability are:
 • EDA test tools that begin at the RT level and are integrally linked to
synthesis • Test synthesis cognizant of layout issues and well integrated with
physical design tools • Test synthesis capable of directly synthesizing all DFT
architectures, with full constraint optimization • Completely automated
creation, verification and management of design data created and consumed
by EDA test tools These are the next steps in DFT tools that will be necessary
to achieve the new requirement of DFT closure.
What is DFT Closure? Why is it Important Now?
Simply put, DFT closure is the ability to rapidly and predictably meet all
mandated DFT requirements through every phase of a SoC design flow, with
no risk of design iterations caused by unanticipated test impact. As ICs get
more sophisticated, not embracing a reliable DFT closure methodology may
result in designs that both substantially miss market windows and still fail to
meet required functionality, performance and manufacturability goals. DFT
closure assumes a top-down hierarchical design approach that predictably
proceeds from RT-level, presynthesis planning all the way to physical
implementation. Traditional over-the-wall methodologies requiring design
handoffs between discrete processes, such as between synthesis and scan
insertion, are becoming intractable. In over-the-wall approaches, it is all too
easy to lack knowledge and an understanding of integration issues between
discrete design processes, which lead to schedule-killing iterations. Figure 1
depicts an “over-the-wall” gate-level DFT flow with many iteration loops due
to the likelihood of finding problems later in the design flow.

In this approach, there are numerous opportunities for the designer to


unknowingly break DFT design rules, and thus incur unacceptably long
iteration loops to fix these problems. To avoid this situation, each design
process in a more-robust flow must follow two new rules: 1. Each design
process must be self-contained—it cannot rely on a subsequent process to
completely satisfy its goals. 2. Each design process must perform its task with
a full understanding of the requirements of the subsequent process, and
transfer only valid information to it. For example, today’s design tools and
flows all strive to achieve timing closure. Advanced design flows using
common timing engines that forward-annotate timing constraints from high-
level design to physical synthesis, can eliminate design iterations and enable
huge productivity gains for cutting-edge devices. Because these advanced
designs must also be testable, complete DFT closure should be achieved in
parallel. By applying these rules in a DFT context, Figure 2 illustrates the
benefits of an up-to-date test synthesisbased design flow. The long iteration
loops from the lack of DFT knowledge between synthesis and separate test
activities are partially eliminated. Design flow “closure” is achievable when
these requirements are met for all steps in the flow
Finally, a new design flow supporting complete DFT closure has two
additional requirements: 3. Each design process is cognizant of all relevant
DFT issues, and is able to meet all relevant design and DFT requirements
simultaneously. 4. Each design process transfers only DFT design-rule correct
databases to subsequent processes Figure 3 shows a state-of-the-art design
flow that supports DFT closure. Smart partitioning of the design flow
eliminates long iteration loops.
The Road to DFT Closure 
Achieving successful DFT closure is a process that will evolve and strengthen
as new tools and EDA methodologies are presented to the market. With
existing technology, there are currently two requirements: 1. Implementing a
flow that satisfies all design requirements in a predictable manner. 2. Being
able to do this very quickly, without excess design iterations and wasted
designer effort. For complex ASICs and SoCs, both of these requirements
must be met. So, in addition to the required, intelligent, up-front planning of
design and test architectures and design flows, key design and test
technologies must be deployed as well. These may include: • Test-enabled RT-
level code checking • In-depth RT-level DFT design rule checking, analysis
and verification integrated with design synthesis and consistent with
downstream test synthesis and ATPG tools • Comprehensive test synthesis
capabilities within a timing closure flow • DFT links to physical synthesis,
placement and routing • Synthesis- and gate-level manufacturing testability
analysis • Design tools “sign-off” to ATPG • ATPG “sign-off” to vector
validation and production test Each of these technologies contribute to DFT
closure by enabling completion of all relevant design and test tasks within its
single process, and transfer of complete and valid design data to the following
process. By doing so, designers can eliminate the risk of long iteration loops
between processes. figure 4 shows the benefit of each of these technologies in
enabling DFT closure in a design flow.
DFT Closure and Test Reuse 
Test tools that enable DFT closure offer other benefits as well. Provided they
are truly automatic and transparent to the user, scan synthesis tools make it
easy for the designer to implement test without having to learn the intricacies
of test engineering. Implementing scan during synthesis also means that
designers on different teams, working on different blocks of a complex design,
can individually be responsible for the testability of their subdesigns, and
know the whole device will be testable when it is assembled at the top level.
This is especially important for companies that have embraced design reuse,
are using pre-designed intellectual property (IP) cores, and are following new
core-based design flows. Truly automated scan synthesis tools are critical in
these new IP-based design methodologies to enable DFT closure for the most
complex systems-on-chip. 
Beyond DFT Closure in The ASIC and SoC Design Flow—Future
Possibilities 
The ultimate goal of implementing strong DFT methodologies is to enable the
very best results and productivity in the manufacturing test environment.
Implementing DFT closure to eliminate iteration loops between the entire
design activity and the test floor itself is the logical next step. However, with
the existing “over the wall” relationships between the design world and the
automatic test equipment (ATE) world, achieving effective DFT closure
between these two worlds will be challenging. The catalyst for change will be
the type of paradigm shift that now enables DFT closure in the RTL-to-GDSII
flow: • Knowledge of DFT must be built directly into ATE, and ATE
requirements must be built directly into design and DFT tools. • Design, DFT
and ATE must conform to common standards, methodologies and/or pre-
negotiated requirements. This will eliminate many of the inefficiencies
incurred by the many design and data transfers that are now a requirement.
Once this is accomplished, the industry will realize the full productivity, cost
savings and designer impact benefits of DFT closure. In addition,
comprehensive DFT closure can enable the development of a new class “DFT-
aware” automatic test equipment, which can lead to dramatic reductions in
the cost of test.

http://www.synopsys.com/tools/implementation/rtlsynthesis/capsulemodule/df
t_wpa4.pdf
Posted 30th May 2016 by Raj
 

Add a comment
5.
MAY
24

Test Procedure file

Writing a Test Procedure file

Test procedure files describe the scan circuitry’s operation for the ATPG tool.
They contain cycle based procedures and timing definitions that tell the DFT
tools how to operate the scan structures in the design.

Before running ATPG you must be ready with a test procedure file to proceed
further.

To specify a test procedure file in setup mode, use the Add Scan Groups
command. The tools can also read in procedure files by using the Read
Procfile command or the Save Patterns command when not in Setup mode.
When you load more than one test procedure file, the tool merges the timing
and procedure data.

Following are the standard test procedures: – test_setup, load_unload, shift

The shift and load_unload procedures define how the design must be
configured to allow shifting data through the scan chains. The procedures
define the timeplate that will be used and the scan group that it will reference.
The following are some examples of timeplate definitions and shift and
load_unload procedures:-

timeplate tp1 =
force_pi 0;
measure_po 1;
pulse clk1;
pulse clk2;
period <integer>;
end;

procedure shift =
scan_group grp1 ;
timeplate gen_tp1 ;
cycle =
force_sci ;
measure_sco ;
pulse clk ;
end;
end;
procedure load_unload =
scan_group grp1 ;
timeplate gen_tp1 ;
cycle =
force <control_signal1> <off_state> ;
force <control_signal2> <off state> ;
………………………………. ;
force scan_en 1 ;
end ;
apply shift <number of scan cells in scan chain>;
end;
TimePlate Definition

The timeplate definition describes a single tester cycle and specifies where in
that cycle all event edges are placed.A procedure file must have at least one
timeplate definition.

offstate <pin_name> <off_state>

This statement is required for any pin that is not defined as a clock pin by the
Add Clocks command but will be pulsed within this timeplate.

force/measure_pi/po <time>

A literal and string pair that specifies the force/measure time for all primary
inputs/outputs

bidi_force/measure_pi/po <time>
A literal and string pair that specifies the force/measure time for all
bidirectional pins. This statement allows the bi-directional pins to be forced
after applying the tri-state control signal, so the system avoids bus contention.
This statement overrides “force_pi” and “measure_po”.

force/measure pin_name time


A literal and double string that specifies the force time for a specific named
pin.This force/measure time overrides the force time specified in
force/measure_pi/po for this specific pin.

pulse <pin_name> <time> <width>


A literal and triple string that specifies the pulse timing for a specific named
pin. The time value specifies the leading edge of the pulse and the width value
specifies the width of the pulse. This statement can only reference two kinds of
pins:

 Pins defined as clock pins by the Add Clocks command.


 Pins not defined as clock pins by the Add Clocks command but
which do provide a
pulse signal and have an offstate specified by the “offstate” statement. The
sum of the time and width must be less than the period.

period time
A literal and string pair that defines the period of a tester cycle.

What is the difference between Serial and Parallel Patterns

Parallel patterns are forced parallely (at the same instance of time) @ SI of
each flop and measured @ SO. Basically these patterns are used for
simulating the patterns faster. Here only two cycles are required to simulate a
pattern : one to force all the flops and one for capture.
Serial patterns are the ones which are used @the tester. They are serially
shifted and captured and shifted out.
Posted 24th May 2016 by Raj
 

Add a comment
.
.
DEC
10

DFT related PD Questions


How does mbist logic affect placement ? Will knowing the algorithms used to
assign controllers help in floorplan ? How does scan chain affect PD ?

MBIST (Memory built-in self-test) logic is inserted to test the memories. It


contains MBIST processor & wrapper around the memories. MBIST
processor controls the wrapper & generates various control signals during the
memory testing. A single block may have multiple processors depending on
the number of memories, memory size, power and memory placement.
Memory placed nearby are grouped together & controlled by a single
processor. 
Memory placement information needs to be given to the DFT team in form of
DEF & floorplan snapshot (optional). 
If memories are not grouped properly according to their physical location i.e
memories under same processors are sitting far apart. This will lead to
MBIST logic spreading. 
This may have impact on MBIST timing due to long paths or increase in
congestion due to lots of criss-cross.

Why hold is independent of frequency?


Physical Design world have one critical problem that
if chip have setup violation than we can compromise on the chip performance
and we can make chip to work on the lower frequency that it is design

But if chip have Hold violation than one question will arise in mind 
             'Will it work if we change the frequency ?'

than answer is no, because hold time is independent of frequency.


so, if we find hold violation after the chip design, than it is waste of effort

Fig: Setup time and Hold time is meet in the following figure

As shown in figure, in the window of Tsetup and Thold, data must remain
stable. 
 Fig: Hold violation

Figure explain that there is hold violation due to data change in the Thold
timing window which result into hold violation.

Now, one solution comes in mind,


'Can we fix the hold violation by reducing the frequency ?' 

The Answer is no.


Now, Lets understand the concept of why hold is free from the frequency?
(means changing the frequency cant fix the hold violation)

According to hold violation definition, data should remain stable after the
active edge of clock for some minimum time

Fig: see the following figure that data is traveling from one ff1 to ff2
data1 = data at ff1
data2 = data at ff2
clock1= launch clock
clock2= capture clock

At the clock1, data1 is being sampled at ff2, and at clock2 data2( data of ff1
that is data1) is already reach to ff2 already

from the figure setup checks equation , that is


Tc2q(ff1) + Tcomb = Tclk - Tsetup
data1 of ff1 at clock1 should reach at ff2 in clock2 before the setup time of ff2.

from the hold checks,that is Tc2q + Tcomb >= T(hold)


data1 should not arrive at ff2 at clock2 before hold time because it override
the data2.

which means that data is overridden by next data because data comes to much
fast that override the previous data that is captured by previous clock edge,
so, functionality of chip is getting failed.

If  the delay of combo logic and Tc2q delay is less than the hold time of ff2
than data comes too much fast that which does not give setup violation but
result in hold violation, so, due to this condition data that is already capture
data at ff2(data2) overrides by the data1 at the clock2 edge.
 so, it is only depends on the Tcombo and Tc2q, and Tcombo and Tc2q is not
depends upon the clock period or working frequency.

so, Hold is independent of frequency.


Why setup/hold time come into picture for Reg?
                        Sequential Circuit Timing
This section covers several timing considerations encountered in the design of
synchronous sequential circuits

Why setup time and hold time arise in a flip flop?

To understand why setup and hold time arises in a flip-flop one needs to begin
by looking at its basic function. 

These flip-flop building blocks include inverters and transmission gates.

    Fig: Inveter diagram

 Inverters are used to invert the input

Fig: Transmission gate (Tx)


It is a parallel connection of nMOS and pMOS with complementary inputs to
both MOSFETs
It is Bidirectional, it carries current in either direction. Depending on the
voltage on the gate, the connection between the input and output is either low-
resistance or high-resistance, so that Ron = 100 Ω or less and 
Roff > 5 MΩ. This effectively isolates the output from the input.

The transistor level structure of a D flip-flop contains two 'back-to-back'


inverters known as a 'latching circuit,' since it retains a logic value.
Immediately after the D input, an inverter may or may not be present (see
figure)

Fig : The transistor level structure of D flip-flop contains two back-to-back


inverters known as a'latching circuit.

It is a positive edge triggered flip flop because output arrives at the positive
edge of clk

When clk = 0 , if D changes, the change would reflect only at node z 


When clk = 1,  it would appear at the output only 

Here, setup and hold time came into picture

Lets refresh what is setup and hold time?

Setup time: it is defined as the minimum amount of time before the clock's
active edge that data must be stable for it to be latched correctly.

Hold time:  it is defined as the minimum amount of time after the clock's
active edge during which data must be stable.

Here, setup and hold time is measured with respect to the active clock edge
only.

why setup time came into picture?

see the following fig carefully

Fig: node D to Z delay is called setup time 


when D=0 and clk=0, 
input D is reflected at node z, so it take some time to reach the node z via path
D-W-X-Y-Z.
The time that data D take to reach at node Z is called setup time

this defines the reason for the setup time within a flip flop.
so, it is necessary that data must be stable before the active edge of clock with
delay value of the D to Z node of the latch unit of flip flop and this delay
define the setup time of the register

Note: 
when the clock =0 , LHS part of the flop is active and RHS part is inactive due
to clock is inverted in the RHS region
same, for when clock =1, LHS part of the flop is inactive and RHS part is
active, and reflect the result of D input.

Fig: see the where is setup time came

Why Hold time came into picture?

here, flop is made of two latch unit with working in master and slave logic
working fashion
so we can assume the LHS part is as Latch-1 and RHS part is as Latch-2

see fig carefully


Now, for working clk will always in invert in nature, so 
when latch-1 is active than latch -2 is inactive
when latch-2 is active than latch-1 is inactive

here, hold time came into picture


Time taken by the latch to come into active mode from inactive mode called
hold time.
form this switching  hold time came
 or 
 we can also understand by this way that
there is the finite delay between the clk and clkbar, so transmission gate some
time to switch on and off.
In meantime it is necessary to maintain a stable value at the input to ensure a
stable value at node W, which in turn translates to the output, that defining
the reason for the hold time within a flop.

there may be combo logic sitting before the first transmission gate ( here you
can see the inverter before the transmission gate at the input path from D to
W). This introduces a certain delay in the path of input data D to reach the
transmission gate. this delay establishes whether the hold time is positive,
negative or zero.
Now this relationship between the Combo logic delay and time taken for
transmission gate to switch On and Off after clk and clkbar is given. that
relationship between that rise to various types of hold time that exist, it can
be +ve,-ve or zero hold time.

here, Tcombo define  the delay before first transmission gate


         Tx  define the time taken for transmission gate to switch on and off
         CLK represents the clock with an active rising edge
         D1, D2 and D3 represent various data signals
         S represents the setup margin
         H1, H2, and H3 denotes the respective hold margins

Fig: Hold time  due to Tx and Tcombo

Placement

 Placement is a step in the Physical Implementation process of


placing the standard cell in a standard cell rows in order meet the timing,
congestion, and utilization.
 An input to the placement is floorplan database or def.
 For the placement, complete standard cell area will be divided
into pieces known as bins or also known as bucket. The size of bin may vary
from design to design.
 There are two steps in placement:
1. Global placement
2. Detail placement
Global Placement: As a part of global placement all the standard cells will
place in standard cell rows but there may be some overlap of standard cells.
Detail Placement: All standard cells on standard cell rows will legalized,
refined and their will not be any overlaps.

 Once the placement is done, than we have to check timing as well


as congestion.
 Outputs from the placement will be netlist, def and spef.
NOTES:
Standard Cell Row Utilization: It is defined as the ratio of the area of the
standard Cells to the area of the chip minus the area of the macros and area
of blockages.
Area (Standard Cells)
----------------------------------------
Area (Chip) - Area (Macro) – Area (Region Blockages)

Congestion: If the number of routing tracks available for routing is less than
the required tracks then it is known as congestion.

 Timing checks to perform after placement: congestion issues,


HFN synthesis, capacitance fixing, Transition fixing, setup fixing.

 Based on timing and congestion the tool optimally places standard


cells. While doing so, if scan chains are detached, it can break the chain
ordering and can reorder to optimize it. it maintains the number of flops in a
chain.
 During placement, the optimization may make the scan chain
difficult to route due to congestion. Hence the tool will re-order the chain to
reduce congestion. This sometimes increases hold time problems in the chain.
To overcome these buffers may have to be inserted into the scan path. It may
not be able to maintain the scan chain length exactly. It cannot swap cell from
different clock domains. Because of scan chain reordering patterns generated
earlier is of no use.
 In placement stage only different types of special cells are added.
They are Spare cells, End cap cells, tie cells, etc.

Standard cells: the designer neither uses predesigned logic cells such as AND
gate, NOR gate, etc. These gates are called Standard Cells. The advantage of
Standard Cell ASIC’s is that the designers save time, money and reduce the
risk by using a predesigned and pre-tested Standard Cell Library
Tie cells : The tie cells are used to connect the floating input to either a VDD
or VSS without any change in logic functionality of circuit.
Spare cells : Whenever it is required to perform some functional ECO
(Engineering change order) , spare cells would be used .These are extra cells,
floating in an ASIC design they are also apart of standard cell library and if
you want to include some more functionality, after the base tape out of chip by
using spare cells.
End cap cells: End caps are placed at the end of cell rows and handle end-of-
row well tie-off requirements End caps are used to connect power and ground
rails across an area and are also used to ensure gaps do not occur between
well or implant layers which could cause design rule violations.
Pre requisites of CTS include, ensuring that the design is placed and
optimizes, ensuring that the clock tree can be routed i.e., taking care of
congestion issues, power and ground nets are pre-routed.
The inputs are placement database or design exchange format file after the
placement stage and clock tree constraints.

Scan Chain Reordering

It is the process of reconnecting the scan chains in a design to optimize for


routing by reordering the scan connection which improve timing and
congestion.
Since logic synthesis arbitrarily connects the scan chain, we need to perform
scan reorder after placement so that the scan chain routing will be optimal
Based on timing and congestion the tool optimally places standard cells. While
doing so, if scan chains are detached, it can break the chain ordering (which is
done by a scan insertion tool like DFT compiler from Synopsys) and can
reorder to optimize it & it maintains the number of flops in a chain.
Physical Netlist is reordered based on placement
Reordered scan chain requires much less routing resources in the example
design.

Congestion Effect:
During placement, the optimization may make the scan chain difficult to route
due to congestion. Hence the tool will re-order the chain to reduce congestion.

Timing Effect:
This sometimes increases hold time problems in the chain. To overcome these
buffers may have to be inserted into the scan path. It may not be able to
maintain the scan chain length exactly. It cannot swap cell from different
clock domains.
Because of scan chain reordering patterns generated earlier is of no use. But
this is not a problem as ATPG can be redone by reading the new netlist.
Posted 10th December 2015 by Raj
 

Add a comment
.
OCT
20

test-memories at-speed with a slow-clock


Because of an error in the design, the engineer couldn’t run a memory test
because the high-speed system clock wasn’t available. In this case, only the
relatively slow test clock was available. Of course, the user was very
concerned about the quality of the memory test and was even more concerned
about the potential increased DPPM (defective parts per million) number of
his product. Fortunately, most memory tests aren’t dependent on the high-
speed clock signal.
Using a slow-speed clock, the chance of detecting memory defects reduces
very little, which results in slightly higher DPPM levels. Whether this higher
DPPM level is significant for the product depends more on the product’s
application than on the test. For automotive or medical products, even the
slightest increase in DPPM is unacceptable, but the same DPPM increase for
low-cost consumer electronics might very well be within the contractual
obligations.
The ability of modern memories to self-time is at the core of the mystery. Self-
timing is the ability to execute a read or write operation at a certain speed
without dependency on an external clock stimuli. The time starts when a
change of certain memory control input ports signal the start of a read or
write operation. The time then stops when the operation is complete.
There are two important paths in the memory that determine the test: the
path the data needs to take and the self-timing path. The purpose of the self-
timing path is to always produce the same delay, within margins, and then
trigger the sensing of the data coming out of the memory array. Together,
these paths set the speed at which a system’s memory operates reliably.
To be precise for the context here, synchronous, embedded SRAM (Static
Random Access Memory), used in today’s microelectronics, are virtually all
self-timed. Figure 1 depicts a self-timed memory. The blocks and gates shaded
gray are the key components of the self-timing feature. The delay through the
Model Row and Column Decode logic determines how long the write drivers
are turned on during a write operation and when the latches should capture
the output of the sense amplifiers during a read operation, after the
occurrence of a rising clock edge. Once the operation is complete, the address
precoders are reset and the bit lines are precharged in preparation for the
next memory access.
Figure 1. Diagram of a self-timed memory.
Memory test algorithms, like the so-called “Serial March” algorithm, are
essentially a very specific sequence of writing to and reading from memory
cells. For example, such a memory test algorithm may write a logic 1 into cell
“a,” then write a logic 0 into cell “b.” If everything is OK, reading from cell
“a” should result in a 1. Reading a 0 indicates a defect.
If the time that a read or write operation takes depends only on the memory
itself, why do we need a high-speed clock in the first place? The speed of the
clock becomes important for the MBIST logic itself. That is, it determines the
speed at which the test logic can fire off these read and write commands at
consecutive clock cycles towards the memory. That creates a specific sequence
of operations, which forms the memory-test algorithm.
Based on this information, let’s analyze the quality implications for testing a
memory through MBIST running off a slow clock and then for running off a
fast clock. With a slow clock, you have to look out for defects that require
consecutive read/write operations fired off quickly; those defects aren’t
detectable by MBIST when using a slow clock. Table 1 shows a list of typical
defect types and the expected coverage with a slow clock. Most defects are
very well covered thanks to the memory’s self-timing.
Table 1. Defect coverage of the “SMarchCHKBcil” Algorithm under slow-
speed test clock assumption.
Look at two examples from the Table. (1) A Dynamic Coupling defect is a
defect where writing a value 0 to one cell forces another cell to assume a 0 as
well (analogously for 1). This defect doesn’t depend on how quickly the read
follows the write. Thus, this type of defect is fully detectable. (2) A Write
Recovery fault occurs when a value is read from a cell just after the opposite
value has been written to a cell (along the same column and the bitline
precharge has not been performed correctly). Obviously, the read operation
must immediately follow the write operation. Therefore, Write Recovery
faults aren’t detectable using MBIST off a slow clock.
You might have noticed that the entire argument hinges off the memory’s self-
timing aspect. What if this self-timing logic itself is defective, such that the
speed of the memory is reduced only a little, not enough to cause a
catastrophic test outcome of the memory? There are three ways of answering
the question. The first involves looking at the statistical defect distribution.
Given the size and density of the memory array itself, the relative probability
of a defect with exactly these properties in the self-timing logic is very, very
small.
The second answer is that you may be able to place two edges of the slow test
clock so close to each other that you effectively get an at-speed clock effect
lasting exactly two pulses. This will give you some confidence that the self-
timing logic is operational, but the availability of this solution depends on the
capabilities of the ATE (Automated Test Equipment).
Lastly, if the memory has such a speed-reducing global defect, even the
simplest functional test will uncover its presence. So, we are pretty well
covered at this end as well.
Special thanks to
https://blogs.mentor.com/tessent/blog/2015/09/24/test-memories-at-speed-
with-a-slow-clock/
Posted 20th October 2015 by Raj
 

Add a comment
.
SEP
23

It is a well known fact that DFT Shifting is done at a slower frequency. Well,
I'm gonna list down some cons against this. 
 Lower is the frequency, greater is the test time. In modern SoCs,
tester cost  (which is directly proportional to the tester time) accounts for
roughly 40% of the selling price of a single chip. It would be pragmatic to
decrease the test time by increasing the frequency. No?
 Increasing the frequency would not pose any timing issue.
Because, hold would anyway be met (Hold check is independent of frequency).
And setup would never be in the critical path considering the fact that scan
chains only involve direct path from output of a flop to scan input pin of the
next flop, devoid of any logic.
Then why not test at a higher frequency, which is at least closer to the
functional frequency? What could possibly be the reason for testing at slower
frequency?
Answer:
Unlike functional mode, where different paths have varying combinational
logic between any two registers, in shift mode, there is absolutely no logic at
all! Hence, all the flops tend to switch at the same time. Imagine all the flops
switching at the same time. The peak power consumption which is directly
proportional to the switching frequency, would shoot up, maybe upto the
point that the IC might catch fire!!  

Also, in functional mode, the entire SoC does not function simultaneously.
Depending on use-case, some portions will either not work, or work in
tandem.

You might argue here, that one can run shift the same way, i.e. different parts
in tandem. But that would mean, higher test times that we intended to reduce
by increasing the shift frequency in the first place.

2.In the below question, we intend to check the node X for a stuck-at-0 fault.
Can you tell what input vector (A,B,C) would be need to give to do so?

Answer:
ABC(100) vector will detect the s@0 fault at that node.

Posted 23rd September 2015 by Raj


 

Add a comment
.
.
SEP
23

Setup and hold time definition

Setup and hold checks are the most common types of timing checks used in
timing verification. Synchronous inputs (e.g.   D)  have Setup, Hold time
specification with respect to the clock input. These checks specify that the data
input must remain stable for a specified interval before and after the clock
input changes

Ø  Setup Time:  the amount of time the data at the synchronous input (D) must
be stable before the active edge of clock

Ø  Hold Time: the amount of time the data at the synchronous input (D) must be
stable after the active edge of clock.

Both setup and hold time for a flip-flop is specified in the library.

Setup Time

Setup Time is the amount of time the synchronous input (D) must show up,
and be stable before the capturing edge of clock. This is so that the data can
be stored successfully in the storage device.
Setup violations can be fixed by either slowing down the clock (increase the
period) or by decreasing the delay of the data path logic.
setup information .lib :

timing () {
                related_pin        : "CK";

                timing_type        : setup_rising;

                fall_constraint(Setup_3_3) {

                     index_1 ("0.000932129,0.0331496,0.146240");

                     index_2 ("0.000932129,0.0331496,0.146240");

                     values ("0.035190,0.035919,0.049386", \

                             "0.047993,0.048403,0.061538", \

                             "0.082503,0.082207,0.094815");

                }

Hold Time

Hold Time is the amount of time the synchronous input (D) stays long enough
after the capturing edge of clock so that the data can be stored successfully in
the storage device.

Hold violations can be fixed by increasing the delay of the data path or by
decreasing the clock uncertainty (skew) if specified in the design.
Hold Information n .lib:

timing () {

              related_pin      : "CK";

              timing_type      : hold_rising;

              fall_constraint(Hold_3_3) {

                   index_1 ("0.000932129,0.0331496,0.146240");

                   index_2 ("0.000932129,0.0331496,0.146240");

                   values ("-0.013960,-0.014316,-0.023648", \

                           "-0.016951,-0.015219,-0.034272", \
                           "0.108006,0.110026,0.090834");

              }

Timing paths

Any digital circuit can be represented as a “timing path” modeled between


two flip flops.

Timing Path
Timing path is defined as the path between start point and end point where
start point and end point is defined as follows:
Start Point:
All input ports or clock pins of a sequential element are considered as valid
start point.
End Point:
All output port or D pin of sequential element is considered as End point.
For  Static Timing Analysis (STA) design is split into different timing path
and each timing path delay is calculated based on gate delays and net delays.
In timing path data gets launched and traverses through combinational
elements and stops when it encounter a sequential element. In any timing
path, in general (there are exceptions); delay requirements should be satisfied
within a clock cycle.
In a timing path wherein start point is sequential element and end point is
sequential element, if these two sequential elements are triggered by two
different clocks(i.e. asynchronous) then a common least common multiple
(LCM) of these two different clock periods should be considered to find the
launch edge and capture edge for setup and hold timing analysis.
Different Timing Paths
Any synchronous design is split into various timing paths and each timing
path is verified for its timing requirements. In general four types of timing
paths can be identified in a synchronous design. They are:
Ø  Input to Register
Ø  Input to Output
Ø  Register to Register
Register to Output
Input to Output:
It starts at input port and ends at output port. This is pure combinational
path. You can hardly find this in a synchronous design.

Input to Register:
Semi synchronous; Register is controlled by the clock. Input data can come at
any time.

Register to Register:
Purely sequential; both starting and ending flops are controlled by the clock.

Register to Output:
Data can come at any point of time.

Clock path
The path wherein clock traverses is known as clock path. Clock path can have
only clock inverters and clock buffers as its element. Clock path may be
passed trough a “gated element” to achieve additional advantages. In this
case, characteristics and definitions of the clock change accordingly. We call
this type of clock path as “gated clock path”. The process of “clock gating”
has main advantage of dynamic power saving.

Data path
The path wherein data traverses is known as data path. Data path is a pure
combinational path. It can have any basic combinational gates or group of
gates.

Launch path
Launch path is part of clock path. Launch path is launch clock path which is
responsible for launching the data at launch flip flop.
Launch path and data path together constitute arrival time of data at the
input of capture register.

Capture path
Capture path is part of clock path. Capture path is capture clock path which
is responsible for capturing the data at capture flip flop.
Capture clock period and its path delay together constitute required time of
data at the input of capture register.
Slack
Slack is defined as difference between actual or achieved time and the desired
time for a timing path. For timing path slack determines if the design is
working at the specified speed or frequency. 
 

Data Arrival Time

This is the time required for data to travel through data path.

Data Required Time

This is the time taken for the clock to traverse through clock path.

Setup and hold slack is defined as the difference between data required time
and data arrival time.

setup slack= Data Required Time- Data Arrival Time

hold slack= Data Arrival Time- Data Required Time


A +ve setup slack means design is working at the specified frequency and it
has some more margin as well.

Zero setup slack specifies design is exactly working at the specified frequency
and there is no margin available.

Negative setup slack implies that design doesn’t achieve the constrained
frequency and timing. This is called as setup violation.

 Reg to Reg path

Data arrival time is the time required for data to propagate through source
flip flop, travel through combinational logic and routing and arrive at the
destination flip-flop before the next clock edge occurs.
Arrival Time= Tclk-q+Tcombo
Required Time=Tclock-Tsetup
setup slack= Required Time- Arrival Time
=( Tclock-Tsetup) – (Tclk-q+Tcombo)

 Reg to Output:

Data arrival time is the time required for data to leave source flip-flop, travel
through combinational logic and interconnects and leave the chip through
output port.
 Reg to Output:
Data arrival time is the time required for data to leave source flip-flop, travel
through combinational logic and interconnects and leave the chip through
output port.

Data arrival time is the time required for the data to start from input port and
propagate through combinational logic and end at data pin of the flip-flop.
Arrival time=Tcombo
Required time= Tclk-Tsetup
setup slack= Required Time- Arrival Time
=( Tclock-Tsetup) – Tcombo

Posted 23rd September 2015 by Raj


 

Add a comment







Loading
Dynamic Views theme. Theme images by Dizzo. Powered by Blogger.

You might also like