100% found this document useful (2 votes)
1K views6 pages

Debugging Low Test-Coverage Situations

1) Scan testing involves breaking an integrated circuit into smaller testable structures by converting internal state elements into scan cells connected by scan chains. This allows each part of the circuit to be tested individually. 2) When test coverage goals are not met, the task is to debug why by understanding fault categories in the ATPG statistics report and taking corrective actions. 3) A major category impacting coverage is "ATPG_untestable" faults, which are testable but not detected due to issues like pin constraints, black boxes, RAMs, or cell constraints. The biggest challenge in debugging is finding ways to improve detection of these faults.

Uploaded by

Brijesh S D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
1K views6 pages

Debugging Low Test-Coverage Situations

1) Scan testing involves breaking an integrated circuit into smaller testable structures by converting internal state elements into scan cells connected by scan chains. This allows each part of the circuit to be tested individually. 2) When test coverage goals are not met, the task is to debug why by understanding fault categories in the ATPG statistics report and taking corrective actions. 3) A major category impacting coverage is "ATPG_untestable" faults, which are testable but not detected due to issues like pin constraints, black boxes, RAMs, or cell constraints. The biggest challenge in debugging is finding ways to improve detection of these faults.

Uploaded by

Brijesh S D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SD, Brijesh

To: SD, Brijesh

Debugging Low Test-Coverage Situations


Rick Fisette | Nov 24, 2009

Scan is a structured test approach in which the overall function of an integrated circuit (IC) is broken into smaller
structures and tested individually. Every state element (D flip-flop or latch) is replaced with a scan cell that operates as
an equivalent state element and is concatenated into long shift registers called “scan chains” in scan mode. All the
internal state elements can be converted into controllable and observable logic. This greatly simplifies the complexity of
testing an IC by testing small combinational logic segments between scan cells. Automatic test pattern generation
(ATPG) tools take advantage of scan to produce high-quality scan patterns.

The combination of scan and ATPG tools has been shown to successfully detect the vast majority of manufacturing
defects. When you use an ATPG tool, your goal should be to achieve the highest coverage of defects as possible.
Because high test coverage directly correlates to the quality of the parts shipped, many companies demand that the
coverage for single stuck-at faults be at least 99% and transition delay faults be at least 90%.

When the coverage report falls short of these goals, your task is to figure out why the coverage is not high enough and
perform corrective actions where possible. Debugging low defect coverage historically requires a significant amount of
manual technique and intimate knowledge of the ATPG tool, as well as design experience especially when device
complexity increases.

Automating more of the debug process during ATPG greatly simplifies this effort. I have seen some cases in which
automation saved hours, even days, of manual debugging effort and other cases in which the tool provided answers
when no feasible, manual technique was possible. Before exploring why you might be getting low coverage and why
further automation is needed, I’ll explain how ATPG tools in general categorize and report different categories of faults.

INTERPRETING THE MYSTERIES OF ATPG STATISTICS


The ATPG tool generates a “statistics report” that tells you what the tool has done and provides the fault category
information that you have to interpret to debug coverage problems. If you’re an expert at using an ATPG tool, you’ll
probably have little problem understanding the fault categories listed in the statistics report. But if you’re not a design-
for-test expert, this data may as well be written in hieroglyphics (Fig. 1). Although the statistics report contains a lot of
information, it can be difficult to interpret and rarely gives enough useful information to determine the reasons for low
coverage, even for an ATPG expert.

When debugging low coverage, you’ll need to understand some of the basic fault categories that are listed in most
typical ATPG statistics reports. The first and broadest category is what is sometimes referred to as the “fault universe.”
This is the total number of faults in a design. For example, when dealing with single stuck-at faults, you have two faults
for each instance/pin, stuck_at logic 1 and stuck_at logic 0, where the instance is the full hierarchical path name to a
library cell instantiated in the design netlist.

This number of total faults really is only important when comparing different ATPG tools against each other. The total
number can vary if “internal” faulting is turned on and whether or not “collapsed” faults are used. Internal faulting
extends the fault site down to the ATPG-model level, rather than limiting it to the library-cell level. ATPG tools, for
efficiency purposes, are designed to collapse equivalent faults whenever possible. Typically, you’ll want to have the

1
internal faults setting turned off and uncollapsed faults setting turned on. These settings most closely match the faults
represented in the design netlist.

SHOULD YOU CARE ABOUT UNTESTABLE/UNDETECTABLE FAULTS?


Faults that cannot possibly be tested are reported as untestable or undetectable. This includes faults that are typically
referred to as unused, tied, blocked, and redundant. For example, a tied fault is one in which the designer has purposely
tied a pin to logic high or logic low. If a stuck-at-1 defect were to occur on a pin that is tied high, you could not test for it
because that would require the tool to be able to toggle the pin to logic low. This cannot be done because of the design
restriction, so the fault is categorized as “untestable.”

Untestable/undetectable faults are significant for two reasons. First, they distinguish “fault coverage” from “test
coverage,” both of which are reported by ATPG tools. When most tools calculate coverage, fault coverage includes all
the faults in the design.

Test coverage subtracts the untestable/undetectable faults from the total number of faults when calculating coverage.
For this reason, the reported number for test coverage is typically higher than fault coverage.

The second reason that untestable/undetectable faults are important is that nothing can be done to improve the
coverage of these faults; therefore, you should direct your debugging efforts elsewhere.

One last thing to be aware of regarding untestable/undetectable faults is that ATPG-tool vendors vary in how they
categorize these faults. These differences can result in coverage discrepancies when comparing the results of each tool.

WHAT IS MORE IMPORTANT—TEST COVERAGE OR FAULT COVERAGE?


This begs a question as to which is the more critical figure: test coverage or fault coverage? Most engineers, but not all,
rely on the higher test coverage number. The justification for ignoring untestable/undetectable faults is that any defect
that occurs at one of those fault locations will not cause the device to functionally fail. For example, if a stuck-at 1 defect
occurred on a pin that is tied high by design, the part will not fail in functional operation. Others would argue that fault
coverage is more important because any defect, even an untestable defect, is significant because it represents a
problem in the manufacturing of the device. That debate won’t be explored here though.

Some faults are testable, meaning that a defect at these fault sites would result in a functional failure. Unfortunately,
ATPG tools cannot produce patterns to detect all of the testable faults. These testable but undetected faults are called
“ATPG_untestable” (AU).

Of all the fault categories listed in an ATPG statistics report, AU is the most significant category that negatively affects
test coverage and fault coverage. Determining the reasons why ATPG is unable to produce a pattern to detect these
faults and coming up with a strategy to improve the coverage is the biggest challenge to debugging low-coverage
problems.

Here are some of the most common reasons why faults may be ATPG_untestable:

Pin constraints: At least one input signal (usually more than one) is required to be constrained to a constant value to
enable test mode. While this constraint makes testing possible, it also results in blocking the propagation of some faults
because the logic is held in a constant state. Unless you have special knowledge to the contrary, these pin constraints
must be adhered to, which means you cannot recover this coverage loss.

Determining the effect on coverage loss is not as simple as counting the number of constrained faults on the net. The
effect on defect coverage also extends to all the logic gates that have an input tied and whatever upstream faults are
blocked by that constraint. Faults downstream from the tied logic have limited control, which further affects coverage.

2
Black-box models: When an ATPG model is not available for a module, a library cell, or more commonly a memory, ATPG
tools treat them as “black boxes,” which propagates a fixed value (often an “X” or unknown value). Faults in the
“shadow” of these black boxes (i.e., faults whose control and observation are affected by their proximity to the black
box), will not be detected. This includes faults in the logic cone driving each black-box input as well as the logic cones
driven by the outputs. Obtaining an exact number of undetected faults is complicated by the fact that some of those
faults may also be in other overlapping cones that are detected. The solution is to ensure that everything is modeled in
the design.

Random access memory: In the absence of either bypass logic or the ability to write/read through RAMs, faults in the
shadow of the RAM may be undetected. Similar to black-box faults, it is difficult determine exactly which faults are not
detected because of potentially overlapping cones of logic.

If you make design changes, adding bypass logic may address this problem. Some ATPG tools are capable of special
“RAM-sequential” patterns that can propagate faults through memories so long as the applicable design rule checks
(DRCs) are satisfied. This may be an option to get around having to modify the design to improve coverage.

Cell constraints: Sometimes you need to constrain scan cells with regard to what values they are capable of loading and
capturing (usually for timing-related reasons). These constraints imposed on the ATPG tool will prevent some faults from
being detected. If the cell constraint is one that limits capturing, then to determine the effect, you’ll need to look at the
cone of logic that drives the scan cell and sift out faults that are detected by overlapping cones.

If found early enough in the design cycle, the underlying timing issue can possibly be corrected, which makes cell
constraints unnecessary. However, this type of timing problem is often found too late in the design cycle to be changed.
Using cell constraints is a bandage approach to getting patterns to pass, and the resulting test coverage loss is the price
to be paid.

ATPG constraints: You may impose additional constraints on the ATPG tool to ensure that certain areas of the design are
held in a desired state. For example, let’s say you need to hold an internal bus driving in one direction. As with all types
of constraints, parts of the design will be prevented from toggling, which limits test coverage. Similar to pin constraints,
if the assumption is that these are necessary for the test to work, the coverage loss cannot be addressed.

False/multicycle paths: Some limitations to test coverage are specific to at-speed testing. False paths cannot be tested at
functional frequencies; therefore, ATPG must be prevented from doing so to avoid failures on the automatic test
equipment. Because transition-delay fault (TDF) patterns use only one at-speed cycle to propagate faults, multicycle
paths (which require more than one cycle) must also be masked out. Determining which faults are not detected in false
paths is complicated by the manner in which false paths are defined.

Delay-constraint files usually specify a path by designating “-from”, “-to” and possibly “-through” to describe a start and
end point of the path. In between those points, there can be a significant amount of logic to trace and potentially
multiple paths if you don’t use “-through” to specify the exact path.

STEPS TO IDENTIFY AND QUANTIFY COVERAGE ISSUES


There are three aspects of the debug challenge:

 How you identify which coverage issues (as described above) exist,
 How you determine the effect each issue has on the coverage, and
 What, if anything, you can do to improve the coverage.

Typically, we have had to rely on a significant amount of design experience as well as ATPG tool proficiency to manually
determine and quantify the effects of design characteristics or ATPG settings that limit coverage. The usual steps that
are required to manually debug fault coverage are:

3
1. Identify a common thread in the AU faults.
2. Investigate a single representative fault.
3. Rely on your experience to recognize trends.
4. Determine the effect of the issue on test coverage.

Let’s look at these steps in turn. When it comes to identifying a common thread in the AU faults, it is extremely difficult,
if not impossible, to identify a single problem by looking at a list of AU faults. You have to recognize trends in either the
text listing of faults or graphical view of faults relative to the design hierarchy. For example, a long list of faults that are
obviously contained in the design hierarchy of the boundary scan logic may be caused by a single problem.

At some point, you’ll need to focus your analysis efforts on one fault at a time, so pick one you think might represent a
larger group of faults. You might zero in on design elements like registers or memories, but this is usually based more on
intuition than anything else. ATPG tools have different reporting capabilities that can be used to report on the inherent
controllability and observability of a fault location, which can help but often provide limited information. Interpreting
the reports at this level requires an in-depth knowledge of the ATPG tool’s capabilities and a fair amount of instinct
regarding where to focus efforts.

As is often the case, your success with debugging relies on having been through the process and identifying similar
situations. For example, if a significant number of boundary scan faults are listed as AU, this may be an indication that
the boundary-scan logic has been initialized to a certain desired state and must be held in that state to operate properly.
Making connections like this between the trends you identify in the list of AU faults to what you know about designs and
design practices in general requires a fair amount of experience.

Once an issue is identified, how you determine its significance will be different depending on the issue. As previously
described, you often need to keep track of backward and forward cones of logic fanning out from a single constrained
point to determine the potential group of affected faults. From there, you also need to evaluate each of those potential
faults to assess if it is possibly observed in another overlapping cone of logic.

Some other possible techniques can approximate the effect of some issues. For pin constraints, it may be possible to
have the tool temporarily treat them like a tied-untestable fault so that coverage can be recalculated and compared to
the original coverage number. Whole design modules can be no-faulted (for example, memory built-in self-test
\\[MBIST\\] logic) to see the difference in coverage.

All of these approaches require a combination of special scripts to trace logic paths backward and/or forward, multiple
runs of the ATPG tool with different settings, and a high level of tool expertise. Even then, the actual effect is usually still
based on an approximation.

AUTOMATED DEBUG ANALYSIS


Recently, ATPG tools have been improved to automatically identify issues that affect test coverage and quantify just how
much each issue affects the coverage. The most common method to display this information is through a modified
version of the traditional statistics report that you can access in the command line mode of ATPG tools. Mentor
Graphics’ ATPG tools FastScan and TestKompress are used as an example here to demonstrate what’s available for
automated analysis of low test coverage.

Without any additional ATPG tool runs or any of the manual debug steps, the new statistics report automatically
provides details about coverage issues (Fig. 2). Note the list of the total number of uncollapsed faults in the design,
which is then broken down into various ATPG categories (Fig. 2, arrow #1). The percentage listed within the parentheses
is based on the total number of faults.

4
The next important area of the report is the test coverage achieved by the patterns generated (Fig. 2, arrow #2). In this
case, the coverage is 83.67%, which may not be acceptable. If that test coverage is unacceptable, the next place to look
is the line in the statistics report that indicates the number of atpg_untestable or AU (Fig. 2, arrow #3). This line points
out that 57,563 faults (or 14.56% of the total number of faults) are AU.

Up to this point, the information is very typical of what you would find in a traditional report. Moving down to the
“Untested Faults” section (Fig. 2, arrow #4), you can now get a detailed breakdown of which AU categories have a
significant effect on test-coverage loss. The first most significant category of test coverage loss is TC or tied cells (Fig. 2,
arrow #5). This category of AU faults accounts for 4.46% of the total number of faults. In this case, “tied cells” refers to
registers that are tied to a particular state as a result of the ATPG tool having performed DRCs and simulating an
initialization or “test_setup” procedure.

The report also lists the most significant individual tied cells (as well as the state to which they are tied), so that you may
evaluate the severity of effect on test coverage at a fine level of detail. A quick review of the instance path names of
these tied cells suggests that it’s all test-related logic (boundary scan and MBIST).Although you must still perform
additional manual analysis to determine if this category of AU faults can be reduced, this report gives a clear indication
of where to look in the design. If it is determined that nothing can be corrected because the test mode requires this logic
to be tied, then at least you will be able to explain why 4.46% of the faults will remain untestable.

The next significant category of AU faults is FP or “false_path” faults (Fig. 2, arrow #6). This transition-fault pattern set
includes a definition of false paths so the coverage will be lower. From this report, you can see that 5.37% of the faults
cannot be tested because of the false path definitions. Many test engineers believe that test coverage should not be
penalized as a result of false paths because they are functionally false paths that, by definition, cannot be tested at-
speed.

A relatively significant number of multicycle-path faults (1.01%) hurt the test coverage (Fig. 2, arrow #7). Given this
information, you may choose to address these faults by targeting them with another pattern set using a clock cycle that
will exercise them at a lower frequency. There is no guarantee that all of these faults will be detected at a different
frequency because other issues may prevent detection. What the report tells you is that these definitely can not be
tested because of the reason listed. This is true for all the categories.

The SEQ (sequential_depth) category (Fig. 2, arrow #8) refers to faults that cannot be detected because the sequential
depth of the ATPG tool has not been set high enough. This implies that there may be some non-scan logic or memories
that require an increased sequential depth to propagate and detect faults. You can affect this number by changing some
of the settings during pattern generation.

Right after the SEQ category is another category called “Unclassified.” This is a group of faults that does not fall into any
of the pre-defined categories that the ATPG tool can determine. They are faults that traditional statistics reports would
normally indicate as AU—there’s just no additional detailed analysis available to determine why they are AU. These
faults will require manual analysis.

I previously mentioned that many test engineers do not believe false path faults should be included in the calculation of
test coverage while others do. To satisfy these differing requirements, a new column of information called “total
relevant” has been added to the statistics report (Fig. 2, arrow #9).

Faults that were not considered relevant were deleted, which resulted in the lower number of total faults (374,238) as
compared to the total number of faults in the neighboring column (395,480). How can you tell which faults were
detected from the relevant coverage calculation? If you trace down the “Total Relevant” column of information, you will
eventually see the word “deleted” corresponding to the false-path category. This means that the 21,242 false-path faults
were deleted from the total relevant faults, and the coverage was recalculated. The relevant coverage was 88.46% as
compared to 83.67% (Fig. 2, arrow #10). You can see both coverage numbers side by side and determine which one
should be used.
5
Another way to slice the coverage information is to view it with respect to the clock domains (Fig. 2, arrow #11). The
next column to the right indicates what percentage of the total number of faults is covered by that clock domain (e.g.,
58.71% of the faults in the design are in the clk1 clock domain).

The next column indicates the test coverage of that clock domain’s fault population. In this case, 94.88% of the clk1
faults were detected. The point in listing both the percentage of total faults and percentage coverage of each clock
domain is so that you can investigate low coverage for clock domains that represent a significant percentage of the
design. Additional reporting capability is available so that a detailed analysis of the AU faults can be shown for the fault
universe of each individual clock domain.

Some tools provide more graphical means of viewing this information relative to design hierarchies as well as the
design’s clock domains. In addition to the traditional statistics report viewed on the tool’s command line, you can look at
the coverage analysis graphically. An example shows how the AU analysis categories can be displayed relative to the
design hierarchy (Fig. 3, top left panel). The bottom panel displays the same statistics report as shown on the command
line, but design instances are hyperlinked so that you can bring up the schematic view of that instance (Fig. 3, top right
panel). You can also overlay the fault category information on the schematic view. The example shown here is the same
one discussed earlier in which boundary-scan logic is tied because of the initialization procedure, which resulted in a loss
of 0.24% test coverage.

The additional information provided in detailed statistics reports like this provides valuable insight into how to identify
and address potential test coverage issues. Debug automation in an ATPG tool means that the most significant test-
coverage issues are quickly highlighted along with the effect on coverage. In many cases (such as pin constraints and tied
cells), you will be able to immediately determine that nothing can be done to fix the issue and you can easily determine
what the test-coverage ceiling will be.

Further automation within the ATPG tool eliminates significant manual effort and debug time required to sift through an
otherwise nonsensical listing of untestable faults. As a result, you are freed to focus on the task of resolving the
identified problems.

You might also like