0% found this document useful (0 votes)
12 views33 pages

Multiple Scan Chains For Power Minimization During Test Application in Sequential Circuits

Uploaded by

Harsh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views33 pages

Multiple Scan Chains For Power Minimization During Test Application in Sequential Circuits

Uploaded by

Harsh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Multiple Scan Chains for Power Minimization During

Test Application in Sequential Circuits


Nicola Nicolici and Bashir M. Al-Hashimi
Electronic Systems Design Group
Department of Electronics and Computer Science
University of Southampton
Southampton SO17 1BJ, UK
nn99r,bmah  @ecs.soton.ac.uk

Abstract

This paper presents a new technique for power minimization during test application in sequen-
tial circuits using multiple scan chains. The technique is based on a new design for test (DFT)
architecture and a novel test application strategy which reduces spurious transitions in the cir-
cuit under test. To facilitate the reduction of spurious transitions, the proposed DFT architecture
is based on classifying scan latches into compatible, incompatible and independent scan latches.
Based on their classification scan latches are partitioned into multiple scan chains and a single
extra test vector associated with each scan chain is computed. A new test application strategy
which applies the extra test vector to primary inputs while shifting out test responses for each
scan chain, minimizes power dissipation by eliminating the spurious transitions which occur
in the combinational part of the circuit. The newly introduced multiple scan chain-based tech-
nique which relies on extra test vectors and multiple scan chains does not introduce performance
degradation and minimizes clock tree power dissipation with minimal impact on both test area
and test data overhead. Unlike previous approaches which are test set dependent and hence are
not able to handle large circuits due to the complexity of the design space, this paper shows
that with low test area and test data overhead substantial savings in power dissipation during
test application are achieved in very low computational time for both small and large tests. For
example, in the case of benchmark circuit s15850 it takes  600s in computational time and
 1% in test area and test data overhead to achieve over 80% savings in power dissipation.
1 Introduction

Minimization of power dissipation in very large scale integrated (VLSI) circuits is important
to improve the reliability and reduce packaging costs [1]. This indicates that future successful
portable applications will depend not only on low-power design methods but also on new design
for testability (DFT) techniques targeting low-power VLSI circuits. Numerous techniques for
investigating power minimization during the normal (functional) mode [2] have been proposed.
Also it is important to examine the power dissipation during the testing mode [3, 4] mainly for
the following two reasons. Firstly it was outlined in [1] that power dissipated during test ap-
plication is substantially higher than power dissipated during functional operation which can
decrease the reliability of the circuit under test due to higher temperature and current density.
Secondly the excessive power/ground noise caused by the high rate of current flowing in power
and ground lines can erroneously change the logic state of circuit lines causing some good dies
to fall the test [5] leading to yield loss. While minimizing power dissipation in full scan se-
quential circuits is the focus of this paper, in order to provide a meaningful understanding of the
novel proposed approach a comprehensive review of sources of higher power dissipation during
test application and low power testing techniques is given in sections 1.1 and 1.2. Motivations
and objectives of the proposed work are presented in section 1.3.

1.1 Sources of higher power dissipation during test application


Depending on the level of abstraction and circuit type, high power dissipation during test appli-
cation is due to the following problems:

i. Systems which comprise modern memory systems and multichip modules (MCMs) em-
ploy power-conscious architectural decisions where blocks are not simultaneously acti-
vated under functional operation [6]. Hence, inactive blocks do not contribute to power
dissipation during the functional operation. However, when the system is in the test mode
of operation, concurrent execution of tests in many blocks will result in substantially
higher power dissipation when compared to functional operation.

ii. Low power combinational circuits are synthesized by algorithms [2] which seek to op-
timize the signal or transition probability of circuit nodes using only the spatial depen-
dencies inside the circuit assuming the transition probabilities of primary inputs to be
given. However, the complex spatiotemporal correlations which occur at the primary in-
puts must be considered [2]. This is of further importance during test application since

1
correlation between consecutive test vectors generated by an automatic test pattern gener-
ator (ATPG) is very low, because a test vector is generated for a given target fault without
any consideration of the previous test vector in the test sequence. The low correlation be-
tween consecutive test vectors during test application leads to substantially higher power
dissipation when compared to functional operation.

iii. Low power sequential circuits are synthesized by state assignment algorithms which use
state transition probabilities [2]. The state transition probabilities are computed assuming
input probability distribution and state transition graph which is valid during functional
operation. These two assumptions are not valid during the test mode of operation when
scan DFT technique is employed. While shifting out test responses, the scan latches
are assigned uncorrelated values that destroy the correlation between successive states.
Furthermore, in the case of data path circuits with large number of states that are syn-
thesized for low power using the correlations between data transfers [2], in the test mode
scan registers are assigned uncorrelated values which are never reached during functional
operation leading to substantially higher power dissipation.

1.2 Previous work on low power testing


This section gives a comprehensive review of recently proposed solutions for solving problems
(i) - (iii) of section 1.1.
Problem (i): To overcome the problem of high power dissipation during test application at
the system level , numerous power-constrained test scheduling algorithms have been proposed
under built-in self-test (BIST) environment [1, 6–11]. The approach in [1] schedules the tests
under power constraints by grouping and ordering based on floorplan information. A further
exploration in the solution space of the scheduling problem is provided in [6] where a resource
graph formulation for the test problem is given and tests are scheduled concurrently without
exceeding their power ratings during test application. To overcome the identification of all the
cliques in a graph and the covering table minimization problem applied in [6], which are well
known NP-hard problems, the solution proposed in [7] uses the left edge algorithm and tree
growing technique as an heuristic for the block test scheduling problem. Several solutions for
scheduling tests under power and area constraints [8–11] have recently been proposed. How-
ever, all the previous approaches assume BIST environment which trades off high test area
overhead and test application time at the expense of lower power dissipation during testing.

2
Problem (ii): A new ATPG tool [5] was proposed to overcome the low correlation between
consecutive test vectors during test application in combinational circuits. Despite achieving the
objectives of safe and inexpensive testing of low power circuits the approach in [5] increased
the test application time. A different approach for minimizing power dissipation during test
application in combinational circuits (problem ii) is based on test vector ordering [12–15]. Test
vector ordering is done in a post-ATPG phase with no overhead in test application time since
test vectors are reordered such that correlation between consecutive test vectors matches the
assumed transition probabilities of primary inputs used for switching activity computation dur-
ing low power logic synthesis. However the computational time in [12] is very high due to the
complexity of test vector ordering problem which is reduced to finding a minimum cost hamil-
tonian path in a complete, undirected, and weighted graph. The high computational time is
overcome by the techniques proposed in [13–15] where test vector ordering assumes high cor-
relation between switching activity in the circuit under test and the hamming distance [13, 14]
or transition density [15] at circuit primary inputs. For combinational circuits employing BIST
several techniques for minimizing power dissipation have been proposed recently [16–23]. In
[16] the use of dual speed linear feedback shift register (LFSR) lowers the transition density at
the circuit inputs leading to minimized power dissipation. Optimal weight sets for input signal
distribution are determined in order to minimize average power [17], while the peak power is
reduced by finding the best initial conditions in the cellular automata (CA) cells used for pat-
tern generation [18]. It has been proved in [19] that all the primitive polynomial LFSR of the
same size, produce the same power dissipation in the circuit under test, thus advising to use the
LFSR with smaller number of XOR gates since it yields lowest power dissipation by itself. A
mixed solution based on reseeding LFSRs and test vector inhibiting to filter few non-detecting
subsequences of a pseudorandom test sequence has been proposed in [20]. An enhancement
of test vector inhibiting technique has been proposed in [21] where all the non-detecting sub-
sequences are filtered. A different approach for filtering non-detecting vectors inspired by the
precomputation architecture is presented in [22]. An improvement in area overhead associated
with filtering non-detecting vectors without penalty in fault coverage or test length has been
achieved using non-linear hybrid cellular automata [23]. Regardless of the type of test pattern
generator, BIST architectures significantly differ from one another in terms of power dissipation
as outlined in [24]. Thus, circuit partitioning for low power BIST and test session planning have
an important influence on power dissipation as shown in [25]. Regularity of multiplier modules
and linear sized test set required to achieve high fault coverage lead to efficient low power BIST
implementations for data paths [26]. Although the techniques proposed for minimizing power

3
dissipation during test application in combinational circuits achieve good results, different ap-
proaches are required for sequential circuits where both DFT methodology and test application
strategy have a strong impact on power dissipation.
Problem (iii): To minimize power dissipation in non scan sequential circuits during test appli-
cation a test pattern generation methodology for low power dissipation has been proposed in
[27]. The methodology is based on three independent steps comprising redundant test pattern
generation, power dissipation measurement and optimal test sequence selection. The method-
ology which is based on genetic algorithms achieves considerable savings in power dissipation,
however cannot be applied to scan sequential circuits where shifting power dissipation is the
major contributor to total power dissipation. To minimize shifting power dissipation in scan
sequential circuits, test vector inhibiting techniques proposed for combinational circuits are ex-
tended to scan sequential circuits [28]. In [29] the test vector inhibiting technique is extended
where the modules and modes with the highest power dissipation are identified, and gating
logic to reduce power dissipation has been introduced. Despite substantial savings in power
dissipation vector detection and gating logic introduce not only significant area overhead but
also considerable performance degradation for modified scan cell design. In [30] a new scan
BIST structure has been proposed based on the experimental observation that a very high fault
coverage can be obtained by a small number of clusters of test vectors. Although not targeted
specifically for low power dissipation during test application the approach in [30], yields high
fault coverage with correlated scan patterns which will also lead to lower power dissipation.
A similar approach is employed in the low transition random test pattern generator (LT-RTPG)
proposed in [31], where neighbouring bits of the test vectors are assigned identical values in
most test vectors. A simple and fast procedure to compact scan vectors as much as possible
without exceeding power dissipation has been proposed in [32]. All the previous scan-based
BIST techniques [28–32] introduce test area overhead and/or further performance degradation
when compared to scan DFT methodology. A different technique [12] based on test vector
and scan latch ordering minimizes power dissipation in full scan sequential circuits without any
overhead in test area or performance degradation. Further benefit of the post-ATPG technique
proposed in [12] is that minimization of power dissipation during test application is achieved
without any decrease in fault coverage and/or increase in test application time. However, the
technique is test set dependent and cannot significantly reduce power dissipation despite a large
computational time required to explore the large design space. Furthermore, for circuits with
large number of scan latches the technique proposed in [12] is infeasible since computational
time required to compute the cost function of each solution in the large design space, is unac-

4
ceptably large. A further enhancement of the technique proposed in [12] can be achieved by
defining novel test application strategies since the value of primary inputs is irrelevant while
shifting out test responses. Hence, an improvement to scan latch and test vector ordering based
on primary input freezing has been proposed in [33]. The approach does not introduce area
overhead or further performance degradation, however it requires high computational times for
large circuits. A different approach to achieve power savings is the use of extra primary input
test vectors and hence supplementary volume of test data [34, 35]. The technique proposed in
[34] exploits the redundant information that occurs during scan shifting, test application and
response capture to minimize switching activity in the circuit under test. Despite achieving
considerable power savings the technique requires long test application time and large volume
of test data. The volume of test data is reduced in [35] where a D-algorithm like pattern gen-
erator [36] is developed to generate a single control pattern to mask the circuit activity while
shifting out response. The input control technique proposed in [35] can further be combined
with previously proposed scan latch and test vector ordering [12] to achieve, however, modest
savings in power dissipation. Moreover, both approaches based on extra test vectors [34, 35]
require high computational time and hence are infeasible for large sequential circuits.

1.3 Motivation and objectives


The aim of this paper is to reduce power dissipation in scan sequential circuit (problem iii).
Despite their benefits in lowering power dissipation during test application, the previously de-
scribed techniques [12, 28–35] are inefficient due to one or more of the following problems:

a. test area overhead associated with detection logic [28, 29] required to find non-essential
vectors (i.e. vectors which do not contribute to an increase in fault coverage).

b. performance degradation associated with modified scan cell design [29].

c. large test application time required to achieve significant power savings [29–32, 34].

d. clock tree power dissipation is tackled by clock gating only for nonessential test vectors
[29].

e. high number of extra test vectors [34] emerges as a problem to testers which need to
change to support the large volume of test data [37].

f. computational time may be prohibitively large hindering the exploration for large sequen-
tial circuits [12, 33–35].

5
The previous techniques [12, 28–35] proposed separate solutions for solving one of the prob-
lems (a) - (f) at the expense of the other problems. For example while test vector inhibiting
techniques [28, 29] achieve good savings in power dissipation, considerable area overhead for
detection logic is introduced (problem a) or further performance degradation is incurred (prob-
lem b). On the other hand techniques based on adjacent patterns [30–32] require considerable
test application time (problem c). Furthermore, clock tree power dissipation (problem d) which
can be up to one third of total power dissipation [38] is tackled only in [29] where the clock
is gated only for non-essential test vectors. This implies that for essential vectors there are no
savings in clock tree power dissipation. The technique proposed in [34] necessitates an increase
of  m  p   m p  in the volume of test data where m is the number of scan latches and p is
the number of primary inputs. While volume of test data (problem e) was not a concern in the
past for small to medium sized circuits it is recently emerging as a problem for testers which
need to change to support the large volume of test data [37]. The technique proposed [35] over-
comes the problem with large volume of test data by computing a single extra vector. However,
it yields modest savings in power dissipation due to inability to fully mask the activity in the
combinational part of the circuit. Furthermore, to achieve good fault coverage both techniques
based on extra vectors [34, 35] require longer test sequences and hence both higher test applica-
tion time (problem c) and computational time (problem f). Finally techniques which operate in
a post-ATPG phase [12, 33] using compact test sets for high fault coverage require huge com-
putational time (problem f) since they are strongly test set dependent and require probabilistic
optimization.
The aim of this paper is to introduce a new technique for power minimization during test
application in full scan sequential circuits based on a novel DFT architecture which eliminates
all the above mentioned problems (a) - (f). The proposed DFT architecture is based on partition-
ing scan latches into multiple scan chains which reduces the clock tree power dissipation and
does not have performance penalty. A new test application strategy for the proposed DFT ar-
chitecture which applies a single extra test vector while shifting out test responses for each scan
chain is presented. The multiple scan chain-based approach for power minimization which is
test set independent, is applicable to both non-compact and compact test sets leading to low test
application time. This paper shows that with low test area and test data overhead high savings
in power dissipation during test application in large full scan sequential circuits are achieved in
low computational time.

6
2 Background and Definitions
In the following, a brief review of the standard test terminology and power dissipation concepts
which will be used throughout the paper are presented.
The controlling value for a gate is a single input value that uniquely determines the output
to a known value independent of the other inputs to the gate. For example, the controlling value
for OR gate is 1, and for AND gate is 0. If the value of an input is the complement of the
controlling value, then the input has a noncontrolling value. A path is a set of connected gates
and wires. A path is defined by a single input wire and a single output wire per gate. A signal
is an on-input if it is on the target path. A signal is an off-input (side input) if it is an input to
a gate which is on a target path but is not an on-input. If two faults can be detected by a single
test vector, they are called compatible faults. Consequently, two faults are called incompatible
faults, if they cannot be detected by a single test vector. A test vector from a given test set is
called an essential test vector, if it detects at least one fault that is not detected by any other test
vector in this test set. A test vector is non-essential with respect to a given test set if all the faults
detected by it are also detected by other test vectors in the given test set. A test set dependent
approach for power minimization is dependent on the size and type of the test set employed
during test application. A test set independent approach for power minimization depends only
on circuit structure and savings are guaranteed regardless of the size and type of test set.
Power dissipation in digital CMOS circuits is divided into static and dynamic power. The
static power is considered negligible when compared to the dynamic power in digital CMOS
circuits [39]. If the gate is part of a synchronous digital circuit controlled by global clock, it
follows that the dynamic power dissipation Pd is calculated using:
2
Pd 0 5  Cload  VDD  Tcyc  NG (1)

where Cload is the load capacitance, VDD is the supply voltage, Tcyc is the global clock cycle, and
NG is the total number of gate output transitions (0  1 or 1  0). Since supply voltage VDD
and global clock cycle Tcyc are design constraints, they are not under designer control. Thus,
node transition count
NTC ∑ NG  Cload (2)
f or all gates G
is reported as quantitative measure for power dissipation throughout the paper. It has been
assumed that load capacitance for each combinational gate is equal to the number of fan-outs.
The node transition count in scan latches NSL is considered as in [12], where it was shown that
for input changes 0  0 and 1  1, NSLmin 2, whilst for input changes 0  1 and 1  0,
NSLmax 6.

7
3 Power Minimization in Full Scan Sequential Circuits Based
on Multiple Scan Chains
In this section a new technique for power minimization in full scan sequential circuits based on
multiple scan chains is introduced. Section 3.1 overviews the proposed design for testability
(DFT) architecture for power minimization. Section 3.2 defines compatible, incompatible and
independent scan latches and their importance for partitioning scan latches into multiple scan
chains, as described in section 4, is explained through examples. Interestingly, although a
previous approach [34] used the term ”independent”, they actually classified primary inputs as
independent and not scan latches as it is the case in section 3.2. Therefore, there is no similarity
between the previous approach [34] and the proposed classification beyond accidental sameness
of terminology. Finally section 3.3 gives an important theoretical result showing the advantage
of the proposed DFT architecture from the clock tree power dissipation standpoint.

3.1 Proposed Design for Testability Architecture Using Multiple Scan Chains
The proposed DFT architecture using multiple scan chains SC0  SCk  1 is illustrated in Figure
1. The scan input ScanIn is routed to all scan chains while the scan output ScanOut is selected
from the output of each scan chain. Scan chains SC0  SCk  1 are operated using nonoverlapping
clock signals CLK0  CLKk  1 . Nonoverlapping clock signals gate the system clock CLK using
a scan control register which has the number of latches equal to the number of scan chains.
While shifting out test responses through scan chain SCi , only the bit position i of scan control
register is set to 1 while the other positions are set 0. This is easily implemented by shifting
the value of 1 through scan control register using the extra scan clock SCLK. Before starting
the first scan cycle, the initial vector 10  00 is set up in the scan control register using the scan
input ScanIn. Thereafter, for each scan cycle, the 10  00 value is propagated circularly through
the scan control register as shown in Figure 1. It should be noted that when the circuit under
test is in the test mode all the faults in the extra logic are observable through ScanOut line using
the test data which is shifted through the k scan chains and control data shifted through the scan
control register. Therefore the extra test hardware does not reduce any decrease in fault cover-
age. During the normal operation of the circuit CLK0  CLKk  1 are active at the same time since
when normal/test signal N  T is 1 the output of extra OR gates is 1 and the system clock CLK is
not gated by the scan control register. To provide a brief overview of the test application strat-
egy for the proposed DFT architecture, while shifting out test responses present in scan latches
from scan chain SCi , primary inputs are set to extra test vector EVi which eliminates the spuri-

8
ous transitions (Definition 1 from section 3.2) that originate from scan latches from scan chain
SCi . Note that the proposed DFT architecture does not introduce performance degradation since
extra test hardware is not inserted on critical paths. Further, the extra test hardware required by
the scan control register and selection logic can be specified at the gate level and synthesized
with the rest of the circuit which makes the proposed DFT architecture easily embeddable in
the existing VLSI design flow. The algorithm for partitioning scan latches into multiple scan
chains is described in section 4.1, while the new test application strategy using multiple scan
chains and extra test vectors is described later in section 4.2. Before describing generation of
multiple scan chains, scan latches need to be classified into three broad classes as described in
the following section.

<EV0 ... EVk-1 >


X Z
C

Extra
Test
Vectors
Scan Out
SCk-1
Multiple
Scan CLKk-1
Schains

Scan
Control
Register
SC0
Scan In
CLK0

0 k-1

CLK N/T SCLK

Figure 1: Proposed DFT architecture based on multiple scan chains.

9
3.2 Compatible, Incompatible and Independent Scan Latches
In order to partition scan latches into multiple scan chains, they need to be classified into three
broad classes: compatible, incompatible and independent scan latches. It should be noted that
scan latch classification is not done explicitly by enumeration or exhaustive search, but it is
done implicitly by the partitioning algorithm as explained later in Figure 7 of section 4.1. The
proposed classification is also important for computing extra test vectors associated with each
scan chain that eliminate spurious transitions which are defined as follows.

Definition 1 A spurious transition during test application in scan sequential circuits is a transi-
tion which occurs in the combinational part of the circuit under test while shifting out the test
response and shifting in the present state part of the next test vector. These transitions do not
have any influence on test efficiency since the values at the input and output of the combinational
part are not useful test data.

Having defined the spurious transitions, now the compatible and incompatible scan latches are
introduced.

Definition 2 Two scan latches Si and S j are compatible if all primary inputs xk are assigned
values ck that eliminate the spurious transitions which originate from both Si and S j . The values
ck of primary inputs xk constitute the extra test vector which eliminates spurious transitions
originating from both Si and S j .

Note that the sole purpose of extra test vectors is to reduce the spurious transitions during test
application and has no effect on fault coverage which is determined by the original test set. The
application of extra test vectors defines a novel test application strategy for power minimization
which is detailed in section 4.2. Further, since a single extra test vector is used for each scan
chain regardless of values loaded in scan latches then the volume of extra test data is dependent
only on the number of scan chains and not on the number of scan latches and/or the size of the
original test set.

Definition 3 Two scan latches Si and S j are incompatible if at least one primary input xk that
is assigned value ik to eliminate the spurious transitions which originate from Si will propagate
the transitions which originate from S j . Two incompatible scan latches cannot be assigned to
the same scan chain since there is no extra test vector that can eliminate spurious transitions
which originate from both of them.

10
ScanIn

S0 y 0 z0
x0

y1 z1
S1

S2 y 2 z2
x1

y3 z3
S3
ScanOut
Selection
Logic
S4 y 4 z4
x2

y5 z5
S5

Figure 2: Example circuit illustrating compatible and incompatible scan latches.

The following example illustrates compatible and incompatible scan latches.

Example 1 Consider the simple circuit of Figure 2. The  x0  x1  x2  are primary inputs,  S0  S1 
S2  S3  S4  S5  are scan latches,  y0  y1  y2  y3  y4  y5  are present state lines, and  z0  z1  z2  z3  z4  z5 
are circuit outputs. To eliminate spurious transitions at gate z0 while shifting out test responses
through scan latch S0 , primary input x0 must be assigned the controlling value 0 of gate z0 .
Similarly, to eliminate spurious transitions that originate from scan latch S1 , primary input x0
must be assigned the controlling value 1 of gate z1 . Different values must be assigned to x0
to eliminate spurios transitions which originate from scan latches S0 and S1 . Therefore scan
latches S0 and S1 are incompatible and are assigned to different scan chains SC0  S0  and
SC1  S1  . On the other hand, by assigning x1 to the controlling value 0 of gates z2 and z3
the spurious transitions which originate from both scan latches S2 and S3 are eliminated. Thus,
by introducing S2 and S3 into SC0 and applying for example extra test vector x0 x1 x2  000 
while shifting out test responses from SC0  S0  S2  S3  no spurious transitions will occur at
gates z0 , z2 and z3 . Similarly, scan latches S4 and S5 are compatible since assigning 1 to the
primary input x2 eliminates spurious transitions at gates z4 and z5 . By introducing S4 and S5
into SC1 and applying extra test vector x0 x1 x2  111  while shifting out test responses from

11
SC1  S1  S4  S5  no spurious transitions will occur at gates z1 , z4 and z5 . It should be noted
that there is a strict interrelation between extra test vector value x0 x1 x2  000  and scan chain
SC0  S0  S2  S3  , and x0 x1 x2  111  and scan chain SC1  S1  S4  S5  . While for the sake
of simplicity, the extra test vectors x0 x1 x2  000  and x0 x1 x2  111  have been described
explicitly in this particular example, the extra test vectors and hence the multiple scan chains
are derived implicitly using a reduced circuit, specified fault list and ATPG tool as described
later in the algorithms of section 4.1. Finally, note that output signals z3 of scan chain SC0 and
z5 of SC1 are fed into the selection logic of the proposed DFT architecture from Figure 1.

The previous example has assumed a simple circuit where all the spurious transitions are elim-
inated by partitioning scan latches in two scan chains SC0 and SC1 . However, some of the
spurious transitions cannot be eliminated as described in the following example.

S0 y 0
t0
y1
S1 z0

x0 t1
x1

Figure 3: Example circuit illustrating spurious transitions which cannot be eliminated.

Example 2 Consider the circuit shown in Figure 3. The spurious transitions which originate in
scan latches S0 and S1 cannot be eliminated at gate t0 since both inputs are present state lines.
However, by assigning x0 and/or x1 to the controlling value 0 of gate t1 the spurious transitions
will be eliminated at gate t1 . Scan latches S0 and S1 are compatible since same primary input
values eliminate the spurious transitions of gate t1 .

Example 2 has illustrated that some of the spurious transitions cannot be eliminated since all
the gate inputs depend on present state lines. Computing primary input values that eliminate
spurious transitions (extra test vectors introduced in Definition 2) can be viewed as an ATPG
problem to a reduced circuit with a specified fault list which are detailed in the algorithms
presented in section 4.1. The following example briefly illustrates the generation of the reduced
circuit required to compute extra test vectors.

12
S0 y 0
t0
y1
S1 z0

x0 t1
x1

Figure 4: Reduced circuit of the example circuit from Figure 3 illustrating the steps required to
compute extra test vectors.

Example 3 For the circuit shown in Figure 3 the reduced circuit is generated as follows. Ini-
tially the signal t1 at the input of gate z0 is identified to eliminate spurious transitions that
originate from scan latches S0 and S1 . Then scan latches S0 and S1 , and the AND gate t0 are
excluded from the reduced circuit as shown in Figure 4. Furthermore, gate z0 is modified to a
buffer (signals t1 and z0 are identical). The targeted fault in the reduced circuit is t1 sa  1 which
eliminates the spurious transitions at gate z0 in the original circuit. Finally, the extra test vec-
tors (Definition 2) that eliminate the spurious transitions during test application are computed
x0 x1  0X  X 0  .

A particular case of spurious transitions which cannot be eliminated using a single extra test
vector are those that originate in self-incompatible scan latches and are defined as follows.

Definition 4 A scan latch Si is self-incompatible if at least one primary input xk that is assigned
value ik to eliminate the spurious transitions which originate from Si on one fanout path will
propagate the transitions which originate from S j on a different fanout path.

Now a new question which arises is whether the spurious transitions which originate from self-
incompatible scan latches can be eliminated? In order to provide an answer consider the fol-
lowing example.

Example 4 Consider the circuit of Figure 5 where  x0  x1  are primary inputs, S0 is scan latch,
y0 is present state line, and  t0  t1  t2  are circuit lines. To eliminate spurious transitions at gate
t0 while shifting out test responses through scan latch S0 , primary input x0 must be assigned the
controlling value 1 of gate t0 . However, to eliminate spurious at gate t1 , primary input x0 must
be assigned the controlling value 0 of gate t1 . Different values must be assigned to x0 to elimi-
nate spurios transitions which originate from the same scan latch S0 and hence scan latch S0 is

13
y0
S0
x0 t0

t1
x1 t2

Figure 5: Example circuit illustrating self-incompatible scan latches.

self-incompatible. However if primary input x1 is assigned the controlling value 0 of gate t2 the
spurious transitions which originate in S0 and propagate on path  S0  t1  t2  will be eliminated.
Therefore by assigning extra test vector x0 x1  10  spurious transitions propagating on both
paths  S0  t0  and  S0  t1  t2  will be eliminated. This leads to the conclusion that most of the
spurious transitions originating in self-incompatible scan latches can be eliminated by examin-
ing the fanout paths of self-incompatible scan latches and assigning a single extra test vector
while shifting out the test responses. However, the single extra test vector is at the expense of
a small number of spurious transitions that cannot be eliminated as in the case of transitions on
line t1 in the simple circuit of Figure 5.

The previous example has shown that following a careful examination of fanout branches of
self-incompatible scan latches, most of the spurious transitions originating in self-incompatible
scan latches can be eliminated using a single value for the extra test vector.
Finally, independent scan latches are introduced.

Definition 5 A scan latches Si is independent if all the gates on all the paths which originate
from Si do not have at least one side input which can be justified by primary inputs.

The independent scan latches are grouped in the extra scan chain (ESC) for which no extra test
vector can be computed and hence the spurious transitions cannot eliminated. The following
example illustrates independent scan latches.

Example 5 Consider the circuit shown in Figure 6. Output z0 depends only on scan latches
S0 and S1 , and the next state Y4 of scan latch S4 depends only on scan latches S0 , S1 , S2 and
S3 . There are no side inputs of gates t0 and t1 that can be justified by primary inputs such that
spurious transitions originated from S0 , S1 , S2 and S3 are eliminated. Therefore scan latches S0 ,
S1 , S2 and S3 are independent.

14
S0 y0
t0 z0
y1
S1 Y4
S4 y4

S2 y2
t1
y3
S3

Figure 6: Example circuit illustrating independent scan latches.

3.3 Power dissipated by the buffered clock tree


Previous research has established that power dissipated in the clock tree is typically one third
of the total power dissipation [38] and hence it is necessary to minimize power dissipated in the
clock tree not only during functional operation but also during test application. Unlike previous
approaches which do not consider power dissipated by the buffered clock tree [12, 28, 30–35] or
gate the clock tree only for non-essential test vectors from a large test set [29] the proposed DFT
architecture using multiple scan chains (Figure 1 from section 3.1) reduces clock tree power for
all the test vectors of a very small test set where each test vector is essential (i.e. detects at least
one fault). This is explained by the following theorem which gives an upper bound on power
reduction.

Theorem 1 Consired k scan chains in the design for testability architecture of Figure 1 then
the power reduction of the buffered clock tree over the standard full scan architecture is upper
bounded by  k  1  k.
k 1
Proof: Let  m0    mk  1 be the size of each scan chain and ∑ mi m, where m is the total
i 0
number of scan latches. Since for large dies the clock power dissipation transitions from square-
root dependence on the number of scan latches to a linear dependence [38] power dissipated by
each scan chain SCi can be approximated to λ  mi where λ is dependent on clock frequency,
supply voltage and wire lengths. The power dissipated while shifting test responses over an
k 1
entire scan cycle (m clock cycles) for the proposed architecture is PMSC λ ∑ m2i since over
i 0
mi clock cycles only the buffered clock tree feeding SCi is active. On the other hand power

15
k 1 k 1
dissipated in the traditional full scan architecture is PFULL λ  m2 λ  ∑ mi  ∑ mi  .
 i 0 i 0
Therefore the reduction in power dissipation is

k 1 k 1 k 1
Red  PFULL  PMSC  PFULL 1  λ  ∑ m2i   λ  ∑ mi   ∑ mi  .
i 0 i 0 i 0

Following Cauchy-Schwarz inequality [40] where

k 1 k 1 k 1
 ∑ mi  ∑ mi "! k   ∑ m2i 
i 0 i 0 i 0

the power reduction is upper bounded by Red ! 1  1  k  k  1  k # .

The previous theorem shows that power reduction of up to  k  1  k can be achieved in the
buffered clock tree, with maximum reduction achieved when scan chains have an equal number
of scan latches. It should be noted that by gating the clock of each scan chain not only average
power reduction is achieved but also savings in peak power are guaranteed since while shifting
out test responses only a single buffered clock tree is active.

16
4 Multiple Scan Chains Generation and New Test Applica-
tion Strategy
In this section, partitioning of scan latches in multiple scan chains based on their classification,
as described in 3.2, is given. Then, a new test application strategy for power minimization
during test application, based on the DFT architecture described in section 3.1, is introduced.

4.1 Partitioning Scan Latches into Multiple Scan Chains


Multiple Scan Chain Partitioning (MSC-PARTITIONING) algorithm identifies compatible scan
latches introduced by Definition 2 of section 3.2, groups them in scan chains and computes an
extra test vector for each scan chain. Figure 7 gives the flow of the proposed MSC-PARTITIONING
algorithm which is divided in five parts identified in boxes marked from (a) to (e). In order to
facilitate the elimination of spurious transitions by computing an extra test vector for each scan
chain the initial circuit C needs to be transformed to a reduced circuit C’ (box (a)). A byproduct
of the reduction procedure is a specified fault list L (box (b)) which is targeted by an automatic
test pattern generation (ATPG) process on the reduced circuit C’ (box (c)). Associated with
each fault FSi sa  nci in the specified fault list L is a set of scan latches whose spurious transi-
tions will be eliminated in the original circuit C by applying extra test vector EVi which detects
FSi sa  nci in the reduced circuit C’. Therefore based on fault compatibility in the reduced
circuit C’, scan latch classification in the original circuit C is done implicitly which leads to
several partitions of the initial single scan chain (box (d)). However, some scan latches may
be self-incompatible (Definition 4) which leads to iterations through the ATPG process with a
respecified fault list (box (e)) until no self-incompatible scan latches are left. At the end of the
MSC-PARTITIONING algorithm the multiple scan chains  SC0    SCk  1 ESC  and extra test
 EV0    EVk  1 will be used by the novel test application strategy described in section 4.2.
In the following each part of the MSC-PARTITIONING algorithm is explained in detail.

a. In the first part of the MSC-PARTITIONING of Figure 7 the initial circuit C is transformed
into a reduced circuit C’ as described in CIRCUIT-REDUCTION algorithm of Figure 8.
The algorithm also identifies the freezing signals which are the signals that depend on
primary inputs and should be set to the controlling value as side inputs to the gates which
eliminate transitions that originate from scan latches as described in the following parts.
Two lists of eliminated gates and modified gates contain the gates which ought to be
eliminated and modified respectively in the reduced circuit C’. Initially eliminated gates
is set to all the scan latches whereas the modified gates is void (lines 1-2). The circuit

17
Reduce
initial circuit C to
a
reduced circuit C’

Specify fault list L


for C’
b

ATPG for C’ using


fault list L; derive
c extra test vectors

Classify scan latches Respecify fault list


in C using fault L for C’
d compatibility from C’
e
Generate multiple
scan chains based on
classification

self-incompatible Yes
latches?

No

Scan chains {SC0 , ... , SC k-1 , ESC}

Extra Test Set ES = {EV0 , ... , EVk-1 }

Figure 7: Proposed algorithm for partitioning scan latches in multiple scan chains.

is traversed in breadth first search order using two lists current frontier and new frontier.
While current frontier is set initially to all the scan latches of C (line 3), the new frontier
initially is void (line 4). In the inner loop (lines 6-13) for all the gates neighbours of the
current frontier it is checked where input gates already belong to eliminated gates (i.e.
depend on scan latches). If this is the case then the currently evaluated gate is intro-
duced into eliminated gates, removed from modified gates (if applicable) and introduced
to new frontier. If at least one input does not belong to eliminated gates then the currently
evaluated gate is introduced to modified gates. In the outer loop (lines 5-16) while cur-

18
rent frontier is not void (i.e. no more gates need to be eliminated) the inner loop proceeds
further. At the end of each iteration of the outer loop current frontier and new frontier are
updated (lines 14 and 15). Finally, using the eliminated gates and modified gates the ini-
tial circuit C is modified to the reduced circuit C’ (lines 16 and 17) as follows: gates that
belong to eliminated gates (depend only on scan latches) are excluded; gates that belong
to modified gates (depend on both scan latches and primary inputs) are modified to gates
with input signals dependent only on primary inputs (in the case of gates with two inputs
of which one is a freezing signal the gate is modified to a buffer); all the freezing signals
identified in the first step are set as primary outputs in the reduced circuit. Freezing sig-
nals  FS0    FS p  1 , which are the outputs of the gates present in the modi f ied gates,
are determined simultaneously with identifying independent scan latches (Definition 5).
The independent scan latches are grouped into the extra scan chain (ESC) which consists
of scan latches whose spurious transitions cannot be eliminated by computing an extra
test vector. The algorithm returns not only the reduced circuit C’ but also the list of freez-
ing signals that will be used in the following part of the MSC-PARTITIONING of Figure
7.

b. In the second part a specified fault list L is created which will be provided together with
the reduced circuit C’ to an automatic test pattern generation (ATPG) tool. Specified
fault list L comprises freezing signals FSi targeting the stuck at the non controlling value
sa  nci of the gate gi from modified gates list of algorithm CIRCUIT-REDUCTION of
Figure 8. It is important to note that each fault FSi sa  nci has attached a list of scan
latches  Si0    Sim $ 1 whose spurious transitions in the initial circuit C are eliminated
when setting gate FSi to its controlling value. The list of scan latches is required during
generation of scan chains in part (d) of the MSC-PARTITIONING algorithm.

c. In the third part, having generated the reduced circuit C’ and the specified fault list L,
any state of the art combinational ATPG tool can be used to generate test vectors for the
faults from L for C’. Test vectors for the faults from L are the extra test vectors required
to eliminate spurious transitions while shifting test responses in the initial circuit C as
described in part (d). Since the freezing signals are primary outputs in C’ as described
in part (a) then L contains faults only on primary outputs. This will clearly speed up
the the ATPG process since only backward justification and no forward propagation is
required. Moreover, the specified fault list is significantly smaller than the entire fault
set which will further reduce ATPG computational time for computing extra test vectors.

19
ALGORITHM: CIRCUIT-REDUCTION
INPUT: Circuit C
OUTPUT: Reduced Circuit C’
Freezing Signals  FS0    FS p  1

1 eliminated gates =  S0    Sm  1 
2 modified gates = Ø
3 current frontier =  S0    Sm  1 
4 new frontier = Ø
5 while (current frontier % Ø) 
6 for all gx & neighbours(current frontier)
7 if (all inputs of gx & eliminated gates) 
8 add gx to eliminated gates
9 remove gx from modified gates
10 add gx to new frontier
11 
12 else
13 add gx to modified gates
14 current frontier = new frontier
15 new frontier = Ø
16 
17 generate reduced circuit C’ as follows 
18 eliminate all the gates gx & eliminated gates
19 modify all the gates gy & modified gates
20 
21 freezing signals  FS0    FS p  1  are
output signals of  g0    g p  1  = modified gates
22  Se0    Sem $ 1  for which no freezing signal
exists are introduced in the extra scan chain ESC
23 return Reduced Circuit C’
Freezing signals  FS0    FS p  1 

Figure 8: Proposed algorithm for circuit reduction for extra test vector computation

20
It should be noted that some faults from L are redundant which implies that no extra
test vector can be computed to stop the propagation of the spurious transitions from scan
latches associated with the respective fault. However, this scan latches are treated as self-
incompatible and handled by re-specifying the fault list as described in the last part (e) of
the MSC-PARTITIONING of Figure 7.

d. Given the extra test with a list of faults from L detected by each extra test vector EVi ,
scan latch classification according to definitions from section 3.2 is done as follows. If
two faults FSi sa  nci and FS j sa  nc j from L are incompatible (i.e. they are not detected
by the same extra test vector) then each element of the two lists of scan latches associ-
ated with the two faults  Si0    Sim $ 1 and  S j0    S jq $ 1 respectively, are incompatible
(otherwise they are compatible). This leads to grouping all the scan latches, associated
with faults detectable by single extra test vector, into a single scan chain. However, this
may lead to self-incompatible scan latches (Definition 4 of section 3.2) when different
extra test vectors eliminate spurious transitions from the same scan latch. Consequently,
while there are self-incompatible the MSC-PARTITIONING algorithm will iterate through
parts (e), (c), (d) as explained next.

e. In the case that there are self-incompatible scan latches after the generation of multiple
scan chains then the problem needs to be addressed as it was briefly explained in example
4 of section 3.2. The faults FSi sa  nci which have attached self-incompatible scan
latches are removed from fault list L and new faults are specified on the lines in the fanout
paths of FSi . Thus, the respecified fault list L will be provided back to the ATPG process
for computing extra test vectors (part (c)) which will be followed by new multiple scan
chain generation based on fault compatibility (part (d)). This iterative process continues
until there are no self-incompatible scan latches left.

The MSC-PARTITIONING algorithm of Figure 7 returns the scan chains of compatible scan
latches, the extra scan chain ESC and the extra test set of extra test vectors used to define a new
test application strategy, as explained in the following section.

21
4.2 New Test Application Strategy Using Multiple Scan Chains and Extra
Test Vectors
Having partitioned the scan latches into multiple scan chains with an extra test vector for each
scan chain (section 4.1), this section introduces a new test application strategy for power mini-
mization during test application in full scan sequential circuits.

ALGORITHM: MSC-TEST APPLICATION


INPUT: Test Set S=  V0    Vn  1  , Circuit C
Scan Chains  SC0    SCk  1  ESC 
Extra Test Set ES =  EV0    EVk  1 
OUTPUT: Node transition count NTC

1 NTC ' 0
2 for every Vi from S with i 0    n  1 
3 for every SC j with j 0    k  1 
4 apply EV j to primary inputs
5 compute NTCi ( j by simulating C when
shifting in the present state part of test
vector Vi through scan latches from SC j
6 NTC ' NTC NTCi ( j
7 
8 apply primary part of Vi to primary inputs
9 compute NTCi ( ESC by simulating C when
shifting in the present state part of test
vector Vi through scan latches from ESC
10 NTC ' NTC NTCi ( ESC
11 NTC ' NTC NTCi ( LOAD
12 
13 return NTC

Figure 9: Proposed test application strategy using multiple scan chains and extra test vectors

Multiple Scan Chain Test Application (MSC-TEST APPLICATION) algorithm computes


the node transition count NTC (section 2) during the entire test application period for the
given test set S, circuit C, multiple scan chains  SC0    SCk  1 ESC  , and extra test set ES
=  EV0    EVk  1 . Figure 9 gives the pseudocode of the proposed MSC-TEST APPLICA-
TION algorithm. The value of NTC is 0 at the beginning of the algorithm and it is gradually
increased as the entire test set is traversed. The outer loop represents the traversal of all the
test vectors Vi , with i 0   n  1, from test set S. Shifting out test responses through all the

22
scan chains are then considered in the inner loop. For each scan chain SC j , circuit C is simu-
lated by applying the extra test vector EV j to primary inputs and NTCi ( j is added to the node
transition count NTC. NTCi ( j stands for node transition count while shifting in present state
part of test vector Vi through scan chain SC j and applying extra test vector EV j to the primary
inputs. After shitfting out the test responses though each scan chain SC j the primary input part
of test vector Vi is applied to primary inputs and NTCi ( ESC is computed while shifting out test
response through the extra scan chain ESC. Finally the entire test vector Vi is applied to the
circuit under test and NTCi ( LOAD required to load the test response in the scan latches, is added
to NTC. After the completion of the inner loop, the outer loop continues until the entire test set
is examined. The algorithm returns the value of NTC over the entire test application period. It
should be noted that algorithms presented in this section are independent of test vector and scan
latch order. Unlike the algorithms from [12, 33–35] whose computational time is prohibitively
large hindering the exploration for large sequential circuits, the proposed MSC-PARTITIONING
and MSC-TEST APPLICATION algorithms have low computational time and can handle large
circuits as shown in the following section.

5 Experimental Results
This section demonstrates through a set of benchmark examples that multiple scan chains com-
bined with extra test vectors, as outlined in section 3, yield savings in power dissipation during
test application. The algorithms described in section 4 have been implemented on a 500 MHz
Pentium III PC with 128 MB RAM running Linux and using GNU CC version 2.91. The av-
erage value of node transition count (NTC) reported throughout this section is calculated using
the formulas from section 2 under the assumption of the zero delay model. The use of zero
delay model is motivated by very rapid computation of NTC and by the observation that power
dissipation under zero delay model has a high correlation with power dissipation under gen-
eral delay model [41]. However, the proposed technique applies equally to other general delay
models as unit and variable delay models [42]. Furthermore, due to elimination of spurious
transitions (Definition 1) the propagation of hazards and glitches is eliminated leading to even
greater reductions for power dissipation in the case of unit and variable delay model. Besides,
the aim of this paper is not to give exact values of power dissipation during test application, but
to define a new design for testability architecture and a new test application strategy for power
minimization that applies equally to every delay model.
Table 1 shows the experimental results for all circuits from ISCAS89 benchmark set [46]

23
ATALANTA [43] ATOM [44] MINTEST [45] CPU
ESC
circuit trad. prop. trad. prop. trad. prop. SC time
TV TV TV length
NTC NTC NTC NTC NTC NTC (s)
s208 34 54.54 26.67 65 55.82 25.93 27 54.94 24.07 2 0 1
s298 33 103.56 39.23 52 115.74 46.36 23 108.88 43.81 3 6 1
s344 24 130.36 42.54 62 131.58 48.03 13 124.77 46.69 4 4 1
s349 22 131.90 52.59 65 132.58 53.63 13 128.91 55.86 4 1 1
s382 32 133.91 50.99 72 145.63 51.81 25 148.40 55.54 3 6 1
s386 74 81.31 63.75 109 86.31 58.92 63 85.45 59.09 3 0 1
s400 33 135.97 51.88 98 107.24 43.82 43 100.60 40.11 3 6 1
s420 73 111.69 54.46 98 107.24 43.82 43 100.60 40.11 2 0 1
s444 33 139.92 47.68 77 150.01 49.74 24 156.59 51.87 4 6 1
s510 60 123.89 64.38 90 115.23 65.86 54 114.04 66.14 4 0 1
s526 60 170.61 63.05 107 186.24 67.38 49 183.15 67.95 4 6 1
s641 58 166.32 60.03 99 184.13 60.31 21 176.95 62.92 3 0 1
s713 58 173.34 57.15 100 196.92 59.92 21 192.82 63.50 3 0 1
s820 110 137.52 111.08 190 139.01 110.31 93 137.89 112.44 5 0 1
s832 115 139.83 115.50 200 138.07 114.29 94 138.61 115.29 3 0 1
s838 148 227.46 108.24 183 199.88 81.15 75 187.63 70.41 2 0 1
s953 90 158.50 76.43 138 169.70 76.37 76 169.09 76.02 3 23 1
s1196 140 101.31 68.12 227 105.37 68.37 113 105.47 68.83 4 2 1
s1238 151 101.50 65.15 240 107.46 66.11 121 103.88 65.24 4 2 1
s1423 70 453.58 137.63 135 509.96 150.41 20 507.21 150.82 5 3 2
s1488 119 340.75 225.81 196 347.17 227.33 101 366.18 234.11 3 0 3
s1494 125 329.98 266.05 191 353.12 237.43 100 371.16 235.81 4 0 3
s5378 259 1772.07 527.87 358 1786.60 531.44 97 1809.42 537.51 5 33 49
s9234 366 3160.16 760.35 660 3123.35 754.58 105 3045.64 751.093 6 20 201
s13207 461 5949.81 2051.55 709 5972.92 2056.18 233 5977.48 2047.51 5 330 472
s15850 436 5260.90 942.07 643 5487.29 952.92 94 5481.82 947.82 6 62 596
s35932 65 11067.50 5440.19 129 13039.30 6291.21 12 10860.50 5374.45 2 0 1903
s38417 904 15920.00 7159.88 1458 15849.20 7136.23 68 14199.90 6486.23 5 1079 8151
s38584 658 12766.30 3914.41 989 12871.30 3912.55 110 12901.50 3896.92 7 7 3543

Table 1: Experimental results using multiple scan chains for power minimization.

using three different ATPG test tools [43–45]. The first and second columns give the circuit
name and the number of test vectors (TV) respectively generated using the ATALANTA test
tool [43]. Third column shows the initial average value of NTC (trad. NTC), which is the total
value of NTC using the traditional single scan chain design [36] divided by the total number of
clock cycles over the entire test application period. The next column 4 shows the final average
value of NTC (prop. NTC) when using multiple scan chains and extra test vectors (MSC-TEST
APPLICATION algorithm from section 4.2). The same experiment has been completed for
non compact test sets generated by ATOM test tool [44] (columns 5-7) and compact test sets
generated by MINTEST compaction tool [45] (columns 8-10) respectively. It should be noted
that all the three test sets [43–45] achieve 100% fault coverage. Columns 11 and 12 give the
number of scan chains (SC) and the length of the extra scan chain (ESC) respectively computed
using the MSC-PARTITIONING algorithm outlined in section 4.1. The number of scan chains
varies from 2 as in the case of s208 up to 7 as in the case of s38584. The small number of scan
chains implies that both area overhead required to control multiple scan chains and test data
overhead caused by extra test vectors are very low since they are proportional to the number

24
power test data test area
circuit reduction (%) overhead (%) overhead (%)
ATALANTA ATOM MINTEST ATALANTA ATOM MINTEST
s208 51.09 53.54 56.18 3.26 1.70 4.11 10.00
s298 62.10 59.94 59.75 1.06 0.67 1.53 11.33
s344 67.36 63.49 62.57 4.68 1.81 8.65 11.11
s349 60.12 59.54 56.66 5.11 1.73 8.65 11.11
s382 61.91 64.42 62.57 0.78 0.34 1.00 9.09
s386 21.59 31.73 30.84 2.18 1.48 2.56 10.33
s400 61.84 59.13 60.12 0.75 1.08 2.46 10.63
s420 51.23 59.13 60.12 1.45 1.08 2.46 9.52
s444 65.91 66.84 66.87 1.13 0.48 1.56 11.04
s510 48.03 42.84 42.01 5.06 3.37 5.62 10.52
s526 63.04 63.82 62.89 0.62 0.35 0.76 11.11
s641 63.90 67.24 64.43 3.35 1.96 9.25 8.00
s713 67.02 69.57 67.06 3.35 1.94 9.25 8.00
s820 19.22 20.64 18.45 3.55 2.05 4.20 12.50
s832 17.39 17.22 16.82 2.04 1.17 2.49 10.71
s838 52.41 59.39 62.47 0.69 0.56 1.37 4.65
s953 51.77 0.79 54.99 0.51 55.03 0.93 6.81
s1196 32.75 35.11 34.74 0.93 0.57 1.16 6.81
s1238 35.81 38.47 37.19 0.86 0.54 1.08 4.16
s1423 69.65 70.50 70.26 1.06 0.55 3.73 3.79
s1488 33.73 34.51 36.06 1.44 0.87 1.69 4.16
s1494 19.37 32.76 36.46 1.82 1.19 2.28 6.12
s5378 70.21 70.25 70.29 0.25 0.18 0.67 2.60
s9234 75.93 0.19 75.84 0.11 75.33 0.69 2.06
s13207 65.51 65.57 65.74 0.07 0.04 0.15 0.91
s15850 82.09 82.63 82.70 0.14 0.09 0.67 0.92
s35932 50.84 51.75 50.51 0.06 0.03 0.33 0.25
s38417 55.02 54.97 54.32 0.01 0.01 0.09 0.38
s38584 69.33 69.60 69.79 0.02 0.01 0.14 0.42

Table 2: Power reduction and overhead in test area and test data.

of scan chains. For most of the examples the size of the extra scan chain (ESC length) is nil
or very low. However, there are to extreme cases as in the case of s13207 and s38417 where
the number of independent scan latches (Definition 5 from section 3.1) is very high leading to
an increase in ESC length and hence insignificant penalty in power reduction. It can be clearly
seen that the proposed test application strategy (MSC-TEST APPLICATION from section 4.2)
has significantly smaller average value of NTC for all the benchmark circuits when compared
to initial value of NTC computed using the test application strategy from [36] which employs
a single scan chain. Furthermore, the computational time is very low (  1s) for small circuits.
Moreover, for large circuits which are not handled by previous approaches [12, 34, 35], as in the
case of s38584, it takes  3600s to achieve substantial reduction in average value of NTC.
To give an indication of the reductions in power dissipation, Table 2 shows the percentage
reduction in power dissipation (columns 1-3) and percentage overhead in test data (columns
4-6) and test area (column 7). The power dissipation is considered directly proportional to the
average value of NTC. The test area overhead represents the extra logic required to multiplex
the scan output signal (Figure 1) and it is computed accurately by synthesizing and technology
mapping the ISCAS89 circuits to AMS 0.35 micron technology [47]. The test data overhead

25
represents the number of extra bits required for the extra test vectors (the number of scan chains
multiplied by the number of primary inputs). Note that test area overhead decreases as the
complexity of the circuit increases. This is due to the fact that extra area occupied by scan
control register and selection logic (Figure 1 from section 3.1) required to control multiple scan
chains is very small when compared to the size of large sequential circuits. The power reduction
varies from approximately 82% as in the case of s15850 down to under 17% as in the case of
s832. It should be noted that moderate power reduction as in the case of s386, s510, s820, s832,
s1488, s1494 is due to very small number of scan latches (5 to 6 scan latches only) which are
difficult to be partitioned in multiple scan chains. However, for modern complex digital circuits
where the number of scan latches is significantly higher (thousands as in the case of s38584)
the power reduction is up to 69% at the expense of insignificant  1% test data and test area
overhead. This clearly shows the advantage of the proposed technique for power minimization
using multiple scan chains for large sequential circuits.
N TC

M INTE S T A TA LA NTA A TOM

8000
7000
6000
5000
4000
3000
2000
1000
0
c ir c u it s5378 s9234 s15850 s13207 s38584 s35392 s38417

Figure 10: Curve illustrating test set independent final value of NTC.

A further advantage of the proposed technique is that due to test set independence the final
average value of NTC is predictable within a given range of values regardless of test vectors
applied to the circuit. This is justified by the fact that the proposed low overhead area multiple
scan chain architecture introduced in section 1 is not overly sensitive to the values of test vectors
since only a single chain is active at a time and the spurious transitions within the combinational
circuit are eliminated by the extra primary input vector regardless of the value loaded in non
active scan chains. This is shown in Figure 5 where the graphs for average value of NTC for
for 7 largest ISCAS89 benchmark under three different size test sets are given. For all three

26
test sets MINTEST [45], ATALANTA [43] and ATOM [44] the average values of NTC are
are approximately equal. This implies that the proposed technique can further be applied to
more DFT methodologies such as scan-based BIST [36] where the regardless of the value of
the pseudorandom test set the savings in power dissipation are guaranteed and final values of
NTC are predictable.
It should be noted that experimental results reported in this section using the simplified
power model from section 2 do not consider power dissipated by the clock tree which is typi-
cally one third of the total power dissipation [38]. However, power dissipated by the clock tree
can be substantially reduced using low power buffered clock tree design [38] which success-
fully handles both scan clock gating and scan clock trees required by the proposed design for
testability architecture using multiple scan chains as shown in Theorem 1 of section 3.3.

6 Conclusions
This paper has presented a new technique for power minimization during test application in
sequential circuits using multiple scan chains. The technique is based on a new design for
test (DFT) architecture and a novel test application strategy which reduces spurious transitions
(Definition 1 of section 3.2) in circuit under test. When compared to traditional approach which
consists of a single scan chain [36] the proposed technique employs a novel DFT architecture
based on multiple scan chains leading to substantial reduction in power dissipation. The pro-
posed technique which is test set independent overcomes large test application time required to
achieve significant power savings [29–32, 34] since substantial power reductions are achieved
for both compact and non compact test sets as shown in section 5. The newly introduced DFT
architecture (Figure 1 from section 3.1) does not introduce any performance degradation when
compared to previous approaches employing modified scan cell design [29]. Unlike previous
approaches which do not consider [12, 28, 30–35] or reduce clock tree power dissipation only
for nonessential test vectors [29] the proposed technique reduces clock tree power for all the test
vectors of a very small test set where each test vector is essential as described in section 3.3.
While previous approaches [28, 29] required considerable test area overhead associated with
detection logic the proposed DFT architecture requires very low extra area to control multiple
scan chains which are successfully combined with extra test vectors in the newly introduced
test application strategy in section 4.2. Since a high number of extra test vectors [34] emerges
as a problem to testers which need to change to support the large volume of test data [37], the
proposed technique based on a small number of extra test vectors introduces very low overhead

27
in test data as shown in section 5. Moreover, due to efficient algorithms described in section 4
the proposed technique is computationally inexpensive unlike previous approaches [12, 33–35]
whose computational time is prohibitively large hindering the exploration for large sequential
circuits. Finally, the synthesizable extra hardware required by the new DFT architecture intro-
duced in section 3.1, the efficient algorithms given in section 4.1, and the novel test application
strategy described in section 4.2 make the technique proposed in this paper easily embeddable
in the existing VLSI design flow using state of the art third party electronic design automation
tools.

Acknowledgement
The authors wish to acknowledge Dr. Dong S. Ha of Virginia Polytechnic Institute and State
University for providing ATALANTA ATPG tool. Also, the authors acknowledge the Centre of
Reliable and High-Performance Computing of University of Illinois at Urbana-Champaign for
providing test sets generated by ATOM and MINTEST test tools.

References
[1] Y. Zorian, “A distributed BIST control scheme for complex VLSI devices,” in Proc. 11th
IEEE VLSI Test Symposium, pp. 4–9, 1993.

[2] M. Pedram, “Power minimization in IC design: Principles and applications,” ACM Trans-
actions on Design Automation of Electronic Systems, vol. 1, pp. 3–56, Jan 1996.

[3] Semiconductor Industry Association (SIA), The International Tech-


nology Roadmap for Semiconductors (ITRS): 1999 Edition.
http://public.itrs.net/1999 SIA Roadmap/Home.htm, 1999.

[4] P. Girard, “Low power testing of VLSI circuits: Problems and solutions,” in First Interna-
tional Symposium on Quality of Electronic Design (ISQED), pp. 173–180, 2000.

[5] S. Wang and S. Gupta, “ATPG for heat dissipation minimization during test application,”
IEEE Transactions on Computers, vol. 47, pp. 256–262, Feb 1998.

[6] R. Chou, K. Saluja, and V. Agrawal, “Scheduling tests for VLSI systems under power
constraints,” IEEE Transactions on VLSI, vol. 5, pp. 175–184, Jun 1997.

28
[7] V. Muresan, V. Muresan, X. Wang, and M. Vladutiu, “The left edge algorithm and the tree
growing technique in block-test scheduling under power constraints,” in Proc. of the 18th
IEEE VLSI Test Symposium, 2000.

[8] K. Chakrabarty, “Test scheduling for core-based systems,” in Proc. IEEE/ACM Interna-
tional Conference on Computer-Aided Design (ICCAD), pp. 391–394, 1999.

[9] C. Ravikumar, G. Chandra, and A. Verma, “Simultaneous module selection and schedul-
ing for power-constrained testing of core based systems,” in 13th International Conference
on VLSI Design, pp. 462–467, 2000.

[10] E. Larsson and Z. Peng, “Test infrastructure design and test scheduling optimization,” in
IEEE European Test Workshop, 2000.

[11] N. Nicolici and B. Al-Hashimi, “Power conscious test synthesis and scheduling for BIST
RTL data paths,” in Proc. IEEE International Test Conference (ITC 2000), (Atlantic City,
New Jersey), October 2000.

[12] V. Dabholkar, S. Chakravarty, I. Pomeranz, and S. Reddy, “Techniques for minimizing


power dissipation in scan and combinational circuits during test application,” IEEE Trans-
actions on CAD, vol. 17, pp. 1325–1333, Dec 1998.

[13] P. Girard, C. Landrault, S. Pravossoudovitch, and D. Severac, “Reduction of power con-


sumption during test application by test vector ordering,” IEE Electronics Letters, vol. 33,
no. 21, pp. 1752–1754, 1997.

[14] P. Flores, J. Costa, H. Neto, J. Monteiro, and J. Marques-Silva, “Assignment and reorder-
ing of incompletely specified pattern sequences targeting minimum power dissipation,” in
12th International Conference on VLSI Design, pp. 37–41, 1999.

[15] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, “A test vector ordering tech-
nique for switching activity reduction during test operation,” in 9th Great Lakes Sympo-
sium on VLSI (GLSVLSI99), pp. 24–27, 1999.

[16] S. Wang and S. Gupta, “DS-LFSR: A new BIST TPG for low heat dissipation,” in Proc.
IEEE International Test Conference, pp. 848–857, 1997.

[17] X. Zhang, K. Roy, and S. Bhawmik, “POWERTEST: A tool for energy conscious weighted
random pattern testing,” in 12th International Conference on VLSI Design, pp. 416–422,
1999.

29
[18] X. Zhang and K. Roy, “Design and synthesis of low power weighted random pattern gen-
erator considering peak power reduction,” in International Symposium on Defect and Fault
Tolerance in VLSI Systems, pp. 148–156, 1999.

[19] M. Brazzarola and F. Fummi, “Power characterization of LFSRs,” in International Sym-


posium on Defect and Fault Tolerance in VLSI Systems, pp. 138–146, 1999.

[20] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, “A test vector inhibiting tech-
nique for low energy BIST design,” in Proc. 17th IEEE VLSI Test Symposium, pp. 407–
412, 1999.

[21] S. Manich, A. Gabarro, M. Lopez, J. Figueras, P. Girard, L. Guiller, C. Landrault,


S. Pravossoudovitch, P. Teixeira, and M. Santos, “Low power BIST by filtering non-
detecting vectors,” Journal of Electronic Testing: Theory and Applications (JETTA),
vol. 16, pp. 193–202, June 2000.

[22] F. Corno, M. Rebaudengo, M. S. Reorda, and M. Violante, “A new BIST architecture for
low power circuits,” in IEEE European Test Workshop (ETW99), pp. 160–164, 1999.

[23] F. Corno, M. Rebaudengo, M. S. Reorda, G. Squillero, and M. Violante, “Low power


BIST via hybrid cellular automata,” in 18th IEEE VLSI Test Symposium, 2000.

[24] C. Ravikumar and N. Prasad, “Evaluating BIST architectures for low power,” in 7th Asian
Test Symposium, pp. 430–434, 1998.

[25] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, “Circuit partitioning for low
power BIST design with minimized peak power consumption,” in 8th Asian Test Sympo-
sium (ATS99), pp. 89–94, 1999.

[26] D. Gizopoulos, N. Kranitis, A. Paschalis, M. Psarakis, and Y. Zorian., “Effective low


power BIST for datapaths,” in Proc. of the Design, Automation and Test in Europe Con-
ference (DATE), p. 757, 2000.

[27] F. Corno, P. Prinetto, M. Rebaudengo, and M. Sonza-Reorda, “A test pattern generation


methodology for low power consumption,” in 16th IEEE VLSI Test Symposium, pp. 453–
460, 1998.

[28] F. Corno, M. Rebaudengo, M. S. Reorda, and M. Violante, “Optimal vector selection for
low power BIST,” in IEEE International Symposium on Defect and Fault Tolerance in
VLSI Systems, pp. 219–226, 1999.

30
[29] S. Gerstendorfer and H. Wunderlich, “Minimized power consumption for scan-based
BIST,” Journal of Electronic Testing: Theory and Applications (JETTA), vol. 16, pp. 203–
212, June 2000.

[30] K.-H. Tsai, S. Hellebrand, J. Rajski, and M. Marek-Sadowska, “STARBIST: Scan auto-
correlated random pattern generation,” in 34th IEEE/ACM Design Automation Conference,
pp. 472–477, 1997.

[31] S. Wang and S. Gupta, “LT-RTPG: A new test-per-scan BIST TPG for low heat dissipa-
tion,” in Proc. IEEE International Test Conference, pp. 85–94, 1999.

[32] R. Sankaralingam, R. R. Oruganti, and N. A. Touba, “Static compaction techniques to


control scan vector power dissipation,” in Proc. of the 18th IEEE VLSI Test Symposium,
2000.

[33] N. Nicolici and B. Al-Hashimi, “Minimisation of power dissipation during test application
in full scan sequential circuits using primary input freezing,” IEE Proceedings - Computers
and Digital Techniques, vol. 147, no. to appear, 2000.

[34] S. Wang and S. Gupta, “ATPG for heat dissipation minimization during scan testing,” in
Proc. 34th Design Automation Conference, pp. 614–619, 1997.

[35] T.-C. Huang and K.-J. Lee, “An input control technique for power reduction in scan cir-
cuits during test application,” in Proc. 8th Asian Test Symposium, pp. 315–320, 1999.

[36] M. Abramovici, M. Breuer, and A. Friedman, Digital Systems Testing and Testable Design.
IEEE Press, 1990.

[37] R. Kapur and T. Williams, “Tough challenges as design and test go nanometer,” Computer,
vol. 32, pp. 42–45, November 1999.

[38] A. Vittal and M. Marek-Sadowska, “Low-power buffered clock tree design,” IEEE Trans-
actions on CAD, vol. 16, pp. 965–975, Sep 1997.

[39] A. Chandrakasan and R. Brodersen, Low Power Digital CMOS Design. Kluwer Academic
Publishers, 1995.

[40] L. Rade and B. Westergren, Mathematics Handbook for Science and Engineering.
Springer-Verlag, 4th ed., 1999.

31
[41] A. Shen, A. Ghosh, S. Devadas, and K. Keutzer, “On average power dissipation and ran-
dom pattern testability of CMOS combinational logic networks,” in Proc. IEEE/ACM In-
ternational Conference on Computer Aided Design, pp. 402–407, 1992.

[42] M. Hsiao, E. Rudnick, and J. Patel, “Effects of delay models on peak power estimation of
VLSI sequential circuits,” in Proc. International Conference on Computer Aided Design,
pp. 45–51, 1997.

[43] H. K. Lee and D. S. Ha, “On the generation of test patterns for combinational circuits,”
Tech. Rep. No. 12-93, Department of Electrical Engineering, Virginia Polytechnic Insti-
tute and State University, 1991.

[44] I. Hamzaoglu and J. Patel, “New techniques for deterministic test pattern generation,”
Journal of Electronic Testing: Theory and Application (JETTA), vol. 15, pp. 63–73, Aug
1999.

[45] I. Hamzaoglu and J. Patel, “Test set compaction algorithms for combinational circuits,”
IEEE Transactions on CAD, vol. 19, Aug 2000.

[46] F. Brglez, D. Bryan, and K. Kozminski, “Combinational profiles of sequential bench-


mark circuits,” in Proc. International Symposium on Circuits and Systems, pp. 1929–1934,
1989.

[47] AMS, 0.35 Micron CMOS Process Parameters. Austria Mikro Systeme International AG,
1998.

32

You might also like