0% found this document useful (0 votes)
91 views12 pages

Test Resource Partitioning For Socs

dIFFERENTIAL

Uploaded by

Rajeev Pandey
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views12 pages

Test Resource Partitioning For Socs

dIFFERENTIAL

Uploaded by

Rajeev Pandey
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

International Test Conference

Test Resource Partitioning for SOCs


Anshuman Chandra and Krishnendu Chakrabarty
Duke University

A new test-resource-partitioning approach, based on test data compression and on-chip decompression, reduces data volume, decreases testing time, and accommodates slower (less expensive) testers without decreasing test quality.
SHRINKING PROCESS TECHNOLOGIES and

increasing design sizes have led to highly complex, billion-transistor integrated circuits. Testing these complex ICs to weed out defective parts has become a major challenge. To reduce design and manufacturing costs, testing must be quick and effective. The time it takes to test an IC depends on test data volume. The rapidly increasing number of transistors in ICs has spurred enormous growth in test data volume. Techniques that decrease test data volume and testing time are necessary to increase production capacity and reduce test cost. The latest system-on-a-chip designs integrate multiple ICs (microprocessors, memories, DSPs, and I/O controllers) on a single piece of silicon. SOCs consist of several reusable embedded intellectual-property (IP) cores provided by third-party vendors and stitched into designs by system integrators. Testing all these circuits when they are embedded in a single device is far more difficult than testing them separately. Achieving satisfactory SOC test quality at an acceptable cost and with minimal

effect on the production schedule is also becoming increasingly difcult. High transistor counts and aggressive clock frequencies require expensive automatic test equipment (ATE). More important, they introduce many problems into test development and manufacturing test that decrease product quality and increase cost and time to market. ATE costs have been rising steeply. A tester that can accurately test todays complex ICs costs several million dollars. According to the 1999 International Technology Roadmap for Semiconductors (http://public.itrs.net/Files/1999_SIA_ Roadmap/Home.htm), the cost of a high-speed tester will exceed $20 million by 2010, and the cost of testing an IC with conventional methods will exceed fabrication cost. Conventional directprobe testing methods have become inadequate and are no longer commercially practical. The increasing ratio of internal node count to external pin count makes most chip nodes inaccessible from system I/O pins, so controlling and observing these nodes and exercising the numerous internal states in the circuit under test is difcult. ATE I/O channel capacity, speed, accuracy, and data memory are limited. Therefore, design and test engineers need new techniques for decreasing data volume. Test resource partitioning offers a promising solution to these problems by moving some test resources from ATE to chip. Our new TRP approach, based on test data compression and on-chip decompression, reduces test data volume, decreases testing time, and uses slower testers without decreasing test quality.
IEEE Design & Test of Computers

80

0740-7475/01/$10.00 2001 IEEE

Overview
There are three main TRP techniques:
I

Smaller memory I/O controller DSP core

Test set compaction. This technique reduces test data volume by compacting the partially specified test cubes generated by automatic test pattern generation (ATPG) algorithms. It requires no additional hardware investment. The test set is compacted through dynamic or static compaction procedures.1,2 However, test set compaction results in the application of fewer patterns to the SOC. Because every modeled fault is thus detected by fewer patterns, this approach can reduce unmodeled-fault coverage.3 Built-in self-test. BIST, an alternative to ATEbased external testing, offers several advantages: It lets precomputed test sets be embedded in test sequences generated by on-chip hardware, supports test reuse and at-speed testing, and protects intellectual property. Although BIST is now extensively used for memory testing, it is not as common for logic testing. This is particularly true for nonscan and partial-scan designs in which test vectors cannot be reordered and applying pseudorandom vectors can lead to serious bus contention problems during testing. Moreover, BIST can be applied to SOC designs only if the IP cores are BIST-ready. Because most currently available IP cores are not BIST-ready, BIST insertion in SOCs containing these circuits is expensive and requires considerable redesign. Test data compression. Another way to reduce test data volume is through data compression techniques such as statistical, runlength, Golomb, and frequency-directed run-length (FDR) coding.4-7 These techniques

CPU

Test head

Decoder

ATE memory

Test access mechanism Memory Legacy core Embedded RAM

Lower bandwidth Figure 1. TRP scheme for testing a SOC by storing encoded test data in ATE memory and decoding it with on-chip decoders.

compress precomputed test set TD, provided by the core vendor, into the much smaller test set TE, which is stored in ATE memory. Figure 1 shows a TRP scheme using test data compression. An on-chip decoder performs pattern decompression to generate TD from TE during pattern application. Compressing difference-vector sequence Tdiff determined from TD decreases test set size and reduces testing time.5-6 Figure 2 shows test architectures based on TD and Tdiff and cyclical scan registers. However, using Tdiff and CSRs is not always necessary. Directly encoding TD can also achieve signicant compression. Our TRP approach uses the third technique, which reduces test data volume more than test set compaction and is less expensive than BIST.

Run-length codes
To encode SOC test data, we first decompose it into either xed-length or variable-length blocks. We then assign each block a code

Decoder TE TD

Internal scan chain TE Core under test

Decoder Tdiff CSR TD

Internal scan chain

Core under test (b)

(a)

Figure 2. Decompression architectures using precomputed test set TD (a) and a cyclical scan register (CSR) and difference-vector test set Tdiff (b).

SeptemberOctober 2001

81

International Test Conference

Table 1. An example of conventional run-length encoding for block size b = 3. Group A1 Run length 0 1 2 3 4 5 6 7 A2 8 9 10 11 Code word 000 001 010 011 100 101 110 111 111000 111001 111010 111011

lengths {(k 1)M, (k 1)M + 1, (k 1)M + 2, , kM 1} and a run of kM comprise group Ak. The code word size for the kth group is k(M + 1). Table 1 shows the encoding. Variable-to-variable-length: Golomb coding The rst step in the encoding procedure is to select Golomb code parameter m. For certain distributions of the input data stream (Tdiff, in our case), group size m can be optimally determined. For example, if the input data stream is random with 0-probability p, then m should be chosen such that pm 0.5. However, because the difference vectors for precomputed test sets do not satisfy the randomness assumption, the best value of m for test data compression must be determined experimentally. Once group size m is determined, the runs of 0s in the precomputed test set are mapped to groups of size m (each group corresponding to a run length). The length of the longest run of 0s in the precomputed test set determines the number of groups. The set of run lengths {0, 1, 2, , m 1} forms group A1; the set {m, m + 1, m + 2, , 2m 1} forms group A2; and so on. In general, the set of run lengths {(k 1)m, (k 1)m + 1, (k 1)m + 2, , km 1} comprises group Ak. To each group Ak, we assign a group prex of (k 1) 1s followed by a 0. We denote this by 1(k1)0. If m is determined to be a power of 2 (that is, m = 2N), each group contains 2N members, and a sequence (a tail) of log2(m) bits uniquely identies each member in the group. Thus, the nal code word for run length L that belongs to group Ak is composed of two partsa group prefix and a tail. The prefix is 1(k1)0, and the tail is a sequence of log2(m) bits. Thus, (k 1) = (L mod m)that is, k = (L mod m) + 1. Table 2 shows an example of Golomb encoding. Variable-to-variable length: FDR coding The need for FDR coding arises from the distribution of runs of 0s in typical test sets. We conducted a series of experiments on the large benchmark circuits from the International Symposium on Circuits and Systems (ISCAS) and studied the distribution of runs of 0s in Tdiff
IEEE Design & Test of Computers

word, also of either fixed or variable length. Assigning a fixed-length code word to fixedlength data blocks doesnt lead to significant compression, so we must consider variable-tofixed-length and variable-to-variable-length encoding. Variable-to-xed-length: conventional runlength coding The rst step in encoding test set TD is to generate a fully specied test set with long runs of 0s followed by a single 1. Run-length codes can be used to compress both difference-vector sequence Tdiff and TD. Let TD = {t1, t2, t3, , tn} be the (ordered) precomputed test set. A straightforward heuristic procedure determines the ordering.6 We say that Tdiff = {d1, d2, , dn} = {t1, t1 t2, t2 t3, , tn1 tn}, where a bitwise exclusive-or operation is carried out between patterns ti and ti+1. If uncompacted test set TD is used for compression, all the dont-care bits in TD are mapped to 0s to obtain a fully specied test set before compression. The next step is to select block size b. Once b is determined, the runs of 0s are mapped to groups of size M + 1 = 2b. The length of the longest run of 0s determines the number of groups. The set of run lengths {0, 1, 2, , M 1} and a run of M 0s form group A1; the set {M, M + 1, M + 2, , 2M 1} and a run of 2M 0s form group A2; and so on. In general, the set of run

82

obtained from complete single stuck-at test sets for these circuits. Figure 3 illustrates this distribution for benchmark s9234. We found that the distributions were similar for other circuits test sets. Figure 3 shows that the frequency of runs of 0s of length l
I I I

Table 2. An example of Golomb encoding for group size m = 4. Group A1 Run length 0 1 2 3 A2 4 5 6 7 A3 8 9 10 11 110 10 Group prex 0 Tail 00 01 10 11 00 01 10 11 00 01 10 11 Code word 000 001 010 011 1000 1001 1010 1011 11000 11001 11010 11011

is high for 0 l 20, is very low for l 20, and decreases rapidly with decreasing l even within the range 0 l 20.

If we use conventional run-length coding with block size b for compressing such test sets, every run of l 0s, 0 l 2b1, is mapped to a bbit code word. This is clearly inefcient for the large number of short runs of 0s. Likewise, if we use Golomb coding with code parameter m , a run of l 0s is mapped to a code word with l/m + 1 + log2(m) bits. This is also inefficient for short runs of 0s. Clearly, test data compression is more efficient if the more frequently occurring runs of 0s are mapped to shorter code words. This leads us to the notion of FDR codes. FDR code is constructed as follows: The runs of 0s are divided into groups A1, A2, A3, , Ak, where k is determined by length lmax of longest run (2k 3 lmax 2k+1 3). Also, a run of length l is mapped to group Aj, where j = log2(l + 3) 1. The ith groups size equals 2ithat is, Ai contains 2i members. Each code word consists of two partsa group prex and a tail. The group prex identies the group to which the run belongs, and the tail identies the groups members. Table 3 shows an example of FDR encoding. FDR code has the following properties:
I

900 800 700 600 500 400 300 200 100 0

Frequency of runs

11

21

31 41 51 61 Length of runs of 0s

71

81

91

Figure 3. Distribution of runs of 0s for ISCAS benchmark circuit s9234.

Table 3. An example of FDR encoding. Group A1 A2 Run length 0 1 2 3 4 5 A3 6 7 8 9 10 11 12 13 110 10 Group prex 0 Tail 0 1 00 01 10 11 000 001 010 011 100 101 110 111 Code word 00 01 1000 1001 1010 1011 110000 110001 110010 110011 110100 110101 110110 110111

For any code word, prefix and tail are of equal length. For example, they are each one bit long for A1, two bits long for A2, and so on. The length of the prex for group Ai equals i. For example, the prefix is 2 bits long for group A2. For any code word, the prex is identical to the binary representation of the run length corresponding to the groups first element. For example, run-length 8 is mapped to group A3, and this groups first element is run-length 6. Hence, the prefix of the code word for run-length 8 is 110.

SeptemberOctober 2001

83

International Test Conference

30
Conventional run-length code (block size b = 3) Golomb code (code parameter m = 4) FDR code

20

10

20

40 Length of runs of 0s

60

80

increases by only 1 bit as we move from one group to another. Hence, Golomb coding is less effective when the runs of 0s spread far from an effective range determined by m. Figure 4 compares the three codes, showing the number of bits per code word for differentlength runs of 0s. Conventional run-length codes performance is worse than that of Golomb code when run-length l exceeds 7. Golomb codes performance is worse than that of FDR code for l 24. FDR code outperforms the other two types for runs of lengths 0 and 1. Since these runs frequencies are very high for precomputed test sets (Figure 3), FDR codes outperform run-length and Golomb codes for SOC test data compression. Test data compression and decompression Although the on-chip decoder designs are similar for the three codes weve described, we discuss only the Golomb decoder in this article. The decoder is simple, scalable, and independent of the core under test and the precomputed test set. Moreover, because it is small, it does not introduce signicant hardware overhead. The decoder decompresses encoded test set TE and outputs Tdiff. The exclusive-or gate and the CSR generate test patterns from the difference vectors. A counter of log2(m) bits and a nitestate machine can efciently implement the decoder. Figure 5 shows the decoders block diagram. The input to the FSM is bit_in, and enable signal en is used to input the bit whenever the decoder is ready. Signal inc increments the counter, and rs indicates that the counter has nished counting. Signal out is the decoder output, and v indicates when the output is valid. The decoder operates as follows:
I

Figure 4. Comparison of code word size (bits) for different run lengths for FDR code, Golomb code, and conventional run-length code.

Number of code word bits

out bit_in en

v inc FSM rs i = log2(m) i -bit counter

clk Figure 5. Block diagram of the decoder used for decompression.

Code word size increases by 2 bits (1 bit for the prex and 1 bit for the tail) as we move from group Ai to group Ai+1.

Run lengths are also mapped to groups in conventional run-length and Golomb coding. In run-length coding with block size b, the groups are of equal size, each containing 2b elements. The number of code bits to which runs of 0s are mapped increases by b bits as we move from one group to another. On the other hand, in Golomb coding, the group size increases as the runs of 0s growthat is, Ai is smaller than Ai+1. However, tails for Golomb code words in different groups are of equal length (log2(m), where m is the code parameter), and the prex

When the input is 1, the counter counts up to m. Signal en is low while the counter is busy counting and enables the input at the end of m cycles to accept another bit. The decoder outputs m 0s during this operation and makes v high. When the input is 0, the FSM starts decoding the input code words tail. The number of output 0s depends on the binary value of tail bits. The en and v signals synchronize the decoders input and output operations.

84

IEEE Design & Test of Computers

Figure 6 shows the FSM state diagram corresponding to the decoder for m = 4. States S0 to S3 and S4 to S8 correspond to prefix and tail decoding, respectively. We synthesized the FSM using the Synopsys Design Compiler to access the decoders hardware overhead. The synthesized circuit contains only four ip-ops and 34 combinational gates. For a circuit whose test set is compressed using m = 4, the logic shown in the gate-level schematic is the only additional hardware required other than the counter. Thus, the decoder is independent of not only the core under test but also its precomputed test set. The amount of extra logic required for decompression is small and can be implemented easilyin contrast to the runlength decoder, which is not scalable and becomes increasingly complex for higher values of block length b. Test application time Here, we analyze the testing time for a single scan chain using Golomb coding with the test architecture shown in Figure 2. The Golomb decoders state diagram indicates that
I I I

bit_in, rs/en, out, inc, v S0 --/1 -00 S1

1-/0011 -0/0011 -1/1 -00 S3 0-/1-- 0

S2 0-/1-00 1-/0011

0-/1--0

S4

1-/10-1 1-/00-1

S5 1-/00-1 0-/0--0 S8 --/11-1

S6 0-/00-1 --/00-1 S7

Figure 6. Decoder FSM state diagram.

each 1 in the prefix takes m cycles for decoding, each separator 0 takes one cycle, and the tail takes a maximum of m cycles and a minimum of = log2(m) + 1 cycles.

We let nC be the total number of bits in TE, and r the number of 1s in Tdiff. TE contains r tails and r separator 0s, and the number of prex 1s in TE equals nC r[1 + log2(m)]. Therefore, maximum and minimum testing times Tmax and Tmin, measured by the number of cycles, are Tmax = {nC r[1 + log2(m)]}m + r + mr = mnC r[mlog2(m) 1] Tmin = {nC r[1 + log2(m)]}m + r + r = mnC rm[1 + log2(m)] (1 + ) Therefore, the difference between Tmax and Tmin is

= Tmax Tmin = r[m log2(m) 1]

A major advantage of Golomb coding is that on-chip decoding can be carried out at scan clock frequency fscan while TE is fed to the core under test with external clock frequency fext < fscan. This lets us use slower testers without increasing test application time. The external clock and the scan clocks must be synchronized such that fscan = mfext, where Golomb code parameter m is usually a power of 2.8 Therefore, the decoder can generate the bits of Tdiff at fscan. Next, we analyze testing time, using fscan = mfext, and compare our methods testing time with that of external testing applying ATPG-compacted patterns. We let the ATPG-compacted test set contain p patterns and the scan chains length be n bits. Therefore, the ATPG-compacted test sets size is pn bits, and testing time TATPG equals pn external clock cycles. Next, we let difference-vector Tdiff obtained from the uncompacted test set contain r 1s and its Golomb-coded test set TE contain nC bits. Therefore, the maximum number of scan clock cycles required for applying test patterns in the Golomb coding scheme is Tmax = mnC r[mlog2(m) 1].

SeptemberOctober 2001

85

International Test Conference

data_in FSM vin clk

data_out Decoder vout clk_stop Demultiplexer Scan chain for core1

Decoder

Scan chain for core2

Decoder i -bit counter i = log2(m) i

Scan chain for corem

Figure 7. SOC channel selector for application to multiple cores and multiple scan chains.

The maximum testing time (seconds) with Golomb coding is

= Tmax/fscan = {mnC r[mlog2(m) 1]}/fscan

and testing time (seconds) for external testing with ATPG-compacted patterns is

= pn/fext = pnm/fscan

If we are to accomplish testing in * seconds with Golomb coding, scan clock frequency fscan must equal Tmax/*. That is, fscan = {mnC r[mlog2(m) 1]}/*. We meet this requirement using a slow external tester operating at frequency fext = fscan/m. On the other hand, if we use only external testing with p ATPG-compacted patterns, the required external tester clock frequency f ext is pn/*. The ratio between f ext and fext is f ext/fext = (pn/*)/(fscan/m) = pn/[nC rlog2(m) + r/m] Our experimental results show that in all cases the ratio is greater than 1, demonstrating that Golomb-code-based TRP lets us decrease test data volume and use a slower tester without increasing testing time.

Interleaving decompression architecture


Our interleaving decompression architecture based on Golomb coding enables testing

of multiple cores in parallel with a single ATE I/O channel. The architecture reduces testing time and test data volume stored in ATE memory and increases ATE I/O channel capacity. As discussed earlier, when Golomb coding is applied to a data block containing a run of 0s followed by a single 1, the code word contains two partsprex and tail. For given code parameter m (group size), the tails length log2(m) is independent of the run length. Every 1 in the prex corresponds to m 0s in the decoded difference vector. Thus, the prefix consists of a string of 1s followed by a 0, and the 0 identies the tails beginning. Whenever the decoder FSM receives a 1, it runs the counter for m decode cycles and starts decoding the tail as soon as it receives a 0. Tail decoding takes at most m cycles. During prefix decoding, the FSM must wait m cycles before the next prefix bit can be decoded. Therefore, we can test m cores in parallel by using interleaving to feed each cores decoder with encoded prefix data every m cycles. Interleaving can also be used to feed multiple scan chains in parallel, as long as their capture cycles are synchronized. Whenever a tail is to be decoded, the respective decoder is fed with the entire tail of log2(m) bits in a single burst of log2(m) cycles. Figure 7 shows the SOC channel selector used for interleaving. It consists of a demultiplexer, an i-bit counter, and an FSM. Interleaving proceeds as follows: First, the SOC integrator combines the encoded test data for m cores to generate composite bit stream TC, which is stored in the ATE. Next, TC is fed to the FSM, which detects the
IEEE Design & Test of Computers

86

beginning of each tail and feeds the demultiplexer. An i-bit counter, where i = log2(m), selects the outputs to the various cores decoders. TC is obtained by interleaving the prexes of each cores compressed test sets, but the tails are included unchanged. In the example in Figure 8, compressed data for two cores (generated using group size m = 2) have been interleaved to obtain TC. The nal encoded test set will be applied to multiple cores through decompression. Lets describe the SOC channel selector in greater detail. The FSM detects the tails beginning and generates clk_stop to stop the i-bit counter. Signals vin and vout indicate that data_in and data_out are valid. The i-bit counter connects to the demultiplexers select lines, and the demultiplexers outputs connect to the different scan chains decoders. Each scan chain has a dedicated decoder. This decoder receives either a 1 or the tail of the compressed data corresponding to the various cores connected to the scan chain. If the FSM detects that a portion of the tail has arrived, the 0 that identifies the tail passes to the decoder, and clk_stop goes high for the next m cycles. The demultiplexers output doesnt change during this period, and the entire tail passes continuously to the appropriate core. Figures 9a and 9b (next page) show the FSM state diagram for m = 4 and the corresponding timing diagram. The FSM is fed TC corresponding to four different cores. It remains in state S0 as long as it receives 1s corresponding to the prexes. As soon as it receives a 0, it outputs the entire tail unchanged and makes clk_stop high. This stops the i-bit counter and prevents any change at the demultiplexer output. The timing diagram shows that whenever the FSM receives a 0, SOC channel selection remains unchanged for the next m + 1 cycles. The difference between maximum and minimum testing times for a single tail is t = [m log2(m) 1]. If we restrict m to be small, then m 8 and t 4. In this case, we can easily modify the decoder FSM by introducing additional states to the Golomb decoder FSM such that the tail decoding always takes m cycles and t = 0. As Figure 10 shows, three additional states are required to equalize tail and prefix
SeptemberOctober 2001

Core1 1101111100

Core2 1110011100

Final encoded test data TC 11110111001111110000

Figure 8. Interleaving composite encoded test data for two cores with group size m = 2.

decoding for m = 4. The additional states dont increase testing time and hardware overhead signicantly. There are m cores in parallel, and each separator 0 and tail takes m + 1 cycles to decode. For m cores, therefore, the decoding time for the separator and tail is t tail =

(r + mr )
j j j=1

= ( m + 1)

r
j=1

= ( m + 1) R where R=

r
j=1

Because all cores prefixes are decoded in parallel, the number of cycles tprex required for decoding all the prexes in TC equals the number of 1s in the prefix of the core with the largest amount of encoded test data. Therefore, tprex = max(m{nC,i ri[1 + log2(m)]}) = m{nC,max rmax[1 + log2(m)]} where nC,i and ri are the number of encoded bits in TE and the number of 1s in Tdiff for the ith core, respectively. Moreover, nC,max and rmax are the number of encoded bits in TE and the number of 1s in Tdiff for the core with the largest amount of encoded test data. Therefore, total testing time for m cores tested in parallel with interleaving is TI = tprex + ttail = m{nC,max rmax[1 + log2(m)]} + R(1 + m) In contrast, we now nd the testing time TNI

87

International Test Conference

1/0111

data_in/clk_stop, vin, data_out, vout 0/0101

S0 0/1101

S1 1/1111 S3 1/1011 S7 0/0101

0/1001

S2 1/1011 0/1001

S4

S5

S6 -/10-0 -/10-0 S8 -/11-0

-/10-0 -/10-0

(a)

1/0111

S9

clk

data_in clk_stop

vin data_out

vout SOC channel 1 2 3 4


core 1 core 2

1
core 3 core 4

(b)

TC = 1010110011011

Figure 9. State diagram (a) and timing diagram (b) for the SOC channel selector FSM (m = 4).

(NI denotes noninterleaved) required if we test all the cores one by one using a single ATE I/O channel: TN 1 =

where TC denotes the number of bits in TC. The difference between the interleaved and noninterleaved testing times is TNI TI = mTC Rmlog2(m) + R nC,max + mrmax[1 + log2(m)] R mR = m(TC nC,max) m[1 + log2(m)](R rmax) = m{(TC nC,max) [1 + log2(m)] (R rmax)} m(TC nC,max) >> 0
IEEE Design & Test of Computers

({n
m j=1

C ,j

rj [1 + log2 ( m) ] m + ( m + 1) R

})
m j=1

= m TC m

r log (m) r + (m + 1)R


j 2 j j=1

= m TC R [ mlog2 ( m) 1]

= m TC mR log2 ( m) + R

88

because nC,max >> rmax and TC >> R. Consider a hypothetical example of four cores with an encoded test data size equal to nC,1 = 40, nC,2 = 60, nC,3 = 80, nC,4 = 100, and r1 = 4, r2 = 6, r3 = 8, and r4 = 10. Therefore, nC,max = 100, rmax = 10, m = 4, R = 28, and TC = 280. Finally, TNI TI = 4[(280 100) (1 + 2)(28 10)] = 504. This analysis shows that the interleaving architecture reduces testing time and increases ATE channel bandwidth. We developed a Verilog model of the FSM for m = 4 and simulated it using TC = 1010110011011. We also synthesized the gatelevel circuit of the channel selector FSM with the Synopsys Design Compiler. It contains only four flip-flops and 17 gates. Thus, additional hardware overhead is small.

S0 bit_in, re/en, out, inc, v S1 --/1-00 1-/0011 -0/0011 -1/1-00 S3 0-/1--0 0-/1--0 S4 1-/10-1 S6 Additional states --/0--0 S9 S10 0-/0--0 --/0--0 --/0--0 S8 --/11-1 Figure 10. Modied state diagram of the decoder FSM equalizing tail and prex decoding cycles. S11 --/00-1 0-/00-1 S7

S2 0-/1-00 1-/0011

1-/00-1 S5

Experimental results
We performed TRP on the large ISCAS benchmark circuits. We considered full-scan circuits for the proposed compression and decompression schemes. For full-scan circuits, we reordered patterns to achieve higher compression. For all full-scan circuits, we considered a single scan chain. We computed the compression percentage as CP = (TD TE / TD ) 100, where TD is the test set size and TE is the encoded test set size. For our rst experiment, we used differencevector sequences (Tdiff) obtained from partially specied test sets (test cubes). Table 4 presents results for test cubes obtained using dynamic compaction with the Mintest ATPG program.9 The table compares the fully compacted Mintest test sets with the compression obtained from FDR, Golomb, and conventional run-length coding. The table lists precomputed (original) test set sizes (TD), encoded test set sizes (TE), and smallest ATPG-compacted (Mintest) test set sizes. We used a Sun Ultra 10 workstation with a 333-MHz processor and 256 Mbytes of DRAM. Table 4 shows that FDR codes provide better compression than Golomb and conventional run-length codes in all cases. (Golomb code results reported here are better than those reported in an earlier publication6 because we used an improved pattern-reordering heuristic for these experiments.) For circuit s38417, the increase for FDR codes was as much as 7% over
SeptemberOctober 2001

Golomb codes. In all but one case, the encoded test set (TE) size is much smaller than that of the Mintest-compacted test set. The test cubes we used for s35932 were already highly compacted, so we didnt obtain high compression for this circuit. Nevertheless, in contrast to FDR codes, Golomb codes provided insignicant compression, and run-length codes provided no compression for this circuit. On average, the compression obtained with FDR codes was 7.49% higher than that obtained with Golomb codes and 19.56% higher than that obtained with conventional run-length codes. Test data compression always leads to encoded test sets smaller than ATPG-compacted test sets.6 Moreover, test data compression decreases testing time by several orders of magnitude,10 and substantially reduces power consumption during scan testing. Table 5 demonstrates that using test cubes TD (with all the dont-care bits mapped to 0s) also yields high compression. The advantage

89

International Test Conference

Table 4. Compression obtained using various Tdiff sequences. Run-length TD size Circuit s5378 s9234 s13207 s15850 s35932 s38417 s38584 (bits) 23,754 39,273 165,200 76,986 28,208 164,736 199,104 compression (%) for b = 3 44.49 49.63 58.75 52.15 None 46.82 48.52 Golomb compression (%) 53.73 59.85 84.33 66.55 2.27 58.08 59.61 FDR compression (%) 61.32 60.63 87.67 71.95 25.74 65.35 64.67 TE size with FDR (bits) 9,188 15,460 20,368 21,590 20,946 57,066 70,328 Mintest test set size (bits) 20,758 25,935 163,100 57,434 19,393 113,152 161,040

Table 5. Compression obtained using TD. Run-length TD size Circuit s5378 s9234 s13207 s15850 s35932 s38417 s38584 (bits) 23,754 39,273 165,200 76,986 28,208 164,736 199,104 compression (%) for b = 3 35.72 42.12 56.83 47.98 None 32.53 42.21 Golomb compression (%) 37.11 45.25 79.74 62.82 None 28.37 57.17 FDR compression (%) 48.02 43.59 81.30 66.22 19.37 43.26 60.91 TE size with FDR (bits) 12,346 22,152 30,880 26,000 22,744 93,466 77,812 Mintest test set size (bits) 20,758 25,935 163,100 57,434 19,393 113,152 161,040

Table 6. Comparison between external clock frequency fext, required for Golomb-coded test data, and external clock frequency f ext, required for external testing with ATPGcompacted patterns (for the same testing time). Circuit s9234 s13207 s15850 s38417 s38584 m 4 16 4 4 4 r 5,039 6,716 8,702 20,165 23,320 nC 22,250 41,658 40,717 92,054 104,111 pn 25,935 163,100 57,434 113,152 161,040 f ext /fext 1.93 9.90 2.25 1.99 2.54

Table 6 shows that Golomb coding lets us use a slower tester without incurring a time penalty. In comparison with external testing using ATPGcompacted patterns, we achieved the same testing time using a much slower tester. Overall, our experimental results for the ISCAS benchmarks show that the compression technique is very efcient for full-scan circuits and that ATPG compaction is not always necessary to save ATE memory and reduce testing time.

of using TD for compression is that the decompression architecture for on-chip pattern generation doesnt require a separate CSR. For circuits with long scan chains, equally long additional CSRs increase hardware overhead significantly. Therefore, compressing TD to generate the encoded test set not only yields smaller test sets but also reduces hardware overhead.

TEST DATA COMPRESSION offers a solution to the TRP problem for SOC designs. We are currently working on reduced-pin-count-testing (RPCT) and BIST techniques using test data compression. I

Acknowledgments
This research was supported in part by National Science Foundation grant CCR-9875324 and in part by an Intel equipment grant.
IEEE Design & Test of Computers

90

References
1. A. Raghunathan and S.T. Chakradhar, Acceleration Techniques for Dynamic Vector Compaction, Proc. Intl Conf. Computer-Aided Design, IEEE CS Press, Los Alamitos, Calif., 1995, pp. 310-317. 2. S. Bommu, S.T. Chakradhar, and K.B. Doreswamy, Static Compaction Using Overlapped Restoration and Segment Pruning, Proc. Intl Conf. Computer-Aided Design, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 140-146. 3. I. Pomeranz and S.M. Reddy, Stuck-at TupleDetection: A Fault Model Based on Stuck-at Faults for Improved Defect Coverage, Proc. IEEE VLSI Test Symp., IEEE CS Press, Los Alamitos, Calif., 1998, pp. 289-294. 4. V. Iyengar, K. Chakrabarty, and B.T. Murray, Deterministic Built-in Pattern Generation for Sequential Circuits, J. Electronic Testing: Theory and Applications (JETTA), vol. 15, Aug.-Oct. 1999, pp. 97-115. 5. A. Jas and N.A. Touba, Test Vector Decompression via Cyclical Scan Chains and Its Application to Testing Core-Based Design, Proc. Intl Test Conf., IEEE CS Press, Los Alamitos, Calif., 1998, pp. 458-464. 6. A. Chandra and K. Chakrabarty, System-on-aChip Test Data Compression and Decompression Architectures Based on Golomb Codes, IEEE Trans. Computer-Aided Design, vol. 20, no. 3, Mar. 2001, pp. 355-368. 7. A. Chandra and K. Chakrabarty, FrequencyDirected Run-Length (FDR) Codes with Application to System-on-a-Chip Test Data Compression, Proc. IEEE VLSI Test Symp., IEEE CS Press, Los Alamitos, Calif., 2001, pp. 42-47. 8. D. Heidel et al., High-Speed Serializing/De-serializing Design-for-Test Methods for Evaluating a 1 GHz Microprocessor, Proc. IEEE VLSI Test Symp., IEEE CS Press, Los Alamitos, Calif., 1998, pp. 234-238. 9. I. Hamzaoglu and J.H. Patel, Test Set Compaction Algorithms for Combinational Circuits, Proc. Intl Conf. Computer-Aided Design, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 283-289. 10. A. Chandra and K. Chakrabarty, Efcient Test Data Compression and Decompression for System-on-a-Chip Using Internal Scan Chains and Golomb Coding, Proc. Design, Automation and Test in Europe (DATE 01) Conf., IEEE CS Press, Los Alamitos, Calif., 2001, pp. 145-149. For further information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.

Anshuman Chandra is a PhD candidate in electrical and computer engineering at Duke University. His research interests include VLSI design, digital testing, and computer architecture. Chandra has a BE in electrical engineering from the University of Roorkee, Roorkee, India, and an MS in electrical and computer engineering from Duke University. He is a student member of the IEEE and ACM SIGDA.

Krishnendu Chakrabarty is an assistant professor of electrical and computer engineering at Duke University. His research interests include system-on-a-chip testing, embedded real-time operating systems, distributed sensor networks, and architectural optimization of microelectrofluidic systems. Chakrabarty has a BTech from the Indian Institute of Technology, Kharagpur, and an MSE and a PhD from the University of Michigan, Ann Arbor, all in computer science and engineering. He is a senior member of the IEEE and a member of the ACM SIGDA and Sigma Xi. He is the vice chair of technical activities in the IEEE Computer Societys Test Technology Technical Council. Direct questions or comments about this article to Anshuman Chandra, Duke University, Dept. of Electrical and Computer Engineering, 130 Hudson Hall, Box 90291, Durham, NC 27708; [email protected].

SeptemberOctober 2001

91

You might also like