Embedded Testing
Embedded Testing
EMBEDDED PROCESSOR-BASED
SELF-TEST
by
DIMITRIS GIZOPOULOS
ANTONIS PASCHALlS
University 0/ Athens, Athens, Greece
and
YERVANT ZORIAN
Virage Logic, Fremont, Califomia, U.S.A.
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 978-1-4419-5252-3
ISBN 978-1-4020-2801-4 (eBook)
DOI 10.1007/978-1-4020-2801-4
CONTENTS
Contents _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ v
List ofFigures _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Vlll
List ofTables _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ IX
Preface
Xlll
Acknowledgments
xv
1. INTRODUCTION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
1.1
Book Motivation and Objectives
1
1.2 Book Organization
4
2. DESIGN OF PROCESSOR-BASED SOC
7
2.1
Integrated Circuits Technology
7
2.2 Embedded Core-Based System-on-Chip Design
8
2.3
Embedded Processors in SoC Architectures
11
3. TESTING OF PROCESSOR-BASED SOC
21
3.1
Testing and Design for Testability
21
3.2 Hardware-Based Self-Testing
32
3.3
Software-Based Self-Testing
41
3.4 Software-Based Self-Test and Test Resource Partitioning _46
3.5
Why is Embedded Processor Testing Important?
48
3.6 Why is Embedded Processor Testing Challenging?
49
4. PROCESSOR TESTING TECHNIQUES
55
4.1
Processor Testing Techniques Objectives
55
4.1.1
External Testing versus Self-Testing
56
4.1.2
DfT-based Testing versus Non-Intrusive Testing
57
4.1.3
Functional Testing versus Structura1 Testing
58
4.1.4
Combinational Faults versus Sequential Faults Testing _59
4.1.5
Pseudorandom versus Deterministic Testing
60
4.1.6
Testing versus Diagnosis
62
4.1.7
Manufacturing Testing versus On-1inelField Testing _63
4.1.8
Microprocessor versus DSP Testing
63
4.2 Processor Testing Literature
64
4.2.1
Chronological List ofProcessor Testing Research _ _ 64
4.2.2
Industrial Microprocessors Testing
78
4.3
Classification ofthe Processor Testing Methodologies
78
5. SOFTWARE-BASED PROCESSOR SELF-TESTING
81
5.1
Software-based self-testing concept and flow
82
5.2
Software-based self-testing requirements
87
5.2.1
Fault coverage and test quality
88
5.2.2
Test engineering effort for self-test generation
90
vi
Contents
5.2.3
Test application time
91
5.2.4
A new self-testing efficiency measure
96
5.2.5
Embedded memory size for self-test execution
97
98
5.2.6
Knowledge ofprocessor architecture
5.2.7
Component based self-test code development
99
5.3
Software-based self-test methodology overview
100
5.4 Processor components c1assification
107
5.4.1
Functional components
108
5.4.2
Control components
111
5.4.3
Hidden components
112
5.5
Processor components test prioritization
113
5.5.1
Component size and contribution to fault coverage _115
5.5.2
Component accessibility and ease oftest
117
5.5.3
Components' testability correlation
119
5.6 Component operations identification and selection
121
5.7 Operand selection
124
5.7.1
Self-test routine development: ATPG
125
5.7.2
Self-test routine development: pseudorandom
133
5.7.3
Self-test routine development: pre-computed tests __ 137
5.7.4
Self-test routine development: style selection
139
141
5.8 Test development for processor components
5.8.1
Test development for functional components
141
5.8.2
Test development for control components
141
5.8.3
Test development for hidden components
143
5.9 Test responses compaction in software-based self-testing _146
5.10
Optimization of self-test routines
148
5.10.1 "Chained" component testing
149
5.10.2 "Parallel" component testing
152
5.11
Software-based self-testing automation
153
6. CASE STUDIES - EXPERIMENTAL RESULTS
157
6.1
Parwan processor core
158
6.1.1
Software-based self-testing of Parwan
159
6.2 PlasmaJMIPS processor core
160
163
6.2.1
Software-based self-testing of Plasma/MIPS
6.3
MeisterlMIPS reconfigurable processor core
168
6.3.1
Software-based self-testing of MeisterlMIPS
170
171
6.4 Jam processor core
6.4.1
Software-based self-testing of Jam
172
173
6.5
oc8051 microcontroller core
6.5.1
Software-based self-testing of oc8051
175
6.6 RISC-MCU microcontroller core
176
6.6.1
Software-based self-testing ofRISC-MCU
177
Contents
vii
LIST OF FIGURES
Figure 2-1: Typical System-on-Chip (SoC) architecture.
9
Figure 2-2: Core types of a System-on-Chip.
11
Figure 3-1: ATE-based testing.
28
Figure 3-2: Self-testing of an IC.
34
Figure 3-3: Self-testing with a dedicated memory.
38
Figure 3-4: Self-testing with dedicated hardware.
39
Figure 3-5: Software-based self-testing concept for processor testing. __ 42
Figure 3-6: Software-based se1f-testing concept for testing a SoC core._43
Figure 5-1: Software-based se1f-testing for a processor (manufacturing). _ 82
Figure 5-2: Software-based self-testing for a processor (periodic).
84
Figure 5-3: Application ofsoftware-based self-testing: the three steps. _86
Figure 5-4: Engineering effort (or cost) versus fault coverage.
91
Figure 5-5: Test application time as a function ofthe K!W ratio.
94
Figure 5-6: Test application time as a function ofthe fuP/!;ester ratio.
95
Figure 5-7: Software-based self-testing: overview of the four phases. _102
Figure 5-8: Phase A of software-based self-testing.
103
Figure 5-9: Phase B of software-based self-testing.
104
Figure 5-10: Phase C of software-based self-testing.
105
Figure 5-11: Phase D of software-based self-testing.
107
Figure 5-12: Classes ofprocessor components.
108
Figure 5-13: Prioritized component-Ievel se1f-test pro gram generation._114
Figure 5-14: ALU component ofthe MIPS-like processor.
122
Figure 5-15: ATPG test patterns application from memory.
129
Figure 5-16: ATPG test patterns application with immediate instructions. 131
Figure 5-17: Forwarding logic multiplexers testing.
145
Figure 5-18: Two-step response compaction.
147
Figure 5-19: One-step response compaction.
147
Figure 5-20: "Chained" testing ofprocessor components.
150
153
Figure 5-21: "Parallel" testing of processor components.
Figure 5-22: Software-based self-testing automation.
154
Figure 7-1: Software-based self-testing for SoC.
186
LIST OF TABLES
Table 2-1: Soft, firm and hard IP cores.
10
Table 2-2: Embedded processor cores (1 of3).
15
Table 2-3: Embedded processor cores (2 of3).
16
Table 2-4: Embedded processor cores (3 of 3).
17
Table 4-1: External testing vs. self-testing.
57
Table 4-2: DtT-based vs. non-intrusive testing.
57
Table 4-3: Functional vs. structural testing.
59
Table 4-4: Combinational vs. sequential testing.
60
Table 4-5: Pseudorandom vs. deterministic testing.
62
Table 4-6: Testing vs. diagnosis.
63
Table 4-7: Manufacturing vs. on-line/field testing.
63
Table 4-8: Processor testing methodologies classification.
79
Table 5-1: Operations ofthe MIPS ALU.
124
Table 5-2: ATPG-based self-test routines test application times (case 1). 132
Table 5-3: ATPG-based self-test routines test application times (case 2). 132
Table 5-4: Characteristics of component self-test routines development. _139
Table 6-1: Parwan processor components.
159
Table 6-2: Self-test program statistics for Parwan.
160
Table 6-3: Fault simulation results for Parwan processor.
160
161
Table 6-4: Plasma processor components.
Table 6-5: Plasma processor synthesis for Design I.
162
Table 6-6: Plasma processor synthesis for Design 11.
162
Table 6-7: Plasma processor synthesis for Design III.
163
Table 6-8: Fault simulation results for the Plasma processor Design I. _164
Table 6-9: Self-test routine statistics for Designs 11 and III ofPlasma._164
Table 6-10: Fault simulation results for Designs 11 and III ofPlasma. _165
Table 6-11: Plasma processor synthesis for Design IV.
167
Table 6-12: Comparisons between Designs 11 and IV ofPlasma.
167
Table 6-13: MeisterlMIPS processor components.
168
Table 6-14: MeisterlMIPS processor synthesis.
169
Table 6-15: Self-test routines statistics for MeisterlMIPS processor. __ 170
Table 6-16: Fault simulation results for MeisterlMIPS processor.
170
Table 6-17: Jam processor components.
171
Table 6-18: Jam processor synthesis.
172
173
Table 6-19: Self-test routine statistics for Jam processor.
Table 6-20: Fault simulation results for Jam processor.
173
Table 6-21: oc8051 processor components.
174
Table 6-22: oc8051 processor synthesis.
174
Table 6-23:
Table 6-24:
Table 6-25:
Table 6-26:
Table 6-27:
Table 6-28:
Table 6-29:
Table 6-30:
Table 6-31:
Table 6-32:
Table 6-33:
Table 6-34:
Table 6-35:
List of Tables
Preface
xiv
Preface
blocks of complex designs to test these blocks as weIl with such self-test
programs. The already established System-on-Chip design paradigm that is
based on pre-designed and pre-verified embedded cores employs one or
more embedded processors of different architectures. Software-based selftesting is a very suitable methodology for manufacturing and in-field testing
of embedded processors and surrounding blocks.
In this book, software-based self-testing is described, as a practical, lowcost, easy-to-apply self-testing solution for processors and SoC designs. It
relaxes the tight relation of manufacturing testing with high-performance,
expensive IC test equipment and hence results in test cost reduction. If
appropriately applied, software-based self-testing can reach a very high test
quality (high fault coverage) with reasonable test engineering effort, small
test development cost and short test application time.
Also, this book sets a basis for comparisons among different softwarebased self-testing techniques. This is achieved by: describing the basic
requirements of this test methodology; focusing on the basic parameters that
have to be optimized; and applying it to a set of publicly available
benchmark processors with different architectures and instruction sets.
Acknowledgments
The authors would like to acknowledge the support and encouragement
by Dr. Vishwani D. Agrawal, the Frontiers in Electronic Testing book series
consulting editor. Special thanks are also due to earl HaITis and
Mark de Jongh of Kluwer Academic Publishers for the excellent
collaboration in the production ofthis book.
The authors would like to acknowledge the help and support of several
individuals at the University of Piraeus, the University of Athens and Virage
Logic and in particular the help ofNektarios Kranitis and George Xenoulis.
Chapter
1
Introduction
1.1
Electronic products are used today in the majority of our daily activities.
Thus, they enabled efficiency, productivity, enjoyment and safety.
The Integrated Circuits (ICs) realized today consist of multiple millions
of logic gates and even more memory cells. They are implemented in, very
deep sub-micron (VDSM) process technologies and often consist of
multiple, pre-designed entities called Intellectual Property (IP) cores. This
design methodology that allowed the integration of embedded IP cores is
known as Embedded Core-Based System-on-Chip (SoC) design
methodology. SoC design flow supported by appropriate Computer Aided
Design (CAD) tools has dramatically improved design productivity and has
opened up new horizons for successful implementation of sophisticated
chips.
An important role in the architecture of complex SoC is played by
embedded processors. Embedded processors and other cores buBt around
them constitute the basic functional elements of today's SoCs in embedded
systems. Embedded processors have optimized design (in terms of silicon
area, performance, power consumption, etc), and provide the means for the
integration of sophisticated, flexible, upgradeable and re-configurable
functionality of a complex Soc. In many cases, more than one embedded
Embedded Processor-Based Self-Test
D.Gizopoulos, APaschalis, Y.Zorian
Kluwer Academic Publishers, 2004
Chapter 1 - Introduction
processors exist in a SoC, each of which takes over different tasks of the
system and sharing the processing workload.
Issues such as the quality of the final SoC, the reliability of the
manufactured ICs, and the reduced possibility of delivering malfunctioning
chips to the end users, are rapidly getting more importance today with the
increasing criticality of most of electronic systems applications.
In the context of these quality and reliability requirements, complex SoC
designs, realized in dense manufacturing technologies face serious problems
that need special consideration. Manufacturing test of complex chips based
on external Automatic Test Equipment (ATE),as a method to guarantee that
the delivered chips are correct1y operating, is becoming less feasible and
more expensive than ever. The volume of test data that must be applied to
each manufactured chip is becoming very large, the test application time is
increasing and the overall manufacturing test cost is becoming the dominant
part ofthe total chip development cost.
Under these circumstances, which are expected to get worse as circuits
size shrinks and density increases, the effective migration of the
manufacturing test resources from outside of the chip (ATE) to on-chip,
built-in resources and thus the effective replacement of external based
testing with internally executed self-testing is, today the test technology of
choice for all SoCs in practice. Self-testing allows at-speed testing, i.e. test
execution at the actual operating speed of the chip. Thus all physical faults
that cause either timing miss-behavior or an incorrect binary value can be
detected. Also, self-testing drastically reduces test data storage requirements
and test application time, both of which explode when external, ATE-based
testing is used. Therefore, the extensive use of self-testing has a direct
impact on the reduction of the overall chip test cost.
Testing of processors or microprocessors, even when they are not deeply
embedded in a complex system, is known to be achallenging task itself.
Classical testing approaches used in other digital circuits are not adequate to
the carefully optimized processor designs, because they can't reach the same
efficiency as in other types of digital circuits. Also, self-test approaches,
successfully used to improve the testability of digital circuits, are not very
suitable for processor testing because such techniques usually add overheads
in the processor's performance, silicon area, pin count and power
consumption. These overheads are often not acceptable for processors which
have been specifically optimized to satisfy very strict area, speed and power
consumption requirements.
This book primarily discusses the special problem of testing and selftesting of embedded processors in SoC architectures, as well as the problem
of testing and se1f-testing other cores of the SoC using the embedded
processor as test infrastructure.
Chapter 1 - Introduction
1.2
Book Organization
Chapter 7 briefly discusses the extension of software-based selftesting to SoC architectures. An embedded processor can be used
for the effective testing of other cores in the Soc. The details of
the approach are discussed and a list of recent, related works
from the literature is given.
Chapter 8 concludes the book, gives a quick summary of what
has been discussed in it and outlines the directions in the topic
that are expected to gain importance in the near future.
Chapter
2
Design 0/ Processor-Based SoC
2.1
The half pitch of the first level interconnect is a measure of the technology level,
calculated as Y2 of the pitch which is the sum of widths of the metal interconnect and the
width ofthe space between two adjacent wires.
2.2
Memory
DSP subsystem
10
IP cores are released from IP core providers either as soft cores, firm
cores or hard cores depending on the level of changes that the SoC designer
(also called IP cores user) can make to them, and the level of transparency
they come with when delivered to the final SoC integrator [60]. A soft core
consists of a synthesizable HDL description that can be synthesized into
different semiconductor processes and design libraries. Afirm core contains
more structural information, usually a gate-level netlist that is ready for
placement and routing. A hard core includes layout and technologydependent timing information and is ready to be dropped into a system but
no changes are allowed to it [60].
Hard cores have usually smaller cost as they are final plug-and-play
designs implemented in a specific technology library and no changes are
allowed in them. At the opposite end, soft cores are available at HDL format
and the designer can use them very flexibly, synthesizing the description
using virtually any tool and any design library and thus the cost of soft cores
is usually much higher than that of hard cores. Adescription level in
between the hard and soft cores both in terms of cost and design flexibility is
the firm cores case where the final SoC designer is supplied with a gate-level
netlist of a design which can be altered in terms of technology library and
placement/routing but not in such a flexible way as a soft core. Table 2-1
summarizes the characteristics of these three description levels of IP cores
used in SoC designs.
Core category
softcore
firm core
hardcore
Changes
Many
Some
No
Cost
High
Medium
Low
Description
HDL
Netlist
Layout
Memory
Gores
Processor
Gores
DSP
Gores
Memory
Interface
iH'
f-+
User
Defined
Logic
DSP
Peripheral
Gores
11
User
Defined
Logic
I-.
Peripheral
Interface
Analog and
Analog and
Mixed-Signal ~ Mixed-Signal ~
Gores
Interface
tt t
Test
Interface
Pin I Port
Mapping
~
With the adoption of the SoC design paradigm embedded core-based ICs
can be designed in a more productive way than ever and first-time-correct
design is much more likely. Electronic systems designed this way have much
shorter time-to-market than before and better chances for market success.
We should never forget the importance of time-to-market reduction in
today's high competition electronic systems market. A product is successful
if it is released to its potential users when they really need it, under the
condition, of course, that it operates acceptably as expected. A "perfect"
system on which hundreds of person-months have been invested may
potentially fail to obtain a significant market share if it is not released on
time to the target users. Therefore, successful practices at all stages of an
electronic system design flow (and a SoC design flow in particular) that
accomplish their mission in a quick and effective manner are always looked
for. The methodology discussed in this book aims to improve one of the
stages ofthe design flow.
We continue our discussion on the SoC design paradigm emphasizing on
the key role of embedded processors in it.
2.3
12
The first task offers high flexibility to the SoC designers because they
can use processor's inherent programmability to efficiently update, improve
and revise the system's functionality just by adding or modirying existing
embedded software (code and data) stored in embedded memory cores.
ActuaIly, in many situations, an updated or new product version is only a
new or revised embedded software module which runs on the embedded
processor of the SoC and offers new functionality to the end user of the
system.
The second task that embedded processors are assigned to, offers
excellent accessibility and communication from the processor to all internal
cores of the SoC and, therefore, the processor can be used for several
reliability-related functions of the system, the most important of them being
manufacturing testing and field testing. This strong connection that an
embedded processor has with all other cores of the SoC makes it an excellent
existing infrastructure for the access of all SoC internal nodes, controlling
their logic states and observing them at the SoC boundaries. As we will see,
embedded processors can be used as an effective vehicle for low-cost selftesting oftheir internal components as weIl as other cores ofthe SoC.
In the majority of modem SoC architectures, more than one embedded
processors exist together; the most common situation is to have two
embedded processors in a SoC. For example, an embedded microcontroller
(/-lC) or embedded RISC (Reduced Instruction Set Computer) or other
processor can be used for the main processing parts of the system, while a
Digital Signal Processor (DSP) can take over part of functionality of the
system which is related to heavier data processing for specialized signal
processing algorithms (see Figure 2-1 and Figure 2-2). In architectures
where the SoC communicates with different external data channels, a
separate embedded processor associated with its dedicated memory
subsystem may deal with each of the communication channels while another
processor can be used to co ordinate the flow of data in the entire SoC.
The extensive use of embedded processors in SoC architectures of
different complexities and application domains has given new life to classic
processor architectures, with a word length as small as 8-bits, which were
widely used in the past. Successful architectures of microcontrollers,
microprocessors and DSPs were used for many years in the past, in a big
variety of applications as individually packaged devices (commercial-of-theshelf components - COTS). These classical processors are now used as
embedded processor cores in complex SoC architectures and can actually
13
14
15
16
Embedded Processor
DW6811
Synopsys
Characteristics
8-bit microcontroller.
6811 compatible.
http://www.synopsys.com
Synopsys DesignWare core
Synthesizable soft core.
15,000 to 30,000 gates.
200 MHz frequency in 0.13 IJm ASIC process.
SAM80
8-bit microprocessor.
Samsung Eleetronies
Zilog Z180 compatible.
http://www.samsung.com
Hard core.
0.6 IJm, 0.5 IJm ASIC processes.
SM8A02/SM8A03
8-bit microcontroller.
Samsung Eleetronies
80C51/80C52 subset compatible.
http://www.samsung.com
Hard core.
0.8 IJm ASIC processes.
eZ80
8-bit microprocessors family.
Zi/og
Enhanced superset of the Z80 family.
http://www.zilog.com
50 MHz frequency.
KL5C80A12
8-bit high speed microcontroller.
Kawasaki LSI
Z80 compatible.
http://www.klsi.com
10 MHz frequency.
R8051
8-bit RISC microcontroller.
Altera and CAST Ine.
Executes all ASM51 instructions.
http://www.altera.com
Instruction set of 80C31 embedded controller.
http://www.cast-inc.com
Synthesizable soft core.
2,000 to 2,500 Altera family FPGA logic cells.
30 to 60 MHz frequency.
8-bit microcontroller.
C8051
Evatronix
Executes all ASM51 instructions.
http://www.evatronix.pl
Instruction set of 80C31 embedded controller.
Synthesizable soft core.
Less than 1OK gates depending on technology.
80 MHz in 0.5 IJm ASIC process
160 MHz in 0.25 IJm ASIC process.
DF6811CPU
8-bit microcontroller CPU.
Altera and Digital Core
Compatible with 68HC11 microcontroller.
Design
Synthesizable soft core.
http://www.altera.com
2,000 to 2,300 Altera family FPGA logic cells.
http://www.dcd.com.pl
40 to 73 MHz frequency.
MIPS32 M4KTM
32-bit RISC microprocessor core of MIPS32'M
MIPS
architecture.
http://www.mips.com
Synthesizable soft core.
300 MHz typical frequency in 0.13 IJm process.
0.3 to 1.0 mm 2 core size.
Table 2-3: Embedded processor cores (2 of3).
Xtensa
Tensilica
Characteristics
32-bit RISC CPU core of MIPS32 'M architecture.
Hard core.
167 MHz in 0.18 IJm ASIC process.
200 MHz in 0.11 IJm ASIC process.
32-bit RISC CPU core of ARM v4T architecture.
Hard core.
133 MHz frequency in 0.13 IJm ASIC process.
0.26 mm 2 core size.
32-bit RISC CPU core of ARM v4T architecture.
Synthesizable soft core.
100 to 133 MHz frequency in 0.13 ASIC
process.
0.32 mm 2 core size.
32-bit RISC CPU core of MIPS32 1IVI
architecture.
DesignWare Star IP Core.
Synthesizable soft core.
240-300 MHz frequency.
0.4 - 1.9 mm 2 core size.
32-bit unified microcontroller-DSP processor
core.
DesignWare Star IP Core.
Synthesizable soft core.
166 MHz frequency in 0.181Jm ASIC process.
200 MHz frequency in 0.13 IJm ASIC process.
32-bit superscalar RISC processor core.
Hard core.
550 MHz / 1000 MIPS in 0.15 IJm ASIC process.
4.0 mm 2 core size.
8-bit AVR-based RISC microcontroller.
Inciudes 2 Kbyte flash memory.
10 MHz freQuency.
8-bit AVR-based RISC microcontroller.
Inciudes 1 Kbyte flash memory.
12 MHz freQuency.
16/32-bit microprocessor.
Motorola MC68000 compatible.
Synthesizable soft core (VHDL and Verilog).
16-bit fixed point Digital Signal Processor.
TMS320C25 compatible.
Synthesizable soft core (VHDL and Verilog).
32-bit RISC configurable processor.
Up to 300 MHz on 0.13 IJm.
SHARC
Analog Devices
http://www.lsilogic.com
ARM7TDMI
ARM
http://www.arm.com
ARM7TDMI-S
ARM
http://www.arm.com
MIPS4KE
Synopsys and MIPS
http://www.synopsys.com
TC1MP-S
Synopsys and Infineon
http://www.synopsys.com
http://www.infineon.com
PowerPC 440
IBM
http://www.ibm.com
AT90S2313
Atmel
http://www.atmel.com
AT90S1200
Atmel
http://www.atmel.com
C68000
Evatronix
http://www.evatronix.pl
C32025
Evatronix
http://www.evatronix.pl
http://www.tensilica.com
http://www.analog.com
17
18
The infonnation presented in Table 2-2, Table 2-3 and Table 2-4 has
been retrieved by the companies' public available documentation. It was our
effort to cover a wide range of representative types of embedded processors,
but of course not all available embedded processors today could be listed.
The intention of the above list of embedded processors is to demonstrate that
classic, cost effective processors and modem, high-perfonnance processors
are equally present today in the embedded processors market.
Apparently, when the perfonnance that a classical, sm all 8-bit or 16-bit
processor architecture gives to the system, is not able to satisfy the particular
perfonnance requirements of a specific application, other solutions are
always available, such as the high-end RISC embedded processors or DSPs
which can be incorporated in the design (several such processors are listed in
the Tables of the previous pages). The high perfonnance of modem
processor architectures, enriched with deep, multi-stage pipeline structures
and complex perfonnance-enhancing circuits, is able to meet any demanding
application needs (communication systems, industrial control systems,
medical applications, transportation and others).
As a joint result of the recent advances in very deep submicron
manufacturing technologies and design methodologies (EDA tools, HDLs
and SoC design methodology), today's complex processor-based SoC
devices offer complex functionality and high perfonnance that is able to
meet the needs ofthe demanding users ofmodem technology.
Unfortunately, the sophisticated functionality and the high perfonnance
of electronic systems are not offered at zero expense. Complex modem
systems based on embedded processors and realized as SoC architectures
have many challenges to face and major hurdles to overcome. Many ofthese
challenges are related to the design phases of the system and others are
related to the tasks of:
These tasks have always been difficult and time consuming, even when
electronic circuits size and complexity was much smaller than today. They
are getting much more difficult in today's multi-core SoC designs, but they
are also of increasing importance for the system's quality. An increasing
percentage ofthe total system development cost is dedicated to these tasks of
design verification and manufacturing testing. As a result, cost-reduction
techniques for circuit testing during manufacturing and in the field of
operation are gaining importance today and attract the attention of
researchers. As we see in this book, the existence of (one or more) embedded
processors in an SoC, although add to the chip's complexity, also provide a
19
Chapter
3
Testing ofPro cess or-Bas ed SoC
3.1
22
Operational faults in deep submicron technologies are classified into the following
categories. Permanent faults are infinitely aetive at the same loeation and refleet
irreversible physieal ehanges. Intermittent appear repeatedly at the same loeation and
eause errors in bursts only when they are aetive. These faults are indueed by unstable or
marginal hardware due to proeess variations and manufaeturing residuals and are aetivated
by environmental ehanges. In many eases, intermittent faults precede the oeeurrenee of
permanent faults. Transient faults appear irregularly at various loeations and last short
time. These faults are induced by neutron and alpha particles, power supply and
interconneet noise, electromagnetie interferenee and electrostatic discharge.
Fault eoverage obtained by a set of test patterns is the pereentage of the total faults of the
ehip that the test set ean detect. Faults belong to a fault model which is an abstraetion of
the physical defect mechanisms.
23
24
with a smaller test set) may lead to relatively poor fault coverage. If a test set
obtains small fault coverage, this means that only a small percentage of the
faults that may exist in the circuit will be detected by the test set. The
remaining faults of the fault model may exist in the circuit but they will not
be detected by the applied test set. Therefore, the insufficiently tested device
has a higher probability to malfunction when placed in the target system than
a device which has been tested to higher fault coverage levels.
A discussion on the details of test generation and test application is given
in the following paragraphs. Both tasks are becoming extremely difficult as
the complexity of ICs and in particular processor-based SoCs increases.
Test generation for complex ICs ean't be easily handled even by the most
advanced commercial combinational and sequential circuit Automatic Test
Pattern Generators (ATPG). ATPG tools can be used, ofcourse, only when a
gate level netlist of the circuit is available. The traditional fault models used
in ATPG flows are the single stuck-at fault model, the transition fault model
and the path delay fault model. The number of gates and memory elements
(flip-flops, latches) to be handled by the ATPG is getting extremely high and
in some cases relatively low fault coverage, of the selected fault model, can
only be obtained after many hours and many backtracks of the ATPG
algorithms. As circuit sizes increase this inefficiency of ATPG tools is
becoming worse.
The difficulties that an ATPG faces in test generation have their sources
in the reduced observability and controllability of the internal nodes in
complex architectures. In earlier years, when today's embedded IP cores
were used as packaged components in a System-on-Board design (packaged
chips mounted to Printed Circuit Board - PCB), the chips inputs and outputs
were easily accessible and testing was significantly simpler and easier
because ofthis high accessibility. The transition to the SoC design paradigm
and to miniaturized systems it develops, has given many advantages like
high performance, low power consumption, small size, small weight, ete, but
on the other side it imposed serious accessibility problems for the embedded
cores and, therefore, serious testability problems for the SoC. Deeply
embedded functional or storage cores in an SoC need special mechanisms
for the delivery of the test patterns from SoC inputs to the core inputs and
the propagation of their test responses from core outputs to the SoC
boundaries for externaiobservation and evaluation.
It is absolutely necessary that a complex SoC architecture includes
special DIT structures to improve the accessibility to its internal nodes and
thus improve its testability. The inclusion of DIT structures in a chip makes
the test generation process for it much more easy and effective, and the
required level of test quality can be obtained. We discuss alternative DIT
25
26
structured Off techniques are very successful inreducing the test generation
costs and efforts because of the existence of EOA tools for the automatic
insertion of scan structures. Manual effort is very limited and high fault
coverage is usua11y obtained. Furthermore, other structured Off techniques
like test point insertion (control and observation points) are widely used in
conjunction with scan design to further increase circuit nodes accessibility
and ease the test generation difficulties.
The major concems and limitations of a11 scan-based testing approaches
and in general of a11 structured or ad-hoc Off approaches that make them not
directly applicable to any design without serious considerations and planning
are the fo11owing. As we see subsequently, processors are types of circuits
where structured Off techniques can't be applied in a straightforward
manner.
Hardware overhead.
Off modifications in a circuit (multiplexers for test point
insertion, multiplexers for the modification of normal flip-flops
to scan flip-flops, additional primary inputs and/or outputs, etc)
always lead to substantial silicon area increase. In some cases
this overhead is not acceptable, for example when the larger
circuit size leads to a package change. Thus, Off modifications
may directly increase the chip development costs.
Performance degradation.
Scan based design and other Off techniques make changes in the
critical paths of a design. In a11 cases, at least some multiplexing
stages are inserted in the critical paths. These additional delays
may not be a problem in low-speed circuits when a moderate
increase in the delay of the critical paths which is
counterbalanced with better testability of the chip is not a serious
concem. But, in the case of high-speed processors or highperformance processor-based
SoCs
such performance
degradations, even at the level of 1% or 2% compared to the nonOff design, may not be acceptable. Processor designs, carefu11y
optimized to deliver high performance, of course, belong to this
class of circuits which are particularly sensitive to performance
impact due to Off modifications.
Power consumption increase.
The increase of silicon area wh ich is due to Off modifications is
also related to an increase in power consumption, a critical factor
in many low-cost, power-sensitive designs. Scan-based Off
techniques are characterized by large power consumption
because of the high circuit activity when test patterns are scanned
27
in the chain and test responses are scanned out of it. Circuit
activity during the application of scan tests may be much more
than the circuit activity during normal operation and may lead to
peak power consumption not foreseen at the design stage. This
can happen because scan tests apply to the circuit not functional
input patterns that do not appear when the circuit operates in
normal mode. Therefore, excessive power consumption during
scan-based testing may seriously impact the manufacturing
testing of an IC, as its package limits may be reached because of
excessive heat dissipation.
Test data size (patterns and responses) and duration oftest.
Scan-based testing among the structured Dff techniques is
characterized by a large amount of test data: test patterns to be
inserted in the scan chain and applied to the circuit, and test
responses captured at the circuitlmodule outputs and then
exported and evaluated externally. The total test application time
(in clock cycles) in scan-based testing is many times larger
compared with the actual number of test patterns, because of the
large numbers of cycles required for the scan-in of a new test
pattern and the scan-out of a captured test response. Test
application time related to scan-in and scan-out phases is getting
larger when the scan chains get longer.
28
As long as the A TPG-based test generation produces high quality tests with
acceptable hardware and performance overheads, the primary concern is the
test application time, i.e. the portion of the manufacturing phases that each
IC spends being tested. Therefore, it is important to obtain a small test set at
the expense of large test generation time, because a sm all test set will lead to
a smaller test application time for each device.
Test application time per designed chip is a critical factor which has a
certain impact on the production cycle duration, time-to-market and
therefore, to some extend, its market success. Scan-based tests and other DIT
supported test flows may lead to very large test sets that, although reach high
fault coverage and test quality levels, consist of enormous amounts of test
data for test pattern application and test response evaluation. This problem
has already become very severe in complex SoC architectures where scanbased testing consists of: (a) core level tests that are applied to the core itself
in a scan fashion (each core may include many different scan chains); and
(b) SoC level tests that are used to isolate a core (again by scan techniques)
and initiate the testing of the core or test the interconnection between the
cores. The sizes of scan-in and scan-out data for such complex architectures
have pushed the limits of modem Automatic Test Equipment (ATE, Figure
3-1) traditionally used for external manufacturing testing ofICs, because the
memory capacity of A TE is usually not enough to store such huge amounts
oftest data.
Automatie Test Equipment
(A TE, external tester)
Test
Patterns
Test
Responses
ATE memory
fATE
------
IC under
test
f lC = IC frequen cy
=ATE frequency
29
and finally compared with the known, correct response. Subsequently, the
next test pattern is applied to the IC, and the process is repeated until all
patterns of the test set stored in the tester memory have been applied to the
chip.
The idea of ATE-based chip testing is outlined in Figure 3-1. The tester
operates in a frequency denoted as f ATE and the chip has an operating
frequency (when used in the field mounted to the final system) denoted as
f re . This means that the tester is able to apply a new test pattern at a
maximum rate of f ATE while the IC is able to produce correct responses when
receiving new inputs at a maximum rate of f rc. The relation between these
two frequencies is a critical factor that determines both the quality of the
testing process with extern al testers and also the test application time and
subsequently the test cast. The relation between these two frequencies is
taken in serious consideration in all cases, independently of the quality and
cost of the utilized ATE (high-speed, high-cost ATE or low-speed, low-cost
ATE). We will study this relation further in this book when the use of lowcost ATE in the context of software-based self-testing will be analyzed.
The essence of the relation between the tester and the chip frequencies is
that if we want to execute a high quality testing to a chip and detect all (or
most) physical failure mechanisms of modem manufacturing technologies,
we must use a tester with a frequency f ATE which is elose or equal to the
actual chip frequency f rc. This means, in turn, that for a high frequency
chip, a very expensive, high-frequency tester must be used and this fact will
increase the overall test and development cost of the IC. A conflict between
test quality and test cost is apparent.
Another cost-related consideration for external testing is the size of the
tester's physical memory where test patterns and test responses are stored. If
the tester memory is not large enough to store all the patterns of the test set
and the corresponding test responses, it is necessary to perform multiple
loadings of the memory, so that the entire set of test vectors is eventually
applied to each manufactured chip. A larger test set requires a larger tester
memory (a more expensive tester) while multiple loadings of the tester
memory mean higher testing costs.
When a multi-million transistor chip is planned to be tested during
manufacturing with the use of an external tester, the size of the tester
memory should be large enough to avoid the application of many loadings of
new chunks of test patterns in the tester memory and the multiple unloading
of test responses from it. In most cases this is really infeasible: a modem
complex SoC architecture can be sufficiently tested (to obtain sufficient fault
coverage and fault quality) only after severalloadings of new test patterns to
tester memory. These multiple loadings lead to a significant amount of time
of the tester being devoted to each of the devices under test. Of course the
30
31
This is simply because, usually, the chips of a technology generation are used in the testers
of the same generation, but these testers are used to test the chips of the next generation.
32
normal circuit operation. The rejection of chips that have nonfunctional faults (like non-functionally sensitizable path delay
faults) leads to further yield loss in addition to yield los ses due to
tester inaccuracy.
External testing of chips relying on ATE technology is a traditional
approach followed by most high-volume chip manufacturers. Lower
production volumes do not justifY very high testing costs and the extra
problems analyzed above paved the way to the self-testing (or built-in selftesting - BIST) technology which is now well-respected and widely applied
in modem electronic devices as it overcomes several of the bottlenecks of
external, ATE-based testing. Development of effective self-testing
methodologies has always been achallenging task, but it is now much more
challenging than in the past because of the complexity of the electronic
designs to which it is expected to be applied successfully.
In the following two subsections we focus on self-testing in both its
flavors: classical hardware-based self-testing and emerging software-based
(processor-based) self-testing. Software-based self-testing which is the focus
ofthis book is analyzed in detail in subsequent Chapters.
3.2
Hardware-Based Self-Testing
33
The increasingly high logic-to-pin (gate-to-pin or transistor-topin) ratio, that severely affects the ability to control and observe
the logic values of internal circuit nodes.
The operating frequencies of chips which are increasing very
quickly. The gigahertz (GHz) frequency domain has been
reached and devices like microprocessors with multi-GHz
operating frequencies are already a common practice.
The increasingly long test pattern generation and test application
times (due to increased difficulty for test generation and the
excessively large test sets).
The extremely large amount of test data to be stored in A TE
memory.
The difficulty to perform at-speed testing with external ATE. A
large population of physical defects that can only be detected at
the actual operating speed of the circuit escape detection.
The unavailability of gate-level netlists and the unfamiliarity of
designers with gate-level details, which both make testability
structures insertion difficult. Especially, in the SoC design era
with the extensive use ofhard, black-box cores, gate-level details
are not easy to obtain.
Lack of skilled test engineers that have a comprehensive
understanding oftesting requirements and testing techniques.
34
r+-
module
under
test
IC under test
f 1c = IC trequency
In a self-testing strategy, the test patterns (as weIl as the expected test
responses) are either stored in a special storage area on the chip (RAM,
ROM) and applied during the self-testing session (we call this approach
stored patterns self-testing), or, alternatively, they are generated by special
hardware that takes over this task (we call this approach on-chip generated
patterns self-testing). Furthermore, the actual test responses are either stored
in special storage area on the chip or compressedlcombined together to
reduce the memory requirements. The latter case uses special circuits for test
responses compaction and produces one or a few self-test signatures. In
either case (compacted or not compacted test responses) the analysis that
must eventually take place to decide ifthe chip is fault-free or faulty can be
done inside the chip or out of it. In the extreme case that comparison with
the expected correct response is done internaIly, a single-bit error signal
comes out of the chip to denote its correct or faulty operation. The opposite
extreme is the case where all test responses are extracted out of the chip for
external evaluation (no compaction). The middle case (most usual in
practice) is the one where a few self-test signatures (sets of compacted test
responses) are collected on-chip and at the end of self-test execution are
externally evaluated.
The advantages of self-testing strategies when compared to external,
ATE-based testing are summarized below.
The costs related to the purchase, use and maintenance of highend A TE are almost eliminated when self-testing is used. There
is no need to store test patterns and test responses in the tester
memory, but both tasks oftest application and response capturing
can be performed inside the chip by on-chip resources.
Self-testing mechanisms have a much better access to interna I
circuit nodes than external test mechanisms and can more likely
lead to test strategies with higher fault detection capabilities.
Self-testing usually obtains higher fault coverage than external
35
36
multiple loading sessions are avoided, but also all test patterns are applied at
the chip's actual frequency (usually higher than the tester frequency) and a
higher test quality is achieved. In external, A TE-based testing, performancerelated faults (delay/transition faults) may remain undetected because ofthe
frequency difference between the chip and the tester, and serious portion of
yield may be lost because of the tester's limited measurement accuracy. An
implicit assumption made for the validity of the above statement which
compares self-testing with external testing, is that the same test access
mechanisms (like scan chains, test points insertion and other Dff means) are
used in both cases. Only under this assumption, comparison between
external testing and self-testing is fair.
Self-testing methodologies are in some situations the only feasible testing
alternative strategy. These are cases where access to expensive ATE is not
possible at all or when test costs associated with the use of ATE are out of
question for the budget of a specific design. Dff modifications to enable
self-testing may be more reasonable for the circuit designers compared with
the excessive external testing costs. Many such cases exist today, for
example in low-cost embedded applications where a reasonably good test
methodology is needed to reach a relatively high level oftest quality, but on
the other side, no sophisticated solutions or expensive ATE can be used
because of budget limitations. Self-testing needs only an appropriate test
pattern generation flow and the design of on-chip test application and test
response collection infrastructure. Hardware and performance overheads due
to the employment of these mechanisms can be tailored to the specific cost
limitations of a design.
Hardware-based self-testing although a proven successful testing
technology for different types of small and medium sized digital circuits, is
not claimed to be a panacea, as a testing strategy, for all types of
architectures. A careful self-testing strategy should be planned and applied
with careful guidance by the performance, cost and quality requirements of
any given application. More specifically, the concerns that a test engineer
should keep in mind when hardware-based self-testing is the intended testing
methodology, are the following.
Hardware overhead.
This is the amount or percentage of hardware overhead devoted
to hardware-based self-testing which is acceptable for the
particular design. Self-testing techniques based on scan-based
architectures have been widely applied to several designs. In
addition to the hardware overhead that the scan design (either full
or partial) adds to the circuit (regular storage elements modified
into scan storage elements), or the extra multiplexing for internal
37
38
under
test
test
responses
seil-test
memory
JC under test
39
on-chip
test
generator
I
gnder
module
on-chip
response
analyser
test
le under test
Figure 3-4: Self-testing with dedicated hardware.
If a circuit that operates on n-bit operand(s) can be tested with a test set of constant size
(number oftest patterns) which is independent on n, we call the circuit C-testable and the
test set aC-test set.
40
41
3.3
Software-Based Self-Testing
42
under test6 , collects the component responses and finally stores them either
in an unrolled fashion (each response is stored in aseparate data memory
word) or in a compacted form (one or more test signatures). In a multiprocessor SoC design, each of the embedded processors can test itself by
software routines and then they can then apply software-based self-testing to
the remaining cores of the SoC.
The concept of software-based self-testing for the processor itself is
illustrated in Figure 3-5. Self-test routines are stored in instruction memory
and the data they need for execution are stored in data memory. Both
transfers (instructions and data) are performed using external test equipment
which can be as simple as a personal computer and as complex as a high-end
tester. Tests are applied to components of the processor core (CPU core)
during the execution of the self-test programs and test responses are stored
back in the data memory.
I data ~ self-test
code
Iresponse(s)
seit-test J
self-test
CPU core
1
CPU bus
Data Memory
Instruction Memory
External Test
Equipment
Figure 3-5: Software-based self-testing concept for processor testing.
It is implied by Figure 3-5 that the external test equipment has access to
the processor bus for the transfer of self-test programs and data in the
processor memory. This is not necessary in all cases. In general, there should
be a mechanism that is able to download self-test program and data in the
processor memory for the execution of the software-based self-testing to
detect the faults ofthe processor.
As a step further to software-based self-testing of a processor, the
concept of software-based self-testing for the entire SoC design is depicted
in Figure 3-6. The embedded processor core supported with appropriately
The component under test may be either an internal component ofthe processor, the entire
processor itself or a core of the SoC other than the processor.
43
CPU core
Data
Memory
seit-test
code
CPU subsystem
apply test !
Instruction
Memory
capture respons e
Software-based self-testing is an alternative methodology to hardwarebased self-testing that has the characteristics discussed here. At this point,
we outline the overall idea of software-based self-testing that clearly reveals
its low-cost aspects. Appropriately scaled to different processor sizes and
architectures, software-based self-testing is a generic SoC self-testing
methodology.
44
45
As we see in the next Chapter a number ofhardware-based and softwarebased self-testing approaches have been proposed in the past along with
7
Minor flexibility may apply to hardware-based self-testing, such as the loading of different
seeds in pseudorandom patterns generators (LFSR).
46
3.4
47
close relation of testing with high-cost A TE and reduces the external test
equipment requirements to low-cost testers only for downloading of test
programs to on-chip memory and uploading of final test responses from onchip memory to the outside (the tester) for external evaluation.
In terms of special built-in hardware dedicated to test, software-based
self-testing is a non-intrusive test method that does not need extra circuits
just for testing purposes. On the contrary, it relies only on existing processor
resources (its instructions, addressing modes, functional and control units)
and re-uses them for testing the chip (either during manufacturing or during
on-line periodic testing in the field). Therefore, compared to hardware-based
self-testing that needs additional circuitry, software-based self-testing is
definitely a better TRP methodology.
When test application time is the test resource to be optimized, softwarebased self-testing has a significant contribution to this optimization too. Selftesting performed by embedded software routines is a fast test application
methodology for two simple reasons:
(a)
(b)
Test time also depends on the nature and the total number of the applied
test patterns: in pseudorandom testing the number of test patterns is much
larger than in the deterministic one. Software-based self-testing that applies
deterministic test patterns executes in shorter time than in the case of
pseudorandom-based self-testing.
When power consumption is the test resource under optimization,
software-based self-testing never excites the processor or the SoC in other
mode than its normal operating mode for which it is designed and analyzed.
Other testing techniques like external testing using scan design or hardwarebased self-testing with scan design again or test points insertion, test the
circuit during a special mode of operation (test mode), which is completely
different from normal mode. During test mode, circuit activity is much
higher than in normal mode and therefore more power is consumed. It has
been observed that circuit activity (and thus power consumption) during
testing and self-testing can be up to three times higher than during normal
mode [118], [176]. Apart from this, excessive power consumption during
testing leads to energy problems in battery operated products and mayaiso
48
stress the limits of device packages when peak power exceeds specific
thresholds.
Finally, when pin count is the test resource considered for optimization,
software-based self-testing again provides an excellent approach. Embedded
software test routines and data that just need to be down loaded from a lowcost extemal tester, only require a very small number of test-related chip
pins. An extreme case is the use of existing JTAG boundary scan interface
for this purpose. This serial downloading of embedded self-test routines may
require more time than in the case of parallel download but if the routines
are sufficiently small, this is not a serious problem. Self-test routines size is a
metric that we will extensively consider when discussing software-based
self-testing in this book.
3.5
Embedded processors have played for a long time a key role in the
development of digital circuits and are constantly the central elements in all
kinds of applications. Processors are today even more important because of
their increasing usage in all developed embedded systems. The answer to the
question of this subsection (processor testing importance) seems to be really
easy but all details must be pointed out.
The embedded processors in SoC architectures are the circuits that will
execute the critical algorithms of the system and will co-ordinate the
communication between all the other components/cores. The embedded
processors are also expected to execute the self-test routines during
manufacturing testing and in the field (on-line periodic testing), as weIl as,
other functions like system debug and diagnosis.
As a consequence, the criticality and importance of processor testing is
equivalent to the criticality and importance of its own existence in a system
or a SoC. When a fault appears in an embedded processor, for example in
one of its registers, then all programs that use this specific register (maybe
all pro grams to be executed) will malfunction and give incorrect results.
Although the fault exists only inside the processor (actually in a very small
part of it), the entire system is very likely to be completely useless in the
field because the system functionality expected to be executed in the
processor will give erroneous output.
Other system components or SoC cores are not as critical as the processor
in terms of correct functionality of the system. For example, if a memory
word contains a fault, only writes and reads to this specific location will be
erroneous, and this will lead to just a few programs (if any program uses this
memory location at a specific point of time) to malfunction. This may not be
of course a perfectly performing system but the implication of a fault in the
49
memory module does not have so catastrophic results in the system as a fault
in the embedded processor. The same reasoning is true for other cores like
peripheral device controllers. If a fault exists in some peripheral device
controller then the system may have trouble to access the specific device but
it will be otherwise fully usable.
The task of embedded processor testing is very important because if the
processor is not free of manufacturing faults it can 't be used as a vehicle for
the efficient software-based self-testing of the surrounding modules, and as a
result the entire process of software-based self-testing will not be applicable
to the particular system.
The importance of comprehensive and high quality processor testing is
not at all related either with the size of an embedded processor used in a SoC
architecture or its performance characteristics. A small 8-bit or 16-bit
microcontroller is equally important for a successful software-based selftesting strategy when compared with a high-end 32-bit RISC processor with
an advanced pipeline structure and other built-in performance-improving
mechanisms. Both types of processors must be comprehensively tested and
their correct operation must be guaranteed before they are used for selftesting the remaining parts of the system.
Apparently, the equal importance of all types and architectures of
embedded processors for the purposes of software-based self-testing does
not mean that all embedded processor architectures are tested with the same
difficulty.
3.6
50
51
52
We note that processor faults that are not functionally detectable i.e. they
can 't be detected when circuit operates in normal mode, will not be detected
by software-based self-testing. This is natural, since software-based selftesting only applies normal processor instructions for fault detection.
To give an idea of the criticality and difficulty of processor testing in
software-based self-testing it is useful to outline a small but informative
example. Consider a SoC consisting of an embedded processor that occupies
only 20% of the total silicon area, several memory modules occupying 70%
of the total area (usual situation to have large embedded memories) and
other components for the remaining 10% of the SoC area.
First, it is much more important to develop a self-testing strategy that
reaches a fault coverage of more than 90% or 95% for the processor than for
the embedded memories although the processor size is more than three times
smaller that the memories. As we explained earlier, faults in the processor
that escape detection will lead to the vast majority of the programs to
malfunction (independently of the exact location of the fault in the
processor: it may be a register fault, a functional unit fault or a fault in the
control unit). On the contrary, faults that escape detection in any of the
memory cores will only lead to a sm all number of pro grams not operating
correct1y. Moreover, one should not forget that the overall idea of processorbased self-testing or software-based self-testing for SoC architectures can be
53
Chapter
4
Processor Testing Techniques
Intensive research has been performed since the appearance of the first
microprocessor in the field of processor testing. A variety of generic
methodologies as weH as several ad hoc solutions has been presented in the
literature. In this Chapter we provide an overview of the open literature in
processor testing, putting emphasis on the different characteristics of the
approaches and the requirements that each of them tries to meet.
This Chapter consists of two parts. In the first part, we discuss the
characteristics of a set of different classes of processor testing techniques
(external testing vs. self-testing; functional testing vs. structural testing; etc)
along with the benefits and drawbacks of each one. In the second part, we
briefly discuss the most important works in the area in a chronological order
of publication. The Chapter concludes with the linking of the two parts,
where each of the works presented in the literature is associated with the one
or more classes that it belongs to aiming to provide a quick reference for
those that are interested to study the area of processor testing.
4.1
56
External testing of a processor (or any IC) means that test patterns are
applied to it by an external tester (ATE). The test patterns along with the
expected test responses have been previously stored in the ATE memory.
This is the classical manufacturing testing technique used in digital circuits.
Functional test patterns previously developed for functional verification can
be re-used in this scenario, potentially enhanced with ATPG patterns to
increase the fault coverage.
On the other side, self-testing of a processor means that test patterns are
applied to it and test responses are evaluated for correctness without the use
of external A TE, but rather using internal resources of the processor. Internal
resources may be either existing hardware and memory resources, or extra
hardware particularly synthesized for test-related purposes (on-chip test
generation and response capturing).
The benefits and drawbacks of external testing and self-testing of
processors are common to those of any other digital circuit and are
summarized in Table 4-1.
57
Benefits
Small on-chip
hardware overhead
Small chip
performance impact
Self Testing
At-speed testing
Low-cost ATE
Re-usable during
product's life cycle
Table 4-1: External testing vs. self-testing.
External Testing
4.1.2
Drawbacks
Not at-speed testing
High-cost ATE
Only for manufacturing
testing
Hardware overhead
Performance impact
Benefits
High fault coverage
Extensive use of EDA
tools
No hardware,
performance or power
consumption overhead
Table 4-2: DfT-based vs. non-intrusive testing.
Drawbacks
Non-trivial hardware,
performance and
power consumption
overheads
Limited EDA use
Low fault coverage
58
4.1.3
59
Drawbacks
No relation with structural
faults
Low defect coverage
Pseudorandom operands
Longtestsequences
Long test programs
Needs gate-level model of
processor
Higher test development
cost
Processor testing may focus on the detection of the faults that belong
either to a combinational fault model 8 like the industry standard single stuckat fault model, or a sequential fault modeP, i.e. a fault model which faults
lead to a sequential behavior of the faulty circuit and require two-pattern
tests for fault detection. Delay fault models like the path delay fault model
are the most usual sequential fault models. Appropriate selection of the
targeted delay faults (such as the selection of the path delay faults) must be
done to reduce the ATPG and fault simulation time. A TPG and fault
simulation times for sequential fault models are much larger than for
combinational faults. In some cases, the number of path delay faults that are
to be simulated are so many that require an excessively long fault simulation
time.
EDA tools for combinational fault models (particularly for stuck-at fault
model) have been used since many years ago. Their sequential fault model
In general, combinational faults alter the behavior of a circuit in such a way that
combinational parts of it still behave as combinational ones but with a different - faulty fimction instead of the correct one.
In general, sequential faults change the behavior of combinational parts of a circuit into a
sequential one: outputs depend on the current inputs as weil as on previous inputs.
60
Sequential
faults testing
Benefits
Small test sets or test
programs
Short test application
time
Short test generation
and fault simulation
time
EDA tools maturity
Higher test quality
Higher defect coverage
Drawbacks
Needs gate-level
model of the processor
Less defect coverage
4.1.5
The fault coverage that a processor testing technique can obtain depends
on the number, type and nature of the test patterns applied to the processor.
Pseudorandom testing for processors can be based on pseudorandom
61
10
11
62
Pseudorandom
testing
Determ in istic
testing
Benefits
Easy development
of test sequences
No gate level
details needed for
test development
High fault coverage
Short test
sequences
Drawbacks
Longtestsequences
Low fault coverage
63
Testing
Diagnosis
Benefits
Small test
sequences
EDA tools support
High test quality
Supports yield
improvement
Drawbacks
Only pass/fail
indication
Large test
sequences
Less EDA tools
support
4.1.7
4.1.8
Drawbacks
May not be reusable
More expensive
64
4.2
One of the first papers in the open literature that addressed the subject of
microprocessor testing was B.Wi111iams' paper in 1974 [169]. The internal
operation of microprocessors was reviewed to illustrate problems in their
testing. LSI test equipment, its requirements, its application were discussed
as well as issues on wafer testing.
In 1975, M.Bilbault described and compared five methods for
microprocessor testing [17]. The first method was called 'Autotest' and it
assembled the microprocessor in its natural environment. A test pro gram was
running that could generate 'good' or 'bad' responses. The second method
was similar to the first one, but it was based on comparison of the responses
using a reference microprocessor whose output was compared with the
microprocessor under test after each instruction cyc1e. The third method was
65
called 'real time algorithmic' and used a prepared program to send aseries of
instructions to the microprocessor and to compare its response with that
expected. The fourth method was called 'recorded pattern' and had two
phases. In the first one, the microprocessor was simulated and the responses
were recorded while in the second one the responses of the microprocessor
under test were compared with the recorded responses created during the
first phase. The fifth method from Fairchild was called 'LEAD' (learn,
execute and diagnose) where the test pro gram was transferred to the memory
of the tester, together with the response found from a reference
microprocessor, and the memory also contains aIl the details of the
microprocessor's environment. Advantages of speed and thoroughness were
claimed for 'LEAD'.
In 1975, R.Regalado introduced the concept of user testing of
microprocessors claiming that microprocessor testing in the user's
environment does not have to be difficult or laborious, even though it is
inherently complex [135]. The basic concepts of the 'people' (user) oriented
approach for microprocessors testing were: (1) "The test system's computer
performs every function that it is capable of performing" - this is the concept
of functional testing; and (2) "The communication link between the
computer and people (test engineers for example) is interactive".
In 1976, E.C.Lee proposed a simple microprocessor testing technique
based on microprocessor substitution [107]. Because of its low cost in
hardware and software development, this technique was suitable for user
testing in a simple user environment. A tester for the Intel 4004
microprocessor was described.
In 1976, D.H.Smith presented a critical study for four ofthe more widely
accepted, during that period, methods for microprocessor testing, with a
view to developing a general philosophy which could be implemented as a
test with minimum effort [147]. The considered microprocessor testing
methods were: actual use of microprocessor, test pattern generation based on
algorithms to avoid test pattern storage, stored-response testing, and
structural verification.
The pioneering work of S.Thatte and J.Abraham in functional
microprocessor testing was first presented in 1978 [156]. In this paper, the
task of faults detection in microprocessors, a very difficult problem because
of the processors' always increasing complexity, was addressed. A general
microprocessor model was presented in terms of a data processing section
(simple datapath) and a control processing section, as weIl as, a functional
fault model for microprocessors. Based on this functional fault model the
authors presented a set of test generation procedures capable of detecting aIl
considered functional faults.
66
67
68
In general, an m-out-of-n code consists of code words which have m I's and n-m O's.
69
70
71
excessively large test sequences without being able to reach high fault
coverage.
In 1988, L.Shen and S.Y.H.Su presented in [146] (and in a preliminary
version of the work in 1984 [145]) a functional testing approach for
microprocessors based on a Register Transfer Level control fault model,
which they also introduced. As a first step, the read and write operations to
the processor registers are tested (these operations are called the kernel) and
subsequently all the processor instructions are tested using the kernel
operations. The k-out-of-m codes are utilized to reduce the total number of
functional tests applied to the processor. Details ofthe functional fault model
and the procedure to derive the tests are provided in this work.
In 1989, E.-S.A.Talkhan, A.M.H.Ahmed, A.E.Salama focused on the
reduction of test sequences used for microprocessor functional testing, so
that as short as possible instruction sequences are developed to cover all the
processor instructions [152]. Application of the method to TI's TMS32010
Digital Signal Processor shows the obtained reductions in terms of number
of functional tests that must be executed.
In 1990, A.Noore and B.E.Weinrich presented a microprocessor
functional testing method in which three test generation approaches were
given: the modular block approach, the comprehensive instruction set
approach and the microinstruction set approach [119]. These approaches
were presented as viable alternatives to the exhaustive testing of all
instructions, all addressing modes and all data patterns, a strategy that is
getting more infeasible and impractical as processors sizes increase. This
type of papers, make obvious how difficult the problem of functional testing
of processors be comes as processor sizes increase. In this work, some
examples ofthe approach application to Intel's 8085 processor are given but
with no specifics on test pro gram size and execution time or fault coverage
obtained.
In 1992, A.J. van de Goor, and Th.J.W. Verhallen [58] presented a
functional testing approach for microprocessors which extends the functional
fault model introduced in [21] and [158] in the 80's, to cover functional units
not described by the earlier functional fault model. Memory testing
algorithms were integrated into the functional testing methodology to detect
more complex types of faults (like coupling faults and transition faults) in
the memory units of the microprocessor. The approach has been applied to
Intel's i860 processor, but no data is provided regarding the number of
tests applied, and the size and execution time ofthe testing program.
J.Lee and J.H.Patel in 1992 [108] and 1994 [109] treated the problem of
functional testing of microprocessors as a high level test generation problem
for hierarchical designs. The test generation procedure was split in two
phases, the path analysis phase and the value analysis phase. In path
72
73
74
75
patterns are used) and automatic test program synthesis. The target is the
detection of delay faults in functionaIly testable paths of the processor. The
entire flow requires knowledge of the processor's instruction set, its microarchitecture, RTL netlist as weIl as the gate-level netlist for the identification
of the functionaIly testable paths. Experiments have been performed for the
Parwan educational processor [116] as weIl as for the DLX RlSC processor
[59]. In Parwan, as self-test program of 5,861 instructions (bytes) obtained a
99.8% coverage of aIl the functionaIly testable path delay faults, while in
DLX, a self-test program of 34,694 instructions (32-bits words) obtained a
96.3% coverage of aIl functionaIly testable path delay faults.
The contribution of the work presented by L.Chen and S.Dey in 2001
[28] (preliminary version was presented in [27]) is twofold. First, it
demonstrates the difficulties and inefficiencies of Logic BIST (LBIST)
application to embedded processors. This is shown by applying Logic BIST
to a very simple 8-bit accumulator-based educational processor (Parwan
[116]) and a stack-based 32-bit soft processor core that implements the Java
Virtual Machine (picojava [127]). In both cases, Logic BIST adds more
hardware overhead compared to fuIl scan, but is not able to obtain
satisfactory structural fault coverage even when a very high number of test
patterns are applied. Secondly, a structural software-based self-testing
approach is proposed in [28] based on the use of self-test signatures. Self-test
signatures provide a compacted way to download previously prepared test
patterns for the processor components into on-chip memory. The self-test
signatures are expanded by embedded software routines into test patterns
which are in turn applied to the processor components and test responses
(either individuaIly or in a compacted signature) are coIlected for external
evaluation. The component test sets are either previously generated by an
ATPG and then embedded in pseudorandom sequences or are generated by
software implemented pseudorandom generators (LFSRs). Experimental
results on the Parwan educational processor show that 91.42% fault coverage
is obtained for single stuck-at faults with a test pro gram consisting of 1,129
bytes, running for a total of 137,649 clock cycles.
In 2001, the problem of processor testing and processor-based SoC
testing was addressed by W.-C.Lai and K.-T.Cheng [104], where instructionlevel DfT modifications to the embedded processor were introduced. Special
instructions are added to the processor instruction set with the aim to reduce
the length of a self-test pro gram for the processor itself or for other cores in
the SoC, and to increase the obtained fault coverage. Experimental results on
two simple processor models, the Parwan [116] and the DLX [59], show that
complete coverage of aIl functionaIly testable path delay faults can be
obtained with smaIl area DfT overheads that also reduce the overall self-test
program length and its total execution time. In Parwan, test program is
76
reduced by 34% and its execution time reduced by 39% with an area
overhead of 4.7% compared to the case when no instruction-level OfT is
applied to the same processor [105]. Moreover, complete 100% fault
coverage is obtained while in [l05] fault coverage was a bit lower 99.8%. In
the OLX case, the self-test program is 15% smaller and its execution time is
reduced by 21 % with an area overhead due to OfT of only 1.6%. Fault
coverage is complete (100%) for the OfT case while it was 96.3% in the
design without the OfT modifications [105]. All fault coverage numbers
refer to the set of functionally testable path delay faults.
In 2001, F.Como, M.Sonza Reorda, G.Squillero and M.Violante [32]
presented a functional testing approach for microprocessors. The approach
consists of two steps. In the first step, the instruction set of the processor is
used for the construction of a set of macros for each instruction. Macros are
responsible for the correct application of an instruction and the observation
of its results. In the second step, a search algorithm is used to select a
suitable set of the previously developed macros to achieve acceptable fault
coverage with the use of ATPG generated test patterns. A genetic algorithm
is employed in the macros selection process to define the values for each of
the parameters of the macros. Experimental results are reported in [32] on
the 8051 microcontroller. The synthesized circuits consist of about 6,000
gates and 85.19% single stuck-at fault coverage is obtained, compared to
80.19% of a pure random-based application of the approach. The number of
macros that actually contributed to the above low fault coverage, lead to a
test program consisting of 624 processor instructions.
In 2002, L.Chen and S.Oey exploited the fault diagnosis capability of
software-based self-testing [29]. A large number of appropriately developed
test programs are applied to the processor core in order to partition the fault
universe in sm all er partitions with unique pass/fail pattern. A sufficient
diagnostic resolution and quality was obtained when the approach is applied
to the simple educational processor Parwan [116].
In 2002, N.Kranitis, O.Gizopoulos, A.Paschalis and Y.Zorian [94], [95],
[96] introduced an instruction-based self-testing approach for embedded
processors. The self-test programs are based on small deterministic sets of
test patterns. First experimental results for the methodology were presented
in these papers. The approach was applied to Parwan, the same small
accumulator-based processor used in [28] and a 91.34% single stuck-at faults
coverage was obtained with a self-test pro gram consisting of around 900
bytes and executing for about 16,000 clock cycles (these numbers gave
about 20% reduction of test pro gram size and about 90% reduction in test
execution time compared to [28]).
In 2002, P.Parvathala, K.Maneparambil and W.Lindsay presented an
approach called Functional Random Instruction Testing at Speed (FRITS)
77
which applies randomized instruction sequences and tries to reduce the cost
of functional testing of microprocessors [125]. To this aim Dff
modifications are proposed that enable the application of the functional selftesting methodology using low-cost, low-pin count testers. Moreover,
automation of the self-testing programs generation is considered. The basic
feature of FRITS which is also its main difference when compared with
classical functional processor self-testing is that a set of basic FRITS
routines (caIled kerneis) are loaded to the cache memory of the processor
and are responsible for the generation of several programs consisting of
random instruction sequences and are used to test parts of the processor.
External memory cycles are avoided by appropriate exception handling that
eliminates the possibility of cache misses that initiate main memory
accesses. The FRITS methodology is reported to be applied in the Intel
Pentium 4 processor resulting in around 70% of single stuck-at fault
coverage. Also, application of the approach to the Intel Itanium processor
integer and floating point units led to 85% single stuck-at fault coverage.
The primary limitation of this technique, like any other random-based,
functional self-testing technique, is that an acceptable level of fault coverage
can only be reached if very long instruction sequences are applied - this is
particularly true for complex processor cores with many components.
In 2003, L.Chen, S.Ravi, A.Raghunathan and S.Dey focused on the
scalability and automation of software-based self-testing [31]. The approach
employs RTL simulation-based techniques for appropriate ranking and
selection of self-test pro gram templates (instruction sequences for test
delivery to each of the processor components) as weIl as techniques from the
theory of statistical regression for the extraction of the constraints imposed
to the application of component-Ievel tests by the instruction set of the
processor. The constraints are modeled as virtual constraint circuits (VCC)
and automated self-test pro gram generation at the component level is
performed. The approach has been applied to a relatively large
combinational sub-circuit of the commercial configurable and extensible
RISC processor Xtensa from Tensilica [174]. A self-test program of
20,373 bytes, running for a total of 27,248 clock cycles obtained 95.2% fault
coverage of the functionaIly testable faults of the component. A total of 288
test patterns generated by an ATPG were applied to the component.
In 2003, N.Kranitis, G.Xenoulis, D.Gizopoulos, A.Paschalis and
Y.Zorian [99], [100], showed the applicability of software-based self-testing
to larger RISC processor models while a classification scheme for the
different processor components was introduced. The processor self-testing
problem is addressed by a solution focusing on the test costs reduction in
terms of engineering effort, self-test code size and self-test execution time,
aIl together leading to significant reductions of the total device test time and
78
Apart from the research described in the previous sections which focused
on more or less generic methodologies for processor testing and self-testing,
a large set of industrial case studies papers were presented at the IEEE
International Test Conference (ITC) over the last years. In this type of
papers, authors of major microprocessor design and manufacturing
companies summarized the manufacturing testing methodologies applied to
several generations of successful, high-end microprocessors of our days.
Several exciting ideas were presented in these papers although, for obvious
reasons, only a small set of the details were revealed to the readers.
A list of papers of this category is included in the bibliography of this
book including microprocessor testing papers from Intel, IBM, Motorola,
Sun, HP, AMD, DEC, ARM, TI. The list includes industrial papers
presented at ITC between years 1990 and 2003.
4.3
79
Table 4-8 presents the classification of the works analyzed above into the
categories described earlier. Some explanations are necessary for the better
understanding ofTable 4-8.
A paper appearing in the self-testing category means that it specifically
focused on self-testing but in many cases the approach can be applied to
external testing as well.
A paper appearing in the OfT-based testing category means that some
design changes are required to the processor structure while all other works
do not change the processor design.
A paper appearing in the sequential faults testing category means that the
approach was developed andJor applied targeting sequential faults such as
de1ay faults. All other works were either applied for combinational faults
testing or did not use any structural fault model at all (functional testing).
Category
Works
Seit Testing
(others are on extern al testing only)
DfT-based Testing
(others are non-intrusive)
Functional Testing
Structural Testing
(including register transfer level)
Sequential Faults Testing
(others are on combinational faults)
Pseudorandom Testing
Deterministic Testing
(including ATPG-based)
[91] ,
[105] ,
[95] ,
Diagnosis
(others are on testing only)
[33] , [92] , [64] , [107] ,
Field Testing - On-Une Testing
(others are on manufacturing
testing)
[152] , [2 ], [131] , [175]
DSP Testing
(others are on microprocessor
testing)
Table 4-8: Processor testmg methodologles classlflcatlOn.
[135] ,
[173]
Chapter
5
Software-Based Processor Self- Testing
82
5.1
CPU bus
External Test
Equipment
83
access to the internal bUS I4 The embedded code will perform the
self-testing ofthe processor. Alternatively, the self-test code may be
"built-in" in the sense that it is permanently stored in the chip in a
ROM or flash memory (this scenario is shown in Figure 5-2). In this
case, there is no need for a downloading process and the self-test
code can be used many times for periodic/on-line testing of the
processor in the field.
The self-test data is downloaded to the embedded data memory of
the processor via the same extern al equipment. Self-test data may
consist, among others, of: (i) parameters, variable and constants of
the embedded code, (ii) test patterns that will be explicitly applied to
internal processor modules for their testing, (iii) the expected faultfree test responses to be compared with actual test responses.
Downloading of self-test data does not exist if on-line testing is
applied and the self-test program is permanently stored in the chip.
Control is transferred to self-test pro gram which starts execution.
Test patterns are applied to internal processor components via
processor instructions to detect their faults. Components' responses
are collected in registers and/or data memory locations. Responses
may be collected in an unrolled manner in the data memory or may
be compacted using any known test response compaction algorithm.
In the former case, more data memory is required and test
application time may be longer, but, on the other side, aliasing
problems are avoided. In the latter case, data memory requirements
are smaller because only one, or just a few, self-test signatures are
collected, but aliasing problems may appear due to compaction.
After self-test code completes execution, the test responses
previously collected in data memory, either as individual responses
for each test pattern or as compacted signatures are transferred to the
external test equipment far evaluation.
In general, it is assumed that a mechanism exists for the transfer of self-test code and data
to the embedded instruction and data memory. This mechanism can be for example a
simple serial interface or a fast Direct Memory Access (DMA) protocol.
84
which can be used subsequently for further actions on the system (repair, reconfiguration, re-computation, etc).
seit-test
code
seit-test
response(s)
CPU core
Data Memory
seit-test
data
Seit-test Memory
(ROM . flash)
CPU bus
15
For example, a test pattern for a two-operand operation consists oftwo parts, the first and
the second operand value. Application of such a test pattern requires two register writes.
85
We note that the above steps can be partially merged depending on the
particular coding style used for a processor and also the details of its
instruction set. For example, if the processor instruction set contains
instructions that directly export the results of an operation (ALU operation,
etc) to a memory location then the steps of response collection and response
extraction are actually merged. Figure 5-3 shows in a graphical way the three
steps of software-based self-testing application to an internal component of a
processor core. All three steps are executed by processor instructions; no
additional hardware has to be synthesized for self-testing since the
application of self-testing is performed by processor instructions that use
existing processor resources. The processor is not placed in a special mode
of operation but continues its normal operation executing machine code
instructions.
16
This is the case when the test application instruction gives a multi-word result (two-word
results are given by muItiplication and division), or the extraction of a data word and a
status word (a status register) is necessary.
86
fault
CPU core
fault
effect
Test Preparation
CPU core
Test Applicationi Response Collection
faull
effecl
faull
CPU_
core
L-_
_ _ _ _--'
10 :;7
Response Extraction
The registers used in the example are named rl, r2, r3, while x, Y are
memory loeations eaeh of whieh eontain half of the test patterns to be
applied to the ALU and Z is the memory loeation where the test response of
87
the ALU is eventually stored 17 The test preparation step consists of loading
the two registers rI, r2 with part of the test pattern to be applied to the
ALU. Memory locations X and Y contain these two parts of the ALU test
pattern. Registers r1 and r2 (like all general purpose registers of the
processor) have access to the ALU component inputs. The test application
and response collection step consists of the execution of the addition
instruction where the test pattern now stored in rl and r2 is applied to the
ALU and its response (result of the addition) is collected into register r3 (in
a single instruction add). Finally, the store instruction is the response
extraction step where the ALU response which is stored in register r3 is
transferred from inside of the processor to the outside external memory
location z. As we have already mentioned, test responses may either be
collected in memory locations individually, or compacted into a single or a
few signatures to reduce the memory requirements.
Of course, not every module of the processor can be tested with such
simple code portions as the one above, but this assembly language code
shows the basic simple idea of software-based self-testing. The effectiveness
of software-based self-testing depends on the way that test generation (i.e.
generation of the self-test programs) is performed to meet the specific
requirements of an application (test code size, test code execution time, etc).
Moreover, it strongly depends on the Instruction Set Architecture of the
processor under test. The requirements for an effective application of
software-based self-testing are discussed in the following subsection.
5.2
Software-based self-testing is by its definition an effective processor selftesting methodology. Its basic objective is to be an alternative self-testing
technique that eliminates or reduces several bottlenecks of other techniques.
Specifically, software-based self-testing is effective because:
17
In this example, the test patterns are stored in locations ofthe data memory (words X and
Y). As we will see, test patterns can be also stored in the instruction memory, as parts of
instructions when the immediate addressing mode is used in self-test programs.
88
The primary and most important decision that has to be taken in any test
development process is the level of fault coverage and test quality that is
required. All aspects of the test generation and test application processes are
affected by the intended fault coverage level: is 90%, 95% or 99% fault
coverage for single stuck-at faults sufficient?
Another issue related to test quality is the fault model that is used for test
development. Comprehensive sequential fault models, such as the delay fault
18
Power consumption concems during manufacturing testing are basically related with the
stressing of the chip's package and thermal issues. Power consumption concems during
on-line periodic testing are related with the duration ofthe system's battery which may be
exhausted faster if long self-test pro grams are executed.
89
model, are known to lead to higher test quality and higher defect coverage
than traditional, simpler combinational fault models such as the stuck-at fault
model. On the other hand, sequential fault models consume significantly
more test development time and also lead to much larger test application
time because of the larger size of the test sets they apply to the circuit.
In summary, in software-based self-testing a hunt for higher fault
coverage and/or a more comprehensive fault model under consideration
means:
more test engineering effort and more CPU time for manual and
automatic (ATPG-based) self-test program generation,
respectively;
larger self-test programs and thus longer downloading time from
external test equipment to the processor's memory;
longer test application time, i.e. longer self-test program
execution intervals, thus more time that each chip spends during
testing.
90
5.2.2
A major concern in test generation for electronic systems is the manpower and related costs that are spent for test development. This cost factor
is an increasingly important one as the complexity of modem chips
increases. In processors and processor-based designs, software-based selftesting provides a low-cost test solution in terms oftest development cost.
Any software-based self-testing technique can theoretieally reach the
maximum possible test quality in terms of fault coverage (i.e. detection of all
faults of the target fault model that can appear during system's normal
operation) if an unlimited time can be spent for test development. Of course,
unlimited test development time is never allowed! A limited test engineering
time and corresponding labor costs can be dedicated during chip
development for the generation of self-test programs.
The ultimate target of a testing methodology is summarized in the
following sentence: obtain the maximum possible fault coverage/test quality
under specijic test development and test application cost constraints.
Therefore, if a methodology is capable to reach a high fault coverage level
with small test engineering effort in short time, it is indeed a cost effective
test solution and particularly useful for low-cost applications. This aspect of
software-based self-testing is the one that is analyzed more in this Chapter.
Of course, if a particular application's cost analysis allows an unlimited
(or very large) effort or time to be spent far test development, then higher
fault coverage levels can be always obtained.
When discussing software-based self-test generation in subsequent
sections we focus on the importance of different parts of the processor for
self-test programs generation so that high fault coverage is obtained by
software self-test routines as quickly as possible. By applying this "greedy",
in some sense, approach, limited test engineering effort is devoted to test
generation to get the maximum test quality under this restrietion.
Figure 5-4 presents in agraphie way the main objective of softwarebased self-testing as a low-cost testing methodology. Between Approach A
and Approach B both of whieh may eventually be able to reach a high fault
coverage of 99%, the best one is Approach A since it is able to obtain a
95%+ fault coverage quicker than Approach B. "Quicker" in this context
means with less engineering effort (man power) and/or with less test
development costs. This is basic objective of software-based self-testing as
we describe it in this Chapter.
.
!
91
Approach A
t-------------------------
95%
Approach B
CD
Cl
CD
>
o
U
""
~
t\I
U.
Effort or Cost
When devising a low-cost test strategy, the most important first step is to
identify the cost factor or factors, if more than one, that are the most
important for the specific test development flow. Low test engineering effort
may be one of the important cost factors because it is calculated in personnel
cost for test development (either manual or EDA assisted). But, on the other
side, test development is always a one-time effort that results to a self-test
program which is eventually applied to all the same processors. Therefore,
the test development cost and corresponding test engineering efforts are
divided by the number of devices that are fmally produced. In high volume
production lines test development costs have marginal contribution to the
overall system costs. If production volume is not very high then test
development costs should be more carefully considered in the global picture.
This is the case of low-cost applications.
5.2.3
Total test application time for each device of a production line has a
direct relation with the total chip manufacturing cost. In particular, test
application time in software-based self-testing consists ofthree parts:
92
The relation between the frequency ofthe external test equipment and the
frequency of the processor under test determines the percentage of the total
test application time that belongs to the downloading/uploading phases or
the self-test execution phase. The basic objective of software-based selftesting is to be used as a low-cost test approach that utilizes low-cost, lowmemory and low pin-count extern al test equipment. To this aim the most
important factors that must be optimized in test application is the time of the
downloading of the test program and the uploading of the test responses, i.e.
the first and third parts of software-based self-testing as given above. A
simple analysis of the total test application time of software-based selftesting is useful to reveal the impact of all three parts oftest application time.
Let us consider that the processor under test has a maximum operating
frequency of f up while the external tester l9 has an operating frequency of
ftester- This means that a self-test program (code and data) can be
down loaded at maximum rate of ftester and it can be executed by the
processor at a maximum rate of f uP'
There are two extremes for the downloading of the self-test pro gram into
the embedded memory of the processor. One extreme is the parallel
downloading of one self-test program word (instruction on data) per cyc1e.
Fast downloading of self-test programs can be also assisted if the chip
contains a fast Direct Memory Access (DMA) mechanism which is able to
transfer large amounts of memory contents without the participation of the
processor in this transfer. The other extreme is the serial downloading of a
self-test program bit per cycle. This is applicable is case that aserial
interface is only available in the chip for self-test program downloading but
this is a rare situation. If the total size of self-test pro gram (code and data) is
sm all enough, then even the simple serial transfer interface will not severely
impact the total self-test application time.
Let us also consider that a self-test program consisting of C instruction
words and D data words must be downloaded to the processor memory
(instruction and data memory) and that a final number of R responses must
be eventually uploaded from processor data memory to the tester memory
for external evaluation. Finally, we assume that the execution of the entire
self-test pro gram and the collection of all responses take in total K c10ck
cyc1es of the processor. For simplicity, we assume that the K c10ck cyc1es
inc1ude any stall cycles that may happen during self-test pro gram execution
either for memory accesses (memory stall cyc1es) or for processor internal
reasons (pipeline stall cyc1es).
19
We use the term "tester" denoting any extemal test-assisting equipment. It can be an
expensive high-frequency tester, a less expensive tester or even a simple personal
computer, depending on the specific application and design.
93
= W/ftester +
K/fup,
or
or
where W
C+D+R
It is exact1y this last formula that guides the process of test generation for
software-based self-testing and the embedded software co ding styles, as we
will see in subsequent sections. The relation between the ftester and fup
frequencies reveals the upper limits for the self-test program size
(instructions - C, and data - D) and responses size (R) as weIl as for the seIftest program execution time (number of cycles K).
Figure 5-5 and Figure 5-6 use the above formula to show in a graphical
way, how the total test application time (T) of software-based self-testing is
affected by the relation between the frequency of the chip (fup) and
frequency of the tester (ftester) as weIl as by the relation between the
number of clock cycles ofthe pro gram (K) and its total size (w).
Figure 5-5 presents the application time of software-based self-testing as
a function ofthe ratio K/w, for three different values ofthe ratio fup/ftester.
A large value of the K/w ratio means that the self-test program
(instructions+data+responses) is small and compact (smaller value for w)
andJor it consists of loops of instructions executed many times each (small
code but large execution time). Obviously as we see in Figure 5-5, in all
three cases of fupl ftester ratios (2, 4 and 8), when the K/w ratio increases
the total test application time increases. The most important observation in
the plot is that when the fupl ftester ratio is smaller (2 for example), this test
application time increase is much faster compared to larger fupl ftester ratios
(4 or 8). In other words, when the chip is much faster than the tester then an
increase in the number of clock cycles for self-test execution (K) has a small
effect on the chip's total test application time. On the other side, ifthe chip is
not very fast compared to the tester, then an increase in the number of clock
cycles leads to a significant increase in test application time.
94
Q)
c:
0
.~
fu p l f tester
Q.
~
(;j
Q)
upl
tester
16
KJW ratio
Figure 5-5: Test application time as a function ofthe K/W ratio.
From the inverse point of view, Figure 5-6 shows the application time of
software-based self-testing as a function of the ratio fup/ ftesten for three
different values ofthe ratio K/W. An increasing value ofthe fup/ftester
ratio denotes that the processor chip is much faster than the tester. This
means that for the same ratio K/W the test application time will be smaller
since the pro gram will be executed faster. This reduction becomes sharper
and more significant when the K/W ratio has smaller values (2 for example),
i.e. when the self-test pro gram is not so compact in nature and the number of
elock cyeles is elose to the size ofthe program.
95
K/W
Q)
E
i=
c:
0
Q.
~
Ci)
K/W
16
f~ftester ratio
Figure 5-6: Test application time as a function of the fup/ftester ratio.
In the above discussion, W is the sum of the self-test code, self-test data
and self-test responses, C, D and R, respectively. When test responses of the
processor components are compacted by the self-test program (a special
compaction routine is part ofthe self-test pro gram) then R is a sm all number,
possibly 1, when a single signature is eventually calculated. This leads to a
reduction of time fot uploading the test responses (signatures) but on the
other side, the self-test pro gram execution time (number of clock cycles K) is
increased because of the execution of the compaction routines.
We have to point out that the above calculations and values ofFigure 5-5
and Figure 5-6 are simple and rough approximations of the test application
time of software-based self-testing based only on the time required to
downloadJupload the self-test program (code, data and responses) and the
time required to execute the self-test pro gram and collect the test responses
from the processor. A more detailed analysis is necessary for each particular
application, but the overall conclusions drawn above will still be valid for
the impact of the fup/ ftester and K/W ratios on the test application time of
software-based self-testing.
In the case of on-line periodic testing of the processor, the
downloadJupload phases either do not exist at all or are executed only at
96
system start-up20. Therefore, the self-test pro gram size does not have an
impact on the test application time in on-line testing. On the other side, the
size of the self-test pro gram is important in on-line periodic testing because
it is related with the size of the memory unit where it will be stored for
periodic execution.
5.2.4
so
= W+ K
20
Ifthe self-test code and data are stored in a memory unit such as a ROM, then there is no
download/upload phase. If they are transferred trom a disk device into the processor
RAM, this will only happen once at system start-up.
97
measures. The one with the smallest measure can be easily considered more
efficient at a given ration between the frequencies of the external tester and
the processor.
5.2.5
98
5.2.6
21
22
Either a gate-level netlist must be available or a synthesizable model ofthe processor from
which a gate-level netlist can be generated after synthesis.
Even non-pipelined processors are very deep sequential circuits, and sequential ATPG
tools usually fail to generate sufficiently small test sets that reach high fault coverage for a
processor.
99
5.2.7
100
5.3
101
24
102
Information Extraction
!
Phase B
Components
Classitication/Prioritization
1
Phase C
Component-Ievel Seit-test
Routines Development
!
Phase D
Processor-Ievel Seit-test
Program Optimization
103
IS
identifies all the components that the processor consists of; this is
an essential part of the component-based software-based selftesting because test development is performed at the component
level;
identifies the operations that each of the components performs;
and
identifies instruction sequences consisting of one or more
instructions for controlling the component operations, for
applying data operands to the operations and for observing the
results of operations to processor outputs.
Phase A
Identity Processor
Components
--
'RT-Levellnfo
"-
--
ISA Info
11--......
Identify Component
Operations
Identity Instruction
Sequencesto
Control/Apply/Observe
each Operation
104
--
-...,
'-
---
Classify Processor
Components Types
.-"
Info from
Phase A
Prioritize Processor
Components for Test
Development
105
generated,
pseudorandom-based,
deterministic tests, etc)
known
pre-computed
Phase C
I'--
./
Info from
Phases A, B
"r
r.......
Develop Self-Test
Routines for HighPriority Components
.....
--"
Component
Test Library
......
106
107
Phase 0
-.....
I'--Routines from
Phase C
""'"
r
---.....11
-......,
Optimize Self-Test
Routines
I'--Optimization
Criteria
""'"
---.....11
5.4
108
5.4.1
Functional components
109
25
26
In the examples of this Chapter, we use the assembly language of the MIPS processors.
We denote the registers as Rs, Rt, Rd or as RI, R2, etc, for simplicity reasons, although
traditionally, these registers are denoted as $sO, $sl, etc in the MIPS assembly language.
Multiplication using a parallel multiplier may take more than 1 clock cycle, if for
performance reasons of the other instructions, the multiplication is broken in more than
one phases and takes more (usually two) c10ck cyc1es.
110
A two-input multiplexer that feeds the second input of the ALU which
implements the bitwise-OR operation is in this case an identified
interconnect functional component.
Storage functional components are the easiest case of the three subclasses of functional components since in most cases their existence is
directly included in the assembly language instructions or explicitly stated in
the programmer's manual (like in the case of the Hi and La registers of
MIPS which although are not referred in assembly language are explicitly
mentioned in the programmer's manual of the processor). Usually, an
accumulator's or general purpose register's name is part of assembly
language format, while also a status register that saves the status information
after instructions are executed, is also directly implied by the format and
meaning of special processor instructions used for its manipulation. Getting,
again, an example from the MIPS instruction set architecture, the following
assembly language instruction:
and R4, R2, R3
identifies three general purpose registers from the integer register file: R2,
and R4. Moreover, an additional piece of information that can be
extracted from the existence of this instruction is that the processor contains
a register file with at least 2 read ports and at least 1 write port.
R3
111
Control components
The control components are those that control either the flow of
instructions and data inside the processor or the flow of data from and to the
external environment (memory subsystem, peripherals, etc).
A classical control component is the component that implements the
instruction decoding and produces the control signals for the functional
components of the processor: the processor control unit. The processor
control unit may be implemented in different ways: as a Finite State
Machine (FSM), or a microprogrammed control unit. If the processor is
pipelined a more complex pipeline control unit is implemented.
Other typical control components are the instruction and data memory
controllers that are related to the task of instruction fetching from the
instruction memory and are also related to the different addressing modes of
the instruction set.
The common characteristic of control components is that they are not
directly related to specific functions of the processor or directly implied by
the processor's instruction set. Control components existence is not evident
in the instruction format and micro-operations of the processor and the actual
implementation of the control units of a processor may significantly differ
from one implementation to another. On the contrary, the functional
components of different processors are implemented more or less with the
same architecture.
112
5.4.3
Hidden components
The hidden components of the processor are those that are included in a
processor's architecture usually to increase its performance and instruction
throughput.
The hidden components are not visible to the assembly language
programmer and user programs should functionally operate the same way
under the existence or absence of the hidden components. The only
difference in program execution should be its performance since in the case
of existence of hidden components performance must be higher than without
them.
A classical group of hidden components is the one consisting of the
components that implement pipelining, the most valuable performanceenhancing technique devised the last decades to increase processor's
performance and instruction execution throughput. The hidden components
related to pipelining include the pipeline registers between the different
pipeline stages 27 , the pipeline multiplexers and the control logic that
determines the operation of the processor's pipeline. These include
components involved with pipeline hazards detection, pipeline interlock and
forwardinglbypassing logic.
The hidden components of a processor may be either of storage,
interconnect or control nature. Storage hidden components have a similar
operation with the sub-class of storage functional components. Pipeline
registers belong to this type of storage hidden components. The pipeline
control logic has control characteristics similar to the control components
class. Finally, there are hidden components that implement pipelining which
have an interconnect nature. These are the multiplexers of the pipeline
structure of a processor which realize the forwarding of data in the pipeline
'27
Pipeline registers do not belong to the class of storage functional components because they
113
when pipeline hazards are detected. The logic which detects the existence of
pipeline hazards consists of another type of hidden components, the pipeline
comparators which can be considered part ofthe pipeline controllogic.
Other cases of hidden components are those related to other performance
increasing mechanisms like Instruction Level Parallelism (ILP) and
speculative mechanisms to improve processor performance such as branch
prediction schemes. Such prediction mechanisms can be added to a
processor to improve programs execution speeds but their malfunctioning
will only lead to reduced performance and not to functional errors in
pro grams execution.
It is obvious from the description above that self-test routines
development for hidden processor' s components (or any other testing means)
may be the most difficult among the different components classes. The
situation is simplified when the processor under test does not contain such
sophisticated mechanisms (processor without pipelining). This is true in
many cases today where previous generations microprocessors are being
implemented as embedded processors. In a large set of SoC designs, the
performance delivered by such non-pipelined embedded processors is
sufficient, and therefore software-based self-testing for them has to deal only
with functional and control components.
In cases that the performance of classical embedded processors is not
enough for an application, modem embedded processors are employed. The
majority of modem embedded processors include performance-enhancing
mechanisms like a k-stage pipeline structure (k may range from 2 to 6 or
even more in embedded processors).
Although direct self-test routines development for hidden components is
not easy, the actual testability of some of them (like the pipeline-related
components) when self-test routines are developed for the functional
components of the processor can be inherendy very high. Intuitively, this is
true due to the fact that pipeline structure is a "transparent" mechanism that
is not used to block the execution of instructions but rather to accelerate it.
The important factor that will determine how much of test development
effort and cost will be spent on pipelining, is their relative size and
contribution to overall processor's testing.
5.5
114
No
Yes
115
The criteria that are used for component prioritization for low-cost
software-based self-testing are discussed and analyzed in the subsequent
paragraphs. The criteria are in summary the following:
5.5.1
This criterion, simply stated, gives the following advice: component level
self-test routines development should give higher priority to large
components that contain large number of faults.
When a processor component is identified from the instruction set
architecture and the RTL model of the processor, its relative size and its
number of faults as a percentage of the total number of processor faults is a
valuable piece of information that will guide the development of self-test
routines. Large processor components containing large numbers of faults
must be assigned higher priority compared to sm aller components because
high coverage of the faults of this component will have a significant
contribution to the overall processor's fault coverage. For example,
developing a self-test routine that obtains a 90% fault coverage for a
processor component that occupies the 30% ofthe processor gate count (and
faults number), contributes a 30% x 90% = 27% to the total processor fault
coverage, while a 99% fault coverage on a sm aller component that occupies
only 10% ofthe processor gate count (and faults as well) will only contribute
by 10% x 99% = 9.9% to the overall processor fault coverage. Needless to
note, that reaching 90% fault coverage in a component is always much easier
than to reach 99% in another.
This criterion, although simple and intuitive, is the first one to be
considered if low-cost testing is the primary objective. Large components
must definitely be given higher priority than smaller ones since their
effective testing will sooner lead to large total fault coverage.
The actual application of this criterion for the prioritization of the
processor's components requires that the information for the gate count of
the processor components is known. Unfortunately, this information is not
116
28
Gate counts are available either when a gate-level netlist of the component is given or
when a synthesizable model of the processor (and thus the components) is available for
synthesis.
117
the storage functional components and at least the first reason for which they
should be first addressed by software-based self-test routines development.
5.5.2
The second, but equally important with the first one, criterion for
prioritization of processor components for low-cost self-test program
generation, is the component's accessibility from outside the processor using
processor instructions. The higher this accessibility is, the easier the testing
of the component iso
The development of self-test routines is much easier and requires less
engineering effort and cost when the component under test is easily
accessible by programmer-visible, general purpose registers ofthe processor.
This means that the component inputs are connected to such registers and
component outputs drive such registers as well.
In this case, the application of a single test pattern to the component
under test by a self-test program simply consists of the following three steps:
The component under test is a shifter and the operation tested by this
portion of assembly code is the right logical shift of the shifter. Register R2
contains the original value to be right-shifted (first part of the test pattern to
be applied to the shifter) and register R3 contains the number ofpositions for
right-shifting (second part ofthe test pattern for the shifter). In our example,
both these values are derived from memory (base address contained in
29 When the immediate addressing mode is used, test patterns are actually stored in the
instruction memory, i.e. as partofthe instructions.
118
30
We remind that Ri is not the usual notation of registers in MIPS assembly language, but
rather, it is $50, $51, $tO, $t1, etc. We use the Ri notation for simplicity.
119
The first instruction is the load immediate (li) instruction31 which loads
the register R6 with the test pattern value to be applied to the component
under test: register R6. The li instruction does not apply a test pattern which
is stored in data memory (as an 1 w instruction does) but a test pattern which
is stored in instruction memory (the instruction itself). The second
instruction (sw) stores the content of the register R6 (which is now the
component's test response) to a memory location addressed by the base
address in register Rl increased by the affset.
The same simplicity in self-test program development and application
does not apply to other components than computational and storage
functional components because (a) they are either not connected to
programmer-visible registers but are rather connected to special registers not
visible to the programm er; and/or (b) they can not be directly accessible by
processor instructions for test pattern application.
Therefore, computational and storage functional components must be
given priority for self-test routine development since they can quickly
contribute to a very high fault coverage for the entire processor with simple
self-test routines. If we combine this second criterion of component
accessibility and ease of testing with the previous criterion of relative
component size, we can see that functional components are very important
for self-test pro gram development in a low-cost software-based self-testing
approach. The third criterion discussed in the following subsection supports
further this importance.
5.5.3
31
Actually, the li instruction is not areal machine instruction ofthe MIPS architecture but
rather an assembler pseudo-instruction (also called macro) that the aseembler decomposes
to two instructions lui (load upper immediate) and ori (or immediate). Load upper
immediate loads a 16-bit quantity to the high order (most significant) half of the register
and ori loads the low order 16 bits of it. Therefore, the instruction li R6, testpattern is translated to:
lui R6, high-half
ori R6,
where
R6,
low-half
denotes concatenation).
120
We remark at this point that the criteria described in this and the previous
subsections are only meant to be used to prioritize the importance of
processor components for test development and are not in any sense
statements that are absolutely true in any processor architecture and
implementation. These criteria are good indications that some components
must be given priority over others for a low-cost development of self-test
routines.
Concluding with this third criterion, we mention that when other
components than functional components are being tested, for example
control components, it is very unlikely that other components are being
sufficiently tested as weIl. For example, when executing a self-test program
for the processor instruction decode component, what is necessary is to pass
through all (or most) of the different instructions of the processor. When
such a self-test pro gram is executed only a few of its instructions detect also
some faults in a computational functional unit, like an adder, which requires
a variety of data values to be applied to it (the adder). Therefore, sufficient
testing of the decoder component does not give, as a side effect, sufficient
fault coverage for the adder component or other functional components. On
the contrary, when sufficient fault coverage is obtained by a self-test routine
for the adder component then, in parallel, the decode unit is also sufficiently
tested at its part that is dedicated for the decoding of the addition-related
instructions.
121
In the global view, when separate self-test routines have been developed
for all the processors functional units (or at least for the computational and
the storage components) then the other components (control components like
the instruction decode and the instruction sequencing components and
hidden components like the pipeline registers, multiplexers and control
logic) are also very likely to be sufficiently tested. The opposite situation is
not true: when self-test routines have been developed targeting the control
components or the hidden components then the functional components are
not sufficiently tested as well, simply because the variety of data required to
test them is not included in the self-test routines far the control and hidden
components.
After having described the criteria for the prioritization of the processor
components far self-test routines development, we elaborate in the next two
subsections on the identification and selection of component operations to be
tested, as well as, on the selection of appropriate operands to test the selected
operations with processor instructions.
5.6
122
operand 2
ALU
operation
result
Figure 5-14: ALU component ofthe MIPS-like processor.
={
OALU
IS
the
add,
subtract,
and,
or,
nor,
xor,
set_on_less_than_unsigned,
set_on_less_than_signed
123
For the development of self-test routine for the ALU, one instruction I
from each set IALU,o (eight different sets for the eight different operations of
the ALU) is needed to apply a data operand for each operation o.
For example, the set IALU,NoR has only one processor instruction and thus
the selection is straightforward. The following instruction can only be used
to test for the NOR operation ofthe ALU:
nor Rd, Rt, Rs
The
sets
of instructions
IALu,oR,
I ALU , XOR,
I ALu , AND,
I ALu , SUBTRACT,
I ALu , SET_ON_LESS_THAN_UNSIGNED'
I ALU , SET_ON_LESS_THAN_SIGNED, all consist of two
instructions, one in the R-type (register) format ofMIPS where the operation
is applied between two registers of the processor and the other in the I-type
(immediate) format of MIPS where the operation is applied between a
register and an immediate operand.
The instructions in the I-type format have less controllability than the
instructions in the R-type format. Thus, the instructions in the R-type format
must be selected because they provide the best controllability and
observability characteristics due to the use of the fully controllable and
observable general purpose registers of the register file. Therefore, from
these sets, the following instructions will be selected to test the
corresponding operation ofthe ALU.
or Rd, Rt, Rs
xor Rd, Rt, Rs
and Rd, Rt, Rs
sub Rd, Rt, Rs
subu Rd, Rt, Rs
slt Rd, Rt, Rs
sltu Rd, Rt, Rs
124
In this case, the selected instruction would be any of the first two of the
listed instructions above that belongs to the R-type format because it
possesses the best controllability and observability since it uses the general
purpose registers of the register file ofthe processor.
5.7
Operand selection
add
subtract
and
or
nor
xor
set on less than unsigned
set on less that signed
We see in Table 5-1 that each operation excites a different part of the
ALU component (in this case that the component is an ALU, some
operations excite the arithmetic and some others excite the logic part of it).
Different sets of test patterns are required to excite and detect the faults in
these different parts of the component.
Appropriate selection of component-Ievel test patterns is an essential
factor for the successful development of self-test routines for components. In
this subsection the focus is on this aspect of software-based self-testing:
operand selection.
125
126
In the case that a gate level model ofthe component is available (or it can
be obtained from synthesis) and combinational or sequential ATPG succeeds
to produce a test set of sufficient fault coverage (with or without the
assistance of constraints extraction), the result is a set of k component test
patterns:
atpg-test-pattern-l
atpg-test-pattern-2
atpg-test-pattern-k
The two important properties of the set of k test patterns for self-test
development are:
127
atpg-tests-no-loop:
load register(s) with ~ediate pattern 1
apply instruction I
store result(s) to memory
load register(s) with ~ediate pattern 2
apply instruction I
store result(s) to memory
load register(s) with ~ediate pattern k
apply instruction I
store result(s) to memory
In the loop-based application oftest patterns, they occupy part ofthe data
segment (thus data memory) of the self-test program as variables. In the
second case, the test patterns are not stored in variables (data memory) but
they rather occupy part ofthe code segment (instruction memory) ofthe seIftest pro gram.
As an example, consider that the functional component under test is a
binary subtracter, a gate-level netlist of the component is available and also
that the ATPG has generated a test set consisting of k test patterns. When the
ATPG-based test patterns are applied in a loop from data memory then the
following code of MIPS assembly language gives an idea of how the k test
patterns can be applied to the subtracte~2.
test-subtracter: andi R4, R4, 0
next-test:
lw R2, xstart(R4)
lw R3, ystart(R4)
sub Rl, R2, R3
sw Rl, rstart(R4)
addi R4, R4, 4
slti R5, R4, 4*k
bne R5, RO, next-test
32 andi, addi
128
counts the number of test patterns times fOU~3 to show the correct addresses
of data memory for loading the test patterns applied to the subtracter and
storing the test response. Register R4 also controls the number of repetitions
of the loop (k repetitions in total). Registers R2 and R3 are loaded with the
next pattern in each repetition and the result of the subtraction is put by the
subtracter in the Rl register. The Rl register that contains the test response
of the subtraction is finally stored to an array of consecutive memory
locations starting at address rstart as shown in the code. At the end of
each loop iteration, the counter R4 is incremented by 1 and acheck is
performed to see if the k test patterns have been exhausted. If this is not the
case, the loop is repeated.
The self-test code above consists of eight instructions (words) and 2k
words storing the k two-word test patterns (one word for operand X and one
word for operand Y). The execution time of this routine depends on the
number k of test patterns to be applied to the subtracter. The exact execution
time and number of clock cycles depends on whether the processor
implementation is pipelined or not and also on the latency of memory read
and write cycles.
For simplicity, le us consider a non-pipelined processor. If we assume
that each instruction executes in one clock cycle apart from memory reads
and writes which take 2 clock cycles, then a rough estimation of the number
of clock cycles required for the completion of above self-test routine is lOk
clock cycles 34
Figure 5-15 presents in a visual way the application of A TPG-generated
test patterns from a data memory array using load word instructions. Two
instructions are required to load the registers with the test patterns. The test
vectors are then applied by the subtract instruction to the subtracter
component as the code above shows.
33 In the case of MIPS, we consider words of 4 bytes and hence addresses of sequential
34
words differ by 4.
In a pipelined implementation, one instruction will be completed in each cycIe but with a
smaller period. Pipeline and memory stalls will increase the execution time of each loop.
129
test patterns
in data memory
words
data
memory
Figure 5-15: ATPG test patterns application from memory.
The alternative way to implement the self-test code for the application of
the k test patterns to the subtracter is to use the immediate operand
addressing mode that all (or most) processors' instruction set architectures
include. According to the immediate addressing mode, an operand is part of
the instruction. The MIPS architecture consists of 32-bit instructions where
the immediate operand can occupy a 16-bit part of the instruction. For
example, in the immediate addressing mode instruction:
andi Rt, Rs, Imm
130
Therefore, for the application of the k test patterns to the subtracter the
following self-test routine can be applied.
test-subtracter: andi R4, R4, 0
lui R2, xtest-l-upper
ori R2, R2, xtest-l-lower
lui R3, ytest-l-upper
ori R3, R3, ytest-l-lower
sub Rl, R2, R3
sw Rl, rstart(R4)
addi R4, R4, 4
lui R2, xtest-2-upper
ori R2, R2, xtest-2-1ower
lui R3, ytest-2-upper
ori R3, R3, ytest-2-1ower
sub Rl, R2, R3
sw Rl, rstart(R4)
addi R4, R4, 4
lui R2, xtest-k-upper
ori R2, R2, xtest-k-lower
lui R3, ytest-k-upper
ori R3, R3, ytest-k-lower
sub Rl, R2, R3
sw Rl, rstart(R4)
addi R4, R4, 4
35
131
test patterns
in instruction
memory words
(part of the instructions)
instruction
memor
core
data
memory
Figure 5-16: ATPG test patterns application with immediate instructions.
132
frequency ofthe processor is 200 MHz (see the calculations given in Section
5.2.3). The number oftest responses is k.
Coding style
ATPG-based
2k+B
o . 11 k+ 0 . 16
k
lOk
loop from
memory
ATPG-based
0.20 k )lsec
7k
k
Bk
with immediate
Tabte 5-2: ATPG-based self-test routines test application times (case I).
)lsec
We can see in Table 5-2 that in this case where the processor is faster
from the tester to a reasonable level (processor is 4 times faster than the
tester), the loop-based application of ATPG-based patterns from data
memory is about two times faster in test application time compared to the
immediate addressing mode application ofthe same k test patterns.
If, on the other side, the tester is ten times slower than the processor chip
(consider for example a tester of 100 MHz frequency used for a 1 GHz
processor) then the difference between the two approaches becomes more
significant, as Table 5-3 shows.
Coding style
ATPG-based
2k+B
o . 0 4 k + O. 0 B
k
lOk
loop from
memory
ATPG-based
7k
k
0.088 k )lsec
Bk
with immediate
Tabte 5-3: ATPG-based self-test routines test application times (case 2).
)lsec
133
execution time of the pro gram of excessively high (this is the case in both
cases of loop-based approach and the immediate addressing mode approach).
Wehave to note at this point that the dock cydes that the load word and
store word instructions need (the CPI - docks per instruction) has an impact
on the exact values of the previous Tables which are only indicative. Also,
the self-test routines execution time is affected by the existence or not of a
pipeline structure.
5.7.2
36
The hardware-based self-testing scheme where test patterns are applied from the memory
does not apply to pseudorandom testing because ofthe large number oftest patterns.
134
135
next-sub
exit:
136
next-pattern:
exit:
In the code above, we assurne that the test response is collected in Rl and
then stored to the responses array. The LFSR routine can be called as many
times as necessary for the specific operation being tested (usually two times
for two-operand operations, so that in the first time the one operand's next
random value is generated and in the second time the other operand's
random value is generated).
We mention at this point that all code examples given in this Chapter
using the MIPS instruction set may be differently applied to other processor
137
5.7.3
37
See, for example, the works on multipliers testing [53], [54], [95], detailed in a few pages.
138
Regular, deterrninistic test sets consist of test patterns that have a relation
between them, which makes them easily on-chip generated (either by
hardware or by software). Each test pattern of the test set can be derived
from the previous one by a simple operation such as:
One may think that pseudorandom tests are generated in a similar way
(LFSR emulation consists of shifts and exc1usive-or operations). The
difference between the two approaches is that pseudorandom testing requires
the generation of a large number oftest patterns, while regular, deterrninistic
relies on a sm all test set.
Regular, pre-computed, deterrninistic test sets combine the positive
properties of ATPG-based and pseudorandom based test development for
processor components. Therefore, in cases when ATPG-based test
development is not able to obtain high fault coverage with very few test
vectors, and also in cases when pseudorandom test development is not able
to obtain a high fault coverage within a reasonable amount of time (c1ock
cyc1es), regular pre-computed sets of test patterns is the solution. They reach
high fault coverage levels with a number of test patterns (and thus c10ck
cyc1es) much less than pseudorandom testing and a bit more that ATPGbased test development. On-chip generation of regular, deterrninistic test sets
is as easy as pseudorandom test development while the self-test program size
is reasonably small.
139
Fault
coverage
High
Seit-test
code size
Small
Test application
time
Short
5.7.4
Short
Long
Short
Now, that we have analyzed and discussed the different test development
approaches for theprocessor components, the question to be answered is:
which one of the approaches is selected for a particular component of a
processor? The answer may seem straightforward after the long analysis, but
it is always useful to summarize.
140
The actual fault coverage obtained in the pseudorandom case, can only be
calculated if a gate-level netlist is available, but the test development phase
(development of the self-test routines) can be done without any gate-level
information available.
We select the regular, pre-computed pattern based, self-test routine
development approach jor processor components, when:
there are known, effective, pre-computed test sets for this type of
component (even if the gate level netlist of it is not available);
a gate-level netlist of the component is not available or can 't be
obtained by synthesis;
141
5.8
5.8.1
5.8.2
memory
access
142
143
5.8.3
144
145
A1 A2 A3
ALU
For example, Figure 5-17 shows part of the pipeline logic where a 3input multiplexer is used at the A input of the ALU to select from three
different sources. Signals Al, A2, A3 are connected to the ALU input A
depending on the forwarding path activated in the processor's pipeline (if
such a path is activated at this moment). A successful self-test program
generation process must guarantee (if RTL or structural information of the
processor is available) that aB paths via the multiplexer (Al to A, A2 to A,
A3 to A) are activated by corresponding instructions sequences so that the
multiplexer component is sufficiently tested.
As we see in the experimental results of the next Chapter, pipeline
structures of the processors are relatively easily tested while self-test
routines are being applied to the other processor components. Of course, we
point out that the publicly available benchmark processors we use may have
a relatively simple pipeline structure compared with high-end modem
microprocessors or embedded processor. Unfortunately, the unavailability of
such complex commercial processor models makes the evaluation of
software-based self-testing on them difficult. Fortunately, software-based
self-testing is attracting the interest of major processor manufacturers and it
is likely that results will become available in the near future. This way, the
potentiallimitations (if any) of software-based self-testing may be revealed.
Moreover, software-based self-testing must be evaluated for other
performance-improving mechanisms such as the branch prediction units of
processors. In such cases, due to the inherent self-correcting nature of these
mechanisms, faults inside them are not observable at aB to processor outputs,
but only performance degradation is expected. Performance measuring
mechanisms can be possibly used for the detection of faults in such units and
research in this topic is expected to ga in importance in the near future.
146
5.9
147
"""'- - '
global
compaction
routine
component
test responses
/L--.:r--.......-..!final
memory
producethese
data
signature(s)
memory
1TI"- - '
final
signature(s)
under
test
instruction
memory
data
memory
148
5.10
reduce the self-test pro gram download time from the low-cost
external tester;
reduce the self-test program execution time;
149
5.10.1
150
Test Instruction 2
adder
shifter
A sufficient test set for the shifter can be generated at the outputs
of the adder. In other words, appropriate inputs must be supplied
to the adder so that they test the adder itself and also provide a
sufficient test set for the shifter.
The shifter does not mask the propagation of error at the adder
outputs (because of faults in the adder) towards primary outputs
ofthe processor.
Let us consider that, if separate self-test routines are developed for the
two components, adder and shifter, each of them consists of a basic loop that
applies a set of test patterns to each component. Let us assume also, that the
test set for the adder applies 70 test patterns to it (this can be the case for a
carry lookahead adder) and that the test set for the shifter applies 50 test
patterns to it. Also, let us consider that the basic loop of the self-test routine
for the adder executes each iteration in 30 dock cydes, and that the basic
loop of the self-test routine for the shifter also executes each iteration in 30
dock cydes. Therefore, a self-test program that uses these two routines one
after
the
other
will
execute
in
approximately
70 x 30 + 50 x 30 = 3,600 dock cycles. Applying the "chained" testing
optimization technique will most probably lead to a combined loop that
applies to the adder and shifter a total of 80 test patterns and executes each
iteration in 31 dock cydes (one more instruction is added to the loop). The
total number of test patterns is larger since it may be necessary to "expand"
the set of adder tests so that their responses produce a sufficient test set for
the shifter which is tested subsequently. Moreover the larger loop execution
151
time for each execution (31 instead of 30) is due to the fact that an extra
instruction is added to it for the application of the test pattern to the shifter.
In the combined, "chained" routine the total execution time of the combined
loop will be 80 x 31 = 2,480 clock cycles. The numbers used in this small
analysis are picked to show the situation in which the optimization technique
can lead to a more efficient self-test code. There may be, of course,
situations where the optimized code is not better than the original. For
example, this may be the case when the total iterations of the combined loop
are too many. This may be caused by the inability to reasonably "expand"
the adder test set so that a sufficient test set for the shifter is produced at the
adder outputs. If, in our simple adderlshifter example the total number of
combined loop iterations is 120 instead of 80 then the number of clock
cycles for the new loop will be 120 x 31 = 3,720 which is larger from the
original back-to-back execution ofthe two component routines.
The following pseudocodes show how "chained" component's testing
can be applied. First, two separate self-test routines for component's Cl and
C2 are given. Then, the combined routine for the "chained" testing of the
components is shown. In this example we assume that the components are
both originally tested with ATPG-based test patterns applied in a loop from
data memory.
atpg-loop-Cl:
load register(s) with next test pattern for Cl
apply instruction lCl
store result(s) to memory
if applied-patterns < Kl
repeat atpg-loop-Cl
atpg-loop-C2:
load register(s) with next test pattern for C2
apply instruction lC2
store result(s) to memory
if applied-patterns < K2
repeat atpg-loop-C2
atpg-loop-Cl-C2:
load register(s) with next test pattern for Cl
apply instruction lCl
apply instruction lC2
store result(s) to memory
if applied-patterns < max(Kl, K2)+m
repeat atpg-loop-Cl-C2
152
153
components is stored to the memory and not used as a test pattern for
another component.
Test Instruction 1
Test Instruction 2
same
test pattern
adder
to memory
subtracter
to memory
5.11
154
automatie generation
of eomponent
self-test routines
automatie generation
and optimization
of proeessor
self-test programs
Figure 5-22: Software-based self-testing automation.
155
Chapter
6
Case Studies - Experimental Results
158
6.1
159
modes for memory operands. Considering that each addressing mode leads
to different instructions, Parwan's instruction set consists of 24 different
instructions.
The Parwan processor model is available in VHDL synthesizable format
and its architecture includes the components shown in Table 6-1. The
classification of each component to the classes described in the previous
Chapter is also shown in Table 6-1.
Component Name
Component Class
Functional computational
Functional computational
Functional storage
Contra I
Contral
Control
Contra I
Contra I
Out of the eight processor components only the Arithmetic Logic Unit
and the Shifter are combinational circuits and also the only functional
computational components. It should also be noted that the only processor
data register is the accumulator. This single data processor register is fully
accessible in terms of controllability and observability by processor
instructions.
We have synthesized Parwan from its VHDL source description and the
resulting circuit consists of 1,300 gates including 53 flip-flops.
6.1.1
The Parwan processor components that have been targeted for self-test
program development are the ALU, the Shifter and the Status Register. SeIftest routines development has been done in three phases: Phase A for the
ALU, Phase B for the Shifter and Phase C for the Status Register. We have
selected this sequence because the third functional unit of the processor, the
Accumulator, is already sufficiently tested after Phase A.
Table 6-2 shows the statistics of the self-test code for the three
consecutive phases A, Band C.
160
Number of Instructions
Self-Test Program Size (bytes)
Response Data Size (bytes)
Execution Time (cycles)
Phase B
target
Shifter
440
881
122
16,545
Phase C
target
Status
Register
463
9;>3
124
16,667
Table 6-3 shows the fault coverage for single stuck-at faults for each of
the Parwan processor components after each ofthe three phases.
Component Name
Fault
Coverage
after
Phase A
98.31%
75.56%
98.67%
87.05%
92.13%
97.22%
98.26%
82.93%
Fault
Coverage
after
Phase B
98.48%
93.82%
98.67%
88.10%
92.13%
97.22%
98.26%
85.52%
Fault
Coverage
after
Phase C
98.48%
93.82%
98.67%
88.10%
92.13%
97.22%
98.26%
87.68%
88.70%
91.10%
91.34%
6.2
161
Component Class
Register File
Multiplier
Divider
Arithmetic-Logic Unit
Shifter
Memory Control
Program Counter Logic
Control Logic
Bus Multiplexer
Pipeline
Functional
Functional
Functional
Functional
Functional
Control
Control
Control
Functional
Hidden
storage
computational
computational
computational
computational
interconnect
162
Component Name
Register File
Multiplier/Divider 39
Arithmetic-Logic Unit
Shifter
Memory Control
Pragram Counter Logic
Control Logic
Bus Multiplexer
Pipeline
Total CPU
Gate Count
9,906
3,044
491
682
1,112
444
223
453
885
17,459
Gate Count
Register File
Multiplier/Divider 40
Arithmetic-Logic Unit
Shifter
Memory Contral
Program Counter Logic
Control Logic
Bus Multiplexer
Pipeline
9,905
11,601
491
682
1,119
444
230
453
885
Total CPU
26,080
39
40
163
Gate Count
Register File
Multiplier/Divider
Arithmetic-Logic Unit
Shifter
Memory Contral
Program Counter Logic
Control Logic
Bus Multiplexer
Pipeline
11,905
Total CPU
30,896
13,358
900
834
1,163
493
361
623
961
6.2.1
164
Fault Coverage
tor Design I
Register File
Multiplier/Divider
Arithmetic-Logic Unit
Shifter
Memory Contral
Pragram Counter Logic
Control Logic
Bus Multiplexer
Pipeline
97.7%
87.5%
96.6%
98.4%
88.3%
53.1%
78.9%
65.7%
91.9%
Total CPU
92.2%
Table 6-8: Fault simulation results for the Plasma processor Design 1.
Register File
Parallel Multiplier
Se rial Divider
Arithmetic-Logic Unit
Shifter
Memory Control
Contral Logic
Total CPU
Seit-Test
Routine Size
(words)
319
28
41
79
210
76
100
Execution
Time
(cycles)
582
3,122
1,154
275
340
160
164
853 41
5,797
Table 6-9: Self-test routine statistics for Designs II and III of Plasma.
~I In Design I with the serial multiplier the total pro gram was 965 words and total cycles
were 3,552 because the serial multiplier needs a larger test program that is executed for
more clock cycles than the one for the parallel multiplier.
165
We notice in the self-test routine statistics of Table 6-9 that there exist
components with very small self-test routines, such as the multiplier and the
divider, which routines take a very large percentage of the overall test
execution time. This is because these routines consist of small, compact
loops that are executed for a large number oftimes and apply a large number
of test patterns to the component they test. On the other side, there. are
components like the Shifter with a relatively large self-test code which is
executed very fast because it is a code that does not contain loops but rather
applies a sm all set of test patterns using immediate addressing mode
instructions for all shifter operations. The self-test routine for the ArithmeticLogic Unit consists of segments for every ALU operation that combine small
compact loops and immediate addressing mode instructions. The self-test
routine of the Memory Control consists of load instructions with specific
data previously stored in data memory, and store instructions that generates
the final responses in data memory as weIl. Finally, the self-test routine of
the Control Logic component is based on an exhaustive application of all the
processor's instruction opcodes not already applied in the routines of the
previous components. This functional testing approach at the end of the
methodology is very natural to be applied to the control unit of the
processor.
At this point we remark that a similar self-test routine development
strategy has been adopted for the remaining benchmark processors, for their
components that are similar with the components ofPlasmaIMIPS.
The fault simulation results for the two designs of PlasmaIMIPS, Design
11 and III are shown in Table 6-10.
Component Name
Register File
Multi~lier/Divider
Arithmetic-Logic Unit
Shifter
Memo~ Control
Program Counter Logic
Control Logic
Bus Multiplexer
Pipeline
Total CPU
Fault
Coverage for
Desian 11
97.8%
96.3%
96.8%
98.4%
87.9%
54.9%
89.3%
71.8%
98.4%
Fault
Coverage for
Desian 111
97.8%
95.2%
95.8%
95.3%
90.3%
55.9%
85.3%
71.3%
96.0%
95.3%
94.5%
Table 6-10: Fault simulation results for Designs II and III ofPlasma.
We see that in the two implementations with the parallel multiplier the
overall fault coverage is higher than in the case of Design I which contains a
serial implementation of the multiplier. This fault coverage increase is
166
simply due to the fact that a large component like the parallel multiplier with
very high testability has been inserted in the processor. The same design
change could be done for the division operation and another large
component, the parallel divider, could be added. This would lead to further
increase of the processor's fault coverage. We didn't implement this design
change because the division operation is not as common as the multiplication
and therefore the cost of adding a large parallel component that is
infrequently used is not justified in low-cost applications. Of course in a
special implementation of the processor in an application with many
divisions, the inc1usion of a parallel divider will lead to an increased
performance of the processor, as well as, an increased fault coverage
obtained by software-based self-testing. In such a case, where a parallel
divider is also implemented, fault coverage can be as high as 98% for single
stuck-at faults.
The overall processor fault coverage is, in both designs, very high (more
than 94.5%) while the particular fault coverage levels of each component
may slightly differ because of the different synthesis optimization
parameters that lead to a different gate-level implementation of the
component.
We also note that the pipeline logic is tested as a side-effect oftesting the
remaining processor components achieving very high fault coverage. This
fact is due to the simple pipeline structure that Plasma realizes.
167
Component Name
Gate Count
Register File
Multiplier/Divider
Arithmetic-Logic Unit
Shifter
Memory Contral
Program Counter Logic
Contral Logic
Bus Multiplexer
Pipeline
11,558
11,654
558
636
1,120
449
244
431
876
Total CPU
27,824
Total CPU
Fault
Coverage tor
Desisn 11
97.8%
96.3%
96.8%
98.4%
87.9%
54.9%
89.3%
71.8%
98.4%
Fault
Coverage tor
Desisn IV
97.8%
96.1%
97.5%
99.9%
88.5%
54.9%
88.3%
72.0%
96.9%
95.3%
95.3%
We note that the total fault coverage for the processor is exactly the same
95.3% of all single stuck-at faults. For each of the components of the
processor, fault coverage results may slightly differ to a maximum of 1.5%
ofthe component's faults (shifter, pipeline). These small differences are due
to the different cells that each implementation library contains. The same
self-test program reaches slightly different structural fault coverage in each
ofthe processor components.
Some useful conclusions can be drawn from this first application of
software-based self-testing to the reasonable size processor model:
168
6.3
ALU
Shifter
Hi-Lo Registers
Controller
Data Memory Controller
Program Counter Logic
Instruction register
Pipeline registers
Table 6-13: Meister/MIPS processor components.
Component Class
Functional storage
Functional computational
Functional computational
Functional computational
Functional storage
Control
Control
Control
Hidden
Hidden
169
Gates Count
11,414
12,564
658
633
536
2,352
1,086
644
275
5,693
37,402
170
6.3.1
Register File
Parallel Multiplier
Serial Divider
Arithmetic-Logic Unit
Shifter
Hi-Lo Registers
Control Logic
Total CPU
Self-Test
Routine Size
(words)
720
68
65
192
378
30
275
Execution
Time
(cycles)
859
5,855
1,396
1,188
437
35
291
1,728
10,061
Fault Coverage
99.8%
95.2%
98.4%
99.8%
100.0%
79.2%
58.2%
58.5%
97.4%
91.0%
Total CPU
92.6%
171
increased test program size (almost double) and test execution time (almost
double) and sm aller fault coverage (92.6% compared to more than 94.5% of
Plasma). Otherwise (if the pipeline logic was completely implemented) test
program statistics would have been very similar to the case of PlasmaJMIPS
and fault coverage would have been much higher.
6.4
Component Class
Functional storage
Functional computational
Functional computational
Control
Control
Control
Hidden
172
Component Name
Register File (REGS)
Integer Unit (lU)
Immediate Extender (IMM EXT)
Memory Access Unit (MAU 1) tor Instruction Memory
Memory Access Unit (MAU 2) tor Data Memory
Control Logic (CONTROL)
Pipeline Registers
Total CPU
Gate Count
22,917
5,698
269
576
576
388
3,771
43,208
The Jam processor benchmark gives us the ability to evaluate softwarebased self-testing in a more complex RISC processor model with a fully
implemented pipeline architecture which realizes hazard detection and
forwarding.
We have developed self-test routines for the list of components shown in
Table 6-19. In this Table we can also see the self-test routine size and
173
execution time for each component. These routines together compose a seIftest program of 897 words executed in 4,787 clock cycles for the Jam
processor.
Component Name
Self-Test
Routine
Size
(words)
478
147
32
120
120
Total CPU
897
Execution
Time
(cycles)
550
3,920
38
135
144
4,787
We have not developed any special self-test routines for the pipeline
logic since pipeline forwarding is already activated many times during the
execution of the se1f-test pro gram and the pipeline logic, multiplexers and
registers are sufficiently tested as a side-effect of testing the remaining
components. The fault simulation results for the Jam processor after
evaluation of 454 responses in data memory are shown in Table 6-20. The
achieved overall processor fault coverage is very high, 94% with respect to
single stuck-at faults.
Component Name
Register File (REGS)
Integer Unit (lU)
Immediate Extender (IMM EXT)
Memory Access Unit (MAU 1) for Instruction Memory
Memory Access Unit (MAU 2) for Data Memory
Contra I Logic (CONTROL)
Pipeline Registers
Total CPU
Table 6-20: Fault simulation results for Jam processor.
6.5
Fault
Coverage
98.1%
98.9%
98.5%
69.4%
81.7%
81. 2%
89.7%
94.0%
174
Component Class
Functional computational
Functional interconnect
Control
Functional storage
Functional storage
Control
Hidden
Gate Count
1,147
269
970
4,507
635
2,703
10,305
The Arithmetic Logic Unit in the oc8051 processor core implements the
following integer operations: addition, subtraction, bitwise OR, bitwise
AND, bitwise XOR, shift and multiplication.
The ALU Source Select component selects the ALU input sources. The
Special Function Registers component contains 18 registers of one or two
bytes each (accumulator, B register, Pro gram Status Word, Stack Pointer,
etc). The Indirect Address component implements the indirect addressing
175
Seit-Test
Code Size
~bxtes!
1,452
512
548
560
360
328
~cxcles!
2,964
541
598
614
324
370
3,760
5,411
Seit-Test
Code Time
176
Component Name
Arithmetic Logic Unit (ALU)
ALU Source Select (ASS)
Decoder (DEC)
Special Function Registers (SFR)
Indirect Address (INDI ADDR)
Memory Interface (MEM)
Fault Coverage
98.4%
97.1%
81.5%
96.2%
90.9%
89.9%
Total CPU
93.1%
We see that for oc8051 as weIl high fault coverage of 93.1 % with respect
to single stuck-at faults is obtained with a relatively small and fast pro gram.
6.6
Component Class
Functional computational
Functional storage
Functional computational
Control
Control
Control
Control
Hidden
177
Gate Count
TotalCPU
127
1, 513
406
777
645
185
294
650
4,693
This smaller RISC processor benchmark has almost the same functional
components of larger 32-bit RISC processors but for a smaller word length.
We can see that the functional components of all types in RISC-MCU
occupy a total of 43.60% of the entire processor area. This percentage is
much smaller than in the case of larger 32-bit RISC processors described in
the previous sections. This is due to the fact that although the control logic
components remain of the same order of magnitude for a small word length
(8 bits) or a large word length (32 bits), the datapath components that
perform the operations (computational functional components) and store the
data (storage functional components) are significantly larger in the 32-bits
word length. For example, the General Purpose Registers (GPR) component
of RISC-MCU occupies only l,513 gates, while the corresponding register
files of Plasma, Meister and Jam occupy more than 9,000, 11,000 or even
22,000 gates (Jam).
6.6.1
178
Component Name
Seit-test
Code Size
(words)
360
510
128
200
60
1,258
Seit-test
Code Time
(cycles)
410
674
1,032
240
90
2,446
Fault Coverage
99.2%
99.0%
98.4%
91.1%
72.5%
91.4%
74.0%
92.4%
Total CPU
91.2%
We see that high fault coverage of 91.2% with respect to single stuck-at
faults is obtained with a relatively sm all and fast self-test pro gram. The fault
coverage is less than in the case of larger 32-bit RISC processors due to the
smaller scale ofthe functional components.
6.7
The oc54x DSP core is a synthesizable 16/32, dual-16 bit DSP core
which is available in synthesizable Verilog format [120]. Oc54x is an
implementation of a popular family of DSPs designed by Texas Instruments
and is software compliant with the original TI C54x DSP.
The oc54x DSP processor consists of the identifiable components shown
in Table 6-29. We can also see in the list below the characterization of each
of the components into the classes described in the previous Chapter.
179
Component Class
Functional computational
Functional computational
Functional computational
Functional interconnect
Functional computational
Functional computational and storage
Functional storage
Control
Gates Count
687
3,145
1,682
444
799
3,819
130
905
11,611
We can see that the synthesis statistics ofthe oc54x DSP make it a very
suitable architecture for the applicatin of software-based self-testing
because of the existence of many and large functional components. A total of
92.21 % of the entire DSP area is occupied by the functional units of all
subtypes. All of them are well accessible units that can be easily targeted
with software-based self-testing routines.
6.7.1
Our DSP benchmark oc54x has very high fault coverage because it
consists of many and large functional components and a small control logic.
We have developed self-test routines for the following functional
180
components of oc54x as shown in Table 6-31. In this list we can also see the
self-test routine size and execution time for each component and the overall
DSP. These routines together compose a self-test program of 1,558 words
executed in 7,742 clock cycles for the oc54x DSP benchmark.
Component Name
Seit-test
Routine Size
Execution
Time (cycles)
lwords~
Accumulator (ACC)
Arithmetic Logic Unit (ALU}
Barrel Shifter (BSFT)
Compare/Select and Store Unit (CSSU)
Exponent Decoder (EXP}
Multipl:i/Accumulate Unit (MAC)
Temporary Register (TREG)
Control
Total CPU
Table 6-31: Self-test routines statistics for oc54x DSP.
64
102
700
180
256
24
112
120
78
956
778
264
340
5,020
152
154
1,558
7,742
The targeted functional components of the oc54x DSP are very classical
components in a DSP datapath and the generation of self-test routines for
them is similar to other components like these for the previous processors'
benchmarks. The fault simulation results after evaluation of 572 test
responses in data memory are shown in Table 6-32.
We see that very high fault coverage of 96.0% with respect to single
stuck-at faults is obtained with a relatively sm all and fast self-test program.
The very high fault coverage is achieved because the vast majority of
processor area is occupied by its functional components.
Component Name
Accumulator (ACe}
Arithmetic Logic Unit (ALU)
Barrel Shifter (BSFT)
Compare/Select and Store Unit (CSSU)
Exponent Decoder (EXP)
Multiply/Accumulate Unit (MAC)
Temporary Register (TREG)
Contral
Total CPU
Table 6-32: Fault simulation results for oc54x DSP.
Fault Coverage
99.2%
98.0%
99.4%
89.0%
91.1%
98.9%
98.0%
84.0%
96.0%
6.8
181
Execution time
without compaction
(clock cycles)
5,797
10,061
4,787
5,411
2,446
7,742
Execution time
with compaction
(clock cycles)
9,874
23,865
8,876
9,513
5,122
12,893
6.9
Summary of Benchmarks
182
Oescription
HOL
Usefulness
8-bit, accumulator-based
processor
32-bit RISC processor
VHDL
VHDL
Meisterl
MIPS
VHDL
Jam
VHDL
oc8051
8-bit microcontroller
Verilog
RISC-MCU
VHDL
oc54x
Verilog
Benchmark
Processor
Parwan
Plasmal
MIPS
We re mark that all the benchmarks we used in this Chapter are publicly
available and represent some very good efforts to implement some common,
classic and popular instruction set architectures. The success of the
application of software-based self-testing in these benchmarks gives a strong
evidence for the practicality and usefulness of the methodology, but this
success does not mean that the approach can be applied straightforwardly to
any commercial embedded processor, microprocessor or DSP, neither that
the same self-test programs will obtain the same fault coverage in the real,
commercial implementations ofthe same instructions set architectures.
Table 6-35 demonstrates the effectiveness of software-based self-testing
on each of the selected benchmark processors that can be used in practical
low-cost embedded systems. Table 6-35 summarizes for each benchmark:
the processor size in gate equivalents, the functional components percentage
with respect to the total processor area, the size of the self-test program, the
execution time of the self-test pro gram (without responses compaction) and
the total single stuck-at fault coverage for the entire processor.
Gate
Count
Plasma I
Plasma 11
Plasma 111
Plasma IV
Meister
Jam
oc8051
RISC-MCU
oc54x
17,458
26,080
30,896
27,824
37,402
43,208
10,305
4,693
11,611
Functional
Components
Percentalile
83.48%
88.69%
89.39%
89.26%
68.99%
66.85%
63.64%
43.60%
92.21%
183
Self-test
program
size
965 w
853 w
853 w
853 w
1,728 w
897 w
3,760 b
1,258 w
1,558 w
Execution
Time
Fault
coverage
lc~clel
3,552
5,797
5,797
5,797
10,061
4,787
5,411
2,446
7,742
92.2%
95.3%
94.5%
95.3%
92.6%
94.0%
93.1%
91. 2%
96.0%
From the contents of Table 6-35 we can draw the following conclusions
that outline the effectiveness of software-based self-testing on the selected
processor benchmarks.
The self-test programs sizes are very small at the range of just a
few kilobytes.
The execution time of the self-test programs is, in all cases, less
than 10,000 clock cycles.
High fault coverage is obtained in all cases and the highest
figures are obtained far the benchmarks that have a larger
percentage of functional components.
Chapter
7
Processor-Based Testing ofSoC
7.1
The concept
186
tests to other cores of the SoC to which it has access and captures the core
responses, playing the role of an "internai" tester.
External Test
Equipment
Memory
187
processor core which will be supplied by the core test patterns and will
effectively apply them to the core at the actual operating speed of the SoC
(at-speed testing).
In other cases, an existing core may come with an autonomous testing
mechanism, such as hardware-based self-testing. This is a very common
situation in embedded memory cores which usually contain a memory BIST
mechanism that applies a comprehensive memory test set to the core.
Memory BIST mechanisms do not occupy excessive silicon area and most
memory core providers deliver their cores with such embedded BIST
facilities. In this case, the only role that the embedded processor is assigned
in software-based self-testing of SoC, is to start and stop (if necessary) the
memory BIST execution at the embedded memory core and capture the final
signature of BIST for further evaluation. Availability of an autonomous
mechanism in an embedded processor is not, of course, restricted only in the
case of memories.
A serious factor that determines the efficiency of software-based selftesting in SoC designs is the mechanism that is used for the communication
between the embedded processor and the external test equipment. Since this
communication is restricted to the frequency of the external test equipment,
it has an impact to the overall SoC test application time. If the self-test codes
and self-test data/patternslresponses that must be transferred to the processor
memory are downloaded with a low-speed serial interface, the total
download time will me long and will seriously affect the overall test
application time. lmprovements can be seen if a parallel downloading
mechanism is used, or even better if the external test equipment can
communicate with the processor's memory using a Oirect Memory Access
(OMA) mechanism which does not interfere with the processor.
Apart from performing self-testing on the embedded cores of the SoC,
testing can be performed by software-based self-testing to the SoC buses and
interconnects. The overall performance of the SoC strongly depends on the
performance of the interconnections and hence detection of defects/faults in
these due to crosstalk must be achieved during manufacturing testing. The
use of software-based self-testing for crosstalk testing has been studied in the
literature as we will see in the next section.
Apart from the "as-is" use of an embedded processor for the self-testing
of the cores and interconnects of a SoC, two other approaches may be used
(have also been studied in the literature): (a) an existing embedded processor
may be modified to include additional instructions (instruction-level Off) to
assist the testing task of the processor itself and the SoC overall; (b) a new
processor, dedicated to SoC testing may be synthesized and used in the SoC
just for the application of the SoC test strategy. In both scenaria (a) and (b)
the objective is the reduction of the overall test application time of the SoC.
188
80th solutions may be very efficient if they do not add excessive area and
performance overheads in the SoC architecture.
7.1.1
189
The fault models used for the different cores of the SoC may significantly defer. While
digital core testing is based on the single stuck-at fault model or a more comprehensive
structural fault model such as the path delay fault model, memory cores may be tested for
a variery of memory-related fault models.
190
7.2
Literature review
191
192
embedded processor to test the cores. Fault coverage results are provided in
the paper for the testing ofISCAS circuits used as example cores ofthe SoC.
C.-H.Tsai and C.-W.Wu proposed in 2001 [164] a processorprogrammable built-in self-test scheme suitable for embedded memory
testing in a Soc. The proposed self-test circuit can be programmed via an on
chip microprocessor. Upon recelvmg the commands from the
microprocessor, the test circuit generates pre-defined test patterns and
compares the memory outputs with the expected outputs. Most popular
memory test algorithms can be realized by properly programming the seIftest circuit using the processor instructions. Compared with processor-based
memory self-testing schemes that use a test pro gram to generate test patterns
and compare the memory outputs, the test time of the proposed memory
BIST scheme was greatly reduced.
C.Galke, M.Pflanz and H.T.Vierhaus proposed in 2002 [49] the concept
of designing a dedicated processor for the self-test of SoCs based on
embedded processors. A minimum sized test processor was designed in
order to perform on-chip test functions. Its architecture contained special
adopted registers to realize LFSR or MISR functions for test pattern decompaction and pattern filtering. The proposed test processor architecture is
scalable and based on a standard RISC architecture in order to facilitate the
use of standard compilers on it.
A.Krstic, W.-C.Lai, L.Chen, K.-T.Cheng and S.Dey presented in 2002
[101], [102] a review ofthe group's work in the area ofsoftware-based selftesting for processors and SoC designs. The software-based self-testing
concept is presented in detail in these works and its advantages are clearly
outlined. Discussion covers self-testing of the embedded processor (for
stuck-at and delay faults), self-diagnosis of the embedded processor, selftesting of buses and interconnects, self-testing of other SoC cores using the
processor and also instruction-Ievel DIT. The authors have worked in these
subtopics of software-based self-testing and a reference in more detail has
been given in Chapter 4 for those that are related to processor self-testing
only.
S.Hwang and J.A.Abraham discussed in 2002 [71] an optimal BIST
technique for SoC using an embedded microprocessor. The approach aims to
the reduction of memory requirements and test application time in scanbased core testing using pseudorandom and deterministic test patterns. This
is achieve using a new test data compression technique using the embedded
processor of the SoC. Experimental results are given using Intel's x86
instruction set (pro grams are deve10ped in C and a compiler is used for their
translation in assembly/machine language) on several ISCAS circuits.
Comparisons are also reported with the previous works of [62], [80], and the
193
superiority of the approach of [71] is shown in terms of the total test data
size ofthe three approaches on the ISCAS circuits studied.
L.Chen, X.Bai, S.Dey proposed in 2002 [26] a new software-based seIftest methodology for SoC based on embedded processors that utilizes an onchip embedded processor to test at-speed the system-level interconnects for
crosstalk. Testing of long interconnects for crosstalk in SoCs is important
because crosstalk effects degrade the integrity of signals traveling on long
interconnects. They demonstrated the feasibility of the proposed method by
applying it to test the interconnects of a processor-memory system. The
defect coverage was evaluated using a system-level crosstalk defect
simulation environment.
M.H.Tehranipour, M.Nourani, S.M.Fakhraie and A.Afzali-Kusha in 2003
[154] outlined the use of embedded processor as a test controller in a SoC to
test itself and the other cores. It is assumed that a DMA mechanism is
available for efficient downloading of test programs in the processor
memory. The flexibility of the approach of processor-based SoC testing is
discussed. The approach is based on appropriate use of processor
instructions that have access to the SoC cores. The approach is evaluated on
a SoC design based on a DSP core called UTS-DSP which is compatible
with TI's TMS320C54x DSP family. The SoC also contains an SRAM, a
ROM, aSerial Port Interface and a Host Port Interface. A test program has
been developed for the entire SoC consisting of a total of 689 bytes and a
test execution time of 84.25 msec. The test pro gram reached a 95.6% fault
coverage for the DSP core, 100% for the two memories and 86.1 % and
81.3% for the two interface cores, respectively. Although, in the work of
[154], no sufficient details are given for the approach, the interest of this
paper is that it gives an idea of the overall SoC testability that can be
obtained by a very small embedded software program.
7.3
194
Chapter
8
Conclusions
Chapter 8 - Conclusions
196
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
198
References
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
References
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
199
200
References
[32]
References
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
201
T.G.Foote,
D.E.Hoffman,
W.V.Huott,
T.J.Koprowski,
B.J.Robbins, M.P.Kusko, "Testing the 400 MHz IBM generation-4
CMOS chip", Proceedings of the IEEE International Test
Conference (lTC) 1997, pp. 106-114.
J.F.Frenzel, P.N.Marinos, "Functional Testing of Microprocessors
in a User Environment", Proceedings of the Fault Tolerant
Computing Symposium (FTCS) 1984, pp. 219-224.
R.A.Frohwerk, "Signature Analysis: A New Digital Field Service
Method", Hewlett-Packard Journal, vol. 28, no. 9, pp. 2-8, May
1977.
R.Fujii,
J.A.Abraham,
"Self-test
for
microprocessors",
Proceedings of the International Test Conference (ITC) 1985, pp.
356-361.
S.B.Furber, ARM System-on-Chip Architecture (2nd Edition),
Addison-Wesley, August, 2000.
C.Galke, M.Pflanz, H.T.Vierhaus, "A test processor concept for
systems-on-a-chip", Proceedings of the IEEE International
Conference on Computer Design (lCCD) 2002, pp.210-212.
M.G.Gallup,
W.Ledbetter,
R.McGarity,
S.McMahan,
K.C.Scheuer, C.G.Shepard, L.Sood, "Testability features of the
68040", Proceedings of the IEEE International Test Conference
(ITC) 1990, pp. 749-757.
S.Ghosh, Hardware Description Languages: Concepts and
Principles, New York: IEEE Press, 2000.
D.Gizopoulos, A.Paschalis and Y.Zorian, "An Effective BIST
Scheme for Datapaths", Proceedings of the IEEE International
Test Conference, (ITC) 1996, pp. 76-85.
D.Gizopoulos, A.Paschalis and Y.Zorian, "An Effective Built-In
Self-Test Scheme for Booth Multipliers", IEEE Design & Test of
Computers, vol. 15, no. 3, pp. 105-111, July-September 1998.
D.Gizopoulos, A.Paschalis and Y.Zorian, "An Effective Built-In
Self-Test Scheme for Array Multipliers", IEEE Transactions on
Computers, vol. 48, no. 9, pp. 936-950, September 1999.
D.Gizopoulos, A.Paschalis, Y.Zorian and M.Psarakis, "An
Effective BIST Scheme for Arithmetic Logic Units", Proceedings
ofthe IEEE International Test Conference, 1997, pp. 868-877.
202
References
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
References
[68]
[69]
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
203
204
References
[81]
References
[93]
[94]
[95]
[96]
[97]
[98]
[99]
[100]
[101]
[102]
205
206
References
[103J
[104J
[105J
[106J
[107J
[108J
[109J
[110J
[IIIJ
[112J
References
207
208
References
[128] PlasmalMIPS
CPU
Model,
http://www.opencores.org/projects/mips
[129] c.Pyron, J.Prado, J.Golab, "Next generation PowerPC
microprocessor test strategy improvements", Proceedings of the
IEEE International Test Conference (ITC) 1997, pp. 414-423.
[130] C.Pyron, M.Alexander, J.Golab, G.Joos, B.Long, R.Molyneaux,
R.Raina, N.Tendolkar, "DFT Advances in the Motorola's
MPC7400, a PowerPC G4 Microprocessor", Proceedings of the
IEEE International Test Conference (ITC) 1999, pp. 137-146.
[131] K.Radecka, J.Rajski, J.Tyszer, "Arithmetic Built-In Self-Test for
DSP Cores", IEEE Transactions on Computer Aided Design of
Integrated Circuits and Systems, vol.
16, no.
11,
pp. 1358-1369, November 1997.
[132] R.Raina, R.Bailey, D.Belete, V.Khosa, R.Molyneaux, J.Prado,
A.Razdan, "DFT Advances in Motorola's Next-Generation 74xx
PowerPCTM Microprocessor", Proceedings of the IEEE
International Test Conference (ITC) 2000, pp. 132-140
[133] J.Rajski, J.Tyszer, Arithmetic Built-In Self-Test for Embedded
Systems, Prentice-Hall, Upper Saddle River, New Jersey, 1998.
[134] R. Rajsuman, "Testing A System-on-Chip with Embedded
Microprocessors", Proceedings of the IEEE International Test
Conference (ITC) 1999, pp. 499-508.
[135] R.Regalado, "A 'people oriented' approach to microprocessor
testing", Proceedings of the IEEE International Symposium on
Circuits and Systems (ISCAS) 1975, pp. 366-368.
[136] RISC-MCU
CPU
model,
http://www .opencores.org/proj ects/riscmcu/
[137] C.Robach, C.Bellon, G.Saucier, "Application oriented test for
dedicated microprocessor systems", Microprocessors and their
Applications, 1979, pp. 275-283.
[138] C.Robach, G.Saucier, "Application Oriented Microprocessor Test
Method", Proceedings of the Fault Tolerant Computing
Symposium (FTCS) 1980, pp. 121-125.
[139] C.Robach, G.Saucier, R.Velazco, "Flexible test method for
microprocessors", Proceedings of the 6th EUROMICRO
Symposium on Microprocessing and Microprogramming, 1980,
pp.329-339.
References
209
210
References
References
211
212
References
Index
ATE
At-speed testing
Automatie test equipment
31,57
2,28
Boundary sean
Built-in self-test
25,48
32,37,137, 190
C
CAD
Computer aided design
Core type
firm
hard
soft
1,8
10,99,186
10, 99, 125, 186
10,99, 157
Design-for-Testability
Deterministie testing
Diagnosis
Digital signal proeessor
Direet Memory Aeeess
23,43,57,88,142,186
39,60, 72, 101
45,48,62,191
9,12,18,64,72,178
92, 187
Index
214
Embedded processor
benchmark
Engineering effort
Fault coverage
Functional testing
22
55,58,98, 165
8, 10
32,41,53, 133, 147, 187
LFSR
Linear feedback shift register
Low-cost testing
o
On-line testing
Overhead
hardware
performance
3,41,51,63,83, 189
26, 72
36,50,81, 188
p
Power consumption
Pre-computed test
Processor component
"chained" testing
"parallel" testing
computational
control
functional
hidden
interconnect
operation
1, 14,23,37,44,57,81, 101
39,61, 101, 137
149
152
108, 116, 127, 137, 159
111,120,141,165
108, 124, 137, 141, 163
112,121, 143
109,112, 138, 161
103, 121
Index
prioritization
size
storage
Processor model
Jam
Meister
oc54x
oc8051
Parwan
Plasma
RISC-MCU
Pseudorandom testing
215
R
Register file
Register transfer level
RTL
S
SBST
Scan design
Self-test execution time
Self-test routine
optimization
size
style
Sequential fault
Software-based self-testing
embedded memory
phases
requirements
sbst-duration
SoC
test application time
Standard
Core test language (CTL)
IEEE 1500
System-on-Chip
Test application
Test cost
Index
216
30, 193
22,30,50,73,93
46
V
VDSM
Verilog
Very deep sub-mieron
VHDL
y
Yield
inaeeuraey
overtesting
35,62,89,188
31
31,89