Software-Based Self-Testing of Embedded Processors: IEEE Transactions On Computers May 2005
Software-Based Self-Testing of Embedded Processors: IEEE Transactions On Computers May 2005
net/publication/3044879
CITATIONS READS
161 274
4 authors, including:
Dimitris Gizopoulos
National and Kapodistrian University of Athens
196 PUBLICATIONS 2,757 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Cross-Layer Early Reliability Estimation for the Computing cOntinuum View project
All content following this page was uploaded by Dimitris Gizopoulos on 04 February 2013.
Software-Based Self-Testing
of Embedded Processors
Nektarios Kranitis, Student Member, IEEE, Antonis Paschalis, Member, IEEE,
Dimitris Gizopoulos, Senior Member, IEEE, and George Xenoulis
Abstract—Embedded processor testing techniques based on the execution of self-test programs have been recently proposed as an
effective alternative to classic external tester-based testing and pure hardware built-in self-test (BIST) approaches. Software-based
self-testing is a nonintrusive testing approach and provides at-speed testing capability without any hardware or performance
overheads. In this paper, we first present a high-level, functional component-oriented, software-based self-testing methodology for
embedded processors. The proposed methodology aims at high structural fault coverage with low test development and test
application cost. Then, we validate the effectiveness of the proposed methodology as a low-cost alternative over structural software-
based self-testing methodologies based on automatic test pattern generation and pseudorandom testing. Finally, we demonstrate the
effectiveness and efficiency of the proposed methodology by completely applying it on two different processor implementations of a
popular RISC instruction set architecture including several gate-level implementations.
1 INTRODUCTION
2 PREVIOUS WORK
Traditionally, processor testing resorted in functional
testing approaches. Pioneering among those traditional
functional testing approaches, [12] is considered a landmark
paper in processor functional testing. Based on the RTL
description of the processor, the authors introduced a
functional fault model and considered the processor as a
theoretical graph model, the S-graph. Since then, many
processor functional testing methodologies have been
proposed. Those approaches were either based on a
Fig. 1. Software-based self-testing concept outline. functional fault model (many were based on the model of
[12]) or based on verification principles without assuming
Self-test routines and data are downloaded into instruc- any functional fault models. Those traditional functional
tion and data memories, respectively, from a low-speed, test approaches had a high level of abstraction, but required
low-cost external mechanism. Subsequently, these self-test a large amount of manual test writing effort; usually, very
routines are executed at the processor actual speed little fault grading was done on structural processor netlists,
(at-speed testing) and test responses are stored back in the while high fault coverage was not guaranteed. Furthermore,
on-chip RAM. These test responses may be in a compacted most of them relied on external ATE to feed the input test
(self-test signatures) or uncompacted form and can be patterns and monitor the response, in contrast with the
unloaded by the low-cost external ATE during manufactur- SBST approaches that apply at-speed in a self-test mode.
ing testing. Since today’s microprocessors have a reasonable The SBST approaches found so far in the literature can be
amount of on-chip cache, execution from the on-chip cache classified in two different categories. The first category
is considered a further improvement when targeting low- includes the SBST approaches [4], [5], [6] that have a high
cost testers, provided that a cache-loader mechanism exists level of abstraction and are functional in nature. The second
to load the test program and unload the test response [6]. As category includes the SBST approaches [3], [7], [8], [9], [10],
an alternative, self-test routines may be executed from an which are structural in nature and require structural fault-
on-chip ROM dedicated to this task when periodic online driven test development.
testing is performed for the device. In [4], a functional test methodology is proposed that
In this paper, we first propose a high-level, functional generates a random sequence of instructions that enumer-
component-oriented, SBST methodology suitable for com- ates all the combinations of the operations and system-
plex embedded processors. Test development of the atically selected operands. Test development is performed
proposed methodology is performed at a high level and at a high-level of abstraction based on Instruction Set
aims at high structural fault coverage with low test Architecture (ISA), which is highly desirable. However,
development and test application cost. Low test develop- since test development is not based on an a priori fault
ment cost is accomplished by keeping engineer manual test
model, the generated tests—applicable for design validation
development effort as low as possible while minimizing
as well—cannot achieve high fault coverage without the use
computational effort. Low test application cost is accom-
of large code sequences and a serious amount of effort for
plished by developing low-cost (small and fast) test
routines for the most important and critical-to-test compo- the user to determine the heuristics for the operands.
nents of the processor. Then, we validate the effectiveness Furthermore, the use of large code sequences results in
of the proposed methodology as a low-cost alternative over excessive test application time and very long fault simula-
software-based self-testing methodologies based on auto- tion time required for fault grading. When applied on the
matic test pattern generation or pseudorandom testing. GL85 processor model consisting of 6,300 gates and 244 FFs,
Finally, the effectiveness of the proposed methodology with a test program consisting of 360,000 instructions was
respect to low cost and high level of application is derived and the attained fault coverage was 90.2 percent,
demonstrated by successfully applying it on two different dropping to 86.7 percent when the responses were
processor implementations of a complex popular RISC compressed in a signature.
instruction set architecture, including several postsynthesis In [5], a self-test method is proposed which combines the
gate-level implementations. execution of microprocessor instructions with a small
The outline of the remainder of the paper is as follows: In amount of on-chip hardware that is used to provide a
Section 2, an overview of previous related work is given. In pseudorandom sequence of instructions, thus creating a
Section 3, we present our SBST methodology and we randomized test program along with randomized data
describe its three phases (information extraction, compo- operands. Besides the fact that the proposed methodology
nent classification and test priority, test routine develop- cannot be considered as a “pure” SBST methodology due to
ment). In Section 4, we compare the proposed SBST the insertion of test hardware, the manual effort required
methodology with existing component-based SBST meth- for test program development is high, while the pseuroran-
odologies which are based on automatic test pattern dom philosophy results in a very long test application time.
generation and pseudorandom testing. In Section 5, we Application on a RISC processor core similar to the DLX
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 463
machine, consisting of 27,860 gates, resulted in 94.8 percent finally, applied to the component by software test applica-
fault coverage with 220,000 cycles after an iterative process tion programs. The pseudorandom approach in [3] does not
considering different parameters. consider the regular structure of critical processor compo-
In [6], an automated functional self-test methodology is nents and, hence, leads to large self-test code, large memory
proposed based on the generation of random instruction requirements, and excessive test application time, even
sequences with pseudorandom data generated by software when applied to a small processor model. Application to an
LFSRs while it uses the on-chip cache to be applied. accumulator-based CPU core, Parwan, consisting of
Constraints are extracted and built into the generator to 888 gates and 53 FFs, resulted in 91.4 percent fault coverage
ensure generation of valid instruction sequences, ensuring in 137,649 cycles using a test program of 1,129 bytes.
that no cache misses and/or bus access cycles are produced. Nevertheless, the gate-level-based test strategy with manual
The high-level functional nature of the proposed approach constraint extraction, as it is presented in [3], is a general-
makes generation of instruction sequences and data that ized approach targeting every processor component. How-
achieve fault coverage goals with reasonable instruction ever, in the case of large “real” processors, manual
counts very difficult, while the huge amount of cycles constraint extraction for the targeted components is rather
inherent in the methodology makes fault grading a impractical. Furthermore, the step that follows constraint
nontrivial task. Although it compares favorably with the extraction, constraint-based test generation for the proces-
classic functional testing manual test generation effort, user sor modules using sequential ATPG, is a very time
expertise and knowledge of the processor units under test is consuming task, if at all achievable since, usually, only
required, as well as merging with manual tests in order to combinational components constrained test generation can
achieve high fault coverage. The methodology achieved be easily handled by commercial ATPG tools.
70 percent fault coverage when applied on the Intel In [9], the authors present a methodology that extends and
Pentium 4 processor. improves their previous work [3] in component-based fault-
In [7], a partially automated test development approach driven SBST by providing solutions that automate the
based on gate-level tuning is proposed. First, a library of complex constraint extraction phase while emphasizing
macros is generated manually by experienced assembly ATPG-based test development instead of a pseudorandom
programmers from the ISA, consisting of instruction one. The key is RTL simulation-based constrained extraction
sequences using operands as parameters. Then, a greedy using statistical regression analysis on test program tem-
search and a genetic algorithm are used to optimize the plates followed by constrained ATPG (adopting previous
process of random macro selection among the macros set, work in virtual constraint circuit (VCC) techniques) at the
along with selecting the most suitable macros parameters to gate-level in an iterative way. Although automation is
build a test program that maximizes the attained fault achieved in several steps, with manual effort required for
coverage when the test program is applied on the gate-level the selection and construction of a set of representative test
netlist of the processor. Thus, test program development is program templates among a quite large space of possible
fine-tuned to a specific gate-level netlist. The approach templates, this structural SBST methodology based on gate-
attained 85.2 percent fault coverage when applied on an level ATPG applies only to combinational processor compo-
8051 8-bit microcontroller design of 6,000 gates using nents without exploiting any possible regular structure of
624 instructions. critical-to-test combinational components. Furthermore, it
In [8], an automated test development approach is results in large test program sizes (thus, high test cost) due to
proposed based on evolutionary theory techniques the fact that the test programs cannot be in a compact form
(MicroGP), that maximizes the attained fault coverage since they are automatically generated using test programs
when the evolved test program is applied on the gate-level templates and also because of the large amount of ATPG test
netlist of the processor. It utilizes a directed acyclic graph patterns required to be downloaded from the slow external
for representing the syntactical flow of an assembly ATE to the on-chip memory. Application of the methodology
program and an instruction library for describing the to a combinational component of the Tensilica Xtensa
assembly syntax of the processor ISA. Manual effort is processor with 24,962 faults resulted in 288 ATPG test
required for the enumeration of all available instructions patterns and 90.1 percent fault coverage after constrained
and their possible operands. Experiments on an 8051 8-bit ATPG. When the tests are applied using processor instruc-
microcontroller design of 12,000 gates, resulted in 90 percent tions in a test program of 20,373 bytes, the fault coverage for
fault coverage. the targeted component is increased (due to collateral
In [3], an SBST methodology is proposed which is coverage) to 95.2 percent in 27,248 cycles.
structural in nature, targeting specific components and fine- The requirement of a gate-level netlist cannot be
tuning the test development to the low, gate-level details of considered a negative attribute in general since it helps to
the processor core. First, pseudorandom pattern sequences improve test quality (higher fault coverage). However, test
are developed for each processor component in an iterative generation tuned on gate level can be very computationally
method, taking into consideration manually extracted costly if applied in an iterative way with successive fault
constraints imposed by its instruction set. Then, test simulations (especially at the total processor level) and the
sequences are encapsulated into self-test signatures that gate-level netlist is in the test generation loop. Thus, SBST
characterize each component. Alternatively, component methodologies based on gate-level details can achieve high
tests can be extracted by structural ATPG and downloaded fault coverage, but require significant test development cost
directly in embedded memory by the tester. The component due to the tremendous increase of the circuit size and
self-test signatures are then expanded on-chip by a soft- complexity of modern processors. Therefore, it is highly
ware-emulated LFSR (test generation program) into pseu- desirable that an SBST methodology be based on a high
dorandom test patterns, stored in embedded memory, and, Register Transfer (RT) level description of the processor and
464 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005
its instruction set architecture, thus providing a low-cost 3 THE PROPOSED HIGH-LEVEL
and technology independent test development strategy. COMPONENT-BASED SBST METHODOLOGY
In [10], a high-level structural SBST methodology was
introduced, showing for the first time that small determi- The key characteristics of our high RT-level, component-
nistic test sets applied by compact test routines provide based, SBST methodology for complex embedded proces-
significant improvement when applied to the same simple sors are the following:
accumulator-based processor design, Parwan, which was
. A divide-and-conquer approach is applied using
used in [3]. Compared to [3], the methodology described in
component-based test development.
[10] requires a 20 percent smaller test program using
. Test development is based only on the Instruction
923 bytes, 75 percent smaller test data, and almost 90 percent
Set Architecture (ISA) of the processor and its
smaller test application time using 16,667 cycles. Both
RT-level description, which is, in almost all cases,
methodologies achieve single stuck-at fault coverage
available, without the need of low gate-level fine-
slightly higher than 91 percent for the simple accumula-
tuning.
tor-based Parwan processor.
Despite the fact that [10] presents the new perspective of Although the main target always remains the high
deterministic test development at high level for SBST of structural fault coverage (stuck-at fault model), test cost
embedded processors, scaling from simple accumulator- should be considered as a very important aspect. The
based processor architectures to more realistic ones in terms proposed methodology has two main objectives both
of complexity, like contemporary complex processors aiming at low test cost: 1) the generation of as small and as
implementing commercially successful ISAs (i.e., RISC fast as possible code routines with 2) as small as possible
ISAs), brings out several test challenges that remain to be engineering effort and test development time. The first objective
solved. These challenges arise when high-level test devel- leads to smaller download times at the low frequency of the
opment is applied to these complex processor architectures external tester as well as to smaller test execution times of
that contain large functional components (i.e., fast parallel the routines, therefore reducing the total processor test time.
multipliers, barrel shifters, etc.) and large register banks The second objective reduces test development cost and
while trying to keep the test cost as low as possible. time-to-market leading to significant improvements in
The SBST methodology introduced in this paper addresses product cost-effectiveness and market success.
these test challenges by defining different test priorities for The high RT-level test development of the proposed
processor components and classifying them according to the approach is well-suited to the high RT-level flow of the
defined priorities, proving that high-level test development design cycle. Since design, simulation, and synthesis are
based on ISA and RTL description of a complex processor ISA usually carried out at the high RT-level, test development
can lead to low test cost without sacrificing high fault can also be carried out at the same level, providing high
coverage, independently of complex processor implementa- convenience and flexibility. In this case, processor cores can
tion and postsynthesis (gate-level) results. be easily integrated into an SoC environment, configured,
The proposed SBST methodology shares desirable char- and retargeted in a variety of silicon technologies without
acteristics with functional-based SBST methodologies like any specific need for fine-tuning the test development to
test development at high level using the ISA, but goes one specific synthesis optimization parameters using a specific
step deeper, using RTL information and a divide-and- technology library. Experimental results show that the gate-
conquer approach targeting individual components with level independent proposed test strategy is very efficient
respect to the stuck-at fault model, thus providing very high and achieves more or less the same high fault coverage
fault coverage. results for different gate-level implementations of the RTL
Our SBST methodology is also fundamentally different
processor core.
when compared with other SBST methodologies targeting
structural faults [3], [7], [8], [9] since it is based only on the The SBST methodology for complex embedded proces-
high RTL description of the processor and its ISA and, thus, sors introduced in this work consists of the three phases
it provides a technology independent test development shown in Fig. 2 and explained in the following.
strategy with gate-level netlist required only for fault Phase A: Identification of processor components and
grading purposes. On the contrary, in other [3], [7], [8], [9] component operations, as well as instructions that excite
structural-based SBST methodologies, gate-level netlist is component operations and instructions (or instruction
not just a simple step of the test generation process (in the sequences) for controlling or observing processor registers.
traditional way of ATPG or pseudorandom TPG), but it is Phase B: Categorization of processor components in
also a key part of iterative test generation loops, which leads classes with the same properties and component prioritiza-
to increased computational cost. Focus at low test applica- tion for test development.
tion cost (test program size and test program cycles) is Phase C: Development of self-test routines emphasizing
another fundamental difference between this work and using compact loops of instructions, based on reusing a
previous works in SBST targeting structural faults [3], [7],
library of test algorithms for generic functional components
[8], [9], [10]. Assigning different test priorities to the
processor components and then developing low-cost (small in pseudocode tailored to the processor under test ISA that
and fast) test routines for the most critical-to-test compo- generate small precomputed test sets. These algorithms
nents of the processor results in smaller on-chip memory provide very high fault coverage for most types and
requirements, shorter test program download and applica- architectures of the processor components independently
tion time while the fault simulation time required for fault of word length.
grading is minimized, thus providing an efficient low-cost These phases are explained in more detail in the
alternative structural approach. following subsections.
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 465
1. The set of all the processor components C. by appropriate control signals generated after
2. The set OC of all the operations of each component C, instruction decoding.
along with the corresponding control signals that the . storage components, which serve as data storage
processor control unit drives to C for the execution elements that feed the data to the inputs of the
of this operation. computational components and capture their
3. The set of instructions IC;O that, during their output. Components classified in this subcate-
execution, excite the same control signals and drive gory include special processor registers visible
component C to perform the same operation O. to the assembly language programmer and the
4. Appropriate instructions or instruction sequences register file.
that set the values of processor registers.
5. Appropriate instructions or instruction sequences Our methodology acquires the information on the
that make the values of processor registers obser- number and types of functional components from
vable at primary outputs. the RTL description of the processor.
The last two parts of the information extracted in phase A, . Control components. The components that control
are used in building the peripheral instructions usually either the flow of instructions/data inside the
required to apply the basic test instruction I from the set of processor core or from/to the external environment
instructions IC;O and observe the result in processor (memory, peripherals). These components include
primary outputs. the processor control unit, the instruction and data
memory controllers that implement instruction
3.2 Phase B: Component Classification and Test fetching and memory handshaking, and similar
Priority components. These components are not directly
Using the information extracted in the previous phase A, related to specific functions of the processor and
we classify and prioritize the processor components as their existence is not evident in the instruction
summarized in Fig. 4. format and micro-operations of the processor.
We classify the components that appear in a processor . Hidden components. The components that are added
core RTL description in the following three classes: in the processor architecture usually to increase its
performance, but they are not visible to the
. Functional components. The processor components assembly language programmer. These components
that are directly related to the execution of instruc- include pipeline logic (pipeline registers, pipeline
tions (data processing/data storage) and are in some
sense visible to the assembly language programmer.
These components are either:
multiplexers, and control) and other performance the components inputs (outputs). Processor functional
increasing components related to Instruction Level components usually provide easy and full accessibility.
Parallelism (ILP) techniques, branch prediction On the contrary, the sequential nature of control compo-
techniques, etc. nents imposes several accessibility difficulties, while
For a low test cost, high-level SBST methodology that control components that implement interface functions
aims to develop small test routines in a small test with memory (instruction fetch, memory handshaking,
development time, the three classes of components have etc.) impose additional serious controllability problems.
different test priorities. Test priority determines the order in Testability of hidden components, like branch prediction
which test routines will be developed for each component. units, is very limited using SBST since such components
High priority components will be considered first, while are not visible to the assembly language programmer.
low priority components will be considered afterward and Manufacturing faults in such components do not lead to
only if the achieved overall fault coverage result is not functional differences, while the detection of the intro-
adequate. In many cases, test development for top priority duced performance faults is not possible using SBST and
components leads to sufficient fault coverage for not special DFT hardware is usually required.
targeted components as well due to collateral coverage. Therefore, according to our methodology, the functional
This is particularly true in processor architectures where the components of the processor have the highest test priority
execution of instructions in functional units excites many of for test development since their size dominates the
the control subsystem components as well. The character- processor area and they are easily accessible. When total
istics of a module that determine its priority in our processor fault coverage is not sufficient, test development
methodology are the relative size and the accessibility should proceed to the other two classes of components.
(controllability and observability) of the component by 3.3 Phase C: Test Routine Development
processor instructions.
In this subsection, we elaborate on the test routine
First of all, we deal with the processor components that
development at the component level which is performed
have the largest contribution to the overall processor fault
as shown in Fig. 5.
coverage. It is obvious that these are the largest components
The development of the dedicated self-test routines for
of the processor which include most of the stuck-at faults of
testing specific processor components according to our test
the entire processor. methodology is performed in two steps as follows:
The following three observations are valid in the Step C1: Instruction Selection. In Phase A, we have
majority of processor architectures: identified the set IC;O which consists of those processor
. The register file of the processor is one of the largest instructions I that, during execution, cause component C to
components. This fact is particularly true in RISC perform operation O. The instructions that belong to the
processors with a load/store architecture, which same set IC;O have different controllability/observability
have many general-purpose registers. Large register properties since, when operation O is performed, the inputs
files offer many advantages, enabling compilers to of component C are driven by internal processor registers
increase performance by reducing memory traffic. with different controllability characteristics while the out-
. The parallel multiplier, if present, is also one of the puts of component C are forwarded to internal processor
largest components. This fact is particularly true in registers with different observability characteristics. There-
RISC processors, as well as in DSPs. fore, for every component operation OC derived in phase A,
we select an instruction I from the set IC;O (basic test
. The functional components of the processor that
instruction) that results in selecting the shortest instruction
perform all arithmetic and logic operations of the
sequences (peripheral instructions) required to apply the
processor are much larger than the corresponding
specific operand to component inputs and propagate the
control logic which controls their operation. This size
component outputs to the processor primary outputs. This
difference got larger when processor generations
way we do not need further constraint analysis.
moved from internal buses of 8-bits and 16-bits to
As an example, we highlight instruction selection for the
those of 32-bits or 64-bits. Additionally, in DSPs
functional component ALU. Let us consider a common
where many functional components of the same type
coexist, the processor size dominating factor is the ALU, like the ALU used in the case of our second
benchmark, a MIPS R3000 compatible processor. Such an
size of the functional components.
ALU has two 32-bit inputs, one 32-bit output, and a control
Therefore, the functional components have the highest test input that specifies the component operations. The set of
priority due to their relative size. operations OALU that component ALU performs is:
The other characteristic of a module that determines
its priority in our methodology is the accessibility OALU ¼ fadd; subtract; and; or; nor; xor;
(controllability and observability) of the component by
set on less than unsigned; set on less than signedg:
processor instructions. The controllability of a proces-
sor component relates to how easy an instruction For every operation O, we identify the corresponding set of
sequence can apply a test pattern to the component processor instructions IALU;O that, during execution, cause
inputs, while the observability relates to how easily component ALU to perform operation O. According to the
an instruction sequence propagates component output proposed test routine development, one instruction I from
values to the primary outputs of the processor. each set IALU;O is needed in order to apply the deterministic
Usually, the controllability (observability) of processor data operands that each operation O imposes. The set
components is directly mapped to the controllability IALU;NOR has only one processor instruction and, thus, the
(observability) of the registers that drive (are driven by) selection is straightforward. The sets IALU;OR , IALU;XOR ,
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 467
IALU;AND , IALU;SUB , IALU;SET ON LESS T HAN UNSIGNED , We have developed a component test library of test
IALU;SET ON LESS T HAN SIGNED consist of two instructions, algorithms that generate small deterministic tests and
one in the R-type (register) format and the other in the provide very high fault coverage for most types and
I-type (immediate) format. In this case, the instructions in architectures of functional processor components. The
the R-type format are selected since they provide the best nature of these deterministic tests is fundamentally differ-
controllability and observability characteristics due to the ent from ATPG-generated test patterns since a gate-level
use of the fully controllable and observable general purpose ATPG tool is not “smart” enough to identify regular
registers of the register file. Finally, the IALU;ADD set consists structures and generate patterns optimized for compact-
of a large number of instructions since the ALU is also used code software-based test application. Based on these small
in memory reference instructions. In this case, we also deterministic tests, we have developed test routines that,
select, for the same reasons, the ADD instruction in the besides the small number of tests, take advantage of test
R-type format. vectors’ regularity and algorithmic nature, resulting in
At this point, it should be noted that the proposed test efficient compact loops. These loop-based compact test
development is component-based and the ALU component routines require a very small number of bytes, while the
is the one considered in this example. Although using the small number of tests results in low test routine execution
I-type format would result in collateral coverage for other, time. The test library routines that describe a test algorithm in
external to the ALU components (like MUXes, sign assembly pseudocode, for almost all generic functional
extender, control unit, etc.), use of the I-type format for components, are tailored each time to the processor’s under
the ALU could cost more (a 32-bit load would require two test instruction set and assembly language. We remark that,
although constructing such a test library seems like a time
instructions, increasing test program size and cycles).
consuming task that requires a substantial manual effort, it is
Collateral coverage for not-targeted components would
a one-time-cost that cannot be directly associated with
require fault simulation at the processor level to estimate
manual effort and test development cost for a specific
the trade off between total processor fault coverage increase
processor under test since it provides reusability. Different
and test application cost.
processors require only tailoring the generic processor
Step C2: Operand Selection. In this step, we consider
component tests to their ISA. Of course, the component test
the deterministic operands that must be applied to each
library can be enriched anytime with new deterministic tests,
component to achieve high structural fault coverage. like tests that deal with new architectures and algorithms
The key to selecting the most appropriate deterministic implementing, for example, arithmetic/logic operations.
operands lies beneath the architecture of most critical-to- As already mentioned, for low-cost test development, it
test processor components. Such components have an would be very desirable for an SBST methodology to be
inherent regularity, which can lead to very efficient test gate-level independent. Since our methodology is compo-
algorithms for any gate-level implementation. This inherent nent-based, targeting structural (stuck-at) faults, it would be
regularity is not exploited either by pseudorandom test very desirable if implementation independent component
development or by ATPG-based test development ap- tests were available. In fact, previous work endorses such
proaches. Many processor components, in particular, the an approach [13].
vast majority of functional components like computational In the remainder of this section, we describe how regular
(arithmetic and logic operation modules), interconnect structures in datapath functional components that imple-
(multiplexers), and storage components (registers, register ment the most usual arithmetic, logic, interconnect, and
files), have a very regular or semiregular structure. Regular storage operations can be efficiently tested with a few
structure appears in several forms, as in the form of arrays deterministic tests without the need for gate-level details.
of identical cells (linear or rectangular), tree-like structures Arithmetic components testing. For components that
of multiplexers, memory element arrays, etc. Such compo- implement arithmetic operations, we have developed deter-
nents can be efficiently tested with small and regular test ministic tests for every type of arithmetic operations, like
sets that are gate-level independent, i.e., provide high fault addition, subtraction, multiplication, and division for various
coverage for any different gate-level implementation. word lengths. Deterministic tests are also available for several
468 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005
architecture and algorithm alternatives. For example, for the components where reusing our test library minimizes costly
addition operation, precomputed tests exist for ripple-carry manual effort. Satisfaction of high-level RTL verification
adder (RCA), carry-look-ahead (CLA), etc., architectures. metrics supported by industry standard simulation tools
Likewise, for the multiplication operation, precomputed like RTL statement, branch, condition, and expression
tests exist for carry-save array, booth-encoded, Wallace coverage helps to improve verification manual effort.
tree summation, etc. architectures [14], [15]. Besides, the overhead in test development cost, testing
Interconnect/multiplexer testing. An n-to-1 multiplexer control, and hidden subsystems impose a small overhead in
is usually decomposed and implemented by smaller multi- the test program size and test application time as well. Such
plexers in a tree structure. For example, an 8-to-1 multi- overheads characterize functional testing approaches since
plexer can be classically implemented as a tree of seven it is very difficult for such components to be tackled by
2-to-1 multiplexers. The implementation of an n-to-1 multi- small and fast deterministic routines. Thus, we adopt such
plexer tree is not unique and, in an RTL design, logic verification-based functional testing techniques for the
synthesis tools can generate different implementations control and hidden subsystems with the test cost/fault
depending on the technology mapping algorithms and coverage trade off in mind. This can be considered as a
technology libraries. For the 2-to-1 m-bit width multiplexer, limitation of the proposed methodology. We should remark
a test set provides complete single stuck-at fault testability, that, in many practical cases, verification-based test routines
if it applies the following four input combinations: (S, A, B): are developed at the design verification phase, therefore,
(1,0...0,1...1), (0,1...1,0...0), (1,1...1, 0...0), (0,0...0, 1...1), where they can be reused for the testing of control and hidden
S is the select signal of the multiplexer and A, B are the two components with no additional manual effort involved or,
m-bit wide data inputs which are selected when S = 0 and S in the worst case, substantially alleviating any manual self-
= 1, respectively. We have extended this test set for the case
test routine development effort.
of n-to-1, m-bit wide multiplexers. This implementation
independent deterministic test set is linear, the minimum
number of 2n test vectors is required and provides 4 COMPARISON WITH ATPG-BASED AND
100 percent single stuck-at fault coverage. PSEUDORANDOM-BASED SBST
Register file testing. The register file component can be
considered as a hierarchical design with subcomponents. Deterministic tests take advantage of the inherent regularity
These subcomponents are the write address decoder, the of several processor functional components’ architecture.
register array, and the two large read ports. We have These patterns are applied by compact self-test routines
developed deterministic tests for the two large multiplexer with high structural test coverage for any gate-level
trees (usually 32 input, 32-bit wide multiplexers for implementation. This inherent regularity is not exploited
processor architectures with 32 registers with each register either by pseudorandom test development or by ATPG-
32-bit wide) that implement the two read ports, as well as based test development approaches.
deterministic tests for flip-flops and write enable multi- The comparison with other software self-test approaches
plexers that implement the register array. This implementa- is favorable since, even if complex constraint extraction and
tion independent deterministic test set results in near constraint TPG (pseudorandom and structural ATPG) were
complete (>99.5 percent) stuck-at fault coverage. feasible, they would require application of long pseuroran-
Logic array testing. Testing a logic array that implements dom pattern sequences (increased test execution time, very
multibit Boolean logic operations like and, or, nor, xor, not is long fault simulation time for fault grading) for random
performed by applying well-known necessary patterns pattern resistant components, or a large amount of test data
while propagating the test response to well observable (structural ATPG patterns) downloaded from the tester to
registers and primary outputs. For example, a multibit on-chip memory (long test download time).
AND logic operation is tested by applying the following To validate the former argument, a set of experiments
data patterns (A, B): (0...0, 1...1), (1...1,0...0), (1...1,1...1), was performed targeting two processor components: one of
where A, B are the multibit input buses. the most critical-to-test processor components, the parallel
In the case where overall processor fault coverage, multiplier, and one of the two register file read ports.
measured by fault simulation at processor level, derived Besides the significance of the contribution of these
by targeting the datapath functional components according components to the total processor area and fault coverage,
to the proposed methodology is not adequate, test devel- these two components were selected as case studies for
opment should proceed with the other classes of compo- another major reason as well. Test development according
nents (i.e., control, hidden). However, it is very important to to other structural SBST approaches applied on the gate
note that, in many practical cases, the control components level for these case study components was applicable in a
and, especially, the control unit are sufficiently tested due to feasible way. That is:
collateral coverage resulting from instructions used to
deliver and propagate tests to the rest of the targeted . Manual constraint extraction was performed so that
components. If this is not the case, for such control-oriented pseudorandom component tests apply on indepen-
subsystems, we adopt standard verification-based func- dent inputs.
tional testing techniques with test development still . For structural ATPG generated tests, a commercial
performed at high-level. Many times, verification-based ATPG tool that was able to generate test patterns
tests cannot guarantee acceptable fault coverage while the under constraints, was provided with a set of
manual effort required to derive simple instruction manually extracted constraints.
sequences that verify control subsystem functionality is We remark that, for other functional units, particularly
higher than the effort required for processor functional sequential ones like the Register file as a whole and the
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 469
TABLE 1
deterministic patterns that, due to their deterministic
Multiplier Test Routine Statistics Comparing nature, provide the highest fault coverage (slightly higher
with ATPG and Pseudorandom SBST than the proposed algorithmic deterministic ones) is a cost
efficient solution only when the number of test patterns is
very small. Besides, when the component under test is pure
random logic, ATPG-based (if feasible) SBST is the only
practical solution.
TABLE 2
Register File Read Port Test Routine Statistics Comparing with ATPG-Based SBST
TABLE 3
Plasma/MIPS Components Classification
TABLE 4
Plasma/MIPS Components Gate Counts
different synthesis optimization parameters, test develop- use of test response compaction is a trade off with
ment is not fine-tuned each time to the gate-level. parameters determined by the specific processor architec-
Otherwise, the conventional gate-level specific ATPG ture and the adopted test methodology. The proposed test
method has to repeat the high cost and time-consuming methodology applies a small number of deterministic tests,
test generation for every different implementation. This fact so test response compaction might not be necessary.
demonstrates that the proposed methodology is efficient in However, this is not the case for pseudorandom-based test
targeting different gate-level netlists where technology methodology, where the very large number of test
remapping is required by exploiting RT-level architectural responses normally would require compaction.
characteristics, driving down the test development cost. The proposed methodology achieves very high fault
This is certainly not the case in previously proposed coverage of about 95 percent by downloading a small test
approaches. program of less than 1K words (853 words). The number of test
The proposed methodology provides at-speed testing responses stored to on-chip memory after the execution of the
using low-cost external testers. Self-test routines and data self-test program is 610 words. This number is very small
are downloaded by a low-cost external tester to the on-chip compared to other pseudorandom-based SBST methodolo-
memory and the processor core executes the test program gies that would require the application of a large number of
at-speed. The test responses, which are stored back in pseudorandom patterns and storage of the test responses.
on-chip memory, are unloaded by the low-cost tester. The Our test library provides an MISR test response compac-
amount of test code downloaded from the tester to the on- tion routine which can be used to compress the targeted
chip memory and the number of test responses unloaded components test responses to signatures with negligible (less
from the on-chip memory to the tester determines the test than 0.2 percent) aliasing probability, at the cost of down-
time required to download/unload the test program and loading an additional 58 words in our case study. Thus, when
test responses, respectively. This process dominates the the test library MISR routine is tailored to the MIPS ISA and
total application time over the test execution time since the used to compress the test responses stored in on-chip
processor executes the test program at speed. Thus, if the memory, the number of words downloaded is 911, the
number of test responses is large, test response compaction number of signatures unloaded is seven words, while the
may be required so that the external tester unloads only a total test program test execution time becomes 14,427 cycles.
very small number of compressed signatures. Since the test Of course, the use or not of compaction to shorten test
execution time depends on the processor speed and the application time depends on the relation between the
processor cycles required to access the on-chip memory, the internal processor frequency and the external tester
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 473
TABLE 5
Self-Test Routines Statistics on Plasma/MIPS
TABLE 6
Fault Coverage on Plasma/MIPS with Test Development for Targeted Components
frequency. The proposed SBST methodology applies a are illustrated in Table 8. It should be noted that, although
small number of deterministic tests, thus test response this processor benchmark implements almost the same
compaction might not be necessary. MIPS I ISA as the previous one, with more complex control
and pipeline, the fact that the VHDL RTL design generated
5.2 Case Study B: A 5-Stage Pipeline MIPS by the ASIP design environment does not support data
Compatible Processor Model
hazard detection and register bypassing, a requirement for
We have designed a MIPS R3000 [17] compatible processor careful assembly instruction scheduling is imposed along
with a 5-stage pipeline using the Application Specific
with insertion of nop instructions wherever it is necessary.
Instruction set Processor (ASIP) design environment of
[18] for evaluating the proposed methodology in a different This resulted in increased test program size and test
processor implementation of the same MIPS I ISA. A execution time. Otherwise, test program statistics would
52 instruction subset of the MIPS R3000 instruction set [17] be very similar to the previous case study A.
was implemented while coprocessor and interrupt instruc- Fault simulation on the synthesized gate-level processor
tions were not implemented in this experiment. It should be model after execution of the test program resulted in the fault
noted that, in the current educational release version of the coverage (stuck-at) figures given in Table 9 for some of the
ASIP design environment of [18], data hazard detection and processor components, including the targeted components
register bypassing are not implemented. The generated RTL which are marked. The fault coverage for the register file is the
VHDL model has been synthesized, optimized for area, expected almost complete coverage (> 99.5 percent) that our
targeting a 0.35 um technology library, and the design runs gate-level independent test library provides for a generic
at a clock frequency of 44 MHz with a total gate count of
register file, while the fault coverage of the parallel Booth
37,400 gates. The resulting gate counts for each component
recoded Wallace Tree multiplier subcomponent of the
and the processor overall, are shown in Table 7. Again, the
Mentor Graphics suite was used for VHDL synthesis, Parallel Multiplier/Serial divider component is 99 percent.
functional and fault simulation (Leonardo, ModelSim, and The total processor fault coverage is less than the previous
FlexTest products, respectively). case study benchmark due to the more complex pipeline
A test program was constructed targeting the functional (5-stage pipeline) and pipeline control, which are not
modules and was further enhanced by instruction sequences targeted. However, by targeting only the functional
targeting the processor controller following a verification- components, which constitute about 70 percent of the
based functional testing approach. Test program statistics total processor area, one can very easily reach high total
474 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005
TABLE 7
MIPS R3000 Compatible Processor Component Gate Counts
TABLE 8
Self-Test Routine Statistics on MIPS R3000 Compatible Processor
TABLE 9
Fault Coverage Results on MIPS R3000 Compatible
fault coverage and then proceed to other classes of implement a very popular RISC instruction set, including
components if required. different gate-level implementations of these processors.
As in the previous benchmark, test development for the The proposed SBST methodology does not require either
targeted components is performed at high RT-level and manual or automatic constraint extraction for the processor
components or any sequential ATPG at the gate level. It can
results in high structural fault coverage that does not
be applied when just an RT-level description and the
depend on the gate-level synthesis results. instruction set architecture of the processor are available.
Postsynthesis gate-level netlist is required only for fault
6 CONCLUSIONS grading and it does not consist of a part of any test
generation iterative loop. The methodology aims to con-
We have analyzed a high-level component-based software- struct small and fast self-test routines for the most
based self-test methodology and demonstrated its applica- important and test critical components of the processor in
tion on two complex embedded processor architectures that order to achieve a low test application cost. Low test
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 475
development cost is accomplished by keeping engineer Nektarios Kranitis received the BSc degree in
manual test writing effort as low as possible with test physics from the University of Patras, Greece.
He is currently completing his work toward the
routine library reuse, while, at the same time, minimizing PhD degree in computer science in the Depart-
computational effort. Low-speed, low-cost external testers ment of Informatics & Telecommunications at
are excellent utilized by the proposed methodology since the University of Athens, Greece. His current
the test program downloading from the tester memory to research interests include self-testing of micro-
the on-chip memory, performed at the low frequency of the processor cores and SoCs. He is a student
tester, is completed in very short time intervals. member of the IEEE and the IEEE Computer
Society.
We have demonstrated that high test quality (more than
95 percent fault coverage for the first benchmark CPU and
more than 92 percent on the second) is achieved with low
Antonis Paschalis received the BSc degree in
test development and test application costs for different physics, the MSc degree in electronics and
architectures (3-stage and 5-stage pipeline, respectively) of a computers, and the PhD degree in computers,
popular RISC instruction set architecture and different gate- all from University of Athens. He is an associate
level implementations of the benchmark CPUs used in our professor in the Department of Informatics and
experiments. Telecommunications, University of Athens,
Greece. Previously, he was a senior researcher
at the Institute of Informatics and Telecommuni-
REFERENCES cations of the National Research Centre “De-
mokritos” in Athens. His current research
[1] ITRS, 2001 edition, http://public.itrs.net/Files/2001ITRS/ interests are logic design and architecture, VLSI testing, processor
Home.htm. testing, and hardware fault tolerance. He has published more than
[2] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan, 100 papers and holds a US patent. He is a member of the editorial board
and J. Rajski, “Logic BIST for Large Industrial Designs: Real Issues of JETTA and has served the test community as vice chair of the
and Case Studies,” Proc. Int’l Test Conf., pp. 358-367, 1999. Communications Group of the IEEE Computer Society TTTC and by
[3] L. Chen and S. Dey, “Software-Based Self-Testing Methodology participating in several organizing and program committees of interna-
for Processor Cores,” IEEE Trans. Compter-Aided Design of tional events in the area of design and test. He is a member of the IEEE
Integrated Circuits and Systems, vol. 20, no. 3, pp. 369-380, Mar. and the IEEE Computer Society.
2001.
[4] J. Shen and J. Abraham, “Native Mode Functional Test Generation Dimitris Gizopoulos received the computer
for Microprocessors with Applications to Self-Test and Design engineering degree from the University of
Validation,” Proc. Int’l Test Conf., pp. 990-999, 1998. Patras, Greece, and the PhD degree from the
[5] K. Batcher and C. Papachristou, “Instruction Randomization Self University of Athens, Greece. He is an assistant
Test for Processor Cores,” Proc. VLSI Test Symp., pp. 34-40, 1999. professor in the Department of Informatics,
[6] P. Parvathala, K. Maneparambil, and W. Lindsay, “FRITS—A University of Piraeus, Greece. His research
Microprocessor Functional BIST Method,” Proc. Int’l Test Conf., interests include processor testing, design-for-
pp. 590-598, 2002. testability, self-testing, online testing, and fault
[7] F. Corno, M. Sonza Reorda, G. Squillero, and M. Violante, “On the tolerance of digital circuits. He is the author of
Test of Microprocessor IP Cores,” Proc. Design Automation & Test more than 60 technical papers in transactions,
in Europe Conf., pp. 209-213, 2001. journals, book chapters, and conferences, author and editor of two
[8] F. Corno, G. Cumani, M. Sonza Reorda, and G. Squillero, “Fully books, and co-inventor of a US patent, all in test technology topics. He is
Automatic Test Program Generation for Microprocessor Cores,” member of the editorial board of IEEE Design and Test of Computers
Proc. Design Automation & Test in Europe Conf., pp. 1006-1011, 2003. and guest editor of special issues in IEEE publications. He is a member
[9] L. Chen, S. Ravi, A. Raghunathan, and S. Dey, “A Scalable of the steering, organizing, and program committees of several test
Software-Based Self-Testing Methodology for Programmable technology technical events, a member of the Executive Committee of
Processors,” Proc. Design Automation Conf., pp. 548-553, 2003. the IEEE Computer Society Test Technology Technical Council (TTTC),
[10] N. Kranitis, A. Paschalis, D. Gizopoulos, and Y. Zorian, “Instruc- a senior member of the IEEE, and a Golden Core Member of the IEEE
tion-Based Self-Testing of Processor Cores,” J. Electronic Testing: Computer Society.
Theory and Applications, no. 19, pp. 103-112, 2003.
[11] A. Krstic, L. Chen, W.C. Lai, K.T. Cheng, and S. Dey, “Embedded George Xenoulis reveived the BSc degree in
Software-Based Self-Test for Programmable Core-Based Designs,” computer science from the University of Piraeus,
IEEE Design and Test of Computers, pp. 18-26, July/Aug. 2002. Greece. He is currenty a PhD student in the
[12] S.M. Thatte and J.A. Abraham, “Test Generation for Micropro- same department and his research interests
cessors,” IEEE Trans. Computers, vol. 29, no. 6, pp. 429-441, June include testing of high-speed floating-point ar-
1980. ithmetic units, self-testing of embedded micro-
[13] H. Kim and J. Hayes, “Realization-Independent ATPG for Designs processor cores, and online testing.
with Unimplemented Blocks,” IEEE Trans. Computer-Aided Design
of Integrated Circuits and Systems, vol. 20, no. 2, pp. 290-306, Feb.
2001.
[14] A. Paschalis, D. Gizopoulos, N. Kranitis, M. Psarakis, and Y.
Zorian, “An Effective BIST Architecture for Fast Multiplier
Cores,” Proc. Design Automation & Test in Europe Conf., pp. 117-
121, 1999.
[15] D. Gizopoulos, A. Paschalis, and Y. Zorian, “An Effective Built-In . For more information on this or any other computing topic,
Self-Test Scheme for Parallel Multipliers,” IEEE Trans. Computers, please visit our Digital Library at www.computer.org/publications/dlib.
vol. 48, no. 9, pp. 936-950, Sept. 1999.
[16] Plasma CPU Model, http://www.opencores.org/projects/mips,
2005.
[17] G. Kane and J. Heinrich, MIPS RISC Architecture. Prentice Hall,
1992.
[18] ASIP Meister, http://www.eda-meister.org, 2005.
[19] J. Pihl and E. Sand Arithmetic Module Generator, http://
www.fysel.ntnu.no/modgen, 2005.