0% found this document useful (0 votes)
44 views16 pages

Software-Based Self-Testing of Embedded Processors: IEEE Transactions On Computers May 2005

This document summarizes a research paper on software-based self-testing of embedded processors. The paper proposes a methodology for software-based self-testing that aims for high fault coverage with low test development and application costs. It validates that this methodology provides an effective low-cost alternative to other structural software-based self-testing approaches. The methodology is applied to two different processor implementations and shown to be effective and efficient.

Uploaded by

mrme44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views16 pages

Software-Based Self-Testing of Embedded Processors: IEEE Transactions On Computers May 2005

This document summarizes a research paper on software-based self-testing of embedded processors. The paper proposes a methodology for software-based self-testing that aims for high fault coverage with low test development and application costs. It validates that this methodology provides an effective low-cost alternative to other structural software-based self-testing approaches. The methodology is applied to two different processor implementations and shown to be effective and efficient.

Uploaded by

mrme44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/3044879

Software-Based Self-Testing of Embedded Processors

Article  in  IEEE Transactions on Computers · May 2005


DOI: 10.1109/TC.2005.68 · Source: IEEE Xplore

CITATIONS READS
161 274

4 authors, including:

Nektarios Kranitis Antonis Paschalis


National and Kapodistrian University of Athens National and Kapodistrian University of Athens
54 PUBLICATIONS   885 CITATIONS    135 PUBLICATIONS   1,940 CITATIONS   

SEE PROFILE SEE PROFILE

Dimitris Gizopoulos
National and Kapodistrian University of Athens
196 PUBLICATIONS   2,757 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Encoders for CCSDS LDPC codes View project

Cross-Layer Early Reliability Estimation for the Computing cOntinuum View project

All content following this page was uploaded by Dimitris Gizopoulos on 04 February 2013.

The user has requested enhancement of the downloaded file.


IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005 461

Software-Based Self-Testing
of Embedded Processors
Nektarios Kranitis, Student Member, IEEE, Antonis Paschalis, Member, IEEE,
Dimitris Gizopoulos, Senior Member, IEEE, and George Xenoulis

Abstract—Embedded processor testing techniques based on the execution of self-test programs have been recently proposed as an
effective alternative to classic external tester-based testing and pure hardware built-in self-test (BIST) approaches. Software-based
self-testing is a nonintrusive testing approach and provides at-speed testing capability without any hardware or performance
overheads. In this paper, we first present a high-level, functional component-oriented, software-based self-testing methodology for
embedded processors. The proposed methodology aims at high structural fault coverage with low test development and test
application cost. Then, we validate the effectiveness of the proposed methodology as a low-cost alternative over structural software-
based self-testing methodologies based on automatic test pattern generation and pseudorandom testing. Finally, we demonstrate the
effectiveness and efficiency of the proposed methodology by completely applying it on two different processor implementations of a
popular RISC instruction set architecture including several gate-level implementations.

Index Terms—Embedded processors, processor self-testing, software-based self-testing, low-cost testing.

1 INTRODUCTION

C ONSUMER demands for high performance and rich


functionality have driven the semiconductor manufac-
turing industry to the integration of multiple complex
Thus, high quality external testing is becoming increasingly
difficult even with today’s multimillion dollar ATE.
Traditional hardware self-test (or built-in self-test—BIST)
components on a single chip. Deep-submicron technologies, moves the testing task from external resources (ATE) to
along with the use of the System-on-Chip (SoC) design internal hardware, synthesized to generate test patterns and
paradigm, have made this possible. System development evaluate test responses of the circuit under test. Hardware
based on the use of a core-based architecture, where cores self-test achieves at-speed testing, reducing the overall test
are interconnected by an industry standard on-chip bus, is costs of the chip [1]. Recent applications of hardware-based
very common. This design strategy has been proven to be commercial Logic BIST techniques in large industrial designs
effective in terms of development time and productivity [2] and microprocessors [3] reveal that extensive and manual
since it reuses existing intellectual property (IP) cores. In the design changes have to be performed in order to make the
multimillion gate SoC era, design and test engineers face design BIST-ready. These include changes to prevent the
signal integrity problems, serious power consumption design from getting to an unknown state that will corrupt the
concerns, and, maybe more than these, serious testability compressed response and extensive test point insertion that is
challenges. necessary to achieve acceptable fault coverage in random
Manufacturing testing of an SoC architecture built pattern resistance circuits. However, these changes increase
around one or more processor cores is a difficult task. The the circuit area and degrade its performance. Therefore, Logic
test data volume required for external testing of embedded BIST has limited applicability to high-performance and
processors is becoming excessively large [1], resulting in long power-optimized embedded processors.
test application time, while the cost of test time using high- Several approaches can be grouped together under the
performance Automatic Test Equipment (ATE) is very high. term software-based self-testing (SBST) and various SBST
Furthermore, new types of defects appearing in deep- techniques have been proposed recently as an effective
submicron technologies require at-speed testing in order to alternative to hardware self-test for embedded processors.
achieve high test quality. However, the increasing gap SBST has a nonintrusive nature since it utilizes existing
between ATE frequencies and SoC operating frequencies processor resources and instructions to perform self-testing.
makes external at-speed testing almost infeasible. Moreover, Therefore, SBST can potentially provide sufficient testing
ATE accuracy problems can lead to serious yield loss [1]. quality without impact on performance, area, or power
consumption during normal operation. SBST approaches
with respect to the single stuck-at faults have been
. N. Kranitis and A. Paschalis are with the Department of Informatics &
Telecommunications, University of Athens, Panepistimiopolis, 15784, proposed in [3], [4], [5], [6], [7], [8], [9], [10] and a review
Athens, Greece. E-mail: {nkran, paschali}@di.uoa.gr. of some of them was given in [11]. Apart from this, [11]
. D. Gizopoulos and G. Xenoulis are with the Department of Informatics, discusses the application of SBST to testing path delay fault
University of Piraeus, 80 Karaoli & Dimitrou Street, 18534, Piraeus, and interconnect crosstalk defects, as well as fault diag-
Greece. E-mail: {dgizop, gxen}@unipi.gr.
nosis. Experimental results provided in [3] demonstrate the
Manuscript received 22 Sept. 2003; revised 10 Sept. 2004; accepted 3 Nov.
2004; published online 15 Feb. 2005.
superiority of SBST for processors over both Full Scan
For information on obtaining reprints of this article, please send e-mail to: design and hardware Logic BIST. An outline of the
[email protected], and reference IEEECS Log Number TC-0161-0903. embedded SBST concept is shown in Fig. 1.
0018-9340/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
462 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005

provide experimental results for various implementations


of two RISC processor model benchmarks. Finally, Section 6
concludes the paper.

2 PREVIOUS WORK
Traditionally, processor testing resorted in functional
testing approaches. Pioneering among those traditional
functional testing approaches, [12] is considered a landmark
paper in processor functional testing. Based on the RTL
description of the processor, the authors introduced a
functional fault model and considered the processor as a
theoretical graph model, the S-graph. Since then, many
processor functional testing methodologies have been
proposed. Those approaches were either based on a
Fig. 1. Software-based self-testing concept outline. functional fault model (many were based on the model of
[12]) or based on verification principles without assuming
Self-test routines and data are downloaded into instruc- any functional fault models. Those traditional functional
tion and data memories, respectively, from a low-speed, test approaches had a high level of abstraction, but required
low-cost external mechanism. Subsequently, these self-test a large amount of manual test writing effort; usually, very
routines are executed at the processor actual speed little fault grading was done on structural processor netlists,
(at-speed testing) and test responses are stored back in the while high fault coverage was not guaranteed. Furthermore,
on-chip RAM. These test responses may be in a compacted most of them relied on external ATE to feed the input test
(self-test signatures) or uncompacted form and can be patterns and monitor the response, in contrast with the
unloaded by the low-cost external ATE during manufactur- SBST approaches that apply at-speed in a self-test mode.
ing testing. Since today’s microprocessors have a reasonable The SBST approaches found so far in the literature can be
amount of on-chip cache, execution from the on-chip cache classified in two different categories. The first category
is considered a further improvement when targeting low- includes the SBST approaches [4], [5], [6] that have a high
cost testers, provided that a cache-loader mechanism exists level of abstraction and are functional in nature. The second
to load the test program and unload the test response [6]. As category includes the SBST approaches [3], [7], [8], [9], [10],
an alternative, self-test routines may be executed from an which are structural in nature and require structural fault-
on-chip ROM dedicated to this task when periodic online driven test development.
testing is performed for the device. In [4], a functional test methodology is proposed that
In this paper, we first propose a high-level, functional generates a random sequence of instructions that enumer-
component-oriented, SBST methodology suitable for com- ates all the combinations of the operations and system-
plex embedded processors. Test development of the atically selected operands. Test development is performed
proposed methodology is performed at a high level and at a high-level of abstraction based on Instruction Set
aims at high structural fault coverage with low test Architecture (ISA), which is highly desirable. However,
development and test application cost. Low test develop- since test development is not based on an a priori fault
ment cost is accomplished by keeping engineer manual test
model, the generated tests—applicable for design validation
development effort as low as possible while minimizing
as well—cannot achieve high fault coverage without the use
computational effort. Low test application cost is accom-
of large code sequences and a serious amount of effort for
plished by developing low-cost (small and fast) test
routines for the most important and critical-to-test compo- the user to determine the heuristics for the operands.
nents of the processor. Then, we validate the effectiveness Furthermore, the use of large code sequences results in
of the proposed methodology as a low-cost alternative over excessive test application time and very long fault simula-
software-based self-testing methodologies based on auto- tion time required for fault grading. When applied on the
matic test pattern generation or pseudorandom testing. GL85 processor model consisting of 6,300 gates and 244 FFs,
Finally, the effectiveness of the proposed methodology with a test program consisting of 360,000 instructions was
respect to low cost and high level of application is derived and the attained fault coverage was 90.2 percent,
demonstrated by successfully applying it on two different dropping to 86.7 percent when the responses were
processor implementations of a complex popular RISC compressed in a signature.
instruction set architecture, including several postsynthesis In [5], a self-test method is proposed which combines the
gate-level implementations. execution of microprocessor instructions with a small
The outline of the remainder of the paper is as follows: In amount of on-chip hardware that is used to provide a
Section 2, an overview of previous related work is given. In pseudorandom sequence of instructions, thus creating a
Section 3, we present our SBST methodology and we randomized test program along with randomized data
describe its three phases (information extraction, compo- operands. Besides the fact that the proposed methodology
nent classification and test priority, test routine develop- cannot be considered as a “pure” SBST methodology due to
ment). In Section 4, we compare the proposed SBST the insertion of test hardware, the manual effort required
methodology with existing component-based SBST meth- for test program development is high, while the pseuroran-
odologies which are based on automatic test pattern dom philosophy results in a very long test application time.
generation and pseudorandom testing. In Section 5, we Application on a RISC processor core similar to the DLX
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 463

machine, consisting of 27,860 gates, resulted in 94.8 percent finally, applied to the component by software test applica-
fault coverage with 220,000 cycles after an iterative process tion programs. The pseudorandom approach in [3] does not
considering different parameters. consider the regular structure of critical processor compo-
In [6], an automated functional self-test methodology is nents and, hence, leads to large self-test code, large memory
proposed based on the generation of random instruction requirements, and excessive test application time, even
sequences with pseudorandom data generated by software when applied to a small processor model. Application to an
LFSRs while it uses the on-chip cache to be applied. accumulator-based CPU core, Parwan, consisting of
Constraints are extracted and built into the generator to 888 gates and 53 FFs, resulted in 91.4 percent fault coverage
ensure generation of valid instruction sequences, ensuring in 137,649 cycles using a test program of 1,129 bytes.
that no cache misses and/or bus access cycles are produced. Nevertheless, the gate-level-based test strategy with manual
The high-level functional nature of the proposed approach constraint extraction, as it is presented in [3], is a general-
makes generation of instruction sequences and data that ized approach targeting every processor component. How-
achieve fault coverage goals with reasonable instruction ever, in the case of large “real” processors, manual
counts very difficult, while the huge amount of cycles constraint extraction for the targeted components is rather
inherent in the methodology makes fault grading a impractical. Furthermore, the step that follows constraint
nontrivial task. Although it compares favorably with the extraction, constraint-based test generation for the proces-
classic functional testing manual test generation effort, user sor modules using sequential ATPG, is a very time
expertise and knowledge of the processor units under test is consuming task, if at all achievable since, usually, only
required, as well as merging with manual tests in order to combinational components constrained test generation can
achieve high fault coverage. The methodology achieved be easily handled by commercial ATPG tools.
70 percent fault coverage when applied on the Intel In [9], the authors present a methodology that extends and
Pentium 4 processor. improves their previous work [3] in component-based fault-
In [7], a partially automated test development approach driven SBST by providing solutions that automate the
based on gate-level tuning is proposed. First, a library of complex constraint extraction phase while emphasizing
macros is generated manually by experienced assembly ATPG-based test development instead of a pseudorandom
programmers from the ISA, consisting of instruction one. The key is RTL simulation-based constrained extraction
sequences using operands as parameters. Then, a greedy using statistical regression analysis on test program tem-
search and a genetic algorithm are used to optimize the plates followed by constrained ATPG (adopting previous
process of random macro selection among the macros set, work in virtual constraint circuit (VCC) techniques) at the
along with selecting the most suitable macros parameters to gate-level in an iterative way. Although automation is
build a test program that maximizes the attained fault achieved in several steps, with manual effort required for
coverage when the test program is applied on the gate-level the selection and construction of a set of representative test
netlist of the processor. Thus, test program development is program templates among a quite large space of possible
fine-tuned to a specific gate-level netlist. The approach templates, this structural SBST methodology based on gate-
attained 85.2 percent fault coverage when applied on an level ATPG applies only to combinational processor compo-
8051 8-bit microcontroller design of 6,000 gates using nents without exploiting any possible regular structure of
624 instructions. critical-to-test combinational components. Furthermore, it
In [8], an automated test development approach is results in large test program sizes (thus, high test cost) due to
proposed based on evolutionary theory techniques the fact that the test programs cannot be in a compact form
(MicroGP), that maximizes the attained fault coverage since they are automatically generated using test programs
when the evolved test program is applied on the gate-level templates and also because of the large amount of ATPG test
netlist of the processor. It utilizes a directed acyclic graph patterns required to be downloaded from the slow external
for representing the syntactical flow of an assembly ATE to the on-chip memory. Application of the methodology
program and an instruction library for describing the to a combinational component of the Tensilica Xtensa
assembly syntax of the processor ISA. Manual effort is processor with 24,962 faults resulted in 288 ATPG test
required for the enumeration of all available instructions patterns and 90.1 percent fault coverage after constrained
and their possible operands. Experiments on an 8051 8-bit ATPG. When the tests are applied using processor instruc-
microcontroller design of 12,000 gates, resulted in 90 percent tions in a test program of 20,373 bytes, the fault coverage for
fault coverage. the targeted component is increased (due to collateral
In [3], an SBST methodology is proposed which is coverage) to 95.2 percent in 27,248 cycles.
structural in nature, targeting specific components and fine- The requirement of a gate-level netlist cannot be
tuning the test development to the low, gate-level details of considered a negative attribute in general since it helps to
the processor core. First, pseudorandom pattern sequences improve test quality (higher fault coverage). However, test
are developed for each processor component in an iterative generation tuned on gate level can be very computationally
method, taking into consideration manually extracted costly if applied in an iterative way with successive fault
constraints imposed by its instruction set. Then, test simulations (especially at the total processor level) and the
sequences are encapsulated into self-test signatures that gate-level netlist is in the test generation loop. Thus, SBST
characterize each component. Alternatively, component methodologies based on gate-level details can achieve high
tests can be extracted by structural ATPG and downloaded fault coverage, but require significant test development cost
directly in embedded memory by the tester. The component due to the tremendous increase of the circuit size and
self-test signatures are then expanded on-chip by a soft- complexity of modern processors. Therefore, it is highly
ware-emulated LFSR (test generation program) into pseu- desirable that an SBST methodology be based on a high
dorandom test patterns, stored in embedded memory, and, Register Transfer (RT) level description of the processor and
464 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005

its instruction set architecture, thus providing a low-cost 3 THE PROPOSED HIGH-LEVEL
and technology independent test development strategy. COMPONENT-BASED SBST METHODOLOGY
In [10], a high-level structural SBST methodology was
introduced, showing for the first time that small determi- The key characteristics of our high RT-level, component-
nistic test sets applied by compact test routines provide based, SBST methodology for complex embedded proces-
significant improvement when applied to the same simple sors are the following:
accumulator-based processor design, Parwan, which was
. A divide-and-conquer approach is applied using
used in [3]. Compared to [3], the methodology described in
component-based test development.
[10] requires a 20 percent smaller test program using
. Test development is based only on the Instruction
923 bytes, 75 percent smaller test data, and almost 90 percent
Set Architecture (ISA) of the processor and its
smaller test application time using 16,667 cycles. Both
RT-level description, which is, in almost all cases,
methodologies achieve single stuck-at fault coverage
available, without the need of low gate-level fine-
slightly higher than 91 percent for the simple accumula-
tuning.
tor-based Parwan processor.
Despite the fact that [10] presents the new perspective of Although the main target always remains the high
deterministic test development at high level for SBST of structural fault coverage (stuck-at fault model), test cost
embedded processors, scaling from simple accumulator- should be considered as a very important aspect. The
based processor architectures to more realistic ones in terms proposed methodology has two main objectives both
of complexity, like contemporary complex processors aiming at low test cost: 1) the generation of as small and as
implementing commercially successful ISAs (i.e., RISC fast as possible code routines with 2) as small as possible
ISAs), brings out several test challenges that remain to be engineering effort and test development time. The first objective
solved. These challenges arise when high-level test devel- leads to smaller download times at the low frequency of the
opment is applied to these complex processor architectures external tester as well as to smaller test execution times of
that contain large functional components (i.e., fast parallel the routines, therefore reducing the total processor test time.
multipliers, barrel shifters, etc.) and large register banks The second objective reduces test development cost and
while trying to keep the test cost as low as possible. time-to-market leading to significant improvements in
The SBST methodology introduced in this paper addresses product cost-effectiveness and market success.
these test challenges by defining different test priorities for The high RT-level test development of the proposed
processor components and classifying them according to the approach is well-suited to the high RT-level flow of the
defined priorities, proving that high-level test development design cycle. Since design, simulation, and synthesis are
based on ISA and RTL description of a complex processor ISA usually carried out at the high RT-level, test development
can lead to low test cost without sacrificing high fault can also be carried out at the same level, providing high
coverage, independently of complex processor implementa- convenience and flexibility. In this case, processor cores can
tion and postsynthesis (gate-level) results. be easily integrated into an SoC environment, configured,
The proposed SBST methodology shares desirable char- and retargeted in a variety of silicon technologies without
acteristics with functional-based SBST methodologies like any specific need for fine-tuning the test development to
test development at high level using the ISA, but goes one specific synthesis optimization parameters using a specific
step deeper, using RTL information and a divide-and- technology library. Experimental results show that the gate-
conquer approach targeting individual components with level independent proposed test strategy is very efficient
respect to the stuck-at fault model, thus providing very high and achieves more or less the same high fault coverage
fault coverage. results for different gate-level implementations of the RTL
Our SBST methodology is also fundamentally different
processor core.
when compared with other SBST methodologies targeting
structural faults [3], [7], [8], [9] since it is based only on the The SBST methodology for complex embedded proces-
high RTL description of the processor and its ISA and, thus, sors introduced in this work consists of the three phases
it provides a technology independent test development shown in Fig. 2 and explained in the following.
strategy with gate-level netlist required only for fault Phase A: Identification of processor components and
grading purposes. On the contrary, in other [3], [7], [8], [9] component operations, as well as instructions that excite
structural-based SBST methodologies, gate-level netlist is component operations and instructions (or instruction
not just a simple step of the test generation process (in the sequences) for controlling or observing processor registers.
traditional way of ATPG or pseudorandom TPG), but it is Phase B: Categorization of processor components in
also a key part of iterative test generation loops, which leads classes with the same properties and component prioritiza-
to increased computational cost. Focus at low test applica- tion for test development.
tion cost (test program size and test program cycles) is Phase C: Development of self-test routines emphasizing
another fundamental difference between this work and using compact loops of instructions, based on reusing a
previous works in SBST targeting structural faults [3], [7],
library of test algorithms for generic functional components
[8], [9], [10]. Assigning different test priorities to the
processor components and then developing low-cost (small in pseudocode tailored to the processor under test ISA that
and fast) test routines for the most critical-to-test compo- generate small precomputed test sets. These algorithms
nents of the processor results in smaller on-chip memory provide very high fault coverage for most types and
requirements, shorter test program download and applica- architectures of the processor components independently
tion time while the fault simulation time required for fault of word length.
grading is minimized, thus providing an efficient low-cost These phases are explained in more detail in the
alternative structural approach. following subsections.
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 465

Fig. 2. Proposed SBST methodology phases.

3.1 Phase A: Information Extraction


The starting point of our component-based SBST test
strategy is the instruction format (derived from the
processor ISA) and the processor micro-operations (derived
from the processor RTL description). Based on these, we
extract the following information, as summarized in Fig. 3: Fig. 3. Proposed SBST methodology (phase A).

1. The set of all the processor components C. by appropriate control signals generated after
2. The set OC of all the operations of each component C, instruction decoding.
along with the corresponding control signals that the . storage components, which serve as data storage
processor control unit drives to C for the execution elements that feed the data to the inputs of the
of this operation. computational components and capture their
3. The set of instructions IC;O that, during their output. Components classified in this subcate-
execution, excite the same control signals and drive gory include special processor registers visible
component C to perform the same operation O. to the assembly language programmer and the
4. Appropriate instructions or instruction sequences register file.
that set the values of processor registers.
5. Appropriate instructions or instruction sequences Our methodology acquires the information on the
that make the values of processor registers obser- number and types of functional components from
vable at primary outputs. the RTL description of the processor.
The last two parts of the information extracted in phase A, . Control components. The components that control
are used in building the peripheral instructions usually either the flow of instructions/data inside the
required to apply the basic test instruction I from the set of processor core or from/to the external environment
instructions IC;O and observe the result in processor (memory, peripherals). These components include
primary outputs. the processor control unit, the instruction and data
memory controllers that implement instruction
3.2 Phase B: Component Classification and Test fetching and memory handshaking, and similar
Priority components. These components are not directly
Using the information extracted in the previous phase A, related to specific functions of the processor and
we classify and prioritize the processor components as their existence is not evident in the instruction
summarized in Fig. 4. format and micro-operations of the processor.
We classify the components that appear in a processor . Hidden components. The components that are added
core RTL description in the following three classes: in the processor architecture usually to increase its
performance, but they are not visible to the
. Functional components. The processor components assembly language programmer. These components
that are directly related to the execution of instruc- include pipeline logic (pipeline registers, pipeline
tions (data processing/data storage) and are in some
sense visible to the assembly language programmer.
These components are either:

. computational components, which perform speci-


fic arithmetic/logic operations on data. Compo-
nents classified in this subcategory include
Arithmetic Logic Units (ALUs), shifters, barrel
shifters, multipliers, dividers, etc.
. interconnect components between processor com-
ponents, which serve the data flow in a
processor datapath. Components classified in
this subcategory include multiplexers controlled Fig. 4. Proposed SBST methodology (phase B).
466 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005

multiplexers, and control) and other performance the components inputs (outputs). Processor functional
increasing components related to Instruction Level components usually provide easy and full accessibility.
Parallelism (ILP) techniques, branch prediction On the contrary, the sequential nature of control compo-
techniques, etc. nents imposes several accessibility difficulties, while
For a low test cost, high-level SBST methodology that control components that implement interface functions
aims to develop small test routines in a small test with memory (instruction fetch, memory handshaking,
development time, the three classes of components have etc.) impose additional serious controllability problems.
different test priorities. Test priority determines the order in Testability of hidden components, like branch prediction
which test routines will be developed for each component. units, is very limited using SBST since such components
High priority components will be considered first, while are not visible to the assembly language programmer.
low priority components will be considered afterward and Manufacturing faults in such components do not lead to
only if the achieved overall fault coverage result is not functional differences, while the detection of the intro-
adequate. In many cases, test development for top priority duced performance faults is not possible using SBST and
components leads to sufficient fault coverage for not special DFT hardware is usually required.
targeted components as well due to collateral coverage. Therefore, according to our methodology, the functional
This is particularly true in processor architectures where the components of the processor have the highest test priority
execution of instructions in functional units excites many of for test development since their size dominates the
the control subsystem components as well. The character- processor area and they are easily accessible. When total
istics of a module that determine its priority in our processor fault coverage is not sufficient, test development
methodology are the relative size and the accessibility should proceed to the other two classes of components.
(controllability and observability) of the component by 3.3 Phase C: Test Routine Development
processor instructions.
In this subsection, we elaborate on the test routine
First of all, we deal with the processor components that
development at the component level which is performed
have the largest contribution to the overall processor fault
as shown in Fig. 5.
coverage. It is obvious that these are the largest components
The development of the dedicated self-test routines for
of the processor which include most of the stuck-at faults of
testing specific processor components according to our test
the entire processor. methodology is performed in two steps as follows:
The following three observations are valid in the Step C1: Instruction Selection. In Phase A, we have
majority of processor architectures: identified the set IC;O which consists of those processor
. The register file of the processor is one of the largest instructions I that, during execution, cause component C to
components. This fact is particularly true in RISC perform operation O. The instructions that belong to the
processors with a load/store architecture, which same set IC;O have different controllability/observability
have many general-purpose registers. Large register properties since, when operation O is performed, the inputs
files offer many advantages, enabling compilers to of component C are driven by internal processor registers
increase performance by reducing memory traffic. with different controllability characteristics while the out-
. The parallel multiplier, if present, is also one of the puts of component C are forwarded to internal processor
largest components. This fact is particularly true in registers with different observability characteristics. There-
RISC processors, as well as in DSPs. fore, for every component operation OC derived in phase A,
we select an instruction I from the set IC;O (basic test
. The functional components of the processor that
instruction) that results in selecting the shortest instruction
perform all arithmetic and logic operations of the
sequences (peripheral instructions) required to apply the
processor are much larger than the corresponding
specific operand to component inputs and propagate the
control logic which controls their operation. This size
component outputs to the processor primary outputs. This
difference got larger when processor generations
way we do not need further constraint analysis.
moved from internal buses of 8-bits and 16-bits to
As an example, we highlight instruction selection for the
those of 32-bits or 64-bits. Additionally, in DSPs
functional component ALU. Let us consider a common
where many functional components of the same type
coexist, the processor size dominating factor is the ALU, like the ALU used in the case of our second
benchmark, a MIPS R3000 compatible processor. Such an
size of the functional components.
ALU has two 32-bit inputs, one 32-bit output, and a control
Therefore, the functional components have the highest test input that specifies the component operations. The set of
priority due to their relative size. operations OALU that component ALU performs is:
The other characteristic of a module that determines
its priority in our methodology is the accessibility OALU ¼ fadd; subtract; and; or; nor; xor;
(controllability and observability) of the component by
set on less than unsigned; set on less than signedg:
processor instructions. The controllability of a proces-
sor component relates to how easy an instruction For every operation O, we identify the corresponding set of
sequence can apply a test pattern to the component processor instructions IALU;O that, during execution, cause
inputs, while the observability relates to how easily component ALU to perform operation O. According to the
an instruction sequence propagates component output proposed test routine development, one instruction I from
values to the primary outputs of the processor. each set IALU;O is needed in order to apply the deterministic
Usually, the controllability (observability) of processor data operands that each operation O imposes. The set
components is directly mapped to the controllability IALU;NOR has only one processor instruction and, thus, the
(observability) of the registers that drive (are driven by) selection is straightforward. The sets IALU;OR , IALU;XOR ,
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 467

Fig. 5. Proposed SBST methodology (phase C).

IALU;AND , IALU;SUB , IALU;SET ON LESS T HAN UNSIGNED , We have developed a component test library of test
IALU;SET ON LESS T HAN SIGNED consist of two instructions, algorithms that generate small deterministic tests and
one in the R-type (register) format and the other in the provide very high fault coverage for most types and
I-type (immediate) format. In this case, the instructions in architectures of functional processor components. The
the R-type format are selected since they provide the best nature of these deterministic tests is fundamentally differ-
controllability and observability characteristics due to the ent from ATPG-generated test patterns since a gate-level
use of the fully controllable and observable general purpose ATPG tool is not “smart” enough to identify regular
registers of the register file. Finally, the IALU;ADD set consists structures and generate patterns optimized for compact-
of a large number of instructions since the ALU is also used code software-based test application. Based on these small
in memory reference instructions. In this case, we also deterministic tests, we have developed test routines that,
select, for the same reasons, the ADD instruction in the besides the small number of tests, take advantage of test
R-type format. vectors’ regularity and algorithmic nature, resulting in
At this point, it should be noted that the proposed test efficient compact loops. These loop-based compact test
development is component-based and the ALU component routines require a very small number of bytes, while the
is the one considered in this example. Although using the small number of tests results in low test routine execution
I-type format would result in collateral coverage for other, time. The test library routines that describe a test algorithm in
external to the ALU components (like MUXes, sign assembly pseudocode, for almost all generic functional
extender, control unit, etc.), use of the I-type format for components, are tailored each time to the processor’s under
the ALU could cost more (a 32-bit load would require two test instruction set and assembly language. We remark that,
although constructing such a test library seems like a time
instructions, increasing test program size and cycles).
consuming task that requires a substantial manual effort, it is
Collateral coverage for not-targeted components would
a one-time-cost that cannot be directly associated with
require fault simulation at the processor level to estimate
manual effort and test development cost for a specific
the trade off between total processor fault coverage increase
processor under test since it provides reusability. Different
and test application cost.
processors require only tailoring the generic processor
Step C2: Operand Selection. In this step, we consider
component tests to their ISA. Of course, the component test
the deterministic operands that must be applied to each
library can be enriched anytime with new deterministic tests,
component to achieve high structural fault coverage. like tests that deal with new architectures and algorithms
The key to selecting the most appropriate deterministic implementing, for example, arithmetic/logic operations.
operands lies beneath the architecture of most critical-to- As already mentioned, for low-cost test development, it
test processor components. Such components have an would be very desirable for an SBST methodology to be
inherent regularity, which can lead to very efficient test gate-level independent. Since our methodology is compo-
algorithms for any gate-level implementation. This inherent nent-based, targeting structural (stuck-at) faults, it would be
regularity is not exploited either by pseudorandom test very desirable if implementation independent component
development or by ATPG-based test development ap- tests were available. In fact, previous work endorses such
proaches. Many processor components, in particular, the an approach [13].
vast majority of functional components like computational In the remainder of this section, we describe how regular
(arithmetic and logic operation modules), interconnect structures in datapath functional components that imple-
(multiplexers), and storage components (registers, register ment the most usual arithmetic, logic, interconnect, and
files), have a very regular or semiregular structure. Regular storage operations can be efficiently tested with a few
structure appears in several forms, as in the form of arrays deterministic tests without the need for gate-level details.
of identical cells (linear or rectangular), tree-like structures Arithmetic components testing. For components that
of multiplexers, memory element arrays, etc. Such compo- implement arithmetic operations, we have developed deter-
nents can be efficiently tested with small and regular test ministic tests for every type of arithmetic operations, like
sets that are gate-level independent, i.e., provide high fault addition, subtraction, multiplication, and division for various
coverage for any different gate-level implementation. word lengths. Deterministic tests are also available for several
468 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005

architecture and algorithm alternatives. For example, for the components where reusing our test library minimizes costly
addition operation, precomputed tests exist for ripple-carry manual effort. Satisfaction of high-level RTL verification
adder (RCA), carry-look-ahead (CLA), etc., architectures. metrics supported by industry standard simulation tools
Likewise, for the multiplication operation, precomputed like RTL statement, branch, condition, and expression
tests exist for carry-save array, booth-encoded, Wallace coverage helps to improve verification manual effort.
tree summation, etc. architectures [14], [15]. Besides, the overhead in test development cost, testing
Interconnect/multiplexer testing. An n-to-1 multiplexer control, and hidden subsystems impose a small overhead in
is usually decomposed and implemented by smaller multi- the test program size and test application time as well. Such
plexers in a tree structure. For example, an 8-to-1 multi- overheads characterize functional testing approaches since
plexer can be classically implemented as a tree of seven it is very difficult for such components to be tackled by
2-to-1 multiplexers. The implementation of an n-to-1 multi- small and fast deterministic routines. Thus, we adopt such
plexer tree is not unique and, in an RTL design, logic verification-based functional testing techniques for the
synthesis tools can generate different implementations control and hidden subsystems with the test cost/fault
depending on the technology mapping algorithms and coverage trade off in mind. This can be considered as a
technology libraries. For the 2-to-1 m-bit width multiplexer, limitation of the proposed methodology. We should remark
a test set provides complete single stuck-at fault testability, that, in many practical cases, verification-based test routines
if it applies the following four input combinations: (S, A, B): are developed at the design verification phase, therefore,
(1,0...0,1...1), (0,1...1,0...0), (1,1...1, 0...0), (0,0...0, 1...1), where they can be reused for the testing of control and hidden
S is the select signal of the multiplexer and A, B are the two components with no additional manual effort involved or,
m-bit wide data inputs which are selected when S = 0 and S in the worst case, substantially alleviating any manual self-
= 1, respectively. We have extended this test set for the case
test routine development effort.
of n-to-1, m-bit wide multiplexers. This implementation
independent deterministic test set is linear, the minimum
number of 2n test vectors is required and provides 4 COMPARISON WITH ATPG-BASED AND
100 percent single stuck-at fault coverage. PSEUDORANDOM-BASED SBST
Register file testing. The register file component can be
considered as a hierarchical design with subcomponents. Deterministic tests take advantage of the inherent regularity
These subcomponents are the write address decoder, the of several processor functional components’ architecture.
register array, and the two large read ports. We have These patterns are applied by compact self-test routines
developed deterministic tests for the two large multiplexer with high structural test coverage for any gate-level
trees (usually 32 input, 32-bit wide multiplexers for implementation. This inherent regularity is not exploited
processor architectures with 32 registers with each register either by pseudorandom test development or by ATPG-
32-bit wide) that implement the two read ports, as well as based test development approaches.
deterministic tests for flip-flops and write enable multi- The comparison with other software self-test approaches
plexers that implement the register array. This implementa- is favorable since, even if complex constraint extraction and
tion independent deterministic test set results in near constraint TPG (pseudorandom and structural ATPG) were
complete (>99.5 percent) stuck-at fault coverage. feasible, they would require application of long pseuroran-
Logic array testing. Testing a logic array that implements dom pattern sequences (increased test execution time, very
multibit Boolean logic operations like and, or, nor, xor, not is long fault simulation time for fault grading) for random
performed by applying well-known necessary patterns pattern resistant components, or a large amount of test data
while propagating the test response to well observable (structural ATPG patterns) downloaded from the tester to
registers and primary outputs. For example, a multibit on-chip memory (long test download time).
AND logic operation is tested by applying the following To validate the former argument, a set of experiments
data patterns (A, B): (0...0, 1...1), (1...1,0...0), (1...1,1...1), was performed targeting two processor components: one of
where A, B are the multibit input buses. the most critical-to-test processor components, the parallel
In the case where overall processor fault coverage, multiplier, and one of the two register file read ports.
measured by fault simulation at processor level, derived Besides the significance of the contribution of these
by targeting the datapath functional components according components to the total processor area and fault coverage,
to the proposed methodology is not adequate, test devel- these two components were selected as case studies for
opment should proceed with the other classes of compo- another major reason as well. Test development according
nents (i.e., control, hidden). However, it is very important to to other structural SBST approaches applied on the gate
note that, in many practical cases, the control components level for these case study components was applicable in a
and, especially, the control unit are sufficiently tested due to feasible way. That is:
collateral coverage resulting from instructions used to
deliver and propagate tests to the rest of the targeted . Manual constraint extraction was performed so that
components. If this is not the case, for such control-oriented pseudorandom component tests apply on indepen-
subsystems, we adopt standard verification-based func- dent inputs.
tional testing techniques with test development still . For structural ATPG generated tests, a commercial
performed at high-level. Many times, verification-based ATPG tool that was able to generate test patterns
tests cannot guarantee acceptable fault coverage while the under constraints, was provided with a set of
manual effort required to derive simple instruction manually extracted constraints.
sequences that verify control subsystem functionality is We remark that, for other functional units, particularly
higher than the effort required for processor functional sequential ones like the Register file as a whole and the
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 469

TABLE 1
deterministic patterns that, due to their deterministic
Multiplier Test Routine Statistics Comparing nature, provide the highest fault coverage (slightly higher
with ATPG and Pseudorandom SBST than the proposed algorithmic deterministic ones) is a cost
efficient solution only when the number of test patterns is
very small. Besides, when the component under test is pure
random logic, ATPG-based (if feasible) SBST is the only
practical solution.

4.2 Comparisons on the Register-File Read Ports


In this subsection, we provide details on how the
deterministic high-level approach is applied on a important
processor component and show how deterministic tests that
provide 100 percent fault coverage result in an order of
Serial Divider, manual constrained extraction and sequen- magnitude improvement in test program statistics like test
tial ATPG is not an easy task, if at all possible. program size and test application time when compared
with low-level ATPG generated tests. This is due to the fact
4.1 Comparisons on the Parallel Multiplier that the proposed test library’s deterministic tests have been
A set of 168 test patterns was generated for the multiplier by developed with SBST application in mind from the
constrained structural ATPG using a commercial tool. Two beginning and have an inherent regularity. No ATPG tool
different test routines (TRs) were developed to apply the is “smart” enough to explore component regular structures
deterministic ATPG test patterns using different approaches: and take advantage of it.
Since the proposed methodology test development is
1. TRATPG LOOP applies the ATPG test patterns to the component-based targeting structural faults, the register file
parallel multiplier using a load-apply-store test needs to be considered as a hierarchical design with
program template using loops and memory accesses subcomponents. Each read port subcomponent is imple-
to retrieve the ATPG test patterns. mented by an n-to-1 multiplexer where each of the n inputs
2. TRATPG IMM applies the ATPG test patterns using is m-bit wide. The n-to-1 multiplexer m-bit wide inputs are
the immediate addressing format template. driven from the n registers. A multiplexer (mux) has two
The first approach would lead to less TR code, but sets of inputs, the data inputs ðD0 ; D1 ; . . . Dn1 Þ and select
increased test application time, while the second one would inputs ðS0 ; S1 ; . . . Sp1 Þ, where p ¼ dlog2 ne. An n-to-1 mux is
lead to increased TR code, but fewer test cycles. usually decomposed and implemented by smaller muxes in
Apart from the ATPG generated structural patterns, a a tree structure. The implementation of an n-to-1 mux tree is
software LFSR was coded in assembly language to apply not unique and, in a register-transfer-level (RTL) design,
pseudorandom patterns to the parallel multiplier (similarly logic synthesis tools can generate different implementations
to the software LFSR used in [3]). Test routine statistics and depending on the technology mapping algorithms and
fault coverage results for each TR along with the proposed technology libraries.
methodology TR, TRPROPOSED are given in Table 1. It is highly desirable to provide a test set that achieves 100
The conclusion behind the contents of Table 1 is that, percent fault coverage with respect to the stuck-at fault model
even when: for a high RTL description of the n-to-1 mux while being
independent of the gate-level implementation. Deterministic
. gate-level tuned, computationally costly test devel- testing of the n-to-1 mux requires the application of 2n test
opment is performed, patterns being linear to the number of data inputs n.
. constrained extraction and constrained-based struc- At this point, we introduce the term Data Input
tural ATPG and pseudorandom test development is Configuration (DIC) which denotes the vector that applies
possible, to the data inputs of the mux. We denote the number of
the test routine size and test application time of such low- DICs for a mux test set as NDIC . Since the n mux data inputs
level structural approaches is severely increased. are driven from the n registers, it is of essential importance
If we consider the test routine size as the most important that the number of DICs is minimized since each DIC would
factor due to the low speed tester download time, the require a different instruction sequence for the setting of the
proposed method provides the smallest test routine since it n registers that correspond to each mux data input.
applies the test library algorithmic deterministic tests [14], The deterministic test set we propose for an n-to-1 mux
[15] using compact loops. Only the coding of the software that results in the minimum of two DICs is the following:
LFSR provides similar test routine size since pseudorandom Proposed Minimum DIC Test Patterns. Apply all 2p
pattern generation can be implemented in an algorithmic possible select input vectors, while each data input Di
implementation using a compact loop of instructions as (i ¼ 0; 1::; n  1) should have the parity of the corresponding p-
well, but at the expense of severely increased (order of bit select input vector Sn ðS0 ; S1 ; . . . Sp1 Þ (even as the first DIC
magnitude) test application time. Furthermore, it should be and odd as the second DIC).
noted that the pseudorandom approach fails to reach It should be noted that the number of test patterns is 2n,
acceptable fault coverage after application of 600 test but NDIC , which is very important for test code size and test
patterns. The only solution would be application of multi- program cycles, is now minimized. The proposed minimum
ple LFSR sequences using different seeds, increasing the DIC test patterns are illustrated in Fig. 6 for a 4-to-1 mux.
test application time and test routine size further. On the The scaling of the above patterns for 1-bit-wide, 4-to-1 mux
other side, test routines based on structural ATPG to the m-bits-wide, n-to-1 mux that implements each
470 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005

handled by commercial ATPG tools, while the ATPG is


unconstrained since both read port inputs and outputs are
controllable and observable, respectively, using processor
instructions. Our goal is to apply the generated structural
ATPG patterns using processor instructions as other
structural SBST methodologies propose for comparison
with the proposed SBST approach.
The ATPG tool will provide k, (k  2n) test patterns,
Fig. 6. Minimum DIC deterministic test patterns for 4-to-1 mux. where 2n is the minimum number of test patterns required
by the stuck-at fault model. Each of the k test patterns will
register bank read port is very simple. For example, if we be a vector of p þ ðm  nÞ bits, where p ¼ dlog2 ne. For
consider a register bank with n = 4 registers, 32-bits wide, example, considering a register file with n = 4 registers,
an assembly instruction sequence similar to the following m = 32-bits wide (32-bit wide 4-to-1 mux read port), each of
(32-bit wide 4-to-1 mux) could set the four registers for the k ATPG test patterns will be a 130-bit vector similar to
the minimum of two DICs (we use MIPS assembly the following:
language notation):
130-bit vector : S1 S0 ðR031 R030 . . . R00 ÞðR131 R130 . . . R10 Þ
; Instruction sequence for DIC#1 ðR231 R230 . . . R20 ÞðR331 R330 . . . R30 Þ:
li $0, 0x00000000
li $1, 0xFFFFFFFF We can easily draw the conclusion that each of the k test
li $2, 0xFFFFFFFF patterns would require a different DIC for the multiplexer,
li $3, 0x00000000 resulting in a total of k (k  2n) different DICs required for
; Test application and propagation 100 percent fault coverage.
; to primary output A test program template using the immediate addressing
format required to apply the ATPG generated patterns of
sw $0, memory
each DIC, where the settable fields by the ATPG patterns in
sw $1, memory
the instruction format are illustrated in bold, would be
sw $2, memory
similar to the following:
sw $3, memory
; Instruction sequence for DIC#2 ; Instruction sequence for DIC#i (i=0,..k-1)
li $0, 0xFFFFFFFF li $0, R031 R030 ::R
R00
li $1, 0x00000000 li $1, R131 R130 ::R
R10
li $2, 0x00000000 li $2, R231 R230 ::R
R20
li $3, 0xFFFFFFFF li $3, R331 R330 ::R
R30
; Test application and propagation ; Test application and propagation to
; to primary output primary output
sw $0, memory X ), memory ;X is the register defined by
sw $(X
sw $1, memory S 1S 0
sw $2, memory A test routine consisting of a total of k instruction
sw $3, memory sequences similar to the above has to be coded for the
In the above code, the instruction store word (sw) with generated ATPG test patterns to be applied following other
instruction format sw rt, offset (base) that stores a word to structural SBST methodologies. The test routine size of this
memory is used. It is a single-instruction used for test ATPG-based approach that also leads to 100 percent fault
application since the rt field in its instruction format is coverage in number of words is ð2n þ 1Þk, while the
directly mapped to the read-port select inputs while at the number of cycles is ð2n þ 3Þk.
same time propagates read port test response to primary We summarize the comparison in test program statistics
output. Read port data inputs are driven from the register for the testing of the register file read ports in Table 2 as a
array, which is fully controllable (i.e., using the immediate function of the number of registers in the register file (n)
addressing format). The test routine size of this deterministic and the number of patterns generated by ATPG (k) along
approach that leads to 100 percent fault coverage when with figures for n = 32 registers and the minimum number
calculated as a function of n (where n equals the number of of ATPG patterns k ¼ kmin ¼ 2n.
registers as well as the number of inputs to the n-to-1 MUX Again, as in the case of the parallel multiplier, the
implements each register bank read port) in number of words conclusion behind the figures of Table 2 is that, even when
is 6n (considered that li is a pseudoinstruction implemented gate-level tuned and constrained-based structural ATPG
as two instructions), while the number of cycles is 10n test development is possible, the test routine size and test
(considered that sw instruction requires three cycles and the application time of such a low level that structural
other instructions one cycle). For the n = 4 register example approaches are severely increased. In the register-file
given above, 24 words (12 for applying DIC#1 and 12 for read-ports, use of low gate-level ATPG test development
applying DIC#2) and 40 cycles are required (20 for applying results in an order of magnitude increase in both test
DIC#1 and 20 for applying DIC#2). program size and test application time, in other words, in
We can apply structural ATPG on the register file read much higher test cost when compared with the proposed
port since it is a combinational component being easily deterministic methodology.
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 471

TABLE 2
Register File Read Port Test Routine Statistics Comparing with ATPG-Based SBST

5 EXPERIMENTAL RESULTS program was enhanced by verification-based functional


testing routines targeting the memory control and control
The proposed methodology is demonstrated through its
logic components.
complete application to different processor implementa-
The self-test routine statistics are given in Table 5 in
tions of a very common instruction set architecture (ISA),
number of 32-bit words and number of clock cycles for
the MIPS ISA. In Section 5.1, a publicly available RISC
routine execution. Along with total self-test routine statis-
processor model Plasma/MIPS CPU [16] is used as a
tics, partial statistics are given for each targeted component
benchmark, while, in Section 5.2, a second benchmark
as well. The test cost in terms of test routine size and test
processor is designed using a publicly available Application
application time is very small, while test development is
Specific Instruction set Processor (ASIP) design environ-
performed at high RT-level.
ment [18], that implements a 5-stage pipeline version of the
Fault simulation results after the application of the
MIPS R3000 ISA.
developed test routines to processor netlists led to the fault
5.1 Case Study A: The Plasma/MIPS Architecture coverage results given in Table 6. The components that have
Processor Model been targeted for test development are marked on their left
The proposed methodology is demonstrated through its side in the table. The third, fifth, and seventh columns of
complete application to different gate-level implementa- Table 6 show, for each synthesis result (Synthesis A, B, and
tions of a public available RISC processor model, called C), the percentage of the processor overall fault coverage
Plasma/MIPS CPU core. It supports interrupts and all MIPS I which is missing from each of the components.
user mode instructions except unaligned load and store Downloading from ATE to memory a test program of less
operations (which are patented) and exceptions. The than 1K words, one can reach a high fault coverage of about
synthesizable CPU core is implemented in VHDL with a 95 percent very fast with a very low test cost. Test
3-stage pipeline [16]. development time and cost is minimized (fulfilling tight
We have enhanced the CPU model of [16] by adding a time-to-market) due the one-time cost precomputed test
parallel multiplier to provide closer correspondence with library reuse for components described at high RT-level.
real-world high-performance RISC processor core models. Only tailoring to MIPS ISA and assembly language is
The fast multiplier module was generated using a publicly required while tests are gate-level independent. Test applica-
available module generator [19] and has the following tion cost is proven to be very small since the small test
characteristics: Booth recoding, Wallace trees for partial program size is directly related to the download time from the
product summation, and fast carry lookahead addition at tester to the on-chip memory and usually dominates the total
the final stage. Table 3 shows the classification of the test time, while the other component of the total test time, test
Plasma/MIPS components in the classes described earlier. execution time (clock cycles) is also very small.
The Plasma/MIPS processor VHDL model was synthe- It is important to note that the fault coverage for the
sized in two different technology libraries with different register file (97.8 percent) is less than the almost complete
synthesis parameters. Synthesis A was optimized for area, coverage (> 99.5 percent) that our test library provides for a
targeted a 0.35 um technology library, and the design runs generic register file because extra logic implemented
at a clock frequency of 57 MHz. Synthesis B was also including external interrupt handling, a situation that
optimized for area, targeted a 0.50 um technology library, software self-test cannot handle without additional test
hardware associated with test instruction insertion. The
and the design runs at a clock frequency of 42 MHz. Finally,
fault coverage of the parallel Booth recoded Wallace Tree
Synthesis C was optimized for delay, targeted a 0.35 um
multiplier is 99 percent. However, the fault coverage of the
technology library, and the design runs at a clock frequency
Parallel Multiplier/Serial divider component is 96 percent
of 74 MHz. The resulting gate counts for each component
due to the existence of not targeted control random logic
and the processor overall are shown in Table 4. A 2-input within this complex module.
NAND gate is the gate count unit. Mentor Graphics suite As is shown in Table 6, we obtained very similar fault
was used for VHDL synthesis, functional and fault coverage results when the processor was synthesized in
simulation (Leonardo, ModelSim, and FlexTest products, different technology libraries with different optimization
respectively). scripts, optimizing for area or performance. This set of
The components of highest priority for test development experiments was performed to show that test development
are the five functional components of the Plasma/MIPS for the targeted components which is performed at high
processor. We developed different self-test routines for RT-level results in high structural fault coverage that does
these functional components since they represent about not depend in the gate-level synthesis results. These
87 percent of the total processor area according to the experiments ensure that, in the real design cycle, where
proposed methodology. Although very high fault coverage processor cores in an SoC environment must be configured
(> 94.5 percent) was achieved in a first phase, the test and retargeted in different silicon technologies using
472 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005

TABLE 3
Plasma/MIPS Components Classification

TABLE 4
Plasma/MIPS Components Gate Counts

different synthesis optimization parameters, test develop- use of test response compaction is a trade off with
ment is not fine-tuned each time to the gate-level. parameters determined by the specific processor architec-
Otherwise, the conventional gate-level specific ATPG ture and the adopted test methodology. The proposed test
method has to repeat the high cost and time-consuming methodology applies a small number of deterministic tests,
test generation for every different implementation. This fact so test response compaction might not be necessary.
demonstrates that the proposed methodology is efficient in However, this is not the case for pseudorandom-based test
targeting different gate-level netlists where technology methodology, where the very large number of test
remapping is required by exploiting RT-level architectural responses normally would require compaction.
characteristics, driving down the test development cost. The proposed methodology achieves very high fault
This is certainly not the case in previously proposed coverage of about 95 percent by downloading a small test
approaches. program of less than 1K words (853 words). The number of test
The proposed methodology provides at-speed testing responses stored to on-chip memory after the execution of the
using low-cost external testers. Self-test routines and data self-test program is 610 words. This number is very small
are downloaded by a low-cost external tester to the on-chip compared to other pseudorandom-based SBST methodolo-
memory and the processor core executes the test program gies that would require the application of a large number of
at-speed. The test responses, which are stored back in pseudorandom patterns and storage of the test responses.
on-chip memory, are unloaded by the low-cost tester. The Our test library provides an MISR test response compac-
amount of test code downloaded from the tester to the on- tion routine which can be used to compress the targeted
chip memory and the number of test responses unloaded components test responses to signatures with negligible (less
from the on-chip memory to the tester determines the test than 0.2 percent) aliasing probability, at the cost of down-
time required to download/unload the test program and loading an additional 58 words in our case study. Thus, when
test responses, respectively. This process dominates the the test library MISR routine is tailored to the MIPS ISA and
total application time over the test execution time since the used to compress the test responses stored in on-chip
processor executes the test program at speed. Thus, if the memory, the number of words downloaded is 911, the
number of test responses is large, test response compaction number of signatures unloaded is seven words, while the
may be required so that the external tester unloads only a total test program test execution time becomes 14,427 cycles.
very small number of compressed signatures. Since the test Of course, the use or not of compaction to shorten test
execution time depends on the processor speed and the application time depends on the relation between the
processor cycles required to access the on-chip memory, the internal processor frequency and the external tester
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 473

TABLE 5
Self-Test Routines Statistics on Plasma/MIPS

TABLE 6
Fault Coverage on Plasma/MIPS with Test Development for Targeted Components

frequency. The proposed SBST methodology applies a are illustrated in Table 8. It should be noted that, although
small number of deterministic tests, thus test response this processor benchmark implements almost the same
compaction might not be necessary. MIPS I ISA as the previous one, with more complex control
and pipeline, the fact that the VHDL RTL design generated
5.2 Case Study B: A 5-Stage Pipeline MIPS by the ASIP design environment does not support data
Compatible Processor Model
hazard detection and register bypassing, a requirement for
We have designed a MIPS R3000 [17] compatible processor careful assembly instruction scheduling is imposed along
with a 5-stage pipeline using the Application Specific
with insertion of nop instructions wherever it is necessary.
Instruction set Processor (ASIP) design environment of
[18] for evaluating the proposed methodology in a different This resulted in increased test program size and test
processor implementation of the same MIPS I ISA. A execution time. Otherwise, test program statistics would
52 instruction subset of the MIPS R3000 instruction set [17] be very similar to the previous case study A.
was implemented while coprocessor and interrupt instruc- Fault simulation on the synthesized gate-level processor
tions were not implemented in this experiment. It should be model after execution of the test program resulted in the fault
noted that, in the current educational release version of the coverage (stuck-at) figures given in Table 9 for some of the
ASIP design environment of [18], data hazard detection and processor components, including the targeted components
register bypassing are not implemented. The generated RTL which are marked. The fault coverage for the register file is the
VHDL model has been synthesized, optimized for area, expected almost complete coverage (> 99.5 percent) that our
targeting a 0.35 um technology library, and the design runs gate-level independent test library provides for a generic
at a clock frequency of 44 MHz with a total gate count of
register file, while the fault coverage of the parallel Booth
37,400 gates. The resulting gate counts for each component
recoded Wallace Tree multiplier subcomponent of the
and the processor overall, are shown in Table 7. Again, the
Mentor Graphics suite was used for VHDL synthesis, Parallel Multiplier/Serial divider component is 99 percent.
functional and fault simulation (Leonardo, ModelSim, and The total processor fault coverage is less than the previous
FlexTest products, respectively). case study benchmark due to the more complex pipeline
A test program was constructed targeting the functional (5-stage pipeline) and pipeline control, which are not
modules and was further enhanced by instruction sequences targeted. However, by targeting only the functional
targeting the processor controller following a verification- components, which constitute about 70 percent of the
based functional testing approach. Test program statistics total processor area, one can very easily reach high total
474 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 4, APRIL 2005

TABLE 7
MIPS R3000 Compatible Processor Component Gate Counts

TABLE 8
Self-Test Routine Statistics on MIPS R3000 Compatible Processor

TABLE 9
Fault Coverage Results on MIPS R3000 Compatible

fault coverage and then proceed to other classes of implement a very popular RISC instruction set, including
components if required. different gate-level implementations of these processors.
As in the previous benchmark, test development for the The proposed SBST methodology does not require either
targeted components is performed at high RT-level and manual or automatic constraint extraction for the processor
components or any sequential ATPG at the gate level. It can
results in high structural fault coverage that does not
be applied when just an RT-level description and the
depend on the gate-level synthesis results. instruction set architecture of the processor are available.
Postsynthesis gate-level netlist is required only for fault
6 CONCLUSIONS grading and it does not consist of a part of any test
generation iterative loop. The methodology aims to con-
We have analyzed a high-level component-based software- struct small and fast self-test routines for the most
based self-test methodology and demonstrated its applica- important and test critical components of the processor in
tion on two complex embedded processor architectures that order to achieve a low test application cost. Low test
KRANITIS ET AL.: SOFTWARE-BASED SELF-TESTING OF EMBEDDED PROCESSORS 475

development cost is accomplished by keeping engineer Nektarios Kranitis received the BSc degree in
manual test writing effort as low as possible with test physics from the University of Patras, Greece.
He is currently completing his work toward the
routine library reuse, while, at the same time, minimizing PhD degree in computer science in the Depart-
computational effort. Low-speed, low-cost external testers ment of Informatics & Telecommunications at
are excellent utilized by the proposed methodology since the University of Athens, Greece. His current
the test program downloading from the tester memory to research interests include self-testing of micro-
the on-chip memory, performed at the low frequency of the processor cores and SoCs. He is a student
tester, is completed in very short time intervals. member of the IEEE and the IEEE Computer
Society.
We have demonstrated that high test quality (more than
95 percent fault coverage for the first benchmark CPU and
more than 92 percent on the second) is achieved with low
Antonis Paschalis received the BSc degree in
test development and test application costs for different physics, the MSc degree in electronics and
architectures (3-stage and 5-stage pipeline, respectively) of a computers, and the PhD degree in computers,
popular RISC instruction set architecture and different gate- all from University of Athens. He is an associate
level implementations of the benchmark CPUs used in our professor in the Department of Informatics and
experiments. Telecommunications, University of Athens,
Greece. Previously, he was a senior researcher
at the Institute of Informatics and Telecommuni-
REFERENCES cations of the National Research Centre “De-
mokritos” in Athens. His current research
[1] ITRS, 2001 edition, http://public.itrs.net/Files/2001ITRS/ interests are logic design and architecture, VLSI testing, processor
Home.htm. testing, and hardware fault tolerance. He has published more than
[2] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan, 100 papers and holds a US patent. He is a member of the editorial board
and J. Rajski, “Logic BIST for Large Industrial Designs: Real Issues of JETTA and has served the test community as vice chair of the
and Case Studies,” Proc. Int’l Test Conf., pp. 358-367, 1999. Communications Group of the IEEE Computer Society TTTC and by
[3] L. Chen and S. Dey, “Software-Based Self-Testing Methodology participating in several organizing and program committees of interna-
for Processor Cores,” IEEE Trans. Compter-Aided Design of tional events in the area of design and test. He is a member of the IEEE
Integrated Circuits and Systems, vol. 20, no. 3, pp. 369-380, Mar. and the IEEE Computer Society.
2001.
[4] J. Shen and J. Abraham, “Native Mode Functional Test Generation Dimitris Gizopoulos received the computer
for Microprocessors with Applications to Self-Test and Design engineering degree from the University of
Validation,” Proc. Int’l Test Conf., pp. 990-999, 1998. Patras, Greece, and the PhD degree from the
[5] K. Batcher and C. Papachristou, “Instruction Randomization Self University of Athens, Greece. He is an assistant
Test for Processor Cores,” Proc. VLSI Test Symp., pp. 34-40, 1999. professor in the Department of Informatics,
[6] P. Parvathala, K. Maneparambil, and W. Lindsay, “FRITS—A University of Piraeus, Greece. His research
Microprocessor Functional BIST Method,” Proc. Int’l Test Conf., interests include processor testing, design-for-
pp. 590-598, 2002. testability, self-testing, online testing, and fault
[7] F. Corno, M. Sonza Reorda, G. Squillero, and M. Violante, “On the tolerance of digital circuits. He is the author of
Test of Microprocessor IP Cores,” Proc. Design Automation & Test more than 60 technical papers in transactions,
in Europe Conf., pp. 209-213, 2001. journals, book chapters, and conferences, author and editor of two
[8] F. Corno, G. Cumani, M. Sonza Reorda, and G. Squillero, “Fully books, and co-inventor of a US patent, all in test technology topics. He is
Automatic Test Program Generation for Microprocessor Cores,” member of the editorial board of IEEE Design and Test of Computers
Proc. Design Automation & Test in Europe Conf., pp. 1006-1011, 2003. and guest editor of special issues in IEEE publications. He is a member
[9] L. Chen, S. Ravi, A. Raghunathan, and S. Dey, “A Scalable of the steering, organizing, and program committees of several test
Software-Based Self-Testing Methodology for Programmable technology technical events, a member of the Executive Committee of
Processors,” Proc. Design Automation Conf., pp. 548-553, 2003. the IEEE Computer Society Test Technology Technical Council (TTTC),
[10] N. Kranitis, A. Paschalis, D. Gizopoulos, and Y. Zorian, “Instruc- a senior member of the IEEE, and a Golden Core Member of the IEEE
tion-Based Self-Testing of Processor Cores,” J. Electronic Testing: Computer Society.
Theory and Applications, no. 19, pp. 103-112, 2003.
[11] A. Krstic, L. Chen, W.C. Lai, K.T. Cheng, and S. Dey, “Embedded George Xenoulis reveived the BSc degree in
Software-Based Self-Test for Programmable Core-Based Designs,” computer science from the University of Piraeus,
IEEE Design and Test of Computers, pp. 18-26, July/Aug. 2002. Greece. He is currenty a PhD student in the
[12] S.M. Thatte and J.A. Abraham, “Test Generation for Micropro- same department and his research interests
cessors,” IEEE Trans. Computers, vol. 29, no. 6, pp. 429-441, June include testing of high-speed floating-point ar-
1980. ithmetic units, self-testing of embedded micro-
[13] H. Kim and J. Hayes, “Realization-Independent ATPG for Designs processor cores, and online testing.
with Unimplemented Blocks,” IEEE Trans. Computer-Aided Design
of Integrated Circuits and Systems, vol. 20, no. 2, pp. 290-306, Feb.
2001.
[14] A. Paschalis, D. Gizopoulos, N. Kranitis, M. Psarakis, and Y.
Zorian, “An Effective BIST Architecture for Fast Multiplier
Cores,” Proc. Design Automation & Test in Europe Conf., pp. 117-
121, 1999.
[15] D. Gizopoulos, A. Paschalis, and Y. Zorian, “An Effective Built-In . For more information on this or any other computing topic,
Self-Test Scheme for Parallel Multipliers,” IEEE Trans. Computers, please visit our Digital Library at www.computer.org/publications/dlib.
vol. 48, no. 9, pp. 936-950, Sept. 1999.
[16] Plasma CPU Model, http://www.opencores.org/projects/mips,
2005.
[17] G. Kane and J. Heinrich, MIPS RISC Architecture. Prentice Hall,
1992.
[18] ASIP Meister, http://www.eda-meister.org, 2005.
[19] J. Pihl and E. Sand Arithmetic Module Generator, http://
www.fysel.ntnu.no/modgen, 2005.

View publication stats

You might also like