A Flexible Software-Based Framework For Online Detection of Hardware Defects
A Flexible Software-Based Framework For Online Detection of Hardware Defects
Abstract—This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of
instructions, called Access-Control Extensions (ACE), that can access and control the microprocessor’s internal state. Special
firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a
hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration.
The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade-off
performance with reliability without requiring any change to the hardware. We describe and evaluate different execution models for
using the ACE framework. We also describe how the proposed ACE framework can be extended and utilized to improve the quality of
post-silicon debugging and manufacturing testing of modern processors. We evaluated our technique on a commercial chip-
multiprocessor based on Sun’s Niagara and found that it can provide very high coverage, with 99.22 percent of all silicon defects
detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5 percent. Based on
a detailed register transfer level (RTL) implementation of our technique, we find its area and power consumption overheads to be
modest, with a 5.8 percent increase in total chip area and a 4 percent increase in the chip’s overall power consumption.
Index Terms—Reliability, hardware defects, online defect detection, testing, online self-test, post-silicon debugging, manufacturing
test.
1 INTRODUCTION
requirements. Researchers have pursued the development
T HE impressive growth of the semiconductor industry
over the last few decades is fueled by continuous silicon
scaling, which offers smaller, faster, and cheaper transistors
of global checkpoint and recovery mechanisms; examples of
these include SafetyNet [52] and ReVive [42], [39]. These
with each new technology generation. However, challenges low-cost checkpointing mechanisms provide the capabilities
in producing reliable components in these extremely dense necessary to implement system recovery. Additionally, the
technologies are growing, with many device experts highly redundant nature of future CMPs will allow low-cost
warning that continued scaling will inevitably lead to repair through the disabling of defective processing ele-
future generations of silicon technology being much less ments [48]. With a sufficient number of processing re-
reliable than present ones [4], [53]. Processors manufac- sources, the performance of a future parallel system will
tured in future technologies will likely experience failures gracefully degrade as manifested defects increase.
in the field due to silicon defects occurring during system Given the existence of low-cost mechanisms for system
operation. In the absence of any viable alternative technol- recovery and repair, the remaining major challenge in the
ogy, the success of the semiconductor industry in the future design of a defect-tolerant CMP is the development of low-
will depend on the creation of cost-effective mechanisms to cost defect detection techniques. Existing online hardware-
tolerate silicon defects in the field (i.e., during operation). based defect detection and diagnosis techniques can be
The challenge—tolerating hardware defects. To tolerate classified into two broad categories: 1) continuous: those that
permanent hardware faults (i.e., silicon defects) encountered continuously check for execution errors and 2) periodic:
during operation, a reliable system requires the inclusion of those that periodically check the processor’s logic.
three critical capabilities: 1) mechanisms for detection and Existing defect tolerance techniques and their short-
diagnosis of defects, 2) recovery techniques to restore correct comings. Examples of continuous techniques are Dual
system state after a fault is detected, and 3) repair mechan- Modular Redundancy (DMR) [51], lockstep systems [27],
isms to restore correct system functionality for future and DIVA [2]. These techniques detect silicon defects by
computation. Fortunately, research in chip-multiprocessor
validating the execution through independent redundant
(CMP) architectures already provides for the latter two
computation. However, independent redundant computa-
tion requires significant hardware cost in terms of silicon
. K. Constantinides, T. Austin, and V. Bertacco, are with the University of area (100 percent extra hardware in the case of DMR and
Michigan, Ann Arbor, 2260 Hayward, 2773 CSE, MI 48109. lockstep systems). Furthermore, continuous checking con-
E-mail: {kypros, austin, valeria}@umich.edu. sumes significant energy and requires part of the power
. O. Mutlu is with the Carnegie Mellon University, 5000 Forbes Avenue, envelope to be dedicated to it. In contrast, periodic
ECE-HH-A305, Pittsburgh, PA 15213. E-mail: [email protected]. techniques check periodically the integrity of the hardware
Manuscript received 18 Feb 2008; revised 30 Aug. 2008; accepted 20 Nov. without requiring redundant execution [50]. These techni-
2008; published online 20 Mar. 2009. ques rely on checkpointing and recovery mechanisms that
Recommended for acceptance by C. Bolchini.
For information on obtaining reprints of this article, please send e-mail to: provide computational epochs and a substrate for spec-
[email protected], and reference IEEECS Log Number TC-2008-02-0078. ulative unchecked execution. At the end of each computa-
Digital Object Identifier no. 10.1109/TC.2009.52. tional epoch, the hardware is checked by on-chip testers. If
0018-9340/09/$25.00 ß 2009 IEEE Published by the IEEE Computer Society
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
1064 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 8, AUGUST 2009
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
CONSTANTINIDES ET AL.: A FLEXIBLE SOFTWARE-BASED FRAMEWORK FOR ONLINE DETECTION OF HARDWARE DEFECTS 1065
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
1066 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 8, AUGUST 2009
TABLE 1
The ACE Instruction Set Extensions
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
CONSTANTINIDES ET AL.: A FLEXIBLE SOFTWARE-BASED FRAMEWORK FOR ONLINE DETECTION OF HARDWARE DEFECTS 1067
TABLE 2
Algorithmic Flow of ACE-Based Testing
in a Checkpoint/Recovery Environment
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
1068 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 8, AUGUST 2009
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
CONSTANTINIDES ET AL.: A FLEXIBLE SOFTWARE-BASED FRAMEWORK FOR ONLINE DETECTION OF HARDWARE DEFECTS 1069
Fig. 7. Fault coverage of basic core functional testing: The pie chart on the right shows the distribution of the outcomes of a fault injection campaign
on a five-stage in-order core running the purely software-based preliminary functional tests.
therefore, high performance overhead for our proposal. To outcomes of the fault injection campaign. Overall, the basic
investigate the performance overhead due to such frequent core test successfully detected 62.14 percent of the injected
I/O operations, we simulated some I/O-intensive filesys- faults. The remaining 37.86 percent of the injected faults lie
tem and network processing benchmarks. We evaluated in parts of the core’s logic that do not affect the core’s
microbenchmarks Bonnie and IOzone to exercise the capability of executing simple programs such as the basic
filesystem by performing frequent disk read/write opera- core test and the ACE testing firmware. ACE testing
tions. We also used NetPerf benchmarks [20] to exercise the firmware will subsequently test these untested areas of
network interface by performing very frequent packet the design to provide full core coverage.
send/receive operations. In addition to the Netperf suite,
we evaluated three other benchmarks, NetIO, NetPIPE, and 5.2 ACE Testing Latency, Coverage,
ttcp, which are commonly used to measure the network and Storage Requirements
performance. In these experiments, the execution of an An important metric for measuring the efficiency of our
irrecoverable I/O operation is preceded by a checkpoint technique is how long it takes to fully check the underlying
termination and the new checkpoint interval begins right hardware for defects. The latency of testing an ACE domain
after the execution of the I/O operation. Section 5.5 presents depends on 1) the number of ACE segments it consists of
our results. and 2) the number of test patterns that need to be applied.
RTL implementation. We implemented the ACE tree In this experiment, we generate test patterns for each
structure in RTL using Verilog in order to obtain a detailed individual ACE domain in the design using three different
and accurate estimate of the area and power consumption fault models (stuck-at, path-delay, and N-detect) and the
overheads of the ACE framework. We synthesized our methodology described in Section 4. Table 3 lists the
design of the ACE tree using the same tools, cell library, number of test instructions needed to test each of the major
and methodology that we used for synthesizing the Open- modules in the design (based on the ACE firmware code
SPARC T1 modules, as described earlier in this section. shown in Fig. 4).
Section 5.6 evaluates and quantifies the area overhead of the For the stuck-at fault model, the most demanding
ACE framework while Section 5.7 evaluates its power
module is the SPARC core, requiring about 150 K dynamic
consumption.
test instructions to complete the test. Modules dominated
by combinational logic, such as the SPARC core, the DRAM
5 EXPERIMENTAL EVALUATION controller, the FPU, and the I/O bridge, are more demand-
ing in terms of test instructions. On the other hand, the
5.1 Basic Core Functional Testing CPU-cache crossbar, which consists mainly of buffer queues
Before running the ACE testing firmware, we first run a and interconnect, requires much fewer instructions to
software functional test to check the core for defects that complete the tests.
would prevent the correct execution of the testing firmware. For the path-delay fault model, we generate test pattern
If this test does not report success in a timely manner to an sets for the critical paths that are within 5 percent of the
independent auditor (i.e., the OS running on other cores), clock period. The required number of test instructions to
the test is repeated to verify that the failing cause was not
transient. If the test fails again, then an irrecoverable core
defect is assumed, the core is disabled, and the targeted TABLE 3
tests are canceled. Number of Test Instructions Needed to Test Each
The software functional test we used to check the core of the Major Modules in the Design
consists of three self-validating phases. The total size of the
software functional test is approximately 700 dynamic
instructions. To evaluate the effectiveness of the basic core
test, we performed a stuck-at fault injection campaign on
the gate-level netlist of a synthesized five-stage in-order
core (similar to the SPARC core with the exception of
multithreading support). Fig. 7 shows the distribution of the
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
1070 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 8, AUGUST 2009
TABLE 4 TABLE 5
Test Pattern/Response Storage Requirements Number of Test Instructions Needed by Each Core Pair
per Fault Model and Design Module in Full-Chip Distributed Testing: The Testing Process Is
Distributed over the Chip’s Eight SPARC Cores
Each core is assigned to test its resources and some parts of the
surrounding noncore modules as shown in this table.
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
CONSTANTINIDES ET AL.: A FLEXIBLE SOFTWARE-BASED FRAMEWORK FOR ONLINE DETECTION OF HARDWARE DEFECTS 1071
Fig. 11. Performance overhead of interleaved ACE testing in the shadow of L2 cache misses.
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
1072 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 8, AUGUST 2009
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
CONSTANTINIDES ET AL.: A FLEXIBLE SOFTWARE-BASED FRAMEWORK FOR ONLINE DETECTION OF HARDWARE DEFECTS 1073
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
1074 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 8, AUGUST 2009
actual physical design of the product meets all the temporarily alters the design by physically changing the
performance and functionality specifications as they were metal layers of the chip. Unfortunately, FIB is limited in
defined in the design phase. The goal of post-silicon two ways. First, FIB typically can only change metal layers
debugging is to find all design errors, also known as of the chip and cannot create any new transistors. There-
design bugs, and to eliminate them through design changes fore, some potential design fixes are not possible to make
or other means before selling the product to the customer or evaluate using this technology. Second, FIB’s effective-
[24], [22], [25]. ness is projected to diminish with further technology
The first phase of post-silicon debugging is to run scaling as the access to lower metal layers is becoming
extended tests to validate the functional and electrical increasingly difficult due to the introduction of more metal
operation of the design. The validation content commonly layers in modern designs [8], [24].
consists of focused software test programs written to Recently proposed mechanisms try to address the
exercise specific functionalities of the design or randomly limitations of these traditional techniques. Specifically,
generated tests that exercise different parts of the design. recently proposed solutions suggest the use of reconfigur-
We refer to these test programs as the validation test suite. able programmable logic cores and flexible on-chip net-
These tests are applied under different operating conditions works that will improve both signal observability and the
(i.e., voltage, clock frequency, and temperature) in order to ability to temporally alter the design [43]. However, these
electrically characterize the product. When the observed solutions have considerable area overheads [43] and still do
behavior diverges from the expected prespecified correct not provide complete accessibility to all of the processor’s
behavior (i.e., when a failure is found), further investigation internal state [43].
is required by the post-silicon debugging team. During a Solution—ACE framework for post-silicon debugging.
failure investigation, the post-silicon debug engineer tries to The ACE framework can be an effective low-overhead
1) isolate the failure, 2) find the root cause of the failure, framework that provides the post-silicon debug engineers
and 3) fix the failure, using features hardwired into the with full accessibility and controllability of the processor’s
design to support debugging as well as tools external to the internal microarchitectural state at runtime. This capability
design [24]. can be helpful to post-silicon debug engineers in isolating
Motivation. The trends of higher device integration into design bugs and finding their root causes. Furthermore, once
a single chip and the high complexity of modern processor a design bug is isolated and its causes have been identified,
designs make the post-silicon debugging phase a signifi- the ACE framework can be used to dynamically overwrite
cantly costly process, both in terms of resources and time. the microarchitectural state, and thus, emulate a potential
For modern processors, the post-silicon debugging phase hardware fix. This allows the debug engineer to quickly
can easily cost $15-20 million and take six months to observe the effects of a potential design fix and verify its
complete [16]. The post-silicon debugging phase is esti- correctness without any physical hardware modification.
mated to take up to 35 percent of the chip design cycle [8], Specifically, the event that triggers a failure investigation
resulting in a lengthy time-to-market. As the level of device by a post-silicon debug engineer is an incorrect design
integration continues to rise and the complexity of modern output during the execution of the validation test suite.
processor designs increases [15], this problem will be However, by just observing the incorrect output, it is very
exacerbated leading to either 1) very expensive and long hard to pinpoint the root cause of the failure. Therefore,
post-silicon debugging phases, which would adversely further debugging of the failure is required. The first step in
affect the processor cost and/or time-to-market or 2) more this process is the reproduction of the conditions under
buggy designs being released to the customers due to poor which the failure occurred. Once the failure is reproduced,
post-silicon debugging [61], [46], which would likely debugging tools can be used to analyze the design’s internal
increase the fraction of chips that fail in the field. state and pinpoint the design bug. This is where the ACE
There are two major challenges in the post-silicon firmware could be very useful to a post-silicon debug
debugging of modern highly integrated processors. First, engineer. The debug engineer can run the ACE firmware as
because the internal signals of the microarchitecture have an independent thread (called the ACE debugging thread)
limited observability to the testing software, it is difficult to that runs in conjunction with the validation test thread to
isolate a failure and find its root cause. Second, because the identify the root cause of the failure and evaluate a potential
hardware design is not easily or flexibly alterable by the design fix. We first describe the required extensions to the
post-silicon debug engineer, it is difficult to evaluate ACE framework to support post-silicon debugging using
whether or not a potential fix to the design eliminates the the ACE firmware, then provide a detailed example of how
cause of the failure [25]. Existing techniques that are used to the debug engineer uses the ACE framework.
address these two challenges are not adequate, as briefly ACE instructions for post-silicon debugging. Table 7
explained below. shows the ACE instruction set extensions that enable the
Traditional techniques used to address the limited signal synchronization between the validation test thread and the
observability problem are built-in scan chains [62], [25] and ACE debugging thread.
optical probing tools [63]. Unfortunately, both have sig- The ACE_pause instruction pauses the execution of the
nificant shortcomings. The use of built-in scan chains to running validation test thread after it is executed for a given
monitor internal signals is very slow due to the serial nature number of clock cycles and switches execution to the ACE
of external scan testing [19]. The effectiveness of optical debugging thread. The execution switch between the
probing tools reduces with each technology generation as validation test thread and the ACE debugging thread is
direct probing becomes very difficult, if not impossible, scheduled by setting an interrupt counter to the parameter
with more metal layers and smaller devices [60]. Further- value of the ACE_pause instruction. This interrupt counter
more, it is very hard to integrate these two techniques into decrements every clock cycle during the execution of the
an automated post-silicon debugging environment [60]. validation test thread. Once the counter becomes zero, the
The traditional technique used to evaluate design fixes processor state and scan state get swapped, thus, taking a
is the Focused Ion Beam (FIB) [24] technique, which snapshot of the running microarchitectural state of the
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
CONSTANTINIDES ET AL.: A FLEXIBLE SOFTWARE-BASED FRAMEWORK FOR ONLINE DETECTION OF HARDWARE DEFECTS 1075
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
1076 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 8, AUGUST 2009
achievable using traditional post-silicon debugging techni- testing, logic BIST techniques use the scan infrastructure
ques that were described previously. However, the use of to apply the on-chip pseudorandomly generated test
the ACE framework provides a promising post-silicon patterns and employ specialized hardware to compact the
debugging tool that can ease, shorten, and reduce the cost test responses [7]. Furthermore, the control signals used for
of the post-silicon design process. The main advantages of testing are driven by an on-chip test controller. Therefore, a
ACE framework-based post-silicon debugging are the clear advantage of logic BIST over the traditional manu-
following: facturing testing methodology is that it significantly reduces
the amount of data that is communicated between the tester
1. It eases the debugging process: ACE framework- and the chip. This leads to shorter testing times and, as a
based debugging is closer to software, very similar result, lower testing cost. Logic BIST also allows the
to the software debugging process, and therefore, is manufacturing test to be performed at-speed (i.e., at the
trivial to understand and use by the debug engineer. chip’s normal operating frequency rather than the fre-
This ease in debugging is achieved by providing quency of the automatic test equipment), which improves
complete accessibility and controllability of the both the speed and quality of testing.
hardware state to the debug engineer. Although logic BIST addresses major challenges of the
2. It can test potential design bug fixes without traditional manufacturing testing methodology, it also
physically and permanently modifying the under- imposes some new challenges. First, logic BIST requires the
lying hardware. This reduces both the cost and on-chip storage of a very large amount of pseudorandomly
difficulty of post-silicon debugging by reducing the generated test patterns. Second, because logic BIST uses
manual labor involved in fixing the design bugs. pseudorandomly generated test patterns, it often provides
3. It can accelerate the post-silicon debugging process significantly lower fault coverage than that provided by a
because it does not require very slow procedures much smaller number of high-quality, ATPG-pregenerated
such as scan-out of the whole microarchitectural test patterns [7]. Third, the use of the logic BIST methodology
state or manual modification of the underlying requires significantly more stringent design rules than
hardware using the aforementioned FIB technique conventional manufacturing testing [19]. For example, bus
to evaluate potential design fixes. conflicts must be eliminated and the circuit must be made
random-pattern testable [19]. Therefore, logic BIST techni-
6.2 ACE Framework for Manufacturing Testing ques significantly increase both the hardware cost and the
Manufacturing testing is the phase that follows chip design complexity, while resulting in lower test coverage.
fabrication and screens out parts with defective or weak Proposed solution—use of the ACE framework for
devices. Today, most complex microprocessor designs use manufacturing testing. The ACE infrastructure incorpo-
scan chains as the fundamental design for test (DFT) rates the advantages of both the scan-based and logic BIST
methodology. During the manufacturing testing phase, testing methodologies, while it also can effectively address
the design’s scan chains are driven by external automatic their limitations. Specifically, the ACE infrastructure
test equipment (ATE) that applies pregenerated test provides two capabilities that are not together present in
patterns to check the chip under test [7]. During the previous manufacturing testing techniques. First, the ACE
manufacturing testing phase, every single chip has to go framework is a built-in solution for fast loading of high-
through this testing process multiple times at different quality pregenerated ATPG test patterns into the scan-
voltage, temperature, and frequency levels. Therefore, the chain structures through software. This capability can
manufacturing testing cost for each chip can be as high as eliminate the need for expensive and slow external
25-30 percent of the total manufacturing cost [19]. equipment, currently needed for test pattern loading.
Motivation. Although this testing methodology served Second, the ACE framework allows the test patterns to be
the semiconductor industry well for the last few decades, it loaded and applied at-speed at the chip’s normal operating
has started to face an increasing number of challenges due frequency rather than the much slower operating fre-
to the exponential increase in the complexity of modern quency of the automatic test equipment, which results in
microprocessors [15], a product of the continuous silicon higher quality testing.
process technology scaling. With these two capabilities, the ACE framework pro-
Specifically, the external ATE testers have a limited vides the best of both existing manufacturing testing
number of channels to drive the design’s scan chains due techniques: 1) fast loading of test patterns to reduce testing
to package pin limitations [19]. Furthermore, the speed of time, 2) at-speed testing of the chip to improve testing
test pattern loading is limited by the maximum scan quality as well as to reduce testing time, and 3) testing with
frequency that is usually much lower than the chip’s ATPG-pregenerated test patterns rather than the use of
operating frequency [19], [7]. The limited throughput of the pseudorandomly generated test patterns, to improve test-
scan interface between the external tester and the design ing quality. Thus, if employed by the future integrated
under test constitutes the main bottleneck. These limita- circuit manufacturing testing methodologies, it can greatly
tions in combination with the larger set of test patterns improve the speed, cost, and test coverage of the costly
required for testing modern multimillion gate designs lead manufacturing testing phase of the microprocessor devel-
to longer time spent on the tester per chip. Even today, the opment cycle.
amount of time a chip spends on a tester can be several
seconds [19]. Considering that the amortized testing cost of
high-end test equipment is estimated to be at thousands of 7 RELATED WORK
dollars per hour [5], [19], the conventional manufacturing Hardware-based reliability techniques. The previous
testing process can be very cost-ineffective for micropro- work most closely related to this work is [50]. In [50],
cessor vendors. we proposed a hardware-based technique that utilizes
Alternative solutions. Logic BIST is a testing methodol- microarchitectural checkpointing to create epochs of
ogy based on pseudorandom test pattern generation and execution during which on-chip distributed BIST-like
test response compaction. To speed up manufacturing testers validate the integrity of the underlying hardware.
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
CONSTANTINIDES ET AL.: A FLEXIBLE SOFTWARE-BASED FRAMEWORK FOR ONLINE DETECTION OF HARDWARE DEFECTS 1077
To lower silicon cost, the testers were customized to the testing of the processor. The proposed scheme requires very
tested modules. However, this leads to increased design little additional hardware cost. It requires an LFSR for
complexity because a specialized tester needs to be generating randomized operands for test instructions and an
designed for each module. MISR for generating the result signature. Also, a minor
A traditional defect detection technique that is predomi- modification of the ISA is required for the test instructions to
nantly used for manufacturing testing is logic BIST [7]. read/write from the LFSR/MISR. Similarly, Kranitis et al.
Logic BIST incorporates pseudorandom pattern generation [29] use the knowledge of the ISA and the RTL-level model of
and response validation circuitry on the chip. Although on- a processor to select high fault coverage instructions and
chip pseudorandom pattern generation removes any need their operands to include in self-test software routines.
for pattern storage, such designs require a large number of Batcher and Papachristiou [3] employ instruction randomi-
random patterns and often provide lower fault coverage zation hardware to generate randomized instructions to be
than ATPG patterns [7]. used in self-test software routines for functional testing.
This work improves on these previous works due to the Brahme and Abraham [6] describe how to generate rando-
following major reasons: mized instruction sequences to be used in self-test software
routines. Building upon these works, Chen and Dey [9]
1. It effectively removes the need for on-chip test propose a mechanism that generates instruction sequences to
pattern generation and validation circuitry and exercise structural test patterns designed to test processor
moves this functionality to software; components and applies such instruction sequences in the
2. It is not hardwired in the design, and therefore, has software-based self-test routines to achieve higher coverage
ample flexibility to be modified/upgraded in the than other approaches that randomly generate instruction
field; sequences.
3. It has higher test coverage and shorter testing time Our technique is fundamentally different from these
because it uses ATPG instead of pseudorandomly instruction-based functional testing techniques in that it is a
generated patterns; structural testing approach that uses software routines to
4. In contrast to [50], it can uniformly be applied to any apply test patterns. We introduce new instructions that are
microprocessor module with low design complexity capable of applying high-quality ATPG-generated structural
because it does not require module-specific custo- test patterns to every processor segment by exposing the scan
mizations; and chain to the instruction set architecture. Software self-test
5. It provides wider coverage across the whole chip, routines that use these instructions can therefore directly
including noncore modules. apply test patterns to processor structures and read test
Software-based reliability techniques. A very recent responses, which results in the fast and high-coverage
approach proposes the detection of silicon defects by structural testing of each processor component. In contrast,
employing low overhead detection strategies that monitor none of the previously proposed instruction-based func-
for simple software symptoms at the operating system level tional testing techniques are capable of directly applying test
[33]. These software-based detection techniques rely on the patterns to processor components. Instead, they execute
premise that silicon defects manifested in some microarch- existing ISA instruction sequences to indirectly (functionally)
itectural structures have a high probability (95 percent) to test the hardware for faults. As such, previous instruction-
propagate detectable symptoms through the software stack based functional test approaches, in general, lead to higher
to the operating system [33]. testing times or lower fault coverage since they rely on
The main differences between [33] and our work are: (randomized) functional testing.
1) unlike the probabilistic software symptom-based defect One recent previous work [41] employed purely soft-
detection, our technique checks the underlying hardware in ware-based functional testing techniques during the man-
a deterministic process through a structured high-quality ufacturing testing of the Intel Pentium 4 processor. In our
test methodology with very high fault coverage (99 percent) approach, we use a similar functional testing technique (our
and can be executed on demand, 2) software symptom- “basic core functional test” program) to check the basic core
based defect detection techniques can flag the possible functionality before running the ACE firmware to perform
existence of a hardware failure, but they do not have the directed, high-quality testing. In fact, any of the previously
capability to diagnose which part of the underlying hard- proposed instruction-based functional testing approaches
ware is defective. In our technique, by employing ATPG test can be used as the basic core functional test within the ACE
patterns, it is trivial to diagnose the defective device at a framework.
very fine granularity.
Instruction-based functional testing. A large amount of
work has been performed in functional testing [6], [26], [31] 8 SUMMARY AND CONCLUSIONS
of microprocessors. The most relevant of these to our We introduced a novel, flexible software-based technique,
approach are the instruction-based functional self-test ISA extensions, and microarchitecture support to detect and
techniques. In general, these techniques apply randomly diagnose hardware defects during online operation of a
generated or automatically selected instruction sequences chip-multiprocessor. Our technique uses the Access-Control
and/or combinations of instruction sequences and ran- Extension (ACE) framework that allows special ISA
domly or automatically generated operands to test for instructions to access and control virtually any part of the
hardware defects. If the result of the test sequence does not processor’s internal state. Based on this framework, we
match the expected output of the instruction sequence, then proposed the use of special firmware that periodically
a hardware fault is declared. suspends the processor’s execution and performs high-
We briefly describe the state-of-the-art approaches that quality testing of the underlying hardware to detect defects.
work in this manner: In [58], a self-test program written in We described several execution models for the interaction
processor assembly language and the expected results of the of the special testing firmware with the applications
program are stored in on-chip ROM memory. When running on the processor for both single-threaded and
invoked, the self-test program performs at-speed functional multithreaded processing cores.
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
1078 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 8, AUGUST 2009
Using a commercial ATPG tool and three different fault [9] L. Chen and S. Dey, “Software-Based Self-Testing Methodology
models, we experimentally evaluated our ACE testing for Processor Cores,” IEEE Trans. Computer-Aided Design of
Integrated Circuits and Systems, vol. 20, no. 3, pp. 369-380, Mar. 2001.
technique on a commercial chip multiprocessor design based
[10] K. Constantinides, J. Blome, S. Plaza, B. Zhang, V. Bertacco, S.
on Sun’s Niagara. Our experimental results showed that Mahlke, T. Austin, and M. Orshansky, “BulletProof: A Defect-
ACE testing is capable of performing high-quality hardware Tolerant CMP Switch Architecture,” Proc. 12th Int’l Symp. High
testing for 99.22 percent of the chip area. Based on our Performance Computer Architecture (HPCA-12), 2006.
detailed RTL implementation, implementing the ACE frame- [11] K. Constantinides, O. Mutlu, and T. Austin, “Online Design Bug
Detection: RTL Analysis, Flexible Mechanisms, and Evaluation,”
work requires a 5.8 percent increase in Sun Niagara’s chip
Proc. 41st Ann. Int’l Symp. Microarchitecture (MICRO-41), 2008.
area and a 4 percent increase in its power consumption [12] K. Constantinides, O. Mutlu, T. Austin, and V. Bertacco, “Soft-
envelope. ware-Based Online Detection of Hardware Defects: Mechanisms,
We demonstrated how ACE testing can seamlessly be Architectural Support, and Evaluation,” Proc. 40th Ann. Int’l Symp.
coupled with a coarse-grained checkpointing and recovery Microarchitecture (MICRO-40), 2007.
mechanism to provide a complete defect tolerance solution. [13] W.J. Dally, L.R. Dennison, D. Harris, K. Kan, and T. Xanthopoulos,
“The Reliable Router: A Reliable and High-Performance Commu-
Our evaluation shows that, with coarse-grained checkpoint nication Substrate for Parallel Computers,” Proc. Parallel Computer
intervals, the average performance overhead of ACE testing Routing and Comm. Workshop (PCRCW), 1994.
is only 5.5 percent. Our results also show that the software- [14] N. Durrant and R. Blish, “Semiconductor Device Reliability
based nature of ACE testing provides ample flexibility to Failure Models,” http://www.sematech.org/, 2000.
dynamically tune the performance-reliability trade-off at [15] M.J. Flynn and P. Hung, “Microprocessor Design Issues: Thoughts
on the Road Ahead,” IEEE Micro, vol. 25, no. 3, pp. 16-31, May/
runtime based on system requirements. June 2005.
We also described how the ACE framework can be used [16] R. Goering, “Post-Silicon Debugging Worth a Second Look,”
to improve the quality and reduce the cost of two critical Electronic Eng. Times, Feb. 2007.
phases of microprocessor development: post-silicon debug- [17] R. Guo, S. Mitra, E. Amyeen, J. Lee, S. Sivaraj, and S. Venkatara-
ging and manufacturing testing. Our descriptions showed man, “Evaluation of Test Metrics: Stuck-At, Bridge Coverage
that the flexibility provided by the ACE framework can Estimate and Gate Exhaustive,” Proc. Very Large Scale Integration
(VLSI) Test Symp. (VTS), 2006.
significantly ease and accelerate the post-silicon debugging
[18] P. Guptan and A.B. Kahng, “Manufacturing-Aware Physical
process by making the microarchitecture state easily Design,” Proc. Int’l Conf. Computer-Aided Design (ICCAD), 2003.
accessible and controllable by the post-silicon debug [19] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan,
engineers. Similarly, the flexibility of the ACE framework and J. Rajski, “Logic BIST for Large Industrial Designs: Real Issues
can eliminate the need for expensive automatic test and Case Studies,” Proc. Int’l Test Conf. (ITC), Sept. 1999.
equipment or costly yet lower coverage hardware changes [20] NetPerf: A Network Performance Benchmark. Hewlett-Packard
Company, 1995.
(e.g., logic BIST) needed for manufacturing testing. We
[21] H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y.
conclude that the ACE framework is a general framework Nakase, and T. Nishizawa, “An Elementary Processor Architecture
that can be used for multiple purposes to enhance the with Simultaneous Instruction Issuing from Multiple Threads,”
reliability and to reduce the design/testing cost of modern Proc. 19th Int’l Symp. Computer Architecture (ISCA-19), 1992.
microprocessors. [22] H. Holzapfel and P. Levin, “Advanced Post-Silicon Verification
and Debug,” EDA Tech Forum, vol. 3, no. 3, Sept. 2006.
[23] A.M. Ionescu, M.J. Declercq, S. Mahapatra, K. Banerjee, and J.
ACKNOWLEDGMENTS Gautier, “Few Electron Devices: Towards Hybrid CMOS-SET
Integrated Circuits,” Proc. Design Automation Conf. (DAC), 2002.
The authors thank the anonymous reviewers for their [24] D. Josephson, “The Good, the Bad, and the Ugly of Silicon Debug,”
feedback. This work was supported by grants from the Proc. 43rd Design Automation Conf. (DAC-43), pp. 3-6, 2006.
National Science Foundation (NSF), SRC, and GSRC, and is [25] D. Josephson and B. Gottlieb, “The Crazy Mixed up World of
an extended and revised version of [12]. Silicon Debug,” Proc. IEEE Custom Integrated Circuits Conf.
(IEEE-CICC), 2004.
[26] H. Klug, “Microprocessor Testing by Instruction Sequences
REFERENCES Derived from Random Patterns,” Proc. Int’l Test Conf. (ITC), 1988.
[27] C. Kong, “A Hardware Overview of the NonStop Himalaya
[1] A. Agarwal, B.-H. Lim, D.A. Kranz, and J. Kubiatowicz, “April: A (K10000),” Tandem Systems Overview, vol. 10, no. 1, pp. 4-11, 1994.
Processor Architecture for Multiprocessing,” Proc. 17th Ann. Int’l [28] P. Kongetira, K. Aingaran, and K. Olukotun, “Niagara: A 32-Way
Symp. Computer Architecture (ISCA-17), 1990. Multithreaded SPARC Processor,” IEEE Micro, vol. 25, no. 2,
[2] T.M. Austin, “DIVA: A Reliable Substrate for Deep Submicron pp. 21-29, Mar./Apr. 2005.
Microarchitecture Design,” Proc. 32nd Ann. Int’l Symp. Micro-
[29] N. Kranitis, A. Paschalis, D. Gizopoulos, and Y. Zorian, “Instruc-
architecture (MICRO-32), 1999.
tion-Based Self-Test of Processor Cores,” Proc. Very Large Scale
[3] K. Batcher and C. Papachristiou, “Instruction Randomization Self
Integration (VLSI) Test Symp. (VTS), 2002.
Test for Processor Cores,” Proc. Very Large Scale Integration (VLSI)
Test Symp. (VTS), 1999. [30] R. Kuppuswamy, P. DesRosier, D. Feltham, R. Sheikh, and P.
[4] S. Borkar, T. Karnik, and V. De, “Design and Reliability Thadikaran, “Full Hold-Scan Systems in Microprocessors: Cost/
Challenges in Nanometer Technologies,” Proc. 41st Ann. Conf. Benefit Analysis,” Intel Technology J., vol. 8, no. 1, pp. 63-72, Feb.
Design Automation (DAC-41), 2004. 2004.
[5] B. Bottoms, “The Third Millennium’s Test Dilemma,” IEEE Design [31] J. Lee and J.H. Patel, “An Instruction Sequence Assembling
and Test of Computers, vol. 15, no. 4, pp. 7-11, Oct.-Dec. 1998. Methodology for Testing Microprocessors,” Proc. Int’l (r) Test Conf.
[6] D. Brahme and J.A. Abraham, “Functional Testing of Micro- (ITC), Sept. 1992.
processors,” IEEE Trans. Computers, vol. 33, no. 6, pp. 475-485, [32] A.S. Leon, K.W. Tam, J.L. Shin, D. Weisner, and F. Schumacher,
June 1984. “A Power-Efficient High-Throughput 32-Thread SPARC Proces-
[7] M.L. Bushnell and V.D. Agrawal, Essentials of Electronic Testing for sor,” IEEE J. Solid-State Circuits, vol. 42, no. 1, pp. 7-16, Jan. 2007.
Digital, Memory and Mixed-Signal VLSI Circuits. Kluwer Academic [33] M.-L. Li, P. Ramachandran, S.K. Sahoo, S.V. Adve, V.S. Adve, and
Publishers, 2000. Y. Zhou, “Understanding the Propagation of Hard Errors to
[8] K.-H. Chang, I.L. Markov, and V. Bertacco, “Automating Post- Software and Implications for Resilient System Design,” Proc. 13th
Silicon Debugging and Repair,” Proc. Int’l Conf. Computer-Aided Int’l Conf. Architectural Support for Programming Languages and
Design (ICCAD), Nov. 2007. Operating Systems (ASPLOS-XIII), 2008.
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.
CONSTANTINIDES ET AL.: A FLEXIBLE SOFTWARE-BASED FRAMEWORK FOR ONLINE DETECTION OF HARDWARE DEFECTS 1079
[34] Y. Li, S. Makar, and S. Mitra, “CASP: Concurrent Autonomous [59] D. Tullsen, S. Eggers, and H. Levy, “Simultaneous Multithreading:
Chip Self-Test Using Stored Test Patterns,” Proc. Conf. Design, Maximizing On-Chip Parallelism,” Proc. 22nd Int’l Symp. Computer
Automation and Test in Europe (DATE), 2008. Architecture (ISCA-22), June 1995.
[35] C.-K. Luk, R.S. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. [60] D.P. Vallett, “Future Challenges in IC Testing and Fault Isolation,”
Wallace, V.J. Reddi, and K. Hazelwood, “Pin: Building Custo- Proc. IEEE Ann. Meeting of Lasers and Electro-Optics Society (LEOS),
mized Program Analysis Tools with Dynamic Instrumentation,” vol. 2, pp. 539-540, Oct. 2003.
Proc. Conf. Programming Language Design and Implementation [61] I. Wagner, V. Bertacco, and T. Austin, “Shielding against Design
(PLDI), 2005. Flaws with Field Repairable Control Logic,” Proc. 43rd Design
[36] E.J. McCluskey and C.-W. Tseng, “Stuck-Fault Tests vs. Actual Automation Conf. (DAC-43), 2006.
Defects,” Proc. Int’l Test Conf. (ITC), pp. 336-343, Oct. 2000. [62] T.J. Wood, “The Test and Debug Features of the AMD-K7
[37] M. Meterelliyoz, H. Mahmoodi, and K. Roy, “A Leakage Control Microprocessor,” Proc. Int’l Test Conf. (ITC), pp. 130-136, 1999.
System for Thermal Stability during Burn-In Test,” Proc. Int’l Test [63] W.M. Yee, M. Paniccia, T. Eiles, and V. Rao, “Laser Voltage Probe
Conf. (ITC), 2005. (LVP): A Novel Optical Probing Technology for Flip-Chip
[38] S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K.S. Kim, “Robust Packaged Microprocessors,” Proc. Int’l Symp. Physical and Failure
System Design with Built-In Soft-Error Resilience,” Computer, Analysis of Integrated Circuits (IPFA-7), 1999.
vol. 38, no. 2, pp. 43-52, Feb. 2005.
[39] J. Nakano, P. Montesinos, K. Gharachorloo, and J. Torrellas, Kypros Constantinides received the BS
“ReViveI/O: Efficient Handling of I/O in Highly-Available degree in computer science from the University
Rollback-Recovery Servers,” Proc. Int’l Symp. High-Performance of Cyprus, in 2004, and the MS degree in
Computer Architecture (HPCA), 2006. electrical engineering and computer science
[40] E.B. Nightingale, P.M. Chen, and J. Flinn, “Speculative Execution from the University of Michigan, Ann Arbor, in
in a Distributed File System,” ACM Trans. Computer Systems, 2006. He is currently working toward the PhD
vol. 24, no. 4, pp. 361-392, Nov. 2006. degree in electrical engineering and computer
[41] P. Parvathala, K. Maneparambil, and W. Lindsay, “FRITS—A science at the University of Michigan, Ann Arbor.
Microprocessor Functional BIST Method,” Proc. Int’l Test Conf. He is interested in computer architecture re-
(ITC), 2002. search with a focus in reliable system design. He
[42] M. Prvulovic, Z. Zhang, and J. Torrellas, “ReVive: Cost-Effective previously worked at Microsoft Research and Intel Corporation. He
Architectural Support for Rollback Recovery in Shared-Memory received the Intel Foundation PhD Fellowship in 2008. He is a student
Multiprocessors,” Proc. 29th Int’l Symp. Computer Architecture member of the IEEE.
(ISCA-29), 2002.
[43] B.R. Quinton and S.J.E. Wilton, “Post-Silicon Debug Using Onur Mutlu received the BS degree in computer
Programmable Logic Cores,” Proc. Conf. Field-Programmable engineering and psychology from the University
Technology (FPT), pp. 241-248, 2005. of Michigan, Ann Arbor, and the MS and PhD
[44] J.M. Rabaey, Digital Integrated Circuits: A Design Perspective. degrees in ECE from the University of Texas at
Prentice-Hall, Inc., 1996. Austin. He is currently an assistant professor of
[45] J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Privulovic, L. Ceze, S. ECE at Carnegie Mellon University. Prior to
Sarangi, P. Sack, K. Stauss, and P. Montesinos, “SESC Simulator,” Carnegie Mellon, he worked at Microsoft Re-
http://sesc.sourceforge.net, 2002. search, Intel Corporation, and Advanced Micro
[46] S. Sarangi, S. Narayanasamy, B. Carneal, A. Tiwari, B. Calder, and Devices. He is interested in computer architec-
J. Torrellas, “Patching Processor Design Errors with Programmable ture and systems research, especially in the
Hardware,” IEEE Micro, vol. 27, no. 1, pp. 12-25, Jan./Feb. 2007. interactions between languages, operating systems, compilers, and
[47] M.J. Serrano, W. Yamamoto, R.C. Wood, and M. Nemirovsky, “A microarchitecture. He was a recipient of the Intel PhD Fellowship in
Model for Performance Estimation in a Multistreamed, Super- 2004, the University of Texas George H. Mitchell Award for Excellence
scalar Processor,” Proc. Seventh Int’l Conf. Modeling Techniques and in Graduate Research in 2005, the Microsoft Gold Star Award in 2008,
Tools for Computer Performance Evaluation, 1994. and five “Computer Architecture Top Pick” Paper Awards by the IEEE
[48] P. Shivakumar, S.W. Keckler, C.R. Moore, and D. Burger, Micro Magazine. He is a member of the IEEE.
“Exploiting Microarchitectural Redundancy for Defect Tolerance,”
Proc. Int’l Conf. Computer Design (ICCD), 2003. Todd Austin received the PhD degree in
[49] M. Shulz, “The End of the Road for Silicon,” Nature Magazine, June computer science from the University of Wis-
1999. consin, Madison, in 1996. He is an associate
[50] S. Shyam, K. Constantinides, S. Phadke, V. Bertacco, and T. professor of electrical engineering and computer
Austin, “Ultra Low-Cost Defect Protection for Microprocessor science at the University of Michigan, Ann Arbor.
Pipelines,” Proc. 12th Int’l Conf. Architectural Support for Program- Prior to joining academia, he was a senior
ming Languages and Operating Systems (ASPLOS-12), pp. 73-82, computer architect at Intel’s Microprocessor
2006. Research Labs, a product-oriented research
[51] D.P. Siewiorek and R.S. Swarz, Reliable Computer Systems: Design laboratory in Hillsboro, Oregon. His research
and Evaluation, third ed. AK Peters, Ltd., 1998. interests include computer architecture, compi-
[52] D.J. Sorin, M.M.K. Martin, M.D. Hill, and D.A. Wood, “SafetyNet: lers, computer system verification, and performance analysis tools and
Improving the Availability of Shared Memory Multiprocessors techniques. He is a member of the IEEE.
with Global Checkpoint/Recovery,” Proc. 29th Int’l Symp. Compu-
ter Architecture (ISCA-29), 2002. Valeria Bertacco received the Laurea degree
[53] J. Srinivasan, S.V. Adve, P. Bose, and J.A. Rivers, “The Impact of in computer engineering from the University of
Technology Scaling on Lifetime Reliability,” Proc. Int’l Conf. Padova, Italy, and the MS and PhD degrees in
Dependable Systems and Networks (DSN-34), 2004. electrical engineering from Stanford University
[54] J.H. Stathis, “Reliability Limits for the Gate Insulator in CMOS in 2003. She is an assistant professor of
Technology,” IBM J. Research and Development, vol. 46, nos. 2/3, electrical engineering and computer science at
pp. 265-286, 2002. the University of Michigan. She joined the
[55] OpenSPARC T1 Microarchitecture Specification. Sun Microsystems, faculty at Michigan after being at Synopsys
Inc., Aug. 2006. for four years. Her research interests are in the
[56] TetraMAX ATPG User Guide, version 2002.05. Synopsys, http:// areas of formal and semiformal design verifica-
www.synopsys.com, 2002. tion with emphasis on full design validation and digital system
[57] D. Tarjan, S. Thoziyoor, and N.P. Jouppi, “Cacti 4.0.,” Technical reliability. She is an associate editor of the IEEE Transactions on
Report hpl-2006-86, Hewlett-Packard, 2006. Computer-Aided Design of Integrated Circuits and Systems and has
served on the program committees for DAC and ICCAD. She is a
[58] M.H. Tehranipour, S. Fakhraie, Z. Navabi, and M. Movahedin, “A
recipient of the US National Science Foundation (NSF) CAREER
Low-Cost At-Speed Bist Architecture for Embedded Processor
Award and the University of Michigan’s Outstanding Achievement
and Sram Cores,” J. Electronic Testing: Theory and Applications,
Award. She is a member of the IEEE.
vol. 20, no. 2, pp. 155-168, 2004.
Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on September 11, 2009 at 22:02 from IEEE Xplore. Restrictions apply.