0% found this document useful (0 votes)
21 views12 pages

A Framework For Reliability Analysis of Combinational Circuits Using Approximate Bayesian Inference

Uploaded by

sonali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views12 pages

A Framework For Reliability Analysis of Combinational Circuits Using Approximate Bayesian Inference

Uploaded by

sonali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO.

4, APRIL 2023 543

A Framework for Reliability Analysis of


Combinational Circuits Using Approximate
Bayesian Inference
Shivani Bathla , Graduate Student Member, IEEE, and Vinita Vasudevan , Member, IEEE

Abstract— A commonly used approach to compute the error


rate at the primary outputs (POs) of a circuit is to compare
the fault-free and faulty copies of the circuit using XOR gates.
This model results in poor accuracies with nonsampling-based
methods for reliability estimation. An alternative is to use a
single copy of the circuit with a four-valued representation for
each net corresponding to the correct and incorrect signals.
One problem in this formulation is the accurate propagation
of associated probabilities. We use the framework of Bayesian
inference (BI) to address this issue. We derive the conditional
probability distribution (CPD) corresponding to the four-valued
signals and find the output error rate using various approximate
BI techniques. With our formulation, we demonstrate that the
output error rate scales with the gate error probabilities. It is
guaranteed to be zero when the gate error probability is zero,
provided approximate BI algorithms based on sum-product belief
propagation (BP) are used. Although inaccuracies increase at
very low gate error probabilities, it is able to capture the relative
reliability of outputs with respect to each other. We also propose
a new method for finding the overall circuit error rate as the
partition function for a fixed state of POs. This method provides
a significant improvement in accuracy when compared with the
existing method using OR gates.
Index Terms— Bayesian inference (BI), Bayesian networks
(BNs), error rate, reliability, signal probability.

I. I NTRODUCTION

A GGRESSIVE scaling of CMOS technology has led to


a sharp increase in manufacturing defects and tran-
sient faults in gates due to low threshold voltages, process
variations, electromigration, and crosstalk. This has resulted
Fig. 1. Models for reliability analysis of circuits. F denotes the error-free
in unreliable logic gates, whose outputs are not completely circuit, and F̂ denotes the faulty circuit. Ei denotes the independent source of
determined by the inputs to the gate. Rather, given the inputs, error for the ith gate in the faulty circuit, and E g is the set of all error sources.
one can only specify the probability that the output of a gate is E i denotes the error signal corresponding to the ith PO, and E denotes error
in at least one of the POs. (a) Model of a faulty gate. (b) Circuit model used
a zero or a one. This probabilistic behavior can also occur due to compute error rate at outputs. (c) Miter used to compute circuit error rate.
to the aging of circuits. It is exploited by imprecise circuits
for error-resilient applications to get energy savings.
The most commonly used model for an unreliable gate is the one instead of zero is the same as that of getting a zero instead
Von Neumann error model [1]. It is essentially a model of a of one. The errors in each gate are assumed to be independent
binary symmetric channel in which the probability of getting a of each other. At the gate level, this model is equivalent to
representing a faulty gate as a fault-free gate followed by an
Manuscript received 12 December 2022; revised 9 January 2023; XOR gate, with the other input of the XOR gate connected to
accepted 14 January 2023. Date of publication 26 January 2023; date of an independent source of error (E), as shown in Fig. 1(a).
current version 22 March 2023. (Corresponding author: Shivani Bathla.)
The authors are with the Department of Electrical Engineering, The outputs of circuits designed using these unreliable gates
IIT Madras, Chennai 600036, India (e-mail: [email protected]; have a nonzero probability of error. The design, synthesis,
[email protected]). and optimization of these circuits require efficient methods
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TVLSI.2023.3237885. to compute this error probability. This computation involves
Digital Object Identifier 10.1109/TVLSI.2023.3237885 a comparison of the outputs of the error-free (F) and faulty
1063-8210 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
544 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023

circuit ( F̂) using XOR gates, as shown in Fig. 1(b). The most values of the signal. These methods are attractive since they
commonly used metric for evaluation of these circuits is the not only remove the additional reconvergent loop created by
error rate at each primary output (PO) of the circuit [2], [3], the XOR gates but also obviate the requirement for maintaining
[4], [5]. It is defined as the probability of error at a particular correlations between the error-free and corresponding erro-
output, averaged over all possible values of the primary inputs neous nets. In [4], every net is associated with two conditional
(PIs). It is computed as the signal probability of the outputs of probabilities that give the probability of occurrence of error in
the XOR gates {E 1 , E 2 , . . . , E m } in Fig. 1(b), with the signal a net, given its correct value. In the methods proposed in [15],
probabilities of the PIs (X) set to 0.5. The other metric that [20], [21], [22], [23], [24], and [25], each net is modeled as
is used is the circuit error rate, which is the probability of a four-valued random variable, corresponding to its erroneous
getting an error in at least one of the outputs. This is typically and error-free values. The associated probabilities are the joint
computed after connecting a tree of OR gates to the XOR gates, probabilities of the error-free and the corresponding erroneous
as shown in Fig. 1(c). net. The problem here is the propagation of the four prob-
The problem of reliability analysis is therefore a problem abilities. The existing methods either assume independence
of computing signal probabilities in a system that has both between gate inputs [15], [21] or compute signal correlation
fault-free and faulty circuits. The computation of signal prob- coefficients [4], [20] or reliability correlation coefficients [23],
abilities in a circuit is known to be #P complete [6] and [24], [25]. The main issue with these methods is thus the
is difficult even in error-free circuits. Reliability analysis is inaccuracies that arise due to the estimation and propagation
significantly more difficult due to the additional signals and of these correlations. Typically, pairwise signal correlation
gates needed to model erroneous circuits, as well as due to the coefficients are used, further limiting the accuracy.
additional reconvergent loop created by connecting XOR gates In this article, we propose a novel algorithm for reliability
at the output. Exact methods for signal probability computa- analysis based on approximate BI. As in [15], we use a single
tion include methods based on probabilistic transfer matrices copy of the circuit, with each net modeled as a four-valued
(PTMs) [7], [8], probabilistic gate models (PGMs) [9], binary random variable with the associated probabilities. However,
decision diagrams (BDDs) [3], [10], weighted model counting instead of propagating correlation coefficients, we cast the
(WMC) [3], and Bayesian networks (BNs) [11], [12]. How- problem as a BI problem. For each gate in the circuit,
ever, the associated exponential time/space complexity limits we derive the conditional probability distribution (CPD) that
the applicability of these methods to relatively small circuits. determines the probability of each of the four states of the
Several approximate methods have been proposed in the output, given all possible states of the inputs. Approximate
literature. The simulation-based methods include logic sim- (and wherever possible, exact) inference methods are then used
ulation using the Monte Carlo (MC) framework, stochastic to find the error rate at the POs. Although sampling-based
computation model (SCM) [13], and sampling-based Bayesian methods can also be used, the focus in this article is the
inference (BI) methods [5], [14]. In these methods, each approximate deterministic methods.
erroneous gate requires generation of an additional random With our formulation, we demonstrate that the output error
number stream. Since the accuracy depends on the number of rate scales with the gate error probabilities. It is also guar-
samples used, the time complexity increases as the gate error anteed to be zero when the gate error probability is zero,
probabilities reduce. Therefore, at low gate error probabilities, provided approximate BI algorithms based on sum-product
the sampling-based techniques become unsuitable for use belief propagation (BP) are used. This property does not
within an optimization framework that requires error rate hold good if the model in Fig. 1(b) is used for reliability
computations in each iteration. Another disadvantage is that computation. In this case, very often the computed error rate
these methods are inflexible in the sense that any change to at the output is 0.5, which is just noise. In contrast, although
the circuit requires a complete reevaluation. inaccuracies do increase at low gate error probabilities, our
An alternative is to use deterministic approximate meth- method is able to capture the relative reliability of the outputs
ods based on signal probability computation. The methods with respect to each other for gate error probabilities as low
proposed in [9] and [15] assume that the inputs to a gate as 10−6 . Hence, it can be used to identify POs that are more
are independent and thus have limited accuracy. To improve susceptible to error.
accuracy, signal correlation coefficients proposed in [16] We also propose a new formulation for finding the circuit
and [17] are used in the methods proposed in [4], [18], [19], error rate as the partition function corresponding to a fixed
and [20]. state of POs. This method gets rid of the additional OR
The main challenge in deterministic approximate techniques gates connected to the XOR outputs in Fig. 1(c) and results
is accounting for correlations between nets. This includes in a significant improvement in accuracy. It also has very
the correlations between the error-free and corresponding good run times and is suitable for use within an optimization
erroneous nets as well as the correlations due to reconvergent routine.
fanouts. The approximate analysis methods proposed in [4], The rest of this article is organized as follows. Section II
[15], [21], [22], [23], [24], [25], and [20] have an interesting has the notation used and the background on BI. We discuss
feature. Instead of using two copies of the circuit, they use the proposed formulation in Section III, the results obtained
a single copy in which each net is associated with additional in Section IV, and a more detailed comparison with related
probabilities that take into account the error-free and erroneous work in Section V. Finally, we present our conclusions.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 545

Fig. 3. Depiction of message-passing from cluster Ci to cluster C j .

where Pa Z i are the parents of Z i , P(Z i |Pa Z i ) is the CPD


associated with Z i , and Z is the set of all the variables in
the BN. Each CPD is said to be a factor over the set of
Fig. 2. (a) Sample circuit (b) BN model of the circuit and (c) CPD of an
error-free NAND gate. variables {Z i ∪ Pa Z i }, and the product of CPDs is computed
as a factor product (defined in the Appendix). The probability
II. BACKGROUND distribution of a variable Z i , P(Z i ), is referred to as the
marginal probability of Z i . The signal probability of Z i is
A. Notation
defined as P(Z i = 1).
We use the following notation. We use capital letters to The task of computing the marginal probabilities of all the
denote random variables, small letters to denote the states or nets in the circuit is one of the inference tasks possible in a BN.
values that a random variable can take, and boldface letters The deterministic methods for approximate BI are variational
to denote a set of random variables. Therefore, P(Z = z) techniques that optimize an energy functional using BP algo-
denotes the probability that the random variable Z takes on rithms. In a majority of these methods, BP is performed using
the value z. X and X̂ are used to denote the set of error-free and the sum-product BP algorithm [26, Ch. 11], which performs
erroneous PIs. Similarly, Y and Ŷ denote the set of error-free message-passing on an undirected cluster graph. Each node in
and erroneous POs of the circuit, respectively. this graph is associated with a cluster or a set of variables in
the BN, and it is denoted as Ci . The edge between two clusters
B. Bayesian Networks Ci and C j is associated with a subset of variables contained
in both the clusters, i.e., each edge is associated with a weight
A BN is a probabilistic graphical model that captures the Si, j ⊆ Ci ∩ C j . The weights Si, j are referred to as sep-sets.
joint probability distribution (JPD) over a set of variables (Z) Each CPD of the BN, called the initial factors, is assigned to a
using a directed acyclic graph (DAG). Each node in the DAG is single cluster in the undirected cluster graph that contains the
a random variable, and edges capture the causal relationship corresponding variable and its parents. The BP algorithm is a
between variables. The BN corresponding to a circuit is a message-passing algorithm in which a cluster Ci transmits a
DAG that has directed edges from each input to the output message to C j based on messages from all its other neighbors.
of a gate. The variables in the BN thus correspond to nets This is depicted in Fig. 3. The message m i→ j from cluster Ci
in the circuit that are either PI or outputs of gates in the to cluster C j is computed using the following sum-product
circuit. Variables that are connected by an incoming edge to operation
a variable Z i are referred to as the parents of Z i and denoted X Y
as Pa Z i . Each variable is associated with a CPD that specifies m i→ j (Si j ) = ψi m k→i (Sik ) (2)
the probability of the variable, given the state of its parents. Ci \Si j k∈Neighbors(i)\ j
In circuit parlance, the CPD specifies the probability that the
where ψi is the product of initial factors assigned to cluster Ci .
output of a gate takes on a particular value, given the values
The factor product of ψi and incoming messages from the
of the input. Fig. 2 shows an example of a BN for a circuit
neighbors of Ci is marginalized over variables that are not
and the CPD for an error-free NAND gate.
present in the sep-set Si, j .
A fundamental property satisfied by a BN is the following.
In exact inference techniques, the cluster graph is con-
Property 1: Given the state of its parent variables, a vari-
strained to be a tree with sep-set variables Si, j = Ci ∩ C j .
able in the BN is conditionally independent of all the
This tree is called the junction tree or the join tree (JT).
nonsuccessors.
Exact inference requires two rounds of message-passing—a
For the network in Fig. 2(b), this property would
downward pass from a randomly chosen root node to the
mean, for example, P(Y |Z 1 , Z 2 , X 1 ) = P(Y |Z 1 , Z 2 ) and
leaves of the tree and an upward pass back from the leaves
P(Z 1 , Z 2 |X 1 , X 2 ) = P(Z 1 |X 1 , X 2 )P(Z 2 |X 1 , X 2 ). As a result
to the root. The time and space complexity of exact inference
of this property, the JPD of all the variables in the network
using BP is exponential in the maximum cluster size. Circuits
(Z) can be written in a factorized form as follows:
Y with long and nested reconvergent loops tend to have larger
P(Z) = P(Z i | Pa Z i ) (1) clusters, making exact inference infeasible. The approximate
Z i ∈Z methods limit the maximum cluster size, but allow for cycles

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
546 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023

Based on this, the gate error model consisting of an


error-free gate followed by an XOR gate [shown in Fig. 1(a)]
can equivalently be represented as a CPD of the outputs, given
the input values, denoted as P( Ẑ |X). This can be derived as
follows:
X
P( Ẑ |X) = P(Z |X)P( Ẑ |Z , E)P(E)
z,ϵ
X
= P(Z |X)P( Ẑ |Z ). (4)
Fig. 4. Error model and the CPD for an erroneous signal and a faulty NAND z
gate. (a) Erroneous signal. (b) Faulty NAND gate.
For example, the CPD for a faulty two-input NAND gate
obtained using (4) is shown in Fig. 4(b). Note that all these
in the cluster graph. These include loopy BP (LBP) [27] and products are obtained using the factor product as defined in
its variants [28], [29], generalized BP (GBP) [30], and iterative the Appendix.
join graph propagation (IJGP) [31]. In these methods, the We motivate our formulation using an example.
message-passing algorithm is run iteratively on a loopy graph Fig. 5(a) and (b) shows the error model and the corresponding
until convergence. The following properties are satisfied after BN for reliability analysis of the circuit shown in Fig. 2(a).
convergence. Since all the variables in this BN are binary-valued, we denote
1) Each cluster is associated with a valid but possi- this BN model as the two-valued BN. As explained in
bly approximate JPD (Q i (Ci )) which is computed as Section II-B, the JPD of all the variables, Z, in the BN can
follows: be written as a product of the CPDs of all the variables [using
Y (1)]. For this example, therefore,
Q i (Ci ) = ψi · m j→i (Si j ). (3)
j∈Neighbors(i) P(Z) = P(X 1 )P(X 2 )P(X 3 ) f 1 f 2 f 3 P(E|Y, Ŷ )
2) The marginal probability of a variable is the same in all
where
the clusters in which it is present.
Note that not all the BP algorithms use the exact sum-product f 1 = P(Z 1 |X 1 , X 2 )P( Zˆ1 |X 1 , X 2 ) = P(Z 1 , Zˆ1 |X 1 , X 2 )
operation to compute messages. One example is the weighted f 2 = P(Z 2 |X 2 , X 3 )P( Zˆ2 |X 2 , X 3 ) = P(Z 2 , Ẑ 2 |X 2 , X 3 )
mini-bucket (WMB) method proposed in [32]. Another algo-
f 3 = P(Y |Z 1 , Z 2 )P(Ŷ | Ẑ 1 , Zˆ2 ) = P(Y, Ŷ |Z 1 , Ẑ 1 , Z 2 , Ẑ 2 ).
rithm for approximate inference is the incremental build-infer-
approximate technique (IBIA) [33] which uses a sequence (5)
of junction trees. The maximum cluster size in each of the
This result follows from Property 1 for BNs. Here, for ease
junction-trees is set to a user-specified value. Sum-product BP
of explanation, the PIs (X = {X 1 , X 2 , X 3 }) are assumed to be
is used in each junction tree to infer probabilities. In most
error-free. The error probability of the circuit, P(E), can be
of these algorithms, it is possible to trade off runtime for
obtained by summing the JPD, P(Z), over the states of all
increased accuracy by having a larger number of variables
the other variables in the circuit, in a process called variable
in a cluster.
elimination [26], [37].
The other techniques used for inference are methods based
Each of the factors f 1 , f 2 , and f 3 in (5) can be considered to
on WMC [34]. These are meant for exact inference and
be a CPD, representing the JPD of the error-free and erroneous
can be used only for small circuits. There are also several
outputs given the values of the error-free and erroneous inputs.
sampling-based techniques such as probabilistic logic sam-
CPDs P(Y |Z 1 , Z 2 ) and P(Ŷ | Ẑ 1 , Zˆ2 ) are 4 × 2 arrays, and
pling (PLS) [14], evidence prepropagated importance sam-
pling [35], and Gibbs sampling [36]. The accuracy obtained therefore, the factor product will give P(Y, Ŷ |Z 1 , Z 2 , Ẑ 1 , Ẑ 2 )
with these methods relies on the method used for sample as a 16 × 4 array.
generation and the number of samples used. Based on this grouping of CPDs, it can be easily seen
that the system for reliability analysis can be represented as
III. P ROPOSED F ORMULATION a single BN, with each variable { Z̃ } taking on four values
{00, 01, 10, 11} that denote the values of the corresponding
A. Motivation
signal in the fault-free and faulty circuits. This is shown in
As discussed in Section I, the error model that is used for Fig. 5(c). We refer to the model in Fig. 5(c) as the four-
reliability analysis is the Von Neumann error model. If Z and valued BN. In the example, variables Z̃ 1 , Z̃ 2 , and Ỹ are four-
Ẑ denote the correct and incorrect values of the output of valued. Since the PIs are assumed to be accurate for error rate
a gate and pϵ denotes the gate error probability, the CPD computation, they are two-valued. In general, the PIs can also
P( Ẑ |Z ) specified by this model is as shown in Fig. 4(a). be four-valued. The CPD of each gate represents the JPD of
The figure also has the corresponding gate-level representation the error-free and erroneous outputs, given the values of the
of an erroneous signal, which is an XOR gate that has an error-free and erroneous inputs. Therefore,
independent error signal E. pϵ is thus the signal probability
of E, i.e., pϵ = P(E = 1). P(Ỹ | Z˜1 , Z˜2 ) = P(Y, Ŷ |Z 1 , Zˆ1 , Z 2 , Zˆ2 ). (6)

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 547

Fig. 5. (a) Error model for computation of error rate for the circuit shown in Fig. 2(a). (b) Two-valued BN for the error model. (c) Four-valued BN for the
error model. (d) CPD of a two-input NAND gate in the four-valued BN. (e) Change in variable order implemented using the permutation matrix.

B. CPDs in the Four-Valued BN Definition 1 (Error Rate ( p Ei )): It is the probability of


Practically, the CPD of a two-input gate in the four- error at a PO Yi , averaged over all possible input values. It is
valued BN, P(Ỹ | Z̃ 1 , Z̃ 2 ), can be obtained as follows. Given assumed that all the inputs are equally likely. It is computed
P(Y |Z 1 , Z 2 ) and P(Ŷ | Ẑ 1 , Ẑ 2 ), we have as follows:

P(Y, Ŷ |Z 1 , Z 2 , Ẑ 1 , Ẑ 2 ) = P(Y |Z 1 , Z 2 )P(Ŷ | Ẑ 1 , Zˆ2 ). (7) p Ei = P(Ỹ = 01) + P(Ỹ = 10). (9)

It is seen that CPDs in (6) and (7) differ only in the order Definition 2 (Reliability (rYi )): The reliability of an out-
of the input variables. Specifically, Zˆ1 and Z 2 have to be put Yi is defined as rYi = 1 − p Ei .
interchanged as shown in Fig. 5(e). A change in the order The process of averaging over equally likely inputs is
of variables implies a permutation of the rows of the CPD. equivalent to computing the probability of error at the output
This can be done by premultiplying by a permutation matrix after setting the signal probability of the PIs to 0.5.
as follows: As seen from (7) and (8), the CPD of net Ỹ in the
four-valued formulations is simply a permutation of the factor
P(Ỹ | Z̃ 1 , Z̃ 2 ) = Pm × P(Y, Ŷ |Z 1 , Z 2 , Ẑ 1 , Ẑ 2 ). (8) product of the CPDs of the error-free net (Y ) and the erroneous
net (Ŷ ) in the two-valued formulation. Therefore, the factor
Here, Pm is a permutation matrix and × is used to represent
product of all CPDs in the four-valued will give the same
matrix multiplication. Using the procedure described in [8], the
overall JPD as the two-valued formulation. Exact computation
permutation matrix can be written as follows:
of marginal probabilities P(Ỹ ) and P(Y, Ŷ ) in the four and
Pm = P(Z 1 , Ẑ 1 , Z 2 , Zˆ2 |Z 1 Z 2 Ẑ 1 Zˆ2 ) two-valued formulation involves summation of the JPD over
= P(Z 1 |Z 1 )P( Ẑ 1 , Z 2 |Z 2 , Ẑ 1 )P( Ẑ 2 | Ẑ 2 ) the state of all other variables. Therefore, P(Ỹ ) = P(Y, Ŷ ).
  Hence, the error rate [(9)] estimated using both the formula-
1 0 0 0
    tions is the same if exact inference methods are used.
1 0 0 0 1 0  1 0
= ⊗  ⊗ . The following propositions show that for approximate infer-
0 1 0 1 0 0 0 1
ence using sum-product BP, the four-valued BN is guaranteed
0 0 0 1
to give an error rate of zero when the PIs and gates are error-
Here, ⊗ denotes the tensor product or Kronecker product of free. The significance of this theorem is discussed after the
the two matrices. The conditional probabilities shown above proofs.
are PTMs that are obtained using the approach described Definition 3: A net Z̃ i is said to be error-free with respect
in [8]. The procedure for a gate with an arbitrary number of to a probability distribution P if it satisfies P( Z̃ i = 01) =
inputs is similar. P( Z̃ i = 10) = 0.
Definition 4: The CPD of a net Ỹ with parents PaỸ is said
C. Computation of Error Rate and Reliability of the POs to be error-free with respect to a distribution P if it satisfies
the following:
The error rate and reliability of the POs of a circuit are
defined as follows. P(Ỹ = 01|PaỸ = s) = P(Ỹ = 10|PaỸ = s) = 0

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
548 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023

where TABLE I
O UTPUT E RROR R ATE E STIMATED U SING LBP ON THE T WO -VALUED
s ∈ S = { Z̃ i ∈ {00, 11} ∀ Z̃ i ∈ PaỸ }. BN [F IG . 5( B )] AND F OUR -VALUED BN [F IG . 5( C )] FOR
D IFFERENT G ATE E RROR P ROBABILITIES
Proposition 1: Given a set of error-free nets Z̃ =
{ Z˜1 , . . . , Z˜n }, the joint probability P( Z̃ ) evaluates to zero if
the state of any Z i ∈ Z̃ is either 01 or 10.
Proof: The marginal probability of a net Z i can be
obtained from the joint probability P( Z̃ ) as follows: P( Z̃ i ) =
Z̃ ). Using Definition 3, P( Z̃ i ) evaluates to zero for
P
Z̃ \ Z̃ i P(
states 01 and 10. Since probabilities are nonnegative, P( Z̃ ) parents of Z̃ are also error-free with respect to Q 1 . Since both
should be zero for all the states such that Z̃ i ∈ {01, 10}. the CPD and inputs in (10) are error-free, using Proposition 2,
Proposition 2: If both the set of parents PaỸ and the CPD the marginal probability obtained after sum-product is also
P(Ỹ |PaỸ ) of a net Ỹ are error-free, Ỹ is also error-free. error-free. Thus, going to each cluster in topological order
Proof: Let S denote the set of all possible states of of the nets, all the nets are error-free with respect to the
PaỸ . S can be split into two disjoint sets S1 and S2 , where approximate distribution obtained after sum-product BP.
S1 comprises states where all the nets are either 00 or 11 In particular, this is also true for POs. Therefore, the output
(accurate states of the net) and S2 = S \ S1 (states where at error rate at all the POs evaluates to zero.
least one net is inaccurate). The sum-product operations to There are several variants of sum-product BP that differ
find P(Ỹ = 01) can be written as follows: in the construction of the cluster graph used. In LBP and its
variants, each node in the cluster graph contains a variable
X
P(Ỹ = 01) = P(Ỹ = 01|PaỸ = s)P(PaỸ = s)
s∈S1 and its parents, and all the sep-sets contain a single variable.
X In GBP and IJGP, clusters and sep-sets can have larger sizes.
+ P(Ỹ = 01|PaỸ )P(PaỸ = s). IBIA uses sum-product BP in a sequence of approximate JTs.
s∈S2
Since the underlying message-passing algorithm is the same,
Since the CPD is error-free, the first term in the summation the estimates obtained with all these methods obey Theorem 1.
is zero. In the second term, the summation is over states of In contrast, this is not guaranteed for the two-valued formula-
PaỸ such that at least one Z̃ i ∈ PaỸ is either 01 or 10. tion. As a result, the estimated error rate using it does not scale
Using Proposition 1, the joint probability P(PaỸ ) = 0 well with gate error probabilities even for the small circuit in
for these states. Therefore, P(Ỹ = 01) = 0. Similarly, Fig. 5(a), as shown in Table I.
P(Ỹ = 10) = 0. Corollary 1: The output error rate estimated after assuming
Theorem 1: The output error rate estimated using independence between inputs of a gate in the four-valued
sum-product BP in the four-valued BN is zero if all the PIs formulation is zero if all PIs and gates are error-free.
and gates are error-free. Proof: In LBP, each sep-set contains a single variable
Proof: In this proof, we use P to denote the exact CPDs and messages are in terms of marginals of variables. Each
present as initial factors and Q(C) to denote the approximate CPD is multiplied by the marginals of inputs to get the
JPD of a cluster C obtained after convergence of sum- marginals of the output. This is the same as estimation of
product BP. Let Z̃ be a net in the circuit, Ck = { Z̃ , Pa Z̃ , R̃} be marginals assuming independence between inputs. Therefore,
the cluster that is assigned the CPD of net Z̃ (P( Z̃ |Pa Z̃ )), and using Theorem 1, the estimated error rate is zero when the PIs
Q k (Ck ) be the JPD of variables in Ck . Using (3), the marginal and gates are error-free.
distribution Q k ( Z̃ ) can be written as follows: Note that this can also be shown by traversing the nets in
X X Y the topological order and applying Proposition 2.
Q k ( Z̃ ) = Q k (Ck ) = ψi m j→i
Ck \ Z̃ Ck \ Z̃ j∈N eight (i)
X D. Computation of Circuit Error Rate
= P( Z̃ |Pa Z̃ )Q k (Pa Z̃ )Q k ( R̃| Z̃ , Pa Z̃ )
Definition 5: Circuit Error Rate: ( p E ) It is the probability
Pa Z̃ , R̃
X of getting an error in at least one of the outputs. Equivalently,
= P( Z̃ |Pa Z̃ )Q k (Pa Z̃ ). (10) circuit reliability is 1 − p E .
Pa Z̃ p E is computed after connecting a tree of OR gates to the
POs, as shown in Fig. 1(c). We know that
The second step in the equation follows from the chain rule
of probability and because one of the initial factors assigned p E = P(E 1 ∨ E 2 ∨ · · · ∨ E m = 1)
to Ck is the CPD of Z̃ . = 1 − P(E 1 = 0, E 2 = 0, . . . , E m = 0). (11)
Let X̃ ∈ C0 be a PI, so that Pa X̃ = ∅ and the corre-
sponding initial factor is P( X̃ ). Therefore, from (10), we have Here, ∨ represents the OR operator. Therefore, the prob-
Q 0 (C0 ) = P( X̃ ). Consider a net Z̃ ∈ C1 that is driven by ability of getting an error in any one of the outputs can be
the PIs. As described in Section II, after convergence in sum- computed from the probability that there is no error in any
product BP, the marginal probability of a variable is the same of the outputs. In the framework of BI, the joint probability
in all the clusters in which it is present. Therefore, all the of getting no error in any of the outputs can be computed

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 549

by setting E 1 = E 2 = · · · = E m = 0 as evidence states TABLE II


and computing the probability of evidence. The probability AVERAGE R ELATIVE E RROR ( IN %) IN O UTPUT E RROR R ATE AND
R EQUIRED RUNTIME FOR ISCAS’85 B ENCHMARKS U SING D IFFERENT
of evidence is referred to as the partition function in the D ETERMINISTIC A PPROXIMATE I NFERENCE T ECHNIQUES ON THE
BN literature. The partition function computation can be done F OUR -VALUED BN F ORMULATION . T HE M AXIMUM C LUSTER
using exact as well as many of the approximate BI algorithms. S IZE FOR IBIA, WMB, AND IJGP WAS S ET TO 10, AND
THE L OOP D EPTH FOR HAK WAS S ET TO 3
Note that LBP for computation of partition function will
typically require a larger number of iterations for convergence
and is not the same as assuming independence among the gate
inputs.
To compute the required partition function using the four-
valued formulation, we connect an additional dummy gate to
each output with an associated CPD P(E i |Ỹ i ) given by the
following equation:
P(E i |Ỹi ) Ei = 0 Ei = 1
Ỹ i = 00 1 0 large enough to ensure gate and output error probabilities are
01 0 1 estimated well. The runtime required for the MC simulations
10 0 1 is O(N g Ns ).
11 1 0 For many of the large EPFL benchmarks (multiplier, div,
The outputs of the dummy gates can be set as evidence and so on), Monte Carlo (MC) simulation with 106 was
states. not possible since it ran out of memory. Therefore, we chose
the simulation results obtained with 105 vectors as the base-
IV. R ESULTS line in all our evaluations. This implies that for output
We evaluated the proposed formulation for reliability anal- error rates of the order of 10−3 , the standard deviation
ysis by computing individual output error rates and the overall ((( p(1 − p))/105 )1/2 ) in the estimated value is an order of
circuit error rate for various values of gate error probabilities. magnitude lower. However, lower error rates cannot be esti-
We used circuits belonging to two combinational benchmark mated very reliably. Also for a gate error probability of 10−6 ,
suites, namely, ISCAS’85 [38] and the recent EPFL’15 [39] this comparison was not possible, since at least 107 samples
benchmarks. The benchmarks were synthesized using the would be required.
Cadence Genus v15.2 tool using the Faraday 55-nm technol-
ogy library. All the experiments were carried out on a 3.7-GHz B. Choice of Deterministic Inference Algorithms
Intel i7-8700 Linux system with 64-GB memory.
We evaluated deterministic approximate BI techniques
To compute the error rates of the outputs, the PIs are
based on sum-product BP that are implemented in two publicly
assumed to be error-free, with a signal probability of 0.5. How-
available tools, libDAI [41] and Merlin [42], and a more recent
ever, the model itself supports any signal probability/reliability
method IBIA [33] on the proposed BN formulation. Table II
for the PIs. We have assumed that the error probabilities of
shows comparison of the required runtime and relative error
all the gates are equal. Once again, the model supports any
in the error rate averaged across all POs and all ISCAS’85
arbitrary gate error probabilities. As in all previous works [4],
benchmarks for different BI methods. The maximum cluster
[16], [40], gates with fan-in greater than two are replaced by
size was set to 10 for methods IBIA [33], WMB [32], and
equivalent combinations of two-input gates. In the equivalent
IJGP [31], and loop depth of 3 was set for HAK [43] which
combination, all the gates other than the gate driving the final
is the double-loop variant of GBP. The parameters were chosen
output are assumed to be error-free. For example, in a four-
based on runtime and memory constraints. In these methods,
input AND gate replaced by a combination of three two-input
the runtime for inference is exponential in the maximum
AND gates, only the final AND gate is erroneous.
cluster size. For IBIA, it also depends on the number of
junction trees constructed, while for HAK and IJGP, it depends
A. Baseline on the number of iterations until convergence.
To validate the proposed model, we use error rates obtained For a gate error probability of 0.1, the average relative
from logic simulation as the baseline. The simulation is carried error is less than 5% with IBIA, LBP, and HAK. On the
out on the circuit configuration shown in Fig. 1(b), where the other hand, for gate error probability of 10−3 , IBIA gives
PIs feed both the ideal logic block (F) and the erroneous logic the least error followed by LBP. For WMB, the sum-product
block ( F̂). Each gate in F̂ has an independent input error operation is approximated while computing messages and it is
signal with static probability pϵ as input. The corresponding not guaranteed to satisfy Theorem 1. As seen from the table,
outputs from F and F̂ are XORed to determine the error rates it does not scale well with gate error probabilities.
(or equivalently the reliabilities) of the circuit outputs. The For both the gate error probabilities, the runtime obtained
total number of inputs to the model is NPI + N g where NPI is with LBP is the least, followed by IBIA. It is seen from
the number of PIs and N g is the number of gates in the circuit. the table that the runtimes are relatively independent of the
The number of samples (Ns ) used for the simulation must be gate error probabilities. We have chosen LBP and IBIA

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
550 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023

TABLE III
C OMPARISON OF M AXIMUM E RROR AND RMSE O BTAINED U SING IBIA
ON THE T WO -VALUED AND F OUR -VALUED BN S FOR D IFFERENT G ATE
E RROR P ROBABILITIES ( pϵ ). F OR pϵ = 0, THE F OUR -VALUED
BN G IVES AN E RROR R ATE OF Z ERO FOR A LL O UTPUTS .
( A ) M AXIMUM E RROR . ( B ) RMSE

Fig. 6. Output error rate inferred using IBIA on the four-valued BN


formulation for different gate error probabilities.

TABLE IV
AVERAGE M UTUAL I NFORMATION B ETWEEN THE E RROR -F REE
AND THE C ORRESPONDING E RRONEOUS N ETS FOR VARIOUS
G ATE E RROR P ROBABILITIES ( pϵ )

correlation. The MI between a signal S and the corresponding


erroneous signal Ŝ is defined as follows:
X P(S, Ŝ)
MIs = P(S, Ŝ) log . (12)
for evaluation in this work since these methods give good P(S)P( Ŝ)
S, Ŝ
accuracies with reasonable runtimes.
Table IV shows the average MI between the corresponding
signals for some of small benchmarks, computed using IBIA.
C. Comparison With the Two-Valued BN Formulation For these benchmarks, IBIA either performs exact inference
or the error in the estimation is very small (of the order
As mentioned previously, we refer to the model in Fig. 5(b) of 10−4 ). It is clearly seen that the MI increases as the gate
as the two-valued model and the model in Fig. 5(c) as the error probability reduces. This increased MI is taken care of
four-valued model. Table III has the maximum and root- in the four-valued formulation but the two-valued formulation
mean-square (rms) error in the estimation of the output error struggles (and fails) to capture it. As seen from Table III, the
rates for the ISCAS’85 benchmarks using the two-valued estimation error obtained with the two-valued model increases
and four-valued formulations. Since errors obtained using the drastically as the gate error probability is reduced.
two-valued formulation were very large, we report absolute
value of errors instead of relative errors. We have tabulated
the results obtained with IBIA. Estimates obtained with LBP D. Output Error Rate With the Four-Valued BN Model
also follow a similar trend. The table has the results for gate Table V has the relative error in the computed error rate at
error probabilities, pϵ = 0.1, 10−3 and 0. It is seen that the the POs and the runtime for various benchmarks. The average
computation using the two-valued formulation does not scale (maximum) relative error is the average (maximum) of the
well with the gate error probability ( pϵ ). For pϵ of 10−3 , the absolute relative error over all the outputs. The percentage
max-error for many outputs is close to 0.5, which is just noise. error in the corresponding reliability (rYi ) is shown in brackets.
This is also true for pϵ = 0. Both the max-error and RMSE are For IBIA, we used cluster sizes (i.e., the maximum number
significantly lower when the four-valued formulation is used. of variables in a cluster) of 7 and 10. The results are reported
The output error rates are zero when pϵ is 0, as guaranteed for two different gate error probabilities ( pϵ = 0.1, 10−3 ). The
by Theorem 1. Fig. 6 shows the error rates at various outputs table also shows the number of POs and the total number of
in the benchmark c5315 for different gate error probabilities gates obtained after replacing gates with fan-in greater than
and it is seen that it scales well with gate error probabilities. two with equivalent combinations of two-input gates.
As the gate error probability reduces, the correlation Runtime: For both the values of pϵ , the same number of
between the error-free and the corresponding erroneous signal samples was used for logic simulation. As discussed, the
increases. For discrete random variables, the mutual infor- runtime of IBIA and LBP is also relatively independent of gate
mation (MI) between the two signals is a measure of this error probabilities. So the table has only one set of runtimes.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 551

TABLE V
S TATISTICS OF B ENCHMARKS U SED FOR E VALUATION , AND T RADE -O FF B ETWEEN RUNTIME ( S ) AND ACCURACY IN T ERMS OF AVERAGE AND
M AXIMUM R ELATIVE E RROR (%) IN THE E RROR R ATE (ER) U SING IBIA AND LBP. T HE R ELATIVE E RROR IN R ELIABILITY (R) I S I NDICATED
IN B RACKETS . R ESULTS O BTAINED W ITH LBP, IBIA U SING M AX -C LUSTER S IZE OF 7 AND 10, AND MC S IMULATIONS
A RE S HOWN IN C OLUMNS M ARKED AS “L,” “I7,” “I10,” AND “MC,” R ESPECTIVELY.
ISCAS’85 AND EPFL’15 C IRCUITS A RE S EPARATED BY A L INE

As seen from the table, the runtime for both LBP and obtained with all three methods is small (2%–4%). Therefore,
MC simulations is approximately linear in the number of for this and larger gate error probabilities reasonably accurate
gates, which is as expected. There are some deviations in MC estimates can be obtained within very small runtimes if we
simulations since we have used Cadence Incisive which is an assume independence between inputs of a gate.
event-driven simulator. The runtime of IBIA also increases When the gate error probabilities are reduced to 10−3 , the
with the number of gates. But the exact dependency is more relative error in the estimates of the error rates is larger, with
difficult to predict. As mentioned previously, IBIA constructs IBIA giving significantly better estimates than LBP with a
a sequence of junction trees. As N g increases, the number of mean max-relative error of 52% and 95%, respectively. For
junction trees also increases, but the exact number depends some large benchmarks in EPFL arithmetic circuits, the maxi-
on the structure of the graph and the reconvergent loops in it. mum relative error is greater than 100%. However, the relative
Due to this, there are some fluctuations in the runtimes. error in the corresponding reliabilities (shown in brackets) is
For the same circuit, the runtime for IBIA increases with the much lower. This is because for small gate error probabilities,
maximum cluster size in nearly all the cases, as expected. For the output error rate is small and the corresponding reliability
these cluster sizes, the time required is dominated by the time is large. Therefore, though the absolute error in both the
required to construct the junction trees rather than inference reliability and error rate is the same, the percentage error with
time. The exceptions occur because IBIA takes longer to respect to reliability is much smaller. We have reported both
construct a junction tree with the specified maximum cluster since some of the earlier works report results with error rates,
size. while others for reliability. On an average, the maximum and
The runtime for IBIA is much better than the MC simu- average relative error in reliability obtained with IBIA is about
lations for most circuits and is comparable for a couple of 7% and 4%, respectively. As expected, for both the gate error
large benchmarks (squar e and multi plier ). However, if the probabilities, we observe that the accuracy improves as the
gate error probability is reduced further, the runtime of IBIA cluster size increases.
and LBP will not change much, but the MC simulations will Among the existing methods [4], [23], [24], [25], [44], the
take significantly larger times since the number of samples results have been reported for ISCAS’85 benchmarks with
required would be larger. For example, the average runtime gate error probabilities of 0.01 or larger. We have not seen
over all the benchmarks for gate error probabilities of 10−1 , any results for large EPFL benchmarks for any gate error
10−3 , and 10−6 with IBIA is 276, 240, and 303 s, respectively. probabilities. A fair comparison of results obtained with the
But full logic simulation is not possible for pϵ = 10−6 . existing methods is difficult since the synthesized netlists may
Accuracy: For pϵ = 0.1, the mean maximum relative vary in each case. However, just to compare the overall trend,
error reduces from 15% with LBP to 8% with IBIA using we observe that the average relative error in the reliability of
a cluster size of 10. However, the mean average relative error POs reported with these methods is between 1.2% and 3.8%

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
552 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023

Fig. 7. Error rate at the POs obtained using MC simulations, IBIA, and LBP for pϵ = 10−3 .

Fig. 8. Error rate at the POs obtained using PLS, IBIA, and LBP for pϵ = 10−6 .

for gate error probabilities of 0.1 and 0.01. In contrast, the is further reduced to pϵ = 10−6 . The MC simulations were not
average error in reliability with the proposed formulation is possible for this gate error probability due to memory errors.
0.7% with IBIA (cluster size = 10) and 2.5% if we assume Therefore, we used PLS [14], a sampling-based approximate
independence between inputs of a gate. Very few results are BI technique. The implementation of PLS was taken from the
available for gate error probabilities pϵ ≤ 0.001. Both [4] SMILE toolkit [45], and the number of samples was set to 107 .
and [20] report errors averaged across several gate error It is seen that the estimated error rate scales well when the gate
probabilities. Thus, a direct comparison is not possible. In [20], error probability is reduced. Once again, the relative reliability
with zero gate error probabilities, the relative error in the among the outputs is captured well by both LBP and IBIA.
error rate is 45% for a small circuit like c2670. In contrast, The error rates obtained using LBP are consistently larger than
we show that our method is guaranteed to give zero error rates with IBIA. While PLS is possibly more accurate, the runtime
if algorithms based on sum-product BP (such as LBP, GBP, is about an order of magnitude larger than IBIA. For example,
and IBIA) are used for inference. benchmark div requires 440 min with PLS, while the runtime
Fig. 7 shows the computed error rates at each PO using with IBIA is only 11 min.
MC, IBIA (with cluster size = 10), and LBP for some of the
benchmarks in which the percentage error in the estimated
error rate is large. It is seen that while IBIA and LBP overes- E. Circuit Error Rate With the Four-Valued BN Model
timate the error rates, they are able to capture the error rates of Table VI shows the relative error in the circuit error rate
outputs relative to each other quite well in most cases. In some (Definition 5) for pϵ = 10−3 . It also shows the estimates
testcases like div and c6288, the accuracy improves when obtained after the MC simulations on the circuit configuration
larger clusters are used. The accuracy obtained with IBIA is shown in Fig. 1(c). For pϵ = 0.1, the circuit error rate
comparable to LBP for square and c3540. Since the relative becomes one and both the methods give close to accurate
error rate among the outputs is approximately preserved by estimates. The table has a comparison of the two methods
both the inference methods (with greater accuracy by IBIA used, namely, (a) connecting a tree of OR gates and (b) using
than LBP), these can be used to identify the outputs that are partition function computation [see (11)]. It is seen that the
more susceptible to error. This will enable targeted application relative error in the estimation is significantly lower when
of techniques to improve the reliability of the circuit. PR formulation is used for both the methods, especially
Fig. 8 shows the logarithm of the error rates at the POs for the smaller ISCAS’85 benchmarks. Although the average
obtained using LBP and IBIA when the gate error probability over all the benchmarks is comparable, it is seen that LBP

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 553

TABLE VI methods. In [20] and [46], the accuracy is limited by the length
C IRCUIT E RROR R ATE ( p E ) O BTAINED U SING MC S IMULATIONS , THE of the bitstreams used to estimate the correlation coefficients.
R EQUIRED RUNTIME ( S ), AND THE R ELATIVE E RROR ( IN %) IN THE
C IRCUIT E RROR R ATE O BTAINED U SING T WO I NFERENCE
The methods proposed in [4], [23], [24], [25], and [44] require
M ETHODS . IBIA/LBP-OR R EFERS TO C OMPUTATION the computation of signal probabilities in error-free circuits to
A FTER C ONNECTING A T REE OF OR G ATES AND estimate correlations in reliability. This itself is a #P-complete
IBIA/LBP-PR R EFERS TO C OMPUTATION
U SING THE PARTITION F UNCTION
problem. Accurate estimates obtained using BDDs have been
used in [4]. However, this limits the scalability of the method
to relatively small circuits. In contrast, our method does
not require these probabilities. We avoid inaccuracies in the
estimation of correlation coefficients by deriving CPDs corre-
sponding to the four-valued signals and using BI techniques
that give approximate joint distribution over larger clusters
of variables. Unlike the existing approaches, in our approach,
it is possible to tradeoff runtime and accuracy by increasing
the cluster sizes.
The BI techniques for reliability analysis have been used
in [5], [11], [12], [47], and [48]. In [5], the two-valued
formulation has been used along with the sampling-based
approximate inference techniques. As with all the sampling
methods, this approach is inflexible in the sense that any
change in the circuit requires a complete reevaluation and it
becomes expensive as the gate error probability reduces. Since
the time complexity for sampling techniques scales linearly
with the number of nodes in the network, it can be reduced
to half using the proposed four-valued BN instead of the two-
valued formulation. Exact BI methods have also been used for
gives significantly lower relative errors for a large number of the estimation of gate error probabilities based on device-level
benchmarks. For testcases squar e and multi plier , the circuit parameters [11], [47], and for the computation of bounds on
error rate is very close to one and both the methods perform reliability by identifying the worst case input vector [12], [48].
well. Though LBP requires more iterations to converge when However, exact inference is possible only for small circuits.
the PR formulation is used, it is quite fast, thus making it
suitable for use in an optimization framework. The runtime for VI. C ONCLUSION
IBIA is expected to be larger since it uses larger cluster sizes. We propose a novel algorithm for the estimation of error
That said, the reported runtimes for LBP and IBIA are not rate/reliability in probabilistic and unreliable circuits. Our
directly comparable since both are implemented in different method scales well with gate error probabilities and preserves
programming languages. The average runtime for LBP and the relative reliability of the outputs. We also propose a novel
IBIA for pϵ = 0.1 is 17 and 377 s, respectively, which is method for computing the overall circuit error rate by casting
similar to the runtimes for pϵ = 0.001. it as a problem of estimation of partition function in BNs. This
formulation gives good accuracies within reasonable runtimes,
V. C OMPARISON W ITH R ELATED W ORK making it suitable for use in an optimization framework.
We compare our approach with the existing methods that Although we have demonstrated results for CMOS circuits,
either use (a) a four-valued formulation or (b) BI techniques the methods proposed are general and can be used for circuits
for reliability estimation. that are built with post-CMOS devices. They can also be used
Several existing approaches [4], [15], [20], [21], [23], [24], for the analysis and design of approximate circuits. For these
[25] use a single copy of the circuit with additional probabili- circuits, the four-valued formulation can be applied directly to
ties for each net. The problem here is the accurate propagation the accurate and imprecise truth tables, since the error rate is
of these probabilities in the presence of reconvergent fanouts. independent of the implementation.
Quick estimates can be obtained if gate inputs are assumed to
be independent [15], [21]. However, as seen from the results, A PPENDIX
the accuracy of this method drops as the gate error probability A conditional distribution P(Y|X) is a factor φ over the set
reduces. Methods in [4] and [20] use an extension of the of variables Y ∪ X. Let X, Y, Z be disjoint sets of variables
correlation coefficient method (CCM) to compute signal cor- and φ1 (X, Y), φ2 (Y, Z) be two factors. The factor product
relation coefficients. In [23], [24], and [25], correlation coeffi- [26, Ch. 4] φ1 φ2 gives a factor ψ which is obtained as follows:
cients are computed with respect to signal reliabilities. These ∀x, y, z ∈ Domain(X, Y, Z)
correlations are estimated using analytical methods [4], [16],
[23] or simulation-based methods [20], or hybrid methods ψ(X, Y, Z = x, y, z) = φ1 (x, y)φ2 (y, z).
that combine these two approaches [46]. Typically, pairwise If factors φ1 and φ2 contain disjoint sets of variables, then
correlations are computed which limits the accuracy of these the factor product is the same as the tensor product.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
554 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023

R EFERENCES [23] C. Chen and R. Xiao, “A fast model for analysis and improvement
of gate-level circuit reliability,” Integration, vol. 50, pp. 107–115,
[1] J. Von Neumann, “Probabilistic logics and the synthesis of reliable Jun. 2015.
organisms from unreliable components,” Automata Stud., vol. 34, [24] J. Cai and C. Chen, “Circuit reliability analysis using signal reliability
pp. 43–98, Dec. 1956. correlations,” in Proc. IEEE Int. Conf. Softw. Qual., Rel. Secur. Com-
[2] L. Tan, Z. Li, G. Su, and D. Wang, “Asymptotically linear analysis panion (QRS-C), Jul. 2017, pp. 171–176.
and gate probability allocation schemes in probabilistic circuits,” IEEE [25] K. Sikander, S. Zhan, and C. Chen, “An analytical model for circuit
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 2, pp. 596–606, reliability estimation,” in Proc. IEEE Int. Midwest Symp. Circuits Syst.
Feb. 2020. (MWSCAS), Aug. 2021, pp. 84–87.
[3] N.-Z. Lee and J.-H.-R. Jiang, “Towards formal evaluation and verifi- [26] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles
cation of probabilistic design,” IEEE Trans. Comput., vol. 67, no. 8, and Techniques. Cambridge, MA, USA: MIT Press, 2009.
pp. 1202–1216, Aug. 2018. [27] K. P. Murphy, Y. Weiss, and M. I. Jordan, “Loopy belief propagation
[4] M. R. Choudhury and K. Mohanram, “Reliability analysis of logic for approximate inference: An empirical study,” in Uncertainty in
circuits,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., Artificial Intelligence. Amsterdam, The Netherlands: Elsevier, 1999,
vol. 28, no. 3, pp. 392–405, Mar. 2009. pp. 467–475.
[5] T. Rejimon, K. Lingasubramanian, and S. Bhanja, “Probabilistic error [28] W. Wiegerinck and T. Heskes, “Fractional belief propagation,” in
modeling for nano-domain logic circuits,” IEEE Trans. Very Large Scale Advances in Neural Information Processing Systems, vol. 15. Cam-
Integr. (VLSI) Syst., vol. 17, no. 1, pp. 55–65, Jan. 2009. bridge, MA, USA: MIT Press, 2003.
[6] B. Krishnamurthy and I. G. Tollis, “Improved techniques for esti- [29] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, “Tree-reweighted
mating signal probabilities,” IEEE Trans. Comput., vol. 38, no. 7, belief propagation algorithms and approximate ML estimation by
pp. 1041–1045, Jul. 1989. pseudo-moment matching,” in Proc. Int. Workshop Artif. Intell. Statist.,
[7] S. Krishnaswamy, G. F. Viamontes, I. L. Markov, and J. P. Hayes, “Accu- 2003, pp. 308–315.
rate reliability evaluation and enhancement via probabilistic transfer [30] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Generalized belief
matrices,” in Proc. Design, Autom. Test Eur., Mar. 2005, pp. 282–287. propagation,” in Proc. NIPS, vol. 13, 2000, pp. 689–695.
[8] S. Krishnaswamy, G. F. Viamontes, I. L. Markov, and J. P. Hayes, [31] R. Mateescu, K. Kask, V. Gogate, and R. Dechter, “Join-graph
“Probabilistic transfer matrices in symbolic reliability analysis of logic propagation algorithms,” J. Artif. Intell. Res., vol. 37, pp. 279–328,
circuits,” ACM Trans. Des. Automat. Electron. Syst., vol. 13, no. 1, Mar. 2010.
pp. 1–35, 2008. [32] Q. Liu and A. Ihler, “Bounding the partition function using holder’s
inequality,” in Proc. Int. Conf. Mach. Learn., 2011, pp. 849–856.
[9] J. Han, H. Chen, E. Boykin, and J. Fortes, “Reliability evaluation of logic
circuits using probabilistic gate models,” Microelectron. Rel., vol. 51, [33] S. Bathla and V. Vasudevan, “IBIA: Bayesian inference via incre-
no. 2, pp. 468–476, 2011. mental build-infer-approximate operations on clique trees,” 2022,
arXiv:2202.12003.
[10] O. Keszocze, “BDD-based error metric analysis, computation and opti- [34] M. Chavira and A. Darwiche, “On probabilistic inference by weighted
mization,” IEEE Access, vol. 10, pp. 14013–14028, 2022. model counting,” Artif. Intell., vol. 172, nos. 6–7, pp. 772–799,
[11] W. Ibrahim, V. Beiu, and A. Beg, “GREDA: A fast and more accurate Apr. 2008.
gate reliability EDA tool,” IEEE Trans. Comput.-Aided Design Integr. [35] C. Yuan and M. J. Druzdzel, “An importance sampling algorithm based
Circuits Syst., vol. 31, no. 4, pp. 509–521, Apr. 2012. on evidence pre-propagation,” in Uncertainty in Artificial Intelligence.
[12] W. Ibrahim, M. Shousha, and J. W. Chinneck, “Accurate and efficient Amsterdam, The Netherlands: Elsevier, 2002, pp. 624–631.
estimation of logic circuits reliability bounds,” IEEE Trans. Comput., [36] A. E. Gelfand, “Gibbs sampling,” J. Amer. Stat. Assoc., vol. 95, no. 452,
vol. 64, no. 5, pp. 1217–1229, May 2015. pp. 1300–1304, Dec. 2000.
[13] J. Han, H. Chen, J. Liang, P. Zhu, Z. Yang, and F. Lombardi, [37] N. L. Zhang and D. Poole, “Exploiting causal independence in Bayesian
“A stochastic computational approach for accurate and efficient relia- network inference,” J. Artif. Intell. Res., vol. 5, pp. 301–328, Dec. 1996.
bility evaluation,” IEEE Trans. Comput., vol. 63, no. 6, pp. 1336–1350, [38] F. Brglez, “A neural netlist of 10 combinational benchmark cir-
Jun. 2014. cuits,” Proc. IEEE Special Session ATPG Fault Simulation, Jun. 1985,
[14] M. Henrion, “Propagating uncertainty in Bayesian networks by prob- pp. 151–158.
abilistic logic sampling,” in Uncertainty in Artificial Intelligence [39] L. Amarú, P.-E. Gaillardon, and G. De Micheli, “The EPFL combi-
(Machine Intelligence & Pattern Recognition), vol. 5, J. F. Lemmer and national benchmark suite,” in Proc. Int. Workshop Log. Synth. (IWLS),
L. N. Kanal, Eds. Amsterdam, The Netherlands: North Holland, 1988, 2015, pp. 1–5.
pp. 149–163.
[40] S. Bhanja and N. Ranganathan, “Cascaded Bayesian inferencing for
[15] D. T. Franco, M. C. Vasconcelos, L. Naviner, and J.-F. Naviner, switching activity estimation with correlated inputs,” IEEE Trans. Very
“Reliability analysis of logic circuits based on signal probability,” Large Scale Integr. (VLSI) Syst., vol. 12, no. 12, pp. 1360–1370,
in Proc. 15th IEEE Int. Conf. Electron., Circuits Syst., Aug. 2008, Dec. 2004.
pp. 670–673. [41] J. M. Mooij, “libDAI: A free and open source C++ library for discrete
[16] S. Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Ricco, “Estimate approximate inference in graphical models,” J. Mach. Learn. Res.,
of signal probability in combinational logic networks,” in Proc. 1st Eur. vol. 11, pp. 2169–2173, Aug. 2010.
Test Conf., Jan. 1989, pp. 132–133. [42] R. Marinescu. (2016). Merlin. Accessed: Oct. 15, 2021. [Online].
[17] S. Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Ricco, “Testability Available: https://github.com/radum2275/merlin/
measures in pseudorandom testing,” IEEE Trans. Comput.-Aided Design [43] T. Heskes, K. Albers, and B. Kappen, “Approximate inference and
Integr., vol. 11, no. 6, pp. 794–800, Jun. 1992. constrained optimization,” in Uncertainty in Artificial Intelligence. Ams-
[18] S. Sivaswamy, K. Bazargan, and M. Riedel, “Estimation and optimiza- terdam, The Netherlands: Elsevier, 2003, pp. 313–320.
tion of reliability of noisy digital circuits,” in Proc. 10th Int. Symp. [44] C. Chen, J. Cai, and S. Zhan, “A triple-point model for circuit-level
Quality Electron. Design, Mar. 2009, pp. 213–219. reliability analysis,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
[19] N. Mohyuddin, E. Pakbaznia, and M. Pedram, “Probabilistic error May 2018, pp. 1–4.
propagation in a logic circuit using the Boolean difference calculus,” [45] M. J. Druzdzel, “Smile: Structural modeling, inference, and learning
in Advanced Techniques in Logic Synthesis, Optimizations and Applica- engine and genie: A development environment for graphical decision-
tions. Cham, Switzerland: Springer, 2011, pp. 359–381. theoretic models,” in Proc. AAAI Conf. Artif. Intell., 1999, pp. 902–903.
[20] H. Jahanirad, “CC-SPRA: Correlation coefficients approach for signal [46] S. Zhan and C. Chen, “A hybrid method for signal probability and
probability-based reliability analysis,” IEEE Trans. Very Large Scale reliability estimation with combinational circuits,” Integration, vol. 87,
Integr. (VLSI) Syst., vol. 27, no. 4, pp. 927–939, Apr. 2019. pp. 275–283, Nov. 2022.
[21] D. T. Franco, M. C. Vasconcelos, L. Naviner, and J.-F. Naviner, [47] W. Ibrahim and V. Beiu, “Using Bayesian networks to accurately
“Reliability of logic circuits under multiple simultaneous faults,” in Proc. calculate the reliability of complementary metal oxide semiconductor
51st Midwest Symp. Circuits Syst., Aug. 2008, pp. 265–268. gates,” IEEE Trans. Rel., vol. 60, no. 3, pp. 538–549, Sep. 2011.
[22] J. T. Flaquer, J. M. Daveau, L. Naviner, and P. Roche, “Fast reliability [48] W. Ibrahim and H. Ibrahim, “Multithreaded and reconvergent aware
analysis of combinatorial logic circuits using conditional probabilities,” algorithms for accurate digital circuits reliability estimation,” IEEE
Microelectron. Rel., vol. 50, pp. 1215–1218, Sep./Nov. 2010. Trans. Rel., vol. 68, no. 2, pp. 514–525, Jun. 2019.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.

You might also like