A Framework For Reliability Analysis of Combinational Circuits Using Approximate Bayesian Inference
A Framework For Reliability Analysis of Combinational Circuits Using Approximate Bayesian Inference
I. I NTRODUCTION
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
544 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023
circuit ( F̂) using XOR gates, as shown in Fig. 1(b). The most values of the signal. These methods are attractive since they
commonly used metric for evaluation of these circuits is the not only remove the additional reconvergent loop created by
error rate at each primary output (PO) of the circuit [2], [3], the XOR gates but also obviate the requirement for maintaining
[4], [5]. It is defined as the probability of error at a particular correlations between the error-free and corresponding erro-
output, averaged over all possible values of the primary inputs neous nets. In [4], every net is associated with two conditional
(PIs). It is computed as the signal probability of the outputs of probabilities that give the probability of occurrence of error in
the XOR gates {E 1 , E 2 , . . . , E m } in Fig. 1(b), with the signal a net, given its correct value. In the methods proposed in [15],
probabilities of the PIs (X) set to 0.5. The other metric that [20], [21], [22], [23], [24], and [25], each net is modeled as
is used is the circuit error rate, which is the probability of a four-valued random variable, corresponding to its erroneous
getting an error in at least one of the outputs. This is typically and error-free values. The associated probabilities are the joint
computed after connecting a tree of OR gates to the XOR gates, probabilities of the error-free and the corresponding erroneous
as shown in Fig. 1(c). net. The problem here is the propagation of the four prob-
The problem of reliability analysis is therefore a problem abilities. The existing methods either assume independence
of computing signal probabilities in a system that has both between gate inputs [15], [21] or compute signal correlation
fault-free and faulty circuits. The computation of signal prob- coefficients [4], [20] or reliability correlation coefficients [23],
abilities in a circuit is known to be #P complete [6] and [24], [25]. The main issue with these methods is thus the
is difficult even in error-free circuits. Reliability analysis is inaccuracies that arise due to the estimation and propagation
significantly more difficult due to the additional signals and of these correlations. Typically, pairwise signal correlation
gates needed to model erroneous circuits, as well as due to the coefficients are used, further limiting the accuracy.
additional reconvergent loop created by connecting XOR gates In this article, we propose a novel algorithm for reliability
at the output. Exact methods for signal probability computa- analysis based on approximate BI. As in [15], we use a single
tion include methods based on probabilistic transfer matrices copy of the circuit, with each net modeled as a four-valued
(PTMs) [7], [8], probabilistic gate models (PGMs) [9], binary random variable with the associated probabilities. However,
decision diagrams (BDDs) [3], [10], weighted model counting instead of propagating correlation coefficients, we cast the
(WMC) [3], and Bayesian networks (BNs) [11], [12]. How- problem as a BI problem. For each gate in the circuit,
ever, the associated exponential time/space complexity limits we derive the conditional probability distribution (CPD) that
the applicability of these methods to relatively small circuits. determines the probability of each of the four states of the
Several approximate methods have been proposed in the output, given all possible states of the inputs. Approximate
literature. The simulation-based methods include logic sim- (and wherever possible, exact) inference methods are then used
ulation using the Monte Carlo (MC) framework, stochastic to find the error rate at the POs. Although sampling-based
computation model (SCM) [13], and sampling-based Bayesian methods can also be used, the focus in this article is the
inference (BI) methods [5], [14]. In these methods, each approximate deterministic methods.
erroneous gate requires generation of an additional random With our formulation, we demonstrate that the output error
number stream. Since the accuracy depends on the number of rate scales with the gate error probabilities. It is also guar-
samples used, the time complexity increases as the gate error anteed to be zero when the gate error probability is zero,
probabilities reduce. Therefore, at low gate error probabilities, provided approximate BI algorithms based on sum-product
the sampling-based techniques become unsuitable for use belief propagation (BP) are used. This property does not
within an optimization framework that requires error rate hold good if the model in Fig. 1(b) is used for reliability
computations in each iteration. Another disadvantage is that computation. In this case, very often the computed error rate
these methods are inflexible in the sense that any change to at the output is 0.5, which is just noise. In contrast, although
the circuit requires a complete reevaluation. inaccuracies do increase at low gate error probabilities, our
An alternative is to use deterministic approximate meth- method is able to capture the relative reliability of the outputs
ods based on signal probability computation. The methods with respect to each other for gate error probabilities as low
proposed in [9] and [15] assume that the inputs to a gate as 10−6 . Hence, it can be used to identify POs that are more
are independent and thus have limited accuracy. To improve susceptible to error.
accuracy, signal correlation coefficients proposed in [16] We also propose a new formulation for finding the circuit
and [17] are used in the methods proposed in [4], [18], [19], error rate as the partition function corresponding to a fixed
and [20]. state of POs. This method gets rid of the additional OR
The main challenge in deterministic approximate techniques gates connected to the XOR outputs in Fig. 1(c) and results
is accounting for correlations between nets. This includes in a significant improvement in accuracy. It also has very
the correlations between the error-free and corresponding good run times and is suitable for use within an optimization
erroneous nets as well as the correlations due to reconvergent routine.
fanouts. The approximate analysis methods proposed in [4], The rest of this article is organized as follows. Section II
[15], [21], [22], [23], [24], [25], and [20] have an interesting has the notation used and the background on BI. We discuss
feature. Instead of using two copies of the circuit, they use the proposed formulation in Section III, the results obtained
a single copy in which each net is associated with additional in Section IV, and a more detailed comparison with related
probabilities that take into account the error-free and erroneous work in Section V. Finally, we present our conclusions.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 545
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
546 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 547
Fig. 5. (a) Error model for computation of error rate for the circuit shown in Fig. 2(a). (b) Two-valued BN for the error model. (c) Four-valued BN for the
error model. (d) CPD of a two-input NAND gate in the four-valued BN. (e) Change in variable order implemented using the permutation matrix.
P(Y, Ŷ |Z 1 , Z 2 , Ẑ 1 , Ẑ 2 ) = P(Y |Z 1 , Z 2 )P(Ŷ | Ẑ 1 , Zˆ2 ). (7) p Ei = P(Ỹ = 01) + P(Ỹ = 10). (9)
It is seen that CPDs in (6) and (7) differ only in the order Definition 2 (Reliability (rYi )): The reliability of an out-
of the input variables. Specifically, Zˆ1 and Z 2 have to be put Yi is defined as rYi = 1 − p Ei .
interchanged as shown in Fig. 5(e). A change in the order The process of averaging over equally likely inputs is
of variables implies a permutation of the rows of the CPD. equivalent to computing the probability of error at the output
This can be done by premultiplying by a permutation matrix after setting the signal probability of the PIs to 0.5.
as follows: As seen from (7) and (8), the CPD of net Ỹ in the
four-valued formulations is simply a permutation of the factor
P(Ỹ | Z̃ 1 , Z̃ 2 ) = Pm × P(Y, Ŷ |Z 1 , Z 2 , Ẑ 1 , Ẑ 2 ). (8) product of the CPDs of the error-free net (Y ) and the erroneous
net (Ŷ ) in the two-valued formulation. Therefore, the factor
Here, Pm is a permutation matrix and × is used to represent
product of all CPDs in the four-valued will give the same
matrix multiplication. Using the procedure described in [8], the
overall JPD as the two-valued formulation. Exact computation
permutation matrix can be written as follows:
of marginal probabilities P(Ỹ ) and P(Y, Ŷ ) in the four and
Pm = P(Z 1 , Ẑ 1 , Z 2 , Zˆ2 |Z 1 Z 2 Ẑ 1 Zˆ2 ) two-valued formulation involves summation of the JPD over
= P(Z 1 |Z 1 )P( Ẑ 1 , Z 2 |Z 2 , Ẑ 1 )P( Ẑ 2 | Ẑ 2 ) the state of all other variables. Therefore, P(Ỹ ) = P(Y, Ŷ ).
Hence, the error rate [(9)] estimated using both the formula-
1 0 0 0
tions is the same if exact inference methods are used.
1 0 0 0 1 0 1 0
= ⊗ ⊗ . The following propositions show that for approximate infer-
0 1 0 1 0 0 0 1
ence using sum-product BP, the four-valued BN is guaranteed
0 0 0 1
to give an error rate of zero when the PIs and gates are error-
Here, ⊗ denotes the tensor product or Kronecker product of free. The significance of this theorem is discussed after the
the two matrices. The conditional probabilities shown above proofs.
are PTMs that are obtained using the approach described Definition 3: A net Z̃ i is said to be error-free with respect
in [8]. The procedure for a gate with an arbitrary number of to a probability distribution P if it satisfies P( Z̃ i = 01) =
inputs is similar. P( Z̃ i = 10) = 0.
Definition 4: The CPD of a net Ỹ with parents PaỸ is said
C. Computation of Error Rate and Reliability of the POs to be error-free with respect to a distribution P if it satisfies
the following:
The error rate and reliability of the POs of a circuit are
defined as follows. P(Ỹ = 01|PaỸ = s) = P(Ỹ = 10|PaỸ = s) = 0
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
548 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023
where TABLE I
O UTPUT E RROR R ATE E STIMATED U SING LBP ON THE T WO -VALUED
s ∈ S = { Z̃ i ∈ {00, 11} ∀ Z̃ i ∈ PaỸ }. BN [F IG . 5( B )] AND F OUR -VALUED BN [F IG . 5( C )] FOR
D IFFERENT G ATE E RROR P ROBABILITIES
Proposition 1: Given a set of error-free nets Z̃ =
{ Z˜1 , . . . , Z˜n }, the joint probability P( Z̃ ) evaluates to zero if
the state of any Z i ∈ Z̃ is either 01 or 10.
Proof: The marginal probability of a net Z i can be
obtained from the joint probability P( Z̃ ) as follows: P( Z̃ i ) =
Z̃ ). Using Definition 3, P( Z̃ i ) evaluates to zero for
P
Z̃ \ Z̃ i P(
states 01 and 10. Since probabilities are nonnegative, P( Z̃ ) parents of Z̃ are also error-free with respect to Q 1 . Since both
should be zero for all the states such that Z̃ i ∈ {01, 10}. the CPD and inputs in (10) are error-free, using Proposition 2,
Proposition 2: If both the set of parents PaỸ and the CPD the marginal probability obtained after sum-product is also
P(Ỹ |PaỸ ) of a net Ỹ are error-free, Ỹ is also error-free. error-free. Thus, going to each cluster in topological order
Proof: Let S denote the set of all possible states of of the nets, all the nets are error-free with respect to the
PaỸ . S can be split into two disjoint sets S1 and S2 , where approximate distribution obtained after sum-product BP.
S1 comprises states where all the nets are either 00 or 11 In particular, this is also true for POs. Therefore, the output
(accurate states of the net) and S2 = S \ S1 (states where at error rate at all the POs evaluates to zero.
least one net is inaccurate). The sum-product operations to There are several variants of sum-product BP that differ
find P(Ỹ = 01) can be written as follows: in the construction of the cluster graph used. In LBP and its
variants, each node in the cluster graph contains a variable
X
P(Ỹ = 01) = P(Ỹ = 01|PaỸ = s)P(PaỸ = s)
s∈S1 and its parents, and all the sep-sets contain a single variable.
X In GBP and IJGP, clusters and sep-sets can have larger sizes.
+ P(Ỹ = 01|PaỸ )P(PaỸ = s). IBIA uses sum-product BP in a sequence of approximate JTs.
s∈S2
Since the underlying message-passing algorithm is the same,
Since the CPD is error-free, the first term in the summation the estimates obtained with all these methods obey Theorem 1.
is zero. In the second term, the summation is over states of In contrast, this is not guaranteed for the two-valued formula-
PaỸ such that at least one Z̃ i ∈ PaỸ is either 01 or 10. tion. As a result, the estimated error rate using it does not scale
Using Proposition 1, the joint probability P(PaỸ ) = 0 well with gate error probabilities even for the small circuit in
for these states. Therefore, P(Ỹ = 01) = 0. Similarly, Fig. 5(a), as shown in Table I.
P(Ỹ = 10) = 0. Corollary 1: The output error rate estimated after assuming
Theorem 1: The output error rate estimated using independence between inputs of a gate in the four-valued
sum-product BP in the four-valued BN is zero if all the PIs formulation is zero if all PIs and gates are error-free.
and gates are error-free. Proof: In LBP, each sep-set contains a single variable
Proof: In this proof, we use P to denote the exact CPDs and messages are in terms of marginals of variables. Each
present as initial factors and Q(C) to denote the approximate CPD is multiplied by the marginals of inputs to get the
JPD of a cluster C obtained after convergence of sum- marginals of the output. This is the same as estimation of
product BP. Let Z̃ be a net in the circuit, Ck = { Z̃ , Pa Z̃ , R̃} be marginals assuming independence between inputs. Therefore,
the cluster that is assigned the CPD of net Z̃ (P( Z̃ |Pa Z̃ )), and using Theorem 1, the estimated error rate is zero when the PIs
Q k (Ck ) be the JPD of variables in Ck . Using (3), the marginal and gates are error-free.
distribution Q k ( Z̃ ) can be written as follows: Note that this can also be shown by traversing the nets in
X X Y the topological order and applying Proposition 2.
Q k ( Z̃ ) = Q k (Ck ) = ψi m j→i
Ck \ Z̃ Ck \ Z̃ j∈N eight (i)
X D. Computation of Circuit Error Rate
= P( Z̃ |Pa Z̃ )Q k (Pa Z̃ )Q k ( R̃| Z̃ , Pa Z̃ )
Definition 5: Circuit Error Rate: ( p E ) It is the probability
Pa Z̃ , R̃
X of getting an error in at least one of the outputs. Equivalently,
= P( Z̃ |Pa Z̃ )Q k (Pa Z̃ ). (10) circuit reliability is 1 − p E .
Pa Z̃ p E is computed after connecting a tree of OR gates to the
POs, as shown in Fig. 1(c). We know that
The second step in the equation follows from the chain rule
of probability and because one of the initial factors assigned p E = P(E 1 ∨ E 2 ∨ · · · ∨ E m = 1)
to Ck is the CPD of Z̃ . = 1 − P(E 1 = 0, E 2 = 0, . . . , E m = 0). (11)
Let X̃ ∈ C0 be a PI, so that Pa X̃ = ∅ and the corre-
sponding initial factor is P( X̃ ). Therefore, from (10), we have Here, ∨ represents the OR operator. Therefore, the prob-
Q 0 (C0 ) = P( X̃ ). Consider a net Z̃ ∈ C1 that is driven by ability of getting an error in any one of the outputs can be
the PIs. As described in Section II, after convergence in sum- computed from the probability that there is no error in any
product BP, the marginal probability of a variable is the same of the outputs. In the framework of BI, the joint probability
in all the clusters in which it is present. Therefore, all the of getting no error in any of the outputs can be computed
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 549
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
550 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023
TABLE III
C OMPARISON OF M AXIMUM E RROR AND RMSE O BTAINED U SING IBIA
ON THE T WO -VALUED AND F OUR -VALUED BN S FOR D IFFERENT G ATE
E RROR P ROBABILITIES ( pϵ ). F OR pϵ = 0, THE F OUR -VALUED
BN G IVES AN E RROR R ATE OF Z ERO FOR A LL O UTPUTS .
( A ) M AXIMUM E RROR . ( B ) RMSE
TABLE IV
AVERAGE M UTUAL I NFORMATION B ETWEEN THE E RROR -F REE
AND THE C ORRESPONDING E RRONEOUS N ETS FOR VARIOUS
G ATE E RROR P ROBABILITIES ( pϵ )
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 551
TABLE V
S TATISTICS OF B ENCHMARKS U SED FOR E VALUATION , AND T RADE -O FF B ETWEEN RUNTIME ( S ) AND ACCURACY IN T ERMS OF AVERAGE AND
M AXIMUM R ELATIVE E RROR (%) IN THE E RROR R ATE (ER) U SING IBIA AND LBP. T HE R ELATIVE E RROR IN R ELIABILITY (R) I S I NDICATED
IN B RACKETS . R ESULTS O BTAINED W ITH LBP, IBIA U SING M AX -C LUSTER S IZE OF 7 AND 10, AND MC S IMULATIONS
A RE S HOWN IN C OLUMNS M ARKED AS “L,” “I7,” “I10,” AND “MC,” R ESPECTIVELY.
ISCAS’85 AND EPFL’15 C IRCUITS A RE S EPARATED BY A L INE
As seen from the table, the runtime for both LBP and obtained with all three methods is small (2%–4%). Therefore,
MC simulations is approximately linear in the number of for this and larger gate error probabilities reasonably accurate
gates, which is as expected. There are some deviations in MC estimates can be obtained within very small runtimes if we
simulations since we have used Cadence Incisive which is an assume independence between inputs of a gate.
event-driven simulator. The runtime of IBIA also increases When the gate error probabilities are reduced to 10−3 , the
with the number of gates. But the exact dependency is more relative error in the estimates of the error rates is larger, with
difficult to predict. As mentioned previously, IBIA constructs IBIA giving significantly better estimates than LBP with a
a sequence of junction trees. As N g increases, the number of mean max-relative error of 52% and 95%, respectively. For
junction trees also increases, but the exact number depends some large benchmarks in EPFL arithmetic circuits, the maxi-
on the structure of the graph and the reconvergent loops in it. mum relative error is greater than 100%. However, the relative
Due to this, there are some fluctuations in the runtimes. error in the corresponding reliabilities (shown in brackets) is
For the same circuit, the runtime for IBIA increases with the much lower. This is because for small gate error probabilities,
maximum cluster size in nearly all the cases, as expected. For the output error rate is small and the corresponding reliability
these cluster sizes, the time required is dominated by the time is large. Therefore, though the absolute error in both the
required to construct the junction trees rather than inference reliability and error rate is the same, the percentage error with
time. The exceptions occur because IBIA takes longer to respect to reliability is much smaller. We have reported both
construct a junction tree with the specified maximum cluster since some of the earlier works report results with error rates,
size. while others for reliability. On an average, the maximum and
The runtime for IBIA is much better than the MC simu- average relative error in reliability obtained with IBIA is about
lations for most circuits and is comparable for a couple of 7% and 4%, respectively. As expected, for both the gate error
large benchmarks (squar e and multi plier ). However, if the probabilities, we observe that the accuracy improves as the
gate error probability is reduced further, the runtime of IBIA cluster size increases.
and LBP will not change much, but the MC simulations will Among the existing methods [4], [23], [24], [25], [44], the
take significantly larger times since the number of samples results have been reported for ISCAS’85 benchmarks with
required would be larger. For example, the average runtime gate error probabilities of 0.01 or larger. We have not seen
over all the benchmarks for gate error probabilities of 10−1 , any results for large EPFL benchmarks for any gate error
10−3 , and 10−6 with IBIA is 276, 240, and 303 s, respectively. probabilities. A fair comparison of results obtained with the
But full logic simulation is not possible for pϵ = 10−6 . existing methods is difficult since the synthesized netlists may
Accuracy: For pϵ = 0.1, the mean maximum relative vary in each case. However, just to compare the overall trend,
error reduces from 15% with LBP to 8% with IBIA using we observe that the average relative error in the reliability of
a cluster size of 10. However, the mean average relative error POs reported with these methods is between 1.2% and 3.8%
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
552 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023
Fig. 7. Error rate at the POs obtained using MC simulations, IBIA, and LBP for pϵ = 10−3 .
Fig. 8. Error rate at the POs obtained using PLS, IBIA, and LBP for pϵ = 10−6 .
for gate error probabilities of 0.1 and 0.01. In contrast, the is further reduced to pϵ = 10−6 . The MC simulations were not
average error in reliability with the proposed formulation is possible for this gate error probability due to memory errors.
0.7% with IBIA (cluster size = 10) and 2.5% if we assume Therefore, we used PLS [14], a sampling-based approximate
independence between inputs of a gate. Very few results are BI technique. The implementation of PLS was taken from the
available for gate error probabilities pϵ ≤ 0.001. Both [4] SMILE toolkit [45], and the number of samples was set to 107 .
and [20] report errors averaged across several gate error It is seen that the estimated error rate scales well when the gate
probabilities. Thus, a direct comparison is not possible. In [20], error probability is reduced. Once again, the relative reliability
with zero gate error probabilities, the relative error in the among the outputs is captured well by both LBP and IBIA.
error rate is 45% for a small circuit like c2670. In contrast, The error rates obtained using LBP are consistently larger than
we show that our method is guaranteed to give zero error rates with IBIA. While PLS is possibly more accurate, the runtime
if algorithms based on sum-product BP (such as LBP, GBP, is about an order of magnitude larger than IBIA. For example,
and IBIA) are used for inference. benchmark div requires 440 min with PLS, while the runtime
Fig. 7 shows the computed error rates at each PO using with IBIA is only 11 min.
MC, IBIA (with cluster size = 10), and LBP for some of the
benchmarks in which the percentage error in the estimated
error rate is large. It is seen that while IBIA and LBP overes- E. Circuit Error Rate With the Four-Valued BN Model
timate the error rates, they are able to capture the error rates of Table VI shows the relative error in the circuit error rate
outputs relative to each other quite well in most cases. In some (Definition 5) for pϵ = 10−3 . It also shows the estimates
testcases like div and c6288, the accuracy improves when obtained after the MC simulations on the circuit configuration
larger clusters are used. The accuracy obtained with IBIA is shown in Fig. 1(c). For pϵ = 0.1, the circuit error rate
comparable to LBP for square and c3540. Since the relative becomes one and both the methods give close to accurate
error rate among the outputs is approximately preserved by estimates. The table has a comparison of the two methods
both the inference methods (with greater accuracy by IBIA used, namely, (a) connecting a tree of OR gates and (b) using
than LBP), these can be used to identify the outputs that are partition function computation [see (11)]. It is seen that the
more susceptible to error. This will enable targeted application relative error in the estimation is significantly lower when
of techniques to improve the reliability of the circuit. PR formulation is used for both the methods, especially
Fig. 8 shows the logarithm of the error rates at the POs for the smaller ISCAS’85 benchmarks. Although the average
obtained using LBP and IBIA when the gate error probability over all the benchmarks is comparable, it is seen that LBP
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
BATHLA AND VASUDEVAN: FRAMEWORK FOR RELIABILITY ANALYSIS OF COMBINATIONAL CIRCUITS 553
TABLE VI methods. In [20] and [46], the accuracy is limited by the length
C IRCUIT E RROR R ATE ( p E ) O BTAINED U SING MC S IMULATIONS , THE of the bitstreams used to estimate the correlation coefficients.
R EQUIRED RUNTIME ( S ), AND THE R ELATIVE E RROR ( IN %) IN THE
C IRCUIT E RROR R ATE O BTAINED U SING T WO I NFERENCE
The methods proposed in [4], [23], [24], [25], and [44] require
M ETHODS . IBIA/LBP-OR R EFERS TO C OMPUTATION the computation of signal probabilities in error-free circuits to
A FTER C ONNECTING A T REE OF OR G ATES AND estimate correlations in reliability. This itself is a #P-complete
IBIA/LBP-PR R EFERS TO C OMPUTATION
U SING THE PARTITION F UNCTION
problem. Accurate estimates obtained using BDDs have been
used in [4]. However, this limits the scalability of the method
to relatively small circuits. In contrast, our method does
not require these probabilities. We avoid inaccuracies in the
estimation of correlation coefficients by deriving CPDs corre-
sponding to the four-valued signals and using BI techniques
that give approximate joint distribution over larger clusters
of variables. Unlike the existing approaches, in our approach,
it is possible to tradeoff runtime and accuracy by increasing
the cluster sizes.
The BI techniques for reliability analysis have been used
in [5], [11], [12], [47], and [48]. In [5], the two-valued
formulation has been used along with the sampling-based
approximate inference techniques. As with all the sampling
methods, this approach is inflexible in the sense that any
change in the circuit requires a complete reevaluation and it
becomes expensive as the gate error probability reduces. Since
the time complexity for sampling techniques scales linearly
with the number of nodes in the network, it can be reduced
to half using the proposed four-valued BN instead of the two-
valued formulation. Exact BI methods have also been used for
gives significantly lower relative errors for a large number of the estimation of gate error probabilities based on device-level
benchmarks. For testcases squar e and multi plier , the circuit parameters [11], [47], and for the computation of bounds on
error rate is very close to one and both the methods perform reliability by identifying the worst case input vector [12], [48].
well. Though LBP requires more iterations to converge when However, exact inference is possible only for small circuits.
the PR formulation is used, it is quite fast, thus making it
suitable for use in an optimization framework. The runtime for VI. C ONCLUSION
IBIA is expected to be larger since it uses larger cluster sizes. We propose a novel algorithm for the estimation of error
That said, the reported runtimes for LBP and IBIA are not rate/reliability in probabilistic and unreliable circuits. Our
directly comparable since both are implemented in different method scales well with gate error probabilities and preserves
programming languages. The average runtime for LBP and the relative reliability of the outputs. We also propose a novel
IBIA for pϵ = 0.1 is 17 and 377 s, respectively, which is method for computing the overall circuit error rate by casting
similar to the runtimes for pϵ = 0.001. it as a problem of estimation of partition function in BNs. This
formulation gives good accuracies within reasonable runtimes,
V. C OMPARISON W ITH R ELATED W ORK making it suitable for use in an optimization framework.
We compare our approach with the existing methods that Although we have demonstrated results for CMOS circuits,
either use (a) a four-valued formulation or (b) BI techniques the methods proposed are general and can be used for circuits
for reliability estimation. that are built with post-CMOS devices. They can also be used
Several existing approaches [4], [15], [20], [21], [23], [24], for the analysis and design of approximate circuits. For these
[25] use a single copy of the circuit with additional probabili- circuits, the four-valued formulation can be applied directly to
ties for each net. The problem here is the accurate propagation the accurate and imprecise truth tables, since the error rate is
of these probabilities in the presence of reconvergent fanouts. independent of the implementation.
Quick estimates can be obtained if gate inputs are assumed to
be independent [15], [21]. However, as seen from the results, A PPENDIX
the accuracy of this method drops as the gate error probability A conditional distribution P(Y|X) is a factor φ over the set
reduces. Methods in [4] and [20] use an extension of the of variables Y ∪ X. Let X, Y, Z be disjoint sets of variables
correlation coefficient method (CCM) to compute signal cor- and φ1 (X, Y), φ2 (Y, Z) be two factors. The factor product
relation coefficients. In [23], [24], and [25], correlation coeffi- [26, Ch. 4] φ1 φ2 gives a factor ψ which is obtained as follows:
cients are computed with respect to signal reliabilities. These ∀x, y, z ∈ Domain(X, Y, Z)
correlations are estimated using analytical methods [4], [16],
[23] or simulation-based methods [20], or hybrid methods ψ(X, Y, Z = x, y, z) = φ1 (x, y)φ2 (y, z).
that combine these two approaches [46]. Typically, pairwise If factors φ1 and φ2 contain disjoint sets of variables, then
correlations are computed which limits the accuracy of these the factor product is the same as the tensor product.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.
554 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 31, NO. 4, APRIL 2023
R EFERENCES [23] C. Chen and R. Xiao, “A fast model for analysis and improvement
of gate-level circuit reliability,” Integration, vol. 50, pp. 107–115,
[1] J. Von Neumann, “Probabilistic logics and the synthesis of reliable Jun. 2015.
organisms from unreliable components,” Automata Stud., vol. 34, [24] J. Cai and C. Chen, “Circuit reliability analysis using signal reliability
pp. 43–98, Dec. 1956. correlations,” in Proc. IEEE Int. Conf. Softw. Qual., Rel. Secur. Com-
[2] L. Tan, Z. Li, G. Su, and D. Wang, “Asymptotically linear analysis panion (QRS-C), Jul. 2017, pp. 171–176.
and gate probability allocation schemes in probabilistic circuits,” IEEE [25] K. Sikander, S. Zhan, and C. Chen, “An analytical model for circuit
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 2, pp. 596–606, reliability estimation,” in Proc. IEEE Int. Midwest Symp. Circuits Syst.
Feb. 2020. (MWSCAS), Aug. 2021, pp. 84–87.
[3] N.-Z. Lee and J.-H.-R. Jiang, “Towards formal evaluation and verifi- [26] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles
cation of probabilistic design,” IEEE Trans. Comput., vol. 67, no. 8, and Techniques. Cambridge, MA, USA: MIT Press, 2009.
pp. 1202–1216, Aug. 2018. [27] K. P. Murphy, Y. Weiss, and M. I. Jordan, “Loopy belief propagation
[4] M. R. Choudhury and K. Mohanram, “Reliability analysis of logic for approximate inference: An empirical study,” in Uncertainty in
circuits,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., Artificial Intelligence. Amsterdam, The Netherlands: Elsevier, 1999,
vol. 28, no. 3, pp. 392–405, Mar. 2009. pp. 467–475.
[5] T. Rejimon, K. Lingasubramanian, and S. Bhanja, “Probabilistic error [28] W. Wiegerinck and T. Heskes, “Fractional belief propagation,” in
modeling for nano-domain logic circuits,” IEEE Trans. Very Large Scale Advances in Neural Information Processing Systems, vol. 15. Cam-
Integr. (VLSI) Syst., vol. 17, no. 1, pp. 55–65, Jan. 2009. bridge, MA, USA: MIT Press, 2003.
[6] B. Krishnamurthy and I. G. Tollis, “Improved techniques for esti- [29] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, “Tree-reweighted
mating signal probabilities,” IEEE Trans. Comput., vol. 38, no. 7, belief propagation algorithms and approximate ML estimation by
pp. 1041–1045, Jul. 1989. pseudo-moment matching,” in Proc. Int. Workshop Artif. Intell. Statist.,
[7] S. Krishnaswamy, G. F. Viamontes, I. L. Markov, and J. P. Hayes, “Accu- 2003, pp. 308–315.
rate reliability evaluation and enhancement via probabilistic transfer [30] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Generalized belief
matrices,” in Proc. Design, Autom. Test Eur., Mar. 2005, pp. 282–287. propagation,” in Proc. NIPS, vol. 13, 2000, pp. 689–695.
[8] S. Krishnaswamy, G. F. Viamontes, I. L. Markov, and J. P. Hayes, [31] R. Mateescu, K. Kask, V. Gogate, and R. Dechter, “Join-graph
“Probabilistic transfer matrices in symbolic reliability analysis of logic propagation algorithms,” J. Artif. Intell. Res., vol. 37, pp. 279–328,
circuits,” ACM Trans. Des. Automat. Electron. Syst., vol. 13, no. 1, Mar. 2010.
pp. 1–35, 2008. [32] Q. Liu and A. Ihler, “Bounding the partition function using holder’s
inequality,” in Proc. Int. Conf. Mach. Learn., 2011, pp. 849–856.
[9] J. Han, H. Chen, E. Boykin, and J. Fortes, “Reliability evaluation of logic
circuits using probabilistic gate models,” Microelectron. Rel., vol. 51, [33] S. Bathla and V. Vasudevan, “IBIA: Bayesian inference via incre-
no. 2, pp. 468–476, 2011. mental build-infer-approximate operations on clique trees,” 2022,
arXiv:2202.12003.
[10] O. Keszocze, “BDD-based error metric analysis, computation and opti- [34] M. Chavira and A. Darwiche, “On probabilistic inference by weighted
mization,” IEEE Access, vol. 10, pp. 14013–14028, 2022. model counting,” Artif. Intell., vol. 172, nos. 6–7, pp. 772–799,
[11] W. Ibrahim, V. Beiu, and A. Beg, “GREDA: A fast and more accurate Apr. 2008.
gate reliability EDA tool,” IEEE Trans. Comput.-Aided Design Integr. [35] C. Yuan and M. J. Druzdzel, “An importance sampling algorithm based
Circuits Syst., vol. 31, no. 4, pp. 509–521, Apr. 2012. on evidence pre-propagation,” in Uncertainty in Artificial Intelligence.
[12] W. Ibrahim, M. Shousha, and J. W. Chinneck, “Accurate and efficient Amsterdam, The Netherlands: Elsevier, 2002, pp. 624–631.
estimation of logic circuits reliability bounds,” IEEE Trans. Comput., [36] A. E. Gelfand, “Gibbs sampling,” J. Amer. Stat. Assoc., vol. 95, no. 452,
vol. 64, no. 5, pp. 1217–1229, May 2015. pp. 1300–1304, Dec. 2000.
[13] J. Han, H. Chen, J. Liang, P. Zhu, Z. Yang, and F. Lombardi, [37] N. L. Zhang and D. Poole, “Exploiting causal independence in Bayesian
“A stochastic computational approach for accurate and efficient relia- network inference,” J. Artif. Intell. Res., vol. 5, pp. 301–328, Dec. 1996.
bility evaluation,” IEEE Trans. Comput., vol. 63, no. 6, pp. 1336–1350, [38] F. Brglez, “A neural netlist of 10 combinational benchmark cir-
Jun. 2014. cuits,” Proc. IEEE Special Session ATPG Fault Simulation, Jun. 1985,
[14] M. Henrion, “Propagating uncertainty in Bayesian networks by prob- pp. 151–158.
abilistic logic sampling,” in Uncertainty in Artificial Intelligence [39] L. Amarú, P.-E. Gaillardon, and G. De Micheli, “The EPFL combi-
(Machine Intelligence & Pattern Recognition), vol. 5, J. F. Lemmer and national benchmark suite,” in Proc. Int. Workshop Log. Synth. (IWLS),
L. N. Kanal, Eds. Amsterdam, The Netherlands: North Holland, 1988, 2015, pp. 1–5.
pp. 149–163.
[40] S. Bhanja and N. Ranganathan, “Cascaded Bayesian inferencing for
[15] D. T. Franco, M. C. Vasconcelos, L. Naviner, and J.-F. Naviner, switching activity estimation with correlated inputs,” IEEE Trans. Very
“Reliability analysis of logic circuits based on signal probability,” Large Scale Integr. (VLSI) Syst., vol. 12, no. 12, pp. 1360–1370,
in Proc. 15th IEEE Int. Conf. Electron., Circuits Syst., Aug. 2008, Dec. 2004.
pp. 670–673. [41] J. M. Mooij, “libDAI: A free and open source C++ library for discrete
[16] S. Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Ricco, “Estimate approximate inference in graphical models,” J. Mach. Learn. Res.,
of signal probability in combinational logic networks,” in Proc. 1st Eur. vol. 11, pp. 2169–2173, Aug. 2010.
Test Conf., Jan. 1989, pp. 132–133. [42] R. Marinescu. (2016). Merlin. Accessed: Oct. 15, 2021. [Online].
[17] S. Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Ricco, “Testability Available: https://github.com/radum2275/merlin/
measures in pseudorandom testing,” IEEE Trans. Comput.-Aided Design [43] T. Heskes, K. Albers, and B. Kappen, “Approximate inference and
Integr., vol. 11, no. 6, pp. 794–800, Jun. 1992. constrained optimization,” in Uncertainty in Artificial Intelligence. Ams-
[18] S. Sivaswamy, K. Bazargan, and M. Riedel, “Estimation and optimiza- terdam, The Netherlands: Elsevier, 2003, pp. 313–320.
tion of reliability of noisy digital circuits,” in Proc. 10th Int. Symp. [44] C. Chen, J. Cai, and S. Zhan, “A triple-point model for circuit-level
Quality Electron. Design, Mar. 2009, pp. 213–219. reliability analysis,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
[19] N. Mohyuddin, E. Pakbaznia, and M. Pedram, “Probabilistic error May 2018, pp. 1–4.
propagation in a logic circuit using the Boolean difference calculus,” [45] M. J. Druzdzel, “Smile: Structural modeling, inference, and learning
in Advanced Techniques in Logic Synthesis, Optimizations and Applica- engine and genie: A development environment for graphical decision-
tions. Cham, Switzerland: Springer, 2011, pp. 359–381. theoretic models,” in Proc. AAAI Conf. Artif. Intell., 1999, pp. 902–903.
[20] H. Jahanirad, “CC-SPRA: Correlation coefficients approach for signal [46] S. Zhan and C. Chen, “A hybrid method for signal probability and
probability-based reliability analysis,” IEEE Trans. Very Large Scale reliability estimation with combinational circuits,” Integration, vol. 87,
Integr. (VLSI) Syst., vol. 27, no. 4, pp. 927–939, Apr. 2019. pp. 275–283, Nov. 2022.
[21] D. T. Franco, M. C. Vasconcelos, L. Naviner, and J.-F. Naviner, [47] W. Ibrahim and V. Beiu, “Using Bayesian networks to accurately
“Reliability of logic circuits under multiple simultaneous faults,” in Proc. calculate the reliability of complementary metal oxide semiconductor
51st Midwest Symp. Circuits Syst., Aug. 2008, pp. 265–268. gates,” IEEE Trans. Rel., vol. 60, no. 3, pp. 538–549, Sep. 2011.
[22] J. T. Flaquer, J. M. Daveau, L. Naviner, and P. Roche, “Fast reliability [48] W. Ibrahim and H. Ibrahim, “Multithreaded and reconvergent aware
analysis of combinatorial logic circuits using conditional probabilities,” algorithms for accurate digital circuits reliability estimation,” IEEE
Microelectron. Rel., vol. 50, pp. 1215–1218, Sep./Nov. 2010. Trans. Rel., vol. 68, no. 2, pp. 514–525, Jun. 2019.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 02,2024 at 05:03:01 UTC from IEEE Xplore. Restrictions apply.