0% found this document useful (0 votes)
44 views14 pages

Counterexample Guided Neural Network Quantization Refinement

Uploaded by

merlin xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views14 pages

Counterexample Guided Neural Network Quantization Refinement

Uploaded by

merlin xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO.

4, APRIL 2024 1121

Counterexample Guided Neural Network


Quantization Refinement
João Batista P. Matos Jr. , Eddie B. de Lima Filho , Iury Bessa ,
Edoardo Manino , Xidan Song , and Lucas C. Cordeiro

Abstract—Deploying neural networks (NNs) in low-resource computation steps [4]. In general, different quantization strate-
domains is challenging because of their high computing, memory, gies can be used. On the one hand, some studies consider
and power requirements. For this reason, NNs are often quan- only the quantization of NN weights [6], [7], [8]. On the
tized before deployment, but such an approach degrades their
accuracy. Thus, we propose the counterexample-guided neural other hand, other studies provide entire NN frameworks in
network quantization refinement (CEG4N) framework, which integer precision, including weights, activation functions, and
combines search-based quantization and equivalence checking. convolutional layers [4], [9].
The former minimizes computational requirements, while the The goal is compressing an NN to the smallest possible bit-
latter guarantees that the behavior of an NN does not change width. However, doing so may affect the functional behavior
after quantization. We evaluate CEG4N on a diverse set of
benchmarks, including large and small NNs. Our technique of the resulting NN, making it prone to errors and loss of
successfully quantizes the networks in the chosen evaluation set, accuracy [5], [9]. For this reason, existing techniques usually
while producing models with up to 163% better accuracy than monitor the accuracy degradation of a quantized NN (QNN)
state-of-the-art techniques. with statistical measures defined on the training set [5].
Index Terms—Equivalent quantization (EQ), neural network Statistical accuracy measures do not capture a network’s
equivalence (NNE), neural network (NN) quantization. vulnerability to adversarial inputs. Indeed, specific inputs may
exist for which a network’s performance degrades signifi-
cantly [5], [10]. Consequently, the only way to guarantee
I. I NTRODUCTION accuracy for a QNN is reformulating the problem under the
EURAL networks (NNs) are becoming essential in many notion of equivalence checking (EC) [11], [12], [13]. This
N applications, such as autonomous driving [1], medicine,
security, and other safety-critical domains [2]. However, cur-
property states that two NN are equivalent when they produce
similar outputs for inputs in a given domain [12], [13].
rent state-of-the-art NNs often require substantial computing, This article extends a previous work [14], which tackles the
memory, and power resources, limiting their applicability [3]. problems of NN quantization and EC in a modular fashion
As a result, resource-constrained systems may be unable to run within the counterexample-guided neural network quantization
complex NNs, leading to high opportunity costs for businesses. refinement (CEG4N) framework by iterating between two
Quantization techniques can help reduce the resource stages: searching for QNN candidates over a finite set of coun-
requirements of NNs [3], [4], [5] by decreasing the bit terexamples and verifying the QNN to either prove equivalence
width required to represent their parameters and intermediate or generate more counterexamples. Here, we present a number
of additional contributions.
Manuscript received 14 April 2023; revised 30 September 2023; accepted 1) We describe the equivalent quantization (EQ) problem as
13 November 2023. Date of publication 21 November 2023; date of cur- a general optimization-verification iterative framework.
rent version 21 March 2024. This work was supported in part by the
Engineering and Physical Sciences Research Council (EPSRC) under Grant 2) We show that CEG4N works with multiple verifica-
EP/T026995/1 and Grant EP/V000497/1; in part by the Soteria Project tion engines by employing NN equivalence verification
awarded by the U.K. Research and Innovation for the Digital Security by (NNEV) and satisfiability modulo theories (SMT).
Design (DSbD) Programme; in part by the Cal-Comp Electronic by the R&D
project of the Cal-Comp Institute of Technology and Innovation; in part 3) We extend the experimental evaluation of CEG4N by
by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior— considering a larger set of ACAS Xu networks [15],
Brasil (CAPES-PROEX)—Finance Code 001; and in part by the Amazonas deeper fully connected networks for MNIST [16],
State Research Support Foundation—FAPEAM—through the POSGRAD
Project. This article was recommended by Associate Editor P. A. Beerel. and convolutional networks for CIFAR-10 [17].
(Corresponding author: Lucas C. Cordeiro.) Furthermore, we explore the effect of different archi-
João Batista P. Matos Jr. is with the Graduate Program in tectures on small networks trained on the Iris [18] and
Informatics, Federal University of Amazonas, Manaus 69067-005, Brazil
(e-mail: [email protected]). Seeds [19] datasets.
Eddie B. de Lima Filho is with the R&D Department, TPV Technology, 4) We show that CEG4N can successfully quantize NNs
Manaus 69058-581, Brazil (e-mail: [email protected]). and produce models with up to 163% better accu-
Iury Bessa is with the Department of Electricity, Federal University of
Amazonas, Manaus 69067-005, Brazil (e-mail: [email protected]). racy, when compared with state-of-the-art quantization
Edoardo Manino, Xidan Song, and Lucas C. Cordeiro are with techniques.
the Department of Computer Science, University of Manchester, The structure of this article is as follows. In Section II,
M13 9PL Manchester, U.K. (e-mail: [email protected];
[email protected]; [email protected]). we introduce some preliminary information on QNNs and
Digital Object Identifier 10.1109/TCAD.2023.3335313 equivalence verification. In Section III, we give a broad survey

c 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/
1122 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 4, APRIL 2024

of related work. In Section IV, we propose our improved


CEG4N framework. In Section V, we present the results of our
extensive experimental evaluation. In Section VI, we conclude
and outline potential future work.

II. P RELIMINARIES
A. Neural Networks
In general, NNs are nonlinear multivariate functions Fig. 1. Simple feedforward NN that has three hidden layers with k neurons
in each, which accepts an input of size n and produces an output of size m.
f : I ⊂ Rn → O ⊂ Rm (1)
where I ⊂ Rn and O ⊂ Rm are the input and output a mapping function Qn : Rm×p → Im×p , which is
domains with dimensions n and m, respectively. Internally, formulated as
an NN is structured as a direct graph with a set of H  
A
hidden layers. In a feedforward NN, the neurons in each layer Q(A, n) = clip , −2 , 2
n−1 n−1
−1 (5)
q(A, n)
h = {0, 1, . . . , H + 1} are connected to those in the preceding
layer h − 1. Additionally, the neurons in the first layer h = 0 where A ∈ Rm× p denotes a high-precision (i.e., single scalar,
are just a placeholder for the input of the NN, while the vector, or matrix) value; n is the number of bits used for
neurons in the last layer h = H + 1 hold the output of quantization; q(A, n) means a function that computes the
function f . Fig. 1 shows a feedforward NN with H = 3 hidden scaling factor for A concerning a number of bits n; clip denotes
layers. a clipping function that ensures the values being mapped by
In this article, we assume that the output a(h) of each the quantization function are bounded by some upper lower
layer is computed by combining an affine and a nonlinear and upper values; · denotes rounding to the nearest integer;
transformation as follows. The nonactivated and activated and −2n−1 and 2n−1 − 1 indicate lower and upper bounds of
outputs of layer h, respectively, z and a(h) , are the clipping function, respectively. Defining a scaling factor
[see (6)] is an important aspect when dealing with uniform
z(h) = W(h) · a(h−1) + b(h) (2)
  quantization [9], [22]. Moreover, the original high-precision
a(h) = σ z(h) (3) values to be quantized are floating-point ones, given that NNs
are usually designed using this representation.
where W(h) ∈ Rmh−1 ×mh is weights’ matrix, b(h) ∈ Rmh is The scaling factor divides a given range of values A into
the bias vector, σ : Rmh → Rmh is the nonlinear activation an arbitrary number of partitions. Thus, let us define a scaling
function, a(0) = x is the NN input, and the layers dimensions factor function q(A, n), a number of bits (bit-width) n to be
satisfy n = m0 and mH+1 = m. The most popular activation used for quantization, and a clipping range given by [α, β],
functions σ are the rectified linear unit (ReLU), the sigmoid which leads to a scaling factor defined as
(Sigm), the hyperbolic tangent (TanH), and the max pooling β −α
operator [13]. While our framework is agnostic to the specific q(A, n) = . (6)
2n − 1
choice of σ , in our experiments we focus on ReLU
    We use symmetric quantization, thus the clipping values
ReLU z(h) = max 0, z(h) . (4) are β = −α = max([| min (A)|, | max (A)|]). A quantization
process can produce an integer value outside the quantized
Our framework support both fully connected and convolu- range. To prevent that, an additional clipping step is necessary.
tional layers. More specifically, (2) can be written as follows: A de-quantization process computes back original values as
⎛ (h) ⎞ ⎛ (h) (h) ⎞⎛ (h−1) ⎞ ⎛ (h) ⎞
z1 w1,0 . . . w1,mh−1 a b  = q(A, n)Q(A, n). (7)
⎜ (h) ⎟ ⎜ (h) (h) ⎟⎜ 1 ⎟ ⎜ 1 ⎟
⎜z2 ⎟ ⎜ w2,0 . . . w2,mh−1 ⎟⎜a2(h−1) ⎟ ⎜b(h) ⎟
⎜ . ⎟=⎜ ⎟⎜ . ⎟ + ⎜ 2. ⎟ However, both clipping and rounding cause permanent loss of
⎜ . ⎟ ⎜ .. .. . ⎟⎜ ⎟ ⎜ ⎟
⎝ . ⎠ ⎝ . . .. ⎠⎝ .. ⎠ ⎝ .. ⎠ information. Consequently, de-quantization can only approxi-
z(h)
mh
(h) (h)
wmh ,0 . . . wmh ,mh−1
(h−1)
am h−1 b(h)
mh mate original values, i.e., A ≈ Â.

where the matrix W(h) is dense in fully connected layers and C. Neural Network Quantization
sparse in convolutional ones [20].
When we quantize an NN, we can follow a number of
different strategies [4], [9]. Here, we will consider only the
B. Quantization strategy of storing all NN weights in quantized format to
Quantization is the process of constraining high-precision reduce the memory requirements [6], [7], [8]. At inference
values (e.g., single-precision floating-point values) to a finite time, we assume that the weights are de-quantized and all the
range of low-precision ones (e.g., integers) [5], [21]. The operations in the NN are executed in floating point.
quantization quality is usually determined by a scalar n Consider a feedforward NN f as defined in (2) and (3).
(the available number of bits) that defines a finite range’s Call Nf the set whose elements n(h) ∈ Nf are the bit widths
lower and upper bounds. Let us define quantization as for each layer h in f . Given the de-quantization process
MATOS Jr. et al.: COUNTEREXAMPLE-GUIDED NN QUANTIZATION REFINEMENT 1123

in (7) and the nonactivated output in (2), we can describe the Checking NNE becomes possible by relying on the negation
nonactivated output ẑ(h) of a quantized layer h as of f  f , i.e., by encoding it as a formula that asserts the
    existence of an input x ∈ I and two outputs y, y ∈ O (y = f (x)
ẑ(h) = q W(h) , n(h) Q W(h) , n(h) · a(h−1) and y = f (x)) such that they do not satisfy the conditions
    imposed by . Indeed, we check if
+ q b(h) , n(h) Q b(h) , n(h) . (8)
 
∃x ∈ I; y, y ∈ O; y = f (x) ∧ y = f (x) ∧ y = y
D. Neural Network Equivalence
or, regarding the -Equivalence definition, if
Let f : I → O and f : I → O be two arbitrary NNs,  
where I ∈ Rn and O ∈ Rm are their common input and ∃x ∈ I; y, y ∈ O; y = f (x) ∧ y = f (x) ∧ ||y − y ||p >  .
output spaces, respectively. Currently, the literature reports the
In summary, checking whether the formula
following definitions of equivalence [11], [12], [13], [23], [24].
y = f (x) ∧ y = f (x) ∧ y = y is unsatisfiable can be further
Definition 1 (Top-Equivalence): Consider two NNs
expressed as
f : I → O and f : I → O. Then, f and f are Top-1-
equivalent, i.e., f ≡ f , if and only if the following holds: φ := y = f (x)
∀ x ∈ I, f (x) = f (x). (9) φ := y = f (x)
 := φ ∧ φ ∧ y = y . (11)
Definition 2 (-Equivalence): Consider two NNs f : I →
O and f : I → O, and an  > 0. Then, f and f are - Similarly, checking whether the formula y = f (x) ∧ y
equivalent, i.e., f p, f , if and only if the following holds: = f (x) ∧ ||y − y ||p >  is unsatisfiable can be expressed as
 
∀ x ∈ I, f (x) − f (x)p ≤ . (10) φ := y = f (x)
Definition 1 is a strict form of equivalence and imposes a φ := y = f (x)
hard requirement [13]. Definition 2, in turn, is a flexible form  := φ ∧ φ ∧ ||y − y ||p > . (12)
of equivalence [12]. As noted by Eleftheriadis et al. [13], Top-
Equivalence is a true equivalence relation, that is, it is reflexive In addition, the input of an NN f is a vector
(f ≡ f for any NN f ), symmetric (f ≡ f iff f ≡ f ), and x = [x1 , . . . , xn ] ∈ Rn , and some limitation regarding it may
transitive (f ≡ f and f ≡ f implies f ≡ f ). However, be necessary. This constraint can then be added as
-Equivalence is only reflexive and symmetric. 
n
xj − r ≤ xj ≤ xj + r. (13)
E. Verification of Equivalence Properties j=1

The goal of NNEV verification is to check if f  f . Then, In practice, we define a limiting region between xj −r and xj +r
we define NNEV as follows. around every point xj ∈ x, where equivalence is more likely.
Definition 3 (NN Verification Problem): Given two NNs f It works as another relaxation factor for equivalence because
and f and an equivalence relation ∈ {≡,  }, NNEV the associated properties should hold only for a restricted
consists in checking if f  f . input domain. It is also corroborated by the notion that we
This article uses two paradigms: 1) SMT and 2) reachability expect the same behavior from a close neighbor of an input.
analysis (RA). With SMT, the equivalence property (EP) and In addition, it can also be linked to real conditions of a
the NN model are encoded as a first-order logic formula. SMT given application, such as its equivalence method and input
restricts the full expressive power of first-order logic to a deviation and magnitude. However, this does not mean that
decidable fragment. With RA, in the form of geometric path our technique is limited to small input ranges. Instead, it is
enumeration (GPE), the property to be checked and the model important to choose a range that preserves the relationship
are encoded as linear constraints. between x and y and is also according to a specific application.
1) SMT Encoding: SMT formulas can capture the complex A possible way to use SMT is to employ a verifier based
relationship between variables holding real, integer, and other on it and then provide a suitable model. One example is
data types. If it is possible to assign values to them so that a the efficient SMT-based bounded model checker (ESBMC),
formula is evaluated as true, then it is said to be satisfiable. which supports SMT solvers natively. It generates verification
However, if assigning such values is impossible, the mentioned conditions for a given C or C++ program, i.e., its input model,
formula is considered unsatisfiable. encodes them using different SMT background theories (i.e.,
The following steps show how to reduce NN equivalence linear-integer, real arithmetic, and bit-vectors), and employs
(NNE) to a logical satisfiability problem: 1) encoding f into different solvers (e.g., Boolector [25] and Z3 [26]).
an SMT formula φ; 2) encoding f into an SMT formula φ ; 2) Geometric Path Enumeration Encoding: Tran et al.,
3) encoding the relation f  f into an SMT formula , such 2019 [27] proposed GPE, a methodology for verifying NNs’
that f  f iff  is not satisfiable; and 4) checking, via SMT safety properties and the verification approach used by the tool
solver, whether  is satisfiable. If the latter is true, f and f proposed by Teuber et al. [12], namely, NNEQUIV. We briefly
are not equivalent, and the solver provides a counterexample. describe NNEQUIV and how it encodes NNs and EP into a
Otherwise, f and f are equivalent. verification problem.
1124 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 4, APRIL 2024

Definition 4 (Generalized Star Set [27]): A generalized Finally, some studies store quantized weights with floating-
start set  is a tuple  c, G, P where c ∈ Rn is the center, point precision, thus facilitating integration for inference and
G = (g1 , . . . ,gm ) ∈ Rn×m is the generator matrix, and P ⊆ Rm generalization [8], [33].
is a polytope defining a conjunction of linear constraints. The
set represented by  is then defined as B. Quantization-Aware Training
  Another important aspect to consider is whether to employ
 = x ∈ Rn |∃α ∈ P : x = c + Gα .
a post-training quantization strategy, as we do in the present
Assume we have two NNs f and g representing piecewise paper, or allow for some form of weight retraining. The
linear functions. Furthermore, assume that we want to verify latter may recover some of the performance lost due
whether f (x) = g(x) for the input domain I ≡ c, G, P. Since to the quantization and has been a very active area of
f and g are piecewise linear, there exist a tiling T ≡ {P } of the research [34], [35], [36]. One of the fundamental problem is
input domain I such that I = {x ∈ Rn | ∃P ∈ T ∧ α ∈ P : x expressing the underlying optimization problem in a gradient-
= c + Gα}. Moreover, for each tile P ∈ T , we require both f friendly form. In this way, the quantization objective can be
and g to be linear, i.e., f (x) = cf + Gf α and g(x) = cg + Gg α included in the regular loss function during training [35].
for α ∈ P , where (cf , cg , Gf , Gg ) are specific to each tile P . As the size of NNs has grown larger, more recent work
The NNs f and g are equivalent if cf = cg and Gf = Gg for attempts to quantize the training gradients too. As an example,
each tile P . Zhou et al. [37] proposed a method called DoReFa-Net for
The NNEQUIV’s algorithm [12] computes the tiling T training convolutional NNs with low bitwidth weights, acti-
via repeated RA as follows. First, the input domain I is vations, and gradients. During the training process, parameter
propagated through NN f , outputting a union of star sets. Then, gradients are quantized to low bitwidth values, allowing faster
each set is projected back onto the input space and propagated training and inference using bit convolution kernels. This
through NN g, leading to a further union of star sets whose approach is efficient on various hardware platforms like CPU,
elements, once projected onto the input domain, represent the FPGA, ASIC, and GPU.
tiles P ∈ T . Alternatively, higher rates of compression can be achieved
by using different quantization schemes on different regions
of the input. For instance, Huang et al. [38] proposed a
III. R ELATED W ORK dynamic quantization strategy that avoids a nonuniform usage
A. Neural Network Quantization of the available computational resources. Their technique is
particularly suited to deployment on hardware accelerators.
There are many aspects to consider when deciding to deploy
Unfortunately, quantization-aware training may break the
a quantization scheme [5]. For instance, if the goal is to reduce
assumptions of the existing NNE verification tools [11], [12],
an NN’s size, one can consider quantizing only its weights
[13], [39]. Thus, in this article, we focus on post-training
and biases [5], [9], [28]. However, if the goal is to reduce
quantization.
computation and memory requirements, one can consider
quantizing weights, biases, and activation functions [9], [28].
Indeed, the quantization of activation functions can reduce C. Verification of Quantized Neural Networks
computational and memory costs [9], [28]. However, it raises Giacobbe et al. [40] are the first to formally investigated the
challenges as it usually requires a calibration step using impact of quantization on NNs. They explore how quantization
representative data, prior to quantization, to correctly compute affects the NNs’ robustness and formal verification. They pro-
quantization ranges [5], [28]. Conservative approaches based pose a bit-precise SMT-solving approach for determining the
on neurons’ transfer functions can also be used [29], but they satisfiability of first-order logic formulas where variables rep-
might lead to poorly quantized regions. In this regard, the resent fixed-size bit-vectors. Their study shows that there is no
technique proposed here aims to reduce an NN’s size as it simple and direct correlation or pattern between the robustness
applies a method that only quantizes its weights and biases. and the number of bits of a QNN and makes several significant
In fact, several studies have focused specifically on quan- contributions, including revealing nonmonotonicity in QNN
tizing weights of NNs [6], [8], [30], [31], [32], [33]. For robustness, introducing a complete verification method, and
instance, Courbariaux et al. [30] proposed a method called highlighting the limitations of existing approaches.
binarized NNs, which lowers the storage costs of an NN’s Henzinger et al. [41] proposed an SMT-based verifica-
weight parameters by reducing them to binary values. Other tion method for QNNs, using bit-vector specifications. It
studies have also explored mixed-precision quantization tech- requires translating an NN and its safety properties into closed
niques, which quantize NN weights while retaining activation quantifier-free formulas over the theory of fixed-size bit-
functions in full precision. Yuan et al. [32] proposed EvoQ, vectors. It performs verification only, focusing on robustness,
which uses evolutionary search to achieve mixed precision while CEG4N tackles both NN quantization and verification,
quantization without access to complete training datasets. trying to find a more compact representation that is sound,
Zhou et al. [6] proposed the incremental network quantization, with NN translation into formulae done by a verifier.
which converts a convolutional NN into a lower precision Mistry et al. [42] discussed the formal verification of QNNs
version whose weights can only be either powers of two or implemented using fixed-point arithmetic. The authors propose
zero, considering weight importance to keep high accuracy. a novel methodology for encoding the verification problem
MATOS Jr. et al.: COUNTEREXAMPLE-GUIDED NN QUANTIZATION REFINEMENT 1125

into a mixed-integer linear programming (MILP) problem, From the definitions of NNE discussed in Section II-D, we
focusing on the bit-precise semantics of QNNs. Their results preserve the equivalence between the mathematical functions
demonstrate that their MILP-based technique outperforms f and f q associated with the original and QNNs, respectively.
state-of-the-art bit-vector encodings by a significant margin. In more detail, consider an NN f with H layers. As stated in
Song et al. [43] proposed QNNVerifier, which performs Definition 5, its quantization assumes that there is a vector N
SMT-based verification of QNNs. Their technique relies on whose elements nh represent the bit width that should be used
fixed-point operational models for using the C language as an to quantize the hth layer in f , with h = [0, 1, 2, . . . , H − 1].
abstract model, which allowed operations to be encoded in In our EQ problem, we obtain a vector N whose elements nh
their quantized form, explicitly, thus providing compatibility are minimized while keeping f and f q equivalent. To obtain
with SMT solvers. Although this study presents some simi- N , one can apply an optimization algorithm.
larities with ours, the main difference lies in the properties
being verified. It checks if a QNN is invariant to adversarial A. EQ as Minimization Problem
inputs, while CEG4N iteratively verifies if an NN is invariant We consider the EQ processing of an NN as an iterative
to quantization, with decoupled quantization and verification. minimization problem. Specifically, each iteration is composed
of two complementary subproblems. First, we need to optimize
D. Neural Network Equivalence the numbers of bits for quantization, i.e., finding a candidate
vector N that holds all minimum bit widths. Second, we need
Our CEG4N framework can in principle accommodate
to verify the EP, i.e., checking if an NN quantized with the
multiple equivalence verification techniques. In addition to
bit widths in N is equivalent to its original model. If the latter
those, we mention in Section II, all the following are
fails, we iteratively return to the minimization subproblem
viable alternatives. First, Büning et al. [11] defined the
with additional information. More formally, we define the first
notion of relaxed equivalence because exact equivalence (see
subproblem as follows.
Section II-D) is hard to solve. They choose to encode EPs
Optimization Subproblem o:
into MILP. First, the input domain is restricted to radii around 
a point, where equality is more likely. Then, a less strict Objective: N o = arg min p(h) ∗ n(h)
relation is used. Finally, two NNs R and T are equivalent if the n0o ,...,nH−1
o h∈Nh<H
classification result of R is amongst the top-K largest results s.t: f (x)  f q
(x) ∀ x ∈ HCE o
of T. It was later extended by Teuber et al. [12]. n(h) ≥ N ∀n ∈N (h) o
Furthermore, Eleftheriadis et al. [13] proposed an SMT-
based NNE checking scheme based on strict -Equivalence. n(h) ≤ N ∀ n(h) ∈ N o . (14)
The key differences between their work and the EC techniques Here, f is the function associated with NN F, f q is the
we use are as follows. We also consider Top-Equivalence, and quantized function associated with NN F, and HCE o is a set
we encode NNs, EP, and equivalence relation either as a C of counterexamples that may be available at iteration o.
program or Python code along with a structured description. Consider N and N as the minimum and maximum bit
More recently, Zhang et al. [39] introduced QEBVerif, a widths allowed for quantization that ensure two aspects as
method that is capable of verifying the equivalence between follows. First, they provide lower and upper bounds for the
an NN and its quantized counterpart when both the weights quantization bit width, which guarantees correctness for the
and the activation tensors are quantized. QEBVerif consists of quantization process and the generation of valid quantized
differential RA (DRA) and MILP-based verification. Similar models. Second, they can be regarded as initialization and
to the work by Eleftheriadis et al. [13], they mainly focus on termination criteria, the latter if a candidate N o such that
-Equivalence, and their technique cannot be easily extended n(h) = N for every n(h) ∈ N o is reached. When it happens,
to Top-Equivalence. the optimization process is stopped as no valid quantized
model could be generated. In any case, if CEG4N proposes
a quantization solution where n = N for every n ∈ N o , it
IV. C OUNTEREXAMPLE G UIDED N EURAL N ETWORK is verified as well. Besides, if the verification process returns
Q UANTIZATION R EFINEMENT a counterexample, CEG4N finishes with failure. Finally, note
This work aims to provide a methodology for creating com- that HCEo is an iterative parameter updated at each iteration o

pressed NNs that are as small as possible, from a quantization and based on the verification subproblem.
point of view. We provide the definition of EQ as follows. Moreover, the function being minimized represents a
Definition 5 (NN Quantization Vector): Let N = (n0 , . . . , weighted summation of the bit widths in the solution candidate
nH−1 ) be a vector that contains the bit width nh for the weights Ho , where p(h) is a constant value associated with the bit
of each layer h of a QNN fq . width n(h) . These constants allow the optimization algorithm
Definition 6 (EQ): Let f be the reference NN, H ∈ Rn be to prioritize layers based on their sizes. In our case, we have
a set of input instances, and ∈ {≡, p, } an NNE relation. defined weights proportional to the number of neurons in a
A vector of bit widths N can be used to quantize f and obtain given layer h: more extensive layers, in terms of neurons, have
its quantized version f q . Thus, EQ searches for a vector N for bigger weights. This way, the optimizer searches for solutions
which f and f q satisfy the following equivalence constraint: with smaller bit widths associated with larger layers, which
f (x)  f q (x) ∀ x ∈ H. straightly represent their complexities.
1126 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 4, APRIL 2024

Fig. 2. Overview of CEG4N’s architecture, highlighting the relationship between main modules and their inputs and outputs.

Our search-based technique is very simple as a unique the choice for the latter, the functional behavior of an NN is
precision is adopted for an entire NN layer, i.e., we do not represented at different levels of abstraction. Specific examples
differentiate channels in a single convolution layer. Since it include the open NN exchange (ONNX) file format, which
favors the combination of quantization and formal verification, stores the architecture of an NN and its weights, and a C/C++
we prioritize the feasibility of the overall framework. source file, which provides a low-level NN implementation.
Verification Subproblem o: Generally, we encode the original NN and its quantized
  counterpart separately.
 := φ ∧ φ ∧ ¬ y  y . At this point, it is important to explain the internal NN
In the verification subproblem o, we check whether N o representations used in CEG4N. Specifically, it handles two
generated by the optimization subproblem o satisfies one of functional versions of the same NN, at the same time:
the properties presented in Section II-E, depending on the 1) a version for optimization written in Python, in BSM;
chosen form of equivalence. If  holds for the candidate 2) a version for verification written in C, when ESBMC
N o , the optimization halts and N o is presented as a solution; is used, or Python, when NNEQUIV is employed, in
otherwise, a new counterexample xCE is generated. Then, the verifier module (VM).
iteration o + 1 starts where the iteration o stops. Consequently, Such versions are equivalent as they share the same param-
the optimization subproblem o + 1 receives as parameter a eters (i.e., weights, bias, and activation functions), while the
set of HCEo+1 o+1
such that HCE = HCE o ∪x , which is used as major difference resides in how their operations are imple-
CE
additional information for successful execution. mented. BSM works with an NN representation written in
Python, in addition to an ONNX model. It is loaded along with
its weights into the Pytorch framework, which implements all
B. CEG4N Framework Implementation NN mathematical operations. The other two representations
We propose the CEG4N framework, a counterexample- are created in AM and used in VM. If ESBMC is chosen
guided optimization approach to solve the EQ problem as verifier (see Section II-E1), a functional NN version
presented here. It integrates different techniques to tackle the written in C/C++ and adapted to it is used. However, if
two subproblems described in Section IV-A: the optimization NNEQUIV is chosen (see Section II-E2), a functional version
of bit widths and the verification of NN equivalence. Indeed, written in Python is employed. The latter implements all
CEG4N is designed to combine three modules, each char- mathematical operations directly in Python, with no additional
acterized by a specific role in EQ. Fig. 2 illustrates its framework.
architecture. 3) Verifier Module: The third and final element, which we
1) Bits Search Module: The first element, namely, bits call the VM, receives as inputs the previously mentioned NN
search module (BSM), is an instance of the optimization abstractions and a set of EPs. Then, it checks whether the
algorithm, e.g., a genetic algorithm (GA). BSM expects three latter holds for the former. When any given property does
main inputs: the original NN, a set of counterexamples HCE , not hold, a counterexample is provided by VM. Currently,
and a set of EPs [see (14)]. Its output is a vector of bits CEG4N supports two options: 1) a bounded model checker,
containing the bit width for each layer in a given NN. We namely, ESBMC [44], [45]) and 2) a GPE encoder, namely,
can also specify lower and upper bounds to restrict the NNEQUIV [12].
possible widths, and, depending on the chosen optimization In summary, NN equivalence is checked using SMT or
algorithm, other parameters may also be required. For instance, GPE. In the first, an NN is encoded into a C program that
if we use GAs, we may need to specify the maximum is translated into SMT by ESBMC. In the second, NNEquiv
number of generations they are allowed to run. Besides, there takes an NN description in Python, encodes it into star sets,
is no limitation regarding optimization algorithms, i.e., any and handles the result as a linear programming problem.
technique could be used. However, in this article, we only However, there is no essential restriction regarding verifi-
support and describe a GA module. cation techniques as long as counterexamples are provided.
2) Abstractions Module: The second element, namely, That is, an interesting compromise configuration regarding
abstractions module (AM), encodes the original and QNNs any desired aspect, such as scalability, accuracy, or speed.
into a format the third module can handle. Depending on In addition, we believe CEG4N is the first framework to
MATOS Jr. et al.: COUNTEREXAMPLE-GUIDED NN QUANTIZATION REFINEMENT 1127

combine quantization and EC, with formal guarantees for the outputs [15]. Its features are sensor data indicating the speed
equivalence between quantized and original NNs. and course of the monitored aircraft and the position and
An important aspect is that spurious counterexamples may speed of any nearby intruder. Its NNs are expected to give
be generated in VM (see Section II-E2). To prevent them from appropriate navigation advisories for input sensor data. The
o+1
being added to HCE , after each step o, we double-check them expected outputs indicate that either the aircraft is clear of
using the original and QNNs. If their predictions diverge, the conflict or it should take soft or hard turns to avoid a collision.
respective counterexamples are discarded. We have evaluated CEG4N on nine pretrained NNs [46], each
4) High-Level Overview of CEG4N Run: A successful containing six layers and a total of 300 ReLU nodes, which
CEG4N run can be summarized in the following steps. were obtained from the VNN-COMP 2021’s benchmarks.1
1) CEG4N starts with NN, a set of counterexamples, the 2) MNIST: This is a popular dataset for image classifica-
BSM’s parameters, and the VM’s EPs. tion [16]. It contains 70 000 gray-scale images of size 28 ×
2) BSM runs to look for a set of bit widths. 28, where the original integer pixel values ([0, 255]) are
3) If none is found, CEG4N stops. rescaled to the floating-point range [0, 1]. We have evaluated
4) Otherwise, AM creates NN abstractions based on the set CEG4N on nine NNs, from which three models contain a
of bit widths found by BSM. single layer with 10, 25, and 50 ReLU nodes, following the
5) VM runs with the previously generated NN abstractions architecture described by Eleftheriadis et al. [13]. Three other
and the EPs specified in Step 1). models, obtained from the VNN-COMP 2021’s benchmarks,
6) If these properties hold, CEG4N stops and produces a have 2, 4, and 6 layers, each with 256 ReLU nodes. The
QNN based on the bit widths found in Step 2). remaining three models were trained using resized MNIST
7) Otherwise, a new set of counterexamples, consisting of images, similar to the first three single-layer ones we described
elements specified in step 1) and also new ones produced here. In addition, we have employed 8 × 8 resized images to
by VM in step 4), is created. reduce dimensionality and also give invariance to small image
8) The execution goes to step 2), until a timeout is reached. distortions. In summary, three of the mentioned models were
5) Discussion: We will now discuss some aspects of pretrained, the VNN-COMP 2021’s ones, while the remaining
CEG4N. First, BSM relies on a GA module to find QNN elements were trained specifically for our experiments.
candidates. It guarantees termination, using timeouts, but not 3) Seeds: This dataset consists of 210 samples of wheat
that the most compact model will be found. Then, we need grain belonging to three different species, namely, Kama,
to consider VM, which, due to verification aspects, does Rosa, and Canadian [19]. Its input features include seven
not ensure completeness; however, it is sound. If a given measurements of the wheat kernel geometry scaled between
property is satisfied for a QNN candidate, it will produce [0, 1]. We have evaluated CEG4N on four NNs containing a
a counterexample as proof that it is not equivalent to its single layer with 4, 6, 10, and 15 ReLU nodes. These four
original counterpart; otherwise, this QNN candidate satisfies NNs were specifically trained for evaluating CEG4N.
the specified EPs. Therefore, CEG4N can find, when possible, 4) Iris: This dataset consists of 50 samples from three
a QNN that satisfies the equivalence requirements regarding species of Iris flower (Iris setosa, Iris virginica, and Iris
its original counterpart. versicolor) [18]. It is a popular benchmark in machine learning
One may also argue that encoding the quantization problem for classification, where data is composed of records of real
into SMT formulae could be used to replace our search- value measurements of the width and length of sepals and
based module. However, it is clear that handling only NN petals of flowers. Its data were scaled to [0, 1]. We have
verification with SMT solvers already leads to large state evaluated CEG4N on three NNs containing two layers with 4,
spaces [29], complex encoding [41], and long run times, the 10, and 15 ReLU nodes in each. These three NNs were trained
latter shown here, which indicates that a possible unified specifically for evaluating CEG4N.
framework including quantization will be even more complex. 5) CIFAR-10: This is a dataset for image classification
Consequently, we believe that the decomposition strategy composed by 60 000 color images with 32 × 32 pixels for
resulting from decoupling quantization and verification makes 10 different classes [17]. We have evaluated CEG4N on two
EQ tractable. pretrained NNs made fromVNN-COMP 2021. One has 3
convolutional and 2 linear layers, each with 250 neurons, while
V. E XPERIMENTAL E VALUATION the other has 2 convolutional and 2 linear layers, each with
250 neurons. Both use only ReLU activations.
A. Description of the Benchmarks
We evaluate our methodology based on feedforward B. Setup
and convolutional NN classification models extracted from
[13], [15], and [16]. We have chosen them mainly based on 1) BSM: As explained in Section IV-A, we use a search-
their popularity in previous verification studies [12], [13]. based optimization algorithm to find bit widths for NN
Additionally, we have included other NN models to cover a quantization. We have experimented with nondominated sort-
broader range of NN architectures (e.g., size and number of ing GA II (NSGA-II) [47], with the lower and upper bounds
neurons). The chosen benchmarks are presented below. for the allowed bit widths set to 2 and 32. The choice for
1) ACAS Xu: This dataset is derived from eight specifica- 1 The pretrained weight for the ACAS Xu benchmarks can be found in the
tions, including features, decision boundaries, and expected following repository: https://github.com/stanleybak/vnncomp2021.
1128 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 4, APRIL 2024

the lower bound relies on the first valid integer that does not adopted for r, also empirically, but now taking into account
break our quantization scheme. In addition, the mentioned Acas Xu’s aspects: broader input range and -Equivalence.
upper bound was selected to match the maximum number If we revisit Definition 2, we must choose two additional
of bits usually employed for integer representation in many parameters, namely, p and . The value for  is usually chosen
different NN frameworks, such as PyTorch and ONNX, and according to the application domain of the NNs being verified,
programming languages, such as C. Moreover, the upper in such a way that it is possible to prove equivalence and,
bound could also be higher, depending on the precision of NN at the same time, the resulting NNs are useful, i.e., they
weights. However, the single-precision floating-point format present tolerable output differences. It is also possible to find
is the standard choice to train NNs and store their weights. an optimal  by incrementally looking at counterexamples
Another important factor backing our choice is that some and deciding if, from the user perspective, their outputs are
NN frameworks may not support other floating-point precision equivalent [46]. Ultimately, we decided for  = 0.05 as it
types, which is the case of ONNX. Since the NNs used in our means a maximum difference of 10% in Acas Xu’s scores. In
experiments are stored using it, their associated floating-point addition, such a value was also adopted by Teuber et al. [12].
values are actually represented with single precision. Finally, we chose p = ∞ due to efficiency reasons [48] and
Furthermore, we have allowed our GA instance to run for also aimed at consistency across different verifiers, which was
1000 generations for every time BSM was run. Such a figure also adopted by Teuber et al. [12].
was found by empirically and incrementally checking if it 3) Time Limits: A timeout is important to ensure ter-
was able to produce QNNs in initial experiments with our mination and should not be arbitrary. In our case, each
benchmarks. That was done by providing an empty HCE to EP verification takes at most 20 min, which is consistent
BSM and a given number of generations: if the outputted N across all verifiers used in our experiments. It was based on
matches a vector such that n = 2 ∀n ∈ N , then GA can find a hardware configuration, expected run time, and other aspects,
solution with the given number of generations. In our case, GA but different limits can be set to suit distinct scenarios.
was able to find solutions for every benchmark when allowing 4) Availability of Data and Tools: Our experiments are
it to run for 1000. Notice that this number of generations may based on publicly available benchmarks. All tools, bench-
not be optimal for smaller NNs, but it does not negatively marks, and results employed here are available on the
impact the correctness of CEG4N. supplementary web page https://zenodo.org/record/7126601.
Lastly, we randomly selected the initial set HCE 0 for each

dataset, with one sample for each class (e.g., we have C. Objectives
selected 10 samples for MNIST and 3 for Iris). The samples
This work explores the concept of EQ, i.e., a safety property
in HCE 0 do not necessarily have to be selected from the
that defines an equivalence relation between an original NN
benchmarks dataset (train or test), and any concrete input
and its quantized form. Then, we propose CEG4N, a frame-
can indeed be specified (e.g., synthetic data). Specifically, in
work that quantizes NNs while accounting for EQ.
our experiments, we have employed only real data, i.e., data
from the chosen datasets, Consequently, we have opted to EG1 Is the CEG4N framework able to generate QNNs
keep one sample per class. In summary, our choice is further that respect the EQ concept?
justified by three conditions: 1) the practical aspect of using EG2 How is CEG4N comparable to other quantization
samples from the benchmarks set; 2) a test set that holds techniques?
representative data; and 3) the attempt to add variability to our
experiments.
Due to our research’s novelty, no existing similar techniques
2) Equivalence Properties: Regarding equivalence, we
lend themselves to a fair comparison. Indeed, the present
need to choose from input samples and input constraints of
framework is a pioneer one and intends to pave the way for
each property, under Definitions 1 and 2.
integrating compression into practical NN deployment cycles.
The EPs for Iris, Seeds, MNIST, and CIFAR-10 were
defined by: 1) selecting one real input sample for each class
(similar to the definition of HCE
0 ), at random; (2) choosing Top- D. Results
Equivalence; and 3) setting input constraints. Regarding the In our first set of experiments, we want to achieve our first
latter, we have defined three possible values for r [see (13)]: experimental goal EG1. We want to show that CEG4N can
r = [0.01, 0.03, 0.05]. Such values were selected empirically, successfully generate QNNs that are verifiably equivalent to
based on input data and experiments, and reflect the full their original counterparts, i.e., respecting the EQ concept. As
structure here: benchmarks with narrow input range and Top- secondary goals, we want to: 1) perform an empirical scala-
Equivalence, which usually leads to tighter input regions. In bility study to help us evaluate the computational demands for
addition, studies in the literature usually adopt only one, while quantizing and verifying the equivalence of NN models and
our work provides a margin for its discussion. Taking as an 2) evaluate different EC techniques and their impacts on the
example MNIST, which has 10 output classes, we were able performance of CEG4N.
to define a total of 3 sets with 10 input constraints each. Our findings are summarized in Tables I–V, which show
For Acas Xu, we followed the same strategy used by results for benchmarks Iris, Seeds, MNIST, CIFAR, and ACAS
Teuber et al. [12], i.e., -Equivalence as equivalence form, Xu, respectively. Regarding the columns available in each
while r = [0.1, 0.3, 0.5]. Again, three different values were Table, Model tells the specific benchmark, Verifier informs if
MATOS Jr. et al.: COUNTEREXAMPLE-GUIDED NN QUANTIZATION REFINEMENT 1129

TABLE I TABLE III


S UMMARY OF THE CEG4N’ S R ESULTS FOR THE I RIS B ENCHMARK S UMMARY OF THE CEG4N’ S R ESULTS FOR THE MNIST B ENCHMARK

TABLE II
S UMMARY OF THE CEG4N’ S R ESULTS FOR THE S EEDS B ENCHMARK

TABLE IV
S UMMARY OF THE CEG4N’ S R ESULTS FOR THE CIFAR B ENCHMARK

ESBMC or NNEQUIV was used, r discloses the parameter


r [see (13)], No. Iter. shows the number of iterations taken
by our GA instance, Bits informs the last bit width passed to
VM, and Status tells the final CEG4N’s status. Moreover, the
proposed methodology presents inherent flexibility, such that
verifiers and constraints can be easily changed and promptly
evaluated, as follows.
Our experiments show results for CEG4N executions clas-
sified into four possible outcomes: 1) Success (S), meaning
that CEG4N ran for one or more iterations and was able to
produce a QNN that respects the EQ property; 2) Timeout failure (VF), which means that some error occurred during the
(TO), which means CEG4N was unable to verify the EP within equivalence verification step, e.g., exceptions thrown by the
a given time limit previously set; 3) quantization failure (QF), VM have occurred.
which means that CEG4N was unable to find a suitable set In summary, CEG4N running with ESBMC
of bit widths to quantize a given NN; and 4) verification (CEG4N+ESBMC) was able to successfully generate QNNs
1130 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 4, APRIL 2024

TABLE V
S UMMARY OF THE CEG4N’ S R ESULTS FOR THE ACAS X U B ENCHMARK respectively, and are composed mostly by medium to large
NNs (at most eight layers), the number of timeouts presents
a significant increase. Indeed, no successful execution could
even be identified. Moreover, the rare runs with a timeout,
with the dataset Seeds processed by CEG4N+ESBMC, also
happened with: 1) NNs containing more than 10 neurons per
layer; 2) more than 4 iterations; and 3) larger constraint regions
(r = 0.03 and 0.05), which reinforces the explanation based
on NN complexity and BMC scalability. We can check such
cases in detail in Table II. In summary, combining factors 1)
and 3) mostly contributes to timeouts. The more neurons an
NN has, the more operations it has to perform (and CEG4N).
Besides, the bigger the r value, the bigger the search space a
verifier has to cover.
Although runs with CEG4N+NNEQUIV suffered
from fewer timeouts, it is interesting to notice that
CEG4N+ESBMC required overall more iterations than
CEG4N+NNEQUIV. Considering only the successful runs,
CEG4N+ESBMC needed, on average, 4 iterations to produce
a QNN, while CEG4N+NNEQUIV used 2. The explanation
for this behavior is that ESBMC can find counterexamples that
NNEQUIV is not. Since ESBMC and NNEQUIV use different
verification approaches, i.e., SMT and RA, respectively, the
obtained results are expected to eventually diverge for some
verification instances. In addition, NNEQUIV produces a high
number of spurious counterexamples, which is not beneficial to
our scheme. Indeed, CEG4N already expects that the verifiers
can eventually produce spurious counterexamples, which are
ruled out as explained in Section IV-B3. Therefore, our results
are entirely based on valid counterexamples only.
QFs are another possible type of error, which can occur only
in BSM. One may notice that CEG4N+ESBMC presented
only 9 cases, i.e., 11.11%, while 12 were identified for
CEG4N+NNEQUIV. i.e., 14.81%. A possible explanation
is that the search for the bits sequence is highly nonlin-
ear and highly dependent on the set of counterexamples
HCE BSM receives at a given iteration, which makes this
optimization problem a hard one to solve. We should also
consider that new constraints are added to the search problem
for 17 out of 81 runs, considering all datasets, which accounts after each iteration, which makes it even harder to solve. This
for 20.99% of all processes. In addition, CEG4N running with is corroborated by the fact that most QFs occurred after 2 or
NNEQUIV (CEG4N+NNEQUIV) was successful in 33 out more runs.
of 81 runs, representing 40.74% of all processes. Most of the Our experiments show a high number of quantization-failure
CEG4N’s failures, with ESBMC, was due to timeouts, with events for Acas Xu’s benchmarks. One explanation for that
55 occurrences, representing 67.90% of the total. In contrast, relies on the fact that some of those NNs are highly sensitive
CEG4N with NNEQUIV resulted in 30 timeouts, i.e., 37% of to errors introduced by quantization processes, in such a way
the total. that it affects those NN’s behaviors. Thus, in such a context,
Such a difference in timeouts can be attributed to many NNs may easily violate the constraint f (x)  f q (x) ∀ x ∈ HCE o

factors. For example, ESBMC, as an approach based on during the step performed by BSM. In addition, for other NNs,
bounded model checking (BMC), is known to suffer from a higher value for  can also help increase the chance of
scalability issues, which greatly diminishes its ability to a successful quantization. However, in the specific scenario
support larger NNs [29] and can be seen in more details used here, we used an  that should result in applicable
in Tables I–V. In the first two, which show information QNNs [12].
regarding CEG4N+ESBMC runs for the Iris and Seeds Indeed, the conditions presented in the last paragraph prob-
datasets (at most two layers with less than 20 node each), ably prevented CEG4N from producing QNN candidates for
respectively, only three timeouts were noticed. However, if we VM. Moreover, successful runs mostly failed with timeouts.
take a look at Tables III–V, which show information regarding The main factors behind the latter are: 1) the size of the input
CEG4N+ESBMC runs for MNIST, CIFAR-10, and AcasXu, NNs, in neurons and 2) the number of features in the input
MATOS Jr. et al.: COUNTEREXAMPLE-GUIDED NN QUANTIZATION REFINEMENT 1131

TABLE VI TABLE VII


C OMPARISON U SING T OP -1 ACCURACY FOR NN S F ROM DATASET I RIS C OMPARISON U SING T OP -1 ACCURACY FOR NN S F ROM DATASET S EEDS
Q UANTIZED U SING CEG4N AND GPFQ Q UANTIZED U SING CEG4N AND GPFQ

space. Indeed, Acas Xu’s NNs are mostly affected by the NN


size problem, as they present 300 neurons in total. MNIST, in
turn, NNs have 64 or 784 features in the input space that, from
the verification perspective, resulting in a very large search
space dimension to cover. In addition, CIFAR-10’s NNs are TABLE VIII
C OMPARISON U SING T OP -1 ACCURACY FOR NN S F ROM DATASET
affected by both factors, having 1024 features in the input MNIST Q UANTIZED U SING CEG4N AND GPFQ
space, and thousands of neurons in total.
Regarding ACAS Xu, we find it possible to tune p and  so
we mitigate most of the associated QFs. However, it should
be done carefully so the resulting NNs are still applicable and
the chosen verifier can perform accordingly.
The last possible failures are the verification ones, which
only occur in VM. We have noticed them only when running
CEG4N+NNEQUIV, with a total of six cases, i.e., a failure
rate of 7.4%. They were caused by exceptions thrown by the
NNEQUIV’s software dependencies. However, it is unclear
that a flaw in NNEQUIV caused those exceptions. As we did
not notice any VFs with ESBMC, this can indicate its maturity,
when comparing it with NNEQUIV.
Finally, the VM choice is also important. As shown here,
CEG4N+ESBMC and CEG4N+NNEQUIV present different
behaviors, which may be suitable for a given scenario. For
instance, regarding large NNs, CEG4N+NNEQUIV seems to
be a clear choice. Moreover, if new verification techniques are We selected the models for which CEG4N presented
introduced, one may consider other VM options, which our successful quantization processes and proceeded to quantize
framework can accommodate due to its modularity. them also using GPFQ. Next, we collected the accuracy of
the original and QNNs to compare them. Tables VI–VIII
These results answer our EG1. Overall, these exper- summarize the accuracy of the models quantized with
iments show that CEG4N can successfully produce CEG4N+NNEQUIV, CEG4N+ESBMC, and GPFQ, using the
equivalent QNNs. However, scalability should be a Iris, Seeds, and MNIST benchmarks. However, there was no
point of concern for larger NN models, which is very successful run regarding CEG4N+ESBMC with MNIST, as
related to the chosen verifier. already mentioned. Note that we do not report the accuracy
of CIFAR-10’s NNs from VNNCOMP since CEG4N could
In our second set of experiments, we want to achieve our not quantize them. Also, we do not report the accuracy of
second experimental goal, i.e., EG2. We primarily want to Acas Xu’s models. It happened partially because of the same
understand the impact of quantization processes performed by problem identified with the VNNCOMP’s models, but mainly
CEG4N on the accuracy of the resulting NNs compared with because GPFQ requires access to training and test datasets,
other post-training quantization techniques. which are not public for Acas Xu.
1132 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 4, APRIL 2024

The tables are organized as follows. Column Model shows NNs prevent biased behavior and provide better generaliza-
the name of the NN models, Quantizer tells the name quantiza- tion [49].
tion technique (i.e., CEG4N+ESBMC, CEG4N+NNEQUIV,
or GPFQ), r informs the value used to define quantization These results answer our EG2. Overall, these exper-
EPs, and columns Original Accuracy, Quantized Accuracy, iments show that CEG4N can successfully produce
and Accuracy Drop show the accuracy of the original and QNNs that present accuracy figures similar to what is
QNNs and their difference, respectively. Lastly, the column obtained with other state-of-the-art techniques.
Equivalence Status tells whether original and QNNs are
equivalent. The Accuracy Drop is positive if the QNN has
a worse accuracy when compared with its original model. E. Limitations
Otherwise, it is negative.
Although CEG4N can generate QNNs while keeping equiv-
Our findings show that the highest drops in accuracy
alence with the original ones, the NNs we have used for
for NNs generated by CEG4N+nnequiv where 3.3% for
evaluation may not fully reflect the state-of-the-art. Indeed,
Iris, 21.43% for Seeds, and −6.20% for MNIST. For NNs
they have a few layers and hundreds of ReLU nodes,
quantized with CEG4Nesbmc, the highest drops were 3.3%
while state-of-the-art ones may have hundreds of layers and
for Iris and 21% for Seeds. Regarding GPFQ, the highest
thousands of ReLU nodes. The main bottleneck lies in the
noticed drops were 53.3% for Iris, −47.61% for Seeds, and
state-of-the-art verification algorithms (e.g., SMT or GPE),
−3.41% for MNIST. Such results are interesting and show that
which currently do not scale to large NNs. Consequently,
an increase in accuracy is even possible, which will be briefly
CEG4N+ESBMC and CEG4N+NNEQUIV were able to
discussed in the following text.
quantize only 20% and 40% of the chosen benchmarks, respec-
On the one hand, the highest accuracy drop for CEG4N-
tively, due to timeouts. Still, we have shown that CEG4N is
generated NNs coincides with r = 0.01 (See Table VI).
both viable and flexible, which opens room for improvements
Indeed, small r values increase the chance of not finding coun-
with verifiers beyond ESBMC and NNEQUIV.
terexamples to drive the quantization process, which favors
The research field of NN equivalence is a relatively new
the generation of poorly QNNs. For higher r values (r = 0.03
one and there is no well-established set of benchmarks [13].
and r = 0.05), we notice that the highest accuracy drop is
In this respect, our choices are a good starting point, but
3.3%. On the other hand, for GPFQ, the highest accuracy
there is ample scope for further contributions. Additionally,
drops involve the Iris and Seeds benchmarks. We believe the
our work is innovative by proposing a framework for NN
GPFQ’s performance is due to two factors: 1) GPFQ relies on
quantization that integrates NNEV as an essential part of
representative data (e.g., data from the training dataset), which
the process. As such, there is no similar methodology in
is more difficult for small datasets and 2) small models are
literature which we can directly compare our approach with.
more sensitive to quantization.
Also, our work focuses on the practical and feasible aspect of
Overall, the accuracy of models quantized with CEG4N is
this new quantization approach and the formalization of NN
better for the Iris and Seeds benchmarks, while the accuracy
equivalence in terms of functional equivalence. Thus, we are
of models quantized with GPFQ is better for the MNIST
not centering the discussion at a conceptual level, and for such,
benchmarks, but only by a small margin. We find that the
we could not discuss in depth the relationship between NN
CEG4N’s performance, in terms of NN accuracy, presents
equivalence, functional equivalence, robustness verification,
interesting results, given that it can produce QNNs relying
their impact, and implication in NN accuracy and error.
only on a small set of representative data.
We have also conducted another experiment in which
NNEQUIV was used to verify equivalence between QNNs VI. C ONCLUSION
generated by GPFQ and their original counterparts. For every We have presented a new method for NN quantiza-
case, when at least one counterexample was found, we tion, named CEG4N, which is a post-training quantization
considered the QNN not equivalent to the original one. We technique that provides formal guarantees regarding NN equiv-
have noticed that, out of 31 NNs generated by GPFQ, only alence. It relies on a counterexample-guided optimization
8 were in fact equivalent, which represents only 25.8% of technique, where an optimization-based quantizer produces
the total amount of resulting NNs. Indeed, GPFQ was not compressed NN candidates. A state-of-the-art verifier then
designed to consider EPs in its quantization approach as checks such candidates to either prove the equivalence between
happened with CEG4N. Anyway, these experiments serve as quantized and original NNs or refute it by providing a
evidence that statistical accuracy measures do not capture counterexample. The latter is then fed to the quantizer to guide
equivalence aspects. Moreover, it reinforces the benefits of it in the search for a viable candidate.
formulating guarantees for properties (e.g., equivalence and In the proposed framework, scalability is tightly related to
robustness) that formal verification techniques can offer. In the underlying verifier, which, in our experiments, took two
addition, we noticed that both CEG4N and GPFQ produced forms: SMT solver and GPE. SMT solvers are sensitive to
QNNs with better accuracy when compared with their original NN complexity [29], which leads to large state spaces, while
counterparts. Indeed, that is possible and has already been GPE may face exponential growth in the number of associated
reported in the literature, since the quantization techniques star sets [12]. Although that may look like an obstacle, both
act as favorable weight regularization mechanisms that help optimization efforts and different quantization strategies have
MATOS Jr. et al.: COUNTEREXAMPLE-GUIDED NN QUANTIZATION REFINEMENT 1133

the potential to alleviate these issues. At the same time, it is [13] C. Eleftheriadis, N. Kekatos, P. Katsaros, and S. Tripakis, “On neural
worth mentioning that our main target was to demonstrate the network equivalence checking using SMT solvers,” in Formal Modeling
and Analysis of Timed Systems. Cham, Switzerland: Springer, 2022,
feasibility of our methodology, which employed the mentioned pp. 237–257.
verifiers as possible solutions, but it is not restricted to them. [14] J. B. P. Matos, I. Bessa, E. Manino, X. Song, and L. C. Cordeiro,
In our future work, we will explore other quantization “CEG4N: Counter-example guided neural network quantization refine-
ment,” in Software Verification and Formal Methods for ML-Enabled
approaches not limited to the search-based ones and dif- Autonomous Systems. Cham, Switzerland: Springer, 2022, pp. 29–45.
ferent equivalence-verification techniques based on RA [12] [15] K. D. Julian, J. Lopez, J. S. Brush, M. P. Owen, and M. J. Kochenderfer,
and SMT encoding [13]. For instance, [39] provides a new “Policy compression for aircraft collision avoidance systems,” in Proc.
35th DASC, 2016, pp. 1–10.
perspective and interpretation of NN behavior centered around [16] C. J. B. Yann, Y. LeCun, and C. Cortes, “The MNIST DATABASE
error bound verification. This interpretation is different than of Handwritten Digits.” [Online]. Available: http://yann.lecun.com/exdb/
ours and may have implications and provide new perspectives mnist/
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
and conclusions to our quantization problem. Possible future with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6,
works may provide a more concise formalization of NN pp. 84–90, 2017.
equivalence, incorporate QEBVerif approach into CEG4N and [18] R. A. Fisher, “The use of multiple measurements in taxonomic
problems,” Annal. Eugen., vol. 7, pp. 179–188, Sep. 1936.
provide a comparison work. Combining new quantization and
[19] M. Charytanowicz, J. Niewczas, P. Kulczycki, P. A. Kowalski,
equivalence-verification techniques will help CEG4N achieve S. Łukasik, and S. Żak, Complete Gradient Clustering Algorithm for
better results while providing a more suitable compromise Features Analysis of X-Ray Images. Berlin, Germany: Springer, 2010,
between accuracy and scalability. Furthermore, we will con- pp. 15–24.
[20] C. M. Bishop and N. M. Nasrabadi, Pattern Recognition and Machine
sider quantization approaches that operate entirely on integer Learning, vol. 4. New York, NY, USA: Springer, 2006.
arithmetic, which can potentially improve the scalability of [21] A. Abate et al., “Sound and automated synthesis of digital stabiliz-
the CEG4N’s verification step. The SMT-encoding of the ing controllers for continuous plants,” in Proc. 20th HSCC, 2017,
pp. 197–206.
quantization problem will also be considered, with the goal [22] R. Krishnamoorthi, “Quantizing deep convolutional networks for effi-
of both comparing its cost with the one presented by our cient inference: A whitepaper,” 2018, arXiv:1806.08342.
current proposal and devising a unified framework for NN [23] B. Paulsen, J. Wang, J. Wang, and C. Wang, “NEURODIFF: Scalable
differential verification of neural networks using fine-grained approxi-
compression and equivalence. mation,” in Proc. 35th IEEE/ACM Int. Conf. Autom. Softw. Eng. (ASE),
2020, pp. 784–796.
[24] B. Paulsen, J. Wang, and C. Wang, “ReluDiff: Differential verification
R EFERENCES of deep neural networks,” in Proc. ISCE, 2020, pp. 714–726.
[25] R. Brummayer and A. Biere, “Boolector: An efficient SMT solver for
[1] M. Bojarski et al., “End to end learning for self-driving cars,” 2016, bit-vectors and arrays,” in Tools and Algorithms for the Construction
arXiv:1604.07316. and Analysis of Systems. Berlin, Germany: Springer, 2009, pp. 174–177.
[2] O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed, [26] L. De Moura and N. Bjørner, “Z3: An efficient SMT solver,” in Tools
and H. Arshad, “State-of-the-art in artificial neural network applications: and Algorithms for the Construction and Analysis of Systems. Berlin,
A survey,” Heliyon, vol. 4, no. 11, 2018, Art. no. e00938. Germany: Springer, 2008, pp. 337–340.
[3] Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “A survey of model [27] H.-D. Tran et al., “Star-based reachability analysis of deep neural
compression and acceleration for deep neural networks,” 2017, networks,” in Proc. FM, 2019, pp. 670–686.
arXiv:1710.09282. [28] M. Nagel, M. Fournarakis, R. A. Amjad, Y. Bondarenko, M. Van Baalen,
[4] D. Lin, S. Talathi, and S. Annapureddy, “Fixed point quantization of and T. Blankevoort, “A white paper on neural network quantization,”
deep convolutional networks,” in Proc. 33rd Int. Conf. Mach. Learn., 2021, arXiv:2106.08295.
2016, pp. 2849–2858. [29] L. Sena, X. Song, E. H. Da S. Alves, I. V. Bessa, E. Manino, and
[5] A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, L. C. Cordeiro, “Verifying quantized neural networks using SMT-based
“A survey of quantization methods for efficient neural network infer- model checking,” 2021, arXiv:2106.05997.
ence,” in Low-Power Computer Vision. Boca Rato, FL, USA: Chapman [30] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio,
and Hall/CRC, 2022, pp. 291–326. “Binarized neural networks: Training deep neural networks with weights
[6] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network and activations constrained to +1 or −1,” 2016, arXiv:1602.02830.
quantization: Towards lossless cnns with low-precision weights,” in [31] M. Á. Carreira-Perpiñán and Y. Idelbayev, “Model compression as
Proc. 5th Int. Conf. Learn. Represent., 2017, pp. 1–14. constrained optimization, with application to neural nets. Part II:
[7] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep Quantization,” 2017, arXiv:1707.04319.
neural network with pruning, trained quantization and Huffman coding,” [32] Y. Yuan, C. Chen, X. Hu, and S. Peng, “Evoq: Mixed precision
in Proc. 4th Int. Conf. Learn. Represent., 2016, pp. 1–14. quantization of dnns via sensitivity guided evolutionary search,” in Proc.
[8] J. Zhang, Y. Zhou, and R. Saab, “Post-training quantization for neural Int. Jt. Conf. Neural Netw., 2020, pp. 1–8.
networks with provable guarantees,” SIAM J. Math. Data Sci., vol. 5, [33] E. Lybrand and R. Saab, “A greedy algorithm for quantizing neural
no. 2, pp. 373–399, 2023. networks,” J. Mach. Learn. Res., vol. 22, no. 1, pp. 1–38, 2021.
[9] B. Jacob et al., “Quantization and training of neural networks for [34] Y. Xu, Y. Wang, A. Zhou, W. Lin, and H. Xiong, “Deep neural network
efficient integer-arithmetic-only inference,” in Proc. IEEE Conf. Comput. compression with single and multiple level quantization,” in Proc. 32nd
Vis. Pattern Recognit. (CVPR), 2018, pp. 2704–2713. AAAI Conf. Artif. Intell. 13th Innovat. Appl. Artif. Intell. Conf. 8th AAAI
[10] X. Huang et al., “A survey of safety and trustworthiness of Symp. Educ. Adv. Artif. Intell., 2018, pp. 1–8.
deep neural networks: Verification, testing, adversarial attack and [35] J. Yang et al., “Quantization networks,” in Proc. IEEE/CVF Conf.
defence, and interpretability,” Comput. Sci. Rev., vol. 37, Aug. 2020, Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 7308–7316.
Art. no. 100270. [36] Q. Jin, L. Yang, and Z. Liao, “Adabits: Neural network quantization
[11] M. K. Büning, P. Kern, and C. Sinz, “Verifying equivalence with adaptive bit-widths,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
properties of neural networks with relu activation functions,” in Recognit. (CVPR), 2020, pp. 2146–2156.
Principles and Practice of Constraint Programming (Lecture Notes in [37] S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, “Dorefa-net:
Computer Science), vol. 12333. Cham, Switzerland: Springer, 2020, Training low bitwidth convolutional neural networks with low bitwidth
pp. 868–884. gradients,” 2016, arXiv:1606.06160.
[12] S. Teuber, M. K. Buning, P. Kern, and C. Sinz, “Geometric path [38] K. Huang et al., “Structured dynamic precision for deep neural networks
enumeration for equivalence verification of neural networks,” in Proc. quantization,” ACM Trans. Design Autom. Electron. Syst., vol. 28, no. 1,
ICTAI, 2021, pp. 1–9. pp. 1–24, Jan. 2023.
1134 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 43, NO. 4, APRIL 2024

[39] Y. Zhang, F. Song, and J. Sun, “QEBVerif: Quantization error bound Iury Bessa received the B.Sc. and master’s degrees
verification of neural networks,” in Computer Aided Verification. Cham, in electrical engineering from the Federal University
Switzerland: Springer, 2023, pp. 413–437. of Amazonas, Manaus, Brazil, in 2014 and 2015,
[40] M. Giacobbe, T. A. Henzinger, and M. Lechner, “How many bits does respectively, and the Ph.D. degree in electrical
it take to quantize your neural network?” in Tools and Algorithms for engineering from the D!FCOM Laboratory, Federal
the Construction and Analysis of Systems. Cham, Switzerland: Springer University of Minas Gerais, Belo Horizonte, Brazil.
Int. Publ., 2020, pp. 79–97. During the Ph.D. degree, he was a Visiting
[41] T. A. Henzinger, M. Lechner, and D. Žikelić, “Scalable verification Scholar with the Advanced Control Systems Group,
of quantized neural networks,” in Proc. 35th AAAI Conf. Artif. Intell., Polytechnic University of Catalonia, Barcelona,
vol. 35, pp. 3787–3795, May 2021. Spain, from February to December 2020. Since
[42] S. Mistry, I. Saha, and S. Biswas, “An MILP encoding for efficient 2015, he has been an Assistant Professor with the
verification of quantized deep neural networks,” IEEE Trans. Comput.- Department of Electricity and is part of the e-Controls Research Group,
Aided Design Integr. Circuits Syst., vol. 41, no. 11, pp. 4445–4456, Federal University of Amazonas. In October 2016, he was a Visiting Scholar
Nov. 2022. with the Department of Computer Science, University of Oxford, Oxford,
[43] X. Song et al., “QNNVerifier: A tool for verifying neural networks using U.K. His research interests include control theory, fault-tolerant control, fault
SMT-based model checking,” 2021, arXiv:2111.13110. detection, diagnosis and prognosis, formal verification and synthesis, learning-
[44] M. R. Gadelha, F. R. Monteiro, J. Morse, L. C. Cordeiro, B. Fischer, and based control, cyber–physical systems, and computational intelligence.
D. A. Nicole, “ESBMC 5.0: An industrial-strength C model checker,”
in Proc. 33rd IEEE/ACM Int. Conf. Autom. Softw. Eng. (ASE), 2018,
pp. 888–891.
[45] F. R. Monteiro, M. R. Gadelha, and L. C. Cordeiro, “Model checking
C++ programs,” Softw. Test., Verif. Rel., vol. 32, no. 1, Jan. 2022,
Art. no. e1793.
[46] S. Bak, C. Liu, and T. Johnson, “The second international verification Edoardo Manino received the Ph.D. degree in
of neural networks competition (VNN-COMP 2021): Summary and Bayesian machine learning from the University of
results,” 2021, arXiv:2109.00498. Southampton, Southampton, U.K., in 2020.
[47] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist He is a Research Associate with the Department
multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evol. Comput., of Computer Science, University of Manchester,
vol. 6, no. 2, pp. 182–197, Apr. 2002. Manchester, U.K., where he is part of the Systems
[48] G. Katz, C. W. Barrett, D. L. Dill, K. D. Julian, and M. J. Kochenderfer, and Software Security Group and focuses on auto-
“Reluplex: An efficient SMT solver for verifying deep neural networks,” mated verification of neural networks. Throughout
in Proc. 29th Int. Conf. Comput. Aided Verif., 2017, pp. 97–117. his research career, he published on a wide num-
[49] Q. Jin, L. Yang, and Z. A. Liao, “Towards efficient training for neural ber of topics, including algorithmic game theory,
network quantization,” 2019, arXiv:1912.10207. multiagent reinforcement learning, network science,
crowdsourcing, analog computing, and automated software testing.

Xidan Song is currently pursuing the Ph.D.


degree with the Department of Computer Science,
João Batista P. Matos Jr. is currently pursuing University of Manchester, Manchester, U.K.
the Ph.D. degree with the Institute of Computation, He is a research student with the Systems and
Federal University of Amazonas, Manaus, Brazil. Software Security Group, University of Manchester,
He is a research student with the Systems and and focuses on the verification and repair of neural
Software Security Group, University of Manchester, networks.
Manchester, U.K., and focuses on automatic veri-
fication of neural networks. He has a background
in recommendation systems, having studied user-
engagement metrics for social networks.

Lucas C. Cordeiro received the B.Sc. degree in


electrical engineering and the M.Sc. degree in com-
puter engineering from the Federal University of
Amazonas (UFAM), Manaus, Brazil, in 2005 and
Eddie B. de Lima Filho received the M.Sc. 2007, respectively, and the Ph.D. degree in com-
and D.Sc. degrees in electrical engineering puter science from the University of Southampton,
from the Federal University of Rio de Janeiro Southampton, U.K., in 2011.
(COPPE/UFRJ), Rio de Janeiro, Brazil, in 2004 and He is a Reader with the Department of Computer
2008, respectively. Science, University of Manchester (UoM), where he
He is currently a Leader of the Digital leads the Systems and Software Security (Research
Convergence Group, Research and Development Group. He is also the Arm Centre of Excellence
Department, TPV Technology, Manaus, Brazil, Director of UoM. He is also affiliated with the Trusted Digital Systems
where he conducts research in Consumer Cluster with the Centre for Digital Trust and Society, the Formal Methods
Electronics. In addition, he is affiliated with the Post- Group, UoM, and the Post-Graduate Programs in Electrical Engineering and
Graduate Program in Electrical Engineering with the Informatics, UFAM. His work focuses on software model checking, automated
Federal University of Amazonas, Manaus. Before joining TPV Technology, he testing, program synthesis, software security, embedded and cyber–physical
worked as a Researcher with the Science, Technology, and Innovation Center, systems.
Manaus Industrial Pole of Manaus/NXP Semiconductors and Genius Institute Dr. Cordeiro has received various international awards, including the Most
of Technology. His work includes software model checking, automated Influential Paper at IEEE/ACM ASE’23, the Distinguished Paper Award at
testing, embedded and cyber–physical systems, channel/source coding, and ACM ICSE’11, and 39 awards from the international competitions on software
video/image processing. verification (SV-COMP) and testing (Test-Comp) 2012–2023.

You might also like