Zero Knowledge ML For Generative AI

Trust the Process: Zero-Knowledge Machine Learning to Enhance Trust in
Generative AI Interactions
Bianca-Mihaela Ganescu1 , Jonathan Passerat-Palmbach1, 2
1
Imperial College London
2
Flashbots
[email protected], [email protected]
arXiv:2402.06414v1 [cs.LG] 9 Feb 2024
Abstract We present a protocol that empowers users to engage with

Generative AI, exemplified by models like transformers, has
remote servers hosting AI models, ensuring that the model
opened up new possibilities in various domains but also raised they interact with behind a black-box API is indeed the pre-
concerns about fairness, transparency and reliability, espe- cise model they have requested from the provider.
cially in fields like medicine and law. This paper empha- The implications of this approach extend across domains,
sizes the urgency of ensuring fairness and quality in these do- with particular relevance in fields such as healthcare and fi-
mains through generative AI. It explores using cryptographic nance, where the outcomes produced by AI models can pro-
techniques, particularly Zero-Knowledge Proofs (ZKPs), to foundly affect the well-being of end-users, especially those
address concerns regarding performance fairness and accu- belonging to underrepresented populations.
racy while protecting model privacy. Applying ZKPs to Ma-
chine Learning models, known as ZKML (Zero-Knowledge Our approach leverages zkSNARKs (Zero-Knowledge
Machine Learning), enables independent validation of AI- Succinct Non-interactive ARgument of Knowledge), a cryp-
generated content without revealing sensitive model informa- tographic tool that bolsters the integrity of computations
tion, promoting transparency and trust. ZKML enhances AI while preserving user privacy by not disclosing sensitive in-
fairness by providing cryptographic audit trails for model pre- formation during the verification process.
dictions and ensuring uniform performance across users. We Recent developments have given rise to a burgeoning
introduce snarkGPT, a practical ZKML implementation for body of research known as ZKML (Zero-Knowledge Ma-
transformers, to empower users to verify output accuracy and chine Learning) (Feng et al. 2021; Lee et al. 2020; Kang
quality while preserving model privacy. We present a series
et al. 2022), which focuses on leveraging zkSNARKs to ver-
of empirical results studying snarkGPT’s scalability and per-
formance to assess the feasibility and challenges of adopt- ify the integrity of the inference pass in AI models. ZKML
ing a ZKML-powered approach to capture quality and per- serves two primary purposes:
formance fairness problems in generative AI models.
1. Verification of Model Output: In scenarios where users
submit their data to remote services hosting AI models,
Introduction ZKML provides users with a cryptographic proof that the
The latest developments in Generative AI have opened new output they receive originates from the exact model they
possibilities for AI to become more pervasive in multiple have requested or paid for. This is particularly pertinent
fields. For example, GPT-4 can now pass internationally rec- in Generative AI applications like ChatGPT, where mul-
ognized exams in medicine and law (Nori et al. 2023; Ope- tiple subscription tiers promise varying levels of model
nAI 2023). Such advancements in transformer-based mod- performance. Currently, users lack guarantees that the
els introduce new challenges, particularly in ensuring the output aligns with their subscription tier.
uniform quality and execution of these models for all users 2. Enhancing Data Privacy: ZKML is also valuable for en-
when deployed and accessed through remote cloud APIs. hancing data privacy. In cases where users are unwilling
This notion is commonly known as performance fairness or unable to upload their data to remote third-party mod-
(Jiang et al. 2023). els, model owners can deploy models directly to users’
Over the past few years, the field of AI fairness has wit- devices. While this deployment strategy is unproblem-
nessed substantial progress in addressing bias mitigation atic in scenarios like predictive text on a smartphone key-
within machine learning models. This collective effort has board, more sensitive situations exist, such as financial
yielded methods that strive to create fairer models, reducing services’ Know Your Customer (KYC) processes. ZKML
the impact of biases in various applications. This paper ex- enables model providers to confidently deploy their mod-
plores a novel algorithmic approach that complements AI els to users’ devices, ensuring that the model’s predic-
fairness strategies by focusing on ensuring the uniform per- tions are genuinely derived from their model and not sim-
formance and execution of deployed AI models across users. ulated by the user using alternative software. The combi-
Copyright © 2024, Association for the Advancement of Artificial nation of data locality and the Zero-Knowledge property
Intelligence (www.aaai.org). All rights reserved. preserves user data privacy.
In the context of AI fairness, ZKML directly contributes et al. 2022). For example, in the case of transformers, x can
to performance fairness, ensuring that all users of a par- represent an input prompt, w the model weights and y the
ticular service experience consistent quality. Model own- output of executing the model using input prompt x and
ers need not reveal their model’s weights, preserving their weights w.
valuable intellectual property. Instead, they can publish a
model fingerprint that is verified as an additional step dur- Most zkSNARK protocols proceed in three steps (Nit-
ing user output generation. This has far-reaching implica- ulescu 2020):
tions, including addressing equality of outcomes concerns 1. Arithmetisation: producing a system of polynomial equa-
in applications like credit scoring models, which have faced tions over a large prime field (an arithmetic circuit)
criticism for perpetuating discrimination against underrepre- for which finding a solution is equivalent to computing
sented communities, such as racial minorities. By employing f (x; w). Therefore, given (f, y, x, w), the circuit con-
ZKML, banks can make their creditworthiness assessments straints are met if and only if y = f (x; w).
fully transparent and trustworthy, guaranteeing that the same
model is used for all users. Auditors can evaluate model de- 2. Building an information-theoretic proof system, that is,
cisions alongside the proof provided by users disputing their building a proof system that guarantees soundness even
scores, thereby eliminating discrimination rooted in users’ against a computationally unbounded prover. This sys-
backgrounds. tem usually relies on idealized components (oracles) or
The remainder of this paper demonstrates the viability is inefficient.
of our approach. In the background section, we introduce 3. Compiling the information-theoretic proof system into an
“snarkGPT,” a verifiable ZKML pipeline. In the evaluation efficient one using cryptographic tools at the cost of con-
section, we provide empirical results that underscore the ef- sidering only computationally bounded adversaries.
fectiveness of our approach, specifically showcasing its ap- There are two main classes of zkSNARKs: Quadratic Arith-
plicability to a GPT2-size model. metic Program (QAP)-based and Polynomial Interactive Or-
acle Proof (PIOP)-based. QAP-based zkSNARKs utilize
Background arithmetic circuits and divisibility checks. PIOP-based zk-
zkSNARKs SNARKs compile circuits into polynomial constraints and
employ polynomial commitment schemes. Recent years
Zero-Knowledge Succinct Non-Interactive Arguments of have seen a dominance of PIOP-based SNARKs mainly due
Knowledge (zkSNARKs) (Bitansky et al. 2017) represent to their increased flexibility.
advanced cryptographic tools that allow verifying the cor- This work uses one of the most popular and well-
rectness of a computation without revealing the inputs or documented PIOP-based proof systems: Halo2.
intermediate steps. The foundation of zkSNARKs is rooted
in a rich mathematical framework characterised by their ad- Halo2 Halo2 (hal) is an instance of a PIOP made non-
vantageous properties, including zero-knowledge, succinct- interactive via the Fiat-Shamir heuristic (Fiat and Shamir
ness, non-interactivity, argument soundness, knowledge- 1987). The scheme uses Plonkish arithmetisation (Gabizon,
soundness, and completeness. Williamson, and Ciobotaru 2019) to represent a relation and
In the field of verifiable computing (VC), zkSNARKs of- a satisfying witness to that relation as a low-degree polyno-
fer an effective solution by enabling clients to delegate com- mial. Then, the constructed polynomial is fed into a polyno-
putationally intensive tasks to providers. After delegation, mial commitment scheme where the prover commits to the
the provider (prover) sends a concise cryptographic proof polynomial and can evaluate it provably at arbitrary points
to the client (verifier), allowing the client to verify the cor- from the verifier. Halo2 uses the inner product argument
rectness of the computation without the need for a full re- (Bowe, Grigg, and Hopwood 2019) as a polynomial com-
execution(Petkus 2019). Most notably, the size of the proof mitment scheme. However, other frameworks using Halo2
is significantly smaller than the size of the original com- can choose to have a different commitment scheme, as is the
putation. This succinct proof size is a fundamental charac- case of EZKL (introduced subsequently in the background
teristic of zkSNARKs, making them suitable for situations section) using KZG (Kate, Zaverucha, and Goldberg 2010).
where executing the whole computation is not feasible, as Using Halo2, a verifier can use an accumulation scheme to
is the case for generative AI. Moreover, in scenarios in- batch instances it wants to evaluate.
volving outsourced AI computations, zkSNARKs provide Plonkish arithmetisation allows polynomial constraints
a robust mechanism for clients to verify the correctness of with certain restricted forms of randomness and supports
model inferences without compromising data confidentiality. custom gates and lookup arguments. Plonkish arithmetisa-
This is particularly relevant in the context of secure model tion conceptualises the arithmetic circuit as a matrix (re-
inference-as-a-service. ferred to as circuit matrix) of m columns and n rows over
To describe zkSNARKs for generative AI, we can define a given finite field F (therefore, the cells contain elements
them formally as, given a function evaluation f (x; w) = y, of F). Each column j encodes a wire and corresponds to
where x is a public input, w is a private input called witness a Lagrange interpolation polynomial pj (X) over the pow-
and y the output, zkSNARKs allow the prover to generate ers (rows) of an nth primitive of unity ω that evaluates to
a proof π such that the verifier that knows x, y and π can pj (ω i ) = xij . A permutation argument is used to enforce
check that the prover knows w such that f (x; w) = y (Kang the equality of cells.
The matrix defines a sequence of polynomial constraints, parameters are 117M, 345M, 762M and 1.5B, respectively.
which are multivariate polynomials over F that must eval- GPT-3 (Radford et al. 2019) follows the same architecture as
uate to zero for each row. In a polynomial constraint, the GPT-2, with a few hyperparameters modified to improve per-
variables can refer to a cell in a given column of the current formance. The model has configurations ranging from 175M
row or another row relative to this one. to 175B parameters. GPT-4 extends this line of models and
To improve performance, Halo2 can trade memory for accepts multiple modalities, but its authors have yet to docu-
CPU by pre-computing and storing lookup tables for some ment its architecture thoroughly in a peer-reviewed publica-
part of the computation. The lookup argument enforces a re- tion.
lation between variables, where the relation is expressed as a In this work, we focus on the GPT-2 architecture (the
table. The lookup table consists of two advice columns and same as the GPT-3 architecture), as it is the latest architec-
two fixed columns of the matrix, where every expression in ture in the GPT series for which a Python implementation
the set of advice columns is equal to some expression in the has been made public at the time of writing. We propose
set of fixed columns. Therefore, the lookup argument is a a zkSNARK for the transformer architecture as a proof-of-
more permissive version of the permutation argument. It en- concept, using nanoGPT: a smaller version of the GPT-2 ar-
forces that all the input-output pairs in the witness are valid chitecture. NanoGPT’s code aims to be shorter and easier
input-output pairs in the fixed columns. Lookup tables are to interpret (Karpathy 2022, 2020), but remains compatible
often leveraged to represent more complex arithmetic rela- with the original GPT-2 and can be used to train or finetune
tionships in the circuit matrix that would otherwise be hard medium-sized GPTs.
to capture only with additions and multiplications. A prime
example is non-linearities in neural networks that would oth- Related work
erwise have to be numerically approximated.
ZKML
EZKL EZKL (Camuto and Morton 2022, 2023) is a Rust Previous research has proposed zkSNARK protocols or the
library and command-line tool that allows constructing zk- inference phase of smaller neural networks. Lee et al. (Lee
SNARKs for the inference phase of machine-learning mod- et al. 2020) propose a protocol for verifiable convolutional
els. EZKL uses Halo2 with KZG in the backend, with cer- neural networks, using QAP-based zkSNARKs for pool-
tain modifications that make the library compatible with ing and ReLU, QPP (Quadratic Polynomial Program)-based
larger models, compared to existing solutions implementing zkSNARKs for convolutions and CP(Commit-and-Prove)-
zkSNARKs for machine learning models. EZKL converts SNARKs for interconnecting the layers. They argue that
pre-trained models from ONNX to Halo2 circuit matrices. their scheme improves the key generation/proving time by
Instead of directly implementing each of the 100+ ONNX 25X compared to the state-of-the-art zkSNARK scheme
operations individually, EZKL builds upon the tract library (Groth 2016) for a small example of the MNIST model
(Sonos 2020), which reduces and prunes complex ONNX consisting of a single convolutional layer with ReLU and
graphs to compositions of a smaller set of operations. By a single pooling layer. Moreover, for VGG16, they argue
leveraging the Einstein summation operation, EZKL can rep- that their scheme improves the performance by 1800X, com-
resent numerous linear operations such as matrix multiplica- pared with (Groth 2016).
tion, dot product, transposition, and tensor contraction with
Another approach was proposed by Weng et al. (Weng
a single operation. This approach allows EZKL to support a
et al. 2022), for privacy-preserving and verifiable CNNs by
wide range of models, including those with LSTM and self-
using a modified version of QAP - QMP (Quadratic Matrix
attention layers while only having to support 20 operations
Program). The authors use a QMP-based arithmetic circuit
as equivalent circuit constraints. Notably, EZKL improves to express convolutional relations and generate zkSNARKs
the memory costs of the Halo2 circuit matrix by allocating
proofs based on Homomorphic Encryption (HE) and collab-
additional columns if all of the available rows have been
orative inference, which promises to protect both the model
filled. By the definition of Halo2, there can be arbitrarily
and the data privacy. They argue that they obtain 17.6X
many columns in the circuit matrix, but the number of rows
faster Setup time and 13.9X faster proving time than the
has to be a power of 2. This comes at the cost of increasing
QAP-based method in their experiments.
the proving time.
ZEN is an optimizing compiler for verifiable neural net-
works using zkSNARKs, introduced by Feng et al. (Feng
Generative AI: from GPT down to nanoGPT et al. 2021). It proposes two privacy-preserving, verifi-
Text generation within generative AI relies on advanced able inference schemes for neural networks, ZENacc and
models such as Transformers, introduced in 2017 (Radford ZENinf er , which promise to provide privacy guarantees
et al. 2018), which have since become foundational for lan- for both the model and the data. The authors also introduce
guage models. The architecture consists of 12 transformer two optimizations for R1CS: R1CS friendly quantization,
blocks with masked multi-head self-attention. GPT-2 and which they argue "brings up to 73.9× savings in R1CS con-
GPT-3 are improved versions with larger datasets and more straints for a convolution layer and up to 8.4× reduction for
parameters. fully connected kernel without any additional accuracy loss"
GPT-2 largely follows the architecture of GPT-1 and has ((Feng et al. 2021), page 2), and Stranded encoding of R1CS
four trained models available for use, each having a differ- Constraints, which they argue "leads to up to 2.2× improve-
ent number of blocks and embedding sizes. Their number of ment in R1CS constraints for convolution kernel and 3.4×
improvement for fully connected kernel" ((Feng et al. 2021), and suggest an accompanying protocol to capture potential
page 2). fairness deviations.
Kang et al. (Kang et al. 2022) followed the steps of EZKL
and also constructed Halo2 proofs for verifying the infer- A protocol fostering performance fairness
ence phase of a machine learning model. They propose an In the context of remote service hosting generative AI mod-
ImageNet-scale (Deng et al. 2009) zkSNARK using Halo2 els, we set the challenge for a provider to instil confidence
and present three protocols for verifying the ML model ac- in consumers regarding the uniform quality and execution of
curacy, verifying the ML model predictions and trustless re- the models it serves.
trieval of items matching a predicate, respectively. They ar- Starting from a simple case, we question how a premium
gue that they can achieve up to 79% accuracy on ImageNet- user (or institution) of ChatGPT can obtain the guarantee
scale while simultaneously taking as few as 10s and 5,952 that they use the premium version of the model at all times
bytes to verify and that the zkSNARK can be scaled down and not a cheaper version sometimes. With the growing pop-
to take as few as 0.7s to verify at 59% accuracy. Comparing ularity of ChatGPT, it is unclear how the service deals with
their results to (Lee et al. 2020), (Weng et al. 2022), (Feng the high number of requests. At the moment of writing, the
et al. 2021) and (Liu, Xie, and Zhang 2021), they claim that only option for paying users is to trust that OpenAI always
the proving time for the prior work is at least 10X higher serves the premium model to premium users.
than their Halo2 method and up to 1,000X higher. One option is for the model provider to share the
model weights. However, this poses problems. Sharing these
Fairness weights means leaking substantial intellectual property en-
Fairness has garnered significant attention within the field capsulated within these models, and not all users may have
of machine learning, encompassing a wide range of investi- the necessary computing power to use these weights effec-
gations into the unintended behaviours exhibited by machine tively (Butler 2023).
learning models (Barocas, Hardt, and Narayanan 2017). The We believe that a promising solution to the lack of guar-
literature has extensively explored various facets of fairness, antees in these cases is using zkSNARKs, which can prove,
including the concepts of group fairness and performance beyond a reasonable doubt, the correct execution of a com-
fairness. The former strives to mitigate model bias with re- putation. Indeed, as discussed in the related work section,
spect to specific protected attributes (Du et al. 2021; Gálvez previous work has proposed zkSNARKs solutions for veri-
et al. 2021; Chu et al. 2021), while the latter necessitates the fying the correct execution and accuracy of other machine
uniformity of performance distributions across different re- learning models, such as CNNs, DNNs and ImageNet-scale.
cipients (Zhang, Kou, and Wang 2020; Deng, Kamani, and One possible ZKML protocol is as follows:
Mahdavi 2020; Pentyala et al. 2022). This section concen- 1. The provider of a generative AI model publishes a com-
trates on the notion of performance fairness. mitment (for example, a hash) to the model weights using
Performance fairness has garnered particular prominence a Credible Commitment Device (Kalai et al. 2010) (for
within Federated Learning, wherein client data originat- instance in a public domain).
ing from multiple sources exhibits high heterogeneity and 2. A client can then send some input data to the provider
spans geographical diversity. Mohri et al. introduced an and ask the provider to evaluate the model on that input
initial framework for optimizing the performance of the data.
poorest-performing device through a minimax optimization
3. The provider generates the output of the model evaluated
scheme (Mohri, Sivek, and Suresh 2019). Subsequently, the on the client’s data, as usual, and generates a zkSNARK.
q-FedAvg algorithm was introduced by Li et al. (Li et al.
The proof takes as parameters the weights of the model
2020), offering a more flexible optimization objective tai-
(private), the client’s input data (public) and the output of
lored to achieve varying degrees of fairness. More recently, the model (public). The zkSNARK enables anyone to val-
Ditto was introduced as an approach to engender fairness by
idate the correct execution of the model with significantly
enabling the learning of personalized models (Li et al. 2021).
smaller costs. The proof attests that the model produces
As we can see, the works mentioned above primarily cen- the reported output given the parameters above and the
tre on narrowing the performance gap across heterogeneous public commitment to the model.
clients. This pursuit aligns with our approach, which is com-
plementary. Our method empowers clients to identify and 4. The client can then run a verification protocol to accept
conscientiously report performance fairness disparities with- or reject the proof without access to the model weights.
out compromising the sensitivity and confidentiality of their To evaluate the viability of ZKML to equip generative AI
private data. with performance fairness and quality guarantees, we intro-
duce "snarkGPT", a zkSNARK prototype for the GPT-2 ar-
A provably fair GPT prototype: SNARKGPT chitecture in the following sections.
This section describes ZKML in the context of performance nanoGPT adaptations
fairness and how zkSNARKs can ensure that all clients
of a generative AI service are treated equally. We demon-
strate the practical viability of this solution by introducing The architecture of the nanoGPT model is illustrated in
snarkGPT, a verifiable ZKML pipeline for the GPT-2 model, table 1. The model takes in as input an input prompt, which
Table 1: The architecture of the nanoGPT model (Karpathy Empirical feasibility evaluation
2022).
Overview
Layer Parameters Role This part of the study aims to shed light on the feasibility
Token vocabulary Creates token embeddings of our cryptographic solution for complementing fairness in
Embedding size, for the input sequence. machine learning models, specifically focusing on nanoGPT.
embedding As the time and memory costs for the verifier are constant in
size Halo2, we evaluate the performance of our proposed method
Positional block size, Creates positional embed- based on the time and memory costs for the prover, who
Embedding embedding dings for the input sequence. has to generate the zkSNARK proof. The server hosting the
size model would bear the burden of generating the proof when
Dropout dropout Takes as input the token and the user wants to obtain quality guarantees for the service
rate positional embeddings and they’re consuming. Our experiments provide empirical ev-
zeroes rate% random ele- idence of the overhead and challenges in adopting ZKML
ments. as a de facto approach to improve performance fairness in a
List number of The model’s hidden layer- client-server setting.
[Transformer layers s/transformer blocks. Takes We first test to what extent we can scale up two of the
Blocks] as input the token and po- main components of our nanoGPT architecture when gener-
sitional embeddings after ating the proof: the number of layers and the size of the em-
dropout. beddings. We consider these parameters in our tests because
Layer Nor- embedding Applies Layer Normaliza- they are core parameters of the self-attention mechanism. To
malisation size tion to the output of the be able to focus on the parameters we select to increase, we
hidden layers, as described choose relatively small values for the rest of the parameters
in (Ba, Kiros, and Hinton for simplicity. More specifically, we set the vocabulary size
2016). to 65, the block size to 64, the number of heads to 4, the
Language embedding A linear layer with weights batch size to 1 and the dropout rate to 0. All our tests are per-
Modelling size, tied to the input embedding. formed using a fixed version of the EZKL framework (com-
Head (Lin- vocabulary mit 8f122bf 1). We perform all our experiments on a machine
ear Layer) size with the following specifications: Intel i7-9700K CPU, 3.60
GHz / 4.90 GHz, 64 GB 3600 MHz RAM and 200 GB Swap
area in NVMe SSD.
We are also interested in how the circuit size (the log num-
ber of rows pre-allocated for the Halo2 circuit matrix to rep-
can be empty, and generates text based on the input sequence. resent the model) influences the time and memory costs for
Anchoring this setting to a medical diagnosis, a patient’s the prover. Therefore, in our tests, we create multiple proofs
symptoms represent the input prompt, and the medical pre- for the same nanoGPT model configuration using EZKL,
scription represents the generated text. To achieve quality wherein we systematically vary the logarithmic dimensions
guarantees, we must adapt the nanoGPT model to make it allocated to the matrix representation of the circuit. We ex-
compatible with zkSNARKs. pect that a larger matrix will result in significantly higher
proving costs.
First, since zkSNARKs operate over finite fields, all
model values must be mapped to a value in the chosen finite Scaling Up the Model
field. In the self-attention mechanism of transformers, the el- Tables 2 and 3 present a detailed breakdown of the run-
ements in the upper-triangular portion of the self-attention time incurred during proof generation for varying embed-
matrix are set to − inf to eliminate the information that fol- ding sizes and layers, respectively. As shown in Tab. 2, the
lows in the sequence. − inf is not a valid element of a finite computational demands appear to increase nonlinearly with
field. Therefore, we set the elements in the upper-triangular the size of the embeddings. For instance, the proof gener-
portion of the self-attention matrix to the smallest value ation time increases from ∼5 minutes for an embedding
covered in the Halo2 lookup table. More specifically, let size of 64 to ∼20 minutes for an embedding size of 144.
B be the lookup table’s logarithmic size. The lookup table Similarly, an augmentation in the number of layers corre-
stores all values in the given finite field between −2B−1 + 1 sponds to an increase in the time required for proof genera-
and 2B−1 − 1. Thus, we zero out the elements in the self- tion. These combined results emphasize the complex factors
attention matrix by setting them to −2B−1 + 1. Then, we re- that must be considered when optimizing the architecture
move all optimizations that are not compatible with ONNX of generative AI models within cryptographic frameworks.
(flash attention (Dao et al. 2022)) nor EZKL (array slicing, ZKML practitioners must balance the model’s representa-
cube operation). Note that these operations are only used tional capacity with a tractable computational efficiency. In
in the nanoGPT python code to accelerate computing self-
attention on GPU. Removing them has thus no impact on 1
https://github.com/zkonduit/ezkl/commit/
the accuracy of the model. 8f122bf2eb5794b681364eb182d7cdd8a7350fe4
other words, the findings highlight the need for a nuanced ap- Notably, transformers such as nanoGPT exhibit a pro-
proach that considers both the sophistication of the model’s nounced M / N ratio increase. For instance, with an embed-
representation and the practical constraints imposed by zk- ding size of 64, the M / N ratio approximates 64. In practical
SNARKs. terms, a one-million-parameter model generates a stagger-
ing 64 million constraints, while a 250,000-parameter model
Circuit Matrix Size Impact on Proof Generation still generates 16 million constraints. In stark contrast, a four-
Table 4 illustrates the impact of the (logarithmic) size of the layer convolutional network with 3,047 parameters gener-
circuit matrix representation of the nanoGPT model on proof ates merely 13,152 constraints, yielding an M / N ratio of
generation time and memory costs. That is, the number of approximately 4.
rows we have to pre-allocate in memory for the circuit ma- Factors Impacting Constraint Generation We recog-
trix representation when using EZKL and Halo2. As illus- nize that the time and memory costs associated with zk-
trated, the runtime exhibits a non-linear progression, initially SNARK proof generation depend not only on the number of
improving from 3 minutes and 55 seconds to 3 minutes and parameters but also on various architectural and methodolog-
11 seconds as the logarithmic size increases from 14 to 18. ical factors. Specifically, the M / N ratio within a ZK-circuit
However, beyond this point, there is a notable rise in runtime, is influenced by 1) the specific gates and constraints chosen
with a substantial increase observed for a matrix logarithmic to represent neural network layers; 2) the network architec-
size of 24 and 25. The increased runtime for larger circuit ture, as we have seen from the difference between transform-
matrices aligns with our expectations. Suppose we allocate ers and other types of networks; 3) the chosen proof system
significantly more rows in the matrix than needed for the and its embedded features, such as lookup arguments that
number of polynomial constraints. In that case, those rows may yield fewer constraints than those without.
will be filled with 0s and still be used in the process (as each Addressing the challenge of minimizing this M / N ratio
column describes a wire in the circuit), adding overhead. in zkSNARK proof generation remains an open area of re-
Similarly, an upward trend in memory cost is apparent as search and exploration. Our findings underscore the need for
the logarithmic size of the circuit matrix increases. This is innovative architectural designs and proof methodologies to
expected, as the entire matrix representation of the circuit enhance the efficiency and practicality of our cryptographic
is generated and stored in RAM when generating the proof. solution for fairness in machine learning models.
Notably, the memory cost experiences a more significant es-
calation than the runtime, reaching 148 GB for a matrix loga-
rithmic size of 25. This finding underscores the intricate con-
Conclusion
siderations involved in managing memory resources when This work introduced a pioneering approach that leverages
dealing with larger circuit matrices. Zero-Knowledge Machine Learning (ZKML) to address the
critical issues of performance fairness and reliability in gen-
Generalisation beyond GPT architectures erative AI models. By harnessing zkSNARKs, our proto-
In this section, we present the empirical results of our inves- col empowers users to confidently engage with AI models
tigation into the relationship between a model’s architecture, through remote cloud APIs, ensuring that the model they in-
the number of parameters and the number of constraints gen- teract with aligns precisely with their expectations.
erated within zkSNARK proofs. The number of constraints The practical viability of our proposal is demonstrated
directly reflects the proving costs. It is thus crucial in our through "snarkGPT," a verifiable ZKML pipeline for GPT2-
context to understand how generalizable our results are to like models. Our empirical evaluations indicate the effective-
other models beyond generative LLMs. ness of our approach, particularly in scaling up the model
and its generalization beyond GPT architectures. We showed
Experimental Setup To assess the generality of the how snarkGPT could be inserted in an end-to-end protocol
parameter-constraint relationship, we designed experiments where the user could obtain a proof of correct execution
based on a model configuration previously examined in alongside the model’s regular output and verify it in constant
Modulus Labs (Labs 2023). This configuration consists of time regardless of the input size or model complexity.
multiple Multi-Layer Perceptron (MLP) layers, each com- This paper lays the foundation for a new era of transparent
prising a linear layer, the Rectified Linear Unit (RELU) acti- and reliable AI, emphasizing ZKML as a vital component
vation function, and a scaling-down factor. We selected this in guaranteeing quality and achieving performance fairness.
setup to assess if the observed parameter-constraint correla- By leveraging ZKML, we bridge the gap between AI model
tion extends to transformer models like nanoGPT. providers and users, ensuring that AI is a truly equitable and
Results Our empirical findings reveal a striking contrast transparent tool for all.
in constraint generation between nanoGPT and the Modu-
lus Labs MLP model. Specifically, while the number of con-
straints (M) in the zkSNARK proof for the Modulus Labs
model roughly aligns with the number of parameters (N)
in the model, our nanoGPT model exhibits a significantly
higher M / N ratio, ranging from approximately 58X to 85X
more constraints than parameters. These observations are
summarized in Table 5.
Table 2: Runtime during proof generation for nanoGPT configured with different embedding sizes.
embeddings size 64 80 96 112 128 144

Runtime for proof 5m 2s 6m 7s 8m 33s 11m 46s 16m 9s 20m 28s
generation
Table 3: Runtime during proof generation for nanoGPT configured with different layer configurations.
number of layers 4 8 12 16 20
Runtime for proof generation 5m 2s 8m 30s 12m 19s 17m 4s 29m 9s
Table 4: Time and memory costs incurred during proof generation for nanoGPT for different sizes of the circuit matrix.
log size of the circuit 14 16 18 20 22 24 25

matrix
Runtime for proof 3m 55s 3m 26s 3m 11s 3m 34s 4m 40s 9m 53s 14m 26s
generation
Memory cost for 44 GB 45 GB 46 GB 49 GB 63 GB 118 GB 148 GB
proof generation
Table 5: The results of generating a zkSNARK proof for our nanoGPT model versus the Modulus Labs model.
Number of parameters Runtime Memory cost Number of Constraints

0.2 M 5m 2s 63 GB 17 M
0.31 M 6m 7s 76 GB 24 M
nanoGPT 0.45 M 8m 33s 108 GB 34 M
0.79 M 11m 46s 132 GB 45 M
1.01 M 16m 9s 173 GB 58 M
0.2 M 0.2 M
0.31 M 0.3 M
MLP 0.45 M ∼1m 30s ∼26 GB 0.45 M
0.79 M 0.8 M
1.01 M 1M
References Advances in Cryptology—CRYPTO’86: Proceedings 6, 186–
194. Springer.
???? The halo2 Book. https://zcash.github.io/halo2/. Ac- Gabizon, A.; Williamson, Z. J.; and Ciobotaru, O. 2019.
cessed: 2023-01-25. Plonk: Permutations over lagrange-bases for oecumenical
noninteractive arguments of knowledge. Cryptology ePrint
Ba, J. L.; Kiros, J. R.; and Hinton, G. E. 2016. Layer nor- Archive.
malization. arXiv preprint arXiv:1607.06450. Gálvez, B. R.; Granqvist, F.; van Dalen, R.; and Seigel, M.
2021. Enforcing fairness in private federated learning via
Barocas, S.; Hardt, M.; and Narayanan, A. 2017. Fairness in the modified method of differential multipliers. In NeurIPS
machine learning. NIPS tutorial, 1: 2. Workshop Privacy in Machine Learning.
Groth, J. 2016. On the Size of Pairing-based Non-interactive
Bitansky, N.; Canetti, R.; Chiesa, A.; Goldwasser, S.; Lin, Arguments. Cryptology ePrint Archive, Paper 2016/260.
H.; Rubinstein, A.; and Tromer, E. 2017. The hunting of the https://eprint.iacr.org/2016/260.
SNARK. Journal of Cryptology, 30(4): 989–1066. Jiang, M.; Roth, H. R.; Li, W.; Yang, D.; Zhao, C.; Nath,
V.; Xu, D.; Dou, Q.; and Xu, Z. 2023. Fair Federated Medi-
Bowe, S.; Grigg, J.; and Hopwood, D. 2019. Recursive cal Image Segmentation via Client Contribution Estimation.
Proof Composition without a Trusted Setup. Cryptology (arXiv:2303.16520). ArXiv:2303.16520 [cs].
ePrint Archive, Paper 2019/1021. https://eprint.iacr.org/ Kalai, A. T.; Kalai, E.; Lehrer, E.; and Samet, D. 2010. A
2019/1021. commitment folk theorem. Games and Economic Behavior,
69(1): 127–137.
Butler, A. 2023. Building Trust in AI with Zero-knowledge Kang, D.; Hashimoto, T.; Stoica, I.; and Sun, Y. 2022.
proofs (ZKP). Scaling up Trustless DNN Inference with Zero-Knowledge
Proofs.
Camuto, A. D.; and Morton, J. 2022. EZKL. https://github. Karpathy, A. 2020. Karpathy/minGPT: A minimal pytorch
com/zkonduit/ezkl. Accessed: 2023-06-19. re-implementation of the openai GPT (generative pretrained
transformer) training. https://github.com/karpathy/minGPT.
Camuto, A. D.; and Morton, J. 2023. What is EZKL? https: Accessed: 2023-06-19.
//docs.ezkl.xyz/. Accessed: 2023-06-19. Karpathy, A. 2022. Karpathy/nanoGPT: The simplest,
fastest repository for training/finetuning medium-sized
Chu, L.; Wang, L.; Dong, Y.; and et al. 2021. Fedfair: GPTs. https://github.com/karpathy/nanoGPT. Accessed:
Training fair models in cross-silo federated learning. arXiv 2023-06-19.
preprint arXiv:2109.05662. Kate, A.; Zaverucha, G. M.; and Goldberg, I. 2010.
Constant-size commitments to polynomials and their appli-
Dao, T.; Fu, D. Y.; Ermon, S.; Rudra, A.; and Ré, C. 2022. cations. In Advances in Cryptology-ASIACRYPT 2010: 16th
FlashAttention: Fast and Memory-Efficient Exact Attention International Conference on the Theory and Application of
with IO-Awareness. arXiv:2205.14135. Cryptology and Information Security, Singapore, December
5-9, 2010. Proceedings 16, 177–194. Springer.
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei- Labs, M. 2023. The Cost of Intel-
Fei, L. 2009. Imagenet: A large-scale hierarchical image ligence. https://drive.google.com/file/d/
database. In 2009 IEEE conference on computer vision and 1tylpowpaqcOhKQtYolPlqvx6R2Gv4IzE/view. Accessed:
pattern recognition. Ieee. 2023-06-19.
Lee, S.; Ko, H.; Kim, J.; and Oh, H. 2020. vCNN: Verifiable
Deng, Y.; Kamani, M. M.; and Mahdavi, M. 2020. Distribu- convolutional neural network based on zk-SNARKs. Cryp-
tionally robust federated averaging. In NeurIPS, volume 33, tology ePrint Archive.
15111–15122. Li, T.; Hu, S.; Beirami, A.; and Smith, V. 2021. Ditto: Fair
and robust federated learning through personalization. In
Du, W.; Xu, D.; Wu, X.; and Tong, H. 2021. Fairness-aware ICML, 6357–6368.
agnostic federated learning. In SDM, 181–189. SIAM. Li, T.; Sanjabi, M.; Beirami, A.; and Smith, V. 2020. Fair
resource allocation in federated learning. In ICLR.
Feng, B.; Qin, L.; Zhang, Z.; Ding, Y.; and Chu, S. Liu, T.; Xie, X.; and Zhang, Y. 2021. zkCNN: Zero Knowl-
2021. ZEN: An optimizing compiler for verifiable, zero- edge Proofs for Convolutional Neural Network Predictions
knowledge neural network inferences. Cryptology ePrint and Accuracy. Cryptology ePrint Archive, Paper 2021/673.
Archive. https://eprint.iacr.org/2021/673.
Mohri, M.; Sivek, G.; and Suresh, A. T. 2019. Agnostic
Fiat, A.; and Shamir, A. 1987. How to prove yourself: Prac- federated learning. In International Conference on Machine
tical solutions to identification and signature problems. In Learning, 4615–4625. PMLR.
Nitulescu, A. 2020. zk-SNARKs: A Gentle Introduction.
Technical report, Technical report.
Nori, H.; King, N.; McKinney, S. M.; Carignan, D.; and
Horvitz, E. 2023. Capabilities of gpt-4 on medical challenge
problems. arXiv preprint arXiv:2303.13375.
OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
Pentyala, S.; Neophytou, N.; Nascimento, A.; De Cock,
M.; and Farnadi, G. 2022. Privfairfl: Privacy-preserving
group fairness in federated learning. arXiv preprint
arXiv:2205.11584.
Petkus, M. 2019. Why and How zk-SNARK Works.
arXiv:1906.07221.
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I.;
et al. 2018. Improving language understanding by genera-
tive pre-training.
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.;
Sutskever, I.; et al. 2019. Language models are unsupervised
multitask learners. OpenAI blog, 1(8).
Sonos. 2020. Tract: Tiny, no-nonsense, self-contained, ten-
sorflow and ONNX inference. Accessed: 2023-06-19.
Weng, J.; Weng, J.; Tang, G.; Yang, A.; Li, M.; and Liu, J.
2022. pvCNN: Privacy-Preserving and Verifiable Convolu-
tional Neural Network Testing. CoRR, abs/2201.09186.
Zhang, D. Y.; Kou, Z.; and Wang, D. 2020. Fairfl: A fair
federated learning approach to reducing demographic bias in
privacy-sensitive classification models. In Big Data, 1051–
1060. IEEE.

Zero Knowledge ML For Generative AI

Uploaded by

Copyright:

Available Formats

Zero Knowledge ML For Generative AI

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zero Knowledge ML For Generative AI

Uploaded by

Copyright:

Available Formats

Trust the Process: Zero-Knowledge Machine Learning to Enhance Trust in

Abstract We present a protocol that empowers users to engage with

embeddings size 64 80 96 112 128 144

log size of the circuit 14 16 18 20 22 24 25

Number of parameters Runtime Memory cost Number of Constraints

You might also like