0% found this document useful (0 votes)
32 views6 pages

A Fingerprint For Large Language Models

Uploaded by

steve.jing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views6 pages

A Fingerprint For Large Language Models

Uploaded by

steve.jing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Fingerprint for Large Language Models

Zhiguang Yang Hanzhou Wu


School of Communication & Information School of Communication & Information
Engineering, Shanghai University Engineering, Shanghai University
Shanghai 200444, China Shanghai 200444, China
[email protected] [email protected]

Abstract—Recent advances show that scaling a pre-trained lan- not solely to maintain commercial value but also to promote
guage model could achieve state-of-the-art performance on many the sustainable development of the open-source community.
downstream tasks, prompting large language models (LLMs) to Numerous studies have employed digital watermarking and
arXiv:2407.01235v1 [cs.CR] 1 Jul 2024

become a hot research topic in the field of artificial intelligence.


However, due to the resource-intensive nature of training LLMs fingerprinting methods to protect deep neural network (DNN)
from scratch, it is urgent and crucial to protect the intellectual models. Uchida et al. [5] first introduced a regularization term
property of LLMs against infringement. This has motivated the to constrain the network weights for embedding watermarks.
authors in this paper to propose a novel black-box fingerprinting Subsequently, scholars have proposed various methodologies
technique for LLMs, which requires neither model training nor to embed watermarks into the model parameters in white-box
model fine-tuning. We first demonstrate that the outputs of LLMs
span a unique vector space associated with each model. We model scenarios, where the extractor can access the entirety of model
the problem of ownership authentication as the task of evaluating parameters [6]–[8]. However, the challenges in accessing all
the similarity between the victim model’s space and the output’s the model parameters in various scenarios has led to a focus
space of the suspect model. To deal with this problem, we propose on black-box techniques for model watermarking. Many wa-
two solutions, where the first solution involves verifying whether termarking methods utilize backdoor techniques, constructing
the outputs of the suspected large model are in the same space as
those of the victim model, enabling rapid identification of model specific input-output mappings and observing the output of
infringement, and the second one reconstructs the union of the the DNN model for verification [9]–[11]. Generative models,
vector spaces for LLM outputs and the victim model to address particularly image processing models, often produce contents
situations where the victim model has undergone the Parameter- with high entropy and sufficient information capacity to ac-
Efficient Fine-Tuning (PEFT) attacks. Experimental results indi- commodate additional watermark information, which remains
cate that the proposed technique achieves superior performance
in ownership verification and robustness against PEFT attacks. highly imperceptible. Wu et al. [12] introduced a watermark
This work reveals inherent characteristics of LLMs and provides framework to make the output contents of the model contain
a promising solution for ownership verification of LLMs in black- a certain watermark. Lukas et al. [13] proposed embedding
box scenarios, ensuring efficiency, generality and practicality. watermarks by fine-tuning the image generator, ensuring that
Index Terms—Intellectual property protection, large language all images produced are watermarked. Fernandez et al. [14]
model, fingerprint, security, Transformer, deep learning.
extended it to the diffusion model. Embedding watermarks
through backdoor and fine-tuning techniques compromises the
I. I NTRODUCTION primary functionality of the model to a certain extent and
Large Language Models (LLMs) have emerged as a cor- requires significant computational resources. Song et al. [15]
nerstone of modern artificial intelligence, exhibiting remark- distinguished different generative models based on their arti-
able performance across various natural language processing facts and fingerprints, which can help alleviate this problem.
tasks due to their capability to generate human-like texts and The emergence of superior reasoning capabilities in LLMs
comprehend human language. Despite the substantial data which require significant computational overhead, poses new
samples and computational resources required to train LLMs, challenges for their protection. Xu et al. [16] specified a
numerous developers continue to advance research by open- confidential private key and embedded it as an instructional
sourcing their models. Many organizations and researchers, backdoor, serving as a fingerprint. Zeng et al. [17] utilized
including those behind well-known models such as LlaMA, the internal parameters in Transformer [18] as a fingerprint to
Gemma and Mistral [1]–[4], have released their well-trained identify the LLMs. Existing methods typically require white-
LLMs to the public. The thriving universe of LLMs is driven box access or fine-tuning to verify the copyright information.
by its entirely open-source nature. However, malicious users In contrast, our proposed fingerprinting method can be imple-
may exploit developed LLMs for illegal purposes such as fine- mented in a black-box scenario without any fine-tuning.
tuning with another pre-trained model without attribution or We present a novel fingerprinting method for LLMs that
stealing a model and claiming it as their own asset. Therefore, analyzes the output of the LLMs for model authentication.
safeguarding intellectual property rights of LLMs is essential, LLMs generate semantically coherent and reasonable texts
by sampling from logits, which themselves contain substan-
Corresponding author: Hanzhou Wu (contact email: [email protected]) tial model-related information. In black-box scenarios, LLM
providers often offer complete or partial logits vectors, en- it or fine-tune it and claim ownership. We protect the legitimate
abling users to apply various sampling methods to generate rights of the model publisher by analyzing distinctive features
realistic contents. Carlini et al. [19] demonstrated the ability found in the parameters or output of the model as fingerprints.
to extract portions of a model solely through API access. In this study, we utilize the output of LLMs as fingerprints for
We implement LLM fingerprinting from a novel perspective, model verification.
identifying unique attributes of each LLM by analyzing their
logits output. Previous works have shown that the output of B. Parameter-Efficient Fine-Tuning
LLMs resides in a linear subspace defined by their parame- Training or fine-tuning DNNs based on pre-training requires
ters [20]–[22]. We retain the parameters of the victim model, substantial resources, particularly for LLMs, which demand
specifically the last linear layer. By querying the suspect model significant GPU memory. Recent research focuses on efficient
and obtaining its output, we use the retained parameters to fine-tuning methods that achieve optimization with fewer
determine the unique attribution of the suspect model. We parameters [23]–[25]. Parameter-Efficient Fine-Tuning (PEFT)
identify ownership authentication as comparing the similarity is used to fine-tune a model with minimal parameters. Low-
between the victim model’s vector space and the suspect Rank Adaptation (LoRA) [24] has become the de facto method
model’s output space. We initially propose a method to swiftly for PEFT, serving as the foundation for many other approaches
ascertain whether the output originates from the victim model such as [25], [26]. In LoRA, the weight matrix WO is updated
by determining its compatibility with the vector space formed by the formula WN = WO +∆W = WO +AB. During training,
by the retained parameters. To address scenarios involving WO remains fixed, while the two matrices A and B encompass
Parameter-Efficient Fine-Tuning (PEFT) attacks, we develop the least trainable parameters. This study uses LoRA to mimic
an alignment-verification method to determine if the suspect the PEFT attack.
model was derived through PEFT by comparing similarities.
Furthermore, through API access, we are able to reconstruct C. Threat Model
complete logits from partial ones to verify ownership. Our Our threat model includes a defender, known as the model
method enables ownership verification in black-box scenar- provider, and an adversary who controls a malicious user. The
ios solely through API access and does not depend on the adversary’s goal is to steal the model and claim ownership.
specific structure of LLMs. This ensures generalizability and We assume the adversary can fine-tune the model using PEFT
provides an effective and promising approach for copyright attacks like LoRA [24] to evade detection. The defender, with
protection of LLMs. Our experiments demonstrate that the access to their own model’s parameters and the last linear layer
proposed method achieves supervisor verification performance as a fingerprint, aims to verify ownership through API access
and robustness against PEFT attacks, without any compromise to the suspect model.
in model functionality.
In summary, the main contributions of this work include: III. LLM F INGERPRINT
• By analyzing the characteristics of LLMs from a novel A. LLM Outputs Span A Vector Space
perspective, we demonstrate that their outputs can serve
The transformer architecture has already become the base
as fingerprints for ownership verification.
of numerous models due to its exceptional performance across
• We propose two distinct methods for implementing own-
a variety of tasks. LLMs are transformer-based models, with
ership verification of LLMs: one quickly determines
their pipeline presented in Fig. 1. The input text is tokenized
whether the output originates from the victim model, and
and converted into word embeddings e, with the embedding
the other verifies whether the suspected model has been
layer represented as a matrix of size |V| × h, where |V| is the
subjected to a PEFT attack from the victim model.
vocabulary size and h is the hidden size. The vocabulary size
• We present a method to recover complete information for
is significantly larger than the hidden size. The embeddings
fingerprint verification by obtaining partial logits through
are processed by the transformer block with multiple layers,
API access.
each containing a multi-head self-attention mechanism and a
The remainder of this paper is organized as follows: Section feed-forward layer, to calculate the intrinsic representation of
II provides an overview of fingerprinting, PEFT and the threat the input and produce the intermediate representation z. The
model. Section III explains the LLM fingerprinting method. In last linear layer maps z to logits s ∈ R|V| . The output of
Section IV, we detail the two proposed ownership verification LLMs is generated by sampling from the logits s. For clarity,
methods, followed by our experimental results and analysis in this analysis excludes Layer Normalization [27] and RMS
Section V. Finally, we conclude this paper in Section VI. Normalization [28], as they primarily introduce additional
II. P RELIMINARIES multiplication terms, which do not affect the conclusions.

A. Fingerprinting
The model provider releases the model Mθ , including its
complete structure and parameters. Once a model is released, it
may be stolen by malicious users who may either directly steal Fig. 1. The illustration of the LLMs pipeline.
It is worth noting that the logits s = Wz, where W ∈ R|V|×h reasonable and commonly in practice. For example, OpenAI
is the weight matrix of the last linear layer and the rank of W includes it in their API1 . The bias is added to the specific logits
is at most h. Every output produced by W is corresponding of the tokens before the softmax operation, the API returns the
to a vector s that will always lie within a subspace of R|V| , top-k probabilities. Without loss of generality, we assume that
spanned by the columns of W, which is at most h-dimensional. 1, 2, . . . , m is the indices of the selected tokens, and the biased
All possible outputs of the LLM logits will span a vector space probabilities distribution pb is calculated by Equation (2).
L isomorphic to the space spanned by the columns of W, since
they have the same dimensionality. Consequently, each LLM (
b b si + b i ∈ {1, . . . , m}
is associated with a distinctive vector space L. This uniqueness p = softmax(s ), where sbi = (2)
arises because the vector space R|V| encompasses an exceed- si otherwise
ingly large number of potential subspaces for any subspace of To address this scenario, we propose a method to recover
dimension h. For example, in the case of Gemma [3], with the complete probabilities p. For a given prompt to the model,
h = 2048 and |V| = 256000, the number of subspaces is we can obtain the top-k probabilities. Suppose the reference
extremely large. Due to variations in training initialization, token has the highest probability without bias, denoted as the
datasets, configurations and hardware, it is impossible for two reference probability pref . We then add a large bias b to the
different LLMs to cover the same vector space. This property other k − 1 tokens, pushing them into the top-k. This gives
allows us to use it as a fingerprint for LLMs and identify us the biased probabilities pbi for the k − 1 tokens and the
their ownership. The model providers only need to retain the biased probability pbref for the reference token. Finally, we can
parameters of the last linear layer in the model. Accessing the calculate the unbiased probabilities pi for the k − 1 tokens
API of the suspected model enables them to retrieve the logits, using Equation (3). Complete probabilities p can be obtained
thereby verifying the ownership. by querying the model with the same prompt and adding the
B. Vector Space Reconstruction via API bias b to all tokens in the vocabulary.
We already demonstrated that the logits outputs from LLMs pbi
span a vector space denoted as L, which we then utilized as the pi = × pref (3)
pbref
fingerprint. Obtaining the full logits of a model is not always
feasible, as attackers aim to disclose minimal information to 3) Top-1 probability: The API yields the highest probability
evade detection by the victim. In this section, we investigate for a single output, akin to the top-k probabilities, with k
a practical scenario in which our approach reconstructs the specifically set to 1. In this scenario, we present a method to
vector space for fingerprint verification and only relies on recover the complete probabilities p. Suppose we introduce
API access, enabling the retrieval of complete vocabulary a substantial bias b to token i, thereby elevating it to the
probabilities, top-k probabilities, or the top-1 probability. top position, resulting in the biased probability pbi . We can
1) Complete probabilities of the vocabulary: The API calculate the unbiased probability pi for token i using Equation
provides the complete probabilities p over the vocabulary. (4). Then, we can obtain the complete probabilities p by
Probabilities p = softmax(s), all the elements of which are querying the model with the same prompt and adding the bias
non-negative and have a sum that is equal to 1. It means b to all tokens in the vocabulary.
that p is a point in the simplex ∆|V|−1 , which lies within a 1
|V| − 1 dimensional subspace of R|V| . Additionally, p is also pi = b (4)
eb−log pi
− eb + 1
constrained by s. Due to normalization, the softmax function
does not have a well-defined inverse transformation. However, It is noteworthy that theoretically, this method can poten-
if we omit it temporarily, and use the CLR (centered log- tially recover the unbiased probability in the top-k scenario,
ratio) transformation, we can obtain p′ that differs from the albeit demanding a significantly higher number of queries.
original p by a constant bias. From the perspective of spatial However, practical application is impeded by the presence of
dimensions, this will introduce a one-dimensional deviation. exponential operations, which may induce numerical instabil-
We can manually do one-dimensional correction and will not ity, consequently impacting subsequent results. In contrast, the
affect the results, as h and |V| are both significantly greater method outlined in Section III-B2 solely involves proportional
than 1. Therefore, we directly reconstruct the vector space P ′ operations, thus significantly mitigating issues stemming from
spanned by p′ . The value of p′ is computed using Equation (1), numerical instability.
where g(·) represents the geometric mean of the input. IV. OWNERSHIP V ERIFICATION

p
 A. Overview

p = CLR(p) = log (1)
g(p) The framework of ownership verification is illustrated in
Fig. 2. We propose two distinct methods to verify the owner-
2) Top-k probabilities: The API provides the top-k prob-
ship of LLMs. The first method involves verifying whether the
abilities, which are the k largest elements of p. In this
scenario, we assume that the provider allows the user to alter 1 https://help.openai.com/en/articles/5247780-using-logit-bias-to-alter-
token probabilities using logit bias through the API. This is token-probability-with-the-openai-api
deviation by appending a constant column vector to W and
obtain W′ as shown in Equation (6).

W′ = [W, 1] (6)

We then use W instead of W to verify the ownership using
the same method described above.
C. Alignment Verification
In PEFT attacks, the suspect model has undergone fine-
tuning, altered its last linear layer. The attack changes the
victim model’s vector space L, preventing ownership verifica-
Fig. 2. The framework of ownership verification.
tion by the method described in Section IV-B. As in manifold
hypothesis, high-dimensional data can be compressed into a
outputs of the suspected model occupy the same space as those low-dimensional latent space, and different tasks are governed
of the victim model. This means that the suspected model and by distinct features in that space. Therefore, when fine-tuning
the victim model share the same last linear layer, facilitating for a specific task, it is assumed to impact only a portion of
the rapid identification of model infringement. Once the model the space L, not the entire space. Especially, in PEFT attacks
has been fine-tuned, indicating that the parameters of the like LoRA, the model is fine-tuned with a low-rank matrix.
last linear layer have been altered, i.e., changes occur in the The parameters are updated by WN = WO + ∆W, where ∆W
vector space, making verification impossible. Therefore, we is a matrix which rank is no more than k. If there is substantial
introduce an alignment verification method to resolve this overlap between the output space L of the suspect model
challenge. This method calculates the joint space dimension and the space spanned by the columns of the victim model
formed by the output vector space of the suspected model W, it indicates that the suspect model closely resembles the
and the parameter space of the victim model. Infringement is victim model and may have been derived through unauthorized
determined by comparing the similarity of these two spaces. replication.
In practice, we calculate the dimension formed by the union
B. Compatibility Testing of the output vector space of the suspect model and the
parameter space of the victim model, denoted as Lsum . If the
As demonstrated in Section III-A, the outputs of LLMs dimension of Lsum is close to that of L, it indicates that the
span a vector space L isomorphic to the space spanned by the suspect model is derived from the victim model. Otherwise,
columns of the last linear layer W. To simplify the expression, it is not. For matrices containing numerous floating-point
we slightly abuse the notation by using L to represent the space numbers, directly calculating the rank is not advisable, as it
by the columns of the last linear layer W in the following can lead to significant errors due to numerical inaccuracies.
discussion. We retain the last linear layer of LLMs as private, Instead, we introduce Algorithm 1 to calculate the dimension
serving as the fingerprint of the victim model. The suspected difference ∆r between Lsum and L.
model is queried through the API to obtain its logits output s. Once ∆r is obtained, it can be used to determine if the
For a suspected model which is stolen from the victim model suspect model is derived from the victim model. If ∆r is much
and the last linear layer does alter, the output s will lie within less than an order of magnitude compared to the hidden size
the vector space L. We can verify this by solving equation h, which indicates that the suspect model is indeed derived
Wx = s to determine whether s lies within the space L. If the from the victim model. In probabilities access scenarios, ∆r
equation has a solution, the suspected model is considered to will only experience a numerical disturbance of one, which
be derived from the victim model. However, due to numerical will not affect our results.
instability introduced by floating-point calculations, we instead
calculate the Euclidean distance d between s and the space L V. E XPERIMENT R ESULTS AND A NALYSIS
to determine whether the suspected model is derived from the A. Experimental Setup
victim model. The d is calculated by Equation (5), where x̂ is We employed Gemma [3] with a hidden size of 2048 as the
the solution of the equation Wx = s. victim model and fine-tuned it by LoRA [24] as attack method.
The LoRA [24] module was applied to the query, key and
d = ∥s − Wx̂∥ (5) value module in attention mechanism [18] or the linear module
in the last layer. The ranks of LoRA [24] were set to 16, 32
We set an error term e to represent the error introduced and 64. The models were fine-tuned on the Alpaca dataset [29]
by the numerical instability. If d < e, this implies that s is and the SAMSum dataset [30] to simulate different scenarios.
compatible with the space L, suggesting that the suspected Additionally, we also compared with a new version termed as
model is derived from the victim model. If we only have access Gemma-2, which shares the same structure but under a novel
to probabilities p and recover p′ using the method described training method, leading to substantial income and completely
in Section III-B, we manually correct the one-dimensional different outputs for the identical inputs.
Algorithm 1 Pseudocode for dimension difference calculation TABLE II
Input: W: the parameter matrix of the last linear layer in the D IMENSION DIFFERENCE RESULTS ON VARIOUS MODEL VERSIONS AND
MODELS FINE - TUNED WITH DIFFERENT MODULES AND RANKS .
victim model; M: the suspected model; Q: the query set;
N : the least number of samples; e: the error term.
Output: ∆r: the dimension difference. full top-5 top-1
probabilities probabilities probability
1: Initialize ∆r = 0, n = 0, S = ∅.
2: while n ≤ N do Gemma 1 1 1
3: Randomly sample a query q from Q Gemma-2 300 300 300
4: Get the logits outputs O = {s1 , s2 , . . . } by querying SAMSum
the suspected model M with q qkv-16 1 1 1
5: S ←S ∪O qkv-32 1 1 1
qkv-64 1 1 1
6: n = size(S) linear-16 11 11 11
7: end while linear-32 12 12 11
8: for i = 1, 2, . . . , n do linear-64 11 11 11
9: Solve Wxi = si to obtain x̂i Alpaca
10: Call Eq. (5) to determine di qkv-16 1 1 1
11: if di > e then qkv-32 1 1 1
12: ∆r = ∆r + 1 qkv-64 1 1 1
linear-16 16 16 16
13: end if linear-32 21 21 20
14: end for linear-64 22 22 21
15: return ∆r

TABLE I Table I. Each column corresponds to the results of d under dif-


C OMPATIBILITY TESTING RESULTS ON VARIOUS MODEL VERSIONS AND ferent API scenarios, while each row represents the outcomes
MODELS FINE - TUNED WITH DIFFERENT MODULES AND RANKS .
for different models. In the table, ‘qkv’ and ‘linear’ represent
LoRA module was applied to the query, key and value module
full logits full proba- top-5 prob- top-1 in the attention mechanism or the linear module in the last
bilities abilities probability
layer, respectively, with their suffixes indicating the rank of
Gemma 8.0 × 10−5 3.4 × 10−3 3.4 × 10−3 8.3 × 10−5 LoRA. The experimental results clearly reveal that for models
Gemma-2 5.5 × 105 5.5 × 105 5.5 × 105 5.5 × 105 fine-tuned on the last layer or not, a significant difference exits
SAMSum between them and other models. This distinction is adequate
qkv-16 9.7 × 10−5 1.4 × 10−4 1.4 × 10−4 1.0 × 10−4 for ascertaining ownership, verifying the effectiveness of the
qkv-32 8.5 × 10−5 3.6 × 10−3 3.6 × 10−3 9.6 × 10−5 proposed method. It is worth to note that although this is the
qkv-64 1.1 × 10−4 9.5 × 10−4 9.5 × 10−4 9.9 × 10−5 average results, this consistency will hold even with only a
linear-16 7.3 × 105 7.2 × 105 7.2 × 105 7.2 × 105
linear-32 6.9 × 105 6.8 × 105 6.8 × 105 6.9 × 105
few samples in practice, thus also enabling rapid verification.
linear-64 7.6 × 105 7.5 × 105 7.5 × 105 7.5 × 105
C. Alignment Verification Results
Alpaca
To minimize computational and time overhead, the value
qkv-16 8.3 × 10−5 5.7 × 10−4 5.7 × 10−4 8.4 × 10−4 of N was set to 300 in Algorithm 1, considering that the
qkv-32 7.0 × 10−5 2.8 × 10−3 2.8 × 10−3 8.5 × 10−5
qkv-64 6.8 × 10−5 1.6 × 10−4 1.6 × 10−4 8.1 × 10−5
value of ∆r is strictly constrained under the PEFT attacks. We
linear-16 6.8 × 105 6.7 × 105 6.7 × 105 6.7 × 105 obtained 300 complete probabilities through random queries
linear-32 6.7 × 105 6.6 × 105 6.6 × 105 6.6 × 105 and calculated results of ∆r to evaluate the effectiveness of
linear-64 6.9 × 105 6.8 × 105 6.8 × 105 6.9 × 105 the alignment verification method. We preserved the param-
eters of Gemma’s last linear layer and conducted ∆r across
various models. Table II presents the results of the dimension
difference calculation. Each row depicts the results of ∆r with
B. Compatibility Testing Results
different models using Algorithm 1, while each column shows
In the compatibility testing, we assume that malicious users the outcomes across various API scenarios. The configuration
might release their stolen model either in its original form or of PEFT attacks is the same as that in Section V-B. In the
after fine-tuning it. Fine-tuning is on the attention mechanism table, ‘qkv’ and ‘linear’ respectively denote LoRA module
in the intermediate layers or the last linear layer. We generated was applied to the query, key and value module in the attention
(or recovered) 300 complete logits (or probabilities) through mechanism or the linear module in the last layer, with their
random queries and calculated the average Euclidean distance suffixes indicating the rank of LoRA. We observe that the
d. We retained the parameters of Gemma’s last linear layer ∆r value of the Gemma and the model fine-tuned on the
and conducted verification experiments across various models. intermediate layers equals 1, which is the result of the CLR
The results of the compatibility testing method are presented in transformation. For the models fine-tuned on the last layer,
the ∆r value is close to the rank of LoRA and much less [7] B. Darvish Rouhani, H. Chen, and F. Koushanfar, “Deepsigns: An
than h, providing the evidence to ascertain that the suspected end-to-end watermarking framework for ownership protection of deep
neural networks,” in Proceedings of the Twenty-Fourth International
model originates from the victim model. It is noteworthy that Conference on Architectural Support for Programming Languages and
models fine-tuned on the last layer exhibit approximately equal Operating Systems, 2019, p. 485–497.
values of ∆r despite having a larger rank, which is consistent [8] P. Fernandez, G. Couairon, T. Furon et al., “Functional invariants to
watermark large transformers,” in IEEE International Conference on
with the assumption of PEFT that the updated weights during Acoustics, Speech and Signal Processing, 2024, pp. 4815–4819.
model fine-tuning have a low intrinsic rank. The intrinsic rank [9] Y. Adi, C. Baum, M. Cisse et al., “Turning your weakness into a
is also influenced by the datasets and reflected in the values strength: Watermarking deep neural networks by backdooring,” in 27th
USENIX Security Symposium, 2018, pp. 1615–1631.
of ∆r. Experiments show that the proposed method can verify [10] X. Zhao, H. Wu, and X. Zhang, “Watermarking graph neural networks
the copyright, despite PEFT attacks. by random graphs,” in IEEE International Symposium on Digital Foren-
Furthermore, to investigate the relationship between Gemma sics and Security, 2021, pp. 1–6.
[11] Y. Liu, H. Wu, and X. Zhang, “Robust and imperceptible black-box
and Gemma-2, we obtained 3000 probability vectors of dnn watermarking based on fourier perturbation analysis and frequency
Gemma-2 and got the value of ∆r is 2052. This result implies sensitivity clustering,” IEEE Transactions on Dependable and Secure
that the Gemma-2 has undergone extensive fine-tuning or Computing, 2024.
[12] H. Wu, G. Liu, Y. Yao, and X. Zhang, “Watermarking neural networks
training from scratch compared to Gemma and also confirms with watermarked images,” IEEE Transactions on Circuits and Systems
that the uniqueness of the proposed fingerprint. Note that the for Video Technology, vol. 31, no. 7, pp. 2591–2601, 2021.
value of ∆r is not exactly the same as h due to the accuracy [13] N. Lukas and F. Kerschbaum, “PTW: Pivotal tuning watermarking for
Pre-Trained image generators,” in 32nd USENIX Security Symposium,
of numerical calculation, but they are of the same magnitude. 2023, pp. 2241–2258.
[14] P. Fernandez, G. Couairon, H. Jégou et al., “The stable signature: Root-
VI. C ONCLUSION ing watermarks in latent diffusion models,” in International Conference
In this paper, we propose a novel fingerprinting method for on Computer Vision, 2023, pp. 22 409–22 420.
[15] H. J. Song, M. Khayatkhoei, and W. AbdAlmageed, “Manifpt: Defin-
LLMs that is generalized and capable of verifying fingerprint ing and analyzing fingerprints of generative models,” arXiv preprint
in a black-box scenario without requiring model training or arXiv:2402.10401, 2024.
compromising in model functionality. Specifically, we analyze [16] J. Xu, F. Wang, M. D. Ma et al., “Instructional fingerprinting of large
language models,” arXiv preprint arXiv:2401.12255, 2024.
the model’s outputs and present two methods for copyright ver- [17] B. Zeng, C. Zhou, X. Wang et al., “Huref: Human-readable fingerprint
ification. First, we introduce a comprehensive testing method for large language models,” arXiv preprint arXiv:2312.04828, 2023.
to swiftly ascertain whether the suspect model is identical [18] A. Vaswani, N. Shazeer, N. Parmar et al., “Attention is all you need,”
in Advances in Neural Information Processing Systems, vol. 30, 2017.
to the victim model. To address scenarios involving PEFT [19] N. Carlini, D. Paleka, K. D. Dvijotham et al., “Stealing part of a
attacks, we also propose an alignment verification method for production language model,” arXiv preprint arXiv:2403.06634, 2024.
copyright verification. Furthermore, we develop a reconstruc- [20] Z. Yang, Z. Dai, R. Salakhutdinov et al., “Breaking the softmax bottle-
neck: A high-rank rnn language model,” in International Conference on
tion method to recover the complete information under API Learning Representations, 2018.
access. Our empirical study shows that our proposed method [21] M. Finlayson, J. Hewitt, A. Koller et al., “Closing the curious case
achieves supervisor verification performance and robustness of neural text degeneration,” in International Conference on Learning
Representations, 2024.
against PEFT attacks. Our proposed method also reveals in- [22] M. Finlayson, S. Swayamdipta, and X. Ren, “Logits of api-protected
herent characteristics of LLMs and offers a promising solution llms leak proprietary information,” arXiv preprint arXiv:2403.09539,
for advancing ownership verification of LLMs. 2024.
[23] N. Houlsby, A. Giurgiu, S. Jastrzebski et al., “Parameter-efficient
ACKNOWLEDGEMENT transfer learning for NLP,” in International Conference on Machine
Learning, vol. 97, 2019, pp. 2790–2799.
This work was supported by the Natural Science Foundation [24] E. J. Hu, Y. Shen, P. Wallis et al., “LoRA: Low-rank adaptation
of China under Grant Number U23B2023. of large language models,” in International Conference on Learning
Representations, 2022.
R EFERENCES [25] T. Dettmers, A. Pagnoni, A. Holtzman et al., “Qlora: Efficient finetuning
of quantized llms,” Advances in Neural Information Processing Systems,
[1] H. Touvron, T. Lavril, G. Izacard et al., “Llama: Open and efficient vol. 36, 2024.
foundation language models,” arXiv preprint arXiv:2302.13971, 2023. [26] F. Meng, Z. Wang, and M. Zhang, “Pissa: Principal singular values and
[2] H. Touvron, L. Martin, K. Stone et al., “Llama 2: Open foundation and singular vectors adaptation of large language models,” arXiv preprint
fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023. arXiv:2404.02948, 2024.
[3] G. Team, T. Mesnard, C. Hardin et al., “Gemma: Open models based [27] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv
on gemini research and technology,” arXiv preprint arXiv:2403.08295, preprint arXiv:1607.06450, 2016.
2024. [28] B. Zhang and R. Sennrich, “Root mean square layer normalization,” in
[4] A. Q. Jiang, A. Sablayrolles, A. Mensch et al., “Mistral 7b,” arXiv Advances in Neural Information Processing Systems, vol. 32, 2019.
preprint arXiv:2310.06825, 2023. [29] Y. Dubois, X. Li, R. Taori et al., “Alpacafarm: A simulation framework
[5] Y. Uchida, Y. Nagai, S. Sakazawa et al., “Embedding watermarks into for methods that learn from human feedback,” Advances in Neural
deep neural networks,” in Proceedings of the 2017 ACM International Information Processing Systems, vol. 36, 2024.
Conference on Multimedia Retrieval, 2017, p. 269–277. [30] B. Gliwa, I. Mochol, M. Biesek et al., “SAMSum corpus: A human-
[6] J. Wang, H. Wu, X. Zhang et al., “Watermarking in deep neural networks annotated dialogue dataset for abstractive summarization,” in 2nd Work-
via error back-propagation,” Electronic Imaging, vol. 32, pp. 1–9, 2020. shop on New Frontiers in Summarization, 2019, pp. 70–79.

You might also like