A Fingerprint For Large Language Models
A Fingerprint For Large Language Models
Abstract—Recent advances show that scaling a pre-trained lan- not solely to maintain commercial value but also to promote
guage model could achieve state-of-the-art performance on many the sustainable development of the open-source community.
downstream tasks, prompting large language models (LLMs) to Numerous studies have employed digital watermarking and
arXiv:2407.01235v1 [cs.CR] 1 Jul 2024
A. Fingerprinting
The model provider releases the model Mθ , including its
complete structure and parameters. Once a model is released, it
may be stolen by malicious users who may either directly steal Fig. 1. The illustration of the LLMs pipeline.
It is worth noting that the logits s = Wz, where W ∈ R|V|×h reasonable and commonly in practice. For example, OpenAI
is the weight matrix of the last linear layer and the rank of W includes it in their API1 . The bias is added to the specific logits
is at most h. Every output produced by W is corresponding of the tokens before the softmax operation, the API returns the
to a vector s that will always lie within a subspace of R|V| , top-k probabilities. Without loss of generality, we assume that
spanned by the columns of W, which is at most h-dimensional. 1, 2, . . . , m is the indices of the selected tokens, and the biased
All possible outputs of the LLM logits will span a vector space probabilities distribution pb is calculated by Equation (2).
L isomorphic to the space spanned by the columns of W, since
they have the same dimensionality. Consequently, each LLM (
b b si + b i ∈ {1, . . . , m}
is associated with a distinctive vector space L. This uniqueness p = softmax(s ), where sbi = (2)
arises because the vector space R|V| encompasses an exceed- si otherwise
ingly large number of potential subspaces for any subspace of To address this scenario, we propose a method to recover
dimension h. For example, in the case of Gemma [3], with the complete probabilities p. For a given prompt to the model,
h = 2048 and |V| = 256000, the number of subspaces is we can obtain the top-k probabilities. Suppose the reference
extremely large. Due to variations in training initialization, token has the highest probability without bias, denoted as the
datasets, configurations and hardware, it is impossible for two reference probability pref . We then add a large bias b to the
different LLMs to cover the same vector space. This property other k − 1 tokens, pushing them into the top-k. This gives
allows us to use it as a fingerprint for LLMs and identify us the biased probabilities pbi for the k − 1 tokens and the
their ownership. The model providers only need to retain the biased probability pbref for the reference token. Finally, we can
parameters of the last linear layer in the model. Accessing the calculate the unbiased probabilities pi for the k − 1 tokens
API of the suspected model enables them to retrieve the logits, using Equation (3). Complete probabilities p can be obtained
thereby verifying the ownership. by querying the model with the same prompt and adding the
B. Vector Space Reconstruction via API bias b to all tokens in the vocabulary.
We already demonstrated that the logits outputs from LLMs pbi
span a vector space denoted as L, which we then utilized as the pi = × pref (3)
pbref
fingerprint. Obtaining the full logits of a model is not always
feasible, as attackers aim to disclose minimal information to 3) Top-1 probability: The API yields the highest probability
evade detection by the victim. In this section, we investigate for a single output, akin to the top-k probabilities, with k
a practical scenario in which our approach reconstructs the specifically set to 1. In this scenario, we present a method to
vector space for fingerprint verification and only relies on recover the complete probabilities p. Suppose we introduce
API access, enabling the retrieval of complete vocabulary a substantial bias b to token i, thereby elevating it to the
probabilities, top-k probabilities, or the top-1 probability. top position, resulting in the biased probability pbi . We can
1) Complete probabilities of the vocabulary: The API calculate the unbiased probability pi for token i using Equation
provides the complete probabilities p over the vocabulary. (4). Then, we can obtain the complete probabilities p by
Probabilities p = softmax(s), all the elements of which are querying the model with the same prompt and adding the bias
non-negative and have a sum that is equal to 1. It means b to all tokens in the vocabulary.
that p is a point in the simplex ∆|V|−1 , which lies within a 1
|V| − 1 dimensional subspace of R|V| . Additionally, p is also pi = b (4)
eb−log pi
− eb + 1
constrained by s. Due to normalization, the softmax function
does not have a well-defined inverse transformation. However, It is noteworthy that theoretically, this method can poten-
if we omit it temporarily, and use the CLR (centered log- tially recover the unbiased probability in the top-k scenario,
ratio) transformation, we can obtain p′ that differs from the albeit demanding a significantly higher number of queries.
original p by a constant bias. From the perspective of spatial However, practical application is impeded by the presence of
dimensions, this will introduce a one-dimensional deviation. exponential operations, which may induce numerical instabil-
We can manually do one-dimensional correction and will not ity, consequently impacting subsequent results. In contrast, the
affect the results, as h and |V| are both significantly greater method outlined in Section III-B2 solely involves proportional
than 1. Therefore, we directly reconstruct the vector space P ′ operations, thus significantly mitigating issues stemming from
spanned by p′ . The value of p′ is computed using Equation (1), numerical instability.
where g(·) represents the geometric mean of the input. IV. OWNERSHIP V ERIFICATION
p
A. Overview
′
p = CLR(p) = log (1)
g(p) The framework of ownership verification is illustrated in
Fig. 2. We propose two distinct methods to verify the owner-
2) Top-k probabilities: The API provides the top-k prob-
ship of LLMs. The first method involves verifying whether the
abilities, which are the k largest elements of p. In this
scenario, we assume that the provider allows the user to alter 1 https://help.openai.com/en/articles/5247780-using-logit-bias-to-alter-
token probabilities using logit bias through the API. This is token-probability-with-the-openai-api
deviation by appending a constant column vector to W and
obtain W′ as shown in Equation (6).
W′ = [W, 1] (6)
′
We then use W instead of W to verify the ownership using
the same method described above.
C. Alignment Verification
In PEFT attacks, the suspect model has undergone fine-
tuning, altered its last linear layer. The attack changes the
victim model’s vector space L, preventing ownership verifica-
Fig. 2. The framework of ownership verification.
tion by the method described in Section IV-B. As in manifold
hypothesis, high-dimensional data can be compressed into a
outputs of the suspected model occupy the same space as those low-dimensional latent space, and different tasks are governed
of the victim model. This means that the suspected model and by distinct features in that space. Therefore, when fine-tuning
the victim model share the same last linear layer, facilitating for a specific task, it is assumed to impact only a portion of
the rapid identification of model infringement. Once the model the space L, not the entire space. Especially, in PEFT attacks
has been fine-tuned, indicating that the parameters of the like LoRA, the model is fine-tuned with a low-rank matrix.
last linear layer have been altered, i.e., changes occur in the The parameters are updated by WN = WO + ∆W, where ∆W
vector space, making verification impossible. Therefore, we is a matrix which rank is no more than k. If there is substantial
introduce an alignment verification method to resolve this overlap between the output space L of the suspect model
challenge. This method calculates the joint space dimension and the space spanned by the columns of the victim model
formed by the output vector space of the suspected model W, it indicates that the suspect model closely resembles the
and the parameter space of the victim model. Infringement is victim model and may have been derived through unauthorized
determined by comparing the similarity of these two spaces. replication.
In practice, we calculate the dimension formed by the union
B. Compatibility Testing of the output vector space of the suspect model and the
parameter space of the victim model, denoted as Lsum . If the
As demonstrated in Section III-A, the outputs of LLMs dimension of Lsum is close to that of L, it indicates that the
span a vector space L isomorphic to the space spanned by the suspect model is derived from the victim model. Otherwise,
columns of the last linear layer W. To simplify the expression, it is not. For matrices containing numerous floating-point
we slightly abuse the notation by using L to represent the space numbers, directly calculating the rank is not advisable, as it
by the columns of the last linear layer W in the following can lead to significant errors due to numerical inaccuracies.
discussion. We retain the last linear layer of LLMs as private, Instead, we introduce Algorithm 1 to calculate the dimension
serving as the fingerprint of the victim model. The suspected difference ∆r between Lsum and L.
model is queried through the API to obtain its logits output s. Once ∆r is obtained, it can be used to determine if the
For a suspected model which is stolen from the victim model suspect model is derived from the victim model. If ∆r is much
and the last linear layer does alter, the output s will lie within less than an order of magnitude compared to the hidden size
the vector space L. We can verify this by solving equation h, which indicates that the suspect model is indeed derived
Wx = s to determine whether s lies within the space L. If the from the victim model. In probabilities access scenarios, ∆r
equation has a solution, the suspected model is considered to will only experience a numerical disturbance of one, which
be derived from the victim model. However, due to numerical will not affect our results.
instability introduced by floating-point calculations, we instead
calculate the Euclidean distance d between s and the space L V. E XPERIMENT R ESULTS AND A NALYSIS
to determine whether the suspected model is derived from the A. Experimental Setup
victim model. The d is calculated by Equation (5), where x̂ is We employed Gemma [3] with a hidden size of 2048 as the
the solution of the equation Wx = s. victim model and fine-tuned it by LoRA [24] as attack method.
The LoRA [24] module was applied to the query, key and
d = ∥s − Wx̂∥ (5) value module in attention mechanism [18] or the linear module
in the last layer. The ranks of LoRA [24] were set to 16, 32
We set an error term e to represent the error introduced and 64. The models were fine-tuned on the Alpaca dataset [29]
by the numerical instability. If d < e, this implies that s is and the SAMSum dataset [30] to simulate different scenarios.
compatible with the space L, suggesting that the suspected Additionally, we also compared with a new version termed as
model is derived from the victim model. If we only have access Gemma-2, which shares the same structure but under a novel
to probabilities p and recover p′ using the method described training method, leading to substantial income and completely
in Section III-B, we manually correct the one-dimensional different outputs for the identical inputs.
Algorithm 1 Pseudocode for dimension difference calculation TABLE II
Input: W: the parameter matrix of the last linear layer in the D IMENSION DIFFERENCE RESULTS ON VARIOUS MODEL VERSIONS AND
MODELS FINE - TUNED WITH DIFFERENT MODULES AND RANKS .
victim model; M: the suspected model; Q: the query set;
N : the least number of samples; e: the error term.
Output: ∆r: the dimension difference. full top-5 top-1
probabilities probabilities probability
1: Initialize ∆r = 0, n = 0, S = ∅.
2: while n ≤ N do Gemma 1 1 1
3: Randomly sample a query q from Q Gemma-2 300 300 300
4: Get the logits outputs O = {s1 , s2 , . . . } by querying SAMSum
the suspected model M with q qkv-16 1 1 1
5: S ←S ∪O qkv-32 1 1 1
qkv-64 1 1 1
6: n = size(S) linear-16 11 11 11
7: end while linear-32 12 12 11
8: for i = 1, 2, . . . , n do linear-64 11 11 11
9: Solve Wxi = si to obtain x̂i Alpaca
10: Call Eq. (5) to determine di qkv-16 1 1 1
11: if di > e then qkv-32 1 1 1
12: ∆r = ∆r + 1 qkv-64 1 1 1
linear-16 16 16 16
13: end if linear-32 21 21 20
14: end for linear-64 22 22 21
15: return ∆r