APIGen-MT: Agentic Pipeline For Multi-Turn Data Generation Via Simulated Agent-Human Interplay
APIGen-MT: Agentic Pipeline For Multi-Turn Data Generation Via Simulated Agent-Human Interplay
Attribution
Abstract
1
technologies evolve. • We propose the first concrete implementation of this
Open-set attribution methods [1, 17, 45] improve de- concept, combining open-set source identification with
tection by differentiating known generators from unknown an evolvable embedding space to robustly discriminate
sources. However, these methods struggle to determine known sources and detect unknown sources.
whether unknown images originate from a single emerg- • We develop an unsupervised clustering approach that au-
ing source or multiple distinct sources. Additionally, since tonomously discovers new generative sources by identi-
open-set systems rely on fixed models trained on initial dis- fying high-confidence subsets within unknown samples.
tributions, their performance deteriorates as new generators • We design an automated update and validation mecha-
continually emerge, rendering them ineffective in dynami- nism that continuously refines the embedding space and
cally evolving environments. decision boundaries, ensuring system improvements with
Continual learning methods [4, 28] incrementally update each adaptation step.
synthetic media identification systems with new data, help- • We conduct extensive experiments demonstrating that
ing maintain or recover performance. However, these meth- our approach achieves consistently superior detection and
ods require manual data collection, curation, and labeling attribution performance as new generators continually
for each new source, making them slow, resource-intensive, emerge, while existing methods experience significant
and impractical. Moreover, rapid proliferation of new gen- degradation over time.
erators—some intentionally concealed—further limits the
scalability and responsiveness of human-dependent contin- 2. Background and Related Work
ual learning approaches.
The rapid rise of generative AI has resulted in highly real-
Each existing approach addresses only a specific aspect
istic synthetic images, necessitating advances in the detec-
of the broader challenge posed by continuously emerging
tion and source attribution of AI-generated media. Despite
generative models. Transferable and zero-shot methods
significant progress, most existing methods remain static,
initially generalize but degrade over time; open-set meth-
assuming that all relevant generative sources are known at
ods identify unknown samples but fail to discern individ-
training time. Consequently, they often struggle to adapt
ual emerging sources; and continual learning methods rely
when they encounter images from novel or unseen genera-
heavily on impractical human intervention. Consequently,
tors.
there is a critical need for synthetic media identification sys-
tems that autonomously detect the emergence of previously Forensic Microstructures. Extensive prior research has
unseen generators, adapt their internal models, and maintain demonstrated that the process by which an image is cap-
robust detection and attribution performance over time. tured or generated leaves behind distinctive forensic mi-
In this paper, we introduce the concept of an autonomous crostructures [11, 25, 26, 42, 47]. Cameras and synthetic
self-adaptive synthetic media identification system and pro- image generators each imprint unique patterns on images
pose its first concrete implementation. Our system auto- due to their different methods of forming pixel values. Al-
matically detects synthetic images, identifies new genera- though these microstructures are not directly observable,
tive sources, and continuously adapts without human in- various techniques have been developed to estimate them
tervention. We employ an open-set identification strat- for synthetic image detection [6, 10, 12, 23, 31]. However,
egy leveraging an evolvable embedding space to discrim- while these methods capture model-specific traits, they typ-
inate known sources and detect unknown samples. To ically do not address how to adapt when new generative pro-
autonomously discover new generative sources, we pro- cesses with distinct microstructures are introduced.
pose an unsupervised clustering method that aggregates un- Supervised and Open-Set Approaches. Supervised detec-
known samples into high-confidence clusters corresponding tors, trained on large labeled datasets [4, 8, 27, 39, 44, 46],
to emerging generators. Subsequently, the embedding space perform well on data similar to the training set but falter
and decision boundaries are automatically refined based on against unseen generators due to distributional shifts [11].
these clusters, with each update validated to ensure consis- Open-set methods extend these approaches by rejecting im-
tent improvement. Extensive experiments demonstrate our ages that do not match any known source [17, 32, 45], yet
method’s consistently superior detection and attribution ac- they lack mechanisms to integrate novel sources once de-
curacy, whereas existing approaches experience significant ployed.
performance degradation over time. Zero-Shot Detection. In contrast, zero-shot detection
In summary, our main contributions are: methods aim to identify synthetic images without prior ex-
• We introduce the concept of an autonomous self-adaptive posure to outputs from specific generators, relying solely
synthetic media identification system, capable of auto- on training with real images [14, 32]. By focusing on gen-
matically detecting and adapting to newly emerging gen- eralizable features rather than generator-specific artifacts,
erative sources without human intervention. zero-shot approaches mitigate biases introduced by syn-
2
thetic training data and offer a promising direction for ro- tem is presented in Fig. 2.
bust detection. However, these methods still assume a fixed
decision framework and do not inherently adapt when new,
4.1. Open-Set Detection & Source Attribution
unseen generative techniques emerge. We begin by establishing an open-set synthetic media iden-
Motivation for Self-Adaptive Systems. These limitations tification system capable of both detection and attribution.
highlight the need for self-adaptive forensic systems that This system is initialized at update step ℓ = 0 and incremen-
can dynamically update their detection and attribution mod- tally updated as new generative sources are discovered. At
ules as new generative models emerge. Our work builds on each update step ℓ, the system operates over a set of known
these advances, proposing a framework that continuously synthetic media sources, denoted S (ℓ) , and an associated
(ℓ)
adapts to an ever-changing generative landscape. training dataset, DT .
As an open-set framework, our system is explicitly de-
3. Problem Formulation signed to detect synthetic images—including those origi-
nating from previously unseen sources—and either attribute
When a synthetic image identification system is created, it them to a known source in S (ℓ) or label them as unknown
is trained using images made by a set of known generative (ℓ)
(su ). Throughout this section, we assume that DT consists
sources. After this initial training phase, however, new im- of a labeled set of Forensic Self-Descriptions (FSDs) [32],
age generators will continue to emerge. Despite this, the each capture the underlying forensic microstructures of an
system must maintain its performance. Furthermore, we image.
assume that a human will not be available to identify the
Capturing Forensic Microstructures. To accurately
emergence of a new generative source, collect training data
model the microstructures present in an image, we leverage
from this source, and use this training data to update the
a recent approach called Forensic Self-Description (FSD).
synthetic media identification system. As a result, any mea-
FSDs are learned in a self-supervised manner and have
sures that the system uses to maintain its performance must
shown strong zero-shot detection and open-set source attri-
be performed autonomously.
bution capabilities. To construct the FSD, a series of predic-
For a synthetic image identification system to maintain
tive filters are applied to produce residuals that emphasize
high performance as new generators emerge, it must con-
microstructures while suppressing scene content. A multi-
tinuously update itself. To achieve successful autonomous
scale parametric model is then used to characterize the sta-
updating, the system must:
tistical relationships among these residuals, and the result-
1. Detect synthetic images—including those produced by ing model parameters, denoted by ϕ, serve as the FSD of
previously unseen generators. This requires the system the image.
to have strong transferability or zero-shot detection ca-
pabilities. Enhanced Source Separation Embedding Space. As
2. Attribute synthetic media to their correct source or flag demonstrated by Nguyen et al. [32], while FSDs provide
them as originating from a new, unknown source. very robust zero-shot detection capabilities, they are not ex-
3. Identify the emergence of a new generator by analyz- plicitly optimized to accurately discriminate between differ-
ing data labeled as unknown for self-similar forensic mi- ent synthetic sources.
crostructures. To address this limitation, we propose learning an en-
4. Update its detection and attribution models accordingly. hanced embedding space that more effectively separates
image sources. To do this, we must design our embed-
4. Proposed Approach ding space such that: (1) Embeddings from a common
source must be close together; (2) Embeddings from dif-
At a high level, our autonomous self-adapting synthetic me- ferent sources must be far apart; and (3) All aspects of the
dia identification system first establishes an open-set frame- forensic microstructures must be preserved to retain strong
work capable of detecting synthetic images and attributing detection capabilities.
them to known generators. It then collects images flagged To achieve these goals, first, let the FSD be denoted as
as originating from unknown sources into a buffer. Images ϕk and a new projection at time step ℓ be f (ℓ) such that
in this buffer are clustered to identify potential new gener- ψk = f (ℓ) (ϕk ), where ψk is an enhanced embedding. We
ative sources, and the cluster with the highest confidence is can attain goals (1) and (2) by training f (ℓ) with a loss func-
designated as a newly discovered generator. Finally, images tion as follows:
from this newly discovered generator are used to update and P h 2 2
i
f (ℓ) ϕaj − f (ℓ) ϕpj
Latt = − f (ℓ) ϕaj − f (ℓ) ϕnj + m ,
refine the underlying open-set system. This iterative process Tj ∈T
2 2 +
3
Figure 2. System diagram of our proposed self-adaptive forensic framework, which continuously identifies and buffers unknown samples,
clusters them to discover emerging sources, and updates the embedding space and open-set model as needed.
and, the total embedding loss is then calculated as: Figure 3. 2D t-SNE plot of the embedding space before and after
applying the enhanced source separation projection.
Lembed = Latt + λ Lpres , (3)
Subsequently, we define the attribution function α(ψk ) by
where λ is a hyperparameter that balances the contribution
applying a confidence threshold τreject :
of the preservation loss relative to the attribution loss.
Fig. 3 shows 2D t-SNE [43] plots comparing the origi-
sc if p(ψk | sc ) > τreject ,
nal FSD space and our enhanced embedding space, demon- α(ψk ) = (5)
strating improved separability among sources. Notably, real u
s otherwise.
images (labeled 0 in the blue cluster) remain well sepa-
rated from generative sources, preserving detection capa-
Here, su ∈
/ S (ℓ) denotes the ”unknown” source.
bility while enhancing discrimination.
We then collect and store unknown examples in a buffer
Detecting and Attributing Image Sources. After obtain- B for later analysis and identification of a potential new
ing our enhanced embeddings, we use them to perform open source.
set synthetic image detection and source attribution. To ac-
complish this, we first build a model of the distribution of 4.2. New Source Discovery
embeddings from each known source. Here, we model each Periodically, the buffer is analyzed by a dedicated subsys-
of these distributions using a Gaussian mixture model such tem designed to detect the emergence of new generative
that p(ψk | s) = GMM(ψk | {πs,i , µs,i , Σs,i }M i=1 ), where
s
sources. When a new source appears, a significant portion
Ms is the number of mixture components for source s, and of the buffered data exhibits similar forensic microstruc-
πs,i , µs,i , and Σs,i represent the mixture weights, means, tures. To identify these, we apply a clustering algorithm
and covariance matrices, respectively. that groups the enhanced embeddings based on their foren-
At test time, for each new image xk with embedding ψk , sic patterns. This process yields a set of clusters C such that
the system computes p(ψk | s) for every s ∈ S (ℓ) using the
learned GMMs. We then define the candidate source sc as: C = {C1 , C2 , . . . , Cm }, with |C1 | ≥ |C2 | ≥ · · · ≥ |Cm |,
c
s = arg max p(ψk | s). (4) where | · | denotes the set cardinality operator.
s∈S (ℓ)
4
Figure 4. 2D t-SNE plots of the embedding space after updating at time step 0, 1, 2, and 3 (from left to right).
To select a candidate cluster from C that may correspond (2) source attribution accuracy on S (ℓ) , and (3) source attri-
to a new source, we begin with the cluster containing the bution accuracy on the new source.
most elements, denoted C ∗ . To confirm that C ∗ represents a If the update meets all three criteria, it is accepted. Oth-
coherent, novel source, we perform a sufficiency check: C ∗ erwise, we first attempt to select a different candidate clus-
is deemed sufficient if |C ∗ | > τs ; otherwise, it is rejected. ter from C; if that fails, we adjust the clustering hyperpa-
If C ∗ passes this test, we conclude that a new source has rameters and try again. Should these attempts prove unsuc-
been identified and proceed to update the detection and at- cessful, the system reverts to normal operation and waits for
tribution subsystem. If not, no new source is discovered, additional data.
and the system continues operating while awaiting addi-
tional data in the buffer. 5. Experiments and Results
4.3. System Update Implementation Details. We train our embedding model
Once a new source, denoted as sn , is discovered, we update using AdamW [24] (10−4 learning rate, 0.01 weight de-
our set of known sources as follows: cay) with mini-batches of 256 samples and 64 hardest pos-
itives/negatives per anchor. The embeddings are modeled
S (ℓ+1) = S (ℓ) ∪ {sn }. (6) using GMMs [29] with a confidence threshold set to maxi-
mize TPR−FPR. For clustering, we employ DBSCAN [16]
We then partition the candidate cluster C ∗ into training with a minimum sample size 7 and epsilon searched be-
(CT∗ ) and validation (CV∗ ) subsets, which are merged with tween 5 and 20. Additional implementation details are pro-
our existing training and validation sets: vided in the supplemental materials.
5
Initial Average over each new source Final
Type Method Det. Acc. AUC% Att. Acc. Det. Acc. AUC% Att. Acc Det. Acc. AUC% Att. Acc.
CNNDet [44] 83.9 95.2 - 58.2 65.7 - 65.2 73.8 -
UFD [33] 78.7 93.6 - 64.8 77.4 - 68.6 81.8 -
Det.
POSE [45] 95.8 99.3 88.0 49.4 45.7 0.0 60.0 58.1 29.3
FSD [32] 89.1 98.8 71.1 85.2 97.3 0.0 86.3 97.7 23.7
SA
Ours 99.5 99.9 99.5 98.4 97.6 85.3 97.8 98.3 83.0
Table 1. Comparison of detection and source attribution performance. The table reports detection accuracy, AUC, and source attribu-
tion accuracy initially, on newly introduced sources (averaged), and overall performance after all sources are integrated. Our proposed
self-adaptive (SA) framework consistently outperforms both detection-only and static open-set attribution methods, demonstrating robust
continuous detection and superior attribution on emerging generative models.
Emerging Sources
Metric Progan Dalle3 MJv6 Firefly SD1.4 SD1.5 SD3 SDXL Avg.
Det. Acc. 95.9 98.4 99.1 99.2 97.1 98.5 99.2 92.1 97.4
Det. AUC% 99.9 99.9 98.7 98.1 98.7 96.5 99.6 90.0 97.7
Att. Acc. 92.4 83.6 94.0 67.6 49.6 90.8 79.2 82.0 79.9
6
Method Det. Acc Att. Acc.
Component
Proposed 98.0 83.0
Cluster. Alg. k-means 87.9 46.1
No Preservation 95.2 47.0
Embd. Spac.
Overly Preserved 96.0 42.2
No Validate 92.7 49.9
Val. Crit. No New Crit. 97.9 60.1
No Det. Crit. 96.6 45.4
No Att. Crit. 95.0 51.9
7
Figure 7. 2D t-SNE plots of the original FSD space (left), enhanced separation embedding space with preservation loss term (right), and
without this term (middle).
function (Eq. 3) are shown in Tab. 4 and illustrated in Embedding Space Det. Acc. Det. AUC% Att. Acc. AU-CRR AU-OSCR
FSDs only 92.4 96.3 90.7 59.5 56.0
Fig. 7. We examine three embedding configurations: the No Preservation 98.1 99.6 99.8 61.9 61.9
original native FSD embedding space, enhancement using Contrastive+Preservation 99.5 99.9 99.5 64.2 64.2
only the contrastive loss term, and enhancement using both
contrastive and preservation loss terms. The original FSD Table 4. Embedding space selection. Our approach with preserva-
embedding clearly distinguishes real images from synthetic tion loss outperforms FSDs and an embedding space without the
ones but struggles to differentiate among multiple synthetic preservation loss in detection and source attribution accuracy, and
sources. Introducing the contrastive loss significantly im- open-set metrics.
proves the separation between synthetic sources, increas- formance improvements depend on the quality and repre-
ing detection accuracy to 98.1% and attribution accuracy to sentativeness of the identified subsets used for updates. Fu-
99.8%. However, robust rejection of unknown sources (AU- ture work could explore semi-supervised clustering or ad-
CRR: 59.5%→61.9%) remains limited without the preser- vanced uncertainty estimation methods to further improve
vation term. Incorporating both contrastive and preserva- the separation of closely related novel sources. Further-
tion terms leads to the best overall performance (64.2% AU- more, incorporating active learning strategies or weak su-
CRR), highlighting that the preservation loss is crucial for pervision signals could enhance the reliability of updates,
maintaining robust and balanced performance in open-set thereby strengthening the system’s robustness in highly dy-
detection and attribution tasks. namic, real-world scenarios.
Continuously Evolving Embedding Space. The evolu-
tion of our embedding space over successive update steps 7. Conclusion
is illustrated in Fig. 4. Initially, the embedding space sepa-
rates known sources into distinct, well-defined clusters. As We presented an autonomous, self-adaptive synthetic
new generative sources emerge at each subsequent update media identification system that dynamically detects
step, our system automatically integrates these sources by synthetic images, attributes them to known sources,
refining its embedding representation, clearly illustrated by and—crucially—adapts to emerging generative models
the progressive addition of well-separated clusters. No- without human intervention. Our approach combines open-
tably, each incremental update effectively accommodates set identification with an evolving embedding space, an un-
the newly discovered source without disrupting existing em- supervised clustering mechanism to discover novel gener-
beddings, highlighting the stability of our adaptation ap- ators, and an automated update and validation process that
proach. This continuous evolution of the embedding space continuously refines detection and attribution performance.
allows our system to robustly distinguish among multiple Extensive experiments demonstrate that our system consis-
emerging sources, sustaining high detection and attribution tently outperforms static methods, maintaining high detec-
accuracy over time, even as the landscape of generative tion accuracy and robust source attribution even as new gen-
models becomes increasingly complex. erators emerge.
Limitations & Future Work. Our autonomous self-
adaptive synthetic media identification system while strong, References
can still face certain limitations. In particular, our ap- [1] Lydia Abady, Jun Wang, Benedetta Tondi, and Mauro Barni.
proach relies on unsupervised clustering to identify emerg- A siamese-based verification system for open-set architec-
ing sources, which may occasionally group together dis- ture attribution of synthetic images. Pattern Recognition Let-
tinct sources sharing similar forensic patterns. Addition- ters, 180:75–81, 2024. 2
ally, while the embedding space evolves continuously, per- [2] Adobe. Adobe Firefly. https://www.adobe.com/
8
products/firefly.html. Accessed: March 7, 2025. Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim
5 Dockhorn, Zion English, and Robin Rombach. Scaling rec-
[3] David Arthur and Sergei Vassilvitskii. k-means++: The tified flow transformers for high-resolution image synthesis.
advantages of careful seeding. Technical report, Stanford, In Forty-first International Conference on Machine Learn-
2006. 7 ing, 2024. 5
[4] Aref Azizpour, Tai D. Nguyen, Manil Shrestha, Kaidi Xu, [16] Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei
Edward Kim, and Matthew C. Stamm. E3: Ensemble of Xu. A Density-Based Algorithm for Discovering Clusters
expert embedders for adapting synthetic image detectors in Large Spatial Databases with Noise. In Second Interna-
to new generators using limited data. In Proceedings of tional Conference on Knowledge Discovery and Data Min-
the IEEE/CVF Conference on Computer Vision and Pattern ing (KDD’96). Proceedings of a conference held August 2-4,
Recognition (CVPR) Workshops, pages 4334–4344, 2024. 2 pages 226–331, 1996. 5
[5] Quentin Bammey. Synthbuster: Towards detection of diffu- [17] Shengbang Fang, Tai D Nguyen, and Matthew c Stamm.
sion model generated images. IEEE Open Journal of Signal Open set synthetic image source attribution. In 34th British
Processing, 5:1–9, 2024. 5 Machine Vision Conference 2023, BMVC 2023, Aberdeen,
[6] Mauro Barni, Kassem Kallas, Ehsan Nowroozi, and UK, November 20-24, 2023. BMVA, 2023. 2, 5, 6
Benedetta Tondi. Cnn detection of gan-generated face im-
[18] Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fis-
ages based on cross-band co-occurrences analysis. In 2020
cher, Dorothea Kolossa, and Thorsten Holz. Leveraging fre-
IEEE International Workshop on Information Forensics and
quency analysis for deep fake image recognition. In Inter-
Security (WIFS), pages 1–6, 2020. 2
national conference on machine learning, pages 3247–3258.
[7] James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng
PMLR, 2020. 1
Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce
Lee, Yufei Guo, et al. Improving image generation with [19] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen.
better captions. Computer Science. https://cdn. openai. Progressive growing of GANs for improved quality, stabil-
com/papers/dall-e-3. pdf, 2(3):8, 2023. 5 ity, and variation. In International Conference on Learning
[8] Tu Bui, Ning Yu, and John Collomosse. Repmix: Represen- Representations, 2018. 5
tation mixing for robust attribution of synthesized images. In [20] Tero Karras, Samuli Laine, and Timo Aila. A style-based
European Conference on Computer Vision, pages 146–163. generator architecture for generative adversarial networks.
Springer, 2022. 1, 2 In Proceedings of the IEEE/CVF conference on computer vi-
[9] Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. sion and pattern recognition, pages 4401–4410, 2019. 5
What makes fake images detectable? understanding prop- [21] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten,
erties that generalize. In European Conference on Computer Jaakko Lehtinen, and Timo Aila. Analyzing and improv-
Vision, pages 103–120. Springer, 2020. 1 ing the image quality of stylegan. In Proceedings of
[10] Umur Aybars Ciftci, Ilke Demir, and Lijun Yin. Fakecatcher: the IEEE/CVF conference on computer vision and pattern
Detection of synthetic portrait videos using biological sig- recognition, pages 8110–8119, 2020.
nals. IEEE Transactions on Pattern Analysis and Machine [22] Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen,
Intelligence, pages 1–1, 2020. 2 Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free
[11] Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Gio- generative adversarial networks. Advances in neural infor-
vanni Poggi, Koki Nagano, and Luisa Verdoliva. On the mation processing systems, 34:852–863, 2021. 5
detection of synthetic images generated by diffusion mod- [23] Yun Liu, Zuliang Wan, Xiaohua Yin, Guanghui Yue, Aiping
els. In ICASSP 2023-2023 IEEE International Conference Tan, and Zhi Zheng. Detection of gan generated image using
on Acoustics, Speech and Signal Processing (ICASSP), pages color gradient representation. Journal of Visual Communica-
1–5. IEEE, 2023. 1, 2 tion and Image Representation, 95:103876, 2023. 2
[12] Davide Cozzolino, Diego Gragnaniello, and Luisa Verdo-
[24] I Loshchilov. Decoupled weight decay regularization. arXiv
liva. Image forgery detection through residual-based local
preprint arXiv:1711.05101, 2017. 5
descriptors and block-matching. In 2014 IEEE International
Conference on Image Processing (ICIP), pages 5297–5301, [25] Jan Lukas, Jessica Fridrich, and Miroslav Goljan. Digi-
2014. 2 tal camera identification from sensor pattern noise. IEEE
[13] Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Transactions on Information Forensics and Security, 1(2):
Matthias Nießner, and Luisa Verdoliva. Raising the bar of 205–214, 2006. 2
ai-generated image detection with clip. In Proceedings of [26] Francesco Marra, Diego Gragnaniello, Davide Cozzolino,
the IEEE/CVF Conference on Computer Vision and Pattern and Luisa Verdoliva. Detection of gan-generated fake im-
Recognition (CVPR) Workshops, pages 4356–4366, 2024. 1 ages over social networks. In 2018 IEEE conference on mul-
[14] Davide Cozzolino, Giovanni Poggi, Matthias Nießner, and timedia information processing and retrieval (MIPR), pages
Luisa Verdoliva. Zero-shot detection of ai-generated images. 384–389. IEEE, 2018. 1, 2
In European Conference on Computer Vision, pages 54–72. [27] Francesco Marra, Diego Gragnaniello, Luisa Verdoliva, and
Springer, 2025. 1, 2, 6 Giovanni Poggi. Do gans leave artificial fingerprints? In
[15] Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim 2019 IEEE conference on multimedia information process-
Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik ing and retrieval (MIPR), pages 506–511. IEEE, 2019. 2
9
[28] Francesco Marra, Cristiano Saltori, Giulia Boato, and Luisa [40] Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu,
Verdoliva. Incremental learning for the detection and clas- and Yunchao Wei. Learning on gradients: Generalized arti-
sification of gan-generated images. In 2019 IEEE inter- facts representation for gan-generated images detection. In
national workshop on information forensics and security Proceedings of the IEEE/CVF Conference on Computer Vi-
(WIFS), pages 1–6. IEEE, 2019. 2 sion and Pattern Recognition (CVPR), pages 12105–12114,
[29] G. Mclachlan and K. Basford. Mixture Models: Inference 2023. 1
and Applications to Clustering. 1988. 5 [41] Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu,
[30] Midjourney. Midjourney. https://www.midjourney. Ping Liu, and Yunchao Wei. Rethinking the up-sampling op-
com/home. Accessed: March 7, 2025. 5 erations in cnn-based generative network for generalizable
[31] Lakshmanan Nataraj, Tajuddin Manhar Mohammed, Shiv- deepfake detection. In Proceedings of the IEEE/CVF Confer-
kumar Chandrasekaran, Arjuna Flenner, Jawadul H. Bappy, ence on Computer Vision and Pattern Recognition (CVPR),
Amit K. Roy-Chowdhury, and B. S. Manjunath. Detect- pages 28130–28139, 2024. 1, 6
ing gan generated fake images using co-occurrence matrices. [42] Diangarti Tariang, Riccardo Corvi, Davide Cozzolino, Gio-
Electronic Imaging, 2019. 2 vanni Poggi, Koki Nagano, and Luisa Verdoliva. Synthetic
[32] Tai Nguyen, Aref Azizpour, and Matthew C. Stamm. Foren- image verification in the era of generative artificial intelli-
sic self-descriptions are all you need for zero-shot detection, gence: What works and what isn’t there yet. IEEE Security
open-set source attribution, and clustering of ai-generated & Privacy, 22(3):37–49, 2024. 2
images. In Proceedings of the IEEE/CVF Conference on [43] Laurens van der Maaten and Geoffrey Hinton. Visualizing
Computer Vision and Pattern Recognition, 2025. 1, 2, 3, data using t-sne. Journal of Machine Learning Research, 9
6 (86):2579–2605, 2008. 4
[33] Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- [44] Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew
versal fake image detectors that generalize across genera- Owens, and Alexei A. Efros. Cnn-generated images are
tive models. In Proceedings of the IEEE/CVF Conference surprisingly easy to spot... for now. In Proceedings of
on Computer Vision and Pattern Recognition (CVPR), pages the IEEE/CVF Conference on Computer Vision and Pattern
24480–24489, 2023. 1, 6 Recognition (CVPR), 2020. 1, 2, 6
[34] Dustin Podell, Zion English, Kyle Lacey, Andreas [45] Tianyun Yang, Danding Wang, Fan Tang, Xinying Zhao,
Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Juan Cao, and Sheng Tang. Progressive open space ex-
Robin Rombach. Sdxl: Improving latent diffusion mod- pansion for open-set model attribution. In Proceedings of
els for high-resolution image synthesis. arXiv preprint the IEEE/CVF Conference on Computer Vision and Pattern
arXiv:2307.01952, 2023. 5 Recognition, pages 15856–15865, 2023. 2, 6
[35] Jonas Ricker, Denis Lukovnikov, and Asja Fischer. Aer- [46] Ning Yu, Larry S. Davis, and Mario Fritz. Attributing fake
oblade: Training-free detection of latent diffusion images images to gans: Learning and analyzing gan fingerprints. In
using autoencoder reconstruction error. In Proceedings of Proceedings of the IEEE/CVF International Conference on
the IEEE/CVF Conference on Computer Vision and Pattern Computer Vision (ICCV), 2019. 2
Recognition (CVPR), pages 9130–9140, 2024. 1 [47] Xu Zhang, Svebor Karaman, and Shih-Fu Chang. Detecting
[36] Robin Rombach, Andreas Blattmann, Dominik Lorenz, and simulating artifacts in gan fake images. In 2019 IEEE In-
Patrick Esser, and Björn Ommer. High-resolution image ternational Workshop on Information Forensics and Security
synthesis with latent diffusion models. In Proceedings of (WIFS), pages 1–6, 2019. 2
the IEEE/CVF conference on computer vision and pattern
recognition, pages 10684–10695, 2022. 5
[37] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San-
jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,
Aditya Khosla, Michael Bernstein, Alexander C. Berg, and
Li Fei-Fei. ImageNet Large Scale Visual Recognition Chal-
lenge. International Journal of Computer Vision (IJCV), 115
(3):211–252, 2015. 5
[38] Zeyang Sha, Zheng Li, Ning Yu, and Yang Zhang. De-fake:
Detection and attribution of fake images generated by text-
to-image generation models. In Proceedings of the 2023
ACM SIGSAC Conference on Computer and Communica-
tions Security, page 3418–3432, New York, NY, USA, 2023.
Association for Computing Machinery. 1
[39] Sergey Sinitsa and Ohad Fried. Deep image fingerprint: To-
wards low budget synthetic image detection and model lin-
eage analysis. In Proceedings of the IEEE/CVF Winter Con-
ference on Applications of Computer Vision, pages 4067–
4076, 2024. 1, 2
10
Autonomous and Self-Adapting System for Synthetic Media Detection and
Attribution
Supplementary Material
Page Appendix Title
1 A Implementation Details
1 B Datasets
1
each subset are provided in Tab. 5.