Paper 2 Level Up The Deepfake Detection
Paper 2 Level Up The Deepfake Detection
ABSTRACT data. Stable Diffusion [6] and DALL-E 2 [7] are the most fa-
The image deepfake detection task has been greatly addressed mous state of the art DMs, based on the text-to-image trans-
by the scientific community to discriminate real images from lation operation. As demonstrated in [8], DMs are able to
those generated by Artificial Intelligence (AI) models: a bi- produce even better realistic images than GANs, since GANs
nary classification task. In this work, the deepfake detection generate high-quality samples but are demonstrated to fail in
and recognition task was investigated by collecting a dedi- covering the entire training data distribution.
cated dataset of pristine images and fake ones generated by
9 different Generative Adversarial Network (GAN) architec- To effectively counteract the illicit use of synthetic data
tures and by 4 additional Diffusion Models (DM). generated by GANs and DMs, new deepfake detection and
A hierarchical multi-level approach was then introduced recognition algorithms are needed. As far as image deep-
to solve three different deepfake detection and recognition fake detection methods in state of the art are concerned, they
tasks: (i) Real Vs AI generated; (ii) GANs Vs DMs; (iii) mostly focus on binary detection (Real Vs. AI generated [9,
AI specific architecture recognition. Experimental results 10]) . Interesting methods in state of the art already demon-
demonstrated, in each case, more than 97% classification strated to effectively discriminate between different GAN ar-
accuracy, outperforming state-of-the-art methods. chitectures [11, 12, 13]. Methods to detect DMs and recog-
nize them have been proposed just recently [14, 15].
Index Terms— Deepfake Detection, Generative Adver-
sarial Nets, Diffusion Models, Multimedia Forensics
In order to level up the deepfake detection and recognition
task, the objective of this paper and the main contribution is
1. INTRODUCTION
to classify an image among 14 different classes: 9 GAN ar-
The term deepfake refers to all those multimedia contents chitectures, 4 DMs engines and 3 pristine datasets (labeled
generated an AI model. The most common deepfake creation as belonging to the same “real” class). At first, a dedicated
solutions are those based on GANs [1] which are effectively dataset of images was collected. Then, a novel multi-level
able to create from scratch or manipulate a multimedia data. hierarchical approach exploiting ResNET models was devel-
In a nutshell, GANs are composed by two neural networks: oped and trained. The proposed approach consists of 3 lev-
the Generator (G) and the Discriminator (D). G creates new els of classification: (Level 1) Real Vs AI-generated images;
data samples that resemble the training data, while D evalu- (Level 2) GANs Vs DMs; (Level 3) recognition of specific AI
ates whether a sample is real (belonging to the training set) (GAN/DM) architectures among those represented in the col-
or fake (generated by the G). A GAN must be trained until D lected dataset. Experimental results demonstrated the effec-
is no longer able to detect samples generated by G, in other tiveness of the proposed solution, achieving more than 97%
words, when D starts to be fooled by G. Several surveys on accuracy on average for each task, exceeding the state of the
methods dealing with GAN-based approaches for the creation art. Moreover, the hierarchical approach can be used to ana-
and detection of deepfakes, have been proposed in [2, 3]. lyze multimedia data in depth to reconstruct its history (foren-
Recently, DMs [4, 5] are arousing interest thanks to their sic ballistics) [16], a task poorly addressed by the scientific
photo-realism and also to a wide choice in output control community on synthetic data.
given to the user. In contrast to GANs, DMs are a class
of probabilistic generative models that aims to model com- This paper is organized as follows: Section 2 and Section
plex data distributions by iteratively adding noise to a random 3 describe the dataset and the proposed approach built upon
noise vector input for the generation of new realistic samples it respectively. Experimental results and comparison are pre-
and, using them as basis, proceed to reconstruct the original sented in Section 4. Finally, Section 5 concludes the paper.
Fig. 1. Examples of images collected from different datasets and images generated by different GANs and DMs.