0% found this document useful (0 votes)

15 views6 pages

Deepfake 1

Uploaded by

rakeen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

Deepfake 1

Uploaded by

rakeen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Training Strategies and Data Augmentations

in CNN-based DeepFake Video Detection

Luca Bondi, Edoardo Daniele Cannas, Paolo Bestagini, Stefano Tubaro

Dipartimento di Elettronica, Informazione e Bioingegneria
Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy

Abstract—The fast and continuous growth in number and through multi-task learning [22], attention mechanisms [23],
quality of deepfake videos calls for the development of reliable de- and ensembles of CNN [24].
tection systems capable of automatically warning users on social As detecting manipulated faces in videos becomes more im-
media and on the Internet about the potential untruthfulness of
such contents. While algorithms, software, and smartphone apps portant [25], [26], many deepfake detection systems proposed
are getting better every day in generating manipulated videos in the literature and in challenges are based on data-driven
and swapping faces, the accuracy of automated systems for face approaches, often backed by one or more CNNs trained on a
forgery detection in videos is still quite limited and generally specific dataset. However, the black-box model of data-driven
biased toward the dataset used to design and train a specific CNN-based methods is notoriously prone to a drawback: over-
detection system. In this paper we analyze how different training
strategies and data augmentation techniques affect CNN-based fitting. Oftentimes, a bare train/validation/test split done within
deepfake detectors when training and testing on the same dataset a single dataset collected with a uniform methodology and
or across different datasets. by a single team proves insufficient in avoiding over-fitting
on that very same dataset conditions and scenarios. A recent
I. I NTRODUCTION example is shown in [27], where the winning model of the
Facebook/Kaggle DeepFake Detection Challenge [28] scored
As the number of techniques and algorithms to generate an Average Precision of 82.56% on the public dataset used
deepfake videos and swap faces grows rapidly, the effort for the temporary leader board of the challenge, and then
of the forensic community is steering even more towards dropped to 65.18% on the sequestered dataset used for the
the development of reliable, robust, and automated deepfake final evaluation. Moreover, it is known that data dependency
detection methods. Techniques and pipelines for facial manip- creates the risk of developing solutions unable to generalize
ulation [1] and facial expression transfer between videos [2], over unseen methods or contexts.
[3] are rapidly improving [4], while the availability of source While most detectors prove to be very effective on a test
code (Deepfake [5], FaceSwap [6]) and even smartphone subset coming from the same data distribution they are trained
apps (Impressions [7], Doublicat [8]) makes face swapping on, what are the detection performance in a cross-dataset
available to a wider audience with either legitimate or harmful scenario? What happens when a CNN trained for deepfake
intents. Tampered video detection is not a novel task to the detection on a dataset A is tested on dataset B, C, and D? As
forensics community [9]–[11]. Codec history [12], [13], copy- it is difficult to gain direct insights about what happens inside
move detection [14], [15], frame duplication or deletion [16], a CNN black-box model, in this paper we offer a set of pre-
[17] are just a few examples of the many contributions in liminary analysis on cross-dataset performance of CNN-based
the last decades. The main drawback of the earlier systems deepfake detection approaches. Rather than focusing on de-
developed by the community is that the exploited traces are veloping a new technique optimized for a specific dataset, we
inherently subtle and vanish with compression or multiple train one of the most popular architectures used by competitors
editing operations [10]. The first generation of deepfake de- in the DeepFake Detection Challenge [28] and we evaluate
tection methods exploited several semantic traces, including how different training approaches [24] and data augmentation
eye blinking [18], face warping [19], head poses [20] or techniques [29] affect the intra-dataset and cross-dataset de-
lighting inconsistencies [21]. Due to the improvement of new tection performances. We base our experiments on publicly
and more accurate generation techniques, methods based on accessible datasets, i.e., FaceForensics++ [30], the DeepFake
semantic artifacts began to fail, leading to the proposal of data- Detection Challenge Dataset [28], and CelebDF(v2) [31]. We
driven solutions capable of providing localization information focus on faces extracted from deepfake videos rather than just
deepfake images, as video compression is usually stronger than
WIFS‘2020, December, 6-11, 2020, New York, USA. 978-1- image compression. We also perform some analysis taking
7281-9930-6/20/$31.00 ©2020 IEEE. This work was sup- into account a limited availability of training data. Far from
ported by the PREMIER project, funded by the Italian Ministry being an exhaustive evaluation or overview of all the available
of Education, University, and Research within the PRIN 2017 techniques and datasets, we wish to share with the readers
program. Hardware support was generously provided by the some insights to consider when developing a new deepfake
NVIDIA Corporation. detection system.

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 25,2021 at 22:58:08 UTC from IEEE Xplore. Restrictions apply.
II. M ETHODOLOGY TABLE I
ROC AUC FOR BASELINE INTRA AND CROSS - DATASET DETECTION
In order to effectively compare the intra-dataset and cross- PERFORMANCE .
dataset detection performances, we first need to define a
homogeneous training and testing methodology. The process Train\Test CelebDF DF DFD DFDC
of determining whether a face in a video is manipulated CelebDF 0.998 0.615 0.708 0.665
starts with a face detection and extraction phase. We rely DF 0.734 0.960 0.844 0.695
DFD 0.754 0.636 0.987 0.669
on BlazeFace [32], a fast and GPU-enabled face detector, DFDC 0.755 0.722 0.891 0.922
and we extract the face with the highest confidence from 32
frames for each video, uniformly sampled over time. This
choice follows from [24], thus taking into account that time TABLE II
ROC AUC FOR TRIPLET TRAINING INTRA AND CROSS - DATASET
and computational power may be a limited resource. As the DETECTION PERFORMANCE .
extracted faces have different scales and aspect ratios, we crop
the faces with a fixed aspect ratio of 1:1 before resizing to a Train\Test CelebDF DF DFD DFDC
fixed size of 256 × 256 pixels. Once faces are extracted and CelebDF 0.995 0.557 0.554 0.619
uniform in size, we train an EfficientNetB4 [33] architecture DF 0.717 0.960 0.829 0.684
DFD 0.759 0.709 0.882 0.666
as reference CNN, due to its popularity in the DeepFake DFDC 0.773 0.714 0.886 0.907
Detection Challenge. The trained model is used to predict the
likelihood of each face being fake. Results are reported at
frame level as the Area Under Curve (AUC) of a Receiver-
Operating-Characteristic (ROC) curve. model is the one at the iteration that minimizes the validation
loss. Training and validation batches are always balanced, with
Among the several available datasets, we select the fol-
randomly selected equal amounts of real and fake faces. No
lowing four, due to their availability and ease of access and
data augmentation is performed at this stage. We train four
download:
CNN models on the training sets of the four datasets, then
• DF: FaceForensics [30], in its original version with 1000
test each model against the test set of each dataset.
real videos and 4000 fake videos generated with four
Results are reported in Table I, where the header column
different methods.
denotes the training dataset, while the header row reports the
• DFD: Actors-based videos added to FaceForensics [34],
test dataset. Reading the table by rows, we observe how on
with 363 real and 3068 fake videos.
CelebDF and DFD the intra-dataset detection is very accurate,
• DFDC: The DeepFake Detection Challenge [28], with
with an AUC above 0.98. This, however, is not reflected on
19154 real and 100000 fake videos.
cross-dataset performance, as the model trained on CelebDF
• CelebDF: The Celeb-DF(v2) dataset [31], with 890 real
and tested on DFD presents an AUC of just 0.708 (29% gap
and 5639 fake videos.
compared to intra-dataset AUC), while the model trained on
The four dataset are divided into disjoint train, validation, and DFD and tested on CelebDF reaches an AUC of 0.754 (23%
test sets at video level. In particular, for DF and DFD we gap). The model trained on DF has a slightly lower AUC
follow the 720/140/140 split proportion as suggested in [30]. when tested on the same dataset (0.960) with a 12% gap when
For DFDC we use the folders from 40 to 49 as test set and tested on DFD. DFDC is the dataset presenting the lowest
the folders from 35 to 39 as validation set. The remaining 40 intra-dataset AUC (0.922) being at the same time the one
folders are the training set. For CelebDF we use the test set that generalizes better, with 3%, 17%, and 20% gap to DFD,
provided by the dataset itself, and we randomly select 15% CelebDF, and DF, respectively. The baseline results are in
of the videos as validation set, with the remaining 85% for line with what expected from data-driven methods: the largest
training. For both DF and DFD we consider only the videos dataset (i.e., DFDC) seems to provide more variety during the
compressed with H.264 at CRF 23. training phase, thus better generalization on unseen data.
We run all our experiments with the PyTorch [35] frame-
work on a workstation equipped with two Intel Xeon E5-
IV. T RAINING STRATEGY
2687W-v4 and several NVIDIA Titan V.
The first analysis we perform is related to the training
III. BASELINE strategy adopted for the CNN. Instead of relying on Binary
As a baseline for the upcoming experiments, we first need Cross Entropy (BCE) loss, we train the CNN with a triplet
to evaluate the deepfakes detection performance of Efficient- loss [36], by running the CNN up to the last-minus-one
NetB4 trained as a classifier using the Binary Cross Entropy layer (features layer). Considering triplets as (anchor sample,
(BCE) loss. The network is initialized with a model pre- positive sample, negative sample), we generate the training
trained on ImageNet, batch of 32 faces, Adam optimizer, triplets as (fake face, fake face, real face) and (real face, real
initial learning rate of 10−4 multiplied by a factor 0.1 after face, fake face) in an equal number for each batch, so to
2000 batch iterations with no reduction in validation loss. The balance the batch itself. The training proceeds in a two step
training ends when the learning rate falls below 10−8 . The final fashion.

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 25,2021 at 22:58:08 UTC from IEEE Xplore. Restrictions apply.
Test on CelebDF Test on DF Test on DFD Test on DFDC
BCE
0.9 0.9 0.9 0.9 Triplet
Train on CelebDF

0.8 0.8 0.8 0.8

0.7 0.7 0.7 0.7

0.6 0.6 0.6 0.6

0.5 0.5 0.5 0.5

500
1000
1500
2000
2500

500
1000
1500
2000
2500
0.9 0.9 0.9 0.9

0.8 0.8 0.8 0.8

Train on DF

0.7 0.7 0.7 0.7

0.6 0.6 0.6 0.6

0.5 0.5 0.5 0.5

500
1000
1500
2000
2500
3000
3500

500
1000
1500
2000
2500
3000
3500
0.9 0.9 0.9 0.9
Train on DFD

0.8 0.8 0.8 0.8

0.7 0.7 0.7 0.7

0.6 0.6 0.6 0.6

0.5 0.5 0.5 0.5

500
1000
1500
2000
2500

500
1000
1500
2000
0.80 0.80 0.80 0.80 2500
0.75 0.75 0.75 0.75
Train on DFDC

0.70 0.70 0.70 0.70

0.65 0.65 0.65 0.65
0.60 0.60 0.60 0.60
0.55 0.55 0.55 0.55
0.50 0.50 0.50 0.50
1000
2000
3000
4000
5000
7000

1000
2000
3000
4000
5000
7000

# videos in train set # videos in train set # videos in train set # videos in train set

Fig. 1. ROC AUC for BCE and triplet training in data-limited conditions. For each dataset two CNNs are trained selecting an increasing number of videos
with BCE and triplet loss. Interestingly, we can see that the cross-dataset performances are generally higher on DFD. This might be related to the overall
quality of the dataset: while DFD consists generally of high resolution videos, the other ones are more various and present also low quality samples. Training
for detection in such difficult settings therefore might be helpful in generalizing on different, yet of higher quality, datasets.

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 25,2021 at 22:58:08 UTC from IEEE Xplore. Restrictions apply.
CelebDF BCE CelebDF Triplet
FAKE
REAL
10
10

5 5

FAKE
0 0 REAL

5 5

10
10

10 5 0 5 10 15 10 5 0 5 10

(a) Trained on CelebDF with BCE loss (b) Trained on CelebDF with triplet loss
DFDC BCE DFDC Triplet
15 FAKE
REAL 7.5
10

5 5.0

0 2.5

5 0.0
10
2.5
15
5.0
20
7.5 FAKE
25 REAL
15 10 5 0 5 10 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5

(c) Trained on DFDC with BCE loss (d) Trained on DFDC with triplet loss
Fig. 2. MDS projection of 10 pairs of REAL/FAKE videos from the CelebDF test dataset. Each point represent a frame in a video, 32 frames are extracted
from each video. Projections are produced starting from the features extracted by an EfficientNetB4 architecture trained on different datasets with binary cross
entropy (BCE) or triplet loss.

In the first step the CNN is trained with triplet loss up For both steps, the model at the iteration with the smallest
to the features layer. In the second step, only the last layer validation loss is selected as the final one.
(classifier) of the CNN is trained (fine tuned) with binary As for the baseline, we are interested in understanding both
cross entropy. In the context of the EfficientNetB4 architecture, the intra and cross-dataset detection performance, as reported
feature vectors are 1792 elements while the classifier has in Table II. The results for DF, DFD, and CelebDF show
1793 weights (1792 multipliers and one bias coefficient). This almost the same intra-detection AUC as with BCE training,
means the classification layer accounts for less than 0.01% with a loss in generalization capability more marked for the
of the net coefficients. Triplet training is initialized with the CelebDF dataset. For the model trained on DFDC, the intra-
model trained through BCE from the baseline, as this provides detection AUC is similar to the BCE training, with slightly
a faster convergence and prevents the model from failing into better cross-dataset AUC (with a modest 2% increase in AUC
a trivial solution (all-zeros feature vector). The batch size is 10 with respect to the same combination in BCE training) only
triplets to fit into 12GB of GPU memory, the initial learning when testing on CelebDF.
rate is set to 10−5 and it is dropped by a factor 10 after 500 A different perspective on the differences between BCE and
batch iterations with no improvements on the validation loss. triplet losses is offered in Figure 1, where EfficientNetB4 is
The fine tuning of the classifier is initialized with the triplet- trained in data-limited conditions by sub-sampling the training
trained model, with an initial learning rate of 10−6 dropped dataset. In this context, triplet loss proves beneficial in intra-
by a factor 10 after 100 iterations with no validation loss dataset detection (DFDC, CelebDF, and DF) as well as in
improvements. Both the triplet training and the fine-tuning cross-dataset detection and outperforms BCE.
process are stopped when the learning rate falls below 10−8 . Even though the triplet training procedure is not revolu-

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 25,2021 at 22:58:08 UTC from IEEE Xplore. Restrictions apply.
1.0

0.9
CelebDF
ROC AUC

DF
0.8 DFD
DFDC
0.7

0.6
HF

NONE

JPEG

HSV

GAUS

ISO

DS
Fig. 3. ROC AUC of EfficientNetB4 trained with BCE on DFDC with different augmentation techniques. HF: Horizontal Flip. BC: Brightness and Contrast
change. HSV: Hue, Saturation and Value changes. ISO: Addition of ISO noise. GAUS: Addition of Gaussian noise. DS: Down-scaling. JPEG: JPEG compression.
MIX: baseline mix of all the other single augmentations. NONE: no augmentations. The horizontal lines are the AUC values when no augmentations are used.

TABLE III TABLE IV

ROC AUC FOR E FFICIENT N ET B4 TRAINED WITH BCE AND SELECTED ROC AUC FOR FOR E FFICIENT N ET B4 TRAINED WITH TRIPLET LOSS AND
AUGMENTATIONS . SELECTED AUGMENTATIONS .

Train\Test CelebDF DF DFD DFDC Train\Test CelebDF DF DFD DFDC

CelebDF 0.998 0.616 0.795 0.673 CelebDF 0.995 0.570 0.604 0.595
DF 0.764 0.966 0.847 0.691 DF 0.779 0.963 0.858 0.682
DFD 0.842 0.650 0.990 0.690 DFD 0.809 0.658 0.982 0.694
DFDC 0.826 0.733 0.923 0.919 DFDC 0.777 0.725 0.905 0.889

tionary in terms of AUC, we are interested in analyzing the • HF: Horizontal Flip
differences in representations learned at feature level with • BC: Brightness and Contrast changes
BCE and triplet loss on the same dataset and across different • HSV: Hue, Saturation and Value changes
datasets. To this end, Figure 2 shows the Multidimensional • ISO: Addition of ISO noise
Scaling (MDS) projection on two components of the features • GAUS: Addition of gaussian noise
extracted with four differently trained EfficientNetB4 models. • DS: Downscaling with a factor between 0.7 and 0.9
All four subplots project faces from the very same 10 pairs • JPEG: JPEG compression with a random quality factor
of real/fake videos randomly extracted from the CelebDF test between 50 and 99
set. Figure 2a uses features extracted with the CNN trained on
We test the aforementioned augmentations independently,
the CelebDF dataset with BCE, while Figure 2b uses features
training with BCE on the DFDC dataset. All the proposed
extracted with the CNN trained on the same dataset with triplet
experiments are performed with the Albumentations [37]
loss instead. While in both cases the separation between real
framework. Results are reported in Figure 3, ordered left to
and fake frames is quite evident, in the triplet case the over-
right in decreasing order of AUC on the DFDC test set.
lapping frames are less in number. This improvement could
Two interesting considerations can be drawn in light of these
prove useful when aggregating the predictions from several
results.
frames at video level. Figure 2c and 2d are generated with
features extracted by CNNs trained on DFDC with BCE and First, augmentations do not seem to help much increasing
triplet loss respectively. While certainly the overlap between intra-dataset detection, maybe due to the cross-contamination
real and fake frames is more evident than in Figures 2a and 2b, between train, validation, and test set in terms of video settings
the triplet loss seems to offer a bit more separation between and scenarios. The only exception is the HF augmentation, that
the two classes, despite the feature extractor being trained on provides a boost of just 0.7% in AUC.
a different dataset. Second, some augmentations are beneficial (at times by
a large margin) in terms of cross-dataset generalization. In
V. DATA AUGMENTATION particular, HF, BC, HSV, and JPEG provide for an AUC
The second batch of experiments is devoted to under- increase on networks trained on both CelebDF and DFD.
standing the effect of different data augmentation techniques. DF does not seem to benefit much from augmentations,
It is known that for deepfake images [29] some type of maybe due to the very different scenes depicted in DFDC
data augmentation techniques prove beneficial in terms of compared to the ones in DF. While the former has actors at
robustness and cross-dataset generalization. Among the many distance, moving in the scene, often two actors, the latter has
possible data augmentations techniques, we focus on the subset almost only a single actor, in the center of the scene, in a TV
that could represent the transformations a face undergoes in studio or during an interview with studio-level lights.
the wild. The following augmentations are considered: In light of the results in terms of single augmentation,

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 25,2021 at 22:58:08 UTC from IEEE Xplore. Restrictions apply.
we build a data augmentation pipeline based on HF, BC, [14] P. Bestagini, S. Milani, M. Tagliasacchi, and S. Tubaro, “Local tamper-
HSV, and JPEG, and re-train the CNN with both BCE and ing detection in video sequences,” in IEEE International Workshop on
Multimedia Signal Processing (MMSP), 2013.
triplet loss. Table III reports results for BCE loss. The fusion [15] L. D'Amiano, D. Cozzolino, G. Poggi, and L. Verdoliva, “A patchmatch-
of augmentations brings important improvements in terms of based dense-field algorithm for video copy-move detection and localiza-
cross-dataset detection AUC, with up to +9% when training on tion,” IEEE Transactions on Circuits and Systems for Video Technology
(TCSVT), vol. 29, pp. 669–682, 2019.
DFD and testing on CelebDF, and when training on CelebDF [16] M. C. Stamm, W. S. Lin, and K. J. R. Liu, “Temporal forensics and
and testing on DFD. The intra-dataset detection performances anti-forensics for motion compensated video,” IEEE Transactions on
are instead mostly unaffected. With augmentations applied Information Forensics and Security (TIFS), vol. 7, pp. 1315–1329, 2012.
[17] A. Gironi, M. Fontani, T. Bianchi, A. Piva, and M. Barni, “A video
to the CNN trained with triplet loss, Table IV shows how forensic technique for detecting frame deletion and insertion,” in IEEE
the few beneficial effects of triplet loss when training on International Conference on Acoustics, Speech and Signal Processing
full dataset are not visible anymore. In facts, triplet loss (ICASSP), 2014.
[18] Y. Li, M. Chang, and S. Lyu, “In ictu oculi: Exposing AI created fake
with data augmentations provides lower AUC for almost all videos by detecting eye blinking,” in IEEE International Workshop on
combinations compared to BCE loss with data augmentation. Information Forensics and Security (WIFS), 2018.
[19] Y. Li and S. Lyu, “Exposing deepfake videos by detecting face warping
VI. C ONCLUSIONS artifacts,” in IEEE Conference on Computer Vision and Pattern Recog-
nition Workshops (CVPRW), 2019.
Two are the main conclusions we can draw from the [20] X. Yang, Y. Li, and S. Lyu, “Exposing deep fakes using inconsistent
experiments presented in this paper. First, a carefully built head poses,” in IEEE International Conference on Acoustics, Speech
and tested data-augmentation pipeline can prove useful in and Signal Processing (ICASSP), 2019.
[21] F. Matern, C. Riess, and M. Stamminger, “Exploiting visual artifacts to
increasing the generalization of a CNN model for deepfake expose deepfakes and face manipulations,” in IEEE Winter Applications
video detection across different datasets. Not all augmentations of Computer Vision Workshops (WACVW), 2019.
are beneficial though, and checking the usefulness of each type [22] H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen, “Multi-task
learning for detecting and segmenting manipulated facial images and
of augmentation could be an important step in the workflow videos,” CoRR, vol. abs/1906.06876, 2019.
of developing a detection pipeline. Second, triplet loss proves [23] H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. Jain, “On the detection
to be helpful in terms of both intra-dataset and cross-dataset of digital face manipulation,” CoRR, vol. abs/1910.01717, 2019.
[24] N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, P. Bestagini, and
detection performances under limited availability of training S. Tubaro, “Video Face Manipulation Detection Through Ensemble of
data. When large datasets are available, data augmentation CNNs,” in International Conference on Pattern Recognition (ICPR),
on a BCE-trained CNN architecture proves to be the winning 2020.
[25] S. Agarwal, H. Farid, Y. Gu, M. He, K. Nagano, and H. Li, “Protecting
combination. world leaders against deep fakes,” in IEEE Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW), 2019.
R EFERENCES [26] L. Verdoliva, “Media forensics and deepfakes: an overview,” CoRR, vol.
abs/2001.06564, 2020.
[1] M. Zollhöfer, J. Thies, P. Garrido, D. Bradley, T. Beeler, P. Pérez,
[27] “DeepFake Detection Challenge Results,” https://ai.facebook.com/blog/
M. Stamminger, M. Nießner, and C. Theobalt, “State of the art on
deepfake-detection-challenge-results-an-open-initiative-to-advance-ai.
monocular 3D face reconstruction, tracking, and applications,” Computer
[28] B. Dolhansky, J. Bitton, B. Pflaum, R. Lu, Jikuo ans Howes, M. Wang,
Graphics Forum, vol. 37, pp. 523–550, 2018.
and C. Canton Ferrer, “The deepfake detection challenge dataset,” CoRR,
[2] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner,
vol. abs/2006.07397, 2020.
“Face2face: Real-time face capture and reenactment of RGB videos,” in
[29] S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn-
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
generated images are surprisingly easy to spot... for now,” CoRR, vol.
2016.
abs/1912.11035, 2019.
[3] J. Thies, M. Zollhöfer, and M. Nießner, “Deferred neural rendering:
[30] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and
Image synthesis using neural textures,” ACM Transactions on Graphics
M. Nießner, “FaceForensics++: Learning to detect manipulated facial
(TOG), vol. 38, pp. 1–12, 2019.
images,” in International Conference on Computer Vision (ICCV), 2019.
[4] L. Li, J. Bao, H. Yang, D. Chen, and F. Wen, “Advancing High Fidelity
[31] Y. Li, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A Large-scale Challenging
Identity Swapping for Forgery Detection,” in IEEE/CVF Conference on
Dataset for DeepFake Forensics,” in IEEE Conference on Computer
Computer Vision and Pattern Recognition (CVPR), 2020.
Vision and Patten Recognition (CVPR), 2020.
[5] “Deepfakes github,” https://github.com/deepfakes/faceswap.
[32] V. Bazarevsky, Y. Kartynnik, A. Vakunov, K. Raveendran, and
[6] “Faceswap,” https://github.com/MarekKowalski/FaceSwap/.
M. Grundmann, “Blazeface: Sub-millisecond neural face detection on
[7] “Impressions,” https://impressions.app/.
mobile gpus,” CoRR, vol. abs/1907.05047, 2019.
[8] “Doublicat,” https://doublicat.com/.
[33] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for
[9] A. Rocha, W. Scheirer, T. Boult, and S. Goldenstein, “Vision of the
convolutional neural networks,” in International Conference on Machine
unseen: Current trends and challenges in digital image and video
Learning (ICML), 2019.
forensics,” ACM Computing Surveys, vol. 43, pp. 1–42, 2011.
[34] “FaceForensics++,” https://ai.googleblog.com/2019/09/
[10] S. Milani, M. Fontani, P. Bestagini, M. Barni, A. Piva, M. Tagliasacchi,
contributing-data-to-deepfake-detection.html.
and S. Tubaro, “An overview on video forensics,” APSIPA Transactions
[35] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
on Signal and Information Processing, vol. 1, p. e2, 2012.
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf,
[11] M. C. Stamm, Min Wu, and K. J. R. Liu, “Information forensics: An
E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner,
overview of the first decade,” IEEE Access, vol. 1, pp. 167–200, 2013.
L. Fang, J. Bai, and S. Chintala, “PyTorch: An imperative style, high-
[12] P. Bestagini, S. Milani, M. Tagliasacchi, and S. Tubaro, “Codec and
performance deep learning library,” in Advances in Neural Information
gop identification in double compressed videos,” IEEE Transactions on
Processing Systems (NIPS), 2019.
Image Processing (TIP), vol. 25, pp. 2298–2310, 2016.
[36] J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen,
[13] D. Vázquez-Padı́n, M. Fontani, D. Shullani, F. Pérez-González, A. Piva,
and Y. Wu, “Learning fine-grained image similarity with deep ranking,”
and M. Barni, “Video integrity verification and GOP size estimation
in IEEE Conference on Computer Vision and Pattern Recognition
via generalized variation of prediction footprint,” IEEE Transactions
(CVPR), 2014.
on Information Forensics and Security (TIFS), vol. 15, pp. 1815–1830,
[37] A. V. Buslaev, A. Parinov, E. Khvedchenya, V. I. Iglovikov, and
2020.
A. A. Kalinin, “Albumentations: fast and flexible image augmentations,”
CoRR, vol. abs/1809.06839, 2018.

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 25,2021 at 22:58:08 UTC from IEEE Xplore. Restrictions apply.

Laptop Bill
No ratings yet
Laptop Bill
7 pages
KodNestCSR2025 JAMRoundPrepMaterial
No ratings yet
KodNestCSR2025 JAMRoundPrepMaterial
20 pages
DeepFake Detection For Human Face Images and Videos A Survey
No ratings yet
DeepFake Detection For Human Face Images and Videos A Survey
19 pages
NEJE KZ Board Schematic
0% (1)
NEJE KZ Board Schematic
1 page
ABS Blink Codes
No ratings yet
ABS Blink Codes
1 page
2303ec039 - Display Systems
100% (1)
2303ec039 - Display Systems
2 pages
Detection Deepfake (EDITED PPT) .16
No ratings yet
Detection Deepfake (EDITED PPT) .16
16 pages
Technical Questions
100% (1)
Technical Questions
7 pages
A Systematic Literature Review On The Effectiveness of Deepfake Detection Techniques 2023
No ratings yet
A Systematic Literature Review On The Effectiveness of Deepfake Detection Techniques 2023
32 pages
Electronics
No ratings yet
Electronics
22 pages
Coding Questions 09-11-2024
0% (1)
Coding Questions 09-11-2024
5 pages
Flask Restx Readthedocs Io en Latest
No ratings yet
Flask Restx Readthedocs Io en Latest
95 pages
HB Ac2 Acv2 Ethernet Ip Geraeteintegration en
No ratings yet
HB Ac2 Acv2 Ethernet Ip Geraeteintegration en
52 pages
WIREs Data Min Knowl - 2023 - Heidari - Deepfake Detection Using Deep Learning Methods A Systematic and Comprehensive
No ratings yet
WIREs Data Min Knowl - 2023 - Heidari - Deepfake Detection Using Deep Learning Methods A Systematic and Comprehensive
45 pages
Test: Jfo Section 2 Quiz
0% (2)
Test: Jfo Section 2 Quiz
2 pages
Deepfake Detection
No ratings yet
Deepfake Detection
10 pages
Review 1-5
No ratings yet
Review 1-5
22 pages
Recent Advancements in The Field of Deepfake Detection
No ratings yet
Recent Advancements in The Field of Deepfake Detection
11 pages
Phase 1 PPT
No ratings yet
Phase 1 PPT
28 pages
Deep Fake Deection
No ratings yet
Deep Fake Deection
13 pages
A Review of Deepfake Techniques Architecture Detection and Datasets
No ratings yet
A Review of Deepfake Techniques Architecture Detection and Datasets
25 pages
Clicker
No ratings yet
Clicker
15 pages
Research Paper Draft
No ratings yet
Research Paper Draft
12 pages
【J-Secure2.0】J-Secure™2.0 Acquirer Implementation Guide - 2.2.0
No ratings yet
【J-Secure2.0】J-Secure™2.0 Acquirer Implementation Guide - 2.2.0
38 pages
Paper 9
No ratings yet
Paper 9
16 pages
Deepfake Detection A Comprehensive Survey From The Reliability Perspective 2024
No ratings yet
Deepfake Detection A Comprehensive Survey From The Reliability Perspective 2024
35 pages
Benchmark Detectors and Datasets
No ratings yet
Benchmark Detectors and Datasets
32 pages
Researchpaper New
No ratings yet
Researchpaper New
17 pages
Video Deepfake Detection Using Particle Swarm Optimization Improved Deep Neural Networks
No ratings yet
Video Deepfake Detection Using Particle Swarm Optimization Improved Deep Neural Networks
37 pages
PRO1 Brochure
No ratings yet
PRO1 Brochure
12 pages
Deepfake Detection For Human Face Images and Videos: A Survey
No ratings yet
Deepfake Detection For Human Face Images and Videos: A Survey
19 pages
A Performance Enhancement of Deepfake Video
No ratings yet
A Performance Enhancement of Deepfake Video
10 pages
A Novel Blockchain Based Deepfake Detection Method Using Federated and Deep Learning Models
No ratings yet
A Novel Blockchain Based Deepfake Detection Method Using Federated and Deep Learning Models
19 pages
Maintenance - Hiren's BootCD 1.3
No ratings yet
Maintenance - Hiren's BootCD 1.3
23 pages
DeepFake Video Detection
No ratings yet
DeepFake Video Detection
22 pages
05 - RLC and MAC Protocols
No ratings yet
05 - RLC and MAC Protocols
66 pages
Study of Supervising and Monitoring of Numerical Relays
No ratings yet
Study of Supervising and Monitoring of Numerical Relays
30 pages
Quick Guide: SUN2000 - (50KTL-ZHM3, 50KTL-M3)
No ratings yet
Quick Guide: SUN2000 - (50KTL-ZHM3, 50KTL-M3)
21 pages
Deepfake Detection Using Convolutional Vision Transformers and Convolutional Neural Networks
No ratings yet
Deepfake Detection Using Convolutional Vision Transformers and Convolutional Neural Networks
17 pages
Innovative Project
No ratings yet
Innovative Project
7 pages
Seminar
No ratings yet
Seminar
18 pages
K024 K006 DWM ResearchPaper
No ratings yet
K024 K006 DWM ResearchPaper
16 pages
ISTVT Interpretable Spatial-Temporal Video Transformer For Deepfake Detection
No ratings yet
ISTVT Interpretable Spatial-Temporal Video Transformer For Deepfake Detection
14 pages
Seminar Literature Review - Deepfake Detection - Rizkiaji Putro
No ratings yet
Seminar Literature Review - Deepfake Detection - Rizkiaji Putro
22 pages
Generalization of Forgery Detection With Meta Deepfake Detection Model
No ratings yet
Generalization of Forgery Detection With Meta Deepfake Detection Model
12 pages
Large Scale Benchmark For Content Driven
No ratings yet
Large Scale Benchmark For Content Driven
19 pages
BIJ Data Analysis Report
No ratings yet
BIJ Data Analysis Report
18 pages
Tech Reviewer Brief - 2.0
No ratings yet
Tech Reviewer Brief - 2.0
17 pages
Deepfake Detection of Images
No ratings yet
Deepfake Detection of Images
9 pages
Retracted Comparative Analysis OfDeepfake Image Detection
No ratings yet
Retracted Comparative Analysis OfDeepfake Image Detection
19 pages
Deepfake Catcher
No ratings yet
Deepfake Catcher
11 pages
Research Paper
No ratings yet
Research Paper
6 pages
A Robust Approach To Multimodal Deepfake Detection
No ratings yet
A Robust Approach To Multimodal Deepfake Detection
18 pages
Detecting Deepfake, Faceswap and Face2Face Facial Forgeries Using Frequency CNN
No ratings yet
Detecting Deepfake, Faceswap and Face2Face Facial Forgeries Using Frequency CNN
18 pages
Deepfake Detection Using CNN and DCGANS To Drop-Out Fake Multimedia Content A Hybrid Approach
No ratings yet
Deepfake Detection Using CNN and DCGANS To Drop-Out Fake Multimedia Content A Hybrid Approach
6 pages
Photo Contest Criteria and Guidelines
No ratings yet
Photo Contest Criteria and Guidelines
2 pages
Deep Fake Detection - Finalized
No ratings yet
Deep Fake Detection - Finalized
8 pages
System For Detecting Deepfake in Videos
No ratings yet
System For Detecting Deepfake in Videos
11 pages
An Online Scheduling Algorithm With Advance Reservation For Large-Scale Data Transfers
No ratings yet
An Online Scheduling Algorithm With Advance Reservation For Large-Scale Data Transfers
22 pages
Deepfake
No ratings yet
Deepfake
10 pages
Deepfake Video Detection Using Convolutional Vision Transformer
No ratings yet
Deepfake Video Detection Using Convolutional Vision Transformer
9 pages
Deepfake Detection With Deep Learning: Convolutional Neural Networks Versus Transformers
No ratings yet
Deepfake Detection With Deep Learning: Convolutional Neural Networks Versus Transformers
8 pages
Pellicer PUDD Towards Robust Multi-Modal Prototype-Based Deepfake Detection CVPRW 2024 Paper
No ratings yet
Pellicer PUDD Towards Robust Multi-Modal Prototype-Based Deepfake Detection CVPRW 2024 Paper
9 pages
Deepfake Video Detection Using Convolutional Visio
No ratings yet
Deepfake Video Detection Using Convolutional Visio
9 pages
Deeo
No ratings yet
Deeo
11 pages
In-The-Wild Deepfake Detection Using Adaptable CNN Models With Visual Class Activation Mapping For Improved Accuracy
No ratings yet
In-The-Wild Deepfake Detection Using Adaptable CNN Models With Visual Class Activation Mapping For Improved Accuracy
6 pages
IJRPR7765
No ratings yet
IJRPR7765
5 pages
A Comparative Study Deepfake Detection Using Deep-Learning
No ratings yet
A Comparative Study Deepfake Detection Using Deep-Learning
5 pages
Bhanu Priya 2020 IOP Conf. Ser. Mater. Sci. Eng. 912 062009
No ratings yet
Bhanu Priya 2020 IOP Conf. Ser. Mater. Sci. Eng. 912 062009
10 pages
Paper (Related Project-3)
No ratings yet
Paper (Related Project-3)
9 pages
Exploring Spatial - Temporal Features Fusion Model For Deepfake Video Detection
No ratings yet
Exploring Spatial - Temporal Features Fusion Model For Deepfake Video Detection
12 pages
Identity Consistency For Enhanced Deep Fake Detection in Video Content Using Deep Learning
No ratings yet
Identity Consistency For Enhanced Deep Fake Detection in Video Content Using Deep Learning
6 pages
Deepfake Video Detection System Using Deep Neural Networks
No ratings yet
Deepfake Video Detection System Using Deep Neural Networks
6 pages
Towards Generalizable Deepfake Detection
No ratings yet
Towards Generalizable Deepfake Detection
10 pages
Etransfer KPESE Manual Version 1.0 PDF
No ratings yet
Etransfer KPESE Manual Version 1.0 PDF
16 pages
GTA-Net A Robust Method For Deepfake Face Image Detection
No ratings yet
GTA-Net A Robust Method For Deepfake Face Image Detection
6 pages
CD4541 Programmable Timer
No ratings yet
CD4541 Programmable Timer
7 pages
Cisco Meeting Management 2 5 1 Release Notes
No ratings yet
Cisco Meeting Management 2 5 1 Release Notes
14 pages
DLAI4 Energy Boltzmann
No ratings yet
DLAI4 Energy Boltzmann
8 pages
Pattern Recognition Letters: Roberto Caldelli, Leonardo Galteri, Irene Amerini, Alberto Del Bimbo
No ratings yet
Pattern Recognition Letters: Roberto Caldelli, Leonardo Galteri, Irene Amerini, Alberto Del Bimbo
7 pages
Web Devlopement Intv Questions
No ratings yet
Web Devlopement Intv Questions
4 pages
Ijirt164937 Paper
No ratings yet
Ijirt164937 Paper
5 pages
It Wasnt Me Irregular Identity in Deepfake Videos
No ratings yet
It Wasnt Me Irregular Identity in Deepfake Videos
5 pages
Deep Fake Detection Using CNN: Project Course On Neural Network
No ratings yet
Deep Fake Detection Using CNN: Project Course On Neural Network
4 pages
Optimization of DeepFake Video Detection Using Image Preprocessing
No ratings yet
Optimization of DeepFake Video Detection Using Image Preprocessing
5 pages
Deepfake Synopsis-1
No ratings yet
Deepfake Synopsis-1
2 pages
Azure Cicd
No ratings yet
Azure Cicd
4 pages
1620A "Dewk" Thermo-Hygrometer: Technical Data
No ratings yet
1620A "Dewk" Thermo-Hygrometer: Technical Data
4 pages
Deepfake Detection
No ratings yet
Deepfake Detection
3 pages
Naveen Resume
No ratings yet
Naveen Resume
2 pages
Datasheet - LPS8 - LoRaWAN Pico Station
No ratings yet
Datasheet - LPS8 - LoRaWAN Pico Station
1 page
Cursed Emoji Love - Google Search
No ratings yet
Cursed Emoji Love - Google Search
1 page
Video Conferencing Industry: 5 Forces Worksheet: Key Barriers To Entry
No ratings yet
Video Conferencing Industry: 5 Forces Worksheet: Key Barriers To Entry
1 page

Deepfake 1

Uploaded by

Deepfake 1

Uploaded by

Training Strategies and Data Augmentations

in CNN-based DeepFake Video Detection

Luca Bondi, Edoardo Daniele Cannas, Paolo Bestagini, Stefano Tubaro

0.8 0.8 0.8 0.8

0.7 0.7 0.7 0.7

0.6 0.6 0.6 0.6

0.5 0.5 0.5 0.5

0.8 0.8 0.8 0.8

0.7 0.7 0.7 0.7

0.6 0.6 0.6 0.6

0.5 0.5 0.5 0.5

0.8 0.8 0.8 0.8

0.7 0.7 0.7 0.7

0.6 0.6 0.6 0.6

0.5 0.5 0.5 0.5

0.70 0.70 0.70 0.70

TABLE III TABLE IV

Train\Test CelebDF DF DFD DFDC Train\Test CelebDF DF DFD DFDC

You might also like