Optimization of DeepFake Video Detection Using Image Preprocessing
Optimization of DeepFake Video Detection Using Image Preprocessing
2023 Fifth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA) | 979-8-3503-0995-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/ACTEA58025.2023.10193954
Image Preprocessing
Ali Berjawi Khouloud Samrouth Olivier Deforges
Cybersecurity and Forensics Department Cybersecurity and Forensics Department IETR
Arab Open University Arab Open University INSA
Beirut, Lebanon Beirut, Lebanon Rennes, France
[email protected] [email protected] [email protected]
Abstract—Deep learning has been evolving recently which part of a wider conspiracy theory [5], and with the FakeNude
allowed it to handle complex problems like big data, computer vi- App, and many other tools that uses Faceswap, puppet-master,
sion, and human-level control. One of the deep learning-powered lip-sync, etc. [6] which targeted multiple political personnel
applications recently emerged is called ”deepfake”. Deepfake
algorithms have recently been a controversial development in as well as celebrities. Also, the biggest problem is that by
Artificial Intelligence, because they use deep learning to generate the advancement of deep learning, deepfake videos are now
fake yet realistic content based on an input dataset. As a result, capable of using highly sophisticated methods with very high
many are concerned with the potential risks in terms of cyber- accuracy as illustrated in Fig. 1 [7].
security as it causes threats to privacy, democracy, and national
security. Multiple techniques were proposed to detect deepfake
videos, however most cannot cope with the variety of the deepfake
generation techniques. Therefore, in this study, we optimize
one of the best existing deepfake detection methods based
on Xception model. In particular, our proposed optimization
scheme consists of a pre-processing phase performing advanced
image enhancement on the videos in hand for highlighting the
face features for better feature extraction as well fake content
detection, which is preceded by a close-up dataset cleansing. Our
experiments show that the proposed pre-processing optimization
scheme had improvemes the performance of the Xception Binary
Classifier- Inference model from 94% to 96%.
Keywords—Deep Learning, Image Processing, Deepfake, Pre-
processing, Xception Binary Classifier- Inference model
Fig. 1: Real vs DeepFake video for Jennifer Lawrence
I. I NTRODUCTION
In this paper, we optimize a recently developed intelligent
Artificial Intelligence (AI) is found as a sophisticated and in- system to detect deepfake videos called Xception Binary
telligent computer science systems that are capable of making Classier-Inference model and finding a way to improve it in
computations similar to those that the human brain routinely order to reach a higher accuracy. This will help in limiting
performs and it includes methods, tools and systems devoted to their spread and therefore reducing their bad effects on so-
simulate human methods of logical and inductive knowledge cial, political, and security level. In particular, our proposed
acquisition for solving problems [1]. Machine learning has optimization scheme is a pre-processing phase that changes
been advancing throughout the recent years and we now some characteristics of the files with in the dataset used. Our
reached the technology of deep learning- An advance of experiments show that this pre-processing phase optimizes
machine learning which contains hidden layers [2]. the Xception Binary Classifier-Inference model in Deepfake
As everything, deep learning can be used not only in a detection results.
good manner, but it can be harmful, and could hit as wide The remainder of the paper is organized as follows. Section
as nations and continents [3], and one of the technologies II summarizes previous studies focused on deepfake detection.
that uses deep learning is the so called ”Deepfake videos”, Section III explains our proposed optimization scheme. Sec-
which is spreading like wildfire in the world, even though tion IV describes the proposed experimental setup and results
it has an innocent side. Fake generated videos of politicians achieved. Section V concludes our work.
and celebrities becoming a cyber security threat that has a
huge impact on the social security [4]. Even it has been used II. R ELATED W ORK
to impact highly rated political personnel as happened with Deep learning has achieved great success in deepfake de-
the US president Joe Biden by faking a video of him talking tection, where a set of features is extracted from the image of
about the riot of 6 January 2021 which made a huge impact video are fed to a deep neural network to take the decision
on the White House, while it was considered to becoming where it is real or fake [8].
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on July 25,2024 at 11:15:23 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-0995-9/23/$31.00 ©2023 IEEE 59
In [9], image descriptors are extracted using Speeded Up
Robust Features (SURF). These features are used to train a
SVM classifier [10], This approach has improved the deepfake
image detection performance however, unable to detect manip-
ulated videos. 3D head position from 2D facial landmarks are
estimated though. The computed difference among the head
poses was used: As a feature vector to train the SVM classifier
in order to differentiate between original and forged content.
This technique exhibits good performance for deepfake detec-
tion but has a limitation in estimating landmark orientation in
the blurred images, which degrades the performance of this
method under such scenarios.
In [11] Guera et al. presented a method for detecting
synthesized faces from videos. Multimedia stream descriptors
were used to extract the features that were then used to train
the SVM, and random forest classifiers to differentiate between
Fig. 2: Xception Architecture
the real and manipulated faces from the videos [12]. This
technique gives an effective solution to deepfake detection TABLE I: Existing Deepfake detection models
however unable to perform well against video re-encoding
Authors Model Description Accuracy
attacks. Juman Noor
Muneeba Daud
Li et al. [13] extracted the facial landmarks using the dlib Raima Rashid Support Vector
Facial Expression Recognition using
Hand-Crafted Features and 90.79
software package. Next, deep CNN-based models such as Hira Mir
Saima Nazir
Machine (SVM)
Supervised Feature Encoding
ResNet152, ResNet101, ResNet50, and VGG16 were trained Sergio A. Velastin
David Güera
to detect forged content from videos. This approach is more Sritam Baireddy Random Forest We need no pixels:
Paolo Bestagini Support Vector Machine Video manipulation detection 91.97
robust in detecting the forensic changes; however, it exhibits Stefano Tubaro (SVM) using Stream Discriptors
low performance on multi-time compressed videos [14]. Edward J, Delp
Yuezun Li Convolutional Neural Exposing Deepfake videos by
93.2
RNN was trained on the set of extracted features to detect Siweily Network (CNN) detecting Face warping artifacts
François Chollet Extreme Inception Deep learning with depthwise
deepfakes from the input videos. This work achieves good Google, Inc. (Xception) separable convolutions
94.5
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on July 25,2024 at 11:15:23 UTC from IEEE Xplore. Restrictions apply.
60
of 200 fake videos were removed (9% of the fake videos were set to 7, the edges and contours are very clear as illustrated
removed). To keep the dataset balanced, we removed the same in Fig. 6.
amount of real videos from the dataset. Hence, 36 of 400 Therefore, the best optimization was obtained when the
videos were cleaned from the whole dataset (9% of the videos values of the parameters were 15 and 7 for the window-size
were removed). and the Clip-limit, respectively.
Fig. 5, and Fig. 6 show examples of real and fake video
frames enhanced using CLAHE with the selected parameters
and their corresponding histograms, respectively.
B. Image Enhancement Fig. 5: Real Images Enhanced Using CLAHE with their
The second block of our pre-processing phase consists in histograms
enhancing the quality of the cleaned dataset images in order
to highlight more their features. Multiple image enhancement IV. E XPERIMENTAL R ESULTS
methods can be efficiently used. In particular, we use Clip-
Limited Adaptive Histogram Equalization (CLAHE) to in- A. Dataset
crease the contrast within the files in the dataset. The original dataset used for testing the model consists of
As the fake videos differ by their brightness and background 400 videos split into 200 hundred real vs 200 fake, which was
details from one video to another, the challenge is to find taken as is from Kaggle’s deepfake challenge.
the suitable configuration to emphasize the features extraction
and to avoid any feature loss. To get the best results, we B. Development Environment
had to make several try-outs to finally select the desired We have implemented the original Xception Binary
values. In particular, CLAHE has two parameters: Clip-limit Classifier- Inference model and our proposed optimization
and Window-size, and they should be carefully selected. From scheme including the different enhancement methods on
a mathematical point of view, the search for the optimal couple Jupyter notebook installed on a system that has VGA Geforce
of values is extensively complex. Therefore, for the sake of GTX 1660 Ti with Max-Q Design, CPU Intel core i7-9750 H
our optimization model, we first fixed the Clip-limit parameter 2.60 GHz, RAM 16 GB, Windows 11 Pro Operating system
in order to find the optimal value of the Window-size. Then, and we installed NVIDIA CUDA Toolkit 11.8.0 522.06 and
on the second hand, based on the selected Window-size, we NVIDIA CuDNN v8.5.0. Where CUDA takes advantage of
search for the optimal value of Clip-limit parameter. the parallel processing power of the CPU that accelerates
More precisely, we first variate the Window-size from 8 to deep learning compute-intensive applications, and CuDNN is a
21, and we have fixed the Clip-limit to 7. We noticed that when GPU-accelerated library of primitives for deep neural networks
the Window-size is less than 11 we start having defections which provides highly tuned implementations for standard
in the image which will cost accuracy, and if we try at the routines such as forward and backward convolution, pooling,
videos with Window size between 11 and 16 we can notice normalization, and activation layers.
that the brightness and the white regions decrease. Therefore,
we empirically set the Window size to 15 which had the most C. Results
features extracted. To improve the experimental analysis, we initially tested
After finding the optimal Window-size, we need to find the impact of each step in the pre-processing phase separately
the optimal Clip-limit. We have fixed the Window-size to its on the binary classification accuracy of the Xception Binary
optimum values (size= 15) and we tried the different clip-limit Classifier Infernce model. Subsequently, we combined the two
values simultaneously. We noticed that when the clip-limit is steps to further increase the accuracy.
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on July 25,2024 at 11:15:23 UTC from IEEE Xplore. Restrictions apply.
61
model accuracy to reach 96% (around 2% increase of accuracy
in comparison with the original Xception Binary Classifier
Inference model).
We note that we did not conduct any time complexity
analysis as we believe that the adopted image enhancement
tools are lightweight tools and do not add any layer of
complexity. A quantitative accurate measurement would be
done as a future work.
V. C ONCLUSION
In this paper, we contributed to the existing Xception Binary
Classifier- Inference model for deepfake detection by adding
a two-steps pre-processing phase where we first cleaned
the dataset by removing videos with major visual artifacts
especially on the face region. Then, we applied an image
enhancementusing CLAHE to amplify the contrast within the
frames of the videos. Our pre-processing phase resulted in an
obvious visibility of different edges and contours in addition
to an increase in the overall accuracy of deepfake detection.
Therefore, this optimized model can be implemented on any
processing system for deepfake detection to assist the forensic
investigators and increase the integrity of evidences provided
to the courts of law.
As a future work, we will verify if our contribution re-
mains valid on the recent Identity-aware deepfake detection
approaches. A possible challenge would be the absence of fake
dataset as this recent approach relies only on real dataset.
R EFERENCES
Fig. 6: Fake Images Enhanced Using CLAHE with their
histograms [1] R. Stroup and K. Niewoehner, “Application of artificial intelligence in
the national airspace system – a primer,” pp. 1–14, 04 2019.
[2] M. Deshmukh, “Basic concepts of artificial neural network (ann) mod-
eling and its application in pharmaceutical research,” Apr 2014.
D. Results with separated pre-processing steps [3] H. Larochelle, Y. Bengio, L. J. erˆome, and P. Lamblin, “Exploring
strategies for training deep neural networks,” Journal of Machine Learn-
We first experimented, apart, the effect of each of the ing Research 1, 2009.
pre-processing steps on the binary classification accuracy of [4] N. Anantrasirichai and D. Bull, Artificial Intelligence in the Creative
Industries: A Review. 07 2020.
the deepfake detection using the original Xception Binary [5] J. Horton and S. Sardarizadeh, “False claims of ’deepfake’ president
Classifier Inference model. Starting with the dataset cleansing, biden go viral,” Jul 2021.
removing the dirty data while keeping the dataset balanced, [6] T. Nguyen, C. M. Nguyen, T. Nguyen, D. Nguyen, and S. Nahavandi,
“Deep learning for deepfakes creation and detection: A survey,” 09 2019.
increased the accuracy to reach 95.1% (around 0.6% increase [7] A. Almars, “Deepfakes detection techniques using deep learning: A
of accuracy in comparison with the original Xception-based survey,” Journal of Computer and Communications, vol. 09, pp. 20–
model). 35, 01 2021.
[8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, i. Xu, a. Warde-Farley,
Regarding the contrast enhancement step, CLAHE method S. Ozair, A. Courville, and o. Bengio, “Generative adversarial nets,” Jun
significantly increased the the Xception-based model accuracy 2014.
to reach 95.6% ((around 1.1% increase of accuracy in com- [9] J. Noor, M. Daud, R. Rashid, H. Mir, S. Nazir, and S. A. Velastin,
“Facial expression recognition using hand-crafted features and super-
parison with the original Xception-based model). The reason vised feature encoding,” in 2020 International Conference on Electrical,
behind that is the increase in the visibility of different edges Communication, and Computer Engineering (ICECCE), pp. 1–5, 2020.
and contours leading to a better feature extraction. [10] S. L. Happy and A. Routray, “Automatic facial expression recognition
using features of salient facial patches,” IEEE Transactions on Affective
Computing, vol. 6, 05 2015.
E. Results with combined pre-processing steps [11] D. Güera, S. Baireddy, P. Bestagini, S. Tubaro, and E. J. Delp, “We
Then, we experimented the combination of the dataset need no pixels: Video manipulation detection using stream descriptors,”
2019.
cleansing and contrast enhancement method. In particular, [12] K. Jack, “Chapter 13 - mpeg-2,” in Video Demystified (Fifth Edition)
after enhancing the contrast using CLAHE of frames in (K. Jack, ed.), pp. 577–737, Burlington: Newnes, fifth edition ed., 2007.
the cleaned dataset, enhanced cropped faces are fed to the [13] Y. Li and S. Lyu, “Exposing deepfake videos by detecting face warping
artifacts,” 2018.
pre-trained Xception model. Our proposed two-steps pre- [14] R. Pourreza, A. Ghodrati, and A. Habibian, “Recognizing compressed
processing phase has further increased the Xception-based videos: Challenges and promises,” pp. 999–1007, 10 2019.
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on July 25,2024 at 11:15:23 UTC from IEEE Xplore. Restrictions apply.
62
[15] F. Chollet, “Xception: Deep learning with depthwise separable convo-
lutions,” 2016.
[16] Z. Akhtar, “Xception: Deep learning with depth-wise separable convo-
lutions,” Mar 2021.
[17] M. Fabien, “Xception model and depthwise separable convolutions,”
Mar 2019.
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on July 25,2024 at 11:15:23 UTC from IEEE Xplore. Restrictions apply.
63