CNN Based Deep Learning Model for Deepfake Detection
CNN Based Deep Learning Model for Deepfake Detection
Abstract—In the recent period there has been massive progress a target person acting or referring to a source has demonstrated
in synthetic image generation and manipulation which signif- how computer graphics and visual effects can be used insult
icantly raises concerns for its ill applications towards society. people by changing their faces to look different faces person. A
This would result in spreading false information, leading to loss
of trust in digital content. This paper introduces an automated basic way to create deepfake in-depth learning models such as
and effective approach to get facial expressions in videos, and autoencoders and competing production networks, widely used
especially focused on the latest method used to produce hyper in the field of computer vision. These models are used to assess
realistic fake videos: Deepfake. Using faceforenc++ dataset for a person’s facial expressions and movements and to combine
training our model, we achieved more that 99% successful images of another person’s face making similar expressions
detection rate in Deepfake, Face2Face, faceSwap and neural
texture. Regular image forensics techniques are usually not and movements [1]. In-depth fraudulent methods often require
very useful, because of the strong deterioration of data due to large amounts of image and video data to train models to
the compression. Thus, this paper follows a layered approach make real photos and videos. While public figures such as
with first detecting the subject with the help of existing facial celebrities and politicians may have a large number of videos
recognition networks followed by extracting facial features using and photos available online, they are the first deep victims [1].
CNN, then passing through the LSTM layer, where we make use
of our temporal sequence for face manipulation between frames. Many politicians and actors became victims of Deepfakes. For
Finally use of the Recycle-GAN which internally makes use of criminal purposes, forensic videos are converted using novel
generative adversarial networks to merge spatial and temporal methods such as face swap and faceswap-GAN. Analysing this
data. issue there have been several methods of obtaining deceptive
Index Terms—Face Detection, FaceForensics++, DeepFake, images, most of them or analyze the inconsistencies compared
Face2Face, FaceSwap, Neural Texture, Convolutional Neural
Network (CNN), Long Short-Term Memory (LSTM) to the conventional ones the camera pipe will be or rely
on the release of something image changes in the resulting
I. I NTRODUCTION image. Opposition (seeing) as video fraud, a few algorithms
using hand-made features, in-depth learning algorithms, and
Over the decades, the popularity of smartphones and the more recently GAN-based methods are being tested. For
growth of social media have led to the adoption of digi- example, manual methods include steganalysis methods, detect
tal photos and videos into the most popular digital assets. 3D head position inconsistencies, etc. However, there is still
On Youtube alone, 300 hours of video are uploaded every a room for the development of the modern industry finding
minute. Every day, 5 billion videos are viewed and 1 billion deepfakes, especially in challenging data such as a database
hours are broadcast, with Facebook and Netflix streaming of Face Forensics (FF ++). In this paper we have selected
combined. This major use of digital photography is followed the base architecture of our model as ResNet18. The reason
by an increase in photo editing techniques, using editing for choosing the ResNet18 model was because it takes care
software such as Photoshop as an example. The proliferation of a major problem like extinction and explosion of the
of deepfakes in recent years raises serious concerns about the gradient. Make this happen by using something known as
authenticity of digital content by the media and other online skipping communication. The advantage of adding this type
forums. For example, Deepfake (based on “deep reading” and of link is that if any of the layers damage the performance
“deception”) is a method that can put more than a person’s of the structures it will be omitted normally. The biggest
facial expression directed at a source video to create a video of
2
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.
b) FaceSwap: It is used for facial identity manipu- thus increasing the accuracy of our model.
lation. It is a graphics-based approach that makes
use of each frame to create a model of the source
and then project this new model onto the target by
minimizing the distance between 2 frames.
c) NeutalTextures: This is an old technique that makes
use of GANs for facial reenactment.
3
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.
with the specific correction related to color thesis applied it were to modify then the network would additionally
accurately. We apply these steps to all the individual require extra input based on the movements of the eye.
pairs and targeted pairs until one video ends. The frames
detected for output are then used to make to surface
populated with high concentration and density (refer
Fig. 4a, 4b). Then these collected frames are used to
connect properly with the dataset faces under various
facial expressions and lighting conditions. To analyze the
dataset videos precisely, we used the Face2Face method
in order to duplicate the frames and achieve the required
result. We process each video through a pre-processing
world; here, we use the first frames to get a temporary
face recognition (i.e., 3D model), and track additions
over the remaining frames.
Fig. 5. Result set 2
TABLE I
ACCURACY OF DIFFERENT ALGORITHMS TESTED
Method Train Validation Tests Raw HQ LQ
Bayar and Stamm 280374 52359 56382 98.74% 82.97% 66.84%
Rahmouni et al. 280342 52356 56371 97.03% 79.08% 61.18%
MesoNet 295164 55317 60540 95.23% 83.10% 70.47%
XceptionNet 295578 55384 60614 99.26% 95.73% 81.00%
VI. C ONCLUSION
Fig. 4. Result Set 1
Deepfakes has led to people believe less in media and
2) YouTube dataset for trained models: Models trained with seeing them as less trustworthy and consistent with their
one-to-one YouTube data learn to find real-world deep- contents. They may cause distress and ill effects to those
fakes, but also learn to find simple deepfakes on paper who are targeted, nurture inadvertent knowledge and hate
databases. These models however failed to detect any speech, and they may provoke political unrest, burning up
other type of deception (such as NeuralTextures). The society, violence, or war. This is especially important these
large FaceForensics ++ database enables us to train a days as the technology for creating deepfakes is very close
modern counterfeit image detector to detect surveillance and social media can spread that untrue content quickly.
(refer Fig. 5a, 5b). In this case, we use three default Sometimes deepfakes do not need to be distributed to a
facial expressions, which are used in our database. To large audience to create harmful effects. People who build
mimic real-life situations, we have chosen to collect deepfakes with malicious intent only need to bring them to
videos anywhere online and on YouTube. The initial the target audience as part of their destructive strategy without
testing by inculcating the mentioned methods made us using a social media platform. As new methods of deception
realize that the dataset face taken must be redirected with emerge by the day, it is necessary to develop methods that
minimum delay in order for test to not fail and thereby can detect fakes with minimal training data. Our website is
produce accurate results. So, we did a personal review of already being used for this legal transfer learning process,
the resulting clips to ensure the selection of high-quality where the knowledge of one source of fraud is transferred
video and to avoid videos with an explicit face. We have to another targeted domain. We hope that the database and
selected approxmiately 300,000 images for our dataset benchmark will be a stepping stone to future research in the
in order to be implemented by the above three mentioned field of digital media intelligence, and especially with a focus
algorithms.. All performed tests are done using the on face-to-face fraud. To summarize this, we were able to
dataset videos from the set. The NeuralTextures method propose an automatic benchmark for face change acquisition
is based on the geometry that is used during the train and under random pressure for standard comparison, including the
test times. The Face2Face module was used to produce human base. Comprehensive modern handmade experiments
the gathered information. It was used to identify and and counterfeiters learned in a variety of contexts are also
correct the expressions using the mouth regions only. shown in a modern way of finding counterfeit designed for
The other parts like eye area were not modified, since if facial modification.
4
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] Nguyen, Thanh Nguyen, Cuong M. Nguyen, Tien Duc, Thanh
Nahavandi, Saeid. (2019). Deep Learning for Deepfakes Creation and
Detection: A Survey.
[2] Korus, Pawel Huang, Jiwu. (2016). Multi-Scale Analysis Strategies in
PRNU-Based Tampering Localization. IEEE Transactions on Informa-
tion Forensics and Security. PP. 1-1. 10.1109/TIFS.2016.2636089.
[3] D. Afchar, V. Nozick, J. Yamagishi and I. Echizen, ”MesoNet: a Com-
pact Facial Video Forgery Detection Network,” 2018 IEEE International
Workshop on Information Forensics and Security (WIFS), 2018, pp. 1-7,
doi: 10.1109/WIFS.2018.8630761.
[4] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies and M.
Niessner, ”FaceForensics++: Learning to Detect Manipulated Facial
Images,” 2019 IEEE/CVF International Conference on Computer Vision
(ICCV), 2019, pp. 1-11, doi: 10.1109/ICCV.2019.00009.
[5] K. Zhang, M. Sun, T. X. Han, X. Yuan, L. Guo and T.
Liu, ”Residual Networks of Residual Networks: Multilevel Resid-
ual Networks,” in IEEE Transactions on Circuits and Systems for
Video Technology, vol. 28, no. 6, pp. 1303-1314, June 2018, doi:
10.1109/TCSVT.2017.2654543.
[6] K. He, X. Zhang, S. Ren and J. Sun, ”Deep Residual Learning for Image
Recognition,” 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[7] Almars, Abdulqader. (2021). Deepfakes Detection Techniques Using
Deep Learning: A Survey. Journal of Computer and Communications.
09. 20-35. 10.4236/jcc.2021.95003.
[8] B. Malolan, A. Parekh and F. Kazi, ”Explainable Deep-Fake Detection
Using Visual Interpretability Methods,” 2020 3rd International Confer-
ence on Information and Computer Technologies (ICICT), 2020, pp.
289-293, doi: 10.1109/ICICT50521.2020.00051.
[9] S. Agarwal, N. Girdhar and H. Raghav, ”A Novel Neural Model
based Framework for Detection of GAN Generated Fake Images,”
2021 11th International Conference on Cloud Computing, Data Sci-
ence Engineering (Confluence), 2021, pp. 46-51, doi: 10.1109/Con-
fluence51648.2021.9377150.
[10] N. S. Ivanov, A. V. Arzhskov and V. G. Ivanenko, ”Combining Deep
Learning and Super-Resolution Algorithms for Deep Fake Detection,”
2020 IEEE Conference of Russian Young Researchers in Electri-
cal and Electronic Engineering (EIConRus), 2020, pp. 326-328, doi:
10.1109/EIConRus49466.2020.9039498.
5
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.