0% found this document useful (0 votes)
55 views6 pages

Comparative Analysis and Evaluation of CNN Models For Deepfake Detection

Uploaded by

Ashiya Ajare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

Comparative Analysis and Evaluation of CNN Models For Deepfake Detection

Uploaded by

Ashiya Ajare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS)

Comparative Analysis and Evaluation of CNN


Models for Deepfake Detection
Pattrick Ritter Devan Lucian Anderies
Computer Science Departement Computer Science Departement Computer Science Departement
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Andry Chowanda
2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS) | 979-8-3503-1843-2/23/$31.00 ©2023 IEEE | DOI: 10.1109/AIDAS60501.2023.10284611

Computer Science Departement


School of Computer Science
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract—Deepfake technology has become a significant con- classification [3] [4] [5]. In this study, we aim to evaluate
cern due to its ability to create highly realistic fake videos and im- the performance of these models, namely EfficientNetB7, Mo-
ages, leading to the potential deception of individuals. Detecting bileNetV3, and ConvNeXt, in the specific context of deepfake
deepfakes has become a critical research area in computer vision
and multimedia forensics. This paper presents a comparative detection. As a baseline model, we will utilize ResNet-152, a
analysis of deepfake detection models, focusing on evaluating widely adopted CNN model in deep-learning research [6]
their accuracy and robustness. Four CNN models, namely The study objective is to assess the accuracy of these
ResNet-152, MobilenetV3, Convnext Large, and EffecientNetB7, models in detecting various types of deepfakes. The findings of
were implemented and trained using a custom dataset obtained this research will contribute to mitigating the possible threats
from FaceForensics++. The models were evaluated based on
training accuracy, average loss, and testing accuracy. An LSTM associated with deepfakes [7]. By conducting a comprehen-
layer was also incorporated into each model’s architecture to sive evaluation of state-of-the-art models, this study aims to
leverage sequential information. The results demonstrate varying advance deepfake detection techniques and provide guidance
performance among the models, with EfficientNet B7 achieving for future research in this critical area.
the highest testing accuracy of 75%. The findings of this study
provide insights for future research in this critical area. II. L ITERATURE R EVIEW
Index Terms—deepfake detection, CNN models, comparative
analysis, accuracy, LSTM layer Deepfake technology has become a growing concern in
recent years due to its potential to manipulate and deceive indi-
I. I NTRODUCTION viduals through the creation of realistic fake videos or images.
Initially developed for legitimate applications in the entertain-
Deepfake refers to the alteration of digital media, such as ment industry, deepfake technology utilizes deep learning tech-
photos and videos, through manipulations that replace the niques, particularly Convolutional Neural Networks (CNN)
appearance of one person with that of another. Deepfake has and Generative Adversarial Networks (GAN) [8], to generate
raised significant concerns as it enables the creation of highly highly convincing and manipulative media content. However,
realistic fake videos and images that can deceive individuals. this technology’s misuse for spreading misinformation, fake
The detection of deepfake media has become a critical research news, and malicious propaganda has raised significant alarms,
area in computer vision and multimedia forensics. necessitating research efforts to detect and combat deepfake
Convolutional neural networks (CNN) have been widely media.
used in deepfake detection methods because they effectively
extract and analyze visual features from images and videos A. Deepfake Detection Methods
[1]. These methods utilize the hierarchical structure of CNN Deepfake Forensics, as one of the deepfake detection meth-
to identify patterns and anomalies in manipulated media, ods is a technique that analyzes various media features such as
thus enhancing deepfake detection accuracy. However, the patterns, noise, and confusion matrix. This method involves the
rapid evolution of deepfake generation techniques presents use of machine learning approaches, specifically deep learning
challenges for existing deepfake detection approaches [2]. and neural networks. The objective is to develop a model that
Several studies have shown the effectiveness of CNN can differentiate between real and fake media. After being
models in various computer vision tasks, including image trained on a set of real and fake media, the model is tested

979-8-3503-1843-2/23/$31.00 ©2023 IEEE

979-8-3503-1843-2/23/$31.00 ©2023 IEEE — 250 —


Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.
2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS)

using a test set. If the accuracy falls below the expected level, C. Long Short-Term Memory
the model is retrained until it achieves the desired accuracy. Long Short-Term Memory, or LSTM, is a modified version
One of the more common detection methods that use deep of a Recurrent Neural Network [13]. LSTM allows models
learning techniques involves binary classification. Binary clas- to learn about temporal information that would otherwise be
sification is a task involving input determined by predictions lost if regular RNNs were used instead due to their inability to
based on the network according to two class labels. In the preserve long-term dependencies by extending CEC by adding
context of deepfake video detection, Binary classification is input and output gates connected to the input layer, which
often used alongside two real and fake labels (0 or 1). While addresses the problem of conflicts during updating weights.
most deepfake detection methods often utilize deep-learning [14].
techniques, several methods have been proposed without us- LSTM has been proposed to be an excellent addition to
ing deep-learning techniques. Usage of successive subspace most machine learning tasks due to LSTM’s ability to preserve
learning (SSL), extracted features that are distilled by using temporal information [15], which can be used to add more
Spatial dimension reduction and Channel-wise Soft Classifi- parameters that the model can learn and improve the outcomes
cation, and the combination of Multi-Region and Multi-Frame of the model.
ensemble have been tested to produce a light-weight, highly In 2019, Tzuu-Hseng S.LI et al. conducted an experiment to
efficient deepfake detector model without using traditional create a facial recognition model that detects human emotions,
deep-learning-based methods [9]. enhancing human-computer interaction for integrating robots
Various deepfake detection models, primarily based on into daily life [16]. The study highlights the superiority of
Convolutional Neural Networks (CNN), have been proposed in LSTM and CNN-LSTM architectures in capturing temporal
the literature. EfficientNetB7, MobileNetV3, ResNet-152, and and contextual facial expression information compared to
ConvNeXt are notable models with promising performance MLP and Singular CNN. LSTM’s ability to retain temporal
in deepfake detection. EfficientNetB7, a state-of-the-art CNN context makes it a valuable addition to deep-learning-based
model, has demonstrated exceptional accuracy in various com- classification models.
puter vision tasks [3]. MobileNetV3 also has shown promising
results in deepfake detection while maintaining computational D. Vulnerabilities and Challenges
efficiency [4]. ResNet-152 is a widely adopted baseline CNN Deepfake detection models are known to be vulnerable
model in deepfake detection research [6]. ConvNeXt, designed to adversarial attacks. Adversaries can manipulate deepfake
to capture spatial and temporal features, has shown excellent videos to evade or even fool the detection models into misclas-
performance in deepfake detection [5]. sifying fake content as real [17] [18]. Developing generalized
models capable of detecting different types of deepfakes
B. Convolutional Neural Network (CNN)
remains a challenge. Deepfakes can vary in quality, manip-
CNN, or Convolutional Neural Network, is a deep-learning ulation techniques, and characteristics, making it challenging
architecture used to recognize features and patterns, including to develop a one-size-fits-all detection solution.
detecting deepfakes. The main advantage of CNN is that it Most current deepfake detection methods often focus on
can automatically learn valuable features from raw using the analyzing the facial features contained in videos, prioritizing
convolutional layers. These convolutional layers can collect visual elements. The nature of current deepfake detectors
spatial information from inputs and then be used for feature relying on facial features of videos leads to potential concerns
extraction. The features that have been extracted will then be where implementation of strong Antiforensics measures on
used by the fully connected layers in the neural network to facially manipulated images alongside the usage of other non-
identify the deepfake visual deepfake media might cause current deepfake detection
Although it has the benefit of self-learning, CNN is still techniques becoming highly inefficient [19].
vulnerable to attacks specifically created to avoid CNN-based
model detection. Therefore, additional research is required E. Comparative Review
to increase the generalization and robustness of CNN-based In this section, we reviewed a comparative review to eval-
detection models, even to detect a deepfake created to avoid uate the performance of deep CNN in detecting distracted
the CNN detection model. drivers. A comparative review involves analyzing and compar-
While it can be used to detect deepfake, CNN can also be ing multiple models or methods to determine their effective-
used for generating deepfake and sometimes leaving a trace. ness. The goal of the comparative review is to identify the best-
Research conducted by Luca et. Al [10] used Convolutional performing approach. Kathiravan et al. made a Comparison of
traces, a unique identifier, to detect deepfake media and deep convolutional neural networks in 2021 [20], in which
even identify the GAN architecture that makes that deepfake. three deep CNN models were given a set of pictures of
They used the Expectation-Maximization algorithm to extract distracted drivers, which the paper claims that several road
the convolutional traces left by the CNN. Based on their accidents have happened due to humans not paying attention
research result, we can see that their proposed model has while driving. The paper suggests that developing a system ca-
better accuracy than other models such as FakeSpotter [11] pable of accurately predicting driver distraction can potentially
and AutoGAN [12]. reduce road accidents.

— 251 —
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.
2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS)

During this Comparative Study, the three chosen deep CNN


models are Resnet, Xception, and VGG16 Model, and all
models are evaluated based on precision, recall, and F1 score.
After analyzing the results obtained from the experiments, it
was concluded that Resnet was the best-performing deep CNN
model, while VGG16 was the worst-performing model due to
VGG16 being a primitive CNN model, even though the results
are satisfactory, and Xception landing in between. It can be
concluded that the Resnet model should be used as a baseline
for our experiment and future experiments.
Fig. 2. FaceForensics++ Video Example Adapted from [21]
III. M ETHODOLOGY
The primary focus of this research is to investigate the
performance of various convolutional neural network models from each randomly selected video and ensuring equal video
in detecting deepfake videos. Thus, we used a qualitative dimensions and parameters for each video. This approach will
approach to detect manipulated videos with equal parameters. create a level playing field for all models, allowing for a fair
and unbiased evaluation of their performance.
By leveraging the FaceForensics++ set and applying consis-
tent preprocessing techniques, we aim to facilitate an objective
comparison of the deepfake detection models and enable
meaningful insights into their accuracy and robustness.

C. Preprocessing Methods
Each randomly chosen video was split into frames for the
preprocessing methods used on the set. These frames were
labeled based on meta to determine whether the video was
real or fake. The dimension of the videos will be resized to
112x112 before preprocessing them. The mean and standard
deviation were standardized for each video in the training and
test sets.

D. Experimenting with Models

Fig. 1. Research Methodology The experiments for this research paper were conducted in
Google Colaboratory Pro using a preprocessed custom dataset
obtained from the FaceForensics++ set, and the source code is
A. Identifying Models for Experimentation a modified version from [22]. The set consisted of a ratio of
Based on the literature review and existing work, we im- real and fake videos 1:4 to ensure a balanced representation.
ported our models from Pytorch libraries. First, Resnet-152 The architecture of the model consisted of a Deep CNN Model
is chosen for the baseline, as it is one of the older models accompanied by one LSTM layer incorporated to enhance
chosen from the roster. The rest of the models chosen, such efficiency and leverage sequential information, which comes
as MobileNetV3, ConvNeXt Large, and EfficientNetB7, are after the data is put through the Deep CNN Model.
much newer and considered much more efficient and accurate The experiments employed four CNN models: ResNet-152,
than the baseline model. MobileNet-V3 Large, ConvNeXt Large, and EfficientNetB7.
Each model was trained separately to ensure independent
B. Collection and Analysis training processes and reliable results. The Adam optimizer
We utilized the FaceForensics++ dataset [21]. FaceForen- with a learning rate of 0.00001 was utilized, and the training
sics++ is a comprehensive public set comprising 1000 original was performed over 20 epochs. Following the training phase,
videos from the public internet and 1000 manipulated videos the models were tested to evaluate their performance regarding
generated through advanced video editing techniques. By us- training accuracy, average loss, and testing accuracy.
ing this established set, we can ensure the inclusion of diverse Finally, we made some additional code that is used to create
and realistic deepfake scenarios. Fig. 2 shows an example of a test dataset, and the newly trained model will be tested by
a FaceForensics++ Video. using a test set to determine the precise accuracy of the model.
To ensure a fair comparison, each deepfake detection By employing this experimental setup, the research aimed to
model had the same set and underwent similar preprocessing compare the performance of the selected models in deepfake
methods. The preprocessing steps involved extracting frames detection. Using custom sets derived from FaceForensics++

— 252 —
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.
2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS)

and including an LSTM layer aimed to provide a fair eval-


uation and improve the models’ effectiveness in detecting
deepfakes.
IV. R ESULT & D ISCUSSION
This section of our research is dedicated to discussing
the results yielded from the research based on each of
the four implemented models, which are ResNet-152, Mo-
bilenetV3, Convnext Large, and EffecientNetB7. The source
code and the result could be found in GitHub Repositories
https://github.com/DevanLucian15741/ComparisonDF RM Fig. 4. Resnet-152 Confusion Matrix
Table 1. Accuracy Results of FaceForensics++ Dataset.
Training Validation
CNN Model
Accuracy Loss Accuracy Loss
Testing the presence of some degree of overfitting within the model.
EfficientNet B7 87.88% 0.3877 67.5% 0.6260 75% However, despite this observation, the model still achieved sat-
MobileNetV3 85.25% 0.4079 70.75% 0.6299 61.33% isfactory accuracy on the training set, and it can be considered
ConvNeXt 94.44% 0.2077 73.25% 1.2783 67.17%
ResNet-152 91.19% 0.2993 80.75% 0.6678 61%
to have performed reasonably well with a significant number
of true positives.

Table 2. F1-Score Results. B. MobilenetV3 Results


CNN Model Precision Recall F1-Score
EfficientNet B7 0.8623 0.7228 0.7853
MobileNetV3 0.8306 0.8022 0.8160
ConvNeXt 0.8025 0.8852 0.8414
ResNet-152 0.8266 0.9556 0.8852

A. Resnet-152 Results

Fig. 5. MobilenetV3 Training and Validation Graph

In Fig. 5, the MobileNetV3 model achieved an overall train-


ing accuracy of 85.25% with a training loss of 0.407. These
results indicate that the model performed well in training,
demonstrating high accuracy and minimal errors.
Fig. 3. Resnet-152 Training and Validation Graph

For the training results obtained from combining the


ResNet-152 model with one layer of LSTM, as shown in Fig.
3. the model demonstrated a training accuracy of 91.19%. This
high accuracy indicates excellent performance in classifying
deepfake videos and distinguishing them from real videos.
The model’s training loss score of 0.2993 indicates that it
made a few mistakes during the classification process, further
affirming its strong performance.
Fig. 4 shows the true positive, false positives, false nega-
tives, and true negative values that the model predicts, repre- Fig. 6. MobilenetV3 Confusion Matrix
sented as a confusion matrix. From Fig. 4. we can determine
that the precision and F1 Score of the model during training are Fig. 6. shows the confusion matrix results for MobilenetV3
0.9556 and 0.8866, respectively, which proves that the model and shows that the Precision and F1 Score calculated based on
performed well on the training dataset. the scores from the confusion matrix are 0.8019 and 0.8157,
Looking at the accuracy, the model performs relatively respectively. This further proves that the model performed well
lower than the training and validation accuracy. It suggests on the training set as it can maintain a good balance between

— 253 —
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.
2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS)

precision and recall, even if Resnet-152 yielded better results ResNet-152 and MobileNetV3. From Table. 1, It achieved a
during training. testing accuracy of 67.17%, indicating that it could correctly
However, when evaluated on both the validation and test classify over half of the test set. This performance surpassed
sets from Table. 1, the model’s performance was significantly the other two models, suggesting that ConvNeXt had a higher
worse at a testing accuracy of 61.33%, much lower than the capability to generalize to the unseen.
training accuracy. This suggests the presence of overfitting,
D. EffecientNetB7 Results
where the model may have memorized the training too well
and struggled to generalize to the unseen. Despite this, the
model still exhibited moderate performance.
Comparing the performance of MobileNetV3 to ResNet152,
it appears that MobileNetV3 had a slightly higher testing
accuracy. However, the difference in accuracy between the
two models is minimal, making them practically identical in
performance.
C. ConvNeXt Results

Fig. 9. EffecientNetB7 Training and Validation Graph

EfficientNetB7 was selected as the final model for this ex-


periment. Based on Fig. 9, the EfficientNetB7 model achieved
a training accuracy of 87.88% with a loss of 0.387696,
indicating its strong performance on the training set, similar
to the previous models. However, the model’s accuracy on the
validation set dropped slightly to 71%, which was still the
highest accuracy among the models.
Fig. 7. ConvNeXt Training and Validation Graph

The ConvNeXt model used in this experiment is the Large


version. It exhibited a notable symptom of overfitting, per-
forming well on the training set with a high accuracy of
94.44% and a low loss of 0.207732 based on Fig. 7. These
results indicate that the model was able to accurately classify
the majority of the training, with minimal errors.

Fig. 10. EffecientNetB7 Confusion Matrix

From Fig. 10. It can be concluded that EfficientNetB7 yields


the lowest Precision and F1 Score out of all the models tested,
as both calculated scores are 0.7221 and 0.7862, respectively,
significantly lower when compared to Resnet-152.
Table. 1. Shows that in terms of testing accuracy, Efficient-
NetB7 performed the best out of all four models, achieving
Fig. 8. ConvNext Confusion Matrix a testing accuracy of 75%. EfficientNetB7 demonstrated the
highest accuracy in classifying whether a video is deepfake
Despite being the model that performed the best during or real. It is important to note that while EfficientNetB7
training, the ConvNeXt model does not have the highest performed well, its testing accuracy was still lower than its
Precision nor F1 Score since both are 0.8851 and 0.8419, training accuracy, suggesting some overfitting.
respectively, calculated from the confusion matrix shown in
Fig. 8; these results are much lower when compared to E. Discussion
Resnet152. The experiment results suggest that EfficientNetB7 is the
Interestingly, when evaluated on the testing set, the Con- most effective CNN model for detecting whether a video
vNeXt model achieved significantly higher accuracy than is real or manipulated based on its superior performance in

— 254 —
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.
2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS)

binary classification. One factor that contributed to the success [8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
of EfficientNetB7 is its utilization of the compound scaling S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,”
2014.
method, which played a role in enhancing its performance. [9] H. G. Hong-Shuo Chen, Mozhdeh Rouhsedaghat, “Defakehop: A light-
The second best-performing model in the experiment was weight high-performance deepfake detector,” in IEEE International
ConvNeXt, which exhibited the second-highest testing ac- Conference on Multimedia and Expo Workshops (ICMEW), 2021.
[10] L. Guarnera, O. Giudice, and S. Battiato, “Fighting deepfake by exposing
curacy. On the other hand, both Resnet-152 and MobileNet the convolutional traces on images,” IEEE Access, vol. 8, pp. 165085–
demonstrated similar results with comparable accuracies. 165098, 2020.
An important observation from the experiment is that all [11] R. Wang, F. Juefei-Xu, L. Ma, X. Xie, Y. Huang, J. Wang, and Y. Liu,
“Fakespotter: A simple yet robust baseline for spotting ai-synthesized
models exhibited signs of overfitting. This can be seen from fake faces,” 2020.
the lower testing accuracy scores than the training accuracy [12] X. Gong, S. Chang, Y. Jiang, and Z. Wang, “Autogan: Neural architec-
scores. This indicates that the models struggled to generalize ture search for generative adversarial networks,” 2019.
[13] A. Sherstinsky, “Fundamentals of recurrent neural network (RNN)
well beyond the training. and long short-term memory (LSTM) network,” Physica D: Nonlinear
To address the issue of overfitting and improve the gen- Phenomena, vol. 404, p. 132306, mar 2020.
[14] R. C. Staudemeyer and E. R. Morris, “Understanding lstm – a tutorial
eralization ability of the models, further enhancements can into long short-term memory recurrent neural networks,” 2019.
be implemented. One approach could involve utilizing addi- [15] P. Saikia, D. Dholaria, P. Yadav, V. Patel, and M. Roy, “A hybrid cnn-
tional sets beyond FaceForensics++ to introduce more diverse lstm model for video deepfake detection by leveraging optical flow
features,” in 2022 International Joint Conference on Neural Networks
features. Increasing the size of the training and testing sets (IJCNN), pp. 1–7, IEEE, 2022.
can also be beneficial, as it provides more information for the [16] T.-H. S. Li, P.-H. Kuo, T.-N. Tsai, and P.-C. Luan, “Cnn and lstm based
models to learn from and improve their performance. facial expression analysis model for a humanoid robot,” IEEE Access,
vol. 7, pp. 93998–94011, 2019.
[17] N. Carlini and H. Farid, “Evading deepfake-image detectors with white-
V. C ONCLUSION and black-box attacks,” in Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition workshops, pp. 658–659, 2020.
This study conducted a comparative analysis of deepfake [18] S. Hussain, P. Neekhara, M. Jere, F. Koushanfar, and J. McAuley,
detection models (ResNet-152, MobileNetV3, ConvNeXt, and “Adversarial deepfakes: Evaluating vulnerability of deepfake detectors
EfficientNetB7) on the FaceForensics++ dataset. Efficient- to adversarial examples,” in Proceedings of the IEEE/CVF winter
conference on applications of computer vision, pp. 3348–3357, 2021.
NetB7 achieved the highest testing accuracy, making it the best [19] S. Lyu, “Deepfake detection: Current challenges and next steps,” in IEEE
model to use when it comes to detecting deepfakes, outper- International Conference on Multimedia and Expo Workshops (ICMEW),
forming the other models. However, all models showed signs pp. 4–5, 2020.
[20] D. D. Kathiravan Srinivasan, Lalit Garg, “Performance comparison of
of overfitting, indicating the need for further improvements in deep cnn models for detecting driver’s distraction,” Tech Science, vol. 68,
generalization ability. To enhance deepfake detection, future 2021.
research should explore techniques such as data augmentation, [21] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and
M. Nießner, “Faceforensics++: Learning to detect manipulated facial
regularization, and larger datasets. Additionally, incorporating images,” 2019.
advanced techniques like attention mechanisms, ensemble [22] A. Jadhav, A. Patange, J. Patel, H. Patil, and M. Mahajan, “Deepfake
learning, and adversarial training can further improve the video detection using neural networks,” International Journal for Sci-
entific Research and Development, vol. 8, no. 1, pp. 1016–1019, 2020.
accuracy and robustness of deepfake detection systems. This
study emphasizes the significance of deepfake detection and
provides insights for selecting appropriate models and address-
ing challenges in the field.

R EFERENCES
[1] H. S. Shad, M. M. Rizvee, N. T. Roza, S. Hoq, M. Monirujjaman Khan,
A. Singh, A. Zaguia, S. Bourouis, et al., “Comparative analysis of
deepfake image detection method using convolutional neural network,”
Computational Intelligence and Neuroscience, vol. 2021, 2021.
[2] A. Beckmann, A. Hilsmann, and P. Eisert, “Fooling state-of-the-art
deepfake detection with high-quality deepfakes,” 2023.
[3] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con-
volutional neural networks,” in International conference on machine
learning, pp. 6105–6114, PMLR, 2019.
[4] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang,
Y. Zhu, R. Pang, V. Vasudevan, et al., “Searching for mobilenetv3,”
in Proceedings of the IEEE/CVF international conference on computer
vision, pp. 1314–1324, 2019.
[5] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A
convnet for the 2020s,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 11976–11986, 2022.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 770–778, 2016.
[7] M. Westerlund, “The emergence of deepfake technology: A review,”
Technology innovation management review, vol. 9, no. 11, 2019.

— 255 —
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 21,2024 at 16:22:36 UTC from IEEE Xplore. Restrictions apply.

You might also like