0% found this document useful (0 votes)
356 views31 pages

Deepfake Detection Report

The document is a technical seminar report on 'Deepfake Detection Using ResNeXt and LSTM: A Hybrid Deep Learning Approach' submitted by Ruchitha MA for a Bachelor of Engineering degree at Visvesvaraya Technological University. It discusses the growing threat of deepfakes and presents a novel method utilizing ResNeXt CNN for feature extraction and LSTM for classification to effectively distinguish between real and manipulated videos. The report outlines the limitations of existing detection systems and emphasizes the need for advanced techniques to combat the misuse of deepfake technology.

Uploaded by

Himashree PG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
356 views31 pages

Deepfake Detection Report

The document is a technical seminar report on 'Deepfake Detection Using ResNeXt and LSTM: A Hybrid Deep Learning Approach' submitted by Ruchitha MA for a Bachelor of Engineering degree at Visvesvaraya Technological University. It discusses the growing threat of deepfakes and presents a novel method utilizing ResNeXt CNN for feature extraction and LSTM for classification to effectively distinguish between real and manipulated videos. The report outlines the limitations of existing detection systems and emphasizes the need for advanced techniques to combat the misuse of deepfake technology.

Uploaded by

Himashree PG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI-590018

A Technical Seminar Report


on
“Deepfake Detection Using ResNeXt and LSTM: A Hybrid Deep
Learning Approach”
Submitted in partial fulfillment of the requirements for the final year degree in
Bachelor of Engineering in Computer Science and Engineering
of Visvesvaraya Technological University, Belagavi

Submitted by

Ruchitha MA 1RN21CS126

Under the Guidance of:


Ms. Soumya N G
Assistant Professor
Dept. of CSE , RNSIT

Department of Computer Science and Engineering


RNS Institute of Technology
Affiliated to VTU, Recognized by GOK, Approved by AICTE,New Delhi
NACC ’A + Grade’ Accredited, NBA Accredited (UG-CSE, ECE, ISE, EIE and EEE)
Channasandra, Dr.Vishnuvardhan Road, Bengaluru-560098
Ph:(080)28611880, 28611881 URL:www.rnsit.ac.in

2024-2025
RNS INSTITUTE OF TECHNOLOGY
Affiliated to VTU, Recognized by GOK, Approved by AICTE, New Delhi
NACC ’A + Grade’ Accredited, NBA Accredited (UG-CSE,ECE,ISE,EIE and EEE)
Channasandra, Dr.Vishnuvardhan Road, Bengaluru-560098
Ph:(080)28611880,28611881 URL:www.rnsit.ac.in
Department of Computer Science and Engineering

CERTIFICATE

Certified that the Technical Seminar entitled “Deepfake Detection Using ResNeXt and LSTM: A

Hybrid Deep Learning Approach” has been successfully carried out at RNSIT by Ruchitha MA

bearing USN 1RN21CS126 bonafide students of RNS Institute of Technology in partial fulfillment

of the requirements of final year degree in Bachelor of Engineering in Computer Science and

Engineering of Visvesvaraya Technological University, Belagavi during academic year 2024-2025.

The Seminar report has been approved as it satisfies the academic requirements in respect of Seminar

work for the said degree.

————————— ————————-
Ms. Soumya N G Dr. Vidya Y
Assistant professor Technical Seminar Coordinator
Dept. of CSE Associate Professor, Dept. of
CSE

————————— ————————-
Dr. Kavitha C Dr. Ramesh Babu H S
Dean and Head Principal
Dept. of CSE RNSIT
Acknowledgement

At the very onset, I would like to place on record my gratitude to all those people who have helped

me in making this seminar a reality.Our Institution has played a paramount role in guiding in the

rightdirection.I would like to profoundly thank Management of RNS Institute of Technology for

providing such a healthy environment for the successful completion of this Seminar.

I would like to thank our beloved Director,Dr.MK Venkatesha,for providing the necessary facilities

to carry out this work.

I would like to thank our beloved Principal,Dr.Ramesh Babu H S , for providing the necessary

facilities to carry out this work.

I am extremely grateful to Dr.Kavitha C, Dean and Head,Department of Computer Scienceand

Engineering, for having agreed to patronize me in the right direction with all her wisdom.

I would like to express my sincere thanks to our Coordinator Dr.Vidya Y, Associate Professor and

guide, Ms. Soumya N G , Assistant professor, for her constant encouragement that motivated me

for the successful completion of this work.Last but not the least, I am thankful to all the teaching

and non-teaching staff members of the Computer Science and Engineering Department for their

encouragement and support throughout this work.

Ruchitha MA (1RN21CS126)

i
Abstract

The growing computation power has made the deep learning algorithms so pow erful that creating a

indistinguishable human synthesized video popularly called as deep fakes have became very simple.

Scenarios where these realistic face swapped deep fakes are used to create political distress, fake

terrorism events, revenge porn, blackmail peoples are easily envisioned. In this work, we describe

a new deep learning-based method that can effectively distinguish AI-generated fake videos from

real videos.Our method is capable of automatically detecting the replacement and reenactment deep

fakes. We are trying to use Artificial Intelligence(AI) to fight Artificial Intelligence(AI). Our system

uses a Res-Next Convolution neural network to extract the frame-level features and these features and

further used to train the Long Short Term Memory(LSTM) based Recurrent Neural Network(RNN)

to clas sify whether the video is subject to any kind of manipulation or not, i.e whether the video is

deep fake or real video. To emulate the real time scenarios and make the model perform better on

real time data, we evaluate our method on large amount of balanced and mixed data-set prepared by

mixing the various available data-set like Face-Forensic++[1], Deepfake detection challenge[2], and

Celeb-DF[3]. We also show how our system can achieve competitive result using very simple and

robust approach.

ii
Contents

Acknowledgement i

Abstract ii

List of Figures v

List of Tables vi

1 INTRODUCTION 1

1.1 INTRODUCTION ABOUT THE SEMINAR TOPIC . . . . . . . . . . . . . . . . . 2

1.2 Existing System and Its Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 SIGNIFICANCE OF TOPIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Key Reasons for the Significance of Deepfake Detection . . . . . . . . . . . 4

2 LITERATURE SURVEY 5

2.1 General Working Features of Existing Systems . . . . . . . . . . . . . . . . . . . . 5

2.2 Review of Related Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Generalizable Deepfake Detection with Phase-Based Motion Analysis . . . . 6

2.2.2 Dynamic Difference Learning with Spatio-Temporal Correlation . . . . . . . 6

2.2.3 Multi-Rate Excitation Network for Deepfake Video Detection . . . . . . . . 7

2.2.4 Improved Dense CNN for Deepfake Image Detection . . . . . . . . . . . . . 7

2.2.5 Deepfake Face Mask Dataset for Detection in the Infectious Disease Era . . . 7

2.3 RELEVANT RECENT PAPERS SUMMARY . . . . . . . . . . . . . . . . . . . . . 8

2.4 CONCLUSION ABOUT LITERATURE SURVEY . . . . . . . . . . . . . . . . . . 8

iii
3 PROBLEM STATEMENT 10

3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 METHODOLOGY 12

4.1 Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2 Parameter Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.3 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.4 Model Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.5 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.6 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.7 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 RESULTS AND SNAPSHOTS 18

5.1 Result Based on Selected Research Papers . . . . . . . . . . . . . . . . . . . . . . . 18

5.1.1 Deepfake Detection Using ResNeXt and LSTM . . . . . . . . . . . . . . . . 18

5.1.2 Model Performance on Different Datasets . . . . . . . . . . . . . . . . . . . 18

5.1.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2 Result Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 CONCLUSION 20

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.2 Future Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

References 23
List of Figures

4.1 ResNeXt Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2 LSTM-based Temporal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

v
List of Tables

5.1 Trained Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

vi
Chapter 1

INTRODUCTION

In the world of ever growing Social media platforms, Deepfakes are con sidered as the major threat

of the AI. There are many Scenarios where these realistic face swapped deepfakes are used to create

political distress, fake ter rorism events, revenge porn, blackmail peoples are easily envisioned.Some

of the examples are Brad Pitt, Angelina Jolie nude videos. It becomes very important to spot the

difference between the deepfake and pris tine video. We are using AI to fight AI.Deepfakes are

created using tools like FaceApp[11] and Face Swap[12], which using pre-trained neural networks

like GANorAutoencoders for these deepfakes creation. Our method uses a LSTM based artificial

neural network to process the sequential temporal analysis of the video frames and pre-trained Res-

Next CNN to extract the frame level fea tures. ResNext Convolution neural network extracts the

frame-level features and these features are further used to train the Long Short Term Memory based

artificial Recurrent Neural Network to classify the video as Deepfake or real. To emulate the real

time scenarios and make the model perform better on real time data, we trained our method with large

amount of balanced and combi nation of various available dataset like FaceForensic++[1], Deepfake

detection challenge[2], and Celeb-DF[3]. Further to make the ready to use for the customers, we have

developed a front end application where the user the user will upload the video. The video will be

processed by the model and the output will be rendered back to the user with the classification of the

video as deepfake or real and confidence of the model.

1
1.1 INTRODUCTION ABOUT THE SEMINAR TOPIC

In today’s digital age, the rise of artificial intelligence (AI) has brought about significant advancements

in media generation and manipulation. One of the most concerning applications of AI is the creation

of deepfakes—highly realistic fake videos or images that can alter a person’s face, voice, or actions in

ways that are difficult to distinguish from real content. Deepfakes are generated using sophisticated AI

models like Generative Adversarial Networks (GANs) and Autoencoders, making them increasingly

deceptive and difficult to detect.

The growing misuse of deepfake technology poses serious threats, including political misinfor-

mation, identity fraud, financial scams, blackmail, and fake news propagation. As deepfakes become

more prevalent, there is an urgent need for robust deepfake detection techniques to counteract their

harmful effects.

This seminar will explore the underlying technology behind deepfake creation, its implications

on society, and various AI-driven detection approaches. We will discuss our proposed deepfake

detection model, which leverages ResNeXt Convolutional Neural Networks (CNNs) for frame-level

feature extraction and a Long Short-Term Memory (LSTM) network for sequential video analysis.

Additionally, we will highlight the datasets used for training, real-world applications, and the

development of a user-friendly front-end tool for video verification.

By the end of this seminar, participants will gain insights into how AI is being used both to

create and combat deepfakes, the challenges in detecting manipulated content, and the future scope

of deepfake detection technology.

1.2 Existing System and Its Limitations

The current deepfake detection systems employ a variety of methods to identify manipulated

media. These approaches primarily focus on pixel-level analysis, handcrafted feature extraction,

and inconsistencies in facial expressions, eye movements, or lighting conditions. Some traditional

techniques also rely on motion analysis, frequency domain analysis, and physiological signal

detection to differentiate real videos from deepfakes.

However, despite these efforts, the existing systems face several limitations:

Dept. of CSE, RNSIT 2023-24 Page 2


1. Lack of Generalization – Many existing deepfake detection models are trained on specific

datasets and fail to generalize well when tested on deepfakes generated using different

techniques or unseen data. This reduces their reliability in real-world scenarios.

2. Vulnerability to Advanced Deepfakes – With rapid advancements in deepfake generation

using Generative Adversarial Networks (GANs) and Autoencoders, traditional detection

methods struggle to keep up. Many of these systems rely on static feature-based detection,

which can be easily bypassed by more sophisticated deepfake models.

3. High False-Negative and False-Positive Rates – Some models incorrectly classify real videos

as deepfakes (false positives) or fail to detect actual deepfakes (false negatives), reducing their

effectiveness in practical applications. This leads to misclassification risks and makes the

systems less trustworthy.

4. Computational Complexity – Some existing methods require significant computational

resources, making real-time detection challenging. This limits their usability in large-

scale applications like social media monitoring, video authentication, and law enforcement

investigations.

5. Lack of Temporal Analysis – Many detection methods analyze deepfakes frame-by-frame,

ignoring the temporal consistency of videos. This makes them less effective in detecting subtle

artifacts that persist across multiple frames.

6. Inability to Detect Emerging Deepfake Techniques – New deepfake creation tools continu-

ously improve their ability to mimic real videos with minimal artifacts, making it harder for

older detection systems to adapt without constant retraining and dataset updates.

Due to these limitations, there is a growing need for more advanced, AI-driven deepfake detection

methodologies that can adapt to evolving threats, improve accuracy, and operate efficiently in real-

time environments.

Dept. of CSE, RNSIT 2023-24 Page 3


1.3 SIGNIFICANCE OF TOPIC

The rise of deepfake technology presents a major challenge in today’s digital landscape. While AI-

driven synthetic media can be used for entertainment and creativity, the misuse of deepfakes poses

significant threats to privacy, security, and trust in digital content. The ability to create highly realistic

fake videos and images has led to political misinformation, financial fraud, identity theft, cybercrimes,

and reputational damage. As a result, deepfake detection has become an essential area of research and

technological development.

1.3.1 Key Reasons for the Significance of Deepfake Detection

1. Preventing Misinformation and Fake News – Deepfakes can be used to spread false

information about public figures, governments, and global events, leading to social and political

instability. Reliable detection systems help combat fake news and media manipulation.

2. Enhancing Cybersecurity and Fraud Prevention – Attackers can use deepfakes to imperson-

ate individuals, conduct financial scams, or create phishing attacks. Detecting such fraudulent

media is crucial for cybersecurity and digital identity protection.

3. Protecting Privacy and Preventing Harassment – Deepfake technology has been misused

to generate non-consensual explicit content, leading to harassment, blackmail, and privacy

violations. Effective detection helps mitigate such cybercrimes and supports legal enforcement.

4. Ensuring Trust in Media and Journalism – In an era where digital media plays a vital role

in communication, deepfake detection ensures that news organizations, social media platforms,

and video-sharing services can verify the authenticity of visual content.

5. Forensic and Law Enforcement Applications – Law enforcement agencies require deepfake

detection tools to analyze evidence in criminal investigations, prevent fraudulent confessions,

and safeguard against video-based deception.

6. Advancing AI Ethics and Responsible AI Development – Developing robust detection

models promotes ethical AI usage and ensures that AI technologies are used responsibly,

preventing their exploitation for malicious purposes.

Dept. of CSE, RNSIT 2023-24 Page 4


Chapter 2

LITERATURE SURVEY

Deepfake technology has gained prominence in recent years due to advancements in artificial

intelligence and deep learning. While it has potential applications in entertainment and media,

deepfakes pose significant threats in terms of misinformation, privacy violations, and fraud. Various

methods have been proposed to detect and mitigate deepfake content. This survey explores existing

systems and recent research in deepfake detection.

2.1 General Working Features of Existing Systems

Existing deepfake detection systems generally employ the following features:

• Artifact Detection: Identifying inconsistencies in facial features, unnatural movements, and

other visual anomalies.

• Feature Extraction: Analyzing pixel-level changes, facial landmarks, and temporal motion

dynamics.

• Machine Learning Models: Using classifiers like Support Vector Machines (SVM) and

Decision Trees.

• Limited Scope: Some systems are specific to face swaps and do not generalize well to other

manipulations.

• Real-time Processing Challenges: High computational costs hinder real-time deepfake

detection.

5
• Dataset Dependency: Detection accuracy heavily depends on the quality and diversity of

training datasets.

• Confidence Scoring: Systems provide probability scores to assess video authenticity.

• User Interface: Enabling users to upload and analyze videos for deepfake detection.

• Integration with Other Technologies: Some systems integrate with social media and

streaming platforms for real-time detection.

2.2 Review of Related Research

2.2.1 Generalizable Deepfake Detection with Phase-Based Motion Analysis

(E. Prashnani, M. Goebel, and B. S. Manjunath, 2024)

• Method: Introduces PhaseForensics, leveraging phase-based motion representation.

• Key Contribution: Improves robustness by utilizing band-pass frequency components.

• Advantages: Enhances cross-dataset generalization and resists adversarial perturbations.

• Limitations: Performance may depend on the type of video content.

2.2.2 Dynamic Difference Learning with Spatio-Temporal Correlation

(Q. Yin, W. Lu, B. Li, and J. Huang, 2023)

• Method: Utilizes spatio-temporal inconsistencies for deepfake detection.

• Key Contribution: Proposes Dynamic Fine-Grained Difference Capture (DFDC) and Multi-

Scale Spatio-Temporal Aggregation (MSA) modules.

• Advantages: Improves accuracy by differentiating between natural and manipulated inter-

frame differences.

• Limitations: High computational cost for real-time applications.

Dept. of CSE, RNSIT 2023-24 Page 6


2.2.3 Multi-Rate Excitation Network for Deepfake Video Detection

(G. Pang, B. Zhang, Z. Teng, Z. Qi, and J. Fan, 2023)

• Method: Proposes Multi-Rate Excitation Network (MRE-Net) to detect dynamic spatio-

temporal inconsistencies.

• Key Contribution: Introduces Bipartite Group Sampling (BGS) and Multi-Rate Branches for

detecting short-term and long-term inconsistencies.

• Advantages: Effective for hyper-realistic deepfake detection.

• Limitations: Requires a large dataset for optimal performance.

2.2.4 Improved Dense CNN for Deepfake Image Detection

(Y. Patel et al., 2023)

• Method: Uses an improved deep Convolutional Neural Network (D-CNN) for deepfake image

detection.

• Key Contribution: Captures inter-frame inconsistencies across multiple datasets.

• Advantages: Achieves high accuracy across multiple datasets (AttGAN, GDWCT, StyleGAN,

etc.).

• Limitations: Limited to image-based deepfake detection.

2.2.5 Deepfake Face Mask Dataset for Detection in the Infectious Disease Era

(N. M. Alnaim et al., 2023)

• Method: Introduces a Deepfake Face Mask Dataset (DFFMD) to enhance deepfake detection

in masked videos.

• Key Contribution: Addresses challenges posed by face masks in deepfake detection.

• Advantages: Provides a new dataset to improve model generalization.

• Limitations: Focuses only on deepfakes involving face masks.

Dept. of CSE, RNSIT 2023-24 Page 7


2.3 RELEVANT RECENT PAPERS SUMMARY

Recent advancements in deepfake detection have introduced novel techniques leveraging deep

learning, spatio-temporal analysis, and phase-based motion representation. Prashnani et al. (2024)

proposed PhaseForensics, which enhances robustness by utilizing band-pass frequency components

to improve cross-dataset generalization. Yin et al. (2023) introduced a Dynamic Fine-Grained

Difference Capture (DFDC) approach with Multi-Scale Spatio-Temporal Aggregation (MSA) to

identify subtle manipulations in inter-frame differences. Similarly, Pang et al. (2023) developed

the Multi-Rate Excitation Network (MRE-Net), incorporating Bipartite Group Sampling (BGS) for

detecting inconsistencies in high-resolution deepfake videos. Patel et al. (2023) enhanced deep

Convolutional Neural Networks (D-CNN) for image-based deepfake detection, demonstrating high

accuracy across multiple datasets such as AttGAN and StyleGAN. Additionally, Alnaim et al. (2023)

addressed the emerging challenge of deepfake videos with face masks by introducing the Deepfake

Face Mask Dataset (DFFMD), enabling better detection under occluded conditions. These studies

highlight the ongoing improvements in generalization, dataset diversity, and real-time efficiency,

addressing key challenges in deepfake detection.

2.4 CONCLUSION ABOUT LITERATURE SURVEY

Deepfake detection techniques have advanced significantly, leveraging cutting-edge methodologies

such as spatio-temporal analysis, phase-based motion representations, and deep learning-driven

feature extraction. These innovations have improved detection accuracy and resilience against adver-

sarial manipulations. However, several critical challenges persist, including the high computational

cost of real-time processing, the heavy reliance on dataset quality and diversity, and the limited

generalization of models across different types of deepfake manipulations. Addressing these issues

requires a multi-faceted approach, integrating lightweight yet robust architectures, enhancing dataset

diversity to encompass a broader range of deepfake techniques, and developing adaptive algorithms

that can efficiently detect emerging threats in real-world scenarios. Future research should also focus

on improving interpretability, reducing false positives, and fostering collaboration between academia,

industry, and regulatory bodies to establish standardized benchmarks and countermeasures, ensuring

Dept. of CSE, RNSIT 2023-24 Page 8


digital integrity and security.

Dept. of CSE, RNSIT 2023-24 Page 9


Chapter 3

PROBLEM STATEMENT

Convincing manipulations of digital images and videos have been demonstrated for several decades

through the use of visual effects, recent advances in deep learn ing have led to a dramatic increase

in the realism of fake content and the accessibility in which it can be created. These so-called AI-

synthesized media (popularly referred to as deep fakes).Creating the Deep Fakes using the Artificially

intelligent tools are simple task. But, when it comes to detection of these Deep Fakes, it is major chal

lenge. Already in the history there are many examples where the deepfakes are used as powerful way

to create political tension, fake terrorism events, revenge porn, blackmail peoples etc.So it becomes

very important to detect these deepfake and avoid the percolation of deepfake through social media

platforms. We have taken a step forward in detecting the deep fakes using LSTM based artificial

Neural network.

3.1 Objectives

The main objectives of this seminar are:

• Detect and expose deepfake content to mitigate its negative impact on digital security and public

trust.

• Minimize the misuse of deepfake technology by providing an efficient detection mechanism.

• Develop a classification system that differentiates between authentic (pristine) and manipulated

(deepfake) videos.

10
• Design a user-friendly interface that allows users to upload videos and receive real-time

authenticity assessments.

• Ensure the system is adaptable to evolving deepfake generation techniques by incorporating

advanced machine learning models.

• Enhance computational efficiency to enable real-time deepfake detection with minimal process-

ing delay.

• Improve detection accuracy by leveraging diverse datasets and state-of-the-art algorithms for

feature extraction and analysis.

Dept. of CSE, RNSIT 2023-24 Page 11


Chapter 4

METHODOLOGY

The development of deepfake detection systems requires a systematic approach that involves problem

analysis, dataset processing, model training, and evaluation. This chapter details the methodologies

used in our project.

4.1 Problem Analysis

Solution Requirement: We began by analyzing the problem statement and assessing the feasibility

of developing an effective deepfake detection system. This involved an extensive literature review of

various academic papers (as discussed in Section 3.3) to understand existing approaches.

During dataset analysis, multiple training strategies were tested, including training models

exclusively on either fake or real videos. However, this introduced significant bias, leading to

inaccurate predictions. Extensive research and experimentation indicated that a balanced training

approach, incorporating both real and deepfake videos, reduced bias and variance, thereby improving

model accuracy.

Solution Constraints: To ensure practical implementation, we evaluated our solution based on

several key factors:

• Cost of implementation

• Processing speed

• Hardware and software requirements

12
• Level of expertise required

• Availability of computational resources

4.2 Parameter Identification

The key parameters for detecting deepfake videos were identified based on prior research and

empirical analysis:

• Irregular blinking patterns

• Inconsistencies in teeth structure

• Unusual eye distance

• Inconsistent mustaches

• Double edges on facial features (eyes, ears, nose)

• Abnormalities in iris segmentation

• Absence of natural facial wrinkles

• Discrepancies in head pose and facial angles

• Variations in skin tone

• Unnatural facial expressions

• Lighting inconsistencies

• Pose misalignment

• Presence of double chins

• Hairstyle irregularities

• Higher cheekbone structures

These parameters were leveraged to enhance model accuracy and improve deepfake detection

performance.

Dept. of CSE, RNSIT 2023-24 Page 13


4.3 Model Design

Based on our research and findings, we designed a deep learning-based system architecture optimized

for deepfake detection. The model consists of multiple layers, each fine-tuned to identify facial

inconsistencies indicative of deepfakes.

4.4 Model Details

The deepfake detection model consists of multiple layers optimized for video-based forgery detection.

The primary components of the model architecture are as follows:

ResNeXt CNN

We utilize the pre-trained Residual Convolutional Neural Network (ResNeXt) model, specifically

resnext50 32x4d [?]. This model consists of 50 layers and follows a 32x4d dimensional

configuration. ResNeXt is chosen due to its superior feature extraction capabilities, leveraging

grouped convolutions to enhance performance.

Figure 4.1: ResNeXt Architecture

Dept. of CSE, RNSIT 2023-24 Page 14


The ResNeXt model extracts spatial features from individual video frames, which are further

processed for sequential analysis.

Sequential Layer

A Sequential Layer is used to structure the feature vectors returned by ResNeXt in an ordered manner.

This ensures that the extracted feature maps are passed to the subsequent LSTM layer sequentially.

LSTM Layer

Long Short-Term Memory (LSTM) networks are employed to capture temporal dependencies

between frames. The extracted 2048-dimensional feature vectors serve as input to the LSTM layer.

Our architecture consists of:

• LSTM Layer: A single LSTM layer with 2048 latent dimensions and 2048 hidden units.

• Dropout: A dropout probability of 0.4 to prevent overfitting.

The LSTM layer processes video frames sequentially, analyzing temporal inconsistencies by

comparing frame differences over time. The model evaluates the frame at time t with previous frames

at t − n, where n represents a variable time step.

Figure 4.2: LSTM-based Temporal Analysis

This hybrid CNN-LSTM architecture effectively detects inconsistencies within video sequences,

making it well-suited for deepfake video detection.

Dept. of CSE, RNSIT 2023-24 Page 15


4.5 Model Development

For implementation, we selected the following technologies:

• Programming Language: Python 3, due to its extensive support for AI and deep learning

libraries.

• Framework: PyTorch, chosen for its ease of use, dynamic computation graph, and compatibil-

ity with CUDA for GPU acceleration.

• Cloud Platform: Google Cloud Platform (GCP), utilized to train the model efficiently on a

large dataset.

The dataset was preprocessed by extracting frames from videos and resizing them to a uniform

resolution. Augmentation techniques such as flipping, rotation, and contrast adjustments were applied

to enhance model robustness.

4.6 Model Evaluation

To evaluate the performance of our deepfake detection model, we used a diverse dataset comprising

real and deepfake videos, including samples sourced from YouTube. We employed multiple

evaluation metrics to ensure a balanced assessment:

• Accuracy: Measures the percentage of correctly classified videos.

• Precision: Evaluates the proportion of true positive detections among all positive predictions.

• Recall: Assesses the model’s ability to correctly identify deepfake videos.

• F1-Score: Provides a harmonic mean of precision and recall for a balanced evaluation.

• Confusion Matrix: Used to analyze false positives and false negatives, ensuring reliable

performance.

The model was tested on an independent validation set to determine its generalizability.

Dept. of CSE, RNSIT 2023-24 Page 16


4.7 Outcome

The final outcome of our project is a trained deepfake detection model capable of analyzing videos

and determining their authenticity. Our solution provides an efficient and accurate mechanism to

combat misinformation and improve trust in digital media.

Future improvements include real-time detection capabilities and integration into digital content

verification systems to enhance practical usability.

Dept. of CSE, RNSIT 2023-24 Page 17


Chapter 5

RESULTS AND SNAPSHOTS

5.1 Result Based on Selected Research Papers

5.1.1 Deepfake Detection Using ResNeXt and LSTM

The proposed deepfake detection model leverages the ResNeXt Convolutional Neural Network

(CNN) for spatial feature extraction and the Long Short-Term Memory (LSTM) network for temporal

analysis. Experimental evaluations on benchmark datasets, including FaceForensics++, Deepfake

Detection Challenge (DFDC), and Celeb-DF, demonstrate promising results.

5.1.2 Model Performance on Different Datasets

The model was trained and tested on a diverse dataset comprising real and deepfake videos. The

following key observations were made:

• The model achieved high accuracy on FaceForensics++ dataset, with a peak accuracy of

97.76% for 100-frame sequences.

• Performance on the Celeb-DF dataset was slightly lower (93.97%), due to the dataset’s high-

quality deepfakes.

• When tested on the final custom dataset (6000 videos), the model maintained a balanced

accuracy of 89.34%.

18
5.1.3 Evaluation Metrics

To assess the model’s performance, the following evaluation metrics were employed:

• Accuracy: The proportion of correctly classified videos.

• Precision: The ratio of correctly predicted deepfakes to total predicted deepfakes.

• Recall: The ability of the model to correctly detect deepfake videos.

• F1-Score: A harmonic mean of precision and recall.

• Confusion Matrix: Used to analyze false positives and false negatives.

5.2 Result Summary

The trained model was evaluated on multiple datasets, and the results are summarized in Table 5.1.

Table 5.1: Trained Model Results

Model Name Dataset Sequence Length Accuracy


model 97 acc 100 FaceForensics++ 100 frames 97.76%
model 93 acc 100 Celeb-DF + FaceForensics++ 100 frames 93.97%
model 89 acc 40 Custom Dataset 40 frames 89.34%

Dept. of CSE, RNSIT 2023-24 Page 19


Chapter 6

CONCLUSION

6.1 Conclusion

Deepfake detection has become a crucial area of research due to the increasing threats posed by

AI-generated synthetic media. In this project, we implemented a deep learning-based solution

using a hybrid architecture comprising ResNeXt CNN for spatial feature extraction and LSTM

for temporal analysis. The model was trained and evaluated on various datasets, including

FaceForensics++, Deepfake Detection Challenge (DFDC), and Celeb-DF, achieving high accuracy

in detecting manipulated videos.

Through extensive experimentation, we found that balanced dataset training significantly im-

proves the model’s ability to generalize across different types of deepfake manipulations. The

evaluation metrics, including accuracy, precision, recall, and F1-score, confirmed the effectiveness

of our approach in distinguishing between real and fake videos.

While our system demonstrates strong performance, challenges remain in improving real-time

processing and handling highly sophisticated deepfakes with minimal detectable artifacts. This study

contributes to the ongoing development of robust deepfake detection systems and highlights the need

for continuous advancements in AI-driven security measures.

20
6.2 Future Enhancements

Despite the success of the proposed deepfake detection model, there is still room for improvement.

Future enhancements will focus on the following aspects:

• Real-time Detection: Optimizing the model for real-time video analysis to detect deepfakes as

they are streamed or uploaded online.

• Lightweight Model Architecture: Developing a more efficient, computationally lightweight

model that can run on edge devices such as smartphones and IoT systems.

• Generalization Across New Deepfake Techniques: Enhancing the model to detect emerging

deepfake generation methods that produce highly realistic fake videos.

• Multi-modal Analysis: Integrating audio and speech analysis alongside video detection to

improve the overall system robustness.

• Explainable AI (XAI) Integration: Implementing explainability features to provide insights

into why a particular video is classified as real or fake.

• Integration with Social Media Platforms: Developing APIs and plugins that can be deployed

on social media platforms to flag deepfake content automatically.

• Robust Adversarial Defense: Enhancing the system’s resistance to adversarial attacks that

attempt to bypass detection mechanisms.

• Dataset Expansion: Incorporating more diverse datasets, including high-resolution and low-

resolution deepfakes, to improve model adaptability.

By addressing these areas, future iterations of the system can significantly enhance deepfake

detection capabilities and contribute to maintaining digital integrity.

6.3 Summary

This project presented a deep learning-based approach to deepfake detection, leveraging ResNeXt for

spatial feature extraction and LSTM for temporal sequence processing. The methodology involved

Dept. of CSE, RNSIT 2023-24 Page 21


problem analysis, dataset preprocessing, model training, evaluation, and system deployment. The

model demonstrated high accuracy in distinguishing between real and fake videos across multiple

datasets.

Key findings of this research include:

• A balanced dataset approach significantly improves generalization and reduces model bias.

• Combining CNN-based feature extraction with LSTM-based sequential analysis enhances

detection performance.

• While the model is effective, real-time processing remains a challenge that requires further

optimization.

In conclusion, the study highlights the importance of AI-driven solutions in countering the rising

threat of deepfake technology. Future work will focus on enhancing real-time detection, improving

generalization across new deepfake methods, and integrating multi-modal verification techniques.

By continuing to refine deepfake detection methodologies, we can contribute to a more secure and

trustworthy digital landscape.

Dept. of CSE, RNSIT 2023-24 Page 22


References

[1] Visual Deepfake Detection: Review of Techniques, Tools, Limitations, and Future Prospects

Ahmed, N. U. R., Badshah, A., Adeel, H., et al. (2024)

[2] A Comprehensive Review of DeepFake Detection Using Advanced Machine Learning and

Fusion Methods Gourav Gupta 1,† , Kiran Raja 1 and MukeshPrasad 3 1 , Manish Gupta 2,*,† ,

Tony Jan 2, , Scott Thompson Whiteside 2.

[3] E. Prashnani, M. Goebel and B. S. Manjunath, ”Generalizable Deepfake Detection with

PhaseBased Motion Analysis,” in IEEE Transactions on Image Processing,2024

[4] Deepfake Video Detection using Neural Networks Abhijit Jadhav1 Abhishek Patange2 Jay

Patel3 Hitendra Patil4 Manjushri Mahajan5

[5] G. Pang, B. Zhang, Z. Teng, Z. Qi and J. Fan, ”MRE-Net: Multi-Rate Excitation Network

for Deepfake Video Detection,” in IEEE Transactions on Circuits and Systems for Video

Technology, vol. 33, no. 8, pp. 3663-3676, Aug. 2023

[6] Y. Patel et al., ”An Improved Dense CNN Architecture for Deepfake Image Detection,” in IEEE

Access, vol. 11, pp. 22081-22095, 2023

[7] N. M. Alnaim, Z. M. Almutairi, M. S. Alsuwat, H. H. Alalawi, A. Alshobaili and F. S. Alenezi,

”DFFMD: A Deepfake Face Mask Dataset for Infectious Disease Era With Deepfake Detection

Algorithms,” in IEEE Access, vol. 11, pp. 16711-16722, 2023

[8] Q. Yin, W. Lu, B. Li and J. Huang, ”Dynamic Difference Learning With Spatio–Temporal

Correlation for Deepfake Video Detection,” in IEEE Transactions on Information Forensics and

Security, vol. 18, pp. 4046-4058, 2023

23

You might also like