0% found this document useful (0 votes)
113 views89 pages

Deepfake Video Detection Using Deep Learning

The document presents a project report on 'Deepfake Video Detection Using Deep Learning' by students from Kings Engineering College, focusing on the development of a hybrid framework called FakeSpotter. This framework combines Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to effectively detect deepfake videos by analyzing both spatial and temporal features. The project aims to enhance detection accuracy and contribute to the authenticity of digital content in light of the growing threat posed by deepfake technology.

Uploaded by

Dani Jojo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views89 pages

Deepfake Video Detection Using Deep Learning

The document presents a project report on 'Deepfake Video Detection Using Deep Learning' by students from Kings Engineering College, focusing on the development of a hybrid framework called FakeSpotter. This framework combines Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to effectively detect deepfake videos by analyzing both spatial and temporal features. The project aims to enhance detection accuracy and contribute to the authenticity of digital content in light of the growing threat posed by deepfake technology.

Uploaded by

Dani Jojo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 89

DEEPFAKE VIDEO DETECTION

USING DEEP LEARNING

A PROJECT REPORT

Submitted By :
RAMYA.R 210821205084
SUJITHA.P 210821205110
SRIMATHI.K 210821205108

in partial fulfilment for the award of the degree


of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY

KINGS ENGINEERING COLLEGE

(An Autonomous Institution, Affiliated to Anna University,

Chennai)

May 2025
KINGS ENGINEERING COLLEGE

(An Autonomous Institution, Affiliated to Anna University, Chennai)

BONAFIDE CERTIFICATE

Certified that this project report “DEEPFAKE VIDEO DETECTION


USING DEEP LEARNING “ is the Bonafide work of
“K.Srimathi,P.Sujitha ,R.Ramya” who carried out this project work under my
supervision.

SIGNATURE SIGNATURE

Dr. D. C. Jullie Josephine M.Tech., PhD Mrs. K.Benitlin Subha M.E., (PhD)

HEAD OF THE DEPARTMENT SUPERVISOR

Dept Of Information Technology, Assistant Professor

Kings Engineering College, Dept Of Information Technology,

Irungattukottai, Kings Engineering College,

Chennai -602117 Irungattukottai,

Chennai -602117

This report is submitted for the Project viva-voce scheduled to be held on


………………………..
INTERNAL EXAMINER EXTERNAL EXAMINER

ACKNOWLEDGEMENT

We thank God for his blessings and also for giving as good knowledge and
strength in enabling us to finish our project. Our deep gratitude goes to our founder late
Dr. D. Selvaraj, M.A., M.Phil., for his patronage in the completion of our project. We
like to take this opportunity to thank our honourable chairperson Dr.S. Nalini Selvaraj,
M.COM., MPhil., Ph.D. and our noble-hearted director, Mr.S. Amirtharaj, M.Tech.,
M.B.A and his wife, Mrs. Merilyn Jemmimah Amirtharaj, B.E., M.B.A., for their
support given to us to finish our project successfully. We wish to express our sincere
thanks to our beloved principal. Dr. C. Ramesh Babu Durai M.E., Ph.D for his kind
encouragement and his interest towards us.

We are extremely grateful and thanks to our professor Dr. D. C. Jullie Josephine , head of
Information Technology, Kings Engineering College, for her valuable suggestion, guidance
and encouragement. We wish to express our sense of gratitude to our project supervisor
Mrs. K.Benitlin Subha M.E., (PhD) Assistant Professor of Information Technology
Department, Kings Engineering College whose idea and direction made our project a grand
success. We express our sincere thanks to our parents, friends and staff members who
have helped and encouraged us during the entire course of completing this project work
successfully.
PROBLEM STATEMENT
In today’s digitally connected world, the proliferation of deepfake technology presents a
growing threat to the authenticity of visual media. Deepfakes—AI-generated synthetic
videos that can realistically impersonate individuals—are becoming increasingly difficult to
detect with the naked eye due to advancements in generative models. These manipulated
videos pose serious risks, including the spread of misinformation, identity fraud, and damage
to public trust in digital content.

Existing detection techniques often rely on isolated image-level analysis or static classifiers,
which fail to capture the spatiotemporal characteristics that distinguish real videos from
deepfakes. Furthermore, with the rapid evolution of deepfake generation methods, single-
model approaches struggle to generalize across varied datasets and manipulation techniques.
This inadequacy in traditional detection models calls for more robust and adaptive solutions.

To address this pressing issue, we propose FakeSpotter – A Hybrid CNN-RNN


Framework for Deepfake Identification. This system leverages the power of
Convolutional Neural Networks (CNNs) to extract spatial features from video frames and
Recurrent Neural Networks (RNNs)—specifically LSTM units—to analyze temporal
inconsistencies across sequential frames. By combining spatial and temporal cues, the model
can effectively detect artifacts, unnatural motion patterns, and frame-level inconsistencies
introduced during deepfake synthesis.

Our approach aims to enhance detection accuracy by training on large-scale datasets such as
Celeb-DF, ensuring adaptability and robustness in real-world scenarios. FakeSpotter thus
contributes to the broader goal of safeguarding digital content authenticity and mitigating the
social, political, and ethical threats posed by deepfake media
ABSTRACT

In today’s digital landscape, the rapid advancement of deepfake technology has raised
serious concerns due to its ability to generate highly convincing fake videos. These synthetic
media artifacts are widely used to spread misinformation, manipulate public opinion, and
perpetrate identity fraud, presenting significant challenges across social, political, and legal
domains. Traditional detection methods often fall short when addressing the spatial and
temporal anomalies introduced by deepfake algorithms.

To combat this threat, we propose FakeSpotter, a hybrid deep learning framework


combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks
(RNNs) for accurate deepfake detection. CNNs are employed to extract detailed spatial
features from individual video frames, while RNNs—specifically Long Short-Term
Memory (LSTM) networks—capture temporal inconsistencies across sequential frames.
By utilizing transfer learning with a pre-trained VGG-16 architecture and training on the
Celeb-DF dataset, the model achieves strong generalization to real-world deepfake
scenarios.

The proposed system demonstrates improved detection accuracy by effectively identifying


subtle visual artifacts and unnatural motion patterns. Through this hybrid approach,
FakeSpotter enhances the reliability of deepfake forensics and contributes to the broader
mission of securing digital content authenticity.

Keywords: Deepfake Detection, CNN-RNN Hybrid, VGG-16, LSTM, Celeb-DF, Transfer


Learning, Spatiotemporal Analysis, Fake Video Identification
LIST OF SYMBOLS

NOTATION
S.N NAME NOTATION DESCRIPTION
O

Class Name

1. Class + public -attribute Represents a collection


-attribute
-private of similar entities
# protected grouped together.
+operation
+operation
+operation

Associations
represents static
relationships between
Class A NAME Class B classes. Roles
representsthe way the
2. Association
two classes see each
Class A Class B
other.

3. Actor
It aggregates several
classes into a single
classes.
Class B Class A
Interaction between the
4. Aggregation
system and external
Class A environment
Class B

5. Relation uses
(uses) Used for additional
process
communication.

Extends relationship is
used when one use case
6. Relation extends
is similar to another use
(extends)
case but does a bit
more.

Communication
7. Communication
between various use
cases.

State of the processs.


8. State State

9. Initial State Initial state of the


object

10. Final state Final state of the object

11. Control flow


Represents various
control flow between
the states.
12. Decision box Represents decision
making process from a
constraint

Interact ion between


the system and external
13. Usecase Usescase environment.

Represents physical
modules which is a
collection of
14. Component
components.

Represents physical
15. Node modules which are a
collection of
components.

A circle in DFD
represents a state or
16. Data process which has been
triggered due to some
Process/State
event or acion.

Represents external
17. External entity
entities such as
keyboard,sensors,etc.

Represents
Transition
18. communication that
occurs between
processes.
19. Object Lifeline Represents the vertical
dimensions that the
object communications.

Represents the message


20. Message Message
exchanged.

ABBREVIATION

Acrony
Description Definition
m

The simulation of human intelligence processes


AI Artificial Intelligence
by machines, especially computer systems.

MongoDB,
A full-stack JavaScript-based framework used for
MERN Express.js, React.js,
web application development.
Node.js

A branch of AI that focuses on building systems


ML Machine Learning
that learn from and make decisions based on data.

Recurrent Neural A type of neural network architecture designed to


RNN
Network recognize patterns in sequences of data.

A specialized type of RNN capable of learning


Long Short-Term
LSTM long-term dependencies in sequence prediction
Memory
problems.

CNN Convolutional Neural A class of deep neural networks primarily used


Network for image recognition and classification.

Optical Character Technology used to convert scanned documents


OCR
Recognition or images into machine-encoded text.

Application
A set of protocols and tools for building and
API Programming
integrating software applications.
Interface

The space where interactions between humans


UI User Interface
and machines occur.

The overall experience a user has when


UX User Experience
interacting with a system.

A compact, URL-safe means of representing

JWT JSON Web Token claims for authentication and information

exchange.

Database Software for storing and retrieving users’ data


DBMS
Management System efficiently and securely.

Central Processing The main processor of a computer, executing


CPU
Unit instructions and managing operations.

Create, Read, Update, The four basic functions of persistent storage in


CRUD
Delete databases.

Cascading Style A language used to describe the style and layout


CSS
Sheets of HTML documents.

Hypertext Markup The standard language for creating web pages and
HTML
Language applications.

Natural Language A subfield of AI focused on the interaction


NLP
Processing between computers and human language.
TABLE OF CONTENT
DEEPFAKE VIDEO DETECTION..........................................................................................1
USING DEEP LEARNING......................................................................................................1
PROBLEM STATEMENT.......................................................................................................4
ABSTRACT..............................................................................................................................5
Keywords: Deepfake Detection, CNN-RNN Hybrid, VGG-16, LSTM, Celeb-DF, Transfer
Learning, Spatiotemporal Analysis, Fake Video Identification................................................5
TABLE OF CONTENTS............................................12
CHAPTER 1............................................................................................................................16
INTRODUCTION...................................................................................................................16
1.1 DOMAIN INTRODUCTION...........................................................................................17
1.2 OBJECTIVES...................................................................................................................17
1.3 SCOPE OF THE PROJECT..............................................................................................19
CHAPTER II...........................................................................................................................21
LITERATURE REVIEW........................................................................................................21
[1] Title: Deepfake Detection Using Convolutional Vision Transformers and Convolutional
Neural Networks (2024)..........................................................................................................21
[2] Title: Celeb-DF: A Large-Scale Challenging Dataset for Deepfake Forensics (2020).....21
[4] Title: Deep CNN Models-Based Ensemble Approach to Driver Drowsiness Detection
(2021)......................................................................................................................................22
[5] Title: Recurrent Convolutional Strategies for Video Classification (2019)......................22
[6] Title: Real-Time Deepfake Video Detection Using LSTM-Based Classifiers (2022)......23
[7] Title: Survey on Deepfake Detection Techniques (2023).................................................23
CHAPTER 3............................................................................................................................24
SYSTEM ANALYSIS............................................................................................................24
3.1 EXISTING PROBLEM.....................................................................................................24
3.2 PROPOSED METHODOLOGY......................................................................................24
SYSTEM REQUIREMENTS...........................................................................................26
4.1 HARDWARE REQUIREMENTS....................................................................................26
4.2 SOFTWARE REQUIREMENTS......................................................................................27
4.3 FUNCTIONAL REQUIREMENTS..................................................................................28
4.4 NON-FUNCTIONAL REQUIREMENTS........................................................................28
4.5 REQUIRED LIBRARIES AND FRAMEWORKS..........................................................30

CHAPTER 5............................................................................................................................31
MODELS AND METHODS...................................................................................................31
5.1 CONVOLUTIONAL NEURAL NETWORK (CNN)......................................................31
5.2 RECURRENT NEURAL NETWORK (RNN) – LSTM..................................................32
5.3 SUPPORT VECTOR MACHINE (SVM) (For Baseline Comparison)............................33
5.4 DECISION TREE / RANDOM FOREST (For Feature Selection or Ensemble
Approaches).............................................................................................................................33
5.5 K-NEAREST NEIGHBORS (KNN) (For Similarity Matching in Model Behavior
Analysis)..................................................................................................................................34
Summary of Selected Models in FakeSpotter:........................................................................34
CHAPTER VI..........................................................................................................................35
MODULES AND UML DIAGRAMS....................................................................................35
6.1 MODULES........................................................................................................................35
6.2 MODULES DESCRIPTION.............................................................................................35
6.3 UML DIAGRAMS............................................................................................................38
6.3.1 Data Flow Diagram (DFD).............................................................................................38
1.User:.....................................................................................................................................47
2.Admin (inherits from User):.................................................................................................47
3.Video:...................................................................................................................................48
4.FrameProcessor:...................................................................................................................48
5.FeatureExtractor (CNN Module):........................................................................................48
6.SequenceAnalyzer (LSTM Module):...................................................................................48
7.PredictionEngine:.................................................................................................................48
8.Result:...................................................................................................................................48
CHAPTER 7............................................................................................................................54
IMPLEMENTATION.......................................................................................................54
CHAPTER 8............................................................................................................................60
FEASIBILITY STUDY AND SYSTEM TESTING..............................................................60
8.1 FEASIBILITY STUDY....................................................................................................60
8.2 SYSTEM TESTING..........................................................................................................62
RESULTS................................................................................................................................66
SOURCE CODE:....................................................................................................................72
REFERENCES........................................................................................................................90
CHAPTER 1
INTRODUCTION

The widespread availability of deepfake technology—which enables the manipulation of


visual media using artificial intelligence—has sparked serious concern across industries.
Deepfakes can fabricate realistic but entirely synthetic video content that imitates people’s
appearances and behaviors, often with malicious intent. These fabricated videos pose
significant threats, including the dissemination of misinformation, damage to personal
reputation, and the erosion of trust in digital media.

Despite growing awareness, many existing deepfake detection systems are not equipped to
handle the complex and evolving nature of synthetic media. Traditional image-based
classifiers often lack the ability to capture temporal inconsistencies across video frames and
are limited in detecting subtle artifacts embedded by sophisticated generative models.

To address these challenges, we propose FakeSpotter – A Hybrid CNN-RNN Framework


for Deepfake Identification. This system leverages the combined strengths of
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
While CNNs are effective at extracting spatial features such as facial anomalies, lighting
inconsistencies, and texture distortions from individual frames, RNNs—specifically Long
Short-Term Memory (LSTM) networks—capture temporal patterns such as unnatural
motion or blinking behavior across sequential frames.

By integrating both spatial and temporal information, FakeSpotter delivers robust and
accurate deepfake detection. The model is trained using the Celeb-DF dataset and
incorporates transfer learning from pre-trained VGG-16 networks to improve
generalization and performance. This hybrid architecture not only detects deepfakes more
accurately but also adapts to the evolution of manipulation techniques .
1.1 DOMAIN INTRODUCTION

Artificial Intelligence (AI), particularly in the form of machine learning (ML) and deep
learning, has revolutionized the analysis of visual and video content. Within the domain of
digital media forensics, AI algorithms now play a vital role in detecting manipulated content
that evades human scrutiny.

Convolutional Neural Networks (CNNs) are widely used in computer vision for detecting
visual patterns such as edges, textures, and shapes in images. In contrast, Recurrent Neural
Networks (RNNs)—and more specifically LSTM architectures—are designed to handle
sequential data, making them suitable for modeling time-series patterns in videos.

The hybrid CNN-RNN approach combines these strengths to form a system capable of
analyzing both spatial and temporal dimensions of video data, thus making it ideal for
deepfake detection. This technique has been increasingly adopted in domains such as
surveillance, digital forensics, and media authentication, and continues to show promising
results in combating synthetic media threats.

1.2 OBJECTIVES

The core objective of the FakeSpotter framework is to effectively identify and classify
deepfake videos using a robust, hybrid deep learning approach that combines both spatial
and temporal analysis. By leveraging the strengths of Convolutional Neural Networks
(CNNs) and Recurrent Neural Networks (RNNs), this system aims to provide accurate and
scalable deepfake detection capable of addressing real-world challenges posed by
manipulated media.
FakeSpotter aims to:

1. Extract spatial features using pre-trained CNN architectures (e.g., VGG-16) to


detect pixel-level anomalies such as inconsistent lighting, facial distortions, and
texture artifacts.

2.Analyze temporal dependencies using LSTM-based RNNs to identify motion


inconsistencies, unnatural blinking, and abrupt transitions across video frames.

3.Preprocess input video data, including frame extraction, face detection, and
normalization, to ensure clean and consistent inputs for model training and inference.

4.Implement transfer learning to enhance model performance and reduce training


time by utilizing pre-trained weights on large datasets.

5.Train and evaluate the model using the Celeb-DF dataset, a challenging
benchmark for real-world deepfake scenarios.

6.Classify entire videos by aggregating frame-wise predictions, improving the


reliability of final deepfake detection outcomes.

By delivering a hybrid and intelligent deepfake detection framework, FakeSpotter seeks to


contribute to digital forensics, promote media authenticity, and serve as a frontline defense
against the threats posed by AI-generated fake content.
1.3 SCOPE OF THE PROJECT

The scope of FakeSpotter encompasses the design, development, and implementation of an


AI-powered hybrid deep learning system aimed at detecting and classifying deepfake video
content. The project focuses on leveraging the combined capabilities of Convolutional
Neural Networks (CNNs) for spatial analysis and Recurrent Neural Networks (RNNs)—
specifically Long Short-Term Memory (LSTM) networks—for temporal feature
extraction.

The system is designed to:

1.Detect spatial anomalies such as facial distortions, inconsistencies in lighting, and


irregularities in texture patterns.

2.Capture temporal inconsistencies like unnatural facial motion, irregular


eyeblinking, and frame transitions using sequence modeling.

3.Preprocess video inputs by extracting and aligning face regions from each frame to
create consistent, structured sequences for analysis.

4.Train on large-scale datasets such as Celeb-DF, enhancing the model’s ability to


generalize across a wide range of manipulation techniques and video qualities.

5.Provide accurate, frame-level and video-level classification using an aggregated


decision mechanism.

The FakeSpotter framework is modular and scalable, making it adaptable for use in various
real-world applications including:

 Digital media verification


 Social media content filtering
 Law enforcement digital forensics
 News/media authenticity validation
Furthermore, the system supports future integration with cloud platforms, edge devices, and
advanced generative model counters, ensuring continuous improvement and relevance as
deepfake technologies evolve.

Ultimately, FakeSpotter aims to become a critical tool in the broader effort to preserve
digital content integrity, promote responsible AI usage, and protect the public from
malicious synthetic media.
CHAPTER II
LITERATURE REVIEW
The detection of deepfake videos has become a pressing concern with the rapid evolution of
generative adversarial networks (GANs) and face-swapping technologies. Numerous studies
have proposed various machine learning and deep learning techniques to address the
challenge of synthetic media detection. This chapter highlights relevant works that have
contributed to the development of CNN, RNN, and hybrid models for deepfake
identification.

[1] Title: Deepfake Detection Using Convolutional Vision Transformers and


Convolutional Neural Networks (2024)

Authors: Soudy, A. H., Sayed, O., Tag-Elser, H., Ragab, R., Mohsen, S., Mostafa, T.,
Abohany, A. A., & Slim, S. O.
This study proposes a hybrid approach utilizing Convolutional Neural Networks (CNNs)
in combination with Vision Transformers (ViTs) to identify subtle inconsistencies in
deepfake videos. The model is capable of capturing both local (pixel-level) and global
(semantic) anomalies. Their results show a significant boost in accuracy when combining
traditional CNNs with transformer-based models for deepfake forensics.

[2] Title: Celeb-DF: A Large-Scale Challenging Dataset for Deepfake Forensics (2020)

Authors: Li, Y., Yang, X., Sun, P., Qi, H., Lyu, S.
This paper introduces the Celeb-DF dataset, a high-quality and diverse video dataset
designed to test deepfake detection systems under realistic conditions. The authors
emphasize the necessity of evaluating models on datasets with minimal visual artifacts to
ensure generalization in real-world scenarios. Celeb-DF has become a standard benchmark
in deepfake research.
[3] Title: Deepfake Detection for Video: An Open Source Challenge (2020)

Authors: Al-Hussein, M., Venkataraman, S., Jawahar, C.


This paper outlines the challenges in developing robust deepfake detection systems. It
discusses the importance of temporal coherence analysis and emphasizes the limitations of
image-based classifiers that fail to consider frame sequences. The authors advocate for RNN
and LSTM-based architectures to address temporal inconsistencies in fake videos.

[4] Title: Deep CNN Models-Based Ensemble Approach to Driver Drowsiness Detection
(2021)

Authors: Dua, M., Singla, R., Raj, S., Jangra, A.


Although not focused on deepfake detection, this paper demonstrates the strength of deep
CNN architectures in facial feature recognition. The ensemble model processes real-time
facial landmarks, which can be repurposed in deepfake detection for recognizing unnatural
facial behaviors, such as non-standard blinking patterns or micro-expressions.

[5] Title: Recurrent Convolutional Strategies for Video Classification (2019)

Authors: Karpathy, A., Toderici, G., Leung, T., et al.


This research introduces hybrid models combining CNNs and RNNs for video understanding
tasks. The authors illustrate how CNNs extract spatial features, while RNNs capture
temporal relationships, making them highly applicable to video-based tasks such as
deepfake detection. Their approach laid the foundation for many hybrid models used today.
[6] Title: Real-Time Deepfake Video Detection Using LSTM-Based Classifiers (2022)

Authors: Sharma, S., Bansal, K., Malhotra, P.


This work focuses on LSTM networks for identifying temporal inconsistencies in deepfake
videos. The authors highlight that artifacts such as jittery eye motion, frame flickering, and
inconsistent lighting are more prominent when analyzed across sequential frames, thus
validating the effectiveness of RNNs in video-based deepfake detection.

[7] Title: Survey on Deepfake Detection Techniques (2023)

Authors: Verma, T., Pandey, R., Singh, R.


This survey categorizes deepfake detection approaches into passive detection (hand-crafted
features), active detection (watermarking), and deep learning-based detection. It concludes
that deep learning models, particularly hybrid CNN-RNN architectures, outperform
traditional methods in identifying subtle manipulations in deepfake media.
CHAPTER 3
SYSTEM ANALYSIS

3.1 EXISTING PROBLEM

With the explosive growth of video-sharing platforms and social media, the threat posed by
deepfake videos has become increasingly severe. These AI-generated fake videos are
capable of realistically mimicking facial expressions, voices, and movements of real
individuals, making them highly deceptive and difficult to detect with the naked eye. As a
result, they pose serious risks including identity theft, reputation damage, political
manipulation, and misinformation dissemination.

Current deepfake detection systems typically rely on image-based classifiers or static pattern
recognition methods. However, these methods struggle to detect high-quality deepfakes
that exhibit few spatial irregularities. They often overlook temporal inconsistencies, such as
unnatural blinking or misaligned lip movements, which are crucial cues in video-based
analysis.

Furthermore, many existing models lack the scalability, generalization, and adaptability
required to handle the diverse techniques used to generate deepfakes. As deepfake
generation becomes more advanced and accessible, there is a critical need for robust,
hybrid detection frameworks that combine spatial and temporal analysis to effectively
counter these synthetic threats.

3.2 PROPOSED METHODOLOGY

To overcome the limitations of existing systems, we propose FakeSpotter – A Hybrid


CNN-RNN Framework for deepfake detection. This system is designed to leverage both
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to
accurately identify deepfakes by analyzing visual and sequential information from video
data.
The proposed methodology includes the following components:

1.Data Preprocessing: Input videos are converted into frames. Each frame undergoes
face detection, cropping, and alignment using tools like dlib or OpenCV to focus on
relevant facial regions.

2.Spatial Feature Extraction (CNN): Each frame is passed through a pre-trained


VGG-16 model to extract spatial features such as texture anomalies, inconsistent
lighting, or facial artifacts typical of deepfake videos.

3.Temporal Analysis (RNN/LSTM): The sequence of features extracted from CNN


is fed into a Long Short-Term Memory (LSTM) network. This component captures
temporal inconsistencies—such as unnatural facial motion or erratic blinking—that
span across multiple frames.

4.Classification Layer: A fully connected dense layer with a softmax or sigmoid


activation function is used to classify frames or entire video sequences as real or fake.

5.Post-Processing: Frame-level predictions are aggregated to make a final decision


about the authenticity of the video, improving robustness against isolated errors.

This hybrid architecture ensures the model captures both static artifacts and temporal
inconsistencies—two critical components in deepfake detection. By training the system on
the Celeb-DF dataset, the framework gains the ability to generalize across different
manipulation techniques and video qualities.

Through this approach, FakeSpotter provides a scalable and reliable deepfake identification
mechanism, contributing to the broader goal of digital content verification and media
integrity protection.
CHAPTER 4

SYSTEM REQUIREMENTS

4.1 HARDWARE REQUIREMENTS

To effectively train, test, and deploy the deep learning-based FakeSpotter framework, the
following hardware configuration is recommended:

Processor: Intel Core i7 / AMD Ryzen 7 or higher (with multi-core support for deep
learning)

RAM: Minimum 16 GB DDR4 (32 GB preferred for large datasets and model
training)

Storage: 512 GB SSD (for fast read/write operations and dataset handling)

GPU: NVIDIA GPU with CUDA support (e.g., RTX 3060 or higher for model
training)

Display: Full HD 15.6” monitor

Internet: Stable broadband connection (for dataset access, model updates, and cloud
deployment)
4.2 SOFTWARE REQUIREMENTS

The FakeSpotter system utilizes a range of software tools, libraries, and frameworks for
model development, training, and deployment:

Operating System: Windows 10/11, Ubuntu 20.04+ (Linux recommended for


compatibility with AI libraries)

Programming Language: Python 3.8+

Deep Learning Libraries: TensorFlow, Keras, PyTorch

Computer Vision Tools: OpenCV, dlib

IDE/Code Editor: Visual Studio Code / Jupyter Notebook

Version Control: Git, GitHub

Deployment Platform: AWS EC2 / Google Colab / Render / Heroku (for model
hosting)

Others: Anaconda (for environment management), JupyterLab (for experimentation)


4.3 FUNCTIONAL REQUIREMENTS

4.3.1 Data Preprocessing and Standardization

Video input must be broken down into frames. Each frame is then preprocessed through
resizing, normalization, and face alignment using facial detection tools such as dlib or
MTCNN. This standardized input structure ensures consistent and accurate feature
extraction by the CNN.

4.3.2 Feature Extraction and Vectorization

Each preprocessed video frame is converted into a numerical representation using VGG-16
CNN, which extracts spatial features like edges, shapes, and texture anomalies. These
features are vectorized and passed into the RNN for sequential analysis.

4.3.3 Sequence Modeling

The LSTM-based RNN processes temporal sequences of frame-level features. This enables
the model to learn patterns over time and detect temporal inconsistencies in deepfake videos.

4.3.4 Final Classification

A softmax/sigmoid activation function in the final dense layer is used to classify each frame
(or sequence) as real or fake. Frame-wise predictions are aggregated to make a final
judgment on the entire video.
4.4 NON-FUNCTIONAL REQUIREMENTS

4.4.1 Reliability

The system must consistently deliver accurate predictions across various datasets and
deepfake techniques. It should exhibit minimal false positives/negatives and ensure robust
behavior during large-scale batch predictions.

4.4.2 Performance and Response Time

Training and inference must be optimized using GPU acceleration and model pruning if
necessary. Inference time for real-time applications (e.g., live video analysis) must remain
within acceptable latency thresholds.

4.4.3 Scalability

The system should support scaling on cloud infrastructure to handle larger datasets and more
concurrent video inputs for batch validation. Containerization tools (e.g., Docker) may be
used for deployment scalability.

4.4.4 Security

Security measures must be enforced to prevent tampering with training datasets or model
outputs. This includes:

1.Secured dataset storage (e.g., S3 buckets with encryption)

2.Model checkpoint integrity verification

3.Secure API endpoints with token-based authentication (OAuth/JWT)

4.4.5 Interoperability

The system should be compatible with video input from diverse sources such as MP4 files,
live streams, and dataset repositories. It must support integration with web-based user
interfaces for uploading videos and retrieving classification results.
4.5 REQUIRED LIBRARIES AND FRAMEWORKS

NumPy – For numerical array operations and data preparation

Pandas – For structured data handling and annotation management

TensorFlow / Keras – For model development (CNN, LSTM, training, and


evaluation)

PyTorch – (optional) as an alternative deep learning framework

OpenCV – For video frame extraction, image processing, and visualization

dlib / MTCNN – For face detection and alignment

Matplotlib / Seaborn – For plotting training curves, loss metrics, and confusion
matrices

Scikit-learn – For auxiliary classification, performance metrics, and cross-validation

Flask / FastAPI – For exposing the model as an API (if deployed for external use)

Jupyter Notebook – For experimental model tuning and visualization


CHAPTER 5
MODELS AND METHODS

The FakeSpotter framework leverages multiple deep learning techniques, with a primary
focus on hybrid architectures that combine Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs). These methods are essential for extracting spatial and
temporal features from video data to identify deepfakes. This chapter outlines the various
models and algorithms relevant to the development of the FakeSpotter system.

5.1 CONVOLUTIONAL NEURAL NETWORK (CNN)

CNNs are a class of deep neural networks widely used for image recognition and
classification tasks. In the FakeSpotter system, CNNs are employed for spatial feature
extraction from individual video frames. These features may include texture irregularities,
inconsistent lighting, and facial distortions that often go unnoticed by the human eye.

Application in FakeSpotter:

Detecting pixel-level artifacts introduced by deepfake generation algorithms.

Extracting high-level visual features from aligned face regions.

Pretrained CNN models like VGG-16 are used through transfer learning to reduce
training time and increase accuracy.

Working Process:

Each video is broken down into frames.

Each frame is passed through the CNN model.


Intermediate layers extract feature maps that represent critical visual cues.

These feature maps are passed into the next stage (RNN) for temporal analysis.

5.2 RECURRENT NEURAL NETWORK (RNN) – LSTM

RNNs, and specifically Long Short-Term Memory (LSTM) networks, are designed to
process sequential data. In FakeSpotter, LSTMs are used to model temporal dependencies
across video frames, capturing inconsistencies in motion patterns, facial expressions, and
blinking.

Application in FakeSpotter:

Learning patterns across sequences of video frames.

Detecting temporal anomalies such as unnatural transitions or inconsistent eye


movement.

Providing context-aware predictions based on frame history.

Working Process:

CNN-extracted features are grouped into sequences (e.g., 10 consecutive frames).

LSTM units analyze the progression of these features.

Output is passed to a classifier for final decision-making.


5.3 SUPPORT VECTOR MACHINE (SVM) (For Baseline Comparison)

SVM is a classical supervised learning algorithm often used as a baseline model for
classification tasks. Though not part of the core architecture of FakeSpotter, SVMs were
explored during the early experimentation phase for frame-based classification.

Applications:

Classifying individual frames as real or fake based on extracted CNN features.

Evaluating performance against the proposed CNN-RNN model.

Used as a lightweight alternative in systems where deep learning resources are limited.

5.4 DECISION TREE / RANDOM FOREST (For Feature Selection or Ensemble


Approaches)

Decision Trees and Random Forests are ensemble learning methods used in some deepfake
detection pipelines for feature ranking or ensemble prediction. While not the core model
in FakeSpotter, these algorithms can assist in:

Selecting important spatial features for classification.

Aggregating outputs from multiple shallow models for improved prediction.

Acting as interpretable models in proof-of-concept phases.


5.5 K-NEAREST NEIGHBORS (KNN) (For Similarity Matching in Model Behavior
Analysis)

KNN is occasionally used in post-classification analysis to:

Identify similar patterns in misclassified frames.

Group video samples by feature similarity for clustering or visualization.

Benchmark lightweight performance when deep models are unavailable.

Summary of Selected Models in FakeSpotter:

Model Purpose Integration

CNN (VGG-16) Spatial feature extraction from frames Core component

LSTM Temporal sequence modeling Core component

SVM Frame-level classification (baseline) Experimental/comparison

Random Forest Feature importance analysis (optional) Experimental support


Model Purpose Integration

KNN Similarity analysis (optional) Supplementary/analytics

CHAPTER 6
MODULES AND UML DIAGRAMS

6.1 MODULES

The FakeSpotter system is structured into the following major functional modules:

 Admin Panel
 Video Upload and Preprocessing
 Face Detection and Alignment
 Feature Extraction (CNN Module)
 Temporal Analysis (RNN/LSTM Module)
 Classification and Result Display

6.2 MODULES DESCRIPTION

6.2.1 Admin Panel

The Admin Panel module allows authorized users to manage system configurations, oversee
detection results, and analyze system performance logs.

Functionality:

Admin authentication and access control

Monitoring uploaded video batches


System settings (e.g., model version, threshold tuning)

Export and audit detection reports

6.2.2 Video Upload and Preprocessing

This module handles user-uploaded videos, converting them into frame sequences for further
processing.

Functionality:

Accepting video input (e.g., .mp4 format)

Extracting frames at fixed intervals

Normalizing resolution and format

Logging metadata (e.g., frame rate, resolution)

6.2.3 Face Detection and Alignment

Face detection and alignment are critical for ensuring consistent frame inputs to the CNN.

Functionality:

Detecting faces in each frame using dlib or MTCNN

Aligning facial regions for uniformity

Cropping and saving face regions

Filtering frames with low-confidence detections


6.2.4 Feature Extraction (CNN Module)

This module uses a pre-trained CNN model (e.g., VGG-16) to extract spatial features from
each aligned frame.

Functionality:

Loading pre-trained CNN weights

Passing each frame through CNN layers

Storing output feature vectors for sequence modeling

Identifying artifacts like blurring, mismatched lighting, or texture anomalies

6.2.5 Temporal Analysis (RNN/LSTM Module)

Using LSTM, this module analyzes the extracted frame-level features over time to identify
deepfake-specific temporal inconsistencies.

Functionality:

Feeding sequences of CNN features into the LSTM

Detecting irregular motion patterns or unnatural transitions

Producing a confidence score for each sequence


Learning temporal relationships between consecutive frames

6.2.6 Classification and Result Display

The final module classifies the input video as real or fake and displays the result to the user.

Functionality:

Aggregating frame-level predictions into a video-level decision

Displaying confidence score and classification label (Real/Fake)

Logging and visualizing results for review

Downloadable detection report

6.3 UML DIAGRAMS

Unified Modeling Language (UML) diagrams illustrate the functional and data flow
structure of the FakeSpotter system.

6.3.1 Data Flow Diagram (DFD)

The Data Flow Diagram (DFD) illustrates how data flows through the FakeSpotter system,
from video input to final deepfake classification. It outlines the sequence of operations,
including preprocessing, feature extraction, temporal analysis, and result generation.
Key Points:

Maps data movement across all core modules of FakeSpotter

Visualizes input, processing, and output stages involved in deepfake detection

Clarifies interactions between users, the CNN-RNN model, and storage systems
(e.g., feature vectors, logs)DFD – Level 1 (Simplified Flow)
Process Flow Summary:

1. User Uploads Video – Interface accepts .mp4 or similar formats.

2. Preprocessing – The video is broken into frames; faces are detected and aligned.

3. CNN Feature Extraction – Each frame is passed through the VGG-16 network to

extract spatial features.

4. LSTM Temporal Analysis – Sequences of CNN features are analyzed to detect

temporal inconsistencies.

5. Classification – Frame-level predictions are aggregated to classify the video as Real or

6. Fake.Output Display – The result is shown to the user with an option to download a

report

6.3.2 Use Case Diagram

The Use Case Diagram outlines the functional requirements of the FakeSpotter system by

identifying the primary actors (Admin and User) and their interactions with the system’s core

modules.
Key Elements:

Actors:

User – A person who uploads video content for verification.

Admin – The system administrator who monitors operations and manages datasets,

models, and system parameters.


Use Cases:

Actor Use Cases

- Upload Video

User - View Detection Result

- Download Report

- Login to Admin Panel

- Configure Detection Model


Admin
- Monitor Detection Logs

- Manage Dataset and System Settings

Relationships and Description:

The User interacts with the system through the Upload Video use case, triggering

preprocessing, model analysis, and result display.

The Admin accesses the Admin Panel, which provides tools to update the detection model,

review classification logs, and manage stored videos or results.

Use cases are connected to their respective actors with association lines (as per UML

standards), forming a visual relationship between system functionality and the roles that

access them.
Diagram Summary :

Admin

Manage Dataset [Configure Model] [View Logs]

User

Upload Video [View Result] [Download Report]

This diagram highlights the high-level functionalities available to each system actor and

shows how they interact with FakeSpotter's core detection pipeline.

6.3.3 Activity Diagram

The Activity Diagram outlines the workflow involved in the core functionality of the

FakeSpotter system—specifically the process of detecting deepfakes in user-submitted video

content. This diagram visually represents the sequence of activities performed, from video

upload to final classification and result display.

This Activity Diagram helps visualize how the FakeSpotter system orchestrates AI

components (CNN + LSTM) and backend services to deliver a reliable deepfake detection

result. It aids developers and stakeholders in understanding both user interactions and system

operations.
Components:

 Initial node (start point)

 Activities (user actions or system processes)

 Decision points (conditional branches)

 Final node (completion of the workflow)


6.3.4 Sequence Diagram

The Sequence Diagram outlines the chronological flow of interactions among various
components in the FakeSpotter system. It visualizes how the user interacts with the system
and how internal modules such as the backend server, AI model, and database work together
to process deepfake detection tasks.

Depictions:

Lifelines:

User Interface (UI) – Where the user uploads the video and views results.

Backend Server (API Layer) – Manages requests and coordinates between modules.

AI Model (CNN + LSTM) – Handles feature extraction and classification.

Database/Storage – Stores videos, processed frames, logs, and prediction outputs.


Fig.6.3.4 Sequence Diagram
6.3.5 Class Diagram

The Class Diagram represents the static structure of the FakeSpotter system by outlining the

key classes, their attributes, methods, and the relationships among them. This diagram forms

the backbone of the object-oriented architecture and illustrates how different system

components interact during the deepfake detection process.

Key Points:

1.Defines core classes such as: User, Video, FrameProcessor, FeatureExtractor,

SequenceAnalyzer, PredictionEngine, and Result.

2.Attributes and methods are described for each class, helping developers understand

system logic and data flow.

3.Relationships (associations, aggregations, inheritance) clarify how objects

communicate and depend on each other.

Main Classes and Relationships:

1.User:

 Attributes: userID, username, email

 Methods: uploadVideo(), viewResult()

2.Admin (inherits from User):

 Attributes: adminPrivileges

 Methods: manageUsers(), manageModel(), viewLogs()


3.Video:

 Attributes: videoID, filePath, uploadDate, status

 Methods: extractFrames(), getMetadata()

4.FrameProcessor:

 Attributes: frameList[], faceCoordinates

 Methods: detectFace(), alignFace(), resizeFrame()

5.FeatureExtractor (CNN Module):

 Attributes: modelName, featureVector[]

 Methods: extractFeatures(frame)

6.SequenceAnalyzer (LSTM Module):

 Attributes: sequenceLength, lstmUnits

 Methods: analyzeTemporalPattern(sequence), classify()

7.PredictionEngine:

 Attributes: confidenceScore, predictionLabel

 Methods: aggregatePredictions(), generateReport()

8.Result:

 Attributes: resultID, videoID, userID, label, score

 Methods: exportPDF(), displayResult()


Relationships:

1. A User can upload multiple Videos.

2. A Video is linked to multiple Frames, which are processed by the FrameProcessor.

3. FrameProcessor interacts with the FeatureExtractor (CNN).

4. FeatureExtractor passes data to SequenceAnalyzer (LSTM).

5. SequenceAnalyzer provides output to PredictionEngine, which generates the final Result.

6. Admin inherits from User and manages system operations and results.

6.3.6 Project Structure


A project developed in Visual Studio Code (VS Code) benefits significantly from a modular,
organized directory structure. For FakeSpotter, which integrates deep learning models with
frontend and backend components, a clean project layout is critical to support:

1. Scalability
2. Maintainability
3. Efficient collaboration
4. Seamless development-to-deployment transition
5. The FakeSpotter project, built using React.js (frontend) and Python (Flask/TensorFlow
backend), adheres to a well-structured format for AI-powered applications.
PROJECT FILES AND SETUP OVERVIEW

PROJECT FILE AND DIRECTORY STRUCTURE

The FakeSpotter project follows a modular directory structure that separates concerns
between the frontend interface and backend deep learning functionality. The system
leverages modern web development frameworks alongside AI libraries to deliver a seamless
deepfake detection platform.

Key Directories and Files:

node_modules/Contains all Node.js packages used for both frontend and backend
operations. Dependencies include:

 React.js for the UI


 Express.js for backend API services
 TensorFlow.js / TensorFlow (Python) for model implementation
 OpenCV / OpenCV.js for video and frame processing
 Managed automatically via package.json.

public/Holds static files like:

 index.html (main HTML template)


 Favicon and manifest files
 Assets such as logos or images
src/The primary source code directory with the following subfolders:

1. components/ – Reusable React components (e.g., video uploader, result cards, modals)
2. hooks/ – Custom React hooks for managing state (e.g., video upload, model inference)
3. integrations/ – API logic for integrating TensorFlow or Python-backend inference
engines
4. lib/ – Helper functions for video splitting, face alignment, and feature formatting
5. pages/ – Core routes like Home, Upload, Result Dashboard
6. types/ – TypeScript interfaces for consistent data types across the app
7. App.tsx / App.css – Root application logic and styling
8. main.tsx – React app entry point

models/Contains trained model files (.h5, .pb) exported from TensorFlow/Keras. May
include:

 CNN (VGG-16) pretrained layers


 LSTM sequence classifier
 Model architecture and weights

backend/Optional Python or Node.js backend API endpoints for:

 Video preprocessing (OpenCV, dlib)


 Inference handling (via Flask or FastAPI)
 File uploads and logging

supabase

If used, this contains configurations for Supabase authentication and storage (can be replaced
with Firebase, MongoDB, or local DB).
Other Core Files:

 .gitignore – Ignores node_modules, model weights, and environment configs


 bun.lockb – Dependency lock file (if Bun runtime used)
 components.json – Metadata for dynamic UI generation
 eslint.config.js – Code quality enforcement
 README.md – Project setup, instructions, and usage documentation
 vite.config.ts – Configuration for Vite frontend bundler
 tsconfig.json – TypeScript compiler settings
 tailwind.config.ts – TailwindCSS design customizations

BUILD AND SETUP


Frontend
Built with React + Vite for rapid development and hot reload

TailwindCSS for design


Run using:

npm install
npm run dev

Backend / AI Model
TensorFlow model can run as:
In-browser via TensorFlow.js
Python backend via Flask/FastAPI
Pre-trained weights for VGG-16 + LSTM loaded at runtime

Setup:

pip install -r requirements.txt


python app.py
Deployment
Frontend: Deploy using Vercel, Netlify, or Firebase Hosting
Backend/Model: Host with Render, Heroku, or AWS EC2

DEVELOPMENT TOOLS – VS CODE

Visual Studio Code (VS Code) is the primary development environment used throughout the
FakeSpotter project.

Features of VS Code:

 Lightweight & Fast: Minimal resource use during development


 Extensible: Rich marketplace for Python, React, and TensorFlow plugins
 Integrated Git: Full version control within the editor
 IntelliSense: Context-aware code completion and tooltips
 Debugging: Set breakpoints and inspect variable states in both Python and JavaScript
 Cross-Platform: Available on Windows, macOS, and Linux
 Integrated Terminal: Run scripts and install dependencies inside VS Code
 Highly Customizable: Themes, layouts, and settings are all user-friendly
 Python Integration: Seamless support for Jupyter Notebooks and script debugging
 Large Community Support: Massive documentation, tutorials, and third-party tools
CHAPTER 7

IMPLEMENTATION

7.1 DATA ANALYSIS

7.1.1 Exploring Video Frame Patterns

In the initial phase of implementation, the focus was on analyzing the structure and patterns

within deepfake and real video samples. Frame-level inconsistencies were key indicators of

manipulation.

The analysis involved:

 Studying eye blinking patterns, lip synchronization, and facial distortions over time.

 Identifying pixel-level artifacts in individual frames, such as color mismatches, edge

blurs, and lighting discrepancies.

 Measuring temporal inconsistencies, such as irregular transitions or unnatural motion

between frames.

 This exploratory analysis informed the feature extraction process and helped refine the

CNN-RNN architecture for optimal deepfake detection.


7.1.2 Training Dataset Acquisition

The performance of the FakeSpotter system heavily relies on the quality of the training

dataset and its ability to reflect real-world deepfake characteristics.

We used the Celeb-DF (v2) dataset, a large-scale and high-quality deepfake dataset that

includes:

Real videos of celebrities speaking in interview settings.

Corresponding deepfake videos generated using face-swapping and expression manipulation

techniques.

Metadata such as frame count, resolution, and generation method.

The dataset was chosen for its:

High realism (minimal visual artifacts).

Rich diversity in facial expressions, angles, and lighting conditions.

Relevance to current state-of-the-art deepfake generation methods.


7.2 DATA PRE-PROCESSING

1. Before model training, the following preprocessing pipeline was applied:

2. Frame Extraction: Each video was decomposed into a sequence of frames at a fixed

frame rate (e.g., 10 fps).

3. Face Detection and Alignment: Using dlib and OpenCV, faces were detected in each

frame and aligned to a standard orientation.

4. Image Normalization: Pixel values were scaled between 0 and 1 to ensure model

convergence during training.

5. Resizing: All frames were resized to 64x64 pixels for compatibility with the CNN input

layer.

6. Label Encoding: Videos were labeled as real (0) or fake (1) for supervised training.

7. Sequence Generation: Frame sequences of length 10 were created for temporal modeling

by LSTM.

8. Train-Test Split: Data was split into 80% training and 20% testing sets to validate

generalization.
7.3 MACHINE LEARNING APPROACH

7.3.1 CNN Feature Extraction (VGG-16)

A Convolutional Neural Network (CNN) based on VGG-16 was used to extract spatial

features from individual video frames.

 Input: Aligned face frames of size 64x64x3.

 Model: Pre-trained VGG-16 layers were used with frozen weights (transfer learning).

 Output: 512-dimensional feature vectors representing spatial characteristics of each

frame.

 Implementation: TensorFlow and Keras were used to load VGG-16 and extract

intermediate layer outputs.

7.3.2 LSTM Sequence Modeling

To analyze temporal dependencies, we used Long Short-Term Memory (LSTM) networks on

top of CNN-extracted features.

Input: Sequences of CNN feature vectors (one sequence = 10 frames).

Architecture:

 LSTM Layer with 64 units

 Dense Layer with ReLU activation

 Output Layer with Sigmoid activation for binary classification

 Loss Function: Binary Cross-Entropy

 Optimizer: Adam (learning rate = 0.0001)


7.3.3 Implementation Summary

Frameworks Used: TensorFlow, Keras, OpenCV

Model Structure:

python

model = Sequential([

TimeDistributed(Conv2D(32, (3, 3), activation='relu'), input_shape=(10, 64, 64, 3)),

TimeDistributed(MaxPooling2D((2, 2))),

TimeDistributed(Flatten()),

LSTM(64),

Dense(64, activation='relu'),

Dense(1, activation='sigmoid')

])

Training: Conducted over 10 epochs with a batch size of 8.

Output: Each video sequence is classified as real (0) or fake (1) based on learned

spatiotemporal patterns.
IMPLEMENTATION PROCESS
CHAPTER 8
FEASIBILITY STUDY AND SYSTEM TESTING

8.1 FEASIBILITY STUDY

Feasibility studies are essential to assess whether a project can be successfully implemented,
both technically and economically, while ensuring user satisfaction and long-term
sustainability. For FakeSpotter, the feasibility study was conducted to evaluate cost-
effectiveness, technical readiness, and operational practicality.

The study is categorized into three main dimensions:

 Economic Feasibility
 Technical Feasibility
 Operational Feasibility

8.1.1 ECONOMIC FEASIBILITY

The FakeSpotter project emphasizes low-cost yet efficient deployment. By leveraging


open-source frameworks (TensorFlow, OpenCV, Flask, Keras) and cloud-hosted datasets
(e.g., Celeb-DF), the system achieves accurate deepfake detection without high operational
expenses.

Deployment via cloud services (e.g., Render, Heroku) and GPU-enabled local systems
ensures minimal infrastructure investment while maintaining high detection accuracy and
scalability. Overall, the project is economically feasible for academic, forensic, or content
verification use cases.
8.1.2 TECHNICAL FEASIBILITY

From a technical perspective, FakeSpotter is designed using reliable and proven components:

1.CNN-RNN architecture using VGG-16 and LSTM

2.Python backend with TensorFlow for model inference

3.OpenCV for real-time frame extraction and facial preprocessing

4.Optional React frontend for user-friendly video upload and result display

All tools are compatible with standard computing environments and require no proprietary
hardware or software, making the project technically sound and deployable on multiple
platforms.

8.1.3 OPERATIONAL FEASIBILITY

The FakeSpotter system is intuitive and simple to operate. Users upload a video, and the
system processes it to return a deepfake classification along with a confidence score. The
web interface (if implemented) provides a clean dashboard, while the backend automates
feature extraction, sequence analysis, and decision-making.

Clear system flow, lightweight infrastructure, and visual feedback make it suitable for users
in academic research, media authentication, and cyber forensics with minimal technical
training required.
8.2 SYSTEM TESTING

Testing ensures the reliability, functionality, and usability of FakeSpotter under various
conditions. Multiple testing methodologies were employed to validate different system
components.

8.2.1 VARIOUS LEVELS OF TESTING

1.White Box Testing

2.Black Box Testing

3.Unit Testing

4.Functional Testing

5.Performance Testing

6.Integration Testing

7.Validation Testing

8.System Testing

9.Output Testing

10.User Acceptance Testing

8.2.1.1 WHITE BOX TESTING

Internal functions of the CNN-LSTM model pipeline and API routes were tested to ensure
proper feature transformation, LSTM sequence handling, and frame classification logic.
Python unit tests validated data preprocessing, model input/output, and intermediate layers.
8.2.1.2 BLACK BOX TESTING

Tested the system from the user's perspective: videos were uploaded, and outputs were
validated without inspecting internal operations. Verified that real and fake videos were
correctly classified based on system predictions.

8.2.1.3 UNIT TESTING

Each module—video preprocessor, frame extractor, face aligner, CNN feature encoder, and
LSTM classifier—was tested independently to verify correctness using mock data.

8.2.1.4 FUNCTIONAL TESTING

Validated core functionalities such as:

1.Uploading video files

2.Extracting and processing frames

3.Generating classification results

4.Displaying prediction confidence

8.2.1.5 PERFORMANCE TESTING

Stress-tested the system with videos of varying lengths, resolutions, and face complexities.
Measured latency from upload to result generation and ensured real-time feasibility for small
to medium videos.
8.2.1.6 INTEGRATION TESTING

Ensured smooth interaction between:

1.Frontend uploader (React)

2.Backend model API (Flask or Node)

3.File handler

4.Face detection and CNN-RNN model components

8.2.1.7 VALIDATION TESTING

Compared system outputs with labeled test samples from the Celeb-DF dataset to verify
detection accuracy, precision, and recall. Confirmed the model met project goals and dataset
benchmarks.

8.2.1.8 SYSTEM TESTING

Conducted on the full end-to-end system:

1.Video upload

2.Frame extraction

3.Model inference

4.Result display

Tested cross-browser compatibility and device responsiveness for the frontend (if deployed).
8.2.1.9 OUTPUT TESTING

Verified that the output classification (Real/Fake) and confidence scores were accurate and
interpretable by end users. Output reports matched frame analysis consistency.

8.2.1.10 USER ACCEPTANCE TESTING

Pilot testing was conducted with sample users (e.g., digital forensics researchers, students).
Feedback confirmed the system's ease of use, effectiveness, and potential for further
improvement with larger datasets.
CHAPTER 9

RESULTS

OUTPUTS:
IF IT’S REAL:
IF IT’S FAKE:

CHAPTER 10
CONCLUSION

The FakeSpotter framework marks a significant advancement in the field of digital media

forensics and deepfake detection. By integrating Convolutional Neural Networks (CNNs)

with Recurrent Neural Networks (RNNs)—specifically VGG-16 for spatial feature

extraction and LSTM for temporal analysis—FakeSpotter provides a powerful hybrid

architecture capable of detecting deepfakes with high accuracy and reliability.

The system effectively captures frame-level artifacts and sequence-based inconsistencies that

are typically overlooked by conventional single-model approaches. Trained on the Celeb-DF

dataset, the model demonstrates strong generalization to real-world deepfake scenarios and is

capable of identifying manipulations involving facial distortions, unnatural motion patterns,

and blinking anomalies.

Through comprehensive testing—including unit, system, integration, and performance

evaluations—FakeSpotter has proven to be a robust solution for real-time and batch

deepfake classification. It offers a scalable, low-cost, and effective framework for

applications in digital forensics, media integrity validation, academic research, and public

awareness campaigns.
Future Enhancements

To further improve detection capabilities and expand the system's impact, future work on

FakeSpotter may include:

Integration of Transformer-based models for deeper sequence-level understanding and

improved generalization across manipulation techniques.

Real-time inference on edge devices using model quantization and ONNX conversion for

lightweight deployment.

Extension to audio-visual deepfake detection, enabling multi-modal forensics.

Web-based and mobile frontends for public or institutional use in content verification

workflows.

Continuous learning framework, allowing the model to adapt to newly emerging deepfake

techniques in the wild.

With these upgrades, FakeSpotter aspires to become a leading tool in the global effort to

safeguard digital authenticity, combat synthetic media threats, and promote ethical AI usage.

SOURCE CODE:
Main.py:

from flask import Flask, render_template, request, redirect, url_for

import os

import cv2

import numpy as np

from tensorflow.keras.models import load_model

from werkzeug.utils import secure_filename

app = Flask(__name__)

app.config['UPLOAD_FOLDER'] = 'static/uploads'

os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)

model = load_model("model/deepfake_mobilenet.h5")

def predict_video(video_path):

cap = cv2.VideoCapture(video_path)

ret, frame = cap.read()

cap.release()

if not ret:

return "Could not read video"


frame = cv2.resize(frame, (128, 128))

frame = np.expand_dims(frame, axis=0) / 255.0

prediction = model.predict(frame)[0]

return "Real" if np.argmax(prediction) == 0 else "Fake"

@app.route('/')

def index():

return render_template('index.html')

@app.route('/predict', methods=['GET', 'POST'])

def predict():

result = None

video_filename = None

if request.method == 'POST':

file = request.files['video']

if file and file.filename.lower().endswith('.mp4'):

filename = "test.mp4"

video_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)

file.save(video_path)

result = predict_video(video_path)

video_filename = filename

return render_template('predict.html', result=result, video=video_filename)

if __name__ == '__main__':

app.run(debug=True)

Cnn_Mobilenet.py:
import os

import cv2

import numpy as np

from tensorflow.keras.models import Model

from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout

from tensorflow.keras.applications import MobileNetV2

from tensorflow.keras.applications.mobilenet_v2 import preprocess_input

from tensorflow.keras.utils import to_categorical

from sklearn.model_selection import train_test_split

# Parameters

IMG_SIZE = 224

DATASET_PATH = "dataset"

CATEGORIES = ['real', 'fake']

# Load one frame per video

def load_data():

data = []

for label, category in enumerate(CATEGORIES):

folder = os.path.join(DATASET_PATH, category)

if not os.path.exists(folder):

continue

for video in os.listdir(folder):

video_path = os.path.join(folder, video)


cap = cv2.VideoCapture(video_path)

ret, frame = cap.read()

if ret:

frame = cv2.resize(frame, (IMG_SIZE, IMG_SIZE))

data.append([frame, label])

cap.release()

return data

# Load and preprocess data

print("[INFO] Loading data...")

data = load_data()

X = np.array([i[0] for i in data])

X = preprocess_input(X) # MobileNetV2-specific preprocessing

y = to_categorical([i[1] for i in data], num_classes=2)

# Split into train/test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build MobileNetV2 model

print("[INFO] Building model...")

base_model = MobileNetV2(weights='imagenet', include_top=False,

input_shape=(IMG_SIZE, IMG_SIZE, 3))

base_model.trainable = False # Freeze base model


x = base_model.output

x = GlobalAveragePooling2D()(x)

x = Dense(128, activation='relu')(x)

x = Dropout(0.5)(x)

predictions = Dense(2, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=predictions)

# Compile the model

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model

print("[INFO] Training model...")

model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

# Save the model

os.makedirs("model", exist_ok=True)

model.save("model/deepfake.h5")

print("[INFO] Model saved to model/deepfake_mobilenet.h5")

INDEX.HTML:
<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<title>Home | Deepfake Detector</title>

<style>

body {

font-family: 'Segoe UI', sans-serif;

margin: 0;

padding: 0;

background: #317180;

color: #333;

nav {

background-color: #1a1a1a;

color: white;

padding: 15px 20px;

display: flex;

justify-content: space-between;

align-items: center;

nav h2 {

margin: 0;

}
nav a {

color: white;

text-decoration: none;

margin-left: 20px;

font-weight: bold;

.container {

display: flex;

flex-wrap: wrap;

max-width: 1100px;

margin: 40px auto;

background: #f2f7fb;

box-shadow: 0 0 10px rgba(0,0,0,0.1);

border-radius: 8px;

overflow: hidden;

.text-section, .image-section {

flex: 1;

padding: 40px;

.text-section {

min-width: 300px;

.text-section h1 {
color: #004080;

.text-section p {

font-size: 18px;

line-height: 1.6;

.btn {

margin-top: 30px;

display: inline-block;

padding: 12px 20px;

font-size: 16px;

background-color: #004080;

color: white;

text-decoration: none;

border-radius: 5px;

.btn:hover {

background-color: #0059b3;

/* Slideshow styling */

.slideshow-container {

position: relative;

max-width: 100%;
height: 300px;

margin: auto;

overflow: hidden;

.slide {

display: none;

width: 100%;

height: 300px;

.slide img {

width: 100%;

height: 100%;

object-fit: cover;

border-radius: 8px;

/* Next/prev buttons */

.prev, .next {

cursor: pointer;

position: absolute;

top: 50%;

width: auto;
padding: 16px;

margin-top: -22px;

color: white;

font-weight: bold;

font-size: 18px;

transition: 0.6s ease;

border-radius: 0 3px 3px 0;

user-select: none;

background: rgba(0,0,0,0.5);

.next {

right: 0;

border-radius: 3px 0 0 3px;

.prev:hover, .next:hover {

background-color: rgba(0,0,0,0.8);

</style>

</head>

<body>

<nav>

<h2>Deepfake Detector</h2>
<div>

<a href="{{ url_for('index') }}">Home</a>

<a href="{{ url_for('predict') }}">Detection</a>

</div>

</nav>

<div class="container">

<div class="text-section">

<h1>What is a Deepfake?</h1>

<p>

Deepfakes are synthetic media created using artificial intelligence, where a

person's face, voice, or actions can be convincingly altered.

These manipulations use deep learning (GANs) and are used in entertainment, but

also pose risks like misinformation, identity fraud, and privacy invasion.

</p>

<a href="{{ url_for('predict') }}" class="btn">Check Video</a>

</div>

<div class="image-section">

<div class="slideshow-container">

<div class="slide"><img src="{{ url_for('static', filename='images/img1.jpg') }}"

alt="Slide 1"></div>
<div class="slide"><img src="{{ url_for('static', filename='images/img2.jpg') }}"

alt="Slide 2"></div>

<div class="slide"><img src= "{{ url_for('static', filename='images/img3.jpg') }}"

alt="Slide 3"></div>

<div class="slide"><img src= "{{ url_for('static', filename='images/img4.jpg') }}"

alt="Slide 4"></div>

<div class="slide"><img src= "{{ url_for('static', filename='images/img5.png') }}"

alt="Slide 5"></div>

<a class="prev" onclick="changeSlide(-1)">❮</a>

<a class="next" onclick="changeSlide(1)">❯</a>

</div>

</div>

</div>

<script>

let slideIndex = 0;

showSlides();

function showSlides() {
let slides = document.getElementsByClassName("slide");

for (let i = 0; i < slides.length; i++) {

slides[i].style.display = "none";

slideIndex++;

if (slideIndex > slides.length) { slideIndex = 1 }

slides[slideIndex - 1].style.display = "block";

setTimeout(showSlides, 4000); // Change image every 4 seconds

function changeSlide(n) {

slideIndex += n - 1;

showSlides();

</script>

</body>

</html>

PREDICT.HTML:

<!DOCTYPE html>
<html lang="en">

<head>

<meta charset="UTF-8">

<title>Upload Video - Deepfake Detector</title>

<style>

body {

font-family: 'Segoe UI', sans-serif;

margin: 0;

padding: 0;

background-color: #317180;

color: #333;

nav {

background-color: #1a1a1a;

color: white;

padding: 15px 20px;

display: flex;

justify-content: space-between;

align-items: center;

nav h2 {

margin: 0;
}

nav a {

color: white;

text-decoration: none;

margin-left: 20px;

font-weight: bold;

.container {

max-width: 600px;

margin: 40px auto;

background-color: #f2f7fb;

padding: 40px;

border-radius: 8px;

box-shadow: 0 0 15px rgba(0,0,0,0.2);

text-align: center;

h1 {

color: #004080;

input[type="file"] {
padding: 10px;

margin-top: 20px;

margin-bottom: 20px;

font-size: 16px;

button {

padding: 12px 20px;

font-size: 16px;

background-color: #004080;

color: white;

border: none;

border-radius: 5px;

cursor: pointer;

button:hover {

background-color: #0059b3;

video {

margin-top: 20px;
border-radius: 8px;

box-shadow: 0 0 10px rgba(0,0,0,0.2);

h2 {

margin-top: 20px;

color: white;

</style>

</head>

<body>

<nav>

<h2>Deepfake Detector</h2>

<div>

<a href="{{ url_for('index') }}">Home</a>

<a href="{{ url_for('predict') }}">Detection</a>

</div>

</nav>

<div class="container">

<h1>Upload a Video</h1>

<form method="post" enctype="multipart/form-data">

<input type="file" name="video" accept="video/mp4" required><br>


<button type="submit">Upload & Predict</button>

</form>

{% if video %}

<h3>Uploaded Video:</h3>

<video width="100%" controls preload="auto">

<source src="{{ url_for('static', filename='uploads/test.mp4') }}">

Your browser does not support the video tag.

</video>

{% endif %}

{% if result == "Real" %}

<h2 style="color: green;">✅ Real </h2>

{% elif result == "Fake" %}

<h2 style="color: red;">❌ Fake ✋🚫</h2>

{% endif %}

</div>

</body>

</html>

REFERENCES
1.Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A Large-Scale Challenging Dataset

for DeepFake Forensics,” Proceedings of the IEEE/CVF Conference on Computer Vision

and Pattern Recognition (CVPR), 2020, pp. 3207–3216.

2.S. M. Korshunov and T. Ebrahimi, “DeepFakes: A New Threat to Face Recognition?

Assessment and Detection,” arXiv preprint arXiv:1812.08685, 2018.

3.T. Nguyen, C. Nguyen, D. Nguyen, D. Chu, and K. Nguyen, “Deep Learning for

Deepfakes Creation and Detection: A Survey,” Computers & Security, vol. 102, 2021, doi:

10.1016/j.cose.2020.102109.

4.J. Agarwal, R. Farid, “Protecting World Leaders Against Deep Fakes,” Proceedings of the

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),

2019.

5.K. Afchar, W. Nozick, J. Yamagishi, and I. Echizen, “MesoNet: A Compact Facial Video

Forgery Detection Network,” 2018 IEEE International Workshop on Information Forensics

and Security (WIFS), pp. 1–7, doi: 10.1109/WIFS.2018.8630761.

6. A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner,

“FaceForensics++: Learning to Detect Manipulated Facial Images,” 2019 IEEE/CVF

International Conference on Computer Vision (ICCV), pp. 1–11, doi:


10.1109/ICCV.2019.00010.

7.A. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative Adversarial Nets,”

Advances in Neural Information Processing Systems (NeurIPS), 2014.

S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9,

no. 8, pp. 1735–1780, 1997.

You might also like