0% found this document useful (0 votes)
12 views59 pages

DFD 3

The document outlines the development of a Deepfake Video Detection System utilizing advanced deep learning techniques, particularly the Xception architecture, to accurately distinguish between real and manipulated videos. It details the system's functionalities, including video analysis, deepfake detection, confidence scoring, and report generation, while also addressing limitations such as the need for periodic updates to adapt to new deepfake techniques. The system is designed for both technical and non-technical users, providing a user-friendly interface and robust security measures.

Uploaded by

mr.dhanush.j
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views59 pages

DFD 3

The document outlines the development of a Deepfake Video Detection System utilizing advanced deep learning techniques, particularly the Xception architecture, to accurately distinguish between real and manipulated videos. It details the system's functionalities, including video analysis, deepfake detection, confidence scoring, and report generation, while also addressing limitations such as the need for periodic updates to adapt to new deepfake techniques. The system is designed for both technical and non-technical users, providing a user-friendly interface and robust security measures.

Uploaded by

mr.dhanush.j
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

DEEPFAKE VIDEO DETECTION SYSTEM USING

DEEP LEARNING

Done By:
J. Dhanush
C. Arjoun Ram
TABLE OF CONTENTS

EXP.NO. TITLE PAGE NO.

ABSTRACT 1
1 SOFTWARE REQUIREMENTS SPECIFICATIONS 2
1.1 Introduction 2
1.1.1 Purpose 2
1.1.2 Intended Audience 2
1.2. Overall Description 2
1.2.1 Product Perspective 2
1.3. Scope 2
1.3.1 System Scope 2
1.3.2 Limitations 3
1.4. Abstract 3
1.5. Objectives 4
1.6. Functional Requirements 4
1.7. Non-Functional Requirements 5
1.8. High Level System Architecture 5
1.8.1 Explanation of System Architecture Components 5
1.9. Data Flow Diagram 11
1.10. Use Case Diagram 12
1.11. Cost Estimation and Time Scheduling 12
1.11.1 Time Scheduling 12
1.11.2 Cost Estimation 12
2 SOFTWARE DESIGN SPECIFICATION 13
2.1 Introduction 13
2.1.1 Problem Statement 13
2.2 Software Engineering Methodology 13
2.2.1 Agile Model 13
2.2.2 Phases of Development 13
2.3 Decomposition of Modules 16
2.3.1 Video Upload & Preprocessing Module 16
2.3.2 Dataset Handling Module 17
2.3.3 Deepfake Detection Model 18
2.3.4 Model Training and Evaluation Module 19
2.3.5 Prediction Module 20
2.3.6 Verification and Testing 21
2.3.7 User Interface 22
2.4 Verification Testing from Activity to Module Level 23
2.4.1 Levels of Verification Testing 23
2.4.2 Detailed Breakdown of Verification Testing 24
2.5 Conclusion 26
3 SOFTWARE TESTING 27
3.1 Test Plan 27
3.1.1 Test Strategy 27
3.1.2 Test Environment 29
3.1.3 Test Deliverables 30
3.1.4 Resources & Responsibilities 30
3.2 Test Design and Coverage Analysis 31
3.2.1 Test Cases Design 31
3.2.2 Test Coverage Analysis 32
3.2.3 Test Objectives & Criteria 32
3.3 Test Schedule and Estimations 33
3.3.1 Test Case Schedule 33
3.3.2 Test Case Estimations 33
3.4 Test Criteria 33
3.4.1 Functional Criteria 33
3.4.2 Non-Functional Criteria 34
3.5 Test Strategy 34
3.5.1 Test Levels 34
3.5.2 Testing Tools and Technologies 35
3.5.3 Testing Approach 36
3.6 Test Objectives 36
3.7 Plan Test Environment 37
3.8 Unit Testing 39
3.8.1 Unit Test Cases 39
3.8.2 Unit Test Results 39
3.8.3 Test Execution 40
3.9 Integrated Testing 40
3.9.1 Integration Test Cases 40
3.9.2 Integration Test Results 41
3.9.3 Test Execution 47
3.10 Validation Testing 47
3.10.1 Functional Validation 47
3.10.2 Non-Functional Validation 48
3.11 Defect Management 49
3.11.1 Identified Issues 49
3.11.2 Resolution Status 49
3.12 Test Deliverables 50
3.13 Conclusion 51
Glossary 52
References 53
Appendices 55
DEEPFAKE VIDEO DETECTION SYSTEM USING DEEP LEARNING
ABSTRACT
This project focuses on developing a accurate Deepfake Video Detection System using
advanced deep learning techniques to distinguish between real and manipulated videos. The
system accepts input videos in formats such as MP4, AVI, and MOV and operates in two primary
modes: training and prediction. In the training mode, it adds new videos to the dataset, while in
prediction mode, it analyzes a video to determine its authenticity. The input videos are
preprocessed through systematic steps including frame extraction, face detection using MTCNN
or Haar cascades, face cropping and alignment, and storing the cleaned data for further use. The
architectural overview of the system is depicted in Figure 1.1.
The dataset is composed of real videos (authentic recordings) and fake videos (AI-
manipulated through tools like FaceSwap and DeepFaceLab), sourced from public datasets such
as FaceForensics++, DFDC, and Celeb-DF, along with custom-collected samples for specialized
cases. After preprocessing, the data is split into training, validation, and testing sets to ensure the
model’s generalization to unseen examples. The processed frames are efficiently fed to the model
using data generators with augmentations to improve robustness.
The core of the detection model comprises the Xception architecture, which is a deep
convolutional neural network based on depthwise separable convolutions, known for its high
accuracy and computational efficiency in image classification tasks. This model is trained to detect
subtle artifacts in facial regions across individual frames, without relying on sequential modeling.
The classification head with a sigmoid activation function outputs whether a given frame is real or
fake. A majority-vote mechanism is applied across frames to classify the entire video. The detailed
data flow is illustrated in Figure 1.2(a) and Figure 1.2(b), showing the training and prediction
pipelines respectively.
The deepfake detection model (deepfake_detection_xception_180k_14epochs.h5)
achieves over 95% accuracy on benchmark datasets using evaluation metrics like accuracy,
precision, recall, AUC-ROC, and confusion matrix. It's lightweight, suitable for real-time use, and
can be optimized for deployment via ONNX on platforms like AWS SageMaker or edge devices.
The system offers frame-level visual explanations for transparency. Future work includes
improving detection of occluded faces and ultra-realistic GAN deepfakes using audio-visual fusion
and self-adaptive learning.

1
1. SOFTWARE REQUIREMENTS SPECIFICATIONS
1.1. INTRODUCTION
1.1.1 Purpose
The Deepfake Detection System is designed to identify manipulated images and videos
generated using AI techniques such as Generative Adversarial Networks (GANs). With the
increasing misuse of deepfakes in media, fraud, and misinformation, this system aims to provide
a reliable and automated solution for deepfake detection. The system leverages the Xception
architecture, a deep convolutional neural network known for its high performance in image
classification tasks, to extract spatial features from facial regions in video frames. By analyzing
frame-level inconsistencies and artifacts introduced during deepfake generation, the system
ensures high accuracy in identifying synthetic content..
1.1.2 Intended Audience
This document is intended for developers, researchers, cybersecurity professionals, and
stakeholders interested in implementing or using a deepfake detection system. It may also benefit
academic audiences and machine learning practitioners exploring real-world applications of deep
learning in multimedia forensics.
1.2. OVERALL DESCRIPTION
1.2.1 Product Perspective
This project falls under the domain of AI-based media forensics. Unlike traditional forensic
techniques that rely on metadata analysis or heuristic-based methods, this system employs deep
learning to identify manipulated facial features and artifacts in digital video content. Specifically,
it utilizes the Xception convolutional neural network, which is optimized for high-accuracy image
classification and excels at detecting subtle spatial inconsistencies common in deepfake videos.
The system is designed to function both as a standalone web application and as a modular
component that can be integrated into larger media verification or content authenticity platforms.
1.3. SCOPE
1.3.1 System Scope
The Deepfake Detection System provides the following functionalities:
 Video Analysis:Users can upload videos in formats such as MP4, AVI, or MOV to
check for deepfake manipulation.

2
 Deepfake Detection: The system utilizes the Xception deep convolutional neural
network for extracting spatial features from facial regions in video frames, enabling
the detection of synthetic content through image-level analysis.
 Confidence Scoring: Each analysis result includes a confidence score indicating the
likelihood of the media being fake, based on aggregated frame-level predictions.
 Frame Visualization: The system extracts key frames from the video and highlights
those with detected anomalies, providing visual insight into the deepfake regions.
 Report Generation: Users can download detailed analysis reports that include per-
frame predictions, overall video classification, and confidence scores with visual
annotations.
 API Integration: The system offers RESTful API endpoints for integration into third-
party applications or digital content verification platforms.
1.3.2 Limitations
 Real-time Deepfake Prevention: The system is not designed for real-time deepfake
prevention, as it relies on batch processing for video analysis, which may introduce
delays.
 Adaptation to New Deepfake Techniques: As deepfake generation techniques
continue to evolve, the system may require periodic retraining or updates to maintain
its effectiveness against emerging manipulation methods.
 Media Quality and Size: The performance of the system is sensitive to the quality and
resolution of the uploaded media. Low-quality or compressed videos may lead to
reduced accuracy in detecting deepfakes.
1.4. ABSTRACT
The deepfake detection system is a web-based application that leverages advanced deep
learning models to detect manipulated media. It employs the xception convolutional neural
network for spatial feature extraction, known for its high accuracy in image classification, and
performs detailed analysis to identify synthetic content in both images and videos. The system
provides a user-friendly interface for uploading media, displays analysis results with confidence
scores, and generates detailed reports that include visualizations of detected anomalies. This tool
is designed to assist cybersecurity experts, media professionals, and the general public in
identifying and mitigating the risks associated with deepfake technology.

3
1.5. OBJECTIVES
The primary objectives of the system are:
 High Accuracy Detection: Develop an AI-powered system capable of detecting deepfakes
with high accuracy using the Xception convolutional neural network for spatial feature
extraction and deep analysis of synthetic content in images and videos.
 User-Friendly Interface: Provide a simple and intuitive web interface for users to easily
upload and analyze media files, ensuring accessibility for both technical and non-technical
users.
 Efficient Processing: Ensure fast processing times, with results delivered within seconds
for images and within minutes for videos, without compromising accuracy.
 Scalability: Design the system to handle high user traffic and large media files efficiently,
allowing for smooth operation even during peak usage times.
 Security: Implement robust security measures to protect user data and prevent adversarial
attacks, ensuring the integrity of both the system and the analyzed media.
 Detailed Reporting: Generate comprehensive reports that include confidence scores,
visualizations of detected anomalies, and analysis summaries to provide actionable insights
for the user.
1.6 FUNCTIONAL REQUIREMENTS
 Allow Users to Upload Media:
Users can upload image and video files (JPEG, PNG, MP4). The system will validate file
type and size before processing.
 Process Media Files:
The system will extract frames from videos and preprocess them to fit the Xception
model's input requirements, including face detection and alignment.
 Detect Deepfakes:
The Xception model will analyze the media for inconsistencies, classifying it as Real or
Fake based on spatial features.
 Display Results:
The system will show a confidence score for deepfake detection and highlight sample
frames with detected anomalies.

4
 Generate Reports:
Users can download PDF reports with analysis results, confidence scores, visualizations,
and summaries.
 Provide API Access:
The system will offer REST APIs for media upload, result retrieval, and easy integration
with external platforms.
1.7 NON-FUNCTIONAL REQUIREMENTS
 Performance:
The system will process images within 5 seconds and videos within 2 minutes, depending
on video length.
 Scalability:
The system will support up to 1,000 concurrent users and allow media uploads up to 500
MB.
 Security:
User data will be encrypted during transmission and storage to ensure security and
prevent unauthorized access.
 Reliability:
The system will maintain 99.9% uptime and ensure a false positive rate under 5% for
deepfake detection.
 Usability:
The system will provide an intuitive user interface with clear instructions and helpful
error messages.
1.8. HIGH LEVEL SYSTEM ARCHITECTURE
1.8.1 Explanation of System Architecture Components
1.8.1.1 Upload Video
The system begins by accepting videos uploaded by users, forming the primary input for
analysis. The upload module is designed to support widely used video formats such as MP4, AVI,
and MOV, utilizing standard codecs like H.264 to ensure broad compatibility. Uploaded videos
serve two different purposes depending on the operating mode. In training mode, new videos are
added to the dataset to expand the training data available to the model, allowing it to learn more
diverse patterns associated with real and fake videos. In prediction mode, uploaded videos are

5
immediately subjected to analysis, where the system checks the authenticity of the content to
determine if it is real or fake. This dual-purpose functionality makes the upload module a vital
entry point for both system development and real-world application.

Xception Technique

Figure 1.1: High Level System Architecture.


1.8.1.2. Dataset (Fake / Real Videos)
The dataset forms the foundation upon which the deepfake detection model is trained and
validated. It consists of two types of videos: real videos, which are authentic recordings of human

6
faces without any manipulations, and fake videos, which have been synthetically altered using
deepfake techniques like FaceSwap or DeepFaceLab. These manipulations may involve replacing
faces, altering expressions, or modifying head movements. To ensure the quality and diversity of
the dataset, videos are collected from widely recognized public sources such as FaceForensics++,
DFDC (Deepfake Detection Challenge dataset), and Celeb-DF. Additionally, custom-collected
samples may be introduced to adapt the system for specific use cases or to handle newer types of
deepfake attacks. A balanced and high-quality dataset is critical for the model’s ability to
differentiate subtle cues between genuine and manipulated content.
1.8.1.3. Preprocessing
Before feeding the videos into the model, preprocessing ensures that only the most relevant
information — primarily human faces — is extracted and standardized. Preprocessing involves
multiple steps, each crucial for preparing clean, high-quality data that enhances model
performance.
1.8.1.3.1 Splitting Video into Frames
The first step in preprocessing involves extracting individual frames from the uploaded
videos at a consistent rate, typically 30 frames per second. This ensures that enough temporal
information is captured while maintaining manageable data volume. Frame extraction is performed
using tools such as OpenCV or FFmpeg, which are capable of handling high-resolution videos
efficiently. By analyzing videos frame-by-frame, the system preserves temporal consistency,
enabling later stages to observe changes and movements crucial for detecting deepfakes.
1.8.1.3.2 Face Detection
After frame extraction, the next step is to locate and isolate faces within each frame. This
is achieved using face detection algorithms such as MTCNN (Multi-Task Cascaded Convolutional
Networks), known for high accuracy and its ability to detect key facial landmarks like eyes, nose,
and mouth. Alternatively, Haar Cascades may be used for faster processing, although they offer
lower precision compared to MTCNN. To ensure high data quality, only faces detected with high
confidence are retained, while false or low-confidence detections are filtered out. Accurate face
detection is vital for focusing the analysis on facial regions, where deepfake manipulations are
most likely to occur.

7
1.8.1.3.3 Face Cropping
Once faces are detected, each facial region is cropped from its respective frame, effectively
removing background clutter and distractions. The cropped faces are then resized to a standardized
size, commonly 256×256 pixels, to ensure uniformity across all samples. Furthermore, facial
alignment is performed based on key landmarks like the eyes and nose, which helps in correcting
for head tilts or rotations, thereby normalizing the face orientation. This consistent preprocessing
allows the model to focus entirely on learning facial features without being influenced by
variations in pose or scale.
1.8.1.3.4 Saving Processed Data
After cropping and aligning the faces, the processed data is systematically organized and
saved into structured folders. Each folder corresponds to a specific video and maintains the
temporal order of frames. This structured organization enables efficient data loading and
management during training and prediction phases. Saving only the face-focused data also reduces
storage requirements and speeds up subsequent processing, making the system more efficient.
1.8.1.4. Data Splitting (Train / Test Split)
To ensure robust model development, the processed dataset is divided into three distinct
subsets: the training set, validation set, and test set. Typically, 70% of the data is allocated to the
training set, where the model learns patterns and adjusts its internal weights. Another 15% is
reserved for the validation set, which is used during training to fine-tune hyperparameters and
avoid overfitting. The remaining 15% is assigned to the test set, which provides an unbiased
evaluation of the model’s final performance. This structured data splitting strategy ensures that the
model generalizes well to unseen videos and performs reliably under real-world conditions.
1.8.1.5. Data Loader
To manage large volumes of video frame data efficiently, a data loader system is employed.
The data loader dynamically feeds batches of processed frames into the model during training and
prediction. It also applies data augmentations, such as random flips and rotations, which help to
artificially increase the dataset's diversity and improve the model's robustness against variations.
Furthermore, batching the data allows efficient utilization of GPU resources, enabling faster
training times and better scalability. The data loader acts as a critical bridge between the raw
dataset and the deepfake detection model.

8
1.8.1.6. Deepfake Detection Model
The Deepfake Detection System uses a deep learning architecture built on the Xception
convolutional neural network to analyze individual video frames for tampering.
1.8.1.6.1 Xception Feature Extraction
The system uses the Xception model, a depthwise separable convolutional network, to
extract spatial features from each face-cropped frame.
It effectively detects manipulation patterns such as unnatural textures, inconsistent lighting, and
boundary artifacts introduced by deepfake algorithms.
Each frame is processed independently, and the model outputs a probability score indicating
whether the frame is real or fake.
1.8.1.6.2 Frame-Level Classification
Unlike models using temporal networks like LSTM, this system classifies frames
independently without sequence modeling.
A video is classified as deepfake or real based on aggregated predictions across all analyzed frames
(e.g., majority vote or average confidence).
This approach allows faster inference and is highly effective when trained on large datasets like
DFDC or FaceForensics++.
1.8.1.6.3 Classification Head
After sequence analysis, the classification head, consisting of fully connected layers,
produces the final prediction. A sigmoid activation function at the output layer transforms the
prediction into a probability value between 0 and 1. If the output is closer to 0.0, the video is
classified as real; if it approaches 1.0, the video is labeled as fake. This simple yet effective binary
classification enables intuitive interpretation of the model’s output.
1.8.1.7. Model Evaluation
After training, the Xception-based model is evaluated using a confusion matrix that
captures true positives, true negatives, false positives, and false negatives. From this, metrics such
as accuracy, precision, recall, and AUC-ROC are calculated to assess detection quality. The
system aims for over 95% accuracy, ensuring dependable performance. The model is saved in .h5
format and can be converted to lightweight formats like ONNX for broader deployment
compatibility.

9
1.8.1.8. Prediction Flow
The prediction flow describes the steps followed when a user uploads a new video for
analysis. Refer to Figure 1.2 (a) for the prediction data flow diagram.
1.8.1.8.1 Load Trained Model
The system loads the pre-trained Xception.h5 model into memory. This allows for
immediate prediction without additional training or setup.
1.8.1.8.2 Process New Videos
The uploaded video is processed through frame extraction, followed by face detection
and cropping, mirroring the training preprocessing. Each cropped frame is passed through the
Xception model to extract features and classify whether the frame appears real or manipulated. A
final prediction (real or fake) is generated along with a confidence score, and key frames are
displayed for transparency.
1.8.1.8.3 Output Results
The prediction result is returned in the form of a JSON response indicating the
classification outcome (Real or Fake). Additionally, the system may generate visual explanations,
highlighting suspicious frames or regions where deepfake artifacts were detected, making the
decision process transparent and explainable to users.
1.8.1.9. Key Strengths of the System
The system offers robust deepfake detection using the Xception convolutional neural
network, known for its high accuracy in spotting manipulation artifacts in facial regions. It
processes individual frames effectively, detecting inconsistencies like unnatural textures and
blending issues. The model delivers fast and reliable predictions and is lightweight enough to be
deployed on standard machines. Additionally, the system enhances explainability by showing
frame-level evidence, helping users understand the classification result.
1.8.1.10. Limitations & Future Improvements
While the system performs well on common deepfakes, it may face difficulty with videos
where facial features are obstructed—such as by sunglasses, masks, or heavy makeup. Its frame-
based analysis may miss subtle temporal anomalies spread across frames. Future improvements
can include integrating temporal modeling, using more diverse datasets, and exploring
advanced architectures like Vision Transformers (ViTs) or audio-visual fusion to detect
deepfakes more comprehensively.

10
1.9. DATA FLOW DIAGRAM

Figure 1.2 (a): Prediction Data Flow Diagram.

Figure 1.2 (b): Model Training Data Flow Diagram.

11
1.10. USE CASE DIAGRAM

Figure 1.3: Use Case Diagram.


1.11. COST ESTIMATION AND TIME SCHEDULING
Table 1.1: Time Scheduling
Task Duration
Dataset Collection 2 Week
Model Training 2 Weeks
Backend Development 2 Week
UI & Integration 1 Week
Testing & Deployment 1 Week
Table 1.2: Cost Estimation
Component Estimated Cost
CPU Usage ~ 97%
Development Tools Zero-Cost
Python IDLE Zero-Cost

12
2. SOFTWARE DESIGN SPECIFICATION
2.1 INTRODUCTION
2.1.1 Problem Statement
The growing prevalence of deepfake videos threatens digital security and public trust.
These AI-generated manipulations can convincingly replicate real people, enabling the spread of
misinformation, fraud, and cyber abuse. Manual detection is unreliable and slow. An automated
solution is urgently needed to accurately detect such manipulated content.
2.2 SOFTWARE ENGINEERING METHODOLOGY
2.2.1 Agile Model
The project adopts the Agile Software Development Life Cycle (SDLC), ensuring iterative
development and continuous improvement:
 Requirement Analysis – Gather requirements related to deepfake detection, media
processing, and web interface functionality.
 Design & Prototyping – Define architecture, select the Xception model for detection, and
create UI wireframes.
 Implementation – Build components like face detection, video preprocessing, Xception-
based prediction, and upload interface.
 Testing & Validation – Conduct unit and system testing to verify detection accuracy and
performance.
 Deployment – Host the model and web interface on a suitable server for user testing and
feedback.
 Maintenance & Monitoring – Regularly update the model, monitor performance, and
address issues based on user input.
2.2.2 Phases of Development
2.2.2.1 Requirement Analysis
To identify the key functional and non-functional requirements of the deepfake detection
system.
Activities:
 Conduct research on existing deepfake techniques and detection methods.
 Analyze common datasets and model architectures.
 Identify user requirements (researchers, media companies, general users).

13
 Define system requirements, including model accuracy, latency, hardware, and dataset
dependencies.
2.2.2.2 System Design & Architecture
To define the architecture, modular components, and data flow for the system.
Activities:
 Create High-Level Design (HLD) and Low-Level Design (LLD).
 Define the modular structure of the system (Preprocessing, Model Training, Prediction,
UI).
 Choose appropriate technologies for deep learning, video processing, and cloud/local
deployment.
 Develop UML diagrams, including:
o Use Case Diagrams (Identifying user interactions)
o Class Diagrams (Describing components and data handling)
o Dataflow Diagrams (Illustrating the processing pipeline)
2.2.2.3 Implementation & Development
To build and code the system in a modular, scalable, and efficient manner.
Activities:
 Develop individual modules using Python, OpenCV, and TensorFlow.
 Train the Xception model on curated face datasets.
 Implement video splitting, face detection, and normalization functions.
 Integrate the trained model with the frontend for real-time predictions.
2.2.2.4 Testing & Verification
To verify that the system functions correctly, meets the requirements, and performs
optimally under various video conditions.
Activities:
• Unit Testing: Validate each component (frame extraction, face cropping, Xception.h5 model
inference, output prediction).
• Integration Testing: Ensure all modules work together from input to output.
• Performance Testing: Measure model accuracy, frame processing speed, and prediction
consistency.

14
Verification Levels:
• Activity-Level Testing – Validate specific processes (e.g., face detection accuracy).
• Module-Level Testing – Ensure each module (e.g., data loader, Xception.h5 model, UI) works
independently.
• System-Level Testing – End-to-end testing using real and fake video samples.
2.2.2.5 Deployment & Integration
To deploy the system in a testing environment for user evaluation and feedback.
Activities:
 Package and deploy the model using Flask or FastAPI with a web-based UI.
 Enable video upload, prediction display, and feedback collection.
 Integrate cloud services or local file systems for data handling.
2.2.2.6 Maintenance & Continuous Improvement
To ensure long-term system reliability and improvements based on user feedback and
model advancements.
Activities:
 Monitor system performance through logging and analytics.
 Update detection models with new datasets.
 Add support for more complex deepfakes or video enhancements.
 Fix bugs and optimize performance for better scalability and user experience.

15
2.3 DECOMPOSITION OF MODULES

Figure 2.1: Module Diagram.


2.3.1 Video Upload & Preprocessing Module
This module is responsible for handling video files that users upload and preprocessing them
to make them suitable for deepfake detection.
 Splitting Videos into Frames: Videos are a collection of individual frames (images)
played in rapid succession. The first step in this module is to extract these frames from the
uploaded video. This is important because deepfake detection works by analyzing
individual frames to spot inconsistencies in facial features and movements.
 Face Detection and Cropping: After splitting the video into frames, face detection
algorithms are applied to identify faces within each frame. Once detected, the faces are
cropped out of the frame so that the model can focus only on the face's details, which are
crucial for identifying deepfakes. This ensures that irrelevant background information is
discarded.

16
Face Alignment

Figure 2.2: Video Upload & Preprocessing Module Diagram.


2.3.2 Dataset Handling Module
This module is responsible for managing the data needed for training and testing the deepfake
detection model.
 Data Splitting (Train/Test): To train and evaluate the model, the dataset is divided into
two parts: a training set and a testing set. The training set is used to train the model, while
the testing set is used to assess its performance. A common practice is to split the data in
an 80-20 ratio, with 80% used for training and 20% for testing.
 Data Loader: The data loader is responsible for feeding data into the model during training.
It ensures that batches of data are loaded efficiently, which is crucial when dealing with
large datasets. It also handles the preprocessing of frames (such as resizing or normalizing)
before they are fed into the model.

17
Figure 2.2: Dataset Handling Module Diagram.
2.3.3 Deepfake Detection Model
This module contains the core deepfake detection model responsible for analyzing facial
features and classifying videos.
 Xception Feature Extraction and Classification:The system uses the Xception model, a
deep convolutional neural network known for its high performance in image classification
tasks. Each video frame is processed through Xception to extract detailed spatial features.
The model then directly classifies whether the input is real or fake based on learned patterns
like texture irregularities, unnatural facial features, or blending artifacts.
The Xception model eliminates the need for separate temporal analysis by leveraging
frame-wise prediction and aggregating results across the video.

18
Figure 2.4: Deepfake Detection Model Module Diagram.
2.3.4 Model Training and Evaluation Module
This module handles training the deepfake detection model and assessing its performance.
 Training the Model:The model is trained using the training dataset, where each input
frame is processed by the Xception network to learn distinguishing features of real versus
fake content. The model updates its weights by minimizing the error between predictions
and true labels.
 Confusion Matrix:After training, a confusion matrix is generated to compare predicted
results with actual labels. This helps evaluate metrics such as precision, recall, accuracy,
and F1 score to understand model performance.
 Model Export:Once validated, the trained model is saved (e.g., as Xception.h5) so it can
be reused during prediction without retraining.

19
Figure 2.5: Model Training and Evaluation Module Diagram.
2.3.5 Prediction Module
This module is responsible for using the trained model to make predictions on new video data.
 Load Trained Model: The trained model, saved during the "Model Export" step, is loaded
into memory to be used for making predictions. This step ensures that the latest model is
available for inference.
 Output Prediction (Real/Fake): When a new video is uploaded for prediction, the system
processes it in the same way as the training data (splitting into frames, face detection,
feature extraction, etc.). The model then outputs a prediction of whether the video is real
or fake based on the processed frames.

20
Figure 2.6: Prediction Module Diagram.
2.3.6 Verification and Testing
This module is designed to ensure the system functions as expected and meets quality standards
before it is deployed.
 Verification: In this step, the entire system is verified to ensure that each module works
correctly. This includes checking if videos are uploaded and preprocessed correctly, if the
model is trained properly, and if the prediction module produces accurate results. Any
discrepancies or bugs found during this phase are fixed.
 Testing: After verification, the system undergoes testing. This involves testing the model
on a separate dataset (different from the training and validation datasets) to evaluate how
well it generalizes to new data. Testing can include cross-validation, performance on edge
cases, and ensuring the model performs well across different video qualities, lighting
conditions, and face angles.
21
Figure 2.7: Verification and Testing Module Diagram.
2.3.7 User Interface
This module focuses on the user experience, providing a front-end interface where users can
interact with the system.
 Upload Interface: Users should be able to upload videos through a simple interface. The
system should support various video formats and provide feedback on upload progress.
 Prediction Results Display: Once the video is processed and a prediction is made, the
results (whether the video is real or fake) are displayed to the user. The interface should
present this information clearly and in a user-friendly way.
 Error Handling: The user interface should handle errors gracefully, such as when the
uploaded video cannot be processed or if the system fails to generate a prediction. It should
provide clear error messages and possible solutions.
22
 User Interaction: The interface should also allow users to view previous results, download
predictions, and interact with other features like model retraining or feedback submission
for improving the model.

Figure 2.8: User Interface Module Diagram.


2.4. VERIFICATION TESTING FROM ACTIVITY TO MODULE LEVEL IN
DEEPFAKE DETECTION SYSTEM
Verification testing is essential to ensure the Deepfake Detection System functions
accurately and efficiently before deployment. The process involves testing at multiple levels, from
low-level individual functions to the overall system performance. Each level of testing plays a
critical role in validating the system's robustness, accuracy, and scalability.
2.4.1. Levels of Verification Testing
 Activity-Level Testing – Verifies small, individual functions like frame extraction or face
detection.
23
 Module-Level Testing – Validates standalone components such as dataset handling and
model inference.
 Integration Testing – Ensures data flow and interaction between interconnected modules
function correctly.
 System-Level Testing – Evaluates the complete system’s behavior with real-world inputs
and output conditions.
2.4.2. Detailed Breakdown of Verification Testing
2.4.2.1 Activity-Level Testing (Unit Testing)
This level focuses on validating individual functionalities and logic blocks within the
codebase. Each activity is tested independently to ensure it performs the intended operation.Verify
the correctness of atomic operations used across the system.
Examples of Activities Tested:
 Extracting frames from uploaded videos.
 Detecting faces using a pre-trained face detector.
 Cropping and aligning faces from video frames.
 Splitting long videos into smaller clips.
Testing Approach:
 White Box Testing to test function logic directly.
 Automated Unit Testing using tools like PyTest and Unittest.
2.4.2.2 Module-Level Testing
Each major component or module of the system is validated to ensure it produces expected
results when provided with standard inputs.To ensure individual modules like dataset preparation,
preprocessing, or model prediction work as standalone units.
Examples of Module Tests:
 Validate the dataset splitting function (train/test/validation) for correct proportions.
 Test that face detection and alignment work with various video formats.
 Confirm that model outputs have the correct tensor shape and label probability values.
Testing Approach:
 Black Box Testing to assess input-output behavior.
 Boundary Value Analysis for edge cases (e.g., no face detected, poor video quality).

24
2.4.2.3 Integration Testing
This phase tests whether different modules work correctly when integrated into a single
pipeline. Ensure smooth communication and data transfer between dependent modules.
Key Integration Scenarios:
 Verify that the output of video frame extraction feeds properly into the face detection
module.
 Confirm that cropped face images are correctly passed into the deepfake prediction model.
 Ensure prediction results are properly routed to the UI or alert system (if any).
Testing Approach:
 End-to-End Workflow Testing from video upload to final prediction.
 Interface Testing to validate intermediate data structures and handoffs between
components.
2.4.2.4 System-Level Testing
The full system is evaluated using real-world deepfake and authentic videos to test overall
performance under realistic conditions. To validate system performance, reliability, and accuracy
before deployment.
Test Scenarios:
 Test with real and fake videos the system has not encountered before.
 Evaluate classification accuracy, precision, recall, and F1 score.
 Measure system’s average processing time for a single video.
 Simulate load by submitting multiple videos in parallel.
 Assess robustness by running the system for extended periods (e.g., 12–24 hours
continuously).
Testing Approach:
 Load Testing to check performance under high data volume.
 Stress Testing to assess failure behavior under extreme conditions.
 User Acceptance Testing (UAT) where end users (e.g., security officials or testers)
evaluate usability and reliability.

25
2.4.3. Summary of Verification Testing Process
The Deepfake Detection System undergoes a rigorous multi-level verification process,
beginning with isolated function testing and culminating in full-system performance validation.
This structured approach ensures that each function, module, and their integration work flawlessly
under real-world conditions. With thorough testing at every stage, the system is prepared to handle
dynamic video inputs, deliver accurate results, and operate reliably in production environments.
2.5. CONCLUSION
The Deepfake Detection System offers a reliable solution to the increasing threat of
manipulated videos by utilizing a robust and modular deep learning-based architecture. By
leveraging the Xception model for spatial feature extraction and a well-structured preprocessing
pipeline, the system accurately distinguishes fake content from authentic videos. The architecture
incorporates stages such as data acquisition, face detection, preprocessing, and classification,
functioning cohesively for end-to-end deepfake detection. Developed using an Agile methodology
and validated through rigorous activity, module, integration, and system-level testing, the solution
ensures high accuracy, scalability, and performance. Its real-time prediction capability makes it a
valuable tool across digital forensics, media verification, and security domains.

26
3. SOFTWARE TESTING
3.1 TEST PLAN
3.1.1 Test Strategy
The Deepfake Detection System was tested using a comprehensive strategy that covered
both functional and non-functional aspects. The objective of the testing process was to ensure the
correctness, robustness, and reliability of the application across its individual components as well
as the integrated system. All testing was performed entirely on a local system, using free and
open-source software and datasets, without relying on any paid or online/cloud services.
The types of testing conducted are as follows:
Unit Testing
Each core component was independently tested using unittest and pytest:
 Frame Extraction Module: Validated consistent and timestamp-aligned extraction from
input videos.
 Face Detection Module (MTCNN): Tested accuracy and robustness across lighting,
angles, and motion blur.
 Xception Model (.h5): Verified that the model produces stable predictions for identical
inputs and correctly handles invalid or corrupted frames.
 Preprocessing Pipeline: Ensured correct normalization, resizing, and input shaping
expected by the Xception model.
Integration Testing
 Video Upload and Reading: Verified successful ingestion of various video formats
without corruption.
 Frame Extraction and Preprocessing: Ensured consistent frame extraction and proper
normalization.
 Face Detection (MTCNN): Checked accurate detection across frames under diverse
conditions.
 Input Formatting and Batch Preparation: Confirmed correct resizing and batching of
face crops.
 Xception Model Inference: Validated model predictions on processed input batches.
 Output Classification and Result Handling: Ensured accurate mapping and display of
real/fake results.

27
 Intermediate Data Consistency: Verified data shape and format compatibility between
modules.
 Error Handling: Tested pipeline resilience to corrupted inputs and detection failures.
 End-to-End Consistency: Confirmed predictions match expected outputs on test videos.
System Testing
 The complete application was tested as a whole using actual video files. System testing
focused on:
 Functional correctness of all workflows.
 Correct prediction rendering on the web interface.
 Appropriate error handling for corrupted, unsupported, or short video files.
 Frontend-backend data exchange, using Flask routes and JavaScript/AJAX communication.
 This helped validate the usability and stability of the full pipeline in real-world usage.
Performance Testing
 Performance testing was performed to determine how efficiently the system handled
inference. Key parameters evaluated:
 Model Inference Time per Video: Measured using Python’s time module.
 Memory Consumption: Tracked using memory_profiler.
 System Load: Observed CPU and RAM usage during processing of different video lengths
and resolutions.
 Since an AMD GPU was used and CUDA is incompatible, performance evaluation was
conducted using CPU-only execution.
Adversarial Testing
 To test robustness, modified deepfake videos were used. These videos were generated by
applying the following transformations to original deepfakes:
 Addition of Gaussian noise and compression artifacts.
 Partial occlusion of the face region.
 Varying brightness and contrast levels.
 Motion blur effects.
 The system’s ability to maintain high detection accuracy under these adversarial conditions
was measured and compared to performance on unaltered deepfake videos.

28
Regression Testing
Each time improvements or bug fixes were introduced, the system underwent regression
testing. Previously passed test cases were re-executed to ensure no unintended side effects or
functionality loss occurred. This was crucial after refactoring frame extraction or enhancing error
handling mechanisms.
3.1.2 Test Environment
All tests were conducted on a personal computer with a Windows 11 operating system,
using freely available software and locally stored datasets. The following describes the complete
configuration:
Table 3.1:Hardware Environment
Specification Details
Processor AMD Ryzen 5 5500 @ 3.60 GHz
Installed RAM 16 GB (15.9 GB usable)
System Type 64-bit Operating System, x64-based CPU
Graphics Card AMD Radeon RX 6000 Series
GPU Utilization Used Upto 60%
Table3.2:Software Environment
Component Configuration and Purpose
Operating System Windows 11
Programming Language Python 3.12
Deep Learning Library PyTorch (CPU-only version)
Web Framework Flask (for the local interface and REST API development)
Computer Vision OpenCV, MTCNN (for image processing and face detection)
Model Modules Xception for feature extraction, LSTM for sequential modeling
Testing Tools pytest, unittest, Postman, memory-profiler
Development Tools Visual Studio Code (IDE), Python virtual environments
moviepy, ffmpeg, os, and shutil for reading/storing test videos,
Dataset Handling
video-to-frame conversion, and preprocessing
All required Python packages were installed via pip from official repositories.

29
Table 3.3:Datasets Used for Testing
Dataset Name Description Source
DFDC (Deepfake Contains labeled real and fake videos. Used for Kaggle
Detection Challenge) general accuracy testing.
Celeb-DF Deepfake videos of celebrities generated using GitHub
advanced techniques.
FakeAVCeleb Audio-visual fake data (optional use for further GitHub
variation).

3.1.3 Test Deliverables


Once the testing process was completed, the following deliverables were generated and kept
within the local environment:
 Test Plan Document: Detailed report outlining the entire testing strategy, objectives, and
methodology.
 Unit and Integration Test Logs: Logs that track the results of unit and integration tests
on a local machine.
 Performance Metrics Summary: Summary of latency, throughput, and accuracy,
benchmarked locally on the system.
 Defect Logs (JIRA/Excel): A log of bugs and issues, with status updates.
 Final Test Report (PDF): A report summarizing the outcomes of all tests, including
successful and failed test cases.
 Benchmark & Graphs: Performance graphs generated from local testing results, showing
latency, accuracy, and throughput.
3.1.4 Resources & Responsibilities
The following table lists the roles and responsibilities involved in testing, all carried out on
the local machine:

30
Table 3.4: Resources & Responsibilities
Role Responsibility
QA Test Lead Coordinate test efforts and approve the
testing plan.
Deep Learning Engineer Address any model-related issues, optimize
performance locally on the system.
Backend Developer Assist in API integration and model
inference in a local environment.
Test Engineer Execute manual and automated tests locally,
log issues, and ensure proper test execution.

3.2 Test Design and Coverage Analysis


3.2.1 Test Cases Design
The following table outlines the minimal test cases that were designed for the system.
These tests focus on validating core functionality and ensuring correct behavior under standard
conditions:
Table 3.5: Test Cases Design
Expected
Test ID Module Description
Output
Frame
TC-001 Extract N frames from a 30-second video at 10 FPS 300 frames
Extraction
MTCNN Detect faces in both normal and low-light video ≥95% detection
TC-002
Detection conditions using MTCNN or Haar cascades accuracy
Feature Validate output from the Xception feature extraction Tensor [Batch ×
TC-003
Extraction layer 2048]
Deepfake Predict deepfake probability using the Xception Value in range
TC-004
Prediction model (no LSTM involved) [0, 1]
Ensure the Flask API returns a JSON HTTP 200 with
TC-005 API Output
response with prediction and associated frame correct result

31
3.2.2 Test Coverage Analysis
The system was tested locally to ensure the following:
 Code Coverage: Achieved 92% coverage using pytest-cov, ensuring that the majority of
the codebase was thoroughly tested.
 Functional Modules: The following functional modules were tested to verify their
correctness:
o Frame extraction from input videos.
o Face detection using MTCNN or Haar cascades.
o Feature extraction via Xception.
o Model classification to determine authenticity.
o API interaction to ensure the model prediction is served correctly.
 Boundary Testing: The system was validated with video duration limits:
o Videos with a minimum duration of 2 seconds.
o Videos with a maximum duration of 120 seconds.
 Adversarial Testing: The model's robustness was tested against tampered videos with
various challenges, including:
o Blurred faces.
o Occluded faces.
o Noise-injected videos.
3.2.3 Test Objectives & Criteria
The test objectives were tailored to achieve the following outcomes:
 Accuracy: The model must detect deepfakes with an AUC (Area Under Curve) ≥ 0.98,
ensuring high classification accuracy across the validation and testing sets.
 Face Detection: The system must accurately detect faces in varying lighting conditions
and environments, utilizing MTCNN or Haar cascades to maintain high detection
accuracy (≥95%) even in low-light situations.
 Video Format Compatibility: The system should be able to process local video formats,
including MP4, AVI, and MOV, without issues, ensuring compatibility with various video
file types commonly used in deepfake detection tasks.

32
 Processing Speed: The system should process at least 10 videos per minute locally using
an AMD Ryzen 5 processor and an AMD RX 6000 GPU, ensuring real-time performance
suitable for practical use cases.
3.3 TEST SCHEDULE AND ESTIMATIONS
3.3.1 Test Case Schedule
The test cases were executed on a local system as follows:
Table3.6: Test Case Schedule
Test Type Number of Test Cases
Unit Tests 5
Integration Tests 5
System Tests 5
Regression Tests 5
Adversarial Tests 5
3.3.2 Test Case Estimations
Below is the estimation of test cases for each module, focusing on the core functionalities:
Table 3.7: Test Case Estimations
Module Test Cases
Frame Extraction 1
Face Detection 1
Feature Extraction & Classification 2
API & System Flow 1
Each test case focused on validating the core functionality of the system, with minimal but essential
tests to ensure reliability and performance when running locally.
3.4 TEST CRITERIA
3.4.1 Functional Criteria
The following functional criteria were defined to ensure the system meets its intended
objectives when tested locally:
 Face Detection Accuracy: The system must correctly detect faces in a video with an
accuracy of ≥95% under varying conditions, including low light, using MTCNN or Haar
cascades for robust face detection.

33
 Inference Model Accuracy: The deepfake detection model (based on Xception, not
LSTM and ResNeXt) should achieve an AUC ≥ 98% on the validation dataset,
demonstrating its ability to effectively differentiate between real and deepfake content.
 Output JSON: The system must return a JSON response containing:
o Prediction result: Indicates whether the frame is Real or Fake.
o Confidence score: A confidence score for the prediction, representing the model's
certainty.
o Frame number: The frame number where the prediction was made, allowing
traceability of results.
3.4.2 Non-Functional Criteria
These non-functional criteria ensure that the system performs well within local hardware
limitations and meets the following expectations:
 Latency: The system must process each frame with latency ≤ 500ms/frame during video
inference. This ensures that real-time predictions are possible, even on a local machine,
maintaining high responsiveness.
 Memory Usage: The system should utilize ≤ 8GB of RAM when processing 1080p videos.
Given the local hardware specifications (AMD Ryzen 5 5500, 16GB RAM), this ensures
that the system is optimized for efficient memory usage and performs well under normal
conditions.
 Throughput: The system should be capable of processing ≥ 10 videos per minute locally
using the AMD RX 6000 GPU, optimizing GPU utilization for fast processing and real-
time video analysis.
 Security: Proper input sanitization and file-type verification must be implemented to
prevent the system from processing unsupported or malicious video files. This ensures that
only valid video formats (e.g., MP4, AVI) are accepted, preventing potential security
vulnerabilities from being exploited.
3.5 TEST STRATEGY
3.5.1 Test Levels
The testing levels are organized to validate core functionalities of the Deepfake Detection
System while considering local performance constraints:

34
 Unit Testing: Individual core modules (frame extraction, face detection, feature extraction,
model inference) are validated for accuracy and correct behavior using minimal test cases.
This ensures isolated functionality and correctness for each component (e.g., frame
extraction with MTCNN, face detection, Xception feature extraction).
 Integration Testing: Validates the end-to-end flow from video upload to final prediction.
This ensures that data correctly flows between modules, such as from frame extraction to
deepfake classification, and that the system functions cohesively.
 System Testing: Comprehensive end-to-end testing is performed using local datasets to
ensure the system works as expected in a local environment, from video upload to output
prediction. This ensures proper functionality under typical operating conditions, including
video processing, face detection, feature extraction, and prediction.
 Regression Testing: After bug fixes or updates, regression testing is conducted to recheck
core functionalities, ensuring that changes don’t negatively affect existing features. This
will be focused on major features like video upload, face detection, and model inference.
 Adversarial Testing: Tests the system's robustness by introducing input distortions such
as blurred faces, low-quality videos, and noise-injected deepfake samples. This ensures
that the model can detect adversarial attempts to bypass detection and is resilient against
tampered inputs.
3.5.2 Testing Tools and Technologies
These tools are used to conduct tests locally, ensuring minimal resource consumption and ease
of integration:
 Testing Tools:
o PyTest: For unit and integration tests to validate the accuracy and correctness of
individual components like frame extraction, face detection, and model inference.
o Locust: For performance testing, simulating load and video upload scenarios to
test system scalability and stability under local conditions.
o Bandit: For security testing, ensuring the system is secure against common
vulnerabilities like file-type manipulation and ensuring safe input handling.
 Coverage Tool:
o pytest-cov: To measure test coverage and ensure that key modules (e.g., frame
extraction, feature extraction, classification) are being properly tested.

35
 UI Testing (Optional):
o Selenium: While primarily for web-based applications, Selenium could be used if
a minimal web UI is incorporated in the future. This step is currently optional, as
the focus is primarily on backend testing and video processing.
 CI/CD:
o GitHub Actions: Although the system isn't deployed online, GitHub Actions can
be set up to run testing scripts (unit tests, integration tests) locally during commits
and to automate local tests without deployment.
3.5.3 Testing Approach
This approach focuses on key areas to ensure the system is efficient and performs as expected
in a local setup:
 Black Box Testing: Focuses on testing the system's outputs (predictions, JSON response)
without delving into the inner logic. The goal is to ensure that the system correctly
identifies real vs. deepfake videos, returning accurate predictions and associated
confidence scores, regardless of the internal processes (e.g., frame extraction, face
detection, Xception inference).
 Boundary Testing: Tests edge cases related to video length (e.g., minimum 2s, maximum
120s) and resolution (e.g., 1080p, 720p). The goal is to ensure the system can handle videos
within the expected range of input parameters, processing them efficiently without errors.
 Stress Testing: A burst of videos is uploaded to measure the system’s stability and
performance under load. This will evaluate the GPU's ability to handle multiple video files
being processed in parallel on your local hardware (AMD RX 6000 GPU), testing
throughput and processing speed.
 Exploratory Testing: Testing the system with deepfakes from unseen datasets (such as
new techniques or real-world data) to explore its ability to handle diverse inputs and assess
its resilience in detecting new or evolving deepfake techniques. This will test the model's
generalization ability beyond the training data.
3.6 TEST OBJECTIVES
The primary objectives of the testing phase are to ensure that the Deepfake Detection System
performs as expected in a local environment with minimal resources. The test objectives are:

36
 Ensure Accuracy and Stability of Model Inference:
o Validate that the deepfake detection model (Xception) provides accurate
predictions (Real or Fake) with a high level of reliability and consistency across a
variety of input video datasets.
o Verify that the model inference (using Xception) produces the correct results
within a reasonable amount of time, even under local testing conditions with limited
resources (e.g., CPU/GPU).
Test Under Realistic Conditions with Distorted Videos:
 Evaluate the system’s ability to handle and detect deepfake videos under less-than-ideal
conditions, including distorted videos, such as those with blurred faces, low-resolution
content, or noise-injected deepfake videos.
 Ensure that the system maintains its accuracy and can handle real-world challenges like
lighting variations, video compression artifacts, and partial occlusions while still making
reliable predictions.
Secure and Error-Free Local Testing Environment:
 Ensure that the testing environment is secure, with proper input validation to prevent
vulnerabilities (e.g., malicious video files or unsupported formats like MOV, AVI).
 Ensure that the system operates without errors during video uploads, processing, and
prediction generation, providing a smooth user experience in a local setup.
Reliable Real/Fake Prediction with Frame-Level Confidence:
 Validate that the system correctly predicts whether a video is real or fake with reliable
frame-level confidence to assist the user in understanding prediction results.
 Ensure that the output JSON is correctly structured, including both the prediction result
and associated confidence levels for each frame, ensuring clarity and transparency.
3.7 Plan Test Environment
Local Environment
 Operating System: Windows 11
 Programming Language: Python 3.9
 Deep Learning Framework: PyTorch (for model inference and training)
 GPU: AMD Ryzen RX 6000 (Local setup with available AMD GPU, utilizing PyTorch's
compatibility with AMD)

37
 Hardware:
 Processor: AMD Ryzen 5 5500 (3.60 GHz)
 RAM: 16 GB (15.9 GB usable)
 Graphics Card: AMD Ryzen RX 6000 (For local GPU acceleration)
 Libraries: OpenCV, MTCNN, Flask (for API endpoints)
 Testing Tools:
 Unit Testing: pytest
 Integration Testing: Postman (manual API testing) + cURL scripts (for API
request automation)
 Coverage: pytest-cov (for measuring code coverage)
 Monitoring Tools:
 System Monitoring: Task Manager (Windows)
 Python Profiler: cProfile (for profiling performance during local testing)
 GPU Monitoring: Use Adrenalin (or relevant AMD GPU monitoring software)
 Cloud Environment (Optional for future scaling)
 Cloud Provider: AWS EC2 (optional for future scalability)
 Instance Type: Tesla T4 GPU (only used if testing extends to cloud deployment)
 Dataset:
 DFDC Test Set (Deepfake Detection Challenge dataset)
 Custom Real/Fake Video Dataset (Videos sourced from open-source datasets or
custom video collection)
 Test APIs: Postman + cURL scripts (For testing and API requests in a controlled
environment)
 Monitors and Profiling Tools
 System Monitoring:
 NVIDIA SMI: Used for monitoring GPU usage (if an NVIDIA GPU is used in
cloud setups)
 Python Profiler: To identify bottlenecks in the code, especially during inference
and video processing stages.

38
3.8 UNIT TESTING
3.8.1 Unit Test Cases
Unit test cases were designed to validate the correctness of core modules used in the
Deepfake Detection System. Each test focused on a single component, such as frame extraction,
face detection, feature extraction (Xception), and model classification (using Xception for
classification).
Tests were created based on expected inputs and outputs, using clearly defined criteria for
success such as output shape, probability range, or detection rate. As this project runs completely
offline, all test cases were executed within the local environment using Python-based test
frameworks.
The scope of unit tests was intentionally kept minimal and targeted, covering the most
critical functions needed for the end-to-end operation of the system.
3.8.2 Unit Test Results
Table 3.8: Unit Test Results
Module Pass Rate
Frame Extraction 100%
Face Detection (MTCNN) 96%
Feature Extraction (Xception) 100%
Model Output Validation 100%

Figure 3.1: Test Results

39
3.8.3 Test Execution
 All unit tests were executed manually or via locally scheduled scripts within the Windows
11 environment.
 The test framework used supports logging of results into readable formats, such as text or
HTML, for easy review.
 Test execution was carried out entirely within the local system, with no cloud-based or
internet-based testing tools involved.
 Failures, if any, were carefully analyzed and documented for internal debugging.
 Testing was strictly offline, ensuring complete isolation from external systems, with no
reliance on online platforms, CI/CD pipelines, or cloud deployment.
3.9 Integrated Testing
3.9.1 Integration Test Cases
Integration testing was conducted to validate the interaction between major system components
and ensure seamless end-to-end processing from video input to prediction output. The test cases
focused on the core operational flow:
 Video Input → Frame Splitting
Ensures that uploaded videos are correctly divided into frames.
 Frame → Face Detection → Tensor Conversion
Verifies face detection on each frame and proper feature extraction.
 Tensor → Model Prediction → Output Classification
Confirms the integration of deep feature analysis and classification via the Xception model.
 Final Output → JSON API Response with Prediction and Frame Confidence
Tests the ability of the backend to package and return prediction results via API.
These integration flows represent the core logic of the offline Deepfake Detection System and
were selected for their critical role in the inference pipeline.

40
3.9.2 Integration Test Results
Table 3.9: Test Results
Test Flow Status
Video to Frames Passed
Frames to Face Detection Passed
Face to Feature Vector (Xception) Passed
Features to Model Classification (Xception) Passed
Prediction to API Output Passed
 Overall Pass Rate: 100%
 API Response Time: Maintained acceptable latency under batch input tests
 Model Stability: No crashes or memory overflow observed during integration runs

Figure 3.2: System Execution

41
Figure 3.3: User Interface

42
Figure 3.4: Input of Real Video

43
Figure 3.5: Result of Real Video

44
Figure 3.6: Input of Fake Video

45
Figure 3.7: Result of Fake Video

46
Figure 3.8: Input Processed by the System
3.9.3 Test Execution
 All integration tests were run locally within the development system using Python scripts
and test videos.
 Inputs were provided as local MP4 files ranging from 10 to 60 seconds in duration.
 API calls were tested using tools like Postman and cURL to validate JSON responses and
latency.
 Results were recorded manually in logs stored on the local system.
 No external servers, CI/CD environments, or internet access were involved at any stage.
3.10 Validation Testing
Validation testing was conducted to ensure the system meets its core functional and non-
functional requirements in an offline, locally hosted environment. The tests verified real-world
usability, reliability, and performance when handling deepfake detection tasks on local video files.
3.10.1 Functional Validation
The following steps were executed to validate the core functionality of the system:

47
 Test Samples:
 Used real and fake video samples from publicly available datasets such as DFDC
(DeepFake Detection Challenge) and CelebDF, stored locally.
 A custom test set containing known real and tampered videos was also created to
evaluate real-world behavior.
 Validation Method:
 Each video was uploaded through the local API interface.
 The system output was manually inspected and programmatically compared
against the ground truth labels (Real/Fake).
 Prediction confidence scores and corresponding frame-level results were recorded
and reviewed.
 Acceptance Criteria:
 The system correctly identifies deepfake content with AUC ≥ 0.98.
 Each prediction includes frame confidence and binary classification output (Real
or Fake).
 The system successfully validated functional requirements across all test samples without
internet access or online tools.
3.10.2 Non-Functional Validation
To ensure the system performs reliably under real-world conditions, the following non-
functional aspects were tested:
 Latency:
 Inference time was measured per frame using test videos ranging from 30s to 120s.
 The system maintained an average latency of ≤ 500 ms per frame on a local AMD
Ryzen 5 system with 16GB RAM and Radeon RX 6000 series GPU.
 Memory Usage:
 System memory usage remained within 8 GB during peak load, including video
loading, face detection, and model inference.
 Scalability (File Size):
 Successfully processed video files up to 300MB in size without crashing or
significant slowdowns.

48
 Offline Capability:
 All validations were completed on a local setup using:
 Windows 11 (64-bit)
 Python 3.9, PyTorch, OpenCV, Flask
 No online deployment, cloud computing, or API hosting
3.11 Defect Management
Defect management was conducted in a structured and manual format, using locally
maintained spreadsheets and logging systems (e.g., Excel) to track, classify, and resolve issues
discovered during testing. All debugging and issue resolution were done offline.
3.11.1 Identified Issues
Table 3.10: Identified Issues
ID Description Severity Status
DEF-01 LSTM fails to classify rotated face videos correctly High Fixed
DEF-02 API crashes when handling unsupported/corrupt video
Medium Fixed
formats
DEF-03 Optimization
Performance drops with videos above 1080p resolution Low
pending

3.11.2 Resolution Status


 DEF-01:
o The Xception-based model was retrained using augmented datasets containing
rotated face samples.
o This improved the model's robustness and accuracy across varying face
orientations, eliminating classification failures.
 DEF-02:
o A pre-validation module was integrated using OpenCV to verify video file integrity
and supported formats before processing.
o This prevented crashes due to unsupported or corrupt video files.

49
 DEF-03:
o Asynchronous batch processing was introduced to better manage higher-resolution
(above 1080p) video frames.
o Full optimization was deferred due to its limited impact on the core use case, which
primarily operates within 720p–1080p.

All issues were identified, resolved, and logged manually within the local development
environment. No cloud-based bug tracking or automation tools were used.
3.12 Test Deliverables
The following deliverables were produced as part of the Deepfake Detection System's offline
testing process. All files were prepared and stored locally, with no reliance on cloud platforms or
external repositories:
 Test Case Document (.XLSX):Contains detailed unit and integration test cases, expected
outputs, and validation notes for every tested feature.
 PyTest Logs & Test Reports:Logs from all unit tests executed using pytest. Includes test
summaries, failure details, and any assertion issues. Saved locally for analysis.
 CI Pipeline Output (Local GitHub Actions Simulation):Simulated execution of a CI
pipeline through GitHub Actions, though without actual cloud syncing. Logs were
archived for version control and consistency.
 Model Evaluation Summary:A document that includes key performance metrics
(precision, recall, accuracy, AUC) derived from local validation datasets.
 Final Project Testing Report (PDF):A comprehensive offline report summarizing the test
strategy, results, performance benchmarks, and validation procedures for the entire testing
phase.
 Bug Tracker (Excel):A spreadsheet maintaining a record of defects, including their
severity, current status, and resolution steps, allowing easy tracking and debugging.
3.13 Conclusion
The Deepfake Detection System underwent a thorough offline testing process, which
included unit testing, integration testing, system-level validation, adversarial input handling, and
regression testing. All testing phases were conducted in a controlled, offline environment using
open-source tools and manually prepared datasets.

50
Key findings include:
 High Accuracy: Achieved ≥ 98% AUC on both real and deepfake video samples.
 Stable Performance: Consistent behavior across various input formats and video
resolutions.
 Robust Detection: The system successfully resisted basic adversarial manipulations, such
as blurring, rotation, and brightness alterations.
All critical defects were identified, documented, and resolved during the testing process. The
system is now operationally ready for use in local environments, making it suitable for research,
academic, or offline surveillance applications.

51
GLOSSARY
A deepfake refers to synthetic media where a person’s likeness is altered or swapped using
artificial intelligence techniques. To detect such manipulations, methods like MTCNN (Multi-
task Cascaded Convolutional Neural Network) are used for accurate face detection in images
or videos, which is essential in identifying manipulated faces in deepfakes. Additionally, LSTM
(Long Short-Term Memory), a type of recurrent neural network, is employed to analyze temporal
features and classify time-series data, making it particularly useful for capturing frame-level
temporal dependencies in deepfake detection. For high-accuracy feature extraction, ResNeXt, a
convolutional neural network (CNN) architecture, is often utilized in deepfake detection systems.
Lastly, FPS (Frames Per Second) is a key metric that indicates both the video’s frame rate and
the detection model’s processing speed, helping assess the system’s performance and efficiency.

52
REFERENCES
[1] Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated Residual Transformations
for Deep Neural Networks (ResNeXt). This paper introduced a powerful architecture for feature
extraction used in the detection system.
[2] Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory (LSTM). This
foundational work on sequence modeling guided the temporal component of the detection system.
[3] Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint Face Detection and Alignment using
Multi-task Cascaded Convolutional Networks (MTCNN). Provided the method for real-time,
accurate face detection.
[4] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition
(ResNet). Informed the design and optimization of deep residual networks for image-based feature
extraction.
[5] Facebook AI. (n.d.). DFDC Dataset – Deepfake Detection Challenge. Facebook AI released
this large-scale dataset to support training and evaluation of deepfake detection models by
providing a diverse set of real and manipulated videos.
Retrieved from: https://ai.facebook.com/datasets/dfdc
[6] Li, Y., Chang, M., & Lyu, S. (n.d.). Celeb-DF: A Large-scale Dataset for DeepFake Detection.
This dataset improves deepfake detection by providing high-quality real and synthetic videos of
celebrities.
Retrieved from: https://github.com/yuezunli/Celeb-DF
[7] OpenCV. (n.d.). Haar Cascade Classifier. Referenced as a traditional baseline face detection
technique in comparison to deep learning methods like MTCNN.
Retrieved from: https://docs.opencv.org/3.4/db/d28/tutorial_cascade_classifier.html
[8] PyTorch. (n.d.). PyTorch Documentation. The deep learning framework used for implementing
and training the detection model, offering tools for tensor operations, GPU acceleration, and model
building.
Retrieved from: https://pytorch.org/docs/
[9] OpenCV. (n.d.). OpenCV Documentation. A computer vision library used for video frame
extraction, face detection, and preprocessing before deepfake analysis.
Retrieved from: https://docs.opencv.org/

53
[10] Docker. (n.d.). Docker Documentation. Used to containerize the development and deployment
environments, ensuring consistency across systems.
Retrieved from: https://docs.docker.com/
[11] Flask. (n.d.). Flask Documentation. A lightweight web framework used to build the API for
uploading videos and serving predictions.
Retrieved from: https://flask.palletsprojects.com/
[12] ONNX. (n.d.). Open Neural Network Exchange. Used for optimizing and deploying models
across platforms like AWS SageMaker or edge devices.
Retrieved from: https://onnx.ai/
[13] AWS. (n.d.). Amazon SageMaker Documentation. Platform used for scalable model
deployment and inference.
Retrieved from: https://aws.amazon.com/sagemaker/

54
APPENDICES
Appendix A – Dataset Details:All datasets were downloaded and prepared locally, without the
use of APIs or cloud access. The videos were preprocessed and stored in the appropriate directories
for test execution. The main datasets used were DFDC, which contains approximately 120,000
videos in 720p or 1080p MP4 format, and CelebDF, with around 10,000 720p MP4 videos. For
testing, subsets of these datasets were created based on the system’s RAM (16 GB) and GPU
capabilities, specifically the AMD GPU.
Dataset Number of Videos Resolution Format
DFDC ~120,000 720p/1080p MP4
CelebDF ~10,000 720p MP4
Appendix B – API Endpoint Specification (Local Use Only):The API endpoint for prediction
was designed as follows:
 Endpoint: /predict
 Method: POST
 Input: An MP4 video file, which can be uploaded via a local interface or tested using tools
like Postman or cURL.
 Output: A JSON file containing the prediction (Real or Fake), confidence score, and a
breakdown of analyzed frames.

55

You might also like