0% found this document useful (0 votes)
9 views

Project Report Model

The document is a mini project report on 'Deepfake Detection Using Deep Learning' submitted by students from Dhanalakshmi Srinivasan Engineering College as part of their Bachelor's degree requirements. It discusses the challenges posed by deepfakes in digital media and proposes a solution using Convolutional Neural Networks (CNNs) for automated detection of manipulated images and videos. The project aims to enhance digital forensics and media authentication to combat misinformation and maintain public trust.

Uploaded by

cnathishwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Project Report Model

The document is a mini project report on 'Deepfake Detection Using Deep Learning' submitted by students from Dhanalakshmi Srinivasan Engineering College as part of their Bachelor's degree requirements. It discusses the challenges posed by deepfakes in digital media and proposes a solution using Convolutional Neural Networks (CNNs) for automated detection of manipulated images and videos. The project aims to enhance digital forensics and media authentication to combat misinformation and maintain public trust.

Uploaded by

cnathishwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

DEEPFAKE DETECTION USING

DEEP LEARNING
Submitted by

VIGNESHWAR P (810422243119)
SABARI KANNAN R (810422243088)
SAYOOJ KUMAR V.S (810422243095)

Of
DHANALAKSHMI SRINIVASAN ENGINEERING COLLEGE, (AUTONOMOUS)
PERAMBALUR – 621 212

A MINI PROJECT REPORT

Submitted to the

FACULTY OF INFORMATION AND COMMUNICATION ENGINEERING

In partial fulfillment of the requirements


for the award of the degree
Of
BACHELOR OF TECHNOLOGY
In
ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

ANNA UNIVERSITY CHENNAI - 600 025


MAY 2025

i
BONAFIDE CERTIFICATE

Certified that this mini project report titled “DEEPFAKE DETECTION USING DEEP

LEARNING” is the bonafide work of “VIGNESHWAR.P (810422243119), SABARI

KANNAN.R (810422243088), SAYOOJ KUMAR.V.S (810422243095),” who carried out the

research under my supervision. certified further, that to the best of my knowledge the work reported

here in does not form part of any other project report or dissertation on the basis of which a degree

or award was conferred on an earlier occasion on this or any other candidate.

SIGNATURE SIGNATURE

SUPERVISOR HEAD OF THE DEPARTMENT

MR. SARAVANAN S, M.TECH,M.B.A,(Ph.D) Dr. SHREE K.V.M, M.E..., Ph.D

Assistant professor, Professor & Head,

Department of Artificial Intelligence and Department of Artificial Intelligence and


Data Science,
Data Science,

Dhanalakshmi Srinivasan engineering college Dhanalakshmi Srinivasan engineering


(Autonomous), Perambalur-621 212. college (Autonomous), Perambalur-621 212.

Submitted for Mini Project Viva-Voce Examination held on ______________

INTERNAL EXAMINER EXTERNAL EXAMNIER


ii
ACKNOWLEDGEMENT

It is with immense pleasure that I present my first venture in the field of real application of

computing in the form of project work. First, I am indebted to the Almighty for his choicest

blessing showered on me in completing this endeavor.

I express my sincere thanks to Shri. A. SRINIVASAN, Chancellor, Dhanalakshmi

Srinivasan University, for having given me an opportunity to study in this Institution.

I would also like to acknowledge Dr. SHANMUGASUNDARAM, M..E., Ph.D.,

Principal, Dr. K. ANBARASAN, M.E, Ph.D., Dean Dhanalakshmi Srinivasan Engineering

College (Autonomous), Perambalur for their moral support and encouragement they have

rendered throughout the course. I express my sincere thanks to the Head of the Department Dr.

K.V.M SHREE, M.E., Ph.D., for having provided us with all the necessary specifications.

I owe my heartfelt thanks to my Internal Guide Mr. S. SARAVANAN, M.TECH,

M.B.A,(Ph.D) for his guidance and suggestions during this project work.

We render our thanks to all the staff members and programmers of department of

ARTIFICIAL INTELLIGENCE AND DATA SCIENCE for their timely assistance.

iii
ABSTRACT

With the rise of manipulated media, deepfake content has become a serious concern in

digital security, social media, and public trust. These synthetic images and videos, generated using

AI, can be indistinguishable from real ones to the human eye, posing significant ethical and

security threats. Manual identification of deepfake images is not only challenging but also

impractical at scale. Human evaluation is time-consuming, prone to error, and insufficient to

counter the rapid spread of fake content online. This calls for an automated, reliable solution to

detect deepfakes accurately and efficiently. This project proposes the use of Convolutional Neural

Networks (CNNs), a powerful class of deep learning models, to automate the detection of deepfake

images. The system is trained on a labeled dataset of real and fake images, using a custom-built

CNN architecture consisting of convolutional, max-pooling, and dense layers. The model takes

preprocessed 128×128 RGB images as input and is trained using binary cross-entropy loss with

the Adam optimizer. The system achieves high classification accuracy and effectively

distinguishes between real and synthetic images. The proposed CNN-based deepfake detection

system provides a fast and scalable solution for identifying manipulated images. It can serve as a

valuable tool in digital forensics, content moderation, and media authentication, helping reduce

the spread of misinformation and enhancing online trust.

iv
TABLE OF CONTENTS

CHAPTER NO TITLE PAGE NO

ABSTRACT iv

LIST OF FIGURES vii

LIST OF ABBREVIATIONS viii

1 INTRODUCTION 1

1.1 INTRODUCTION 1

1.2 PURPOSE 1

1.3 PROBLEM STATEMENT 2

1.4 MOTIVATION 2

1.5 OBJECTIVES 3

2 LITERATURE SURVEY 4

3 SYSTEM ANALYSIS 6

3.1 EXISTING SYSTEM 6

3.2 DISADVANTAGES OF EXISTING SYSTEM 6

3.3 PROPOSED SYSTEM 6

3.4 ADVANTAGES OF PROPOSED SYSTEM 7

4 SYSTEM SPECIFICATIONS 8

4.1 HARDWARE REQUIREMENTS 8

4.2 SOFTWARE REQUIREMNTS 8

5 SYSTEM IMPLEMENTATION 9

5.1 LIST OF MODULES 9

5.2 MODULE DESCRIPTION 9

v
5.2.1 DATASET COLLECTION 9

5.2.2 DATA PREPROCESSING 9

5.2.3 MODEL DESIGN AND TRAINING 10

5.2.4 DEEPFAKE DETECTION 10

5.2.5 EVALUATION AND RESULT VISUALIZATION 10

6 SYSTEM DESIGN 12

6.1 SYSTEM ARCHITECTURE 12

6.2 USE CASE DIAGRAM 13

6.3 CLASS DIAGRAM 14

6.4 SEQUENCE DIAGRAM 15

6.5 ACTIVITY DIAGRAM 16

7 SOFTWARE DESCRIPTION 17

7.1 OVERVIEW 17

7.2 SOFTWARE MODULES 17

7.2.1 DATA INGESTION MODULE 17

7.2.2 PREPROCESSING MODULE 17

7.2.3 CNN MODEL MODULE 17

7.2.4 TRAINING AND VALIDATION MODULE 18

18
7.2.5 INFERENCE MODULE
18
7.2.6 EXPLAINABILITY MODULE
18
7.2.7 STREAMLIT INTERFACE MODULE
18
7.2.8 UTILITY MODULE
18
7.3 FRAMEWORK OVERVIEW
19
7.4 FEATURES
19

vi
8 SOFTWARE TESTING 20

8.1 AIM OF TESTING 20

8.2 TEST CASE 20

8.2.1 VALID IMAGE WITH CLEAR FACE 20

8.2.2 VALID DEEPFAKE IMAGE 20

8.2.3 NO FACE IN IMAGE 20

8.2.4 LOW CONFIDENCE OUTPUT 21

8.2.5 LARGE FILE HANDLING 21

8.2.6 ADVERSERIAL INPUT 21

8.3 TYPES OF TESTING

8.3.1 UNIT TESTING 21

8.3.2 INTEGRATION TESTING 21

8.3.3 FUNCTIONAL TESTING 22

8.3.4 REGRESSION TESTING 22

8.3.5 PERFORMACE TESTING 23

8.3.6 USABILITY TESTING 23

8.3.7 BLACK BOX TESTING 23

8.3.8 WHITE BOX TESTING 24

8.3.9 OUTPUT TESTING 24

8.3.10 USER ACCEPTANCE TESTING 24

8.4 TESTING TOOLS USED 25

8.5 MODEL VALIDATION METRICS 25

8.6 ERROR HANDLING 26

27

vii
9 CONCLUSION AND FUTURE ENHANCEMENT 28

9.1 CONCLUSION 28

9.2 FUTURE ENHANCEMENT 28

APPENDICES 29

APPENDIX 1 - SOURCE CODE 30

APPENDIX 2 - SCREENSHOTS 35

REFERENCES 42

viii
LIST OF FIGURES

FIGURE NO TITLE PAGE NO

6.1 SYSTEM ARCHITECTURE 12

6.2 USE CASE DIAGRAM 13

6.3 CLASS DIAGRAM 14

6.4 SEQUENCE DIAGRAM 15

6.5 ACTIVITY DIAGRAM 16

10.1 STREAMLIT WEBPAGE FOR DEEPFAKE DETECTION 35

10.2 DEEPFAKE DETECTION PREDICTION 1: REAL 36

10.3 DEEPFAKE DETECTION PREDICTION 2: REAL 37

10.4 DEEPFAKE DETECTION PREDICTION 3: REAL 38

10.5 DEEPFAKE DETECTION PREDICTION 4: DEEPFAKE 39

10.6 DEEPFAKE DETECTION PREDICTION 5: DEEPFAKE 40

10.7 DEEPFAKE DETECTION PREDICTION 6: DEEPFAKE 41

ix
LIST OF ABBREVIATIONS

ABBREVIATION FULL FORM

AI - ARTIFICIAL INTELLIGENCE

GANS - GENERATIVE ADVERSARIAL NETWORKS

CNN - CONVOLUTIONAL NEURAL NETWORK

DFDC - DEEPFAKE DETECTION CHALLENGE

LSTM - LONG SHORT-TERM MEMORY

3D CNNS - THREE-DIMENSIONAL CONVOLUTIONAL NEURAL


NETWORKS

MTCNN - MULTI-TASK CASCADED CONVOLUTIONAL NEURAL


NETWORKS

ROC-AUC - RECEIVER OPERATING CHARACTERISTIC - AREA UNDER


CURVE

CPU - CENTRAL PROCESSING UNIT

GPU - GRAPHICS PROCESSING UNIT

RAM - RANDOM ACCESS MEMORY

SSD - SOLID STATE DRIVE

OS - OPERATING SYSTEM

VS CODE - VISUAL STUDIO CODE

JSON - JAVASCRIPT OBJECT NOTATION

UI - USER INTERFACE

API - APPLICATION PROGRAMMING INTERFACE

OPENCV - OPEN SOURCE COMPUTER VISION LIBRARY

TPU - TENSOR PROCESSING UNIT

BCE - BINARY CROSS ENTROPY

x
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION

In recent years, the rise of deepfakes—synthetically altered or generated images and videos
using artificial intelligence—has emerged as a major threat to digital authenticity. These
manipulated media can be used maliciously to spread misinformation, commit fraud, or manipulate
public opinion. Deepfakes are typically generated using techniques such as Generative Adversarial
Networks (GANs), which create realistic but fake content that is difficult for humans to distinguish
from authentic material.

As the quality of deepfakes continues to improve, the challenge of detecting them becomes
increasingly complex. This has led to a surge of interest in developing reliable and automated
systems that can detect such manipulations. Deep learning, a subset of artificial intelligence that
excels at pattern recognition, offers promising tools for tackling this problem. By training neural
networks on large datasets of real and fake media, models can learn subtle artifacts and
inconsistencies indicative of tampering.

This project aims to build an effective deepfake detection system using deep learning
techniques. The proposed system analyzes facial and visual features in video frames and uses a
convolutional neural network (CNN) model to classify content as real or fake. This technology has
applications in digital forensics, social media moderation, and media verification.

1.2 PURPOSE

The purpose of this project is to design and implement a deep learning-based system
capable of accurately detecting deepfake videos and images. With the increasing availability of
tools that allow even non-experts to create highly realistic fake media, the integrity of digital
content has become a growing concern. This project aims to combat the misuse of such synthetic
content by developing a reliable and automated solution that can identify tampered visuals through
analysis of facial and visual inconsistencies. By leveraging convolutional neural networks (CNNs)
and other deep learning architectures, the system is expected to detect subtle artifacts introduced

1
during the generation of deepfakes. Beyond technical implementation, the broader purpose of this
work is to support digital forensics, social media moderation, and public awareness efforts by
offering a scalable method for verifying the authenticity of digital media. This contributes to
safeguarding individuals, organizations, and societies from the potentially harmful consequences
of misinformation, impersonation, and fraud caused by deepfakes.

1.3 PROBLEM STATEMENT

Deepfakes present a serious threat to digital content authenticity, with potentially severe
implications for individuals, corporations, and governments. These AI-generated videos and
images can be manipulated to falsely represent people saying or doing things they never did,
leading to misinformation, defamation, identity theft, and political manipulation. The quality of
deepfakes has advanced to the point where they are nearly indistinguishable from genuine media,
making manual detection by human observers unreliable. Existing detection methods are often
limited in scope, lack real-time performance, and struggle to keep pace with rapidly evolving
deepfake generation techniques. Moreover, traditional forensic analysis techniques are time-
consuming and require expert intervention, making them unsuitable for large-scale content
verification. This project addresses these challenges by developing an intelligent, automated
system that uses deep learning to detect deepfakes with high accuracy. By training models on a
combination of real and fake media datasets, the system aims to identify subtle features that
distinguish authentic content from manipulated media, thus helping to restore trust in digital
communications.

1.4 MOTIVATION

The motivation behind this project stems from the growing misuse of AI-generated content
in malicious contexts such as political misinformation, fake news, financial scams, and personal
reputation damage. By developing a deepfake detection model, we can help combat these threats
and promote trust in digital content, while contributing to AI accountability and responsible media
practices.

2
1.5 OBJECTIVES

• To study and understand deepfake generation and detection techniques.


• To collect and preprocess a dataset consisting of real and fake images/videos.
• To implement and train a convolutional neural network (CNN) or other deep learning
model for detection.
• To evaluate the model’s performance using accuracy, precision, recall, and F1-score.
• To build a prototype system that can flag potential deepfakes.

3
CHAPTER 2
LITERATURE SURVEY

With the rapid development of generative adversarial networks (GANs) and related
technologies, deepfakes have become one of the most challenging threats to digital media
authenticity. Consequently, researchers have actively explored various techniques for detecting
such manipulations using machine learning and deep learning approaches. This chapter reviews
significant existing work in the field of deepfake detection.

Chollet (2017) introduced XceptionNet, a deep convolutional neural network architecture


that later became a foundational model for deepfake detection. Trained on datasets like
FaceForensics++, XceptionNet proved highly effective in identifying manipulated facial regions
due to its depthwise separable convolutions and robust feature extraction capabilities.

Afchar et al. (2018) proposed MesoNet, a lightweight CNN architecture designed


specifically for detecting deepfakes in compressed video formats. The model showed that deepfake
content often contains subtle inconsistencies in mesoscopic features, which can be effectively
captured by shallow neural networks for classification.

Nguyen et al. (2019) explored the use of capsule networks for deepfake detection,
highlighting their ability to preserve spatial hierarchies in facial structures. Their work showed
promise in scenarios where traditional CNNs struggled due to geometric transformations.

Li et al. (2020) proposed Face X-ray, a technique that identifies blending artifacts in
deepfakes by treating the problem as an image segmentation task. This method detects whether a
given image contains a combination of two facial regions—a common trait in face-swapping
deepfakes.

The DeepFake Detection Challenge (DFDC) launched by Facebook and hosted on


Kaggle provided one of the largest benchmark datasets for evaluating detection models. This
challenge spurred innovation in the development of models that could generalize across different
deepfake generation techniques and compression levels.

4
More recent approaches have leveraged transformer-based models and attention
mechanisms to improve detection accuracy. These models focus on capturing long-range
dependencies and facial expressions more effectively, which are often difficult to fake consistently
across frames.

Despite advancements, many studies have identified generalization as a key challenge—


models trained on specific datasets often struggle when tested on new, unseen deepfake generation
methods. This highlights the need for robust, adaptive detection frameworks.

5
CHAPTER 3
SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

In the current landscape, deepfake detection systems face significant challenges due to the
rapid evolution of deepfake generation techniques. Many existing systems rely on manual
observation or traditional video forensics, which are time-consuming, inconsistent, and often
ineffective against high-quality deepfakes. While several machine learning approaches have been
introduced, many of them lack generalization capability and fail to maintain high accuracy across
various deepfake datasets.

Earlier detection methods depended on handcrafted features such as inconsistencies in eye


blinking, head pose, or lighting conditions. Although useful in some cases, these approaches are
limited by their dependence on specific artifacts, making them vulnerable to more advanced or
novel deepfake techniques that do not exhibit those artifacts. Moreover, these systems typically
lack scalability and cannot process large volumes of content efficiently.

3.2 DISADVANTAGES OF EXISTING SYSTEMS

• Rely heavily on specific artifacts or features.


• Poor generalization to unseen deepfake generation methods.
• Require manual intervention or expert knowledge.
• Lack real-time processing capabilities.
• Struggle with performance on compressed or low-resolution media.

3.3 PROPOSED SYSTEM

The proposed system leverages deep learning, specifically convolutional neural networks
(CNNs), to detect deepfake content based on learned features rather than manually crafted ones.
This system is designed to automatically analyze image or video frames and identify subtle facial
distortions, pixel-level inconsistencies, or blending artifacts commonly found in manipulated

6
media. By training the model on a diverse dataset of real and fake videos/images, the system aims
to achieve high accuracy and robustness.

The architecture of the system may include a pretrained model (e.g., XceptionNet,
EfficientNet, or ResNet) fine-tuned on deepfake datasets such as FaceForensics++, DFDC, or
Celeb-DF. The model extracts deep visual features from each frame and classifies them as "real"
or "fake." The system can be extended to include temporal models (e.g., LSTM or 3D CNNs) to
analyze video frame sequences and improve detection based on motion inconsistencies.

This solution is scalable, fast, and capable of detecting both known and emerging deepfake
types. It can be deployed in content moderation systems, mobile apps, or browser extensions to
verify media authenticity in real-time.

3.4 ADVANTAGES OF THE PROPOSED SYSTEM

• Learns features automatically from large datasets.


• Performs well even on high-quality or compressed deepfakes.
• Can generalize across various manipulation techniques.
• Suitable for large-scale deployment in real-world scenarios.
• Supports both image and video-based detection.

7
CHAPTER 4
SYSTEM SPECIFICATIONS

4.1 HARDWARE REQUIREMENTS

• Processor : Intel i5 / AMD Ryzen 5

• RAM : 8-16 GB

• Storage : 256 GB

• Keyboard : Standard keyboard

• Monitor : 15-inch color monitor

4.2 SOFTWARE REQUIREMENTS

• Operating systems : Windows 10/11,Ubuntu 20.04+

• Programming language : Python 3.8+

• Python libraries : Numpy,Pandas,Opencv,Matplotlib,Schikit

learn,Tensorflow

• Development tools : VS Code/Jupyter Notebook

• Dataset : Kaggle deep fake detection dataset

• Web based interface : Streamlit /Django/Flask

8
CHAPTER 5
SYSTEM IMPLEMENTATION

5.1 LIST OF MODULES

1. Dataset Collection
2. Data Preprocessing
3. Model Design and Training
4. Deepfake Detection
5. Result Evaluation and Visualization

5.2 MODULE DESCRIPTION

5.2.1. Dataset Collection

The Dataset Collection module involves gathering a diverse and comprehensive dataset
that includes both real and deepfake images or video frames. Public datasets such as
FaceForensics++, DFDC, or Celeb-DF are often used for this purpose. These datasets provide a
wide range of manipulated content, ensuring variety in terms of facial expressions, lighting
conditions, backgrounds, and manipulation techniques. The goal is to collect enough data to train
a robust model capable of generalizing to different types of deepfake content.

5.2.2. Data Preprocessing

This module is responsible for preparing the collected data for training. It includes
extracting frames from video files, detecting and aligning faces using tools like MTCNN or
OpenCV, resizing images to the desired input size for the CNN (typically 224x224 pixels), and
normalizing pixel values. To enhance model generalization and reduce overfitting, data
augmentation techniques such as rotation, flipping, brightness adjustment, and noise addition are
also applied. Proper preprocessing ensures consistency and improves the efficiency of the model
training process.

9
5.2.3. Model Design and Training

In this module, a Convolutional Neural Network (CNN) is designed and trained for binary
classification—determining whether an input is real or fake. This can involve building a custom
CNN architecture or fine-tuning a pre-trained model such as VGG16 or ResNet. The model is
trained using a binary crossentropy loss function and an optimizer like Adam. The dataset is split
into training, validation, and test sets to monitor the model's performance and prevent overfitting.
Training is carried out over multiple epochs, and key metrics such as training accuracy and loss
are recorded.

5.2.4. Deepfake Detection

Once the model is trained, it is used in this module to classify new or unseen media. The
input image or video frame undergoes the same preprocessing steps and is then passed through the
CNN model to predict the likelihood of it being real or fake. Based on the output probability, the
system labels the input accordingly. This module represents the core functionality of the system—
real-time or batch detection of deepfakes using the trained model.

5.2.5 Evaluation and Result Visualization

The final module of the system is designed to thoroughly evaluate model performance and present
the results through intuitive and insightful visualizations. It begins with the confusion matrix,
which clearly outlines the distribution of true positives, true negatives, false positives, and false
negatives, helping identify the model’s strengths and the nature of its misclassifications. To further
assess the quality of the classification, the module includes a Receiver Operating Characteristic
(ROC) curve, which demonstrates the trade-off between sensitivity and specificity across different
threshold values, providing a visual guide for selecting an optimal decision boundary.

In addition, accuracy and loss graphs are plotted to monitor the model’s training process
over time, comparing training and validation metrics to detect issues like overfitting or
underfitting.

10
As an added feature, an optional interactive interface is available, allowing users to upload
video files and observe real-time detection results. This interface displays visual outputs such as
bounding boxes and labels on each frame, offering a hands-on, user-friendly way to test and
explore the system’s functionality. Collectively, these tools not only deliver a complete evaluation
of the model’s performance but also enhance its interpretability and accessibility for both technical
and non-technical users.

11
CHAPTER 6
SYSTEM DESIGN

6.1 SYSTEM ARCHITECTURE

Figure 6.1 System Architecture

12
6.2 USE CASE DIAGRAM

Figure 6.2 Use case diagram

13
6.3 CLASS DIAGRAM

Figure 6.3 Class diagram

14
6.4 SEQUENCE DIAGRAM

Figure 6.4 Sequence diagram

15
6.5 ACTIVITY DIAGRAM

Figure 6.5 Activity diagram

16
CHAPTER 7
SOFTWARE DESCRIPTION

7.1 OVERVIEW

Deepfake technology is becoming increasingly sophisticated, posing serious threats to


authenticity and trust in digital media. To combat this, the proposed system uses deep learning
techniques to detect manipulated media. The solution is based on Convolutional Neural Networks
(CNNs), which excel at recognizing visual patterns in images and videos.

The project leverages Python due to its robust ecosystem of AI and image processing
libraries. Using datasets containing both genuine and deepfake videos (e.g., FaceForensics++,
Celeb-DF), the system is trained to distinguish real from fake content. It provides a seamless
pipeline—from media upload to prediction output—through modular components.

7.2 SOFTWARE MODULES

7.2.1. Data Ingestion Module

The Data Ingestion Module handles the loading of datasets and the extraction of frames
from video files. It organizes data into training, validation, and test sets while managing associated
labels, enabling the model to learn from real-world examples of deepfakes.

7.2.2. Preprocessing Module

The Preprocessing Module is responsible for preparing the input data. It detects and crops
faces from images or frames using face detection libraries like MTCNN or dlib, then resizes and
normalizes them. This module also performs data augmentation to enhance model generalization.

7.2.3. CNN Model Module

The CNN Model Module defines the structure of the neural network used for classification.
It allows for the use of custom or pre-trained models, such as Xception or EfficientNet, and

17
includes functionality for compiling, training, saving, and loading the model architecture and
weights.

7.2.4. Training & Validation Module

The Training & Validation Module manages the model training loop, monitors
performance metrics like accuracy and F1 score, and applies callbacks such as early stopping. This
module ensures the model learns effectively while avoiding overfitting.

7.2.5. Inference Module

The Inference Module is used to predict whether new input media is real or fake. It
processes the input image or video, extracts faces, and applies the trained CNN model. It returns a
classification label with a confidence score.

7.2.6. Explainability Module

The Explainability Module enhances trust in model predictions by generating visual


interpretations. Using Grad-CAM, it highlights the regions of the face that influenced the model’s
decision, which can help users understand why an image or video was flagged.

7.2.7.Streamlit Interface Module

The Streamlit Interface Module provides a lightweight, user-friendly interface where users
can upload images or videos and receive deepfake predictions in real time. It displays results,
confidence levels, and visualizations directly in the browser.

7.2.8. Utility Module

The Utility Module supports various background tasks such as configuration management,
file handling, logging, and formatting. It helps streamline development and debugging by
consolidating reusable functions and settings in one place.

18
7.3 FRAMEWORK OVERVIEW

TensorFlow / Keras

• Offers high-level APIs for rapid prototyping of CNNs.


• Used for model training, validation, and deployment.
• Transfer learning from XceptionNet significantly boosts accuracy and speeds up training.

OpenCV

• Essential for video handling and image manipulation.


• Detects and crops face regions from image .
• Captures webcam streams in real-time detection mode.

Streamlit / Flask (Optional UI)

• Enables simple deployment of the model as a web app.


• Users can upload videos, trigger detection, and view results in browser.
• Easy to integrate backend Python logic with frontend controls.

7.4 FEATURES

• High Detection Accuracy: Leverages state-of-the-art CNN models trained on large


datasets.
• Real-time Processing: Supports on-the-fly frame analysis using webcam (optional).
• Modular Design: Clear separation of components allows easy debugging and
enhancement.
• Dataset Flexibility: Compatible with multiple datasets for robust training and
benchmarking.
• Explainability: Possibility to integrate Grad-CAM for heatmap visualization (optional
enhancement).
• Lightweight Deployment: With tools like TensorFlow Lite, the model can be optimized
for edge devices.

19
CHAPTER 8
SOFTWARE TESTING

8.1 AIM OF TESTING

Software testing in the context of deepfake detection is critical to ensure that the system
accurately and reliably differentiates between real and manipulated media. The aim is to identify
defects in the model logic, data preprocessing, and the user interface, and to validate that the deep
learning model generalizes well across unseen data. This chapter details the testing approaches
applied across all levels of the system to validate functionality, performance, and usability.

8.2 TEST CASES

8.2.1. Valid Image with Clear Face:

This test case involves providing a high-quality image with a clearly visible human face. The input
image should be in a supported format like .jpg or .png. The purpose is to verify that the model
accurately identifies real faces. The expected result is a "Real" prediction with high confidence,
typically above 90%. This confirms the model performs well with ideal input conditions.

8.2.2. Valid Deepfake Image:

In this case, the model is tested using a confirmed deepfake image. The goal is to ensure the CNN
correctly classifies fake content. The model should return a prediction of "Fake" with high
confidence, validating that it has learned to distinguish synthetic facial features and manipulation
artifacts effectively.

8.2.3. No Face in Image:

This test evaluates the system’s response when an image without a human face is submitted. For
example, images of landscapes, objects, or animals can be used. The model should return an error
message like "No face detected," demonstrating that the face detection preprocessing step is
functioning correctly and that unnecessary processing is avoided.

20
8.2.4. Low Confidence Output:

To test how the model handles uncertainty, a low-quality, blurry, or partially obscured face image
is input. The model should still attempt a prediction but may return a output with "Low confidence
prediction." This helps the user understand that the model is uncertain and provides guidance for
corrective action.

8.2.5. Large File Handling:

A very high-resolution image or a 4K video is used as input to assess how the model handles large
data. The goal is to ensure the system does not crash due to memory overload or processing
timeouts. The output should still be correct, and the performance should remain stable, confirming
system scalability.

8.2.6. Adversarial Input:

This advanced test uses adversarial examples—images that are subtly modified to confuse the
model, often with added noise or slight distortions. The goal is to check if the model is robust
against minor perturbations. Ideally, the system should still classify the input as "Fake" if it's
indeed a deepfake, showing resilience against manipulation.

8.3 TYPES OF TESTING

In the development of a robust and reliable media classification system—particularly one


aimed at detecting whether images or videos are real or manipulated (e.g., deepfakes)—a
comprehensive and methodical testing strategy is essential. Testing ensures the system
meets performance, usability, and accuracy standards while minimizing the risk of errors
and inconsistencies in real-world applications. Below is an extended overview of the key
types of testing employed to validate such a system:

8.3.1. Unit Testing

Unit testing involves validating individual components or functions of the system in


isolation to ensure they operate correctly on their own. In the context of this media

21
classification system, unit tests are written for core functions such as frame extraction, face
detection, image preprocessing, and model prediction. Each of these components is tested
using Python’s unittest framework and assertions, which check whether the actual output
matches the expected result for various test cases. For example, the face detection function
may be tested by passing in an image with a known face and verifying that it returns the
correct bounding box. Unit testing enables early detection of bugs, simplifies debugging,
and helps maintain code quality during ongoing development.

8.3.2. Integration Testing

Once individual components have been tested, integration testing is conducted to ensure
that these modules interact correctly when combined. This type of testing examines the
flow of data across modules—specifically from the frame extractor to the preprocessing
unit, then to the deep learning model, and finally to the output generation system. The main
goal is to verify that each component correctly passes formatted and expected data to the
next. For example, integration testing checks whether the preprocessed face images output
by one module are properly formatted and compatible with the input expected by the
classification model. This helps detect interface mismatches, improper data handling, and
communication failures between modules.

8.3.3. Functional Testing

Functional testing evaluates whether the overall system behaves as expected from a user’s
perspective. This includes testing the complete pipeline starting from the user uploading
an image or video, followed by system processing, classification (real or fake), and finally
the display of results on the interface. Test scenarios include valid uploads, invalid file
types, corrupted media, and edge cases like extremely small or blurry faces. The goal is to
ensure the system meets functional requirements, such as successful file uploads, accurate
deepfake detection, and timely feedback. This testing is vital in validating the end-to-end
functionality of the system in realistic usage scenarios.

22
8.3.4. Regression Testing

As the system evolves, with new features added or existing algorithms improved,
regression testing ensures that these changes do not unintentionally disrupt previously
working functionality. For example, if the face detection algorithm is enhanced or the
model is retrained for better accuracy, regression tests are used to retest all critical
features—such as correct image classification and proper result rendering—that were
already working in earlier versions. Automated test scripts are often used for this purpose
to quickly verify that nothing has been broken in the process of updates.

8.3.5. Performance Testing

Performance testing evaluates the system’s responsiveness, efficiency, and scalability


under various workloads. Key metrics include the average prediction time (ideally less than
1 second per image) and the model loading time (targeted at under 3 seconds). This type
of testing also simulates multiple users uploading media simultaneously to check how well
the system performs under stress.

Tools and scripts are used to simulate concurrent uploads and track the system’s ability to
maintain consistent response times, handle memory efficiently, and recover from overload
situations. A well-performing system ensures users experience minimal delays even during
peak usage.

8.3.6. Usability Testing

Usability testing focuses on the design and user interface of the application, ensuring that
it is intuitive and accessible for users with varying levels of technical expertise. Test
participants are asked to perform common tasks such as uploading files, interpreting
results, and troubleshooting errors. During testing, evaluators look for signs of confusion,
difficulty, or hesitation. Elements such as clear instructions, helpful tooltips, informative
error messages, and easy navigation are essential. For instance, if a user uploads an

23
unsupported file format, the system should provide a clear message indicating the accepted
formats. Based on usability feedback, the interface is adjusted to ensure a smooth and user-
friendly experience.

8.3.7. Black Box Testing

Black box testing treats the system as a "black box" where the internal code and
architecture are not considered. Instead, testing focuses purely on inputs and outputs.
Testers provide a variety of input media files and observe the output (real or fake
classification, error messages, etc.) to ensure correctness. They also evaluate how the
system responds to unexpected or incorrect input, such as uploading text files or extremely
large videos. The goal is to ensure the application behaves correctly and predictably from
a user's perspective, regardless of the underlying implementation.

8.3.8. White Box Testing

In contrast to black box testing, white box testing involves a detailed examination of the
internal workings of the system. This includes checking the structure of the code, data
transformations, model layer outputs, and normalization processes. For example, testers
may verify that pixel values are normalized to the correct range before being fed into the
model, or that the intermediate outputs of convolutional layers fall within expected
distributions. This kind of testing is particularly useful for debugging and optimizing model
performance and verifying that the architecture and data handling conform to design
specifications.

8.3.9. Output Testing

Output testing validates the accuracy and clarity of the final system outputs. The
classification results (i.e., "Real" or "Fake") are compared against a labeled test dataset to
assess prediction accuracy. Additionally, the visual presentation of results is examined—
for instance, checking whether the predicted label is displayed near the detected face along

24
with a confidence score overlay. The correctness of the overlay, font clarity, color-coding
(e.g., red for fake, green for real), and alignment with detected features are tested to ensure
users can easily understand the results.

8.3.10. User Acceptance Testing (UAT)

User Acceptance Testing is the final phase where the system is tested by actual end users—
typically a representative group of the intended audience. These users interact with the
system by uploading various media files and interpreting the detection results. Their
feedback is collected on several parameters, including the clarity of classification results,
usefulness of confidence levels, and ease of navigation and interaction. Based on this
feedback, minor enhancements are often implemented, such as more descriptive file format
alerts, better result styling, and improved layout responsiveness. UAT ensures that the
system is ready for deployment and meets real-world user expectations.

8.4 TESTING TOOLS USED

In any machine learning or AI-based system, testing is a critical phase that ensures
reliability, correctness, and performance under various conditions. In this project, several tools
have been employed to support both unit-level and system-level testing. The Python modules
unittest and pytest serve as automated unit testing frameworks. These tools allow the developer to
create test cases for individual components such as data loading, preprocessing, face detection,
and model prediction. They help maintain the integrity of the codebase by ensuring that newly
added functions do not break existing features. pytest in particular provides a more scalable and
user-friendly syntax and supports advanced features like fixtures and parameterized testing,
making it ideal for complex deep learning projects.

TensorBoard, a visualization toolkit provided by TensorFlow, is used extensively for


monitoring the training process. It allows the developer to visualize metrics such as training loss,
validation accuracy, precision, recall, and other custom scalars. This visualization is crucial for
identifying issues such as overfitting, underfitting, or vanishing gradients. Additionally,

25
TensorBoard’s interactive graphs and histograms help in understanding how weights and biases
evolve during the training process.

To complement automated tools, manual verification is performed using OpenCV, a


powerful open-source computer vision library. With OpenCV, individual video frames or images
can be visually inspected to ensure that face detection and alignment processes are functioning as
intended. It also helps in detecting anomalies that may not be captured through code-based tests,
such as incorrect face cropping or lighting inconsistencies. Lastly, Jupyter Notebook serves as the
primary environment for code development and debugging. Its interactive interface allows
developers to experiment with different model configurations, run cell-by-cell execution, and view
outputs in real time, which is highly advantageous during model tuning and testing.

8.5 MODEL VALIDATION METRICS

Evaluating the performance of a deep learning model goes beyond simply reporting
accuracy. For a binary classification task such as deepfake detection, it is vital to use a set of robust
evaluation metrics that account for different types of prediction errors. Accuracy, while commonly
used, only indicates the overall correctness of predictions. It can be misleading in imbalanced
datasets where one class may dominate. For instance, if most videos are real, a model predicting
everything as real might still appear accurate.

To address this limitation, Precision is used to measure the number of correctly identified
fake instances divided by the total instances the model predicted as fake. High precision indicates
that when the model claims something is fake, it is likely correct—important in minimizing false
accusations of authenticity. Conversely, Recall focuses on the model’s ability to detect actual fake
content. It is calculated as the number of correctly predicted fake instances divided by the total
number of actual fake samples. High recall ensures the model doesn't miss potential threats in the
form of deepfakes.

The F1-Score serves as a balanced metric that considers both precision and recall. It is
especially useful when dealing with uneven class distributions or when both false positives and
false negatives carry significant consequences. An ideal deepfake detection model should aim for
a high F1-score to maintain balance between caution and coverage. Finally, the ROC-AUC

26
(Receiver Operating Characteristic - Area Under the Curve) metric is employed to evaluate the
trade-off between sensitivity (true positive rate) and specificity (false positive rate) across various
threshold settings.

8.6 ERROR HANDLING

Building a user-facing AI system demands that it not only performs accurately but also handles
unexpected situations gracefully. The error handling module in the deepfake detection system is
designed to provide informative and user-friendly responses to a variety of potential issues,
ensuring robustness and enhancing user experience.

One common scenario is when a user uploads an image or video where no recognizable
human face is present. In such cases, the system returns the message: “No human face detected.”
This prevents the model from processing irrelevant or non-human content, which could lead to
misleading outputs. This check is implemented early in the pipeline using face detection
algorithms like MTCNN or Haar cascades.

Another error addressed is invalid file formats. The system is designed to work with
specific media formats (e.g., .jpg, .png, .mp4), and when an unsupported file is uploaded, it
prompts the message: “Unsupported file type.” This safeguards the application from crashing due
to unrecognized data structures and guides the user toward acceptable input types.

If the system encounters a failure in loading the trained model—either due to file
corruption, incorrect path, or missing files—it raises an alert with the message: “Model loading
error.” This is a critical failure point, and the error message informs the user or developer to
recheck the deployment files.

Lastly, the system incorporates a confidence threshold mechanism. If the model makes a
prediction but with a confidence level below 60%, it triggers a warning: “Low confidence. Re-
upload suggested.” This acts as a safeguard against unreliable outputs and encourages users to
submit better-quality inputs, such as clearer images or videos with good lighting and frontal faces.
Collectively, these error-handling features make the system more reliable, user-oriented, and
capable of functioning well in real-world scenarios.

27
CHAPTER 9
CONCLUSION AND FUTURE ENHANCEMENT

9.1 CONCLUSION

The increasing prevalence of deepfake media poses a significant threat to digital content
authenticity, personal identity, and information security. This project presents an effective solution
to detect deepfake videos and images using deep learning models. By leveraging convolutional
neural networks (CNNs), the system can learn and extract complex visual features from input
media to distinguish between real and fake content with high accuracy.

Throughout the project, various aspects of deepfake generation and detection were
explored. The proposed system was trained and tested on benchmark datasets and demonstrated
promising results in identifying synthetic facial manipulations. Unlike traditional manual or rule-
based methods, this system relies on learned features, making it more scalable, adaptive, and
suitable for real-world applications.

This work contributes to the broader field of digital forensics and can assist platforms, law
enforcement, and the general public in countering misinformation, fraud, and media tampering.
The proposed model successfully meets the core objectives of detecting manipulated media and
improving awareness regarding the risks of deepfake content.

9.2 FUTURE ENHANCEMENT

While the proposed deepfake detection system demonstrates effective performance, there
are opportunities for further development and improvement in future work. Some key areas of
enhancement include:

Incorporating Temporal Features: Current models often analyze frames individually. Adding
temporal models like 3D CNNs or LSTMs will enable better video-level analysis by capturing
motion-based inconsistencies.

28
Multi-modal Detection: Integrating both audio and video features will provide more robust
detection, particularly in detecting deepfakes that also manipulate voice and speech patterns.

Real-time Detection Capabilities: Optimization of the system for real-time processing can allow
for implementation in web applications, browser extensions, or mobile platforms for on-the-fly
deepfake analysis.

Cross-Dataset Generalization: Enhancing the model’s ability to generalize across different


datasets and manipulation techniques will improve reliability against novel or unseen deepfake
generation methods.

User Interface Development: Building a simple and interactive front-end interface would allow
non-technical users to upload and check media content for authenticity.

This project lays a solid foundation for future advancements in automated deepfake detection and
has the potential to evolve into a full-fledged system that plays a key role in combating the spread
of synthetic misinformation.

29
APPENDICES

APPENDIX 1
SOURCE CODE

Training.py

import tensorflow as tf

from tensorflow import keras

from tensorflow.keras import layers # type: ignore

import numpy as np

import cv2

import os

from sklearn.model_selection import train_test_split

# Define Model Architecture

def build_model():

model = keras.Sequential([

layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),

layers.MaxPooling2D((2, 2)),

layers.Conv2D(64, (3, 3), activation='relu'),

layers.MaxPooling2D((2, 2)),

layers.Conv2D(128, (3, 3), activation='relu'),

layers.MaxPooling2D((2, 2)),

layers.Flatten(),

layers.Dense(128, activation='relu'),

30
layers.Dense(1, activation='sigmoid')

])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

return model

# Load Data from Train/Test Directories

def load_data(data_dir):

def process_folder(folder):

images, labels = [], []

for label, category in enumerate(['real', 'fake']):

category_path = os.path.join(folder, category)

for img_name in os.listdir(category_path):

img_path = os.path.join(category_path, img_name)

img = cv2.imread(img_path)

img = cv2.resize(img, (128, 128))

images.append(img)

labels.append(label)

return np.array(images) / 255.0, np.array(labels)

train_images, train_labels = process_folder(os.path.join(data_dir,


'D:\jp\dataset\Dataset\Train'))

test_images, test_labels = process_folder(os.path.join(data_dir,


'D:\jp\dataset\Dataset\Test'))

return train_images, train_labels, test_images, test_labels

31
# Train Model

data_dir = 'D:\jp\dataset' # Update with actual path

X_train, y_train, X_test, y_test = load_data(data_dir)

model = build_model()

model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test,


y_test))

model.save("deepfake_model.h5")

32
APP.PY

import streamlit as st

import tensorflow as tf

import numpy as np

from PIL import Image

import cv2

# Load the trained model

def load_model():

return tf.keras.models.load_model("D:\jp\deepfake_model.h5")

# Preprocess image

def preprocess_image(image):

image = image.resize((128, 128))

image = np.array(image) / 255.0 # Normalize

image = np.expand_dims(image, axis=0) # Add batch dimension

return image

# Prediction function

def predict_image(image, model):

processed_image = preprocess_image(image)

prediction = model.predict(processed_image)[0][0]

confidence = prediction if prediction > 0.5 else 1 - prediction

result = "Deepfake" if prediction > 0.5 else "Real"

return result, confidence

# Streamlit UI

33
st.title("Deepfake Detection System")

st.write("Upload an image to check if it's real or a deepfake.")

model = load_model()

# Image Upload

uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "png", "jpeg"])

if uploaded_file is not None:

image = Image.open(uploaded_file)

st.image(image, caption="Uploaded Image", use_column_width=True)

result, confidence = predict_image(image, model)

st.write(f"### Prediction: {result} (Confidence: {confidence:.2%})")

if __name__ == "__main__":

st.write("Deepfake Detection Ready!")

1.3 Deepfake Prediction Function

import numpy as np

def predict_image(model, image_path):

face = extract_face(image_path)

if face is not None:

face = np.expand_dims(face, axis=0) # Add batch dimension

prediction = model.predict(face)[0][0]

return "Fake" if prediction > 0.5 else "Real"

else:

return "No face detected"

34
APPENDIX 2
SCREENSHOTS

INITIAL WEBPAGE

Figure 10.1 Streamlit webpage for deepfake detection

35
DEEPFAKE DETECTION

1.REAL IMAGES

Figure 10.2 DeepFake Detection prediction 1:Real

36
Figure 10.3 DeepFake Detection prediction 2: Real

37
Figure 10.4 DeepFake Detection prediction 3: Real

38
2.FAKE IMAGES

Figure 10.5 DeepFake Detection prediction 4: Deepfake

39
Figure 10.6 DeepFake Detection prediction 5 :Deepfake

40
REFERENCES

Figure 10.7 DeepFake Detection prediction 6: Deepfake

41
REFERENCES

1. Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019).
FaceForensics++: Learning to detect manipulated facial images.
2. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: a Compact Facial
Video Forgery Detection Network. In Proceedings of the IEEE International Workshop
on Information Forensics and Security (WIFS), 1–7.
3. Chollet, F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 1251–1258.
4. Nguyen, H. H., Yamagishi, J., & Echizen, I. (2019). Capsule-forensics: Using capsule
networks to detect forged images and videos. In ICASSP 2019 - IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2307–2311.
5. Li, Y., Chang, M. C., & Lyu, S. (2020). Face X-ray for more general face forgery
detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 5001–5010.
6. Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C.
(2020). The Deepfake Detection Challenge (DFDC) Dataset. arXiv preprint
arXiv:2006.07397.
https://www.kaggle.com/c/deepfake-detection-challenge
7. Li, Y., & Lyu, S. (2019). Exposing DeepFake Videos By Detecting Face Warping
Artifacts. In Proceedings of the IEEE Conference (CVPRW).
8. Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-
Scale Image Recognition. arXiv preprint arXiv:1409.1556.
9. Abavisani, M., & Patel, V. M. (2020). Exploring the Space of Deepfake Detection. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), 1–8.
10. Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. In
Proceedings of the International Conference on Learning Representations (ICLR).
https://arxiv.org/abs/1412.6980

42

You might also like