0% found this document useful (0 votes)
13 views

Project Report Model

This mini project report focuses on developing a deepfake detection system using deep learning techniques, specifically Convolutional Neural Networks (CNNs). The system aims to automate the identification of manipulated media, addressing the challenges posed by the increasing sophistication of deepfake content. By training on a dataset of real and fake images, the proposed solution seeks to enhance digital forensics, content moderation, and public trust in media authenticity.

Uploaded by

cnathishwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Project Report Model

This mini project report focuses on developing a deepfake detection system using deep learning techniques, specifically Convolutional Neural Networks (CNNs). The system aims to automate the identification of manipulated media, addressing the challenges posed by the increasing sophistication of deepfake content. By training on a dataset of real and fake images, the proposed solution seeks to enhance digital forensics, content moderation, and public trust in media authenticity.

Uploaded by

cnathishwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

DEEPFAKE DETECTION USING

DEEP LEARNING
Submitted by

DINESHWAR C (810422243026)
ABINESH M (810422243002)
JEYARAJ K (810422243038)
AMRITH ALOISHIOUS A (810422243004)

Of
DHANALAKSHMI SRINIVASAN ENGINEERING COLLEGE, (AUTONOMOUS)
PERAMBALUR – 621 212

A MINI PROJECT REPORT

Submitted to the

FACULTY OF INFORMATION AND COMMUNICATION ENGINEERING

In partial fulfillment of the requirements


for the award of the degree
Of
BACHELOR OF TECHNOLOGY
In
ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

ANNA UNIVERSITY CHENNAI - 600 025


MAY 2025
i
BONAFIDE CERTIFICATE

Certified that this mini project report titled “PERSONAL VIRTUAL AI ASSISTANT”

is the bonafide work of “DINESHWAR.C (810422243026), ABINESH.M(810422243002),

JEYARAJ.K (810422243038),AMRITH ALOISHIOUS.A (810422243004)” who carried out

the research under my supervision. certified further, that to the best of my knowledge the work reported

here in does not form part of any other project report or dissertation on the basis of which a degree or

award was conferred on an earlier occasion on this or any other candidate.

SIGNATURE SIGNATURE

SUPERVISOR HEAD OF THE DEPARTMENT

Dr.SHREE K.V,M,M.E…,(Ph.D) Dr. SHREE K.V.M, M.E..., Ph.D

Professor & Head, Professor & Head,

Department of Artificial Intelligence and Department of Artificial Intelligence and


Data Science,
Data Science,

Dhanalakshmi Srinivasan engineering college Dhanalakshmi Srinivasan engineering


(Autonomous), Perambalur-621 212. college (Autonomous), Perambalur-621 212.

Submitted for Mini Project Viva-Voce Examination held on

INTERNAL EXAMINER EXTERNAL EXAMNIER

ii
ACKNOWLEDGEMENT

It is with immense pleasure that I present my first venture in the field of real application

of computing in the form of project work. First, I am indebted to the Almighty for his choicest

blessing showered on me in completing this endeavor.

I express my sincere thanks to Shri. A. SRINIVASAN, Chancellor, Dhanalakshmi

Srinivasan University, for having given me an opportunity to study in this Institution.

I would also like to acknowledge Dr. SHANMUGASUNDARAM, M..E., Ph.D.,

Principal, Dr. K. ANBARASAN, M.E, Ph.D., Dean Dhanalakshmi Srinivasan Engineering

College (Autonomous), Perambalur for their moral support and encouragement they have

rendered throughout the course. I express my sincere thanks to the Head of the Department

Dr.

K.V.M SHREE, M.E., Ph.D., for having provided us with all the necessary specifications.

I owe my heartfelt thanks to my Internal Guide Dr.SHREE K.V.M, M.E…(Ph.D) for

his guidance and suggestions during this project work.

We render our thanks to all the staff members and programmers of department of

ARTIFICIAL INTELLIGENCE AND DATA SCIENCE for their timely assistance.

iii
ABSTRACT

With the rise of manipulated media, deepfake content has become a serious concern in

digital security, social media, and public trust. These synthetic images and videos, generated

using AI, can be indistinguishable from real ones to the human eye, posing significant ethical

and security threats. Manual identification of deepfake images is not only challenging but also

impractical at scale. Human evaluation is time-consuming, prone to error, and insufficient to

counter the rapid spread of fake content online. This calls for an automated, reliable solution to

detect deepfakes accurately and efficiently. This project proposes the use of Convolutional

Neural Networks (CNNs), a powerful class of deep learning models, to automate the detection of

deepfake images. The system is trained on a labeled dataset of real and fake images, using a

custom-built CNN architecture consisting of convolutional, max-pooling, and dense layers. The

model takes preprocessed 128×128 RGB images as input and is trained using binary cross-

entropy loss with the Adam optimizer. The system achieves high classification accuracy and

effectively distinguishes between real and synthetic images. The proposed CNN-based deepfake

detection system provides a fast and scalable solution for identifying manipulated images. It can

serve as a valuable tool in digital forensics, content moderation, and media authentication,

helping reduce the spread of misinformation and enhancing online trust.

iv
TABLE OF CONTENTS

CHAPTER NO TITLE PAGE NO

ABSTRACT iv

LIST OF FIGURES vii

LIST OF ABBREVIATIONS viii

1 INTRODUCTION 1

1.1 INTRODUCTION 1

1.2 PURPOSE 1

1.3 PROBLEM STATEMENT 2

1.4 MOTIVATION 2

1.5 OBJECTIVES 3

2 LITERATURE SURVEY 4

3 SYSTEM ANALYSIS 6

3.1 EXISTING SYSTEM 6

3.2 DISADVANTAGES OF EXISTING SYSTEM 6

3.3 PROPOSED SYSTEM 6

3.4 ADVANTAGES OF PROPOSED SYSTEM 7

4 SYSTEM SPECIFICATIONS 8

4.1 HARDWARE REQUIREMENTS 8

4.2 SOFTWARE REQUIREMNTS 8

5 SYSTEM IMPLEMENTATION 9

5.1 LIST OF MODULES 9

5.2 MODULE DESCRIPTION 9

v
5.2.1 DATASET COLLECTION 9

5.2.2 DATA PREPROCESSING 9

5.2.3 MODEL DESIGN AND TRAINING 10

5.2.4 DEEPFAKE DETECTION 10

5.2.5 EVALUATION AND RESULT VISUALIZATION 10

6 SYSTEM DESIGN 12

6.1 SYSTEM ARCHITECTURE 12

6.2 USE CASE DIAGRAM 13

6.3 CLASS DIAGRAM 14

6.4 SEQUENCE DIAGRAM 15

6.5 ACTIVITY DIAGRAM 16

7 SOFTWARE DESCRIPTION 17

7.1 OVERVIEW 17

7.2 SOFTWARE MODULES 17

7.2.1 DATA INGESTION MODULE 17

7.2.2 PREPROCESSING MODULE 17

7.2.3 CNN MODEL MODULE 17

7.2.4 TRAINING AND VALIDATION MODULE 18

18
7.2.5 INFERENCE MODULE
18
7.2.6 EXPLAINABILITY MODULE
18
7.2.7 STREAMLIT INTERFACE MODULE
18
7.2.8 UTILITY MODULE
18
7.3 FRAMEWORK OVERVIEW
19
7.4 FEATURES
19

vi
8 SOFTWARE TESTING 20

8.1 AIM OF TESTING 20

8.2 TEST CASE 20

8.2.1 VALID IMAGE WITH CLEAR FACE 20

8.2.2 VALID DEEPFAKE IMAGE 20

8.2.3 NO FACE IN IMAGE 20

8.2.4 LOW CONFIDENCE OUTPUT 21

8.2.5 LARGE FILE HANDLING 21

8.2.6 ADVERSERIAL INPUT 21

8.3 TYPES OF TESTING

8.3.1 UNIT TESTING 21

8.3.2 INTEGRATION TESTING 21

8.3.3 FUNCTIONAL TESTING 22

8.3.4 REGRESSION TESTING 22

8.3.5 PERFORMACE TESTING 23

8.3.6 USABILITY TESTING 23

8.3.7 BLACK BOX TESTING 23

8.3.8 WHITE BOX TESTING 24

8.3.9 OUTPUT TESTING 24

8.3.10 USER ACCEPTANCE TESTING 24

8.4 TESTING TOOLS USED 25

8.5 MODEL VALIDATION METRICS 25

8.6 ERROR HANDLING 26

27

vii
9 CONCLUSION AND FUTURE ENHANCEMENT 28

9.1 CONCLUSION 28

9.2 FUTURE ENHANCEMENT 28

APPENDICES 29

APPENDIX 1 - SOURCE CODE 30

APPENDIX 2 - SCREENSHOTS 35

REFERENCES 42

viii
LIST OF FIGURES

FIGURE NO TITLE PAGE NO

6.1 SYSTEM ARCHITECTURE 12

6.2 USE CASE DIAGRAM 13

6.3 CLASS DIAGRAM 14

6.4 SEQUENCE DIAGRAM 15

6.5 ACTIVITY DIAGRAM 16

10.1 STREAMLIT WEBPAGE FOR DEEPFAKE DETECTION 35

10.2 DEEPFAKE DETECTION PREDICTION 1: REAL 36

10.3 DEEPFAKE DETECTION PREDICTION 2: REAL 37

10.4 DEEPFAKE DETECTION PREDICTION 3: REAL 38

10.5 DEEPFAKE DETECTION PREDICTION 4: DEEPFAKE 39

10.6 DEEPFAKE DETECTION PREDICTION 5: DEEPFAKE 40

10.7 DEEPFAKE DETECTION PREDICTION 6: DEEPFAKE 41

ix
LIST OF ABBREVIATIONS

ABBREVIATION FULL FORM

AI - ARTIFICIAL INTELLIGENCE

GANS - GENERATIVE ADVERSARIAL NETWORKS

CNN - CONVOLUTIONAL NEURAL NETWORK

DFDC - DEEPFAKE DETECTION CHALLENGE

LSTM - LONG SHORT-TERM MEMORY

3D CNNS - THREE-DIMENSIONAL CONVOLUTIONAL NEURAL


NETWORKS

MTCNN - MULTI-TASK CASCADED CONVOLUTIONAL NEURAL


NETWORKS

ROC-AUC - RECEIVER OPERATING CHARACTERISTIC - AREA UNDER


CURVE

CPU - CENTRAL PROCESSING UNIT

GPU - GRAPHICS PROCESSING UNIT

RAM - RANDOM ACCESS MEMORY

SSD - SOLID STATE DRIVE

OS - OPERATING SYSTEM

VS CODE - VISUAL STUDIO CODE

JSON - JAVASCRIPT OBJECT NOTATION

UI - USER INTERFACE

API - APPLICATION PROGRAMMING INTERFACE

OPENCV - OPEN SOURCE COMPUTER VISION LIBRARY

TPU - TENSOR PROCESSING UNIT

BCE - BINARY CROSS ENTROPY

x
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION

In recent years, the rise of deepfakes—synthetically altered or generated images and


videos using artificial intelligence—has emerged as a major threat to digital authenticity. These
manipulated media can be used maliciously to spread misinformation, commit fraud, or
manipulate public opinion. Deepfakes are typically generated using techniques such as
Generative Adversarial Networks (GANs), which create realistic but fake content that is difficult
for humans to distinguish from authentic material.

As the quality of deepfakes continues to improve, the challenge of detecting them


becomes increasingly complex. This has led to a surge of interest in developing reliable and
automated systems that can detect such manipulations. Deep learning, a subset of artificial
intelligence that excels at pattern recognition, offers promising tools for tackling this problem.
By training neural networks on large datasets of real and fake media, models can learn subtle
artifacts and inconsistencies indicative of tampering.

This project aims to build an effective deepfake detection system using deep learning
techniques. The proposed system analyzes facial and visual features in video frames and uses a
convolutional neural network (CNN) model to classify content as real or fake. This technology
has applications in digital forensics, social media moderation, and media verification.

1.2 PURPOSE

The purpose of this project is to design and implement a deep learning-based system
capable of accurately detecting deepfake videos and images. With the increasing availability of
tools that allow even non-experts to create highly realistic fake media, the integrity of digital
content has become a growing concern. This project aims to combat the misuse of such synthetic
content by developing a reliable and automated solution that can identify tampered visuals
through analysis of facial and visual inconsistencies. By leveraging convolutional neural
networks (CNNs) and other deep learning architectures, the system is expected to detect subtle

1
artifacts introduced

2
during the generation of deepfakes. Beyond technical implementation, the broader purpose of
this work is to support digital forensics, social media moderation, and public awareness efforts
by offering a scalable method for verifying the authenticity of digital media. This contributes to
safeguarding individuals, organizations, and societies from the potentially harmful consequences
of misinformation, impersonation, and fraud caused by deepfakes.

1.3 PROBLEM STATEMENT

Deepfakes present a serious threat to digital content authenticity, with potentially severe
implications for individuals, corporations, and governments. These AI-generated videos and
images can be manipulated to falsely represent people saying or doing things they never did,
leading to misinformation, defamation, identity theft, and political manipulation. The quality of
deepfakes has advanced to the point where they are nearly indistinguishable from genuine media,
making manual detection by human observers unreliable. Existing detection methods are often
limited in scope, lack real-time performance, and struggle to keep pace with rapidly evolving
deepfake generation techniques. Moreover, traditional forensic analysis techniques are time-
consuming and require expert intervention, making them unsuitable for large-scale content
verification. This project addresses these challenges by developing an intelligent, automated
system that uses deep learning to detect deepfakes with high accuracy. By training models on a
combination of real and fake media datasets, the system aims to identify subtle features that
distinguish authentic content from manipulated media, thus helping to restore trust in digital
communications.

1.4 MOTIVATION

The motivation behind this project stems from the growing misuse of AI-generated
content in malicious contexts such as political misinformation, fake news, financial scams, and
personal reputation damage. By developing a deepfake detection model, we can help combat
these threats and promote trust in digital content, while contributing to AI accountability and
responsible media practices.

3
1.5 OBJECTIVES

 To study and understand deepfake generation and detection techniques.


 To collect and preprocess a dataset consisting of real and fake images/videos.
 To implement and train a convolutional neural network (CNN) or other deep learning
model for detection.
 To evaluate the model’s performance using accuracy, precision, recall, and F1-score.
 To build a prototype system that can flag potential deepfakes.

4
CHAPTER 2
LITERATURE SURVEY

With the rapid development of generative adversarial networks (GANs) and related
technologies, deepfakes have become one of the most challenging threats to digital media
authenticity. Consequently, researchers have actively explored various techniques for detecting
such manipulations using machine learning and deep learning approaches. This chapter reviews
significant existing work in the field of deepfake detection.

Chollet (2017) introduced XceptionNet, a deep convolutional neural network


architecture that later became a foundational model for deepfake detection. Trained on datasets
like FaceForensics++, XceptionNet proved highly effective in identifying manipulated facial
regions due to its depthwise separable convolutions and robust feature extraction capabilities.

Afchar et al. (2018) proposed MesoNet, a lightweight CNN architecture designed


specifically for detecting deepfakes in compressed video formats. The model showed that
deepfake content often contains subtle inconsistencies in mesoscopic features, which can be
effectively captured by shallow neural networks for classification.

Nguyen et al. (2019) explored the use of capsule networks for deepfake detection,
highlighting their ability to preserve spatial hierarchies in facial structures. Their work showed
promise in scenarios where traditional CNNs struggled due to geometric transformations.

Li et al. (2020) proposed Face X-ray, a technique that identifies blending artifacts in
deepfakes by treating the problem as an image segmentation task. This method detects whether a
given image contains a combination of two facial regions—a common trait in face-swapping
deepfakes.

The DeepFake Detection Challenge (DFDC) launched by Facebook and hosted on


Kaggle provided one of the largest benchmark datasets for evaluating detection models. This
challenge spurred innovation in the development of models that could generalize across different
deepfake generation techniques and compression levels.

5
More recent approaches have leveraged transformer-based models and attention
mechanisms to improve detection accuracy. These models focus on capturing long-range
dependencies and facial expressions more effectively, which are often difficult to fake
consistently across frames.

Despite advancements, many studies have identified generalization as a key challenge—


models trained on specific datasets often struggle when tested on new, unseen deepfake
generation methods. This highlights the need for robust, adaptive detection frameworks.

6
CHAPTER 3
SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

In the current landscape, deepfake detection systems face significant challenges due to
the rapid evolution of deepfake generation techniques. Many existing systems rely on manual
observation or traditional video forensics, which are time-consuming, inconsistent, and often
ineffective against high-quality deepfakes. While several machine learning approaches have been
introduced, many of them lack generalization capability and fail to maintain high accuracy across
various deepfake datasets.

Earlier detection methods depended on handcrafted features such as inconsistencies in


eye blinking, head pose, or lighting conditions. Although useful in some cases, these approaches
are limited by their dependence on specific artifacts, making them vulnerable to more advanced
or novel deepfake techniques that do not exhibit those artifacts. Moreover, these systems
typically lack scalability and cannot process large volumes of content efficiently.

3.2 DISADVANTAGES OF EXISTING SYSTEMS

 Rely heavily on specific artifacts or features.


 Poor generalization to unseen deepfake generation methods.
 Require manual intervention or expert knowledge.
 Lack real-time processing capabilities.
 Struggle with performance on compressed or low-resolution media.

3.3 PROPOSED SYSTEM

The proposed system leverages deep learning, specifically convolutional neural networks
(CNNs), to detect deepfake content based on learned features rather than manually crafted ones.
This system is designed to automatically analyze image or video frames and identify subtle facial
distortions, pixel-level inconsistencies, or blending artifacts commonly found in manipulated

7
media. By training the model on a diverse dataset of real and fake videos/images, the system
aims to achieve high accuracy and robustness.

The architecture of the system may include a pretrained model (e.g., XceptionNet,
EfficientNet, or ResNet) fine-tuned on deepfake datasets such as FaceForensics++, DFDC, or
Celeb-DF. The model extracts deep visual features from each frame and classifies them as "real"
or "fake." The system can be extended to include temporal models (e.g., LSTM or 3D CNNs) to
analyze video frame sequences and improve detection based on motion inconsistencies.

This solution is scalable, fast, and capable of detecting both known and emerging
deepfake types. It can be deployed in content moderation systems, mobile apps, or browser
extensions to verify media authenticity in real-time.

3.4 ADVANTAGES OF THE PROPOSED SYSTEM

 Learns features automatically from large datasets.


 Performs well even on high-quality or compressed deepfakes.
 Can generalize across various manipulation techniques.
 Suitable for large-scale deployment in real-world scenarios.
 Supports both image and video-based detection.

8
CHAPTER 4
SYSTEM SPECIFICATIONS

4.1 HARDWARE REQUIREMENTS

 Processor : Intel i5 / AMD Ryzen 5

 RAM : 8-16 GB

 Storage : 256 GB

 Keyboard : Standard keyboard

 Monitor : 15-inch color monitor

4.2 SOFTWARE REQUIREMENTS

 Operating systems Windows 10/11,Ubuntu 20.04+

 Programming language Python 3.8+

 Python libraries Numpy,Pandas,Opencv,Matplotlib,Schikit

learn,Tensorflow

 Development tools VS Code/Jupyter Notebook

 Dataset Kaggle deep fake detection dataset

 Web based interface Streamlit /Django/Flask

9
CHAPTER 5
SYSTEM
IMPLEMENTATION

5.1 LIST OF MODULES

1. Dataset Collection
2. Data Preprocessing
3. Model Design and Training
4. Deepfake Detection
5. Result Evaluation and Visualization

5.2 MODULE DESCRIPTION

5.2.1. Dataset Collection

The Dataset Collection module involves gathering a diverse and comprehensive dataset
that includes both real and deepfake images or video frames. Public datasets such as
FaceForensics++, DFDC, or Celeb-DF are often used for this purpose. These datasets provide a
wide range of manipulated content, ensuring variety in terms of facial expressions, lighting
conditions, backgrounds, and manipulation techniques. The goal is to collect enough data to train
a robust model capable of generalizing to different types of deepfake content.

5.2.2. Data Preprocessing

This module is responsible for preparing the collected data for training. It includes
extracting frames from video files, detecting and aligning faces using tools like MTCNN or
OpenCV, resizing images to the desired input size for the CNN (typically 224x224 pixels), and
normalizing pixel values. To enhance model generalization and reduce overfitting, data
augmentation techniques such as rotation, flipping, brightness adjustment, and noise addition are
also applied. Proper preprocessing ensures consistency and improves the efficiency of the model
training process.

10
5.2.3. Model Design and Training

In this module, a Convolutional Neural Network (CNN) is designed and trained for
binary classification—determining whether an input is real or fake. This can involve building a
custom CNN architecture or fine-tuning a pre-trained model such as VGG16 or ResNet. The
model is trained using a binary crossentropy loss function and an optimizer like Adam. The
dataset is split into training, validation, and test sets to monitor the model's performance and
prevent overfitting. Training is carried out over multiple epochs, and key metrics such as training
accuracy and loss are recorded.

5.2.4. Deepfake Detection

Once the model is trained, it is used in this module to classify new or unseen media. The
input image or video frame undergoes the same preprocessing steps and is then passed through
the CNN model to predict the likelihood of it being real or fake. Based on the output probability,
the system labels the input accordingly. This module represents the core functionality of the
system— real-time or batch detection of deepfakes using the trained model.

5.2.5 Evaluation and Result Visualization

The final module of the system is designed to thoroughly evaluate model performance and
present the results through intuitive and insightful visualizations. It begins with the confusion
matrix, which clearly outlines the distribution of true positives, true negatives, false positives,
and false negatives, helping identify the model’s strengths and the nature of its
misclassifications. To further assess the quality of the classification, the module includes a
Receiver Operating Characteristic (ROC) curve, which demonstrates the trade-off between
sensitivity and specificity across different threshold values, providing a visual guide for selecting
an optimal decision boundary.

In addition, accuracy and loss graphs are plotted to monitor the model’s training process
over time, comparing training and validation metrics to detect issues like overfitting or
underfitting.

11
As an added feature, an optional interactive interface is available, allowing users to
upload video files and observe real-time detection results. This interface displays visual outputs
such as bounding boxes and labels on each frame, offering a hands-on, user-friendly way to test
and explore the system’s functionality. Collectively, these tools not only deliver a complete
evaluation of the model’s performance but also enhance its interpretability and accessibility for
both technical and non-technical users.

12
CHAPTER 6
SYSTEM DESIGN

6.1 SYSTEM ARCHITECTURE

Figure 6.1 System Architecture

13
6.2 USE CASE DIAGRAM

Figure 6.2 Use case diagram

14
6.3 CLASS DIAGRAM

Figure 6.3 Class diagram

15
6.4 SEQUENCE DIAGRAM

Figure 6.4 Sequence diagram

16
6.5 ACTIVITY DIAGRAM

Figure 6.5 Activity diagram

17
CHAPTER 7
SOFTWARE DESCRIPTION

7.1 OVERVIEW

Deepfake technology is becoming increasingly sophisticated, posing serious threats to


authenticity and trust in digital media. To combat this, the proposed system uses deep learning
techniques to detect manipulated media. The solution is based on Convolutional Neural
Networks (CNNs), which excel at recognizing visual patterns in images and videos.

The project leverages Python due to its robust ecosystem of AI and image processing
libraries. Using datasets containing both genuine and deepfake videos (e.g., FaceForensics++,
Celeb-DF), the system is trained to distinguish real from fake content. It provides a seamless
pipeline—from media upload to prediction output—through modular components.

7.2 SOFTWARE MODULES

7.2.1. Data Ingestion Module

The Data Ingestion Module handles the loading of datasets and the extraction of frames
from video files. It organizes data into training, validation, and test sets while managing
associated labels, enabling the model to learn from real-world examples of deepfakes.

7.2.2. Preprocessing Module

The Preprocessing Module is responsible for preparing the input data. It detects and crops
faces from images or frames using face detection libraries like MTCNN or dlib, then resizes and
normalizes them. This module also performs data augmentation to enhance model generalization.

7.2.3. CNN Model Module

The CNN Model Module defines the structure of the neural network used for
classification. It allows for the use of custom or pre-trained models, such as Xception or
EfficientNet, and

18
includes functionality for compiling, training, saving, and loading the model architecture and
weights.

7.2.4. Training & Validation Module

The Training & Validation Module manages the model training loop, monitors
performance metrics like accuracy and F1 score, and applies callbacks such as early stopping.
This module ensures the model learns effectively while avoiding overfitting.

7.2.5. Inference Module

The Inference Module is used to predict whether new input media is real or fake. It
processes the input image or video, extracts faces, and applies the trained CNN model. It returns
a classification label with a confidence score.

7.2.6. Explainability Module

The Explainability Module enhances trust in model predictions by generating visual


interpretations. Using Grad-CAM, it highlights the regions of the face that influenced the
model’s decision, which can help users understand why an image or video was flagged.

7.2.7. Streamlit Interface Module

The Streamlit Interface Module provides a lightweight, user-friendly interface where


users can upload images or videos and receive deepfake predictions in real time. It displays
results, confidence levels, and visualizations directly in the browser.

7.2.8. Utility Module

The Utility Module supports various background tasks such as configuration


management, file handling, logging, and formatting. It helps streamline development and
debugging by consolidating reusable functions and settings in one place.

19
7.3 FRAMEWORK OVERVIEW

TensorFlow / Keras

 Offers high-level APIs for rapid prototyping of CNNs.


 Used for model training, validation, and deployment.
 Transfer learning from XceptionNet significantly boosts accuracy and speeds up training.

OpenCV

 Essential for video handling and image manipulation.


 Detects and crops face regions from image .
 Captures webcam streams in real-time detection mode.

Streamlit / Flask (Optional UI)

 Enables simple deployment of the model as a web app.


 Users can upload videos, trigger detection, and view results in browser.
 Easy to integrate backend Python logic with frontend controls.

7.4 FEATURES

 High Detection Accuracy: Leverages state-of-the-art CNN models trained on large


datasets.
 Real-time Processing: Supports on-the-fly frame analysis using webcam (optional).
 Modular Design: Clear separation of components allows easy debugging and
enhancement.
 Dataset Flexibility: Compatible with multiple datasets for robust training and
benchmarking.
 Explainability: Possibility to integrate Grad-CAM for heatmap visualization (optional
enhancement).
 Lightweight Deployment: With tools like TensorFlow Lite, the model can be optimized
for edge devices.

20
CHAPTER 8
SOFTWARE TESTING

8.1 AIM OF TESTING

Software testing in the context of deepfake detection is critical to ensure that the system
accurately and reliably differentiates between real and manipulated media. The aim is to identify
defects in the model logic, data preprocessing, and the user interface, and to validate that the
deep learning model generalizes well across unseen data. This chapter details the testing
approaches applied across all levels of the system to validate functionality, performance, and
usability.

8.2 TEST CASES

8.2.1. Valid Image with Clear Face:

This test case involves providing a high-quality image with a clearly visible human face. The input
image should be in a supported format like .jpg or .png. The purpose is to verify that the model
accurately identifies real faces. The expected result is a "Real" prediction with high confidence,
typically above 90%. This confirms the model performs well with ideal input conditions.

8.2.2. Valid Deepfake Image:

In this case, the model is tested using a confirmed deepfake image. The goal is to ensure the CNN
correctly classifies fake content. The model should return a prediction of "Fake" with high
confidence, validating that it has learned to distinguish synthetic facial features and manipulation
artifacts effectively.

8.2.3. No Face in Image:

This test evaluates the system’s response when an image without a human face is submitted. For
example, images of landscapes, objects, or animals can be used. The model should return an
error message like "No face detected," demonstrating that the face detection preprocessing step is
functioning correctly and that unnecessary processing is avoided.

21
8.2.4. Low Confidence Output:

To test how the model handles uncertainty, a low-quality, blurry, or partially obscured face
image is input. The model should still attempt a prediction but may return a output with "Low
confidence prediction." This helps the user understand that the model is uncertain and provides
guidance for corrective action.

8.2.5. Large File Handling:

A very high-resolution image or a 4K video is used as input to assess how the model handles
large data. The goal is to ensure the system does not crash due to memory overload or processing
timeouts. The output should still be correct, and the performance should remain stable,
confirming system scalability.

8.2.6. Adversarial Input:

This advanced test uses adversarial examples—images that are subtly modified to confuse the
model, often with added noise or slight distortions. The goal is to check if the model is robust
against minor perturbations. Ideally, the system should still classify the input as "Fake" if it's
indeed a deepfake, showing resilience against manipulation.

8.3 TYPES OF TESTING

In the development of a robust and reliable media classification system—particularly one


aimed at detecting whether images or videos are real or manipulated (e.g., deepfakes)—a
comprehensive and methodical testing strategy is essential. Testing ensures the system
meets performance, usability, and accuracy standards while minimizing the risk of errors
and inconsistencies in real-world applications. Below is an extended overview of the key
types of testing employed to validate such a system:

8.3.1. Unit Testing

Unit testing involves validating individual components or functions of the system in


isolation to ensure they operate correctly on their own. In the context of this media

22
classification system, unit tests are written for core functions such as frame extraction,
face detection, image preprocessing, and model prediction. Each of these components is
tested using Python’s unittest framework and assertions, which check whether the actual
output matches the expected result for various test cases. For example, the face detection
function may be tested by passing in an image with a known face and verifying that it
returns the correct bounding box. Unit testing enables early detection of bugs, simplifies
debugging, and helps maintain code quality during ongoing development.

8.3.2. Integration Testing

Once individual components have been tested, integration testing is conducted to ensure
that these modules interact correctly when combined. This type of testing examines the
flow of data across modules—specifically from the frame extractor to the preprocessing
unit, then to the deep learning model, and finally to the output generation system. The
main goal is to verify that each component correctly passes formatted and expected data
to the next. For example, integration testing checks whether the preprocessed face images
output by one module are properly formatted and compatible with the input expected by
the classification model. This helps detect interface mismatches, improper data handling,
and communication failures between modules.

8.3.3. Functional Testing

Functional testing evaluates whether the overall system behaves as expected from a
user’s perspective. This includes testing the complete pipeline starting from the user
uploading an image or video, followed by system processing, classification (real or fake),
and finally the display of results on the interface. Test scenarios include valid uploads,
invalid file types, corrupted media, and edge cases like extremely small or blurry faces.
The goal is to ensure the system meets functional requirements, such as successful file
uploads, accurate deepfake detection, and timely feedback. This testing is vital in
validating the end-to-end functionality of the system in realistic usage scenarios.

23
8.3.4. Regression Testing

As the system evolves, with new features added or existing algorithms improved,
regression testing ensures that these changes do not unintentionally disrupt previously
working functionality. For example, if the face detection algorithm is enhanced or the
model is retrained for better accuracy, regression tests are used to retest all critical
features—such as correct image classification and proper result rendering—that were
already working in earlier versions. Automated test scripts are often used for this purpose
to quickly verify that nothing has been broken in the process of updates.

8.3.5. Performance Testing

Performance testing evaluates the system’s responsiveness, efficiency, and scalability


under various workloads. Key metrics include the average prediction time (ideally less
than 1 second per image) and the model loading time (targeted at under 3 seconds). This
type of testing also simulates multiple users uploading media simultaneously to check
how well the system performs under stress.

Tools and scripts are used to simulate concurrent uploads and track the system’s ability to
maintain consistent response times, handle memory efficiently, and recover from
overload situations. A well-performing system ensures users experience minimal delays
even during peak usage.

8.3.6. Usability Testing

Usability testing focuses on the design and user interface of the application, ensuring that
it is intuitive and accessible for users with varying levels of technical expertise. Test
participants are asked to perform common tasks such as uploading files, interpreting
results, and troubleshooting errors. During testing, evaluators look for signs of confusion,
difficulty, or hesitation. Elements such as clear instructions, helpful tooltips, informative
error messages, and easy navigation are essential. For instance, if a user uploads an

24
unsupported file format, the system should provide a clear message indicating the
accepted formats. Based on usability feedback, the interface is adjusted to ensure a
smooth and user- friendly experience.

8.3.7. Black Box Testing

Black box testing treats the system as a "black box" where the internal code and
architecture are not considered. Instead, testing focuses purely on inputs and outputs.
Testers provide a variety of input media files and observe the output (real or fake
classification, error messages, etc.) to ensure correctness. They also evaluate how the
system responds to unexpected or incorrect input, such as uploading text files or
extremely large videos. The goal is to ensure the application behaves correctly and
predictably from a user's perspective, regardless of the underlying implementation.

8.3.8. White Box Testing

In contrast to black box testing, white box testing involves a detailed examination of the
internal workings of the system. This includes checking the structure of the code, data
transformations, model layer outputs, and normalization processes. For example, testers
may verify that pixel values are normalized to the correct range before being fed into the
model, or that the intermediate outputs of convolutional layers fall within expected
distributions. This kind of testing is particularly useful for debugging and optimizing
model performance and verifying that the architecture and data handling conform to
design specifications.

8.3.9. Output Testing

Output testing validates the accuracy and clarity of the final system outputs. The
classification results (i.e., "Real" or "Fake") are compared against a labeled test dataset to
assess prediction accuracy. Additionally, the visual presentation of results is examined—
for instance, checking whether the predicted label is displayed near the detected face
along
25
with a confidence score overlay. The correctness of the overlay, font clarity, color-coding
(e.g., red for fake, green for real), and alignment with detected features are tested to
ensure users can easily understand the results.

8.3.10. User Acceptance Testing (UAT)

User Acceptance Testing is the final phase where the system is tested by actual end users
— typically a representative group of the intended audience. These users interact with the
system by uploading various media files and interpreting the detection results. Their
feedback is collected on several parameters, including the clarity of classification results,
usefulness of confidence levels, and ease of navigation and interaction. Based on this
feedback, minor enhancements are often implemented, such as more descriptive file
format alerts, better result styling, and improved layout responsiveness. UAT ensures that
the system is ready for deployment and meets real-world user expectations.

8.4 TESTING TOOLS USED

In any machine learning or AI-based system, testing is a critical phase that ensures
reliability, correctness, and performance under various conditions. In this project, several tools
have been employed to support both unit-level and system-level testing. The Python modules
unittest and pytest serve as automated unit testing frameworks. These tools allow the developer
to create test cases for individual components such as data loading, preprocessing, face detection,
and model prediction. They help maintain the integrity of the codebase by ensuring that newly
added functions do not break existing features. pytest in particular provides a more scalable and
user-friendly syntax and supports advanced features like fixtures and parameterized testing,
making it ideal for complex deep learning projects.

TensorBoard, a visualization toolkit provided by TensorFlow, is used extensively for


monitoring the training process. It allows the developer to visualize metrics such as training loss,
validation accuracy, precision, recall, and other custom scalars. This visualization is crucial for
identifying issues such as overfitting, underfitting, or vanishing gradients. Additionally,

26
TensorBoard’s interactive graphs and histograms help in understanding how weights and biases
evolve during the training process.

To complement automated tools, manual verification is performed using OpenCV, a


powerful open-source computer vision library. With OpenCV, individual video frames or images
can be visually inspected to ensure that face detection and alignment processes are functioning as
intended. It also helps in detecting anomalies that may not be captured through code-based tests,
such as incorrect face cropping or lighting inconsistencies. Lastly, Jupyter Notebook serves as
the primary environment for code development and debugging. Its interactive interface allows
developers to experiment with different model configurations, run cell-by-cell execution, and
view outputs in real time, which is highly advantageous during model tuning and testing.

8.5 MODEL VALIDATION METRICS

Evaluating the performance of a deep learning model goes beyond simply reporting
accuracy. For a binary classification task such as deepfake detection, it is vital to use a set of
robust evaluation metrics that account for different types of prediction errors. Accuracy, while
commonly used, only indicates the overall correctness of predictions. It can be misleading in
imbalanced datasets where one class may dominate. For instance, if most videos are real, a
model predicting everything as real might still appear accurate.

To address this limitation, Precision is used to measure the number of correctly identified
fake instances divided by the total instances the model predicted as fake. High precision
indicates that when the model claims something is fake, it is likely correct—important in
minimizing false accusations of authenticity. Conversely, Recall focuses on the model’s ability
to detect actual fake content. It is calculated as the number of correctly predicted fake instances
divided by the total number of actual fake samples. High recall ensures the model doesn't miss
potential threats in the form of deepfakes.

The F1-Score serves as a balanced metric that considers both precision and recall. It is
especially useful when dealing with uneven class distributions or when both false positives and
false negatives carry significant consequences. An ideal deepfake detection model should aim for
a high F1-score to maintain balance between caution and coverage. Finally, the ROC-AUC

27
(Receiver Operating Characteristic - Area Under the Curve) metric is employed to evaluate the
trade-off between sensitivity (true positive rate) and specificity (false positive rate) across
various threshold settings.

8.6 ERROR HANDLING

Building a user-facing AI system demands that it not only performs accurately but also handles
unexpected situations gracefully. The error handling module in the deepfake detection system is
designed to provide informative and user-friendly responses to a variety of potential issues,
ensuring robustness and enhancing user experience.

One common scenario is when a user uploads an image or video where no recognizable
human face is present. In such cases, the system returns the message: “No human face detected.”
This prevents the model from processing irrelevant or non-human content, which could lead to
misleading outputs. This check is implemented early in the pipeline using face detection
algorithms like MTCNN or Haar cascades.

Another error addressed is invalid file formats. The system is designed to work with
specific media formats (e.g., .jpg, .png, .mp4), and when an unsupported file is uploaded, it
prompts the message: “Unsupported file type.” This safeguards the application from crashing due
to unrecognized data structures and guides the user toward acceptable input types.

If the system encounters a failure in loading the trained model—either due to file
corruption, incorrect path, or missing files—it raises an alert with the message: “Model loading
error.” This is a critical failure point, and the error message informs the user or developer to
recheck the deployment files.

Lastly, the system incorporates a confidence threshold mechanism. If the model makes a
prediction but with a confidence level below 60%, it triggers a warning: “Low confidence. Re-
upload suggested.” This acts as a safeguard against unreliable outputs and encourages users to
submit better-quality inputs, such as clearer images or videos with good lighting and frontal
faces. Collectively, these error-handling features make the system more reliable, user-oriented,
and capable of functioning well in real-world scenarios.

28
CHAPTER 9
CONCLUSION AND FUTURE ENHANCEMENT

9.1 CONCLUSION

The increasing prevalence of deepfake media poses a significant threat to digital content
authenticity, personal identity, and information security. This project presents an effective
solution to detect deepfake videos and images using deep learning models. By leveraging
convolutional neural networks (CNNs), the system can learn and extract complex visual features
from input media to distinguish between real and fake content with high accuracy.

Throughout the project, various aspects of deepfake generation and detection were
explored. The proposed system was trained and tested on benchmark datasets and demonstrated
promising results in identifying synthetic facial manipulations. Unlike traditional manual or rule-
based methods, this system relies on learned features, making it more scalable, adaptive, and
suitable for real-world applications.

This work contributes to the broader field of digital forensics and can assist platforms,
law enforcement, and the general public in countering misinformation, fraud, and media
tampering. The proposed model successfully meets the core objectives of detecting manipulated
media and improving awareness regarding the risks of deepfake content.

9.2 FUTURE ENHANCEMENT

While the proposed deepfake detection system demonstrates effective performance, there
are opportunities for further development and improvement in future work. Some key areas of
enhancement include:

Incorporating Temporal Features: Current models often analyze frames individually. Adding
temporal models like 3D CNNs or LSTMs will enable better video-level analysis by capturing
motion-based inconsistencies.

29
Multi-modal Detection: Integrating both audio and video features will provide more robust
detection, particularly in detecting deepfakes that also manipulate voice and speech patterns.

Real-time Detection Capabilities: Optimization of the system for real-time processing can
allow for implementation in web applications, browser extensions, or mobile platforms for on-
the-fly deepfake analysis.

Cross-Dataset Generalization: Enhancing the model’s ability to generalize across different


datasets and manipulation techniques will improve reliability against novel or unseen deepfake
generation methods.

User Interface Development: Building a simple and interactive front-end interface would allow
non-technical users to upload and check media content for authenticity.

This project lays a solid foundation for future advancements in automated deepfake detection
and has the potential to evolve into a full-fledged system that plays a key role in combating the
spread of synthetic misinformation.

30
APPENDICES

APPENDIX 1
SOURCE CODE

Training.py

import tensorflow as tf

from tensorflow import keras

from tensorflow.keras import layers # type: ignore

import numpy as np

import cv2

import os

from sklearn.model_selection import train_test_split

# Define Model Architecture

def build_model():

model = keras.Sequential([

layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),

layers.MaxPooling2D((2, 2)),

layers.Conv2D(64, (3, 3), activation='relu'),

layers.MaxPooling2D((2, 2)),

layers.Conv2D(128, (3, 3), activation='relu'),

layers.MaxPooling2D((2, 2)),

layers.Flatten(),

layers.Dense(128, activation='relu'),

31
layers.Dense(1, activation='sigmoid')

])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

return model

# Load Data from Train/Test Directories

def load_data(data_dir):

def process_folder(folder):

images, labels = [], []

for label, category in enumerate(['real', 'fake']):

category_path = os.path.join(folder, category)

for img_name in os.listdir(category_path):

img_path = os.path.join(category_path, img_name)

img = cv2.imread(img_path)

img = cv2.resize(img, (128, 128))

images.append(img)

labels.append(label)

return np.array(images) / 255.0, np.array(labels)

train_images, train_labels = process_folder(os.path.join(data_dir,


'D:\jp\dataset\Dataset\Train'))

test_images, test_labels = process_folder(os.path.join(data_dir,


'D:\jp\dataset\Dataset\Test'))

return train_images, train_labels, test_images, test_labels

32
# Train Model

data_dir = 'D:\jp\dataset' # Update with actual path

X_train, y_train, X_test, y_test = load_data(data_dir)

model = build_model()

model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test,


y_test))

model.save("deepfake_model.h5")

33
APP.PY

import streamlit as st

import tensorflow as tf

import numpy as np

from PIL import Image

import cv2

# Load the trained model

def load_model():

return tf.keras.models.load_model("D:\jp\deepfake_model.h5")

# Preprocess image

def preprocess_image(image):

image = image.resize((128, 128))

image = np.array(image) / 255.0 # Normalize

image = np.expand_dims(image, axis=0) # Add batch dimension

return image

# Prediction function

def predict_image(image, model):

processed_image = preprocess_image(image)

prediction = model.predict(processed_image)[0][0]

confidence = prediction if prediction > 0.5 else 1 - prediction

result = "Deepfake" if prediction > 0.5 else "Real"

return result, confidence

# Streamlit UI

34
st.title("Deepfake Detection System")

st.write("Upload an image to check if it's real or a deepfake.")

model = load_model()

# Image Upload

uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "png", "jpeg"])

if uploaded_file is not None:

image = Image.open(uploaded_file)

st.image(image, caption="Uploaded Image", use_column_width=True)

result, confidence = predict_image(image, model)

st.write(f"### Prediction: {result} (Confidence:

{confidence:.2%})") if name == " main ":

st.write("Deepfake Detection Ready!")

1.3 Deepfake Prediction Function

import numpy as np

def predict_image(model, image_path):

face = extract_face(image_path)

if face is not None:

face = np.expand_dims(face, axis=0) # Add batch dimension

prediction = model.predict(face)[0][0]

return "Fake" if prediction > 0.5 else "Real"

else:

return "No face detected"

35
APPENDIX 2
SCREENSHOTS

INITIAL WEBPAGE

Figure 10.1 Streamlit webpage for deepfake detection

36
DEEPFAKE DETECTION

1. REAL IMAGES

Figure 10.2 DeepFake Detection prediction 1:Real

37
Figure 10.3 DeepFake Detection prediction 2: Real

38
Figure 10.4 DeepFake Detection prediction 3: Real

39
2. FAKE IMAGES

Figure 10.5 DeepFake Detection prediction 4: Deepfake

40
Figure 10.6 DeepFake Detection prediction 5 :Deepfake

41
REFERENCES

Figure 10.7 DeepFake Detection prediction 6: Deepfake

42
REFERENCES

1. Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019).
FaceForensics++: Learning to detect manipulated facial images.
2. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: a Compact Facial
Video Forgery Detection Network. In Proceedings of the IEEE International Workshop
on Information Forensics and Security (WIFS), 1–7.
3. Chollet, F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 1251–1258.
4. Nguyen, H. H., Yamagishi, J., & Echizen, I. (2019). Capsule-forensics: Using capsule
networks to detect forged images and videos. In ICASSP 2019 - IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2307–2311.
5. Li, Y., Chang, M. C., & Lyu, S. (2020). Face X-ray for more general face forgery
detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), 5001–5010.
6. Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C.
(2020). The Deepfake Detection Challenge (DFDC) Dataset. arXiv preprint
arXiv:2006.07397.
https://www.kaggle.com/c/deepfake-detection-challenge
7. Li, Y., & Lyu, S. (2019). Exposing DeepFake Videos By Detecting Face
Warping Artifacts. In Proceedings of the IEEE Conference (CVPRW).
8. Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-
Scale Image Recognition. arXiv preprint arXiv:1409.1556.
9. Abavisani, M., & Patel, V. M. (2020). Exploring the Space of Deepfake Detection. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), 1–8.
10. Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. In
Proceedings of the International Conference on Learning Representations (ICLR).
https://arxiv.org/abs/1412.6980

43

You might also like