Project Report Model
Project Report Model
DEEP LEARNING
Submitted by
VIGNESHWAR P (810422243119)
SABARI KANNAN R (810422243088)
SAYOOJ KUMAR V.S (810422243095)
Of
DHANALAKSHMI SRINIVASAN ENGINEERING COLLEGE, (AUTONOMOUS)
PERAMBALUR – 621 212
Submitted to the
i
BONAFIDE CERTIFICATE
Certified that this mini project report titled “DEEPFAKE DETECTION USING DEEP
research under my supervision. certified further, that to the best of my knowledge the work reported
here in does not form part of any other project report or dissertation on the basis of which a degree
SIGNATURE SIGNATURE
It is with immense pleasure that I present my first venture in the field of real application of
computing in the form of project work. First, I am indebted to the Almighty for his choicest
College (Autonomous), Perambalur for their moral support and encouragement they have
rendered throughout the course. I express my sincere thanks to the Head of the Department Dr.
K.V.M SHREE, M.E., Ph.D., for having provided us with all the necessary specifications.
M.B.A,(Ph.D) for his guidance and suggestions during this project work.
We render our thanks to all the staff members and programmers of department of
iii
ABSTRACT
With the rise of manipulated media, deepfake content has become a serious concern in
digital security, social media, and public trust. These synthetic images and videos, generated using
AI, can be indistinguishable from real ones to the human eye, posing significant ethical and
security threats. Manual identification of deepfake images is not only challenging but also
counter the rapid spread of fake content online. This calls for an automated, reliable solution to
detect deepfakes accurately and efficiently. This project proposes the use of Convolutional Neural
Networks (CNNs), a powerful class of deep learning models, to automate the detection of deepfake
images. The system is trained on a labeled dataset of real and fake images, using a custom-built
CNN architecture consisting of convolutional, max-pooling, and dense layers. The model takes
preprocessed 128×128 RGB images as input and is trained using binary cross-entropy loss with
the Adam optimizer. The system achieves high classification accuracy and effectively
distinguishes between real and synthetic images. The proposed CNN-based deepfake detection
system provides a fast and scalable solution for identifying manipulated images. It can serve as a
valuable tool in digital forensics, content moderation, and media authentication, helping reduce
iv
TABLE OF CONTENTS
ABSTRACT iv
1 INTRODUCTION 1
1.1 INTRODUCTION 1
1.2 PURPOSE 1
1.4 MOTIVATION 2
1.5 OBJECTIVES 3
2 LITERATURE SURVEY 4
3 SYSTEM ANALYSIS 6
4 SYSTEM SPECIFICATIONS 8
5 SYSTEM IMPLEMENTATION 9
v
5.2.1 DATASET COLLECTION 9
6 SYSTEM DESIGN 12
7 SOFTWARE DESCRIPTION 17
7.1 OVERVIEW 17
18
7.2.5 INFERENCE MODULE
18
7.2.6 EXPLAINABILITY MODULE
18
7.2.7 STREAMLIT INTERFACE MODULE
18
7.2.8 UTILITY MODULE
18
7.3 FRAMEWORK OVERVIEW
19
7.4 FEATURES
19
vi
8 SOFTWARE TESTING 20
27
vii
9 CONCLUSION AND FUTURE ENHANCEMENT 28
9.1 CONCLUSION 28
APPENDICES 29
APPENDIX 2 - SCREENSHOTS 35
REFERENCES 42
viii
LIST OF FIGURES
ix
LIST OF ABBREVIATIONS
AI - ARTIFICIAL INTELLIGENCE
OS - OPERATING SYSTEM
UI - USER INTERFACE
x
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
In recent years, the rise of deepfakes—synthetically altered or generated images and videos
using artificial intelligence—has emerged as a major threat to digital authenticity. These
manipulated media can be used maliciously to spread misinformation, commit fraud, or manipulate
public opinion. Deepfakes are typically generated using techniques such as Generative Adversarial
Networks (GANs), which create realistic but fake content that is difficult for humans to distinguish
from authentic material.
As the quality of deepfakes continues to improve, the challenge of detecting them becomes
increasingly complex. This has led to a surge of interest in developing reliable and automated
systems that can detect such manipulations. Deep learning, a subset of artificial intelligence that
excels at pattern recognition, offers promising tools for tackling this problem. By training neural
networks on large datasets of real and fake media, models can learn subtle artifacts and
inconsistencies indicative of tampering.
This project aims to build an effective deepfake detection system using deep learning
techniques. The proposed system analyzes facial and visual features in video frames and uses a
convolutional neural network (CNN) model to classify content as real or fake. This technology has
applications in digital forensics, social media moderation, and media verification.
1.2 PURPOSE
The purpose of this project is to design and implement a deep learning-based system
capable of accurately detecting deepfake videos and images. With the increasing availability of
tools that allow even non-experts to create highly realistic fake media, the integrity of digital
content has become a growing concern. This project aims to combat the misuse of such synthetic
content by developing a reliable and automated solution that can identify tampered visuals through
analysis of facial and visual inconsistencies. By leveraging convolutional neural networks (CNNs)
and other deep learning architectures, the system is expected to detect subtle artifacts introduced
1
during the generation of deepfakes. Beyond technical implementation, the broader purpose of this
work is to support digital forensics, social media moderation, and public awareness efforts by
offering a scalable method for verifying the authenticity of digital media. This contributes to
safeguarding individuals, organizations, and societies from the potentially harmful consequences
of misinformation, impersonation, and fraud caused by deepfakes.
Deepfakes present a serious threat to digital content authenticity, with potentially severe
implications for individuals, corporations, and governments. These AI-generated videos and
images can be manipulated to falsely represent people saying or doing things they never did,
leading to misinformation, defamation, identity theft, and political manipulation. The quality of
deepfakes has advanced to the point where they are nearly indistinguishable from genuine media,
making manual detection by human observers unreliable. Existing detection methods are often
limited in scope, lack real-time performance, and struggle to keep pace with rapidly evolving
deepfake generation techniques. Moreover, traditional forensic analysis techniques are time-
consuming and require expert intervention, making them unsuitable for large-scale content
verification. This project addresses these challenges by developing an intelligent, automated
system that uses deep learning to detect deepfakes with high accuracy. By training models on a
combination of real and fake media datasets, the system aims to identify subtle features that
distinguish authentic content from manipulated media, thus helping to restore trust in digital
communications.
1.4 MOTIVATION
The motivation behind this project stems from the growing misuse of AI-generated content
in malicious contexts such as political misinformation, fake news, financial scams, and personal
reputation damage. By developing a deepfake detection model, we can help combat these threats
and promote trust in digital content, while contributing to AI accountability and responsible media
practices.
2
1.5 OBJECTIVES
3
CHAPTER 2
LITERATURE SURVEY
With the rapid development of generative adversarial networks (GANs) and related
technologies, deepfakes have become one of the most challenging threats to digital media
authenticity. Consequently, researchers have actively explored various techniques for detecting
such manipulations using machine learning and deep learning approaches. This chapter reviews
significant existing work in the field of deepfake detection.
Nguyen et al. (2019) explored the use of capsule networks for deepfake detection,
highlighting their ability to preserve spatial hierarchies in facial structures. Their work showed
promise in scenarios where traditional CNNs struggled due to geometric transformations.
Li et al. (2020) proposed Face X-ray, a technique that identifies blending artifacts in
deepfakes by treating the problem as an image segmentation task. This method detects whether a
given image contains a combination of two facial regions—a common trait in face-swapping
deepfakes.
4
More recent approaches have leveraged transformer-based models and attention
mechanisms to improve detection accuracy. These models focus on capturing long-range
dependencies and facial expressions more effectively, which are often difficult to fake consistently
across frames.
5
CHAPTER 3
SYSTEM ANALYSIS
In the current landscape, deepfake detection systems face significant challenges due to the
rapid evolution of deepfake generation techniques. Many existing systems rely on manual
observation or traditional video forensics, which are time-consuming, inconsistent, and often
ineffective against high-quality deepfakes. While several machine learning approaches have been
introduced, many of them lack generalization capability and fail to maintain high accuracy across
various deepfake datasets.
The proposed system leverages deep learning, specifically convolutional neural networks
(CNNs), to detect deepfake content based on learned features rather than manually crafted ones.
This system is designed to automatically analyze image or video frames and identify subtle facial
distortions, pixel-level inconsistencies, or blending artifacts commonly found in manipulated
6
media. By training the model on a diverse dataset of real and fake videos/images, the system aims
to achieve high accuracy and robustness.
The architecture of the system may include a pretrained model (e.g., XceptionNet,
EfficientNet, or ResNet) fine-tuned on deepfake datasets such as FaceForensics++, DFDC, or
Celeb-DF. The model extracts deep visual features from each frame and classifies them as "real"
or "fake." The system can be extended to include temporal models (e.g., LSTM or 3D CNNs) to
analyze video frame sequences and improve detection based on motion inconsistencies.
This solution is scalable, fast, and capable of detecting both known and emerging deepfake
types. It can be deployed in content moderation systems, mobile apps, or browser extensions to
verify media authenticity in real-time.
7
CHAPTER 4
SYSTEM SPECIFICATIONS
• RAM : 8-16 GB
• Storage : 256 GB
learn,Tensorflow
8
CHAPTER 5
SYSTEM IMPLEMENTATION
1. Dataset Collection
2. Data Preprocessing
3. Model Design and Training
4. Deepfake Detection
5. Result Evaluation and Visualization
The Dataset Collection module involves gathering a diverse and comprehensive dataset
that includes both real and deepfake images or video frames. Public datasets such as
FaceForensics++, DFDC, or Celeb-DF are often used for this purpose. These datasets provide a
wide range of manipulated content, ensuring variety in terms of facial expressions, lighting
conditions, backgrounds, and manipulation techniques. The goal is to collect enough data to train
a robust model capable of generalizing to different types of deepfake content.
This module is responsible for preparing the collected data for training. It includes
extracting frames from video files, detecting and aligning faces using tools like MTCNN or
OpenCV, resizing images to the desired input size for the CNN (typically 224x224 pixels), and
normalizing pixel values. To enhance model generalization and reduce overfitting, data
augmentation techniques such as rotation, flipping, brightness adjustment, and noise addition are
also applied. Proper preprocessing ensures consistency and improves the efficiency of the model
training process.
9
5.2.3. Model Design and Training
In this module, a Convolutional Neural Network (CNN) is designed and trained for binary
classification—determining whether an input is real or fake. This can involve building a custom
CNN architecture or fine-tuning a pre-trained model such as VGG16 or ResNet. The model is
trained using a binary crossentropy loss function and an optimizer like Adam. The dataset is split
into training, validation, and test sets to monitor the model's performance and prevent overfitting.
Training is carried out over multiple epochs, and key metrics such as training accuracy and loss
are recorded.
Once the model is trained, it is used in this module to classify new or unseen media. The
input image or video frame undergoes the same preprocessing steps and is then passed through the
CNN model to predict the likelihood of it being real or fake. Based on the output probability, the
system labels the input accordingly. This module represents the core functionality of the system—
real-time or batch detection of deepfakes using the trained model.
The final module of the system is designed to thoroughly evaluate model performance and present
the results through intuitive and insightful visualizations. It begins with the confusion matrix,
which clearly outlines the distribution of true positives, true negatives, false positives, and false
negatives, helping identify the model’s strengths and the nature of its misclassifications. To further
assess the quality of the classification, the module includes a Receiver Operating Characteristic
(ROC) curve, which demonstrates the trade-off between sensitivity and specificity across different
threshold values, providing a visual guide for selecting an optimal decision boundary.
In addition, accuracy and loss graphs are plotted to monitor the model’s training process
over time, comparing training and validation metrics to detect issues like overfitting or
underfitting.
10
As an added feature, an optional interactive interface is available, allowing users to upload
video files and observe real-time detection results. This interface displays visual outputs such as
bounding boxes and labels on each frame, offering a hands-on, user-friendly way to test and
explore the system’s functionality. Collectively, these tools not only deliver a complete evaluation
of the model’s performance but also enhance its interpretability and accessibility for both technical
and non-technical users.
11
CHAPTER 6
SYSTEM DESIGN
12
6.2 USE CASE DIAGRAM
13
6.3 CLASS DIAGRAM
14
6.4 SEQUENCE DIAGRAM
15
6.5 ACTIVITY DIAGRAM
16
CHAPTER 7
SOFTWARE DESCRIPTION
7.1 OVERVIEW
The project leverages Python due to its robust ecosystem of AI and image processing
libraries. Using datasets containing both genuine and deepfake videos (e.g., FaceForensics++,
Celeb-DF), the system is trained to distinguish real from fake content. It provides a seamless
pipeline—from media upload to prediction output—through modular components.
The Data Ingestion Module handles the loading of datasets and the extraction of frames
from video files. It organizes data into training, validation, and test sets while managing associated
labels, enabling the model to learn from real-world examples of deepfakes.
The Preprocessing Module is responsible for preparing the input data. It detects and crops
faces from images or frames using face detection libraries like MTCNN or dlib, then resizes and
normalizes them. This module also performs data augmentation to enhance model generalization.
The CNN Model Module defines the structure of the neural network used for classification.
It allows for the use of custom or pre-trained models, such as Xception or EfficientNet, and
17
includes functionality for compiling, training, saving, and loading the model architecture and
weights.
The Training & Validation Module manages the model training loop, monitors
performance metrics like accuracy and F1 score, and applies callbacks such as early stopping. This
module ensures the model learns effectively while avoiding overfitting.
The Inference Module is used to predict whether new input media is real or fake. It
processes the input image or video, extracts faces, and applies the trained CNN model. It returns a
classification label with a confidence score.
The Streamlit Interface Module provides a lightweight, user-friendly interface where users
can upload images or videos and receive deepfake predictions in real time. It displays results,
confidence levels, and visualizations directly in the browser.
The Utility Module supports various background tasks such as configuration management,
file handling, logging, and formatting. It helps streamline development and debugging by
consolidating reusable functions and settings in one place.
18
7.3 FRAMEWORK OVERVIEW
TensorFlow / Keras
OpenCV
7.4 FEATURES
19
CHAPTER 8
SOFTWARE TESTING
Software testing in the context of deepfake detection is critical to ensure that the system
accurately and reliably differentiates between real and manipulated media. The aim is to identify
defects in the model logic, data preprocessing, and the user interface, and to validate that the deep
learning model generalizes well across unseen data. This chapter details the testing approaches
applied across all levels of the system to validate functionality, performance, and usability.
This test case involves providing a high-quality image with a clearly visible human face. The input
image should be in a supported format like .jpg or .png. The purpose is to verify that the model
accurately identifies real faces. The expected result is a "Real" prediction with high confidence,
typically above 90%. This confirms the model performs well with ideal input conditions.
In this case, the model is tested using a confirmed deepfake image. The goal is to ensure the CNN
correctly classifies fake content. The model should return a prediction of "Fake" with high
confidence, validating that it has learned to distinguish synthetic facial features and manipulation
artifacts effectively.
This test evaluates the system’s response when an image without a human face is submitted. For
example, images of landscapes, objects, or animals can be used. The model should return an error
message like "No face detected," demonstrating that the face detection preprocessing step is
functioning correctly and that unnecessary processing is avoided.
20
8.2.4. Low Confidence Output:
To test how the model handles uncertainty, a low-quality, blurry, or partially obscured face image
is input. The model should still attempt a prediction but may return a output with "Low confidence
prediction." This helps the user understand that the model is uncertain and provides guidance for
corrective action.
A very high-resolution image or a 4K video is used as input to assess how the model handles large
data. The goal is to ensure the system does not crash due to memory overload or processing
timeouts. The output should still be correct, and the performance should remain stable, confirming
system scalability.
This advanced test uses adversarial examples—images that are subtly modified to confuse the
model, often with added noise or slight distortions. The goal is to check if the model is robust
against minor perturbations. Ideally, the system should still classify the input as "Fake" if it's
indeed a deepfake, showing resilience against manipulation.
21
classification system, unit tests are written for core functions such as frame extraction, face
detection, image preprocessing, and model prediction. Each of these components is tested
using Python’s unittest framework and assertions, which check whether the actual output
matches the expected result for various test cases. For example, the face detection function
may be tested by passing in an image with a known face and verifying that it returns the
correct bounding box. Unit testing enables early detection of bugs, simplifies debugging,
and helps maintain code quality during ongoing development.
Once individual components have been tested, integration testing is conducted to ensure
that these modules interact correctly when combined. This type of testing examines the
flow of data across modules—specifically from the frame extractor to the preprocessing
unit, then to the deep learning model, and finally to the output generation system. The main
goal is to verify that each component correctly passes formatted and expected data to the
next. For example, integration testing checks whether the preprocessed face images output
by one module are properly formatted and compatible with the input expected by the
classification model. This helps detect interface mismatches, improper data handling, and
communication failures between modules.
Functional testing evaluates whether the overall system behaves as expected from a user’s
perspective. This includes testing the complete pipeline starting from the user uploading
an image or video, followed by system processing, classification (real or fake), and finally
the display of results on the interface. Test scenarios include valid uploads, invalid file
types, corrupted media, and edge cases like extremely small or blurry faces. The goal is to
ensure the system meets functional requirements, such as successful file uploads, accurate
deepfake detection, and timely feedback. This testing is vital in validating the end-to-end
functionality of the system in realistic usage scenarios.
22
8.3.4. Regression Testing
As the system evolves, with new features added or existing algorithms improved,
regression testing ensures that these changes do not unintentionally disrupt previously
working functionality. For example, if the face detection algorithm is enhanced or the
model is retrained for better accuracy, regression tests are used to retest all critical
features—such as correct image classification and proper result rendering—that were
already working in earlier versions. Automated test scripts are often used for this purpose
to quickly verify that nothing has been broken in the process of updates.
Tools and scripts are used to simulate concurrent uploads and track the system’s ability to
maintain consistent response times, handle memory efficiently, and recover from overload
situations. A well-performing system ensures users experience minimal delays even during
peak usage.
Usability testing focuses on the design and user interface of the application, ensuring that
it is intuitive and accessible for users with varying levels of technical expertise. Test
participants are asked to perform common tasks such as uploading files, interpreting
results, and troubleshooting errors. During testing, evaluators look for signs of confusion,
difficulty, or hesitation. Elements such as clear instructions, helpful tooltips, informative
error messages, and easy navigation are essential. For instance, if a user uploads an
23
unsupported file format, the system should provide a clear message indicating the accepted
formats. Based on usability feedback, the interface is adjusted to ensure a smooth and user-
friendly experience.
Black box testing treats the system as a "black box" where the internal code and
architecture are not considered. Instead, testing focuses purely on inputs and outputs.
Testers provide a variety of input media files and observe the output (real or fake
classification, error messages, etc.) to ensure correctness. They also evaluate how the
system responds to unexpected or incorrect input, such as uploading text files or extremely
large videos. The goal is to ensure the application behaves correctly and predictably from
a user's perspective, regardless of the underlying implementation.
In contrast to black box testing, white box testing involves a detailed examination of the
internal workings of the system. This includes checking the structure of the code, data
transformations, model layer outputs, and normalization processes. For example, testers
may verify that pixel values are normalized to the correct range before being fed into the
model, or that the intermediate outputs of convolutional layers fall within expected
distributions. This kind of testing is particularly useful for debugging and optimizing model
performance and verifying that the architecture and data handling conform to design
specifications.
Output testing validates the accuracy and clarity of the final system outputs. The
classification results (i.e., "Real" or "Fake") are compared against a labeled test dataset to
assess prediction accuracy. Additionally, the visual presentation of results is examined—
for instance, checking whether the predicted label is displayed near the detected face along
24
with a confidence score overlay. The correctness of the overlay, font clarity, color-coding
(e.g., red for fake, green for real), and alignment with detected features are tested to ensure
users can easily understand the results.
User Acceptance Testing is the final phase where the system is tested by actual end users—
typically a representative group of the intended audience. These users interact with the
system by uploading various media files and interpreting the detection results. Their
feedback is collected on several parameters, including the clarity of classification results,
usefulness of confidence levels, and ease of navigation and interaction. Based on this
feedback, minor enhancements are often implemented, such as more descriptive file format
alerts, better result styling, and improved layout responsiveness. UAT ensures that the
system is ready for deployment and meets real-world user expectations.
In any machine learning or AI-based system, testing is a critical phase that ensures
reliability, correctness, and performance under various conditions. In this project, several tools
have been employed to support both unit-level and system-level testing. The Python modules
unittest and pytest serve as automated unit testing frameworks. These tools allow the developer to
create test cases for individual components such as data loading, preprocessing, face detection,
and model prediction. They help maintain the integrity of the codebase by ensuring that newly
added functions do not break existing features. pytest in particular provides a more scalable and
user-friendly syntax and supports advanced features like fixtures and parameterized testing,
making it ideal for complex deep learning projects.
25
TensorBoard’s interactive graphs and histograms help in understanding how weights and biases
evolve during the training process.
Evaluating the performance of a deep learning model goes beyond simply reporting
accuracy. For a binary classification task such as deepfake detection, it is vital to use a set of robust
evaluation metrics that account for different types of prediction errors. Accuracy, while commonly
used, only indicates the overall correctness of predictions. It can be misleading in imbalanced
datasets where one class may dominate. For instance, if most videos are real, a model predicting
everything as real might still appear accurate.
To address this limitation, Precision is used to measure the number of correctly identified
fake instances divided by the total instances the model predicted as fake. High precision indicates
that when the model claims something is fake, it is likely correct—important in minimizing false
accusations of authenticity. Conversely, Recall focuses on the model’s ability to detect actual fake
content. It is calculated as the number of correctly predicted fake instances divided by the total
number of actual fake samples. High recall ensures the model doesn't miss potential threats in the
form of deepfakes.
The F1-Score serves as a balanced metric that considers both precision and recall. It is
especially useful when dealing with uneven class distributions or when both false positives and
false negatives carry significant consequences. An ideal deepfake detection model should aim for
a high F1-score to maintain balance between caution and coverage. Finally, the ROC-AUC
26
(Receiver Operating Characteristic - Area Under the Curve) metric is employed to evaluate the
trade-off between sensitivity (true positive rate) and specificity (false positive rate) across various
threshold settings.
Building a user-facing AI system demands that it not only performs accurately but also handles
unexpected situations gracefully. The error handling module in the deepfake detection system is
designed to provide informative and user-friendly responses to a variety of potential issues,
ensuring robustness and enhancing user experience.
One common scenario is when a user uploads an image or video where no recognizable
human face is present. In such cases, the system returns the message: “No human face detected.”
This prevents the model from processing irrelevant or non-human content, which could lead to
misleading outputs. This check is implemented early in the pipeline using face detection
algorithms like MTCNN or Haar cascades.
Another error addressed is invalid file formats. The system is designed to work with
specific media formats (e.g., .jpg, .png, .mp4), and when an unsupported file is uploaded, it
prompts the message: “Unsupported file type.” This safeguards the application from crashing due
to unrecognized data structures and guides the user toward acceptable input types.
If the system encounters a failure in loading the trained model—either due to file
corruption, incorrect path, or missing files—it raises an alert with the message: “Model loading
error.” This is a critical failure point, and the error message informs the user or developer to
recheck the deployment files.
Lastly, the system incorporates a confidence threshold mechanism. If the model makes a
prediction but with a confidence level below 60%, it triggers a warning: “Low confidence. Re-
upload suggested.” This acts as a safeguard against unreliable outputs and encourages users to
submit better-quality inputs, such as clearer images or videos with good lighting and frontal faces.
Collectively, these error-handling features make the system more reliable, user-oriented, and
capable of functioning well in real-world scenarios.
27
CHAPTER 9
CONCLUSION AND FUTURE ENHANCEMENT
9.1 CONCLUSION
The increasing prevalence of deepfake media poses a significant threat to digital content
authenticity, personal identity, and information security. This project presents an effective solution
to detect deepfake videos and images using deep learning models. By leveraging convolutional
neural networks (CNNs), the system can learn and extract complex visual features from input
media to distinguish between real and fake content with high accuracy.
Throughout the project, various aspects of deepfake generation and detection were
explored. The proposed system was trained and tested on benchmark datasets and demonstrated
promising results in identifying synthetic facial manipulations. Unlike traditional manual or rule-
based methods, this system relies on learned features, making it more scalable, adaptive, and
suitable for real-world applications.
This work contributes to the broader field of digital forensics and can assist platforms, law
enforcement, and the general public in countering misinformation, fraud, and media tampering.
The proposed model successfully meets the core objectives of detecting manipulated media and
improving awareness regarding the risks of deepfake content.
While the proposed deepfake detection system demonstrates effective performance, there
are opportunities for further development and improvement in future work. Some key areas of
enhancement include:
Incorporating Temporal Features: Current models often analyze frames individually. Adding
temporal models like 3D CNNs or LSTMs will enable better video-level analysis by capturing
motion-based inconsistencies.
28
Multi-modal Detection: Integrating both audio and video features will provide more robust
detection, particularly in detecting deepfakes that also manipulate voice and speech patterns.
Real-time Detection Capabilities: Optimization of the system for real-time processing can allow
for implementation in web applications, browser extensions, or mobile platforms for on-the-fly
deepfake analysis.
User Interface Development: Building a simple and interactive front-end interface would allow
non-technical users to upload and check media content for authenticity.
This project lays a solid foundation for future advancements in automated deepfake detection and
has the potential to evolve into a full-fledged system that plays a key role in combating the spread
of synthetic misinformation.
29
APPENDICES
APPENDIX 1
SOURCE CODE
Training.py
import tensorflow as tf
import numpy as np
import cv2
import os
def build_model():
model = keras.Sequential([
layers.MaxPooling2D((2, 2)),
layers.MaxPooling2D((2, 2)),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(128, activation='relu'),
30
layers.Dense(1, activation='sigmoid')
])
return model
def load_data(data_dir):
def process_folder(folder):
img = cv2.imread(img_path)
images.append(img)
labels.append(label)
31
# Train Model
model = build_model()
model.save("deepfake_model.h5")
32
APP.PY
import streamlit as st
import tensorflow as tf
import numpy as np
import cv2
def load_model():
return tf.keras.models.load_model("D:\jp\deepfake_model.h5")
# Preprocess image
def preprocess_image(image):
return image
# Prediction function
processed_image = preprocess_image(image)
prediction = model.predict(processed_image)[0][0]
# Streamlit UI
33
st.title("Deepfake Detection System")
model = load_model()
# Image Upload
image = Image.open(uploaded_file)
if __name__ == "__main__":
import numpy as np
face = extract_face(image_path)
prediction = model.predict(face)[0][0]
else:
34
APPENDIX 2
SCREENSHOTS
INITIAL WEBPAGE
35
DEEPFAKE DETECTION
1.REAL IMAGES
36
Figure 10.3 DeepFake Detection prediction 2: Real
37
Figure 10.4 DeepFake Detection prediction 3: Real
38
2.FAKE IMAGES
39
Figure 10.6 DeepFake Detection prediction 5 :Deepfake
40
REFERENCES
41
REFERENCES
1. Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019).
FaceForensics++: Learning to detect manipulated facial images.
2. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: a Compact Facial
Video Forgery Detection Network. In Proceedings of the IEEE International Workshop
on Information Forensics and Security (WIFS), 1–7.
3. Chollet, F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 1251–1258.
4. Nguyen, H. H., Yamagishi, J., & Echizen, I. (2019). Capsule-forensics: Using capsule
networks to detect forged images and videos. In ICASSP 2019 - IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2307–2311.
5. Li, Y., Chang, M. C., & Lyu, S. (2020). Face X-ray for more general face forgery
detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 5001–5010.
6. Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C.
(2020). The Deepfake Detection Challenge (DFDC) Dataset. arXiv preprint
arXiv:2006.07397.
https://www.kaggle.com/c/deepfake-detection-challenge
7. Li, Y., & Lyu, S. (2019). Exposing DeepFake Videos By Detecting Face Warping
Artifacts. In Proceedings of the IEEE Conference (CVPRW).
8. Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-
Scale Image Recognition. arXiv preprint arXiv:1409.1556.
9. Abavisani, M., & Patel, V. M. (2020). Exploring the Space of Deepfake Detection. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), 1–8.
10. Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. In
Proceedings of the International Conference on Learning Representations (ICLR).
https://arxiv.org/abs/1412.6980
42