1.Deepfake Detection Using Lstm and Resnext
1.Deepfake Detection Using Lstm and Resnext
On
DEEPFAKE DETECTION USING LSTM AND
RESNEXT
Submitted by,
ARADHYULA SHASHANK 20J41A1201
MAY-2024
MALLA REDDY ENGINEERING COLLEGE
Maisammaguda, Secunderabad, Telangana, India 500100
BONAFIDE CERTIFICATE
Bonafide record of project work carried out under my/our supervision during the
academic year 2023 – 2024 and that this work has not been submitted elsewhere
for a degree.
DECLARATION
A SAHITHI 20J41A1202
B VISHNUVARDHAN 20J41A1210
G SRIKANTH 20J41A1218
i
MALLA REDDY ENGINEERING COLLEGE
Maisammaguda, Secunderabad, Telangana, India 500100
ACKNOWLEDGEMENT
We are extremely thankful and indebted to our internal guide, Dr. Deena
Babu Mandru, Professor and HOD, Department of Information Technology,
MREC (A) for his constant guidance, encouragement and moral support
throughout the project.
Finally, we would also like to thank all the faculty and staff of the IT
Department who helped us directly or indirectly, parents and friends for their
cooperation in completing the project work.
ii
ABSTRACT
The growing computation power has made the deep learning algorithms so
called as deep fakes have become very simple. Scenarios where these realistic
face swapped deep fakes are used to create political distress, fake terrorism
events, revenge porn, blackmail peoples are easily envisioned. In this work, we
describe a new deep learning-based method that can effectively distinguish AI-
detecting the replacement and reenactment deep fakes. We are trying to use
Artificial Intelligence (AI) to fight Artificial Intelligence (AI). Our system uses a
these features and further used to train the Long Short-Term Memory (LSTM)
based Recurrent Neural Network (RNN) to classify whether the video is subject
to any kind of manipulation or not, i.e whether the video is deep fake or real
video. To emulate the real time scenarios and make the model perform better on
real time data, we evaluate our method on large amount of balanced and mixed
+[1], Deepfake detection challenge[2], and Celeb-DF[3]. We also show how our
system can achieve competitive result using very simple and robust approach.
iii
PROJECT ACKNOWLEDGEMENT
The work “would not have been possible” without the contribution of the
university. We are indebted to teachers who have offered continuous support
while preparing the project.
We are also grateful to all those “with whom” We had the opportunity to
do the work and complete this project. Each member of the “dissertation
committee” have offered and provided “professional guidance” and have given
us great advice while completing the project.
iv
TABLE OF CONTENTS
DESCRIPTION PAGE NO
ACKNOWLEDGMENT ii
ABSTRACT iii
PROJECT ACKNOWLEDGEMENT iv
TABLE OF CONTENTS v
LIST OF FIGURES vi
LIST OF ABBREVIATIONS vii
S NO CHAPTER PAGE NO
1 INTRODUCTION………………………………………… 01-08
2 BACKGROUND STUDY ……………………………….. 09-30
2.1 Literature Review…………………………………….. 09-18
2.2 Existing System………………………………............. 18-30
2.2.1 DFDC Baseline Model…………………….......... 18-21
2.2.2 Faceforensics++………………………………… 21-24
2.2.3 Deepfake Detection with Capsule Networks…... 24-26
2.2.4 Hybrid Architectures with Attention Mechanisms 27-30
3 METHODOLOGY………………………………….......... 31-46
3.1 Proposed Methodology……………………………….. 31-33
3.1.1 System Architecture……………………………... 34-36
3.2 Modules………………………………………………. 36-42
3.2.1 Data-Set-Gathering……………………………… 36-37
3.2.2 Pre-processing…………………………………... 37-38
3.2.3 Data-Set-split……………………………………. 39-41
3.2.5 Hyper parameter tuning…………………………. 41-42
3.3 System Design..………………………………………... 42-46
3.3.1 Use Case Diagram………………………………. 42
3.3.2 Activity Diagram………………………………... 43-44
3.3.3 Sequence Diagram………………………………. 45
3.3.4 Workflow………………………………………... 46
v
4 RESULT AND ANALYSIS……………………………… 47-52
5 CONCLUSION…………………………………………… 53
6 REFERENCES……………………………………………. 54-55
vi
LIST OF FIGURES
vii
LIST OF ABBREVIATIONS
vii
i
Deepfake detection using LSTM and ResNext
CHAPTER 1
INTRODUCTION
1.1 Introduction
The advent of deepfake technology has blurred the line between reality and
fabrication, raising profound concerns about the integrity of visual and auditory
information. Deepfake algorithms leverage deep neural networks to synthesize
realistic images, videos, and audio recordings by seamlessly superimposing one
person's likeness onto another or manipulating existing content to convey false
narratives. With the democratization of AI tools and the widespread availability
of training data, the barrier to creating convincing deepfakes has diminished,
amplifying the risk of malicious exploitation for disinformation, blackmail, or
other nefarious purposes.
Despite the formidable obstacles ahead, there is cause for optimism in the
collective efforts to combat deepfake manipulation. The interdisciplinary nature
of this endeavor, spanning fields such as artificial intelligence, cybersecurity,
psychology, and law, underscores the complexity of the problem and the
necessity for a multifaceted approach. By harnessing the power of technological
innovation, ethical considerations, and regulatory frameworks, we can strive
towards a future where the integrity of digital content is safeguarded, and trust in
media authenticity is restored.
Fake news is not just one type of content coming from one source but a
different type of manipulated media created to misinform or deceive readers
deliberately. Usually, this news is visually sensational, made to either influence
people’s views, push a political agenda or cause confusion. Viewers see that fake
reality and share it through social media, and thus, becoming responsible for the
virality of false news. Fake news is increasingly being shared via social media
platforms like Twitter and Facebook. For example, in Eritrea, the polygamy hoax
stating that the government of Eritrea had made it mandatory for each man to
marry two wives was shared in at least four countries, namely Kenya, Nigeria,
Eritrea, and Sudan. The Eritrea Embassy in Nairobi later refuted this story which
initially sounded genuine and trended on Twitter
learning” and fake content, was named after Reddit’s user name “Deepfake” who
in late 2017 claimed to develop a machine learning algorithm for transferring
celebrity faces into adult contents
The deepfake technology was rapidly extended to create fake news, including
newscasts, pictures, and even leaders’ speeches. A deepfake is a video or still
image manipulated from the original one to replace the person’s identity (face)
with another identity, with or without an audio track using a generative
adversarial network (GAN).A GAN is a type of neural network in which two
different networks coordinate with each other. While one network generates the
fakes, the other evaluates if the generated model is real or fake. With the
evolution of the deepfake technology, three categories of deepfakes have
emerged.
Face Swap (swapping the face of one subject in an image or a video with
another while keeping the rest of the body and background context) lip sync
(matching a subject’s lip movements with pre-recorded audio, while maintaining
the expressions on the rest of the face), and puppet-master (recording the
movements of an “actor” and superimposing that image with another subject).
With the rapid growth of deepfake generation open source tools —such as
DeepFaceLab ,Faceswap , wave2lip , among others—, any person with a laptop
and some basic understanding of programming can create their own deepfake.
Recently, mobile applications (E.g., ZAO , and WOMBO ) are trending on the
Internet due to their ingenious capability of creating deepfakes within seconds.
Taking into account the availability and accessibility of these open-source tools
and applications, their possible malicious usage (like disinformation or online
harassment) by an expert or a novice user, and the improvement in the quality of
deepfakes and deepfake generation algorithms, there is a need of a deepfake
detection system to determine whether an image or a video is deepfake or
original content.
over the past year. Different states across the US are defining procedures to
criminalize deepfake pornography and prohibit the use of deepfakes, especially
in the voting context.
In the authors inserted inputs called adversarial examples into every video
frame to show that the current state-of-the-art deepfake detection methods can be
easily bypassed. Another strategy to countermeasure fake videos and images
posted online is to embed a digital watermark in the content . However, digital
watermarks are not foolproof , and this problem can be countered by
incorporating a blockchain , to hold a tamper-proof record of watermark and
content features. In the literature, only one paper can be found, which has
proposed a proof-of-concept deepfake video news detection and preven tion
system using watermarking and blockchain technologies. Digimarc’s robust
audio and image watermarking techniques are employed to embed watermarks in
audio and video tracks of video news clips before their distribution
Blockchain tech nology is used to store video and its metadata for performing
forensic analysis. The watermarks can be detected at social media networks’
portals, nodes, and back ends. For deepfake generation, the authors have used the
face-swapping algorithm to replace the subject’s face with a target person. A two
stage authentication process is performed to detect fake news videos generated
from watermarked video clips. While the f irst stage uses Digimarc’s watermark
reader on the frames of the deepfake video, the second stage uses the information
stored in the blockchain. The proposed scheme provides an informal security
analysis of the deepfake detection scheme and presents the simulation results of
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext
the face-swap algorithm for proof of concept. However, the transparency and
robustness of the embedded watermarks have not been evaluated.
This project presents a proof-of-concept system that detects fake news video
clips generated using voice impersonation. A hybrid watermarking method is
proposed that combines robust and fragile speech watermarking schemes, thus
providing copyright protection and tamper-proofing. Cross-referencing of speech
and video features is used to provide resistance against possible copy attacks.
The metadata related to the embedded watermarks and the content features is
stored in the blockchain for tamper-proof recording. The simulations are
performed to evaluate the embedded watermark’s robustness against common
signal processing and video integrity attacks. The rest of the paper is organized
as follows. Section II defines the basic building blocks of the scheme. In Section
III, the design and functionality of the proposed system are described in detail.
The results of the experiments designed to evaluate the performance of the
proposed scheme are presented in Section IV. Finally, in Section V, we present
the conclusions and possible future research directions of this work.
It becomes very important to spot the difference between the deepfake and pris
tine video. We are using AI to fight AI. Deepfakes are created using tools like
FaceApp and Face Swap, which using pre-trained neural networks like
GANorAutoencoders for these deepfakes creation. Our method uses a LSTM
based artificial neural network to process the sequential temporal analysis of the
video frames and pre-trained Res-Next CNN to extract the frame level fea tures.
ResNext Convolution neural network extracts the frame-level features and these
features are further used to train the Long Short Term Memory based artificial
Recurrent Neural Network to classify the video as Deepfake or real. To emulate
the real time scenarios and make the model perform better on real time data, we
trained our method with large amount of balanced and combination of various
available dataset like FaceForensic++, Deepfake detection challenge, and Celeb-
DF.
Further to make the ready to use for the customers, we have developed a front
end application where the user the user will upload the video. The video will be
processed by the model and the output will be rendered back to the user with the
classification of the video as deepfake or real and confidence of the model.
CHAPTER 2
BACKGROUND STUDY
In the context of deepfake detection, here's a brief survey of key papers and
approaches that are relevant . Keep in mind that there might be newer
developments that aren't covered here:
the RNNs used for encoding sentences). The obtained results attest to the
effectiveness of the proposed ideas and show that our model, particularly when
considering pre-training and the combination of neural representations together
[2] “Rana, M. S., Nobi, M. N., Murali, B., & Sung, A. H. (2022).
Deepfake detection: A systematic literature review. IEEE access, 10, 25494-
25513 Over the last few decades, rapid progress in AI, machine learning, and
deep learning has resulted in new techniques and various tools for manipulating
multimedia. Though the technology has been mostly used in legitimate
applications such as for entertainment and education, etc., malicious users have
also exploited them for unlawful or nefarious purposes. For example, high-
quality and realistic fake videos, images, or audios have been created to spread
misinformation and propaganda, foment political discord and hate, or even harass
and blackmail people. The manipulated, high-quality and realistic videos have
become known recently as Deepfake. Various approaches have since been
described in the literature to deal with the problems raised by Deepfake. To
provide an updated overview of the research works in Deepfake detection, we
conduct a systematic literature review (SLR) in this paper, summarizing 112
relevant articles from 2018 to 2020 that presented a variety of methodologies.
We analyze them by grouping them into four different categories: deep learning-
based techniques, classical machine learning-based methods, statistical
techniques, and blockchain-based techniques. We also evaluate the performance
of the detection capability of the various methods with respect to different
datasets and conclude that the deep learning-based methods outperform other
methods in Deepfake detection.
[3] “A system for mitigating the problem of deepfake news videos using
watermarking,” Electronic Imaging, no. 4, pp. 117–1–117–10, 2020. Deepfakes
constitute fake content -generally in the form of video clips and other media
formats such as images or audio- created using deep learning algorithms. With
the rapid development of artificial intelligence (AI) technologies, the deepfake
content is becoming more sophisticated, with the developed detection techniques
proving to be less effective. So far, most of the detection techniques in the
literature are based on AI algorithms and can be considered as passive. This
paper presents a proof-of-concept deepfake detection system that detects fake
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext
news video clips generated using voice impersonation. In the proposed scheme,
digital watermarks are embedded in the audio track of a video using a hybrid
speech watermarking technique. This is an active approach for deepfake
detection. A standalone software application can perform the detection of robust
and fragile watermarks. Simulations are performed to evaluate the embedded
watermark's robustness against common signal processing and video integrity
attacks. As far as we know, this is one of the first few attempts to use digital
watermarking for fake content detection.
online social media deploy methods to filter fake content. Although this can be
an effective method, its centralized approach gives an enormous power to the
manager of these services. Considering the above, this paper outlines the main
principles and research approach of the ongoing DISSIMILAR project, which is
focused on the detection of fake news on social media platforms using
information hiding techniques, in particular, digital watermarking, combined
with machine learning approacheslivestock management schemes. The
standardisation by IOS of the next generation of injectable electronic
transponders opened a world-wide market for all species of animals. The third
generation, currently under development, includes also read/write possibilities
and sensor technologies for automatic monitoring of animal health and
performance. The addition of these sensors facilitates the automation of
sophisticated tasks such as health and reproduction status monitoring. Examples
are given of the automated oestrus detection by signal processing of one or more
sensor signals.
length inputs (i.e. video frames) to variable length outputs (i.e. natural language
text) and can model complex temporal dynamics; yet they can be optimized with
backpropagation. Our recurrent long-term models are directly connected to state-
of-the-art visual convnet models and can jointly trained, updating temporal
dynamics and convolutional perceptual representations simultaneously. Our
results show such models have distinct advantages over state-of-the-art models
for recognition or generation which are separately defined and/or optimized.
[8] “Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y. G. (2020, October).
Wilddeepfake: A challenging real-world dataset for deepfake detection.
In Proceedings of the 28th ACM international conference on multimedia (pp.
2382-2390).” In recent years, the abuse of a face swap technique called deepfake
has raised enormous public concerns. So far, a large number of deepfake videos
(known as "deepfakes") have been crafted and uploaded to the internet, calling
for effective countermeasures. One promising countermeasure against deepfakes
is deepfake detection. Several deepfake datasets have been released to support
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext
the training and testing of deepfake detectors, such as DeepfakeDetection [1] and
FaceForensics++ [23]. While this has greatly advanced deepfake detection, most
of the real videos in these datasets are filmed with a few volunteer actors in
limited scenes, and the fake videos are crafted by researchers using a few popular
deepfake softwares. Detectors developed on these datasets may become less
effective against real-world deepfakes on the internet. To better support detection
against real-world deepfakes, in this paper, we introduce a new dataset
WildDeepfake, which consists of 7,314 face sequences extracted from 707
deepfake videos collected completely from the internet. WildDeepfake is a small
dataset that can be used, in addition to existing datasets, to develop and test the
effectiveness of deepfake detectors against real-world deepfakes. We conduct a
systematic evaluation of a set of baseline detection networks on both existing and
our WildDeepfake datasets, and show that WildDeepfake is indeed a more
challenging dataset, where the detection performance can decrease drastically.
[10] K. Zhang, Z. Zhang, Z.and Li, , and Y. Qiao, “Joint face detection
and alignment using multitask cascaded convolutional networks,” IEEE Signal
Processing Letters, vol. 23, no. 10, p. 1499–1503, 2016. Face detection and
alignment in unconstrained environment are challenging due to various poses,
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext
illuminations, and occlusions. Recent studies show that deep learning approaches
can achieve impressive performance on these two tasks. In this letter, we propose
a deep cascaded multitask framework that exploits the inherent correlation
between detection and alignment to boost up their performance. In particular, our
framework leverages a cascaded architecture with three stages of carefully
designed deep convolutional networks to predict face and landmark location in a
coarse-to-fine manner. In addition, we propose a new online hard sample mining
strategy that further improves the performance in practice. Our method achieves
superior accuracy over the state-of-the-art techniques on the challenging face
detection dataset and benchmark and WIDER FACE benchmarks for face
detection, and annotated facial landmarks in the wild benchmark for face
alignment, while keeps real-time performance.
[11] “Ahmed, S. R., Sonuç, E., Ahmed, M. R., & Duru, A. D. (2022,
June). Analysis survey on deepfake detection and recognition with convolutional
neural networks. In 2022 International Congress on Human-Computer
Interaction, Optimization and Robotic Applications (HORA) (pp. 1-7). IEEE.
Deep Learning (DL) is the most efficient technique to handle a wide range of
challenging problems such as data analytics, diagnosing diseases, detecting
anomalies, etc. The development of DL has raised some privacy, justice, and
national security issues. Deepfake is a DL-based application that has been very
popular in recent years and is one of the reasons for these problems. Deepfake
technology can create fake images and videos that are difficult for humans to
recognize as real or not. Therefore, it needs to be proposed some automated
methods for devices to detect and evaluate threats. In another word, digital and
visual media must maintain their integrity. A set of rules used for Deepfake and
some methods to detect the content created by Deepfake have been proposed in
the literature. This paper summarizes what we have in the critical discussion
about the problems, opportunities, and prospects of Deepfake technology. We
aim for this work to be an alternative guide to getting knowledge of Deepfake
detection methods. First, we cover Deepfake history and Deepfake techniques.
Then, we present how a better and more robust Deepfake detection method can
be designed to deal with fake content.
[12] Deepfake defense not only requires the research of detection but also
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext
requires the efforts of generation methods. However, current deepfake methods suffer
the effects of obscure workflow and poor performance. To solve this problem, we
present DeepFaceLab, the current dominant deepfake framework for face-swapping. It
provides the necessary tools as well as an easy-to-use way to conduct high-quality
face-swapping. It also offers a flexible and loose coupling structure for people who
need to strengthen their pipeline with other features without writing complicated
boilerplate code. We detail the principles that drive the implementation of
DeepFaceLab and introduce its pipeline, through which every aspect of the pipeline
can be modified painlessly by users to achieve their customization purpose. It is
noteworthy that DeepFaceLab could achieve cinema-quality results with high fidelity.
We demonstrate the advantage of our system by comparing our approach with other
face-swapping methods
[16] Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., & Xia, W. (2021).
Learning self-consistency for deepfake detection. In Proceedings of the
IEEE/CVF international conference on computer vision (pp. 15023-15033). We
propose a new method to detect deepfake images using the cue of the source
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext
feature inconsistency within the forged images. It is based on the hypothesis that
images' distinct source features can be preserved and extracted after going
through state-of-the-art deepfake generation processes. We introduce a novel
representation learning approach, called pair-wise self-consistency learning
(PCL), for training ConvNets to extract these source features and detect deepfake
images. It is accompanied by a new image synthesis approach, called
inconsistency image generator (I2G), to provide richly annotated training data for
PCL. Experimental results on seven popular datasets show that our models
improve averaged AUC from 96.45% to 98.05% over the state of the art in the
in-dataset evaluation and from 86.03% to 92.18% in the cross-dataset evaluation.
[2] FaceForensics++
Temporal analysis, on the other hand, focuses on the temporal flow and
coherence of the video sequence. This facet is particularly crucial for detecting
deepfakes, as they often struggle to maintain consistency over time, leading to
irregularities in motion or facial dynamics. Long Short-Term Memory (LSTM)
networks play a central role in this aspect, enabling the model to capture and
analyze temporal patterns effectively. By considering the progression of actions
and expressions throughout the video, the baseline model can discern whether the
observed behaviors are natural or artificially generated.
DISADVANTAGES
Certainly, here are some disadvantages of DFDC Baseline Model Detecting
deepfakes using the DFDC (DeepFake Detection Challenge) Baseline Model has
several disadvantages, which are as follows:
2. High False Positive Rate: Due to its reliance on specific features, the baseline
model may misclassify authentic videos as deepfakes, leading to a high false
positive rate. This can cause unnecessary concern or mistrust, particularly in
contexts where accurate identification is crucial.
2.2.2FaceForensics++
DISADVANTAGES
Detecting deepfakes using FaceForensics++ has several limitations and
disadvantages. Here's a detailed breakdown in point format:
5. Inference Time: The time it takes to analyze and detect deepfakes using
FaceForensics++ could be relatively high, especially for high-resolution videos
or when using complex detection models. This latency may not be acceptable for
applications where real-time detection is critical.
detection systems.
DISADVANTAGES
Detecting deepfakes using Capsule Networks (CapsNets) has its own set of
disadvantages, despite being a promising approach. Below are some detailed
points highlighting these
research and development to overcome these limitations and realize their full
potential in combating the spread of deepfake content.
additional modules for feature fusion and decision fusion. Feature fusion
modules enable the integration of spatial and temporal information extracted by
different components of the network, facilitating a comprehensive understanding
of the input data. Decision fusion mechanisms combine the outputs of multiple
branches or modalities within the network, further enhancing the model's
discriminative power.
DISADVANTAGES
CHAPTER 3
METHODOLOGY
LSTM:
ResNeXt:
The image you sent is a block diagram of a deepfake detection system. The
system first uploads a video to a server. The server then preprocesses the video,
which involves splitting the video into frames and then cropping the faces out of
the frames. The cropped faces are then saved in a dataset. The dataset is then split
into training and testing data. The training data is used to train a deep learning
model to detect deepfakes. The testing data is used to evaluate the performance of
the model. The deep learning model used in the system is a combination of a
Long Short-Term Memory (LSTM) network and a ResNext network. The LSTM
network is used to learn the temporal features of the video, while the ResNext
network is used to learn the spatial features of the video. Once the model is
trained, it is used to predict whether a new video is a real video or a deepfake. The
prediction is made by loading the trained model and then feeding the video into
the model. The model then outputs a classification, which is either "real" or
"fake".Here is a more detailed explanation of the steps involved in the system:
Upload video: The first step in the system is to upload the video that you want
to be analyzed. The video can be uploaded from a local file or from a remote
server.
Data splitting: The dataset is then split into training and testing data. The
training data is used to train the deep learning model to detect deepfakes. The
testing data is used to evaluate the performance of the model.
Data loader: The data loader is responsible for loading the training and testing
data into the deep learning model.
Train/Test the Model: The training data is used to train the deep learning
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext
model. The testing data is used to evaluate the performance of the model.
Deepfake detection Model: The deep learning model used in the system is a
combination of a Long Short-Term Memory (LSTM) network and a ResNext
network. The LSTM network is used to learn the temporal features of the video,
while the ResNext network is used to learn the spatial features of the video.
Load Trained Model: Once the model is trained, it is saved to a file. The
model can then be loaded back into memory when you want to use it to detect
deepfakes.
Prediction Flow: The prediction flow is the part of the system that is used to
make predictions on new videos. To make a prediction on a new video, you
simply load the trained model and then feed the video into the model. The model
will then output a classification, which is either "real" or "fake".
3.2MODULES
3.2.2. Pre-processing
In this step, the videos are preprocessed and all the unrequired
and noise is removed from videos. Only the required portion of the
video i.e face is detected and cropped. The first steps in the
preprocessing of the video is to split the video into frames.
After splitting the video into frames the face is detected in each of
the frame and the frame is cropped along the face. Later the cropped
frame is again converted to a new video by combining each frame of
the video. The process is followed for each video which leads to
creation of processed dataset containing face only videos. The frame
that does not contain the face is ignored while preprocessing.To
maintain the uniformity of number of frames, we have selected a
threshold value based on the mean of total frames count of each video.
Another reason for selecting a threshold value is limited computation
power.
As a video of 10 second at 30 frames per second(fps) will have
total 300 frames and it is computationally very difficult to process the
300 frames at a single time in the experimental envi- ronment.
The dataset is split into train and test dataset with a ratio of 70% train videos
(4,200) and 30% (1,800) test videos.
The train and test split is a balanced split i.e 50% of the real and 50%
of fake videos in each split.
3.2.4 Architecture
Our model is a combination of CNN and RNN. We have used the Pre-
trained ResNext CNN model to extract the features at frame level and based
on the extracted features a LSTM network is trained to classify the video as
deepfake or pristine. Us- ing the Data Loader on training split of videos the
labels of the videos are loaded and fitted into the model for training.
ResNext
Instead of writing the code from scratch, we used the pre-trained model
of ResNext for feature extraction. ResNext is Residual CNN network
optimized for high per- formance on deeper neural networks. For the
The system begins with three datasets labeled FaceForensic, DFDC Dataset,
and Celeb-DF, containing a total of 6,000 videos. After preprocessing, which
involves face detection and cropping, these videos become the ‘Face-cropped
Dataset’. This dataset is then split into training videos (4,200 videos) and test
videos (1,800 videos).
A Data Loader component feeds the videos and labels into a ResNext CNN
(Convolutional Neural Network) and an LSTM (Long Short-Term Memory
network), likely to train a deep learning model that can identify patterns in real
and deepfake videos. A confusion matrix then evaluates the performance of
the model, providing an accuracy rating. Finally, the system exports the trained
model, presumably for deployment to identify deepfakes in future videos.
The system has a user who uploads a video. The system then takes the video
through preprocessing steps. After preprocessing, the video is fed into a
ResNext CNN (Convolutional Neural Network) and an LSTM (Long Short-
Term Memory network). These are likely deep learning models that analyze
the video to determine if it's real or fake (REAI/FAKE). The system then
outputs a result, presumably categorizing the video as real or fake.
3.3.4 WORKFLOW
The diagram shows a face recognition system that operates on a user video.
The first step is to pre-process the video, which likely involves resizing and
converting the video to a format suitable for analysis. Then, the system
performs face detection and cropping, isolating the faces in the video. Next, a
trained model is loaded to analyze the preprocessed faces. This model was
likely trained on a large dataset of faces. Finally, the system outputs a
prediction, classifying the faces as real or fake.
CHAPTER 4
RESULT ANALYSIS
Finally, the trained model is ready for inference. Given a new video
Figure 4.2 In the next step click on the choose file button visible on the
home page and select the video you want to check whether it is a deepfake or
not. The selected video will be visible on the interface. Next click on the
upload button to upload the video.
Figure 4.3 explains that after the uploading. The video will be analyzed with
LSTM and ResNext algorithm and gives the output whether it is fake video or
real video.
Figure 4.4 shows the uploading of fake video and the model starts analyzing
the video by using the LSTM and ResNext algorithm.
Figure
4.5 shows
the output of
a uploaded
video as fake
by using the
LSTM
and ResNext
algorithm.
Figure 4.6 explains that Always Select the video which has face in the Video
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext
Figure 4.7 show that there will be no output if we upload a video with no
faces.
Figure 4.8 shows an error if we upload a video greater than 100MB.either the
video might be a fake video or a original video.
Figure 4.9 shows that there will be a warning under the box of choose file to
select a file. if we click upload button without uploading a file
CHAPTER 5
CONCLUSION
REFERENCES:
[2] “Rana, M. S., Nobi, M. N., Murali, B., & Sung, A. H. (2022). Deepfake
detection: A systematic literature review. IEEE access, 10, 25494-25513.
[4] D. Güera and E. J. Delp, “Deepfake video detection using recurrent neural
networks,” in 15th IEEE International Conference on Advanced Video and
Signal Based Surveillance, 2018, pp. 1–6.
[8] “Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y. G. (2020, October).
Wilddeepfake: A challenging real-world dataset for deepfake detection.
In Proceedings of the 28th ACM international conference on multimedia (pp.
2382-2390).”
[10] K. Zhang, Z. Zhang, Z.and Li, , and Y. Qiao, “Joint face detection and
alignment using multitask cascaded convolutional networks,” IEEE Signal
Processing Letters, vol. 23, no. 10, p. 1499–1503, 2016.
[11] Ahmed, S. R., Sonuç, E., Ahmed, M. R., & Duru, A. D. (2022, June).
Analysis survey on deepfake detection and recognition with convolutional neural
networks. In 2022 International Congress on Human-Computer Interaction,
Optimization and Robotic Applications (HORA) (pp. 1-7). IEEE.
[16] “ Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., & Xia, W. (2021).
Learning self-consistency for deepfake detection. In Proceedings of the
IEEE/CVF international conference on computer vision (pp. 15023-15033).