0% found this document useful (0 votes)
4 views

1.Deepfake Detection Using Lstm and Resnext

The document is a major project report on 'Deepfake Detection Using LSTM and ResNext' submitted by students from Malla Reddy Engineering College for their Bachelor of Technology in Information Technology. It outlines the growing challenge of deepfake technology, the proposed methodology for detection using advanced deep learning techniques, and acknowledges the support received during the project. The report emphasizes the need for robust detection mechanisms to combat the misuse of deepfake technology in various malicious contexts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1.Deepfake Detection Using Lstm and Resnext

The document is a major project report on 'Deepfake Detection Using LSTM and ResNext' submitted by students from Malla Reddy Engineering College for their Bachelor of Technology in Information Technology. It outlines the growing challenge of deepfake technology, the proposed methodology for detection using advanced deep learning techniques, and acknowledges the support received during the project. The report emphasizes the need for robust detection mechanisms to combat the misuse of deepfake technology in various malicious contexts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

A MAJOR PROJECT REPORT

On
DEEPFAKE DETECTION USING LSTM AND
RESNEXT

Submitted by,
ARADHYULA SHASHANK 20J41A1201

ANNABATHULA SAHITHI 20J41A1202

BOMPALLY VISHNUVARDHAN 20J41A1210

GUGULOTH SRIKANTH 20J41A1218

In partial fulfillment of the requirements for the award of the degree


of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY

Under the Guidance of


DR. DEENA BABU MANDRU
Professor & HOD-IT

DEPARTMENT OF INFORMATION TECHNOLOGY


MALLA REDDY ENGINEERING COLLEGE
An UGC Autonomous Institution, (Approved by AICTE, New Delhi &
Affiliated to JNTUH, Hyderabad) Maisammaguda, Secunderabad, Telangana, India 500100

MAY-2024
MALLA REDDY ENGINEERING COLLEGE
Maisammaguda, Secunderabad, Telangana, India 500100

BONAFIDE CERTIFICATE

This is to certify that this major project work entitled “DEEPFAKE

DETECTION USING LSTM AND RESNEXT”, submitted by ARADHYULA

SHASHANK (20J41A1201), ANNABATHULA SAHITHI (20J41A1202),

BOMPALLY VISHNUVARDHAN (20J41A1210), GUGULOTH

SRIKANTH (20J41A1218) to Malla Reddy Engineering College affiliated to

Jawaharlal Nehru Technological University, Hyderabad in partial fulfillment

for the award of Bachelor of Technology in Information Technology is a

Bonafide record of project work carried out under my/our supervision during the

academic year 2023 – 2024 and that this work has not been submitted elsewhere

for a degree.

Under the Guidance of


Dr. Deena Babu Mandru
Dr. Deena Babu Mandru
Professor and HOD
Professor and HOD Department of
Department of Information Technology
Information Technology
Malla Reddy Engineering College
Malla Reddy Engineering College
Secunderabad, 500 100
Secunderabad, 500 100

Internal Examiner External Examiner

Submitted for Major Project Viva-Voce Examination held on


MALLA REDDY ENGINEERING COLLEGE
Maisammaguda, Secunderabad, Telangana, India
500100

DECLARATION

We hereby declare that the project titled ‘DEEPFAKE DETECTION


USING LSTM AND RESNEXT’, submitted to Malla Reddy Engineering
College (Autonomous) and affiliated with JNTUH, Hyderabad, in partial
fulfillment of the requirements for the award of a Bachelor of Technology in
Information Technology, represents my ideas in my own words. Wherever
others' ideas or words have been included, We have adequately cited and
referenced the original sources. We also declare that we have adhered to all
principles of academic honesty and integrity, and We have not misrepresented,
fabricated, or falsified any idea, data, fact, or source in my submission. We
understand that any violation of the above will be a cause for disciplinary action
by the Institute. It is further declared that the project report or any part thereof
has not been previously submitted to any University or Institute for the award of
a degree or diploma.

Name of the Student Roll.No Signature(s)


A SHASHANK 20J41A1201

A SAHITHI 20J41A1202

B VISHNUVARDHAN 20J41A1210

G SRIKANTH 20J41A1218

i
MALLA REDDY ENGINEERING COLLEGE
Maisammaguda, Secunderabad, Telangana, India 500100

ACKNOWLEDGEMENT

We are extremely thankful to our beloved Chairman and Founder of


Malla Reddy Group of Institutions, Sri. Ch. Malla Reddy, for providing
necessary infrastructure facilities for completing project work successfully.

We express our sincere thanks to our Principal, Dr.A.Ramaswami


Reddy, who took keen interest and encouraged us in every effort during the
project work.

We express our heartfelt thanks to Dr.Deena Babu Mandru, Professor


and HOD, Department of Information Technology, MREC (A) for all the kindly
support and valuable suggestions during the period of our project.
We are extremely thankful to our project coordinator, N. Satish Kumar,
for the constant guidance and support to complete the project work.

We are extremely thankful and indebted to our internal guide, Dr. Deena
Babu Mandru, Professor and HOD, Department of Information Technology,
MREC (A) for his constant guidance, encouragement and moral support
throughout the project.

Finally, we would also like to thank all the faculty and staff of the IT
Department who helped us directly or indirectly, parents and friends for their
cooperation in completing the project work.

ARADHYULA SHASHANK 20J41A1201

ANNABATHULA SAHITHI 20J41A1202

BOMPALLY VISHNUVARDHAN 20J41A1210

GUGULOTH SRIKANTH 20J41A1218

ii
ABSTRACT
The growing computation power has made the deep learning algorithms so

powerful that creating a indistinguishable human synthesized video popularly

called as deep fakes have become very simple. Scenarios where these realistic

face swapped deep fakes are used to create political distress, fake terrorism

events, revenge porn, blackmail peoples are easily envisioned. In this work, we

describe a new deep learning-based method that can effectively distinguish AI-

generated fake videos from real videos.Our method is capable of automatically

detecting the replacement and reenactment deep fakes. We are trying to use

Artificial Intelligence (AI) to fight Artificial Intelligence (AI). Our system uses a

Res-Next Convolution neural network to extract the frame-level features and

these features and further used to train the Long Short-Term Memory (LSTM)

based Recurrent Neural Network (RNN) to classify whether the video is subject

to any kind of manipulation or not, i.e whether the video is deep fake or real

video. To emulate the real time scenarios and make the model perform better on

real time data, we evaluate our method on large amount of balanced and mixed

data-set prepared by mixing the various available data-set like Face-Forensic+

+[1], Deepfake detection challenge[2], and Celeb-DF[3]. We also show how our

system can achieve competitive result using very simple and robust approach.

Keywords: Res-Next Convolution neural network, Recurrent Neural Network


(RNN), Long Short Term Memory (LSTM), Computer vision

iii
PROJECT ACKNOWLEDGEMENT

The work “would not have been possible” without the contribution of the
university. We are indebted to teachers who have offered continuous support
while preparing the project.

We are also grateful to all those “with whom” We had the opportunity to
do the work and complete this project. Each member of the “dissertation
committee” have offered and provided “professional guidance” and have given
us great advice while completing the project.

On a personal note, We are also grateful to my family members who have


offered us continuous support while completing the project. Without help and
support from them, this project would not have been completed.

iv
TABLE OF CONTENTS

DESCRIPTION PAGE NO
ACKNOWLEDGMENT ii
ABSTRACT iii
PROJECT ACKNOWLEDGEMENT iv
TABLE OF CONTENTS v
LIST OF FIGURES vi
LIST OF ABBREVIATIONS vii

S NO CHAPTER PAGE NO
1 INTRODUCTION………………………………………… 01-08
2 BACKGROUND STUDY ……………………………….. 09-30
2.1 Literature Review…………………………………….. 09-18
2.2 Existing System………………………………............. 18-30
2.2.1 DFDC Baseline Model…………………….......... 18-21
2.2.2 Faceforensics++………………………………… 21-24
2.2.3 Deepfake Detection with Capsule Networks…... 24-26
2.2.4 Hybrid Architectures with Attention Mechanisms 27-30
3 METHODOLOGY………………………………….......... 31-46
3.1 Proposed Methodology……………………………….. 31-33
3.1.1 System Architecture……………………………... 34-36
3.2 Modules………………………………………………. 36-42
3.2.1 Data-Set-Gathering……………………………… 36-37
3.2.2 Pre-processing…………………………………... 37-38
3.2.3 Data-Set-split……………………………………. 39-41
3.2.5 Hyper parameter tuning…………………………. 41-42
3.3 System Design..………………………………………... 42-46
3.3.1 Use Case Diagram………………………………. 42
3.3.2 Activity Diagram………………………………... 43-44
3.3.3 Sequence Diagram………………………………. 45
3.3.4 Workflow………………………………………... 46

v
4 RESULT AND ANALYSIS……………………………… 47-52
5 CONCLUSION…………………………………………… 53
6 REFERENCES……………………………………………. 54-55

vi
LIST OF FIGURES

F NO. TITLE PAGE NO.

3.1.1 System Architecture 35


3.1.1 Face Swap Deepfake Generation 36
3.2.1 Dataset 37
3.2.2 Preprocessing of video 38
3.2.3 Train Test Split 39
3.2.4 Overview of model 41
3.3.1 Use case diagram for deepfake detection 42
3.3.1 Activity diagram for deepfake detection 44
3.3.1 Sequence diagram for deepfake detection 45
3.3.1 Workflow diagram for deepfake detection 46
4.1 Home page 48
4.2 Selecting and uploading 49
4.3 Real Video Output 49
4.4 Uploading fake video 50
4.5 Fake video output 50
4.6 Selecting video with no faces 51
4.7 Output of Uploaded video with no faces 51
4.8 Uploading video of size greater than 52
100MB
4.9 Pressing upload without selecting video 52

vii
LIST OF ABBREVIATIONS

LSTM Long-Short Term Memory


ResNext Residual CNN Network
CNN Convolutional Neural Network
GAN Generative Adversarial Networks
RNN Recurrent Neural Network
LRCN Long-term Recurrent Convolutional Networks
SVM Support Vector Machine

vii
i
Deepfake detection using LSTM and ResNext

CHAPTER 1
INTRODUCTION
1.1 Introduction

In recent years, the proliferation of deepfake technology has presented a


formidable challenge to the veracity of digital content. Deepfakes, synthetic
media generated through deep learning algorithms, have emerged as a potent tool
for manipulating audiovisual material, enabling the fabrication of hyper-realistic
videos and audios that can deceive even the most discerning observers. As the
capabilities of deepfake technology continue to evolve, the need for robust
detection mechanisms has become increasingly urgent. This paper delves into the
intricacies of deepfake detection, exploring the methodologies, advancements,
and challenges inherent in safeguarding against synthetic manipulation across
various digital platforms.

The advent of deepfake technology has blurred the line between reality and
fabrication, raising profound concerns about the integrity of visual and auditory
information. Deepfake algorithms leverage deep neural networks to synthesize
realistic images, videos, and audio recordings by seamlessly superimposing one
person's likeness onto another or manipulating existing content to convey false
narratives. With the democratization of AI tools and the widespread availability
of training data, the barrier to creating convincing deepfakes has diminished,
amplifying the risk of malicious exploitation for disinformation, blackmail, or
other nefarious purposes.

In response to the escalating threat posed by deepfakes, researchers and


technologists have embarked on a quest to develop effective detection
mechanisms capable of discerning authentic content from synthetic
manipulations. These efforts encompass a multifaceted approach that combines
machine learning algorithms, computer vision techniques, and forensic analysis
to scrutinize the subtle artifacts and inconsistencies inherent in deepfake media.
One prominent strategy involves leveraging deep learning architectures, such as

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to


discern patterns indicative of tampering or synthesis within digital content.

Furthermore, researchers have devised innovative techniques for detecting


deepfakes by analyzing physiological signals, such as heartbeat patterns or micro-
expressions, that betray the synthetic nature of manipulated media. Moreover,
advancements in blockchain technology have enabled the creation of
decentralized verification systems to authenticate the provenance of digital assets
and mitigate the dissemination of fraudulent content. Despite these pioneering
efforts, deepfake detection remains an ongoing arms race, with adversaries
continually refining their techniques to evade detection and perpetrate deception
at scale.

Moreover, the proliferation of low-cost, user-friendly deepfake generation


tools has democratized the creation of synthetic media, amplifying the challenge
of distinguishing authentic content from sophisticated forgeries. To address this
evolving threat landscape, interdisciplinary collaboration between researchers,
policymakers, and industry stakeholders is imperative to develop robust detection
frameworks that can adapt to emerging deepfake techniques and mitigate their
societal impact.

Furthermore, as deepfake technology continues to evolve, so do the challenges


in detecting these synthetic manipulations. While current detection methods
primarily rely on analyzing visual and auditory cues, such as inconsistencies in
facial expressions or unnatural speech patterns, the rapid advancement of
generative adversarial networks (GANs) poses new hurdles. GANs can produce
increasingly convincing deepfakes by continuously refining their output based on
feedback from discriminative models, making it difficult for traditional detection
algorithms to keep pace.

In response, researchers are exploring novel approaches that go beyond


traditional cues, such as incorporating contextual information and behavioral
analysis. Contextual clues, such as discrepancies in lighting or spatial
inconsistencies within a scene, can provide valuable insights into the authenticity
of media content. Additionally, behavioral analysis techniques, such as examining
user interactions and engagement patterns, can help identify suspicious content

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

dissemination strategies employed by malicious actors.

Moreover, the global spread of deepfake disinformation campaigns has


prompted calls for regulatory intervention and policy initiatives to combat the
societal impact of synthetic media. Governments and regulatory bodies are
grappling with the ethical and legal implications of deepfake technology,
exploring frameworks for content moderation, platform accountability, and data
privacy protection. By fostering collaboration between stakeholders across
academia, industry, and policymaking spheres, we can develop holistic strategies
that address the multifaceted challenges posed by deepfake proliferation.

Despite the formidable obstacles ahead, there is cause for optimism in the
collective efforts to combat deepfake manipulation. The interdisciplinary nature
of this endeavor, spanning fields such as artificial intelligence, cybersecurity,
psychology, and law, underscores the complexity of the problem and the
necessity for a multifaceted approach. By harnessing the power of technological
innovation, ethical considerations, and regulatory frameworks, we can strive
towards a future where the integrity of digital content is safeguarded, and trust in
media authenticity is restored.

Fake news is not just one type of content coming from one source but a
different type of manipulated media created to misinform or deceive readers
deliberately. Usually, this news is visually sensational, made to either influence
people’s views, push a political agenda or cause confusion. Viewers see that fake
reality and share it through social media, and thus, becoming responsible for the
virality of false news. Fake news is increasingly being shared via social media
platforms like Twitter and Facebook. For example, in Eritrea, the polygamy hoax
stating that the government of Eritrea had made it mandatory for each man to
marry two wives was shared in at least four countries, namely Kenya, Nigeria,
Eritrea, and Sudan. The Eritrea Embassy in Nairobi later refuted this story which
initially sounded genuine and trended on Twitter

However, with the advancement in artifi cial intelligence and computer


graphics, a more sophisticated technology called Deepfake has emerged, making
it possible to create fake content that looks realistic to the human eyes.
DeepFake, a combination of deep from the machine learning term “deep

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

learning” and fake content, was named after Reddit’s user name “Deepfake” who
in late 2017 claimed to develop a machine learning algorithm for transferring
celebrity faces into adult contents

The deepfake technology was rapidly extended to create fake news, including
newscasts, pictures, and even leaders’ speeches. A deepfake is a video or still
image manipulated from the original one to replace the person’s identity (face)
with another identity, with or without an audio track using a generative
adversarial network (GAN).A GAN is a type of neural network in which two
different networks coordinate with each other. While one network generates the
fakes, the other evaluates if the generated model is real or fake. With the
evolution of the deepfake technology, three categories of deepfakes have
emerged.

Face Swap (swapping the face of one subject in an image or a video with
another while keeping the rest of the body and background context) lip sync
(matching a subject’s lip movements with pre-recorded audio, while maintaining
the expressions on the rest of the face), and puppet-master (recording the
movements of an “actor” and superimposing that image with another subject).
With the rapid growth of deepfake generation open source tools —such as
DeepFaceLab ,Faceswap , wave2lip , among others—, any person with a laptop
and some basic understanding of programming can create their own deepfake.
Recently, mobile applications (E.g., ZAO , and WOMBO ) are trending on the
Internet due to their ingenious capability of creating deepfakes within seconds.
Taking into account the availability and accessibility of these open-source tools
and applications, their possible malicious usage (like disinformation or online
harassment) by an expert or a novice user, and the improvement in the quality of
deepfakes and deepfake generation algorithms, there is a need of a deepfake
detection system to determine whether an image or a video is deepfake or
original content.

Detection of deepfake contents generated through sophis ticated deep learning


technology is almost impossible by an untrained eye. Various approaches have
been proposed and adopted, including regulations, artificial intelligence, digital
forensics, and authentication to combat deepfakes. For exam ple, in the United
States (US), new legislation and policies related to deepfakes have been enforced
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

over the past year. Different states across the US are defining procedures to
criminalize deepfake pornography and prohibit the use of deepfakes, especially
in the voting context.

Many researchers have proposed and developed deepfake detectors to


automatically extract salient and discriminative features using artificial
intelligence algorithms, such as support vector machines (SVM) , convolutional
neural networks (CNNs) , long-term recurrent convolutional networks (LRCN) ,
and recurrent neural networks (RNN). Recently, Facebook has developed an
artificial intelligence based method of detecting and attributing deepfakes in col
laboration with researchers at the Michigan State University . Though these AI-
trained techniques could detect and identify deepfakes with a reasonable degree
of accuracy, these will become less effective (in terms of accuracy) with the
continuous improvement in the development of deepfake generation algorithms.

In the authors inserted inputs called adversarial examples into every video
frame to show that the current state-of-the-art deepfake detection methods can be
easily bypassed. Another strategy to countermeasure fake videos and images
posted online is to embed a digital watermark in the content . However, digital
watermarks are not foolproof , and this problem can be countered by
incorporating a blockchain , to hold a tamper-proof record of watermark and
content features. In the literature, only one paper can be found, which has
proposed a proof-of-concept deepfake video news detection and preven tion
system using watermarking and blockchain technologies. Digimarc’s robust
audio and image watermarking techniques are employed to embed watermarks in
audio and video tracks of video news clips before their distribution

Blockchain tech nology is used to store video and its metadata for performing
forensic analysis. The watermarks can be detected at social media networks’
portals, nodes, and back ends. For deepfake generation, the authors have used the
face-swapping algorithm to replace the subject’s face with a target person. A two
stage authentication process is performed to detect fake news videos generated
from watermarked video clips. While the f irst stage uses Digimarc’s watermark
reader on the frames of the deepfake video, the second stage uses the information
stored in the blockchain. The proposed scheme provides an informal security
analysis of the deepfake detection scheme and presents the simulation results of
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

the face-swap algorithm for proof of concept. However, the transparency and
robustness of the embedded watermarks have not been evaluated.

This project presents a proof-of-concept system that detects fake news video
clips generated using voice impersonation. A hybrid watermarking method is
proposed that combines robust and fragile speech watermarking schemes, thus
providing copyright protection and tamper-proofing. Cross-referencing of speech
and video features is used to provide resistance against possible copy attacks.
The metadata related to the embedded watermarks and the content features is
stored in the blockchain for tamper-proof recording. The simulations are
performed to evaluate the embedded watermark’s robustness against common
signal processing and video integrity attacks. The rest of the paper is organized
as follows. Section II defines the basic building blocks of the scheme. In Section
III, the design and functionality of the proposed system are described in detail.
The results of the experiments designed to evaluate the performance of the
proposed scheme are presented in Section IV. Finally, in Section V, we present
the conclusions and possible future research directions of this work.

In earlier times, the manipulation of visual content predominantly relied on


conventional image editing tools such as Photoshop and Pixlr. These tools were
commonly used to adjust subjects' features, including slimming figures or
transforming them into animated characters. However, the landscape drastically
transformed with the rapid progression of artificial intelligence (AI) and
computer graphics technologies. This evolution gave rise to a considerably more
sophisticated tool known as Deepfake.

The term "Deepfake" is a portmanteau derived from "deep," representing the


complex layers of machine learning, particularly deep learning, and "fake,"
denoting the artificial nature of the content produced. It gained prominence after
a Reddit user, going by the pseudonym "Deepfake," claimed in late 2017 to have
developed a machine learning algorithm capable of seamlessly transferring
celebrity faces into adult content. This marked the inception of a technology that
has since revolutionized the landscape of fake content creation.

Deepfake technology has introduced a paradigm shift in the creation of


deceptive media. By leveraging AI algorithms and sophisticated computer

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

graphics techniques, it generates content that appears remarkably realistic to the


human eye. Initially associated with adult content, Deepfake applications rapidly
expanded to encompass a wide array of uses, including the fabrication of fake
news. This expansion saw the creation of counterfeit newscasts, manipulated
images, and even simulated speeches delivered by influential figures.

As Deepfake technology continues to advance, its implications for media


integrity and authenticity become increasingly profound. The ease with which
realistic yet fabricated content can be produced raises significant concerns
regarding misinformation, privacy violations, and potential threats to societal
stability. Thus, understanding and addressing the challenges posed by Deepfake
technology remain critical in safeguarding the integrity of visual content in the
digital age.

It becomes very important to spot the difference between the deepfake and pris
tine video. We are using AI to fight AI. Deepfakes are created using tools like
FaceApp and Face Swap, which using pre-trained neural networks like
GANorAutoencoders for these deepfakes creation. Our method uses a LSTM
based artificial neural network to process the sequential temporal analysis of the
video frames and pre-trained Res-Next CNN to extract the frame level fea tures.
ResNext Convolution neural network extracts the frame-level features and these
features are further used to train the Long Short Term Memory based artificial
Recurrent Neural Network to classify the video as Deepfake or real. To emulate
the real time scenarios and make the model perform better on real time data, we
trained our method with large amount of balanced and combination of various
available dataset like FaceForensic++, Deepfake detection challenge, and Celeb-
DF.

The increasing sophistication of mobile camera technology and the ever


growing reach of social media and media sharing portals have made the cre ation
and propagation of digital videos more convenient than ever before. Deep
learning has given rise to technologies that would have been thought impossible
only a handful of years ago. Modern generative models are one example of these,
capable of synthesizing hyper realistic images, speech, music, and even video.
These models have found use in a wide variety of applications, including making
the world more accessible through text-to-speech, and helping generate training
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

data for medical imaging.

To overcome such a situation, Deep fake detection is very important. So, we


describe a new deep learning-based method that can effectively distinguish AI
generated fake videos (Deep Fake Videos) from real videos. It’s incredibly
important to develop technology that can spot fakes, so that the deep fakes can be
identified and prevented from spreading over the internet.

Further to make the ready to use for the customers, we have developed a front
end application where the user the user will upload the video. The video will be
processed by the model and the output will be rendered back to the user with the
classification of the video as deepfake or real and confidence of the model.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

CHAPTER 2
BACKGROUND STUDY

2.1 Literature Review

In the context of deepfake detection, here's a brief survey of key papers and
approaches that are relevant . Keep in mind that there might be newer
developments that aren't covered here:

[1] L. Borges, B. Martins, and P. Calado, “Combining similarity features


and deep representation learning for stance detection in the context of checking
fake news,” vol. 11, no. 3, 2019. Fake news is nowadays an issue of pressing
with similarity features, slightly outperforms the previous state of the art.
concern, given its recent rise as a potential threat to high-quality journalism and
well-informed public discourse. The Fake News Challenge (FNC-1) was
organized in early 2017 to encourage the development of machine-learning-based
classification systems for stance detection (i.e., for identifying whether a
particular news article agrees, disagrees, discusses, or is unrelated to a particular
news headline), thus helping in the detection and analysis of possible instances of
fake news. This article presents a novel approach to tackle this stance detection
problem, based on the combination of string similarity features with a deep
neural network architecture that leverages ideas previously advanced in the
context of learning-efficient text representations, document classification, and
natural language inference. Specifically, we use bi-directional Recurrent Neural
Networks (RNNs), together with max-pooling over the temporal/sequential
dimension and neural attention, for representing (i) the headline, (ii) the first two
sentences of the news article, and (iii) the entire news article. These
representations are then combined/compared, complemented with similarity
features inspired on other FNC-1 approaches, and passed to a final layer that
predicts the stance of the article toward the headline. We also explore the use of
external sources of information, specifically large datasets of sentence pairs
originally proposed for training and evaluating natural language inference
methods to pre-train specific components of the neural network architecture (e.g.,

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

the RNNs used for encoding sentences). The obtained results attest to the
effectiveness of the proposed ideas and show that our model, particularly when
considering pre-training and the combination of neural representations together

[2] “Rana, M. S., Nobi, M. N., Murali, B., & Sung, A. H. (2022).
Deepfake detection: A systematic literature review. IEEE access, 10, 25494-
25513 Over the last few decades, rapid progress in AI, machine learning, and
deep learning has resulted in new techniques and various tools for manipulating
multimedia. Though the technology has been mostly used in legitimate
applications such as for entertainment and education, etc., malicious users have
also exploited them for unlawful or nefarious purposes. For example, high-
quality and realistic fake videos, images, or audios have been created to spread
misinformation and propaganda, foment political discord and hate, or even harass
and blackmail people. The manipulated, high-quality and realistic videos have
become known recently as Deepfake. Various approaches have since been
described in the literature to deal with the problems raised by Deepfake. To
provide an updated overview of the research works in Deepfake detection, we
conduct a systematic literature review (SLR) in this paper, summarizing 112
relevant articles from 2018 to 2020 that presented a variety of methodologies.
We analyze them by grouping them into four different categories: deep learning-
based techniques, classical machine learning-based methods, statistical
techniques, and blockchain-based techniques. We also evaluate the performance
of the detection capability of the various methods with respect to different
datasets and conclude that the deep learning-based methods outperform other
methods in Deepfake detection.

[3] “A system for mitigating the problem of deepfake news videos using
watermarking,” Electronic Imaging, no. 4, pp. 117–1–117–10, 2020. Deepfakes
constitute fake content -generally in the form of video clips and other media
formats such as images or audio- created using deep learning algorithms. With
the rapid development of artificial intelligence (AI) technologies, the deepfake
content is becoming more sophisticated, with the developed detection techniques
proving to be less effective. So far, most of the detection techniques in the
literature are based on AI algorithms and can be considered as passive. This
paper presents a proof-of-concept deepfake detection system that detects fake
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

news video clips generated using voice impersonation. In the proposed scheme,
digital watermarks are embedded in the audio track of a video using a hybrid
speech watermarking technique. This is an active approach for deepfake
detection. A standalone software application can perform the detection of robust
and fragile watermarks. Simulations are performed to evaluate the embedded
watermark's robustness against common signal processing and video integrity
attacks. As far as we know, this is one of the first few attempts to use digital
watermarking for fake content detection.

[4] D. Güera and E. J. Delp, “Deepfake video detection using recurrent


neural networks,” in 15th IEEE International Conference on Advanced Video
and Signal Based Surveillance, 2018, pp. 1–6. In recent months a machine
learning based free software tool has made it easy to create believable face swaps
in videos that leaves few traces of manipulation, in what are known as
"deepfake" videos. Scenarios where these realistic fake videos are used to create
political distress, blackmail someone or fake terrorism events are easily
envisioned. This paper proposes a temporal-aware pipeline to automatically
detect deepfake videos. Our system uses a convolutional neural network (CNN)
to extract frame-level features. These features are then used to train a recurrent
neural network (RNN) that learns to classify if a video has been subject to
manipulation or not. We evaluate our method against a large set of deepfake
videos collected from multiple video websites. We show how our system can
achieve competitive results in this task while using a simple architecture.

[5] D. Megías, M. Kuribayashi, A. Rosales, and W. Mazurczyk,


“DISSIMILAR: Towards fake news detection using information hiding, signal
processing and machine learning,” in 16th International Conference on
Availability, Reliability and Security (ARES), 2021. Digital media have changed
the classical model of mass media that considers the transmitter of a message and
a passive receiver, to a model where users of the digital media can appropriate
the contents, recreate, and circulate them. In this context, online social media are
a suitable circuit for the distribution of fake news and the spread of
disinformation. Particularly, photo and video editing tools and recent advances in
artificial intelligence allow non-professionals to easily counterfeit multimedia
documents and create deep fakes. To avoid the spread of disinformation, some
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

online social media deploy methods to filter fake content. Although this can be
an effective method, its centralized approach gives an enormous power to the
manager of these services. Considering the above, this paper outlines the main
principles and research approach of the ongoing DISSIMILAR project, which is
focused on the detection of fake news on social media platforms using
information hiding techniques, in particular, digital watermarking, combined
with machine learning approacheslivestock management schemes. The
standardisation by IOS of the next generation of injectable electronic
transponders opened a world-wide market for all species of animals. The third
generation, currently under development, includes also read/write possibilities
and sensor technologies for automatic monitoring of animal health and
performance. The addition of these sensors facilitates the automation of
sophisticated tasks such as health and reproduction status monitoring. Examples
are given of the automated oestrus detection by signal processing of one or more
sensor signals.

[6] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S.


Guadarrama, K. Saenko, and T. Darrell, “Long-term recurrent convolutional
networks for visual recognition and description,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 4, pp. 677–691, 2017. Models
comprised of deep convolutional network layers have dominated recent image
interpretation tasks; we investigate whether models which are also
compositional, or "deep", temporally are effective on tasks involving visual
sequences or label sequences. We develop a novel recurrent convolutional
architecture suitable for large-scale visual learning which is end-to-end trainable,
and demonstrate the value of these models on benchmark video recognition
tasks, image to sentence generation problems, and video narration challenges. In
contrast to current models which assume a fixed spatio-temporal receptive field
or simple temporal averaging for sequential processing, recurrent convolutional
models are "doubly deep" in that they can be compositional in spatial and
temporal "layers". Such models may have advantages when target concepts are
complex and/or training data are limited. Learning long-term dependencies is
possible when nonlinearities are incorporated into the network state updates.
Long-term RNN models are appealing in that they directly can map variable

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

length inputs (i.e. video frames) to variable length outputs (i.e. natural language
text) and can model complex temporal dynamics; yet they can be optimized with
backpropagation. Our recurrent long-term models are directly connected to state-
of-the-art visual convnet models and can jointly trained, updating temporal
dynamics and convolutional perceptual representations simultaneously. Our
results show such models have distinct advantages over state-of-the-art models
for recognition or generation which are separately defined and/or optimized.

[7] E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and P.


Natarajan, “Recurrent convolutional strategies for face manipulation detection in
videos,” in IEEE Conference on Computer Vision and Pattern Recognition
Workshops, CVPR Workshops, 2019, pp. 80–87. The spread of misinformation
through synthetically generated yet realistic images and videos has become a
significant problem, calling for robust manipulation detection methods. Despite
the predominant effort of detecting face manipulation in still images, less
attention has been paid to the identification of tampered faces in videos by taking
advantage of the temporal information present in the stream. Recurrent
convolutional models are a class of deep learning models which have proven
effective at exploiting the temporal information from image streams across
domains. We thereby distill the best strategy for combining variations in these
models along with domain specific face preprocessing techniques through
extensive experimentation to obtain state-of-the-art performance on publicly
available videobased facial manipulation benchmarks. Specifically, we attempt to
detect Deepfake, Face2Face and FaceSwap tampered faces in video streams.
Evaluation is performed on the recently introduced FaceForensics++ dataset,
improving the previous state-of-the-art by up to 4.55% in accuracy

[8] “Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y. G. (2020, October).
Wilddeepfake: A challenging real-world dataset for deepfake detection.
In Proceedings of the 28th ACM international conference on multimedia (pp.
2382-2390).” In recent years, the abuse of a face swap technique called deepfake
has raised enormous public concerns. So far, a large number of deepfake videos
(known as "deepfakes") have been crafted and uploaded to the internet, calling
for effective countermeasures. One promising countermeasure against deepfakes
is deepfake detection. Several deepfake datasets have been released to support
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

the training and testing of deepfake detectors, such as DeepfakeDetection [1] and
FaceForensics++ [23]. While this has greatly advanced deepfake detection, most
of the real videos in these datasets are filmed with a few volunteer actors in
limited scenes, and the fake videos are crafted by researchers using a few popular
deepfake softwares. Detectors developed on these datasets may become less
effective against real-world deepfakes on the internet. To better support detection
against real-world deepfakes, in this paper, we introduce a new dataset
WildDeepfake, which consists of 7,314 face sequences extracted from 707
deepfake videos collected completely from the internet. WildDeepfake is a small
dataset that can be used, in addition to existing datasets, to develop and test the
effectiveness of deepfake detectors against real-world deepfakes. We conduct a
systematic evaluation of a set of baseline detection networks on both existing and
our WildDeepfake datasets, and show that WildDeepfake is indeed a more
challenging dataset, where the detection performance can decrease drastically.

[9] S. Hussain, P. Neekhara, M. Jere, F. Koushanfar, and J. McAuley,


“Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to
adversarial examples,” in IEEE Winter Conference on Applications of Computer
Vision, 2021, pp. 3347–3356. Recent advances in video manipulation techniques
have made the generation of fake videos more accessible than ever before.
Manipulated videos can fuel disinformation and reduce trust in media. Therefore
detection of fake videos has garnered immense interest in academia and industry.
Recently developed Deepfake detection methods rely on deep neural networks
(DNNs) to distinguish AI-generated fake videos from real videos. In this work,
we demonstrate that it is possible to bypass such detectors by adversarially
modifying fake videos synthesized using existing Deepfake generation methods.
We further demonstrate that our adversarial perturbations are robust to image and
video compression codecs, making them a real-world threat. We present
pipelines in both white-box and black-box attack scenarios that can fool DNN
based Deepfake detectors into classifying fake videos as real.

[10] K. Zhang, Z. Zhang, Z.and Li, , and Y. Qiao, “Joint face detection
and alignment using multitask cascaded convolutional networks,” IEEE Signal
Processing Letters, vol. 23, no. 10, p. 1499–1503, 2016. Face detection and
alignment in unconstrained environment are challenging due to various poses,
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

illuminations, and occlusions. Recent studies show that deep learning approaches
can achieve impressive performance on these two tasks. In this letter, we propose
a deep cascaded multitask framework that exploits the inherent correlation
between detection and alignment to boost up their performance. In particular, our
framework leverages a cascaded architecture with three stages of carefully
designed deep convolutional networks to predict face and landmark location in a
coarse-to-fine manner. In addition, we propose a new online hard sample mining
strategy that further improves the performance in practice. Our method achieves
superior accuracy over the state-of-the-art techniques on the challenging face
detection dataset and benchmark and WIDER FACE benchmarks for face
detection, and annotated facial landmarks in the wild benchmark for face
alignment, while keeps real-time performance.

[11] “Ahmed, S. R., Sonuç, E., Ahmed, M. R., & Duru, A. D. (2022,
June). Analysis survey on deepfake detection and recognition with convolutional
neural networks. In 2022 International Congress on Human-Computer
Interaction, Optimization and Robotic Applications (HORA) (pp. 1-7). IEEE.
Deep Learning (DL) is the most efficient technique to handle a wide range of
challenging problems such as data analytics, diagnosing diseases, detecting
anomalies, etc. The development of DL has raised some privacy, justice, and
national security issues. Deepfake is a DL-based application that has been very
popular in recent years and is one of the reasons for these problems. Deepfake
technology can create fake images and videos that are difficult for humans to
recognize as real or not. Therefore, it needs to be proposed some automated
methods for devices to detect and evaluate threats. In another word, digital and
visual media must maintain their integrity. A set of rules used for Deepfake and
some methods to detect the content created by Deepfake have been proposed in
the literature. This paper summarizes what we have in the critical discussion
about the problems, opportunities, and prospects of Deepfake technology. We
aim for this work to be an alternative guide to getting knowledge of Deepfake
detection methods. First, we cover Deepfake history and Deepfake techniques.
Then, we present how a better and more robust Deepfake detection method can
be designed to deal with fake content.

[12] Deepfake defense not only requires the research of detection but also
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

requires the efforts of generation methods. However, current deepfake methods suffer
the effects of obscure workflow and poor performance. To solve this problem, we
present DeepFaceLab, the current dominant deepfake framework for face-swapping. It
provides the necessary tools as well as an easy-to-use way to conduct high-quality
face-swapping. It also offers a flexible and loose coupling structure for people who
need to strengthen their pipeline with other features without writing complicated
boilerplate code. We detail the principles that drive the implementation of
DeepFaceLab and introduce its pipeline, through which every aspect of the pipeline
can be modified painlessly by users to achieve their customization purpose. It is
noteworthy that DeepFaceLab could achieve cinema-quality results with high fidelity.
We demonstrate the advantage of our system by comparing our approach with other
face-swapping methods

[13] Y. Mirsky and W. Lee, “The creation and detection of deepfakes: A


survey,” ACM Computing Survey, vol. 54, no. 1, 2021. Generative deep learning
algorithms have progressed to a point where it is difficult to tell the difference between
what is real and what is fake. In 2018, it was discovered how easy it is to use this
technology for unethical and malicious applications, such as the spread of
misinformation, impersonation of political leaders, and the defamation of innocent
individuals. Since then, these “deepfakes” have advanced significantly.In this article,
we explore the creation and detection of deepfakes and provide an in-depth view as to
how these architectures work. The purpose of this survey is to provide the reader with
a deeper understanding of (1) how deepfakes are created and detected, (2) the current
trends and advancements in this domain, (3) the shortcomings of the current defense
solutions, and (4) the areas that require further research and attention

[14] ““Guarnera, Luca, Oliver Giudice, and Sebastiano Battiato.


"Deepfake detection by analyzing convolutional traces." Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition workshops.
2020.?” The Deepfake phenomenon has become very popular nowadays thanks
to the possibility to create incredibly realistic images using deep learning tools,
based mainly on ad-hoc Generative Adversarial Networks (GAN). In this work
we focus on the analysis of Deepfakes of human faces with the objective of
creating a new detection method able to detect a forensics trace hidden in images:
a sort of fingerprint left in the image generation process. The proposed technique,
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

by means of an Expectation Maximization (EM) algorithm, extracts a set of local


features specifically addressed to model the underlying convolutional generative
process. Ad-hoc validation has been employed through experimental tests with
naive classifiers on five different architectures (GDWCT, STARGAN,
ATTGAN, STYLEGAN, STYLEGAN2) against the CELEBA dataset as
ground-truth for non-fakes. Results demonstrated the effectiveness of the
technique in distinguishing the different architectures and the corresponding
generation process.

[15] F. Deguillaume, S. Voloshynovskiy, and T. Pun, “Secure hybrid


robust watermarking resistant against tampering and copy attack,”.Digital
watermarking appears today as an efficient mean of securing multimedia documents.
Several application scenarios in the security of digital watermarking have been pointed
out, each of them with different requirements. The three main identified scenarios are:
copyright protection, i.e. protecting ownership and usage rights; tamper proofing,
aiming at detecting malicious modifications; and authentication, the purpose of which
is to check the authenticity of the originator of a document. While robust watermarks,
which survive to any change or alteration of the protected documents, are typically
used for copyright protection, tamper proofing and authentication generally require
fragile or semi-fragile watermarks in order to detect modified or faked documents.
Further, most of robust watermarking schemes are vulnerable to the so-called copy
attack, where a watermark can be copied from one document to another by any
unauthorized person, making these schemes inefficient in all authentication
applications. In this paper, we propose a hybrid watermarking method joining a robust
and a fragile or semi-fragile watermark, and thus combining copyright protection and
tamper proofing. As a result this approach is at the same time resistant against copy
attack. In addition, the fragile information is inserted in a way which preserves
robustness and reliability of the robust part. The numerous tests and the results
obtained according to the Stirmark benchmark demonstrate the superior performance
of the proposed approach.

[16] Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., & Xia, W. (2021).
Learning self-consistency for deepfake detection. In Proceedings of the
IEEE/CVF international conference on computer vision (pp. 15023-15033). We
propose a new method to detect deepfake images using the cue of the source
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

feature inconsistency within the forged images. It is based on the hypothesis that
images' distinct source features can be preserved and extracted after going
through state-of-the-art deepfake generation processes. We introduce a novel
representation learning approach, called pair-wise self-consistency learning
(PCL), for training ConvNets to extract these source features and detect deepfake
images. It is accompanied by a new image synthesis approach, called
inconsistency image generator (I2G), to provide richly annotated training data for
PCL. Experimental results on seven popular datasets show that our models
improve averaged AUC from 96.45% to 98.05% over the state of the art in the
in-dataset evaluation and from 86.03% to 92.18% in the cross-dataset evaluation.

2.2 Existing System


Several existing systems for deepfake detection utilize LSTM (Long Short-Term
Memory) networks and ResNext CNN (Residual Next Convolutional Neural
Network) architectures due to their effectiveness in capturing temporal and
spatial patterns in multimedia data. Here are a few examples:

[1] DFDC Baseline Model

[2] FaceForensics++

[3] DeepFake Detection with Capsule Networks

[4] Hybrid Architectures with Attention Mechanisms

2.2.1 DFDC Baseline Model

The DFDC (DeepFake Detection Challenge) Baseline Model is a pivotal tool


in the ongoing battle against the proliferation of deepfake videos, which are
digitally altered videos often created with malicious intent. This model serves as
a foundational framework for detecting such manipulations. At its core, the
baseline model utilizes a combination of advanced machine learning techniques,
particularly deep neural networks, to analyze and scrutinize the authenticity of
videos. It employs a multifaceted approach, leveraging both spatial and temporal
cues within the multimedia content.

Spatial analysis involves examining individual frames of the video to identify

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

discrepancies or anomalies that may indicate manipulation. This aspect often


relies on convolutional neural networks (CNNs), such as ResNet (Residual
Network) architectures, which excel in extracting intricate features from images.
These networks are adept at scrutinizing facial expressions, subtle distortions,
and inconsistencies that may betray the artificial nature of the content.

Temporal analysis, on the other hand, focuses on the temporal flow and
coherence of the video sequence. This facet is particularly crucial for detecting
deepfakes, as they often struggle to maintain consistency over time, leading to
irregularities in motion or facial dynamics. Long Short-Term Memory (LSTM)
networks play a central role in this aspect, enabling the model to capture and
analyze temporal patterns effectively. By considering the progression of actions
and expressions throughout the video, the baseline model can discern whether the
observed behaviors are natural or artificially generated.

Furthermore, the DFDC Baseline Model incorporates robust training strategies


to enhance its accuracy and generalization capabilities. It leverages large-scale
datasets containing both authentic and deepfake videos to train the neural
network models comprehensively. Through iterative training and validation
procedures, the model learns to distinguish between genuine and manipulated
content across various scenarios and contexts.

Overall, the DFDC Baseline Model represents a crucial advancement in the


field of deepfake detection, offering a sophisticated and adaptive solution to
combat the growing threat posed by synthetic media. By combining spatial and
temporal analysis techniques within a deep learning framework, this model
exemplifies the interdisciplinary synergy between computer vision, machine
learning, and cybersecurity, paving the way for more resilient defenses against
digital deception.

DISADVANTAGES
Certainly, here are some disadvantages of DFDC Baseline Model Detecting
deepfakes using the DFDC (DeepFake Detection Challenge) Baseline Model has
several disadvantages, which are as follows:

1. Limited Detection Capability: The DFDC Baseline Model may not be

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

equipped to detect deepfakes generated using advanced techniques or those


created with newer algorithms. It relies on certain features and patterns that may
not capture all variations of deepfake manipulation.

2. High False Positive Rate: Due to its reliance on specific features, the baseline
model may misclassify authentic videos as deepfakes, leading to a high false
positive rate. This can cause unnecessary concern or mistrust, particularly in
contexts where accurate identification is crucial.

3. Resource Intensive: Implementing the DFDC Baseline Model may require


significant computational resources, particularly for large-scale video analysis.
This could pose challenges for real-time detection or applications operating
under resource constraints.

4. Limited Generalization: The model's effectiveness may vary across different


types of videos, ethnicities, or languages. Its performance may degrade when
applied to videos that differ significantly from those in its training dataset,
limiting its generalization capability.

5. Susceptibility to Adversarial Attacks: Like many machine learning models,


the DFDC Baseline Model is vulnerable to adversarial attacks. Attackers can
manipulate videos in subtle ways to evade detection, exploiting weaknesses in
the model's decision boundaries.

6. Privacy Concerns: Deepfake detection often involves analyzing video


content, which raises privacy concerns, particularly when dealing with sensitive
or personal data. Deployment of the baseline model may raise questions about
data privacy and user consent.

7. Ethical Considerations: Misidentification of authentic content as deepfakes


can have serious consequences, including reputational damage or legal
ramifications. Ethical concerns arise regarding the potential impact of false
accusations resulting from the model's limitations.

8. Maintenance and Updates: Keeping the DFDC Baseline Model up-to-date


with emerging deepfake techniques and evolving threats requires continuous
maintenance and updates. Failure to adapt to new methods could render the
model obsolete or less effective over time.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

9. Complexity of Implementation: Integrating deepfake detection systems into


existing platforms or workflows may be complex and time-consuming. It may
require specialized knowledge and resources for deployment and maintenance.

10. Scalability Issues: Scaling deepfake detection systems to handle large


volumes of video content can be challenging, especially if the baseline model's
performance deteriorates under increased workload or if the infrastructure lacks
scalability.

Addressing these disadvantages requires a comprehensive understanding of


the limitations of the DFDC Baseline Model and the development of more robust
and adaptive deepfake detection techniques.presented in a pointwise format:

2.2.2FaceForensics++

Detecting deepfakes is crucial in today's digital landscape, where manipulated


videos can spread misinformation or cause harm. FaceForensics++ stands out as
a prominent tool for this purpose, offering a comprehensive approach to
identifying manipulated facial imagery. It employs advanced machine learning
techniques to scrutinize videos for signs of tampering.

At the heart of FaceForensics++ lies a deep convolutional neural network


(CNN), a type of artificial intelligence specifically designed to analyze visual
data. This network has been trained on a vast dataset of both authentic and
manipulated facial images, enabling it to recognize subtle discrepancies that may
indicate a deepfake. Through this extensive training, the network has learned to
distinguish between genuine facial expressions and those artificially generated by
deepfake algorithms.

One key feature of FaceForensics++ is its utilization of spatial and temporal


information. By considering not only individual frames but also the sequence of
frames over time, the system can detect inconsistencies that might be overlooked
in a single image. This approach mirrors the way humans perceive videos, taking
into account the dynamic nature of facial movements and expressions.

Moreover, FaceForensics++ incorporates state-of-the-art techniques such as


optical flow analysis, which tracks the motion of pixels between frames. This
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

allows the system to identify anomalies in facial movements that are


characteristic of deepfake videos. By analyzing the flow of pixels, it can detect
unnatural transitions or distortions that suggest manipulation. Additionally,
FaceForensics++ leverages ensemble learning, a strategy that combines multiple
detection models to enhance accuracy. By aggregating the predictions of
different algorithms, the system can achieve greater robustness against various
types of deepfake techniques. This ensemble approach helps mitigate the risk of
false positives or negatives, ensuring reliable detection performance across
diverse scenarios.

In summary, FaceForensics++ offers a sophisticated framework for deepfake


detection, leveraging deep learning, temporal analysis, optical flow, and
ensemble learning techniques. By scrutinizing both spatial and temporal aspects
of facial imagery, it can effectively identify manipulated videos with a high
degree of accuracy. As deepfake technology continues to evolve, tools like
FaceForensics++ play a crucial role in safeguarding against the spread of
misinformation and preserving the integrity of digital media.

DISADVANTAGES
Detecting deepfakes using FaceForensics++ has several limitations and
disadvantages. Here's a detailed breakdown in point format:

1. Limited Dataset: FaceForensics++ relies on a dataset of manipulated videos


to train its detection algorithms. However, this dataset might not encompass the
full spectrum of deepfake techniques, making the detection system less effective
against newer and more sophisticated deepfake methods.

2. Generalization Issues: The algorithms trained on the FaceForensics++ dataset


might struggle to generalize well to unseen or slightly different types of deepfake
videos. This could result in false positives or false negatives when detecting
deepfakes that deviate significantly from the training data.

3. Adversarial Attacks: Deepfake creators can potentially exploit vulnerabilities


in the detection algorithms by intentionally crafting deepfakes to evade detection.
Adversarial attacks could target specific weaknesses in the FaceForensics++
detection system, making it less reliable in real-world scenarios.
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

4. Resource Intensive: Deepfake detection using FaceForensics++ may require


significant computational resources, particularly if processing large volumes of
video data in real-time. This could pose challenges for applications that require
efficient and timely deepfake detection, such as social media platforms or video
hosting websites.

5. Inference Time: The time it takes to analyze and detect deepfakes using
FaceForensics++ could be relatively high, especially for high-resolution videos
or when using complex detection models. This latency may not be acceptable for
applications where real-time detection is critical.

6. False Positives/Negatives: Like any detection system, FaceForensics++ is


susceptible to producing false positives (identifying authentic videos as
deepfakes) or false negatives (failing to detect actual deepfakes). These errors
can undermine trust in the detection system and lead to unintended
consequences, such as the misidentification of legitimate content.

7. Privacy Concerns: Deepfake detection techniques like FaceForensics++ often


rely on analyzing facial features and other visual cues, raising privacy concerns
regarding the potential misuse or unauthorized access to personal data captured
in videos.

8. Ethical Considerations: There are ethical implications associated with


deepfake detection, particularly concerning the potential for false accusations or
the infringement of individuals' rights to freedom of expression. Implementing
deepfake detection measures must be balanced with respect for privacy and civil
liberties.

9. Continuous Evolution of Deepfake Technology: As deepfake technology


continues to evolve, detection methods like FaceForensics++ may become less
effective over time. Newer deepfake techniques could surpass the capabilities of
existing detection algorithms, necessitating ongoing research and development to
keep pace with emerging threats.

10. Accessibility and Implementation Challenges: Deploying deepfake


detection systems like FaceForensics++ across various platforms and
applications may pose implementation challenges, particularly for organizations
with limited resources or technical expertise.
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

In summary, while FaceForensics++ and similar tools play a crucial role in


combating deepfake proliferation, they are not without limitations. Overcoming
these disadvantages requires ongoing research, innovation, and collaboration
across various disciplines.

2.2.3 DeepFake Detection with Capsule Networks

Detecting deepfake videos is a crucial challenge in today's digital landscape,


where manipulated multimedia content can spread rapidly and deceive viewers.
One advanced approach gaining traction in this field involves the utilization of
Capsule Networks, a relatively novel type of neural network architecture. Unlike
traditional Convolutional Neural Networks (CNNs), Capsule Networks excel at
capturing hierarchical relationships between features, making them well-suited
for tasks requiring precise spatial relationships and object recognition.

Capsule Networks operate on the principle of capsules, which are groups of


neurons specifically designed to encode various attributes of an object or feature.
These capsules work together to represent different parts of an image, allowing
the network to understand the spatial arrangement and pose of objects within the
scene. This capability is particularly valuable in deepfake detection, where subtle
inconsistencies in facial expressions, lighting, or shadows can indicate
manipulation.

One of the key advantages of Capsule Networks in deepfake detection lies in


their ability to preserve spatial relationships between features. Traditional CNNs
often struggle with this aspect, as they rely on pooling layers that discard spatial
information. In contrast, Capsule Networks maintain detailed information about
the relative positions and orientations of features, enabling them to identify
discrepancies indicative of deepfake manipulation.

Furthermore, Capsule Networks are inherently robust to certain types of


adversarial attacks, which are techniques used to fool neural networks into
making incorrect predictions. By encoding information in the form of capsules
and routing signals based on spatial agreements between features, Capsule
Networks can better resist such attacks, enhancing the reliability of deepfake

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

detection systems.

In practice, deepfake detection using Capsule Networks involves training the


network on a diverse dataset of both authentic and manipulated videos. During
training, the network learns to distinguish between genuine and fake content by
analyzing the spatial relationships and patterns encoded in the capsule
representations. Once trained, the network can then be deployed to identify
potential deepfake videos with a high degree of accuracy.

Overall, Capsule Networks offer a promising avenue for improving the


effectiveness of deepfake detection systems. Their ability to capture spatial
relationships and resist adversarial attacks makes them well-suited for discerning
subtle inconsistencies characteristic of manipulated videos

DISADVANTAGES

Detecting deepfakes using Capsule Networks (CapsNets) has its own set of
disadvantages, despite being a promising approach. Below are some detailed
points highlighting these

1. Limited Data: Capsule Networks require a significant amount of data to train


effectively. However, deepfake datasets might not be as extensive or diverse as
needed to train CapsNets robustly. This could lead to overfitting or poor
generalization to unseen deepfake variations.
2. Computational Complexity: Capsule Networks are computationally intensive
compared to traditional neural networks like CNNs (Convolutional Neural
Networks). Training CapsNets for deepfake detection requires substantial
computational resources, which might not be feasible for all organizations or
researchers, especially those with limited computing infrastructure.
3. Training Instability: Capsule Networks are relatively new compared to
CNNs and might suffer from training instability issues, such as vanishing or
exploding gradients. This instability can make it challenging to converge to an
optimal solution during training, affecting the detection accuracy of deepfakes.
4. Interpretability: While Capsule Networks offer improved interpretability
compared to traditional CNNs, they might still lack transparency in certain cases.
Understanding the inner workings of CapsNets and how they make decisions
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

regarding deepfake detection can be challenging, which could hinder their


adoption and trustworthiness in real-world applications.
5. Domain Adaptation: Deepfake detection using Capsule Networks might
struggle with domain adaptation issues. Deepfakes can vary significantly in terms
of visual quality, resolution, and content, and CapsNets might not generalize well
across these diverse domains without extensive domain adaptation techniques.
6. Adversarial Attacks: Capsule Networks, like other deep learning models, are
susceptible to adversarial attacks. Attackers can manipulate deepfake videos to
evade detection by exploiting vulnerabilities in the CapsNet architecture.
Adversarial training and robustness techniques may be necessary to mitigate
these attacks effectively.
7. Resource Requirements: Training Capsule Networks for deepfake detection
requires large amounts of annotated data, computational power, and time.
Acquiring labeled datasets for deepfake detection can be expensive and time-
consuming, especially considering the rapid evolution of deepfake techniques,
which necessitates frequent retraining.
8. Scalability: Capsule Networks might face scalability challenges when applied
to large-scale deepfake detection tasks. As the volume of deepfake content
continues to increase, CapsNets might struggle to maintain detection
performance without sacrificing efficiency or requiring significant architectural
modifications.
9. Model Size: Capsule Networks tend to have more parameters compared to
CNNs, leading to larger model sizes. Deploying deepfake detection systems
based on CapsNets in resource-constrained environments, such as mobile devices
or edge devices, could be impractical due to memory and computational
constraints.
10. Limited Research and Development: Compared to CNNs, Capsule
Networks have received less research and development attention in the context of
deepfake detection. As a result, there might be fewer pre-trained models, fewer
readily available implementation frameworks, and less community support for
CapsNets in this domain, making it challenging to adopt them effectively.
These disadvantages highlight the complexities and challenges associated with
using Capsule Networks for deepfake detection, underscoring the need for further

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

research and development to overcome these limitations and realize their full
potential in combating the spread of deepfake content.

2.2.4 Hybrid Architectures with Attention Mechanisms


Deepfake detection has become a crucial area of research due to the
proliferation of manipulated media content. Hybrid architectures incorporating
attention mechanisms have emerged as a promising approach to tackle this
challenge effectively. These architectures combine the strengths of different
neural network components to discern subtle cues indicative of deepfakes while
leveraging attention mechanisms to focus on relevant features.

At the core of these hybrid architectures lie Convolutional Neural Networks


(CNNs) and Recurrent Neural Networks (RNNs), particularly Long Short-Term
Memory (LSTM) networks. CNNs excel at capturing spatial patterns within
images, making them adept at identifying visual anomalies or inconsistencies
typical of deepfake manipulation. On the other hand, RNNs, with their ability to
model sequential data over time, are instrumental in discerning temporal patterns
such as facial expressions and movements.

In the context of deepfake detection, attention mechanisms play a pivotal role


in enhancing the model's ability to discern relevant features while suppressing
irrelevant ones. By dynamically weighting the importance of different regions or
frames within the input data, attention mechanisms enable the network to focus
on salient details crucial for discrimination. This selective attention mechanism
not only improves detection accuracy but also enhances the model's
interpretability by highlighting the areas contributing most to its decisions.

One common implementation of attention mechanisms in hybrid architectures


is through the integration of attention layers within the network structure. These
attention layers operate in conjunction with convolutional and recurrent layers,
allowing the model to adaptively attend to relevant spatial and temporal features
at different levels of abstraction. Through this hierarchical attention mechanism,
the network can effectively capture both local and global cues indicative of
deepfake manipulation.

Moreover, hybrid architectures with attention mechanisms often incorporate

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

additional modules for feature fusion and decision fusion. Feature fusion
modules enable the integration of spatial and temporal information extracted by
different components of the network, facilitating a comprehensive understanding
of the input data. Decision fusion mechanisms combine the outputs of multiple
branches or modalities within the network, further enhancing the model's
discriminative power.

In practice, the training of hybrid architectures with attention mechanisms


requires large-scale datasets containing both authentic and manipulated media
samples. These datasets serve as the foundation for training deep learning models
to differentiate between genuine and fake content accurately. Furthermore,
continual refinement and validation of these models are essential to adapt to
evolving deepfake generation techniques and ensure robust detection
performance in real-world scenarios.

In conclusion, hybrid architectures incorporating attention mechanisms


represent a state-of-the-art approach to deepfake detection. By synergistically
combining CNNs, RNNs, and attention mechanisms, these models achieve
superior performance in discerning subtle cues indicative of manipulation while
enhancing interpretability and robustness. As deepfake technology continues to
advance, the development of innovative detection methodologies remains crucial
in combating the proliferation of synthetic media manipulation.

DISADVANTAGES

Detecting deepfakes using hybrid architectures with attention mechanisms can


offer significant advantages, but there are also several notable disadvantages
associated with this approach:

1. Complexity: Hybrid architectures with attention mechanisms typically


involve intricate neural network structures combining various components
such as convolutional layers, recurrent layers, and attention mechanisms.
Managing the complexity of such models can be challenging, requiring
extensive computational resources and expertise in both machine learning
and computer vision.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

2. Training Data Requirements: Building effective deepfake detection models


requires large amounts of diverse training data encompassing both real and fake
videos. Curating such datasets can be labor-intensive and may raise ethical
concerns regarding the collection and use of potentially sensitive content.

3. Limited Generalization: Deepfake detection models based on hybrid


architectures with attention mechanisms may struggle to generalize well to
unseen deepfake variations or distribution shifts. They might overfit to specific
types of manipulations present in the training data, leading to reduced
performance on novel deepfake techniques.

4. Computational Intensity: Hybrid architectures with attention mechanisms


often demand significant computational resources for both training and
inference. Deploying such models in real-time applications or resource-
constrained environments may not be feasible due to high computational costs.

5. Vulnerability to Adversarial Attacks: Deep learning models, including


those based on hybrid architectures with attention mechanisms, are susceptible to
adversarial attacks. Malicious actors can manipulate input data to evade
detection, undermining the reliability of the detection system.

6. Interpretability: The intricate nature of hybrid architectures with attention


mechanisms can hinder the interpretability of the deepfake detection process.
Understanding how these models arrive at their decisions may be challenging,
limiting the transparency and trustworthiness of the detection system.

7. Robustness to Manipulation: Deepfake creators continually evolve their


techniques to produce more convincing forgeries. Hybrid architectures with
attention mechanisms may struggle to keep pace with these advancements,
leading to decreased effectiveness in detecting sophisticated deepfakes.

8. Resource Consumption: Implementing and maintaining deepfake detection

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

systems based on hybrid architectures with attention mechanisms require


significant financial and human resources. Organizations must allocate funds for
hardware infrastructure, software development, and personnel training,
potentially straining budgets and manpower.

9. Ethical Considerations: Deepfake detection efforts must navigate ethical


considerations surrounding privacy, consent, and potential misuse of the
technology. The deployment of sophisticated detection systems raises questions
about surveillance, censorship, and the balance between security and individual
freedoms.

10. False Positives and Negatives: Despite advancements in deepfake detection,


no system is perfect. Hybrid architectures with attention mechanisms may still
produce false positives (labeling authentic videos as deepfakes) or false
negatives (failing to detect actual deepfakes), impacting the reliability and
trustworthiness of the detection process.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

CHAPTER 3
METHODOLOGY

3.1 PROPOSED METHODOLOGY


In our approach to detect deepfakes, we combine traditional computer vision
methods with advanced deep learning techniques. First, we prepare the input
videos by making sure they're in the right format and enhancing their quality.
Then, we extract important features from these videos, looking at both the spatial
and temporal aspects using methods like optical flow analysis and frame
differencing. These features are passed into a hybrid deep learning model that
includes attention mechanisms. This model has layers designed to capture patterns
that indicate manipulation, both at local and global levels. The attention
mechanisms help the model focus on key parts of the videos, making it more
accurate. We train the model on a mix of real and synthetic videos, making sure it
can detect new types of deepfakes. After training, we thoroughly test the model on
various datasets and real situations to check its accuracy, how well it handles
different situations, and how fast it runs. We fine-tune and optimize the model as
needed to improve its performance and keep up with new deepfake techniques.
Before using the model in real-world situations, we make sure it meets ethical
standards and consider its broader impacts on society.

LSTM:

Long Short-Term Memory (LSTM) networks, belonging to the realm of


recurrent neural networks (RNNs), are exceptionally suitable for processing and
scrutinizing sequences of data, finding relevance in diverse applications, including
the identification of deepfakes. Within the context of deepfake detection, LSTMs
serve a multifaceted purpose. Initially, they excel in the domain of sequence
modeling, crucial for scrutinizing video frames to pinpoint discrepancies or
irregularities. This involves the adeptness of LSTMs in grasping the temporal
interdependencies between frames, with each frame represented as a feature
vector, facilitating the LSTM's sequential processing to encapsulate the video's
dynamics.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

Moreover, LSTMs showcase prowess in feature acquisition, being proficient in


extracting intricate patterns from sequential data. Specifically, in the realm of
deepfake detection, LSTMs are adept at discerning pertinent features from the
series of frames, including subtle cues like facial expressions, movements, and
other distinctive traits delineating authentic videos from deepfake counterparts.
These acquired features hold significant utility, subsequently employed in tasks
like classification or anomaly detection.

Furthermore, LSTMs boast a design tailored to capture prolonged dependencies


within sequential data, a critical attribute for scrutinizing videos necessitating
contextual cues from preceding frames to decipher the present frame adequately.
This intrinsic capability empowers LSTMs to effectively identify disparities or
irregularities spanning multiple frames within a deepfake video, thus enhancing
the detection process's efficacy.

In addition to their prowess in long-term dependency capture, LSTMs also


exhibit robustness against variations inherent in deepfake videos. These videos
may manifest a plethora of manipulations, ranging from facial morphing to
expression synthesis and lip-syncing. LSTMs contribute significantly to detecting
such manipulations by learning to discern patterns indicative of tampering or
alteration across disparate frames, thereby fortifying the detection system's
resilience to the manifold techniques employed in deepfake generation.

Finally, LSTMs can seamlessly integrate into end-to-end learning frameworks


tailored for deepfake detection. These comprehensive systems encompass all
facets of the detection process, spanning from feature extraction and sequence
modeling to classification. By jointly learning from the dataset, this approach
enables the LSTM network to adapt dynamically to the distinctive attributes of
deepfake videos, potentially culminating in superior detection outcomes. Thus,
within the sphere of deepfake detection, LSTMs emerge as indispensable tools,
leveraging their intrinsic capabilities to combat the proliferation of deceptive
multimedia content effectively.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

ResNeXt:

ResNeXt, an architectural variant within the neural network domain, finds


utility in the realm of deepfake detection owing to its adeptness in discerning
intricate features within images. Renowned for its efficacy in image classification
tasks, ResNeXt models exhibit a remarkable capacity for capturing nuanced visual
cues. In the context of deepfake detection, leveraging ResNeXt architectures
involves training them on datasets encompassing both authentic and deepfake
images. Through this training process, the model assimilates the subtle disparities
inherent in visual features, enabling it to discern between genuine and
manipulated content. By scrutinizing these features, the model endeavors to
ascertain the likelihood of an image or video being a deepfake. Nonetheless, the
endeavor of deepfake detection remains an enduring challenge, attributable to the
swift evolution of deepfake technology. As a consequence, it necessitates the
adoption of more sophisticated methodologies beyond mere reliance on neural
network architectures like ResNeXt.

ResNeXt, a neural network architecture variant, can be harnessed for


deepfake detection owing to its proficiency in deciphering intricate image
features. Respected for its effectiveness in classifying images, ResNeXt models
possess a notable ability to capture subtle visual nuances. Within the realm of
deepfake detection, deploying ResNeXt architectures entails training them on
datasets containing authentic and manipulated images. Through this training, the
model assimilates the subtle differences ingrained in visual features, empowering
it to discern between genuine and manipulated content. By scrutinizing these
features, the model endeavors to evaluate the probability of an image or video
being a deepfake. Nevertheless, the task of deepfake detection remains an ongoing
challenge due to the rapid evolution of deepfake technology. Consequently, it
necessitates the exploration of more advanced techniques beyond the reliance
solely on neural network architectures like ResNeXt.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

3.1.1 System Architecture

The image you sent is a block diagram of a deepfake detection system. The
system first uploads a video to a server. The server then preprocesses the video,
which involves splitting the video into frames and then cropping the faces out of
the frames. The cropped faces are then saved in a dataset. The dataset is then split
into training and testing data. The training data is used to train a deep learning
model to detect deepfakes. The testing data is used to evaluate the performance of
the model. The deep learning model used in the system is a combination of a
Long Short-Term Memory (LSTM) network and a ResNext network. The LSTM
network is used to learn the temporal features of the video, while the ResNext
network is used to learn the spatial features of the video. Once the model is
trained, it is used to predict whether a new video is a real video or a deepfake. The
prediction is made by loading the trained model and then feeding the video into
the model. The model then outputs a classification, which is either "real" or
"fake".Here is a more detailed explanation of the steps involved in the system:

Upload video: The first step in the system is to upload the video that you want
to be analyzed. The video can be uploaded from a local file or from a remote
server.

Preprocessing: Once the video has been uploaded, it is preprocessed.


Preprocessing involves splitting the video into frames and then cropping the faces
out of the frames. The cropped faces are then saved in a dataset.

Data splitting: The dataset is then split into training and testing data. The
training data is used to train the deep learning model to detect deepfakes. The
testing data is used to evaluate the performance of the model.

Data loader: The data loader is responsible for loading the training and testing
data into the deep learning model.

Train/Test the Model: The training data is used to train the deep learning
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

model. The testing data is used to evaluate the performance of the model.

Deepfake detection Model: The deep learning model used in the system is a
combination of a Long Short-Term Memory (LSTM) network and a ResNext
network. The LSTM network is used to learn the temporal features of the video,
while the ResNext network is used to learn the spatial features of the video.

Load Trained Model: Once the model is trained, it is saved to a file. The
model can then be loaded back into memory when you want to use it to detect
deepfakes.

Prediction Flow: The prediction flow is the part of the system that is used to
make predictions on new videos. To make a prediction on a new video, you
simply load the trained model and then feed the video into the model. The model
will then output a classification, which is either "real" or "fake".

Real/Fake: The final output of the system is a classification of whether the


video is a real video or a deepfake.

Figure 3.1.1: System Architecture

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

Figure 3.1.1: Face Swapped deepfake generation

3.2MODULES

3.2.1. Data-set Gathering


For making the model efficient for real time prediction. We have
gathered the data from different available data-sets like Face Forensic+
+(FF), Deepfake detection challenge (DFDC), and Celeb-DF. Futher
we have mixed the dataset the col- lected datasets and created our own
new dataset, to accurate and real time detection on different kind of
videos. To avoid the training bias of the model we have consid- ered
50% Real and 50% fake videos.
Deep fake detection challenge (DFDC) dataset consist of certain
audio alerted video, as audio deepfake are out of scope for this paper.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

Figure 3.2.1: Dataset

After preprocessing of the DFDC dataset, we have taken 1500 Real


and 1500 Fake videos from the DFDC dataset. 1000 Real and 1000
Fake videos from the FaceForensic++(FF) dataset and 500 Real and
500 Fake videos from the Celeb- DF dataset. Which makes our total
dataset consisting 3000 Real, 3000 fake videos and 6000 videos in
total.

3.2.2. Pre-processing

In this step, the videos are preprocessed and all the unrequired
and noise is removed from videos. Only the required portion of the
video i.e face is detected and cropped. The first steps in the
preprocessing of the video is to split the video into frames.
After splitting the video into frames the face is detected in each of
the frame and the frame is cropped along the face. Later the cropped
frame is again converted to a new video by combining each frame of
the video. The process is followed for each video which leads to
creation of processed dataset containing face only videos. The frame
that does not contain the face is ignored while preprocessing.To
maintain the uniformity of number of frames, we have selected a

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

threshold value based on the mean of total frames count of each video.
Another reason for selecting a threshold value is limited computation
power.
As a video of 10 second at 30 frames per second(fps) will have
total 300 frames and it is computationally very difficult to process the
300 frames at a single time in the experimental envi- ronment.

Figure 3.2.2: Pre-processing of video

So, based on our Graphic Processing Unit (GPU) computational


power in experimental environment we have selected 150 frames as the
threshold value. While saving the frames to the new dataset we have
only saved the first 150 frames of the video to the new video. To
demonstrate the proper use of Long Short-Term Memory (LSTM) we
have considered the frames in the sequential manner i.e. first 150
frames and not randomly. The newly created video is saved at frame
rate of 30 fps and resolution of 112 x 112.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

3.2.3. Data-set split

The dataset is split into train and test dataset with a ratio of 70% train videos
(4,200) and 30% (1,800) test videos.

Figure 3.2.3: Train test split

The train and test split is a balanced split i.e 50% of the real and 50%
of fake videos in each split.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

3.2.4 Architecture

Our model is a combination of CNN and RNN. We have used the Pre-
trained ResNext CNN model to extract the features at frame level and based
on the extracted features a LSTM network is trained to classify the video as
deepfake or pristine. Us- ing the Data Loader on training split of videos the
labels of the videos are loaded and fitted into the model for training.

LSTM for Sequence Processing

2048-dimensional feature vectors is fitted as the input to the LSTM.


We are using 1 LSTM layer with 2048 latent dimensions and 2048 hidden
layers along with 0.4 chance of dropout, which is capable to do achieve our
objective. LSTM is used to process the frames in a sequential manner so that
the temporal analysis of the video can be made, by comparing the frame at ‘t’
second with the frame of ‘t-n’ seconds. Where n can be any number of frames
before t.

The model also consists of Leaky Relu activation function. A linear


layer of 2048 input features and 2 output features are used to make the model
capable of learning the average rate of correlation between eh input and
output. An adaptive average polling layer with the output parameter 1 is used
in the model. Which gives the the target output size of the image of the form H
x W. For sequential processing of the frames a Sequential Layer is used. The
batch size of 4 is used to perform the batch training. A SoftMax layer is used
to get the confidence of the model during predication.

ResNext
Instead of writing the code from scratch, we used the pre-trained model
of ResNext for feature extraction. ResNext is Residual CNN network
optimized for high per- formance on deeper neural networks. For the

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

experimental purpose we have used resnext50_32x4d model. We have used a


ResNext of 50 layers and 32 x 4 dimen- sions. Following, we will be fine-
tuning the network by adding extra required layers and selecting a proper
learning rate to properly converge the gradient descent of the model. The
2048-dimensional feature vectors after the last pooling layers of ResNext is
used as the sequential LSTM input.

Figure 3.2.4: Overview of our model

3.2.5 Hyper-parameter tuning

It is the process of choosing the perfect hyper-parameters for achieving the


maximum accuracy. After reiterating many times on the model. The best
hyper-parameters for our dataset are chosen. To enable the adaptive learning
rate Adam[21] optimizer with the model parameters is used. The learning rate
is tuned to 1e-5 (0.00001) to achieve a better global minimum of gradient

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

descent. The weight decay used is 1e-3. As this is a classification problem so to


calculate the loss cross entropy approach is used.To use the available
computation power properly the batch training is used. The batch size is taken
of 4. Batch size of 4 is tested to be ideal size for training in our development
environment. The User Interface for the application is developed using Django
framework. Django is used to enable the scalability of the application in the
future. The first page of the User interface i.e index.html contains a tab to
browse and upload the video. The uploaded video is then passed to the model
and prediction is made by the model. The model returns the output whether the
video is real or fake along with the confidence of the model. The output is
rendered in the predict.html on the face of the playing video.

3.3 SYSTEM DESIGN

3.3.1 USE CASE DIAGRAM


The system starts with a user uploading a video to the server. Then, the video
undergoes preprocessing, which likely involves preparing the video for
analysis by converting it into a format suitable for the deep learning model.
Next, the model analyzes the video, presumably to detect signs of
manipulation indicative of a deepfake. Finally, the system outputs a prediction,
categorizing the video as “Real” or “Fake” along with a confidence percentage

Figure 3 . 3 . 1 : Use case diagram for deep fake detection

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

3.3.2 ACTIVITY DIAGRAM

The system begins with three datasets labeled FaceForensic, DFDC Dataset,
and Celeb-DF, containing a total of 6,000 videos. After preprocessing, which
involves face detection and cropping, these videos become the ‘Face-cropped
Dataset’. This dataset is then split into training videos (4,200 videos) and test
videos (1,800 videos).

A Data Loader component feeds the videos and labels into a ResNext CNN
(Convolutional Neural Network) and an LSTM (Long Short-Term Memory
network), likely to train a deep learning model that can identify patterns in real
and deepfake videos. A confusion matrix then evaluates the performance of
the model, providing an accuracy rating. Finally, the system exports the trained
model, presumably for deployment to identify deepfakes in future videos.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

Figure 3 . 3 . 2 :Activity diagram for deep fake detection

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

3.3.3 SEQUENCE DIAGRAM

The system has a user who uploads a video. The system then takes the video
through preprocessing steps. After preprocessing, the video is fed into a
ResNext CNN (Convolutional Neural Network) and an LSTM (Long Short-
Term Memory network). These are likely deep learning models that analyze
the video to determine if it's real or fake (REAI/FAKE). The system then
outputs a result, presumably categorizing the video as real or fake.

Figure 3 . 3 . 3 : Sequence diagram for deep fake detection

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

3.3.4 WORKFLOW

The diagram shows a face recognition system that operates on a user video.
The first step is to pre-process the video, which likely involves resizing and
converting the video to a format suitable for analysis. Then, the system
performs face detection and cropping, isolating the faces in the video. Next, a
trained model is loaded to analyze the preprocessed faces. This model was
likely trained on a large dataset of faces. Finally, the system outputs a
prediction, classifying the faces as real or fake.

Figure 3 . 3 . 4 : Workflow diagram for deep fake detection


Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

CHAPTER 4
RESULT ANALYSIS

To overcome the problem of faking videos we designed a webpage


called deepfake detection which is used to detect the fake video. For detecting
the fake video’s it uses LSTM and ResNext algorithm.first we need to upload a
video of certain length and then the webpage will start processing the video
and with the help of processed video the webpage will give the conclusion
whether it is fake or not.
Initially, the videos are preprocessed into frames. Each frame is then
passed through the ResNext model to extract high-level features. These
features capture the spatial information present in each frame. And ResNext, a
convolutional neural network (CNN) architecture, is utilized to extract spatial
features from each frame of the video. ResNext is known for its ability to
capture intricate details and patterns within images. The output of the ResNext
model is a set of high-dimensional feature vectors representing each frame.
After extracting spatial features from individual frames, the LSTM model
comes into play to capture temporal dependencies across frames. LSTM is a
type of recurrent neural network (RNN) that can model sequential data
effectively. The output from ResNext, which consists of feature vectors for
each frame, is fed into the LSTM model to capture the temporal dynamics of
the video.
The combined LSTM-ResNext model is trained using a dataset
containing both real and fake videos. During training, the model learns to
distinguish between the temporal patterns present in real and fake videos. This
process involves adjusting the parameters of both the ResNext .Once the
model is trained, it undergoes fine-tuning and validation phases to optimize its
performance further. Fine-tuning involves adjusting hyperparameters and
possibly retraining the model on a subset of the data. Validation is done on a
separate set of videos to assess the model's generalization ability and detect
any overfitting issues.
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

Finally, the trained model is ready for inference. Given a new video

as input, the combined LSTM-ResNext model processes the frames, extracts


spatial features using ResNeXt, captures temporal dynamics using LSTM, and
then makes a prediction regarding the video's authenticity (real or fake).

Figure 4 . 1 : Home Page

Figure 4.1 shows the interface of the webpage of deepfake


detection.When user opens the webpage the above interface will be seen by
user.The user can see the upload button on opening the web page.By clicking
on the upload button the user can select and upload the video file.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

Figure 4 . 2 : Selecting and uploading

Figure 4.2 In the next step click on the choose file button visible on the
home page and select the video you want to check whether it is a deepfake or

not. The selected video will be visible on the interface. Next click on the
upload button to upload the video.

Figure 4.3:Real video output

Figure 4.3 explains that after the uploading. The video will be analyzed with
LSTM and ResNext algorithm and gives the output whether it is fake video or
real video.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

Figure 4.4 :Uploading Fake Video

Figure 4.4 shows the uploading of fake video and the model starts analyzing
the video by using the LSTM and ResNext algorithm.

Figure 4.5:Fake video output

Figure
4.5 shows
the output of
a uploaded
video as fake
by using the
LSTM
and ResNext

algorithm.

Figure 4.6: Selecting Video with no faces

Figure 4.6 explains that Always Select the video which has face in the Video
Department Of Information Technology, MREC
Deepfake detection using LSTM and ResNext

for the detector to process the Deep fake detection

Figure 4.7: Output of Uploaded Video with no faces

Figure 4.7 show that there will be no output if we upload a video with no
faces.

Figure 4.8: Uploading file greater then 100MB

Figure 4.8 shows an error if we upload a video greater than 100MB.either the
video might be a fake video or a original video.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

Figure 4.9: Pressing Upload button without selecting video

Figure 4.9 shows that there will be a warning under the box of choose file to
select a file. if we click upload button without uploading a file

CHAPTER 5
CONCLUSION

In conclusion, our work presents a significant step forward in the


ongoing battle against the proliferation of deep fake videos. With the
exponential growth of computational power, the creation of indistinguishable
human-synthesized videos has become alarmingly simple, posing serious
threats such as political manipulation, fake terrorism events, revenge porn, and
blackmail. To address this pressing issue, we propose a novel deep learning-
based method capable of effectively detecting AI-generated fake videos,
specifically targeting replacement and reenactment deep fakes.

Our approach leverages a Res-Next Convolutional Neural Network to


extract frame-level features, which are then utilized to train a Long Short-
Term Memory (LSTM) based Recurrent Neural Network (RNN) for
classification purposes. By utilizing this architecture, we aim to discern
whether a video has undergone manipulation or is genuine. To ensure the

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

practicality and efficacy of our model in real-world scenarios, we extensively


evaluate its performance on a large dataset that encompasses a balanced mix of
various existing datasets such as Face-Forensic++, the Deepfake Detection
Challenge, and Celeb-DF.

Through our experiments, we demonstrate that our system achieves


competitive results using a straightforward and robust approach. By
combining sophisticated deep learning techniques with comprehensive dataset
curation, we pave the way for more reliable detection methods against the
growing threat of deep fake videos. Our work underscores the importance of
leveraging artificial intelligence to combat the negative consequences of
artificial intelligence, ultimately contributing to the preservation of trust and
integrity in digital media landscapes.

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

REFERENCES:

[1] L. Borges, B. Martins, and P. Calado, “Combining similarity features and


deep representation learning for stance detection in the context of checking fake
news,” vol. 11, no. 3, 2019.

[2] “Rana, M. S., Nobi, M. N., Murali, B., & Sung, A. H. (2022). Deepfake
detection: A systematic literature review. IEEE access, 10, 25494-25513.

[3] “A. Qureshi, D. Megías and M. Kuribayashi, "Detecting Deepfake Videos


using Digital Watermarking," 2021 Asia-Pacific Signal and Information
Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo,
Japan, 2021, pp. 1786-1793.

[4] D. Güera and E. J. Delp, “Deepfake video detection using recurrent neural
networks,” in 15th IEEE International Conference on Advanced Video and
Signal Based Surveillance, 2018, pp. 1–6.

[5] D. Megías, M. Kuribayashi, A. Rosales, and W. Mazurczyk, “DISSIMILAR:


Towards fake news detection using information hiding, signal processing and
machine learning,” in 16th International Conference on Availability, Reliability
and Security (ARES), 2021.

[6] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama,


K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for
visual recognition and description,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 39, no. 4, pp. 677–691, 2017.

[7] E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and P. Natarajan,


“Recurrent convolutional strategies for face manipulation detection in videos,” in
IEEE Conference on Computer Vision and Pattern Recognition Workshops,
CVPR Workshops, 2019, pp. 80–87.

[8] “Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y. G. (2020, October).
Wilddeepfake: A challenging real-world dataset for deepfake detection.
In Proceedings of the 28th ACM international conference on multimedia (pp.
2382-2390).”

Department Of Information Technology, MREC


Deepfake detection using LSTM and ResNext

[9] S. Hussain, P. Neekhara, M. Jere, F. Koushanfar, and J. McAuley,


“Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to
adversarial examples,” in IEEE Winter Conference on Applications of Computer
Vision, 2021, pp. 3347–3356.

[10] K. Zhang, Z. Zhang, Z.and Li, , and Y. Qiao, “Joint face detection and
alignment using multitask cascaded convolutional networks,” IEEE Signal
Processing Letters, vol. 23, no. 10, p. 1499–1503, 2016.

[11] Ahmed, S. R., Sonuç, E., Ahmed, M. R., & Duru, A. D. (2022, June).
Analysis survey on deepfake detection and recognition with convolutional neural
networks. In 2022 International Congress on Human-Computer Interaction,
Optimization and Robotic Applications (HORA) (pp. 1-7). IEEE.

[12] I. Perov, D. Gao, N. Chervoniy, K. Liu, S. Marangonda, C. Umé, M. Dpfks,


C. S. Facenheim, L. RP, J.Jiang, S. Zhang, P. Wu, B. Zhou, and W. Zhang,
“DeepFaceLab: Integrated, flexible and extensible faceswapping framework,”
arXiv, 2021.

[13] Y. Mirsky and W. Lee, “The creation and detection of deepfakes: A


survey,” ACM Computing Survey, vol. 54, no. 1, 2021.

[14] “Guarnera, Luca, Oliver Giudice, and Sebastiano Battiato. "Deepfake


detection by analyzing convolutional traces." Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition workshops. 2020.?”

[15] F. Deguillaume, S. Voloshynovskiy, and T. Pun, “Secure hybrid robust


watermarking resistant against tampering and copy attack,”

[16] “ Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., & Xia, W. (2021).
Learning self-consistency for deepfake detection. In Proceedings of the
IEEE/CVF international conference on computer vision (pp. 15023-15033).

Department Of Information Technology, MREC

You might also like