Project Report
Project Report
The growing computation power has made the deep learning algorithms so
powerful that creating a indistinguishable human synthesized video popularly
called as deep fakes have became very simple. Scenarios where these realistic
face swapped deep fakes are used to create political distress, fake terrorism
events, revenge porn, blackmail peoples are easily envisioned. In this work, we
describe a new deep learning-based method that can effectively distinguish AI-
generated fake videos from real videos. Our method is capable of automatically
detecting the replacement and reenactment deep fakes. We are trying to use
Artificial Intelligence(AI) to fight Artificial Intelligence(AI). Our system uses a Res-
Next Convolution neural network to extract the frame-level features and these
features and further used to train the Long Short Term Memory(LSTM) based
Recurrent Neural Network(RNN) to classify whether the video is subject to any
kind of manipulation or not, i.e whether the video is deep fake or real video. To
emulate the real time scenarios and make the model perform better on real time
data, we evaluate our method on large amount of balanced and mixed data-set
prepared by mixing the various available data-set like Face-Forensic++[1],
Deepfake detection challenge[2], and Celeb-DF[3]. We also show how our system
can achieve competitive result using very simple and robust approach.
Keywords:
Res-Next Convolution neural network.
Recurrent Neural Network (RNN).
Long Short Term Memory(LSTM).
Computer vision
Contents
1 Synopsis 1
2 Technical Keywords 2
2.1 Area of Project ............................ 2
2.2 Technical Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Introduction 3
3.1 Project Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Motivation of the Project . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Problem Definition and scope 7
4.1 Problem Statement ..........................7
4.1.1 Goals and objectives . . . . . . . . . . . . . . . . . . . . . 7
4.1.2 Statement of scope . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Major Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Methodologies of Problem solving . . . . . . . . . . . . . . . . . .9
4.3.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3.2 Design ............................ 10
4.3.3 Development . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.6 Hardware Resources Required . . . . . . . . . . . . . . . . . . . . 10
4.7 Software Resources Required . . . . . . . . . . . . . . . . . . . . . 11
5 Project Plan 12
5.1 Project Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1.1 Reconciled Estimates . . . . . . . . . . . . . . . . . . . . . 13
5.1.2 Cost Estimation using COCOMO(Constructive Cost) Model 13
List of Figures
5.1 Spiral Methodology SDLC ..................... 12
6.1 Use case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 DFD Level 0 ............................. 18
6.3 DFD Level 1 ............................. 19
6.4 DFD Level 2 ............................. 19
6.5 Training Workflow ......................... 20
6.6 Testing Workflow .......................... 21
6.7 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 1
Synopsis
Deep fake is a technique for human image synthesis based on neural network
tools like GAN(Generative Adversarial Network) or Auto Encoders etc. These tools
super impose target images onto source videos using a deep learning techniques
and create a realistic looking deep fake video. These deep-fake video are so real
that it becomes impossible to spot difference by the naked eyes. In this work, we
describe a new deep learning-based method that can effectively distinguish AI-
generated fake videos from real videos. We are using the limitation of the deep
fake creation tools as a powerful way to distinguish between the pristine and
deep fake videos. During the creation of the deep fake the current deep fake
creation tools leaves some distinguishable artifacts in the frames which may not
be visible to the human being but the trained neural networks can spot the
changes. Deepfake creation tools leave distinctive artefacts in the resulting Deep
Fake videos, and we show that they can be effectively captured by Res-Next
Convolution Neural Networks.
Chapter 2
1
Deepfake Video Detection
Technical Keywords
• Deep learning
• Computer vision
• OpenCV
• Face Recognition
• PyTorch.
Chapter 3
Introduction
these realistic face swapped deepfakes are used to create political distress,
fake terrorism events, revenge porn, blackmail peoples are easily
envisioned.Some of the examples are Brad Pitt, Angelina Jolie nude videos.
It becomes very important to spot the difference between the deepfake and
pristine video. We are using AI to fight AI.Deepfakes are created using tools
like FaceApp[11] and Face Swap [12], which using pre-trained neural
networks like GAN or Auto encoders for these deepfakes creation. Our
method uses a LSTM based artificial neural network to process the
sequential temporal analysis of the video frames and pre-trained Res-Next
CNN to extract the frame level features. ResNext Convolution neural
network extracts the frame-level features and these features are further
used to train the Long Short Term Memory based artificial Recurrent Neural
Network to classify the video as Deepfake or real. To emulate the real time
scenarios and make the model perform better on real time data, we trained
our method with large amount of balanced and combination of various
available dataset like FaceForensic++[1], Deepfake detection challenge[2],
and Celeb-DF[3].
Further to make the ready to use for the customers, we have developed a
front end application where the user the user will upload the video. The
video will be processed by the model and the output will be rendered back
to the user with the classification of the video as deepfake or real and
confidence of the model.
3
Deepfake Video Detection
Like any trans-formative technology, this has created new challenges. So-
called "deep fakes" produced by deep generative models that can
manipulate video and audio clips. Since their first appearance in late 2017,
many open-source deep fake generation methods and tools have emerged
now, leading to a growing number of synthesized media clips. While many
are likely intended to be humorous, others could be harmful to individuals
and society. Until recently, the number of fake videos and their degrees of
realism has been increasing due to availability of the editing tools, the high
demand on domain expertise.
Spreading of the Deep fakes over the social media platforms have become
very common leading to spamming and peculating wrong information over
the platform. Just imagine a deep fake of our prime minister declaring war
against neighboring countries, or a Deep fake of reputed celebrity abusing
the fans. These types of the deep fakes will be terrible, and lead to
threatening, misleading of common people.
4
Deepfake Video Detection
Detection by Eye Blinking [16] describes a new method for detecting the
deepfakes by the eye blinking as a crucial parameter leading to classification
of the videos as deepfake or pristine. The Long-term Recurrent Convolution
Network (LRCN) was used for temporal analysis of the cropped frames of
eye blinking. As today the deepfake generation algorithms have become so
powerful that lack of eye blinking can not be the only clue for detection of
the deepfakes. There must be certain other parameters must be considered
for the detection of deepfakes like teeth enchantment, wrinkles on faces,
wrong placement of eyebrows etc.
Capsule networks to detect forged images and videos [17] uses a method
that uses a capsule network to detect forged, manipulated images and
videos in different scenarios, like replay attack detection and computer-
generated video detection.
In their method, they have used random noise in the training phase which is
not a good option. Still the model performed beneficial in their dataset but
may fail on real time data due to noise in training. Our method is proposed
to be trained on noiseless and real time datasets.
Recurrent Neural Network [18] (RNN) for deepfake detection used the
approach of using RNN for sequential processing of the frames along with
ImageNet pre-trained model. Their process used the HOHO [19] dataset
consisting of just 600 videos.
Their dataset consists small number of videos and same type of videos,
which may not perform very well on the real time data. We will be training
out model on large number of Realtime data.
5
Deepfake Video Detection
6
Deepfake Video Detection
Chapter 4
• Our project aims at discovering the distorted truth of the deep fakes.
• Our project will reduce the Abuses’ and misleading of the common people
on the world wide web.
• Our project will distinguish and classify the video as deepfake or pristine.
• Provide a easy to use system for used to upload the video and distinguish
whether the video is real or fake.
There are many tools available for creating the deep fakes, but for deep fake
detection there is hardly any tool available. Our approach for detecting the
7
Deepfake Video Detection
deep fakes will be great contribution in avoiding the percolation of the deep
fakes over the world wide web. We will be providing a web-based platform
for the user to upload the video and classify it as fake or real. This project
can be scaled up from developing a web-based platform to a browser plugin
for automatic deep fake detection’s. Even big application like WhatsApp,
Facebook can integrate this project with their application for easy pre-
detection of deep fakes before sending to another user. A description of the
software with Size of input, bounds on input, input validation, input
dependency, i/o state diagram, Major inputs, and outputs are described
without regard to implementation detail.
• User: User of the application will be able detect the whether the uploaded
video is fake or real, Along with the model confidence of the prediction.
• Prediction: The User will be able to see the playing video with the output on
the face along with the confidence of the model.
4.3.1 Analysis
• Solution Requirement
We analysed the problem statement and found the feasibility of the solution
of the problem. We read different research paper as mentioned in 3.3. After
8
Deepfake Video Detection
checking the feasibility of the problem statement. The next step is the
dataset gathering and analysis. We analysed the data set in different
approach of training like negatively or positively trained i.e training the
model with only fake or real video’s but found that it may lead to addition of
extra bias in the model leading to inaccurate predictions. So after doing lot
of research we found that the balanced training of the algorithm is the best
way to avoid the bias and variance in the algorithm and get a good accuracy.
• Solution Constraints
We analysed the solution in terms of cost,speed of
processing,requirements,level of expertise, availability of equipment’s.
• Parameter Identified
1. Blinking of eyes
2. Teeth enchantment
3. Bigger distance for eyes
4. Moustaches
5. Double edges, eyes, ears, nose
6. Iris segmentation
7. Wrinkles on face
8. Inconsistent head pose
9. Face angle
10. Skin tone
11. Facial Expressions
12. Lighting
13. Different Pose
14. Double chins
15. Hairstyle
16. Higher cheek bones
9
Deepfake Video Detection
4.3.2 Design
After research and analysis we developed the system architecture of the solution
as mentioned in the Chapter 6. We decided the baseline architecture of the
Model which includes the different layers and their numbers.
4.3.3 Development
After analysis we decided to use the PyTorch framework along with python3
language for programming. PyTorch is chosen as it has good support to CUDA i.e
Graphic Processing Unit (GPU) and it is customize-able. Google Cloud Platform for
training the final model on large number of data-set.
4.3.4 Evaluation
We evaluated our model with a large number of real time dataset which include
YouTube videos dataset. Confusion Matrix approach is used to evaluate the
accuracy of the trained model.
4.4 Outcome
The outcome of the solution is trained deepfake detection models that will help
the users to check if the new video is deepfake or real.
4.5 Applications
Web based application will be used by the user to upload the video and submit
the video for processing. The model will pre-process the video and predict
whether the uploaded video is a deepfake or real video.
10
Deepfake Video Detection
Chapter 5
Project Plan
11
Deepfake Video Detection
capable of handling the risks that’s the reason we are using spiral model for
product development.
Since we have small team , less-rigid requirements, long deadline we are using
the organic COCOMO[23] model.
12
Deepfake Video Detection
E f fortApplied(E)= ab(KLOC)bb
E = 2.4(20.5)1.05
E = 57.2206PM
2. Development Time: Simply means the amount of time required for the
completion of the job, which is, of course, proportional to the effort put. It is
measured in the units of time such as weeks, months.
DevelopmentTime(D)= cb(E)db
D = 2.5(57.2206)0.38
D = 11.6M
E
PeopleRequired(P)=
D
P = 4.93
13
Deepfake Video Detection
Before the training, we need to prepare thousands of images for both persons.
We can take a shortcut and use a face detection library to scrape facial pictures
from their videos. Spend significant time to improve the quality of your facial
pictures. It impacts your final result significantly.
1. Remove any picture frames that contain more than one person.
2. Make sure you have an abundance of video footage. Extract facial pictures
contain different pose, face angle, and facial expressions.
3. Some resembling of both persons may help, like similar face shape.
In Deepfakes, it creates a mask on the created face so it can blend in with the
target video. To further eliminate the artifacts
14
Deepfake Video Detection
• Task 3: Pre-processing
Pre-processing includes the creation of the new dataset which includes only
face cropped videos.
15
Deepfake Video Detection
• Task 8 : Testing
The complete application is tested using unit testing,
Chapter 6
6.1 Introduction
This document lays out a project plan for the development of Deepfake video
detection using neural network.The intended readers of this document are
current and future developers working on Deepfake video detection using neural
network and the sponsors of the project. The plan will include, but is not
restricted to, a summary of the system functionality, the scope of the project
from the perspective of the “Deepfake video detection” team (me and my
mentors), use case diagram, Data flow diagram,activity diagram, functional and
non- functional requirements, project risks and how those risks will be mitigated,
the process by which we will develop the project, and metrics and measurements
that will be recorded throughout the project.
16
Deepfake Video Detection
DFD level – 0 indicates the basic flow of data in the system. In this System Input is
given equal importance as that for Output.
17
Deepfake Video Detection
Hence, the data flow diagram indicates the visualization of system with its
input and output flow.
DFD Level-1
[1] DFD Level – 1 gives more in and out information of the system.
[2] Where system gives detailed information of the procedure taking place.
DFD Level-2
18
Deepfake Video Detection
19
Deepfake Video Detection
20
Deepfake Video Detection
Safety Requirement
21
Deepfake Video Detection
• The Data integrity is preserved. Once the video is uploaded to the system. It
is only processed by the algorithm. The videos are kept secured from the
human interventions, as the uploaded video is not are not able for human
manipulation.
• To extent the safety of the videos uploaded by the user will be deleted after
30 min from the server.
Security Requirement
• While uploading the video, the video will be encrypted using a certain
symmetric encryption algorithm. On server also the video is in encrypted
format only. The video is only decrypted from preprocessing till we get the
output. After getting the output the video is again encrypted.
• This cryptography will help in maintain the security and integrity of the
video.
Chapter 7
22
Deepfake Video Detection
7.1 Introduction
7.1.1 System Architecture
In this system, we have trained our PyTorch deepfake detection model on equal
number of real and fake videos in order to avoid the bias in the model. The
system architecture of the model is showed in the figure. In the development
phase, we have taken a dataset, preprocessed the dataset and created a new
processed dataset which only includes the face cropped videos.
23
Deepfake Video Detection
video into frames , detect the face in the video and replace the source face with
target face on each frame. Then the replaced frames are then combined using
different pre-trained models. These models also enhance the quality of video my
removing the left-over traces by the deepfake creation model. Which result in
creation of a deepfake looks realistic in nature. We have also used the same
approach to detect the deepfakes. Deepfakes created using the pretrained neural
networks models are very realistic that it is almost impossible to spot the
difference by the naked eyes. But in reality, the deepfakes creation tools leaves
some of the traces or artifacts in the video which may not be noticeable by the
naked eyes. The motive of this paper to identify these unnoticeable traces and
distinguishable artifacts of these videos and classified it as deepfake or real video.
1. Faceswap
2. Faceit
24
Deepfake Video Detection
For making the model efficient for real time prediction. We have gathered the
data from different available data-sets like FaceForensic++(FF)[1], Deepfake
detection challenge(DFDC)[2], and Celeb-DF[3]. Futher we have mixed the
dataset the collected datasets and created our own new dataset, to accurate and
real time detection on different kind of videos. To avoid the training bias of the
model we have considered 50% Real and 50% fake videos.
Deep fake detection challenge (DFDC) dataset [3] consist of certain audio
alerted video, as audio deepfake are out of scope for this paper. We preprocessed
the DFDC dataset and removed the audio altered videos from the dataset by
running a python script.
After preprocessing of the DFDC dataset, we have taken 1500 Real and 1500
Fake videos from the DFDC dataset. 1000 Real and 1000 Fake videos from the
FaceForensic++(FF)[1] dataset and 500 Real and 500 Fake videos from the
CelebDF[3] dataset. Which makes our total dataset consisting 3000 Real, 3000
fake videos and 6000 videos in total. Figure 2 depicts the distribution of the data-
sets.
25
Deepfake Video Detection
In this step, the videos are preprocessed and all the unrequired and noise is removed
from videos. Only the required portion of the video i.e face is detected and cropped.
The first steps in the preprocessing of the video is to split the video into frames. After
splitting the video into frames the face is detected in each of the frame and the
frame is cropped along the face. Later the cropped frame is again converted to a new
video by combining each frame of the video. The process is followed for each video
which leads to creation of processed dataset containing face only videos. The frame
that does not contain the face is ignored while preprocessing.
To maintain the uniformity of number of frames, we have selected a
threshold value based on the mean of total frames count of each video. Another
reason for selecting a threshold value is limited computation power. As a video of
10 second at 30 frames per second(fps) will have total 300 frames and it is
computationally very difficult to process the 300 frames at a single time in the
experimental environment. So, based on our Graphic Processing Unit (GPU)
computational power in experimental environment we have selected 150 frames
as the threshold value. While saving the frames to the new dataset we have only
saved the first 150 frames of the video to the new video. To demonstrate the
proper use of Long Short-Term Memory (LSTM) we have considered the frames in
the sequential manner i.e. first 150 frames and not randomly. The newly created
video is saved at frame rate of 30 fps and resolution of 112 x 112.
26
Deepfake Video Detection
The dataset is split into train and test dataset with a ratio of 70% train videos
(4,200) and 30% (1,800) test videos. The train and test split is a balanced split i.e
50% of the real and 50% of fake videos in each split.
27
Deepfake Video Detection
Our model is a combination of CNN and RNN. We have used the Pre- trained
ResNext CNN model to extract the features at frame level and based on the
extracted features a LSTM network is trained to classify the video as deepfake or
pristine. Using the Data Loader on training split of videos the labels of the videos
are loaded and fitted into the model for training.
ResNext :
Instead of writing the code from scratch, we used the pre-trained model of
ResNext for feature extraction. ResNext is Residual CNN network optimized for
high performance on deeper neural networks. For the experimental purpose we
have used resnext50_32x4d model. We have used a ResNext of 50 layers and 32 x
4 dimensions.
Following, we will be fine-tuning the network by adding extra required layers
and selecting a proper learning rate to properly converge the gradient descent of
the model. The 2048-dimensional feature vectors after the last pooling layers of
ResNext is used as the sequential LSTM input.
2048-dimensional feature vectors is fitted as the input to the LSTM. We are using
1 LSTM layer with 2048 latent dimensions and 2048 hidden layers along with 0.4
28
Deepfake Video Detection
29
Deepfake Video Detection
Adam[21] optimizer with the model parameters is used. The learning rate is tuned
to 1e-5 (0.00001) to achieve a better global minimum of gradient descent. The
weight decay used is 1e-3.
As this is a classification problem so to calculate the loss cross entropy
approach is used.To use the available computation power properly the batch
training is used. The batch size is taken of 4. Batch size of 4 is tested to be ideal
size for training in our development environment.
The User Interface for the application is developed using Django framework.
Django is used to enable the scalability of the application in the future.
The first page of the User interface i.e index.html contains a tab to browse
and upload the video. The uploaded video is then passed to the model and
prediction is made by the model. The model returns the output whether the
video is real or fake along with the confidence of the model. The output is
rendered in the predict.html on the face of the playing video.
30
Deepfake Video Detection
Chapter 8
Project Implementation
8.1 Introduction
There are many examples where deepfake creation technology is used to
mislead the people on social media platform by sharing the false deepfake videos
of the famous personalities like Mark Zuckerberg Eve of House A.I. Hearing,
Donald Trump’s Breaking Bad series where he was introduces as James McGill,
Barack Obama’s public service announcement and many more [5]. These types of
deepfakes creates a huge panic among the normal people, which arises the need
to spot these deepfakes accurately so that they can be distinguished from the real
videos.
Latest advances in the technology have changed the field of video manipulation.
The advances in the modern open source deep learning frameworks like
TensorFlow, Keras, PyTorch along with cheap access to the high computation
power has driven the paradigm shift. The Conventional autoencoders[10] and
Generative Adversarial Network (GAN) pretrained models have made the
tampering of the realistic videos and images very easy. Moreover, access to these
pretrained models through the smartphones and desktop applications like
FaceApp and Face Swap has made the deepfake creation a childish thing. These
applications generate a highly realistic synthesized transformation of faces in real
videos. These apps also provide the user with more functionalities like changing
the face hair style, gender, age and other attributes. These apps also allow the
user to create a very high quality and indistinguishable deepfakes. Although some
malignant deepfake videos exist, but till now they remain a minority. So far, the
released tools [11,12] that generate deepfake videos are being extensively used
to create fake celebrity pornographic videos or revenge porn [13]. Some of the
examples are Brad Pitt, Angelina Jolie nude videos. The real looking nature of the
deepfake videos makes the celebraties and other famous personalities the target
of pornographic material, fake surveillance videos, fake news and malicious
hoaxes. The Deepfakes are very much popular in creating the political tension
31
Deepfake Video Detection
[14]. Due to which it becomes very important to detect the deepfake videos and
avoid the percolation of the deepfakes on the social media platforms.
8.2.1 Planning
1. OpenProject
1. draw.io
1. Python3
2. JavaScript
1. PyTorch
2. Django
8.2.5 IDE
1. Google Colab
2. Jupyter Notebook
1. Git
32
Deepfake Video Detection
8.2.9 Libraries
1. torch
2. torchvision
3. os
4. numpy
5. cv2
6. matplotlib
7. face_recognition
8. json
9. pandas
10. copy
11. glob
12. random
13. sklearn
Refer 7.2.1
33
Deepfake Video Detection
• Using glob we imported all the videos in the directory in a python list.
• cv2.VideoCapture is used to read the videos and get the mean number of
frames in each video.
• The video is split into frames and the frames are cropped on face location.
• The face cropped frames are again written to new video using VideoWriter.
• The new video is written at 30 frames per second and with the resolution of
112 x 112 pixels in the mp4 format.
• Instead of selecting the random videos, to make the proper use of LSTM for
temporal sequence analysis the first 150 frames are written to the new
video.
34
Deepfake Video Detection
35
Deepfake Video Detection
• LSTM Layer : LSTM is used for sequence processing and spot the temporal
change between the frames.2048-dimensional feature vectors is fitted as the
input to the LSTM. We are using 1 LSTM layer with 2048 latent dimensions
and 2048 hidden layers along with 0.4 chance of dropout, which is capable
to do achieve our objective. LSTM is used to process the frames in a
sequential manner so that the temporal analysis of the video can be made,
by comparing the frame at ‘t’ second with the frame of ‘t-n’ seconds. Where
n can be any number of frames before t.
• ReLU:A Rectified Linear Unit is activation function that has output 0 if the
input is less than 0, and raw output otherwise. That is, if the input is greater
than 0, the output is equal to the input. The operation of ReLU is closer to
the way our biological neurons work. ReLU is non-linear and has the
36
Deepfake Video Detection
• Dropout Layer :Dropout layer with the value of 0.4 is used to avoid
overfitting in the model and it can help a model generalize by randomly
setting the output for a given neuron to 0. In setting the output to 0, the cost
function becomes more sensitive to neighbouring neurons changing the way
the weights will be updated during the process of backpropagation.
37
Deepfake Video Detection
• Train Test Split:The dataset is split into train and test dataset with a ratio of
70% train videos (4,200) and 30% (1,800) test videos. The train and test split
is a balanced split i.e 50% of the real and 50% of fake videos in each split.
Refer figure 7.6
• Data Loader: It is used to load the videos and their labels with a batch size of
4.
• Training: The training is done for 20 epochs with a learning rate of 1e-5
(0.00001),weight decay of 1e-3 (0.001) using the Adam optimizer.
• Cross Entropy: To calculate the loss function Cross Entropy approach is used
because we are training a classification problem.
38
Deepfake Video Detection
• Export Model: After the model is trained, we have exported the model. So
that it can be used for prediction on real time data.
• The new video for prediction is preprocessed(refer 8.3.2, 7.2.2) and passed
to the loaded model for prediction
39
Deepfake Video Detection
• The trained model performs the prediction and return if the video is a real or
fake along with the confidence of the prediction.
40
Deepfake Video Detection
Chapter 9
Software Testing
1. Unit Testing
2. Integration Testing
3. System Testing
4. Interface Testing
Non-functional Testing
1. Performance Testing
2. Load Testing
3. Compatibility Testing
41
Deepfake Video Detection
faces
5 Deepfake video Fake Fake Pass
6 Enter /predict in URL Redirect to /upload Redirect to /upload Pass
7 Press upload button Alert message: Please Alert message: Please Pass
without selecting select video select video
video
8 Upload a Real video Real Real Pass
9 Upload a face cropped Real Real Pass
real video
10 Upload a face cropped Fake Fake Pass
fake video
Chapter 10
42
Deepfake Video Detection
43
Deepfake Video Detection
44
Deepfake Video Detection
45
Deepfake Video Detection
10.2 Outputs
46
Deepfake Video Detection
_20_frames_
FF_data
model_95_acc FaceForensic++ 2000 40 95.22613
_40_frames_
FF_data
model_97_acc FaceForensic++ 2000 60 97.48743
_60_frames_
FF_data
model_97_acc FaceForensic++ 2000 80 97.73366
_80_frames_
FF_data
model_97_acc FaceForensic++ 2000 100 97.76180
_100_frames_
FF_data
model_93_acc Celeb-DF + 3000 100 93.97781
_100_frames_ FaceForensic++
celeb_FF_data
model_87_acc Our Dataset 6000 20 87.79160
_20_frames_
final_data
model_84_acc Our Dataset 6000 10 84.21461
_10_frames_
final_data
model_89_acc Our Dataset 6000 40 89.34681
_40_frames_
final_data
Chapter 11
11.1 Deployment
Following are the steps to be followed for the deployment of the application.
47
Deepfake Video Detection
Note: As its a private repository only authorized users will be able see the code
and do the process of deployment
11.2 Maintenance
Following are the steps to be followed for the updating the code to the latest
version of the application.
2. git pull
Note: As its a private repository only authorized users will be able see the
code and do the process of deployment
5. Copy all the trained models into models folder(optional: Do this if any new
models are added).
48
Deepfake Video Detection
Chapter 12
12.1 Conclusion
We presented a neural network-based approach to classify the video as deep
fake or real, along with the confidence of proposed model. Our method is capable
of predicting the output by processing 1 second of video (10 frames per second)
with a good accuracy. We implemented the model by using pre-trained ResNext
CNN model to extract the frame level features and LSTM for temporal sequence
processing to spot the changes between the t and t-1 frame. Our model can
process the video in the frame sequence of 10,20,40,60,80,100.
• Web based platform can be upscaled to a browser plugin for ease of access
to the user.
• Currently only Face Deep Fakes are being detected by the algorithm, but the
algorithm can be enhanced in detecting full body deep fakes.
Appendix A
References
[1] Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus
Thies,Matthias Nießner, “FaceForensics++: Learning to Detect Manipulated Facial
Images” in arXiv:1901.08971.
49
Deepfake Video Detection
50
Deepfake Video Detection
[18] D. Güera and E. J. Delp, "Deepfake Video Detection Using Recurrent Neural
Networks," 2018 15th IEEE International Conference on Advanced Video and
Signal Based Surveillance (AVSS), Auckland, New Zealand, 2018, pp. 1-6.
[19] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic
human actions from movies. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 1–8, June 2008. Anchorage,
AK
[20] Umur Aybars Ciftci, ˙Ilke Demir, Lijun Yin “Detection of Synthetic Portrait
Videos using Biological Signals” in arXiv:1901.02212v2
[21] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.
arXiv:1412.6980, Dec. 2014.
[22] ResNext Model : https://pytorch.org/hub/pytorch_vision_resnext/ accessed
on
06 April 2020
[23] https://www.geeksforgeeks.org/software-engineering-cocomo-model/
Accessed on 15 April 2020
[24] Deepfake Video Detection using Neural Networks
http://www.ijsrd.com/articles/IJSRDV8I10860.pdf
[25] International Journal for Scientific Research and Development
http://ijsrd.com/
Appendix B
51
Deepfake Video Detection
Project Planner
52
Deepfake Video Detection
Appendix C
53
Deepfake Video Detection
4. Publication Details:
54