0% found this document useful (0 votes)
8 views6 pages

Base Paper 4

The document discusses the advancements in deepfake technology and its implications for privacy, democracy, and national security, highlighting the need for effective detection methods. It proposes a system that utilizes deep learning techniques, such as CNN and GAN, to identify manipulated media files, addressing the limitations of traditional detection methods. The paper also reviews various existing techniques and challenges in deepfake detection, emphasizing the importance of developing robust solutions to mitigate the risks associated with deepfakes.

Uploaded by

obe.mechanical
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

Base Paper 4

The document discusses the advancements in deepfake technology and its implications for privacy, democracy, and national security, highlighting the need for effective detection methods. It proposes a system that utilizes deep learning techniques, such as CNN and GAN, to identify manipulated media files, addressing the limitations of traditional detection methods. The paper also reviews various existing techniques and challenges in deepfake detection, emphasizing the importance of developing robust solutions to mitigate the risks associated with deepfakes.

Uploaded by

obe.mechanical
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | December 03-05,2020 | Trivandrum

Deepfake Detection in Media Files - Audios,


Images and Videos
Bismi Fathima Nasar Sajini T Elizabeth Rose Lason
Dept. of Computer Science and Engineering Knowledge Resource Center Dept. of Computer Science and Engineering
ER & DCI Institute of Technology CDAC ER & DCI Institute of Technology
Thiruvanthapuram, Kerala - India Thiruvanthapuram, Kerala - India Thiruvanthapuram, Kerala - India
[email protected] [email protected] [email protected]
2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | 978-1-7281-9052-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/RAICS51191.2020.9332516

Abstract—Recent advancement in deep learning has applied to in the testing phase of these videos to examine the facial
solve various complex problems ranging from big data analytic expression, movements of a person, etc. The initial target of
to computer vision and human-level control. One among them is Deepfake videos were some public figures such as celebrities
the deepfake technology which becomes a real threat to privacy,
democracy, and national security. Deepfake is hyper-realistic and politicians since they have a large number of videos and
digitally manipulated videos to depict people saying and doing images available online. Thus these videos can be a threat
things that never actually happened. This technology has been to democracy since these methods were used to create fake
used in many fields in film industries for recreating videos without speeches of political leaders which can cause impact over
re-shooting, awareness video generation such as creating voices the election campaigns. Evidence has been obtained that even
of those who have lost theirs or updating episodes of movies
without re-shooting them at very low cost. This technology has these Deepfake has misled military troops to lose the battle by
many harmful usages in social media, pornographic sites, etc. to creating a fake bridge across a river although the bridge was
deface peoples which largely dominate the positive side of this not there in reality. There is also positive side for Deepfakes
application of deep learning. Also, the creation and spreading such as this method can be used in creating videos which was
of these videos are increasing rapidly along all fields of media lost or updating episodes of movies without re-shooting them.
files. Therefore, it is very much important to develop efficient
tools that can automatically detect the deepfake in these videos But the negative impact of these videos dominates over the
and thus reduce the public harm caused by such videos. In the positive usage of these videos. The various methods used in
early stages of deepfake detection, traditional technologies like the creation of Deepfake include - Face Replacement, Face
signal processing, image processing, lip-syncing, etc were used Re-enactment, Face Generation using GAN, Face Generation
but this provides very little accuracy when combined with the using the attributes that exist in the source image, and Speech
recent technologies of deep learning. So, here a system is proposed
that can automatically detect the deepfake in media files such Synthesis. Also, the process of creating those manipulated
as images, videos, and audios. This uses an image processing images and videos is also much simpler today as it needs as
approaches combined with deep learning which detects the little as an identity photo or short video of a target individual.
inconsistency that exists in fake media. Recent technology can even create a Deepfake video with
Index Terms—GAN, CNN, LSTM still images. Deepfake thus not only after the celebrities
but also ordinary peoples. Also with the development of
I. I NTRODUCTION
various applications like DeepNude, Fakeapp, etc. shows more
The first known attempt of Deepfake was developed during spreading of the threats caused due to Deepfakes as it can
1865. It can be found in one of the iconic portraits of U.S transfer a person to non-consensual porn. These forms of
President Abraham Lincoln. The lithography mixes Lincoln’s falsification cause a real threat to privacy, identity, national
head with the body of southern politician John Calhoun. After security and affect many human lives. This paper basically
Lincoln’s assassination, demand for lithographs of him was consists of VII sections. Section I is the introduction, Section
so great that engravings of his head on other bodies appeared II contains the recent techniques involved in the creation
almost overnight. Later on, in just over 18 months, a small and detection of the Deepfake multimedia, Section III covers
subculture on Reddit focused on combining and superimposing the various works that has come up based on the Deepfake
images and videos with realistic and believable results that technology, Section IV states the proposed system detailing
have exploded onto the national scene. Deepfake technologies with a block diagram, Section V includes the detailing of the
have been a combination of Artificial Intelligence along with various stages in the development of the proposed system,
Machine Learning that allows to create fake videos which Section VI covers the various challenges that we faced during
are very much difficult to differentiate from authentic videos. the development of the proposed system along with the future
Deepfake algorithm used many deep learning methods like scopes and finally the last section that is Section VII brings
auto encoders and GAN has been widely used to train large the entire paper into a conclusion phrase.
dataset to generate the training models. These models are used

978-1-7281-9052-5/20/$31.00 ©2020 IEEE 74


Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on December 04,2024 at 14:05:53 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | December 03-05,2020 | Trivandrum

II. TECHNICAL APPROACH IN DEEPFAKES samples, it gives the generator a hint about what not
to do when creating the next fake samples. Conversely,
With the advancement in the deep learning techniques, the
as the discriminator gets better at detecting the fake
state of the art approach has also put up its footprint in
video samples, the generator gets better at creating them.
the Deepfake creation and detection. This approach mainly
Together, the generator and discriminator form something
includes AI-based techniques like CNN architecture, GAN,
called a generative adversarial network (GAN). The figure
etc. explained in the section below. Also, Code snippets are
describing the working of the GAN is given below.
readily available so it is very much easy to generate our
own user-friendly software, require very less expert knowledge
and the process is time-consuming. Many studies have been
done to make it closer to the user expectation and to increase
their usage by improving the techniques. The basic techniques
involved in the state of the art approach used in this work is
covered in this section below:
• CNN Network - CNN or the Convolutional Neural
Network is a Deep Learning algorithm that takes the
images of various categories as input and detect the
point of difference between the categories which helps
to differentiate them from one another. This technique
is mainly deployed in the creation and detection of Fig. 2. GAN Network
Deepfake. Also, the pre-processing required in a ConvNet
is much lower as compared to other classification III. RELATED WORKS
algorithms. CNN Network basically consists of two In this section, we are going to discuss the various literature
parts: One is a convolution tool layer that contains work in the Deepfake creation and detection domain which
a convolution, max-pooling, and activation layer that we specified in our survey paper on deepfake [1]. In the
splits the various features in the image and uses it for paper [2], a detailed survey on the various passive blind
analysis. Secondly is a fully connected layer that uses video content authentication with the main focus on forgery
the output of the convolution tool layer to predict the detection, video recapture, and phylogeny detection. The auto
best description for the image. CNN uses the predictions encoders had the following disadvantages when compared
from the layers to produce a final output that presents to the recently used convolutional or neural networks [3].
a vector of probability scores to represent the likelihood First disadvantage is the Lack of temporal awareness which
that specifies the distinct features of each class. The figure is the basic source of multiple abnormalities in the auto
showing an example of the analysis over CNN is shown encoders. Next is the inconsistencies existing with the face
below. encoder i.e. the Encoder is unaware of the skin tone or other
background information. The third disadvantage is the visual
inconsistency that exist due to the use of multiple cameras,
different lighting conditions, or simply the use of different
video codecs which make it tough for the auto encoder
to create very accurate and realistic videos under different
conditions. Finally, it is the inconsistency in choosing the
illuminates between the different background with frames.
This usually leads to blinking in the face region in the most
of the Deepfake videos. So, to overcome these disadvantages
over the auto encoder, another deep learning technique like
the convolutional neural network (CNN), Recurrent Neural
Fig. 1. CNN Network Network (RNN), Generative Adversarial Network (GAN), etc.
were developed.
• Generative Adversarial Network (GAN) - Generative With the advancement in this convolutional network,
Adversarial Networks (GANs) is a powerful class of there were many other schemes were developed in the
neural networks that are used for unsupervised learning. creation and detection of Deepfake using Recurrent Neural
Deepfake videos are usually created by using two Network (RNN), Long Short-Term Memory (LSTM), even the
competing GAN network-based AI systems -one is called hybrid approaches of all the recent algorithm in the Deep
the generator and the other is called the discriminator. The Learning, etc. David Guera and Edward J Delp [3] brings
generator creates fake videos and makes the discriminator up the first approach where the frame level features are
distinguish between fake and real video samples.Each extracted out after each processing in convolutional neural
time the discriminator accurately identifies the fake video networks. These features are then fed into the recurrent

75
Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on December 04,2024 at 14:05:53 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | December 03-05,2020 | Trivandrum

neural network as training samples. The output from this phase, the generator network is adjusted and oriented as a
RNN is the classification result. Along with the combination Task network where some samples from the training samples
of CNN and RNN, a set of encoder-decoder is also used are assigned with a specific task that is the generation of
for dimensionality reduction and image compression in the the manipulated data. This takes up the advantage of a GAN
training and generation phase. Another such approach was network and also overcomes the disadvantage of the GAN
brought up by Ekraam Sabir in [4] were the face manipulation network.
detection using RNN strategies was discussed. They uses a Later on, a Hybrid combination of LSTM and Encoder-
combination of variations in RNN models along with domain- Decoder architecture [9] was developed for image forgery
specific face pre-processing techniques to obtain state-of- detection. In this system, a high-confidence architecture is used
the-art performance on publicly available facial manipulation which utilizes re-sampling features used to capture artifacts.
videos. The experimental evaluation shows an accuracy of 4.55 These artifacts include JPEG quality loss, up sampling, down
%. A similar approach was stated in [5] for the swapped face sampling, rotation, etc., Long-Short Term Memory (LSTM)
detection using Deep Learning and Subjective Assessment. In cells, an encoder-decoder network to distinguish whether the
this paper, they proposed a swapped face detection system specific area of the image has tampered or not. Here they
which shows 96 % positive results with few false alarms use a spatial map and the frequency domain correlation to
when compared with the other existing systems. Along with determine the distinct characteristics of the manipulated and
the detection of face-swapping, this model also evaluates the non-manipulated regions by combining the effort of LSTM
uncertainty in each prediction which is very much critical in and Encoder network. Finally, the decoder network learns
the evaluation of the performance of a system. In order to how it is mapped from low-resolution feature maps to pixel-
improve this predictability, they have setup a website to review wise predictions for tamper detection. Through this work, they
the human response over the dataset by collecting pair to pair also present a dataset that can be used in further research
comparison of images over the videos on humans. Based on work on media forensics. With several experiments conducted
these comparisons, images are classified as real or fake. It’s with different data sets, they brought up to the conclusion
experimental results show that this proposed model is much that their scheme was efficiently segmented various types
better when compared to the existing systems. of manipulations including copy-move, object removal, and
When it comes to the most advanced version of this splicing.
deep neural network, a hybrid approach was implemented The active learning approach was also stated in this field
in [6]. A two-stream neural network was proposed trained of deep learning [10] as a wider advancement to acquire
by GoogLeNet. In the first stream, the tampering artifacts annotations for data from a human reaction by selecting
like strong edges near lips, blurred areas on the forehead, informative samples with a high probability to enhance
etc. was detected in image classification stream and in the performance. This model is implemented to generate a label to
second stream, a patch-based three-layer network is trained for the data in a cheaper manner. Here for each sample, a reward
capturing local noise components and camera characteristics. will be assigned by the classifier trained with these pre-existing
This network is designed to determine whether it comes from labels and these rewards can be used to guide a conditional
the same image or not. GAN to generate useful and informative samples with a higher
Another concept in the hybrid model was pairwise learning probability for a certain label. Finally, with the evaluation of
[7] where a deep learning-based approach is used to identify this model, the effectiveness of the model can be estimated
manipulated images with contrastive loss. First, state-of-the-art showing that the generated samples are capable of improving
GANs network is used to generate pair of fake and real images. the classification performance in popular image classification
Then, these pairs of image samples are fed into a common tasks. Then certain pre-processed authentic or fake images
fake features network (CFFN) to learn the distinguish feature [11] can be used to train the CNN network in the generation
between the fake image and real image as a paired information. which destroys the unstable low level noise cues on the
Then in the final stage, a small network will be used to manipulated images and the discriminative network is forced
combine these features to make the decision on whether it’s to learn more distinct features to classify the manipulated and
fake or real. Experimental results show that the proposed real face images. A key difference with other GAN related
method has high performance when compared to the existing methods is that here they use an image pre-processing step
state of the art image detection techniques. In the paper [8], in the training stage to destroy low-level unstable artifacts of
a task-oriented GAN for PoISAR image classification and GAN images and force the discriminator to focus on more
clustering techniques were used which consists of a Triplet distinct clues and by doing so they improve the generalization
network. Along with the generator and the discriminator, capabilities. But this approach was difficult to implement and
they are another network called the task network or T-net. has got only some preliminary results. Thus, to improve the
The network in this proposed system basically has two task discriminative capabilities, face wrapping artifact detection
networks – one is called as the classifier and another is called technique was developed [12]. This method was developed
as a clustered network. The first is the learning stage which based on the observation that current Deepfake algorithms
has the two competing generator and discriminator network can generate images of low resolution and further wrapping is
which work hand in hand as in GANs network. In the second essential to make the manipulated one with that of the original

76
Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on December 04,2024 at 14:05:53 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | December 03-05,2020 | Trivandrum

one. So, such transform leaves an artifact called resolution Mind doing an especially good job of creating realistic human-
inconsistency along the line of fake one and these artifacts can like voices, videos, or images. In order to reduce the harmful
be effectively captured using a CNN network for detecting the effects caused by these kinds of media files, we proposed
authenticity of the video. a system that could detect the forgery in media files such
Many other approaches with indirect implementation of as audios videos and images. There many existing systems
Deep Learning were also implemented in parallel to the direct coming up in this domain recently. The major issues with the
approaches but shows less accuracy when compared to other existing techniques are that they deploy a detection method
existing systems. One such approach was proposed in [13] that can detect the forgery in any one of the media files.
where an automated system is developed that can detect the Most of them use traditional approach which needs expert
forgery in the videos being recorded in the camera and in its knowledge and the process involved is time-consuming. In
audio channel. Here they used the method in which they detect this work, we propose a system that uses a combinational
the audio-visual inconsistencies with certain artifacts like lip approach using CNN architecture for image processing. The
syncing, Dubbing inconsistencies etc. In the experimental main objective of this system is to reduce the time involved
evaluation, the proposed system is evaluated with various in the processing to improve its performance when compared
classifiers like the LSTM, GMM, PCA etc. but given a better to the existing systems. The block diagram corresponding to
result only with LSTM which flows as the drawback of the the proposed system is shown in figure below.
system. Another such approach in the detection of the forgery
in the images and videos was a capsule forensic approach
stated in the article [14] where they use a capsule network
to detect the anomaly in the replay attack using printed
images and videos to the computer-generated video using deep
convolution neural networks. This experiment brings up the
feasibility of generating a common detection technique that
can be used to detect the forgery in the videos as well as
images. Here the capsule network can be used in the domain
along with computer vision where they use random noise
samples in the training phase. The main aim of this work
was to protect the random samples against machine attacks as
well as mixed attacks. Another approach is the Eye blinking
detection [15] where the temporal features of the eye and the
inconsistency in the eye blinking is detected to identify the Fig. 3. Block Diagram of the Proposed System
manipulation in the sample file using LSTM.
Finally, we provide a detail on a rare approach completely The system consists of four modules namely Data
out from deep learning domain that can be used for the forgery Preparation module, Image Enhancement, CNN Model
detection in [16] where they use a PRNU (Photo Response Generation, and Testing Phase or Detection phase. Initially,
Non-Uniformity) analysis for detecting the deep fake video the user inputs the media file in the form of audio, video,
manipulation. In this approach, the videos are divided into or images. In the Data Preparation, the input media files
different frames and the frames corresponding to the face is get converted into images that is, the image will be taken as
cropped. Then the mean correlation is calculated between the such, the videos will be converted into keyframes( Keyframes
authentic and Deepfake one is calculated to determine which are usually the complete frames on which the data stream
one is fake and which is not. This method was also used to of a video is saved ) and the audio is converted into
determine the amount of tampering in each Deepfake videos. spectrogram image. Then, these image files are given into
PRNU analysis shows a notable difference in mean normalized an Image Enhancement Module where the image gets
cross-correlation scores between real and Deepfake medias. In enhanced by removing the noise factor if present in the image
the early stage of implementation of the detection techniques, files, and this process is called denoising. This enhanced
there was no much academic paper found on the detection of image will be given for detection against the models being
Deepfake. Although efforts have been brought up to detect and generated in the CNN Model Generation Module. The
remove these kinds of videos from websites such as Gyfcat decision based on the testing will be given out in the
[Matsakis, 2018]. Gyft attempts to use artificial intelligence form of prediction probability based on both the classes.
and facial recognition software to mark inconsistencies in the This prediction probability has been calculated based on the
facial region of an uploaded video. comparison between the distinct features extracted out from
the input media to the features saved in its corresponding
IV. PROPOSED SYSTEM model file. Based on this probability, the decision on forgery
is displayed over the user interface.
Deepfake technology has advanced a great deal in recent The CNN Model Generation Module consists of two phases
years with the development of neural networks from Deep namely the training and testing phase. In the training phase,

77
Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on December 04,2024 at 14:05:53 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | December 03-05,2020 | Trivandrum

the patterns in the classes is learned by the model. Basically, 538,508 parameters in which 524,428 are trainable parameters
three models are generated in this proposed system; one for and the remaining are non-trainable parameters. The model
images, another for the video, and the final one for the audio training provided an accuracy ranging from 0.6 -1.0 and loss
files. The model for the images is generated using a simple ranging from 0.54 - 0.0. The plot corresponding to accuracy
CNN network. The input to these network layers consists and loss is given in the figures below.
of a set of the fake and real image and the output file will
be the Keras model called ”model image.h5” which contains
features that can be used to classify between these two classes
- fake and real. This model file is further used in the detection
process in the case of images. The model for video detection
is generated using the model for image forgery detection, but
an improved version using transfer learning. This is given
the name model video.h5. For audio detection, we have
generated another model called ”model audio.h5”. This was
generated by feeding spectrogram images to the CNN network
which learn the variation in the normal energy distribution
or intensity distribution which is further used in the testing Fig. 4. Accuracy and Loss Corresponding to Image Modelling
phase. In the testing phase, the input sample was tested against
the models been generated and made the decision based In the case of video, the CNN training was done by adding
on the prediction probability. Then based on this prediction more specific samples as training data and improved the
probability, the decision on fake or real is displayed. Display performance of the model generated in image detection. The
of meta-data corresponding to the input media file is added generated model file was named as ”model video.h5”. This
as an additional feature. This is stored in HTML format. model file was also tested against different data sets like
The metadata file is generated by using an Exif Tool and Deepfake Timit, Forensics++, VidTimit dataset, etc, and it’s
the location of the corresponding metadata file will be also performance evaluation which is tabulated, as shown in figure
displayed over the interface. below.
V. RESULTS AND DISCUSSIONS
The main challenge we have faced during the development
of the proposed system is the lack of dataset for training and
testing in the CNN Network. To address these challenges, a
image dataset is generated using face-swapping techniques.
The dataset for the image was generated by using the technique
called face-swapping. This technique begins by detecting the
face region in both source and target image and then cut off
the face region from both of them. The face from the source
is then placed over the empty space in the target image. As
a base for the generation of the fake images, we use the face
Fig. 5. Evaluation Result corresponding to Video Modelling
images of 50k celebrities from CelebA dataset. In the case of
audio dataset generation, we use the various signal processing
In the case of audio, the CNN classification is done by
and manipulation techniques over real human voice recorders.
analyzing the variation in the normal energy distribution
After the dataset generation, we start our implementation
or intensity distribution in the spectrogram. The training
with the data preparation. In this proposed system, we use
starts with the generation of the spectrogram image of the
a basic image classifier in all the three cases. So, initially
1000+ samples of fake and real images that are given to the
we need to convert all the three media files to images for
CNN network containing three convolution layers, three max-
the training and testing in CNN network for the development
pooling layers, and five activation layers. The output from
of Keras model. The videos are converted into sequence of
these layers is given to the fully connected convolution layer
key frames using OpenCV and the audio is converted into
and is trained with an epoch = 15. The characteristics of
spectrogram images using Matplotlib. Then these converted
red patterns corresponding to the fake and real samples are
image samples undergoes image enhancement with the Librosa
extracted out to generate the model audio.h5 file. This training
package.The next phase is the training and testing phase over
provides an accuracy of 0.8611 and Loss of 0.290 when trained
binary categorical CNN network. In the CNN training for
with 100 samples and further to increase the accuracy the
images, the CNN network consists of two sequential layers and
number of samples is increased to more than 1000 samples.
two dense layers each for each class of images. The generated
The accuracy and loss plot is shown in the figure below.
dataset is fed over the CNN network to generate the model
file named as model image.h5.The generated model file has

78
Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on December 04,2024 at 14:05:53 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) | December 03-05,2020 | Trivandrum

finally is the detection module, where the real-time detection


against different media files taken from various platforms takes
place. The accuracy of the model is also evaluated by testing
against various datasets like the Deepfake Timit dataset, Face
Forensic++, etc. and almost got the accuracy as 0.9 for all the
three model files.
R EFERENCES
[1] Bismi Fathima Nasar, Sajini. T, Elizabeth Rose Lalson, ”A Survey
on Deepfake Detection Techniques”, International Journal of Computer
Engineering in Research Trends, pp:49-55 ,August-2020.
Fig. 6. Accuracy and Loss Corresponding to Audio Modelling [2] Raahat Devender Singh, Naveen Aggarwal,” Video content
authentication techniques: a comprehensive survey”, Springer,
Multimedia Systems, pp. 211- 240, 2018.
[3] David G’uera Edward J. Delp,” Deepfake Video Detection Using
In the testing phase with all the three trained Keras model, Recurrent Neural Networks”, Video and Image Processing Laboratory
we have tested with generated dataset as well as some (VIPER), Purdue University,2018.
[4] Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo
real time dataset available like the VidTimit dataset, Face Masi, Prem Natarajan,” Recurrent Convolutional Strategies for Face
Forensics++ etc. and calculated the various parameter like the Manipulation Detection in Videos”, In proceeding of the IEEE Xplore
True positive, True negative, False positive, False negative, Final Publication, pp. 80-87, 2018.
[5] Xinyi Ding, Zohreh Razieiy, Eric C, Larson, Eli V, Olinick, Paul
Accuracy, Precision and Recall. In all the three cases, we have Krueger, Michael Hahsler,” Swapped Face Detection using Deep
got an accuracy nearly 0.9. Learning and Subjective Assessment”, Research Gate, pp. 1-9, 2019.
[6] Peng Zhou, Xintong Han, Vlad I. Morariu Larry S. Davis,” TwoStream
VI. CHALLENGES AND FUTURE SCOPES Neural Networks for Tampered Face Detection”, IEEE Conference on
Computer Vision and Pattern Recognition, 2019.
The main challenges faced is the lack of availability [7] Chih-Chung Hsu, Yi-Xiu Zhuang, and Chia-Yen Lee,” Deep Fake Image
of Deepfakes with high quality, properly classified and Detection based on Pairwise Learning”, MDPI, Applied Science,2020,
doi:10.3390/app10010370.
the original data. Both these data is required for training [8] Fang Liu, Licheng Jiao, Fellow, IEEE, and Xu Tang, Member”
a supervised model. Available dataset normally contains TaskOriented GAN for PolSAR Image Classification and Clustering”,
either original or the fake. Another important challenge is IEEE Transactions On Neural Networks and Learning Systems, Volume
30, Issue 9, 2019.
the incompatibility of these detection techniques and their [9] Jawadul H. Bappy, Cody Simons, Lakshmanan Nataraj, B.S. Manjunath,
associated packages in the generic systems. Since, the system and Amit K. Roy-Chowdhury,” Hybrid LSTM and EncoderDecoder
that is being used generally require high specification like Architecture for Detection of Image Forgeries”, IEEE Transaction on
Image Processing, Volume: 28 , Issue: 7 ,pp. 1-14, 2019.
high-quality graphics card, interfaces that support machine [10] Xinsheng Xuan, Bo Peng, Wei Wang and Jing Dong,” On the
learning packages, high memory for the training process, etc. Generalization of GAN Image Forensics”, Computer Vision and Pattern
These challenges always pull the research works backward. Recognition, Cornell University, Volume 1, pp. 1-8, 2019.
[11] Yuezun Li, Siwei Lyu,” Exposing DeepFake Videos by Detecting
This system with better dataset and training, it can be used in Face Warping Artifacts”, In Proceedings of the IEEE Xplore Final
various fields like for women safety, Fake news detection, Fake Publication, pp. 46- 52, 2019.
checker over identity of an individuals and on investigation of [12] Pavel Korshunov, S´ebastien Marcel,” Speaker Inconsistency Detection
in Tampered Video”, 26th European Signal Processing Conference
various cyber crimes in future. (EUSIPCO), 2018, ISBN 978-90-827970-1-5.
[13] Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen,”
VII. CONCLUSION CapsuleForensics: Using Capsule Networks to detect Forged Images
and Videos”, ICASSP, pp. 2307 – 2311, 2019
Deepfakes are hyper-realistic video forgeries in which [14] Li, Y., Chang, M. C., and Lyu, S, “Exposing AI created fake videos by
people say or do things that they have never actually said or detecting eye blinking”, In IEEE International Workshop on Information
done which have become a real threat to society. Mere visual Forensics and Security (WIFS) (pp. 1-7). 2018.
[15] Steven Fernandes, Sunny Raj, Rickard Ewetz, Jodh Singh Pannu,
verification is not enough to make a judgment on this kind of Sumit Kumar Jha, Eddy Ortiz, Iustina Vintila, Margaret Salte,”
forgery. Since the visual quality of Deepfakes has become so Detecting deepfake videos using attribution-based confidence metric”,
flawless and most of the current technologies develop really In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition Workshops (pp. 308-309), 2020.
high quality ones. Digger’s solution is to develop a toolkit [16] A.M. Rodriguez, Z. Geradts,” Detection of Deepfake Video
that can detect deep fakes in videos, images, and audios Manipulation”, In Proceedings of the 20th Irish Machine Vision
using the state of the art approach. So, here a scheme is and Image Processing conference, Belfast, Northern Ireland, pp.
133-136, 2018, ISBN 978-0-9934207-3-3.
developed to detect the forgery in the Deepfakes by deploying [17] Rohini Sawant and Manoj Sabnis,” A Review of Video Forgery and Its
a combinational approach that uses image processing with Detection”, IOSR Journal of Computer Engineering (IOSR-JCE) eISSN:
the CNN network. The proposed system basically has four 2278-0661, Volume 20, Issue 2, p-ISSN: 2278-8727, 2018.
[18] Thanh Thi Nguyen, Cuong M. Nguyen, Dung Tien Nguyen, Duc Thanh
modules. The first one is the Data preparation module where Nguyen and Saeid Nahavandi,” Deep Learning for Deepfakes Creation
the input samples are converted into image samples. Secondly and Detection”, IEEE, pp. 1-12, 2019.
is the Data Enhancement module where the noise component
from the image is removed. Next is the CNN Network where
the training and testing take place to generate the model and

79
Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY TAXILA. Downloaded on December 04,2024 at 14:05:53 UTC from IEEE Xplore. Restrictions apply.

You might also like