Paper (Related Project-2)
Paper (Related Project-2)
AND VIDEOS
2308
64 16 2 8 1 cap. 1
real real
cap. 2
input fake
cap. 3
cap. 1
fake real
64 × 64 64 × 64 16 8 8 cap. 2
input fake
convolution cap. 3
batch normalization primary capsules output capsules
ReLU
statistical pooling Fig. 4. Average results calculated by primary capsules and
output capsules from real and fake images generated with
Face2Face method [1]. Three primary capsules have signif-
Fig. 3. Detailed design of primary capsule. Upper numbers
icantly different reactions between real and fake inputs. Al-
indicate number of filters (depth) while lower number indicate
though their weights are also different, there is strong agree-
size of outputs of corresponding filters.
ment in the output capsules.
ksj k2 sj
vj = squash(sj ) = (1) Table 1. Half total error rate of state-of-the-art detection
1 + ksj k2 ksj k methods on REPLAY-ATTACK dataset [7].
Unlike Sabour et al.’s work [15], we use the cross-entropy Method HTER (%)
loss function: Chigovska et al. [7] 17.17
Pereira et al. [8] 08.51
Kim et al. [17] 12.50
L = − (y log(ŷ) + (1 − y) log(1 − ŷ)) , (2) Yang et al. [18] 02.30
where y is the ground truth label and ŷ is the predicted label Menotti et al. [19] 00.75
calculated using equation 3, in which m is the dimension of Alotabib et al. [20] 10.00
Ito et al. [21] 00.43
the output capsule vj .
Nguyen et al. [12] 00.00
| ! Capsule-Forensics 00.28
1 X v1 Capsule-Forensics-Noise 00.00
ŷ = sof tmax (3)
m i v|2 :,i
2309
4.2. Face Swapping Detection
Table 4. Accuracy of state-of-the-art facial reenactment de-
We determined the ability of our proposed method to detect tection methods at frame level on FaceForensics dataset [11]
face swapping using a deepfake technique on the deepfake with three levels of compression: no compression, easy com-
dataset proposed by Afchar et al. [13] at both the frame and pression (23), and strong compression (40).
video levels. As shown in Tables 2 and 3, our proposed Accuracy (%)
Method
method with random noise (Capsule-Forensics-Noise) had No-C Easy-C Hard-C
the highest accuracy in both cases. Fridrich & Kodovsky [10] 99.40 75.87 58.16
Cozzolino et al. [22] 99.60 79.80 55.77
Bayar & Stamm [24] 99.53 86.10 73.63
Table 2. Accuracy of face swapping detection at frame level Rahmouni et al. [25] 98.60 88.50 61.50
on deepfake dataset [13]. Raghavendra et al. [23] 97.70 93.50 82.13
Method Accuracy (%) Zhou et al. [28] 99.93 96.00 86.83
Meso-4 [13] 89.10 Rossler et al. [11] 99.93 98.13 87.81
MesoInception-4 [13] 91.70 Meso-4 [13] 94.60 92.40 83.20
Nguyen et al. [12] 92.36 MesoInception-4 [13] 96.80 93.40 81.30
Capsule-Forensics 94.47 Nguyen et al. [12] 98.80 96.10 76.40
Capsule-Forensics-Noise 95.93 Capsule-Forensics 99.13 97.13 81.20
Capsule-Forensics-Noise 99.37 96.50 81.00
2310
7. PREFERENCES [14] Geoffrey E Hinton, Alex Krizhevsky, and Sida D Wang,
“Transforming auto-encoders,” in ICANN. Springer,
[1] Justus Thies, Michael Zollhofer, Marc Stamminger, 2011.
Christian Theobalt, and Matthias Nießner, “Face2Face: [15] Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton,
Real-time face capture and reenactment of RGB “Dynamic routing between capsules,” in NIPS, 2017.
videos,” in CVPR. IEEE, 2016. [16] Geoffrey E Hinton, Sara Sabour, and Nicholas Frosst,
[2] Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, “Matrix capsules with EM routing,” in ICLRW, 2018.
Weipeng Xu, Justus Thies, Matthias Nießner, Patrick [17] Wonjun Kim, Sungjoo Suh, and Jae-Joon Han, “Face
Pérez, Christian Richardt, Michael Zollhöfer, and Chris- liveness detection from a single image via diffusion
tian Theobalt, “Deep video portraits,” in SIGGRAPH. speed model,” IEEE TIP, 2015.
ACM, 2018.
[18] Jianwei Yang, Zhen Lei, and Stan Z Li, “Learn convo-
[3] Hadar Averbuch-Elor, Daniel Cohen-Or, Johannes lutional neural network for face anti-spoofing,” arXiv
Kopf, and Michael F Cohen, “Bringing portraits to life,” preprint arXiv:1408.5601, 2014.
ACM TOG, 2017. [19] David Menotti, Giovani Chiachia, Allan Pinto,
[4] Joon Son Chung, Amir Jamaludin, and Andrew William Robson Schwartz, Helio Pedrini, Alexan-
Zisserman, “You said that?,” arXiv preprint dre Xavier Falcao, and Anderson Rocha, “Deep
arXiv:1705.02966, 2017. representations for iris, face, and fingerprint spoofing
[5] Supasorn Suwajanakorn, Steven M Seitz, and Ira detection,” IEEE TIFS, 2015.
Kemelmacher-Shlizerman, “Synthesizing obama: learn- [20] Aziz Alotaibi and Ausif Mahmood, “Deep face liveness
ing lip sync from audio,” ACM TOG, 2017. detection based on nonlinear diffusion using convolu-
[6] “Terrifying high-tech porn: Creepy ’deepfake’ videos tion neural network,” Signal, Image and Video Process-
are on the rise,” https://www.foxnews.com/ ing, 2017.
tech/terrifying-high-tech-porn- [21] Koichi Ito, Takehisa Okano, and Takafumi Aoki, “Re-
creepy-deepfake-videos-are-on-the- cent advances in biometrics security: A case study of
rise, Accessed: 2018-02-17. liveness detection in face recognition,” in APSIPA ASC.
IEEE, 2017.
[7] Ivana Chingovska, André Anjos, and Sébastien Marcel,
“On the effectiveness of local binary patterns in face [22] Davide Cozzolino, Giovanni Poggi, and Luisa Verdo-
anti-spoofing,” in BIOSIG, 2012. liva, “Recasting residual-based local descriptors as con-
volutional neural networks: an application to image
[8] Tiago de Freitas Pereira, André Anjos, José Mario forgery detection,” in IH&MMSEC. ACM, 2017.
De Martino, and Sébastien Marcel, “Can face anti-
[23] R. Raghavendra, Kiran B. Raja, Sushma Venkatesh, and
spoofing countermeasures work in a real world sce-
Christoph Busch, “Transferable deep-CNN features for
nario?,” in ICB. IEEE, 2013.
detecting digital and print-scanned morphed face im-
[9] Yuezun Li, Ming-Ching Chang, Hany Farid, and Si- ages,” in CVPRW. IEEE, 2017.
wei Lyu, “In ictu oculi: Exposing AI generated fake [24] Belhassen Bayar and Matthew C Stamm, “A deep learn-
face videos by detecting eye blinking,” arXiv preprint ing approach to universal image manipulation detec-
arXiv:1806.02877, 2018. tion using a new convolutional layer,” in IH&MMSEC.
[10] Jessica Fridrich and Jan Kodovsky, “Rich models for ACM, 2016.
steganalysis of digital images,” IEEE TIFS, 2012. [25] Nicolas Rahmouni, Vincent Nozick, Junichi Yamagishi,
[11] Andreas Rössler, Davide Cozzolino, Luisa Verdo- and Isao Echizen, “Distinguishing computer graph-
liva, Christian Riess, Justus Thies, and Matthias ics from natural images using convolution neural net-
Nießner, “FaceForensics: A large-scale video dataset works,” in WIFS. IEEE, 2017.
for forgery detection in human faces,” arXiv preprint [26] Weize Quan, Kai Wang, Dong-Ming Yan, and Xiaopeng
arXiv:1803.09179, 2018. Zhang, “Distinguishing between natural and computer-
[12] Huy H Nguyen, Ngoc-Dung T Tieu, Hoang-Quoc generated images using convolutional neural networks,”
Nguyen-Son, Vincent Nozick, Junichi Yamagishi, and IEEE TIFS, 2018.
Isao Echizen, “Modular convolutional neural network [27] Karen Simonyan and Andrew Zisserman, “Very deep
for discriminating between computer-generated images convolutional networks for large-scale image recogni-
and photographic images,” in ARES. ACM, 2018. tion,” in ICLR, 2015.
[13] Darius Afchar, Vincent Nozick, Junichi Yamagishi, and [28] Peng Zhou, Xintong Han, Vlad I Morariu, and Larry S
Isao Echizen, “MesoNet: a compact facial video forgery Davis, “Two-stream neural networks for tampered face
detection network,” in WIFS. IEEE, 2018. detection,” in CVPRW. IEEE, 2017.
2311