ICPR2022 Latest
ICPR2022 Latest
net/publication/365855190
CITATIONS READS
13 50
5 authors, including:
Somaya Al-ma'adeed
Qatar University
300 PUBLICATIONS 7,686 CITATIONS
SEE PROFILE
All content following this page was uploaded by Y. Akbari on 22 June 2023.
URL: https://doi.org/10.1109/ICPR56361.2022.9956272
<https://doi.org/10.1109/ICPR56361.2022.9956272>
Northumbria University has developed Northumbria Research Link (NRL) to enable users
to access the University’s research output. Copyright © and moral rights for items on
NRL are retained by the individual author(s) and/or other copyright owners. Single copies
of full items can be reproduced, displayed or performed, and given to third parties in any
format or medium for personal research or study, educational, or not-for-profit purposes
without prior permission or charge, provided the authors, title and full bibliographic
details are given, as well as a hyperlink and/or URL to the original metadata page. The
content must not be changed in any way. Full items must not be sold commercially in any
format or medium without formal permission of the copyright holder. The full policy is
available online: http://nrl.northumbria.ac.uk/policies.html
This document may differ from the final, published version of the research and has been
made available online in accordance with publisher policies. To read and/or cite from the
published version of the research, please visit the publisher’s website (a subscription
may be required.)
PRNU-Net: a Deep Learning Approach for Source
Camera Model Identification based on Videos Taken
with Smartphone
Younes Akbari∗ , Noor Almaadeed∗ , Somaya Al-maadeed∗ ,
Fouad Khelifi† , and Ahmed Bouridane‡ ,
∗ Department of Computer Science and Engineering , Qatar University, Doha, Qatar
† Department of Computer and Information Sciences, Northumbria University, UK,
‡ Center for Data Analytics and Cybernetics, University of Sharjah, Sharjah, UAE.
Abstract—Recent advances in digital imaging have meant that well as the differences between frame types that can occur
every smartphone has a video camera that can record high- when producing a video. By analyzing the video produced by
quality video for free and without restrictions. In addition, rapidly
digital cameras, video identification algorithms can identify and
developing Internet technology has contributed significantly to the
widespread distribution of digital video via web-based multimedia distinguish camera types. During the last few years, forensic
systems and mobile applications such as YouTube, Facebook, specialists have been particularly interested in this topic. In
Twitter, WhatsApp, etc. However, as the recording and distribution general, images and videos can be identified in two ways: by
of digital video has become affordable nowadays, security issues extracting a unique fingerprint from the images or videos, or by
have become threatening and have spread worldwide. One of the examining the metadata associated with the images or videos
security issues is the identification of source cameras on videos.
Generally, two common categories of methods are used in this area, (the DNA of the video). Lopez et al. [13] demonstrated that
namely Photo Response Non-Uniformity (PRNU) and Machine the internal elements and metadata of video can be used for
Learning approaches. To exploit the power of both approaches, source video identification. Since metadata can be removed
this work adds a new PRNU-based layer to a convolutional neural from an image or video, identifying video or images based on
network (CNN) called PRNU-Net. To explore the new layer, the fingerprint is a reliable method. Moreover, two concepts are
main structure of the CNN is based on the MISLnet, which has
been used in several studies to identify the source camera. The considered for identifying camera: individual source camera
experimental results show that the PRNU-Net is more successful identification (ISCI) and source camera model identification
than the MISLnet and that the PRNU extracted by the layer from (SCMI). ISCI distinguishes cameras from both the same and
low features, namely edges or textures, is more useful than high different camera models, while SCMI is a subset of ISCI that
and mid-level features, namely parts and objects, in classifying distinguishes a particular camera model from others, but cannot
source camera models. On average, the network improves the
results in a new database by about 4%. distinguish a particular camera model from the same camera
models. In this paper, we focus on the SCMI scenario.
I. I NTRODUCTION Two common methods used in the field, namely Photo
In the last two decades, the cell phone technology has evolved Response Non Uniformity (PRNU) [14], [15], [16], [17], [18]
significantly due to its economic advantages, functionality and Machine Learning approaches. PRNU, which is understood
and accessibility [1]. The ability to create digital audiovisual to be the unique fingerprint of the camera, is often referred
content without constraints such as time, objects, or locations to as residual noise or sensor pattern noise (SPN). PRNU
are clear advantages of the technology [2]. For forensic is generated when the CCD (Charge Coupled Device) or
investigations and crime prosecutions, smartphone devices CMOS (Complementary Metal Oxide Semiconductor) sensors
provide some important information in crucial ways [1], [3]. process the input signal (light) and convert it into a digital
In areas such as medicine, law, and surveillance systems, signal. The output of the methods can be considered low-
where images and videos are examined for authenticity, these level features. In deep learning methods, which are a popular
types of investigations have potential significance. Lossy video category of machine learning, this training step should be
compression complicates the forensic analysis of videos much performed to extract the fingerprint of the video captured
more than the analysis of images, since the current traces can by the camera. The main challenges for these methods are
be erased or significantly damaged by high compression rates, the separation of content from noise. The challenge can be
making it impossible or difficult to recover the entire processing solved by introducing methods and algorithms to address
data. While numerous forensic methods have been developed the problem by, for example, adding new layers and loss
based on digital images [4], [5], [6], [7], [8], [9], the forensic functions. The architecture introduced by the Multimedia
analysis of videos has been less explored. It should be noted and Information Security Lab (MISL) [19] is one of the
that methods based on images cannot also be applied directly architectures. The MISLnet network is based on a so-called
to videos [10], [11], [12]. This is due to some challenges constrained convolutional layer. A Constrained Convolutional
such as compression, stabilization, scaling, and cropping, as layer is added at the beginning of a CNN that is to perform
forensic tasks as shown in Figure 1 (a). As a result of the layer, we focus on Deep Learning in this paper, we address these
low-level features are extracted to suppress the image content. methods in this section.
To design the layer, the convolutional layer filters are enforced In [23], a CNN based sensor pattern noise (SPN) method
by the following constraints: was presented, called SPN-CNN. The authors implemented
( (1)
ωkj (0, 0) = −1 the architecture based on the idea that CNN has the ability
P (1) (1) to extract signals characterised by noise from a set of images
m,n̸=0 ωkj (m, n) = 1 [24]. Therefore, the network was to obtain a noise pattern.
(1)
where j = {1, 2, 3}. Moreover, ωkj denotes the jth kernel The method was tested on the VISION database [25] and
of the kth filter in the first layer of the network. Despite experimental results have shown that the results outperform
promising results of the method [20], [21], because the degree those of the wavelet denoiser. Also, the authors showed that,
of sensitivity in the field, an improvement in the field is always when I-frames were considered to feed into CNN, the results
essential. were further improved.
To add the benefits of PRNU approaches to CNNs, this References [20] and [21] proposed a deep learning method
paper presents a new PRNU-based layer that can improve the (MISLnet architecture) for source camera identification using
results of Deep Learning architectures in this application. The video frames to train the network. They extended a version of a
PRNU-based layer, which can be inserted into CNNs, adds an constrained convolutional layer introduced in [19] as mentioned
advantageous attribute to CNNs thus taking into account the in Section I. Moreover, a majority vote was considered to make
fingerprint information extracted from frames in the network. the decision in video level using frames fed into the network.
The layer can pass the extracted fingerprint (low-level features) The constrained convolutional layer was added as the first layer
from each layer to the next layers. This means that the features that used three kernel with size 5. This layer is constructed
can be extracted by layers with high, mid or low features. in such a way that there are relationships between adjacent
An overview of the structures is given in Figure 1 (b). In pixels that are independent of the content of the scene. The
the structure, the new layer can be placed at any point in the methods was tested on VISION database [25]. The experiments
network and retrieved several times like a convolutional layer. showed that the layer can improve results compared with deep
The goal of evaluating the new layer at different locations in learning architectures without the layer. The key difference
the network is to make the effects of the PRNU extracted from between the two methods relates to the size of images and
the high, mid, and low-level features during learning clearer. type of color modes. [20] and [21] used RGB and gray scale
Forward propagation and backpropagation are based on the modes, respectively. Patches used in former is 480 while latter
PRNU method and the derivatives of the loss with respect to fed patched with 256.
the input data of the layer, respectively. Two scenarios were Mayer et al. [26] used CNN proposed in [19] like the two
performed in relation to the location and number of repetitions previous studies to extract features and a similarity network
of the layer. To evaluate the approaches, the frames need be to verify the source camera. The similarity network maps two
extracted. Generally, the frames consists of intra-coded picture input deep feature vectors to a 2D similarity vector. To achive
(I-frame), predictive coded picture (P-frame), and bi-predictive this, the authors follow a design of the similarity network
coded picture (B-frames) showing promising results with I- developed in [27]. To obtain a decision in video level, a fusion
frames [12], [22]. In our work, I-frames are extracted from approach based on mean of the inactivated output layer from
Qatar University Forensic Video Database (QUFVD) which the similarity network was presented. This method was tested
was created as part of this investigation. The database includes on SOCRatES dataset [28]. The experiments showed that the
6000 videos from 20 modern smartphone representing five method improve traditional methods such as [29].
brands, each brand has two models, and each model has two The structure of the CNN for the three studies is shown in
identical smartphone devices. The experiments show that the Figure 1 (a). As shown in the figure, a constrained convolutional
new layer can improve the results of the CNNs without it layer is added to a simple CNN.
(MISL [19]). It is worth noting that, like all deep learning
methods used to identify source cameras, this study deals with III. PRNU-N ET
videos at the frame level instead of considering the video in a
Figure 1 (b) shows the structure of the network used in the
feature space representation.
study. As can be seen in the figure, only the constrained layer
The paper is organized as follows. Section II gives a review of
proposed by [19] (MISLnet architecture) is removed from our
available deep learning methods for source camera verification
structure and the rest of our structure is the same as MISLnet.
from videos is presented. Our new approach is then presented in
A new PRNU-based layer has been designed to replace the
Section III. Section IV describes our database used to identify
constrained convolutional layer and can be placed elsewhere
source cameras from videos. Section V discusses the evaluation
in the network. The layer can extract PRNU from raw images
of the proposed while the last section concludes this work.
(input layer) and feature maps of each convolutional layer.
II. LITERATURE REVIEW To design our new layer, forward propagation and backward
Source camera identification from videos can be classified propagation are considered, which are explained in the follow-
into two categories: PRNU and Deep Learning methods. Since ing two subsections. Since PRNU is extracted from grayscale
Batch Normalization
Fully coneccted
Fully coneccted
Batch Normalization
Batch Normalization
Batch Normalization
Fully coneccted
Max Pooling
Max Pooling
SoftMax
TanH
Max Pooling
Max Pooling
TanH
TanH
TanH
TanH
TanH
(a)
Batch Normalization
Batch Normalization
Fully coneccted
Fully coneccted
Batch Normalization
Batch Normalization
Fully coneccted
Max Pooling
Max Pooling
SoftMax
PRNU Layer
TanH
Max Pooling
Max Pooling
TanH
TanH
TanH
TanH
TanH
(b)
Fig. 1. Overview of (a) The CNN (MISLnet) presented in [19] with a constrained layer in the first layer of the network and (b) PRNU network structure with
a layer based on PRNU (the dotted rectangle shows that different locations of the layer can be considered).
Brand Model Resolution Number of videos Number of I-frames Length in Secs Operating system
Samsung Galaxy A50 (device #1) 1080 × 1920 300 3654 11-15 Android 10.0
Samsung Galaxy A50 (device #2) 1080 × 1920 300 3782 11-15 Android 10.0
Samsung Note9 (device #1) 1080 × 1920 300 3956 11-15 Android 10.0
Samsung Note9 (device #2) 1080 × 1920 300 3962 12-15 Android 10.0
Huawei Y7 (device #1) 720 × 1280 300 3630 11-15 Android 9.0
Huawei Y7 (device #2) 720 × 1280 300 3642 11-15 Android 9.0
Huawei Y9 (device #1) 720 × 1280 300 4146 11-14 Android 9.0
Huawei Y9 (device #2) 720 × 1280 300 4011 11-15 Android 9.0
iPhone 8 Plus (device #1) 1080 × 1920 300 3991 11-15 iOS 13
iPhone 8 Plus (device #2) 1080 × 1920 300 4080 11-14 iOS 13
iPhone XS Max (device #1) 1080 × 1920 300 3893 11-15 iOS 13
iPhone XS Max (device #2) 1080 × 1920 300 4074 11-15 iOS 13
Nokia 5.4 (device #1) 1080 × 1920 300 3350 11-13 Android 10.0
Nokia 5.4 (device #2) 1080 × 1920 300 3531 11-14 Android 10.0
Nokia 7.1 (device #1) 1080 × 1920 300 3904 11-13 Android 10.0
Nokia 7.1 (device #2) 1080 × 1920 300 3819 11-14 Android 10.0
Xiaomi Redmi Note8 (device #1) 1080 × 1920 300 3776 11-14 Android 11.0
Xiaomi Redmi Note8 (device #2) 1080 × 1920 300 3598 11 Android 11.0
Xiaomi Redmi Note9 Pro (device #1) 1080 × 1920 300 3888 11-15 Android 11.0
Xiaomi Redmi Note9 Pro (device #2) 1080 × 1920 300 3838 11-13 Android 11.0
TABLE II
T HE RESULTS OF THE FRAME AND VIDEO LEVELS IN TERMS OF ACCURACY (%) FOR THE SCMI SCENARIO FOR EACH SMARTPHONE MODEL .
TABLE III
I MPACT OF PLACE AND REPETITION OF THE PRNU LAYER IN THE NETWORK (l)
Place l=1 l=2 l=3 l=4 l=5 l = {1, 2} l = {1, 2, 3} l = {1, 2, 3, 4} l = {1, 2, 3, 4, 5}
77.4 77.6 74.1 73.8 73.0 77.4 73.9 72.5 72.3
this premise, Figure 4 provides a more comprehensive picture Impact of place and repetition of the layer in the network (l) is
of camera identification performance to check the quality of explored as shown in Table III. This shows that the position is
PRNU-Net compared to MISLnet by presenting the Receiver more suitable for layers with high, mid, or low-level features.
Operating Characteristic (ROC) curves for a selected group of
A. Result discussion
ten classes (smartphone model) from our database. Two values
are calculated for each threshold: True Positive Ratio (TPR) Recently, Deep Learning methods have been introduced to
and False Positive Ratio (FPR). The TPR of a given class, solve source camera identification. The methods can help to
e.g. Huawei Y7, is the number of outputs whose actual and improve the results obtained with traditional methods such as
predicted class is Huawei Y7 divided by the number of outputs the PRNU methods. Overall, the results obtained at the frame
whose predicted class is Huawei Y7. The FPR is calculated and video levels suggest that PRNU-Net is more successful
by dividing the number of outputs whose actual class is not than MISLnet for the SCMI problem in all device models.
Huawei Y7, but whose predicted class was Huawei Y7 by the For both methods, when the results are reported at the video
number of outputs whose predicted class is not Huawei Y7. level, improvement can be cleaely observed. In addition, the
results of PRNU-Net and MISLnet with the constrained layer
when compared against MISLnet without the constrained layer
clearly show that the separation of content and noise is useful
for source camera identification. The results are discussed in
more detail below. As can be seen in Table II at the frame level,
a few devices are hard to identify, such as the Y7, 8 Plus, and
Redmi Note9 Pro, and this requires further analysis to find out
the reason for the differences, which could be the resolution of
the videos or the imaging technology used by the devices, etc.
However, from other results, resolution does seem to be the
reason, since Y7 and Y9 have the lowest resolution, but their
identification results are not worse. The biggest improvement
is for the Redmi Note9 Pro compared to MISLnet about 9%.
At the video level, an overall improvement is achieved for
all devices. The best result with 95% is also obtained for the
Note 9 by both the methods. The best improvement on video
level is also for Redmi Note9 pro compared to MISLnet about
12%. Figures 4 shows the TPR compared to the FPR for the
two methods at different frame-level thresholds. As can be (a)
seen from the figure, all models achieve a larger Area Under
Curve (AUC) value than MISLnet. A different analysis for
the devices in terms of TPR and FPR as shown in the figure,
shows that the best performance is obtained by Nokia 5.4 with
Area Under Curve (AUC=0.991) for PRNU-Net (Figure 4 (a))
compared to AUC=0.989 for MISLnet (Figure 4 (b)). Table III
shows the best result when the layer position is l = 2 and the
layer repetition is 2, namely, l = {1, 2} in frame level. When
the repetition is set for all layers, a drop in performance is
indicated showing that the fingerprint extracted can be effected
by convolutional layers. This also shows that the layer gives
better results when placed after layers with low-level features.
This may be because PRNU extracts low level features and
the features may be more accurate if the input is low level.
VI. C ONCLUSION
This paper has presented a new layer based PRNU extracted
from videos taken with a smartphone to identify the camera (b)
source. In general, PRNU methods extract low-level features
from frames, and we have studied the feature extraction by the Fig. 4. True and false positive rates (ROC) obtained in SCMI scenario (a) 10
classes with PRNU-Net (b) 10 classes with MISLnet
methods using a deep-learning approach. For the new layer,
forward propagation and backpropagation are defined based on
the extracted PRNU and the derivative of the loss with respect be a worthwhile endeavor for the future to change architecture
to the input data, respectively. The method is evaluated with so that videos are seen as a sequence of frames rather than
a database containing five popular smartphone brands with focusing on single frames. Finally, the PRNU network should
two models per brand and two devices for each model, 6000 be tested using other scenarios such as ISCI to obtain a more
original videos, and 76,531 I-frames. The results show that the accurate analysis.
approach achieves promising results compared to MISLnet, one
of the most popular deep learning methods in the field. The ACKNOWLEDGMENT
best results are obtained when the layer located after low level This publication was made possible by NPRP grant #
inputs. However, it is obvious that it is essential to improve NPRP12S-0312-190332 from Qatar National Research Fund
the results in future works. (a member of Qatar Foundation). The statement made herein
To improve the results, especially when the layer is repeated, are solely the responsibility of the authors.
defining new learnable parameters can help to reduce the impact
of the convolutional layers. The parallel use of other PRNU R EFERENCES
methods and filters can be considered as a bank of operators that
[1] H. Tian, Y. Xiao, G. Cao, Y. Zhang, Z. Xu, and Y. Zhao, “Daxing
can be used instead of convolutional layers. Also, it is possible smartphone identification dataset,” IEEE Access, vol. 7, pp. 101 046–
to add the layer to other Deep Learning architectures. It would 101 053, 2019.
[2] S. Milani, M. Fontani, P. Bestagini, M. Barni, A. Piva, M. Tagliasacchi, Workshop on Information Forensics and Security (WIFS). IEEE, 2019,
and S. Tubaro, “An overview on video forensics,” APSIPA Transactions pp. 1–6.
on Signal and Information Processing, vol. 1, 2012. [24] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian
[3] Y. Akbari, S. Al-maadeed, O. Elharrouss, F. Khelifi, A. Lawgaly, and denoiser: Residual learning of deep cnn for image denoising,” IEEE
A. Bouridane, “Digital forensic analysis for source video identification: transactions on image processing, vol. 26, no. 7, pp. 3142–3155, 2017.
A survey,” Forensic Science International: Digital Investigation, vol. 41, [25] D. Shullani, M. Fontani, M. Iuliani, O. Al Shaya, and A. Piva, “Vision:
p. 301390, 2022. a video and image dataset for source identification,” EURASIP Journal
[4] J. Lukas, J. Fridrich, and M. Goljan, “Digital camera identification from on Information Security, vol. 2017, no. 1, pp. 1–16, 2017.
sensor pattern noise,” IEEE Transactions on Information Forensics and [26] O. Mayer, B. Hosler, and M. C. Stamm, “Open set video camera model
Security, vol. 1, no. 2, pp. 205–214, 2006. verification,” in ICASSP 2020-2020 IEEE International Conference on
[5] M. Chen, J. Fridrich, M. Goljan, and J. Lukás, “Determining image origin Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp.
and integrity using sensor noise,” IEEE Transactions on information 2962–2966.
forensics and security, vol. 3, no. 1, pp. 74–90, 2008. [27] O. Mayer and M. C. Stamm, “Forensic similarity for digital images,”
[6] A. Lawgaly and F. Khelifi, “Sensor pattern noise estimation based on IEEE Transactions on Information Forensics and Security, vol. 15, pp.
improved locally adaptive dct filtering and weighted averaging for source 1331–1346, 2019.
camera identification and verification,” IEEE Transactions on Information [28] C. Galdi, F. Hartung, and J.-L. Dugelay, “Socrates: A database of realistic
Forensics and Security, vol. 12, no. 2, pp. 392–404, 2016. data for source camera recognition on smartphones.” in ICPRAM, 2019,
[7] A. Lawgaly, F. Khelifi, and A. Bouridane, “Weighted averaging-based pp. 648–655.
sensor pattern noise estimation for source camera identification,” in 2014 [29] M. Goljan, J. Fridrich, and T. Filler, “Large scale test of sensor fingerprint
IEEE International Conference on Image Processing (ICIP). IEEE, camera identification,” in Media forensics and security, vol. 7254.
2014, pp. 5357–5361. International Society for Optics and Photonics, 2009, p. 72540I.
[8] X. Kang, Y. Li, Z. Qu, and J. Huang, “Enhancing source camera [30] Y. Akbari, S. Al-Maadeed, N. Al-Maadeed, A. Al-Ali, F. Khelifi,
identification performance with a camera reference phase sensor pattern A. Lawgaly et al., “A new forensic video database for source smartphone
noise,” IEEE Transactions on Information Forensics and Security, vol. 7, identification: Description and analysis,” IEEE Access, vol. 10, pp. 20 080–
no. 2, pp. 393–402, 2011. 20 091, 2022.
[9] F. Ahmed, F. Khelifi, A. Lawgaly, and A. Bouridane, “Comparative
analysis of a deep convolutional neural network for source camera
identification,” in 2019 IEEE 12th International Conference on Global
Security, Safety and Sustainability (ICGS3). IEEE, 2019, pp. 1–6.
[10] M. Iuliani, M. Fontani, D. Shullani, and A. Piva, “Hybrid reference-based
video source identification,” Sensors, vol. 19, no. 3, p. 649, 2019.
[11] S. Mandelli, P. Bestagini, L. Verdoliva, and S. Tubaro, “Facing device
attribution problem for stabilized video sequences,” IEEE Transactions
on Information Forensics and Security, vol. 15, pp. 14–27, 2019.
[12] E. Altinisik and H. T. Sencar, “Source camera verification for strongly
stabilized videos,” IEEE Transactions on Information Forensics and
Security, vol. 16, pp. 643–657, 2020.
[13] R. R. López, E. A. Luengo, A. L. S. Orozco, and L. J. G. Villalba, “Digital
video source identification based on container’s structure analysis,” IEEE
Access, vol. 8, pp. 36 363–36 375, 2020.
[14] S. McCloskey, “Confidence weighting for sensor fingerprinting,” in 2008
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition Workshops. IEEE, 2008, pp. 1–6.
[15] W.-H. Chuang, H. Su, and M. Wu, “Exploring compression effects for
improved source camera identification using strongly compressed video,”
in 2011 18th IEEE International Conference on Image Processing. IEEE,
2011, pp. 1953–1956.
[16] M. Goljan, M. Chen, P. Comesaña, and J. Fridrich, “Effect of compression
on sensor-fingerprint based camera identification,” Electronic Imaging,
vol. 2016, no. 8, pp. 1–10, 2016.
[17] A. Mahalanobis, B. V. Kumar, and D. Casasent, “Minimum average
correlation energy filters,” Applied Optics, vol. 26, no. 17, pp. 3633–
3640, 1987.
[18] L. J. G. Villalba, A. L. S. Orozco, R. R. López, and J. H. Castro,
“Identification of smartphone brand and model via forensic video analysis,”
Expert Systems with Applications, vol. 55, pp. 59–69, 2016.
[19] B. Bayar and M. C. Stamm, “Constrained convolutional neural networks:
A new approach towards general purpose image manipulation detection,”
IEEE Transactions on Information Forensics and Security, vol. 13, no. 11,
pp. 2691–2706, 2018.
[20] D. Timmerman, S. Bennabhaktula, E. Alegre, and G. Azzopardi, “Video
camera identification from sensor pattern noise with a constrained
convnet,” arXiv preprint arXiv:2012.06277, 2020.
[21] B. Hosler, O. Mayer, B. Bayar, X. Zhao, C. Chen, J. A. Shackleford,
and M. C. Stamm, “A video camera model identification system using
deep learning and fusion,” in ICASSP 2019-2019 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,
2019, pp. 8271–8275.
[22] B. C. Hosler, X. Zhao, O. Mayer, C. Chen, J. A. Shackleford, and M. C.
Stamm, “The video authentication and camera identification database: A
new database for video forensics,” IEEE Access, vol. 7, pp. 76 937–76 948,
2019.
[23] M. Kirchner and C. Johnson, “Spn-cnn: Boosting sensor-based source
camera attribution with deep learning,” in 2019 IEEE International