Autoencoder-convolutionalneuralnetwork-based
Autoencoder-convolutionalneuralnetwork-based
net/publication/363686202
CITATIONS READS
18 1,400
5 authors, including:
All content following this page was uploaded by Om Prakash Singh on 22 September 2022.
Abstract. Watermarking consists of embedding in, and later extracting from, a digital cover
a design called a watermark to prove the image’s copyright/ownership. In watermarking, the
use of deep-learning approaches is extremely beneficial due to their strong learning ability with
accurate and superior results. By taking advantage of deep-learning, we designed an autoencoder
convolutional neural network (CNN)-based watermarking algorithm to maximize the robustness
while ensuring the invisibility of the watermark. A two network model, including embedding
and extraction, is introduced to comprehensively analyze the performance of the algorithm. The
embedding network architecture is composed of convolutional autoencoders. Initially, CNN is
considered to obtain the feature maps from the cover and mark images. Subsequently, the feature
maps of the mark and cover are concatenated with the help of the concatenation principle. In the
extraction model, block-level transposed convolution and the rectified linear unit algorithm is
applied on the extracted features of watermarked and cover images to obtain the hidden mark.
Extensive experiments demonstrate that the proposed algorithm has high invisibility and good
robustness against several attacks at a low cost. Further, our proposed scheme outperforms other
state-of-the-art schemes in terms of robustness with good invisibility. © 2022 SPIE and IS&T
[DOI: 10.1117/1.JEI.32.2.021604]
Keywords: digital image watermarking; deep-learning; convolutional neural network; auto-
encoders; deep neural networks.
Paper 220607SS received Jun. 16, 2022; accepted for publication Aug. 30, 2022; published
online Sep. 19, 2022.
1 Introduction
Deep-learning technology is constantly evolving and becoming more popular among intelligent
multimedia data processing services. However, the alteration and unauthorized distribution of
multimedia content has become easier, especially regarding images, and the major issue of
copyright violation and ownership conflicts has attracted the attention of numerous research
practitioners.1 Therefore, the protection of the intellectual property rights of these images is
crucial. There are many existing studies on image protection. Many researchers have developed
an efficient watermarking scheme used in versatile applications to resolve the security and
ownership conflict issues of media data.2 The applications include hospitals, surveillance, gov-
ernment, ecommerce, academics, crime prevention, etc. This scheme embeds secret information
called a watermark that is visually invisible in a media object and identifies its ownership by
extracting the mark.1,2 The primary aim of a watermarking scheme is to enhance the three fea-
tures of invisibility, capacity, and robustness to maintain a good relationship between them.1
Generally, watermarking methods are applied to alter pixel values of an image in the spatial
Fig. 1 A traditional image watermarking model; I C : cover image, I w : watermark image, M: marked
image, M 0 : received marked image after transmission through a channel, and I w 0 : extracted water-
mark image.
domain or alter the transform coefficients in the transformed domain.3 Compared with the trans-
formed domain scheme, spatial domain schemes lack robustness and flexibility. Therefore, many
researchers have proposed digital watermarks in the transformed domain to protect the images
for various potential applications. A schematic diagram of the traditional image watermarking
system in the transformed domain is shown in Fig. 1.
In watermarking, the use of convolutional neural network (CNN)-based approaches instead
of transformed-based approaches is extremely beneficial due to their strong learning ability and
more accurate and superior results.4–9 Figure 2 shows a block diagram for our proposed deep-
learning-based watermarking scheme.
In most recent works,6,10 models inspired by complex deep neural network (DNN) architec-
tures have been employed in image watermarking. Although these methods may be able to
provide better visual quality of watermarked images, they are computationally expensive and
require very long training times.
In this paper, we present an autoencoder CNN-based watermarking algorithm to maximize
the robustness while ensuring the invisibility of the watermark. A two network model, including
embedding and extraction, is introduced to comprehensively analyze the performance of the
algorithm. Extensive experiments demonstrate that the proposed algorithm has high invisibility
and good robustness against several attacks. Further, our proposed scheme outperforms other
state-of-the-art schemes in terms of robustness with good invisibility.
The remainder of this paper is organized as follows. The related literature is described in
Sec. 2. The proposed image watermarking scheme is presented in Sec. 3. Results and analysis
are given in Sec. 4. Finally, in Sec. 5, we provide some concluding remarks.
2 Literature Survey
Deep-learning models are capable of finding hidden patterns within the data, making them very
successful in image-based applications. Recent years have witnessed the application of DNNs
in image watermarking, which is capable of obtaining a good visual quality as well as providing
robustness against commonly known attacks. In this section, we provide a summary of some
recent works in this area. Initial works focused more on enhancing the quality of the water-
marked image with the features extracted from deep-learning models. Kandi et al.11 proposed
a secure non-blind image watermarking mechanism based on CNNs that outperformed existing
state-of-the-art transform-domain techniques. This scheme has a good level of invisibility and
a security approach, and its robustness was proved against certain basic image attacks. Vukotić
et al.4 proved the utility of feature extraction to yield good imperceptibility to watermarked
images in zero-bit watermarking (or a watermark detection strategy) using the hyper-cone detec-
tor and the concept of adversarial images.
Fierro-Radilla et al.5 used CNNs to extract features from the cover image and combined
these features with the watermark data using XOR operations that make their zero-watermarking
scheme faster. Han et al.6 designed another zero-watermarking algorithm based on the pre-
trained network Visual Geometry Group (VGG)-19. This scheme addressed the issues related
to medical image security and is robust to geometric attacks. Latent feature maps from the medi-
cal images were extracted using VGG-19, and two-dimensional discrete Fourier transform was
used for watermark generation. In addition to this, hashing enhanced the robustness of the sys-
tem and with the Hermite chaotic neural network provided additional security. The watermarking
scheme in Ref. 7 is a CNN-based simple that is lightweight and uses an iterative learning frame-
work for image watermarking. The training process is composed of a loop consisting of the three
stages, viz., embedding the watermark, followed by the attack simulation, and finally the weight
update. Attack simulation allows the network to adaptively capture invariant features for various
attacks. The weights of the network were updated repeatedly to extract the watermark from the
given image. Chen et al.8 in their work on JSNet used a simulation network to enhance the
robustness against JPEG lossy compression attacks. This attack simulation was implemented
in a robust end-to-end CNN-based watermarking scheme, which achieved a considerable
performance improvement. HiDDeN12 is one of the remarkable works in this domain. It is a
framework that can be trained in an end-to-end manner and used for both data hiding and water-
marking. Based on CNNs, the cover image and the message to be encoded are first passed
through an encoder network and processed at the decoder network to extract the message.
The adversary network is used to compute the loss and to confirm whether a given image con-
tains any hidden data. Another work inspired by this framework is ROMark9 in which adversarial
learning and min-max optimizations were applied to provide robustness to a CNN-based
watermarking model. ReDMark13 is an end-to-end framework that simulates a discrete cosine
transform in their deep network architecture using convolutional autoencoders. It exhibits an
enhanced robustness against JPEG attacks and outperforms the HiDDeN framework. Zhong
et al.14 used an architecture based on CNNs to adaptively learn the rules of embedding a water-
mark in an image. In this model, the encoder and decoder networks perform the stages of
encoding, embedding and decoding the watermark, and using an extractor to retrieve the original
watermark image by reversing the processes. Additionally, to provide robustness to this method,
an invariance layer composed of a DNN is applied to handle the distortions in the watermarked
image. A practical application of this scheme was demonstrated in Ref. 15 that enables secure
and authorized internet of things device onboarding by embedding user credentials into images,
such as using printed QR codes on these devices. The marked images obtained using the scheme
can tolerate distortions up to 85% to 90%, thereby, proving its robustness.
A common issue in most of the above discussed schemes was maintaining the robustness due
to the fragile nature of deep-learning architectures. On encountering any changes made to the
marked image, it became difficult for the extractor network to obtain the watermark correctly.
For this purpose, several of the works involve attack simulation by training networks iteratively.
However, it only adds to an increased complexity of the scheme and a higher training time.
We address this issue in our work in the subsequent sections.
3 Proposed Method
A two network model, including embedding and extraction, is introduced to comprehensively
analyze the performance of the algorithm. The entire proposed mechanism is expressed as fol-
lows: (a) autoencoder-based CNN embedding and (b) extraction network model. The simplified
embedding and extraction network models are shown in Figs. 3 and 4, respectively. Further, the
detailed steps in both network models are presented in Algorithm 1 and Algorithm 2, respec-
tively. The embedder network configuration details are given in Table 1. Hyperparameters used
in the embedder and extraction networks are given in Table 2. Further, extraction network
Input: Training samples consisting of cover image I_c and watermark image I_w
1.1. Initialize:
S ← 32
ɳ ← 0.001
e ← 50
Watermark_SIZE = (64 × 64 × 1)
2. Reading data:
Load Dataset_Ic
Load Dataset_Iw
Select no. of kernels and kernel size for each layer, α & β
fc ← Ψ (Dataset_Ic, α, β)
fw ← Ψ (Dataset_Iw, α, β)
7. Concatenate features:
ω (fcw, α, β)
9. Compile model:
Ɱ ← compile (Ψ, ω, A, M)
Training:
for 0 to e do
for 0 to S do
Ω ← Ɱ(Dataset_Ic, Dataset_Iw)
l ← M(Dataset_Ic, Ω)
Algorithm 1 (Continued).
Ȣ ← A (l, α, β)
α, β ← A (Ȣ, α, β)
end
end
13. Testing:
У ← Ɱ (Dataset_Test_Ic, Dataset_Test_Iw)
14. Calculate:
PSNR (Dataset_Ic, У)
SSIM (Dataset_Ic, У)
1. Reading data:
Load Dataset_ У
Load Dataset_Ic
fy ← Ψ (Dataset_У, α, β)
fc ← Ψ (Dataset_Ic, α, β)
fw′ = fy-fc
4. features = empty_list( )
for j from 1 to 4:
Y = features.append(yj)
end
Y = reshape(Y) to 16 × 16 × 16
Algorithm 2 (Continued).
5. Training:
for 0 to e do
for 0 to S do
Ω ← Ɛ (Dataset_Ic, Dataset_ У)
l ← M(Dataset_Iw, Ω)
Ȣ ← A (l, α, β)
α, β ← A (Ȣ, α, β)
End
End
Testing:
Ew ← Ɛ (Dataset_Ic, Dataset_У)
Input 1 128 × 128 × 1 (128, 128, 1) Conv 2-1 32 (3 × 3) (32, 32, 32)
grayscale images convolutions; stride = 2
Conv 1-1 64 (3 × 3) (64, 64, 64) Conv transpose 64 (3 × 3) (64, 64, 64)
convolutions; stride = 2 convolutions; stride = 2
Epochs 50 300
Input 2 128 × 128 × 1 (128, 128, 1) Dense block D 1 4096 nodes (4096,)
grayscale images
Subtract To subtract the (128, 128, 1) Dense block D 2 4096 nodes (4096,)
input images
Dense block B 1 1024 nodes (1024,) Conv transpose 64 (3 × 3) (32, 32, 64)
convolutions;
stride = 1
Dense block B 2 1024 nodes (1024) Conv transpose 128 (3 × 3) (32, 32, 128)
convolutions;
stride = 1
Dense block B 3 1024 nodes (1024) Conv transpose 128 (3 × 3) (64, 64, 128)
convolutions;
stride = 2
Dataset_Ic ½c1; c2; : : : ; cn is the set of cover images l Calculated loss
for training the model
Ω Model outputs during training process SSIM Structural similarity index measure
function
configuration details are given in Table 3. Some commonly used notations in the algorithms are
listed in Table 4.
Fig. 5 Test images as (a) cover; (b) mark; and (c) marked.
The embedding time during model loading is 9.6 s, and the extraction and cleaner time are 29.4
and 3.19 s, respectively. Figure 6 indicates that training and validation losses decrease with an
increase in training steps (epochs). As the gap between training losses and validation losses is
very small, this depicts that the model does not overfit. In Fig. 7, we present some samples of
the extracted watermarks and its corresponding original watermark.
The performance comparison of our proposed model was carried out to test the robustness of
the proposed watermarking scheme. Table 7 contains the NC values against different attacks. It is
obvious that the NC score is over 0.7, which proves that the proposed algorithm is robust to
considered attacks. Additionally, in Fig. 8, we plotted the robustness performance of the pro-
posed algorithm against different attacks.
Major
S. No. characteristics Description How to measure it
2
1 Invisibility Measure the visual similarity PSNR ¼ 10 log10 ðsize of
MSE
imageÞ
.
between the plain and
marked images.17 Mean square error
PM PN
1 2
ðM MSE Þ ¼ M×N p¼1 q¼1 ðH pq − I pq Þ .
3 Time cost The cost associated with Embedding time and extraction time
evaluation embedding and extracting
digital watermark from
the cover media
PSNR 31.34 dB
SSIM 0.9940
NC 0.9937
Fig 6 Training and validation loss curves for (a) embedder network and (b) extractor network.
JPEG
Salt and pepper Speckle Gaussian Filtering attacks compression Rotation
For Advanced Research-10 (CIFAR-10)18 and Modified National Institute of Standards and
Technology (MNIST)19 datasets in terms of PSNR and SSIM, and the results are shown in
Table 8. For this evaluation, a subset of 8000 images was randomly chosen from the selected
datasets. Here, 6000 images and 2000 images are selected for training and testing purposes,
respectively. The PSNR and SSIM scores of the proposed algorithm and those from Rahim and
Nadeem20 and Ding et al.21 are given in Table 8. The SSIM of the proposed algorithm is higher
than these,20,21 but PSNR is lower than both schemes. Because our PSNR performance is lower,
our values are a good indicator of the invisibility of the proposed algorithm. Further, the NC
score of the proposed algorithm and from Ding et al.21 is given in Table 9. Here, median filter,
JPEG, and rotation attack are added in the cover image containing the secret obtained by our
algorithm to test the robustness. Table 9 shows that the NC score obtained by our algorithm is
bigger and closer to 1, which shows better robustness than the Ding et al.21 scheme. Additionally,
the detail comparisons of our algorithm with this scheme21 are listed in Table 10.
5 Conclusion
This article presented a CNN-based watermarking technique for the prevention of the infringe-
ment of digital images. A two network model, including embedding and extraction, is introduced
Fig. 8 Test for different attacks: (a) salt and pepper noise, (b) speckle, (c) Gaussian, (d) rotation,
and (e) JPEG.
to comprehensively analyze the performance of the algorithm. The embedding network archi-
tecture is composed of convolutional autoencoders. Initially, CNN is considered to obtain the
feature maps from the cover and mark images. Subsequently, the feature map of the mark and
cover is concatenated with the help of the concatenation principle. In the extraction model,
NC values (%)
Number of 1703 images 1629 images Extraction Using a CNN to Using DNN blocks
cover images (training); 427 (training); 697 process reconstruct the to capture
images (testing) images (testing) mark image; invariant features
being sensitive from the marked
to noise, it affects image and
the robustness reconstructing
of the scheme. using Tr-CNNs
Image 320 × 240 128 × 128 (cover); Image quality Low quality mark Good quality mark
dimensions 64 × 64 (mark)
block-level transposed convolution and a rectified linear unit algorithm is applied on the
extracted features of the watermarked and cover images to obtain the hidden mark. The exper-
imental analysis demonstrates that our proposed technique maintains a satisfactory marked
image quality and resistance to the effective attacks. Future work will investigate the perfor-
mance with color images, improve the robustness and security performance, and identify poten-
tial threats for further analysis.
References
1. A. K. Singh, “Data hiding: current trends, innovation and potential challenges,” ACM Trans.
Multimedia Comput. Commun. Appl. 16(3s), 1–16 (2021).
2. O. P. Singh et al., “SecDH: security of COVID-19 images based on data hiding with PCA,”
Comput. Commun. 191, 367–377 (2022).
3. O. P. Singh et al., “Image watermarking using soft computing techniques: a comprehensive
survey,” Multimedia Tools Appl. 80(20), 30367–30398 (2020).
4. V. Vukotić, V. Chappelier, and T. Furon, “Are deep neural networks good for blind image
watermarking?” in IEEE Int. Workshop on Inf. Forensics and Security, December, IEEE,
pp. 1–7 (2018).
5. A. Fierro-Radilla et al., “A robust image zero-watermarking using convolutional neural
networks,” in 7th Int. Workshop on Biometrics and Forensics, pp. 1–5 (2019).
6. B. Han et al., “Zero-watermarking algorithm for medical image based on VGG19 deep
convolution neural network,” J. Healthcare Eng. 2021, 12 (2021).
7. S. M. Mun et al., “A robust blind watermarking using convolutional neural network,”
arXiv-1704 (2018).
8. B. Chen et al., “JSNet: a simulation network of JPEG lossy compression and restoration
for robust image watermarking against JPEG attack,” Comput. Vis. Image Underst. 197,
103015 (2020).
9. B. Wen and S. Aydore, “ROMark: a robust watermarking system using adversarial training,”
arXiv:1910.01221 (2019).
10. M. Bagheri et al., “Image watermarking with region of interest determination using deep
neural networks,” in 19th IEEE Int. Conf. Mach. Learn. and Appl., pp. 1067–1072 (2020).
11. H. Kandi, D. Mishra, and S. R. S. Gorthi, “Exploring the learning capabilities of convolu-
tional neural networks for robust image watermarking,” Comput. Security 65, 247–268
(2017).
12. J. Zhu et al., “Hidden: hiding data with deep networks,” Lect. Notes Comput. Sci. 11219,
682–697 (2018).
13. M. Ahmadi et al., “ReDMark: framework for residual diffusion watermarking based on deep
networks,” Expert Syst. Appl. 146, 113157 (2020).
14. X. Zhong et al., “An automated and robust image watermarking scheme based on deep
neural networks,” IEEE Trans. Multimedia 23, 1951–1961 (2021).
15. S. Mastorakis et al., “DLWIoT: deep learning-based watermarking for authorized IoT
onboarding,” in IEEE 18th Annu. Consumer Commun. & Netw. Conf., IEEE, pp. 1–7
(2021).
16. “Kaggle Dogs vs. Cats,” https://www.kaggle.com/c/dogs-vs-cats (accessed 13 September
2022).
17. O. P. Singh and A. K. Singh, “Data hiding in encryption–compression domain,” Complex
Intell. Syst. 1–14 (2021).
18. A. Krizhevsky and G. Hinton, Learning Multiple Layers of Features from Tiny Images,
Technical Report, University of Toronto (2009).
19. Y. LeCun et al., “Gradient-based learning applied to document recognition,” Proc. IEEE
86(11), 2278–2324 (1998).
20. R. Rahim and S. Nadeem, “End-to-end trained CNN encoder-decoder networks for image
steganography,” Lect. Notes Comput. Sci. 11132, 723–729 (2019).
21. W. Ding et al., “A generalized deep neural network approach for digital watermarking analy-
sis,” IEEE Trans. Emerg. Top. Comput. Intell. 6(3), 613–627 (2022).
Debolina Mahapatra is currently pursuing her MTech degree in computer science and engi-
neering at National Institute of Technology (NIT), Patna, Bihar, India. Her research interests
include data hiding techniques and cryptography.
Preetam Amrit is currently pursuing his PhD in computer science and engineering at NIT Patna,
Bihar, India. His research interests include multimedia hiding techniques and deep learning
methodology.
Om Prakash Singh is currently working as a temporary faculty in NIT, Patna, Bihar, India.
He pursued his PhD at NIT Patna, Bihar, India. His research interests include data hiding
techniques and cryptography.
Amit Kumar Singh is an assistant professor in the Computer Science and Engineering
Department at NIT, Bihar, India. His research interests include watermarking and image
processing.
Amrit Kumar Agrawal is an assistant professor in the Computer Science and Engineering
Department at the Galgotias College of Engineering & Technology, Greater Noida, Uttar
Pradesh, India. His research interests include security, computer vision, and biometrics.