0% found this document useful (0 votes)
7 views9 pages

Cross-Platform Compatibility Protocol

This paper presents a robust image steganography system that utilizes an invertible neural network (INN) to hide messages in the discrete cosine transform (DCT) coefficients of images, enhancing resilience against JPEG compression. The proposed method features a mutual information loss and a two-way fusion module to improve message extraction while maintaining low error rates, achieving a 0% error rate for 1 bit per pixel embedding. Experimental results demonstrate significant robustness, with error rates 22% lower than existing methods under aggressive JPEG compression conditions.

Uploaded by

mohammad haqqi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views9 pages

Cross-Platform Compatibility Protocol

This paper presents a robust image steganography system that utilizes an invertible neural network (INN) to hide messages in the discrete cosine transform (DCT) coefficients of images, enhancing resilience against JPEG compression. The proposed method features a mutual information loss and a two-way fusion module to improve message extraction while maintaining low error rates, achieving a 0% error rate for 1 bit per pixel embedding. Experimental results demonstrate significant robustness, with error rates 22% lower than existing methods under aggressive JPEG compression conditions.

Uploaded by

mohammad haqqi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

Robust Image Steganography: Hiding Messages in Frequency Coefficients


Yuhang Lan1 , Fei Shang1 , Jianhua Yang2 , Xiangui Kang1∗ , Enping Li3
1
Guangdong Key Laboratory of Information Security Technology, School of Computer Science and Engineering, Sun Yat-Sen
University, Guangzhou, China
2
Guangdong Polytechnic Normal University, Guangzhou, China
3
Computer Science Department, Bridgewater State University, Massachusetts, USA
{lanyh5, shangf5}@mail2.sysu.edu.cn, [email protected], [email protected], [email protected]

Abstract
Steganography is a technique that hides secret messages in-
to a public multimedia object without raising suspicion from
third parties. However, most existing works cannot provide
good robustness against lossy JPEG compression while main-
taining a relatively large embedding capacity. This paper
presents an end-to-end robust steganography system based on Figure 1: The universal pipeline of robust image steganog-
the invertible neural network (INN). Instead of hiding in the s-
raphy. Our model has robustness against JPEG compression
patial domain, our method directly hides secret messages into
the discrete cosine transform (DCT) coefficients of the cover with different quality factors.
image, which significantly improves the robustness and anti-
steganalysis security. A mutual information loss is first pro-
posed to constrain the flow of information in INN. Besides, tistical steganalysis detection, these traditional methods on-
a two-way fusion module (TWFM) is implemented, utilizing ly have a low embedding payload. Encouraged by the repre-
spatial and DCT domain features as auxiliary information to sentational power of convolutional networks, learning-based
facilitate message extraction. These two designs aid in recov- steganography methods achieve higher embedding capacity.
ering secret messages from the DCT coefficients losslessly. Work (Hayes and Danezis 2017) proposes the first genera-
Experimental results demonstrate that our method yields sig- tive adversarial network (GAN) based steganography frame-
nificantly lower error rates than other existing hiding method-
s. For example, our method achieves reliable extraction with
work by simultaneously training both a steganographic gen-
0 error rate for 1 bit per pixel (bpp) embedding payload; and erator and a steganalyzer. SteganoGAN (Zhang et al. 2019a)
under the JPEG compression with quality factor QF = 10, proposes a flexible image steganography method that ex-
the error rate of our method is about 22% lower than the state- pands the binary secret information into a three-dimensional
of-the-art robust image hiding methods, which demonstrates tensor, significantly increasing the embedding capacity. AB-
remarkable robustness against JPEG compression. DH (Yu 2020) utilizes a spatial attention mechanism to im-
prove the visual quality of generated stego images, which
Introduction embeds the secret messages into locations that are not sensi-
tive to human vision. CHAT-GAN (Tan et al. 2021) designs
A safe and robust image steganography system attempt- a channel attention module to promote the network to con-
s to hide the secret messages within the cover image, and centrate on the critical channel features. However, most of
the generated stego image is more inclined to evade mali- these methods are vulnerable to malicious JPEG compres-
cious distortion and detection. Figure 1 shows the universal sion when transmitting stego images on lossy channels. That
pipeline of robust image steganography. In real-world ap- is, a slight disturbance usually results in a poor secret mes-
plications, the stego image inevitably encounters distortions sage revealing. This substantial drop in performance makes
during transmission on social networks, such as JPEG com- them not applicable in practical scenarios.
pression, which dramatically complicates extracting mes- This paper proposes an end-to-end robust steganography
sages. As modifying the different pixel positions of the im- system in which the concealing and revealing of secret mes-
age has different embedding effects, traditional steganogra- sages are realized by INN’s forward and backward process-
phy algorithms combine the Syndrome Trellis Codes (Filler, es. Instead of choosing spatial image pixels, our method u-
Judas, and Fridrich 2011) and the additive distortion cost tilize the DCT coefficients for message embedding. As the
functions (Holub, Fridrich, and Denemark 2014; Li et al. quantization and rounding operations in JPEG compression
2014; Pevnỳ, Filler, and Bas 2010) to minimize the stegano- directly act on the DCT coefficients, the modification of DC-
graphic framework overall distortion. However, to evade sta- T coefficients by JPEG compression is intuitive and easier to

Corresponding author: Xiangui Kang model than in spatial domain or wavelet domain. However,
Copyright c 2023, Association for the Advancement of Artificial one DCT coefficient of a block is calculated by all pixel val-
Intelligence (www.aaai.org). All rights reserved. ues of the block in the spatial domain, which indicates that

14955
modifications in pixel values may significantly affect cor- 2021) hide secret messages into compressed DCT coeffi-
responding DCT coefficients. This complex statistical char- cients. These works achieve promising robustness and se-
acteristic of DCT coefficients incurs the difficulty of robust curity with small embedding capacity.
steganography in the DCT domain. Our proposed method
efficiently achieves JPEG robustness under any quality fac- Invertible Neural Network
tors, and it meets low error rate requirements for the robust The affine coupling layer (Kingma and Dhariwal 2018) is
steganography algorithm. The main contributions are sum- the fundamental building block of the invertible neural net-
marized as follows: work (INN) (Dinh, Krueger, and Bengio 2014). The encod-
• We introduce an end-to-end robust steganography system ing and decoding process shares the same parameters, mak-
based on INN, which directly hides/extracts secret mes- ing the model lightweight. Since the reversible network is in-
sages in the DCT coefficients. Our method ensures ar- formation lossless in theory, it can preserve details of the in-
tifact imperceptibility of the generated stego image and put as much as possible. Due to these exceptional properties,
robustness against JPEG compression. many works with invertible architecture gain more satisfac-
tory performance than traditional autoencoder frameworks,
• We propose a mutual information loss in the concealing especially for image-related tasks with inherent invertibil-
process and devise a two-way fusion module (TWFM) ity. Work (Van der Ouderaa and Worrall 2019) applies a
used in the revealing process. These two designs mini- reversible network as the central workhorse to the image-
mize the lost information and utilize the reserved crucial to-image translation task, which reduces memory consump-
features to reveal the secret messages as much as possi- tion and generates images with high fidelity. For the image
ble. rescaling task, work (Xiao et al. 2020) utilizes INN to dis-
• Experimental results demonstrate the superior robustness cover a bijective mapping between high-resolution and low-
of our steganography system in lossless or distorted con- resolution images, which alleviates the ill-posed problem of
ditions. Specifically, our method can reliably achieve a image upscaling reconstruction.
0 error for message extraction at an embedding payload The image steganography composed of concealing and re-
of 1 bpp. Moreover, even with an aggressive JPEG com- vealing processes can be considered inverse problems. Work
pression of a very low quality factor, such as QF = 10, (Lu et al. 2021) utilizes the INN to conduct high-capacity s-
the error rate does not exceed 3%. patial image steganography, in which multiple secret images
can be hidden into one cover image. HiNet (Jing et al. 2021)
Related Work attempts to hide the secret image in the wavelet domain us-
ing INN for high invisibility. It presents a low-frequency
Robust Image Hiding loss to confine hiding messages into high-frequency wavelet
Robust image steganography methods have been develope- subbands. Work (Xu et al. 2022) achieves a certain degree
d to meet the application requirements of covert communi- of robustness under various distortions using an invertible
cation in lossy channels, in which the model is resistant to structure. Despite managing a large number of bits, all of
distortions. HiDDeN (Zhu et al. 2018) proposes an end-to- the above INN methods incur high error rates for the recov-
end framework incorporating typical digital attacks into the ered messages. So applying them straightly for the scenario
encoder-decoder structure to simulate distortions. ReDMark with zero tolerance of even a single incorrectly recovered bit
(Ahmadi et al. 2020) presents a deep diffusion framework to is impractical.
improve the robustness. However, these two simple network
structures provide poor universality to different distortion- Proposed Method
s. Work (Wengrowski and Dana 2019) focuses on particu-
lar robustness in the light field messaging (LFM): encod- Overview of the Framework
ed image displayed on a screen and captured by a camera. The traditional neural network utilizes two independent en-
StegaStamp (Tancik, Mildenhall, and Ng 2020) uses a set coder and decoder to model the mappings among secret mes-
of differentiable image augmentations to simulate the print- sage M, cover image C, and stego image S, which can be de-
shooting process and hides hyperlinks in a physical photo- noted as: (M, C) → S and S → (M, C), respectively. This
graph. To model the non-differentiable distortion, work (Liu separate encoder-decoder structure will result in inaccurate
et al. 2019) designs a two-stage separable framework, which bijective mapping and may accumulate the error of one map-
jointly trains the encoder and decoder without noise in the ping into the other. In our proposed method, we attemp-
first stage, and then presents noise attacks in the second t to find an invertible and bijective network that can hide
stage but restricts the loss to only propagate back through the and extract the secret message simultaneously, denoted as:
decoder. Later, work (Jia, Fang, and Zhang 2021) random- (M, C) ↔ S. In Figure 2, the framework of our method con-
ly chooses one of simulated JPEG, real JPEG, and noise- sists of several components: INN, JPEG compression lay-
free layer as the noise layer for each mini-batch to enhance er, discriminator, and two-way fusion module. We employ
the robustness against JPEG compression. Besides, work the affine coupling layers to structure the concealing and re-
(Zhang et al. 2021) uses a forward attack simulation lay- vealing blocks. There are 12 such reversible blocks for both
er to make the pipeline compatible with non-differentiable the concealing and revealing process of the INN. These two
distortion. As for adaptive steganography algorithms in the processes share parameters during training, which drive the
frequency domain, works (Zhang et al. 2019b; Zhu et al. model more effectively than the traditional encoder-decoder

14956
Figure 2: Overview of the proposed steganography architecture.

architecture. The JPEG compression layer is a simulator to ϕ(·), ρ(·), and η(·) denote the arbitrary functions but do not
simulate the non-differentiable JPEG compression in a dif- require to be invertible. Here, we employ the 5-layer Dense-
ferentiable and approximate form, which is then introduced block (Wang et al. 2018) to represent these three functions
to the end-to-end network to alleviate the distortion influ- for satisfactory performance in image processing. Note that
ence and efficiently improve the robustness. The adversarial after a series of concealing blocks, the model outputs the
discriminator is utilized to distinguish whether the generat- DCT coefficients of stego image Sdct and the lost informa-
ed stego image is similar to the original cover image, which tion matrix L. To obtain the corresponding stego image S,
takes advantage of adversarial learning to improve the visu- the IDCT module receives the frequency features Sdct and
al fidelity and steganographic security of the generated stego transforms them back to the spatial domain.
image. The TWFM is designed delicately to combine spatial In the backward revealing process, the DCT module re-
and DCT domain features fully to recover the messages. ceives attacked stego image Sa and converts it into DCT fea-
tures Sdct a . Since only the stego image can be transmitted
Invertible Concealing/Revealing Process on the communication channel, we hold the attacked stego
We denote the cover image with c channels and h × w size image Sa at the start of the revealing process. Therefore, to
c×h×w substitute the lost information matrix L and refine essen-
as C ∈ {0, ..., 255} , where c = 3, h = 256, and
w = 256 in our work. The secret message is denoted as tial information for the revealing, the TWFM incorporates
l
m ∈ {0, 1} of length l. To ensure that the two inputs of the attacked stego image Sa and its frequency coefficients
INN are the same size, we manage the secret message in- Sdct a to generate the auxiliary variable matrix Z. The de-
put as a three-dimensional volume by padding each bit with tailed structure of TWFM will be introduced next. It is not-
the same values into 8×8 small block if the payload is less ed that the concealing blocks and the revealing blocks are
than 1 bpp. The shaped secret message is represented as almost identical, except that the information flowing direc-
c0 ×h×w tion is opposite. For the i-th revealing block, the inputs are
M ∈ {0, 1} , where c0 is the number of channel. In
Si+1
dct a and Z
i+1
, and obtains Sidct a and Zi as the outputs
the forward concealing process, the cover image is convert-
according to the following formula:
ed by DCT transform to form a three-dimensional DCT co-
efficient cube, the same size as the spatial image. Then we
utilize concealing blocks to embed secret messages M into
Zi = (Zi+1 − η(Si+1dct a )) exp(ρ(Si+1
dct a ))
DCT coefficients of cover image Cdct . Considering the i-th (2)
concealing block in Figure 2, we input Cidct and Mi , and Sidct a = Si+1
dct a − ϕ(Zi ),
output Ci+1
dct and M
i+1
, which can be formulated as:
where denotes the matrix division operation. The last re-
vealing block outputs the recovered DCT coefficients of cov-
Ci+1 i i
dct = Cdct + ϕ(M ) er image C0dct and the recovered message M0 . Eventually,
(1)
Mi+1 = Mi exp(ρ(Ci+1 i+1
dct )) + η(Cdct )),
C0dct goes through the IDCT module to transform into the
recovered cover image C0 , and M0 reshapes and maps back
where indicates the Hadamard product operation, and to binary messages m0 .

14957
The Two-Way Fusion Module (TWFM)
When hiding secret messages into the cover image, high-
capacity embedding will inevitably damage the carrier. Be-
sides, to avoid the visual distortion of the generated stego
image, it is difficult to embed the secret message into the
carrier entirely. Thus, the above two information losses con-
stitute the lost information matrix L. Work (Jing et al. 2021)
randomly samples from a Gaussian distribution to make up
the auxiliary variable matrix Z. Then every sampled Z is
trained to ensure that the INN can recover the secret mes- Figure 3: The architecture of the two-way fusion module.
sage. However, this approach does not fully consider that
the L contains valid information from input features, which
Design of the Discriminator
should be utilized as much as possible in the revealing pro-
cess. That is, discarding the L will result in accuracy degra- The discriminator evaluates the difference between cover
dation for the secret message recovery. and stego images and provides feedback on generator per-
Motivated by the self-attention mechanism (Vaswani et al. formance, which further stimulates the generated instances
2017) that maps the input into several different patches to to be closer to the data from the original class. In our pro-
capture the internal correlation of the features, TWFM aims posed method, the discriminator module is composed of 6
at effectively extracting and fusing spatial domain and fre- groups. From group 1 to group 5, each group consists of a
quency domain features to improve the accuracy of secret convolutional layer (kernel size = 3, stride = 2, padding = 1),
message recovery. The architecture of TWFM is shown in a BN layer, and a LeakyReLU activation function. Group 6
Figure 3. The inputs are the attacked stego image Sa and its contains a global average pooling (GAP) and a linear layer
frequency features Sdct a . The attention weight map assigns to output the classification probability. Benefited from the
the weights between 0 and 1 to the corresponding features basic principle of adversarial learning, the generated stego
in the spatial and frequency domains through an element- images have higher visual fidelity and steganographic secu-
wise product. It selects meaningful features from the Sa and rity against statistical detection.
Sdct a while suppressing some irrelevant details. The overall
process can be depicted as follows:
Loss Function
To generate stego images with high fidelity and recover mes-
W = σ[δ(fc (Sa ), fc (Sdct a )) ⊗ δ(fc (Sdct a ), fc (Sa ))], sages with low error, the overall optimization object is:
Z = fc (W ⊗ Sa + W ⊗ Sdct a ) Ltotal = λc Lc + λr Lr − λd Ld + λm Lm . (5)
(3) where λc , λr , λd and λm are the weight factors to balance
different loss terms.
where ⊗ is an element-wise product operation, and σ(·), Concealing loss Lc . Concealing loss Lc indicates the d-
δ(·), and fc (·) denote the softmax function, the concat oper- ifference between cover image C and stego image S. The
ation, and the convolution respectively. The attention weight concealing loss needs to be managed at a decent level to
map W indicates the importance of the regional information obtain good imperceptibility. We employ mean square error
obtained from the input feature matrix, where more atten- (MSE) to measure the difference between them, which can
tion needs to be paid to the high-weight elements. Note that be defined as:
TWFM is plug-and-play, that is, it can be trained end-to-end 1
jointly with the INN. Lc = M SE(C, S) = kC − Sk22 , (6)
c×h×w
Design of the JPEG Compression Layer Revealing loss Lr . Revealing loss Lr states the difference
The JPEG compression process contains a non- between the recovered message M0 and the original secret
differentiable step: the rounding function is a piecewise message M. A low revealing loss is desired to obtain a high
step function that truncates the gradient propagation. Thus, accuracy in message recovery, which can be expressed as:
JPEG is not appropriate for direct end-to-end training 1
Lr = M SE(M, M0 ) = 0 kM − M0 k22 , (7)
optimization. To solve this problem, we adopt a smooth c ×h×w
rounding function R (x) in work (Shin and Song 2017) to Discrimination loss Ld . The discrimination loss is adopt-
simulate the rounding step, which can be formulated as: ed to enhance the visual fidelity of the generated stego image
R(x) = [x] + (x − [x])3 , (4) and the anti-steganalysis ability of the network, where as de-
scribed in the following cross-entropy loss function:
where [x] is the rounding operation on x, and R(x) repre-
1 X
sents the quantized and simulated rounded DCT coefficien- Ld = − [yi · log (pi ) + (1 − yi ) · log (1 − pi )] ,
t. The simulated rounding function R(x) is approximately N i
continuous, which indicates that the derivative is nonzero. (8)
Through this way, the non-differentiable JPEG compression where yi refers to the ground truth label (the cover as 0 and
can be simulated in a differentiable form to keep gradient stego as 1). The pi represents the classification probability
propagation in the training process. of the discriminator.

14958
Cover/Stego Secret/Recovery
Methods
PSNR(dB)/SSIM BER(%)
HiDDeN 36.61/0.922 24.72
Hayes 35.54/0.914 26.29
SteganoGAN 40.47/0.971 1.43
ABDH 42.38/0.988 1.95
HiNet 47.43/0.993 0.47
Ours 48.41/0.996 0

Table 1: Comparison between our proposed method and oth-


er steganography methods with 1 bpp embedding payload.

Mutual information loss Lm . Mutual information reflect- Figure 4: Visual comparison of our proposed method and
s the correlation between two variables: the amount of infor- other steganography methods: (a)/(c) the generated stego
mation contained in one variable about the other variable. In images, (b)/(d) the differences between the cover and stego
Figure 2, the forward concealing process has the secret mes- images (magnified 10 times).
sage M as the input and the lost information matrix L as the
output. Ideally, when the mutual information between them
is close to 0, it can be assumed that the distribution of L is
independent of the distribution of the input M. In this case,
we can discard L without rendering information loss for re-
vealing. Thus, to preserve valid information in embedding,
a mutual information loss Lm is proposed to constrain the
direction of information flow. It can be defined as follows:
Lm = H (M) + H (L) − H (M, L) , (9) Figure 5: The visual effect of our proposed method under
where H (M) and H (L) represent information entropy, and JPEG compression with different quality factors: (a) the gen-
H (M, L) represents the joint entropy of the M and L. The erated stego images, (b) the difference between the cover and
information entropy and joint entropy are calculated as: compressed stego images (magnified 5 times).
N
X −1
H(M) = − Pi log Pi , (10) Evaluation metrics. We evaluate the performance from
i=0 the following several aspects. The peak signal-to-noise ra-
X tio (PSNR) (Welstead 1999) and structural similarity index
H(M, L) = − PM L (i, j) · log PM L (i, j), (11) measure (SSIM) (Wang et al. 2004) are adopted as the im-
i,j
age quality assessment. The bit error rate (BER) is utilized
where N is the number of distinct pixel values in the matrix, to measure the robustness, and a lower BER means the re-
Pi denotes the probability that the pixel with value i appears covered message m0 is closer to the original message m.
in the matrix. PM L (i, j) is the probability that the pixel at Since the recovered message m0 are floating numbers, we
the same position has a value of i in matrix M and a value map the bits in m0 greater than 0.5 to 1, and the rest to 0. As
of j in matrix L. for steganographic security, the detection error indicates that
the steganalysis network is unable to distinguish cover/stego
Experimental Results images yielded by a specific steganography method.
Datasets and implemental details. The dataset for train-
ing and testing is MSCOCO (Lin et al. 2014). We randomly Performances Analysis
select 5000 images as cover images for training and 1000 Steganographic imperceptibility. From the perspective
images for testing, respectively. The cover images used for of quantitative results, Table 1 shows the significant supe-
training or testing above do not overlap and are all cropped riority of our method compared with HiDDeN (Zhu et al.
at resolution 256 × 256 using the center-cropping strate- 2018), Hayes (Hayes and Danezis 2017), SteganoGAN
gy with MATLAB. The Adam optimizer (Kingma and Ba (Zhang et al. 2019a), ABDH (Yu 2020), and HiNet (Jing
2014) with standard parameters is adopted to optimize our et al. 2021). The framework of ABDH and HiNet are mainly
network. The initial learning rate is set as 0.0001 and batch designed to hide a color image in one carrier image. To make
size as 2 to adapt our devices. The whole training process in- it available at capacity with 1 bpp, we finetuned the networks
cludes 120 epochs. The whole framework is implemented by for hiding message bits. For cover/stego image pairs, the av-
PyTorch and executed on NVIDIA GeForce RTX 2080 Ti. erage PSNR and SSIM for our generated stego image are
At the end of the training, the model has already converged distinctly higher than the comparison methods. Notably, for
sufficiently. The weight factors λc , λr , λd and λm are set to secret/recovery pairs, our method achieves 100% recovery
1.0, 15.0, 3.0, and 5.0, respectively. accuracy, which can be applied to the scenario to extract the

14959
Metrics Identity JPEG 100 JPEG 90 JPEG 70 JPEG 50 JPEG 30 JPEG 10
BER(%) 0 0.19 0.31 0.66 0.92 1.47 2.92
PSNR(dB)/SSIM 48.41/0.996 45.54/0.992 44.13/0.985 41.39/0.977 37.08/0.967 36.27/0.955 34.58/0.948

Table 2: The average metric values of our method for cover/stego pairs and secret/recovery pairs with the payload of 1 bpp. The
QFs chosen for training and testing are the same.

Methods Payload (bit) Identity JPEG 90 JPEG 70 JPEG 50 JPEG 30 JPEG 10


HiDDeN 120 37.41 38.18 38.83 38.11 44.58 49.01
TSR 30 9.29 15.12 29.21 34.64 39.88 46.59
MBRS 256 0 0.00063 0.0098 1.35 8.58 33.17
Ours 256×4 0.0022 0.0072 0.081 0.74 3.47 10.93

Table 3: Comparison for the BER of secret message restoration. We chose JPEG compression with quality factor 50 in training,
then tested with different quality factors. The PSNR value of the stego image is adjusted to about 33.5 dB for fair.

Detection Error(%) our method has strong resistance against severe JPEG com-
Methods
XuNet SRNet WISERNet pression distortion. Due to such a low BER for extracting
HiDDeN 0.22 0 2.55 messages, error correction codes can be utilized to achieve
Hayes 0.04 0 1.13 100% extraction accuracy in practical applications.
SteganoGAN 2.35 0 1.67 To better illustrate the robustness of our method, we
ABDH 10.11 2.86 4.89 compare the performance or our method with several ad-
Ours 19.27 5.14 9.37 vanced robust image hiding methods: HiDDeN (Zhu et al.
2018), TSR (Liu et al. 2019), and MBRS (Jia, Fang, and
Table 4: Comparison on detection error using three steganal- Zhang 2021). All the methods apply JPEG compression with
ysis networks with the embedding payload of 1 bpp. QF = 50 to the training process, while different QFs vary
from 10 to 90 for testing. That is, the testing is under an un-
known channel. Note that there is a trade-off relationship be-
hidden message without error. Figure 4 compares the stego tween the message embedding and extracting. To make the
images of our method with the other four methods. The d- comparative experiment fair, we maintain the PSNR of the
ifference between cover and stego images shows the extent stego images generated by the embedding process as 33.5 d-
of damage to the cover image after embedding secret mes- B, then compare the BER of the recovered secret message. In
sages. We can see that the stego images generated by our addition, the embedding payload of our method has reduced
proposed method differ slightly from the cover images and to 256×4 bits to achieve better robust performance. As seen
have no obvious text-copying artifacts or color distortion. in Table 3, although the embedding payload of our method
In contrast, the method HiDDeN, Hayes, and SteganoGAN is 4 times that of MBRS, our method provides much better
have apparent differences between cover and stego images, BER performance, especially when JPEG compression QF
especially in complex texture and edge regions. The above is decreased to 50 or even lower. As the intensity of JPEG
comparison demonstrated the effectiveness of our proposed compression increases, the superiority of our method over
method in generating stego images with pleasing visual fi- MBRS becomes more pronounced. The above experimen-
delity and high reconstruction quality. tal results demonstrated that our method provides outstand-
ing robustness for aggressive JPEG compression as low as
Robustness against JPEG compression. We analyze the QF = 10, no matter the transmission channel is known or
robustness of our proposed method with the embedding pay- unknown. Rather than holding reconstruction capability at a
load of 1 bpp. We train the model using the simulated JPEG specific compression factor, the robustness of our method is
compression with different quality factors (QFs) and test- more stable and general in practical applications.
ing with the same QFs. Figure 5 shows the generated stego
images, and the difference between cover and stego images Security analysis. The steganography security is usually
after JPEG compression. The stego images yielded by our evaluated by measuring the detection error of a certain ste-
method maintain high visible fidelity without block-blurring ganalyzer to distinguish the images generated by a stegano-
artifacts, and the difference between cover and stego images graphic algorithm. Current research (Yang et al. 2019) has
is nearly invisible. The quantitative results are displayed in demonstrated that the CNN-based steganalyzer is able to
Table 2. The QFs chosen for training and testing are the reduce the detection error dramatically compared with the
same, which means the testing is under a known channel. conventional steganalyzer. To verify the security, we adopt
Our method can reach satisfactory reconstruction results un- three advanced steganalysis networks XuNet (Xu, Wu, and
der different QFs. In particular, our model achieves the BER Shi 2016), SRNet (Boroumand, Chen, and Fridrich 2018),
of less than 3% even subjected to a high-intensity JPEG and WISERNet (Zeng et al. 2019) to assess the stegano-
compression with a quality factor of 10, which indicates that graphic security for different steganography methods with

14960
Cover/Stego Secret/Recovered
Discrete cosine transform Ld loss Lm loss + TWFM Detection Error(%)
PSNR(dB)/SSIM BER(%)
% % % 36.75/0.961 1.4 4.81
! % % 43.29/0.975 0.23 10.36
! ! % 48.13/0.988 0.39 19.01
! ! ! 48.41/0.996 0 19.27

Table 5: Ablation study on discrete cosine transform, discrimination loss Ld , mutual information loss Lm and TWFM.

DCT domain is nearly invisible, which means embedding


into frequency domain coefficients will better adapt distor-
tion and maintain the detailed content of the cover image.
From the first and second rows in Table 5, the PSNR and
SSIM with DCT transform increase by 6.54 dB and 0.014,
respectively, and the detection error increases from 4.81%
Figure 6: Visual results of ablation study on DCT. The dif- to 10.36%. This ablation further demonstrates that the DC-
ferences are between the cover and compressed stego images T transform is inherently robust to JPEG compression and
with QF = 50. plays an indispensable role in the robust framework.
Effectiveness of Ld . The discriminator provides progres-
sive feedback to the generator, enabling it to enhance the im-
perceptibility and security of the generated stego images. As
shown in the second and third rows in Table 5, with the Ld
loss, the visual quality of the stego image can be improved
by 4.84 dB in terms of PSNR, by 0.013 in terms of SSIM.
In addition, the steganographic security of our method is en-
hanced, for which the detection error is increased by 8.65%.
Effectiveness of Lm and TWFM. The Lm can guide the
information flow in the INN to flow more to the branch that
yields the stego image, thereby preserving the valid informa-
tion for the revealing process. The TWFM makes full use of
this information to facilitate the recovery of secret messages.
The training loss curve with and without Lm and TWFM has
Figure 7: The total loss curve of ablation study on Lm loss shown in Figure 7. It can be seen that these two designs can
and TWFM. Note that when the training process reaches to accelerate the convergence of the network in training and
120 epochs, two models have converged. decline the total loss of the entire network to a certain ex-
tent. From the third and fourth rows in Table 5, the BER in
secret/recovered pairs decreases, which also highlights the
the embedding payload of 1 bpp. We yield 5,000 cover/stego contribution of Lm loss and TWFM.
image pairs for each method to re-train the steganalysis
networks. Table 4 presents the detection error using three
steganalysis networks for different methods. Our proposed Conclusion
method increases the detection error by 9.16%, 2.28%, and This paper presents a robust end-to-end image steganogra-
4.48% compared to the second best result, respectively. The phy system based on INN, which is resistant to lossy JPEG
considerably higher detection error than other methods a- compression. Unlike other works with INN, we are the first
gainst three steganalysis networks demonstrates that our to consider from the viewpoint of reducing the valid infor-
method has relatively better steganographic security. mation loss in the forward process and utilizing it in the
recovery process by designing mutual information loss and
Ablation Study TWFM module. Experiments demonstrate that our method
Effectiveness of discrete cosine transform. To conduct can yield high-fidelity stego images and achieve 0 error for
this ablation, we embedded secret messages into spatial do- message extraction with 1 bpp embedding capacity. In addi-
main and DCT domain respectively. In Figure 6, under the tion, our method has robustness against JPEG compression
JPEG compression with QF = 50, the generated stego us- under the QFs of the transmission channel are known or un-
ing spatial domain (without using DCT) shows visible block known, which shows the generalization performance. More-
artifacts. The difference between the cover and stego image over, compared with other advanced steganography method-
contains some apparent regions with color distortion. In con- s, our method achieves the best security performance under
trast, the difference between the cover and stego image using three deep learning-based steganalysis network detections.

14961
Acknowledgements Lu, S.-P.; Wang, R.; Zhong, T.; and Rosin, P. L. 2021. Large-
This work was supported partly by the NSFC under Grant capacity image steganography based on invertible neural
62072484 and 62102462, partly by the Guangdong Ba- networks. In Proceedings of the IEEE/CVF Conference on
sic and Applied Basic Research Foundation under Grant Computer Vision and Pattern Recognition, 10816–10825.
2022A1515010108, and partly by the Research Project of Pevnỳ, T.; Filler, T.; and Bas, P. 2010. Using high-
Guangdong Polytechnic Normal University under Grant dimensional image models to perform highly undetectable
2022SDKYA027. steganography. In Proceedings of the International work-
shop on information hiding, 161–177. Springer.
References Shin, R.; and Song, D. 2017. Jpeg-resistant adversarial im-
ages. In Proceedings of the NIPS 2017 Workshop on Ma-
Ahmadi, M.; Norouzi, A.; Karimi, N.; Samavi, S.; and Ema-
chine Learning and Computer Security, volume 1, 8.
mi, A. 2020. ReDMark: Framework for residual diffusion
watermarking based on deep networks. Expert Systems with Tan, J.; Liao, X.; Liu, J.; Cao, Y.; and Jiang, H. 2021. Chan-
Applications, 146: 113157. nel attention image steganography with generative adversar-
ial networks. IEEE Transactions on Network Science and
Boroumand, M.; Chen, M.; and Fridrich, J. 2018. Deep Engineering, 9(2): 888–903.
residual network for steganalysis of digital images. IEEE
Tancik, M.; Mildenhall, B.; and Ng, R. 2020. Stegastamp:
Transactions on Information Forensics and Security, 14(5):
Invisible hyperlinks in physical photographs. In Proceed-
1181–1193.
ings of the IEEE/CVF Conference on Computer Vision and
Dinh, L.; Krueger, D.; and Bengio, Y. 2014. Nice: Non- Pattern Recognition, 2117–2126.
linear independent components estimation. arXiv preprint Van der Ouderaa, T. F.; and Worrall, D. E. 2019. Reversible
arXiv:1410.8516. gans for memory-efficient image-to-image translation. In
Filler, T.; Judas, J.; and Fridrich, J. 2011. Minimizing Ad- Proceedings of the IEEE/CVF Conference on Computer Vi-
ditive Distortion in Steganography Using Syndrome-Trellis sion and Pattern Recognition, 4720–4728.
Codes. IEEE Transactions on Information Forensics and Se- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones,
curity, 6(3): 920–935. L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. At-
Hayes, J.; and Danezis, G. 2017. Generating steganographic tention is all you need. Advances in neural information pro-
images via adversarial training. Advances in neural infor- cessing systems, 30.
mation processing systems, 30. Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.;
Holub, V.; Fridrich, J.; and Denemark, T. 2014. Universal and Loy, C. C. 2018. ESRGAN: Enhanced Super-Resolution
distortion function for steganography in an arbitrary domain. Generative Adversarial Networks. In Proceedings of the Eu-
EURASIP Journal on Information Security, 2014(1): 1–13. ropean Conference on Computer Vision, 63–79. Springer.
Jia, Z.; Fang, H.; and Zhang, W. 2021. Mbrs: Enhancing Wang, Z.; Bovik, A. C.; Sheikh, H. R.; and Simoncelli, E. P.
robustness of dnn-based watermarking by mini-batch of real 2004. Image quality assessment: from error visibility to
and simulated jpeg compression. In Proceedings of the 29th structural similarity. IEEE transactions on image process-
ACM International Conference on Multimedia, 41–49. ing, 13(4): 600–612.
Jing, J.; Deng, X.; Xu, M.; Wang, J.; and Guan, Z. 2021. Welstead, S. T. 1999. Fractal and wavelet image compres-
HiNet: deep image hiding by invertible network. In Proceed- sion techniques, volume 40. Spie Press.
ings of the IEEE/CVF International Conference on Comput- Wengrowski, E.; and Dana, K. 2019. Light field messaging
er Vision, 4733–4742. with deep photographic steganography. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
Kingma, D. P.; and Ba, J. 2014. Adam: A method for s-
Recognition, 1515–1524.
tochastic optimization. arXiv preprint arXiv:1412.6980.
Xiao, M.; Zheng, S.; Liu, C.; Wang, Y.; He, D.; Ke, G.; Bian,
Kingma, D. P.; and Dhariwal, P. 2018. Glow: Generative J.; Lin, Z.; and Liu, T.-Y. 2020. Invertible image rescaling.
flow with invertible 1x1 convolutions. Advances in neural In Proceedings of the European Conference on Computer
information processing systems, 31. Vision, 126–144. Springer.
Li, B.; Wang, M.; Huang, J.; and Li, X. 2014. A new cost Xu, G.; Wu, H.-Z.; and Shi, Y. Q. 2016. Ensemble of CNNs
function for spatial image steganography. In Proceedings of for steganalysis: An empirical study. In Proceedings of the
the 2014 IEEE International Conference on Image Process- 4th ACM Workshop on Information Hiding and Multimedia
ing (ICIP), 4206–4210. IEEE. Security, 103–107.
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ra- Xu, Y.; Mou, C.; Hu, Y.; Xie, J.; and Zhang, J. 2022. Ro-
manan, D.; Dollár, P.; and Zitnick, C. L. 2014. Microsoft bust Invertible Image Steganography. In Proceedings of
coco: Common objects in context. In Proceedings of the Eu- the IEEE/CVF Conference on Computer Vision and Pattern
ropean conference on computer vision, 740–755. Springer. Recognition, 7875–7884.
Liu, Y.; Guo, M.; Zhang, J.; Zhu, Y.; and Xie, X. 2019. A Yang, J.; Ruan, D.; Huang, J.; Kang, X.; and Shi, Y.-Q.
novel two-stage separable deep learning framework for prac- 2019. An embedding cost learning framework using GAN.
tical blind watermarking. In Proceedings of the 27th ACM IEEE Transactions on Information Forensics and Security,
International Conference on Multimedia, 1509–1517. 15: 839–851.

14962
Yu, C. 2020. Attention based data hiding with generative ad-
versarial networks. In Proceedings of the AAAI Conference
on Artificial Intelligence, volume 34, 1120–1128.
Zeng, J.; Tan, S.; Liu, G.; Li, B.; and Huang, J. 2019. WIS-
ERNet: Wider separate-then-reunion network for steganal-
ysis of color images. IEEE Transactions on Information
Forensics and Security, 14(10): 2735–2748.
Zhang, C.; Karjauv, A.; Benz, P.; and Kweon, I. S. 2021. To-
wards Robust Deep Hiding Under Non-Differentiable Dis-
tortions for Practical Blind Watermarking. In Proceedings
of the 29th ACM International Conference on Multimedia,
5158–5166.
Zhang, K. A.; Cuesta-Infante, A.; Xu, L.; and Veera-
machaneni, K. 2019a. SteganoGAN: High capacity im-
age steganography with GANs. arXiv preprint arX-
iv:1901.03892.
Zhang, Y.; Luo, X.; Guo, Y.; Qin, C.; and Liu, F.
2019b. Multiple robustness enhancements for image adap-
tive steganography in lossy channels. IEEE Transactions
on Circuits and Systems for Video Technology, 30(8): 2750–
2764.
Zhu, J.; Kaplan, R.; Johnson, J.; and Fei-Fei, L. 2018. Hid-
den: Hiding data with deep networks. In Proceedings of the
European conference on computer vision (ECCV), 657–672.
Zhu, L.; Luo, X.; Yang, C.; Zhang, Y.; and Liu, F. 2021. In-
variances of JPEG-quantized DCT coefficients and their ap-
plication in robust image steganography. Signal Processing,
183: 108015.

14963

You might also like