Signal Processing: Fengyong Li, Zongliang Yu, Chuan Qin
Signal Processing: Fengyong Li, Zongliang Yu, Chuan Qin
Signal Processing
journal homepage: www.elsevier.com/locate/sigpro
a r t i c l e i n f o a b s t r a c t
Article history: Steganography has been widely used to realize covert communication with multimedia content. By sim-
Received 14 July 2021 ulating the competition between a generator and a discriminator in a generative adversarial network
Revised 3 September 2021
(GAN), good distortion measurement can be automatically learnt to help designing high undetectable
Accepted 19 September 2021
steganography. Nevertheless, existing GAN-based steganography always uses linear convolutional neu-
Available online 26 September 2021
ral networks to construct generators, where the modification information of pixel in deep layers cannot
Keywords: be real-time transmitted to the neurons of the shallow layers, resulting in a low undetectable perfor-
Information hiding mance due to insufficient network training. To address the problem, we propose a new GAN-based spatial
Image steganography steganographic scheme. Different from existing GAN-based steganography, we build multiple cross feed-
Generative adversarial network back channels between the contraction path and the expansion path of the generator, which allows that
Cross feedback the down-sampling information is directly sent to the expansion layer through these feedback channels.
Distortion
Since the detailed information from deep layers can be captured effectively by referring to the informa-
tion from cover image, the generator can finally learn a sophisticated probability map to guide high unde-
tectable steganography. Comprehensive experimental results demonstrate that, with the same steganalyz-
ers, our scheme is superior to existing GAN-based steganographic schemes in terms of anti-steganalysis
capability.
© 2021 Elsevier B.V. All rights reserved.
1. Introduction ding operations in these regions, while smooth regions are avoided
for data embedding [7]. Accordingly, more sophisticated stegano-
As an efficient private communication method, steganography graphic schemes can be successfully designed by combining a rea-
[1] aims to embed secret messages into digital media, such as dig- sonable distortion measurement and syndrome trellis code (STC)
ital images, video or audio files, for covert communication. Tra- [8]. We term this way [9] as content adaptive steganography, which
ditional steganography mainly focuses on embedding messages in has been studied in a series of works, such as HUGO [10], WOW
a digital image, while keeping the least embedding modifications [11], S-UNIWARD [12], MiPOD [13] and HILL [14].
for this image [2]. However, with the development of steganaly- Corresponding to these sophisticated steganographic methods,
sis techniques [3,4] whose main purpose is to detect the existence a large number of steganalysis schemes represented by the high
of secret information, traditional steganography faces more extra dimensional features are used to evaluate the security of the
challenges. For instance, by combining various classifiers trained steganography, such as Spatial Rich Model (SRM) [15] and the vari-
in a supervised learning fashion, steganalysis can effectively find ant version maxSRMd2 [16], which have been experimentally con-
the abnormal statistical characteristics before and after message firmed to be able to effectively capture the slight embedding mod-
embedding. To avoid the problems in traditional steganography, ifications. With the development of convolutional neural networks
the users try to tap the opportunity of measuring each pixel’s (CNNs) based deep learning technology, people try to introduce the
modification effect [5] and assign a cost to quantify the modi- CNN in steganalysis and thus obtain considerable achievements.
fication effect [6]. Since a smaller cost value is usually assigned Qian et al. [17] designed a CNN structure based on a Gaussian non-
to the pixels locating at complex texture or edge regions of im- linear activation function, and claimed that a low payload model
ages, these steganographic schemes always force the data embed- can be constructed from a pre-trained CNN model with a high
payload. Xu et al. [18] introduced batch normalization layers and
1 × 1 convolution layers to construct a new CNN structure, named
∗
Corresponding author.
by Xu-Net, which is verified to be able to achieve a similar se-
E-mail address: [email protected] (C. Qin).
https://doi.org/10.1016/j.sigpro.2021.108341
0165-1684/© 2021 Elsevier B.V. All rights reserved.
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
curity performance to SRM scheme. Ye et al. [19] considered the BOSSbase v1.01. Experimental results show that the proposed
domain knowledge in terms of the number of high-pass filters, a cross feedback mechanism can significantly improve the effec-
truncated linear unit, and selection channel awareness to design a tiveness and accuracy of cost learning, and proposed scheme
CNN structure with ten layers, which can significantly improve the outperforms the state of the art steganographic methods in
detection performance of adaptive steganography. Yedroudj et al. terms of anti-steganalysis capability.
[20] further improved the detection performance by combining Xu-
The rest of this paper is organized as follows. Section 2 review
Net and Ye-Net. Additionally, Deng et al. [21] firstly introduced
existing deep learning based steganographic works. Section 3 pro-
global co-variance pooling into steganalysis in 2019 to accelerate
vides the detailed procedure of our proposed scheme. Subse-
the fitting speed of the network. Zhu et al. [22] used two different
quently, we evaluate the performance of the proposed scheme
deep-separated convolution modules (Zhu-Net for short) to obtain
and present corresponding discussions in Section 4. Finally,
spatial residual characteristics and channel residual characteristic
Section 5 concludes the paper.
information, which further improve the accuracy of steganalysis.
Compared with applying deep learning to steganalysis, only a
2. Related work
few works [23] of using deep networks in steganography have
been reported. Tang et al. [24] proposed a GAN-based stegano-
2.1. ASDL-GAN model
graphic method named by ASDL-GAN (Automatic Steganographic
Distortion Learning framework with Generative Adversarial Net-
ASDL-GAN model [24] is firstly proposed to form an actual
work). They constructed a special data embedding simulator by
steganographic framework by introducing a steganographic gener-
introducing a dedicated Ternary Embedding Simulator (TES) and
ative sub-network and a steganalytic discriminative sub-network.
can automatically learn the embedding costs and implement re-
The former is to generate the modification probability map for
liable data extraction. With ASDL-GAN framework, Yang et al.
cover image, while the latter is to discriminate cover images and
[25] further used the differentiable Tanh-simulator activation func-
stego images and provides an effective feedback.
tion instead of the non-differentiable embedded simulation net-
In ASDL-GAN model, ternary embedding, i.e., the modification
work (TES) [8,26] to generate the modification location map.
directions φ ∈ {+1, 0, −1}, is employed to construct a TES activa-
Yang’s scheme can significantly accelerate the convergence of the
tion function as the embedding simulator. The given change prob-
steganographic model and save training time. Wu et al. [27] re-
ability pi, j ( pi, j ∈ (0, 0.5 )) and a random number ni, j (ni, j ∈ [0, 1])
constructed the generator of GAN by combining multiple feature
are considered as the input, and the corresponding modification
maps to further optimize ASDL-GAN and Yang’s scheme. Tang et al.
mi, j can be calculated as follows.
[28] proposed another novel cost learning framework based on
deep reinforcement learning, called SPAR-RL, which used an agent −1, if ni, j < pi, j /2
to learn the optimal embedding policy that maximizes the re- mi, j = 1, if ni, j > 1 − pi, j /2 (1)
wards from the steganalytic environment and can thus produce 0, otherwise
better content-adaptive costs. The aforementioned steganographic
As a result, by combining the given total capacity, the mod-
schemes provide new steganography design idea and present the
ification probabilities p+1 , p−1 , p0i, j for {+1, 0, −1} are accumu-
secure steganography performance. Nevertheless, these schemes i, j i, j
are hard, at least inconvenient, to further obtain an improvement lated to meet the embedding capacity of corresponding stego im-
due to some potential limitations. Firstly, for multi-layer convolu- age. With Xu-Net [18] as the discriminator, ASDL-GAN model au-
tional neural networks, the linear-connection manner between the tomatically learns embedding change probabilities for each pixel
contraction convolution and the expansion convolution makes that and coverts them as steganographic distortion, which is combined
the down-sampling information in the contraction path cannot be with a minimal-distortion framework, e.g., Syndrome Trellis Codes
fed back to the expansion layer in real time. Therefore, the opti- (STCs) framework [8], to construct GAN-based steganography. Nev-
mization objectives may not fully utilize the learning ability to- ertheless, ASDL-GAN model always shows an inferior undetectable
wards pixel-level embedding costs. Secondly, some important fine- performance comparing with SUNIWARD and is time-consuming
grained pixel-level information of cover image may be ignored and due to a long-time pre-training.
cannot be effectively exploited by the generator.
Facing the aforementioned problems, we introduce a cross feed- 2.2. UT-GAN model
back mechanism in generative adversarial network and conduct
GAN-based secure steganography by automatically learning the Inspired by ASDL-GAN model, Yang et al. [25] proposed another
embedding cost. Compared with the previous methods, the main adversarial steganographic scheme by using a more compact U-
contributions of this paper are as follows: NET [29] based generator, which is called as UT-GAN model. With
a series of extensive experiments, the authors verified that this
generator improves security performance significantly and reduces
• The proposed GAN-based steganography reconstructs the struc-
training time dramatically. In order to avoid the problem that the
ture of generative adversarial network and simulates the com-
parameters in TES are not easy to back propagate, UT-GAN model
petition by alternatively updating a generator and a discrimi-
changes the activation function by using double Tanh simulator in-
nator in the reconstructed GAN. The embedding costs can be
stead of TES, and can save two thirds of training time for each
automatically learnt to guide effective and secure data embed-
epoch due to a linear calculation. The definition of double Tanh
ding.
simulator is as follows.
• Multiple cross feedback channels are built between the contrac-
tion path and the expansion path of the generator, which allows mi, j = −0.5 × tanh(λ( pi, j − 2 × ni, j )) + 0.5
that the down-sampling information is directly sent to the ex-
× tanh(λ( pi, j − 2 × (1 − ni, j ))) (2)
pansion layer through these feedback channels. Since the infor-
mation from deep layers in GAN is captured effectively by the
ex − e−x
generator in training stage, more sophisticated probability map tanh(x ) = (3)
can be generated to guide a high undetectable steganography. ex + e−x
• We perform comprehensive experiments over different image where pi, j and ni, j have the same meaning as in Eq. (1) and λ con-
sets, including a large-scale natural image set Place365 and trols the slope at the junction of stairs.
2
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
By combining the channel selection in the discriminator, a more 3. Proposed steganographic model
sophisticated distortion measurement is generated to guide the se-
cure steganography designing. Compared with ASDL-GAN, UT-GAN 3.1. Notation
model has a significant performance improvement w.r.t. training
time, but, the undetectability still has much room for improve- To ease description, we denote X and Y are M × N cover
ment. and corresponding stego images, respectively, where X = (xi, j )M×N
and Y = (yi, j )M×N , their pixel value xi, j , yi, j ∈ {0, 1, . . . , 255}, 1 ≤
i ≤ M, 1 ≤ j ≤ N. P = ( pi, j )M×N is the embedding probability map,
2.3. R-GAN model while M = (mi, j )M×N represents the corresponding modification
map and mi, j ∈ Ii, j = {−1, 0, 1}.
Since the U-NET structure in the UT-GAN model only considers
one feature mapping concatenation, this structure does not involve 3.2. The overall framework
enough feature information, especially in the first few layers of the
contraction path. As a result, the image textures in finer detail may Our scheme reconstructs the structure of GAN by building mul-
not be effectively learned. Facing the aforementioned problem, Wu tiple cross feedback channels between the contraction path and
et al. [27] reconstructed a new U-NET to obtain a more sophisti- the expansion path of the generator. The reconstructed framework
cated probability map, which are used to generate more accurate is used to automatically learn the embedding probability of each
distortion measurement. The corresponding structure is called as pixel for cover image and then deduce the optimal distortion mea-
R-GAN model. surement to guide a securer steganographic scheme.
In the U-NET structure of R-GAN model, the maximum number The proposed framework contains two part: a steganographic
of feature channels is adjusted from 128 to 256 in the contract- generator G and a steganalytic discriminator D. In the former, a
ing path, and two feature maps before and after convolution in cross feedback channel is built between each contraction path and
the corresponding contracting layer are combined in the expand- its corresponding expansion path. The texture information from
ing path, which ensures that the image features learned from the extension layer can be sent back in time to the contraction layer
contracting path can be used effectively for image reconstruction. and the modification probability can be automatically learnt by
Although R-GAN model can achieve a more secure stegano- multiple iterations. Subsequently, Tanh simulator is employed to
graphic scheme, it is slightly more time-consuming comparing generate the modification map, which is used to guide the stego
with UT-GAN model, because the combination for multiple feature image generation. In the latter, a steganalytic network, e.g., Xu-
maps contains more extra processing steps in each iteration. Net, is used to provide the real-time feedback indicating the anti-
steganalysis capability of stego image. An optimal probability map
can be effectively learnt by the generator and discriminator com-
2.4. SPAR-RL model
peting against each other. The overall framework is shown in Fig. 1.
3
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
Fig. 1. The framework of proposed scheme (using the gray ’1013.pgm’ image in BOSSbase as an example).
Table 1
Specific network structure of our generator.
constructed generator (i.e. total number of layers along a path) is the expanding layer can be directly sent back to the corresponding
L = 16. The detailed structure of reconstructed generator is shown contracting layer in real-time. For example, in Fig. 2, a feedback
in Table 1. channel can be constructed between the ith (i ∈ [1, 7 )) expand-
Second, in the contracting path, we replace the traditional pool- ing layer and the (15 − i )th contracting layer. The specific network
ing layer with the convolution layer with step size 2, which can in- structure of the generator is shown in Fig. 2.
crease the channel number of feature maps and further reduce the
loss of features. With the convolution modules increasing, both the
3.4. Design of the discriminator
detailed features and the contour features can be gradually learnt
from the cover image.
An adversarial network can be trained by steganography and
Third, in the expanding path, the loss of feature information
steganalysis competing against each other. In general, a stronger
may affect whether the image features learned from the con-
discriminator makes better detection performance, and then moti-
tracting path can be used effectively for image reconstruction. To
vates more secure steganography by a series of adversarial training.
address the above problem, we implement pixel-level fusion by
With the reconstructed generator, an efficient discriminator (also
building a cross feedback between a contracting bock and corre-
considered as steganalytic tool) needs to be built as the adversar-
sponding expanding block. The feature maps with same size in
ial tool in our framework. Considering the above requirement, we
4
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
Fig. 2. The structure of proposed generator. The red line represents the constructed feedback channels. (For interpretation of the references to colour in this figure legend,
the reader is referred to the web version of this article.)
Table 2
Specific network structure of our discriminator.
5
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
for the generator G should not only consider the adversarial train- Notably, the specific training rounds of the network are not
ing between discriminator and generator, but also consider the ac- fixed and it needs to be adjusted adaptively according to the above
tual embedding capacity. From this perspective, the loss function mentioned two convergence criteria during the training process. In
for the generator should contain the adversarial loss GL1G and the addition, the double-tanh function is mainly used to convert the
entropy loss GL2G , where the adversarial loss makes the stegano- probability map to modification map, which is finally used to di-
graphic images more difficult to detect and can be recorded as the rectly generate the corresponding stego image by adding cover im-
inverse loss of discriminator, while the entropy loss guarantees the age.
stego image is embedded with an approximate given embedding
rate Q and it should thus involve the actual capacity C calculated 3.7. Practical steganography with STC framework
by the embedding probability. Accordingly,
In this section, we present the actual steganographic proce-
GL1G = −LD (5) dure by using reconstructed GAN network. Essentially, GAN-based
steganography model is still based on the distortion-minimizing
GL2G = (C − H × W × Q )2 (6) framework, whose main goal is to design the distortion function
to calculate the distortion distribution of all pixels. However, since
where H and W are the height and width of the image, and the GAN network can only output the optimal embedding probabili-
actual embedding capacity C needs to consider p+1i, j
, p−1
i, j
and p0i, j , ties of all pixels by a series of adversarial training, before perform-
which stand for the modification probabilities that the pixel xi, j is ing the practical steganography, the embedding probability map is
performed +1, −1 and unchanged, respectively, that is, necessary to be transmitted to the actual distortion (also called
embedding cost). Accordingly, the practical steganography is eas-
p+1
i, j
= p−1
i, j
= pi, j /2, pi, j ∈ (0, 0.5 ) (7)
ily achieved by using the STC framework [8].
Figure 4 presents a simple data embedding and extracting pro-
p0i, j = 1 − pi, j (8) cedure by using our reconstructed generator with cross feedback
mechanism. When the GAN network is trained completely, the
Accordingly, the actual embedding capacity C can be easily generator G can obtain the corresponding embedding modification
computed by the following equation. probability pi, j of each pixel xi, j , which is used to further calculate
H
W the actual steganographic distortion (embedding cost) ρi, j by the
C= (−p+1 log2 p+1 − p−1 log2 p−1 − p0i, j log2 p0i, j ) (9) following equation.
i=1 j=1
i, j i, j i, j i, j
ρi, j = ln 1/ pi, j − 2 , pi, j ∈ (0, 0.5 ) (13)
Finally, the total loss function LG of the generator G is defined
With the STC encoder, the secret messages can be embedded
by combining with the adversarial loss GL1G and the entropy loss
into the cover image with embedding cost map and a secret key.
GL2G . Correspondingly, due to the symmetry of STC framework, the re-
LG = α × GL1G + β × GL2G (10) ceiver can also extract the secret message correctly by the secret
key.
where α and β are two adjusted parameters, which can be set as
the empirical value α = 1 and β = 10−7 mentioned in [25]. 4. Experimental results and discussion
When the loss functions are set well with a series of initial pa- In order to validate the proposed scheme and evaluate its secu-
rameters, we subsequently train the reconstructed generator G and rity performance, we implement a series of experiments with the
the discriminator D round by round by the following steps. following settings.
Step 1: The covers are firstly divided into different mini-
batches, e.g., 8 cover images as a mini-batch, which are sequen- 4.1.1. Image sources and pre-processing
tially put into the reconstructed generator G. For any cover X, Three classical image sources, BOSSbase v1.011 , BOWS22 and
the embedding probability map P can be generated by the recon- Place365-Standard3 , are used to evaluate the performance of pro-
structed generator G. posed scheme. The former two contain 10 0 0 0 images respectively,
Step 2: By combining the embedding probability map P and a which are down-sampled with “resize” Matlab function, while
random distribution matrix R ranging from 0 to 1, the modification the latter contains a huge amount of shared pictures, where the
map M can be generated by using double-tanh function mentioned standard high-resolution images database including 140 0 0 0 pic-
in [25]. tures (MD5:41a4b6b724b1d2cd862fb3871ed59913) is downloaded
in batch. In our experiments, we convert all experimental images
M = T ( P, R ) (11)
to 8-bit grayscale, and central-cropped them to 256 × 256. Notably,
where T (·, · ) represents the double-tanh function. we do not consider the colorful scenario, because our scheme can
Step 3: Generate the stego image Y by applying the modifica- also be easily applied to color images, if three pixel channels of a
tion map M on the cover image X. color image are processed simultaneously.
Y=X+M (12)
4.1.2. Tested steganographic and steganalysis schemes
Step 4: Repeat Steps 2–3, a mini-batch is sequentially processed In our experiments, several state-of-the-art steganographic
to generate their corresponding stego images. Then, both cover im- schemes, WOW [11], SUNIWARD [12], MiPOD [13], HILL [14], ASDL-
ages and their stego images are put into the discriminator D. GAN [24], UT-GAN [25], R-GAN [27] and SPAR-RL [28], are tested
Step 5: Alternately update the parameters by mini-batch gra-
dient descent optimizer to minimize the loss of generator and 1
BOSSbase v1.01. Available from: http://www.dde.binghamton.edu/download/.
discriminator, and GAN network will stop training when the loss 2
BOows2. Available from: http://bows2.ec-lille.fr/
function attains the convergence criteria. 3
Place365. Available from: http://www.places2.csail.mit.edu/.
6
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
by a series of experiments. In these schemes, WOW, SUNIWARD, level fusion and improve the learning capability for distortion mea-
MiPOD and HILL are four traditional hand-crafted distortion based surement of individual pixel. In order to verify the effectiveness of
steganographic schemes, while ASDL-GAN, UT-GAN, R-GAN and cross feedback mechanism, we test the learning capability of GAN
SPAR-RL belong to deep learning distortion based steganographic network before and after using cross feedback mechanism.
schemes. We define the relative payload as the message length di- Since proposed cross feedback mechanism can optimize the
vided by the size of cover image and denote it as bits per pixel U-NET structure of generator in GAN network, we first employ
(bpp in short). The secret messages are embedded into each cover the classical deep learning distortion model based steganographic
with the above relative payload. scheme, UT-GAN [25], to evaluate the performance before and after
In order to demonstrate the advantage of our proposed stegano- using cross feedback mechanism. Two state-of-the-art traditional
graphic method, we carry out a series of experiments by using dif- steganographic algorithms, SUNIWARD and HILL, are also used to
ferent steganalysis schemes, including SRM [15], maxSRMd2 [16], provide a reference. We carry out a series of experiments to test
Xu-Net [18], Ye-Net [19], Yedroudj-Net [20] and Deng-Net [21]. the modification map for steganography with payload 0.40 bpp.
In general, SRM and maxSRMd2 are two traditional hand-crafted Each experiment is implemented with 20 0 0 0, 60 0 0 0, 120 0 0 0 it-
rich feature-based steganalytic schemes, in which image residu- erations, respectively and the corresponding experimental results
als from different high-pass filters are used to form high-order are shown in Fig. 5. As can be seen from this figure, with increas-
co-occurrence matrix features to implement effective classifica- ing of training rounds, the embedding modifications of proposed
tion by combining ensemble classifier [26], while Xu-Net, Ye-Net, scheme (corresponding to Fig. 5(g)–(i)) are more and more con-
Yedroudj-NET and Deng-NET are four deep learning-based stegan- centrated, and the outline of the image is clearer. It implies that
alytic schemes, in which multiple convolutions are constructed as the modification map should be always more concentrated in the
the convolutional mask with high-pass filters and different acti- texture area. Since these comparisons only add the cross feedback
vation functions are also designed to boost the detection perfor- steps while maintaining other parameters unchanged, it can thus
mance. conclusively demonstrate that the proposed cross feedback mech-
In each experiment, the experimental images are divided into anism obtains a faster convergence speed and produces a clearer
three parts: training set, validation set, and testing set, whose their modification map. In fact, we can explain easily this phenomenon
proportion keeps 5 : 1 : 4. The overall security performance evalu- by the following two reasons. First, increasing cross feedback pro-
ation [26], denoted as the detection error rate P̄E over the testing cedure between the contraction path and the expansion path sig-
set, can be calculated by combining false alarm rate PFA and the nificantly reduce the loss of detailed information in the texture
missed detection rate PMD as follows. features concatenation process. Second, since the neurons of the
1 contraction path are more sensitive to feedback information, the
P̄E = min (PFA + PMD (PFA ) ) (14) learning capability of the generator can be improved significantly,
PFA 2
resulting in outputting a more accuracy modification map.
4.2. Evaluation for proposed cross feedback mechanism
4.3. Comparison with the state of the arts
Note that our scheme reconstructs the U-NET structure of gen-
erator by introducing cross feedback mechanism between the con- In this section, we carry out a series of experiments to present
traction path and the expansion path, which can achieve the pixel- the overall advantages of our scheme. Four traditional hand-crafted
7
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
Fig. 5. Illustration for modification maps with 0.4 bpp for BOSSBase cover image “1013.pgm”. (a) Cover. (b) Modification map of S-UNIWARD. (c) Modification map of
HILL. (d)-(f): Modification maps of UT-GAN after 20 0 0 0, 60 0 0 0, and 120 0 0 0 iterations, respectively. (g)-(i): Modification maps of UT-GAN with cross feedback mechanism
(CF-UT-GAN) after 20 0 0 0, 60 0 0 0, and 120 0 0 0 iterations, respectively.
distortion based steganographic schemes, e.g., WOW, SUNIWARD, This results demonstrate that compared to traditional hand-crafted
MiPOD, HILL, and four recently-proposed deep learning based distortion based steganography, our scheme can significantly im-
steganographic schemes, e.g., ASDL-GAN, UT-GAN, R-GAN, SPAR-RL, prove the security performance against rich model based stegana-
are successively tested to evaluate the proposed scheme and the lyzer. For example, for maxSRMd2 and 0.4 bpp payload, proposed
state of the art schemes. scheme can also achieve an 2.54% higher detection error rate than
We firstly evaluate proposed scheme with several traditional WOW, and 2.68%, 1.29%, 0.78% than S-UNIWARD, MiPOD and HILL,
hand-crafted distortion based steganographic schemes. Since tra- respectively.
ditional steganographic schemes are consistently verified over the To gain more insights, we compare the recently-proposed deep
classical BOSSbase v1.01 database, the corresponding experiments learning-based distortion minimizing steganography. Four GAN-
are also implemented on this dataset to build a fair comparison. based steganalyzers, Xu-Net [18], Ye-Net [19], Yedrouj-Net [20] and
Also, the performance P̄E is evaluated by the steganalyzers 34,671- Deng-Net [21], are also utilized to verify the overall performance.
dimensional SRM [15] and maxSRMd2 [16] with ensemble classi- Since GAN-based steganalyzers require massive images to imple-
fiers [26]. Each experiment is performed 10 rounds to obtain the ment model training, we randomly select 90 0 0 0 JPEG color images
average detection error rate and five relative payloads from 0.10 from Places365 standard database and transform them into 256 ×
bpp to 0.50 bpp are used to provide more insights. The overall 256 gray images to build a fair comparison. Notably, the training
detection error rate for five steganographic algorithms are pre- image can be set to a large size, but, too large size may extremely
sented in Fig. 6. As can be seen from this figure, the P̄E of pro- consume GPU source. Then, the training set is divided into two
posed scheme always has a substantial improvement whatever the parts. One part, including 40 0 0 0 images, is used to train stegano-
payload is. This implies that proposed scheme can provide a more graphic modification map and another part, including 50 0 0 0 im-
secure steganography for the same steganalyzer. We also present ages, is used to generate the stego images. Accordingly, there are
the actual experimental results in Table 3. In general, proposed total 50 0 0 0 cover-stego pairs, which are subsequently segmented
scheme can always achieve a higher P̄E among the five schemes. to training set (including 40 0 0 0 cover-stego pairs) and testing set
Specifically, comparing with HILL scheme (the second best method (including 10 0 0 0 cover-stego pairs). In addition, for different GAN
in this table), the average gain is about 1.1% − 2.0% for 0.1 bpp networks, we fix the Adam optimizer with 0.0 0 01 learning rate
and 1.1% − 2.3% for 0.5 bpp, no matter which steganalyzer is used. and implement 120 0 0 0 iterations to achieve the model training.
8
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
Fig. 6. Anti-steganalysis performance comparison between the proposed scheme and four hand-crafted distortion-based schemes, WOW, SUNIWARD, HILL, MIPOD. The
experiments are implemented over the BOSSbase image database and five payloads are used.
Table 3
Overall detection error rate of different steganographic schemes with the STC embedding technology over BOSSbase. Two
steganalyzers, SRM and maxSRMd2, are used in this experiment.
SRM [15] WOW .4236 ± .0037 .3645 ± .0042 .2933 ± .0029 .2557 ± .0031 .2031 ± .0033
SUNI .4276 ± .0024 .3657 ± .0031 .2921 ± .0037 .2513 ± .0025 .2011 ± .0028
MiPOD .4293 ± .0026 .3708 ± .0025 .3096 ± .0035 .2724 ± .0018 .2219 ± .0024
HILL .4333 ± .0032 .3762 ± .0039 .3177 ± .0024 .2784 ± .0031 .2271 ± .0031
Proposed .4423 ± .0035 .3859 ± .0029 .3362 ± .0033 .2950 ± .0042 .2482 ± .0025
maxSRMd2 [16] WOW .3541 ± .0032 .2893 ± .0020 .2411 ± .0027 .2037 ± .0032 .1624 ± .0027
SUNI .3578 ± .0031 .2917 ± .0028 .2414 ± .0025 .2023 ± .0030 .1627 ± .0032
MiPOD .3592 ± .0029 .2963 ± .0032 .2527 ± .0041 .2162 ± .0028 .1748 ± .0037
HILL .3687 ± .0025 .3052 ± .0037 .2582 ± .0033 .2213 ± .0025 .1857 ± .0031
Proposed .3850 ± .0031 .3116 ± .0027 .2642 ± .0020 .2291 ± .0028 .1963 ± .0029
Table 4
Overall detection error rate of different steganographic schemes over Place365-standard image database. Four steganalyzers,
Xu-Net, Ye-Net, Yedrouj-Net and Deng-Net, are used in this experiment and five payloads are used.
Xu-Net [18] ASDL-GAN .4497 ± .0024 .3877 ± .0035 .3119 ± .0029 .2931 ± .0033 .2347 ± .0023
UT-GAN .4651 ± .0029 .4134 ± .0034 .3767 ± .0030 .3253 ± .0020 .2912 ± .0027
R-GAN .4692 ± .0027 .4198 ± .0032 .3798 ± .0026 .3329 ± .0041 .2967 ± .0036
SPAR-RL .4697 ± .0033 .4213 ± .0025 .3821 ± .0034 .3348 ± .0041 .2996 ± .0027
Proposed .4783 ± .0033 .4267 ± .0022 .3863 ± .0028 .3393 ± .0042 .3038 ± .0026
Ye-Net [19] ASDL-GAN .4360 ± .0031 .3894 ± .0026 .3074 ± .0025 .2886 ± .0026 .2183 ± .0033
UT-GAN .4587 ± .0036 .4117 ± .0028 .3669 ± .0032 .3210 ± .0034 .2857 ± .0025
R-GAN .4636 ± .0027 .4156 ± .0032 .3710 ± .0028 .3287 ± .0038 .2914 ± .0025
SPAR-RL .4655 ± .0028 .4189 ± .0031 .3723 ± .0027 .3320 ± .0023 .2936 ± .0030
Proposed .4747 ± .0032 .4243 ± .0022 .3837 ± .0020 .3361 ± .0036 .3013 ± .0024
Yedrouj-Net ASDL-GAN .4007 ± .0032 .3511 ± .0041 .2986 ± .0033 .2511 ± .0028 .1982 ± .0040
[20] UT-GAN .4286 ± .0027 .3758 ± .0035 .3166 ± .0031 .2683 ± .0028 .2224 ± .0024
R-GAN .4352 ± .0034 .3792 ± .0039 .3193 ± .0026 .2731 ± .0043 .2279 ± .0033
SPAR-RL .4385 ± .0031 .3798 ± .0028 .3221 ± .0030 .2754 ± .0018 .2316 ± .0026
Proposed .4448 ± .0034 .3844 ± .0026 .3271 ± .0020 .2796 ± .0042 .2345 ± .0027
Deng-Net [21] ASDL-GAN .3562 ± .0036 .2931 ± .0029 .2311 ± .0036 .1771 ± .0030 .1652 ± .0028
UT-GAN .3877 ± .0028 .3209 ± .0041 .2587 ± .0035 .2093 ± .0034 .1921 ± .0026
R-GAN .3865 ± .0034 .3237 ± .0041 .2611 ± .0022 .2132 ± .0043 .1982 ± .0032
SPAR-RL .3932 ± .0035 .3291 ± .0028 .2657 ± .0037 .2207 ± .0018 .2011 ± .0027
Proposed .3954 ± .0034 .3335 ± .0027 .2711 ± .0029 .2223 ± .0042 .2067 ± .0026
The corresponding experimental results are shown in Fig. 7 and features concatenation process to generate a more accurate mod-
Table 4. As can be observed from Fig. 7, although the improvement ification map, which finally guide a more secure steganographic
seems like a minor alternation, the proposed scheme can always embedding. Moreover, we can also observe the actual results from
obtain the highest P̄E compared with existing deep learning-based Table 4. Under the steganalyzer Xu-Net with payload 0.40 bpp, the
schemes, whichever steganalyzer is used. We believe that the ben- detection error rate P̄E of proposed scheme is higher than that of
efits are from the introduction of cross feedback mechanism, be- ASDL-GAN with gain 4.62%, and achieves 1.40%, 0.76% and 0.45%
cause more detailed information can be preserved in the texture higher P̄E than UT-GAN, R-GAN and SPAR-RL, respectively. Interest-
9
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
Fig. 7. Anti-steganalysis performance comparison between the proposed scheme and four deep learning-based schemes, ASDL-GAN, UT-GAN, R-GAN and SPAR-RL. The ex-
periments are implemented over the Places365-standard image database and five payloads are used.
ingly, it seems that several steganographic schemes are more se- proposed scheme, and then evaluate them over BOWS2 database.
cure when the Xu-Net is used (corresponding to a higher P̄E with The corresponding results are shown in Table 6. As can be ob-
respect to other steganalyzers.). This is mainly because the genera- served that the proposed scheme can still work well over different-
tor of several steganographic schemes is trained by using Xu-Net as source datasets and present the superior performance than that
the discriminator. Actually, when Deng-Net is used as the stegana- of the state-of-the-art steganographic methods, regardless of tradi-
lyzer, the detection error rate P̄E for five steganographic schemes tional steganography and GAN-based steganography. This demon-
have a significant reduction. Nevertheless, proposed scheme can strates again that proposed scheme can obtain an effective anti-
still obtain a highest P̄E among several steganographic schemes. steganalysis performance, and meanwhile, even if the training set
It also implies that proposed scheme has a superior performance and the testing set are not from the same source, proposed scheme
than existing deep learning-based schemes, no matter which ste- still shows a stronger robustness comparing with other schemes.
ganalyzer is used.
4.5. Complexity for different schemes
4.4. Cover source mismatch testing
GAN-based steganography always shows a high time-consuming
Since proposed scheme belongs to GAN-based steganography, it during the training procedure. To test the computing efficiency,
needs massive images to train distortion model. Accordingly, if the we investigate the complexity of several steganographic schemes
training images do not come from the same source as the testing under different running situations. To build a fair comparison, all
set, GAN-based steganography maybe encounter the cover source training procedure for different steganographic schemes are im-
mismatch problem. In this section, we further evaluate the ro- plemented over the same experimental environment, that is, DELL
bustness of different steganographic scheme when facing the cover Workstation with Intel Core I7 CPU, 16GB RAM and RTX2080 GPU.
source mismatch. Since several steganographic schemes can converge well with high
To present the performance under the cover source mis- iterations, we set the stop condition as 120,0 0 0 training iterations
match, we firstly employ the downloaded Place365-standard image (72 epochs).
database (randomly select 90 0 0 0 images) to train distortion model The testing results are shown in Table 7. As can be seen from
and form five different steganographic schemes, ASDL-GAN, UT- this table, the training time of ASDL-GAN scheme is significantly
GAN, R-GAN, SPAR-RL and proposed scheme. Then, we evaluate the longer than that of our scheme, approximately 3.06 times. This
overall detection error rate by combining SRM with the ensemble is because ASDL-GAN scheme has a deeper and wider generator,
classifier over BOSSbase v1.01. The corresponding results are shown leading to a higher time-consuming. In addition, our scheme bor-
in Table 5. We can see from this table that proposed scheme still rows the structure of UT-GAN, it, however, has a lower training
keeps a best performance with respect to several existing deep time comparing with UT-GAN, approximately 0.19 h time-saving.
learning-based schemes, although the overall secure performance We believe that this interesting phenomenon indeed benefits from
may have a slight reduction comparing with the case that the the proposed cross feedback mechanism. Cross feedback between
training and testing sets are from the same image source. In addi- contraction path and expansion path significantly reduces the loss
tion, to gain more insights, we also use BOSSbase v1.01 (expanding of detailed information in the texture features concatenation pro-
this set to 40 0 0 0 images) to train distortion model and form other cess, and then makes the texture information of image be fully
five steganographic schemes, WOW, SUNIWARD, MiPOD, HILL and learnt during the training process. Accordingly, the convergence
10
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
Table 5
Cover source mismatch testing between four existing deep learning based steganographic schemes, ASDL-
GAN, UT-GAN, R-GAN, SPAR-RL and proposed scheme. In this test, the training set is Place365, while the
testing set is BOSSbase v1.01. The steganalyzer is the combination of SRM and ensemble classifier.
ASDL-GAN [24] .3723 ± .0023 .3228 ± .0035 .2646 ± .0036 .2262 ± .0030 .1721 ± .0021
UT-GAN [25] .4146 ± .0025 .3529 ± .0031 .2982 ± .0044 .2471 ± .0036 .2030 ± .0020
R-GAN [27] .4217 ± .0031 .3607 ± .0025 .3093 ± .0040 .2589 ± .0027 .2168 ± .0034
SPAR-RL [28] .4227 ± .0026 .3622 ± .0035 .3081 ± .0023 .2573 ± .0031 .2172 ± .0028
Proposed .4267 ± .0031 .3638 ± .0029 .3127 ± .0037 .2619 ± .0024 .2206 ± .0034
Table 6
Cover source mismatch testing between four existing traditional steganographic schemes, WOW, SUNIWARD,
MiPOD, HILL and proposed scheme. In this test, the training set is BOSSbase v1.01, while the testing set is
BOWS2. The steganalyzer is the combination of SRM and ensemble classifier.
WOW [11] .4246 ± .0039 .3656 ± .0029 .2926 ± .0026 .2551 ± .0034 .2035 ± .0039
SUNIWARD [12] .4241 ± .0030 .3650 ± .0039 .2923 ± .0035 .2523 ± .0021 .2018 ± .0024
MiPOD [13] .4257 ± .0024 .3712 ± .0020 .3092 ± .0034 .2727 ± .0016 .2215 ± .0027
HILL [14] .4336 ± .0028 .3746 ± .0034 .3146 ± .0021 .2773 ± .0030 .2276 ± .0038
Proposed .4463 ± .0037 .3863 ± .0022 .3358 ± .0038 .2946 ± .0044 .2490 ± .0026
Table 7
Overall running time (hours) for our proposed scheme and three deep learning based
schemes.
Different Schemes
DELL Workstation with Intel Core I7 CPU, 16GB RAM and RTX2080 GPU.
speed of training procedure is accelerated while the accuracy of easily transmitted to color image scenario if three pixel channels
distortion measurement can also be guaranteed, resulting in a sig- of a color image are processed simultaneously.
nificant improvement in the overall steganography performance. We plan to further investigate GAN-based steganography in two
ways. First, the architecture of adversarial training is not clear yet.
We will investigate the relationship between the generator and the
5. Conclusions and future work discriminator in GAN network to boost the security performance
of our framework. Second, the proposed scheme is built over the
In this paper, we introduced cross feedback mechanism in spatial domain. Is it possible to transmit the GAN-based scheme to
GAN network to design a new GAN-based spatial steganographic JPEG domain? The above two issues are left as our future research.
scheme. For the GAN network, multiple cross feedback channels
between the contraction path and the expansion path in genera- Declaration of Competing Interest
tor were built to directly send the down-sampling information to
the contraction layer. By repeating iterations of the texture infor- The authors declare that they have no known competing finan-
mation returned during the training process, the detailed informa- cial interests or personal relationships that could have appeared to
tion from deep layers in GAN network can be captured effectively influence the work reported in this paper.
and a more sophisticated probability map was thus generated to
finally guide high undetectable steganography. Compared to exist- CRediT authorship contribution statement
ing steganographic schemes, our scheme significantly improved the
anti-steganalysis capability with different steganalyzers. Fengyong Li: Conceptualization, Methodology, Writing – origi-
While our proposed scheme reconstructs the U-NET structure nal draft. Zongliang Yu: Software, Validation. Chuan Qin: Supervi-
to design more secure steganographic scheme and has shown bet- sion, Formal analysis.
ter performance in terms of anti-steganalysis capability, we should
note that our scheme is essentially a supervised learning method, Acknowledgments
that is, it still needs a large number of training samples to learn
the optimized discriminant model. More importantly, due to a This work was sponsored by the Natural Science Foundation
large number of iterative learning, it always requires a longer train- of Shanghai (20ZR1421600, 21ZR1444600), the Natural Science
ing time than that of traditional supervised learning methods, such Foundation of China (62172280, U20B2051) and the Shanghai Sci-
as SVM and Ensemble. This may be a common problem for GAN- ence and Technology Committee Capability Construction Project for
based steganography. Nevertheless, this limitation should not be Shanghai Municipal Universities (20 06050230 0).
a concern at all, because high-performance equipment may be a
References
better choice for shortening the training time. Moreover, we only
perform the simulation experiments over grayscale images, not on [1] J. Fridrich, Steganography in Digital Media: Principles, Algorithms, and Appli-
the color images, because we believe that proposed scheme can be cations, Cambridge University Press, 2009.
11
F. Li, Z. Yu and C. Qin Signal Processing 190 (2022) 108341
[2] M. Yu, X. Yin, W. Liu, W. Lu, Secure halftone image steganography based on [17] Y. Qian, J. Dong, W. Wang, T. Tan, Deep learning for steganalysis via con-
density preserving and distortion fusion, Signal Process. (2021) 108227, doi:10. volutional neural networks, in: Media Watermarking, Security, and Forensics
1016/j.sigpro.2021.108227. 2015, vol. 9409, International Society for Optics and Photonics, 2015, p. 94090J,
[3] F. Li, K. Wu, J. Lei, M. Wen, Z. Bi, C. Gu, Steganalysis over large-scale social doi:10.1117/12.2083479.
networks with high-order joint features and clustering ensembles, IEEE Trans. [18] G. Xu, H.-Z. Wu, Y.-Q. Shi, Structural design of convolutional neural networks
Inf. Forensics Secur. 11 (2) (2015) 344–357, doi:10.1109/TIFS.2015.2496910. for steganalysis, IEEE Signal Process. Lett. 23 (5) (2016) 708–712, doi:10.1109/
[4] J. Yu, F. Li, H. Cheng, X. Zhang, Spatial steganalysis using contrast of residuals, LSP.2016.2548421.
IEEE Signal Process. Lett. 23 (7) (2016) 989–992, doi:10.1109/LSP.2016.2575100. [19] J. Ye, J. Ni, Y. Yi, Deep learning hierarchical representations for image steganal-
[5] X. Yu, K. Chen, Y. Wang, W. Li, W. Zhang, N. Yu, Robust adaptive steganog- ysis, IEEE Trans. Inf. Forensics Secur. 12 (11) (2017) 2545–2557, doi:10.1109/
raphy based on generalized dither modulation and expanded embedding do- TIFS.2017.2710946.
main, Signal Process. 168 (2020) 107343, doi:10.1016/j.sigpro.2019.107343. [20] M. Yedroudj, F. Comby, M. Chaumont, Yedroudj-net: an efficient CNN for spa-
[6] L. Zhu, X. Luo, C. Yang, Y. Zhang, F. Liu, Invariances of JPEG-quantized DCT co- tial steganalysis, in: 2018 IEEE International Conference on Acoustics, Speech
efficients and their application in robust image steganography, Signal Process. and Signal Processing (ICASSP), IEEE, 2018, pp. 2092–2096, doi:10.1109/ICASSP.
183 (2021) 108015, doi:10.1016/j.sigpro.2021.108015. 2018.8461438.
[7] W. Liu, X. Yin, W. Lu, J. Zhang, J. Zeng, S. Shi, M. Mao, Secure halftone image [21] X. Deng, B. Chen, W. Luo, D. Luo, Fast and effective global covariance pool-
steganography with minimizing the distortion on pair swapping, Signal Pro- ing network for image steganalysis, in: Proceedings of the ACM Workshop on
cess. 167 (2020) 107287, doi:10.1016/j.sigpro.2019.107287. Information Hiding and Multimedia Security, 2019, pp. 230–234, doi:10.1145/
[8] T. Filler, J. Fridrich, Minimizing additive distortion functions with non-binary 3335203.3335739.
embedding operation in steganography, in: 2010 IEEE International Workshop [22] R. Zhang, F. Zhu, J. Liu, G. Liu, Depth-wise separable convolutions and multi-
on Information Forensics and Security, 2010, pp. 1–6, doi:10.1109/WIFS.2010. level pooling for an efficient spatial CNN-based steganalysis, IEEE Trans. Inf.
5711444. Forensics Secur. 15 (2019) 1138–1150, doi:10.1109/TIFS.2019.2936913.
[9] T. Qiao, S. Wang, X. Luo, Z. Zhu, Robust steganography resisting JPEG com- [23] L. Li, W. Zhang, C. Qin, K. Chen, W. Zhou, N. Yu, Adversarial batch im-
pression by improving selection of cover element, Signal Process. 183 (2021) age steganography against CNN-based pooled steganalysis, Signal Process. 181
108048, doi:10.1016/j.sigpro.2021.108048. (2021) 107920, doi:10.1016/j.sigpro.2020.107920.
[10] T. Pevnỳ, T. Filler, P. Bas, Using high-dimensional image models to perform [24] W. Tang, S. Tan, B. Li, J. Huang, Automatic steganographic distortion learning
highly undetectable steganography, in: International Workshop on Information using a generative adversarial network, IEEE Signal Process. Lett. 24 (10) (2017)
Hiding, Springer, 2010, pp. 161–177, doi:10.1007/978- 3- 642- 16435- 4_13. 1547–1551, doi:10.1109/LSP.2017.2745572.
[11] V. Holub, J. Fridrich, Designing steganographic distortion using directional fil- [25] J. Yang, D. Ruan, J. Huang, X. Kang, Y.-Q. Shi, An embedding cost learning
ters, in: 2012 IEEE International workshop on information forensics and secu- framework using gan, IEEE Trans. Inf. Forensics Secur. 15 (2019) 839–851,
rity (WIFS), IEEE, 2012, pp. 234–239, doi:10.1109/WIFS.2012.6412655. doi:10.1109/TIFS.2019.2922229.
[12] V. Holub, J. Fridrich, T. Denemark, Universal distortion function for steganog- [26] J. Kodovsky, J. Fridrich, V. Holub, Ensemble classifiers for steganalysis of digital
raphy in an arbitrary domain, EURASIP J. Inf. Secur. 2014 (1) (2014) 1–13, media, IEEE Trans. Inf. Forensics Secur. 7 (2) (2011) 432–444, doi:10.1109/TIFS.
doi:10.1186/1687- 417X- 2014- 1. 2011.2175919.
[13] V. Sedighi, R. Cogranne, J. Fridrich, Content-adaptive steganography by mini- [27] H. Wu, F. Li, X. Zhang, K. Wu, GAN-based steganography with the concatena-
mizing statistical detectability, IEEE Trans. Inf. Forensics Secur. 11 (2) (2015) tion of multiple feature maps, in: International Workshop on Digital Water-
221–234, doi:10.1109/TIFS.2015.2486744. marking, Springer, 2019, pp. 3–17, doi:10.1007/978- 3- 030- 43575- 2_1.
[14] B. Li, S. Tan, M. Wang, J. Huang, Investigation on cost assignment in spatial [28] W. Tang, B. Li, M. Barni, J. Li, J. Huang, An automatic cost learning framework
image steganography, IEEE Trans. Inf. Forensics Secur. 9 (8) (2014) 1264–1277, for image steganography using deep reinforcement learning, IEEE Trans. Inf.
doi:10.1109/TIFS.2014.2326954. Forensics Secur. 16 (2020) 952–967, doi:10.1109/TIFS.2020.3025438.
[15] J. Fridrich, J. Kodovsky, Rich models for steganalysis of digital images, [29] O. Ronneberger, P. Fischer, T. Brox, U-Net: convolutional networks for biomed-
IEEE Trans. Inf. Forensics Secur. 7 (3) (2012) 868–882, doi:10.1109/TIFS.2012. ical image segmentation, in: International Conference on Medical Image
2190402. Computing and Computer-Assisted Intervention, Springer, 2015, pp. 234–241,
[16] T. Denemark, V. Sedighi, V. Holub, R. Cogranne, J. Fridrich, Selection-channel- doi:10.1007/978- 3- 319- 24574- 4_28.
aware rich model for steganalysis of digital images, in: 2014 IEEE International
Workshop on Information Forensics and Security (WIFS), IEEE, 2014, pp. 48–
53, doi:10.1109/WIFS.2014.7084302.
12