0% found this document useful (0 votes)

18 views11 pages

RobustGAN 19

Uploaded by

bingoyesbingoyes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views11 pages

RobustGAN 19

Uploaded by

bingoyesbingoyes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Robust Invisible Video Watermarking with Attention

Kevin Alex Zhang Lei Xu

LIDS, MIT LIDS, MIT
Cambridge, MA 02142 Cambridge, MA
[email protected] [email protected]
arXiv:1909.01285v1 [cs.MM] 3 Sep 2019

Alfredo Cuesta-Infante Kalyan Veeramachaneni

Univ. Rey Juan Carlos LIDS, MIT
Móstoles, Spain Cambridge, MA 02142
[email protected] [email protected]

Abstract
The goal of video watermarking is to embed a message within a video file in a
way such that it minimally impacts the viewing experience but can be recovered
even if the video is redistributed and modified, allowing media producers to assert
ownership over their content. This paper presents R IVAGAN, a novel architecture
for robust video watermarking which features a custom attention-based mechanism
for embedding arbitrary data as well as two independent adversarial networks which
critique the video quality and optimize for robustness. Using this technique, we are
able to achieve state-of-the-art results in deep learning-based video watermarking
and produce watermarked videos which have minimal visual distortion and are
robust against common video processing operations.

1 Introduction
Video watermarking is a set of techniques that aims to hide information in a video stream in such a
way that it is hard to remove or tamper with, all while preserving the quality and fidelity of the content.
Video watermarking allows content creators to prove ownership after distribution and enables movie
producers to identify leaks by embedding unique identifiers into preview copies of films [2]. Other
examples of applications include everything from identifying copyright infringement and embedding
tags for filtering content to automated broadcast monitoring for commercials.
Effective video watermarking is both invisible and robust. However, existing techniques rarely
achieve both of these goals at once. Invisible watermarking is much more challenging in videos than
in still images because perturbing frames independently may result in highly visible distortions such
as flickering. Also, classical watermarking techniques based on algorithms such as the discrete cosine
transform or discrete wavelet transform are typically not robust to video processing operations like
cropping and scaling. If the leaked video has undergone any of these geometric transformations, the
watermark may be destroyed.
The goal of this paper is to design a deep learning-based, multi-bit video watermarking process that
is both robust and invisible. We are motivated by the recent success of deep learning and adversarial
training methods in data hiding tasks as shown by [32, 28]. In this paper, we propose a novel
architecture that goes beyond the standard convolutional layers and operations used in related deep
learning based systems such as image steganography and watermarking.
Our paper is organized as follows: Section 2 discusses related work in watermarking and steganogra-
phy, Section 3 introduces our approach to video watermarking, Section 4 presents some results on
benchmark datasets, and Section 5 provides additional insights into how our model functions.

Preprint. Under review.

2 Related Work
Watermarking is closely related to steganography, as both aim to hide information within another
medium. Recent work at the intersection of deep learning and steganography has shown promis-
ing results. For example, [32, 9] have explored using convolutional neural networks for image
steganography and managed to achieve higher relative payloads. Additionally, [30] explores video
steganography in the context of secretly embedding one video inside of another. While they both
involve hiding data inside another piece of content, steganography assumes that adversaries are
attempting to detect a message, while watermarking assumes that they are attempting to remove the
watermark.
Watermarking techniques are usually classified as image-based (operating on each frame indepen-
dently) or video-based (exploiting temporal information). Image-based techniques operate either in
the spatial or frequency domain [7]. Spatial domain methods include approaches such as modifying
the least significant bit of each pixel which is a straightforward but weak method as the watermark
could be easily removed by a random modification of all pixels. More elaborate proposals include
Spread Spectrum Modulation [8, 1, 19] and Quantization Index Modulation [13, 15]. In the frequency
domain, the watermark can be embedded by modifying the coefficients produced with any of the
following transformations of the video sequence: the Discrete Cosine Transform (DCT) [11], the
Discrete Fourier Transform [24], and the Discrete Wavelet Transform [33, 17, 14].
There are two main approaches for video-based techniques, the first of which is the compression
format. For instance, Reversible Variable Length Codes (RVLC), introduced in [21], can be based
on the H.264 format [23] or on MPEG [5]. Alternatively, one might focus on the motion vectors
produced by the compression process, slightly altering any with a magnitude large enough that the
alteration will not be visible, as in [31, 22].
Recently Convolutional Neural Networks (CNN) and Deep Learning (DL) have been used for image
steganography [4] – suggesting that they may also be suitable for watermarking. For example, a
secure signal authentication based on dynamic watermarking with DL was recently proposed in
[6]. Steganography with DL for still images has recently been investigated [32, 9] and used for
hiding a video within another video in [30]. Other approaches to video watermarking such as
perceptual models [26, 3], object detection [10, 18], and scene segmentation [27], are also used in
computer vision tasks where DL achieves state-of-the-art results. As far as we know, however, video
watermarking with DL has only recently begun to be explored, and is still an open problem [29, 16].
In this paper, we propose an end-to-end model for robust video watermarking which uses multiple
adversaries as well as a novel attention mechanism.

3 RivaGAN
In this section, we introduce our model for robust video watermarking. Our goal is to encode a D-bit
data vector, where D ∈ {32, 64}, into an arbitrary video of T frames in such a way that (1) the bit
vector can be reliably recovered given one or more frames of the watermarked video, (2) there are no
visible distortions, (3) the watermark cannot be easily removed by watermark removal tools, and (4)
the watermark is robust against common video processing operations.
To achieve these goals, we design our architecture with two adversaries: a critic which evaluates the
quality of the watermarked video, and an adversary network which attempts to remove the watermark.
These two work with an encoder network which adds the watermark to a video, and a decoder network
which extracts the watermark. We present our architecture in Figure 2 and present details about
various transformations in Section 3.1.
In addition, we also introduce a new mechanism for combining the data and image representations
which is more robust against common video processing operations and is easier to train. Currently,
all existing approaches for deep learning-based data hiding techniques operate by concatenating
the binary data to a feature map derived from the image and apply additional convolution layers to
generate the output. We propose a different attention-based mechanism (shown in Figure 1 which
learns a probability distribution over the data dimensions for each pixel and uses that distribution
to select the bits to pay attention to during the embedding process. This biases our model towards
learning to hide different bits in different objects and textures, making it easier to train and resulting
in robustness against operations such as cropping, scaling, and compression.

2
Spatial Repetition Attention
1 Attention Mask
0 1 0 0 for Bit 1
0 1 0 0
0.9 0.1 0.4 ... ... 0.9
...
...

...
0.4
0.1
0 1 0 0 0.7 0.2 ... ... 0.6
0.1 0.2 ... 0.6
0 1 0 0 x
0 1 0 0
2 Attention Distribution

...
0 1 0 0
for Pixel 1
0.7
0.1 0.2 ... ... 0.6
1 2
Figure 1: This figure shows the difference between what related deep learning-based approaches (left)
to this task use to represent their data and what our attention-based approach (right) uses. Unlike
existing approaches which naively repeat the data cross the spatial dimensions, we learn a probability
distribution over the data for each pixel (e.g. the attention distribution) and use that to generate a
more compact data representation. This operation also has the advantage of being interpretable as an
“attention mask" as we can see what bits each pixel is paying attention to and encourage the model to
pay attention to different bits based on the content of the image.

0
Notation. Let X ∈ RT ×W ×H×C be an tensor and Y ∈ RC be a vector. Then let Cat : (X, Y ) →
0
Φ ∈ RT ×W ×H×(C+C ) be the concatenation of X and Y , where Y is expanded to a T ×W ×H ×C 0
dimensional tensor.
Let ConvD→D0 : X → Φ be a 3D convolutional block that takes an input tensor X ∈ RT ×W ×H×D
and maps it to a feature tensor Φ ∈ RT ×W ×H×D , where T , W and H are the time, width and
height dimension respectively. D and D0 are the depth of features. The convolutional block applies a
1×K ×K convolution kernel where K = 11, followed by a TanH activation and a batch normalization
operation [12].
Let Pool : X → Φ be an adaptive mean pooling operation which takes an input tensor X ∈
RT ×W ×H×D and maps it to a feature tensor Φ ∈ RD by averaging over the T , W , and H dimensions.
0
Let LinearD→D0 : X ∈ R...×D → Φ ∈ R...×D be a linear transformation for the last dimension of
a tensor. So Φ and X are the same size except for the last dimension, which changes from D to D0 .
Finally, let V and V̂ be the original video and the watermarked video, which all have the same length
and resolution T × W × H and use RGB color space. Let M ∈ {0, 1}32 be the 32-bit watermark,
and M̂ be the watermark recovered from V̂ . Let T be the attention module, let E be the encoder, let
D be the decoder, let C be the critic, and let A be the adversary.

3.1 Architecture

Attention. The attention module is a pair of convolutional layers shared between the encoder and
decoder. It takes the source frames, applies two convolutional blocks, and generates an attention
mask of size (T, W, H, D) where D is the data dimension, and (T, W, H) corresponds to the time
and size dimensions. The attention mask functions by allowing the model to use the content of the
image at a particular location to determine which dimensions of the data vector to pay attention to.
As shown in Figure 2, the output is an attention mask where the vector at each pixel can be interpreted
as a multinomial distribution over the D data dimensions. The attention module can be formally
expressed as follows: 
a = Conv3→32 (V )
b = Conv32→D (a) (1)
T (V ) = Softmax(b)


The attention mechanism biases the model towards hiding data in textures and objects that are less
affected by these transformations than lower-level features. In this way, it helps encourage robustness
against scaling and compression. Empirically, we find that we are able to achieve faster convergence

3
Attention
Conv Conv Softmax
(T, W, H, 3) (T, W, H, 32) (T, W, H, D) (T, W, H, D)
3 32 32 D dim = 3
Video Attention Mask

Encoder
Conv Conv
(T, W, H, 3) Concat
4 32 (T, W, H, 32) + (T, W, H, 3)
32 3
Video D Video’
Attention (T, W, H, D) x (T, W, H, 1)
Attention Mask

Decoder
Conv Conv
(T, W, H, 3) (T, W, H, 32) (T, W, H, D)
3 32 32 D
Video’

Attention
(T, W, H, D) x (T, W, H, D) D
Attention Mask

Figure 2: This figure shows how the attention, encoder, and decoder modules operate on a tensor
level. The attention module uses two convolutional blocks to create an attention mask, which is
then used by the encoder and decoder modules to determine which bits to pay attention to at each
pixel. The encoder module uses the attention mask to compute a compacted form of the data tensor
and concatenates it to the image before applying additional convolutional blocks to generate the
watermarked video. The decoder module extracts the data from each pixel but then weights the
prediction using the attention mask before averaging to try and recover the original data.

and better performance than with other approaches such as concatenation or multiplication as in [32].
We compare our attention-based approach to these competing approaches in Section 4.

Encoder. The encoder network is responsible for taking a fixed-length data vector and embedding
it into a sequence of video frames. The encoder uses the attention module to generate a compact
data tensor of shape (T, W, H, 1), where the D data dimensions have been reduced to a single real
value using the attention weights, and concatenates this compact data tensor to the image. It then
applies two convolutional blocks and generates a residual mask. We constrain the residual such that
an individual pixel can be perturbed by no more than ±0.01 and add the residual mask to the original
video to generate the watermarked output1 . It can be formally expressed as follows:

= T (V ) × M

a

b = Conv4→32 (Cat(V, a))
(2)
c
 = Conv32→3 (b)

V̂ = E(V, M ) = V + 0.01 · TanH(c)

Consider an extreme example: for a given pixel, the attention module generates an attention vector
where all the values are 0 except in the first dimension, which is 1. In this case, the compacted data
vector would simply contain the first bit of the data. This operation allows the encoder to learn to pay
attention to different dimensions of the data conditioned on the content of the image at each pixel.

Decoder. The decoder network is responsible for taking a sequence of video frames and extracting
the watermark. As shown in Figure 2, we (1) attempt to extract all D bits of data from every location
in the video and (2) reuse the attention module from the encoder, performing what we refer to as an
“attention pooling" operation to aggregate over the spatial dimensions and generate a D-dimensional

1
We represent pixel intensities as floating point numbers in the range [−1.0, 1.0] as opposed to the discrete
representation as integers in the set {0, ..., 255}.

4
prediction of the watermark bits. This operation can be formally expressed as:

a


= Conv3→32 (V̂ )
b = Conv32→D (a)

(3)
c
 = T (V ) × b

M̂ = D(V̂ ) = Pool(c)

This operation is designed to take advantage of that fact that if the encoder paid a lot of attention to
bit d at a particular pixel, then the value of that pixel is more likely to contain information about bit
d than some arbitrary bit. Therefore we weigh the predictions generated by the decoder using the
amount of attention paid to each bit at each location, and take the average.
We note that the decoder module does not require access to the original source video since the
attention module is applied to the watermarked version of the video; as a result, this decoder satisfies
the criteria for our system to be classified as a blind video watermarking algorithm.

Critic. The critic network is responsible for taking a sequence of video frames and detecting the
presence of a watermark. It encourages the encoder to watermark the video in such a way that the
distortion is less visible and can fool the critic. This module consists of two convolutional blocks,
followed by an adaptive spatial pooling layer and a linear classification layer which produces the
critic score.

Adversary. The adversary network attempts to imitate an attacker trying to remove the watermark.
Specifically, the adversary network is responsible for taking a sequence of video frames and removing
the watermark to generate another sequence of clean video frames. This module closely resembles the
Encoder module without the data tensor. This module consists of two convolutional blocks followed
by a linear layer which generates the residual mask. We then apply a scaled TanH activation function
to constrain the maximum amount by which an individual pixel can be perturbed to ±0.01, and add
the residual mask to the watermarked video to generate the output.

3.2 Noise Layers

In order to encourage robustness against common video transforms, we apply several noise layers to
the watermarked video before it is passed to the decoder, forcing the encoder and decoder to learn
representations that are invariant to these transforms.

Scaling. The scaling layer is designed to re-scale the video to a random size where the width and
height are between 80-100% of the original. By inserting this noise layer between the encoder and
decoder, we ensure that our model learns to embed data bits in a scale-invariant manner.

Cropping. The cropping layer is designed to randomly select a sub-window that contains 80-100%
of the video frame. By inserting this noise layer between the encoder and decoder, we ensure that our
model learns to embed the data bits with sufficient spatial redundancy that cropping will not remove
the message.

Compression. The compression layer uses the discrete cosine transform (DCT) to provide a dif-
ferentiable approximation of video compression algorithms such as H.264 [25]. By converting the
video into the YCrCb color space, applying the 3D DCT transform, zeroing out 0-10% of the highest
frequency components, applying the inverse DCT transform, and then converting the video back
into the RGB color space, we can force our model to embed watermarks in a compression-resistant
manner.

3.3 Optimization

Loss Functions. In order to train the encoder E and decoder D in our video watermarking model, we
minimize the following loss functions. The cross-entropy loss between the bit vector and the decoded
data
Ld = EV,M [CrossEntropy(M, D(E(V, M )))]
The cross-entropy loss between the bit vector and the decoded data after the watermarked video
is processed by the non-differentiable MJPEG compression operation. This operation takes the

5
Table 1: This table shows the results for a model trained to embed 32 bits of data with or without our
attention mechanism. We find that models trained with our attention masking and pooling operations
outperform models trained without it and are significantly more robust against geometric transforms.
The average PSNR of the attention-based models is 42.65 while the average PSNR of the models
without attention is 42.73.
Model MJPEG Cropped Scaled
No Attention 0.595 0.588 0.589
No Attention + Noise 0.973 0.970 0.915
Attention 0.997 0.981 0.985
Attention + Noise 0.997 0.995 0.987

sequence of frames generated by the model, saves it to disk using the MJPEG compression format,
and reads it back for the decoder to process. This loss can be expressed by
L∗d = EV,M [CrossEntropy(M, D(MJPEG(E(V, M ))))]
The realism of the watermarked video according to the critic network
Lc = EV,M [C(E(V, M ))]
The cross-entropy loss between the bit vector and the data that is recovered from the watermarked
video after the adversary has tampered with it
La = EV,M [CrossEntropy(M, D(A(E(V, M ))))]

To optimize the critic C and adversary A modules, we also use the following loss functions. The
Wasserstein loss to distinguish between source and watermarked videos
Lw = EV [C(V )] − EV,M [C(E(V, M ))]
The negative cross-entropy loss to teach the adversary to remove the watermark
Lr = −EV,M [CrossEntropy(M, D(A(E(V, M ))))]

Training Procedure. We optimize these loss functions using the Adam optimizer with an initial
learning rate of 10−3 which is decayed when the loss function plateaus; furthermore, we clip the
critic weights to [−0.1, 0.1] and train our model for 300 epochs. During the training stage, we use
standard data augmentation procedures including random horizontal flipping (where we flip all frames
in a given video) and random cropping (where we select a random sub-image from all frames). We
operate on batches of size N = 12 and our procedure for generating the batches involves selecting
N/2 videos from the training dataset and pairing each video with (1) a randomly generated bit vector
D and (2) the complement of that bit vector D̄. We refer to these paired samples as Hamming vector
pairs to denote that the two bit vectors differ by a single bit. We find that this procedure results in
faster convergence and improves model performance significantly.

4 Experiments and Results

Setup. To evaluate the effectiveness of our approach, we run a series of experiments on the Hol-
lywood2 [20] data set which contains over 2500 short video clips extracted from movies and adds
up to over 20 hours of video content. We generated a random 32-bit identifier and measured our
model’s ability to hide and recover the data in a variety of different scenarios. In order to evaluate
the decoding accuracy, we quantize the pixel values to 8-bit integers and store them as a MJPEG
video file. We also report the decoding accuracy after apply other video processing operations such
as cropping to examine our robustness against these transforms.
How do we compare to concatenation-based models? As shown in Table 1, we find that our
attention-based models outperform concatenation-based models such as [32]. In general, we find
that attention-based models are more robust to compression, cropping, and scaling even when
we do not explicitly use noise layers to encourage the model to be robust to these transformations.

6
Table 2: This table shows the video quality and watermarking accuracy when embedding D bits of
random data into videos from the test set. The MJPEG column indicate the accuracy obtained by
the decoder after the video is compressed, saved, and read back. The Cropped column indicates the
accuracy obtained after the video is randomly cropped down to 80% of its original size, compressed,
saved, and read back. Similarly, the Scaled column indicates the accuracy obtained after the video is
randomly scaled down to 80% of its original size, compressed, saved, and read back.
Quality Accuracy
Model D PSNR SSIM MJPEG CroppedScaled
Attention 32 42.71 0.954 0.997 0.981 0.985
Attention + Noise 32 42.61 0.960 0.997 0.995 0.987
Attention + Noise + Critic 32 42.08 0.948 0.998 0.998 0.991
Attention + Noise + Critic + Adversary 32 42.05 0.960 0.992 0.988 0.981
Attention 64 42.20 0.944 0.993 0.980 0.961
Attention + Noise 64 42.22 0.953 0.971 0.966 0.917
Attention + Noise + Critic 64 42.06 0.945 0.991 0.989 0.961
Attention + Noise + Critic + Adversary 64 41.99 0.950 0.983 0.972 0.958

Furthermore, even when noise layers are used, we find that our attention-based models still outperform
concatenation-based approaches.
How effective is our approach? We show some examples of video frames in Figure 3 and note
that the watermarked video does not contain any noticeable artifacts. Our results are presented in
Table 2 which shows our image quality and our ability to recover the watermark for different model
configurations and different video processing operations.
We find that when the watermarked video is transmitted without modification, the receiver is able
to decode the 32-bit watermark with above 95% accuracy in all cases. We note that this low error
rate can easily be compensated for through error correcting codes, allowing our system to be used in
real-world applications. Furthermore, we find that the cropping and scaling noise layers are effective
at encouraging robustness against the corresponding video processing operations. When these layers
are applied, the receiver is able to decode the watermark with approximately 99% accuracy despite
cropping and scaling.
Can humans identify the watermarked video? To further establish the invisibility of our water-
marking scheme, we asked workers on the Mechanical Turk platform to watch a random selection of
videos and try to distinguish the source videos from the watermarked videos. For this experiment, we
generated pairs of source and watermarked videos for all 884 videos in our test set and asked workers
who possessed the “masters" qualification to review each pair and identify which video contained the
watermark.
We present the results of this experi- Table 3: This table shows the detection rate by workers on
ment in Table 3 and note that the hu- Mechanical Turk for a randomly selected subset of test videos
man workers are only slightly better generated by each model.
than random guessing. Furthermore,
we find evidence to suggest that the
critic module reduces the visibility of Model Detection Rate
the watermark as the detection rate for Attention + Noise 0.541
watermarked videos generated by the Attention + Noise + Critic 0.514
critic model is 5% lower than those Attention + Noise + Critic + Adversary 0.515
generated by the baseline models.

5 Additional Insights

What does the watermark look like? Next, we’ll examine where the watermark data is being
hidden by visually inspecting the residual that is generated by the encoder and added to the source

7
Figure 3: This figure shows the watermarked video (top) and the residual masks (bottom). The
residual masks were generated by the encoder module and added to the source video to produce the
watermarked video.

Figure 4: This figure shows the original source video and two examples “difference masks" for the
first and second bit of the data tensor. Bright regions indicate that flipping a single bit caused that
pixel to change in the watermarked output. The three images on the top correspond to a model trained
with the attention mechanism and we note that the two difference masks look significantly different.
The three images on the bottom correspond to a model trained without the attention mechanism and
the two difference masks are virtually identical.

video. Figure 3 shows an example of a source video and the corresponding residuals. We note that
the residual values appear to be fairly evenly distributed across the frame.
How does changing a single bit change the watermark? Finally, we examine the impact of flipping
a single bit in the data tensor by examining the resulting “distance mask". We compute each distance
mask by taking a fixed data tensor D1 , embedding it in the image to generate a watermarked video W1 ,
changing a single bit in D1 to create D2 , and embedding it in the image to generate a watermarked
video W2 . Then, we visualize the difference between the two watermarked videos |W1 − W2 | to
highlight the regions of the watermarked video are affected by that particular bit.
We perform this process with a randomly selected image for the first and second bits in our data
tensor and present the results in Figure 4. This figure provides evidence to support that our hypothesis
that the attention mechanism allows our model to pay attention to different dimensions of the data
tensor depending on the content of the image as different bits appear to affect different parts of the
watermark. We observe that the difference masks for the two bits are significantly different in the
attention-based model but are not significantly different in the model without attention, suggesting
that this phenomena can be attributed to the attention mechanism.
Do we need to use Hamming vector pairs? In our initial explorations, we trained our model by
iterating over all of the videos in our dataset and training our model to encode and decode a randomly
generated bit vector into each video. Despite experimenting with multiple optimizers, batch sizes,
and learning rates, we found that our model often failed to converge within a reasonable number of
epochs. This is shown in Figure 5 where the model trained with a high learning rate and without the
Hamming vector pairs fail to converge.

8
1.5
Hamming Vector (LR = 0.001)
1
Hamming Vector (LR = 0.0001)
No Hamming Vector (LR = 0.001)
0.9 No Hamming Vector (LR = 0.0001)
1

Test Accuracy
Training Loss 0.8

0.7
0.5
0.6

0.5
0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Wall Clock Time Wall Clock Time

Figure 5: This figure shows the training loss for the same model architecture, learning rate, and
optimizer but trained with and without the bit inverse trick. We find that including the bit inverse
within the same batch results in dramatically faster convergence as well as better model performance.

In order to overcome this instability, we introduced the concept of Hamming vector pairs and found
that the model converges significantly faster and, in the case of a high initial learning rate, achieves
higher test accuracy. We hypothesize that this is due to the fact that the gradients produced by
Hamming vector pairs are less noisy than the gradients produced by a simple random sample.

6 Conclusion
In this paper, we introduced a new class of attention-based architectures for data hiding tasks such
as steganography and watermarking which is superior to existing approaches such as [32] as it (1)
uses less memory, (2) is easier to train, and (3) is robust against common video processing operations
such as scaling, cropping, and compression. We demonstrated the effectiveness of our approach on
the video watermarking task, achieving near perfect accuracy with minimal visual distortion when
hiding an arbitrary 32-bit watermark into video files. Our code is publically available and can be
found online at: https://github.com/DAI-Lab/RivaGAN.

References
[1] H. O. Altun, A. Orsdemir, G. Sharma, and M. F. Bocko. Optimal spread spectrum watermark
embedding via a multistep feasibility formulation. IEEE Trans. on Image Processing, 18(2):371–
387, Feb 2009.
[2] M. Asikuzzaman and M. R. Pickering. An overview of digital video watermarking. IEEE
Transactions on Circuits and Systems for Video Technology, 28(9):2131–2153, Sep. 2018.
[3] Zhila Bahrami and Fardin Akhlaghian Tab. A new robust video watermarking algorithm based
on surf features and block classification. Multimedia Tools and Applications, 77(1):327–345,
2018.
[4] Shumeet Baluja. Hiding Images in Plain Sight: Deep Steganography. In Proc. of the Conf. on
Neural Information Processing Systems (NIPS) , 2017.
[5] S. Biswas, S. R. Das, and E. M. Petriu. An adaptive compressed mpeg-2 video watermarking
scheme. IEEE Trans. on Instrumentation and Measurement, 54(5):1853–1861, Oct 2005.
[6] A. Ferdowsi and W. Saad. Deep learning-based dynamic watermarking for secure signal
authentication in the internet of things. In Proc. of the IEEE Int. Conf. on Communications
(ICC), pages 1–6, May 2018.
[7] Garima Gupta, V. K. Gupta, and Mahesh Chandra. Review on video watermarking techniques
in spatial and transform domain. In Suresh Chandra Satapathy, Jyotsna Kumar Mandal, Siba K.
Udgata, and Vikrant Bhateja, editors, Information Systems Design and Intelligent Applications,
pages 683–691, 2016.
[8] Frank Hartung and Bernd Girod. Watermarking of uncompressed and compressed video. Signal
Processing, 66(3):283 – 301, 1998.

9
[9] Jamie Hayes and George Danezis. Generating steganographic images via adversarial training.
In NIPS, 2017.
[10] Dajun He, Qibin Sun, and Qi Tian. A semi-fragile object based video authentication system. In
Proc. of the 2003 Int. Symposium on Circuits and Systems, volume 3, 2003.
[11] J. R. Hernandez, M. Amado, and F. Perez-Gonzalez. Dct-domain watermarking techniques for
still images: detector performance analysis and a new structure. IEEE Transactions on Image
Processing, 9(1):55–68, Jan 2000.
[12] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training
by Reducing Internal Covariate Shift. arXiv e-prints, page arXiv:1502.03167, Feb 2015.
[13] Nie Jie and Wei Zhiqiang. A new public watermarking algorithm for rgb color image based
on quantization index modulation. In 2009 Int. Conf. on Information and Automation, pages
837–841, June 2009.
[14] S. Kadu, C. Naveen, V. R. Satpute, and A. G. Keskar. Discrete wavelet transform based
video watermarking technique. In Proc. of the Int. Conf. on Microelectronics, Computing and
Communications (MicroCom), pages 1–6, Jan 2016.
[15] N. K. Kalantari and S. M. Ahadi. A logarithmic quantization index modulation for perceptually
better data hiding. IEEE Transactions on Image Processing, 19(6):1504–1517, June 2010.
[16] Haribabu Kandi, Deepak Mishra, and Subrahmanyam R.K. Sai Gorthi. Exploring the learning
capabilities of convolutional neural networks for robust image watermarking. Computers &
Security, 65:247 – 268, 2017.
[17] Ashish M. Kothari and Ved Vyas Dwivedi. Transform domain video watermarking: Design,
implementation and performance analysis. In Proc. of the Int. Conf. on Communication Systems
and Network Technologies, pages 133–137, 2012.
[18] Jung-Soo Lee and Whoi-Yul Kim. A new object-based image watermarking robust to geometri-
cal attacks. In Pacific-Rim Conference on Multimedia, pages 58–64. Springer, 2004.
[19] S. P. Maity and S. Maity. Multistage spread spectrum watermark detection technique using
fuzzy logic. IEEE Signal Processing Letters, 16(4):245–248, April 2009.
[20] Marcin Marszałek, Ivan Laptev, and Cordelia Schmid. Actions in context. In IEEE Conference
on Computer Vision & Pattern Recognition, 2009.
[21] Bijan G. Mobasseri and Domenick Cinalli. Reversible watermarking using two-way decodable
codes. In Proc. of the Int. Society for Optical Engineering, Security, Steganography, and
Watermarking of Multimedia (VI), pages 397–404, 2004.
[22] N. Mohaghegh and O. Fatemi. H.264 copyright protection with motion vector watermarking.
In Int. Conf. on Audio, Language and Image Processing, pages 1384–1389, July 2008.
[23] M. Noorkami and R. M. Mersereau. A framework for robust watermarking of h.264-encoded
video with controllable detection performance. IEEE Trans. on Information Forensics and
Security, 2(1):14–23, March 2007.
[24] S. Pereira, J. J. K. O. Ruanaidh, F. Deguillaume, G. Csurka, and T. Pun. Template based
recovery of fourier-based watermarks using log-polar and log-log maps. In Proc. of the IEEE
Int. Conf. on Multimedia Computing and Systems, volume 1, pages 870–874, June 1999.
[25] Iain E. Richardson. The H.264 Advanced Video Compression Standard. Wiley Publishing, 2nd
edition, 2010.
[26] Mathias Schlauweg, Dima Pröfrock, Benedikt Zeibich, and Erika Müller. Self-synchronizing
robust texel watermarking in gaussian scale-space. In Proceedings of the 10th ACM Workshop
on Multimedia and Security, MM&Sec ’08, pages 53–62, New York, NY, USA, 2008.
ACM.
[27] M. D. Swanson, Bin Zhu, B. Chau, and A. H. Tewfik. Multiresolution video watermarking
using perceptual models and scene segmentation. In Proceedings of International Conference
on Image Processing, volume 2, pages 558–561 vol.2, Oct 1997.
[28] Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp: Invisible hyperlinks in physical
photographs. CoRR, abs/1904.05343, 2019.

10
[29] V. Vukotić, V. Chappelier, and T. Furon. Are Deep Neural Networks good for blind image
watermarking? In Proc. of the IEEE Int. Workshop on Information Forensics and Security
(WIFS), pages 1–7, Dec 2018.
[30] Xinyu Weng, Yongzhi Li, Lu Chi, and Yadong Mu. Convolutional video steganography with
temporal residual modeling. CoRR, abs/1806.02941, 2018.
[31] B. Yann, L. Nathalie, and D. Jean-Luc. A comparative study of different modes of perturbation
for video watermarking based on motion vectors. In Proc. of the 12th Euro. Signal Processing
Conf., pages 1501–1504, 2004.
[32] Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. HiDDeN: Hiding Data With Deep
Networks. In Proc. 15th Euro. Conf. on Computer Vision (ECCV) Part XV, pages 682–697,
2018.
[33] Wenwu Zhu, Zixiang Xiong, and Ya-Qin Zhang. Multiresolution watermarking for images and
video. IEEE Trans. on Circuits and Systems for Video Technology, 9(4):545–550, June 1999.

Tfn-Nursing Theories
100% (7)
Tfn-Nursing Theories
31 pages
AGA Report 7-Measurement of Natural Gas by Turbine Meters
No ratings yet
AGA Report 7-Measurement of Natural Gas by Turbine Meters
77 pages
Quiz 11 Unit .3 Patricia E. Benner Introduction of Nursing Theory & Model
No ratings yet
Quiz 11 Unit .3 Patricia E. Benner Introduction of Nursing Theory & Model
3 pages
Anjaneyulu Base Paper
No ratings yet
Anjaneyulu Base Paper
28 pages
DCT Based Video Watermarking in MATLAB PDF
No ratings yet
DCT Based Video Watermarking in MATLAB PDF
11 pages
Base Paper
No ratings yet
Base Paper
13 pages
Ijct V2i6p5
No ratings yet
Ijct V2i6p5
11 pages
Digital Watermarking For Big Data
No ratings yet
Digital Watermarking For Big Data
17 pages
A Robust Video Watermarking Technique For The Tamper Detection of Surveillance Systems
No ratings yet
A Robust Video Watermarking Technique For The Tamper Detection of Surveillance Systems
31 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
12 pages
Digital Image Water Marking Using Deep Learning
No ratings yet
Digital Image Water Marking Using Deep Learning
16 pages
Robust Image Watermarking Based On Generative Adve
No ratings yet
Robust Image Watermarking Based On Generative Adve
27 pages
Deep Learning-Based Watermarking Techniques Challenges: A Review of Current and Future Trends
No ratings yet
Deep Learning-Based Watermarking Techniques Challenges: A Review of Current and Future Trends
30 pages
Robust Image Watermarking Theories and Techniques: A Review: Vol.12, February2014
No ratings yet
Robust Image Watermarking Theories and Techniques: A Review: Vol.12, February2014
17 pages
Electronics: An End-to-End Video Steganography Network Based On A Coding Unit Mask
No ratings yet
Electronics: An End-to-End Video Steganography Network Based On A Coding Unit Mask
15 pages
Research Proposal Idea
No ratings yet
Research Proposal Idea
3 pages
A Robust Video Watermarking Technique Using DWT, DCT, and FFT
No ratings yet
A Robust Video Watermarking Technique Using DWT, DCT, and FFT
5 pages
Watermarking Scheme For Color Images: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
No ratings yet
Watermarking Scheme For Color Images: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
4 pages
Robust Image Watermarking Based On Generative Adversarial Network
No ratings yet
Robust Image Watermarking Based On Generative Adversarial Network
10 pages
Digital Watermarking For Big Data
No ratings yet
Digital Watermarking For Big Data
15 pages
Video Watermarking: Subhajit Brojabasi M.Tech 3 Semester Roll No - Reg No - 143000410090 Prof. Mihir Sing
No ratings yet
Video Watermarking: Subhajit Brojabasi M.Tech 3 Semester Roll No - Reg No - 143000410090 Prof. Mihir Sing
40 pages
Video Watermarking: Manjesh Kumar (12100027) Dr. David Peter S
No ratings yet
Video Watermarking: Manjesh Kumar (12100027) Dr. David Peter S
34 pages
Protection and Content Authentication
No ratings yet
Protection and Content Authentication
9 pages
Literature Survey: Hardware Implementation
No ratings yet
Literature Survey: Hardware Implementation
7 pages
V3i9201429 PDF
No ratings yet
V3i9201429 PDF
6 pages
Oblivious Image Watermarking Combined With JPEG Compression
No ratings yet
Oblivious Image Watermarking Combined With JPEG Compression
9 pages
Hide and Track 24
No ratings yet
Hide and Track 24
12 pages
61 CopyrightProtection
No ratings yet
61 CopyrightProtection
6 pages
Yangshitaijiquan - Gu Liuxin
100% (1)
Yangshitaijiquan - Gu Liuxin
162 pages
VLSI Architecture and Chip For Combined Invisible Robust Watermarking
No ratings yet
VLSI Architecture and Chip For Combined Invisible Robust Watermarking
23 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
4 pages
A Hybrid Block Based Watermarking Algorithm Using DWT-DCT-SVD Techniques For Color Images
No ratings yet
A Hybrid Block Based Watermarking Algorithm Using DWT-DCT-SVD Techniques For Color Images
7 pages
Paper Videpo Tra
No ratings yet
Paper Videpo Tra
5 pages
Secured Reversible Data Transmission by Using Gzip Deflector Algorithm For Encoded AVC Video
No ratings yet
Secured Reversible Data Transmission by Using Gzip Deflector Algorithm For Encoded AVC Video
6 pages
Reversible Watermarking Thesis
100% (3)
Reversible Watermarking Thesis
7 pages
Digital Image Watermarking Using Deep Learning - A Survey
No ratings yet
Digital Image Watermarking Using Deep Learning - A Survey
12 pages
J - Neucom - 2018 - 09 - 091 - Anna's Archive
No ratings yet
J - Neucom - 2018 - 09 - 091 - Anna's Archive
33 pages
Video Watermarking Using Wavelet Transformation
No ratings yet
Video Watermarking Using Wavelet Transformation
4 pages
Electronics: A Hidden DCT-Based Invisible Watermarking Method For Low-Cost Hardware Implementations
No ratings yet
Electronics: A Hidden DCT-Based Invisible Watermarking Method For Low-Cost Hardware Implementations
25 pages
Video Watermarking Algorithm Based On Pseudo 3D DCT and Quantization Index Modulation
No ratings yet
Video Watermarking Algorithm Based On Pseudo 3D DCT and Quantization Index Modulation
4 pages
A Robust Non-Blind Algorithm For Watermarking Color Images Using Multi-Resolution Wavelet Decomposition
No ratings yet
A Robust Non-Blind Algorithm For Watermarking Color Images Using Multi-Resolution Wavelet Decomposition
7 pages
Color Image Steganography Using Deep Convolutional Autoencoders Based On Resnet Architecture
No ratings yet
Color Image Steganography Using Deep Convolutional Autoencoders Based On Resnet Architecture
16 pages
Implementation of H.264/AVC Video Authentication System Using Watermark
No ratings yet
Implementation of H.264/AVC Video Authentication System Using Watermark
7 pages
Hartung99 05
No ratings yet
Hartung99 05
80 pages
08 Watermarking
No ratings yet
08 Watermarking
46 pages
2 2010 A Digital Robust Video Watermarking Using 4 Level DWT
No ratings yet
2 2010 A Digital Robust Video Watermarking Using 4 Level DWT
13 pages
Block-Based Discrete Wavelet Transformsingular
No ratings yet
Block-Based Discrete Wavelet Transformsingular
19 pages
Text Water Marking Using Techniques DCT and DWT: A Review
No ratings yet
Text Water Marking Using Techniques DCT and DWT: A Review
6 pages
LSB Based Digital Watermarking Technique
No ratings yet
LSB Based Digital Watermarking Technique
4 pages
Video Watermarking: Subhajit Brojabasi Prof. Mihir Singh
No ratings yet
Video Watermarking: Subhajit Brojabasi Prof. Mihir Singh
31 pages
Extended Performance of Digital Video Watermarking Using Hybrid Wavelet Transform With Haar, Cosine, Kekre, Walsh, Slant and Sine Transforms
No ratings yet
Extended Performance of Digital Video Watermarking Using Hybrid Wavelet Transform With Haar, Cosine, Kekre, Walsh, Slant and Sine Transforms
6 pages
Digital Watermarking: Michael Stumpfl
No ratings yet
Digital Watermarking: Michael Stumpfl
10 pages
Hurrah 2017
No ratings yet
Hurrah 2017
5 pages
Robust Adaptive Data Hiding in Video Watermarking: S.Jagadeesan, Senior Lecturer
No ratings yet
Robust Adaptive Data Hiding in Video Watermarking: S.Jagadeesan, Senior Lecturer
9 pages
Home Automation
67% (3)
Home Automation
56 pages
Digital Watermarking: Algorithms and Applications: Park, Jungjin
No ratings yet
Digital Watermarking: Algorithms and Applications: Park, Jungjin
36 pages
An Adaptive Watermarking Scheme For E-Government Document Images
No ratings yet
An Adaptive Watermarking Scheme For E-Government Document Images
19 pages
Fuzzy Distortion Controlled Video Steganography Using H.264 Video Coding
No ratings yet
Fuzzy Distortion Controlled Video Steganography Using H.264 Video Coding
12 pages
Compusoft, 2 (4), 97-102
No ratings yet
Compusoft, 2 (4), 97-102
6 pages
MPEG Video Watermarking Technologies
No ratings yet
MPEG Video Watermarking Technologies
11 pages
J1 - Mtap - 2020
No ratings yet
J1 - Mtap - 2020
16 pages
CCSP - Certified Cloud Security Professional Exam Insights
From Everand
CCSP - Certified Cloud Security Professional Exam Insights
SUJAN
No ratings yet
Efficient Container Image Building with BuildKit: Definitive Reference for Developers and Engineers
From Everand
Efficient Container Image Building with BuildKit: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
300+ (MOSK ASKED) L&T Civil Engineer Interview Questions and Answers
No ratings yet
300+ (MOSK ASKED) L&T Civil Engineer Interview Questions and Answers
10 pages
Name - Shivam Jangid Class - 2-C Enroll. No.-06559301619 Assignment - 3 (PCC & RCC)
No ratings yet
Name - Shivam Jangid Class - 2-C Enroll. No.-06559301619 Assignment - 3 (PCC & RCC)
28 pages
BSR Tran Uno Bsu
No ratings yet
BSR Tran Uno Bsu
2 pages
Pemanfaatan Serat Selulosa ECENG GONDOK (Eichhornia Crassipes) SEBAGAI BAHAN BAKU Pembuatan Kertas: Isolasi Dan Karakterisasi
No ratings yet
Pemanfaatan Serat Selulosa ECENG GONDOK (Eichhornia Crassipes) SEBAGAI BAHAN BAKU Pembuatan Kertas: Isolasi Dan Karakterisasi
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
63 pages
AECC Assignment - 2
No ratings yet
AECC Assignment - 2
5 pages
Curvas Graficas de LM35
No ratings yet
Curvas Graficas de LM35
2 pages
The Earth's Magnetic Field: Stephen Kimbrough Damjan Štrus Corina Toma
No ratings yet
The Earth's Magnetic Field: Stephen Kimbrough Damjan Štrus Corina Toma
5 pages
Office 2016 Activation
No ratings yet
Office 2016 Activation
2 pages
Pitambara 1
No ratings yet
Pitambara 1
30 pages
Introduction To Career Counseling
No ratings yet
Introduction To Career Counseling
27 pages
Limit of PAC (1.5%) Analysis of The Effect of Polyanionic Cellulose On Viscosity and Filtrate Volume in Drilling Fluid
No ratings yet
Limit of PAC (1.5%) Analysis of The Effect of Polyanionic Cellulose On Viscosity and Filtrate Volume in Drilling Fluid
6 pages
Lab 2
No ratings yet
Lab 2
5 pages
Summarise The Nature and Effects of Perceived Fairness in Groups C2
No ratings yet
Summarise The Nature and Effects of Perceived Fairness in Groups C2
1 page
Oup Dynamics
No ratings yet
Oup Dynamics
18 pages
İdi̇l Ören CV
No ratings yet
İdi̇l Ören CV
3 pages
Food Culture and A Travelogue Nine Fishy Tales of Samanth Subramanian's Following Fish
No ratings yet
Food Culture and A Travelogue Nine Fishy Tales of Samanth Subramanian's Following Fish
4 pages
NEET Chemistry Chapter Wise Mock Test - Physical Chemistry I - CBSE Tuts
No ratings yet
NEET Chemistry Chapter Wise Mock Test - Physical Chemistry I - CBSE Tuts
25 pages
Belk - Possessions and The Extended Self
No ratings yet
Belk - Possessions and The Extended Self
31 pages
Recent IELTS Writing Topics and Questions 2024 - How To Do IELTS
No ratings yet
Recent IELTS Writing Topics and Questions 2024 - How To Do IELTS
49 pages
Designing and Building A Sustainable
No ratings yet
Designing and Building A Sustainable
3 pages
C5c Total Internal Reflection and The Critical Angle
No ratings yet
C5c Total Internal Reflection and The Critical Angle
2 pages
Local Media3092843488830198412
100% (1)
Local Media3092843488830198412
2 pages
Divisibility Rules 86
No ratings yet
Divisibility Rules 86
60 pages
(PDF) Laser Diode
No ratings yet
(PDF) Laser Diode
16 pages
ANN Course File 2011
No ratings yet
ANN Course File 2011
8 pages
Final Term Table Exam Fall 2024-2025 Final
No ratings yet
Final Term Table Exam Fall 2024-2025 Final
3 pages

RobustGAN 19

Uploaded by

RobustGAN 19

Uploaded by

Robust Invisible Video Watermarking with Attention

Kevin Alex Zhang Lei Xu

Alfredo Cuesta-Infante Kalyan Veeramachaneni

Preprint. Under review.

3.2 Noise Layers

4 Experiments and Results

You might also like