Water Marking Audio Files With Copyrights
Water Marking Audio Files With Copyrights
net/publication/361231013
CITATION READS
1 232
1 author:
Farida Aboelezz
The German University in Cairo
1 PUBLICATION 1 CITATION
SEE PROFILE
All content following this page was uploaded by Farida Aboelezz on 11 June 2022.
Bachelor Thesis
(ii) due acknowledgement has been made in the text to all other material used.
–––––––––––––––––––––
Farida Aboelezz
June 6th , 2022
ii
Abstract
The internet is a host for billions of pirated audio files, including songs, audio
books, podcasts, and voice recordings. People who illegally upload audio files on
the internet disregard all copyright laws. Digital watermarking is a widely used
technology for copyright protection and content authentication. This thesis tries
to solve the business problem of the need to protect audio files from copyright
infringement through audio watermarking. An overview of audio files and digital
watermarking is presented and the existing literature for audio watermarking tech-
niques are discussed and contrasted. An audio watermarking scheme is proposed
and its performance is measured and compared with other schemes previously pro-
posed and implemented.
iii
Acknowledgement
–––––––––––––––––––––
Farida Aboelezz
June 6th , 2022
iv
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
v
2.4.6 Time–Domain Properties . . . . . . . . . . . . . . . . . . . . 22
2.4.7 Frequency–Domain Properties . . . . . . . . . . . . . . . . . 22
2.4.8 Basic Histogram Properties . . . . . . . . . . . . . . . . . . 23
2.4.9 Information Entropy . . . . . . . . . . . . . . . . . . . . . . 23
2.5 State–of–the–Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Technology Comparison . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Conclusions 64
References 66
vi
List of Figures
vii
List of Tables
viii
List of Acronyms and Initialisms
2-D Two-Dimensional
ADC Analog-to-Digital Converter
BPS Bits Per Second
CC Creative Commons
CPTWG Copy Protection Technical Working Group
CF Crest Factor
DAC Digital-to-Analog Converter
DB Decibel
DCT Discrete Cosine Transform
DFT Discrete Fourier Transform
DFRST Discrete Fractional Sine Transform
DST Discrete Sine Transform
DWT Discrete Wavelet Transform
FFT Fast Fourier Transform
HAS Human Auditory System
IDFT Inverse Discrete Fourier Transform
IMF Intrinsic Mode Function
ISO International Organization for Standardization
KNN K-Nearest Neighbor
LSB Least Significant Bit
LWT Lifting Wavelet Transform
MPEG Moving Picture Experts Group
MSE Mean Squared Error
ix
ODG Objective Difference Grade
PAPR Peak–to–Average Power Ratio
PSNR Peak Signal to Noise Ratio
P2P Peer-to-Peer
QIM Quantization Index Modulation
RMS Root Mean Square
SDMI Secure Digital Music Initiative
SIM Similarity
SNR Signal-to-Noise Ratio
SSIM Structural Similarity Index Measure
SD Standard Deviation
SV Singular Value
SVD Singular Value Decomposition
TC Temporal Centroid
ZC Zero Crossings
x
Chapter 1
Introduction
In this chapter, I provide a brief introduction to this thesis by explaining the moti-
vation behind it, the business problem that it attempts to solve, my contributions
as well as an explanation to how the paper is organized.
1.1 Motivation
The online world has given rise to new innovative ways to share art and create
digital works. The invention of online streaming has created a multi-billion dollar
industry that changed the way people listen to audio files. However, it faces a
major dilemma, which is online piracy. Online piracy is the illegal copying and
distribution of copyrighted content for which the owners of the work did not give
consent to. The online world and audio streaming websites have a massive world-
wide user base since they are easily accessible, which renders the protection of au-
dio files from any form of copyright infringement extremely difficult to accomplish.
1
This problem remains on the rise with very limited research in this area at-
tempting to approach the issue using digital watermarking. A digital watermark
is any sort of marker embedded in an audio, video, text or image. Through audio
watermarking, creators of content would be able to prove ownership of their con-
tent and protect their copyrights.
1.2 Contributions
In this thesis, I explain the business problem of copyright infringement and the
concept of digital watermarking in combating this problem. I also provide an ex-
tensive literature review of 30 previously implemented audio watermarking schemes
and compare them with one another. I propose a double layer message security
scheme with cryptography in the first layer and steganography in the second layer
implemented on Wolfram Mathematica® . The proposed scheme’s performance is
evaluated and compared with other similar schemes previously implemented.
2
1.3 Organization
3
Chapter 2
4
2.1 Copyright Infringement
As a creator, one has the right to distribute, modify, perform, and display their
works as they prefer. However, if someone else takes their work, or does any of
these actions, then they will be participating in copyright infringement. Accord-
ing to the US Copyright office, “copyright infringement occurs when a copyrighted
work is reproduced, distributed, performed, publicly displayed, or made into a
derivative work without the permission of the copyright owner”. Moreover, if
someone knowingly induced or helped someone else in doing these actions, they
will also be guilty of contributory infringement. Copyright laws provide severe
civil and criminal penalties for copyright infringement [1, 2, 3].
The only exceptions to copyright infringement laws include fair use, public do-
main, creative commons, and direct licensing. Fair use is when copyrighted works
may be used without permission or payment for a limited purpose, such as educa-
tional use, to comment upon, criticize or for a parody. For example, if you write an
article placing a paragraph of a text of an author you are criticizing, this is consid-
ered fair use since it is for the purpose of criticism. Copyrights last for the lifetime
of their owner and for 70 years after, so, when the term of copyrighted works is
5
over, the works are placed in the public domain and may be used by anyone in any
way. Creative Commons (CC), an internationally active non-profit organization,
provides creators with licenses that allow them to communicate which rights they
are reserving for themselves and which rights they give permission for others to use
their work. Finally, the finest way is to directly seek a license to use copyrighted
works before one intends to use it, which results in obtaining a direct license if the
creator agrees [1, 2, 3].
The significance of such type of copyright infringement lies in the negative im-
pact it has on different industries. Taking the music industry as an example, it
has lost billions of dollars because of music piracy. While downloading one song
may not feel like a very serious crime, the accumulative impact of downloading
and streaming millions of songs can be devastating. When a digital work is pirated
and distributed illegally online, the creator of such work is not compensated. One
6
credible study by the Institute for Policy Innovation reports the annual harm of
music piracy to be 12.5 billion dollars in losses to the U.S. economy as well as more
than 71,000 lost jobs and 2.7 billion dollars in lost wages to American workers [5,
7, 6].
In the following sections, I briefly define what an audio file is and give an overall
view of audio watermarking and some performance evaluation metrics of water-
marking schemes. I also review the existing literature for audio watermarking and
audio steganography. Finally, I suggest different technologies that can be used to
implement an audio watermarking scheme.
Before discussing audio watermarking, this section defines what an audio file is
and explains the basics of digital audio.
7
Figure 2.1: Audio signals conversion [8].
Audio signals are the representation of sound, which is in the form of digital
and analog signals. Analog signals occur in electrical signals, whereas digital sig-
nals occur in binary representations. Digital audio is a technology used to record,
store, manipulate, generate and reproduce sound using audio signals encoded in
digital form [8].
Figure 2.1 explains how an analog signal can be converted into a digital signal
for digital storage and how a digital signal can be transformed back to an analog
one for output.
8
waveform. Each sample represents the intensity of the waveform in that instant.
The more samples taken, the better the representation, and the higher the qual-
ity of the digital audio becomes. The samples are stored in binary (a numbering
scheme in which there are only two possible values for each digit: 0 and 1), like
any digital data. The samples have to be merged into a single data file in a correct
format and the digital audio signal can then be transmitted or stored. Digital stor-
age can be on a CD, and MP3 player, a hard drive, USB flash drive, or any other
digital data storage device. There exist audio compression techniques to reduce
the file size for the digital audio signals to be easily streamed to other devices [8].
In order for the digital audio signals to be outputted and listened to, it needs to
be converted back to an analog signal done through a digital-to-analog converter
(DAC) that, like ADCs, run at a specific sampling rate and bit resolution [8].
The main advantage of digital audio over analog is that computers can ma-
nipulate numbers without any errors by performing calculations on the long list
of numbers. Computers are able to make perfect copies of the list of numbers
you want, thus your audio file. Not only that, but computers can also combine
different recordings together, by adding the numbers in one list to another list,
resulting in a third list of numbers which includes both sounds at the same time.
This is known as digital mixing. Moreover, computers can manipulate the streams
of numbers in a digital audio making it quieter or louder, adding effects, playing
recordings at different speeds, removing noise or echo, etc. [8].
9
2.3 Audio Watermarking
10
Figure 2.2: Process of digital watermarking [13].
11
Figure 2.3: Process of digital audio watermarking [16].
How watermarking audio files works is demonstrated in Fig. 2.3. In the most
general form, audio watermarking hides a user-specified bit-stream (the water-
mark) in digital audio. The original audio and the bit-stream are both inputted
into the encoder with a secret key known only to the person creating the water-
mark, producing the watermarked digital audio. This key is used again with the
watermarked audio in order to extract and recover the watermark [16].
12
The next application of watermarking was introduced in the eighteenth century
to combat attempts of cash forgery. Later, in 1954, Emil Hem Brooke recorded his
musical-works patent which was the basis for the advancements that followed in
the technology of watermarking. Particularly after 1988, the term “computerized
watermarking” became known [18], although according to [11], the term “digital
watermarking” dates back to 1979.
The popularity of watermarking surged post 1990 around the world and many
associations began considering implementing it. Associations such as the Secure
Digital Music Initiative (SDMI) started a watermarking framework for music, the
Copy Protection Technical Working Group (CPTWG) also started to consider
watermarking its DVDs, and the International Organization for Standardization
(ISO) also showed interest in watermarking its MPEG [18, 17]. By 1998, digital
watermarking had become a well-established technology [11].
2.3.2 Applications
Digital watermarking can be used to achieve various objectives other than copy-
right protection including fingerprinting, content authentication, and broadcast
monitoring [19, 20]. In this section, I discuss the different applications of audio
watermarking.
13
identifies the owner of the digital media in order to prevent others from claiming it
[19]. Hence, in cases of copyright disputes, ownership of digital media can be estab-
lished through the embedded watermark [14]. If the watermark includes copyright
regulations related to the fair use of content, then it also achieves the objective of
copy control. The objective behind copy control is to disable unauthorized par-
ties from modifying, copying, or redistributing the content without permission [12].
14
toring of content broadcasted on different networks. This is referred to as broadcast
monitoring where an identification code can is embedded as a watermark in the
work being broadcast and a computer-based monitoring system can detect this
watermark, verifying that content is broadcasted or not. It is especially useful in
the entertainment industries to check whether content is broadcast according to
contracts with broadcasters [12, 14, 19].
2.3.3 Attacks
Even though watermarking can achieve the objectives of copyright protection and
content authentication, attackers may still attempt to manipulate the protected
media as well as the watermark. In this section, I define what an attack is in the
realm of audio watermarking and overview its different types and categories.
15
cannot be detected, false watermarks are detected or unauthorized detection of
watermarks takes place. From this we can categorize different types of attacks
that are dependent on the knowledge of the attacker, the tools they have at their
disposal and the availability of watermarked version of the same or different works
[21].
Attacks which produce “no detection” of watermarks can be divided into re-
moval attacks, which erase the watermark from the watermarked data, and desyn-
chronization attacks, which misalign the watermark detector and the watermark
without the removal of the watermark information. Removal attacks can be a
result of normal signal processing operations in which attackers do not need any
special knowledge of the underlying algorithms or in signal processing such as noise
addition, resampling, filtering, echo addition and data compression [10]. Moreover,
specific designed attacks, in which the attacker has knowledge of the watermark
embedding mechanism, enable the attacker to design specific algorithms by finding
and exploiting their weaknesses. Collusion attacks are used when the attacker does
not have knowledge about the embedding mechanism but has access to more than
one watermarked works with the same watermark as long as the added watermark
signal is not a function of the original work. On the other hand, if the attacker
has no knowledge about the embedding algorithm and only one watermarked work
but has access to a watermark detector, they can apply oracle attacks [21].
16
As a consequence, the detector will not be able to detect the watermark. These
attacks consist of global and local transformations, as well as scrambling attacks.
Examples of these types of attacks are Random Samples Cropping and Zeros In-
serting, Jittering, Pitch-invariant Time Stretching, and Tempo-preserved Pitch
Shifting [10, 21].
2.3.4 Properties
There are multiple properties to look out for in a watermarking scheme. How-
ever, the basic and most important requirements of any watermarking scheme are
imperceptibility, robustness, security, data payload, and computation complexity
[10, 11, 14, 19]. There are trade-offs between the properties since many of them
17
Table 2.1: Key requirements for each watermarking application.
Watermarking application Prioritized parameter
Copyright protection High security and imperceptibility
Content description High data payload
Content authentication Low robustness
Real–Time watermarking Low computational complexity
are mutually exclusive [10]. The relative importance of each property depends on
the requirements of the system application [19]. Table 2.1 summarizes the key
requirements of each watermarking application.
18
cation of a watermarking system [19]. In some instances, a fragile watermark is
favored over a robust one [11].
The property of data payload refers to the quantity of bits that a watermark
embeds per time interval. In other words, it describes how much data to embed
as a watermark for effective detection of the watermark [10, 14]. There exists a
trade-off between data payload and robustness where higher payloads result in
lower robustness and vice versa [10]. Different applications require different data
payload where copy control applications may need embedding of a few bits in cover
works [19].
19
The last property that I will be discussing in this section is computation com-
plexity, or cost, of a watermarking scheme. It is described as the effort and time
needed for embedding and detection of watermark [11, 19]. There is a direct rela-
tion between this complexity and the desired level of security. In other words, more
computation complexity is needed for strong security of a watermark. However, in
some cases where speed is favored over security, such as in real-time applications,
lower computational complexity is required [14].
As per the above discussion, there will always be some properties that are
prioritized and other compromised, depending on the desired application of the
watermarking scheme. Any watermarking scheme is designed with consideration
of these requirements which are optimized to achieve the goal of the scheme [14].
Performance evaluation metrics are used to measure the performance, speed and
effectiveness of a watermarking scheme. Moreover, statistical tests are carried out
to make sure a watermarking scheme is robust against attacks. In this section, I
highlight some of the metrics and tests that I will be using to evaluate the perfor-
mance and security of my proposed watermarking scheme.
20
2.4.1 Hearing Test
The Human Auditory System (HAS) can be used for evaluating differences be-
tween a cover audio and a watermarked audio owing to its sensitivity. Therefore,
the simplest test to recognize if any alterations occurred to an audio file after
watermarking is attentively listening to both files. A watermarking scheme is ef-
fective if there are no audible differences after hearing both version of an audio file.
Waveform plots for a cover audio file and a watermarked audio file can be exam-
ined with the human eye to determine if there are any differences. If the plots are
identical through observation, this proves the effectiveness of the watermarking
scheme used.
The Mean Squared Error (MSE) is calculated through comparing the samples of
a cover audio and a watermarked audio as follows
N
1 X
M SE = (ci − si )2 (2.1)
N i=1
where N is the number of samples, ci is the sample value of the cover audio and
si is the sample value of the watermarked audio. If the original audio file and the
watermarked audio file are totally identical, the MSE would be equal to zero.
21
2.4.4 Peak Signal–to–Noise Ratio
The Peak Signal to Noise Ratio (PSNR) is calculated in decibels (dB) by getting
the maximum sample value of a cover audio and dividing it by the MSE as follows
I2
max
P SN R = 10 × log 10 (2.2)
M SE
where I max is the maximum value of the samples in the cover audio.
Intensity properties of an audio file include power, root mean square (RMS) of
values, and loudness. Power is given in terms of the mean of the squared values
and the loudness is computed with Stevens’ power law.
Time-domain properties of an audio file include the crest factor (CF), the peak to
average power ratio (PAPR), the temporal centroid (TC) values, and the number
of zero crossings (ZC). The CF is calculated as the maximum divided by the RMS
and the PAPR is calculated as the maximum power divided by the average power.
22
maximum power spectrum divided by the mean of the power spectrum. The spec-
tral flatness is calculated as the geometric mean divided by the mean of the power
spectrum. The spectral kurtosis is computed as the kurtosis of the magnitude
spectrum [25]. The frequency below which most of the energy is concentrated is
the spectral roll off. The spectral spread is a measure of the bandwidth of the
power spectrum.
The basic histogram properties of an audio file include the maximum value, the
minimum value, and statistical measures. Statistical measures include the mean,
median and the standard deviation (SD) of the values.
2.5 State–of–the–Art
This section provides an overview of the latest advancements in the fields of digital
audio watermarking and steganography. The existing literature is reviewed, and
the schemes proposed by different authors are compared. Digital steganography
23
and audio steganography are included in this discussion. Papers covering digital
watermarking and audio watermarking are also be reviewed.
24
In [30], the authors designed a new watermarking system using discrete Fourier
transform (DFT) where the audio file gets segmented into non-overlapping frames
and watermarks are then embedded into the highest peak in the magnitude spec-
trum of each frame. For this scheme, the Similarity (SIM) values range from 13
to 20 and the SNR values range from 20 dB to 28 dB for different watermarked
sounds, which is much lower compared to [29].
The authors of [31] proposed a watermarking method in which the audio file is
transformed into Discrete Cosine Transform (DCT) domain. The absolute values
of DCT coefficients are divided into an arbitrary number of segments and each
segment’s energy is calculated. Then, watermarks are embedded into the selected
peaks of the highest energy segment. Simulation results show that this method is
highly robust against different kinds of attacks and achieves SIM values ranging
from 13 to 32 and SNR values ranging from 13 dB to 24 dB for different water-
marked sounds, which are close to results in [30].
25
increased up to a certain level by selecting longer audio files or through insertion
of the watermark signal multiple times in an audio file. Furthermore, upon inter-
preting the results, it has been found that SNR values depend on type of audio
with loud pitch.
26
A novel steganography method is introduced in [35] which is based on LSB ma-
nipulation and inclusion of redundant noise as secret key in the message. In this
method, the high frequency DCT coefficients of the cover audio file are replaced
with the low frequency DCT coefficients of the watermark audio file. This method
exhibited a very high watermark channel capacity; however, the main disadvan-
tages are the extremely low robustness of the method and the unlikelihood that
the embedded watermark would survive digital to analogue or analogue to digital
conversions.
In [36], the authors present a new high bit rate LSB audio watermarking
method. The algorithm is a two-step one that embeds watermark bits into higher
LSB layers resulting in increased robustness. The idea behind this scheme is wa-
termark embedding that causes minimal embedding distortion of the host audio.
Results of both objective and subjective tests demonstrate that this algorithm
outperforms standard LSB insertion algorithm, as presented in [35] and [34], with
higher SNR values and higher perceptual quality of watermarked audio.
27
sage is encrypted with AES—128 and is then embedded using LSB of the audio
file using a Tan Logistic Map generated sequence. This scheme has a better MSE
(0.06822) and PSNR (101.971) applied on classic music than the scheme in [37]
proving that it is an effective audio steganographic technique.
Similar to the previous schemes, the one in [39] starts with AES—128 encryp-
tion of the secret data. The steganographic layer is carried out by LSB embedding
with an algorithm that utilizes an audio file as a cover where the audio data is pre-
sented as samples with an immense range and the range itself is manipulated and
then the secret data is added to it consequently. This algorithm makes it basically
impossible to extract the secret data without manipulating the range itself. The
obtained PSNR values of the tested music files range from 74.1758 dB to 74.2525
dB which is reasonably high.
28
composition is proposed in [40]. The audio signal is divided into frames where each
frame is decomposed adaptively, by EMD, into intrinsic oscillatory components
called Intrinsic Mode Functions (IMFs). The watermark and the synchronization
codes are then embedded into the extrema of the last IMF of the audio signal,
a very low frequency mode. Simulations are performed on different audio signals
sampled at 44.1 kHz resulting in SNR values above 20 dB and Objective Differ-
ence Grade (ODG) values between -1 and 0. These results demonstrate the good
quality of the watermarked signals.
29
based on SVD in the DWT domain using synchronization code. A watermark is
embedded through the application of a quantization-index-modulation process on
the singular values in the SVD of the wavelet domain blocks. The SNR values of
selected audio files, on which the algorithm was applied, range from 22.11 dB to
26.84 dB, and payload value of 45.9 bps. The scheme exhibits better performance
against MP3 compression compared to other earlier audio watermarking schemes.
A different watermarking algorithm that is also based on DWT and SVD tech-
niques is proposed in [46]. The authors proposed a new signal framing, DWT
matrix formation and embedding methods and implemented them successfully to
30
enhance the quality of the watermarked audio. The SNR values for different au-
dio files, under different attacks, yield results that range between 38.9659 dB and
47.3899 dB which is higher than the results yielded by the scheme in [45].
31
Transform (DST) coefficients with the watermark image before re-constituting the
signal. SNR results for 8 different watermarked audio signals range from 25.15
dB to 31.07 dB. Moreover, it is a secure scheme due to the usage of secret keys
generated during the watermark insertion process. This proposed scheme showed
robustness and imperceptibility, however, its main drawback would be that the
original signal is required for extracting the watermark, unlike in [46].
In [50], the proposed algorithm utilizes DWT, SVD and secret sharing method
to watermark a given audio. The algorithm was tested on different audios with
different sampling rates and a scaling factor value of 0.04. Different attacks were
stimulated and the accuracy of the algorithm was measured using accuracy rate.
Results show that this technique is robust against different attacks and that in-
creasing the scaling factor improves accuracy, however, at the cost of audibility of
the watermark.
32
(LWT) and SVD is proposed in [51]. The watermark data is inserted into the LWT
coefficients of the low frequency sub-band taking advantage of SVD, Quantization
Index Modulation (QIM) and synchronization code technique. The utilization
of QIM makes this scheme blind in nature. Experimental results show that this
scheme is inaudible, robust to general signal processing and desynchronization at-
tacks with SNR values all above 20 dBs. Moreover, the scheme outperforms the
scheme proposed in [27]
The authors in [54] propose a novel robust, transparent and high-capacity bling
audio watermarking scheme. Watermark embedding is performed through modu-
33
lating the vectors in the DCT domain subject to an auditory masking constraint,
implemented on a frame basis. The resulting payload capacity is as high as 848
bps which is much higher than in [43], [44] and [45]. However, the SNR is as low
as 17.51 dB, which is a poor result.
In this section, I briefly mention five different programming languages that can
be utilized for implementing an audio watermarking scheme. These include MAT-
LAB, Python, Java, Wolfram Mathematica, and Maple.
34
Java is a general-purpose, class-based, object-oriented programming language
and computing platform released in 1995 by Sun Microsystems. In comparison
with other programming languages, it is fairly easy to learn. Moreover, it provides
a reliable platform to build services, products and applications. It is free to down-
load for personal use as well as for development.
Maple is a math software released in 1982. It combines the world’s most pow-
erful math engine with a user-friendly interface that permits easy analysis, explo-
ration, visualization and solving of mathematical problems. It also requires an
expensive license to download and utilize, although free trials exist.
35
mentioned languages. Symbolic manipulation is easier in Mathematica and its
user interface is simpler and easier to build than in MATLAB. Also, Mathemat-
ica is better for handling calculus and differential equations. On the other hand,
MATLAB is more data-oriented and better in design functions than Mathematica.
Furthermore, Mathematica is a universal natural language that can be used for
any programming structure, whereas Maple is a software tool utilized to perform
mathematical calculations only.
36
Chapter 3
In this chapter, I discuss my approach to solve the business problem of this the-
sis: watermarking audio files with copyrights for copyright protection. Countless
websites have been illegally uploading audio files disregarding ethics, morals and
copyright laws. The solution proposed in this thesis is digitally watermarking
audio files with copyrights so that owners of such files can be protected and ap-
propriate measures taken against anyone who illegally abuses the owners’ rights.
Furthermore, the performance of the scheme is evaluated and compared with other
schemes from the literature.
37
The scheme proposed is a double layer message security scheme with cryptog-
raphy in the first layer and steganography in the second layer. The scheme starts
with secret descriptive text, a cover audio file and a 2D Tan Logistic Map. First
of all, the audio cover is changed into left and right channels. In the cryptography
layer, the secret text is encrypted using a chaotic function, a Varied Tan Logistic
Map, at the sender side and decrypted in the same way at the receiver side. The
steganography layer involves generating 2 sequences from the 2D Tan Logistic Map
which are used to embed the secret message, using LSB substitution, in the cover
audio’s left and right channels. The scheme is explained in further detail in the
following sections.
38
Figure 3.1: The proposed watermarking scheme for audio files.
Figure 3.2: The resulting sequences from the 2D Tan Logistic Map.
39
3.1.1 Data Encryption and Embedding
First of all, the cover audio is converted into signed-integers of 16 bits with a range
between - 215 and 215 - 1. Owing to the fact that the audio data range is negative,
a value of 215 is added to extend the range to 0 to (215 - 1) + 215 . This addition is
done to be able to convert the samples of the cover audio into binary. Then, the
cover audio file generates left and right channels.
where xn is 0.5 for the proposed scheme. This results in a random sequence of
real numbers. A threshold is chosen, with a value of 0.6 for the proposed scheme,
which is compared to the generated real numbers to produce the secret key bits
as follows
0, if xn+1 < threshold
bi = (3.2)
1, if xn+1 > threshold
This process iterates until a key with the same length of the binary stream of
the secret message used is generated. Once the key is produced, it is XORed with
the binary stream of the secret message, generating the encrypted secret message.
Before the encrypted secret message is embedded in the cover audio file, 2
chaotic sequences are generated from 2D Tan Logistic Map which can be seen as
two equations as follows
40
and
y n+1 = tan[(πrxn + 3)(y n (1 – y n ))] (3.4)
Finally, the previously added 215 is subtracted from the sample values forming
the steganograph-ied audio file.
For extraction of the secret message, 215 is first added to the stegonagraph-ied
audio samples and are then converted to binary values. Then, the LSBs are ex-
tracted using the 2D Tan Logistic Map sequences in order to retrieve the binary
form of the encrypted secret message.
Finally, the encrypted secret message is decrypted using the chaotic function,
Varied Tan Logistic Map, used in encryption to recover the secret message. The
key for decryption is generated using the same equation used for encryption with
41
the same parameters, in other words, the encryption and decryption keys are iden-
tical and can be easily generated. Then, the key is XORed with the encrypted
secret message producing the secret message itself.
3.2 Implementation
This section presents the code used for the proposed watermarking scheme. The
scheme is implemented on Wolfram Mathematica® 13.0.
Out[1]=
42
In[3]:= Length[a]
Out[3]= 2
In[4]:= a[[1]];
In[5]:= a[[2]];
In[6]:= aP = a + 2^15;
In[7]:= Length[aP]
Out[7]= 2
Out[10]= False
In[11]:= message =
StringTake[ExampleData[{"Text", "AliceInWonderland"}],30000]
43
Out[11]= I--DOWN THE RABBIT-HOLE Alice was beginning
to get very tired of sitting by her sister
on the bank, and of having nothing to do.
Once or twice she had peeped into the book
her sister was reading, but it had no pictures
or conversations in it, "and what is the use
of a book," thought Alice, "without pictures
or conversations?" So she was considering in
her own mind (as well as she could, for the day
made her feel very sleepy and stupid), whether
the pleasure of making a daisy-chain would be
worth the trouble of getting up and picking the
daisies, when suddenly a White Rabbit with pink
eyes ran close by her . . .
We prepare the key by using tan logistic map function which is used to encrypt
the message.
Out[14]= 240000
44
We generate the key with the same length of the message bits with a threshold
value of 0.6.
Out[16]= 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 0, 1, 0, 1, 1, 0, 1, 0,
1, 0, 0, 1, 1, 0, 0, 1, 0,
0, 1, 1, 0, 1, 0, 1, 1, . . .
In[18]:= s1 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceA.xlsx"];
s2 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceB.xlsx"];
s3 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceC.xlsx"];
s4 = Import[
45
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceD.xlsx"];
s5 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceE.xlsx"];
s6 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceF.xlsx"];
s7 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceG.xlsx"];
s8 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceH.xlsx"];
s9 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceI.xlsx"];
s10 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceJ.xlsx"];
s11 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceK.xlsx"];
s12 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
46
\\FinalSequenceL.xlsx"];
s13 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceM.xlsx"];
s14 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceN.xlsx"];
s15 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceO.xlsx"];
s16 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceP.xlsx"];
s17 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceQ.xlsx"];
s18 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceR.xlsx"];
s19 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceS.xlsx"];
s20 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceT.xlsx"];
47
s21 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceU.xlsx"];
s22 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceV.xlsx"];
s23 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceW.xlsx"];
s24 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceX.xlsx"];
s25 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceY.xlsx"];
s26 = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceZ.xlsx"];
48
ss8 = Flatten[s8];
ss9 = Flatten[s9];
ss10 = Flatten[s10];
ss11 = Flatten[s11];
ss12 = Flatten[s12];
ss13 = Flatten[s13];
ss14 = Flatten[s14];
ss15 = Flatten[s15];
ss16 = Flatten[s16];
ss17 = Flatten[s17];
ss18 = Flatten[s18];
ss19 = Flatten[s19];
ss20 = Flatten[s20];
ss21 = Flatten[s21];
ss22 = Flatten[s22];
ss23 = Flatten[s23];
ss24 = Flatten[s24];
ss25 = Flatten[s25];
ss26 = Flatten[s26];
In[20]:= seqL = Join[ss1, ss2, ss3, ss4, ss5, ss6, ss7, ss8,
ss9, ss10, ss11, ss12, ss13, ss14, ss15, ss16, ss17,
ss18, ss19, ss20, ss21, ss22, ss23, ss24, ss25, ss26];
49
s2R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceBR.xlsx"];
s3R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceCR.xlsx"];
s4R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceDR.xlsx"];
s5R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceER.xlsx"];
s6R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceFR.xlsx"];
s7R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceGR.xlsx"];
s8R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceHR.xlsx"];
s9R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceIR.xlsx"];
s10R = Import[
50
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceJR.xlsx"];
s11R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceKR.xlsx"];
s12R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceLR.xlsx"];
s13R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceMR.xlsx"];
s14R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceNR.xlsx"];
s15R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceOR.xlsx"];
s16R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequencePR.xlsx"];
s17R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceQR.xlsx"];
s18R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
51
\\FinalSequenceRR.xlsx"];
s19R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceSR.xlsx"];
s20R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceTR.xlsx"];
s21R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceUR.xlsx"];
s22R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceVR.xlsx"];
s23R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceWR.xlsx"];
s24R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceXR.xlsx"];
s25R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceYR.xlsx"];
s26R = Import[
"C:\\Users\\Laila\\Desktop\\Bachelor\\Audio
\\FinalSequenceZR.xlsx"];
52
In[22]:= ss1R = Flatten[s1R];
ss2R = Flatten[s2R];
ss3R = Flatten[s3R];
ss4R = Flatten[s4R];
ss5R = Flatten[s5R];
ss6R = Flatten[s6R];
ss7R = Flatten[s7R];
ss8R = Flatten[s8R];
ss9R = Flatten[s9R];
ss10R = Flatten[s10R];
ss11R = Flatten[s11R];
ss12R = Flatten[s12R];
ss13R = Flatten[s13R];
ss14R = Flatten[s14R];
ss15R = Flatten[s15R];
ss16R = Flatten[s16R];
ss17R = Flatten[s17R];
ss18R = Flatten[s18R];
ss19R = Flatten[s19R];
ss20R = Flatten[s20R];
ss21R = Flatten[s21R];
ss22R = Flatten[s22R];
ss23R = Flatten[s23R];
ss24R = Flatten[s24R];
ss25R = Flatten[s25R];
53
ss26R = Flatten[s26R];
We embed the secret message in the LSB of the left and right channels of the
cover audio using the two chaotic functions generated above.
54
In[32]:= f = (FromDigits[#, 2] & /@ c) - 2^15;
Out[33]= 2646000
In[35]:= AudioStego == a
Out[35]= False
After finishing the embedding process, we get the steganograph-ied audio file.
Out[36]=
The extracting process in the inverse process of the embedding process. First, 215
is added to the stegonagraph-ied audio samples to be able to convert them back
to binary.
55
In[38]:= aEnc = AudioData[StegoAudio, "SignedInteger16"];
Out[44]= True
Out[45]= True
The LSBs are extracted using the chaotic sequences previously generated to
retrieve the binary form of the encrypted secret message.
Out[48]= True
56
In[50]:= For[r = 1, r <= Length[encSecMsg], r++,
secMessageRight[[r]] = binRightEncP[[seqR[[r]], 16]]]
Out[51]= True
Then, we decrypt the encrypted secret message with the generated key.
57
of a book," thought Alice, "without pictures
or conversations?" So she was considering in
her own mind (as well as she could, for the day
made her feel very sleepy and stupid), whether
the pleasure of making a daisy-chain would be
worth the trouble of getting up and picking the
daisies, when suddenly a White Rabbit with pink
eyes ran close by her . . .
The following links contain the audio file before and after watermarking with
the secret message using the proposed scheme.
Cover audio: shorturl.at/dfuTV
58
Watermarked audio: shorturl.at/etB39
After the hearing test, it is evident that there are no audible differences between
both files, which shows the effectiveness of the proposed scheme.
The waveform plots for the left and right channels for the cover audio file and
the steganograph-ied audio file are shown in Fig. 3.3 and Fig. 3.4. Through ob-
serving the plots with the human eye, it can be deduced that they are identical
with no observable differences showing how the proposed scheme’s steganographic
ability is superior.
Figure 3.3: Waveform plot of the left and right channels of the cover audio file.
Figure 3.4: Waveform plot of the left and right channels of the watermarked audio
file.
59
Table 3.1: MSE and PSNR values employing the proposed algorithm.
MSE PSNR
Proposed scheme 0.0453658 101.011
Table 3.1 shows the values of MSE and PSNR obtained from (2.1) and (2.2)
respectively. The values show very good performance of the proposed scheme with
a low value for MSE and a high value for PSNR.
Table 3.2 compares the intensity properties of the cover audio and the steganograph-
ied audio files where the results are identical for both files across the 3 metrics.
Table 3.3 compares the time–domain properties of the cover audio and the
steganograph-ied audio files where the results are identical for both files across the
metrics except for the ZC value.
Table 3.4 compares the frequency–domain properties of the cover audio and
the steganograph-ied audio files where the results are near-identical for both files
across all 7 metrics.
Table 3.5 compares the basic histogram properties of the cover audio and the
steganograph-ied audio files. The results are identical for both files, except for the
60
Table 3.3: Time-domain properties.
File CF PAPR TC ZC
Cover 9.28116 86.1399 0.635737 98502.0
Stego 9.28116 86.1399 0.635737 98514.0
mean value.
Table 3.6 compares between the information entropy of the cover audio and the
steganograph-ied audio files. The cover audio file shows a slightly lower entropy
value since it contains less information than its watermarked version.
61
Table 3.5: Basic histogram properties.
File Max Min Mean Median SD
Cover 0.571701 -0.529785 -0.0000308313 0.0 0.061598
Stego 0.571701 -0.529785 -0.000030731 0.0 0.061598
Overall, the scheme proposed in this thesis exhibits very good performance.
62
Table 3.7: A comparison between the PSNR value obtained from the proposed
scheme with the PSNR values obtained from other schemes in the literature.
3.4 Conclusions
In this chapter, I have proposed and discussed the mechanism I used to address
the business problem of this thesis: copyright protection of audio files. I have used
a 1-minute .wav classic music genre audio file with a sampling rate of 44.1 kHz as
my cover audio for a double layer message security scheme with cryptography in
the first layer and steganography in the second layer. The secret message, acting
as the watermark, that was embedded in the cover audio was the first 30,000 char-
acters from the novel Alice in W onderland. The results have shown almost no
differences between the cover audio file and the steganograph-ied audio file proving
the success and effectiveness of my proposed scheme.
63
Chapter 4
Conclusions
64
is very good with barely no differences between the audio files before and after
watermarking. Furthermore, I have compared my results with those of similar
schemes from the literature and found them to be very comparable.
65
References
[1] I. T. Hardy, “Criminal copyright infringement,” Wm. & Mary Bill Rts. J.,
vol. 11, p. 305, 2002.
[5] R. K. Sinha and N. Mandel, “Preventing digital music piracy: The carrot or
the stick?” Journal of Marketing, vol. 72, no. 1, pp. 1–15, 2008.
[6] C. Koester, “Combating music piracy: The recording industry’s legal pursuit
of online copyright infringers,” 2009.
[7] S. E. Siwek, The true cost of sound recording piracy to the us economy, 2007.
[8] B. Fries and M. Fries, Digital audio essentials. ” O’Reilly Media, Inc.”, 2005.
66
[10] Y. Lin and W. H. Abdulla, “Audio watermarking for copyright protection,”
University of Auckland, Auckland, New Zealand, Tech. Rep, 2007.
[14] R. Patel and P. Bhatt, “A review paper on digital watermarking and its
techniques,” International Journal of Computer Applications, vol. 110, no. 1,
pp. 10–13, 2015.
67
view,” Journal of Ambient Intelligence and Humanized Computing, pp. 1–9,
2019.
[21] ——, “Analysis of risks and attacks on digital audio watermarks,” Journal
of New Music Research, vol. 34, no. 2, pp. 197–208, 2005.
[23] M Agbaje and A. Adebayo, “Robustness and security issues in digital au-
dio watermarking,” International Journal of Engineering and Information
Systems (IJEAIS), vol. 1, pp. 1–10, 2017.
68
[26] F. Hemeida, W. Alexan, and S. Mamdouh, “A comparative study of audio
steganography schemes,” International Journal of Computing and Digital
Systems, vol. 10, pp. 555–562, 2021.
[28] P. K. Dhar and J.-M. Kim, “Digital watermarking scheme based on fast
fourier transformation for audio copyright protection,” International Journal
of Security and Its Applications, vol. 5, no. 2, pp. 33–48, 2011.
[29] X. Wen, X. Ding, J. Li, L. Gao, and H. Sun, “An audio watermarking al-
gorithm based on fast fourier transform,” in 2009 International Conference
on Information Management, Innovation Management and Industrial Engi-
neering, IEEE, vol. 1, 2009, pp. 363–366.
[30] P. K. Dhar, M. I. Khan, and J Kim, “A new audio watermarking system using
discrete fourier transform for copyright protection,” International journal of
computer science and network security, vol. 10, no. 6, pp. 35–40, 2010.
69
[33] I. Hussain, “A novel approach of audio watermarking based on s-box trans-
formation,” Mathematical and Computer Modelling, vol. 57, no. 3-4, pp. 963–
969, 2013.
[35] A. Chadha and N. Satam, “An efficient method for image and audio steganog-
raphy using least significant bit (lsb) substitution,” arXiv preprint arXiv:1311.1083,
2013.
[38] M. T. Elkandoz and W. Alexan, “Logistic tan map based audio steganogra-
phy,” in 2019 international conference on electrical and computing technolo-
gies and applications (ICECTA), IEEE, 2019, pp. 1–5.
70
[40] K. Khaldi and A.-O. Boudraa, “Audio watermarking via emd,” IEEE trans-
actions on audio, speech, and language processing, vol. 21, no. 3, pp. 675–
680, 2012.
[43] V. Bhat, I. Sengupta, and A. Das, “An adaptive audio watermarking based
on the singular value decomposition in the wavelet domain,” Digital Signal
Processing, vol. 20, no. 6, pp. 1547–1558, 2010.
[44] B. Y. Lei, Y. Soon, and Z. Li, “Blind and robust audio watermarking scheme
based on svd–dct,” Signal Processing, vol. 91, no. 8, pp. 1973–1984, 2011.
[46] J. Mishra, M. Patil, and J. Chitode, “An effective audio watermarking using
dwt-svd,” International Journal of Computer Applications, vol. 70, no. 8,
2013.
[47] X.-Y. Wang and H. Zhao, “A novel synchronization invariant audio wa-
termarking scheme based on dwt and dct,” IEEE Transactions on signal
processing, vol. 54, no. 12, pp. 4835–4840, 2006.
71
[48] A. Elshazly, M. Fouad, and M. Nasr, “Secure and robust high quality dwt do-
main audio watermarking algorithm with binary image,” in 2012 seventh in-
ternational conference on computer engineering & systems (ICCES), IEEE,
2012, pp. 207–212.
[49] H. Yassine, B. Bachir, and K. Aziz, “A secure and high robust audio water-
marking system for copyright protection,” International journal of computer
applications, vol. 53, no. 17, 2012.
[51] B. Lei, Y. Soon, F. Zhou, Z. Li, and H. Lei, “A robust audio watermarking
scheme based on lifting wavelet transform and singular value decomposition,”
Signal processing, vol. 92, no. 9, pp. 1985–2001, 2012.
[52] M. Fan and H. Wang, “Chaos-based discrete fractional sine transform do-
main audio watermarking scheme,” Computers & Electrical Engineering,
vol. 35, no. 3, pp. 506–516, 2009.
[54] H.-T. Hu and L.-Y. Hsu, “Robust, transparent and high-capacity audio wa-
termarking in dct domain,” Signal Processing, vol. 109, pp. 226–235, 2015.
72