0% found this document useful (0 votes)
25 views10 pages

Security Analysis of DNA Based Steganography Techniques

Uploaded by

23pg1by0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views10 pages

Security Analysis of DNA Based Steganography Techniques

Uploaded by

23pg1by0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Review Paper

Security analysis of DNA based steganography techniques


Omnia Abdullah Al‑Harbi1 · Walaa Essa Alahmadi1 · Asia Othman Aljahdali1

Received: 2 October 2019 / Accepted: 20 December 2019 / Published online: 9 January 2020
© Springer Nature Switzerland AG 2020

Abstract
This study investigates the most recent data hiding techniques based on DNA steganography, including the highly
improved DNA-based steganography technique, the data hiding using double DNA sequences method, and the enhanced
DNA-based steganography technique. The strengths and weaknesses of these techniques are discussed. Additionally, the
security of these techniques is analyzed based on several security parameters that measure the quality of DNA steganog-
raphy with respect to many factors, including, but not limited to, cracking probability, blindness, modification rate and
expansion rate, and layers of security. The goal of the comparison between the investigated techniques is to highlight
the advantages and disadvantages of the existing data hiding algorithms and to motivate future research in this field.
Moreover, the paper evaluates the discussed techniques based on some parameters, including capacity, payload, and bit
per nucleotide (bpn). The result shows that the enhanced DNA-based steganography technique hides 2 bpn, whereas the
highly improved method can hide on average 1.46 bpn, which is higher than data hiding using double DNA sequences
method can hide .The paper also presents suggestions for how each technique can be optimized to to achieve a higher
security level for hiding data within DNA sequences.

Keywords Hiding data · Security · Steganography · DNA sequence

1 Introduction within cover mediums such as images, video, and DNA in


such a way that it becomes difficult to detect [5, 9]. Several
Cryptography and steganography are usually interrelated algorithms have been proposed in image steganography
and share the common aims and services of preserving for hiding secret information inside an image. However,
the confidentiality, integrity, and availability of informa- the embedding capacity of the image is low, so it can-
tion, which are some of the most significant fields in com- not hide a large data inside it [10]. In order to overcome
puter security [1–3]. Cryptography and steganography are the deficit of the capacity, DNA steganography has been
methods allowing information to be sent securely [4]. introduced. DNA steganography is a research direction of
Cryptography is an historical science that began in Egypt DNA cryptography, which started in 1999. This approach
around 1900 B.C. with hieroglyphic writing [5]. It uses uses DNA sequences as carriers to enable secure transfer
encryption to scramble the secret information in such a of the critical data [4, 11]. The principal idea is basically
way that only the sender and the intended receiver can to encrypt and conceal messages in a large number of
reveal it [6].On the other hand, steganography began in DNA strands to prevent adversaries from reading and
ancient Greece around 440 B.C. [7, 8]. It hides the secret deciphering the messages. This could be achieved only
information in different carriers in which the visibility of if the original sequences are preserved from adversaries
private information is made unavailable to unauthorized [3, 11, 12]. Hiding data in DNA sequences is a new and
users. This is done by concealing the sensitive information evolving scientific field. This paper intends to discuss

* Asia Othman Aljahdali, [email protected]; Omnia Abdullah Al‑Harbi, [email protected]; Walaa Essa Alahmadi,
[email protected] | 1College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia.

SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1

Vol.:(0123456789)
Review Paper SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1

three different techniques recently presented for hiding research in designing more reliable and secure data hid-
data in DNA sequences and to investigate their security ing techniques.
based on different factors. The paper suggests ways of
enhancing and improving each technique with respect 3.1 Highly improved DNA‑based steganography
to the security requirements. This paper is divided into techniques
sections, Sect. 2 defines a DNA sequence and its elements,
while Sect. 3 discusses in detail three different techniques Malathi in [18] modifies the insertion algorithm to
that are used to hide information within DNA sequences. decrease the cracking probability of the fake DNA
Sect. 4 analyzes each technique, clarifies their strengths sequence. The algorithm uses two different keys. The
and weaknesses, and compares them. The last section first key (K1 ) is a number in the range of 0 to 255, which
discusses each technique and proposed ideas for further is used to XOR the last character in the message (M);
improvements. the result will be XORed with the character preceding
the last one in the M, and so on. Accordingly, the first
key is used to encrypt the message. The second key
2 Elements of DNA (K2 ) is randomly generated and is used to divide the
DNA sequence into same-length segments. The result-
In biology, a deoxyribonucleic acid (DNA) is a huge mol- ing cipher characters are inserted as binary bits one by
ecule that exists within the cells of all living organisms, one at the beginning of each segment. Then, the binary
containing the genetic information that allows the func- sequence is converted into DNA bases using Table 1. The
tioning, reproduction, and evolution of these organisms second key is preferred to be a small number so that the
[13]. DNA has many small subunits called nucleotides. It DNA sequence has a minimum length while hiding the
is made up of four types of nucleotide bases: Adenine (A), secret message.
thymine (T), guanine (G), and cytosine (C) [14, 15]. The two The encryption process The proposed algorithm [18]
strands are held together by bonds between the bases; follows several steps to encrypt and hide messages
adenine binds to thymine, and cytosine binds to guanine inside a DNA sequence. The encryption process steps
[1, 14, 16]. Every three neighboring nucleotides make up are as follows:
a codon so we get 43 = 64 different possible codon com-
binations. In living organisms, the arrangement of these 1. Split M into characters, M = m1 , m2 , m3 , … , mn , and
combinations determines the structure and function of each character is converted into its 8-bit binary equiva-
the resultant protein [17]. DNA encoding techniques are lent based upon the ASCII standard.
binary coding schemes for the purpose of DNA computa- 2. Randomly generate a number between 0 and 255 to
tion. The most popular binary mapping of digital coding form K1, and then the key is converted into an 8-bit
is given in Table 1. binary sequence.
3. The last character in M is XORed with K1.
4. The result is XORed with the character preceding the
3 DNA data hiding techniques last one in M; the XORing is repeated until all the char-
acters are converted and stored in A.
Over the years, different algorithms have been proposed 5. The binary sequence A is converted into a protein
in hiding sensitive data within DNA sequences. In this sequence.
section, we will investigate and discuss the strengths 6. A sample DNA sequence S is selected randomly and
and weaknesses of the lately proposed DNA-based converted into a binary bit sequence using Table 1.
data hiding techniques. The analysis aims to help future 7. Generate a random number, which is preferred to be a
small number K2, and then divide the DNA sequence S
into segments; the segment length should be equal to
K2.
Table 1  DNA digital coding 6. Add the first binary value of A at the beginning of the
DNA Decimal Binary
[11] nucleo- first DNA binary segment, and insert the second binary
tide value of K1 into the second binary segment, and so on.
7. Concatenate all the binary sequences, and then con-
A 0 00
vert it to produce a fake DNA sequence using Table 1.
C 1 01
  An illustrative example is given in Fig. 1 showing the
G 3 10
encryption processes.
T 3 11

Vol:.(1234567890)
SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1 Review Paper

Fig. 1  Example of the encryp-


tion process

The decryption process The receiver must have knowl- 1


sequence is 1.63×10 . The probability of guessing the different
8
edge about K1 and K2 in order to decrypt the message. binary coding (A,C,G,T) combinations is 24 1
. The probability
Additionally, he/she must receive the original DNA from 1
of finding the message and reference DNA sequence is n−1 ,
the sender. The receiver performs the following steps:
where n is the number of bits in the fake DNA sequence. The
message and DNA are segmented using a random key; thus,
1. Convert the received fake DNA sequence into a binary
the probability of guessing the segmentation of the mes-
sequence using Table 1.
sage is 2m1−1 , where m is the number of bits in the secret
2. Divide the binary DNA sequence into segments; each
message. The probability of guessing segmentation of DNA
segment’s size will be equal to K2 + 1. 1
is 2s−1 , where s is the number of bits in the reference DNA
3. Get the first bit from each segment and concatenate
sequence. The XOR operation is performed for encoding the
them to produce significant bits B.
data inside the DNA sequence, and the probability of the
4. XOR the first 8 binary bits of B with K1, and then XOR
XOR combination is 218m [18].
the second 8 bits in B with the previous 8 bits of B, and
so on. Thus, the probability of finding the message hidden in
5. Convert the binary bits of the DNA sequence into ASCII the DNA sequence is:
text value [18]. 1 1 1 1 1 1
  An illustrative example is given in Fig. 2 showing the × × × × × (1)
1.63 × 108 24 (n − 1) (2m − 1) 2s−1 28m
decryption processes.

Probability of cracking The reference DNA size is about 163


million, thus, the probability of predicting the reference DNA

Fig. 2  Example of the decryp-


tion process

Vol.:(0123456789)
Review Paper SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1

3.2 Data hiding using double DNA sequences resulting number to DNA codons (e.g, 122 converts
techniques into CGG).
2. Generate the encrypted message DṔ by performing
Ibrahim, Abdalkader, and Moussa [19] proposed an algo- the bitwise XOR of the secret message DP, and the ref-
rithm that uses a double DNA sequences technique. The erence sequence S, and then delete the extra unused
main idea is to pick a random pair of DNA sequences from nucleotide (DṔ = DP XOR S).
the DNA database (S, Ś), which is a combination of two 3. Append the ID of S at the beginning of the resulting
DNA sequences. The proposed algorithm consists of two DṔ to get IDsDṔ.
phases. In the first phase, the secret message P is encoded 4. Read the second DNA sequence Ś and mark the sec-
into the DNA sequence DP, in which each letter is replaced ond repeated characters.
by three nucleotides. The first selected DNA sequence S 5. Replace the second repeated characters with
is used for the encryption of DP. In the second phase, encrypted message characters DṔ.
the other DNA sequence Ś is used to hide the encrypted 6. Hide the characters of encrypted message DṔ in Ś
secret data. The encryption and decryption processes are using the replacement rules in Table 2. We refer to
explained below. nucleotides in DṔ by {A(DṔ), C(DṔ), G(DṔ), T(DṔ)}.
The encryption process Two inputs are used in the Table 2 is used to hide DṔ in Ś by replacing the second
encryption process: the secret message P and the DNA repeated letter in Ś with one of the four letters {A, C, G,
sequence pair (S, Ś ). The encryption steps are as follows: T} according to the encrypted message.
7. Append the ID of Ś to the beginning of the result-
1. Encode the secret message P into DNA sequence ing S̈ = (IDś (IDsDṔ) ′ ) and send it to the receiver [19].
DP using Algorithm 1 to generate a total of 64 DNA An illustrative example is given in Fig. 3 showing the
codons. The NUM_FORMAT is a combination of three encryption processes.
digits, and the DNA (NUM_FORMAT) transfers the

Table 2  Hiding and Recovery A C G T


Msg sbs Msg sbs Msg sbs Msg sbs

A A A C A G A T
C C C A C T C G
G G G T G A G C
T T T G T C T A

Fig. 3  Example of the encryp-


tion process

Vol:.(1234567890)
SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1 Review Paper

Algorithm 1: Generating DNA Codons. between the letters of English alphabet (capital and small
1: for i = 0 to 3 do letters), the ten digital numbers, the two punctuation
2: for j = 0 to 3 do marks, and the 64 codons that are generated from Algo-
3: for k = 0 to 3 do
4: NUM FORMAT= i jk rithm 1 is p64
64
[19]. Thus, the probability of inferring the
5: codon=DNA(NUM FORMAT) secret message is:
6: end for
7: end for
8: end for
( )2
1 1 1
1.63 × 108
× × 64
244 p64 (2)
The decryption process The Dncryption process’s input
is a faked DNA sequence S̈ = ((IDś (IDsDṔ) ′ )) with a secret 3.3 Enhanced DNA‑based steganography technique
hidden message. The decryption process is as follows: with a higher hiding capacity
1. Extract the first bases that represent ID of Ś used by the Marwan, Shawish, and Nagaty [20] introduced this
sender to hide the data. approach to simplify the current techniques and obtain a
2. Find the second repeated nucleotide in Ś. higher hiding capacity. This technique follows two phases.
3. Extract the DṔ sequence from S̈ using the replacement The first phase is the encryption phase, which is a modi-
inverse rules in Table 2. fied version of the 5 × 5 Playfair cipher grid called the 4 × 4
4. Extract the first bases form DṔ that represent the ID of Playfair cipher grid. The result of this phase is an encrypted
S. message. The second phase is the hiding phase, which is a
5. Decrypt DṔ as follows: use the commutative property substitution process used for hiding the encrypted mes-
of XOR DṔ XOR S = (DP XOR S) XOR S = DP XOR (S XOR sage. The result of this phase is a fake DNA sequence. The
S) = DP. encryption and decryption processes are described below.
6. Decode DP to letters, with each group of three nucleo- The encryption process There are four inputs for the
tides representing a letter from the English alphabet. encryption process: a message, a key, initial values of
7. Get plaintext P [19]. the 4 × 4 binary grid, and initial values of the 4 × 4 DNA
  An illustrative example is given in Fig. 4 showing the grid. The 4 × 4 binary grid and DNA grid must be shared
decryption processes. between the sender and the receiver before the encryp-
tion and decryption processes. The encryption process
Probability of cracking The reference DNA size is about steps are as follows:
163 million, thus, the probability of predicting the refer-
1
ence DNA sequence is 1.63×108
. The probability of guessing 1. Generate 16 random English letters to create the 4 × 4
1 Playfair cipher grid using the given key input as a seed
the second selection Ś is 1.63×10 , where the reference DNA
8
value; an example of Playfair cipher grid is given in
sequence Ś is used to hide the secret message. There are Table 3.
244 possible situations for hiding based on Table 2 in the 2. Shuffle the initial values of the 4 × 4 binary grid and
hiding process; thus, the probability of an attacker making the 4 × 4 DNA grid using the key; an example of a
a successful guess is 2414 . The total number of permutations shuffled 4 × 4 binary and DNA grid is given in Tables 4
and 5, respectively.

Fig. 4  Example of the decryp-


tion process

Vol.:(0123456789)
Review Paper SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1

Fig. 5  Example of the encryp-


tion process

Table 3  Example of 16 The decryption process There are two inputs for the
randomly generated English H C M U
decryption process: the encrypted DNA sequence and
letters D G Z B
the key. The receiver will receive these inputs through a
I A X J
secure channel. The initial values of the 4 × 4 binary grid
Q V W F
and DNA grid should be shared before the encryption and
decryption processes.

Table 4  4 × 4 Shuffled binary 1. Use the reverse of the substitution process to extract
grid 0110 0001 1000 0000
the hidden encrypted DNA sequence.
1001 0101 1010 1110
2. Shuffle the initial values of the 4 × 4 binary grid and
0100 1111 1011 1101
4 × 4 DNA grid using the key.
3. Generate 16 random English letters to create the 4 × 4
Playfair cipher grid using the received key as a seed
Table 5  4 × 4 Shuffled DNA value.
grid GT CG CA TG
4. For each group of two letters of DNA sequence (Enc),
TA TC GG AA
map its position in a 4 × 4 DNA grid to its correspond-
AT TT CT AG
ing position in the 4 × 4 cipher grid and get the values.
GC GA CC AC
The outcome of this step is encrypted text (C).
5. Apply the inverse of the Playfair cipher technique to
the encrypted text (C) to get a sequence of English
3. Convert the input message into a binary sequence (B). letters (E).
4. Find all 4-bit values of B in the 4 × 4 binary grid, and 6. For each English letter in (E), map its positions in a
then map their positions to the corresponding posi- 4 × 4 cipher grid to its corresponding position in a
tions in a 4 × 4 cipher grid and fetch the English letter. 4 × 4 a binary grid and get the values. The outcome of
The result of this step is a sequence of English letters this step is a binary sequence (B).
(E). 7. Convert the binary sequence (B) into the original mes-
5. Apply the Playfair cipher technique to the sequence of sage [20].
English letters (E) to get the encrypted text (C).   An illustrative example is given in Fig. 6 showing the
6. For each letter of (C), map its position in a 4 × 4 Playfair decryption processes.
cipher grid to its corresponding position in the 4 × 4
DNA grid and get the values. The outcome of this step Probability of cracking In this technique, the attacker
is a DNA sequence (Enc). needs 4 types of information to decrypt a message,
7. Pick a reference DNA sequence from the database which are the binary representation, the reference DNA,
(DNA database). the complementary rule, and the ciphering technique.
8 Hide the encrypted DNA sequence in the chosen ref- Thus, the probability of getting the binary scheme b is
1
erence DNA sequence using the substitution process 4!
. Since we have 4 DNA bases, the number of possible
[20]. binary schemes is 4!. The probability of guessing the
  An illustrative example is given in Fig. 5 showing the 1
reference DNA r is 1.6×10 . The probability of the comple-
encryption processes. 8

1
mentary rule c is 16 . Thus, probability of cracking the k is

Vol:.(1234567890)
SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1 Review Paper

Fig. 6  Example of the decryp-


tion process

• The capacity, defined as the amount of data hidden


1
(3) within the carrier. it is necessary that a data hiding
24 × 1.6 × 108 × 16
technique maintains a sizable hiding capacity.
• Bits per nucleotides (bpn), it is the total number of bits
hidden per character [21–25]
4 Security analysis
Highly improved DNA based steganography techniques The
algorithm has many features distinguishing it from other
In this section, we will analyze the security of all the dis-
algorithms used in hiding data in DNA sequences. One of
cussed algorithms in the previous section and investi-
the most important advantages is that the probability of
gate whether they fulfill the security requirements sum-
knowing the data inside the DNA sequence by an adver-
marized by their factors.
sary is very small. In addition, this algorithm encrypts the
The quality of a steganography DNA technique
information before it is hidden into a DNA sequence, and
depends on many factors, including the following:
the encryption process is based on a randomly chosen
key (K1 ). Also, another random key (K2 ) is used in divid-
• Cracking probability of the algorithm; analyzing the
ing DNA sequence into segments. The ciphertext bits are
probability to ensure that a minimum cracking prob-
then placed between these segments, with the two keys
ability leads to a secure steganography technique.
kept secret, which increases the strength of this algo-
• Layer of security, which refers to the number of DNA
rithm. In contrast, other data hiding algorithms send the
sequences used in the data hiding technique; the algo-
message as an integrated series placed inside the DNA
rithm can have a single or a double hiding layer. A dou-
sequence, which is insecure. However, in this technique,
ble hiding layer technique is more secure than a single
after the encryption process of the message, the cipher-
one [4].
text is divided into binary bits, and then each bit is inserted
• Blindness, which means that the algorithm does not
between the segments of the DNA sequence. Thus, it is
require to send the original DNA sequence to the recip-
difficult to extract the ciphertext and distinguish it from a
ient. The security degree is maximized by minimizing
long series of DNA sequence. Although this technique has
the required data sent to the recipient.
many advantages, it has also many defects. The algorithm
• Modification rate and payload; a low modification rate
changes the length of the DNA sequence, and a human
and a payload equal to zero result in higher quality
genome consists of 3 × 109 pairs of nucleotides [11]. This is
DNA steganography.
because the algorithm does not remove DNA stretches to
• Encrypting the secret data into cipher before embed-
replace them with the encrypted message, but it directly
ding it into a DNA sequence is more secure than
inserts the ciphertext, thereby increasing the length of
embedding the original data in a DNA sequence.
the DNA sequence; thus, it would be easy for an adver-
• Preserving the original functionality of the reference
sary to figure out the fake DNA. Also, this technique does
DNA in such a way that its function of producing pro-
not take into account preserving the functionality of the
teins is not affected.
DNA sequence.
• The most essential factor in a data hiding techniques is
Data hiding using double DNA sequences techniques
the key that makes attacking data hiding system much
The technique is highly secure, for several reasons: the
more difficult.
secret data is encrypted before being embedded in the
DNA sequence. Moreover, it is a double-layer technique
that uses two different DNA sequences (S, Ś) to ensure the

Vol.:(0123456789)
Review Paper SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1

security. This technique also possesses very high data hid- The blindness feature is to maximize the security level by
ing capacity and preserves the reference DNA’s original reducing as much as possible the required data that are
function of producing proteins. The output of the encryp- transferred to the receiver.
tion process is a fake DNA sequence with a modification A comparison between the investigated techniques is
rate of approximately 28.4%, which is low [19]. The expan- given to highlight the advantages and disadvantages of
sion rate is equal to zero, which means that after embed- the existing data hiding algorithms and to provide moti-
ding the secret data, the length of the reference DNA vation for future research in this field. Table 6 presents the
sequence is not expanded. In fact, a low modification rate strengths and weaknesses of each previously explained
and zero expansion rate ensures the security and results technique.
in a better quality of the fake DNA sequence. Furthermore,
this technique is a blindness technique, meaning that
there is no need to send the original DNA to the receiver, 5 Experimental results
so the security degree is maximized. Finally, the probability
of cracking is low. On the other hand, this technique has The techniques were tested in [18–20] using eight real
some weaknesses. The replacement rules should be sent DNA sequences from the NCBI database [Ref:https://
to the receiver, and plain text must contain only capital www.ncbi.nlm.nih.gov/]. The experiment’s goal was
letters, small letters, 0, … , 9, a period, and a dot; it cannot to evaluate the discussed techniques based on some
contain other punctuation marks. Also, the algorithm does parameters, including capacity, payload, and bpn.
not use any type of key. As mentioned before, the capacity refers to the total
Enhanced DNA-based steganography technique with length of the extended reference sequence after the
a higher hiding capacity The security of this technique secret message is hidden within it, which can be cal-
is based on several elements. First, the secret data culated by |S| + |M|2
[18]. The payload is the remaining
is encrypted before being embedded into the DNA length of the new sequence after extracting out the
sequence. Moreover, the encryption and hiding processes reference DNA sequence, and can be calculated by |M| 2
of secret data are done by using the Playfair and substitu- [18]. The bpn is the number of bits hidden per character,
tion methods. Accordingly, the Playfair method provides which can be calculated by bpn = |M| C
[19], where |M| is
a higher hiding capacity and stronger security, besides the length of the secret message, C is the capacity, and
being a fast and simple method. The substitution method |S| is the length of the reference DNA sequence.
preserves the length of DNA sequence, so the payload is We will show and compare the experimental result
always zero. Furthermore, this technique uses a secret key, of the three techniques. Table 7 shows the performance
which grants a higher security level to the data hiding sys- of the data hiding using double DNA sequences, and
tem. Finally, preserving the reference DNA’s original func- the highly improved DNA-based steganography tech-
tion of producing proteins is a considerable asset of this niques for hiding a 20000-byte secret message in the
technique. On the other hand, the technique is not a blind- DNA sequence regarding the capacity, payload, and bits
ness technique. The sender and receiver must share some per nucleotide (bpn).
data before the encryption and decryption processes.

Table 6  Comparing the discussed techniques based on different quality factors


Quality factors Highly improved DNA based Data hiding using double DNA Enhanced DNA-based steganography
steganography sequences

Cracking probability Very low cracking probability Low cracking probability Low cracking probability
Security layer Double layer Double layer Double layer
Blindness Does not support blindness Support Blindness Does not support blindness
Modification rate Low Low Low
Payload Not equal to zero Always equal to zero Always equal to zero
Expansion rate Other DNA length Same DNA length Same DNA length
Encrypting the secret data Yes (XOR) Yes Yes
Preserving DNA functionality Changing DNA functionality Preserving DNA functionality Preserving DNA functionality
Using keys Uses two keys Doesn’t use a key Use a key
High capacity Yes Yes Yes
Easy to apply Easy to implement Not easy to implement Easy to implement
Number of used DNA sequences One Two One

Vol:.(1234567890)
SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1 Review Paper

Table 7  The capacity, payload, Sequence No. of nucleotides Highly improved DNA based Data hiding using double DNA
and bpn for each technique steganography techniques sequences techniques
Capacity Payload Bpn Capacity Payload Bpn

AC153526 200,117 280,117 80,000 1.52 200,117 0 0.577


AC166252 149,884 229,884 80,000 1.2 149,884 0 0.580
AC167221 204,841 284,841 80,000 1 204,841 0 0.563
AC168874 206,488 286,488 80,000 1.38 206,488 0 0.560
AC168897 200,203 280,203 80,000 1.49 200,203 0 0.565
AC168901 191,456 271,456 80,000 1.99 191,456 0 0.583
AC168907 194,226 274,226 80,000 1.6 194,226 0 0.580
AC168908 218,028 298,028 80,000 1.52 218,028 0 0.583
Average bpn 1.46 0.574

The enhanced DNA-based steganography technique this technique, an asymmetric encryption schema could
hides 2 bits per nucleotide; for example, a reference DNA be used and implemented by encrypting the message
sequence of 149,884 bp can hide a message up to 36.56 using any schema, and then starting the encryption
Kb [20], whereas the highly improved method can hide process of data hiding using the double DNA sequences
on average 1.46 which is higher than the 0.574 bpn that technique. This will increase the security degree and
the data hiding using double DNA sequences method eliminate one of its vulnerabilities.
can hide on average. As mentioned before, the enhanced DNA-based steg-
The data hiding using double DNA sequences method anography technique technique [20] has a weakness. The
preserves the length of the original DNA sequence (the initial values of the 4 × 4 binary grid and DNA grid must
payload is always 0), whereas the highly improved method be shared before the encryption and decryption pro-
and the enhanced DNA-based steganography method cesses, which means that this technique is not a blind-
increase the length of the reference DNA sequence. ness technique. To overcome this weakness, it is required
to minimize the shared data. Furthermore, using a public
key rather than a secret key would improve the algo-
6 Discussion rithm’s security.

The modified insertion technique [18] is used to hide


secret messages in DNA sequence. It merges the mes-
sage within protein sequences. However, to solve the 7 Conclusion
problem of increasing DNA length, we recommend con-
catenating the bits that arise from XORing the messages Hiding data in DNA sequences is a new science and an
with K1, and then adding the encoded text to the DNA evolving field. This paper investigates several DNA-based
series. After that, delete the same number of bits from steganography techniques that have recently been pro-
the original DNA sequence, but from the DNA series that posed and analyzes each technique separately by speci-
exist after the fake DNA. Additionally, this technique fying its advantages and disadvantages. A comparison
does not support blindness, which requires sending the between these techniques was carried out based on secu-
original DNA sequence to the recipient, which would rity and quality factors that are important for developing
minimize the security degree. To solve this problem, we efficient and secure DNA-based data hiding techniques.
suggest using a primer key (Ks ) , which determines the The analysis shows that all the reviewed techniques meet
position of embedding the ciphertext into the fake DNA. most of the security and quality requirements. Accord-
Thus, the technique now supports blindness and does ingly, all three techniques have a low cracking probabil-
not need to send the original DNA to the recipient. ity, a double layer of security, a low modification rate, and
Data hiding using double DNA sequences [19] is con- a high hiding capacity. On the other hand, the highly
sidered a highly secure technique, but like many tech- improved DNA-based steganography technique does
niques, it has some weaknesses. To improve the security not achieve the blindness and expansion rate, and does
and efficiency of any technique, we should focus on its not preserve the original functionality of the reference
vulnerabilities. One of the weaknesses in this technique DNA sequence. Moreover, the data hiding using double
is that it does not use a key. To provide more security to DNA sequences technique does not use a key, which is

Vol.:(0123456789)
Review Paper SN Applied Sciences (2020) 2:172 | https://doi.org/10.1007/s42452-019-1930-1

the most essential element in data hiding. Furthermore, 11. Clelland CT, Risca V, Bancroft C (1999) Hiding messages in DNA
the enhanced DNA-based steganography technique does microdots. Nature 399:533–534
12. Sharma A (2016) Security and information hiding based on
not possess the blindness property. The aim of the com- DNA Steganography. A Monthly J Comput Sci Inf Technol
parison in this study is to help in designing efficient and 5(3):827–832
secure DNA data hiding techniques; thus, the paper sug- 13. Ginu A, Jeenu J, Vishnu P, Jerin D (2017) DNA based cryptogra-
gests ways of enhancing and improving the investigated phy and steganography. GRD J Glob Res Dev J Eng 2:249–253
14. Kiss Gábor (2018) How to teach the history of cryptography and
techniques with respect to the security requirements. steganography. Educaţia Plus 20(2):13–23
15. Abbasy MR et al (2012) DNA base data hiding algorithm. Int J
New Comput Archit Appl 2(1):183–192
Compliance with ethical standards 16. Khalifa A (2013) LSBase: a key encapsulation scheme to improve
hybrid crypto-systems using DNA steganography. In: 2013 8th
Conflicts of interest The authors declare that they have no conflict international conference on computer engineering & systems
of interest. (ICCES). IEEE
17. Petsko Gregory A, Ringe Dagmar (2004) Protein structure and
function. New Science Press, Beijing
18. Pa Malathi, Ma Manoaj, Ra Manoj, Vaikunth R, Vinodhini R (2017)
References Highly improved DNA based steganography. Procedia Comput
Sci 115:651–659
1. Information Resources Management Association (2018) Cyber 19. Ibrahim Fatma E, Abdalkader HM, Moussa MI. Enhancing the
security and threats: concepts, methodologies, tools, and appli- security of data hiding using double DNA sequences. In: Indus-
cations. IGI Global try Academia collaboration conference (IAC)
2. Provos Niels, Honeyman Peter (2003) Hide and seek: an intro- 20. Marwan S, Shawish A, Nagaty K (2015) An enhanced DNA-based
duction to steganography. IEEE Secur Priv 1(3):32–44 steganography technique with a higher hiding capacity. Bioin-
3. Krishnan RB, Thandra PK, Sai Baba M (2017) An overview of text formatics 1:150–157
steganography. In: 2017 4th international conference on signal 21. S Sajisha K (2017) An encryption based on DNA cryptography
processing, communication and networking (ICSCN). IEEE and steganography. In: International conference on electronics,
4. Sokół B, Yarmolik VN (2005) Cryptography and steganography: communication and aerospace technology(ICECA)
teaching experience. Enhanced methods in computer security, 22. Jain S, Bhatnagar V (2014) Analogy of various DNA based secu-
biometric and artificial intelligence systems. Springer, Boston, rity algorithms using cryptography and steganography. In: 2014
pp 83–92 international conference on issues and challenges in intelligent
5. Siper A, Farley R, Lombardo C (2005) The rise of steganogra- computing techniques (ICICT). IEEE
phy. In: Proceedings of student/faculty research day, CSIS, Pace 23. Hamed G et al (2016) Comparative study for various DNA based
University steganography techniques with the essential conclusions about
6. Selvaraj D (2017) Development of a secure communication sys- the future research. In: 2016 11th international conference on
tem based on steganography for mobile devices. p 3 computer engineering & systems (ICCES). IEEE
7. Vinodhini RE, Malathi P, Gireesh Kumar T (2017) A survey on 24. Hamed G et al (2015) Hybrid technique for steganography-
DNA and image steganography. 2017 4th International Confer- based on DNA with n-bits binary coding rule. In: 2015 7th Inter-
ence on Advanced Computing and Communication Systems national conference of soft computing and pattern recognition
(ICACCS). IEEE (SoCPaR). IEEE
8. Kahn David (1996) The history of steganography, international 25. Dilovan Z, Habibollah H, Subhi z (2017) Security issues in DNA
workshop on information hiding. Springer, Berlin based on data hiding: a review . Int J Appl Eng Res ISSN
9. Petitcolas Fabien AP, Anderson Ross J, Kuhn Markus G (1999)
Information hiding—a survey. Proc IEEE 877:1062–1078 Publisher’s Note Springer Nature remains neutral with regard to
10. Malathi P, Gireeshkumar T (2016) Relating the embedding effi- jurisdictional claims in published maps and institutional affiliations.
ciency of LSB steganography techniques in spatial and trans-
form domains. Procedia Comput Sci 93:878–885

Vol:.(1234567890)

You might also like