0% found this document useful (0 votes)

12 views8 pages

DNA Cryptography

Uploaded by

Shahriar Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

DNA Cryptography

Uploaded by

Shahriar Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A Hybrid Encryption Technique based on DNA

Cryptography and Steganography

Shahriar Hassan1 , Md. Asif Muztaba1 , Md. Shohrab Hossain1 and Husnu S. Narman2
1
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Bangladesh
2
Department of Computer Sciences and Electrical Engineering, Marshall University, Huntington, WV, USA
Email: [email protected], [email protected], [email protected], [email protected]

Abstract—The importance of data and its transmission rate Most previous works focused on developing only the en-
are increasing as the world is moving towards online services cryption method [2], and others focused on developing only
every day. Thus, providing data security is becoming of utmost the data hiding method [3]–[8], thereby providing only a single
importance. This paper proposes a secure data encryption and
hiding method based on DNA cryptography and steganography. layer of protection. The proposed hybrid methods [2], [4], [9]
Our approach uses DNA for encryption and data hiding processes used DNA cryptography and steganography, which use the
due to its high capacity and simplicity in securing various kinds of Playfair cipher method and decrease capacity. The works in
data. Our proposed method has two phases. In the first phase, it [3], [9], [10] have expanded the reference DNA size, which
encrypts the data using DNA bases along with Huffman coding. will get the attention of intruders, and the works in [5], [6],
In the second phase, it hides the encrypted data into a DNA
sequence using a substitution algorithm. Our proposed method [9], [10] have not preserved the biological functionality of the
is blind and preserves biological functionality. The result shows a DNA.
decent cracking probability with comparatively better capacity. The works using the Playfair cipher method generate am-
Our proposed method has eliminated most limitations identified biguous bits and transfer them in the reference sequence
in the related works. Our proposed hybrid technique can provide because the bits are like a key to deciphering the text.
a double layer of security to sensitive data.
Index Terms—Keywords: DNA Cryptography, DNA Steganog- Hence that affects the capacity of the method. Our aim is
raphy, Hybrid Encryption, Huffman Coding. to develop a method that improves the data hiding capacity
while it is blind, which means no other information along
I. I NTRODUCTION with the reference sequence needs to be sent by preserving the
In this new era of information technology, the security biological functionality of DNA. Thus, our method addresses
and confidentiality of information are becoming crucial. The the limitations of the previous works.
need for confidential information transmission is increasing Our objective of this work is to propose a robust method
(such as online transactions). Therefore, we need a strong of data encryption by using DNA sequences so that data can
encryption model. For this purpose, researchers aim to find be transmitted securely without getting the attention of the
out a more robust system of data encryption. Moreover, some intruder. The contributions of this work are: (i) proposing a
information needs to be transferred by hiding the encrypted hybrid method for data encryption based on DNA cryptog-
data in some medium, such as images, audio, video, etc., raphy and steganography, (ii) performing security analysis of
to avoid intruders’ attention because of security concerns. our proposed hybrid model, and (iii) comparing the proposed
Therefore, the hybridization of encryption and data hiding is model with existing works to analyze the efficiency.
getting more research attention. In our proposed method, we used the DNA cryptography
Researchers are concentrating on Deoxyribonucleic Acid concept and Huffman coding for data encryption; and used
(DNA) to develop a more robust encryption model because DNA as a medium to hide encrypted data with the substitution
of its advantages such as ultra-high storage density, ultra- method. Results show that our method has a decent system
low energy consumption, and the potential of ultra-large-scale cracking probability and capacity. Moreover, the payload is
parallel computing to realize the cryptographic functions of zero for our method. The comparison result shows that our
information encryption and authentication [1]. Furthermore, method has overcome the limitations of the previous methods.
the DNA sequence is only comprised of four symbols, which Our proposed approach will help secure data transmission,
can be used to encrypt any data. The DNA sequence is especially in banking, e-commerce, authentication, and server-
interesting for data hiding. There are around 163 Million DNA client secure communication sector.
sequences available in the public database. Hence, using DNA The rest of the paper is organized as follows. In Section II,
sequence as a medium significantly lowers system cracking we have explained some of the terminologies used in this
probability and makes the system robust. paper and briefly discussed some existing works with their
Finding a specific algorithm to encrypt and hide the data in advantages and disadvantages. In Section III, the proposed
such a way that it does not get intruders’ attention is challeng- approach is explained along with its strength and limitations.
ing; because no extra information is sent (blind technique), and Section IV presents the security analysis of the proposed
it should have a decent cracking probability. method. In section V, we have compared our method with

978-1-6654-6316-4/22/$31.00 ©2022 IEEE 1

some related works and discussed the outcome. Section VI cover image to nucleotides. Then, nucleotides are converted
presents the implementation details and results. Finally, we to binary. Then, the XOR operation is performed between the
conclude the paper in Section VII. message and the cover image.
II. BACKGROUND AND L ITERATURE R EVIEW C. Related Works based on DNA Cryptography
A. Terminology Namdev et al. [2] proposed a method that does a significant
Security in data communication is required when message modification of the old approach of using DNA and Amino
transfer between sender and receiver is needed to be confiden- Acids based approach with Playfair Cipher by using the
tial. Cryptography is the process of achieving confidentiality same approach with a different encryption algorithm, i.e., a
in message transfer. Cryptography can be thought of as a Foursquare cipher to the core of the ciphering process. In this
process of secret writing in order to protect data or messages study, a binary form of data, such as plaintext messages or
from various intruders’ attacks. Secret writing is achieved images, is transformed into sequences of DNA nucleotides.
through the process of transforming a message called plaintext Subsequently, these nucleotides pass through a Foursquare
into cipher text using a cryptographic algorithm. Security is encryption process based on amino acid structure. The fun-
concerned with protecting messages or data while transmitting damental idea behind this encryption process is to enforce
over networks. DNA stands for Deoxyribonucleic Acid. It other conventional cryptographic algorithms that proved to be
contains biological information about every living being. A broken and to open the door for applying the DNA and Amino
DNA sequence is the sequence of Nucleotides. Nucleotides are Acids concepts to more conventional cryptographic algorithms
Adenine(A), Guanine(G), Cytosine(C) and Thymine(T). DNA to enhance their security features.
cryptography refers to converting plain text into a sequence of
D. Hybrid Methods based on DNA Cryptography and
nucleotides based on some specified rules. DNA can be used
Steganography
to hide data. Hiding data in some medium, like image, audio,
video, etc., is known as steganography. Hiding data in a DNA Mitras et al. [10] proposed a hybrid method based on
sequence is known as DNA steganography. the RSA algorithm and DNA encryption. They mapped the
message bits into a DNA sequence and Amino acid. Then
B. Related Works based on DNA Steganography they used the insertion method to hide the encrypted data into
Shiu et al. [3] proposed three methods of hiding messages an actual DNA sequence. The method is not blind. Taur et al.
based on DNA and considered them the main methods. The [9] proposed a hybrid method that uses a Playfair cipher based
first method is the insertion method which inserts a message bit on DNA and Amino acid followed by data hiding using the
in random places of a DNA reference sequence. It obviously insertion method. They used the 5*5 Playfair cipher method.
expands the real sequence. The second method is based on The method is blind and has a low cracking probability. Yadav
complementary rules, which detect the longest complementary et al. [11] proposed a hybrid technique that uses images to
pair in a DNA sequence. They insert message bits before hide a message and DNA to encrypt a message. They first
them, which also increases DNA size. The third one is based convert the message into DNA and from DNA to cipher text.
on the substitution method, where some DNA nucleotides are Then, they take a cover image and manipulate the pixel values
substituted based on secret message bits. Guo et al. [4] pro- according to cipher text. The used algorithm for hiding in the
posed a substitution data hiding technique using motif finding image is a well-known algorithm named KIMLA.
in DNA sequence. Repeated nucleotides in a DNA sequence
are known as motif. They [4] find those motifs and substitute E. Gap analysis
them with other nucleotides based on message bits. Yunus et There have been few works [3], [9], [10] which hide data
al. [5] also proposed a substitution method based on motif based on insertion method and expand the DNA sequence’s
finding in a DNA sequence. This method does not expand the size. Hence it might draw the attention of the intruder to the
DNA length but is not blind. Also, high modification may transmission. Few other works [4], [5], [7] used substitution
be required if the number of the motif is high in a DNA methods for data hiding, and there was no expansion of
sequence. Hamed et al. [6] also proposed a complementary DNA sequence. However, they did not use any encryption
rule-based steganography method. The complementary rule is method. Others [3], [8] did not used any encryption before
the rule that specifies the strand of DNA directly opposite steganography. Again [2] did not use steganography. Thus,
to a specified sequence. It does not expand the length of they provide a single layer of protection.
the sequence and is blind. However, it does not preserve the In some works, [8], [11] image has been used for data
biological function of the DNA. Mousa et al. [7] proposed hiding. Image resolution is changed when it is used as a cover
a hiding technique that preserves the biological functionality image, and it may get the attention of the intruder. Also, a long
of the DNA sequence using the reverse mapping method. message cannot be hidden in images of low resolutions. On
The method is based on the substitution method and does the other hand, DNA sequences can hide the long message,
not expand the length of the sequence. Vijaykumar et al. [8] and also cracking probability of DNA steganography is very
proposed a DNA steganography model for image encryption. low since there are 163 million DNA sequences available in
This method first converts the 3*3 matrix of pixels of the the public database [12].

2
In [6] biological functionality of the DNA sequence is not Encryption Decryption

preserved. Again [2], [4], [9] used the Playfair cipher method
Convert Text to ASCII Binary Convert ASCII Binary to Text
for encryption, generating ambiguity and ambiguity bits that
must be passed to the receiver for decryption. The ambiguous
Huffman Coding
bits often decrease the capacity of the algorithm. Convert ASCII to DNA Format Scheme Convert DNA Format to ASCII
Using 2 bit Binary encoding Using 2 bit Binary encoding
Encryption Key
F. Novelty of Our Work
The novelty of our approach is that our approach provides Encrypt Data Decrypt Data
Publicly Available
double-layer security incorporating encryption and steganogra- NCBI Database
phy techniques, and the technique is blind. It does not expand Apply LSBase Real DNA Apply LSBase Reverse
Sequence
the DNA size so that it does not get the intruder’s attention Steganography Method Steganography Method

and preserves the sequence’s biological functionality. It also

decreases cracking probability and increases hiding capacity Cipher Text Cipher Text
compared to the methods that use the Playfair cipher for data
encryption. In short, it eliminates all the disadvantages men- Sender End Receiver End
tioned above in the related works and incorporates advantages
into it. Fig. 1. Flowchart of the proposed method: encryption process is shown on
the left side and decryption process on the right side.
III. P ROPOSED A PPROACH
TABLE I
Our proposed method has two phases: D IGITAL DNA BASE CODING .
• In the first phase (data encryption phase), we encode the
DNA Base Binary Code
plain text message into ASCII binary and then encode
it with only DNA bases using 2-bit binary encoding. A 00
T 01
Then we apply Huffman coding scheme to further encode G 10
the encoded message with a variable length code for the C 11
bases.
• In the second phase (data hiding phase), we hide the
encoded message into an actual DNA sequence using the Step 2: Convert the PBIN into DNA Sequence M using 2-bit
3:1 LS Base method. Here we are using a modified 3:1 binary encoding.
LS Base method to hide both data and key, making it Step 3: Derive the variable length Huffman code for each
quite impossible to break. DNA Base, i.e., A, T, G, C.
Thus, our first contribution is to encode the plain text mes- Step 4: Convert the M into binary cipher text MBIN using
sage with DNA Cryptography and Huffman Coding scheme. variable length code from the Huffman scheme.
Moreover, our second contribution is innovatively hiding the Algorithm: Huffman Coding Step 1: Obtain the frequency
encoded message and key into actual DNA sequences. The of each DNA base (A, T, C, G) from the DNA Encoded String
whole process is shown in Fig. 1 and described in the M.
following subsections. Step 2: Sort the bases in ascending order based on frequen-
cies.
A. Phase I: Data Encryption Step 3: Take two minimum frequencies and add them.
The data encryption process starts with converting the Step 4: Make the resultant frequency as root and the
plain text message P containing letters, numbers, and special minimum frequencies as their left and right child.
characters into ASCII binary. Then we take each two binary Step 5: Repeat step 3-4 until a single tree is constructed.
digits from left to right and convert two bits into one DNA Step 6: Starting from the root, label the left child with 0
base according to the 2-bit binary encoding rule. Table I shows and the right child with 1.
the digital DNA base coding. In this way, we convert the plain Step 7: Obtain binary code for A, T, G, C.
text P into DNA bases M. Next; we calculate the frequencies The process is explained using a flowchart in Fig. 2. Assume
of the DNA bases in the encoded message. Moreover, based our text message is: ”hello”, which we want to send to our
on the frequency, we apply the Huffman coding rule to get a receiver securely. Hence, we want to encrypt it first with the
variable length code for each base. After that, we convert the above-mentioned way. Thus, we have P=hello. The ASCII
M into MBIN using that variable length code. The algorithms binary of P, PBIN=01101000 01100101 01101100 01101100
of this encryption method and the Huffman Coding scheme 01101111
are given below: We convert PBIN into M by substituting every two bits
Algorithm: Encryption Procedure with its corresponding DNA base. Thus, we get M= TGGA
Step 1: Convert the Plain text message P into ASCII Binary TGTT TGCA TGCA TGCC. Next, we try to get the variable
PBIN. length code for A, T, G, and C with Huffman coding. Here the

3
Plain Text, ASCII Binary of P,
20
P PBIN
ASCII Conversion 01101000 01100101 01101100 0 1
hello 01101100 01101111 13 7
0 1 T
Huffman code for DNA Conversion
(Based on Digital 7 6
DNA Bases DNA Based Coding)
0 1 G
A 000 3 4
C 001 TGGA TGTT TGCA TGCA
G 01 Huffman TGCC A C
T 1 Coding DNA form of P,
M Fig. 3. Huffman Code generation for DNA bases based on frequency as
described in Huffman Coding Algorithm.
Conversion of DNA with Huffman Coding

Plain Text, Cipher Text,

10101000 10111 101001000 P MBIN
101001000 101001001
Encryption 10101000 10111 101001000
Cipher Text, hello 101001000 101001001
MBIN

Fig. 2. Flowchart explaining the data encryption process with an example. DNA Sequence from NCBI, D
AATTCCAAAGAAACAGACTCTACAGC
CAGCGAAGGCATGGATTTGCTGGCTG
frequencies are A=3, T=7, G=6, and C=4. Hence, the sorted GGGCAAACAGGCAAAGAGAGAGCAA
Base MBIN Substitute
GCCTTCTTCTTCCATATC CTTTATATAG Value Value
order of base is A - C - G - T. From the sorted order, we ACTGCCAACTAAAGG A/G 0 A
construct a tree-like Fig. 3. We get variable length code for A/G 1 G
A, T, G, and C, which is shown in Table II. Next, we convert T/C 0 U
AACTCUAAGGAAACGGAUTCUACAGC Data T/C 1 T
M to MBIN according to Table II values. Therefore, we get CAGUGAGGGUATGGACTTGCTAGCUG Hiding Substitute 1st 3 & 5th Base
GAGCAAAUAGACAAAGAGCCTUACCG
MBIN=Cipher Text= 10101000 10111 101001000 101001000 CACTCCTCCCGTACCCCTTATAUAGACT
by sorted Bases from
Huffman Code
101001001. GCCAACCUAAAAGG
3:1 Substitution Rules
Final Cipher Text,
TABLE II DC
VARIABLE LENGTH CODE FOR DNA BASES .
Fig. 4. Flowchart explaining the data hiding process with an example.
DNA Base Huffman Code
A 000
C 001
G 01
to encode 1 and U to encode 0 from cipher text. We encode
T 1 this way until the length of the cipher text or the actual DNA
sequence is reached. We hide our key (the variable length code
from Huffman code) into the first 5 bases of the actual DNA
B. Phase II: Data Hiding sequence leaving the third base for cipher text encoding.
In this phase of our hybrid algorithm, we hide our cipher We get this opportunity because, in the case of the Huffman
text which is the encrypted string of our plain text, into an coding scheme, the variable length code is actually fixed, but it
actual DNA sequence. There are millions of natural DNA changes with the bases’ frequencies. That means the variable
sequences available in the public database. We can get our length codes can only be 000, 001, 01, and 1 every time.
DNA sequence from NCBI (National Center for Biotechnol- However, which base represents which one depends on the
ogy Information) database. Then, we hide the cipher text into frequency only. The least frequent base is encoded with 000,
that actual DNA sequence using the 3:1 LS (Least Significant) then 001, then 01, and the most frequent one in 1. Thus, if
base method. However, we have modified the method to we send only the sorted order of that bases, the code can be
increase capacity and security. The process is straightforward. obtained. We substitute the 1st, 2nd, 4th and 5th base with the
First, from the left, we select each base from the actual DNA sorted list of the bases based on the frequency. It also gives
sequence placed into positions of multiple of 3, i.e., 3, 6,9, us another benefit in security which we will discuss in the
12, 15, and so on. Moreover, we substitute them with another security analysis section. Therefore, in this way, we get our
base based on the binary value of our cipher text from left cipher text with the key hidden in our actual DNA sequence.
to right. As 1 of the 3 bases in the actual DNA sequence The process is explained using a flowchart in Fig. 4.
contains cipher text, and that is the least significant among From the previous example, we saw that the length
that 3. Hence it is called the 3:1 LS base method. If the base of the cipher text MBIN is 40 bits. To hide it in
is a Purine base (A or G), then we substitute that with A to a DNA sequence, the length of that sequence needs
encode 0 and G to encode 1 from the cipher text. If the base to be at least 120bp. Let the DNA Sequence be
is Pyrimidine base (T or C), then we substitute that with C D=AATTCCAAAGAAACAGACTCTACAGCCAGCGAAGG

4
CATGGATTTGCTGGCTGGGGCAAACAGGCAAAGAGA- E. Limitations of Our Approach
GAGCAAGCCTTCTTCTTCCATATC CTTTATATAGACT- Data redundancy is the main drawback of our approach. As
GCCAACTAAAGG. We check the 3rd base, which is T, we use the 3:1 LS base method, we need to take DNA of
and the 1st bit of our cipher text is 1. Thus, we substitute it length 3 times longer than the cipher we got to hide. Still, the
with C. Then, we go to 6th base, which is C. The 2nd bit processing steps for hiding remain within the length of the
of cipher text is 0. Thus, we substitute it with U. And that cipher.
way; it goes on. After all the cipher text gets hidden, we
hide the key. The sorted list of the bases from the previous F. Cost Benefit Analysis
example was A - C - G - T. Hence, we substitute 1st base Though our model introduces redundancy, it makes the
of the DNA sequence with A, 2nd one with C, 4th one data sending highly secure. The machines today contain high
with G and 5th one with T. The final DNA Sequence is processing power, and the internet connections are high speed.
DC=AACTCUAAGGAAACGGAUTCUACAGCCAGUGA Hence, data redundancy is not the primary problem. From the
GGGUATGGACTTGCTAGCUGGAGCAAAUA- security analysis below, we will see that the system cracking
GACAAAGAGCCTUACCGCACTCCTCCCGTAC- probability is very low; thus, it can be used to secure the
CCCTTATAUAGACTGCCAACCUAAAAGG transmission of highly secured data.
IV. S ECURITY A NALYSIS
C. Data Extraction - Receiver Side
An intruder needs to know vital information to get the
At the receiver end, the received message is just like an message back from the encrypted message we sent. They
actual DNA sequence which contains a hidden encrypted are: DNA reference, Encoding rule, and LSB substituted
message and a key to decrypt it. We need to do the opposite permutation. Analysis of the parameters is as follows.
procedure to get back the actual message from the received
message. First, we go through every 3 multiple bases and A. DNA Reference Sequence
check it. If that is A or U, then the cipher bit was 0. If To decode the information, the intruder needs to guess the
that is C or G, the cipher bit was 1. In that way, we first correct reference DNA so that he can analyze the changes in
extract the cipher text. Now from the cipher text, we match it to decode the message. This process is the toughest for our
which of the code is represented in places, i.e., 000, 001, model as there are around 163 million DNA sequences in the
01, 1. Then, according to the Huffman coding scheme, we public database. Again the first 6 bases of the sequence might
get the DNA encrypted message back. We get to know the be fully changed in our model. Therefore, the intruder needs
Huffman representation of the bases from first 5 bases. Next, to analyze the rest n-6 bases of a DNA sequence of length n
we convert the DNA encrypted message to binary using 2- to find the most related sequence. Therefore, the probability
bit binary encoding. That means we check each base of the of making a correct guess of DNA reference is:
DNA sequence and represent the digital code of that base.
1
In this way, we get the ASCII representation of our actual P (DN ARef ) = (1)
message. Now just convert that to the character. That is the 1.63 ∗ 108 ∗ (n − 6)
actual message we wanted to send securely. The whole process B. Binary Encoding Rule
is shown in Fig. 1. Let us assume that the intruder knows the number of
symbols used in the encoding process as it is a DNA sequence,
D. Strength of Our Approach so the number of symbols is 4. The Huffman code for the four
symbols can be 000, 001, 01, and 1. Each of the four bases
Our approach has several strong points described below: can get any of that code. The 2-bit binary encoding for DNA
1. This algorithm ensures three layers of protection against bases also creates 4 codes 00, 01, 10, and 11. Each of the
intruders. bases can have any of these codes. Thus, the probability of
a. Conversion of plain text to a DNA sequence. guessing the right code each time P(BER) is:
b. Encoding it again with variable length coding.
c. Hiding it into an actual DNA sequence. 1
P (BER) = (2)
2. The process of hiding the key or Huffman code into a 4! ∗ 4!
DNA sequence that we used made the fake DNA sequence
unique, and difficult to find the actual DNA sequence from C. The Least Significant Base Substitution Rule
the database for the intruder. LS Base method is applied by substituting pyrimidine base
3. On the receiver side decryption process is simpler and takes by ’U’ to encode the secret bit ’0’ or ’C’ to encode ’1’.
less effort, which will benefit this model in the server-client However, it is also can encode ’0’ by C and ’1’ by U, and
network as the client side machines are less powerful and the same for the Purine base. Briefly, the ’0’ secret bit can be
hence less work for it in this model. encoded by substituting the Pyrimidine base with ’U’or ’C’.
4. Though our model introduces data redundancy, it decreases If it is selected to be substituted by ’U’, then ’C’ will be used
cracking probability. to substitute the Pyrimidine base to encode ’1’. So the number

5
TABLE III
C OMPARISON BETWEEN RELATED WORKS .

Comparison Cri- P1: Enhanced Double P2: DNA Base Data En- P3: Proposed Steganogra- P4: A New Data Hiding P5: The Proposed Method
teria Layer Security using RSA cryption and Hiding us- phy Approach using DNA Scheme Based on DNA
over DNA based Data ing Playfair and Insertion Properties [6] Sequence [5]
Encryption [10] Techniques [9]
Secret Text Type Any Type of Data Any Type of Data Any Type of Data Binary Data Any Type of Data
Binary Coding 2-Bit Binary Coding Rule 2-Bit Binary Coding Rule 2-Bit Binary Coding Rule Binary Coding Rule Inde- 2-Bit Binary Coding Rule
Rule pendent
Encryption Type Symmetric Asymmetric Not Applicable Not Applicable Symmetric
Encryption Algo- Encrypting secret data by 5*5 Playfair cipher based No Encryption No Encryption DNA Based Huffman
rithm mapping it to DNA and on DNA and amino acids Coding Encryption
amino acids
Data Hiding Al- Insertion Insertion Complementary rules Substitution method us- Substitution method using
gorithm based hiding method, ing repeated nucleotides the least significant base
which is the rule that to hide the secret message of each codon in the DNA
specifies the strand of bits reference sequence
DNA directly opposite a
specified sequence
Blind/Not Blind Not Blind Blind Not Blind Not Blind Blind
System Cracking P (S) = 1/(1.63 ∗ 108 ∗ P (S) = 1/(1.63 ∗ 108 ∗ P (S) = 1/(1.63 ∗ 108 ∗ P (S) = 1/(1.63 ∗ 108 ∗ P (S) = 1/(1.63 ∗ 108 ∗
Probability (n − 1) ∗ 24 ∗ 2( m − 1) ∗ (n − 1) ∗ 24 ∗ 2( m − 1) ∗ (n − 1) ∗ 24 ∗ 24) (n − 1) ∗ 24 ∗ 6) (n − 6) ∗ 4! ∗ 4! ∗ 4)
2( s − 1)) 2( s − 1))
Security Level Double Layer Double Layer Single Layer Single Layer Double Layer
Modification High High Moderate High Low
Rate
Biological Func- Does not Preserve Does not preserve Does not preserve Does not preserve Preserves
tionality
Capacity High High Moderate Moderate Moderate

of possibilities is 2*1 guesses, and the same will be done for the data by converting it to DNA then amino acids form. P2
the Purine base. Thus, the probability of making a successful encrypts the secret data using DNA and amino acids Playfair
guess for the substituted nucleotides N is: cipher. P3 and P4 hide the original format of the data without
encryption; hence it increases the cracking probability and
1
P (N ) = (3) decreases processing overhead. Our proposed method uses
4 Huffman coding scheme-based encryption followed by DNA
Using the proposed method, the probability of an attacker encryption, providing extra protection against intruders.
making a correct guess or the system cracking probability P(S) The fifth parameter shows which data hiding algorithm is
is: used. P1 and P2 use the insertion method to hide the secret
1 message in the DNA sequence, increasing the DNA sequence’s
P (S) = (4) length. P3 hides the secret message using complementary
1.63 ∗ 108 ∗ (n − 6) ∗ 4! ∗ 4! ∗ 4
rules. P4 and P5 hide the secret message by substituting DNA
V. C OMPARATIVE S TUDY nucleotides based on the cipher text bits.
In this section, we have compared our proposed model with The sixth parameter shows us if the message can be
some of the recent DNA-based steganography algorithms, and retrieved without needing extra information other than the
the result is shown in Table IV-B. For the comparison, we have reference DNA sequence during data extraction. P2 and the
chosen some crucial parameters [13], [14] as shown in Table proposed scheme P5 are blind algorithms. The seventh pa-
IV-B. The first parameter of our consideration is the secret rameter is the cracking probability of each algorithm in the
text type. That shows us if an algorithm hides all data formats table. The eighth parameter shows the security level offered
comprising letters, symbols, or numbers. We can see that all by each algorithm. Our proposed method provides a double
the algorithms mentioned, excluding P4, support all types of layer of security as it encrypts the data before hiding it.
data. P4 supports only binary data. The second parameter is The ninth parameter shows us the modification rate. P1, P2,
the type of binary coding rule used in the conversion from and P4 have high modification rates. The modification rate for
the binary format of the message to DNA. All methods in P3 is moderate. Our proposed model has a low modification
the table use the 2-bit binary coding rule. The third parameter rate, as it only modifies the reference sequence for the length
shows the type of encryption used in every algorithm that we of the cipher text. The tenth parameter is the preservation of
mention in Table IV-B. Our proposed method uses symmetric Biological functionality. It is also crucial to avoid intruders’
key encryption. The fourth parameter shows if the method attention. We can see that only our method preserves the
encrypts the secret data before hiding it or not. P1 encrypts biological functionality of reference DNA. This is because

6
we substitute Purine bases with Purine bases and pyrimidine
bases with pyrimidine bases at the time of the steganographic
process. The eleventh parameter shows the capacity, and we
can see that only P1 and P2 have a high capacity, while our
method also gives moderate capacity. Although we consider
the method 3:1 LS base method, our method utilizes the
maximum capacity that can be given in this method.
After considering all the aspects, we found that we have a
decent cracking probability though it is not the best. P1 and
P1, and P2 show the best cracking probability. However, they
use the insertion method and hence increase the fake DNA
sequence length and may get into the eye of the intruder. Also,
P1 is not a blind method. Our method gives a double layer of
security, making it better than P3 and P4. Again our method is
the only one that preserves the biological functionality of the Fig. 5. The effect on encryption and decryption time based on the length of
reference DNA sequence having a low modification rate. We the reference DNA sequence. From left to right length of the DNA sequences
increased. It shows that the encryption and decryption time increase as the
can conclude that our proposed algorithm is decently strong length increases. Also, encryption time is more than decryption time.
compared to other algorithms represented here.
VI. E XPERIMENTAL R ESULT
and key into it. The second parameter is the ’Payload’. Payload
In this section, we have shown the performance of the pro- refers to the remaining length of the new DNA sequence after
posed algorithm based on some of the predefined parameters extracting the data from it. The third parameter is the ’bpn’.
that are used to evaluate the performance of an encryption BPN stands for a bit per nucleotide, which is the number of
algorithm in the literature. The proposed algorithm was tested bits hidden per nucleotide. It is the ratio of the total length
on Intel(R) Core (TM) i5-8300H CPU @ 2.30 GHz personal of the message and key bit to the capacity in bits. The last
computer with 8 GB RAM. The implementation is carried out two parameter shows the encryption and decryption time in
with Jupyter Notebook version 6.1.4. We have experimented seconds.
on a message kept in a file of size 5 kilobytes. The message
contains letters, symbols, and numbers. C. Summary of Findings

A. Used Dataset Table V displays the experimental results in terms of ca-

pacity, payload, and bpn parameters to evaluate the system’s
The eight real DNA sequences in Table IV were used performance. In the proposed algorithm, the capacity includes
and they are publicly available from NCBI database [10]. hiding the secret message and Huffman code (key) in the
In Table IV, the left-most column shows the locus of the sequence. Payload is zero, meaning that the length of the
DNA sequence, and the middle column shows the number fake DNA reference sequence is not expanded after hiding
of nucleotides in it. The right-most column shows the species the message bits within it, which avoids drawing attention to
definition for the locus. it. This is achieved by hiding the secret data by substituting
the nucleotides. Furthermore, bpn is within [2.5, 3.6], and
TABLE IV
S PECIFICATION OF EIGHT REAL DNA SEQUENCES USED IN OUR the proposed scheme has a sufficient embedding capacity
EXPERIMENT. distributed on both the message and Huffman code(key),
increasing the total number of nucleotides required for hiding
Locus Number of Nu- Species Definition the message bits only. Finally, the execution time to encrypt
cleotides(bp)
and hide 5KB data is calculated. Fig. 5 represents the relation
AC166252 149,884 Mus musculus 6 BAC RP23-100G10 found from Table V that the capacity and the execution time
AC168901 191,456 Bos taurus clone CH240-1851
AC168907 194,226 Bos taurus clone CH240-19517 are affected by the length of the DNA sequence used, i.e.
AC153526 200,117 Mus musculus 10 BAC RP23-383C2 the DNA sequence’s length is directly proportional to the
AC168897 200,203 Bos taurus clone CH240-190B15 execution time. As the DNA sequence’s length increases, its
AC167221 204,481 Mus musculus 10 BAC RP23-3P24
AC168874 206,488 Bos taurus clone CH240-209N9 hiding capacity increases, and consequently, the execution time
AC168908 218,028 Bos taurus clone CH240-195K23 and visa verse as shown in Fig. 5.

VII. C ONCLUSION
B. Performance Metrics In this paper, we have proposed a novel cryptographic
We have used some parameters that are commonly used in technique combining DNA cryptography and steganography.
evaluating the system’s performance [2-11]. The first param- The technique encrypts the data in its first stage and then hides
eter is ’Capacity’. Capacity refers to the total length of the the encrypted message into an actual DNA sequence. The
modified DNA sequence after hiding the encrypted message encryption method uses DNA bases to encrypt the message,

7
TABLE V
E XPERIMENTAL RESULTS .

Locus Capacity(bits) Payload bpn = (M+K)/C Encryption Time(Sec) Decryption Time (Sec)
AC166252 49965 0 3.6 0.049 0.038
AC168901 63822 0 2.8 0.063 0.048
AC168907 64746 0 2.8 0.063 0.048
AC153526 66709 0 2.7 0.065 0.050
AC168897 66738 0 2.7 0.065 0.050
AC167221 68284 0 2.6 0.067 0.052
AC168874 68833 0 2.6 0.068 0.053
AC168908 72680 0 2.5 0.071 0.055

followed by a variable length code generation and assignment [13] G. Hamed, M. Marey, S. A. El-Sayed, and M. F. Tolba, “Hybrid
for each DNA base using Huffman coding. The proposed technique for steganography-based on DNA with n-bits binary coding
rule,” in 7th International Conference of Soft Computing and Pattern
method is blind as it does not need to send the actual reference Recognition, Fukuoka, Japan, November 2015.
DNA sequence with the fake one. Also, it does not expand the [14] K. S. Sajisha and S. Mathew, “An encryption based on DNA cryptog-
actual DNA sequence while keeping its biological functional- raphy and steganography,” in International conference of Electronics,
Communication and Aerospace Technology (ICECA), Coimbatore, India,
ity. From our security analysis and comparison with a number April 2017, pp. 162–167.
of promising methods from different literature, we found that
our proposed method gives a decent level of security which is
quite impossible to break without having full knowledge of the
steps involved in particular encryption. The proposed method
can be modified in our future work to increase its data hiding
capabilities and security.

R EFERENCES

[1] Y. Niu, K. Zhao, X. Zhang, and G. Cui, “Review on DNA cryptography,”

in Bio-inspired Computing: Theories and Applications. Singapore:
Springer Singapore, 2020, pp. 134–148.
[2] S.Namdev and V. Gupta, “A DNA and amino-acids based implemen-
tation of four-square cipher,” Journal of Engineering Research and
Applications, vol. 6, pp. 90–96, January 2016.
[3] H. Shiu, K. Ng, J. Fang, R. lee, and C. Huang, “Data hiding methods
based upon DNA sequences,” Journal of Information Sciences: an
International Journal, vol. 180, pp. 2196–2208, June 2010.
[4] C. Guo, C. Change, and Z. Wang, “A new data hiding scheme based
on DNA sequence,” International Journal of Innovative Computing,
Information and Control, vol. 8, pp. 139–149, January 2014.
[5] Y. A. Yunus, S. Ab-Rahman, and J. Ibrahim, “Steganography: A review
of information security research and development in muslim world,”
American Journal of Engineering Research, vol. 11, pp. 122–128, 2013.
[6] H. Ghada, M. Mohammed, E. S., and T. Fahmy, DNA Based Steganog-
raphy: Survey and Analysis for Parameters Optimization. Springer
International Publishing, 2016, pp. 47–89.
[7] H. Mousa, K. Moustafa, W. Abdel-Wahed, and M. Hadhoud, “Data hid-
ing based on contrast mapping using DNA medium,” The International
Arab Journal of Information Technology, vol. 8, pp. 147–154, April
2011.
[8] P. Vijayakumar, V. Vijayalakshmi, and R. Rajashree, “Increased level
of security using DNA steganography,” Int. J. Advanced Intelligence
Paradigms, vol. 10, pp. 74–82, January 2018.
[9] H. L. J. Taur, H. Lin and C. Tao, “Data hiding in DNA sequences
based on table lookup substitution,” Journal of Innovative Computing,
Information and Control, vol. 8, pp. 6585–6598, October 2012.
[10] B. A. Mitras and A. K. Abo, “Proposed steganography approach using
DNA properties,” International Journal of Information Technology and
Business Management, vol. 14, pp. 96–102, June 2013.
[11] V. Yadav and I. Gupta, “A hybrid approach to metamorphic cryptog-
raphy using kimla and DNA concept,” Int. J. Computational Systems
Engineering, vol. 5, pp. 218–229, January 2019.
[12] R. E. Vinodhini, P. Malathi, and T. G. Kumar, “A survey on DNA and
image steganography,” in 4th International Conference on Advanced
Computing and Communication Systems (ICACCS), Coimbatore, India,
6-7 Jan, 2017, pp. 1–7.

File Encryption and Decryption Using Cryptanalysis ICCIDT2K23 302
No ratings yet
File Encryption and Decryption Using Cryptanalysis ICCIDT2K23 302
3 pages
Security Analysis of DNA Based Steganography Techniques
No ratings yet
Security Analysis of DNA Based Steganography Techniques
10 pages
Thesis On Dna Cryptography
100% (3)
Thesis On Dna Cryptography
7 pages
SouravChanda TermPaper
No ratings yet
SouravChanda TermPaper
20 pages
Dna Computing: Nandha Kishore R 2018272031
No ratings yet
Dna Computing: Nandha Kishore R 2018272031
15 pages
CNS Research Paper
No ratings yet
CNS Research Paper
15 pages
A Research On DNA and RSA Cryptography For Hybrid Encryption and
No ratings yet
A Research On DNA and RSA Cryptography For Hybrid Encryption and
5 pages
Peerj Cs 1847
No ratings yet
Peerj Cs 1847
33 pages
Dna Computings
No ratings yet
Dna Computings
25 pages
Secure Data Communication and Cryptography Based On DNA Based Message Encoding
No ratings yet
Secure Data Communication and Cryptography Based On DNA Based Message Encoding
6 pages
Message Transmission Based On DNA Cryptography: Review: Tausif Anwar, Dr. Sanchita Paul and Shailendra Kumar Singh
No ratings yet
Message Transmission Based On DNA Cryptography: Review: Tausif Anwar, Dr. Sanchita Paul and Shailendra Kumar Singh
8 pages
A Review of DNA Cryptography
No ratings yet
A Review of DNA Cryptography
12 pages
A Dynamic DNA For Key-Based Cryptography
No ratings yet
A Dynamic DNA For Key-Based Cryptography
5 pages
Vijayakumar 2016
No ratings yet
Vijayakumar 2016
22 pages
Thisisit
No ratings yet
Thisisit
27 pages
CNS Research Paper
No ratings yet
CNS Research Paper
15 pages
Hiding Message Into DNA Sequence Through DNA Coding and Chaotic Maps
No ratings yet
Hiding Message Into DNA Sequence Through DNA Coding and Chaotic Maps
7 pages
Hacked
No ratings yet
Hacked
12 pages
Datahiding-In - DNA1
No ratings yet
Datahiding-In - DNA1
4 pages
5548 ArticleText 8587 1 10 20200309
No ratings yet
5548 ArticleText 8587 1 10 20200309
6 pages
Enhanced Level of Security Using DNA ComputingTechnique With Hyperelliptic Curve Cryptography
No ratings yet
Enhanced Level of Security Using DNA ComputingTechnique With Hyperelliptic Curve Cryptography
5 pages
DNA Cryptography Based User Level Security For Clo
No ratings yet
DNA Cryptography Based User Level Security For Clo
8 pages
DNA Based Cryptography
No ratings yet
DNA Based Cryptography
7 pages
Application and Implementation of DES Algorithm Based on FPGA
From Everand
Application and Implementation of DES Algorithm Based on FPGA
madhav
No ratings yet
CS Paper
No ratings yet
CS Paper
5 pages
Comparative Study of Hybrid
No ratings yet
Comparative Study of Hybrid
12 pages
A DNA-based Data Hiding Technique With Low Modification Rates
No ratings yet
A DNA-based Data Hiding Technique With Low Modification Rates
13 pages
A New Hybrid Technique For Data Encryption
No ratings yet
A New Hybrid Technique For Data Encryption
1 page
Highly Secure DNA-based Audio Steganography: Shyamasree C M, Sheena Anees
No ratings yet
Highly Secure DNA-based Audio Steganography: Shyamasree C M, Sheena Anees
6 pages
(PAPER) A Study of Securing Healthcare Big Data Using DNA Encoding-Based ECC
No ratings yet
(PAPER) A Study of Securing Healthcare Big Data Using DNA Encoding-Based ECC
5 pages
Review From 2020 To Past
No ratings yet
Review From 2020 To Past
31 pages
Ijcnis V11 N7 3
No ratings yet
Ijcnis V11 N7 3
8 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
23 pages
Cryptography Basics for New Coders: A Practical Guide with Examples
From Everand
Cryptography Basics for New Coders: A Practical Guide with Examples
William E. Clark
No ratings yet
Dna Cryptography
No ratings yet
Dna Cryptography
1 page
A Novel Encryption Scheme Based On DNA Computing: M. Shyam, N. Kiran, V. Maheswaran
No ratings yet
A Novel Encryption Scheme Based On DNA Computing: M. Shyam, N. Kiran, V. Maheswaran
4 pages
Paper 217834
No ratings yet
Paper 217834
14 pages
International Journal of Engineering Issues - Vol 2015 - No 2 - Paper6
No ratings yet
International Journal of Engineering Issues - Vol 2015 - No 2 - Paper6
6 pages
Complexity Analysis
No ratings yet
Complexity Analysis
5 pages
Text Encryption Using DNA Stenography
No ratings yet
Text Encryption Using DNA Stenography
3 pages
R Eooucron: To Bacl Up Our Data, We Are Using Various Hard Drives and Big Data Centeis To Liar Vest The Important Data
No ratings yet
R Eooucron: To Bacl Up Our Data, We Are Using Various Hard Drives and Big Data Centeis To Liar Vest The Important Data
13 pages
Enterprise Security: A Data-Centric Approach to Securing the Enterprise
From Everand
Enterprise Security: A Data-Centric Approach to Securing the Enterprise
Aaron Woody
No ratings yet
Zef Reh 2020
No ratings yet
Zef Reh 2020
30 pages
DNA Steganography in Information Security
No ratings yet
DNA Steganography in Information Security
8 pages
Design and Implementation of A New DNA Based Stream Cipher Algorithm Using Python
No ratings yet
Design and Implementation of A New DNA Based Stream Cipher Algorithm Using Python
12 pages
Network Coding and Signcryption for Cloud Data Integrity
From Everand
Network Coding and Signcryption for Cloud Data Integrity
Noah Joan
No ratings yet
New Field of Cryptography: DNA Cryptography: Chinese Science Bulletin June 2006
No ratings yet
New Field of Cryptography: DNA Cryptography: Chinese Science Bulletin June 2006
9 pages
DNA Structue
No ratings yet
DNA Structue
74 pages
Paper 24
No ratings yet
Paper 24
12 pages
Combinationof Hidingand Encryptionfor Data Security
No ratings yet
Combinationof Hidingand Encryptionfor Data Security
15 pages
Article Springer
No ratings yet
Article Springer
23 pages
2 Inventive.2016.7830158
No ratings yet
2 Inventive.2016.7830158
8 pages
Design of DNA-based Advanced Encryption Standard (AES)
0% (1)
Design of DNA-based Advanced Encryption Standard (AES)
8 pages
Tcpdump in Depth: Definitive Reference for Developers and Engineers
From Everand
Tcpdump in Depth: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DNA Cryptography
No ratings yet
DNA Cryptography
17 pages
Securing Aggregate Queries For DNA Databases
No ratings yet
Securing Aggregate Queries For DNA Databases
7 pages
Datahiding-Security Issues in Dna
No ratings yet
Datahiding-Security Issues in Dna
15 pages
Wa0001.
No ratings yet
Wa0001.
8 pages
DNA Steganography: Hiding Undetectable Secret Messages Within The Single Nucleotide Polymorphisms of A Genome and Detecting Mutation-Induced Errors
No ratings yet
DNA Steganography: Hiding Undetectable Secret Messages Within The Single Nucleotide Polymorphisms of A Genome and Detecting Mutation-Induced Errors
9 pages
Datahiding-In - DNA2
No ratings yet
Datahiding-In - DNA2
12 pages
Project Report: "Shannon Fannon Coding"
No ratings yet
Project Report: "Shannon Fannon Coding"
8 pages
CS3401 Algorithm Unit3
No ratings yet
CS3401 Algorithm Unit3
11 pages
AOA Viva Question
No ratings yet
AOA Viva Question
8 pages
PDF Paper 199 Morse Code Translator Using The Arduino Platform
100% (1)
PDF Paper 199 Morse Code Translator Using The Arduino Platform
6 pages
Assignment 5 (Soln)
No ratings yet
Assignment 5 (Soln)
71 pages
EE 376A: Information Theory: Lecture Notes
No ratings yet
EE 376A: Information Theory: Lecture Notes
75 pages
C.M.S. College of Engineering: Part-A 1. 2. 3. 4
No ratings yet
C.M.S. College of Engineering: Part-A 1. 2. 3. 4
5 pages
DSA Unit - II Trees
No ratings yet
DSA Unit - II Trees
96 pages
Bec613a MMC Mod3
No ratings yet
Bec613a MMC Mod3
50 pages
Data Compression Home Assignment Questions
No ratings yet
Data Compression Home Assignment Questions
3 pages
Chapter Seven Multimedia Data Compression 1. Lossy and Lossless Compression
100% (1)
Chapter Seven Multimedia Data Compression 1. Lossy and Lossless Compression
34 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
12 pages
Lect34 Huffman Coding
No ratings yet
Lect34 Huffman Coding
13 pages
CSL 210: Data Structures With Applications: Module3: IIIT Nagpur
No ratings yet
CSL 210: Data Structures With Applications: Module3: IIIT Nagpur
11 pages
EC6018-Multimedia Compression and Communication
0% (2)
EC6018-Multimedia Compression and Communication
12 pages
Data Structure and Algorithms II Mid Solution
No ratings yet
Data Structure and Algorithms II Mid Solution
46 pages
Example: Data Compressor
No ratings yet
Example: Data Compressor
19 pages
Notes For EE 229A: Information and Coding Theory UC Berkeley Fall 2020
100% (1)
Notes For EE 229A: Information and Coding Theory UC Berkeley Fall 2020
70 pages
Bce613a-Mod 3
No ratings yet
Bce613a-Mod 3
22 pages
Information Theory Lecture Notes
100% (1)
Information Theory Lecture Notes
97 pages
Adaptive Huffman Coding
No ratings yet
Adaptive Huffman Coding
13 pages
a.k.t.u.-B.Tech. - CSE-4th Year - 19 - 05 - 2016
0% (2)
a.k.t.u.-B.Tech. - CSE-4th Year - 19 - 05 - 2016
32 pages
Unit9 - Huffman Code With Exercises Reported Speech
No ratings yet
Unit9 - Huffman Code With Exercises Reported Speech
14 pages
What Is Algorith (Autosaved) 12 Size
No ratings yet
What Is Algorith (Autosaved) 12 Size
14 pages
CS502 Finl Term Hndsout by Dream Team
No ratings yet
CS502 Finl Term Hndsout by Dream Team
88 pages
CH - 03 Huffman & Extended Huffman
No ratings yet
CH - 03 Huffman & Extended Huffman
10 pages
Assignment 1
No ratings yet
Assignment 1
14 pages
CS6301 - Analog and Digital Communication (ADC) PDF
No ratings yet
CS6301 - Analog and Digital Communication (ADC) PDF
122 pages
Dsa 1
No ratings yet
Dsa 1
1 page
Unit-4 Greedy Algorithms
No ratings yet
Unit-4 Greedy Algorithms
71 pages

DNA Cryptography

Uploaded by

DNA Cryptography

Uploaded by

A Hybrid Encryption Technique based on DNA

Cryptography and Steganography

978-1-6654-6316-4/22/$31.00 ©2022 IEEE 1

and preserves the sequence’s biological functionality. It also

Plain Text, Cipher Text,

A. Used Dataset Table V displays the experimental results in terms of ca-

[1] Y. Niu, K. Zhao, X. Zhang, and G. Cui, “Review on DNA cryptography,”

You might also like