Next Generation Big Data Storage
Next Generation Big Data Storage
Abstract- In today’s world rate of information produced nucleotides that consist of one of a five carbon
is higher than the rate of storage. It is about few zetta sugar, four nitrogen bases and phosphate group. An
bytes. Efficient storage of this Big data is the current immense amount of information is stored in DNA to
issue. The agenda of this paper is one of the efficient way
of storing Big data in DNA. An organic material DNA is utilize a significant number of combinatorial issues,
a unique factor for identification of living organism DNA registering methodologies are utilized. It can be a
which is not redundant and tiny to look. Synchronizing possibility that a few gram of DNA have the
this feature we store the Big data, by turning 0’s and 1’s
in digital data to four main blocks of DNA namely As, possibility of storing all data in the world which is 1
Gs ,Cs ,Ts sequences. It also provides the efficient gram of DNA has about 1021DNA bases. Also, this
method for DNA computing among all the various
methods. DNA can be kept in dry cold and dark conditions. As it
comes for storage problem, there are a lot of reasons
Keywords- Storage Based DNA, Coding theory, Encoding, to use DNA due to its ubiquity and its very small size. In
DNA cloud.
their initial work, Lipton, Adelman [1] and several
other researchers suggested that DNA based
I. INTRODUCTION approaches can be used to solve such problems as: SAT
problem and salesman traveling problems. In 1994,
Our information network is being loomed due to the according to Adelman it was indicated to store
preser- vation problem of the data which is stored and information and did a few calculations similarly, the
retrieved and is inevitable. The current annual data DNA might be used. And also it is stated by Adleman in
creation rate is 16.3ZB. The shock value of the recent 1994 that DNA used four bases with each input
prediction by research group International Data parameter[1]. In 1995, Lipton also described how all
Corporation (IDC) that the world will be creating 163 could be allowed of qualities to a SAT issue might be
zettabytes of data a year by 2025 and a zettabyte is one depicted in graph, and then can be translated into DNA
trillion.gigabytes. As the data was stored into DVDs and in the same salesman traveling example. A few
CDs, now it has been shifted to portable hard drives and researches from NEC showed some results after using
USB flash drives. Yet all of these techniques are of no computing of molecular for solving NP-complete
use when dealing with the rapid growth of data problem. problem. The researcher’s team and Adelman executed a
Moreover, the environment can be polluted with graph from six vertexes and every node connecting to
silicon and the other non-biodegradable materials which other individually. Then they employed a method
are limited in resources and would exhaust one day. Till called “thermal cycling” which generated all possible
2015, the projected data demand can be raised to 8000 paths between the nodes. Adelman and other
exabyte as the maximum storage density on these researchers in 2001 solved the largest problem using
devices is 1 terabyte per square inch. For archival DNA computer with a 20-variable
purposes libraries, corporations and file sharing systems 3-SAT problem [3].In 1995, Lipton proposed solving
are in favor of shifting to newer technologies as the SAT problem that general hunts against 1 million
current storage technologies are not capable to handle it possibilities utilizing a comparative encoding plan of
efficiently. one were performed by the team if researchers. Clell,
Risca and Bancroft in 1999, developed the idea of
II. RELAT ED WORK storing data in DNA based on encoding data in DNA
strands. In order to encrypt the information they used
Any organism which is made of two stranded spiral of DNA nucleotide polymer, PCR and the key. Then, one
nucleotides has cells called Deoxyribonucleic Acid needs to repeat the sequence of DNA for a hundred
(DNA) cells. Four polymers which are Adenine, times in order to have a strong background to store the
Cytosine, Guanine and Thymine make up these data
speed of reproduction rate. Like digital arrangement they
used oligonucleotide sequences as 1’s and 0’s to make
the ASCII scheme that can present the text in silicon
devices. Out of 10 billion sequences they can found, they
should carefully choose the safe sequences which cannot
harm of the encoding safety. After this step, a restriction
enzyme is used by the
d. Reverse complement next 100 nucleotides. [1] L M Adleman, Adleman, and Leonard M.
e. These 100 nucleotides will now be data. Molecular computation of solutions to combinatorial
problems. Science 266, 266(5187):1021–1024,
f. The last nucleotide can be used for 1994.
confirmation of the type of segment.
6. If 1st nucleotide is T then: [2] Masanori. Arita. Writing information into DNA.
a. Remove 1st nucleotide. In Aspects of Molecular Computing, 2004.
b. Next 4 nucleotides will tell us about segment
number. [3] Carter Bancroft,Timothy Bowler,Brian Bloom and
Catherine Taylor, Clell and Long –term storage of
c. Reverse compliment next 100 nucleotides. Information in DNA.
d. These 100 nucleotides will now be data.
[4] Church, George M., Yuan Gao, and Sriram Kosuri.
e. The last nucleotide can be used for Next-generation digital information storage in DNA.
confirmation of the type of segment. Science 337, (6102):1628–1628,2012.
7. If TTTT sequence is found, this will denote the end of
the file. The new character will start from next nucleotide. [5] Gibson, Daniel G., John I. Glass, Carole Lartigue,
Vladimir N. Noskov, Ray-Yuan Chuang, Mikkel A.
8. Now by using the same Huffman tree, data can convert
Algire, and Gwynedd A. Benders et Al. creation
the data into original characters.
of a bacterial cell controlled by achemically
It is possible to generate different Huffman tree for
synthesized genome science.
different files or single Huffman tree for whole data. This
will compress the data and decoding cannot be done
[6] Jack Parker, Computing with DNA. EMBO
unless one has the original tree. As specific orientation
Reports ,2003.
nucleotides have been used in the strands, it is possible to
read double number segments in the same number of
indexes. The user can read the strand from any direction. [7] Shrivastava Siddant Badlani and Roshan.Data
Storage in DNA, International Journal of
Electrical Energy.
-
[8] Huffman David A and David A Huffman.
A method for the construction of minimum
Redundancy codes.