Lecture 3
Lecture 3
Lecture 3
STORAGE OF BIOLOGICAL SEQUENCE
INFORMATION
• We know that sequence of DNA contain A,C,T&G nucleotides and
sequence of RNA contains A,C,U&G while sequence of protein contain
A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y&P these are actually 20 different
amino acids in nature which compose a protein.
• When both DNA and RNA or mRNA are sequenced in lab their
sequences contains larger number of nucleotides with variety
• And when we talk about protein its sequences contain large number
of bases as they are complex in nature.
Biological Databases
• This large number of sequence or bases cannot be stored in a single
computer that’s why solution lies in public sequence data bases for
DNA & RNA the public database is GenBank (by NCBI).
• For proteins the public database is UniProt (by Uniprot Consortium)
• Both GenBank and UniProt are online database and the DNA, RNA
and Protein sequences are available here online for public and
researchers.
Biological Databases
• GenBank is online database where researcher can get access to the
sequences of DNA, RNA and proteins.
• To find any sequence we go online to NCBI GenBank website which is
Public database site. Which is;
• www.ncbi.nlm.nih.gov/genbank
• And for example we want to find the sequence for
Mycobacterium tuberculosis (a species of pathogenic bacteria and
the causative agent of tuberculosis (tb).
Biological Databases
• Sequences can be searched from GenBank by typing;
• Sequence name
• ID
• Name
• Species
• Locus
• Accession Number
• Author
• Journal
Biological Databases
• UniProt is public database which is being used to search the sequence
of proteins.
• www.Uniprot.org
• For example we want to search a sequence of a protein which is
Human insulin which plays an important role in managing blood sugar.
We have to go online to the website www.Uniprot.org