0% found this document useful (0 votes)
23 views5 pages

Basics of Bioinformatics in Biological Research

Uploaded by

mecoxib546
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

Basics of Bioinformatics in Biological Research

Uploaded by

mecoxib546
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A. Kamble and R.

Khairkar (2016) Int J Appl Sci Biotechnol, Vol 4(4): 425-429


DOI: 10.3126/ijasbt.v4i4.16252

Mini Review

BASICS OF BIOINFORMATICS IN BIOLOGICAL RESEARCH


Ashwini Kamble1* and Rajesh Khairkar2
1
Department of Biochemistry, Mahatma Gandhi Institute of medical sciences, Sevagram, wardha. Maharashtra, India.
2
Department of CE/IT, NUVA College of Engineering, Kalmeshwar Nagpur, India

Corresponding author’s email: [email protected]

Abstract
The concept of laboratory rat is giving way to the computer mouse arose after the famous handshake between Clinton-Blair for the completion
of the human genome in April 2003.Bioinformatics is defined as the application of computational techniques to understand and organize the
information associated with biological macromolecules.
There is availability of large databases of genomic information which has enabled research efforts for discovering methods for diagnosis and
treatment of human diseases using DNA microarrays and proteomics experiments. But there are various problems while doing this like it’s
always challenging to develop proper and sophisticated analysis method which can properly use genomic data bases considering its and
heterogeneity of the data.
The main purpose of this first paper is to explore and explain Bioinformatics in a more scientific way, and try highlighting applications of
bioinformatics in the medical sector.

Keywords: Bioinformatics; Microarrays; Proteomics.

Introduction handshake between Clinton-Blair for the completion of the


Since the birth of the Bioinformatics in the 1980s the field human genome in April 2003 (Ouzounis, 2012). We can say
has been rapidly expanding, keeping pace with the in many countries wet lab experiments and use of
expansion of genome sequence data. Bioinformatics is a bioinformatics goes hand in hand in clinical and biological
field of conceptualizing biology in terms of molecules and researches (Daisuke and Troy, 2006).
applying informatics techniques for understanding as well Literature is replete with large databases of genomic
as organizing the information of these molecules. In other information which has enabled research efforts for
words it is defined as the application of computational discovering methods for diagnosis and treatment of human
techniques to understand and organize the information diseases using DNA microarrays and proteomics
associated with biological macromolecules (Pandey and experiments. But there are various problems while doing
Divyasheesh, 2016). this like it’s always challenging to develop proper and
Bioinformatics intends to use information technology for sophisticated analysis method which can properly use
biological purpose. This can be explained in simple terms genomic data bases considering its applications and
as life can be an information technology and the genes heterogeneity of the data.
which determine organism’s physiology is the most basic as The main purpose of this first paper is to explore and
digital storehouse of information. Traditionally, explain Bioinformatics in a more scientific way, and try
bioinformatics has had a structural orientation mainly highlighting applications of bioinformatics in the medical
because of its use in Rational Drug Design (RDD) and sector like oncological research.
Structure-based drug design (SBDD). SBDD and RDD both
use different computational methods for discovering new Basics of Bioinformatics Tools
compounds with good selectivity, efficacy and safety. Take The bioinformatics have numerous applications broadly
for example the concept of laboratory rat which is giving defined as,
way to the computer mouse arose after the famous

This paper can be downloaded online at http://ijasbt.org & http://nepjol.info/index.php/IJASBT 425


A. Kamble and R. Khairkar (2016) Int J Appl Sci Biotechnol, Vol 4(4): 425-429
1.
organization of data in such a way that it allows out under different environmental conditions, different
researchers to access existing information and to stages of the cell cycle and different cell types in multi-
submit new information they have produced, eg cellular organisms (Luscombe, 2001).
DNA Data Bank of Japan (National Institute of
Apart from this primary nucleotide database there are
Genetics),
2.
variety of other area in which databases have prepared like
developing most appropriate tools and resources
protein sequence databases, proteomic databases, Protein
for analysis of data (such as FASTA (Pearson and
structure, Protein model ,RNA databases, Carbohydrate
Lipman 1988) and PSI- BLAST (Altschul et al.,
structure ,Protein-protein and other molecular interactions,
1997) and
3.
Signal transduction pathway databases, Metabolic pathway
Using these tools to interpret the result in
and Protein Function, Gene expression databases (mostly
biological manner.
4.
Microarray data) etc.
So with the help of bioinformatics, anyone can
now perform global analyses of all the available b) Classification of databases
data for uncovering the common principles that Biological databases can be classified into primary,
apply across many systems and also can highlight secondary and composite databases.
novel features. Primary Database(s) are those which contain information
of the sequence or structure alone. For e.g. Gen bank and
Tools for Systemic Collection and Organisation of
DDBJ for genome sequence.
Biological Data
Biological databases are meant for this purpose. Biological Secondary Database(s) are those which contain derived
databases are libraries of life sciences information, information from primary databases. They contains
collected from scientific experiments, published literature, information like the conserved sequence, signature
high-throughput experiment technology, and computational sequence and active site residues of the protein families etc.
analysis (Attwood et al., 2011). some of the databases like SCOP developed at Cambridge
university, CATH developed at university college of
But for creating these database(s) it requires some raw
London, eMOTIF at standford etc. are created and hosted
materials.
by individual researchers at their individual laboratories.
a) Source of information for databases
Composite Database(s) includes variety of primary
Raw DNA sequences, protein sequences, macromolecular
database sources which obviates the need to search multiple
structures, genome sequences, and other whole genome data
resources. The NCBI i.e. national centre for Biotechnology
forms sources.
information, hosts nucleotides and protein databases in their
GenBank (R) is a place where comprehensive database that large high available arrays of computer servers, provide free
contains publicly available nucleotide sequences, nearly for access to the persons involved in research.
more than 240 000 named organisms, obtained primarily
Now we can discuss about some of the well-known
through submissions from individual laboratories and batch
databases.
submissions from large-scale sequencing projects (Benson
et al., 2013). 1. Nucleotide and Genome sequences
The GenBank (Benson, 2000) EMBL (Baker, 2000)and
This database is produced and maintained by the National
DDBJ (Okayama, 1998) databases contain DNA sequences
Center for Biotechnology Information (NCBI) as part of
for individual genes that encode protein and RNA products
the International Nucleotide Sequence Database
the biggest excitement currently lies with the availability of
Collaboration (INSDC).GenBank is part of the International
complete genome sequences for different organisms. They
Nucleotide Sequence Database Collaboration, which
have uniform data format (but not identical) and exchange
comprises the DNA DataBank of Japan (DDBJ), the
on daily basis.
European Molecular Biology Laboratory (EMBL), and
The composite protein sequence database, the Entrez
GenBank at NCBI. These three organizations exchange data
nucleotide database (Schuler, 1996)compiles sequence data
on a daily basis (Benson et al., 2013).
from these primary databases. they not only provide the raw
Scientific researchers have stressed on genome sequencing nucleotide sequence, but they store information in detail
and revealed that genomes consist of baseletters, ranging regarding all chromosomes in an organism, detailed views
from 1.6 million bases in Haemophilus influenza to 3 billion of single chromosomes marking coding and non-coding
in humans (Luscombe et al., 2001) scientist now have regions, list of completed genomes, and single genes.
reached to the stage where measurement of expression Adding to this at each level there are graphical
levels of almost every gene in a given cell on a whole- presentations, precomputed analyses and links to other
genome level is possible although public availability of such sections of Entrez (Luscombe et al., 2001)
data is still limited. Interestingly this measurement is carried

This paper can be downloaded online at http://ijasbt.org & http://nepjol.info/index.php/IJASBT 426


A. Kamble and R. Khairkar (2016) Int J Appl Sci Biotechnol, Vol 4(4): 425-429
Another database, COGs database classifies proteins the HIV protease database (Vondrasek,1997) for HIV-1,
encoded in 21 completed genomes on the basis of sequence HIV-2 and SIV protease structures and their complexes, and
similarity.21 ReLiBase for receptor-ligand complexes (Hendlich, 1998).
The essential function of these databases is to predict the
Table 1: Databases and their bioinformatics’ sources
function of proteins which are uncharacterized by their
Database Bioinformatics sources
homology to characterized proteins, in addition to identify
phylogenetic patterns of protein occurrence (Tatusov, Protein sequence SWISS-PROT
1997).
(primary) PIR-International
2. Protein sequence databases
Protein sequence databases are even categorized into Protein sequence OWL
primary, secondary and composite databases. (composite) NRDB
Primaryprotein sequence databases contain more than
300,000 protein sequences. SWISS-PROT (Bairoch and Protein sequence PROSITE
Apweiler 2000) and PIR International (Mc Garvey, 2000) (secondary)
PRINTS
acts as primary as well as secondary databases they acts as
repositories as well as describe the proteins’ functions, its Pfam
domain structure and post-translational modifications. Macromolecular Protein Data Bank (PDB)
Composite databases like OWL (Bleasby 1994) and the structures
Nucleic Acids Database
NRDB (Bleasby and Wootton 1990). Compile and filter
(NDB)
sequence data from different primary databases to produce
more complete databases than the individual databases. HIV Protease Database
The secondary databases help the user determine whether a
ReLiBase
new sequence belongs to a known protein family. The most
popular databases in this are PROSITE. It is one of the most PDBsum
popular database of short sequence patterns and profiles that
CATH
characterize biologically significant sites in proteins.
PRINTS on the other hand, expand on this concept and SCOP
provide an essence of protein fingerprints – groups of FSSP
conserved motifs that characterize a protein family. Finally,
Pfam-A is another database which comprises accurate Nucleotide sequences GenBank
manually compiled alignments on the other hand Pfam-B is EMBL
an automated clustering of the whole SWISS-PROT
database (Bateman et al., 2000). These different secondary DDBJ
databases have recently been incorporated into a single Genome sequences Entrez genomes
resource named InterPro (Attwood et al., 1999).
Entrez genomes GeneCensus
3. Structural databases
These are databases of the macromolecular structures. The GeneCensus COGs
Protein Data Bank, PDB provides a primary archive of all
COGs
3D structures for macromolecules such as proteins, RNA,
DNA and various complexes (Bernstein, F.C.1977; Integrated databases InterPro
Berman, H.M., 2000).The problem with individual Protein Sequence retrieval system
Data Bank is that information regarding entries can be (SRS)
difficult to extract. This problem has overcome by PDBsum
(Laskowski, 1997). PDBsum has capability of providing a Entrez
separate Web page for every structure in the protein
databases and helps in detailed structural analyses, Data Integration
schematic diagrams and data on interactions between Data integration is most important step in the field of
different molecules in a given entry (Luscombe, 2001) bioinformatics. Because individual data does not carry
CATH (Pearl, 2000), SCOP (Lo Conte, 2000) and FSSPb much significance until it combines with the other
(Holm and Sander. 1998) databases are the three major information available regarding that structure. In other
databases which classify proteins by structure to identify words it is the way of putting individual pieces of
structural and evolutionary relationships. Similarly, there information in context with respect to other data. Data
are various other databases which focus on particular types integration, as it looks like, however is not always
of macromolecules for ex. Nucleic Acids Database, straightforward to access as there are differences in
NDB(Berman,1992) for structures related to nucleic acids, nomenclature and file formats.

This paper can be downloaded online at http://ijasbt.org & http://nepjol.info/index.php/IJASBT 427


A. Kamble and R. Khairkar (2016) Int J Appl Sci Biotechnol, Vol 4(4): 425-429
There are several methods to overcome this problem bioinformatics methods. Likewise protein sequence data
can be utilized for multiple sequence alignments
1) Can be solved to some extent by providing cross-
algorithms, sequence comparison algorithms, Identification
references
of conserved sequence motifs etc.
2) At a more advanced level, there have been efforts
to integrate access across several data sources. Conclusions
3) SRS is the Sequence Retrieval System (Etzold et Bioinformatics methods have become indispensable to
al., 1996) which allows databases to be indexed to biological investigations. In this review we have tried to
each other. provide baseline information regarding role of
4) Entrez facility (Schuler et al., 1996) which bioinformatics in the biomedical research. In our next
provides similar gateways to DNA and protein article we will try to cover role of bioinformatics in specific
sequences, genome mapping data, 3D medical conditions.
macromolecular structures and the PubMed
bibliographic database. Bioinformatics covers a wide range of subject areas
So in this way a search for any specific gene in either including structural biology, genomics and gene expression
database will allow smooth transitions to the genome it studies etc. as we have seen Bioinformatics principle
comes from, the protein sequence it encodes, its structure, approach is to compare and group the data according to
bibliographic reference and equivalent entries for all related biologically meaningful similarities and then, based on this,
genes. analysing one type of data to infer and understand the
observations for another type of data. This helps us to
Use(S) Of Integrated Data understand the biological information in large scale
So after integrating available information, integrated data is dimensions both in depth and breadth.
to be utilized in different areas.
As depicted in the Table 2, data source formed can be So in total it enables us to examine individual systems in
utilized for different purpose(s) using bioinformatics’ detail, to compare them with those that are related to find
techniques. For example genomics data can be used relating out similar principals in them and also distinguish some
specific genes to diseases, in metabolic pathways for features which are unique to some systems.
characterization of protein content etc. by using

Table 2: Data sources in bioinformatics and subject areas that utilize this data.
DATA SOURCE RESEARCH AREAS
Genomes 1) Phylogenetic analysis
2) Linkage analysis relating specific genes to diseases
3) characterization of protein content metabolic pathways
4) Characterization of repeats
5) Structural assignments to genes
Raw DNA sequence 1) Identification of introns and exons
2) Separating coding and non-coding regions
3) Forensic analysis
4) Gene product prediction
Protein sequence 1) Multiple sequence alignments algorithms
2) Sequence comparison algorithms
3) Identification of conserved sequence motifs
Gene expression 1) Mapping expression data to sequence, structural and
biochemical data
2) Correlating expression patterns
Macromolecular structure 1) Protein geometry measurements
2) Secondary, tertiary structure prediction
3) 3D structural alignment algorithms
4) Surface and volume shape calculations
5) Intermolecular interactions

This paper can be downloaded online at http://ijasbt.org & http://nepjol.info/index.php/IJASBT 428


A. Kamble and R. Khairkar (2016) Int J Appl Sci Biotechnol, Vol 4(4): 425-429
Etzold T, Ulyanov A and Argos P (1996). SRS: information
References retrieval system for molecularbiology data banks. Methods
Altschul SF, Madden TL, Schaffer AA, Zhan J, Zhang Z, Miller Enzymol. 266: 114-128. DOI: 10.1016/S0076-
W and et al. (1997) Gapped BLAST and PSI-BLAST: a 6879(96)66010-8
new generation of protein database search programs. Hendlich M (1998) Databases for protein-ligand complexes. Acta
Nucleic Acids Res. 25(17):3389-3402. DOI: Crys.t D. 54(1): 1178-1182. DOI :
10.1093/nar/25.17.3389 10.1107/S0907444998007124
Attwood TK, Flower DR, Lewis AP, Mabey JE, Morgan SR, Holm L and Sander C (1998) Touring protein fold space with
Scordis P and et al. (1999). PRINTS prepares for the new Dali/FSSP. Nucleic Acids Res. 26(1): 316-319. DOI:
millennium. Nucleic Acids Res. 27(1): 220-225. DOI: 10.1093/nar/26.1.316
10.1093/nar/27.1.220 Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones
Attwood TK, Gisel A, Eriksson, NE, Bongcam-Rudloff E ML and Thornton JM (1997) PDBsum: a Web-based
(2011) Concepts, Historical Milestones and the Central database of summaries and analyses of all PDB structures.
Place of Bioinformatics in Modern Biology: A European Trends in Biomedical Science 22(12): 488-490. DOI:
Perspective". Bioinformatics - Trends and Methodologies. 10.1016/S0968-0004(97)01140-7
InTech. Retrieved 8 Jan 2012. DOI: 10.5772/23535 Lo Conte L, Ailey B, Hubbard TJ., Brenner SE, Murzin AG and
Baker W, Van den Broek, A, Camon E, Hingamp P, Sterk P, Chothia C (2000) SCOP: a structural classification of
Stoesser G and et al. (2000) The EMBL nucleotide proteins database. Nucleic Acids Res. 28(1): 257-259.
sequence database. Nucleic Acids Res 28(1):19-23. DOI: DOI: 10.1093/nar/28.1.257
10.1093/nar/28.1.19 Luscombe NM, Greenbaum D and Gerstein M (2001).What is
Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, bioinformatics? An introduction and overview. Yearbook
Sonnhammer EL (2000) The Pfam protein families of Medical Informatics.
database. Nucleic Acids Res. 28(1): 263-266. DOI: Mc Garvey PB, Huang H, Barker WC, Orcutt BC, Garavelli JS,
10.1093/nar/28.1.263 Srinivasarao GY and et al.(2000) PIR: a new resource for
Benson DA, Karsch-Mizrachi I, Lipman, DJ, Ostell J, Rapp BA bioinformatics. Bioinformatics 16(3): 290-291. DOI:
and Wheeler DL (2000) GenBank. Nucleic Acids Res. 10.1093/bioinformatics/16.3.290
28(1): 15-18. DOI: 10.1093/nar/28.1.15 Okayama T, Tamura T, Gojobori T, Tateno Y, Ikeo K, Miyazaki
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman S and et al. (1998) Formal design and implementation of
DJ, Ostell J, Sayers EW (2013) GenBank. Nucleic Acids an improved DDBJ DNA database with a new schema and
Res. 41(Database issue): D36-42. DOI: object-oriented library. Bioinformatics 14(6): 472-8. DOI:
10.1093/nar/gks1195. DOI: 10.1093/nar/gks1195 10.1093/bioinformatics/14.6.472
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Ouzounis CA (2012) Rise and demise of bioinformatics? Promise
Weissig H and et al. (2000) The Protein Data Bank. and progress. PLoS Comput. Biol. 8: e1002487. DOI:
Nucleic Acids Res. 28(1): 235-242. DOI: 10.1371/journal.pcbi.1002487
10.1093/nar/28.1.235 Pandey AS and Divyasheesh V (2016) Applications of
Berman HM, Olson WK, Beveridge DL, Westbrook J, Gelbin A, Bioinformatics in Medical Renovation and Research.
Demeny T and et al.(1992).The Nucleic Acid Database. A International Journal of Advanced Research in Computer
comprehensive relational database of three Science and Software Engineering 6(3): 56-58.
dimensionalstructures of nucleic acids. Biophysics Journal Pearl FM, Lee D, Bray JE, Sillitoe I, Todd AE, Harrison AP and
63(3): 751-759. DOI: 10.1016/S0006-3495(92)81649-1 et al. (2000) Assigning genomic sequences to CATH.
Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD, Nucleic Acids Res. 28(1): 277-282. DOI:
Rodgers JR and et al. (1977) The Protein Data Bank. A 10.1093/nar/28.1.277
computer-based archival file for macromolecular Pearson WR and Lipman DJ (1988) Improved tools for biological
structures. European Journal of Biochemistry 80(2): 319- sequence comparison. Proceedings of National Academy
24. DOI: 10.1111/j.1432-1033.1977.tb11885.x of Sciences U S A 85(8): 2444-2448. DOI:
Bleasby AJ and Wootton JC (1990). Construction of validated, 10.1073/pnas.85.8.2444
non-redundant composite protein sequence databases. Schuler GD, Epstein JA, Ohkawa H, Kans JA (1996) Entrez:
Protein Engineering. 3(3): 153-159. DOI: molecular biology database and retrieval system. Methods
10.1093/protein/3.3.153 Enzymol. 266: 141-62. DOI: 10.1016/S0076-
Bleasby AJ, Akrigg D, Attwood TK.(1994). OWL—a non- 6879(96)66012-1
redundant composite protein sequence database. Nucleic Tatusov RL, Koonin EV and Lipman DJ (1997) A genomic
Acids Res. 22(17): 3574-3577. perspective on protein families. Science. 278(5338): 631-
Daisuke Kihara YDY, Troy H. (2006). Bioinformatics resources 637. DOI: 10.1126/science.278.5338.631
for cancer research with an emphasis on gene function and Vondrasek J and Wlodawer A (1997) Database of HIV proteinase
structure prediction tools. Cancer Informatics 2: 25-35. structures. Trends in Biochemical Science 22(5): 183.
DOI: 10.1016/s0968-0004(97)01024-4

This paper can be downloaded online at http://ijasbt.org & http://nepjol.info/index.php/IJASBT 429

You might also like