Module-I
Module-I
— Unit I—
If you know your own DNA sequence than you know every thing about your self
Proteins
The emphasis on the use of computers because
Most of the tasks in genomic data analysis are highly repetitive or
mathematically complex
January 20, 2025 Bioinformatics 3
Bioinformatics and Computational Biology
The use of computers is absolutely indispensable in mining genomes
for information gathering and knowledge building
Bioinformatics differs from a related field known as computational
biology
Bioinformatics refers to the study of large sets of biodata, biological
statistics, and results of scientific studies
Bioinformatics is limited to
Sequence, structural, functional analysis of genes & genomes and
molecular biology
Bioinformatics as the development and application of computational
tools in managing all kinds of biological data
Example: Prediction of protein function from DNA sequence and
structural information
January 20, 2025 Bioinformatics 4
Bioinformatics and Computational Biology
Computational biology, by contrast, is concerned with solutions to
issues that have been raised by studies in bioinformatics.
However, computational biology encompasses all biological areas that
involve computation
Computational biology is more confined to the theoretical development
of algorithms used for bioinformatics
Computational biology is useful in scientific research, including
The examination of how proteins interact with each other
DNA bases pair up with each other, A with T and C with G, to form
units called base pairs
Each base is also attached to a
sugar molecule and a phosphate molecule
Every person has two copies of each gene, one inherited from
each parent
Most genes are the same in all people, but a small number of
translation
of biology in which
DNA is transcribed to RNA, which is translated to proteins
sciences
It has applications in
knowledge-based drug design,
agricultural biotechnology
is called translation
The genetic code:
During translation, the nucleotide sequence of an mRNA is translated
called codons
There are 61 codons that specify amino acids
How does the ribosome "know" which amino acid to add for each
codon?
This matching is not done by the ribosome itself
organism
perhaps as a signaling molecule, structural element, or enzyme!
inefficient process
Searches through such files often cause crashes of the entire computer
system because of the memory-intensive nature of the operation
To facilitate the access and retrieval of data, sophisticated computer
software programs
for organizing, searching, and accessing data have been developed
known as
relational databases or
Relational Databases:
Instead of using a single table as in a flat file database, relational
The database is structured such that the objects are linked by a set of
pointers
defining predetermined relationships between the objects
January 20, 2025 Bioinformatics 36
Databases and Types
Searching the database involves navigating through the objects with the
aid of the pointers linking different objects
Programming languages like C++ are used to create OODbs
The object-oriented database system is more flexible; data can be
structured based on hierarchical relationships
However, this type of database system lacks the rigorous mathematical
foundation of the relational databases
There is also a risk that some of the relationships between objects
maybe misrepresented
Some current databases have therefore incorporated features of
both types of database programming, creating the object–relational
Relational, and
Object oriented
specialized databases
Most of the data in the databases are contributed directly by authors with
manipulation
Example: NM_031959.3
Secondary Databases:
Need of Secondary Databases:
Sequence annotation information in the primary database is often
minimal
To turn the raw sequence information into more sophisticated
primary databases
The amount of computational processing work varies greatly among the
secondary databases
Some are simple archives of translated sequence data from
organism
The content of these databases may be sequences or other types of
information
The sequences in these databases may overlap with a primary database,
category
Examples include Flybase, WormBase, AceDB, and TAIR
system
It is a gateway that allows text-based searches for a wide variety of data
sequences,
Although a small number of the amino acid sequences are derived
using BLAST
January 20, 2025 Bioinformatics 52
Biological Databases: Characteristics
The contents
The ontology: the list of valid terms and their definitions
The logical structure, or the expression of the inter-relationships among
the data, called schema
The format of the data
The roots for selective retrieval of data, and presentation of results, or
pasting them on to a program for analysis
Links to other resources: other databases, references to original
publications of data, tutorial background etc.
The top line of the Header section is the LOCUS, which contains
A unique database identifier for a sequence location in the database
Next to the division is the date when the record was last modified
DEFINITION provides the summary information including
The name of the sequence,
information
There are many errors in sequence databases
All these types of errors can be passed on to other databases, causing
propagation of errors
Most errors in nucleotide sequences are caused by sequencing errors
Some of these errors cause frame shifts that make whole gene
identification difficult or protein translation impossible
Sometimes, gene sequences are contaminated with sequences from
cloning vectors
Errors are more common for sequences produced before the 1990s;
sequence quality has been greatly improved now
January 20, 2025 Bioinformatics 61
Pitfalls of Biological Databases
Therefore, exceptional care should be taken when dealing with more
updated sequences
Redundancy:
There are also high levels of redundancy in the primary sequence Dbs
various reasons
The causes of redundancy include
which
Identical sequences from the same organism and associated
database
The SWISS-PROT database also has minimal redundancy for