0% found this document useful (0 votes)
93 views5 pages

Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3

This document discusses biological databases, including their purpose and types. Biological databases store biological data like DNA sequences, protein structures, etc. in an organized way. There are three main types of biological databases: primary databases which archive experimental data with minimal annotation; secondary databases which apply computational analysis to primary data to derive more knowledge; and composite databases which merge and filter data from primary databases to make searches more efficient. Examples of prominent biological databases discussed include GenBank, SWISS-PROT, Pfam, and BLOCKS.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views5 pages

Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3

This document discusses biological databases, including their purpose and types. Biological databases store biological data like DNA sequences, protein structures, etc. in an organized way. There are three main types of biological databases: primary databases which archive experimental data with minimal annotation; secondary databases which apply computational analysis to primary data to derive more knowledge; and composite databases which merge and filter data from primary databases to make searches more efficient. Examples of prominent biological databases discussed include GenBank, SWISS-PROT, Pfam, and BLOCKS.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Bioinformatics and Omics

Topic: Database and Biological database with examples


Assignment-3

Submitted to: Submitted by:


Dr. Shazia Haider Diksha gupta(20915008)
Msc micro 3rd sem
Database and Biological database with examples

Assignment-3

A database is a collection of inter-related data which helps in efficient retrieval, insertion and deletion lf
data from database and organizes data in the form of tables, schemas, reports etc. Databases are
effectively electronic filing cabinets, a convenient and efficient method of storing vast amount of
information.

It is a computerized archive used to store and organize data in such a way that information can be
retrieved easily via a variety of search criteria. These are composed of computer hardware and software
for data management. The chief objective of the development of a database is to organize data in a set of
structured records to enable easy retrieval of information. Each record, also called an entry,
should contain a number of fields that hold the actual data items, for example, fields for names,
phone numbers, addresses, dates. To retrieve a particular record from the database, a user can
specify a particular piece of information, called value, to be found in a particular field and expect
the computer to retrieve the whole data record. This process is called making a query.

Although data retrieval is the main purpose of all databases, biological databases often have a
higher level of requirement, known as knowledge discovery, which refers to the identification of
connections between pieces of information that were not known when the information was first
entered. For example, databases containing raw sequence information can perform extra
computational tasks to identify sequence homology or conserved motifs. These features facilitate
the discovery of new biological insights from raw data.

Software which is used to manage database is called Database Management System (DBMS).
They are sophisticated computer software programs for organizing, searching, and accessing
data.

Types

There are many different databases types, depending both on nature of the information being
stored (eg. Sequences or structure, 2D gel or 3D structure images) and on the manner of data
storage
 Flat-files database
 Relational database
 Object-oriented database

Biological database

These are the databases consisting of biological data like protein sequencing, molecular
structure, DNA sequences, etc. in an organized form.

They are free to use and contain a huge collection of a variety of biological data.These are
libraries of biological sciences, collected from scientific experiments, published literature, high-
throughput experiment technology, and computational analysis. They contain information from
research areas including  genomics, proteomics, metabolomics, microarray gene expression,
and phylogenetic.

Information contained in biological databases includes gene function, structure, localization


(both cellular and chromosomal), clinical effects of mutations as well as similarities of biological
sequences and structures. Current biological databases use all three types of database structures:
flat files, relational, and object oriented.

Biological database can be divided into 3 categories:

 Primary Databases
o Most of the data in the databases are contributed directly by authors with a
minimal level of annotation.
o It can also be called an archival database since it archives the
experimental results submitted by the scientists.
o The primary database is populated with experimentally derived data like

genome sequence, macromolecular structure, etc. The data entered here


remains uncurated (no modifications are performed over the data)

There are three major public sequence databases that store raw nucleic acid sequence data
produced and submitted by researchers worldwide: GenBank, the European Molecular Biology
Laboratory (EMBL) database and the DNA Data Bank of Japan (DDBJ). They together
constitute the International Nucleotide Sequence Database Collaboration for the three-
dimensional structures of biological macromolecules, there is only one centralized database, the
PDB. This database archives atomic coordinates of macromolecules (both proteins and nucleic
acids) determined by x-ray crystallography and NMR.

GenBank is the most complete collection of annotated nucleic acid sequence data for almost
every organism. The content includes genomic DNA, mRNA, cDNA, ESTs, high throughput raw
sequence data, and sequence polymorphisms. There is also a GenPept database for protein
sequences, the majority of which are conceptual translations from DNA sequences, although a
small number of the amino acid sequences are derived using peptide sequencing techniques.

There are two ways to search for sequences in GenBank. One is using text-based keywords
similar to a PubMed search. The other is using molecular sequences to search by sequence
similarity using BLAST.

 Secondary Databases
o To turn the raw sequence information into more sophisticated biological
knowledge, much post processing of the sequence information is needed. This
begs the need for secondary databases, which contain computationally processed
sequence information derived from the primary databases.
o Computational algorithms are applied to the primary database and meaningful and
informative data is stored inside the secondary database. 
o A secondary database is better and contains more valuable knowledge compared
to the primary database.

A prominent example of secondary databases is SWISS-PROT, which provides detailed


sequence annotation that includes structure, function, and protein family assignment. The
sequence data are mainly derived from TrEMBL, a database of translated nucleic acid sequences
stored in the EMBL database. PIR is also an example of this databases.

The Pfam and Blocks databases contain aligned protein sequence information as well as derived
motifs and patterns, which can be used for classification of protein families and inference of
protein functions. The DALI database is vital for protein structure classification and threading
analysis to identify distant evolutionary relationships among proteins.

In Blocks database, the motifs (here called Blocks) are created automatically by highlighting and
detecting the most conserved regions of each family of proteins. This databases are fully
automated. Keyword and sequence searching are the two important features of this type of
database. Blocks are ungapped Multiple Sequence Alignment representing conserved protein
regions.

 Composite Databases

o The data entered in these types of databases are first compared and then filtered
based on desired criteria. 

o The initial data are taken from the primary database, and then they are merged
together based on certain conditions. 

o It helps in searching sequences rapidly. Composite Databases contain non-


redundant data. 

o They render sequence searching more efficient, because they obviate the need to
interrogate multiple resources.

o Because they are often curated by experts in the field, they may have unique
organizations and additional annotations associated with the sequences.

OWL, NRDB and SWISS-PROT+ TrEMBL are the examples of these databases.

NRDB (Non-Redundant Database) is built locally at the NCBI. The database is a composite of
GenPept, PDB sequences, SWISS-PROT, SPupdate, PIR, and GenPeptupdate. The database is
thus comprehensive and contains up-to-date information. It is non-redundant, but non- identical,
i.e., only identical sequence copies are removed from the resource. But the contents of NRDB
are both error-prone and, in spite of its name, redundant. NRDB is the default database of the
NCBI BLAST service.

You might also like