0% found this document useful (0 votes)
145 views

Introduction To Databases

Biological databases play a fundamental role in bioscience and bioinformatics by organizing vast amounts of biological data for easy access and analysis. There are various types of biological databases that contain data from the nucleic acid level to whole genomes and proteomes, including nucleotide and protein sequences, gene expression patterns, metabolic pathways, and more. Maintaining comprehensive yet structured biological databases is crucial as modern biological research generates enormous volumes of raw genomic and molecular data.

Uploaded by

jonny depp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views

Introduction To Databases

Biological databases play a fundamental role in bioscience and bioinformatics by organizing vast amounts of biological data for easy access and analysis. There are various types of biological databases that contain data from the nucleic acid level to whole genomes and proteomes, including nucleotide and protein sequences, gene expression patterns, metabolic pathways, and more. Maintaining comprehensive yet structured biological databases is crucial as modern biological research generates enormous volumes of raw genomic and molecular data.

Uploaded by

jonny depp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Background

Now biology becomes increasingly turned into a data-rich science, so the need for strong and
communicating large datasets has grown tremendously (e.g. Nucleotide and protein
sequences, three-dimensional structures from X-ray crystallography and NMR). A biological
database is a collection of data that is organized so that it contents can easily be accessed,
managed and updated. Biological databases play a fundamental role in bioscience particularly
in bioinformatics. They offer scientists the opportunity to access sequence and structure data
for tens of thousands of sequences from a broad range of organisms. Biological databases
represent an invaluable resource in support of biological research.

Types of Biological Data

The biological data obtained from the nucleotide to the networks level results the diverse
classes of biological databases, which includes:

 Nucleic acid sequence and structure


 Transcriptional regulation/gene expression patterns
 Protein sequence and structure
 Motifs and domains
 Protein-protein interactions
 Metabolic and signaling pathways
 Metabolites, enzymes, protein modification
 Viruses, bacteria, protozoa, and fungi
 Partial and whole genome sequences
 Genomic variation, diseases, and drugs
 Plant databases
 Other molecular biology databases, etc.

Definition of Bioinformatics
A biological database is a collection of data that is organized so that it contents can easily be
accessed, managed, and updated.
Why we need Biological database?
One of the hallmarks of modern genomic research is the generation of enormous amounts of
raw sequence data. As the volume of genomic data grows, sophisticated computational
methodologies are required to manage the data. Thus, the very first challenge in the genomics
era is to store and handle the overwhelming volume of information through the establishment
of computer databases. The development of databases to handle the vast amount of molecular
biological data is thus a fundamental task of bioinformatics. This chapter introduces some
basic concepts related to databases, in particular, the types, designs, and architectures of
biological databases.

Objectives of Biological Databases

• Availability of biological data to scientific community: To store, organize and share data in
a structured and searchable manner with the aim to facilitate data retrieval and visualization.

• Availability of biological data in computer-readable form: To maintain the data in the


common formats and to provide web application programming interfaces for computers to
exchange and integrate data from various database resources in an automated manner.

Features of Biological database

• Structured - Stored in a well designed fashion


• Searchable (index) - Table of contents
• Updated periodically (release) - New edition
• Cross-referenced (hyperlinks) - Links with other Databases
Organization of a database

• Databases are composed of tables of data: It the same thing as a spreadsheet: a set of rows
and columns.

• Each table has several records (rows): A record stores all the information for a given
individual.

• Each record has several fields (columns): A field is an individual piece of data, a single
attribute of the record.

• Each record has a unique identifier, the primary key: A primary key serves to identify the
data stored in this record across all the tables in the database.

The ‘Perfect’ Database

A perfect database has following qualities:

 Comprehensive, but easy to search


 Annotated, but not “too annotated”
 A simple, easy to understand structure
 Cross-referenced
 Minimum redundancy
 Easy retrieval of data
Classification of Biological Databases

Biological databases can be broadly classified into two categories

Sequence databases:
Contains nucleic acid and protein sequences information

Structure databases:
Three-dimensional structures of proteins, nucleic acids, and macromolecular complexes.

These databases are important tools in assisting scientists to analyze and explain a host of
biological phenomena from the structure of biomolecules and their interaction, to the whole
metabolism of organisms and to understanding the evolution of species. This knowledge helps
to facilitate the fight against diseases, assists in the development of medications, predicting
certain genetic diseases and in discovering basic relationships among species in the history of
life.

Sequences and structures are only among the several different types of data required in the
practice of the modern biology. Other important data types includes metabolic pathway
networks and molecular interactions, mutations and polymorphisms in molecular sequences
and structures as well as organelle structure and tissue type, genetic maps, physicochemical
data, gene and mRNA expression profiles, two dimensional gel electrophoresis images of
protein expression.
Sequence and structural databases are further can be classified into:

 Primary
 Secondary
 Composite

Primary database:

Consisting of data derived experimentally such as nucleotide, protein sequences and three
dimensional structures alone.
Examples of these include UniProtKB for protein sequences, GenBank & DDBJ for Genome
sequences and the Protein Data Bank for protein structures.

Secondary databases:

Contains data that are derived from the analysis or treatment of primary data such as
secondary structures, hydrophobicity plots, conserved sequence, signature sequence and
domain are stored in secondary databases.

Secondary structure database contains detailed information of the PDB entry in an organized
way. Example: Structural classification of protein class, fold, superfamily, etc.

Most of the secondary database created and hosted by various researchers at their individual
laboratories. Example: SCOP-developed at Cambridge University, CATH-developed at
University College of London, BMCD-developed at NIST, USA.

Composite databases:

This merges a variety of different primary database sources, which avoids the need to search
multiple resources. Different composite database use different combinations of primary
database and different criteria in their search algorithm.

The nucleotide and protein databases hosted at the National Center for Biotechnology
Information (NCBI), provides OMIM (Online Mendelian Inheritance in Man) an online
comprehensive, authoritative compendium of human genes and genetic phenotypes.
Current Status
The Database Issue of the journal “Nucleic Acids Research” is freely available, and categorizes
many of the publicly available online databases related to biology and bioinformatics.
According to a report of 21st Nucleic Acids Research Database Issue, published in 2014, there
are 1552 databases that are publicly accessible online [ref] and the recent 22nd Nucleic Acids
Research Database Issue reports the addition of 58 new molecular biology databases, and the
updates on 115 existing databases. (Nucleic Acids Research, 2015, Vol. 43, Database issue D1–
D5)
About Me

• Researcher at Rajasthan University DBT Bioinformatics Centre.

• Highly rated freelancer at https://www.teacheron.com/tutor-


profile/2w79

• Owns automatic updating chemical compound database


https://sites.google.com/view/mud-data

• 2 years of Next-Generation Sequencing, Data Processing, and


Bioinformatics experience.

• Developed septicemia and COVID-19 QSAR models using R.

• 3 years of project-based teaching experience to national and


international students.

For any query, suggestion, or feedback, you can contact me at-

[email protected]

www.linkedin.com/in/shradheya-r-r-gupta-54492984

Thanks for taking the course.

My aim is to bridge the gap between life science and computer.

Enjoy learning!

You might also like