0% found this document useful (0 votes)

58 views

Lecture 4 Biological Databases

The document discusses biological databases, their importance and components. Biological databases store vast amounts of complex biological data and make it accessible. They allow indexing and removal of redundant data. Biological databases have entities, fields, records and identifiers as their basic components.

Uploaded by

Veer khade

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Lecture 4 Biological Databases

Uploaded by

Veer khade

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Databases,

Biological
Databases & Role
in Bioinformatics
by
Dr. Aditya Kumar Padhi, Ph.D.

Laboratory for Computational

Biology & Biomolecular Design
Lecture-4 (LCBD),
School of Biochemical
Engineering, IIT (BHU)
Contents
• What is a database?

• Types of databases

• Biological databases and need for biological databases

• Types of biological databases

• Interconnection between databases

• Pitfalls

• Information retrieval

• Biological databases in Indian context

• Conclusion
2
Need of Database
• One of the hallmarks of modern genomic research is the generation of enormous
amounts of raw sequence data (DNA & Protein).

• As the volume of genomic data grows, sophisticated computational methodologies

are required to manage the huge data.

• Thus, the very first challenge in the genomics era is to store and handle the
staggering volume of information through the establishment and use of computer
databases.

• The development of databases to handle the vast amount of molecular biological

data is thus a fundamental task of bioinformatics.

• We will go through some basic concepts related to databases, the types, designs,
and architectures of biological databases.
3
Database
• A database is a computerized archive used to store and organize data in such a way that
information can be retrieved easily via a variety of search criteria.

• Databases are composed of computer hardware and software for data management.

• The chief objective of the development of a database is to organize the data in a set of structured
records for easy retrieval of information.

• Although data retrieval is the main purpose of all databases, “biological databases often have a
higher level of requirement, known as knowledge discovery (the identification of connections
between pieces of information that were not known when the information was first entered)”.

• For example, databases containing raw sequence information can perform extra computational
tasks to identify sequence homology or conserved motifs. These features facilitate the discovery
of new biological insights from raw data.
4
Types of Databases
• To facilitate the access and retrieval of data, sophisticated computer software programs for
organizing, searching, and accessing data have been developed.

• These are called database management systems.

• These systems contain not only raw data records but also operational instructions to help
identify hidden connections among data records.

OODBMS
RDBMS
(object-oriented
(Relational Database
Management Systems) database management
systems)

5
RDBMS
• Originally, databases all used a flat-file format, which is a long text file that contains many
entries separated by a delimiter, a special character such as a vertical bar (|).

• Within each entry are a number of fields separated by tabs or commas.

• The text file can be considered a single table. Thus, to search a flat file for a particular piece of
information, a computer has to read through the entire file (obviously an inefficient process).

• Instead of using a single table as in a flat-file database, relational databases use a set of tables to
organize data. Each table also called a relation, is made up of columns and rows.

6
RDBMS

• Example of constructing a relational database for five students’ course information originally expressed
in a flat file.
• By creating 3 different tables linked by common fields, data can be easily accessed and reassembled.

• Question: which courses are students from Texas taking?

7
OODBMS
• One of the problems with
relational databases is
that the tables used do not
describe complex
hierarchical relationships
between data items.

• OODBMS stores the data

as objects.

• Programming languages
like C++ are used to
create object-oriented
Question: which courses are databases.
students from Texas taking?
• The objects are linked by
a set of pointers defining
predetermined
Example of the construction and query of an OODBMS using the same student information. relationships between the
Objects are constructed and are linked by pointers shown as arrows. Finding specific objects.
information relies upon simple navigation through the objects by way of pointers.
8
Data types in Biology
Primary data Sequence Primary database
AATGCGTATAGGCAG DNA

SMEKPCYSGKLTYPS Amino acid

Secondary data Secondary protein Secondary database

“motifs”: blocks, structure
signatures, fingerprints e.g., alpha-helices,
beta-strands

Tertiary data Tertiary protein Tertiary database

Atomic co-ordinates structure
Domains, folding
units
Biological databases
• Biological data are complex, vast, and incomplete.

• A collection of biological data arranged in a computer-readable form that

enhances the speed of search, retrieval and is convenient to use is called a
biological database.

• A good database must have updated information.

• Therefore, the organized nature of the database makes it easy to access,

manage, and periodically update.

• Rapidly search the required data/information from a suitable computer system.

Importance of biological databases
• Biological science has now turned into a data-rich science.

• Gene sequences
• Amino acid sequences in proteins
• Motifs and domains in proteins
• Structural data from XRD & NMR
• Metabolic pathways
• Protein-protein interactions
• Gene expression data DNA microarrays

• All this information can be retrieved by using biological databases.

• Thus, the storage and handling of this staggering information are the major challenges of the
current genomics era.

• Biological databases address this, allow data indexing, as well as help, remove the data
redundancy.
Components of biological database
• Similar to other databases, a biological database also has certain basic components.

a) Entity - An entity refers to the item we want to store in a database. e.g., DNA sequences, Genes,
Bibliographic references, etc.
b) Fields - The properties of an entity are called fields. e.g., Gene name, gene sequence, mutation (if
any), etc.
c) Records - A record typically refers to a combination of all the fields for a given entity. For e.g., Record
for gene BRCA1 in GenBank.
d) Identifier - The unique name which identifies a record.

• The entities stored are movies.

• The field refers to the columns of the table i.e.,

Title, Year, Director

• The records are each row of the table including the

movie name.

• The unique identifiers are movie1, movie2, etc.

Types of biological databases

Primary databases Secondary databases Derived databases

Nucleotide Protein Protein Domain

sequence sequence structure and motif
database database database database
1. NCBI- 1. Swissprot 1. PDB 1. Prosite
GenBank 2. PIR 2. EBI-MSD 2. Blocks
2. DDBJ 3. GenePept 3. MMDB 3. COG
3. EMBL

Structure Gene expression Metabolic pathway Specialized

database database database database
1. GEO 1. KEGG 1. TGI
1. SCOPe
2. GXD 2. PathDB 2. GSOB
2. CATH
3. MGED 3. EMP 3. GPCRD
Primary vs. secondary database
• Primary:
• Contains experimentally derived, original data from the researchers.

• Mostly public and open access.

• A primary database contains information on sequence or structure alone.

• Once given a database accession number, the data in primary databases are never changed: they
form part of the scientific record.
• Example: 1) Swissprot, PIR (protein sequences), 2) GenBank, DDBJ (genome sequences), 3) Protein Data
Bank (protein 3D structures).

• Secondary:
• The database is derived from the analysis or treatment of primary data.

• Manually created or automatically generated.

• It is very important for interfering the protein function.

• They are highly curated, often using a complex combination of computational algorithms and manual
analysis.
• Example: 1) InterPro (protein families, motifs and domains), 2) UniProt Knowledgebase (sequence and
functional information of proteins), 3) Ensembl (variation, function, regulation and more layered onto whole
genome sequences).
Classification of databases
Nucleotide Protein

Nucleotide Sequence Interaction Structure

sequence database 1. Uniprot 1. Biogrid 1. PDB
(Primary) 2. PIR 2. STRING 2. CATH
3. Swissprot 3. SCOPe
1. NCBI- Whole Genome Database [Protein-protein
GenBank (ENSEMBL) [All are primary] interaction] 1. Protein Data
2. DDBJ Bank
3. EMBL 2. Clas
Architecture
2. DNA Data Specialized Topology
Bank of Japan Homology
3. The European OMIM (Online 3. Structural
Molecular Biology Mendelian Inheritance classification
Laboratory of Man)-inherited of Proteins
disease database

Altogether is under the Gene expression

database – INSDC omnibus – Microarray
(International Nucleotide database
Sequence Database)
Examples of various databases

Largest collection is housed at the

National Center for Biotechnology
Information (NCBI), part of the
National Library of Medicine

NLM-NCBI complex in Bethesda MD

Large staff of curators process the information and

compile information into derivative databases
NCBI maintains both primary and derivative databases

PubMed is the premier literature database in the world

GenBank

NCBI GenBank/GenPept format showing the three major

components of a sequence file.
EMBL-EBI
1) 2)

3)
4)
Uniprot
Uniprot
RCSB PDB
RCSB PDB
CATH
1)
• A free, publicly available online resource that provides
information on the evolutionary relationships of protein
domains.

2)
SCOPe

1)
2)
STRING
1)

2)
OMIM
1)

2)
India is not lagging behind!

Suggested reading:
1. A repository of web-based bioinformatics resources developed in
India, Abhishek Agarwal, Piyush Agrawal, Aditi Sharma, Vinod Kumar,
Chirag Mugdal, Anjali Dhall, Gajendra P.S. Raghava, bioRxiv
2020.01.21.855627; doi: https://doi.org/10.1101/2020.01.21.855627

2. https://www.natureasia.com/en/nindia/article/10.1038/nindia.2015.118

3. https://bioinformaticsreview.com/20190210/india-ranks-4th-among-
the-top-20-bioinformatics-database-contributors-in-the-world/
Thank you

Arb Corportion LTD Assignment
No ratings yet
Arb Corportion LTD Assignment
13 pages
Introduction To Databases
No ratings yet
Introduction To Databases
7 pages
WINSEM2021-22 BIY1012 ETH VL2021220501045 Reference Material I 11-01-2022 Ntroduction To Databases
No ratings yet
WINSEM2021-22 BIY1012 ETH VL2021220501045 Reference Material I 11-01-2022 Ntroduction To Databases
42 pages
Biological Databases: DR Z Chikwambi Biotechnology
No ratings yet
Biological Databases: DR Z Chikwambi Biotechnology
47 pages
Unit 5-Introduction To Biological Databases
No ratings yet
Unit 5-Introduction To Biological Databases
14 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
M Lec 01 & 02 Biological Database
No ratings yet
M Lec 01 & 02 Biological Database
50 pages
Lec2 Databases
No ratings yet
Lec2 Databases
135 pages
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
No ratings yet
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
5 pages
Database
No ratings yet
Database
16 pages
UNIT II
No ratings yet
UNIT II
23 pages
04 Computer Applications in Pharmacy Full Unit IV
No ratings yet
04 Computer Applications in Pharmacy Full Unit IV
14 pages
2024.HF_BioInformatics_Lec3p
No ratings yet
2024.HF_BioInformatics_Lec3p
11 pages
Biological Databases
No ratings yet
Biological Databases
3 pages
Introduction To Biological Databases
No ratings yet
Introduction To Biological Databases
5 pages
المحاضرة 2
No ratings yet
المحاضرة 2
16 pages
FE_BME_400_BI_Week 05_Lec
No ratings yet
FE_BME_400_BI_Week 05_Lec
10 pages
Biological Databases- Types and Importance _ Bioinformatics _ Microbe Notes
No ratings yet
Biological Databases- Types and Importance _ Bioinformatics _ Microbe Notes
6 pages
RAJU
No ratings yet
RAJU
24 pages
Bioinformatics Biological Database
No ratings yet
Bioinformatics Biological Database
31 pages
Capture D'écran . 2023-03-14 À 00.15.22
No ratings yet
Capture D'écran . 2023-03-14 À 00.15.22
54 pages
Databases in Bioinformatics
No ratings yet
Databases in Bioinformatics
33 pages
Biological Database
No ratings yet
Biological Database
3 pages
Bioinformatics lecture 1
No ratings yet
Bioinformatics lecture 1
48 pages
Lesson 01 Intro DataBases V2
No ratings yet
Lesson 01 Intro DataBases V2
38 pages
ajol-file-journals_314_articles_242956_submission_proof_242956-3745-584187-1-10-20230306
No ratings yet
ajol-file-journals_314_articles_242956_submission_proof_242956-3745-584187-1-10-20230306
17 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
"MBG1002 Biological Databases Week II
No ratings yet
"MBG1002 Biological Databases Week II
37 pages
Sec1 Introduction to Bioinformatics
No ratings yet
Sec1 Introduction to Bioinformatics
20 pages
Introduction To Bioinformatics (Databases)
No ratings yet
Introduction To Bioinformatics (Databases)
28 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
BIOINFO
No ratings yet
BIOINFO
15 pages
DOC-20250225-WA0035.
No ratings yet
DOC-20250225-WA0035.
12 pages
Bioinformatics Database Resources: Icxa Khandelwal Pavan Kumar Agrawal Rahul Shrivastava
No ratings yet
Bioinformatics Database Resources: Icxa Khandelwal Pavan Kumar Agrawal Rahul Shrivastava
46 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
Bioinfo U2 KD 2
No ratings yet
Bioinfo U2 KD 2
3 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
34 pages
BCH 505 Bioinformatics 3(2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3(2 2) Databases
17 pages
Q Data Bases
No ratings yet
Q Data Bases
2 pages
9. Biological Databases
No ratings yet
9. Biological Databases
17 pages
Day 1
No ratings yet
Day 1
38 pages
#1 L1 BioDatabases
No ratings yet
#1 L1 BioDatabases
89 pages
Databases in Bioinformatics - An Introduction
No ratings yet
Databases in Bioinformatics - An Introduction
11 pages
1. Databases
No ratings yet
1. Databases
34 pages
Bioinformatics Day1
No ratings yet
Bioinformatics Day1
5 pages
Lecture 5- DataBase
No ratings yet
Lecture 5- DataBase
18 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Biological Data Bases
No ratings yet
Biological Data Bases
36 pages
Biological Databases: - Bio-Informatics
No ratings yet
Biological Databases: - Bio-Informatics
16 pages
BIOINFORMATICS (FINAL)
No ratings yet
BIOINFORMATICS (FINAL)
41 pages
CH12
No ratings yet
CH12
8 pages
Biological Information on Artificial Intelligence
No ratings yet
Biological Information on Artificial Intelligence
20 pages
5 Bioinformatics
No ratings yet
5 Bioinformatics
23 pages
Databases For Microarrays: Vidhya Jagannathan SIB, Lausanne
No ratings yet
Databases For Microarrays: Vidhya Jagannathan SIB, Lausanne
49 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Basics of Bioinformatics in Biological Research
No ratings yet
Basics of Bioinformatics in Biological Research
5 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Bioinformatics. CH 3 Databases (Summarized Notes)
50% (2)
Bioinformatics. CH 3 Databases (Summarized Notes)
5 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
52 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Anaytical Investigation
No ratings yet
Anaytical Investigation
6 pages
Daftar Pustaka IKA
No ratings yet
Daftar Pustaka IKA
5 pages
Ijrtssh Vol 2 Issue1 102
No ratings yet
Ijrtssh Vol 2 Issue1 102
31 pages
Basic Calculus Module 1 Second Semester Quarter 3
No ratings yet
Basic Calculus Module 1 Second Semester Quarter 3
6 pages
Good Thesis Sentences Examples
100% (2)
Good Thesis Sentences Examples
8 pages
2022 - 04 - 22 Direct Time Study Problems
No ratings yet
2022 - 04 - 22 Direct Time Study Problems
14 pages
What's New in Revit 2025
No ratings yet
What's New in Revit 2025
74 pages
The Problem of Many Hands The Problem of Many Hands: Challenger Case (Case 1)
No ratings yet
The Problem of Many Hands The Problem of Many Hands: Challenger Case (Case 1)
3 pages
A computational introduction to number theory and algebra 2nd Edition Victor Shoup download pdf
100% (8)
A computational introduction to number theory and algebra 2nd Edition Victor Shoup download pdf
61 pages
6-020 2019 CheungNiuXieLan IEEERevBME
No ratings yet
6-020 2019 CheungNiuXieLan IEEERevBME
14 pages
PPE Assessment PDF
100% (2)
PPE Assessment PDF
2 pages
Qualitative Research
No ratings yet
Qualitative Research
12 pages
BleDoubleN1 Bleaching System 100tmdia
No ratings yet
BleDoubleN1 Bleaching System 100tmdia
2 pages
S2 - Vocabulary - Unit 3 - Synonyms
No ratings yet
S2 - Vocabulary - Unit 3 - Synonyms
18 pages
Download ebooks file The Cambridge Handbook of Consumer Psychology 2nd Edition Stephen A. Spiller all chapters
100% (3)
Download ebooks file The Cambridge Handbook of Consumer Psychology 2nd Edition Stephen A. Spiller all chapters
40 pages
Immediate download (Ebook) Applied Calculus by Geoffrey C. Berresford, Andrew M. Rockett ISBN 9781305085312, 1305085310 ebooks 2024
100% (10)
Immediate download (Ebook) Applied Calculus by Geoffrey C. Berresford, Andrew M. Rockett ISBN 9781305085312, 1305085310 ebooks 2024
65 pages
gr3 Foss
No ratings yet
gr3 Foss
4 pages
Limit State Design
No ratings yet
Limit State Design
8 pages
PLC English Year 2
No ratings yet
PLC English Year 2
2 pages
Chapter 3 Project Identification and Feasibility
100% (1)
Chapter 3 Project Identification and Feasibility
80 pages
NCSDATA
No ratings yet
NCSDATA
324 pages
Newton Second Law
No ratings yet
Newton Second Law
4 pages
RP Submission Form & MOM - Pratap - Notsigned
No ratings yet
RP Submission Form & MOM - Pratap - Notsigned
3 pages
Dac CBL
No ratings yet
Dac CBL
20 pages
Implementing Holistic Government Joined-Up Action On The Ground (David Wilkinso
No ratings yet
Implementing Holistic Government Joined-Up Action On The Ground (David Wilkinso
180 pages
Defining Leadership David Carl Wilson
No ratings yet
Defining Leadership David Carl Wilson
31 pages
Vedic Maths Chapter 6
No ratings yet
Vedic Maths Chapter 6
7 pages
Suffixes
No ratings yet
Suffixes
4 pages
2 Protein
No ratings yet
2 Protein
7 pages

Lecture 4 Biological Databases

Uploaded by

Lecture 4 Biological Databases

Uploaded by

Databases,

Laboratory for Computational

• Biological databases and need for biological databases

• Types of biological databases

• Interconnection between databases

• Biological databases in Indian context

• As the volume of genomic data grows, sophisticated computational methodologies

• The development of databases to handle the vast amount of molecular biological

• These are called database management systems.

• Within each entry are a number of fields separated by tabs or commas.

• Question: which courses are students from Texas taking?

• OODBMS stores the data

SMEKPCYSGKLTYPS Amino acid

Secondary data Secondary protein Secondary database

Tertiary data Tertiary protein Tertiary database

• A collection of biological data arranged in a computer-readable form that

• A good database must have updated information.

• Therefore, the organized nature of the database makes it easy to access,

• Rapidly search the required data/information from a suitable computer system.

• All this information can be retrieved by using biological databases.

• The entities stored are movies.

• The field refers to the columns of the table i.e.,

• The records are each row of the table including the

• The unique identifiers are movie1, movie2, etc.

Primary databases Secondary databases Derived databases

Nucleotide Protein Protein Domain

Structure Gene expression Metabolic pathway Specialized

• Mostly public and open access.

• A primary database contains information on sequence or structure alone.

• Manually created or automatically generated.

• It is very important for interfering the protein function.

Nucleotide Sequence Interaction Structure

Altogether is under the Gene expression

Largest collection is housed at the

NLM-NCBI complex in Bethesda MD

Large staff of curators process the information and

PubMed is the premier literature database in the world

NCBI GenBank/GenPept format showing the three major

You might also like