Database

The document provides an overview of databases, particularly in the context of biological data, detailing types such as primary, secondary, and composite databases, as well as their structures and functions. It highlights key databases like GenBank, EMBL, and DDBJ, along with their roles in storing nucleotide sequences and protein information. Additionally, it discusses various database management systems and tools available for data retrieval and analysis in bioinformatics.

Uploaded by

anis442643

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views40 pages

Database

Uploaded by

anis442643

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Databases

Dr. Shazia Rehman

Database
A Computerized archive used to store and organize data in such a way that
information can be retrieved easily.
A database is a repository of information that has a specific structure that
enables the entering and extraction of data
In general this database structure consists of files or tables,
each containing numerous records and fields
Conti..
Database System (DBS) is an integrated collection of related files along with the
detail about their definition, interpretation, manipulation and maintenance
A database system controls the data from unauthorized access.
Database management systems
Database management systems provide several functions in addition to simple
file management:
control security
 maintain data integrity
provide for backup and recovery
 control redundancy
 allow data independence
perform automatic query optimization
Organisation
Organisation:
flat files
Relational databases
Flat-file databases
the simplest form of a database,
where collections of data, such as nucleotide and amino acid sequence, are
stored as either a large single text file
Conti…
Conti..
A relational database stores the data within a number of tables.
Each table consists of records and fields (rows and columns)
Types of Database
The databases can be classified into three
categories on the basis of the information
stored.
They are Primary, Secondary and
Composite databases.
Primary databases contain data that is
derived experimentally.
They usually store information related to
the sequences or structures of biological
components
They can be further divided into protein or
nucleotide databases
Primary Database
This databases contains the raw nucleic acid
sequence data which are produced and submitted
by researchers worldwide.
NCBI(The National Centre for Biotechnology
Information)
GenBank
DDBJ (DNA data bank of Japan)
Protein
SWISS-PROT(Swiss-Prot ) PIR
PIR (Protein Information Resource) MIPS
PDB(Protein Data Bank) SWISS-PROT
TrEMBL (Translated European Molecular Biology TrEMBL
Laboratory)
Conti…
Secondary Databases
Secondary Databases:
contain information derived from primary
databases.
store information such as conserved sequences,
active site residues, and signature sequences.
Protein Databank data is stored in secondary
databases. Examples include:
Class Architecture Topology Homology (CATH),
Kyoto Encyclopedia of Genes and Genomics
(KEGG),
Protein Families (Pfam)
and Structural Classification of Proteins (SCOP)
Composite Databases
Composite Databases
are collections of several primary database resources.
provide users with various tools and software for analysis of data.
NCBI being a composite database has stored a lot of sequence of nucleotide and
protein within its server and thereby suffers from high redundancy in the data
deposited
Biological databases
Biological databases can be broadly classified in to
Sequence database
structure database
and pathway databases.
Sequence databases are applicable to both nucleic acid sequences and protein
sequences, whereas structure databases are applicable to only Proteins.
Sequence databases

Sequence databases
Nucleotide and protein sequence databases represent the most widely used and
some of the best established biological databases.
serve as repositories for wet lab results and the primary source for experimental
results.
Major public data banks included in this type are
GenBank in USA,
EMBL (European Molecular Biology Laboratory) in Europe
and DDBJ (DNADataBank) in Japan
Conti….
And protein databases includes
ExPaSy
UniProt
PIR
PDB
Swiss-Prot
TrEMBL
NATIONAL CENTER FOR
BIOTECHNOLOGY
INFORMATION

(NCBI)
developed at the National Institutes of Health (NIH) in 1988
Part of national library of medicine at national institute of health
provides access to a large amount of biomedical and genomic information (
www.ncbi.nlm.nih.gov/home/ about/mission.shtml).
It maintains a large scale of databases and bioinformatics tools as well as
services.
One of the most popular databases is GenBank
Conti…
Mission or role
The aim is to find novel techniques and methodologies for dealing with huge and
complex data
and provide better accessibility to analytical and computational tools.
Maintenance of biological databases whether primary or secondary.
It includes GENEBANK
NCBI provides the data retrieval systems such as ENTREZ
Provides computational sources for the analysis of the GENEBANK data and other
biological data
Conti…
Resources
The resources that are present on this site can be divided into two major
categories:
1) databases
2) tools
The major databases maintained at NCBI are
GenBank and PubMed (bibliographic database for biomedical literature).
Other databases include the
Gene,
Genome,
Epigenomics,
Gene
Expression
RefSeq,
Structure, Database of Short Genetic Variation (dbSNP),
TAXONOMY, etc.
TOOLS at NCBI
The NCBI also provides a variety of tools for database search
The Entrez: is search engine of NCBI
The other tools include
Genomes Browser,
BLAST,
CDTree,
Genetic Codes,
Open Reading Frame Finder (ORF Finder),
SNP Database Specialized Search Tools,
GenBank
GenBank (Genetic Sequence Databank)
GenBank® is the genetic sequence database at the National Center for
Biotechnology Information (NCBI).
It was established in the year 1982 and now maintained by the National Center
for Biotechnology (NCBI).
It contains publicly available nucleotide sequences
DNA sequences can be submitted to GenBank using several different methods.
BankIt: Web-based form for submission of a small number of sequences
Sequin: More appropriate for complicated submissions containing many
sequences
Structure of Genbank
A detailed structure of a nucleotide
sequence file format in this database
includes the following:
• 1. Locus: This can be defined as a title given
by GenBank itself to name the sequence
entry. It includes the following:
• a. Locus Name: Similar to accession number
for the sequence.
• b. Sequence Length: Tells the number of
bases existing in the sequence.
Conti….
• c. Molecule-Type: Identifies the
type of nucleic acid sequence. The
various types are mRNA (which is
present as cDNA), rRNA, snRNA,
and DNA.
• d. GB Division: Postulates class of
the data according to classification
criteria of GenBank.
• e. Modification Date: The date on
which the record was modified.
• 2. Definition: This denotes the name of
the nucleotide sequence.
• 3. Accession: This covers accession
number, accession version, and GI
number.
• Accession number can be defined as
the unique identifier associated with
each nucleotide sequence present in
the database.
• 4. VERSION - Identification number
assigned to a single, specific sequence
in the database. This number is in the
format “accession.version.”
• 5. GI Also a sequence identification
number. Whenever a sequence is
changed, the version number is
increased and a new GI is assigned.
• 6. Keyword: Defined words that were
used to index the entries.
• 7. The Source: This describes organism
from which sequences have been
obtained.
• 8. Organism - The scientific name
(usually genus and species) and
phylogenetic lineage
• 9. REFERENCE - Citations of publications
by sequence authors, the journal from
which with the sequence was derived
 10. Features: These
consist of the
information derived
from the sequence such
as biological source,
 exon,
 intron,
 promoters,
 CDS
 alternate splice,
 Base Count,
 Origin
European Molecular Biology
Laboratory (EMBL)
The EMBL Nucleotide Sequence Database is maintained by EBI, UK
It was formed in the year 1974
It develops and maintains a large number of databases, and scientists can access
the data free of cost.
This database serves as the primary source of nucleotide sequences for Europe.
in this database, the nucleotide sequence data generated by large-scale genome-
sequencing projects and those available from the European Patent Office can be
submitted
Conti…
Data collection is done in collaboration with GenBank (USA) and the DNA
Database of Japan (DDBJ).
The other genomic databases held at EBI are
Ensembl (a database of genome annotation)
Genome Reviews.
The daily releases of the database contain new submissions and updated
sequence data
while every 3 months the entire database is released.
DDBJ
DDBJ: DNA Data Bank of Japan Is a biological database that collects DNA
sequences submitted by researchers.
 It is run by the National Institute of Genetics, Japan.
DDBJ Flat File Format
The data submitted in DDBJ is managed and retrieved according to the DDBJ
format (flat file).
The flat file includes the sequence and the information of who submitted the
data, references, source organisms, and information about the feature, etc
Ensembl Genome Database
Ensembl is one of several well known genome browsers for the retrieval of
genomic information from several organisms including human, plants, bacteria
and animals.
Created and maintained by the EBI and the Sanger Center (UK)
databases for green plants
There are three different comparative genomic databases for green plants,
namely,
GreenPhylDB,
Plaza,
Phytozome
These databases aim to support studies on genomics studies related to plant
evolution and
 to provides comparative data on genomes and gene families and the tools for
their analysis.
Conti…..
It provides information on
genomic context of plant genes,
Gene homologues, and paralogues,
RNA transcripts from the given genes,
peptide sequences, and
functions of gene families.
It allows access to complete genome sequences available in the database.
Protein Databases
Swiss-Prot
• A protein sequence database which strives to provide a high level of annotation:
* the function of a protein
* domains structure
* post-translational modifications
* variants
• Complete, Curated, Non-redundant and cross-referenced with 34 other databases
its repository contains the amino acid sequence, the protein name and description,
taxonomic data, and citation information
PFAM
A database of protein families, Pfam contains annotations as well as multiple
sequence alignments generated using hidden Markov models
Conti…

TrEMBL: TrEMBL (translation of EMBL nucleotide sequence database) was

introduced by the European Bioinformatics Institute in collaborating with Swiss-Prot
• Created in 1996 as a computer annotated supplement to SWISS-PROT.
• Contains translations of all coding sequences (CDS) in EMBL.
PIR: The Protein Information Resource (PIR) is an integrated public bioinformatics
resource that supports genomic and proteomic research and scientific studies
The PIR serves the scientific community through on-line access, and performing off-
line sequence identification services for researchers.
It is a database of freely accessible protein sequences which contains high-quality
data and functional information for the proteins
Structure databases
There are many structural database that include
Protein Data Bank (PDB)
Important in solving real problems in molecular biology
PDB Established in 1972 at Brookhaven National Laboratory (BNL)
It contains structural information of the macromolecules determined by X-ray,
crystallographic, NMR methods
PDB is maintained by the Research Collaboratory for Structural Bioinformatics
(RCSB).
Conti…
PROSITE: is a database of protein domains and families.
PROSITE contains biologically significant sites, patterns and profiles that help to
reliably identify to which known protein family a new sequence belongs.
CATH: The CATH database (Class, architecure, topology, homologous superfamily)
is a hierarchical classification of protein domain structures, which clusters
proteins at four major structural levels.
Pathway databases
Pathway databases
A pathway database (DB) is a DB that describes biochemical pathways, reactions,
and enzymes
Some examples of the pathway databases are
KEGG (The Kyoto Encyclopedia of Genes and Genomes)
BRENDA,
Biocyc.
Conti…

KEGG: The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary
resource for the Japanese Genome Net service
it is a collection of online databases dealing with genomes, enzymatic
pathways, and biological chemicals
KEGG contains three databases: PATHWAY, GENES, and LIGAND.
The PATHWAY database stores computerized knowledge on molecular
interaction networks.
The GENES database contains data concerning sequences of genes and
proteins generated by the genome projects.
The LIGAND database holds information about the chemical compounds and
chemical reactions that are relevant to cellular processes.
Conti…
BioCyc: The BioCyc Database Collection is a compilation of
pathway and genome information for different organisms.
It includes two other databases,
 EcoCyc which describes Escherichia coli K-12;
 MetaCyc, which describes pathways for more than 300 organisms.

Lecture3 4
No ratings yet
Lecture3 4
73 pages
M Lec 01 & 02 Biological Database
No ratings yet
M Lec 01 & 02 Biological Database
50 pages
Day 1
No ratings yet
Day 1
38 pages
Unit II Bioinformatics
No ratings yet
Unit II Bioinformatics
25 pages
Biological Databases Lec 2,3
No ratings yet
Biological Databases Lec 2,3
49 pages
BCH 505 Bioinformatics 3 (2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3 (2 2) Databases
17 pages
UGRD-ITE6100B Fundamentals of Database System FINAL EXAM
No ratings yet
UGRD-ITE6100B Fundamentals of Database System FINAL EXAM
12 pages
Google Cloud Platform
No ratings yet
Google Cloud Platform
17 pages
Biological Databases ODL
No ratings yet
Biological Databases ODL
31 pages
Nucleic Acid Databases
No ratings yet
Nucleic Acid Databases
37 pages
NT Seq Database
No ratings yet
NT Seq Database
4 pages
CMSC 838T - Lecture 9: Bioinformatics Databases
No ratings yet
CMSC 838T - Lecture 9: Bioinformatics Databases
65 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Dbs Merge
No ratings yet
Dbs Merge
313 pages
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
No ratings yet
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
105 pages
Next Generation Computing: Anjalai Ammal Mahalingam Engineering College Kovilvenni, India
No ratings yet
Next Generation Computing: Anjalai Ammal Mahalingam Engineering College Kovilvenni, India
49 pages
Pentaho Data Integration
No ratings yet
Pentaho Data Integration
99 pages
Data Base in Bioinformatics
No ratings yet
Data Base in Bioinformatics
30 pages
Seminar Bioinformatics
No ratings yet
Seminar Bioinformatics
13 pages
BIOINFORMATICS - eNOTES
No ratings yet
BIOINFORMATICS - eNOTES
23 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Sr. Facets Business Analyst Resume - Hire IT People - We Get IT Done
100% (1)
Sr. Facets Business Analyst Resume - Hire IT People - We Get IT Done
6 pages
Biological Data Bases
No ratings yet
Biological Data Bases
36 pages
DNA Databases
No ratings yet
DNA Databases
12 pages
DP200 - PracticeTests 2 AnswersAndExplanation
No ratings yet
DP200 - PracticeTests 2 AnswersAndExplanation
107 pages
Databases Class Work
No ratings yet
Databases Class Work
48 pages
Introduction To Bioinformatics (Databases)
No ratings yet
Introduction To Bioinformatics (Databases)
28 pages
Applications of Fuzzy Logic in Geographic Informat
No ratings yet
Applications of Fuzzy Logic in Geographic Informat
11 pages
16 Data Mining Techniques - The Complete List - Talend
No ratings yet
16 Data Mining Techniques - The Complete List - Talend
9 pages
Biol BDs Singapore
No ratings yet
Biol BDs Singapore
24 pages
Module 2 Biodata
No ratings yet
Module 2 Biodata
36 pages
04 Rel-Algebra2
No ratings yet
04 Rel-Algebra2
23 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
1 - Introduction and Sequence Database
No ratings yet
1 - Introduction and Sequence Database
51 pages
Biologicaldatabase 190402034501
No ratings yet
Biologicaldatabase 190402034501
26 pages
Biological Data and Database
No ratings yet
Biological Data and Database
13 pages
Database
No ratings yet
Database
16 pages
Database 2
No ratings yet
Database 2
15 pages
Developing A I Based Scheme For Project Planning by
No ratings yet
Developing A I Based Scheme For Project Planning by
10 pages
Lec2 Databases
No ratings yet
Lec2 Databases
135 pages
Bsa2a Paraggua Finals
No ratings yet
Bsa2a Paraggua Finals
3 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
Title: Introduction To PHP Programming Slide 1: Title
No ratings yet
Title: Introduction To PHP Programming Slide 1: Title
4 pages
IS220 Project Desc-2023
No ratings yet
IS220 Project Desc-2023
4 pages
Iit Roorkee Full Stack Software Dev
No ratings yet
Iit Roorkee Full Stack Software Dev
17 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
Databases - Final
No ratings yet
Databases - Final
50 pages
OpenBuildings Deployment Guide For ProjectWise Managed Configurations - v1.1
No ratings yet
OpenBuildings Deployment Guide For ProjectWise Managed Configurations - v1.1
59 pages
Biological Sequence Databases
No ratings yet
Biological Sequence Databases
35 pages
Biological Databases
No ratings yet
Biological Databases
17 pages
Lecture2-DataMining For Bioinformatics
No ratings yet
Lecture2-DataMining For Bioinformatics
7 pages
Bioinformatics Biological Database
No ratings yet
Bioinformatics Biological Database
31 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Bioinformatics Lab Notebook: Comsats University, Islamabad
No ratings yet
Bioinformatics Lab Notebook: Comsats University, Islamabad
27 pages
Lecture 2 Introduction To The Computational Tools
No ratings yet
Lecture 2 Introduction To The Computational Tools
15 pages
Aim:-To Familiarize With Databases (Ncbi, Swissprot, Embl, DDBJ)
No ratings yet
Aim:-To Familiarize With Databases (Ncbi, Swissprot, Embl, DDBJ)
8 pages
Biological Databases: - Bio-Informatics
No ratings yet
Biological Databases: - Bio-Informatics
16 pages
Information Tech NSC Grade 12 June 2021 P1 and Memo
No ratings yet
Information Tech NSC Grade 12 June 2021 P1 and Memo
47 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Building A Real-Time User Action Counti... - Pinterest Engineering Blog - Medium
No ratings yet
Building A Real-Time User Action Counti... - Pinterest Engineering Blog - Medium
13 pages
Resume Raviteja Madishetty PDF
No ratings yet
Resume Raviteja Madishetty PDF
3 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Bioinfi U3 Part - 1
No ratings yet
Bioinfi U3 Part - 1
4 pages
Sohil Hendre Data Engineer New May ZS - PDF (1) .PDF - 20240927 - 145457 - 0000
No ratings yet
Sohil Hendre Data Engineer New May ZS - PDF (1) .PDF - 20240927 - 145457 - 0000
2 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
11g Lag Resolution Using SCN Method
No ratings yet
11g Lag Resolution Using SCN Method
5 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
BCH 428 Slide
No ratings yet
BCH 428 Slide
32 pages
Spring Boot Developer Resume
No ratings yet
Spring Boot Developer Resume
6 pages
Biological Databases Genbank
No ratings yet
Biological Databases Genbank
31 pages
Index: Auroras Technological and Research Institute
No ratings yet
Index: Auroras Technological and Research Institute
56 pages
Forms and Reports
No ratings yet
Forms and Reports
3 pages
Introduction To Databases
No ratings yet
Introduction To Databases
21 pages
Biological Database
No ratings yet
Biological Database
8 pages
AWS DE Certification Guide 1728124415
No ratings yet
AWS DE Certification Guide 1728124415
112 pages
Cuares Proposed Conten4t
No ratings yet
Cuares Proposed Conten4t
2 pages
Lecture 5 - DataBase
No ratings yet
Lecture 5 - DataBase
18 pages
Biological Data and Database Biological Data
No ratings yet
Biological Data and Database Biological Data
10 pages
CH12
No ratings yet
CH12
8 pages
System Biology Assignment
No ratings yet
System Biology Assignment
17 pages
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
No ratings yet
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
5 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
WebProgramming - Exercises
No ratings yet
WebProgramming - Exercises
6 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Data Engineering SQL Concepts - Mindmap
No ratings yet
Data Engineering SQL Concepts - Mindmap
1 page
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
IICS MCQs
100% (1)
IICS MCQs
7 pages
Computer Science - Xii - Question Paperfinal
No ratings yet
Computer Science - Xii - Question Paperfinal
9 pages
Mapa Tipo de Datos
No ratings yet
Mapa Tipo de Datos
1 page
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)

Database

Uploaded by

Database

Uploaded by

Databases

Dr. Shazia Rehman

TrEMBL: TrEMBL (translation of EMBL nucleotide sequence database) was

You might also like