Databases Bioinformatics

Uploaded by

Sukhdeep Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views

Databases Bioinformatics

Uploaded by

Sukhdeep Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Bioinformatics Databases

MV 2017
Data in Bioinformatics
• DNA- Sequences of nucleotides (ATGC) that
contain information in the form of triplet codons
having specific reading frames and built-in control
segments
• RNA sequences (AUGC) mRNA, tRNA, hnRNA
• Protein sequences- Strings of Amino-acid
sequences (e. g., Aspartate, Glycine, Histidine,
Isoleucine, Leucine, Methionine, Serine,
Threonine, Valine, Phenyl alanine, Tyrosine)
• Structure data (Protein structure-primary
secondary, tertiary, Quaternary, 3D views)
• Images of 2D Gel electrophoresis
Bioinformatics Databases
• Databases are convenient system to properly
store, search and retrieve any type of data
• Databases are different types based on nature of
information and manner (complexity) of data
storage
Types of databases
Based on nature of information db are divided
into
• 1. Generalized db: DNA, Protein (e. g., NCBI)
– a. Sequence db: nucleotides or amino acids
– b. Structure db: structure of macromolecules
• 2. Specialized db: Expressed Sequence Tags
(EST), Single Nucleotide Polymorphisms (SNP)
Based on the manner of data storage, db are divided
into
1. Primary or abbreviated db: in original form, taken
as such from the source. Eg: GenBank, Swiss-Prot
2. Secondary db: value added db with derived
information from primary db
3. Composite db: combined primary db
Redundant and Non-redundant db: more than
one copy of each sequence
Boutique db: species specific sequence data
• Db entries composed of
– Core data: original sequence
– Supplementary data or annotation (source,
author, date, method used etc)

• Sequence formats
– PIR (Protein Information Resource)/NBRF(National
Biomedical Res. Foundation) - >P, >N
– FASTA (Fast Alignment) - >
– GDE (Genetic Data Environment) - %
Primary Databases
• In original form, taken from the source
• Original submission by researcher
• Contents controlled by the submitter
• Data explosion in 1980s - so started many
repositories
1. Nucleic acid sequence db
2. Protein sequence db
3. Metabolite db
Secondary databases
• Derivative db
• Result of analyses of sequences in the primary db
• Secondary db built up from primary db
• Secondary db analyzed in a variety of ways and
contain different information in different formats
• Contents of secondary db controlled by a third
party
• Eg: Prosite, Prints, Blocks
Nucleic acid sequence databases
• Collection of nucleotide sequences
• Organize and distribute nucleotide sequences from
all available source
• In the form of a text file
• Can read by humans and computer
• Many dbs are assembled from several
publications, so overlapping fragments of
complete sequence
• First sequence - Yeast t-RNA with 77 bases in 1964
NCBI
• National Centre for Biotechnology Information
• Established on November 4, 1988 as part of
the National Library of Medicine (NLM) at the
National Institute of Health (NIH), USA
• Headquarters in Bethesda, Maryland
• Legislation sponsored by Senator Claude
Pepper
Services
• Pubmed
• Genbank
• BLAST
• Entrez
GenBank
• GenBank ® is the NIH genetic sequence
database, an annotated collection of all
publicly available DNA sequences
• GenBank is part of the International
Nucleotide Sequence Database Collaboration
(INSDC) , which comprises the DNA DataBank
of Japan (DDBJ), the European Nucleotide
Archive (ENA), and GenBank at NCBI. These
three organizations exchange data on a daily
basis.
International Nucleotide Sequence
Database Collaboration (INSDC)
• INSDC consist of
• 1. EMBL
• 2. DDBJ
• 3. GenBank
• Daily exchange of data
• The GenBank database is designed to provide
and encourage access within the scientific
community to the most up to date and
comprehensive DNA sequence information.
Therefore, NCBI places no restrictions on the
use or distribution of the GenBank data.
However, some submitters may claim patent,
copyright, or other intellectual property rights
in all or a portion of the data they have
submitted.
What is in it?
• Annotated nucleotide sequences, including
mRNA sequences with coding regions,
segments of genomic DNA with a single gene
or multiple genes, and ribosomal RNA gene
clusters
• More than 100,000 organisms
• Aminoacid translations (CDS)
EMBL
• The European Molecular Biology Laboratory (EMBL) is
a molecular biology research institution supported by
25 member states, four prospect and two associate
member states. EMBL was constituted in 1974 and is
an intergovernmental organisation funded by public
research money from its member states. Research at
EMBL is conducted by approximately 85 independent
groups covering the spectrum of molecular biology.
EMBL groups and laboratories perform basic research
in molecular biology and molecular medicine as well as
training for scientists, students and visitors.
Stations
• The Laboratory operates from six sites: the
main laboratory in Heidelberg, and
outstations in Hinxton (the European
Bioinformatics Institute (EBI), in
England), Grenoble (France), Hamburg (Germa
ny), Monterotondo (near Rome)
and Barcelona (Spain).
European Molecular Biology Laboratory
(EMBL)
• From European Bioinformatics Institute (EBI), UK
• Collect and assemble data from
-Direct author submission
-Genome sequencing groups
-Patent application
-Literature
• Goal - integrate nucleotide sequence data and
annotation into the wealth of bioinformatics
resources
• By cross reference and Sequence Retrieval System
(SRS) data can be viewed in 200 local stations
• 2494 completed genomes
EMBL
• The roots of the EMBL-EBI lie in the EMBL
Nucleotide Sequence Data Library (now
known as EMBL-Bank), which was established
in 1980 at the EMBL laboratories in
Heidelberg, Germany and was the world's first
nucleotide sequence database.
• The original goal was to establish a central
computer database of DNA sequences, to
supplement sequences submitted to journals.
• The EMBL-EBI hosts a number of publicly open, free to use
life science resources, including biomedical databases,
analysis tools and bio-ontologies. These include:
• ArrayExpress - archive of gene expression experiments
• BioModels Database - a database of computational models
relevant to the life sciences
• BioStudies - a database that serves as a generic data
archive at EMBL-EBI for biomolecular datasets
• Chemical Entities of Biological Interest (ChEBI) - database
and ontology of molecular entities
• European Nucleotide Archive (ENA) - resource of
nucleotide sequencing information
• Ensembl project - genome databases for vertebrates and
other eukaryotic species (joint with Wellcome Trust Sanger
Institute)
• Europe PubMed Central - database offering free access to
collection of biomedical research literature
DNA Data Bank of Japan
• Currently, DDBJ Center is in operation at the
National Institute of Genetics (NIG) in
Mishima, Japan with endorsement of MEXT;
Japanese Ministry of Education, Culture,
Sports, Science and Technology.
• DDBJ Center is reviewed and advised by its
own advisory board, DNA Database Advisory
Committee (an outside committee of NIG),
and also by the advisory board to
INSDC, International Advisory Committee.
• Started in 1986
• It is located at the National Institute of
Genetics (NIG) in the Shizuoka prefecture of
Japan. It is also a member of the INSDC. It
exchanges its data with European Molecular
Biology Laboratory at the European
Bioinformatics Institute and with GenBank at
the National Center for Biotechnology
Information on a daily basis.
• These three databanks contain the same data
at any given time.
Protein sequence databases
• SWISSPROT, PIR
• UniProtKB/Swiss-Prot is the manually
annotated and reviewed section of the
UniProt Knowledgebase (UniProtKB).
• It is a high quality annotated and non-
redundant protein sequence database, which
brings together experimental results,
computed features and scientific conclusions.
• Since 2002, it is maintained by the UniProt
consortium and is accessible via the UniProt
website.
UniProtKB/Swiss-Prot
• UniProtKB/Swiss-Prot is the manually
annotated and reviewed section of the
UniProt Knowledgebase (UniProtKB).
It is a high quality annotated and non-
redundant protein sequence database, which
brings together experimental results,
computed features and scientific conclusions.
• Since 2002, it is maintained by the UniProt
consortium and is accessible via the UniProt
website.
Swiss-Prot
• Established in 1986 by Dept. of Biochemistry, University of Geneva
• Maintenance by Swiss Institute of Bioinformatics (SIB) and EMBL

• Database composed of 2 parts

1. Core data - sequence reference and taxonomic details
2. Annotation - sequence variants, functions, 2o & 3o structures

• Provide high level annotation including functions of the protein

• Maintain high quality and structure - first choice for most research
purpose

• Swiss-Prot is supplemented by TrEMBL in 1996 - translated EMBL

• TrEMBL has 2 sections

1. SP-TrEMBL - data included in the Swiss-Prot from EMBL
2. REM-TrEMBL - data which are not included in the Swiss-Prot
• A well-defined manual curation process is
essential to ensure that all manually
annotated entries are handled in a consistent
manner. This process consists of 6 major
mandatory steps: (1) sequence curation, (2)
sequence analysis, (3) literature curation, (4)
family-based curation, (5) evidence
attribution, (6) quality assurance and
integration of completed entries. Curation is
performed by expert biologists using a range
of tools that have been iteratively developed
in close collaboration with curators.
Protein Sequence Databases
1. PIR - Protein Information Resource
• Established in 1984 by National Biomedical
Research Foundation (NBRF), Washington DC
• Aim - identification and interpretation of protein
sequence information
• Investigating evolutionary relationship among
proteins
• Help to do search and similarity analysis
• Provide integrated environment for sequence
analysis between 3 units
PIR - Protein Information Resource
• Established in 1984 by National Biomedical
Research Foundation (NBRF), Washington DC
• Aim - identification and interpretation of protein
sequence information
• Investigating evolutionary relationship among
proteins
• Help to do search and similarity analysis
• Provide integrated environment for sequence
analysis between 3 units
PIR is composed of 3 databases:
1. PSD - protein sequence database
2. NREF - Non-redundant reference database
3. iProClass - provides structural and functional
features of proteins

• PIR database split into 4 sections - differ in terms

of quality of data and levels of annotation provided
1. fully classified and annotated entries
2. preliminary entries, not thoroughly reviewed
3. unverified entries, not reviewed
4. genetically engineered sequences
Structure databases
PDB

• Protein Data Bank

• The Protein Data Bank (PDB) is a crystallographic
database for the three-dimensional structural data of
large biological molecules, such
as proteins and nucleic acids. The data, typically
obtained by X-ray crystallography, NMR
spectroscopy, or, increasingly, cryo-electron
microscopy
www.rcsb.org/
• The data is freely accessible on the Internet via
the websites of its member organisations (PDBe,
PDBj, and RCSB).
• The PDB is overseen by an organization called
the Worldwide Protein Data Bank, wwPDB.
• The PDB is a key resource in areas of structural
biology, such as structural genomics. Most major
scientific journals, and some funding agencies,
now require scientists to submit their structure
data to the PDB. Many other databases use
protein structures deposited in the PDB.
• Molecular graphics display, the Brookhaven
RAster Display (BRAD), is used to visualize
protein structures in 3-D.
• The file format initially used by the PDB was
called the PDB file format.
• PDB was initiated in 1968 with the help of
BRAD visualization of protein structure and X
ray crystallographic studies of proteins
• In October 1998, the PDB was transferred to
the Research Collaboratory for Structural
Bioinformatics (RCSB).
• In 2003, with the formation of the wwPDB,
the PDB became an international organization.
The founding members are PDBe (Europe),
RCSB (USA), and PDBj (Japan).
• Each of the three members of wwPDB can act
as deposition, data processing and
distribution centers for PDB data.
• The data processing refers to the fact that
wwPDB staff review and annotate each
submitted entry
INSULIN
NDB
• The Nucleic Acid Database (NDB; Berman et
al., 1992) was established in 1991 as a
resource for specialists in the field of nucleic
acid structure. Over the years, the NDB has
developed generalized software for
processing, archiving, querying and
distributing structural data for nucleic acid-
containing structures. The core of the NDB has
been its relational database of nucleic acid-
containing crystal structures.
• It allows researchers to perform comparative
analyses of nucleic acid-containing structures
selected from the NDB
• Structures available in the NDB include RNA and
DNA oligonucleotides with two or more bases
either alone or complexed with ligands, natural
nucleic acids such as tRNA and protein±nucleic
acid complexes. The archive stores both primary
and derived information about the structures
• The primary data include the crystallographic
coordinate data, structure factors and
information about the experiments used to
determine the structures, such as crystallization
information, data-collection and refinement
statistics.
OMIM
• Online Mendelian Inheritance in Man
• A comprehensive, authoritative and timely
compendium of human genes and genetic
phenotypes
• The full-text, referenced overviews in OMIM
contain information on all known Mendelian
disorders and over 12,000 genes.
• Initiated in the early 1960s by Dr. Victor A. Mc
Kusick as a catalogue of Mendelian traits and
disorders, entitled Mendelian Inheritance in
Man
• 1995 internet version

Instrumental Methods of Analysis
From Everand
Instrumental Methods of Analysis
Nalini C. N.
No ratings yet
Structure of Eukaryotic Genome
No ratings yet
Structure of Eukaryotic Genome
48 pages
(Lewis I. Held JR) Quirks of Human Anatomy An Ev PDF
100% (3)
(Lewis I. Held JR) Quirks of Human Anatomy An Ev PDF
273 pages
Bioinformatics Class Notes
No ratings yet
Bioinformatics Class Notes
12 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
104 pages
Unit 1: Structure Determination: Protein Structure Database PDB PDB File Format Ramachandran Plot
No ratings yet
Unit 1: Structure Determination: Protein Structure Database PDB PDB File Format Ramachandran Plot
33 pages
Bio305 Molecular Biology Summary 08024665051
No ratings yet
Bio305 Molecular Biology Summary 08024665051
35 pages
Group # 13
No ratings yet
Group # 13
49 pages
Golgi Complex: Structure and Function
No ratings yet
Golgi Complex: Structure and Function
5 pages
Gene Isolation
No ratings yet
Gene Isolation
25 pages
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
No ratings yet
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
11 pages
NCBI Part1
100% (2)
NCBI Part1
52 pages
Metagenomics and Industrial Applications: Perspectives
100% (1)
Metagenomics and Industrial Applications: Perspectives
7 pages
DDBJ Database BioInformatics Notes
No ratings yet
DDBJ Database BioInformatics Notes
8 pages
Protein Data Bank
No ratings yet
Protein Data Bank
5 pages
Unit 5-Introduction To Biological Databases
No ratings yet
Unit 5-Introduction To Biological Databases
14 pages
Letstalkacademy Unit 3 Molecular Biology
No ratings yet
Letstalkacademy Unit 3 Molecular Biology
193 pages
KEGG
No ratings yet
KEGG
6 pages
Genome Organization in Prokaryote
No ratings yet
Genome Organization in Prokaryote
21 pages
Scope and Application of Genetic Engineering RS Maam HW
No ratings yet
Scope and Application of Genetic Engineering RS Maam HW
29 pages
Biotecnika Csir Net Unit 8 Abc Notes 1 - 2
No ratings yet
Biotecnika Csir Net Unit 8 Abc Notes 1 - 2
14 pages
DBT Skill Dev Scheme Details
100% (2)
DBT Skill Dev Scheme Details
2 pages
5 Mitochondrial DNA and Chloroplast DNA
No ratings yet
5 Mitochondrial DNA and Chloroplast DNA
16 pages
Protein Database Overview
No ratings yet
Protein Database Overview
13 pages
Histone Modifications
No ratings yet
Histone Modifications
8 pages
Rat Liver Dna Isolation
67% (3)
Rat Liver Dna Isolation
4 pages
Microarray 09
No ratings yet
Microarray 09
73 pages
Nucleic Acid
No ratings yet
Nucleic Acid
28 pages
Recombinant Dna Technology
No ratings yet
Recombinant Dna Technology
21 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
Inheritance Biology EDUNCLE
100% (1)
Inheritance Biology EDUNCLE
54 pages
IGenetics, Chapter 2 - Russel
No ratings yet
IGenetics, Chapter 2 - Russel
73 pages
PSSM
No ratings yet
PSSM
17 pages
Organization and Structure of Genome: Genome Size Variation
100% (1)
Organization and Structure of Genome: Genome Size Variation
27 pages
Peroxisomes and Glyoxysomes
No ratings yet
Peroxisomes and Glyoxysomes
4 pages
Introduction To Databases
No ratings yet
Introduction To Databases
7 pages
Next Generation Sequencing - : An Overview
No ratings yet
Next Generation Sequencing - : An Overview
46 pages
Cath Database
No ratings yet
Cath Database
16 pages
An Overview of Microbiology: Dr. Thaigar Parumasivam Email: Thaigarp@usm - My
No ratings yet
An Overview of Microbiology: Dr. Thaigar Parumasivam Email: Thaigarp@usm - My
26 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
40 pages
Omics
No ratings yet
Omics
6 pages
Submitted by Aiswarya V 1St MSC Zoology Roll Number 3301
No ratings yet
Submitted by Aiswarya V 1St MSC Zoology Roll Number 3301
32 pages
The Computer
No ratings yet
The Computer
4 pages
Mitochondrial Inheritance (Maternal Inheritance)
No ratings yet
Mitochondrial Inheritance (Maternal Inheritance)
36 pages
Practical 1 Lab Safety and DNA Isolation PDF
No ratings yet
Practical 1 Lab Safety and DNA Isolation PDF
14 pages
Restriction Enzymes
No ratings yet
Restriction Enzymes
7 pages
Blast
No ratings yet
Blast
6 pages
Test Bank for Elements of Ecology 8th Edition by Smith 2024 scribd download full chapters
100% (9)
Test Bank for Elements of Ecology 8th Edition by Smith 2024 scribd download full chapters
45 pages
Concept of Gene
No ratings yet
Concept of Gene
12 pages
PFAM Database
No ratings yet
PFAM Database
22 pages
Rna Biosynthesis (Transicription)
No ratings yet
Rna Biosynthesis (Transicription)
33 pages
Developmental Plant Biogical
No ratings yet
Developmental Plant Biogical
30 pages
Isolation of DNA
No ratings yet
Isolation of DNA
19 pages
2 Introduction To PDB
No ratings yet
2 Introduction To PDB
43 pages
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
No ratings yet
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
66 pages
Blotting Techniques
No ratings yet
Blotting Techniques
28 pages
Molecular Cell Biology 1. Exam Questions
No ratings yet
Molecular Cell Biology 1. Exam Questions
3 pages
Genetic Code and Its Characteristics
No ratings yet
Genetic Code and Its Characteristics
17 pages
Homology Modelling
No ratings yet
Homology Modelling
29 pages
Allosteric Regulation & Covalent Modification
100% (1)
Allosteric Regulation & Covalent Modification
10 pages
Impacts of Computers On Todays Society
No ratings yet
Impacts of Computers On Todays Society
6 pages
Botany Syllabus 2016
No ratings yet
Botany Syllabus 2016
34 pages
55 (B) Physics For Blind Candidates
No ratings yet
55 (B) Physics For Blind Candidates
15 pages
Timing: 9:30 Am To 12:30 PM: Date Sheet For Pre - Board-2 Examination (2018-19)
No ratings yet
Timing: 9:30 Am To 12:30 PM: Date Sheet For Pre - Board-2 Examination (2018-19)
1 page
Revised Date Sheet Date Sheet For Pre - Board-2 Examination (2018-19) Timing: 9:30 Am To 12:30 PM
No ratings yet
Revised Date Sheet Date Sheet For Pre - Board-2 Examination (2018-19) Timing: 9:30 Am To 12:30 PM
1 page
Advances In Trichoderma Biology For Agricultural Applications N Amaresan pdf download
No ratings yet
Advances In Trichoderma Biology For Agricultural Applications N Amaresan pdf download
57 pages
Bioinformatics-II (BIF501) : Assignment No.01
No ratings yet
Bioinformatics-II (BIF501) : Assignment No.01
15 pages
Unit 3 Genetic Algorithm Final
100% (1)
Unit 3 Genetic Algorithm Final
32 pages
An introduction to ecological genomics 2nd ed Edition Nico M. Van Straalen 2024 Scribd Download
100% (2)
An introduction to ecological genomics 2nd ed Edition Nico M. Van Straalen 2024 Scribd Download
55 pages
Biology The Dynamic Science 1st Edition Peter J. Russell - Quickly download the ebook to read anytime, anywhere
100% (1)
Biology The Dynamic Science 1st Edition Peter J. Russell - Quickly download the ebook to read anytime, anywhere
57 pages
Mark Scheme (Results) January 2022: Pearson Edexcel International Advanced Level in Biology (WBI12) Paper 01
No ratings yet
Mark Scheme (Results) January 2022: Pearson Edexcel International Advanced Level in Biology (WBI12) Paper 01
28 pages
Plant Genetic Resources Its Utilization and Conservation
No ratings yet
Plant Genetic Resources Its Utilization and Conservation
8 pages
Imat Mock Test
No ratings yet
Imat Mock Test
23 pages
Kim Et Al., 2020
No ratings yet
Kim Et Al., 2020
19 pages
Mixtest 2022 1 BW
No ratings yet
Mixtest 2022 1 BW
9 pages
Simplified Protocol For Faster Transformation of A
No ratings yet
Simplified Protocol For Faster Transformation of A
12 pages
Question About Human Evolution Possible Evidence
No ratings yet
Question About Human Evolution Possible Evidence
3 pages
The Runaway Brain - The Evolution of Human Uniqueness - by Christopher Wills (Quarterly Review of Biology, Vol. 70, Issue 1) (1995)
No ratings yet
The Runaway Brain - The Evolution of Human Uniqueness - by Christopher Wills (Quarterly Review of Biology, Vol. 70, Issue 1) (1995)
3 pages
A Stone Age Archaeological Site in The Arabian Pen
No ratings yet
A Stone Age Archaeological Site in The Arabian Pen
4 pages
Download ebooks file Focus On Life Science Interactive Student Edition all chapters
100% (6)
Download ebooks file Focus On Life Science Interactive Student Edition all chapters
81 pages
Physics Project
No ratings yet
Physics Project
8 pages
BSC-Hon's-Zoology-Scheme&Syllabus (1st To 6th Sem)
No ratings yet
BSC-Hon's-Zoology-Scheme&Syllabus (1st To 6th Sem)
83 pages
Grade 10 Biology Week 9 Lesson 1 Worksheet 1 and Solutions PDF
No ratings yet
Grade 10 Biology Week 9 Lesson 1 Worksheet 1 and Solutions PDF
4 pages
3 Biopsychology
No ratings yet
3 Biopsychology
32 pages
Chase 1969 A
No ratings yet
Chase 1969 A
52 pages
Personalized Medicine - A Biological Approach To Patient Treatment - FDA
No ratings yet
Personalized Medicine - A Biological Approach To Patient Treatment - FDA
5 pages
BIOLOGY Full Notes
No ratings yet
BIOLOGY Full Notes
2 pages
Get Genetic Databases Socio ethical Issues in the Collection and Use of DNA 1st Edition Oonagh Corrigan PDF ebook with Full Chapters Now
100% (7)
Get Genetic Databases Socio ethical Issues in the Collection and Use of DNA 1st Edition Oonagh Corrigan PDF ebook with Full Chapters Now
68 pages
Seminar LS 2024
No ratings yet
Seminar LS 2024
1 page
SAS For Biochemistry BIO 024 Module 6 1
No ratings yet
SAS For Biochemistry BIO 024 Module 6 1
33 pages
Larson Fuller 2014 The Evolution of Animal Domestication
No ratings yet
Larson Fuller 2014 The Evolution of Animal Domestication
24 pages
Mutagenic Effects of Sodium Azide On The Quality of Maize Seeds
No ratings yet
Mutagenic Effects of Sodium Azide On The Quality of Maize Seeds
7 pages
Blood Type HW Worksheet
No ratings yet
Blood Type HW Worksheet
2 pages
(Science 10) Mutations
No ratings yet
(Science 10) Mutations
9 pages