0% found this document useful (0 votes)

7 views18 pages

Disclaimer

The document provides a disclaimer regarding the non-commercial and educational use of content, acknowledging potential copyright issues. It details various primary protein databases, including their purposes, structures, and examples such as PIR, SWISS-PROT, and UniProt. Additionally, it discusses the Protein Data Bank (PDB) and its significance in storing 3-D structural data of biological molecules.

Uploaded by

Sneha Ardeshna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views18 pages

Disclaimer

Uploaded by

Sneha Ardeshna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Disclaimer

It is hereby declared that the production of the said content is meant for non-commercial, scholastic and research
purposes only.

We admit that some of the content or the images provided in this channel's videos may be obtained through the
routine Google image searches and few of them may be under copyright protection. Such usage is completely
inadvertent.

It is quite possible that we overlooked to give full scholarly credit to the Copyright Owners. We believe that the non-
commercial, only-for-educational use of the material may allow the video in question fall under fair use of such
content. However we honour the copyright holder's rights and the video shall be deleted from our channel in case of
any such claim received by us or reported to us.
Department
of

Primary Protein Microbiology

Unit no 3
Introduction to
Databases databases

Subject name
and code
Bioinformatics
& Biostatistics;
02MB0301
Dr. Purvi M.
Rakhashiya
Primary Protein databases

• The PRIMARY databases hold the experimentally determined

protein sequences inferred from the conceptual translation of
the nucleotide sequences. This, may and may not contain
experimentally derived information, but has arisen as a result of
interpretation of the nucleotide sequence information.
• There is a number of primary protein sequence and structure
databases and each requires some specific consideration. The
following are the types with examples:

• Protein sequence databases:

PIR
MIPS
SWISS-PROT
TrEMBL
NRL-3D
• Protein Structure Databases:
PDB
Protein Information Resource (PIR)

• The Protein Sequence Database was developed at the National

Biomedical Research Foundation (NBRF) in the early 1960s by
Margaret Dayhoff as a collection of sequences for investigating
evolutionary relationships among proteins.
• Since 1988, the Protein Sequence Database has been maintained
collaboratively by PIR-International, the consortium includes the
Protein Information Resource (PIR) at the NBRF, the International
Protein Information Database of Japan (JIPID), and the Martinsried
Institute for Protein Sequences (MIPS)
• In its current form, the database is split into four distinct sections,
designated PIR1-PIR4, which differ in terms of the quality of data and
level of annotation provided: PIR1 contains fully classified and
annotated entries; PIR2 includes preliminary entries, which have not
been thoroughly reviewed and may contain redundancy; PIR3
contains unverified entries, which have not been reviewed; and PIR4
entries fall into one of four categories:
• (i) conceptual translations of artefactual sequences;
• (ii) conceptual translations of sequences that are not transcribed or
translated;
• (iii) protein sequences or conceptual translations that are extensively
genetically engineered; or
• (iv) sequences that are not genetically encoded and not produced on
ribosomes.
Martinsried Institute for Protein Sequences
(MIPS)
• The Martinsried Institute for Protein Sequences (MIPS) collects and
processes sequence data for PIR-International Protein Sequence
Database project
• The database is distributed with PATCHX, a supplement of unverified
protein sequences from external sources.
Uni-Prot/ SWISS PROT

• SWISS-PROT is a protein sequence database which, from its inception in

1986, was produced collaboratively by the Department of Medical
Biochemistry at the University of Geneva and the EMBL; after 1994, the
collaboration moved to EMBL's UK outstation, the EBI
• In April 1998, further change saw a move to the Swiss Institute of
Bioinformatics (SIB); hence the database is now maintained
collaboratively by SIB and EBI/EMBL
• The database endeavours to provide high-level annotations, including
descriptions of the function of the protein, and of the structure of its
domains, its post-translational modifications, variants, and so on
• SWISS-PROT aims to be minimally redundant, and is interlinked to many
other resources. In 1996, a computer-annotated supplement to SWISS-
PROT was created, termed TrEMBL, which is described in more detail
below
• First, however, we will take a close look at the structure of SWISS-PROT
entries.
The Structure of Swissprot entries

The structure of the database, and the quality of its

annotations, sets SWISS-PROT apart from other
protein sequence resources and has made it the
database of choice for most research purposes.

By mid-1998, the database contained -70000

entries from more than 5000 different species, the
bulk of these coming from just a small number of
model organisms (e.g., Homo sapiens,
Saccharomyces cerevisiae, Escherichia coli, Mus
musculus and Rattus norvegicus).
UniProt

• UniProt is produced by the UniProt Consortium, a collaboration

between the European Bioinformatics Institute (EBI), the Swiss
Institute of Bioinformatics (SIB) and the Protein Information Resource
(PIR). UniProt comprises four components:
1. The UniProt Knowledgebase (UniProtKB)
2. UniProt Reference Clusters (UniRef)
3. UniProt Archive (UniParc)
4. UniProt Metagenomic and Environmental Sequences (UniMES)
Sources for curation in UniProt KB

The default raw sequence data for

UniProtKB are:
• DDBJ/ENA/GenBank coding
sequence (CDS) translations,
• the sequences of PDB structures,
• sequences from Ensembl and
RefSeq,
• data derived from amino acid
sequences that are directly
submitted to UniProtKB or
scanned from the literature.
UniProt Knowledgebase

• The UniProt Knowledgebase, and in particular UniProtKB/Swiss-Prot, is

used to access functional information on proteins. Every UniProtKB
entry contains the amino acid sequence, protein name or description,
taxonomic data and citation information but in addition to this, we add
as much annotation as possible. This includes widely accepted biological
ontologies, classifications and cross-references, as well as clear
indications on the quality of annotation in the form of evidence
attribution to experimental and computational data.
• UniProtKB/Swiss-Prot contains high-quality manually annotated and
non-redundant protein sequence records. Manual annotation consists
of analysis, comparison and merging of all available sequences for a
given protein
• UniProtKB/TrEMBL contains high-quality computationally analysed
records enriched with automatic annotation and classification.
UniProt Reference Clusters (UniRef)

• Three UniRef databases – UniRef100, UniRef90 and UniRef50 – merge

sequences automatically across species.
• UniRef100 is based on all UniProtKB records. It also contains selected
UniParc records, including Ensembl protein translations from chicken,
cow, dog, fly, Fugu, human, mouse, rat, Tetraodon, Xenopus and
zebrafish. UniRef100 is produced by clustering all these records by
sequence identity. Identical sequences and sub-fragments are presented
as a single UniRef100 entry with accession numbers of all the merged
entries, the protein sequence, links to the corresponding UniProtKB and
archive records.
• UniRef90 and UniRef50 are built from UniRef100 to provide records with
mutual sequence identity of 90% or more, or 50% or more, respectively,
with links to the corresponding UniProtKB records
UniProt Archive (UniParc)

• UniParc is designed to capture all publicly available protein sequence

data and contains all the protein sequences from the main publicly
available protein sequence databases. This makes UniParc the most
comprehensive publicly accessible non-redundant protein sequence
database.
• A protein sequence may exist in several databases and more than once
in a given database, thus creating redundant information. UniParc
overcomes this problem by storing each unique sequence only once,
and assigning it a unique UniParc identifier
UniProt Metagenomic and Environmental
Sequences (UniMES)

• The availability of metagenomic data has necessitated the creation of

a separate database, UniMES, to store sequences which are
recovered directly from environmental samples.
• The predicted proteins from this dataset are combined with
automatic classification by InterPro, an integrated resource for
protein families, domains and functional sites, to enhance the
original information with further analysis.
TrEMBL

• TrEMBL (Translated EMBL) was created in 1996 as a computer-annotated

supplement to SWISS-PROT
• The database benefits from the SWISS-PROT format, and contains
translations of all coding sequences (CDS) in EMBL
• TrEMBL has two main sections, designated SP-TrEMBL and REM-TrEMBL:
• SP-TrEMBL (SWISS-PROT, TrEMBL) contains entries that will eventually be
incorporated into SWISS-PROT, but that have not yet been manually
annotated
• REM-TrEMBL contains sequences that are not destined to be included in
SWISS-PROT - these include immunoglobulins and T-cell receptors,
fragments of fewer than eight amino acids, synthetic sequences, patented
sequences, and codon translations that do not encode real proteins
• TrEMBL was designed to address the need for a well-structured SWISS-PROT-
like resource that would allow very rapid access to sequence data from the
genome projects, without having to compromise the quality of SWISS-PROT
itself by incorporating sequences with insufficient analysis and annotation
NRL-3D
• The NRL-3D database is produced by PIR from sequences extracted
from the Brookhaven Protein Databank (PDB)
• The titles and biological sources of the entries conform to the
nomenclature standards used in the PIR
• Bibliographic references and MEDLINE cross- references are included,
together with secondary structure, active site, binding site and modified
site annotations, and details of experimental method, resolution, R-
factor, etc. Keywords are also provided
• NRL-3D is a valuable resource, as it makes the sequence information in
the PDB available both for keyword interrogation and for similarity
searches
• The database may be searched using the ATLAS retrieval system, a
multi-database information retrieval program specifically designed to
access macromolecular sequence databases.
Protein structure database
• The Protein Data Bank (PDB) is a repository for the 3-D
structural data of large biological molecules, such as
proteins and nucleic acids. The data, typically obtained by
X-ray crystallography or NMR spectroscopy and submitted
by biologists and biochemists from around the world, can
be accessed at no charge on the internet. The PDB is
overseen by an organization called the Worldwide Protein
Data Bank, wwPDB.
• The PDB file format is the standard file format for protein
structure files. It describes how molecules are held
together in 3-D structure of a protein. The file contains
hundreds or thousands of lines called record, which
describes about protein.
• File formats - PDB, mmCIF, XML.

Uni Prot
No ratings yet
Uni Prot
6 pages
I Hate This Website
No ratings yet
I Hate This Website
4 pages
Uniprot Flyer
No ratings yet
Uniprot Flyer
4 pages
Fat Noews
No ratings yet
Fat Noews
16 pages
Bioinformatics Day4
No ratings yet
Bioinformatics Day4
5 pages
Lecture 3-Uniprot-Biological Information Repository.
No ratings yet
Lecture 3-Uniprot-Biological Information Repository.
15 pages
Protein Seq Databases
No ratings yet
Protein Seq Databases
20 pages
Introduction To Databases - NCBI, PDB and Uniprot
No ratings yet
Introduction To Databases - NCBI, PDB and Uniprot
5 pages
Lecture Topic: Protein Databases: Topics Covered
No ratings yet
Lecture Topic: Protein Databases: Topics Covered
67 pages
Module 2 Biodata
No ratings yet
Module 2 Biodata
36 pages
UniPort The Universal Protein Knowledgebase Summary KhZ01
No ratings yet
UniPort The Universal Protein Knowledgebase Summary KhZ01
3 pages
Biologicaldatabase 190402034501
No ratings yet
Biologicaldatabase 190402034501
26 pages
Adv Bi Unit 1
No ratings yet
Adv Bi Unit 1
39 pages
Biological Sequence Databases
No ratings yet
Biological Sequence Databases
33 pages
Database 2
No ratings yet
Database 2
15 pages
6.1 Bioinformatics Databases and Tools - Introduction: Lecture 6: December, 28, 2001
No ratings yet
6.1 Bioinformatics Databases and Tools - Introduction: Lecture 6: December, 28, 2001
31 pages
Bioinformatics Databases
No ratings yet
Bioinformatics Databases
10 pages
Bioinformatics Database
No ratings yet
Bioinformatics Database
50 pages
Databases Class Work
No ratings yet
Databases Class Work
48 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Biological Databases Genbank
No ratings yet
Biological Databases Genbank
31 pages
Bioinformatics (STH Sir)
No ratings yet
Bioinformatics (STH Sir)
13 pages
Database Dalam Bioinformatika
No ratings yet
Database Dalam Bioinformatika
34 pages
The Universal Protein Resource (Uniprot) 2009
No ratings yet
The Universal Protein Resource (Uniprot) 2009
6 pages
Bioinformatics Day2
No ratings yet
Bioinformatics Day2
3 pages
Lecture 4 Nucleic Acid Sequence Database
No ratings yet
Lecture 4 Nucleic Acid Sequence Database
21 pages
Biological Databases BDB
No ratings yet
Biological Databases BDB
5 pages
Data Base in Bioinformatics
No ratings yet
Data Base in Bioinformatics
30 pages
Mulder 2007
No ratings yet
Mulder 2007
13 pages
Lista de Bases de Datos
No ratings yet
Lista de Bases de Datos
13 pages
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
No ratings yet
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
105 pages
Note 2
No ratings yet
Note 2
54 pages
Bioinformatics Biological Database
No ratings yet
Bioinformatics Biological Database
31 pages
Group 2
No ratings yet
Group 2
24 pages
CH12
No ratings yet
CH12
8 pages
Protein Database Overview
No ratings yet
Protein Database Overview
13 pages
Biological Databases
No ratings yet
Biological Databases
41 pages
Unit I
No ratings yet
Unit I
28 pages
Protein Sequence Database Ankita Sharma
No ratings yet
Protein Sequence Database Ankita Sharma
31 pages
Protein Database
No ratings yet
Protein Database
3 pages
Biological Databases
No ratings yet
Biological Databases
13 pages
Biological Databases PDF
No ratings yet
Biological Databases PDF
13 pages
BIOINFORMATICS - eNOTES
No ratings yet
BIOINFORMATICS - eNOTES
23 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Introduction To Bioinformatics (Databases)
No ratings yet
Introduction To Bioinformatics (Databases)
28 pages
Bioinformatics Unit I
No ratings yet
Bioinformatics Unit I
6 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
BCH 505 Bioinformatics 3 (2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3 (2 2) Databases
17 pages
Anjali 1
No ratings yet
Anjali 1
16 pages
Protein Databases
No ratings yet
Protein Databases
23 pages
Lecture 5 Protein Sequence Database
No ratings yet
Lecture 5 Protein Sequence Database
12 pages
Bioinformatics Databases
No ratings yet
Bioinformatics Databases
7 pages
The Universal Protein Resource (Uniprot) : An Expanding Universe of Protein Information
No ratings yet
The Universal Protein Resource (Uniprot) : An Expanding Universe of Protein Information
6 pages
Databases - Final
No ratings yet
Databases - Final
50 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Databases 2 KD
No ratings yet
Databases 2 KD
4 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
BIOINFORMATICS
No ratings yet
BIOINFORMATICS
22 pages
Protein Databases
No ratings yet
Protein Databases
12 pages
Driving NC Ii Post Test
100% (2)
Driving NC Ii Post Test
2 pages
Sindhu Rudianto - PDF - Wiratman Wangsadinata .PDF - Ellen M. Rathje - Makalah
No ratings yet
Sindhu Rudianto - PDF - Wiratman Wangsadinata .PDF - Ellen M. Rathje - Makalah
10 pages
Paving Flooring and Dado
No ratings yet
Paving Flooring and Dado
17 pages
Singer Model NM-17-27 Electromagnetic Field Intensity Meter (1-500783-255 (Rec. A) ) Instruction Manual, 1973.
100% (1)
Singer Model NM-17-27 Electromagnetic Field Intensity Meter (1-500783-255 (Rec. A) ) Instruction Manual, 1973.
29 pages
Log Cat 1750001494765
No ratings yet
Log Cat 1750001494765
5 pages
AXALT Addendum DOT 2017
No ratings yet
AXALT Addendum DOT 2017
29 pages
NEETTest Paper - Physics - 28-4-2020 Deepak PDF
No ratings yet
NEETTest Paper - Physics - 28-4-2020 Deepak PDF
5 pages
Mclogit
No ratings yet
Mclogit
19 pages
EN - BioMajesty 6010 - C
100% (1)
EN - BioMajesty 6010 - C
2 pages
Excel VBA Programming Golden Rules
100% (6)
Excel VBA Programming Golden Rules
31 pages
Architecture and Algorithms For Tracking Football Players With Multiple Cameras
No ratings yet
Architecture and Algorithms For Tracking Football Players With Multiple Cameras
5 pages
RD - Incident Rail Commander
No ratings yet
RD - Incident Rail Commander
7 pages
Eco Care 250
No ratings yet
Eco Care 250
8 pages
Lecture # 2 Sept 8, 2020: Artificial Lift Technology Quizzes Questions
No ratings yet
Lecture # 2 Sept 8, 2020: Artificial Lift Technology Quizzes Questions
3 pages
The Design and Simulation of An S-Band Circularly Polarized Microstrip Antenna Array
No ratings yet
The Design and Simulation of An S-Band Circularly Polarized Microstrip Antenna Array
5 pages
GE2 - Exercise 2.1 Juvine Ramos
No ratings yet
GE2 - Exercise 2.1 Juvine Ramos
4 pages
Shark Tank - Season 1 (2021) Frequently Asked Questions Registration
No ratings yet
Shark Tank - Season 1 (2021) Frequently Asked Questions Registration
3 pages
Linked Hybrid by Steven Holl
100% (1)
Linked Hybrid by Steven Holl
16 pages
Canon 750d Specifications
No ratings yet
Canon 750d Specifications
6 pages
9 DCT Generic Standards
No ratings yet
9 DCT Generic Standards
40 pages
State Common Entrance Test Cell, Mumbai
No ratings yet
State Common Entrance Test Cell, Mumbai
5 pages
Week 7 To 10 (Writing The Concept Paper)
70% (46)
Week 7 To 10 (Writing The Concept Paper)
36 pages
Objective Test
No ratings yet
Objective Test
36 pages
5f551-D36f-0026-86b-301b84f4a UPDATED The Self Help Planner 1 Better Goal Planner
No ratings yet
5f551-D36f-0026-86b-301b84f4a UPDATED The Self Help Planner 1 Better Goal Planner
15 pages
Assigment Histology
No ratings yet
Assigment Histology
40 pages
The First Quarterly Assessment Results of Grade 2
No ratings yet
The First Quarterly Assessment Results of Grade 2
13 pages
Theories of Evolution
No ratings yet
Theories of Evolution
17 pages
PC5020 V3.2 - Manual Programare PDF
No ratings yet
PC5020 V3.2 - Manual Programare PDF
46 pages
19xr Impeller
No ratings yet
19xr Impeller
1 page
Internal Storage Encoding of Characters
No ratings yet
Internal Storage Encoding of Characters
4 pages

Disclaimer

Uploaded by

Disclaimer

Uploaded by

Disclaimer

Primary Protein Microbiology

• The PRIMARY databases hold the experimentally determined

• Protein sequence databases:

• The Protein Sequence Database was developed at the National

• SWISS-PROT is a protein sequence database which, from its inception in

The structure of the database, and the quality of its

By mid-1998, the database contained -70000

• UniProt is produced by the UniProt Consortium, a collaboration

The default raw sequence data for

• The UniProt Knowledgebase, and in particular UniProtKB/Swiss-Prot, is

• Three UniRef databases – UniRef100, UniRef90 and UniRef50 – merge

• UniParc is designed to capture all publicly available protein sequence

• The availability of metagenomic data has necessitated the creation of

• TrEMBL (Translated EMBL) was created in 1996 as a computer-annotated

You might also like