Protein Databases

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Protein databases

• Protein databases are a type of


biological database that are collections of
information about proteins.
• The information contained in protein databases
includes the amino acid sequence, the domain
structure, the biological function of the protein, its
three-dimensional structure, and its interactions
with other proteins.
• Several protein databases are publicly available.
Based on the type of information stored, protein
databases can be classified into several categories.
• Some of the most common categories of protein
databases are as follows
Protein Sequence Databases

• The protein sequence database contains amino acid


sequences of proteins and related information. The
amino acid sequence of a protein is important
because it determines the protein’s three-
dimensional structure and function, as well as its
identity.
• Some of the most popular protein sequence
databases are:
PIR

• PIR (Protein Information Resource) is a popular


protein sequence database that provides
information on functionally annotated protein
sequences.
• PIR maintains three databases, the Protein
Sequence Database (PSD), the Non-redundant
Reference (NREF) sequence database, and the
integrated Protein Classification (iProClass)
database, which contains annotated protein
sequences, classification information, and protein
family, function, and structure information.
SWISS-PROT

• SWISS-PROT is a protein sequence database that


provides high levels of annotations, including
information on the protein’s function, domain
structure, post-translational modifications, and
variants.
• Swiss-Prot is jointly managed by the SIB (Swiss
Institute of Bioinformatics) and the EBI (European
Bioinformatics Institute).
PDB

• PDB (Protein Data Bank) is a worldwide repository


of 3D structure data on large molecules such as
proteins, nucleic acids, and other biological
macromolecules.
• It stores three-dimensional structural models of
macromolecules obtained through three frequently
used experimental methods: X-ray crystallography,
nuclear magnetic resonance spectroscopy (NMR),
and electron microscopy (3DEM).
SCOP

• SCOP (Structural Classification of Proteins) is a


protein structure database that organizes proteins
based on their secondary structure properties.
• SCOP categorizes proteins into different levels
based on their evolutionary relationships and
structural similarities.
• Proteins with high sequence identity or similar
structure and function are grouped into families,
and families with similar structures but low
sequence identity are placed into superfamilies.
CATH

• CATH is a database that categorizes protein


domains into hierarchical levels based on their
folding patterns.
• Protein domains are classified into the CATH
hierarchy, which consists of four levels of increasing
specificity: Class, Architecture, Topology, and
Homologous Superfamily. Domains that have
similar folding patterns are grouped together at
higher levels of the hierarchy.
Protein-Protein Interaction Databases

• Protein-protein interaction databases are


collections of information on the interactions
between proteins.
• These databases provide valuable information on
the relationships between different proteins and
their functions in biological systems.
BIND

• BIND (Biomolecular Interaction Network Database)


is a database that stores detailed descriptions of
interactions, molecular complexes, and pathways
between various biomolecules, including proteins,
nucleic acids, and small molecules.
• The database is designed to be used for data
mining and can be used to study networks of
interactions and map pathways across different
species. The database can also provide information
for kinetic simulations.
DIP

• DIP (Database of Interacting Proteins) is a database


that contains protein-protein interaction
information that has been compiled through both
manual curations and computational methods.
• It is useful for understanding protein functions, and
their relationships with other proteins. It can also
be used to study the properties of networks of
interacting proteins, evaluate predictions of
protein-protein interactions, and explore the
evolution of these interactions.
MINT

• MINT (Molecular Interaction) is a database that


stores information on functional interactions
between biological molecules such as proteins,
RNA, and DNA.
• It also stores information on enzymatic
modifications of partner molecules.
• The database primarily focuses on experimentally
verified protein-protein interactions and considers
both direct and indirect relationships.
Protein Pattern and Profile Databases

• Protein pattern and profile databases contain


information on motifs found in sequences.
Sequence motifs correspond to structural or
functional features in proteins.
• So, the use of protein sequence patterns or profiles
is a valuable tool in determining the function of
proteins.
InterPro

• InterPro is a database that contains information on


protein families, domains, and functional sites.
• It was created by combining several major protein
signature databases, including PROSITE, Pfam,
PRINTS, ProDom, and SMART into a single
comprehensive resource.
PROSITE

• PROSITE is a collection of signatures that identify


patterns or profiles in proteins, which can provide
information on their biological functions.
• The signatures in the database are linked to
annotation documents that provide information on
the protein family or domain detected, including its
name, function, 3D structure, and references
Metabolic Pathway Databases

• Metabolic pathway databases contain information


about enzymes, biochemical reactions, and
metabolic pathways.
ENZYME

• ENZYME is a database that stores information on


enzyme nomenclature.
• It is used as the nomenclature source for enzyme
names and reactions by most metabolic databases
as well as by other biomolecular databases.
KEGG

• KEGG (Kyoto Encyclopedia of Genes and Genomes)


is a comprehensive database that maps out
molecular and cellular pathways involving
interactions between genes and molecules.
• It is composed of pathway maps, molecule tables,
gene tables, and genome maps, and is used to build
functional maps of metabolic and regulatory
pathways.
Applications of protein
databases
• Protein databases have numerous applications.
Some of the applications are:
• Protein databases can be used in sequence analysis
to identify homologous sequences and predict
protein functions based on sequence similarity.
• Protein databases can also be used for predicting
protein structure by comparing the amino acid
sequence of a protein with known structures in the
database.
• Protein databases also include tools to study
protein-protein interactions.
• Protein pattern and profile databases can be used
for protein family identification by identifying
conserved motifs.
• Protein databases such as metabolic pathway
databases can be used in drug discovery and
disease research by studying the metabolic
pathways involved in diseases.
Secondary databases

• By contrast, secondary databases comprise data


derived from the results of analysing primary data.
• They are often referred to as curated databases but
this is a bit of a misnomer because primary
databases are also curated to ensure that the data
in them is consistent and accurate.
• Secondary databases often draw upon information
from numerous sources,
• including other databases (primary and secondary),
controlled vocabularies and the scientific literature.
• They are highly curated, often using a complex
combination of computational algorithms and
manual analysis and interpretation to derive new
knowledge from the public record of science.

You might also like