Zoya Bioinformatics Assignment
Zoya Bioinformatics Assignment
The National Center for Biotechnology Information (NCBI) is a division of the National Library
of Medicine (NLM), USA, that provides access to a vast range of biological databases and tools.
It is responsible for storing, analyzing, and making biological and biomedical data publicly
available. Some of its most notable databases include GenBank, PubMed, PubChem, OMIM, and
the Taxonomy Database.
1
I) GenBank: -
2
3
II) Taxonomy Database: -
4
III) Gene Expression Omnibus (GEO): -
The Gene Expression Omnibus (GEO) is a public repository for gene expression data, storing
experimental data from microarrays, RNA-Seq, and ChIP-Seq studies. Researchers can submit,
retrieve, and analyze gene expression data for various biological studies.
5
6
IV) PubChem: -
7
8
V) PubMed: -
PubMed is a biomedical literature database providing access to over 35 million citations from
MEDLINE, life sciences journals, and online books. It is widely used by researchers, doctors, and
students to access scientific publications.
9
10
VI) OMIM (Online Mendelian Inheritance in Man): -
OMIM is a comprehensive database cataloging human genes and genetic disorders, providing
information on disease-causing mutations, inheritance patterns, and clinical descriptions, making
it valuable for medical genetics research.
11
12
VII) Virus Database: -
The NCBI Virus Database provides a collection of viral genome sequences, taxonomic
information, and related research data. It includes data on RNA and DNA viruses, including those
affecting humans, animals, and plants. This database is essential for studying viral evolution,
outbreaks, and vaccine development.
13
14
B) EMBL Database: -
The European Molecular Biology Laboratory (EMBL) Database is a key resource for biological
data storage and analysis. It is maintained by the European Bioinformatics Institute (EMBL-EBI)
and provides various databases for nucleotide sequences, protein functions, chemical compounds,
and structural biology. Some of its important databases include UniProt, AlphaFold, InterPro,
ChEMBL, IntAct, IPD, and Pfam.
15
16
I) UniProt: -
UniProt (Universal Protein Resource) is a comprehensive database that provides protein sequence
and functional information. It integrates data from several sources, including Swiss-Prot,
TrEMBL, and PIR, to offer high-quality, curated protein annotations. UniProt is widely used for
studying protein function, structure, and interactions.
17
18
II) AlphaFold Database: -
19
20
III) InterPro: -
InterPro is a database that classifies and predicts protein families, domains, and functional sites. It
integrates information from multiple protein signature databases to provide insights into protein
structure and function. Researchers use InterPro to identify conserved domains and evolutionary
relationships.
21
22
IV) ChEMBL: -
23
24
V) IntAct Database: -
The IntAct Database is a resource for molecular interaction data, storing experimentally verified
protein-protein, protein-DNA, and protein-RNA interactions. It supports researchers in
understanding biological pathways, disease mechanisms, and protein networks.
25
26
VI) IPD Database (Immuno Polymorphism Database): -
The Immuno Polymorphism Database (IPD) is a specialized resource that provides genetic
variation data of immune system-related genes, including human leukocyte antigens (HLA), killer-
cell immunoglobulin-like receptors (KIR), and major histocompatibility complex (MHC). It is
essential for immunogenetics and transplantation research.
27
VII) Pfam Database: -
The Pfam Database is a collection of protein families and domains, using hidden Markov models
(HMMs) to predict conserved functional regions within proteins. It helps researchers study protein
evolution, domain architectures, and functional annotations across species.
28
29
C) PDB (Protein Data Bank): -
The Protein Data Bank (PDB) is a structural biology database that stores 3D atomic-level
structures of biomolecules, including proteins, DNA, RNA, and complex assemblies. It is
maintained by the Worldwide Protein Data Bank (wwPDB) consortium, which includes RCSB
PDB (USA), PDBe (Europe), and PDBj (Japan). PDB structures are obtained from X-ray
crystallography, NMR spectroscopy, and cryo-electron microscopy. The database is widely used
for drug discovery, protein function analysis, and structural bioinformatics.
30
E) Gene Ontology: -
Gene Ontology (GO) is a structured system in bioinformatics that categorizes gene and protein
functions across various species. It offers a consistent terminology for describing genes and their
products based on their roles in biological processes, molecular activities, and cellular locations.
31
D) KEGG (Kyoto Encyclopedia of Genes and Genomes): -
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a bioinformatics resource that
provides information on genes, proteins, metabolic pathways, diseases, and drugs. It integrates
genomic, biochemical, and systems biology data to help researchers study cellular processes, gene
functions, and metabolic interactions. KEGG is widely used in genomics, metabolomics, and
systems biology research.
32
I) KEGG Pathway: -
33
II) KEGG Enzyme: -
The KEGG Enzyme Database provides information on enzymes and their roles in biochemical
reactions. It classifies enzymes based on the Enzyme Commission (EC) number and links them to
metabolic pathways, substrates, and reactions. This database is useful for studying enzyme
mechanisms, metabolic engineering, and drug-target interactions.
34