0% found this document useful (0 votes)
50 views13 pages

Databases of NCBI

This document provides an overview of the nucleotide and protein databases available from the National Center for Biotechnology Information (NCBI). It describes several nucleotide databases, including GenBank, RefSeq, and Sequence Read Archive, that contain nucleotide sequences. It also outlines protein databases like GenPept that house translated coding sequences and protein clusters containing highly similar protein sequences. The document aims to introduce students to the various sequence data resources available through NCBI.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views13 pages

Databases of NCBI

This document provides an overview of the nucleotide and protein databases available from the National Center for Biotechnology Information (NCBI). It describes several nucleotide databases, including GenBank, RefSeq, and Sequence Read Archive, that contain nucleotide sequences. It also outlines protein databases like GenPept that house translated coding sequences and protein clusters containing highly similar protein sequences. The document aims to introduce students to the various sequence data resources available through NCBI.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Course title: Basic Bioinformatics

Course Code: ZOL-602


Credit hours: 3(2-1)
Databases of NCBI

DR. SAIRA HINA


Contents include:

 Nucleotide Databases
 BioProject
 Assembly
 GenBank
 PopSet
 Protein Databases
 GenPept
 Protein clusters
Nucleotide Databases

 Contain collection of nucleotide sequences from different sources.

 Different databases categorized under nucleotide databases are :


BioProject
 BioProject provides links to the primary
data from these projects, which range from focused genome
sequencing projects to large international collaborations

Assembly
 Assembly database collects metadata about genome
assemblies that were either submitted to GenBank or that are part of the Ref Seq
database.
Genome
 The Genome database collects genomic sequencing projects
for a given species and provides links to corresponding
records in BioProject, Assembly, Nucleotide and Protein.

RefSeq
 The RefSeq database is a non-redundant set of curated
and computationally derived sequences for transcripts, proteins
and genomic regions.
GenBank
 GenBank is the primary nucleotide sequence archive at
NCBI and is a member of the International Nucleotide Sequence
Database Collaboration (INSDC).

 Sequences from GenBank are available from three


Entrez databases

 Nucleotide
 EST
 GSS
PopSet
 Collection of related sequences and alignments derived from
population, phylogenetic, mutation and ecosystem studies that
have been submitted to GenBank.

Sequence Read Archive (SRA)


 repository for raw sequence reads and

alignments generated by high-throughput nucleic acid sequencers.


Clone database (CloneDB)
 Resource for finding descriptions, sources, map
positions and distributor information about available
clones and libraries.

Probe
 Registry of nucleic acid reagents designed for use in a wide
variety of biomedical research applications including
genotyping, SNP discovery, gene expression, gene silencing
and gene mapping.


Protein Databases
 Include collection of protein sequences from different sources.
 Entrez (protein sequence database of NCBI).
 Receives protein sequences from

 PIR (Protein Information Resource)


 PDB (Protein Data Bank)
 Translations for coding sequences from RefSeq(Reference Sequence)
 GenBank
 CDD is protein annotation resource comprising of MSA (Multiple
Sequence Alignment) models of proteins and ancient domains.
 Cross-linking of Entrez with Entrez Taxonomy database
 Contains information of more than 75,000 organisms.

GenPept
 Format in which protein sequence is displayed in Entrez.

Protein clusters
Protein Clusters database contains sets of almost identical RefSeq proteins
encoded by complete genomes from prokaryotes, eukaryotic organelles.
 Accession number is provided to each nucleotide

 Remains permanently associated with nucleotide sequence


 Allows easy tracking of different versions of sequence information among
three organizations.
References
 Andreas D. Baxevanis, BIOINFORMATICS A Practical Guide to the
Analysis of Genes and Proteins SECOND EDITION, A JOHN WILEY &
SONS, INC., PUBLICATION.

 Database resources of the National Center for Biotechnology Information


by NCBI Resource Coordinator, Nucleic Acids Research, 2016, Vol. 44,
Database issue D7–D19doi: 10.1093/nar/gkv129
 Essential Bioinformatics, by Jin Xiong, Cambridge
Applied Bioinformatics by Selzer, P., Marhofer, R. and Rohwer, A.
, Internet Source

You might also like