Biological Databases For Human Research
Biological Databases For Human Research
H O S T E D BY
RESOURCE REVIEW
CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences,
Beijing 100101, China
Handled by Ge Gao
KEYWORDS Abstract The completion of the Human Genome Project lays a foundation for systematically
Human; studying the human genome from evolutionary history to precision medicine against diseases.
Database; With the explosive growth of biological data, there is an increasing number of biological databases
Big data; that have been developed in aid of human-related research. Here we present a collection of human-
Database category; related biological databases and provide a mini-review by classifying them into different categories
Curation according to their data types. As human-related databases continue to grow not only in count but
also in volume, challenges are ahead in big data storage, processing, exchange and curation.
completion of the Human Genome Project in 2003 holds sig- Human-related databases
nificant benefits for many fields from human evolution to per-
sonalized healthcare and precision medicine. In this report, we
Decoding the human genome bears great significance in, from a
present a collection of biological databases relevant to human
theoretical view, unveiling human evolutionary history, and
research and provide a mini-review by classifying them into
different categories. from an application view, exploring personalized medicine
against diverse diseases. Considering the heterogeneity in data
type, scope and curation, biological databases can be classified
Database classification into multiple categories under different criteria as presented
above, making it easier for people to effectively characterize
Biological databases are developed for diverse purposes, databases and identify the database(s) of interest. However,
encompass various types of data at heterogeneous coverage some databases are inaccessible over time or poorly main-
and are curated at different levels with different methods, so tained/updated or even never used [12]. In this study, therefore,
that there are accordingly several different criteria applicable we assemble a collection of human-related databases that are
to database classification. widely used and currently accessible via the Internet (Table 1).
As database classification based on data type is informative
Scope of data coverage and straightforward, we assign one major category to each data-
base, albeit one database may correspond to multiple categories.
According to the scope of data coverage, biological databases In what follows, we focus on databases categorized in DNA,
can be classified as comprehensive and specialized databases. RNA, protein, expression, pathway and disease, respectively.
Comprehensive databases cover different types of data from
numerous species and typical examples are GenBank [2], DNA databases
European Molecular Biology Laboratory (EMBL) [3], and
DNA Data Bank of Japan (DDBJ) [4]. These three databases A DNA database centers on managing DNA data from many
were established as the International Nucleotide Sequence or some specific species. The primary function of human DNA
Database Collaboration in 1988 to collect and disseminate databases includes establishment of the reference genome (e.g.,
DNA and RNA sequences. On the other hand, specialized NCBI RefSeq [8]), profiling of human genetic variation (e.g.,
databases contain specific types of data or data from specific dbSNP [13]), association of genotype with phenotype (e.g.,
organisms. For example, WormBase [5] is for nematode biol- EGA [14]), and identification of human microbiome metagen-
ogy and genomics and RiceWiki [6] is for community curation omes (e.g., IMG/HMP [15]). A representative example of
of rice genes. DNA database is GenBank [2], a collection of all publicly-
available DNA sequences (http://www.ncbi.nlm.nih.gov/gen-
Level of biocuration bank). Since its inception in 1982, GenBank grows at an extra-
ordinary pace and as of December 2014, contains over 184
According to level of data curation, biological databases can billion nucleotide bases in more than 179 million sequences
roughly fall into primary and secondary or derivative data- (http://www.ncbi.nlm.nih.gov/genbank/statistics).
bases. Primary databases contain raw data as archival reposi-
tory such as the NCBI Sequence Read Archive (SRA) [7], RNA databases
whereas secondary or derivative databases contain curated
information as added value, e.g., NCBI RefSeq [8]. It is well acknowledged that only a tiny proportion of the
human genome is transcribed into mRNAs, whereas the vast
Method of biocuration majority of the genome is transcribed into ‘‘dark matter’’––
non-coding RNAs (ncRNAs) that do not encode proteins
As a consequence of the explosive growth of data, curation [16], including microRNAs (miRNAs), small nucleolar RNAs
increasingly requires collective intelligence for collaborative (snoRNAs), piwiRNAs (piRNAs), and long non-coding
data integration and annotation. Therefore, biological data- RNA (lncRNA). Therefore, an increasing number of human
bases can also be classified as (1) expert-curated databases, RNA databases have been built for deciphering ncRNAs
e.g., RefSeq [8] and TAIR, [9] and (2) community-curated (e.g., GENCODE [17]), in particular lncRNAs that attract
databases, which are curated in a collective and collaborative the rising interest (e.g., LncRNAWiki [10]), and characterizing
manner by a number of researchers, e.g., LncRNAWiki [10] their functions and interactions (e.g., RNAcentral [18]). A
and GeneWiki [11]. representative example of RNA database is RNAcentral [18].
It provides unified access to the ncRNA sequence data supplied
Type of data managed by multiple databases including Rfam [19], lncRNAdb [20], and
miRBase [21] (http://rnacentral.org).
According to the types of data managed in different databases,
Protein databases
biological databases can roughly fall into the following cate-
gories: (1) DNA, (2) RNA, (3) protein, (4) expression, (5) path-
way, (6) disease, (7) nomenclature, (8) literature, and (9) The purpose of constructing protein databases includes collec-
standard and ontology. tion of universal proteins (e.g., UniProt [22]), identification of
Zou D et al / Human-related Biological Databases 57
Table 1 (continued)
Name Link Brief description Refs. Category#
ModBase http://salilab.org/modbase Database of comparative protein structure models [77]
mUbiSiDa http://reprod.njmu.edu.cn/mUbiSiDa Mammalian Ubiquitination Site Database [78]
PANTHER http://www.pantherdb.org Protein ANalysis THrough Evolutionary [79]
Relationships
PDB http://www.rcsb.org/pdb Protein Data Bank for 3D structures of biological [25]
macromolecules
PDBe http://www.ebi.ac.uk/pdbe Protein Data Bank in Europe [80]
Pfam http://pfam.xfam.org Database of conserved protein families and [23]
domains
PhosSNP http://phossnp.biocuckoo.org Genetic polymorphisms that influence protein [81]
phosphorylation
PIR http://pir.georgetown.edu Protein Information Resource [82]
PROSITE http://www.expasy.org/prosite Database of protein domains, families and [83]
functional sites
SysPTM http://lifecenter.sgst.cn/SysPTM Post-translational modifications [84]
TreeFam http://www.treefam.org Database of phylogenetic trees of animal species [24]
UniPROBE http://thebrain.bwh.harvard.edu/ Universal PBM Resource for Oligonucleotide [85]
uniprobe Binding Evaluation
UniProt http://www.uniprot.org Universal protein resource [22]
UUCD http://uucd.biocuckoo.org Ubiquitin and Ubiquitin-like Conjugation [86]
Database
ArrayExpress http://www.ebi.ac.uk/arrayexpress Database of functional genomics experiments [87] Expression
BioGPS http://biogps.org Portal for querying and organizing gene [88]
annotation resources
Expression Atlas http://www.ebi.ac.uk/gxa Differential and baseline expression [27]
Human Protein http://www.proteinatlas.org Tissue-based map of the human proteome [29]
Atlas
MOPED https://www.proteinspire.org Multi-Omics Profiling Expression Database [89]
NCBI GEO http://www.ncbi.nlm.nih.gov/geo Gene Expression Omnibus [26]
NRED http://nred.matticklab.com Database of lncRNA expression [90]
ONCOMINE https://www.oncomine.org Cancer microarray database [91]
PrimerBank http://pga.mgh.harvard.edu/primerbank Public resource for PCR primers [92]
PRIDE http://www.ebi.ac.uk/pride PRoteomics IDEntifications [93]
TiGER http://bioinfo.wilmer.jhu.edu/tiger Tissue-specific Gene Expression and Regulation [28]
WikiCell http://www.wikicell.org Unified resource for Human transcriptomics [94]
research
CPDB http://consensuspathdb.org Database of human interaction networks [95] Pathway
HMDB http://www.hmdb.ca Human Metabolome Database [96]
KEGG http://www.genome.jp/kegg/pathway. KEGG pathway maps [30]
PATHWAY html
MetaCyc http://metacyc.org Metabolic pathway database [97]
Pathway http://www.pathwaycommons.org Pathway commons [98]
Commons
PID http://pid.nci.nih.gov Pathway Interaction Database [99]
Reactome http://www.reactome.org Curated and peer-reviewed pathway database [100]
UniPathway http://www.grenoble.prabi.fr/ Universal Pathway [101]
obiwarehouse/unipathway
AlzBase http://alz.big.ac.cn/alzBase Database for gene dysregulation in Alzheimer’s [102] Disease
disease
CADgene http://www.bioguo.org/CADgene Coronary Artery Disease gene database [103]
COSMIC http://cancer.sanger.ac.uk Catalog Of Somatic Mutations In Cancer [104]
DiseaseMeth http://bioinfo.hrbmu.edu.cn/diseasemeth Human disease methylation database [105]
DisGeNET http://www.disgenet.org/web/ Gene–disease associations [106]
DisGeNET/v2.1
GOBO http://co.bmc.lu.se/gobo Gene expression-based Outcome for Breast cancer [107]
Online
GWAS Central http://www.gwascentral.org A comprehensive resource for the comparison and [108]
interrogation of genome-wide association studies
GWASdb http://jjwanglab.org/gwasdb Human genetic variants identified by genome- [109]
wide association studies
HbVar http://globin.cse.psu.edu/hbvar Hemoglobin variants and thalassemias [110]
HGMD http://www.hgmd.org Human Gene Mutation Database [111]
Zou D et al / Human-related Biological Databases 59
Table 1 (continued)
Name Link Brief description Refs. Category#
ICGC http://icgc.org International Cancer Genome Consortium [33]
IDbases http://structure.bmc.lu.se/idbase Immunodeficiency-causing variations [112]
LncRNADisease http://cmbi.bjmu.edu.cn/lncrnadisease lncRNA and disease database [113]
LOVD http://www.lovd.nl Leiden open (source) Variation Database [114]
MalaCards http://www.malacards.org Human maladies and their annotations [115]
MethHC http://methhc.mbc.nctu.edu.tw Database of DNA methylation and gene [116]
expression in human cancer
MethyCancer http://methycancer.psych.ac.cn Database of human DNA Methylation and cancer [117]
miR2Disease http://www.miR2Disease.org Database for miRNA deregulation in human [118]
disease
MITOMAP http://www.mitomap.org/MITOMAP Polymorphisms and mutations in human [119]
mitochondrial DNA
NHGRI GWAS http://www.genome.gov/gwastudies Curated resource of SNP-trait associations [120]
Catalog
OMIM http://omim.org Online Mendelian Inheritance in Man [121]
T2D@ZJU http://tcm.zju.edu.cn/t2d Connections associated with type 2 diabetes [122]
TCGA http://cancergenome.nih.gov The Cancer Genome Atlas [32]
Universal http://www.umd.be/ Locus-specific database [123]
Mutation
Database
ViRBase http://www.rna-society.org/virbase Virus–host ncRNA associated interactions [124]
GO http://geneontology.org Gene ontology [125] Standard and ontology
HGNC http://www.genenames.org Database of human gene names [126]
Europe PMC http://europepmc.org Literature database in Europe [127] Literature
PubMed http://www.ncbi.nlm.nih.gov/pubmed Database of biomedical literature from [128]
MEDLINE
PubMed Central http://www.ncbi.nlm.nih.gov/pmc Free full-text literature archive [129]
Note: *This collection, however, by no means pictures the whole range of human-related databases that are currently available. Primary databases
(DDBJ/EMBL/GenBank) are not shown, as they contain raw data. #A database may correspond to multiple categories and only one major
category is shown here.
protein families and domains (e.g., Pfam [23]), reconstruction exploring tissue-specific gene expression and regulation (e.g.,
of phylogenetic trees (e.g., TreeFam [24]), and profiling of pro- TiGER [28]), and profiling expression information based on
tein structures (e.g., PDB [25]). A representative example of both RNA and protein data (e.g., Human Protein Atlas [29]).
protein database is PDB, the main primary database for 3D A representative case of expression database is Human
structures of biological macromolecules determined by X-ray Protein Atlas. As of 30 December 2014, it encompasses expres-
crystallography and NMR. Established in 1971, PDB contains sion profiles for a large majority of human protein-coding genes
105,465 biological macromolecular structures as of 30 based on both RNA (transcriptome analysis based on 213 tis-
December 2014, in which 27,393 entries belong to human sue and cell line samples) and protein data (proteome analysis
(http://www.rcsb.org/pdb). Another example is the Universal based on 24,028 antibodies) (http://www.proteinatlas.org).
Protein Resource (UniProt). As a collaborative project
between EMBL-EBI, Swiss Institute of Bioinformatics (SIB), Pathway databases
and Protein Information Resource (PIR), UniProt provides a
comprehensive, high-quality, and freely-accessible resource of Pathway databases contain biological pathways for metabolic,
protein sequence and functional information. Currently, signaling, and regulatory pathway analysis. A representative
UniProt includes three member databases: UniProt example is KEGG PATHWAY [30], a curated biological path-
Knowledgebase (UniProtKB), UniProt Reference Clusters way resource on the molecular interaction and reaction net-
(UniRef), and UniProt Archive (UniParc). In addition, works. As the core of KEGG, KEGG PATHWAY integrates
UniProtKB consists of two sections: Swiss-Prot (containing a many entities that are stored in KEGG sibling databases, includ-
collection of 547,357 manually-annotated and -reviewed ing genes, proteins, RNAs, chemical compounds, and chemical
proteins as of January 2015) and TrEMBL (containing a reactions (http://www.genome.jp/kegg/pathway.html).
collection of 89,451,166 un-reviewed proteins as of January
2015) (http://www.uniprot.org). Disease databases
Expression databases There are at least 200 forms of cancer in the world, causing
14.6% of all human deaths (http://en.wikipedia.org/wiki/
Expression databases can be used for various purposes, includ- Cancer). Thus, obtaining complete cancer genomes and
ing archiving expression data (e.g., GEO [26]), detecting dif- identifying molecular mutations and abnormal genes can pro-
ferential and baseline expression (e.g., Expression Atlas [27]), vide new insights for cancer prevention, detection, and
60 Genomics Proteomics Bioinformatics 13 (2015) 55–63
eventually, personalized treatment [31]. Toward this end, there (863 Program; Grant No. 2012AA020409) by the Ministry of
are two well-known cancer projects, viz., The Cancer Genome Science and Technology of China awarded to ZZ.
Atlas (TCGA) [32] and International Cancer Genome
Consortium (ICGC) [33]. TCGA, founded in 2006 by the
National Cancer Institute and National Human Genome
References
Research Institute at the National Institutes of Health, aims
[1] Fernandez-Suarez XM, Rigden DJ, Galperin MY. The 2014
to collect a wide diversity of omics data (including exome,
Nucleic Acids Research Database Issue and an updated NAR
SNP, mRNA, miRNA, and methylation) for more than 20 dif- online Molecular Biology Database Collection. Nucleic Acids
ferent types of human cancer (http://cancergenome.nih.gov). Res 2014;42:D1–6.
Unlike TCGA, ICGC is a voluntary collaborative organization [2] Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J,
initiated in 2008 and open to all cancer and genomic researchers Sayers EW. GenBank. Nucleic Acids Res 2014;42:D32–7.
in the world. It aims to obtain a comprehensive description of [3] Brooksbank C, Bergman MT, Apweiler R, Birney E, Thornton
genomic, transcriptomic, and epigenomic changes in 50 differ- J. The European Bioinformatics Institute’s data resources 2014.
ent tumor types and/or subtypes, which are of clinical and soci- Nucleic Acids Res 2014;42:D18–25.
etal importance across the globe (http://icgc.org). [4] Kosuge T, Mashima J, Kodama Y, Fujisawa T, Kaminuma E,
Ogasawara O, et al. DDBJ progress report: a new submission
system for leading to a correct annotation. Nucleic Acids Res
Perspectives 2014;42:D44–9.
[5] Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ,
Here we summarize a collection of biological databases rele- et al. WormBase 2014: new views of curated biology. Nucleic
Acids Res 2014;42:D789–93.
vant to human research. This collection, however, by no means
[6] Zhang Z, Sang J, Ma L, Wu G, Wu H, Huang D, et al.
pictures the whole range of human-related databases that are
RiceWiki: a wiki-based database for community curation of rice
currently available. As primary databases store raw data, data- genes. Nucleic Acids Res 2014;42:D1222–8.
bases in this collection are most derivative databases, which [7] Kodama Y, Shumway M, Leinonen RInternational Nucleotide
are built from primary databases and contain curated informa- Sequence Database C.. The Sequence Read Archive: explosive
tion for different data types, and thus would be of great useful- growth of sequencing data. Nucleic Acids Res 2012;40:D54–6.
ness for studying the human genome. In the era of big data, [8] Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn
human-related biological databases continue to grow not only A, Ermolaeva O, et al. RefSeq: an update on mammalian
in count but also in volume, posing unprecedented challenges reference sequences. Nucleic Acids Res 2014;42:D756–63.
in data storage, processing, exchange, and curation. From this [9] Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C,
Sasidharan R, et al. The Arabidopsis Information Resource
point, it would be necessary to establish a cloud computing
(TAIR): improved gene annotation and new tools. Nucleic Acids
platform to store and process such big data and facilitate con-
Res 2012;40:D1202–10.
struction/update of a secondary or derivative database [34]. As [10] Ma L, Li A, Zou D, Xu X, Xia L, Yu J, et al. LncRNAWiki:
biological databases are physically distributed and heteroge- harnessing community knowledge in collaborative curation of
neous in data type and format, it is additionally required to human long non-coding RNAs. Nucleic Acids Res
build web open APIs to ease data exchange and sharing among 2015;43:D187–92.
different resources [35]. The last but not the least is curation, [11] Good BM, Clarke EL, de Alfaro L, Su AI. The Gene wiki in
which becomes an indispensable part in biological databases, 2011: community intelligence applied to human gene annotation.
principally because curation involves added value by standard- Nucleic Acids Res 2012;40:D1255–61.
ization and quality control and accordingly enhances data [12] Wren JD, Bateman A. Databases, data tombs and dust in the
wind. Bioinformatics 2008;24:2127–8.
interoperability and consistency [36]. Taken together, biologi-
[13] Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski
cal databases hold great utilities for human research and can
EM, et al. DbSNP: the NCBI database of genetic variation.
be regarded as an indicator of our potential to translate big Nucleic Acids Res 2001;29:308–11.
data into big discovery. Considering the current situation in [14] Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tarraga A,
China when compared to other countries, it is our hope that Cheng Y, et al. The European nucleotide archive. Nucleic Acids
this report may raise the general awareness, albeit better Res 2011;39:D28–31.
improved nowadays, of the significant role of human-related [15] Markowitz VM, Chen IM, Chu K, Szeto E, Palaniappan K,
biological databases not only for academic studies but also Pillay M, et al. IMG/M 4 version of the integrated metagenome
for clinical applications. comparative analysis system. Nucleic Acids Res
2014;42:D568–73.
[16] Ma L, Bajic VB, Zhang Z. On the classification of long non-
Competing interests coding RNAs. RNA Biol 2013;10:925–33.
[17] Genomes Project Consortium, Abecasis GR, Auton A, Brooks
The authors declared that there are no competing interests. LD, DePristo MA, Durbin RM, et al. An integrated map of
genetic variation from 1,092 human genomes. Nature
2012;491:56–65.
[18] The RNAcentral Consortium. RNAcentral: an international
database of ncRNA sequences. Nucleic Acids Res
Acknowledgements
2015;43:D123–9.
[19] Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki
This work was supported by the ‘‘100-Talent Program’’ of EP, et al. Rfam 11.0: 10 years of RNA families. Nucleic Acids
Chinese Academy of Sciences, the Strategic Priority Research Res 2013;41:D226–32.
Program of the Chinese Academy of Sciences (Grant No. [20] Quek XC, Thomson DW, Maag JL, Bartonicek N, Signal B,
XDB13040500), and the National High-tech R&D Program Clark MB, et al. lncRNAdb v2.0: expanding the reference
Zou D et al / Human-related Biological Databases 61
database for functional long noncoding RNAs. Nucleic Acids [42] Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R,
Res 2015;43:D168–73. Arenillas DJ, et al. JASPAR 2014: an extensively expanded and
[21] Kozomara A, Griffiths-Jones S. MiRBase: annotating high updated open-access database of transcription factor binding
confidence microRNAs using deep sequencing data. Nucleic profiles. Nucleic Acids Res 2014;42:D142–7.
Acids Res 2014;42:D68–73. [43] Kodama Y, Mashima J, Kosuge T, Katayama T, Fujisawa T,
[22] The UniProt Consortium. Ongoing and future developments at Kaminuma E, et al. The DDBJ Japanese Genotype-phenotype
the Universal Protein Resource. Nucleic Acids Res archive for genetic and phenotypic human data. Nucleic Acids
2011;39:D214–9. Res 2015;43:D18–22.
[23] Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, [44] Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG
Hollich V, Lassmann T, et al. Pfam: clans, web tools and for integration and interpretation of large-scale molecular data
services. Nucleic Acids Res 2006;34:D247–51. sets. Nucleic Acids Res 2012;40:D109–14.
[24] Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly L, [45] Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC,
et al. TreeFam: a curated database of phylogenetic trees of Mishmar D, et al. An enhanced MITOMAP with a global
animal gene families. Nucleic Acids Research 2006;34:D572–80. mtDNA mutational phylogeny. Nucleic Acids Res
[25] Rose PW, Beran B, Bi C, Bluhm WF, Dimitropoulos D, 2007;35:D823–8.
Goodsell DS, et al. The RCSB Protein Data Bank: redesigned [46] Bhattacharya A, Ziebarth JD, Cui Y. PolymiRTS Database 3.0:
web site and web services. Nucleic Acids Res 2011;39:D392–401. linking polymorphisms in microRNAs and their target sites with
[26] Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim human diseases and biological pathways. Nucleic Acids Res
IF, et al. NCBI GEO: archive for functional genomics data sets 2014;42:D86–91.
– 10 years on. Nucleic Acids Res 2011;39:D100510. [47] Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson
[27] Petryszak R, Burdett T, Fiorelli B, Fonseca NA, Gonzalez-Porta H, Diekhans M, et al. The UCSC Genome Browser Database:
M, Hastings E, et al. Expression Atlas update – a database of 2015 update. Nucleic Acids Res 2015;43:D670–81.
gene and transcript expression from microarray- and sequencing- [48] Yang JH, Li JH, Jiang S, Zhou H, Qu LH. ChIPBase: a database
based functional genomics experiments. Nucleic Acids Res for decoding the transcriptional regulation of long non-coding
2014;42:D926–32. RNA and microRNA genes from ChIP-Seq data. Nucleic Acids
[28] Liu X, Yu X, Zack DJ, Zhu H, Qian J. TiGER: a database for Res 2013;41:D177–87.
tissue-specific gene expression and regulation. BMC [49] Kiran A, Baranov PV. DARNED: a DAtabase of RNa EDiting
Bioinformatics 2008;9:271. in humans. Bioinformatics 2010;26:1772–6.
[29] Ponten F, Schwenk JM, Asplund A, Edqvist PH. The Human [50] Paraskevopoulou MD, Georgakilas G, Kostoulas N, Reczko M,
Protein Atlas as a proteomic resource for biomarker discovery. J Maragkakis M, Dalamagas TM, et al. DIANA-LncBase: experi-
Intern Med 2011;270:428–46. mentally verified and computationally predicted microRNA
[30] Okuda S, Yamada T, Hamajima M, Itoh M, Katayama T, Bork targets on long non-coding RNAs. Nucleic Acids Res
P, et al. KEGG Atlas mapping for global analysis of metabolic 2013;41:D239–45.
pathways. Nucleic Acids Res 2008;36:W423–6. [51] Takeda J, Suzuki Y, Sakate R, Sato Y, Gojobori T, Imanishi T,
[31] Stratton MR, Campbell PJ, Futreal PA. The cancer genome. et al. H-DBAS: human-transcriptome database for alternative
Nature 2009;458:719–24. splicing: update 2010. Nucleic Acids Res 2010;38:D86–90.
[32] Cancer Genome Atlas Research Network, Weinstein JN, [52] Busch A, Hertel KJ. HEXEvent: a database of Human EXon
Collisson EA, Mills GB, Shaw KR, Ozenberger BA, et al. The splicing Events. Nucleic Acids Res 2013;41:D118–24.
Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet [53] Volders PJ, Helsens K, Wang X, Menten B, Martens L, Gevaert
2013;45:1113–20. K, et al. LNCipedia: a database for annotated human lncRNA
[33] International Cancer Genome Consortium, Hudson TJ, transcript sequences and structures. Nucleic Acids Res
Anderson W, Artez A, Barker AD, Bell C, et al. International 2013;41:D246–51.
network of cancer genome projects. Nature 2010;464:993–8. [54] Jiang Q, Wang J, Wu X, Ma R, Zhang T, Jin S, et al.
[34] Dai L, Gao X, Guo Y, Xiao J, Zhang Z. Bioinformatics clouds LncRNA2Target: a database for differentially expressed genes
for big data manipulation. Biol Direct 2012;7:43 [discussion]. after lncRNA knockdown or overexpression. Nucleic Acids Res
[35] Zhang Z, Bajic VB, Yu J, Cheung K-H, Townsend JP. Data 2015;43:D193–6.
integration in bioinformatics: current efforts and challenges. In: [55] Gong J, Liu W, Zhang J, Miao X, Guo AY. LncRNASNP: a
Mahdavi MA, editor. Bioinformatics – trends and database of SNPs in lncRNAs and their potential functions in
methodologies. Rijeka, Croatia: InTech; 2011. p. 41–56. human and mouse. Nucleic Acids Res 2015;43:D181–6.
[36] Zhang Z, Zhu W, Luo J. Bringing biocuration to China. [56] Hsu SD, Tseng YT, Shrestha S, Lin YL, Khaleel A, Chou CH,
Genomics Proteomics Bioinformatics 2014;12:153–5. et al. MiRTarBase update 2014: an information resource for
[37] Gonzalez-Galarza FF, Christmas S, Middleton D, Jones AR. experimentally validated miRNA-target interactions. Nucleic
Allele frequency net: a database and online repository for Acids Res 2014;42:D78–85.
immune gene frequencies in worldwide populations. Nucleic [57] Dweep H, Gretz N, Sticht C. MiRWalk database for miRNA-
Acids Res 2011;39:D913–9. target interactions. Methods Mol Biol 2014;1182:289–305.
[38] Luo H, Lin Y, Gao F, Zhang CT, Zhang R. DEG 10, an update [58] Bu D, Yu K, Sun S, Xie C, Skogerbo G, Miao R, et al.
of the database of essential genes that includes both protein- NONCODE v3.0: integrative annotation of long noncoding
coding genes and noncoding genomic elements. Nucleic Acids RNAs. Nucleic Acids Res 2012;40:D210–5.
Res 2014;42:D574–80. [59] Yuan J, Wu W, Xie C, Zhao G, Zhao Y, Chen R. NPInter v2.0:
[39] Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al. an updated database of ncRNA interactions. Nucleic Acids Res
Ensembl 2014. Nucleic Acids Res 2014;42:D749–55. 2014;42:D104–8.
[40] Gilbert DG. EuGenes: a eukaryote genome information system. [60] Ramaswami G, Li JB. RADAR: a rigorously annotated
Nucleic Acids Res 2002;30:145–8. database of A-to-I RNA editing. Nucleic Acids Res
[41] Safran M, Chalifa-Caspi V, Shmueli O, Olender T, Lapidot M, 2014;42:D109–13.
Rosen N, et al. Human Gene-Centric Databases at the [61] Sai Lakshmi S, Agrawal S. piRNABank: a web resource on
Weizmann Institute of Science: GeneCards, UDB, CroW 21 classified and clustered Piwi-interacting RNAs. Nucleic Acids
and HORDE. Nucleic Acids Res 2003;31:142–6. Res 2008;36:D173–7.
62 Genomics Proteomics Bioinformatics 13 (2015) 55–63
[62] Cook KB, Kazan H, Zuberi K, Morris Q, Hughes TR. RBPDB: [82] Wu CH, Yeh LS, Huang H, Arminski L, Castro-Alvear J, Chen
a database of RNA-binding specificities. Nucleic Acids Res Y, et al. The Protein Information Resource. Nucleic Acids Res
2011;39:D301–8. 2003;31:345–7.
[63] Coimbatore Narayanan B, Westbrook J, Ghosh S, Petrov AI, [83] Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS,
Sweeney B, Zirbel CL, et al. The Nucleic Acid Database: new Bulliard V, Bairoch A, et al. PROSITE, a protein domain
features and capabilities. Nucleic Acids Res 2014;42:D114–22. database for functional characterization and annotation. Nucleic
[64] Xie J, Zhang M, Zhou T, Hua X, Tang L, Wu W. Sno/ Acids Res 2010;38:D161–6.
scaRNAbase: a curated database for small nucleolar RNAs and [84] Li J, Jia J, Li H, Yu J, Sun H, He Y, et al. SysPTM 2.0: an
cajal body-specific RNAs. Nucleic Acids Res 2007;35:D183–7. updated systematic resource for post-translational modification.
[65] Li JH, Liu S, Zhou H, Qu LH, Yang JH. starBase v2.0: decoding Database (Oxford) 2014;2014:bau025.
miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction [85] Hume MA, Barrera LA, Gisselbrecht SS, Bulyk ML.
networks from large-scale CLIP-Seq data. Nucleic Acids Res UniPROBE, update 2015: new tools and content for the online
2014;42:D92–7. database of protein-binding microarray data on protein–DNA
[66] Vlachos IS, Paraskevopoulou MD, Karagkouni D, Georgakilas interactions. Nucleic Acids Res 2015;43:D117–22.
G, Vergoulis T, Kanellos I, et al. DIANA-TarBase v7.0: [86] Gao T, Liu Z, Wang Y, Cheng H, Yang Q, Guo A, et al.
indexing more than half a million experimentally supported UUCD: a family-based database of ubiquitin and ubiquitin-like
miRNA:mRNA interactions. Nucleic Acids Res conjugation. Nucleic Acids Res 2013;41:D445–51.
2015;43:D153–9. [87] Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N,
[67] Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often Burdett T, Dylag M, et al. ArrayExpress update – an archive of
flanked by adenosines, indicates that thousands of human genes microarray and high-throughput sequencing-based functional
are microRNA targets. Cell 2005;120:15–20. genomics experiments. Nucleic Acids Res 2011;39:D1002–4.
[68] Sillitoe I, Lewis TE, Cuff A, Das S, Ashford P, Dawson NL, [88] Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S,
et al. CATH: comprehensive structural and functional annota- et al. BioGPS: an extensible and customizable portal for
tions for genome sequences. Nucleic Acids Res querying and organizing gene annotation resources. Genome
2015;43:D376–81. Biol 2009;10:R130.
[69] Liu Z, Wang Y, Gao T, Pan Z, Cheng H, Yang Q, et al. CPLM: [89] Montague E, Janko I, Stanberry L, Lee E, Choiniere J,
a database of protein lysine modifications. Nucleic Acids Res Anderson N, et al. Beyond protein expression, MOPED goes
2014;42:D531–6. multi-omics. Nucleic Acids Res 2015;43:D1145–51.
[70] Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, [90] Dinger ME, Pang KC, Mercer TR, Crowe ML, Grimmond SM,
Eisenberg D. DIP, the Database of Interacting Proteins: a Mattick JS. NRED: a database of long noncoding RNA
research tool for studying cellular networks of protein interac- expression. Nucleic Acids Res 2009;37:D122–6.
tions. Nucleic Acids Res 2002;30:303–5. [91] Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R,
[71] Wang Y, Liu Z, Cheng H, Gao T, Pan Z, Yang Q, et al. EKPD: Ghosh D, et al. ONCOMINE: a cancer microarray database
a hierarchical database of eukaryotic protein kinases and protein and integrated data-mining platform. Neoplasia 2004;6:1–6.
phosphatases. Nucleic Acids Res 2014;42:D496–502. [92] Wang X, Spandidos A, Wang H, Seed B. PrimerBank: a PCR
[72] Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, primer database for quantitative gene expression analysis, 2012
Kumar S, Mathivanan S, et al. Human Protein Reference update. Nucleic Acids Res 2012;40:D1144–9.
Database – 2009 update. Nucleic Acids Res 2009;37:D767–72. [93] Jones P, Cote RG, Cho SY, Klie S, Martens L, Quinn AF, et al.
[73] Du Y, Xu N, Lu M, Li T. hUbiquitome: a database of PRIDE: new developments and new datasets. Nucleic Acids Res
experimentally verified ubiquitination cascades in humans. 2008;36:D878–83.
Database (Oxford) 2011;2011:bar055. [94] Zhao D, Wu J, Zhou Y, Gong W, Xiao J, Yu J. WikiCell: a
[74] Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, unified resource platform for human transcriptomics research.
Lopez R, et al. The InterPro protein families database: the Omics 2012;16:357–62.
classification resource after 15 years. Nucleic Acids Res [95] Kamburov A, Stelzl U, Lehrach H, Herwig R. The
2015;43:D213–21. ConsensusPathDB interaction database: 2013 update. Nucleic
[75] Rawlings ND, Waller M, Barrett AJ, Bateman A. MEROPS: the Acids Res 2013;41:D793–800.
database of proteolytic enzymes, their substrates and inhibitors. [96] Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y,
Nucleic Acids Res 2014;42:D503–9. et al. HMDB 3.0 – The Human Metabolome Database in 2013.
[76] Ceol A, Chatr Aryamontri A, Licata L, Peluso D, Briganti L, Nucleic Acids Res 2013;41:D801–7.
Perfetto L, et al. MINT, the molecular interaction database: [97] Caspi R, Altman T, Billington R, Dreher K, Foerster H, Fulcher
2009 update. Nucleic Acids Res 2009;2010(38):D532–9. CA, et al. The MetaCyc database of metabolic pathways and
[77] Pieper U, Webb BM, Dong GQ, Schneidman-Duhovny D, Fan enzymes and the BioCyc collection of Pathway/Genome
H, Kim SJ, et al. ModBase, a database of annotated compara- Databases. Nucleic Acids Res 2014;42:D459–71.
tive protein structure models and associated resources. Nucleic [98] Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O,
Acids Res 2014;42:D336–46. Anwar N, et al. Pathway commons, a web resource for
[78] Chen T, Zhou T, He B, Yu H, Guo X, Song X, et al. biological pathway data. Nucleic Acids Res 2011;39:D685–90.
MUbiSiDa: a comprehensive database for protein ubiquitination [99] Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay
sites in mammals. PLoS One 2014;9:e85744. T, et al. PID: the pathway interaction database. Nucleic Acids
[79] Mi H, Guo N, Kejariwal A, Thomas PD. PANTHER version 6: Res 2009;37:D674–9.
protein sequence and function evolution data with expanded [100] Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, et al.
representation of biological pathways. Nucleic Acids Res The Reactome pathway knowledgebase. Nucleic Acids Res
2007;35:D247–52. 2014;42:D472–7.
[80] Gutmanas A, Alhroub Y, Battle GM, Berrisford JM, Bochet E, [101] Morgat A, Coissac E, Coudert E, Axelsen KB, Keller G, Bairoch
Conroy MJ, et al. PDBe: Protein Data Bank in Europe. Nucleic A, et al. UniPathway: a resource for the exploration and
Acids Res 2014;42:D285–91. annotation of metabolic pathways. Nucleic Acids Res
[81] Ren J, Jiang C, Gao X, Liu Z, Yuan Z, Jin C, et al. PhosSNP for 2012;40:D761–9.
systematic analysis of genetic polymorphisms that influence [102] Bai Z, Han G, Xie B, Wang J, Song F, Peng X, et al. AlzBase: an
protein phosphorylation. Mol Cell Proteomics 2010;9:623–34. Integrative Database for gene dysregulation in Alzheimer’s
Zou D et al / Human-related Biological Databases 63
disease. Mol Neurobiol 2014. http://dx.doi.org/10.1007/s12035- diseases and their annotation. Database (Oxford)
014-9011-3. 2013;2013:bat018.
[103] Liu H, Liu W, Liao Y, Cheng L, Liu Q, Ren X, et al. CADgene: [116] Huang WY, Hsu SD, Huang HY, Sun YM, Chou CH, Weng SL,
a comprehensive database for coronary artery disease genes. et al. MethHC: a database of DNA methylation and gene
Nucleic Acids Res 2011;39:D991–6. expression in human cancer. Nucleic Acids Res
[104] Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, 2015;43:D856–61.
Boutselakis H, et al. COSMIC: exploring the world’s knowledge [117] He X, Chang S, Zhang J, Zhao Q, Xiang H, Kusonmano K,
of somatic mutations in human cancer. Nucleic Acids Res et al. MethyCancer: the database of human DNA methylation
2015;43:D805–11. and cancer. Nucleic Acids Res 2008;36:D836–41.
[105] Lv J, Liu H, Su J, Wu X, Liu H, Li B, et al. DiseaseMeth: a [118] Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, et al.
human disease methylation database. Nucleic Acids Res MiR2Disease: a manually curated database for microRNA
2012;40:D1030–5. deregulation in human disease. Nucleic Acids Res
[106] Bauer-Mehren A, Rautschka M, Sanz F, Furlong LI. DisGeNET: 2009;37:D98–D104.
a cytoscape plugin to visualize, integrate, search and analyze gene- [119] Brandon MC, Lott MT, Nguyen KC, Spolim S, Navathe SB,
disease networks. Bioinformatics 2010;26:2924–6. Baldi P, et al. MITOMAP: a human mitochondrial genome
[107] Ringner M, Fredlund E, Hakkinen J, Borg A, Staaf J. GOBO: database – 2004 update. Nucleic Acids Res 2005;33:D611–3.
gene expression-based outcome for breast cancer online. PLoS [120] Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins
One 2011;6:e17911. H, et al. The NHGRI GWAS catalog, a curated resource of
[108] Beck T, Hastings RK, Gollapudi S, Free RC, Brookes AJ. SNP-trait associations. Nucleic Acids Res 2013;42:D1001–6.
GWAS Central: a comprehensive resource for the comparison [121] Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh
and interrogation of genome-wide association studies. Eur J A. OMIM.org: online mendelian inheritance in man
Hum Genet 2014;22:949–52. (OMIM(R)), an online catalog of human genes and genetic
[109] Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, et al. disorders. Nucleic Acids Res 2015;43:D789–98.
GWASdb: a database for human genetic variants identified by [122] Yang Z, Yang J, Liu W, Wu L, Xing L, Wang Y, et al.
genome-wide association studies. Nucleic Acids Res T2D@ZJU: a knowledgebase integrating heterogeneous connec-
2012;40:D1047–54. tions associated with type 2 diabetes mellitus. Database (Oxford)
[110] Giardine B, Borg J, Viennas E, Pavlidis C, Moradkhani K, Joly 2013;2013:bat052.
P, et al. Updates of the HbVar database of human hemoglobin [123] Beroud C, Collod-Beroud G, Boileau C, Soussi T, Junien C.
variants and thalassemia mutations. Nucleic Acids Res UMD (Universal mutation database): a generic software to build
2014;42:D1063–9. and analyze locus-specific databases. Hum Mutat 2000;15:86–94.
[111] Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, Cooper [124] Li Y, Wang C, Miao Z, Bi X, Wu D, Jin N, et al. ViRBase: a
DN. The Human Gene Mutation Database (HGMD) and its resource for virus-host ncRNA-associated interactions. Nucleic
exploitation in the fields of personalized genomics and molecular Acids Res 2015;43:D578–82.
evolution. Curr Protoc Bioinformatics 2012 [Chapter 1, [125] The Gene Ontology Consortium. Gene Ontology Consortium:
Unit1.13]. going forward. Nucleic Acids Res 2015;43:D1049–56.
[112] Piirila H, Valiaho J, Vihinen M. Immunodeficiency mutation [126] Gray KA, Yates B, Seal RL, Wright MW, Bruford EA.
databases (IDbases). Hum Mutat 2006;27:1200–8. Genenames.org: the HGNC resources in 2015. Nucleic Acids
[113] Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X, et al. Res 2015;43:D1079–85.
LncRNADisease: a database for long-non-coding RNA-associ- [127] The Europe PMC Consortium. Europe PMC: a full-text litera-
ated diseases. Nucleic Acids Res 2013;41:D983–6. ture database for the life sciences and platform for innovation.
[114] Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, Nucleic Acids Res 2015;43:D1042–8.
den Dunnen JT. LOVD v. 2.0: the next generation in gene [128] Lu Z. PubMed and beyond: a survey of web tools for searching
variant databases. Hum Mutat 2011;32:557–63. biomedical literature. Database (Oxford) 2011;2011:baq036.
[115] Rappaport N, Nativ N, Stelzer G, Twik M, Guan-Golan Y, [129] Sequeira E, McEntyre J, Lipman D. PubMed central decentral-
Stein TI, et al. MalaCards: an integrated compendium for ized. Nature 2001;410:740.