0% found this document useful (0 votes)
59 views50 pages

2a.BioinfoServerDatabase (Proteomics)

This document provides information about bioinformatics servers and databases. It lists popular bioinformatics tools and websites that host biological sequence data and analysis services, including ExPASy, UniProt, NCBI, EBI, DDBJ, GenomeNet, EMBNet, and PDB. It also lists various human genome centers and their directors. Finally, it discusses common file formats for representing and storing biological sequence data.

Uploaded by

Quicker Quick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views50 pages

2a.BioinfoServerDatabase (Proteomics)

This document provides information about bioinformatics servers and databases. It lists popular bioinformatics tools and websites that host biological sequence data and analysis services, including ExPASy, UniProt, NCBI, EBI, DDBJ, GenomeNet, EMBNet, and PDB. It also lists various human genome centers and their directors. Finally, it discusses common file formats for representing and storing biological sequence data.

Uploaded by

Quicker Quick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Bioinformatics

Servers and Database


Trinh Hong Thai
Dept. of Biology, College of Science, VNU
Bioinformatics Application
Tools

Downloadable On-line application


software

Freeware Shareware Free Membership


(payware) access (payable)
Bioinformatics Application
Tools

User friendly

Good looking Good quality

area of interest
Bioinformatics servers and databases
ExPASy
http://www.expasy.org/
Bioinformatics servers and databases
ExPASy
http://www.expasy.org/
Bioinformatics servers and databases
UniProt database
http://www.uniprot.org/
Bioinformatics servers and databases
NCBI
http://www.ncbi.nlm.nih.gov/
Bioinformatics servers and databases
EBI
http://www.ebi.ac.uk/
Bioinformatics servers and databases
DDBJ
http://www.ddbj.nig.ac.jp/
Bioinformatics servers and databases
GenomeNet
http://www.genome.ad.jp/
Bioinformatics servers and databases
EMBNet
http://www.ch.embnet.org/
Bioinformatics servers and databases
Protein Data Bank (PDB)
http://www.rcsb.org/pdb/index.html
Bioinformatics servers and databases
DBTSS: Database of Transcriptional Start Sites
http://dbtss.hgc.jp/index.html
Bioinformatics servers and databases
Human Genome Centers and Contacts
Informatics
Human Genome Center Director
Contact
Baylor College of Medicine Human Genome Center Richard Andy
http://www.hgsc.bcm.tmc.edu/ Gibbs Arenson
Columbia University Genome Center Arg Eric Schon
http://genome1.ccc.columbia.edu/%7Egenome/ Efstratiadis
Cooperative Human Linkage Center (CHLC) Jeff Murray Ken
http://lpg.nci.nih.gov/CHLC/ Buetow
Eleanor Roosevelt Institute (U. Colorado) David Guido
http://eri.uchsc.edu/chromosome21 Patterson Vacano
Fondation Jean Dausset - CEPH, Howard Mourad
http://www.cephb.fr/ Cann Sahbatou
Genethon Jean Cecile
http://www.genethon.fr/genethon_en.html Weissenbach Fizames
Lawrence Berkeley Laboratory Human Genome Michael Nomi
Center (LBL) Palazzolo Harris
http://www-hgc.lbl.gov/GenomeHome.html
Lawrence Livermore National Laboratory Biology Tony Tom
and Biotechnology Research Program (LLNL) Carrano Slezak
http://www-bio.llnl.gov/bbrp/genome/genome.html
Los Alamos National Laboratory Center for Human Larry L. Robert
Genome Studies (LANL) Deaven Sutherland
http://www-ls.lanl.gov/index.html
(1 of 3)
Bioinformatics servers and databases
Human Genome Centers and Contacts
Human Genome Center Director Informatics
Contact
James L. Chengfeng
Marshfield Medical Research Foundation Weber Zhao
http://www.marshmed.org/genetics/
Resource for Molecular Cytogenetics (UCSF/LBL) Joe Gray Manfred
http://ioerror.ucsf.edu:8080/%7Edfdavy/rmchome.html Zorn
Sanger Centre John Sulston Peter Rice
http://www.sanger.ac.uk/
Stanford DNA Sequence and Technology Center Ron Davis Mike Cherry
http://genome-www.stanford.edu/SDSATC/
Stanford Human Genome Center Richard Myers, Kate
http://shgc-www.stanford.edu/ David Cox McKusick
TIGR - The Institute for Genomic Research Claire M. Tony
http://www.tigr.org/ Fraser Kerlavage
University of Michigan Human Genome Center Miriam Meisler Spencer
http://seqcore.brcf.med.umich.edu/ Thomas
University of Pennsylvania Beverly Chris
http://www.cbil.upenn.edu/ Emanuel Overton
University of Texas Health Science Center at San Sue Naylor Vladimir
Antonio Genome Center Pekkel
http://apollo.uthscsa.edu/

(2 of 3)
Bioinformatics servers and databases
Human Genome Centers and Contacts

Human Genome Center Director Informatics


Contact
University of Texas Southwestern Medical Center Glen Evans Chris Davies
http://www.swmed.edu/
University of Utah Ray Gesteland Peter
http://www.genetics.utah.edu/ Cartwright
University of Washington Genome Center Maynard Phil Green
http://www.genome.washington.edu/uwgc/ Olson
Washington University Center for Genetics in Medicine David States David States
http://www.ibc.wustl.edu/cgm/cgm.html
Washington University Genome Sequencing Center Bob LaDeana
http://genome.wustl.edu/gsc/gschmpg.html Waterston Hillier
Weizmann Institute Doron Lancet Jaime
http://bioinformatics.weizmann.ac.il/wis_genome_project.html Prilusky
Wellcome Trust Centre for Human Genetics John Bell, Geoff
http://www.well.ox.ac.uk/ Mark Lathrop Barton
Whitehead Institute Center for Genome Research (at MIT) Eric Lander Lincoln
http://www-genome.wi.mit.edu/ Stein
Albert Einstein Genome Center Raju Perry Miller
http://sequence.aecom.yu.edu/chr12/ Kucherlapati
(3 of 3)
Bioinformatics servers and databases
BLAST
http://www.ncbi.nlm.nih.gov/BLAST/
Bioinformatics servers and databases
SWISS-MODEL
http://www.expasy.org/swissmod/SWISS-MODEL.html
Bioinformatics servers and databases
Pedro's BioMolecular Research Tools
http://www.biophys.uni-
duesseldorf.de/BioNet/Pedro/research_tools.html
Multiple sequence alignment
Program ClustalX
Phylogenetic trees
TreeView
Representing and
Retrieving Sequences
Definition

• A sequence is a linear set of characters


(sequence elements) representing
nucleotides or amino acids
 DNA composed of four nucleotides or
"bases": A, C, G, T
 RNA composed of four also: A, C, G, U (T
transcribed as U)
 proteins are composed of amino acids (20)
Representation of Sequences

• Characters
 Simplest

 Easy to read, edit, etc.


• Bit-coding
 More compact, both on disk and in
memory
 Comparisons more efficient

 More to come on this


Character representation
of sequences
• DNA or RNA
 use 1-letter codes (e.g., A, C, G, T)
• Protein
 use 1-letter codes
• can convert to/from 3-letter
codes
(e.g., A = Ala = Alanine
C = Cys = Cysteine)
Representing uncertainty in
nucleotide sequences

• It is often the case that we would like to represent


uncertainty in a nucleotide sequence, i.e., that
more than one base is “possible” at a given
position
 to express ambiguity during sequencing
 to express variation at a position in a gene
during evolution
 to express ability of an enzyme to tolerate
more than one base at a given position of a
recognition site
Representing uncertainty in
nucleotide sequences
• To do this for nucleotides, we use a set
of single character codes that represent
all possible combinations of bases
• This set was proposed and adopted by
the International Union of Biochemistry
and is referred to as the I.U.B. code
The I.U.B. Code
• A, C, G, T, U
• R = A, G (puRine)
• Y = C, T (pYrimidine)
• S = G, C (Strong hydrogen bonds)
• W = A, T (Weak hydrogen bonds)
• M = A, C (aMino group)
• K = G, T (Keto group)
• B = C, G, T (not A)
• D = A, G, T (not C)
• H = A, C, T (not G)
• V = A, C, G (not T/U)
• N = A, C, G, T/U (iNdeterminate) X or - are
sometimes used
Representing uncertainty in
protein sequences

• Given the size of the amino acid


“alphabet”, it is not practical to design a
set of codes for ambiguity in protein
sequences
• Fortunately, ambiguity is less common in
protein sequences than in nucleic acid
sequences
• Could use bit-coding as for nucleic acids
but rarely done
Single Letter Code (SLC)
Amino
acids
Sequence
File Formats
Sequence file formats
• Two characteristics of file formats
 text or binary
 minimal or annotated
• Text files use IUB codes and are readable by a
word processor (e.g., SimpleText, Microsoft
Word) or text editor (e.g., emacs)
• Binary files are usually readable only by the
program that created them (e.g., MacVector)
• Annotated files preserve information known
about the sequence (coding region start/stop,
protein features, literature references, etc.)
Examples of ASCII sequence
file formats
• Fasta
>gi|995614|dbj|D49653|RATOBESE Rat mRNA for obese.
CCAAGAAGAAGAAGACCCCAGCGAGGAAAATGTGCTGGAGACCCC
TGTGCCGGTTCCTGTGGCTTTGGTCCTGTCCTATGTTCAAGCTGTG
CCTATCCACAAAGTCCAGGATGACACCAAAACCCTCATCAAGACCAT
TGTCACCAGGATCAATGACATTTCACACACGCAGTCGGTATCCGCC
AGGCAGAGGGTCACCGGTTTGGACTTCATTCCCGGGCTTCACCCCA
TTCTGAGTTTGTCCAAGATGGACCAGACCCTGGCAGTCTATCAACA
GATCCTCACCAGCTTGCCTTCCCAAAACGTGCTGCAGATAGCTCAT
GACCTGGAGAACCTGCGAGACCTCCTCCATCTGCTGGCCTTCTCCA
AGAGCTGCTCCCTGCCGCAGACCCGTGGCCTGCAGAAGCCAGAGA
GCCTGGATGGCGTCCTGGAAGCCTCGCTCTACTCCACAGAGGTGG
TGGCTCTGAGCAGGCTGCAGGGCTCTCTGCAGGACATTCTTCAACA
GTTGGACCTTAGCCCTGAATGCTGAGGTTTC
Examples of ASCII sequence
file formats
• GCG
LOCUS RATOBESE.G 539 BP SS-RNA ENTERED 09/23/95
DEFINITION Rat mRNA for obese.
ACCESSION -
KEYWORDS -
SOURCE Rattus norvegicus; Norway rat
ORGANISM Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata;
Sarcopterygii; Mammalia; Eutheria; Rodentia; Sciurognathi;
Myomorpha; Muridae; Murinae; Rattus
REFERENCE [1]
AUTHORS Murakami, T. & Shima, K.
TITLE Cloning of rat obese cDNA and its expression in obese rats.
JOURNAL Biochem. Biophys. Res. Commun., 209, 3, 944-952, (1995)
COMMENT Database Reference:
DDBJ RATOBESE
Accession: D49653
------------
Submitted (10-Mar-1995) to DDBJ by:
Takashi Murakami
Department of Laboratory Medicine
School of Medicine
University of Tokushima
Kuramotocho 3-chome
Tokushima 770
Japan
Phone: +81-886-33-7184
Fax: +81-886-31-9495
[continued]
Examples of ASCII sequence
file formats
– GCG [continued]
FEATURES From To/Span Description
pept 30 533 obese
???? 1 539 source; /organism=Rattus norvegicus;
/strain=OLETF, LETO and Zucker;
/dev_stage=differentiated; /sequenced_mol=cDNA
to mRNA; /tissue_type=adipose
BASE COUNT 121 A 167 C 133 G 118 T 0 OTHER
ORIGIN ?
RATOBESE.G Length: 539 Jan 30, 1996 - 05:32 PM Check: 5797 ..
1 CCAAGAAGAA GAAGACCCCA GCGAGGAAAA TGTGCTGGAG ACCCCTGTGC CGGTTCCTGT
61 GGCTTTGGTC CTATCTGTCC TATGTTCAAG CTGTGCCTAT CCACAAAGTC CAGGATGACA
121 CCAAAACCCT CATCAAGACC ATTGTCACCA GGATCAATGA CATTTCACAC ACGCAGTCGG
181 TATCCGCCAG GCAGAGGGTC ACCGGTTTGG ACTTCATTCC CGGGCTTCAC CCCATTCTGA
241 GTTTGTCCAA GATGGACCAG ACCCTGGCAG TCTATCAACA GATCCTCACC AGCTTGCCTT
301 CCCAAAACGT GCTGCAGATA GCTCATGACC TGGAGAACCT GCGAGACCTC CTCCATCTGC
361 TGGCCTTCTC CAAGAGCTGC TCCCTGCCGC AGACCCGTGG CCTGCAGAAG CCAGAGAGCC
421 TGGATGGCGT CCTGGAAGCC TCGCTCTACT CCACAGAGGT GGTGGCTCTG AGCAGGCTGC
481 AGGGCTCTCT GCAGGACATT CTTCAACAGT TGGACCTTAG CCCTGAATGC TGAGGTTTC
//
Entrez
Entrez Databases
http://www.ncbi.nlm.nih.gov/
• PubMed: The biomedical literature
PUBMED database contains Medline abstracts as
well as links to full text articles on sites maintained
by journal publishers
• PubMed Central: free, full text journal articles
• Books: online books
• OMIM: Online Mendelian Inheritance in Man
• Nucleotide sequence database (Genbank)
• Protein sequence database
• Genome: complete genome assemblies
• Structure: three-dimensional macromolecular
structures
Entrez Databases

• Taxonomy: organisms in GenBank


• SNP: single nucleotide polymorphism
• PopSet: population study data sets
• And many more…
Entrez essentials

• Semi-automated entry of
information into databases
• Critical to usefulness is the links
between databases
Entrez literature searching

• Can find papers on a given subject


• Can find papers on a specific gene
• Can find papers related to a given paper
• Can switch between literature and sequence
databases
• Pubmed has links to publishers’ websites to
view full text of articles
• Pubmed Central has free full text copies
Entrez sequence searching

• Can find sequences for a given


gene or protein
• Can download copy of sequence
Example Entrez Session
• Goal: Find literature and sequences for cystic
fibrosis genes
 Use OMIM with Keyword searching.
 Switch to Nucleotide database to see
sequence.
 Switch to Protein database to see sequence.
 Change to GenPept format to save sequence.
 Use links to find related literatures in pubmed.
 Use Related Articles to find similar articles.
 Search the Nucleotide database by gene name.
 Set Limits to narrow down the search
Example Entrez Session:
home of Entrez
Example Entrez Session:
search OMIM for ‘BRCA1’
Example Entrez Session:
first hit
Example Entrez Session:
after clicking linksNucleotide
Example Entrez Session:
after clicking linksProtein
Example Entrez Session:
LinksPubMed
Block Diagram for Entrez
Literature Searching

Results of
Previous Search
Results of
Additional Entrez Search (List)
Search Criterion
Search
Displayed Item
Engine Item Display
Selection

Desired Output
Format

You might also like