0% found this document useful (0 votes)
146 views30 pages

Introduction To Bioinformatics

This document provides an introduction to bioinformatics for biomedical engineers. It defines key terms related to DNA, RNA, proteins and their relationships. It describes major genome and DNA sequencing projects. It outlines databases for genomic, protein and other biomolecular data. It introduces online tools for sequence alignment, database searching and primer design. It highlights specific databases like GenBank, OMIM and protein databases like UniProt.

Uploaded by

abdulmoiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views30 pages

Introduction To Bioinformatics

This document provides an introduction to bioinformatics for biomedical engineers. It defines key terms related to DNA, RNA, proteins and their relationships. It describes major genome and DNA sequencing projects. It outlines databases for genomic, protein and other biomolecular data. It introduces online tools for sequence alignment, database searching and primer design. It highlights specific databases like GenBank, OMIM and protein databases like UniProt.

Uploaded by

abdulmoiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Introduction to Bioinformatics

for Biomedical Engineers

Dr. Nisar A. Shar


DNA RNA protein
gene ORF secondary(2o)
intron/exon coding region tertiary(3o)
chromosome mRNA structure
genome codon crystal
nuclear catalytic site
mitochondrial

• Molecular Biology of the Cell


http://www.ncbi.nlm.nih.gov/books/NBK21054/
• Molecular Cell Biology
http://www.ncbi.nlm.nih.gov/books/NBK21475/
Genome Resource Databases
and the WWW
Bioinformatics and databases

Databases are growing exponentially: > 100 gigabases


• Bibliographic (e.g. MedLine) • Nucleotide Seq (e.g. GenBank, EMBL)
• Amino Acid Seq (e.g. SWISS-PROT) • Biochemical Pathways (e.g. KEGG, WIT)
• 3D Molecular Structure (e.g. PDB) • Motif Libraries (e.g. PROSITE, Blocks)
DNA Data Projects
• 1000 Genomes Project- 1000 people’s genomes
• Genome 10k Project- sequence 10,000 vertebrate
genomes
• Metagenomics-environmental samples like skin, digestive
tract, soil
• Paleogenetics- ancient DNA such as history of man,
Neanderthal DNA
RNA is sequenced as cDNA
How to find the coding sequence

Human clotting
Factor IX
Start codon

Stop codon
TAA
TGA
TAG
Amino acids encoded as triplet codons shows ORF
Translated mRNA sequence as single letter amino
acid code

Human clotting
Factor IX
Translated mRNA sequence as single letter amino
acid code
Single base change in Factor IX
is responsible for hemophilia B

*note that most single bp changes have no effect


Domains (regions of the sequence) with
functions in the protein can be identified in
Factor IX
Mapping Factor IX onto the
chromosomes
Where is the data being generated?

The Wellcome Trust Sanger Institute (charity UK)


http://www.sanger.ac.uk/research/projects/

National Human Genome Research Institute (public USA)


http://www.genome.gov/ResearchAtNHGRI/

J. Craig Venter Institute (charity USA)


http://www.jcvi.org/cms/research/groups/

Fred Sanger
Where is all the DNA sequence data stored?

Ensembl Genome Browser


http://www.ensembl.org/index.html

UCSC Genome Browser


http://genome.ucsc.edu/cgi-bin/hgGateway

NCBI (National Center for Biotechnology Information)


http://www.ncbi.nlm.nih.gov/

MGI (Mouse Genome Informatics)


http://www.informatics.jax.org/
Online tools

NCBI BLAST (Basic Local Alignment Search Tool):


http://blast.ncbi.nlm.nih.gov/Blast.cgi

UCSC BLAT alignment tool:


http://genome.ucsc.edu/cgi-bin/hgBlat

Ensembl BLAT alignment tool:


http://www.ensembl.org/Multi/blastview

Uniprot:
http://www.uniprot.org/

iHOP:
http://www.ihop-net.org/UniPub/iHOP/

OMIM (Online Mendelian Inheritance in Man):


http://www.ncbi.nlm.nih.gov/omim/

Primer3 primer design:


http://primer3.wi.mit.edu/
GenBank
http://www.ncbi.nlm.nih.gov/genbank/

• GenBank is based at the NCBI in the US, but


synchronised with EMBL and DDBJ
GenBank

• GenBank is an example of the international science


community freely sharing data
• Sharing data is a condition of publication in all reputable
scientific journals
• GenBank also includes sequences never formally published
(e.g. from forensic-DNA labs and ancestry-testing companies
like FamilyTree)
• Anyone can add their sequences, and there are no
restrictions on access – like Wikipedia
• Like Wikipedia – beware errors (mix-up, incorrect labelling,
sequencing errors, etc….)
http://www.ncbi.nlm.nih.gov/gquery/
BRCA1 [Homo sapiens]

GenBank: AAC37594.1
LOCUS AAC37594 1863 aa linear PRI 12-OCT-2005
DEFINITION BRCA1 [Homo sapiens].
ACCESSION AAC37594
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (residues 1 to 1863)
AUTHORS Smith,T.M., Lee,M.K., Szabo,C.I., Jerome,N., McEuen,M., Taylor,M.,
Hood,L. and King,M.C.
TITLE Complete genomic sequence and analysis of 117 kb of human DNA
containing the gene BRCA1
JOURNAL Genome Res. 6 (11), 1029-1049 (1996)
PUBMED 8938427
COMMENT Characterization of an aberrant BRCA1 cDNA clone in the original
report (Miki, 1994)
/germline
Protein 1..1863
/product="BRCA1”
CDS 1..1863
ORIGIN
1 mdlsalrvee vqnvinamqk ilecpiclel ikepvstkcd hifckfcmlk llnqkkgpsq
61 cplcknditk rslqestrfs qlveellkii cafqldtgle yansynfakk ennspehlkd
121 evsiiqsmgy rnrakrllqs epenpslqet slsvqlsnlg tvrtlrtkqr iqpqktsvyi
181 elgsdssedt vnkatycsvg dqellqitpq gtrdeislds akkaacefse tdvtntehhq
241 psnndlntte kraaerhpek yqgssvsnlh vepcgtntha sslqhenssl lltkdrmnve
301 kaefcnkskq pglarsqhnr wagsketcnd rrtpstekkv dlnadplcer kewnkqklpc
FASTA format
OMIM

http://www.ncbi.nlm.nih.gov/omim/
iHOP

http://www.ihop-net.org/UniPub/iHOP/
The EMBL Nucleotide Sequence Database

EMBL-EBI/NCBI-GenBank/DDBJ
Protein databases: SWISS-PROT
Uni-Prot Protein Sequence Database
UniProt is a non-redundant composite of publicly-
available primary sources:
» UniProtKB/Swiss-Prot which is manually
annotated and is reviewed
» UniProtKB/TrEMBL which is automatically
annotated and is not reviewed

SWISS-PROT is the highest priority source, all


others being compared against it to eliminate
identical and trivially-different sequences.

The strict redundancy criteria render UniProt


relatively "small" and hence efficient in similarity
searches.
http://www.ebi.ac.uk/uniprot/search/SearchTools.html

You might also like