01-Introduction Bioinformatics
01-Introduction Bioinformatics
Gideon Greenspan
[email protected]
TA: Oleg Rokhlenko
Lecture 1
Introduction to Bioinformatics
Introduction to Bioinformatics
• What is Bioinformatics?
• Why do we need it?
• Development timeline
• Journals, books, websites
• How to access bioinformatics tools?
• Why is bioinformatics hard?
• PubMed and OMIM databases
2
Bioinformatics: What?
• NCBI: “Research, development, or application of
computational tools and approaches for expanding the use
of biological, medical, behavioral or health data, including
those to acquire, store, organize, archive, analyze, or
visualize such data.”
• Lincoln Stein: “Biologists using computers, or the
other way around.”
• Martin Gerstel (Compugen): “Bioinformatics is
a name which will probably disappear with time.”
3
Bioinformatics: Why?
• Storing large quantity of data
– Sequencing
– Crystallography
– DNA chips
• Enabling fast retrieval
– Database searching
• Data mining and analysis
– Integrate diverse sources
4
Human Genome Project
• Initiated in 1988, declared ‘complete’ 2003
• Major goals
– Determine 3¥109 base pairs
– Identify ~30,000 genes
• Computational tasks
– Storage and indexing
– Building contigs
– Scanning for genes
5
Human Genome Progress
7
Bioinformatics: When?
Watson and Crick Sanger sequences
1955
DNA model insulin protein
1960
N-W sequence ARPANET (early
alignment 1965 Internet)
1970
PDB (Protein Sanger dideoxy
Data Bank) 1975 DNA sequencing
FASTA 1990
Human Genome
algorithm Initiative
BLAST
Israel’s INN
algorithm
1995
WWW (World
Europe’s EBI
Wide Web)
Source:
NCBI
10
PubMed Growth
14,000,000
12,000,000
Articles in Database
10,000,000
8,000,000
6,000,000
4,000,000
2,000,000
0
59
62
65
68
71
74
77
0
83
86
89
92
95
98
01
8
19
19
19
19
19
19
19
19
19
19
19
19
19
19
20
11
Bioinformatics: Where?
• Journals
12
• Books
– David W. Mount, Bioinformatics: Sequence and
Genome Analysis
– Cynthia Gibas, Developing Bioinformatics Computer
Skills
– Bryan P. Bergeron, Bioinformatics Computing
13
• World Wide Web
– USA National Center for Biotechnology
Information: www.ncbi.nlm.nih.gov
– European Bioinformatics Institute:
www.ebi.ac.uk
– ExPASy Molecular Biology Server:
www.expasy.org
– Israeli National Node: inn.org.il
– Open source news: bioinformatics.org
– German directory: bioinformatik.de
14
Bioinformatics: How?
• Pre-packaged tools
– Majority on World Wide Web
– Some require downloading
– Most are free to use
• Beginning development
– Mostly Unix environment
– Perl programming language
15
The Trouble with Nature
• Hard to represent
• Understanding still incomplete
• Some problems insoluble?
16
The Trouble with Man
• Confusing choice of tools
• Developed independently
• Written by and for nerds
17
Making it Simpler
18
PubMed
• MEDLINE publication database
– Over 17,000 journals
– Some other citations
• Papers from 1960s
– Over 12,000,000 entries
• Alerting services
– http://www.pubcrawler.ie/
– http://www.biomail.org/
19
A PubMed Entry
• Journal reference
– Volume, number, date, pages
– Title, authors, affiliation
– Abstract Cancer 2003 May 1;97(9):2248-53
23
Searching OMIM
• Search Fields
– Disease name, e.g. hypertension
– Cytogenetic location, e.g. 1p31.6
– Inheritance, e.g. autosomal dominant
• Browsing Interfaces
– Alphabetical by disease
– Genetic map
• Additional features like PubMed
24