0% found this document useful (0 votes)
5 views

01-Introduction Bioinformatics

bioinformatics, a hybrid science that links biological data with techniques for information storage, distribution, and analysis to support multiple areas of scientific research, including biomedicine.

Uploaded by

Shewafera Ademu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

01-Introduction Bioinformatics

bioinformatics, a hybrid science that links biological data with techniques for information storage, distribution, and analysis to support multiple areas of scientific research, including biomedicine.

Uploaded by

Shewafera Ademu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Introduction to Bioinformatics

for Medical Research

Gideon Greenspan
[email protected]
TA: Oleg Rokhlenko

Lecture 1
Introduction to Bioinformatics
Introduction to Bioinformatics
• What is Bioinformatics?
• Why do we need it?
• Development timeline
• Journals, books, websites
• How to access bioinformatics tools?
• Why is bioinformatics hard?
• PubMed and OMIM databases
2
Bioinformatics: What?
• NCBI: “Research, development, or application of
computational tools and approaches for expanding the use
of biological, medical, behavioral or health data, including
those to acquire, store, organize, archive, analyze, or
visualize such data.”
• Lincoln Stein: “Biologists using computers, or the
other way around.”
• Martin Gerstel (Compugen): “Bioinformatics is
a name which will probably disappear with time.”

3
Bioinformatics: Why?
• Storing large quantity of data
– Sequencing
– Crystallography
– DNA chips
• Enabling fast retrieval
– Database searching
• Data mining and analysis
– Integrate diverse sources
4
Human Genome Project
• Initiated in 1988, declared ‘complete’ 2003
• Major goals
– Determine 3¥109 base pairs
– Identify ~30,000 genes
• Computational tasks
– Storage and indexing
– Building contigs
– Scanning for genes
5
Human Genome Progress

Source: EMBL Genome Monitoring Table


6
IBM’s Blue Gene
• Task: in-silico
protein folding
• Announced 1999
– Expanded in 2001
• 500,000 times faster
than Pentium IV
• Aim: Fold one protein per year

7
Bioinformatics: When?
Watson and Crick Sanger sequences
1955
DNA model insulin protein
1960
N-W sequence ARPANET (early
alignment 1965 Internet)

1970
PDB (Protein Sanger dideoxy
Data Bank) 1975 DNA sequencing

GenBank 1980 PCR (Polymerase


database Chain Reaction)
1985 8
SWISS-PROT
USA’s NCBI
database

FASTA 1990
Human Genome
algorithm Initiative

BLAST
Israel’s INN
algorithm
1995
WWW (World
Europe’s EBI
Wide Web)

2000 First human


Celera Genomics
genome draft
9
GenBank Growth

Source:
NCBI
10
PubMed Growth
14,000,000

12,000,000
Articles in Database

10,000,000

8,000,000

6,000,000

4,000,000

2,000,000

0
59
62
65
68
71
74
77

0
83
86
89
92
95
98
01
8
19
19
19
19
19
19
19
19
19
19
19
19
19
19
20
11
Bioinformatics: Where?
• Journals

12
• Books
– David W. Mount, Bioinformatics: Sequence and
Genome Analysis
– Cynthia Gibas, Developing Bioinformatics Computer
Skills
– Bryan P. Bergeron, Bioinformatics Computing

13
• World Wide Web
– USA National Center for Biotechnology
Information: www.ncbi.nlm.nih.gov
– European Bioinformatics Institute:
www.ebi.ac.uk
– ExPASy Molecular Biology Server:
www.expasy.org
– Israeli National Node: inn.org.il
– Open source news: bioinformatics.org
– German directory: bioinformatik.de

14
Bioinformatics: How?
• Pre-packaged tools
– Majority on World Wide Web
– Some require downloading
– Most are free to use
• Beginning development
– Mostly Unix environment
– Perl programming language

15
The Trouble with Nature
• Hard to represent
• Understanding still incomplete
• Some problems insoluble?

16
The Trouble with Man
• Confusing choice of tools
• Developed independently
• Written by and for nerds

17
Making it Simpler

18
PubMed
• MEDLINE publication database
– Over 17,000 journals
– Some other citations
• Papers from 1960s
– Over 12,000,000 entries
• Alerting services
– http://www.pubcrawler.ie/
– http://www.biomail.org/
19
A PubMed Entry
• Journal reference
– Volume, number, date, pages
– Title, authors, affiliation
– Abstract Cancer 2003 May 1;97(9):2248-53

• Links Pregnancy and early-stage melanoma.


Daryanani D, Plukker JT, De Hullu JA,
– Related articles Kuiper H, Nap RE, Hoekstra HJ.
Division of Surgical Oncology, University
– Full text (sometimes) Medical Center, Groningen, The Netherlands.
– Database entries BACKGROUND: Cutaneous melanomas are
aggressive tumors with an unpredictable…
20
Searching PubMed
• Structureless searches
– Automatic term mapping
• Structured searches
– Field names, e.g. [au], [ta], [dp], [ti]
– Boolean operators, e.g. AND, OR, NOT, ()
• Additional features
– Subsets, limits
– Clipboard, history
21
OMIM
• Online Mendelian Inheritance in Man
– Genes and genetic disorders
– Edited by team at Johns Hopkins
– Updated daily
• Entries
– 10670 single-loci phenotypes (*)
– 1294 multi-loci phenotypes (#)
– 2415 unclassified phenotypes
22
An OMIM Entry
• Phenotype description
– Clinical features
– Diagnosis and treatment
– Molecular genetics CYSTIC FIBROSIS; CF
• Inheritance Model Alternative titles; symbols
MUCOVISCIDOSIS
– Mapping history
Gene map locus 7q31.2
– Genetic locus/loci DESCRIPTION
Manifestations relate not only to the disruption
• References of exocrine function of the pancreas…

23
Searching OMIM
• Search Fields
– Disease name, e.g. hypertension
– Cytogenetic location, e.g. 1p31.6
– Inheritance, e.g. autosomal dominant
• Browsing Interfaces
– Alphabetical by disease
– Genetic map
• Additional features like PubMed
24

You might also like