0% found this document useful (0 votes)
115 views24 pages

Bioinformatics Molecular Biology

This document discusses bioinformatics and its role in analyzing biological data from molecular biology experiments. It defines bioinformatics as using computers to analyze biological data at the molecular level, such as DNA and protein sequences. It describes some of the challenges of molecular biology that require computational analysis, such as assembling DNA sequences, finding genes within genomes, analyzing protein sequences and structures. It also discusses how the role of biologists is changing as more biological data becomes available to analyze using bioinformatics tools and databases.

Uploaded by

ajay_kumar_161
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views24 pages

Bioinformatics Molecular Biology

This document discusses bioinformatics and its role in analyzing biological data from molecular biology experiments. It defines bioinformatics as using computers to analyze biological data at the molecular level, such as DNA and protein sequences. It describes some of the challenges of molecular biology that require computational analysis, such as assembling DNA sequences, finding genes within genomes, analyzing protein sequences and structures. It also discusses how the role of biologists is changing as more biological data becomes available to analyze using bioinformatics tools and databases.

Uploaded by

ajay_kumar_161
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Bioinformatics

and
Molecular biology
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
What is Bioinformatics
• The use of computers to collect,
analyze, and interpret biological
information at the molecular level.
"The mathematical, statistical and computing methods that
aim to solve biological problems using DNA and amino
acid sequences and related information."

• A set of software tools for molecular


sequence analysis
Introduction

The Human Genome Project

Challenges of Molecular Biology
computing

The changing role of the Biologist in
the Age of Information

Bioinformatics software

Genomics

Impact on medicine
DNA Sequencing

Automated sequencers > 40 KB per day

500 bp reads must be assembled into
complete genes
- errors especially insertions and deletions
- error rate is highest at the ends where we want to
overlap the reads
- vector sequences must be removed from ends

Faster sequencing relies on better
software
 overlapping deletions vs. shotgun approaches: TIGR
Raw Genome Data:
The next step is obviously to locate all of
the genes and describe their functions.
This will probably take another 15-20
years!
Finding Genes in genome
Sequence is Not Easy
• About 2% of human DNA encodes
functional genes.

• Genes are interspersed among long


stretches of non-coding DNA.

• Repeats, pseudo-genes, and introns


confound matters
Pattern Finding Tools
• It is possible to use DNA sequence patterns to
predict genes:
• promoters
• translational start and stop codes (ORFs)
• intron splice sites
• codon bias

• Can also use similarity to known genes/ESTs


Phylogenetic tree construction
RNA Splicing
Primary transcripts (in eukaryotes) are sometimes “spliced” to
remove non-coding regions “introns” from coding regions “exons”
The exon regions are spliced together to form the mature mRNA

hnRNA
addition of cap, polyA tail

5’ Cap- - Poly A Tail

Splicing

5’ Cap- - Poly A Tail


Mature mRNA
Table 28.6

- consensus sequences are conserved throughout eukaryotes


Conservation of sequence is expected with recognition of sequences
being done by base pairing with snRNP’s RNA component
Similarity Searching the Databanks

What is similar to my sequence?

Searching gets harder as the
databases get bigger - and quality
degrades

Tools: BLAST and FASTA = time
saving heuristics (approximate)

Statistics + informed judgment of the
biologist
>gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'.
Length = 369

Score = 272 bits (137), Expect = 4e-71


Identities = 258/297 (86%), Gaps = 1/297 (0%)
Strand = Plus / Plus

Query: 17 aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76
|||||||||||||||| | ||| | ||| || ||| | |||| ||||| |||||||||
Sbjct: 1 aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59

Query: 77 agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136


|||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| ||
Sbjct: 60 agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119

Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196


|||||||| | || | ||||||||||||||| ||||||||||| || ||||||||||||
Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179

Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256


||||||||| | |||||||| |||||||||||||||||| ||||||||||||||||||||
Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239

Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313


|| || ||||| || ||||||||||| | |||||||||||||||||| ||||||||
Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296
Alignment

Alignment is the basis for finding similarity

Pair wise alignment = dynamic
programming

Multiple alignment: protein families and
functional domains

Multiple alignment is "impossible" for lots
of sequences

Another heuristic - progressive pair wise
alignment
Sample Multiple Alignment
Structure- Function Relationships

Can we predict the function of protein
molecules from their sequence?
sequence > structure > function

Conserved functional domains = motifs

Prediction of some simple 3-D
structures (-helix, -sheet, membrane
spanning, etc.)
New Types of Biological Data
• Microarrays - gene expression
• Multi-level maps: genetic, physical,
sequence, annotation
• Networks of Protein-protein interactions
• Cross-species relationships
• Homologous genes
• Chromosome organization
II. The Biologist in the
Age of Information
The Internet provides a wealth of
biological information

can be overwhelming
- e-mail
- USENET
- Web

Info skill = finding the information that
you need efficiently
The job of the biologist is changing
• As more biological information becomes
available …
– The biologist will spend more time using
computers
– The biologist will spend more time on data
analysis (and less doing lab biochemistry)
– Biology will become a more quantitative science
(think how the periodic table and atomic theory
affected chemistry)

You might also like