0% found this document useful (0 votes)

68 views54 pages

Lecture 5: Algorithm Design and Time/space Complexity Analysis

The implanted motif with four random mutations is: atgaccgggatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg The mutations are underlined.

Uploaded by

pranali suryawanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views54 pages

Lecture 5: Algorithm Design and Time/space Complexity Analysis

The implanted motif with four random mutations is: atgaccgggatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg The mutations are underlined.

Uploaded by

pranali suryawanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Lecture 5:

Algorithm design and time/space
complexity analysis
Torgeir R. Hvidsten

Professor
Norwegian University of Life Sciences

Guest lecturer
Umeå Plant Science Centre
Computational Life Science Cluster (CLiC)

1
This lecture
• Basic algorithm design: exhaustive search, greedy
algorithms, dynamic programming and randomized
algorithms
• Correct versus incorrect algorithms
• Time/space complexity analysis
• Go through Lab 3

2
Algorithm
• Algorithm: a sequence of instructions that one must
perform in order to solve a well-formulated problem
• Correct algorithm: translate every input instance into
the correct output
• Incorrect algorithm: there is at least one input instance
for which the algorithm does not produce the correct
output
• Many successful algorithms in bioinformatics are not
“correct” (optimal)

3
Search space

4
Sequence alignment as a search problem
w
j A T C T G A T C
0 1 2 3 4 5 6 7 8
i 0
Deletion
T 1

Matches
G 2

C 3
Insertion
v A 4

T 5

A 6

-TGCAT-A-C
C 7
AT-C-TGATC

5
Algorithm design (I)
• Exhaustive algorithms (brute force): examine every
possible alterative to find the solution
• Branch-and-bound algorithms: omit searching through
a large number of alternatives by branch-and-bound or
pruning
• Greedy algorithms: find the solution by always
choosing the currently ”best” alternative
• Dynamic programming: use the solution of the
subproblems of the original problem to construct the
solution

6
Algorithm design (II)
• Divide-and-conquer algorithms: splits the problem into
subproblems and solve the problems independently
• Randomized algorithms: finds the solution based on
randomized choices

• Machine learning: induce models based on previously

labeled observations (examples)

7
Algorithm complexity
• The Big-O notation:
– the running time of an algorithm as a function of the size of
its input
– worst case estimate
– asymptotic behavior
• O(n2) means that the running time of the algorithm on
an input of size n is limited by the quadratic function
of n

8
Big‐O Notation
• A function f(x) is O(g(x)) if there are positive real
constants c and x0 such that f(x) ≤ cg(x) for all values of
x ≥ x0.
Big‐O Notation
• A function f(x) is O(g(x)) if there are positive real
constants c and x0 such that f(x) ≤ cg(x) for all values of
x ≥ x0.
Time complexity
• Genome assembly: pice together a genome from short reads (~200bp)
– Aspen: 300M reads
– Spruce: 3000M reads

• Pair-wise all-against all alignment for Aspen takes 3 weeks on 16 porcessors

• What about spruce?

350
Bioinformatician:
300
Spruce: 300 uker
250
Time (weeks)

200 Time complexity: O(n2)
150

100
Biologist:
50
Spruce: 30 weeks
0
0 500 1000 1500 2000 2500 3000 3500
Million reads
11
Sorting algorithm
Sorting problem: Sort a list of n integers a = (a1, a2,
…, an)

SelectionSort(a,n)
1 for i ← 1 to n-1
2 j ← Index of the smallest element
among ai, ai+1, …, an
3 Swap elements ai and aj
4 return a

12
Example run
i = 1: (7,92,87,1,4,3,2,6)
i = 2: (1,92,87,7,4,3,2,6)
i = 3: (1,2,87,7,4,3,92,6)
i = 4: (1,2,3,7,4,87,92,6)
i = 5: (1,2,3,4,7,87,92,6)
i = 6: (1,2,3,4,6,87,92,7)
i = 7: (1,2,3,4,6,7,92,87)
(1,2,3,4,6,7,87,92)

13
Complexity of SelectionSort
• Makes n – 1 iterations in the for loop
• Analyzes n – i +1 elements ai, ai+1, …, an in iteration i
• Approximate number of operations:
– n + (n-1) + (n-2) + … + 2 + 1 = n(n+1)/2
– plus the swapping: n(n+1)/2 + 3n = 1/2 n2 + 3n + 1/2

• Thus the algorithm is O(n2)

14
Tractable versus intractable problems
• Some problems requires polynomial time
– e.g. sorting a list of integers
– called tractable problems
• Some problems require exponential time
– e.g. listing every subset in a list
– called intractable problems
• Some problems lie in between
– e.g. the traveling salesman problem
– called NP-complete problems
– nobody have proved whether a polynomial time algorithm
exists for these problems

15
Traveling salesman problem

16
Exhaustive search:
Finding regulatory motifs in
DNA sequences

17
Random sample

atgaccgggatactgataccgtatttggcctaggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatactgggcataaggtaca

tgagtatccctgggatgacttttgggaacactatagtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgaccttgtaagtgttttccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatggcccacttagtccacttatag

gtcaatcatgttcttgtgaatggatttttaactgagggcatagaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtactgatggaaactttcaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttggtttcgaaaatgctctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatttcaacgtatgccgaaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttctgggtactgatagca

18
Implanting motif AAAAAAAGGGGGGG

atgaccgggatactgatAAAAAAAAGGGGGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataAAAAAAAAGGGGGGGa

tgagtatccctgggatgacttAAAAAAAAGGGGGGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgAAAAAAAAGGGGGGGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAAAAAAAAGGGGGGGcttatag

gtcaatcatgttcttgtgaatggatttAAAAAAAAGGGGGGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtAAAAAAAAGGGGGGGcaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttAAAAAAAAGGGGGGGctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatAAAAAAAAGGGGGGGaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttAAAAAAAAGGGGGGGa

19
Where is the implanted motif?
atgaccgggatactgataaaaaaaagggggggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataaaaaaaaaggggggga

tgagtatccctgggatgacttaaaaaaaagggggggtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgaaaaaaaagggggggtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaataaaaaaaagggggggcttatag

gtcaatcatgttcttgtgaatggatttaaaaaaaaggggggggaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtaaaaaaaagggggggcaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttaaaaaaaagggggggctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcataaaaaaaagggggggaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttaaaaaaaaggggggga

20
Implanting motif AAAAAAGGGGGGG
with four random mutations

atgaccgggatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacAAtAAAAcGGcGGGa

tgagtatccctgggatgacttAAAAtAAtGGaGtGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgcAAAAAAAGGGattGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAtAAtAAAGGaaGGGcttatag

gtcaatcatgttcttgtgaatggatttAAcAAtAAGGGctGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtAtAAAcAAGGaGGGccaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttAAAAAAtAGGGaGccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatActAAAAAGGaGcGGaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttActAAAAAGGaGcGGa

21
Where is the motif?

atgaccgggatactgatagaagaaaggttgggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacaataaaacggcggga

tgagtatccctgggatgacttaaaataatggagtggtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgcaaaaaaagggattgtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatataataaaggaagggcttatag

gtcaatcatgttcttgtgaatggatttaacaataagggctgggaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtataaacaaggagggccaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttaaaaaatagggagccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatactaaaaaggagcggaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttactaaaaaggagcgga

22
Why finding motif is difficult