0% found this document useful (0 votes)
71 views

Graph Theory and Its Applications

1) Graph theory has applications in bioinformatics, such as modeling gene structure and DNA sequencing problems. 2) In the 1950s, Seymour Benzer used graph theory to show that genes are linear by modeling his experiment as an interval graph. 3) DNA sequencing by hybridization can be modeled as a Hamiltonian path or Eulerian path problem in a graph to reconstruct the original DNA sequence from subsequences.

Uploaded by

Avi Shek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Graph Theory and Its Applications

1) Graph theory has applications in bioinformatics, such as modeling gene structure and DNA sequencing problems. 2) In the 1950s, Seymour Benzer used graph theory to show that genes are linear by modeling his experiment as an interval graph. 3) DNA sequencing by hybridization can be modeled as a Hamiltonian path or Eulerian path problem in a graph to reconstruct the original DNA sequence from subsequences.

Uploaded by

Avi Shek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Graph Theory and

Its Applications
GRAPHS IN BIO-INFORMATICS

What is a Graph?

a b

i d
e
h

f
g
Vertices
GRAPHS IN BIO-INFORMATICS

What is a Graph?

a b

i d
e
h

f
g
Edges
GRAPHS IN BIO-INFORMATICS

What is a Graph?

a b

i d
e
h Vertices

f
g
Edges
GRAPHS IN BIO-INFORMATICS

A Real-life Puzzle
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS

Can you find a


route around
the city that
crosses every
bridge only
once?
GRAPHS IN BIO-INFORMATICS

Let us map the


puzzle to a
graph.
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS

The problem
reduces to
finding an
Eulerian
Circuit in a
graph
GRAPHS IN BIO-INFORMATICS

The Königsberg Bridge Problem is a story of 1736.


Map Coloring
Frequency Assignment Problem
Resource Partitioning
Motivation: Tale of an Ancient King

Treasure Island
Tale of an Ancient King

Treasure Island
Tale of an Ancient King

Treasure Island
Tale of an Ancient King

Treasure Island
Tale of an Ancient King

Treasure Island
Tale of an Ancient King

u1

u2

# r1 = 6
G # r2 = 5
Tale of an Ancient King

u1

u2

# r1 = 6
G # r2 = 5
Applications: Grid Computing

❑ Terminals delegating subtasks are base vertices.


❑ Computing elements are resource vertices
Irrigation Canal NETWORKS
Applications

Canal
Water Network
Pump
Plots in a region to be irrigated
47
Applications

Plots in a region to be irrigated


48
Applications

Find a set of boundaries


that induces

• Connected canals

• Supply water in all the plots


of the region

• Length of the induced canals


is minimum.

Region to be irrigated
Modeling 49
Applications

Find a set of roads


that induces
• Connected road network

• Supply gas in all the regions


of the locality

Gas • Length of the induced road


Pipeline network is minimum.
Gas Field

Roads Network of a Locality


50
Modeling
VLSI LAYOUT
VLSI Layout
VLSI Floorplanning

B
A
E F

G C

Interconnection graph
VLSI Floorplanning

B
B
A A
E F F
E
G C
G C
D
D

Interconnection graph VLSI floorplan


VLSI Floorplanning

B
B
A A
E F F
E
G C
G C
D
D

Interconnection graph VLSI floorplan


VLSI Floorplanning

B
B
A A
E F F
E
G C
G C
D
D

Interconnection graph VLSI floorplan


VLSI Floorplanning

B
B
A A
E F F
E
G C
G C
D
D

Interconnection graph VLSI floorplan

B
A
E F
G C
D

Dual-like graph
VLSI Floorplanning

B
B
A A
E F F
E
G C
G C
D
D

Interconnection graph VLSI floorplan

B B
A A
F E F
E
G G C
C
D D

Dual-like graph Add four corners


VLSI Floorplanning

B
B
A A
E F F
E
G C Rectangular
G C
D drawing
D

Interconnection graph VLSI floorplan

B B
A A
F E F
E
G G C
C
D D

Dual-like graph Add four corners


GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics
An Early Application
In the 1950s, Seymour Benzer applied graph theory to
show that genes are linear.
GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics
An Early Application
Benzer’s problem was equivalent to deciding whether the graph
obtained from his bacteriophage experiment represented an
interval graph.

1 1 2
2
3 3
6
4
5
6 5 4

Linear Structure Interval Graph


GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics

Benzer’s theorem, “genes are linear” played a significant


role in Watson and Crick’s discovery of the double-helix
structure of DNA.
GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics

Solve
Problems Solution to
Graph using
in Genomics
Model Graph
Genomics Problems
Algorithms
GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics
Problem: DNA Sequencing by Hybridization (SBH)
Given an array of subsequences,

S = { ATG TGG TGC GTG GGC GCA GCG CGT}


All subsequences of length 3 of a sequence of length 10.
Find the DNA Sequence of length 10 which contains all
the subsequences.
Ordering of the subsequences is not known.
Probable Solutions:
1. ATGCGTGGCA
2. ATGGCGTGCA
GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics
Modeling SBH as a Hamiltonian Path Problem
S = { ATG TGG TGC GTG GGC GCA GCG CGT}

A Hamiltonian path from the above graph will lead to a solution.


GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics
Modeling SBH as a Hamiltonian Path Problem
S = { ATG TGG TGC GTG GGC GCA GCG CGT}

Graph

Solution 1:
ATGCGTGGCA

Solution 2:
ATGGCGTGCA
GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics
Modeling SBH as a Hamiltonian Path Problem

Finding a Hamiltonian Path in a graph is difficult.

The problem is thus addressed by searching for


Eulerian Paths.
GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics
Modeling SBH as an Eulerian Path Problem
S = { ATG TGG TGC GTG GGC GCA GCG CGT}
GT CG

TG GC
AT CA

GG

An Eulerian path of the above graph will lead to a solution.


GRAPHS IN BIO-INFORMATICS

Graphs in Bioinformatics
Modeling SBH as an Eulerian Path Problem
S = { ATG TGG TGC GTG GGC GCA GCG CGT}
GT CG

TG GC
AT CA

GT CG
GT CG
GG
TG GC
TG GC
AT CA
AT CA

GG
GG
Solution 1: Solution 2:
ATGGCGTGCA ATGCGTGGCA
Pairwise Compatibility Graphs
Pairwise Compatible Graphs: Easy to Construct

h
0.5 0.5
f i
1 1 1 1 3 dmin= 4
k j
e b a
4 3
dmax= 7
d c

A pairwise compatible graph of (T, dmin , dmax )


❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: Easy to Construct

5 ∈ [4,7]
h
0.5 0.5
f i
1 1 dmin= 4 5 a/
1 1 3
k j
e b a
4 3 e/ b/
dmax= 7
d c d/ c/

T
G

A pairwise compatible graph of (T, dmin , dmax )


❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: Easy to Construct

3 ∉ [4,7]
h
0.5 0.5
f i
1 1 dmin= 4 5 a/
1 1 3
k j 3
e b a
4 3 e /
b/
dmax= 7
d c d/ c/

T
G

A pairwise compatible graph of (T, dmin , dmax )


❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: Easy to Construct

7 ∈ [4,7]
h
0.5 0.5
f i
1 1 dmin= 4 5 a/
1 1 3
k j
e b a 7
4 3 e/ b/
dmax= 7
d c d/ c/

T
G

A pairwise compatible graph of (T, dmin , dmax )


❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: Easy to Construct

h
0.5 0.5
f i
1 dmin= 4 a/
1 1 How
1 3 hard is the
k j
e
4 3
Reverse
b a Problem? e/ b/
dmax= 7
d c d/
c/

T
G

A pairwise compatible graph of (T, dmin , dmax )


❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: How to Recognize?

h
f 0.5 0.5
i
1 1 dmin= 4 a/
1 1 3
k j
e b ,d a
2 (T,3 dmin max) ?
e/ b/
dmax= 7
d c d/
c/

T G

A pairwise compatible graph of (T, dmin , dmax )


Given a graph
❑ has G, isthat
n vertices there an edge weighted
correspond tree T of
to the n leaves andT, two
and non-negative integers
dminare
❑ two vertices , dadjacent
max, suchinthat
G ifGand
is aonly
PCGif of (T,tree
their dmin , dmax)?is in [dmin, dmax]
distance
Recognizing PCG by Constructing (T, dmin, dmax)

dmin = 2
dmax = 3.5

Output
Input
dmin = 1.5
dmax = 2

Given a graph G, is there an edge weighted tree T and two non-negative integers
dmin, dmax, such that G is
Another (T, dmin, dOutput
a PCG of possible max)?
Pairwise Compatibility Tree Construction

1
1 1

1 1
G
Pairwise-Compatibility Tree
Complete Graphs of G for dmin=2, dmax=2

1 1
1 1
1
1 1

G
Complete Bipartite Graphs Pairwise-Compatibility Tree
of G for dmin=3, dmax=3
79
Historical Background
[Kearney, Munro, Phillips 2003]
PCGs are used to model evolutionary relationships among a set of organisms.

20

10
20
20
10

Relation Graph Evolutionary Tree


Rigidity Theory

26-Jun-19 Academy Lecture 81


Combinatorial characterization of rigidity
• (J.C.Maxwell, 1870) necessary condition that plane
truss with n joints and m bars is rigid: m≧2n-3

m=13
n=8
2n-3=13

• Maxwell’s rule --- classical and famous theorem


– implies that the rigidity of a truss is determined by its
topology
26-Jun-19 Academy Lecture 83
Theorem (Katoh-Tanigawa 2009,Molecular Theorem)
~
Panel-hinge framework in generic position is rigid in 3D ⇔ G contains 6 edge-
disjoint spanning trees.

Panel-Hinge framework Panel-Hinge graph G

G is rigid panel-hinge graph⇒ (G, q) is rigid for almost all hinge configurations
Srinakharinwirot University, Bangkok
Rigidity analysis for probing protein function:
… some examples Monitor effect of mutations on rigidity

No Drug With drug

HIV Protease Non-mutant Mutant


Jacobs et al: Structure, Function, and Genetics, 2001 Nucleotide binding domain of CFTR
SLIDE – drug docking/screening software (Kuhn, Michigan) Sljoka, TVO, 2011

Simulation of Thermal Protein


Unfolding

Cytochrom c

Hespenheide et al: J. Mol. Graph. & Model., 2002


We can replicate costly and cumbersome experiments using rigidity theory.
Understanding stability of Hyperthermophile and disordered proteins:
Hydrogen-Deuterium Exchange
(HDX) data of Sso Acp -
Acylphosphatase from
hyperthermophile Sulfolobus
solfataricus (Hyperthermophile
protein – solfataricus organism lives
in hot volcanic springs around 80-
90oC)
(Sljoka and Wilson, Physical Biology,
2013)
Rigidity analysis used to
characterize structure and
stability of Tau protein – an
intrinsically flexible protein
responsible in amyloid formation
in Alzheimers, Parkinsons,
dementia and other
neurodegenerative diseases.

Difficult to study experimentally and


with standard computational techniques
(Sljoka, Wilson et al, PLOS ONE 2015)
Social Network Analysis
Can we identify
social
communities?

Nodes: Users
Edges: Friendships
Interesting!
Active users are from the
same community
Critical
Person

You might also like