Lec9 Distances
Lec9 Distances
MaxSteps − ObsSteps
RI =
MaxSteps − MinSteps
1.0 1.0
0.01 0.01
0.01
Long branch attraction
1.0 1.0
The probability of a parsimony informative
site due to inheritance is very low,
(roughly 0.0003).
0.01 0.01
A 0.01 G
Long branch attraction
1.0 1.0
The probability of a parsimony informative
site due to inheritance is very low,
(roughly 0.0003).
1 3
2 4
2 4 Inferred
True
Inconsistency
• Parsimony based tree inference is not consistent for some tree shapes. In
fact it can be “positively misleading”:
– “Felsenstein zone” tree
– Many clocklike trees with short internal branch lengths and long
terminal branches (Penny et al., 1989, Huelsenbeck and Lander,
2003).
A A
A G
Pr
≈ 0.0003 and Pr ≈ 0.008
G G
G A
Can we find a tree that would predict these observed character divergences?
Species 1 Species 2 Species 3
Species 2 0
Species 3 0.3 0.3
Species 4 0.2 0.2 0.3
Can we find a tree that would predict these observed character divergences?
Sp. 1 Sp. 3
0.0 0.1
Sp. 2 Sp. 4
1 a c
3
i
b d
2 4
parameters data
p12 =a+b
p13 =a+i+c 1 2 3
2 d12
p14 =a+i+d
3 d13 d23
p23 =b+i+c 4 d14 d24 d34
p23 =b+i+d
p34 =c+d
If our pairwise distance measurements were error-free estimates
of the evolutionary distance between the sequences, then we
could always infer the tree from the distances.
p-dist
0.0
Evolutionary distance
0.0 ∞
Sequence divergence vs evolutionary distance
1.0
the p-dist
“levels off”
p-dist
0.0
Evolutionary distance
0.0 ∞
“Multiple hits” problem (also known as saturation)
1 5 10 15 20
Number of substitutions simulated onto a twenty-base sequence.
Distance corrections
1 2 3 1 2 3
2 c12 2 d12
3 c13 c23 3 d13 d23
4 c14 c24 c34 4 d14 d24 d34
3 4c
d = − ln 1 −
4 3
1 2 3 1 2 3
2 0.0 2 0
3 0.3 0.3 3 0.383 0.383
4 0.2 0.2 0.3 4 0.233 0.233 0.383
Least Squares Branch Lengths
X X (pij − dij )2
Sum of Squares =
i j
σijk
X X (pij − dij )2
Sum of Squares =
i j
σijk
SS = 0.00034 SS = 0.0003314
(best tree)
Ch Go
0.05591 0.05790
Go Gi Ch Gi
Minimum evolution optimality criterion
0.05591 0.05790
Go Gi Ch Gi
We still use least squares branch lengths when we use Minimum Evolution
Huson and Steel – distances that perfectly mislead
Taxa
Taxon Characters Taxa A B C D
A A A C A A C C A - 6 6 5
B A C A C C A A B 6 - 5 6
C C A G G G A A C 6 5 - 6
D C G A A A G G D 5 6 6 -
Yes.
1 2
3 4
Failure to correct distance sufficiently leads to poor
performance
1 2
3 4
Distance methods – summary
We can:
B D
@
If the tree above is correct then:
pAB = a + b
pAC = a + i + c
pAD = a + i + d
pBC = b + i + c
pBD = b + i + d
pCD = c + d
A@@@ C
@
a@ c A B C
@
@ @
@
@ @
@
@
@
@
i B dAB
@
@
@ C dAC dBC
@
b d@@@ D dAD dBD dCD
@
B D
@
dAC
A@@@ C
@
a@ c A B C
@
@ @
@
@ @
@
@
@
@
i B dAB
@
@
@ C dAC dBC
@
b @
@d
@
@
@ D dAD dBD dCD
@
@ @
@
B D
@
@
@
@
dAC + dBD
A@@@ C
@ @
a c A B C
@
@ @ @
@ @ @
@ @ @
@
@
@
@
@
@
@
i B dAB
@ @
@
@ C dAC dBC
@
b @
@d
@
@
@ D dAD dBD dCD
@
@ @
@
B D
@
@
@
@
dAC + dBD
dAB
A C
@
@
@ a
@
c A B C
@
@
@ i B dAB
@
@
@ C dAC dBC
@
b @
@ d
@
@
@ D dAD dBD dCD
@
@ @
@
B D
@
@
@
@
dCD
A C
@
@
@ a
@
c A B C
@
@
@ i B dAB
@
@
@ C dAC dBC
@
b d@@@ D dAD dBD dCD
@
B D
@
B D
@
νb νc νd
@ @ @
νd@@ νd@@ νb @
@
B D C D D B
@
@ @
@ @
@
dAD + dBC νa +νb +νc +νd +2νi νa +νb +νc +νd +2νi νa + νb + νc + νd
νc
@
νd@@
C D
@
@
If |ij | < ν2i then dAC + dBD will still be the smallest sum – So
Buneman’s method will get the tree correct.
νi
Worst case: AC = BD = 2 and AB = CD = − ν2i then
where:
1
wij =
2n(i,j)
and n(i, j) is the number of nodes on the path from i to j.
Balanced minimum evolution
A A
G G
Neighbor-joining
A B C D E F
A -
B 0.258 -
C 0.274 0.204 -
D 0.302 0.248 0.278 -
E 0.288 0.224 0.252 0.268 -
F 0.250 0.160 0.226 0.210 0.194 -
Neighbor-joining (example)
P
k dik A B C D E F
1.372 A 0.0 0.258 0.274 0.302 0.288 0.25
1.094 B 0.258 0.0 0.204 0.248 0.224 0.16
1.234 C 0.274 0.204 0.0 0.278 0.252 0.226
1.306 D 0.302 0.248 0.278 0.0 0.268 0.21
1.226 E 0.288 0.224 0.252 0.268 0.0 0.194
1.040 F 0.25 0.16 0.226 0.21 0.194 0.0
Q(A, B) -1.434
Q(A, C) -1.510
Q(A, D) -1.470
Q(A, E) -1.446
Q(A, F ) -1.412
Q(B, C) -1.512
Q(B, D) -1.408
Q(B, E) -1.424
Q(B, F ) -1.494
Q(C, D) -1.428
Q(C, E) -1.452
Q(C, F ) -1.370
Q(D, E) -1.460
Q(D, F ) -1.506
Q(E, F ) -1.490
Neighbor-joining (example)
A D E F (B,C)
A 0.0 0.302 0.288 0.25 0.164
D 0.302 0.0 0.268 0.21 0.161
E 0.288 0.268 0.0 0.194 0.136
F 0.25 0.21 0.194 0.0 0.091
(B,C) 0.164 0.161 0.136 0.091 0.0
Neighbor-joining (example)
P
k dik A D E F (B,C)
1.004000 A 0.0 0.302 0.288 0.25 0.164
0.941000 D 0.302 0.0 0.268 0.21 0.161
0.886000 E 0.288 0.268 0.0 0.194 0.136
0.745000 F 0.25 0.21 0.194 0.0 0.091
0.552000 (B,C) 0.164 0.161 0.136 0.091 0.0
Neighbor-joining (example)
Q(A, D) -1.039000
Q(A, E) -1.026000
Q(A, F ) -0.999000
Q(A, (B, C)) -1.064000
Q(D, E) -1.023000
Q(D, F ) -1.056000
Q(D, (B, C)) -1.010000
Q(E, F ) -1.049000
Q(E, (B, C)) -1.030000
Q(F, (B, C)) -1.024000
Neighbor-joining (example)
D E F (A,(B,C))
D 0.0 0.268 0.21 0.1495
E 0.268 0.0 0.194 0.13
F 0.21 0.194 0.0 0.0885
(A,(B,C)) 0.1495 0.13 0.0885 0.0
Neighbor-joining (example)
P
k dik D E F (A,(B,C))
0.627500 D 0.0 0.268 0.21 0.1495
0.592000 E 0.268 0.0 0.194 0.13
0.492500 F 0.21 0.194 0.0 0.0885
0.368000 (A,(B,C)) 0.1495 0.13 0.0885 0.0
Neighbor-joining (example)
Q(D, E) -0.683500
Q(D, F ) -0.700000
Q(D, (A, (B, C))) -0.696500
Q(E, F ) -0.696500
Q(E, (A, (B, C))) -0.700000
Q(F, (A, (B, C))) -0.683500